Merge tag 'selinux-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git...
authorLinus Torvalds <torvalds@linux-foundation.org>
Wed, 28 Jun 2023 00:18:48 +0000 (17:18 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Wed, 28 Jun 2023 00:18:48 +0000 (17:18 -0700)
Pull selinux updates from Paul Moore:

 - Thanks to help from the MPTCP folks, it looks like we have finally
   sorted out a proper solution to the MPTCP socket labeling issue, see
   the new security_mptcp_add_subflow() LSM hook.

 - Fix the labeled NFS handling such that a labeled NFS share mounted
   prior to the initial SELinux policy load is properly labeled once a
   policy is loaded; more information in the commit description.

 - Two patches to security/selinux/Makefile, the first took the cleanups
   in v6.4 a bit further and the second removed the grouped targets
   support as that functionality doesn't appear to be properly supported
   prior to make v4.3.

 - Deprecate the "fs" object context type in SELinux policies. The fs
   object context type was an old vestige that was introduced back in
   v2.6.12-rc2 but never really used.

 - A number of small changes that remove dead code, clean up some
   awkward bits, and generally improve the quality of the code. See the
   individual commit descriptions for more information.

* tag 'selinux-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: avoid bool as identifier name
  selinux: fix Makefile for versions of make < v4.3
  selinux: make labeled NFS work when mounted before policy load
  selinux: cleanup exit_sel_fs() declaration
  selinux: deprecated fs ocon
  selinux: make header files self-including
  selinux: keep context struct members in sync
  selinux: Implement mptcp_add_subflow hook
  security, lsm: Introduce security_mptcp_add_subflow()
  selinux: small cleanups in selinux_audit_rule_init()
  selinux: declare read-only data arrays const
  selinux: retain const qualifier on string literal in avtab_hash_eval()
  selinux: drop return at end of void function avc_insert()
  selinux: avc: drop unused function avc_disable()
  selinux: adjust typos in comments
  selinux: do not leave dangling pointer behind
  selinux: more Makefile tweaks

3329 files changed:
.gitattributes
.mailmap
CREDITS
Documentation/RCU/Design/Requirements/Requirements.rst
Documentation/RCU/whatisRCU.rst
Documentation/admin-guide/bcache.rst
Documentation/admin-guide/cgroup-v2.rst
Documentation/admin-guide/cifs/changes.rst
Documentation/admin-guide/cifs/usage.rst
Documentation/admin-guide/kernel-parameters.txt
Documentation/admin-guide/perf/hisi-pmu.rst
Documentation/admin-guide/quickly-build-trimmed-linux.rst
Documentation/arch/arm/arm.rst [moved from Documentation/arm/arm.rst with 100% similarity]
Documentation/arch/arm/booting.rst [moved from Documentation/arm/booting.rst with 100% similarity]
Documentation/arch/arm/cluster-pm-race-avoidance.rst [moved from Documentation/arm/cluster-pm-race-avoidance.rst with 100% similarity]
Documentation/arch/arm/features.rst [moved from Documentation/arm/features.rst with 100% similarity]
Documentation/arch/arm/firmware.rst [moved from Documentation/arm/firmware.rst with 100% similarity]
Documentation/arch/arm/google/chromebook-boot-flow.rst [moved from Documentation/arm/google/chromebook-boot-flow.rst with 100% similarity]
Documentation/arch/arm/index.rst [moved from Documentation/arm/index.rst with 100% similarity]
Documentation/arch/arm/interrupts.rst [moved from Documentation/arm/interrupts.rst with 100% similarity]
Documentation/arch/arm/ixp4xx.rst [moved from Documentation/arm/ixp4xx.rst with 100% similarity]
Documentation/arch/arm/kernel_mode_neon.rst [moved from Documentation/arm/kernel_mode_neon.rst with 100% similarity]
Documentation/arch/arm/kernel_user_helpers.rst [moved from Documentation/arm/kernel_user_helpers.rst with 100% similarity]
Documentation/arch/arm/keystone/knav-qmss.rst [moved from Documentation/arm/keystone/knav-qmss.rst with 100% similarity]
Documentation/arch/arm/keystone/overview.rst [moved from Documentation/arm/keystone/overview.rst with 100% similarity]
Documentation/arch/arm/marvell.rst [moved from Documentation/arm/marvell.rst with 100% similarity]
Documentation/arch/arm/mem_alignment.rst [moved from Documentation/arm/mem_alignment.rst with 100% similarity]
Documentation/arch/arm/memory.rst [moved from Documentation/arm/memory.rst with 100% similarity]
Documentation/arch/arm/microchip.rst [moved from Documentation/arm/microchip.rst with 100% similarity]
Documentation/arch/arm/netwinder.rst [moved from Documentation/arm/netwinder.rst with 100% similarity]
Documentation/arch/arm/nwfpe/index.rst [moved from Documentation/arm/nwfpe/index.rst with 100% similarity]
Documentation/arch/arm/nwfpe/netwinder-fpe.rst [moved from Documentation/arm/nwfpe/netwinder-fpe.rst with 100% similarity]
Documentation/arch/arm/nwfpe/notes.rst [moved from Documentation/arm/nwfpe/notes.rst with 100% similarity]
Documentation/arch/arm/nwfpe/nwfpe.rst [moved from Documentation/arm/nwfpe/nwfpe.rst with 100% similarity]
Documentation/arch/arm/nwfpe/todo.rst [moved from Documentation/arm/nwfpe/todo.rst with 100% similarity]
Documentation/arch/arm/omap/dss.rst [moved from Documentation/arm/omap/dss.rst with 100% similarity]
Documentation/arch/arm/omap/index.rst [moved from Documentation/arm/omap/index.rst with 100% similarity]
Documentation/arch/arm/omap/omap.rst [moved from Documentation/arm/omap/omap.rst with 100% similarity]
Documentation/arch/arm/omap/omap_pm.rst [moved from Documentation/arm/omap/omap_pm.rst with 100% similarity]
Documentation/arch/arm/porting.rst [moved from Documentation/arm/porting.rst with 100% similarity]
Documentation/arch/arm/pxa/mfp.rst [moved from Documentation/arm/pxa/mfp.rst with 100% similarity]
Documentation/arch/arm/sa1100/assabet.rst [moved from Documentation/arm/sa1100/assabet.rst with 100% similarity]
Documentation/arch/arm/sa1100/cerf.rst [moved from Documentation/arm/sa1100/cerf.rst with 100% similarity]
Documentation/arch/arm/sa1100/index.rst [moved from Documentation/arm/sa1100/index.rst with 100% similarity]
Documentation/arch/arm/sa1100/lart.rst [moved from Documentation/arm/sa1100/lart.rst with 100% similarity]
Documentation/arch/arm/sa1100/serial_uart.rst [moved from Documentation/arm/sa1100/serial_uart.rst with 100% similarity]
Documentation/arch/arm/samsung/bootloader-interface.rst [moved from Documentation/arm/samsung/bootloader-interface.rst with 100% similarity]
Documentation/arch/arm/samsung/clksrc-change-registers.awk [moved from Documentation/arm/samsung/clksrc-change-registers.awk with 100% similarity]
Documentation/arch/arm/samsung/gpio.rst [moved from Documentation/arm/samsung/gpio.rst with 100% similarity]
Documentation/arch/arm/samsung/index.rst [moved from Documentation/arm/samsung/index.rst with 100% similarity]
Documentation/arch/arm/samsung/overview.rst [moved from Documentation/arm/samsung/overview.rst with 100% similarity]
Documentation/arch/arm/setup.rst [moved from Documentation/arm/setup.rst with 100% similarity]
Documentation/arch/arm/spear/overview.rst [moved from Documentation/arm/spear/overview.rst with 100% similarity]
Documentation/arch/arm/sti/overview.rst [moved from Documentation/arm/sti/overview.rst with 100% similarity]
Documentation/arch/arm/sti/stih407-overview.rst [moved from Documentation/arm/sti/stih407-overview.rst with 100% similarity]
Documentation/arch/arm/sti/stih418-overview.rst [moved from Documentation/arm/sti/stih418-overview.rst with 100% similarity]
Documentation/arch/arm/stm32/overview.rst [moved from Documentation/arm/stm32/overview.rst with 100% similarity]
Documentation/arch/arm/stm32/stm32-dma-mdma-chaining.rst [moved from Documentation/arm/stm32/stm32-dma-mdma-chaining.rst with 100% similarity]
Documentation/arch/arm/stm32/stm32f429-overview.rst [moved from Documentation/arm/stm32/stm32f429-overview.rst with 100% similarity]
Documentation/arch/arm/stm32/stm32f746-overview.rst [moved from Documentation/arm/stm32/stm32f746-overview.rst with 100% similarity]
Documentation/arch/arm/stm32/stm32f769-overview.rst [moved from Documentation/arm/stm32/stm32f769-overview.rst with 100% similarity]
Documentation/arch/arm/stm32/stm32h743-overview.rst [moved from Documentation/arm/stm32/stm32h743-overview.rst with 100% similarity]
Documentation/arch/arm/stm32/stm32h750-overview.rst [moved from Documentation/arm/stm32/stm32h750-overview.rst with 100% similarity]
Documentation/arch/arm/stm32/stm32mp13-overview.rst [moved from Documentation/arm/stm32/stm32mp13-overview.rst with 100% similarity]
Documentation/arch/arm/stm32/stm32mp151-overview.rst [moved from Documentation/arm/stm32/stm32mp151-overview.rst with 100% similarity]
Documentation/arch/arm/stm32/stm32mp157-overview.rst [moved from Documentation/arm/stm32/stm32mp157-overview.rst with 100% similarity]
Documentation/arch/arm/sunxi.rst [moved from Documentation/arm/sunxi.rst with 100% similarity]
Documentation/arch/arm/sunxi/clocks.rst [moved from Documentation/arm/sunxi/clocks.rst with 100% similarity]
Documentation/arch/arm/swp_emulation.rst [moved from Documentation/arm/swp_emulation.rst with 100% similarity]
Documentation/arch/arm/tcm.rst [moved from Documentation/arm/tcm.rst with 100% similarity]
Documentation/arch/arm/uefi.rst [moved from Documentation/arm/uefi.rst with 100% similarity]
Documentation/arch/arm/vfp/release-notes.rst [moved from Documentation/arm/vfp/release-notes.rst with 100% similarity]
Documentation/arch/arm/vlocks.rst [moved from Documentation/arm/vlocks.rst with 100% similarity]
Documentation/arch/index.rst
Documentation/arch/x86/resctrl.rst
Documentation/arm64/acpi_object_usage.rst
Documentation/arm64/arm-acpi.rst
Documentation/arm64/booting.rst
Documentation/arm64/cpu-feature-registers.rst
Documentation/arm64/elf_hwcaps.rst
Documentation/arm64/index.rst
Documentation/arm64/kdump.rst [new file with mode: 0644]
Documentation/arm64/memory.rst
Documentation/arm64/ptdump.rst [new file with mode: 0644]
Documentation/arm64/silicon-errata.rst
Documentation/block/index.rst
Documentation/block/request.rst [deleted file]
Documentation/cdrom/index.rst
Documentation/conf.py
Documentation/core-api/cpu_hotplug.rst
Documentation/core-api/kernel-api.rst
Documentation/core-api/pin_user_pages.rst
Documentation/core-api/this_cpu_ops.rst
Documentation/core-api/workqueue.rst
Documentation/crypto/async-tx-api.rst
Documentation/dev-tools/kselftest.rst
Documentation/dev-tools/kunit/architecture.rst
Documentation/dev-tools/kunit/start.rst
Documentation/dev-tools/kunit/usage.rst
Documentation/devicetree/bindings/arm/xen.txt
Documentation/devicetree/bindings/ata/ahci-common.yaml
Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml
Documentation/devicetree/bindings/cache/qcom,llcc.yaml
Documentation/devicetree/bindings/clock/canaan,k210-clk.yaml
Documentation/devicetree/bindings/display/msm/dsi-controller-main.yaml
Documentation/devicetree/bindings/firmware/qcom,scm.yaml
Documentation/devicetree/bindings/fpga/lattice,sysconfig.yaml
Documentation/devicetree/bindings/fpga/microchip,mpf-spi-fpga-mgr.yaml
Documentation/devicetree/bindings/i2c/opencores,i2c-ocores.yaml
Documentation/devicetree/bindings/i3c/silvaco,i3c-master.yaml
Documentation/devicetree/bindings/iio/adc/nxp,imx8qxp-adc.yaml
Documentation/devicetree/bindings/iio/adc/renesas,rcar-gyroadc.yaml
Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml
Documentation/devicetree/bindings/interrupt-controller/loongson,eiointc.yaml [new file with mode: 0644]
Documentation/devicetree/bindings/media/i2c/ovti,ov2685.yaml
Documentation/devicetree/bindings/memory-controllers/nuvoton,npcm-memory-controller.yaml [new file with mode: 0644]
Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml
Documentation/devicetree/bindings/net/can/st,stm32-bxcan.yaml
Documentation/devicetree/bindings/net/realtek-bluetooth.yaml
Documentation/devicetree/bindings/pci/fsl,imx6q-pcie-common.yaml
Documentation/devicetree/bindings/pci/fsl,imx6q-pcie-ep.yaml
Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.yaml
Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml
Documentation/devicetree/bindings/pinctrl/canaan,k210-fpioa.yaml
Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.yaml
Documentation/devicetree/bindings/power/qcom,rpmpd.yaml
Documentation/devicetree/bindings/reset/canaan,k210-rst.yaml
Documentation/devicetree/bindings/riscv/canaan.yaml
Documentation/devicetree/bindings/serial/8250_omap.yaml
Documentation/devicetree/bindings/sound/tas2562.yaml
Documentation/devicetree/bindings/sound/tas2770.yaml
Documentation/devicetree/bindings/sound/tas27xx.yaml
Documentation/devicetree/bindings/sound/tlv320aic32x4.txt
Documentation/devicetree/bindings/thermal/armada-thermal.txt
Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.txt [deleted file]
Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.yaml [new file with mode: 0644]
Documentation/devicetree/bindings/thermal/qcom-tsens.yaml
Documentation/devicetree/bindings/timer/brcm,kona-timer.txt [deleted file]
Documentation/devicetree/bindings/timer/brcm,kona-timer.yaml [new file with mode: 0644]
Documentation/devicetree/bindings/timer/loongson,ls1x-pwmtimer.yaml [new file with mode: 0644]
Documentation/devicetree/bindings/timer/ralink,rt2880-timer.yaml [new file with mode: 0644]
Documentation/devicetree/bindings/usb/cdns,usb3.yaml
Documentation/devicetree/bindings/usb/snps,dwc3.yaml
Documentation/devicetree/usage-model.rst
Documentation/doc-guide/sphinx.rst
Documentation/driver-api/basics.rst
Documentation/driver-api/edac.rst
Documentation/filesystems/directory-locking.rst
Documentation/filesystems/fsverity.rst
Documentation/filesystems/index.rst
Documentation/filesystems/ramfs-rootfs-initramfs.rst
Documentation/filesystems/sharedsubtree.rst
Documentation/filesystems/smb/cifsroot.rst [moved from Documentation/filesystems/cifs/cifsroot.rst with 97% similarity]
Documentation/filesystems/smb/index.rst [moved from Documentation/filesystems/cifs/index.rst with 100% similarity]
Documentation/filesystems/smb/ksmbd.rst [moved from Documentation/filesystems/cifs/ksmbd.rst with 100% similarity]
Documentation/fpga/index.rst
Documentation/locking/index.rst
Documentation/maintainer/configure-git.rst
Documentation/mm/page_table_check.rst
Documentation/mm/page_tables.rst
Documentation/netlink/specs/ethtool.yaml
Documentation/netlink/specs/handshake.yaml
Documentation/networking/bonding.rst
Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
Documentation/networking/index.rst
Documentation/networking/ip-sysctl.rst
Documentation/networking/tls-handshake.rst
Documentation/networking/x25-iface.rst
Documentation/pcmcia/index.rst
Documentation/process/2.Process.rst
Documentation/process/changes.rst
Documentation/process/handling-regressions.rst
Documentation/process/maintainer-netdev.rst
Documentation/process/maintainer-tip.rst
Documentation/process/submitting-patches.rst
Documentation/riscv/patch-acceptance.rst
Documentation/rust/quick-start.rst
Documentation/s390/vfio-ap.rst
Documentation/scheduler/sched-deadline.rst
Documentation/staging/crc32.rst
Documentation/subsystem-apis.rst
Documentation/timers/index.rst
Documentation/trace/histogram.rst
Documentation/trace/user_events.rst
Documentation/translations/zh_CN/arch/arm/Booting [moved from Documentation/translations/zh_CN/arm/Booting with 98% similarity]
Documentation/translations/zh_CN/arch/arm/kernel_user_helpers.txt [moved from Documentation/translations/zh_CN/arm/kernel_user_helpers.txt with 98% similarity]
Documentation/translations/zh_CN/devicetree/usage-model.rst
Documentation/userspace-api/ioctl/ioctl-number.rst
Documentation/virt/guest-halt-polling.rst
Documentation/virt/kvm/halt-polling.rst
Documentation/virt/kvm/locking.rst
Documentation/virt/kvm/ppc-pv.rst
Documentation/virt/kvm/vcpu-requests.rst
Documentation/virt/paravirt_ops.rst
MAINTAINERS
Makefile
arch/Kconfig
arch/alpha/include/asm/atomic.h
arch/alpha/include/asm/bugs.h [deleted file]
arch/alpha/kernel/osf_sys.c
arch/alpha/kernel/setup.c
arch/arc/include/asm/atomic-spinlock.h
arch/arc/include/asm/atomic.h
arch/arc/include/asm/atomic64-arcv2.h
arch/arm/Kconfig
arch/arm/boot/compressed/atags_to_fdt.c
arch/arm/boot/compressed/fdt_check_mem_start.c
arch/arm/boot/compressed/misc.c
arch/arm/boot/compressed/misc.h
arch/arm/boot/dts/am57xx-cl-som-am57x.dts
arch/arm/boot/dts/at91-sama7g5ek.dts
arch/arm/boot/dts/at91sam9261ek.dts
arch/arm/boot/dts/imx6qdl-mba6.dtsi
arch/arm/boot/dts/imx6ull-dhcor-som.dtsi
arch/arm/boot/dts/imx7d-pico-hobbit.dts
arch/arm/boot/dts/imx7d-sdb.dts
arch/arm/boot/dts/omap3-cm-t3x.dtsi
arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi
arch/arm/boot/dts/omap3-lilly-a83x.dtsi
arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi
arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi
arch/arm/boot/dts/omap3-pandora-common.dtsi
arch/arm/boot/dts/omap5-cm-t54.dts
arch/arm/boot/dts/qcom-apq8026-asus-sparrow.dts
arch/arm/boot/dts/qcom-apq8026-huawei-sturgeon.dts
arch/arm/boot/dts/qcom-apq8026-lg-lenok.dts
arch/arm/boot/dts/qcom-apq8064.dtsi
arch/arm/boot/dts/qcom-apq8084.dtsi
arch/arm/boot/dts/qcom-ipq4019.dtsi
arch/arm/boot/dts/qcom-ipq8064.dtsi
arch/arm/boot/dts/qcom-mdm9615-wp8548-mangoh-green.dts
arch/arm/boot/dts/qcom-msm8660.dtsi
arch/arm/boot/dts/qcom-msm8960.dtsi
arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
arch/arm/boot/dts/qcom-msm8974-sony-xperia-rhine.dtsi
arch/arm/boot/dts/qcom-msm8974.dtsi
arch/arm/boot/dts/qcom-msm8974pro-oneplus-bacon.dts
arch/arm/boot/dts/qcom-msm8974pro-samsung-klte.dts
arch/arm/boot/dts/qcom-msm8974pro-sony-xperia-shinano-castor.dts
arch/arm/boot/dts/stm32f429.dtsi
arch/arm/boot/dts/stm32f7-pinctrl.dtsi
arch/arm/boot/dts/vexpress-v2p-ca5s.dts
arch/arm/common/mcpm_entry.c
arch/arm/common/mcpm_head.S
arch/arm/common/vlock.S
arch/arm/include/asm/arm_pmuv3.h
arch/arm/include/asm/assembler.h
arch/arm/include/asm/atomic.h
arch/arm/include/asm/bugs.h
arch/arm/include/asm/ftrace.h
arch/arm/include/asm/mach/arch.h
arch/arm/include/asm/page.h
arch/arm/include/asm/ptrace.h
arch/arm/include/asm/setup.h
arch/arm/include/asm/signal.h
arch/arm/include/asm/smp.h
arch/arm/include/asm/spectre.h
arch/arm/include/asm/suspend.h
arch/arm/include/asm/sync_bitops.h
arch/arm/include/asm/syscalls.h [new file with mode: 0644]
arch/arm/include/asm/tcm.h
arch/arm/include/asm/traps.h
arch/arm/include/asm/unwind.h
arch/arm/include/asm/vdso.h
arch/arm/include/asm/vfp.h
arch/arm/include/uapi/asm/setup.h
arch/arm/kernel/atags_parse.c
arch/arm/kernel/bugs.c
arch/arm/kernel/entry-armv.S
arch/arm/kernel/fiq.c
arch/arm/kernel/head-inflate-data.c
arch/arm/kernel/head.h [new file with mode: 0644]
arch/arm/kernel/module.c
arch/arm/kernel/setup.c
arch/arm/kernel/signal.c
arch/arm/kernel/smp.c
arch/arm/kernel/sys_arm.c
arch/arm/kernel/sys_oabi-compat.c
arch/arm/kernel/traps.c
arch/arm/kernel/unwind.c
arch/arm/kernel/vdso.c
arch/arm/lib/bitops.h
arch/arm/lib/testchangebit.S
arch/arm/lib/testclearbit.S
arch/arm/lib/testsetbit.S
arch/arm/mach-at91/pm.c
arch/arm/mach-exynos/common.h
arch/arm/mach-mxs/mach-mxs.c
arch/arm/mach-omap1/board-ams-delta.c
arch/arm/mach-omap1/board-nokia770.c
arch/arm/mach-omap1/board-osk.c
arch/arm/mach-omap1/board-palmte.c
arch/arm/mach-omap1/board-sx1.c
arch/arm/mach-omap1/irq.c
arch/arm/mach-pxa/gumstix.c
arch/arm/mach-pxa/pxa25x.c
arch/arm/mach-pxa/pxa27x.c
arch/arm/mach-pxa/spitz.c
arch/arm/mach-sa1100/jornada720_ssp.c
arch/arm/mach-sti/Kconfig
arch/arm/mm/Kconfig
arch/arm/mm/dma-mapping.c
arch/arm/mm/fault.h
arch/arm/mm/flush.c
arch/arm/mm/mmu.c
arch/arm/mm/nommu.c
arch/arm/mm/tcm.h [deleted file]
arch/arm/probes/kprobes/checkers-common.c
arch/arm/probes/kprobes/core.c
arch/arm/probes/kprobes/opt-arm.c
arch/arm/probes/kprobes/test-core.c
arch/arm/probes/kprobes/test-core.h
arch/arm/tools/mach-types
arch/arm/vdso/vgettimeofday.c
arch/arm/vfp/entry.S
arch/arm/vfp/vfphw.S
arch/arm/vfp/vfpmodule.c
arch/arm64/Kconfig
arch/arm64/boot/dts/arm/foundation-v8.dtsi
arch/arm64/boot/dts/arm/rtsm_ve-aemv8a.dts
arch/arm64/boot/dts/arm/vexpress-v2f-1xv7-ca53x2.dts
arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi
arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi
arch/arm64/boot/dts/freescale/imx8mn-beacon-baseboard.dtsi
arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi
arch/arm64/boot/dts/freescale/imx8mn.dtsi
arch/arm64/boot/dts/freescale/imx8mp.dtsi
arch/arm64/boot/dts/freescale/imx8qm-mek.dts
arch/arm64/boot/dts/freescale/imx8x-colibri-eval-v3.dtsi
arch/arm64/boot/dts/freescale/imx8x-colibri-iris.dtsi
arch/arm64/boot/dts/freescale/imx8x-colibri.dtsi
arch/arm64/boot/dts/qcom/ipq5332.dtsi
arch/arm64/boot/dts/qcom/ipq6018.dtsi
arch/arm64/boot/dts/qcom/ipq8074.dtsi
arch/arm64/boot/dts/qcom/ipq9574.dtsi
arch/arm64/boot/dts/qcom/msm8916.dtsi
arch/arm64/boot/dts/qcom/msm8953.dtsi
arch/arm64/boot/dts/qcom/msm8976.dtsi
arch/arm64/boot/dts/qcom/msm8994.dtsi
arch/arm64/boot/dts/qcom/msm8996.dtsi
arch/arm64/boot/dts/qcom/msm8998.dtsi
arch/arm64/boot/dts/qcom/qcm2290.dtsi
arch/arm64/boot/dts/qcom/qcs404.dtsi
arch/arm64/boot/dts/qcom/qdu1000.dtsi
arch/arm64/boot/dts/qcom/sa8155p-adp.dts
arch/arm64/boot/dts/qcom/sa8155p.dtsi [new file with mode: 0644]
arch/arm64/boot/dts/qcom/sa8775p.dtsi
arch/arm64/boot/dts/qcom/sc7180-idp.dts
arch/arm64/boot/dts/qcom/sc7180-lite.dtsi
arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi
arch/arm64/boot/dts/qcom/sc7180.dtsi
arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
arch/arm64/boot/dts/qcom/sc7280-idp.dtsi
arch/arm64/boot/dts/qcom/sc7280-qcard.dtsi
arch/arm64/boot/dts/qcom/sc7280.dtsi
arch/arm64/boot/dts/qcom/sc8280xp.dtsi
arch/arm64/boot/dts/qcom/sdm630.dtsi
arch/arm64/boot/dts/qcom/sdm670.dtsi
arch/arm64/boot/dts/qcom/sdm845.dtsi
arch/arm64/boot/dts/qcom/sm6115.dtsi
arch/arm64/boot/dts/qcom/sm6125.dtsi
arch/arm64/boot/dts/qcom/sm6350.dtsi
arch/arm64/boot/dts/qcom/sm6375-sony-xperia-murray-pdx225.dts
arch/arm64/boot/dts/qcom/sm6375.dtsi
arch/arm64/boot/dts/qcom/sm8150.dtsi
arch/arm64/boot/dts/qcom/sm8250-xiaomi-elish-boe.dts
arch/arm64/boot/dts/qcom/sm8250-xiaomi-elish-csot.dts
arch/arm64/boot/dts/qcom/sm8350.dtsi
arch/arm64/boot/dts/qcom/sm8450.dtsi
arch/arm64/boot/dts/qcom/sm8550.dtsi
arch/arm64/boot/dts/rockchip/rk3308.dtsi
arch/arm64/boot/dts/rockchip/rk3328-rock64.dts
arch/arm64/boot/dts/rockchip/rk3328.dtsi
arch/arm64/boot/dts/rockchip/rk3566-soquartz-cm4.dts
arch/arm64/boot/dts/rockchip/rk3566-soquartz.dtsi
arch/arm64/boot/dts/rockchip/rk3568-nanopi-r5c.dts
arch/arm64/boot/dts/rockchip/rk3568-nanopi-r5s.dts
arch/arm64/boot/dts/rockchip/rk3568.dtsi
arch/arm64/boot/dts/rockchip/rk356x.dtsi
arch/arm64/boot/dts/rockchip/rk3588s.dtsi
arch/arm64/hyperv/mshyperv.c
arch/arm64/include/asm/alternative-macros.h
arch/arm64/include/asm/alternative.h
arch/arm64/include/asm/arch_timer.h
arch/arm64/include/asm/archrandom.h
arch/arm64/include/asm/arm_pmuv3.h
arch/arm64/include/asm/asm-uaccess.h
arch/arm64/include/asm/atomic.h
arch/arm64/include/asm/atomic_ll_sc.h
arch/arm64/include/asm/atomic_lse.h
arch/arm64/include/asm/cmpxchg.h
arch/arm64/include/asm/compat.h
arch/arm64/include/asm/cpu.h
arch/arm64/include/asm/cpufeature.h
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/efi.h
arch/arm64/include/asm/el2_setup.h
arch/arm64/include/asm/esr.h
arch/arm64/include/asm/exception.h
arch/arm64/include/asm/hw_breakpoint.h
arch/arm64/include/asm/hwcap.h
arch/arm64/include/asm/io.h
arch/arm64/include/asm/irqflags.h
arch/arm64/include/asm/kernel-pgtable.h
arch/arm64/include/asm/kvm_arm.h
arch/arm64/include/asm/kvm_asm.h
arch/arm64/include/asm/kvm_host.h
arch/arm64/include/asm/kvm_pgtable.h
arch/arm64/include/asm/lse.h
arch/arm64/include/asm/memory.h
arch/arm64/include/asm/mmu_context.h
arch/arm64/include/asm/module.h
arch/arm64/include/asm/module.lds.h
arch/arm64/include/asm/percpu.h
arch/arm64/include/asm/pgtable-hwdef.h
arch/arm64/include/asm/pgtable-prot.h
arch/arm64/include/asm/scs.h
arch/arm64/include/asm/smp.h
arch/arm64/include/asm/spectre.h
arch/arm64/include/asm/syscall_wrapper.h
arch/arm64/include/asm/sysreg.h
arch/arm64/include/asm/traps.h
arch/arm64/include/asm/uaccess.h
arch/arm64/include/uapi/asm/hwcap.h
arch/arm64/kernel/Makefile
arch/arm64/kernel/alternative.c
arch/arm64/kernel/cpufeature.c
arch/arm64/kernel/cpuidle.c
arch/arm64/kernel/cpuinfo.c
arch/arm64/kernel/entry-common.c
arch/arm64/kernel/entry.S
arch/arm64/kernel/fpsimd.c
arch/arm64/kernel/ftrace.c
arch/arm64/kernel/head.S
arch/arm64/kernel/hibernate.c
arch/arm64/kernel/hw_breakpoint.c
arch/arm64/kernel/hyp-stub.S
arch/arm64/kernel/idreg-override.c
arch/arm64/kernel/kaslr.c
arch/arm64/kernel/kuser32.S
arch/arm64/kernel/module-plts.c
arch/arm64/kernel/module.c
arch/arm64/kernel/mte.c
arch/arm64/kernel/setup.c
arch/arm64/kernel/signal.c
arch/arm64/kernel/smp.c
arch/arm64/kernel/syscall.c
arch/arm64/kernel/traps.c
arch/arm64/kernel/vdso.c
arch/arm64/kvm/debug.c
arch/arm64/kvm/fpsimd.c
arch/arm64/kvm/hyp/include/hyp/switch.h
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
arch/arm64/kvm/hyp/nvhe/debug-sr.c
arch/arm64/kvm/hyp/nvhe/mem_protect.c
arch/arm64/kvm/hyp/nvhe/switch.c
arch/arm64/kvm/hyp/pgtable.c
arch/arm64/kvm/hyp/vhe/switch.c
arch/arm64/kvm/inject_fault.c
arch/arm64/kvm/pmu-emul.c
arch/arm64/kvm/pmu.c
arch/arm64/kvm/sys_regs.c
arch/arm64/kvm/vgic/vgic-init.c
arch/arm64/kvm/vgic/vgic-its.c
arch/arm64/kvm/vgic/vgic-kvm-device.c
arch/arm64/kvm/vgic/vgic-mmio-v3.c
arch/arm64/kvm/vgic/vgic-mmio.c
arch/arm64/kvm/vgic/vgic-v2.c
arch/arm64/kvm/vgic/vgic-v3.c
arch/arm64/kvm/vgic/vgic-v4.c
arch/arm64/kvm/vmid.c
arch/arm64/lib/xor-neon.c
arch/arm64/mm/context.c
arch/arm64/mm/copypage.c
arch/arm64/mm/fault.c
arch/arm64/mm/flush.c
arch/arm64/mm/init.c
arch/arm64/mm/kasan_init.c
arch/arm64/mm/mmu.c
arch/arm64/mm/proc.S
arch/arm64/tools/cpucaps
arch/arm64/tools/gen-cpucaps.awk
arch/arm64/tools/sysreg
arch/csky/Kconfig
arch/csky/include/asm/atomic.h
arch/csky/include/asm/smp.h
arch/csky/kernel/smp.c
arch/hexagon/include/asm/atomic.h
arch/ia64/Kconfig
arch/ia64/include/asm/atomic.h
arch/ia64/include/asm/bugs.h [deleted file]
arch/ia64/kernel/setup.c
arch/loongarch/Kconfig
arch/loongarch/include/asm/atomic.h
arch/loongarch/include/asm/bugs.h [deleted file]
arch/loongarch/include/asm/loongarch.h
arch/loongarch/include/asm/pgtable-bits.h
arch/loongarch/include/asm/pgtable.h
arch/loongarch/kernel/hw_breakpoint.c
arch/loongarch/kernel/perf_event.c
arch/loongarch/kernel/setup.c
arch/loongarch/kernel/time.c
arch/loongarch/kernel/unaligned.c
arch/m68k/Kconfig
arch/m68k/configs/amiga_defconfig
arch/m68k/configs/apollo_defconfig
arch/m68k/configs/atari_defconfig
arch/m68k/configs/bvme6000_defconfig
arch/m68k/configs/hp300_defconfig
arch/m68k/configs/mac_defconfig
arch/m68k/configs/multi_defconfig
arch/m68k/configs/mvme147_defconfig
arch/m68k/configs/mvme16x_defconfig
arch/m68k/configs/q40_defconfig
arch/m68k/configs/sun3_defconfig
arch/m68k/configs/sun3x_defconfig
arch/m68k/configs/virt_defconfig
arch/m68k/include/asm/atomic.h
arch/m68k/include/asm/bugs.h [deleted file]
arch/m68k/kernel/setup_mm.c
arch/m68k/kernel/signal.c
arch/mips/Kconfig
arch/mips/alchemy/common/dbdma.c
arch/mips/bmips/setup.c
arch/mips/cavium-octeon/smp.c
arch/mips/include/asm/atomic.h
arch/mips/include/asm/bugs.h
arch/mips/include/asm/mach-loongson32/loongson1.h
arch/mips/include/asm/mach-loongson32/regs-pwm.h [deleted file]
arch/mips/include/asm/smp-ops.h
arch/mips/kernel/cpu-probe.c
arch/mips/kernel/setup.c
arch/mips/kernel/smp-bmips.c
arch/mips/kernel/smp-cps.c
arch/mips/kernel/smp.c
arch/mips/loongson32/Kconfig
arch/mips/loongson32/common/time.c
arch/mips/loongson64/smp.c
arch/nios2/boot/dts/10m50_devboard.dts
arch/nios2/boot/dts/3c120_devboard.dts
arch/nios2/kernel/cpuinfo.c
arch/nios2/kernel/setup.c
arch/openrisc/include/asm/atomic.h
arch/parisc/Kconfig
arch/parisc/Kconfig.debug
arch/parisc/include/asm/assembly.h
arch/parisc/include/asm/atomic.h
arch/parisc/include/asm/bugs.h [deleted file]
arch/parisc/include/asm/cacheflush.h
arch/parisc/include/asm/pgtable.h
arch/parisc/include/asm/spinlock.h
arch/parisc/include/asm/spinlock_types.h
arch/parisc/kernel/alternative.c
arch/parisc/kernel/cache.c
arch/parisc/kernel/kexec.c
arch/parisc/kernel/pci-dma.c
arch/parisc/kernel/process.c
arch/parisc/kernel/smp.c
arch/parisc/kernel/traps.c
arch/powerpc/Kconfig
arch/powerpc/boot/Makefile
arch/powerpc/crypto/Kconfig
arch/powerpc/crypto/Makefile
arch/powerpc/crypto/aes-gcm-p10-glue.c
arch/powerpc/crypto/aesp10-ppc.pl [moved from arch/powerpc/crypto/aesp8-ppc.pl with 99% similarity]
arch/powerpc/crypto/ghashp10-ppc.pl [moved from arch/powerpc/crypto/ghashp8-ppc.pl with 97% similarity]
arch/powerpc/include/asm/atomic.h
arch/powerpc/include/asm/bugs.h [deleted file]
arch/powerpc/include/asm/iommu.h
arch/powerpc/include/asm/pgtable.h
arch/powerpc/kernel/dma-iommu.c
arch/powerpc/kernel/iommu.c
arch/powerpc/kernel/isa-bridge.c
arch/powerpc/kernel/smp.c
arch/powerpc/kernel/tau_6xx.c
arch/powerpc/mm/book3s64/radix_pgtable.c
arch/powerpc/mm/book3s64/radix_tlb.c
arch/powerpc/net/bpf_jit_comp.c
arch/powerpc/platforms/Kconfig
arch/powerpc/platforms/powermac/setup.c
arch/powerpc/platforms/powernv/pci.c
arch/powerpc/platforms/pseries/dlpar.c
arch/powerpc/platforms/pseries/iommu.c
arch/powerpc/purgatory/Makefile
arch/powerpc/xmon/xmon.c
arch/riscv/Kconfig
arch/riscv/errata/Makefile
arch/riscv/include/asm/atomic.h
arch/riscv/include/asm/hugetlb.h
arch/riscv/include/asm/kfence.h
arch/riscv/include/asm/perf_event.h
arch/riscv/include/asm/pgtable.h
arch/riscv/include/asm/smp.h
arch/riscv/kernel/Makefile
arch/riscv/kernel/cpu-hotplug.c
arch/riscv/kernel/pi/Makefile
arch/riscv/kernel/probes/Makefile
arch/riscv/kernel/vmlinux.lds.S
arch/riscv/mm/hugetlbpage.c
arch/riscv/mm/init.c
arch/riscv/purgatory/Makefile
arch/s390/Kconfig
arch/s390/boot/vmem.c
arch/s390/configs/debug_defconfig
arch/s390/configs/defconfig
arch/s390/configs/zfcpdump_defconfig
arch/s390/crypto/chacha-glue.c
arch/s390/crypto/paes_s390.c
arch/s390/include/asm/asm-prototypes.h
arch/s390/include/asm/cmpxchg.h
arch/s390/include/asm/compat.h
arch/s390/include/asm/cpacf.h
arch/s390/include/asm/cpu_mf.h
arch/s390/include/asm/os_info.h
arch/s390/include/asm/percpu.h
arch/s390/include/asm/pgtable.h
arch/s390/include/asm/physmem_info.h
arch/s390/include/asm/pkey.h
arch/s390/include/asm/timex.h
arch/s390/include/uapi/asm/pkey.h
arch/s390/include/uapi/asm/statfs.h
arch/s390/kernel/Makefile
arch/s390/kernel/crash_dump.c
arch/s390/kernel/ipl.c
arch/s390/kernel/module.c
arch/s390/kernel/perf_cpum_cf.c
arch/s390/kernel/perf_cpum_sf.c
arch/s390/kernel/perf_pai_crypto.c
arch/s390/kernel/perf_pai_ext.c
arch/s390/kernel/time.c
arch/s390/kernel/topology.c
arch/s390/lib/Makefile
arch/s390/lib/tishift.S [new file with mode: 0644]
arch/s390/mm/pageattr.c
arch/s390/mm/vmem.c
arch/s390/purgatory/Makefile
arch/sh/Kconfig
arch/sh/include/asm/atomic-grb.h
arch/sh/include/asm/atomic-irq.h
arch/sh/include/asm/atomic-llsc.h
arch/sh/include/asm/atomic.h
arch/sh/include/asm/bugs.h [deleted file]
arch/sh/include/asm/processor.h
arch/sh/kernel/idle.c
arch/sh/kernel/setup.c
arch/sparc/Kconfig
arch/sparc/include/asm/atomic_32.h
arch/sparc/include/asm/atomic_64.h
arch/sparc/include/asm/bugs.h [deleted file]
arch/sparc/kernel/setup_32.c
arch/um/Kconfig
arch/um/drivers/Makefile
arch/um/drivers/harddog.h [new file with mode: 0644]
arch/um/drivers/harddog_kern.c
arch/um/drivers/harddog_user.c
arch/um/drivers/harddog_user_exp.c [new file with mode: 0644]
arch/um/drivers/ubd_kern.c
arch/um/include/asm/bugs.h [deleted file]
arch/um/kernel/um_arch.c
arch/x86/Kconfig
arch/x86/Kconfig.cpu
arch/x86/Makefile
arch/x86/Makefile.postlink [new file with mode: 0644]
arch/x86/boot/Makefile
arch/x86/boot/compressed/Makefile
arch/x86/boot/compressed/efi.h
arch/x86/boot/compressed/error.c
arch/x86/boot/compressed/error.h
arch/x86/boot/compressed/kaslr.c
arch/x86/boot/compressed/mem.c [new file with mode: 0644]
arch/x86/boot/compressed/misc.c
arch/x86/boot/compressed/misc.h
arch/x86/boot/compressed/sev.c
arch/x86/boot/compressed/sev.h [new file with mode: 0644]
arch/x86/boot/compressed/tdx-shared.c [new file with mode: 0644]
arch/x86/boot/compressed/tdx.c
arch/x86/boot/cpu.c
arch/x86/coco/core.c
arch/x86/coco/tdx/Makefile
arch/x86/coco/tdx/tdx-shared.c [new file with mode: 0644]
arch/x86/coco/tdx/tdx.c
arch/x86/crypto/aria-aesni-avx-asm_64.S
arch/x86/entry/thunk_64.S
arch/x86/entry/vdso/vgetcpu.c
arch/x86/events/amd/core.c
arch/x86/events/amd/ibs.c
arch/x86/events/core.c
arch/x86/events/intel/core.c
arch/x86/events/intel/ds.c
arch/x86/events/intel/uncore_snbep.c
arch/x86/hyperv/hv_init.c
arch/x86/hyperv/hv_vtl.c
arch/x86/hyperv/ivm.c
arch/x86/include/asm/Kbuild
arch/x86/include/asm/alternative.h
arch/x86/include/asm/apic.h
arch/x86/include/asm/apicdef.h
arch/x86/include/asm/atomic.h
arch/x86/include/asm/atomic64_32.h
arch/x86/include/asm/atomic64_64.h
arch/x86/include/asm/bugs.h
arch/x86/include/asm/cmpxchg.h
arch/x86/include/asm/cmpxchg_32.h
arch/x86/include/asm/cmpxchg_64.h
arch/x86/include/asm/coco.h
arch/x86/include/asm/cpu.h
arch/x86/include/asm/cpufeature.h
arch/x86/include/asm/cpumask.h
arch/x86/include/asm/doublefault.h
arch/x86/include/asm/efi.h
arch/x86/include/asm/fpu/api.h
arch/x86/include/asm/fpu/sched.h
arch/x86/include/asm/ftrace.h
arch/x86/include/asm/mce.h
arch/x86/include/asm/mem_encrypt.h
arch/x86/include/asm/mshyperv.h
arch/x86/include/asm/mtrr.h
arch/x86/include/asm/nops.h
arch/x86/include/asm/nospec-branch.h
arch/x86/include/asm/orc_header.h [new file with mode: 0644]
arch/x86/include/asm/percpu.h
arch/x86/include/asm/perf_event.h
arch/x86/include/asm/pgtable.h
arch/x86/include/asm/pgtable_64.h
arch/x86/include/asm/pgtable_types.h
arch/x86/include/asm/processor.h
arch/x86/include/asm/realmode.h
arch/x86/include/asm/sev-common.h
arch/x86/include/asm/sev.h
arch/x86/include/asm/shared/tdx.h
arch/x86/include/asm/sigframe.h
arch/x86/include/asm/smp.h
arch/x86/include/asm/syscall.h
arch/x86/include/asm/tdx.h
arch/x86/include/asm/tlbflush.h
arch/x86/include/asm/topology.h
arch/x86/include/asm/tsc.h
arch/x86/include/asm/unaccepted_memory.h [new file with mode: 0644]
arch/x86/include/asm/unwind_hints.h
arch/x86/include/asm/uv/uv_hub.h
arch/x86/include/asm/uv/uv_mmrs.h
arch/x86/include/asm/vdso/gettimeofday.h
arch/x86/include/asm/vmx.h
arch/x86/include/asm/x86_init.h
arch/x86/include/uapi/asm/mtrr.h
arch/x86/kernel/Makefile
arch/x86/kernel/acpi/sleep.c
arch/x86/kernel/acpi/sleep.h
arch/x86/kernel/alternative.c
arch/x86/kernel/amd_nb.c
arch/x86/kernel/apic/apic.c
arch/x86/kernel/apic/x2apic_phys.c
arch/x86/kernel/apic/x2apic_uv_x.c
arch/x86/kernel/callthunks.c
arch/x86/kernel/cpu/Makefile
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/cacheinfo.c
arch/x86/kernel/cpu/common.c
arch/x86/kernel/cpu/cpu.h
arch/x86/kernel/cpu/mce/amd.c
arch/x86/kernel/cpu/mce/core.c
arch/x86/kernel/cpu/microcode/amd.c
arch/x86/kernel/cpu/mtrr/Makefile
arch/x86/kernel/cpu/mtrr/amd.c
arch/x86/kernel/cpu/mtrr/centaur.c
arch/x86/kernel/cpu/mtrr/cleanup.c
arch/x86/kernel/cpu/mtrr/cyrix.c
arch/x86/kernel/cpu/mtrr/generic.c
arch/x86/kernel/cpu/mtrr/legacy.c [new file with mode: 0644]
arch/x86/kernel/cpu/mtrr/mtrr.c
arch/x86/kernel/cpu/mtrr/mtrr.h
arch/x86/kernel/cpu/resctrl/rdtgroup.c
arch/x86/kernel/cpu/sgx/encl.c
arch/x86/kernel/cpu/topology.c
arch/x86/kernel/doublefault_32.c
arch/x86/kernel/dumpstack.c
arch/x86/kernel/fpu/context.h
arch/x86/kernel/fpu/core.c
arch/x86/kernel/fpu/init.c
arch/x86/kernel/ftrace.c
arch/x86/kernel/head32.c
arch/x86/kernel/head_32.S
arch/x86/kernel/head_64.S
arch/x86/kernel/irq.c
arch/x86/kernel/itmt.c
arch/x86/kernel/kvmclock.c
arch/x86/kernel/nmi.c
arch/x86/kernel/platform-quirks.c
arch/x86/kernel/process.c
arch/x86/kernel/pvclock.c
arch/x86/kernel/setup.c
arch/x86/kernel/sev-shared.c
arch/x86/kernel/sev.c
arch/x86/kernel/signal.c
arch/x86/kernel/smp.c
arch/x86/kernel/smpboot.c
arch/x86/kernel/topology.c
arch/x86/kernel/tsc.c
arch/x86/kernel/tsc_sync.c
arch/x86/kernel/unwind_orc.c
arch/x86/kernel/vmlinux.lds.S
arch/x86/kernel/x86_init.c
arch/x86/kvm/cpuid.c
arch/x86/kvm/lapic.c
arch/x86/kvm/mmu/mmu.c
arch/x86/kvm/svm/svm.c
arch/x86/kvm/vmx/sgx.c
arch/x86/kvm/x86.c
arch/x86/lib/Makefile
arch/x86/lib/cmpxchg16b_emu.S
arch/x86/lib/cmpxchg8b_emu.S
arch/x86/lib/copy_user_64.S
arch/x86/lib/csum-partial_64.c
arch/x86/lib/getuser.S
arch/x86/lib/memmove_64.S
arch/x86/lib/msr.c
arch/x86/lib/putuser.S
arch/x86/lib/retpoline.S
arch/x86/lib/usercopy_64.c
arch/x86/math-emu/fpu_entry.c
arch/x86/mm/highmem_32.c
arch/x86/mm/init.c
arch/x86/mm/init_32.c
arch/x86/mm/kaslr.c
arch/x86/mm/mem_encrypt_amd.c
arch/x86/mm/mem_encrypt_identity.c
arch/x86/mm/pat/set_memory.c
arch/x86/mm/pgtable.c
arch/x86/net/bpf_jit_comp.c
arch/x86/pci/ce4100.c
arch/x86/pci/xen.c
arch/x86/platform/efi/efi.c
arch/x86/platform/olpc/olpc_dt.c
arch/x86/power/cpu.c
arch/x86/purgatory/Makefile
arch/x86/realmode/init.c
arch/x86/realmode/rm/trampoline_64.S
arch/x86/video/fbdev.c
arch/x86/xen/efi.c
arch/x86/xen/enlighten_hvm.c
arch/x86/xen/enlighten_pv.c
arch/x86/xen/mmu_pv.c
arch/x86/xen/setup.c
arch/x86/xen/smp.h
arch/x86/xen/smp_hvm.c
arch/x86/xen/smp_pv.c
arch/x86/xen/time.c
arch/x86/xen/xen-ops.h
arch/xtensa/Kconfig
arch/xtensa/Kconfig.debug
arch/xtensa/boot/boot-redboot/Makefile
arch/xtensa/include/asm/asm-prototypes.h [new file with mode: 0644]
arch/xtensa/include/asm/asmmacro.h
arch/xtensa/include/asm/atomic.h
arch/xtensa/include/asm/bugs.h [deleted file]
arch/xtensa/include/asm/core.h
arch/xtensa/include/asm/ftrace.h
arch/xtensa/include/asm/platform.h
arch/xtensa/include/asm/string.h
arch/xtensa/include/asm/traps.h
arch/xtensa/kernel/align.S
arch/xtensa/kernel/mcount.S
arch/xtensa/kernel/platform.c
arch/xtensa/kernel/setup.c
arch/xtensa/kernel/signal.c
arch/xtensa/kernel/stacktrace.c
arch/xtensa/kernel/time.c
arch/xtensa/kernel/traps.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/lib/Makefile
arch/xtensa/lib/ashldi3.S
arch/xtensa/lib/ashrdi3.S
arch/xtensa/lib/bswapdi2.S [new file with mode: 0644]
arch/xtensa/lib/bswapsi2.S [new file with mode: 0644]
arch/xtensa/lib/checksum.S
arch/xtensa/lib/divsi3.S
arch/xtensa/lib/lshrdi3.S
arch/xtensa/lib/memcopy.S
arch/xtensa/lib/memset.S
arch/xtensa/lib/modsi3.S
arch/xtensa/lib/mulsi3.S
arch/xtensa/lib/strncpy_user.S
arch/xtensa/lib/strnlen_user.S
arch/xtensa/lib/udivsi3.S
arch/xtensa/lib/umodsi3.S
arch/xtensa/lib/umulsidi3.S
arch/xtensa/lib/usercopy.S
arch/xtensa/mm/kasan_init.c
arch/xtensa/mm/misc.S
arch/xtensa/platforms/iss/setup.c
arch/xtensa/platforms/iss/simdisk.c
arch/xtensa/platforms/xt2000/setup.c
arch/xtensa/platforms/xtfpga/setup.c
block/Makefile
block/bdev.c
block/bfq-iosched.c
block/bio.c
block/blk-cgroup-fc-appid.c
block/blk-cgroup.c
block/blk-core.c
block/blk-flush.c
block/blk-ioc.c
block/blk-iocost.c
block/blk-ioprio.c
block/blk-map.c
block/blk-mq-debugfs.c
block/blk-mq-sched.h
block/blk-mq-tag.c
block/blk-mq.c
block/blk-mq.h
block/blk-rq-qos.c
block/blk-settings.c
block/blk-wbt.c
block/blk-zoned.c
block/blk.h
block/bsg-lib.c
block/bsg.c
block/disk-events.c
block/early-lookup.c [new file with mode: 0644]
block/elevator.c
block/fops.c
block/genhd.c
block/ioctl.c
block/mq-deadline.c
block/partitions/amiga.c
block/partitions/core.c
crypto/asymmetric_keys/public_key.c
drivers/accel/ivpu/Kconfig
drivers/accel/ivpu/ivpu_hw_mtl.c
drivers/accel/ivpu/ivpu_hw_mtl_reg.h
drivers/accel/ivpu/ivpu_ipc.c
drivers/accel/ivpu/ivpu_job.c
drivers/accel/ivpu/ivpu_mmu.c
drivers/accel/qaic/qaic_control.c
drivers/accel/qaic/qaic_data.c
drivers/accel/qaic/qaic_drv.c
drivers/acpi/acpi_ffh.c
drivers/acpi/acpi_lpss.c
drivers/acpi/acpi_pad.c
drivers/acpi/acpica/achware.h
drivers/acpi/apei/apei-internal.h
drivers/acpi/apei/bert.c
drivers/acpi/apei/ghes.c
drivers/acpi/arm64/Makefile
drivers/acpi/arm64/agdi.c
drivers/acpi/arm64/apmt.c
drivers/acpi/arm64/init.c [new file with mode: 0644]
drivers/acpi/arm64/init.h [new file with mode: 0644]
drivers/acpi/arm64/iort.c
drivers/acpi/bus.c
drivers/acpi/button.c
drivers/acpi/ec.c
drivers/acpi/nfit/nfit.h
drivers/acpi/processor_idle.c
drivers/acpi/resource.c
drivers/acpi/scan.c
drivers/acpi/sleep.c
drivers/acpi/thermal.c
drivers/acpi/tiny-power-button.c
drivers/acpi/video_detect.c
drivers/acpi/x86/s2idle.c
drivers/acpi/x86/utils.c
drivers/android/binder.c
drivers/android/binder_alloc.c
drivers/android/binder_alloc.h
drivers/android/binder_alloc_selftest.c
drivers/ata/libata-core.c
drivers/ata/libata-eh.c
drivers/ata/libata-scsi.c
drivers/auxdisplay/ht16k33.c
drivers/auxdisplay/lcd2s.c
drivers/base/cacheinfo.c
drivers/base/class.c
drivers/base/dd.c
drivers/base/firmware_loader/main.c
drivers/base/node.c
drivers/base/power/domain.c
drivers/base/power/wakeup.c
drivers/base/regmap/Kconfig
drivers/base/regmap/regcache-maple.c
drivers/base/regmap/regcache.c
drivers/base/regmap/regmap-sdw.c
drivers/base/regmap/regmap-spi-avmm.c
drivers/base/regmap/regmap.c
drivers/block/amiflop.c
drivers/block/aoe/aoeblk.c
drivers/block/aoe/aoechr.c
drivers/block/ataflop.c
drivers/block/brd.c
drivers/block/drbd/drbd_bitmap.c
drivers/block/drbd/drbd_main.c
drivers/block/drbd/drbd_nl.c
drivers/block/drbd/drbd_receiver.c
drivers/block/floppy.c
drivers/block/loop.c
drivers/block/mtip32xx/mtip32xx.c
drivers/block/nbd.c
drivers/block/null_blk/main.c
drivers/block/pktcdvd.c
drivers/block/rbd.c
drivers/block/rnbd/Makefile
drivers/block/rnbd/rnbd-clt-sysfs.c
drivers/block/rnbd/rnbd-clt.c
drivers/block/rnbd/rnbd-common.c [deleted file]
drivers/block/rnbd/rnbd-proto.h
drivers/block/rnbd/rnbd-srv-sysfs.c
drivers/block/rnbd/rnbd-srv.c
drivers/block/rnbd/rnbd-srv.h
drivers/block/sunvdc.c
drivers/block/swim.c
drivers/block/swim3.c
drivers/block/ublk_drv.c
drivers/block/virtio_blk.c
drivers/block/xen-blkback/xenbus.c
drivers/block/xen-blkfront.c
drivers/block/z2ram.c
drivers/block/zram/zram_drv.c
drivers/bluetooth/btnxpuart.c
drivers/bluetooth/hci_qca.c
drivers/cdrom/cdrom.c
drivers/cdrom/gdrom.c
drivers/char/agp/parisc-agp.c
drivers/char/random.c
drivers/char/tpm/tpm-chip.c
drivers/char/tpm/tpm-interface.c
drivers/char/tpm/tpm_tis.c
drivers/char/tpm/tpm_tis_core.c
drivers/char/tpm/tpm_tis_core.h
drivers/clk/clk-composite.c
drivers/clk/clk-loongson2.c
drivers/clk/imx/clk-imx1.c
drivers/clk/imx/clk-imx27.c
drivers/clk/imx/clk-imx31.c
drivers/clk/imx/clk-imx35.c
drivers/clk/mediatek/clk-mt8365.c
drivers/clk/pxa/clk-pxa3xx.c
drivers/clocksource/Kconfig
drivers/clocksource/Makefile
drivers/clocksource/arm_arch_timer.c
drivers/clocksource/hyperv_timer.c
drivers/clocksource/ingenic-timer.c
drivers/clocksource/timer-cadence-ttc.c
drivers/clocksource/timer-imx-gpt.c
drivers/clocksource/timer-loongson1-pwm.c [new file with mode: 0644]
drivers/cpufreq/Kconfig
drivers/cpufreq/Kconfig.x86
drivers/cpufreq/acpi-cpufreq.c
drivers/cpufreq/amd-pstate.c
drivers/cpufreq/cpufreq.c
drivers/cpufreq/intel_pstate.c
drivers/cpufreq/pcc-cpufreq.c
drivers/cpuidle/cpuidle.c
drivers/cpuidle/poll_state.c
drivers/crypto/allwinner/sun4i-ss/sun4i-ss-cipher.c
drivers/crypto/allwinner/sun4i-ss/sun4i-ss-core.c
drivers/crypto/allwinner/sun4i-ss/sun4i-ss-hash.c
drivers/crypto/allwinner/sun4i-ss/sun4i-ss.h
drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c
drivers/crypto/allwinner/sun8i-ce/sun8i-ce-core.c
drivers/crypto/allwinner/sun8i-ce/sun8i-ce-hash.c
drivers/crypto/allwinner/sun8i-ce/sun8i-ce-prng.c
drivers/crypto/allwinner/sun8i-ce/sun8i-ce-trng.c
drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c
drivers/crypto/allwinner/sun8i-ss/sun8i-ss-core.c
drivers/crypto/allwinner/sun8i-ss/sun8i-ss-hash.c
drivers/crypto/allwinner/sun8i-ss/sun8i-ss-prng.c
drivers/crypto/marvell/octeontx2/otx2_cptpf_main.c
drivers/crypto/marvell/octeontx2/otx2_cptvf_main.c
drivers/cxl/core/mbox.c
drivers/cxl/core/pci.c
drivers/cxl/core/port.c
drivers/cxl/cxl.h
drivers/cxl/cxlmem.h
drivers/cxl/cxlpci.h
drivers/cxl/mem.c
drivers/cxl/pci.c
drivers/cxl/port.c
drivers/devfreq/exynos-bus.c
drivers/devfreq/mtk-cci-devfreq.c
drivers/dma-buf/udmabuf.c
drivers/dma/at_hdmac.c
drivers/dma/at_xdmac.c
drivers/dma/idxd/cdev.c
drivers/dma/pl330.c
drivers/dma/ti/k3-udma.c
drivers/edac/Kconfig
drivers/edac/Makefile
drivers/edac/amd64_edac.c
drivers/edac/amd64_edac.h
drivers/edac/mce_amd.c
drivers/edac/npcm_edac.c [new file with mode: 0644]
drivers/edac/qcom_edac.c
drivers/edac/thunderx_edac.c
drivers/firewire/net.c
drivers/firmware/arm_ffa/bus.c
drivers/firmware/arm_ffa/driver.c
drivers/firmware/arm_scmi/raw_mode.c
drivers/firmware/cirrus/cs_dsp.c
drivers/firmware/efi/Kconfig
drivers/firmware/efi/Makefile
drivers/firmware/efi/efi.c
drivers/firmware/efi/libstub/Makefile
drivers/firmware/efi/libstub/Makefile.zboot
drivers/firmware/efi/libstub/bitmap.c [new file with mode: 0644]
drivers/firmware/efi/libstub/efistub.h
drivers/firmware/efi/libstub/find.c [new file with mode: 0644]
drivers/firmware/efi/libstub/unaccepted_memory.c [new file with mode: 0644]
drivers/firmware/efi/libstub/x86-stub.c
drivers/firmware/efi/unaccepted_memory.c [new file with mode: 0644]
drivers/firmware/iscsi_ibft_find.c
drivers/firmware/sysfb_simplefb.c
drivers/gpio/Kconfig
drivers/gpio/gpio-f7188x.c
drivers/gpio/gpio-mockup.c
drivers/gpio/gpio-sifive.c
drivers/gpio/gpio-sim.c
drivers/gpio/gpiolib.c
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
drivers/gpu/drm/amd/amdgpu/nv.c
drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
drivers/gpu/drm/amd/amdgpu/soc15.c
drivers/gpu/drm/amd/amdgpu/soc21.c
drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
drivers/gpu/drm/amd/amdgpu/vi.c
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
drivers/gpu/drm/amd/display/dc/core/dc.c
drivers/gpu/drm/amd/display/dc/core/dc_resource.c
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
drivers/gpu/drm/amd/display/dc/dcn314/dcn314_hwseq.c
drivers/gpu/drm/amd/display/dc/dcn314/dcn314_hwseq.h
drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c
drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.h
drivers/gpu/drm/amd/display/dc/link/link_detection.c
drivers/gpu/drm/amd/display/dc/link/link_validation.c
drivers/gpu/drm/amd/pm/amdgpu_dpm.c
drivers/gpu/drm/amd/pm/amdgpu_pm.c
drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_5_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
drivers/gpu/drm/ast/ast_dp.c
drivers/gpu/drm/ast/ast_drv.h
drivers/gpu/drm/ast/ast_main.c
drivers/gpu/drm/ast/ast_mode.c
drivers/gpu/drm/ast/ast_post.c
drivers/gpu/drm/bridge/ti-sn65dsi86.c
drivers/gpu/drm/display/drm_dp_mst_topology.c
drivers/gpu/drm/drm_fb_helper.c
drivers/gpu/drm/drm_managed.c
drivers/gpu/drm/drm_mipi_dsi.c
drivers/gpu/drm/drm_panel_orientation_quirks.c
drivers/gpu/drm/exynos/exynos_drm_g2d.c
drivers/gpu/drm/exynos/exynos_drm_g2d.h
drivers/gpu/drm/exynos/exynos_drm_vidi.c
drivers/gpu/drm/i915/Kconfig
drivers/gpu/drm/i915/display/intel_atomic_plane.c
drivers/gpu/drm/i915/display/intel_cdclk.c
drivers/gpu/drm/i915/display/intel_display.c
drivers/gpu/drm/i915/display/intel_dp.c
drivers/gpu/drm/i915/display/intel_dp_aux.c
drivers/gpu/drm/i915/display/intel_hdcp.c
drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
drivers/gpu/drm/i915/gt/selftest_execlists.c
drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
drivers/gpu/drm/i915/i915_pci.c
drivers/gpu/drm/i915/i915_perf.c
drivers/gpu/drm/lima/lima_sched.c
drivers/gpu/drm/mgag200/mgag200_mode.c
drivers/gpu/drm/msm/adreno/a6xx_gmu.c
drivers/gpu/drm/msm/adreno/a6xx_gpu.c
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_3_0_msm8998.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_6_0_sm8250.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_6_2_sc7180.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_6_3_sm6115.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_6_5_qcm2290.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_7_0_sm8350.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_7_2_sc7280.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_1_sm8450.h
drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_9_0_sm8550.h
drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
drivers/gpu/drm/msm/disp/dpu1/dpu_hw_wb.c
drivers/gpu/drm/msm/disp/dpu1/dpu_hwio.h
drivers/gpu/drm/msm/dp/dp_audio.c
drivers/gpu/drm/msm/dp/dp_audio.h
drivers/gpu/drm/msm/dp/dp_catalog.c
drivers/gpu/drm/msm/dp/dp_catalog.h
drivers/gpu/drm/msm/dp/dp_display.c
drivers/gpu/drm/msm/msm_atomic.c
drivers/gpu/drm/msm/msm_drv.c
drivers/gpu/drm/msm/msm_gem.c
drivers/gpu/drm/msm/msm_gem_submit.c
drivers/gpu/drm/msm/msm_iommu.c
drivers/gpu/drm/nouveau/include/nvif/if0012.h
drivers/gpu/drm/nouveau/nouveau_acpi.c
drivers/gpu/drm/nouveau/nouveau_connector.c
drivers/gpu/drm/nouveau/nouveau_drm.c
drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.h
drivers/gpu/drm/nouveau/nvkm/engine/disp/uoutp.c
drivers/gpu/drm/pl111/pl111_display.c
drivers/gpu/drm/pl111/pl111_drm.h
drivers/gpu/drm/pl111/pl111_drv.c
drivers/gpu/drm/pl111/pl111_versatile.c
drivers/gpu/drm/radeon/radeon_fbdev.c
drivers/gpu/drm/radeon/radeon_gem.c
drivers/gpu/drm/radeon/radeon_irq_kms.c
drivers/gpu/drm/scheduler/sched_main.c
drivers/gpu/drm/vmwgfx/vmwgfx_msg_x86.h
drivers/greybus/connection.c
drivers/greybus/svc.c
drivers/hid/hid-google-hammer.c
drivers/hid/hid-ids.h
drivers/hid/hid-logitech-hidpp.c
drivers/hid/wacom_sys.c
drivers/hid/wacom_wac.c
drivers/hv/channel_mgmt.c
drivers/hv/hv_common.c
drivers/hv/vmbus_drv.c
drivers/hwmon/k10temp.c
drivers/hwtracing/coresight/coresight-etm-perf.c
drivers/hwtracing/coresight/coresight-tmc-etr.c
drivers/hwtracing/coresight/coresight-trbe.c
drivers/hwtracing/coresight/coresight-trbe.h
drivers/i2c/busses/i2c-designware-core.h
drivers/i2c/busses/i2c-designware-slave.c
drivers/i2c/busses/i2c-img-scb.c
drivers/i2c/busses/i2c-imx-lpi2c.c
drivers/i2c/busses/i2c-mchp-pci1xxxx.c
drivers/i2c/busses/i2c-mv64xxx.c
drivers/i2c/busses/i2c-qup.c
drivers/i2c/busses/i2c-sprd.c
drivers/idle/intel_idle.c
drivers/iio/accel/kionix-kx022a.c
drivers/iio/accel/st_accel_core.c
drivers/iio/adc/ad4130.c
drivers/iio/adc/ad7192.c
drivers/iio/adc/ad_sigma_delta.c
drivers/iio/adc/imx93_adc.c
drivers/iio/adc/mt6370-adc.c
drivers/iio/adc/mxs-lradc-adc.c
drivers/iio/adc/palmas_gpadc.c
drivers/iio/adc/stm32-adc.c
drivers/iio/addac/ad74413r.c
drivers/iio/dac/Makefile
drivers/iio/dac/mcp4725.c
drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.c
drivers/iio/industrialio-gts-helper.c
drivers/iio/light/rohm-bu27034.c
drivers/iio/light/vcnl4035.c
drivers/iio/magnetometer/tmag5273.c
drivers/infiniband/core/cma.c
drivers/infiniband/core/uverbs_cmd.c
drivers/infiniband/core/uverbs_main.c
drivers/infiniband/hw/bnxt_re/bnxt_re.h
drivers/infiniband/hw/bnxt_re/ib_verbs.c
drivers/infiniband/hw/bnxt_re/main.c
drivers/infiniband/hw/bnxt_re/qplib_fp.c
drivers/infiniband/hw/bnxt_re/qplib_res.c
drivers/infiniband/hw/bnxt_re/qplib_sp.c
drivers/infiniband/hw/efa/efa_verbs.c
drivers/infiniband/hw/hns/hns_roce_hw_v2.c
drivers/infiniband/hw/hns/hns_roce_hw_v2.h
drivers/infiniband/hw/hns/hns_roce_mr.c
drivers/infiniband/hw/irdma/verbs.c
drivers/infiniband/hw/mlx5/counters.c
drivers/infiniband/hw/mlx5/fs.c
drivers/infiniband/hw/mlx5/fs.h
drivers/infiniband/hw/mlx5/main.c
drivers/infiniband/hw/mlx5/mlx5_ib.h
drivers/infiniband/hw/mlx5/qp.c
drivers/infiniband/sw/rxe/rxe_comp.c
drivers/infiniband/sw/rxe/rxe_cq.c
drivers/infiniband/sw/rxe/rxe_net.c
drivers/infiniband/sw/rxe/rxe_qp.c
drivers/infiniband/sw/rxe/rxe_recv.c
drivers/infiniband/sw/rxe/rxe_req.c
drivers/infiniband/sw/rxe/rxe_resp.c
drivers/infiniband/sw/rxe/rxe_verbs.c
drivers/infiniband/ulp/isert/ib_isert.c
drivers/infiniband/ulp/rtrs/rtrs-clt.c
drivers/infiniband/ulp/rtrs/rtrs.c
drivers/input/input.c
drivers/input/joystick/xpad.c
drivers/input/misc/soc_button_array.c
drivers/input/mouse/elantech.c
drivers/input/touchscreen/cyttsp5.c
drivers/input/touchscreen/sun4i-ts.c
drivers/iommu/Kconfig
drivers/iommu/amd/amd_iommu.h
drivers/iommu/amd/amd_iommu_types.h
drivers/iommu/amd/init.c
drivers/iommu/amd/iommu.c
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
drivers/iommu/intel/irq_remapping.c
drivers/iommu/mtk_iommu.c
drivers/iommu/rockchip-iommu.c
drivers/irqchip/irq-clps711x.c
drivers/irqchip/irq-ftintc010.c
drivers/irqchip/irq-gic-common.c
drivers/irqchip/irq-gic-common.h
drivers/irqchip/irq-gic-v3-its.c
drivers/irqchip/irq-gic-v3.c
drivers/irqchip/irq-jcore-aic.c
drivers/irqchip/irq-loongson-eiointc.c
drivers/irqchip/irq-loongson-liointc.c
drivers/irqchip/irq-loongson-pch-pic.c
drivers/irqchip/irq-mbigen.c
drivers/irqchip/irq-meson-gpio.c
drivers/irqchip/irq-mips-gic.c
drivers/irqchip/irq-mmp.c
drivers/irqchip/irq-mxs.c
drivers/irqchip/irq-stm32-exti.c
drivers/leds/rgb/leds-qcom-lpg.c
drivers/mailbox/mailbox-test.c
drivers/md/bcache/bcache.h
drivers/md/bcache/btree.c
drivers/md/bcache/btree.h
drivers/md/bcache/request.c
drivers/md/bcache/stats.h
drivers/md/bcache/super.c
drivers/md/bcache/sysfs.c
drivers/md/bcache/sysfs.h
drivers/md/bcache/writeback.c
drivers/md/dm-cache-metadata.c
drivers/md/dm-cache-target.c
drivers/md/dm-clone-target.c
drivers/md/dm-core.h
drivers/md/dm-crypt.c
drivers/md/dm-era-target.c
drivers/md/dm-init.c
drivers/md/dm-integrity.c
drivers/md/dm-ioctl.c
drivers/md/dm-raid.c
drivers/md/dm-snap.c
drivers/md/dm-table.c
drivers/md/dm-thin-metadata.c
drivers/md/dm-thin.c
drivers/md/dm-verity-fec.c
drivers/md/dm-verity-target.c
drivers/md/dm-zoned-metadata.c
drivers/md/dm.c
drivers/md/dm.h
drivers/md/md-autodetect.c
drivers/md/md-bitmap.c
drivers/md/md-bitmap.h
drivers/md/md-cluster.c
drivers/md/md-multipath.c
drivers/md/md.c
drivers/md/md.h
drivers/md/raid1-10.c
drivers/md/raid1.c
drivers/md/raid1.h
drivers/md/raid10.c
drivers/md/raid10.h
drivers/md/raid5-cache.c
drivers/md/raid5-ppl.c
drivers/md/raid5.c
drivers/md/raid5.h
drivers/media/cec/core/cec-adap.c
drivers/media/cec/core/cec-core.c
drivers/media/cec/core/cec-priv.h
drivers/media/dvb-core/dvb_ca_en50221.c
drivers/media/dvb-core/dvb_demux.c
drivers/media/dvb-core/dvb_frontend.c
drivers/media/dvb-core/dvb_net.c
drivers/media/dvb-core/dvbdev.c
drivers/media/dvb-frontends/mn88443x.c
drivers/media/pci/netup_unidvb/netup_unidvb_core.c
drivers/media/platform/amphion/vpu_core.c
drivers/media/platform/amphion/vpu_v4l2.c
drivers/media/platform/chips-media/coda-common.c
drivers/media/platform/mediatek/mdp3/mtk-mdp3-comp.c
drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateful.c
drivers/media/platform/nxp/imx8-isi/imx8-isi-core.c
drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c
drivers/media/platform/qcom/camss/camss-video.c
drivers/media/platform/renesas/rcar-vin/rcar-dma.c
drivers/media/platform/verisilicon/hantro_v4l2.c
drivers/media/usb/dvb-usb-v2/ce6230.c
drivers/media/usb/dvb-usb-v2/ec168.c
drivers/media/usb/dvb-usb-v2/rtl28xxu.c
drivers/media/usb/dvb-usb/az6027.c
drivers/media/usb/dvb-usb/digitv.c
drivers/media/usb/dvb-usb/dw2102.c
drivers/media/usb/pvrusb2/Kconfig
drivers/media/usb/ttusb-dec/ttusb_dec.c
drivers/media/usb/uvc/uvc_driver.c
drivers/media/v4l2-core/v4l2-mc.c
drivers/misc/eeprom/Kconfig
drivers/misc/fastrpc.c
drivers/misc/lkdtm/bugs.c
drivers/mmc/core/block.c
drivers/mmc/core/pwrseq_sd8787.c
drivers/mmc/host/bcm2835.c
drivers/mmc/host/litex_mmc.c
drivers/mmc/host/meson-gx-mmc.c
drivers/mmc/host/mmci.c
drivers/mmc/host/mtk-sd.c
drivers/mmc/host/mvsdio.c
drivers/mmc/host/omap.c
drivers/mmc/host/omap_hsmmc.c
drivers/mmc/host/owl-mmc.c
drivers/mmc/host/sdhci-acpi.c
drivers/mmc/host/sdhci-cadence.c
drivers/mmc/host/sdhci-esdhc-imx.c
drivers/mmc/host/sdhci-msm.c
drivers/mmc/host/sdhci-spear.c
drivers/mmc/host/sh_mmcif.c
drivers/mmc/host/sunxi-mmc.c
drivers/mmc/host/usdhi6rol0.c
drivers/mmc/host/vub300.c
drivers/mtd/devices/block2mtd.c
drivers/mtd/mtd_blkdevs.c
drivers/mtd/mtdblock.c
drivers/mtd/mtdchar.c
drivers/mtd/nand/raw/ingenic/ingenic_ecc.h
drivers/mtd/nand/raw/marvell_nand.c
drivers/mtd/spi-nor/core.c
drivers/mtd/spi-nor/spansion.c
drivers/mtd/ubi/block.c
drivers/net/bonding/bond_main.c
drivers/net/bonding/bond_netlink.c
drivers/net/bonding/bond_options.c
drivers/net/can/Kconfig
drivers/net/can/bxcan.c
drivers/net/can/dev/skb.c
drivers/net/can/kvaser_pciefd.c
drivers/net/dsa/lan9303-core.c
drivers/net/dsa/mt7530.c
drivers/net/dsa/mt7530.h
drivers/net/dsa/mv88e6xxx/chip.c
drivers/net/dsa/mv88e6xxx/port.h
drivers/net/dsa/ocelot/felix_vsc9959.c
drivers/net/dsa/qca/Kconfig
drivers/net/dsa/rzn1_a5psw.c
drivers/net/dsa/rzn1_a5psw.h
drivers/net/ethernet/3com/3c515.c
drivers/net/ethernet/3com/3c589_cs.c
drivers/net/ethernet/8390/ne.c
drivers/net/ethernet/8390/smc-ultra.c
drivers/net/ethernet/8390/wd.c
drivers/net/ethernet/amd/lance.c
drivers/net/ethernet/amd/pds_core/dev.c
drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
drivers/net/ethernet/broadcom/bcmsysport.c
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/net/ethernet/broadcom/bnxt/bnxt.c
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
drivers/net/ethernet/broadcom/genet/bcmgenet.c
drivers/net/ethernet/broadcom/genet/bcmgenet.h
drivers/net/ethernet/broadcom/genet/bcmmii.c
drivers/net/ethernet/cavium/thunder/thunder_bgx.c
drivers/net/ethernet/cirrus/cs89x0.c
drivers/net/ethernet/emulex/benet/be_main.c
drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
drivers/net/ethernet/freescale/enetc/enetc.c
drivers/net/ethernet/freescale/enetc/enetc_qos.c
drivers/net/ethernet/freescale/fec_main.c
drivers/net/ethernet/google/gve/gve_main.c
drivers/net/ethernet/hisilicon/hns3/hns3_common/hclge_comm_cmd.c
drivers/net/ethernet/hisilicon/hns3/hns3_common/hclge_comm_cmd.h
drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.h
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
drivers/net/ethernet/intel/iavf/iavf.h
drivers/net/ethernet/intel/iavf/iavf_main.c
drivers/net/ethernet/intel/iavf/iavf_register.h
drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
drivers/net/ethernet/intel/ice/ice_common.c
drivers/net/ethernet/intel/ice/ice_common.h
drivers/net/ethernet/intel/ice/ice_dcb_lib.c
drivers/net/ethernet/intel/ice/ice_gnss.c
drivers/net/ethernet/intel/ice/ice_gnss.h
drivers/net/ethernet/intel/ice/ice_lib.c
drivers/net/ethernet/intel/ice/ice_main.c
drivers/net/ethernet/intel/ice/ice_sriov.c
drivers/net/ethernet/intel/ice/ice_txrx.c
drivers/net/ethernet/intel/ice/ice_txrx.h
drivers/net/ethernet/intel/ice/ice_vf_lib.c
drivers/net/ethernet/intel/ice/ice_vf_lib.h
drivers/net/ethernet/intel/ice/ice_virtchnl.c
drivers/net/ethernet/intel/igb/e1000_mac.c
drivers/net/ethernet/intel/igb/igb_ethtool.c
drivers/net/ethernet/intel/igb/igb_main.c
drivers/net/ethernet/intel/igc/igc_main.c
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
drivers/net/ethernet/marvell/octeon_ep/octep_main.c
drivers/net/ethernet/marvell/octeontx2/af/rvu.c
drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_hash.c
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c
drivers/net/ethernet/mediatek/mtk_eth_soc.c
drivers/net/ethernet/mediatek/mtk_wed.c
drivers/net/ethernet/mellanox/mlx5/core/cmd.c
drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
drivers/net/ethernet/mellanox/mlx5/core/en.h
drivers/net/ethernet/mellanox/mlx5/core/en/params.c
drivers/net/ethernet/mellanox/mlx5/core/en/params.h
drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.c
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.h
drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
drivers/net/ethernet/mellanox/mlx5/core/en_common.c
drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
drivers/net/ethernet/mellanox/mlx5/core/eq.c
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
drivers/net/ethernet/mellanox/mlx5/core/main.c
drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
drivers/net/ethernet/mellanox/mlx5/core/mlx5_irq.h
drivers/net/ethernet/mellanox/mlx5/core/mr.c
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ptrn.c
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c
drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.h
drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
drivers/net/ethernet/mellanox/mlx5/core/thermal.c
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
drivers/net/ethernet/microchip/lan966x/lan966x_main.c
drivers/net/ethernet/microsoft/mana/mana_en.c
drivers/net/ethernet/microsoft/mana/mana_ethtool.c
drivers/net/ethernet/mscc/vsc7514_regs.c
drivers/net/ethernet/netronome/nfp/nic/main.h
drivers/net/ethernet/nvidia/forcedeth.c
drivers/net/ethernet/qlogic/qed/qed_l2.c
drivers/net/ethernet/qlogic/qede/qede.h
drivers/net/ethernet/qlogic/qede/qede_ethtool.c
drivers/net/ethernet/qlogic/qede/qede_main.c
drivers/net/ethernet/qualcomm/qca_spi.c
drivers/net/ethernet/realtek/r8169_main.c
drivers/net/ethernet/renesas/rswitch.c
drivers/net/ethernet/sfc/ef10.c
drivers/net/ethernet/sfc/ef100_netdev.c
drivers/net/ethernet/sfc/ef100_nic.c
drivers/net/ethernet/sfc/ef100_tx.c
drivers/net/ethernet/sfc/ef100_tx.h
drivers/net/ethernet/sfc/efx_channels.c
drivers/net/ethernet/sfc/efx_devlink.c
drivers/net/ethernet/sfc/siena/efx_channels.c
drivers/net/ethernet/sfc/tc.c
drivers/net/ethernet/sfc/tx_common.c
drivers/net/ethernet/sfc/tx_common.h
drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
drivers/net/ethernet/stmicro/stmmac/dwmac4.h
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
drivers/net/ethernet/stmicro/stmmac/stmmac_xdp.c
drivers/net/ethernet/sun/cassini.c
drivers/net/ethernet/ti/am65-cpsw-nuss.c
drivers/net/ieee802154/adf7242.c
drivers/net/ieee802154/mac802154_hwsim.c
drivers/net/ipa/ipa_endpoint.c
drivers/net/ipvlan/ipvlan_core.c
drivers/net/ipvlan/ipvlan_l3s.c
drivers/net/macsec.c
drivers/net/mdio/mdio-i2c.c
drivers/net/mdio/mdio-mvusb.c
drivers/net/pcs/pcs-xpcs.c
drivers/net/phy/bcm-phy-lib.h
drivers/net/phy/bcm7xxx.c
drivers/net/phy/dp83867.c
drivers/net/phy/mdio_bus.c
drivers/net/phy/mscc/mscc.h
drivers/net/phy/mscc/mscc_main.c
drivers/net/phy/mxl-gpy.c
drivers/net/phy/phy_device.c
drivers/net/phy/phylink.c
drivers/net/tap.c
drivers/net/team/team.c
drivers/net/tun.c
drivers/net/usb/cdc_ncm.c
drivers/net/usb/qmi_wwan.c
drivers/net/virtio_net.c
drivers/net/wan/lapbether.c
drivers/net/wireless/ath/ath10k/qmi.c
drivers/net/wireless/ath/ath11k/qmi.c
drivers/net/wireless/ath/ath12k/qmi.c
drivers/net/wireless/broadcom/b43/b43.h
drivers/net/wireless/broadcom/b43legacy/b43legacy.h
drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
drivers/net/wireless/intel/iwlwifi/fw/acpi.c
drivers/net/wireless/intel/iwlwifi/fw/dbg.c
drivers/net/wireless/intel/iwlwifi/mvm/d3.c
drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c
drivers/net/wireless/intel/iwlwifi/mvm/fw.c
drivers/net/wireless/intel/iwlwifi/mvm/link.c
drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
drivers/net/wireless/intel/iwlwifi/mvm/mld-sta.c
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
drivers/net/wireless/intel/iwlwifi/mvm/nvm.c
drivers/net/wireless/intel/iwlwifi/mvm/rfi.c
drivers/net/wireless/intel/iwlwifi/mvm/rs.c
drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
drivers/net/wireless/intel/iwlwifi/mvm/sta.c
drivers/net/wireless/intel/iwlwifi/mvm/tx.c
drivers/net/wireless/intel/iwlwifi/pcie/drv.c
drivers/net/wireless/intel/iwlwifi/pcie/trans.c
drivers/net/wireless/marvell/mwifiex/cfg80211.c
drivers/net/wireless/marvell/mwifiex/main.c
drivers/net/wireless/mediatek/mt76/mt7615/mac.c
drivers/net/wireless/mediatek/mt76/mt76_connac2_mac.h
drivers/net/wireless/mediatek/mt76/mt76_connac_mac.c
drivers/net/wireless/mediatek/mt76/mt7996/mac.c
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
drivers/net/wireless/realtek/rtw88/mac80211.c
drivers/net/wireless/realtek/rtw88/main.c
drivers/net/wireless/realtek/rtw88/main.h
drivers/net/wireless/realtek/rtw88/ps.c
drivers/net/wireless/realtek/rtw88/ps.h
drivers/net/wireless/realtek/rtw88/sdio.c
drivers/net/wireless/realtek/rtw88/usb.h
drivers/net/wireless/realtek/rtw89/core.c
drivers/net/wireless/realtek/rtw89/mac.c
drivers/net/wireless/realtek/rtw89/mac.h
drivers/net/wireless/realtek/rtw89/mac80211.c
drivers/net/wireless/realtek/rtw89/ps.c
drivers/net/wireless/realtek/rtw89/ps.h
drivers/net/wireless/realtek/rtw89/rtw8852b.c
drivers/net/wireless/virtual/mac80211_hwsim.c
drivers/net/wwan/iosm/iosm_ipc_imem.c
drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
drivers/net/wwan/iosm/iosm_ipc_mux_codec.c
drivers/net/wwan/iosm/iosm_ipc_mux_codec.h
drivers/net/wwan/t7xx/t7xx_hif_cldma.c
drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
drivers/net/wwan/t7xx/t7xx_pci.c
drivers/net/wwan/t7xx/t7xx_pci.h
drivers/nfc/fdp/fdp.c
drivers/nfc/nfcsim.c
drivers/nubus/nubus.c
drivers/nubus/proc.c
drivers/nvme/host/Makefile
drivers/nvme/host/auth.c
drivers/nvme/host/constants.c
drivers/nvme/host/core.c
drivers/nvme/host/fabrics.c
drivers/nvme/host/fabrics.h
drivers/nvme/host/hwmon.c
drivers/nvme/host/ioctl.c
drivers/nvme/host/multipath.c
drivers/nvme/host/nvme.h
drivers/nvme/host/pci.c
drivers/nvme/host/rdma.c
drivers/nvme/host/sysfs.c [new file with mode: 0644]
drivers/nvme/host/tcp.c
drivers/nvme/target/fabrics-cmd-auth.c
drivers/nvme/target/fcloop.c
drivers/nvme/target/io-cmd-bdev.c
drivers/nvme/target/nvmet.h
drivers/nvme/target/passthru.c
drivers/of/overlay.c
drivers/pci/controller/pci-hyperv.c
drivers/pci/quirks.c
drivers/perf/Kconfig
drivers/perf/Makefile
drivers/perf/apple_m1_cpu_pmu.c
drivers/perf/arm-cci.c
drivers/perf/arm-cmn.c
drivers/perf/arm_cspmu/Kconfig
drivers/perf/arm_cspmu/arm_cspmu.c
drivers/perf/arm_cspmu/arm_cspmu.h
drivers/perf/arm_dmc620_pmu.c
drivers/perf/arm_pmu.c
drivers/perf/arm_pmuv3.c
drivers/perf/fsl_imx9_ddr_perf.c [new file with mode: 0644]
drivers/perf/hisilicon/Makefile
drivers/perf/hisilicon/hisi_pcie_pmu.c
drivers/perf/hisilicon/hisi_uncore_pa_pmu.c
drivers/perf/hisilicon/hisi_uncore_pmu.c
drivers/perf/hisilicon/hisi_uncore_pmu.h
drivers/perf/hisilicon/hisi_uncore_uc_pmu.c [new file with mode: 0644]
drivers/perf/qcom_l2_pmu.c
drivers/phy/amlogic/phy-meson-g12a-mipi-dphy-analog.c
drivers/phy/mediatek/phy-mtk-hdmi-mt8195.c
drivers/phy/qualcomm/phy-qcom-qmp-combo.c
drivers/phy/qualcomm/phy-qcom-qmp-pcie-msm8996.c
drivers/phy/qualcomm/phy-qcom-snps-femto-v2.c
drivers/pinctrl/meson/pinctrl-meson-axg.c
drivers/pinctrl/pinctrl-amd.c
drivers/platform/chrome/cros_ec_i2c.c
drivers/platform/chrome/cros_ec_lpc.c
drivers/platform/chrome/cros_ec_spi.c
drivers/platform/chrome/cros_hps_i2c.c
drivers/platform/chrome/cros_typec_switch.c
drivers/platform/mellanox/mlxbf-pmc.c
drivers/platform/mellanox/mlxbf-tmfifo.c
drivers/platform/surface/aggregator/controller.c
drivers/platform/surface/surface_aggregator_tabletsw.c
drivers/platform/x86/amd/pmc.c
drivers/platform/x86/amd/pmf/core.c
drivers/platform/x86/asus-nb-wmi.c
drivers/platform/x86/hp/hp-wmi.c
drivers/platform/x86/intel/ifs/load.c
drivers/platform/x86/intel/int3472/clk_and_regulator.c
drivers/platform/x86/intel/speed_select_if/isst_if_common.c
drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c
drivers/platform/x86/intel_scu_pcidrv.c
drivers/platform/x86/thinkpad_acpi.c
drivers/platform/x86/touchscreen_dmi.c
drivers/power/supply/ab8500_btemp.c
drivers/power/supply/ab8500_fg.c
drivers/power/supply/axp288_fuel_gauge.c
drivers/power/supply/bq24190_charger.c
drivers/power/supply/bq25890_charger.c
drivers/power/supply/bq27xxx_battery.c
drivers/power/supply/bq27xxx_battery_i2c.c
drivers/power/supply/mt6360_charger.c
drivers/power/supply/power_supply_core.c
drivers/power/supply/power_supply_leds.c
drivers/power/supply/power_supply_sysfs.c
drivers/power/supply/rt9467-charger.c
drivers/power/supply/sbs-charger.c
drivers/power/supply/sc27xx_fuel_gauge.c
drivers/powercap/Kconfig
drivers/powercap/Makefile
drivers/powercap/intel_rapl_common.c
drivers/powercap/intel_rapl_msr.c
drivers/powercap/intel_rapl_tpmi.c [new file with mode: 0644]
drivers/pwm/pwm-atmel.c
drivers/pwm/pwm-pxa.c
drivers/ras/debugfs.c
drivers/regulator/core.c
drivers/regulator/mt6359-regulator.c
drivers/regulator/pca9450-regulator.c
drivers/regulator/qcom-rpmh-regulator.c
drivers/s390/block/dasd.c
drivers/s390/block/dasd_eckd.c
drivers/s390/block/dasd_genhd.c
drivers/s390/block/dasd_int.h
drivers/s390/block/dasd_ioctl.c
drivers/s390/block/dcssblk.c
drivers/s390/char/zcore.c
drivers/s390/cio/device.c
drivers/s390/cio/qdio.h
drivers/s390/cio/vfio_ccw_drv.c
drivers/s390/cio/vfio_ccw_private.h
drivers/s390/crypto/pkey_api.c
drivers/s390/crypto/vfio_ap_ops.c
drivers/s390/crypto/vfio_ap_private.h
drivers/s390/net/ism_drv.c
drivers/scsi/NCR5380.c
drivers/scsi/aacraid/aacraid.h
drivers/scsi/aacraid/commsup.c
drivers/scsi/aacraid/linit.c
drivers/scsi/aacraid/src.c
drivers/scsi/ch.c
drivers/scsi/lpfc/lpfc_bsg.c
drivers/scsi/qla2xxx/qla_def.h
drivers/scsi/qla2xxx/qla_init.c
drivers/scsi/qla2xxx/qla_inline.h
drivers/scsi/qla2xxx/qla_isr.c
drivers/scsi/scsi_bsg.c
drivers/scsi/scsi_ioctl.c
drivers/scsi/scsi_lib.c
drivers/scsi/sd.c
drivers/scsi/sg.c
drivers/scsi/sr.c
drivers/scsi/st.c
drivers/scsi/stex.c
drivers/scsi/storvsc_drv.c
drivers/soc/fsl/qe/Kconfig
drivers/soc/qcom/Makefile
drivers/soc/qcom/icc-bwmon.c
drivers/soc/qcom/ramp_controller.c
drivers/soc/qcom/rmtfs_mem.c
drivers/soc/qcom/rpmh-rsc.c
drivers/soc/qcom/rpmhpd.c
drivers/soundwire/dmi-quirks.c
drivers/soundwire/qcom.c
drivers/soundwire/stream.c
drivers/spi/spi-cadence-quadspi.c
drivers/spi/spi-cadence.c
drivers/spi/spi-dw-mmio.c
drivers/spi/spi-fsl-dspi.c
drivers/spi/spi-fsl-lpspi.c
drivers/spi/spi-geni-qcom.c
drivers/spi/spi-mt65xx.c
drivers/spi/spi-qup.c
drivers/staging/media/atomisp/i2c/atomisp-ov2680.c
drivers/staging/media/imx/imx8mq-mipi-csi2.c
drivers/staging/octeon/TODO
drivers/target/iscsi/iscsi_target.c
drivers/target/iscsi/iscsi_target_login.c
drivers/target/iscsi/iscsi_target_nego.c
drivers/target/iscsi/iscsi_target_util.c
drivers/target/iscsi/iscsi_target_util.h
drivers/target/target_core_iblock.c
drivers/target/target_core_pscsi.c
drivers/target/target_core_transport.c
drivers/tee/amdtee/amdtee_if.h
drivers/tee/amdtee/call.c
drivers/tee/optee/smc_abi.c
drivers/thermal/Kconfig
drivers/thermal/amlogic_thermal.c
drivers/thermal/armada_thermal.c
drivers/thermal/imx8mm_thermal.c
drivers/thermal/imx_sc_thermal.c
drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.h
drivers/thermal/intel/int340x_thermal/int3400_thermal.c
drivers/thermal/intel/int340x_thermal/processor_thermal_rapl.c
drivers/thermal/intel/intel_soc_dts_iosf.c
drivers/thermal/k3_bandgap.c
drivers/thermal/mediatek/auxadc_thermal.c
drivers/thermal/mediatek/lvts_thermal.c
drivers/thermal/qcom/qcom-spmi-adc-tm5.c
drivers/thermal/qcom/qcom-spmi-temp-alarm.c
drivers/thermal/qcom/tsens-v0_1.c
drivers/thermal/qcom/tsens-v1.c
drivers/thermal/qcom/tsens.c
drivers/thermal/qcom/tsens.h
drivers/thermal/qoriq_thermal.c
drivers/thermal/rcar_gen3_thermal.c
drivers/thermal/st/st_thermal.c
drivers/thermal/st/st_thermal.h
drivers/thermal/st/st_thermal_memmap.c
drivers/thermal/sun8i_thermal.c
drivers/thermal/tegra/tegra30-tsensor.c
drivers/thermal/thermal-generic-adc.c
drivers/thermal/thermal_core.h
drivers/thermal/thermal_hwmon.c
drivers/thermal/ti-soc-thermal/ti-thermal-common.c
drivers/thunderbolt/dma_test.c
drivers/thunderbolt/nhi.c
drivers/thunderbolt/nhi_regs.h
drivers/thunderbolt/tb.c
drivers/thunderbolt/tunnel.c
drivers/tty/serial/8250/8250_bcm7271.c
drivers/tty/serial/8250/8250_exar.c
drivers/tty/serial/8250/8250_pci.c
drivers/tty/serial/8250/8250_port.c
drivers/tty/serial/8250/8250_tegra.c
drivers/tty/serial/Kconfig
drivers/tty/serial/arc_uart.c
drivers/tty/serial/cpm_uart/cpm_uart.h
drivers/tty/serial/fsl_lpuart.c
drivers/tty/serial/lantiq.c
drivers/tty/serial/qcom_geni_serial.c
drivers/tty/tty_io.c
drivers/tty/vt/vc_screen.c
drivers/ufs/core/ufs-mcq.c
drivers/ufs/core/ufshcd.c
drivers/usb/cdns3/cdns3-gadget.c
drivers/usb/class/usbtmc.c
drivers/usb/core/buffer.c
drivers/usb/core/devio.c
drivers/usb/dwc3/core.c
drivers/usb/dwc3/core.h
drivers/usb/dwc3/debugfs.c
drivers/usb/dwc3/dwc3-qcom.c
drivers/usb/dwc3/gadget.c
drivers/usb/gadget/function/f_fs.c
drivers/usb/gadget/function/u_ether.c
drivers/usb/gadget/udc/amd5536udc_pci.c
drivers/usb/gadget/udc/core.c
drivers/usb/gadget/udc/renesas_usb3.c
drivers/usb/host/uhci-pci.c
drivers/usb/host/xhci-pci.c
drivers/usb/host/xhci-ring.c
drivers/usb/host/xhci.h
drivers/usb/serial/option.c
drivers/usb/storage/scsiglue.c
drivers/usb/typec/altmodes/displayport.c
drivers/usb/typec/pd.c
drivers/usb/typec/tipd/core.c
drivers/usb/typec/ucsi/ucsi.c
drivers/vdpa/mlx5/net/mlx5_vnet.c
drivers/vdpa/vdpa_user/vduse_dev.c
drivers/vfio/vfio_iommu_type1.c
drivers/vhost/net.c
drivers/vhost/vdpa.c
drivers/vhost/vhost.c
drivers/vhost/vhost.h
drivers/video/fbdev/68328fb.c
drivers/video/fbdev/Kconfig
drivers/video/fbdev/arcfb.c
drivers/video/fbdev/atmel_lcdfb.c
drivers/video/fbdev/aty/atyfb_base.c
drivers/video/fbdev/au1100fb.c
drivers/video/fbdev/au1200fb.c
drivers/video/fbdev/broadsheetfb.c
drivers/video/fbdev/bw2.c
drivers/video/fbdev/cg14.c
drivers/video/fbdev/controlfb.c
drivers/video/fbdev/core/bitblit.c
drivers/video/fbdev/core/fbmem.c
drivers/video/fbdev/core/modedb.c
drivers/video/fbdev/g364fb.c
drivers/video/fbdev/hgafb.c
drivers/video/fbdev/hpfb.c
drivers/video/fbdev/i810/i810_dvt.c
drivers/video/fbdev/imsttfb.c
drivers/video/fbdev/macfb.c
drivers/video/fbdev/matrox/matroxfb_maven.c
drivers/video/fbdev/maxinefb.c
drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td043mtea1.c
drivers/video/fbdev/p9100.c
drivers/video/fbdev/platinumfb.c
drivers/video/fbdev/sa1100fb.c
drivers/video/fbdev/ssd1307fb.c
drivers/video/fbdev/stifb.c
drivers/video/fbdev/udlfb.c
drivers/video/fbdev/valkyriefb.c
drivers/video/fbdev/vfb.c
drivers/virt/acrn/ioreq.c
drivers/virt/coco/sev-guest/Kconfig
drivers/xen/pvcalls-back.c
fs/9p/vfs_file.c
fs/Kconfig
fs/Makefile
fs/adfs/file.c
fs/affs/file.c
fs/afs/dir.c
fs/afs/file.c
fs/afs/vl_probe.c
fs/afs/write.c
fs/aio.c
fs/autofs/root.c
fs/bfs/file.c
fs/btrfs/async-thread.c
fs/btrfs/async-thread.h
fs/btrfs/backref.c
fs/btrfs/backref.h
fs/btrfs/bio.c
fs/btrfs/bio.h
fs/btrfs/block-group.c
fs/btrfs/block-group.h
fs/btrfs/block-rsv.c
fs/btrfs/block-rsv.h
fs/btrfs/btrfs_inode.h
fs/btrfs/check-integrity.c
fs/btrfs/compression.c
fs/btrfs/compression.h
fs/btrfs/ctree.c
fs/btrfs/ctree.h
fs/btrfs/defrag.c
fs/btrfs/delayed-ref.c
fs/btrfs/delayed-ref.h
fs/btrfs/dev-replace.c
fs/btrfs/discard.c
fs/btrfs/discard.h
fs/btrfs/disk-io.c
fs/btrfs/disk-io.h
fs/btrfs/extent-io-tree.c
fs/btrfs/extent-io-tree.h
fs/btrfs/extent-tree.c
fs/btrfs/extent-tree.h
fs/btrfs/extent_io.c
fs/btrfs/extent_io.h
fs/btrfs/extent_map.c
fs/btrfs/extent_map.h
fs/btrfs/file-item.c
fs/btrfs/file-item.h
fs/btrfs/file.c
fs/btrfs/free-space-cache.c
fs/btrfs/free-space-cache.h
fs/btrfs/free-space-tree.c
fs/btrfs/free-space-tree.h
fs/btrfs/fs.h
fs/btrfs/inode-item.h
fs/btrfs/inode.c
fs/btrfs/ioctl.c
fs/btrfs/locking.c
fs/btrfs/lzo.c
fs/btrfs/messages.c
fs/btrfs/messages.h
fs/btrfs/misc.h
fs/btrfs/ordered-data.c
fs/btrfs/ordered-data.h
fs/btrfs/print-tree.c
fs/btrfs/print-tree.h
fs/btrfs/qgroup.c
fs/btrfs/raid56.c
fs/btrfs/raid56.h
fs/btrfs/relocation.c
fs/btrfs/relocation.h
fs/btrfs/scrub.c
fs/btrfs/send.c
fs/btrfs/subpage.c
fs/btrfs/subpage.h
fs/btrfs/super.c
fs/btrfs/tests/extent-io-tests.c
fs/btrfs/transaction.c
fs/btrfs/transaction.h
fs/btrfs/tree-checker.c
fs/btrfs/tree-checker.h
fs/btrfs/tree-log.c
fs/btrfs/tree-log.h
fs/btrfs/tree-mod-log.c
fs/btrfs/volumes.c
fs/btrfs/volumes.h
fs/btrfs/zlib.c
fs/btrfs/zoned.c
fs/btrfs/zoned.h
fs/btrfs/zstd.c
fs/buffer.c
fs/cachefiles/namei.c
fs/ceph/caps.c
fs/ceph/file.c
fs/ceph/mds_client.c
fs/ceph/snap.c
fs/char_dev.c
fs/coda/file.c
fs/coredump.c
fs/cramfs/inode.c
fs/crypto/fscrypt_private.h
fs/crypto/hooks.c
fs/d_path.c
fs/direct-io.c
fs/ecryptfs/file.c
fs/erofs/Kconfig
fs/erofs/Makefile
fs/erofs/compress.h
fs/erofs/data.c
fs/erofs/decompressor.c
fs/erofs/internal.h
fs/erofs/super.c
fs/erofs/utils.c
fs/erofs/xattr.c
fs/erofs/zdata.c
fs/erofs/zmap.c
fs/eventfd.c
fs/eventpoll.c
fs/exfat/file.c
fs/ext2/file.c
fs/ext4/balloc.c
fs/ext4/ext4.h
fs/ext4/extents_status.c
fs/ext4/file.c
fs/ext4/fsync.c
fs/ext4/hash.c
fs/ext4/ialloc.c
fs/ext4/inline.c
fs/ext4/inode.c
fs/ext4/ioctl.c
fs/ext4/mballoc.c
fs/ext4/migrate.c
fs/ext4/mmp.c
fs/ext4/namei.c
fs/ext4/super.c
fs/ext4/xattr.c
fs/f2fs/file.c
fs/f2fs/namei.c
fs/f2fs/super.c
fs/fat/file.c
fs/file_table.c
fs/fs_context.c
fs/fuse/file.c
fs/gfs2/file.c
fs/gfs2/ops_fstype.c
fs/gfs2/super.c
fs/hfs/inode.c
fs/hfsplus/inode.c
fs/hostfs/hostfs.h
fs/hostfs/hostfs_kern.c
fs/hostfs/hostfs_user.c
fs/hpfs/file.c
fs/inode.c
fs/internal.h
fs/iomap/buffered-io.c
fs/iomap/direct-io.c
fs/jffs2/build.c
fs/jffs2/file.c
fs/jffs2/xattr.c
fs/jffs2/xattr.h
fs/jfs/file.c
fs/jfs/jfs_logmgr.c
fs/jfs/namei.c
fs/kernfs/file.c
fs/lockd/svc.c
fs/minix/file.c
fs/namei.c
fs/namespace.c
fs/nfs/blocklayout/dev.c
fs/nfs/dir.c
fs/nfs/file.c
fs/nfs/internal.h
fs/nfs/nfs4file.c
fs/nfs/nfs4proc.c
fs/nfsd/cache.h
fs/nfsd/export.c
fs/nfsd/nfs3proc.c
fs/nfsd/nfs3xdr.c
fs/nfsd/nfs4xdr.c
fs/nfsd/nfscache.c
fs/nfsd/nfsctl.c
fs/nfsd/nfsfh.c
fs/nfsd/nfsproc.c
fs/nfsd/nfssvc.c
fs/nfsd/nfsxdr.c
fs/nfsd/trace.h
fs/nfsd/vfs.c
fs/nfsd/vfs.h
fs/nilfs2/btnode.c
fs/nilfs2/file.c
fs/nilfs2/inode.c
fs/nilfs2/page.c
fs/nilfs2/segbuf.c
fs/nilfs2/segment.c
fs/nilfs2/sufile.c
fs/nilfs2/super.c
fs/nilfs2/the_nilfs.c
fs/no-block.c [deleted file]
fs/notify/inotify/inotify_fsnotify.c
fs/ntfs/attrib.c
fs/ntfs/compress.c
fs/ntfs/file.c
fs/ntfs/mft.c
fs/ntfs/super.c
fs/ntfs3/file.c
fs/ocfs2/cluster/heartbeat.c
fs/ocfs2/file.c
fs/ocfs2/ocfs2_trace.h
fs/ocfs2/super.c
fs/omfs/file.c
fs/open.c
fs/orangefs/file.c
fs/overlayfs/file.c
fs/overlayfs/overlayfs.h
fs/pipe.c
fs/pnode.c
fs/pnode.h
fs/proc/inode.c
fs/proc/meminfo.c
fs/proc/proc_sysctl.c
fs/proc_namespace.c
fs/pstore/blk.c
fs/ramfs/file-mmu.c
fs/ramfs/file-nommu.c
fs/read_write.c
fs/readdir.c
fs/reiserfs/file.c
fs/reiserfs/journal.c
fs/reiserfs/reiserfs.h
fs/remap_range.c
fs/romfs/mmap-nommu.c
fs/smb/Kconfig [new file with mode: 0644]
fs/smb/Makefile [new file with mode: 0644]
fs/smb/client/Kconfig [moved from fs/cifs/Kconfig with 100% similarity]
fs/smb/client/Makefile [moved from fs/cifs/Makefile with 100% similarity]
fs/smb/client/asn1.c [moved from fs/cifs/asn1.c with 100% similarity]
fs/smb/client/cached_dir.c [moved from fs/cifs/cached_dir.c with 100% similarity]
fs/smb/client/cached_dir.h [moved from fs/cifs/cached_dir.h with 100% similarity]
fs/smb/client/cifs_debug.c [moved from fs/cifs/cifs_debug.c with 95% similarity]
fs/smb/client/cifs_debug.h [moved from fs/cifs/cifs_debug.h with 100% similarity]
fs/smb/client/cifs_dfs_ref.c [moved from fs/cifs/cifs_dfs_ref.c with 100% similarity]
fs/smb/client/cifs_fs_sb.h [moved from fs/cifs/cifs_fs_sb.h with 100% similarity]
fs/smb/client/cifs_ioctl.h [moved from fs/cifs/cifs_ioctl.h with 100% similarity]
fs/smb/client/cifs_spnego.c [moved from fs/cifs/cifs_spnego.c with 100% similarity]
fs/smb/client/cifs_spnego.h [moved from fs/cifs/cifs_spnego.h with 100% similarity]
fs/smb/client/cifs_spnego_negtokeninit.asn1 [moved from fs/cifs/cifs_spnego_negtokeninit.asn1 with 100% similarity]
fs/smb/client/cifs_swn.c [moved from fs/cifs/cifs_swn.c with 100% similarity]
fs/smb/client/cifs_swn.h [moved from fs/cifs/cifs_swn.h with 100% similarity]
fs/smb/client/cifs_unicode.c [moved from fs/cifs/cifs_unicode.c with 100% similarity]
fs/smb/client/cifs_unicode.h [moved from fs/cifs/cifs_unicode.h with 100% similarity]
fs/smb/client/cifs_uniupr.h [moved from fs/cifs/cifs_uniupr.h with 100% similarity]
fs/smb/client/cifsacl.c [moved from fs/cifs/cifsacl.c with 100% similarity]
fs/smb/client/cifsacl.h [moved from fs/cifs/cifsacl.h with 100% similarity]
fs/smb/client/cifsencrypt.c [moved from fs/cifs/cifsencrypt.c with 99% similarity]
fs/smb/client/cifsfs.c [moved from fs/cifs/cifsfs.c with 98% similarity]
fs/smb/client/cifsfs.h [moved from fs/cifs/cifsfs.h with 98% similarity]
fs/smb/client/cifsglob.h [moved from fs/cifs/cifsglob.h with 98% similarity]
fs/smb/client/cifspdu.h [moved from fs/cifs/cifspdu.h with 99% similarity]
fs/smb/client/cifsproto.h [moved from fs/cifs/cifsproto.h with 99% similarity]
fs/smb/client/cifsroot.c [moved from fs/cifs/cifsroot.c with 100% similarity]
fs/smb/client/cifssmb.c [moved from fs/cifs/cifssmb.c with 100% similarity]
fs/smb/client/connect.c [moved from fs/cifs/connect.c with 98% similarity]
fs/smb/client/dfs.c [moved from fs/cifs/dfs.c with 99% similarity]
fs/smb/client/dfs.h [moved from fs/cifs/dfs.h with 100% similarity]
fs/smb/client/dfs_cache.c [moved from fs/cifs/dfs_cache.c with 100% similarity]
fs/smb/client/dfs_cache.h [moved from fs/cifs/dfs_cache.h with 100% similarity]
fs/smb/client/dir.c [moved from fs/cifs/dir.c with 100% similarity]
fs/smb/client/dns_resolve.c [moved from fs/cifs/dns_resolve.c with 100% similarity]
fs/smb/client/dns_resolve.h [moved from fs/cifs/dns_resolve.h with 100% similarity]
fs/smb/client/export.c [moved from fs/cifs/export.c with 100% similarity]
fs/smb/client/file.c [moved from fs/cifs/file.c with 99% similarity]
fs/smb/client/fs_context.c [moved from fs/cifs/fs_context.c with 99% similarity]
fs/smb/client/fs_context.h [moved from fs/cifs/fs_context.h with 100% similarity]
fs/smb/client/fscache.c [moved from fs/cifs/fscache.c with 100% similarity]
fs/smb/client/fscache.h [moved from fs/cifs/fscache.h with 100% similarity]
fs/smb/client/inode.c [moved from fs/cifs/inode.c with 100% similarity]
fs/smb/client/ioctl.c [moved from fs/cifs/ioctl.c with 98% similarity]
fs/smb/client/link.c [moved from fs/cifs/link.c with 100% similarity]
fs/smb/client/misc.c [moved from fs/cifs/misc.c with 100% similarity]
fs/smb/client/netlink.c [moved from fs/cifs/netlink.c with 100% similarity]
fs/smb/client/netlink.h [moved from fs/cifs/netlink.h with 100% similarity]
fs/smb/client/netmisc.c [moved from fs/cifs/netmisc.c with 100% similarity]
fs/smb/client/nterr.c [moved from fs/cifs/nterr.c with 100% similarity]
fs/smb/client/nterr.h [moved from fs/cifs/nterr.h with 100% similarity]
fs/smb/client/ntlmssp.h [moved from fs/cifs/ntlmssp.h with 100% similarity]
fs/smb/client/readdir.c [moved from fs/cifs/readdir.c with 100% similarity]
fs/smb/client/rfc1002pdu.h [moved from fs/cifs/rfc1002pdu.h with 100% similarity]
fs/smb/client/sess.c [moved from fs/cifs/sess.c with 100% similarity]
fs/smb/client/smb1ops.c [moved from fs/cifs/smb1ops.c with 99% similarity]
fs/smb/client/smb2file.c [moved from fs/cifs/smb2file.c with 100% similarity]
fs/smb/client/smb2glob.h [moved from fs/cifs/smb2glob.h with 100% similarity]
fs/smb/client/smb2inode.c [moved from fs/cifs/smb2inode.c with 100% similarity]
fs/smb/client/smb2maperror.c [moved from fs/cifs/smb2maperror.c with 100% similarity]
fs/smb/client/smb2misc.c [moved from fs/cifs/smb2misc.c with 100% similarity]
fs/smb/client/smb2ops.c [moved from fs/cifs/smb2ops.c with 99% similarity]
fs/smb/client/smb2pdu.c [moved from fs/cifs/smb2pdu.c with 99% similarity]
fs/smb/client/smb2pdu.h [moved from fs/cifs/smb2pdu.h with 100% similarity]
fs/smb/client/smb2proto.h [moved from fs/cifs/smb2proto.h with 100% similarity]
fs/smb/client/smb2status.h [moved from fs/cifs/smb2status.h with 100% similarity]
fs/smb/client/smb2transport.c [moved from fs/cifs/smb2transport.c with 100% similarity]
fs/smb/client/smbdirect.c [moved from fs/cifs/smbdirect.c with 100% similarity]
fs/smb/client/smbdirect.h [moved from fs/cifs/smbdirect.h with 100% similarity]
fs/smb/client/smbencrypt.c [moved from fs/cifs/smbencrypt.c with 98% similarity]
fs/smb/client/smberr.h [moved from fs/cifs/smberr.h with 100% similarity]
fs/smb/client/trace.c [moved from fs/cifs/trace.c with 100% similarity]
fs/smb/client/trace.h [moved from fs/cifs/trace.h with 100% similarity]
fs/smb/client/transport.c [moved from fs/cifs/transport.c with 99% similarity]
fs/smb/client/unc.c [moved from fs/cifs/unc.c with 100% similarity]
fs/smb/client/winucase.c [moved from fs/cifs/winucase.c with 100% similarity]
fs/smb/client/xattr.c [moved from fs/cifs/xattr.c with 100% similarity]
fs/smb/common/Makefile [moved from fs/smbfs_common/Makefile with 59% similarity]
fs/smb/common/arc4.h [moved from fs/smbfs_common/arc4.h with 100% similarity]
fs/smb/common/cifs_arc4.c [moved from fs/smbfs_common/cifs_arc4.c with 100% similarity]
fs/smb/common/cifs_md4.c [moved from fs/smbfs_common/cifs_md4.c with 100% similarity]
fs/smb/common/md4.h [moved from fs/smbfs_common/md4.h with 100% similarity]
fs/smb/common/smb2pdu.h [moved from fs/smbfs_common/smb2pdu.h with 100% similarity]
fs/smb/common/smbfsctl.h [moved from fs/smbfs_common/smbfsctl.h with 100% similarity]
fs/smb/server/Kconfig [moved from fs/ksmbd/Kconfig with 100% similarity]
fs/smb/server/Makefile [moved from fs/ksmbd/Makefile with 100% similarity]
fs/smb/server/asn1.c [moved from fs/ksmbd/asn1.c with 100% similarity]
fs/smb/server/asn1.h [moved from fs/ksmbd/asn1.h with 100% similarity]
fs/smb/server/auth.c [moved from fs/ksmbd/auth.c with 99% similarity]
fs/smb/server/auth.h [moved from fs/ksmbd/auth.h with 100% similarity]
fs/smb/server/connection.c [moved from fs/ksmbd/connection.c with 96% similarity]
fs/smb/server/connection.h [moved from fs/ksmbd/connection.h with 100% similarity]
fs/smb/server/crypto_ctx.c [moved from fs/ksmbd/crypto_ctx.c with 100% similarity]
fs/smb/server/crypto_ctx.h [moved from fs/ksmbd/crypto_ctx.h with 100% similarity]
fs/smb/server/glob.h [moved from fs/ksmbd/glob.h with 100% similarity]
fs/smb/server/ksmbd_netlink.h [moved from fs/ksmbd/ksmbd_netlink.h with 100% similarity]
fs/smb/server/ksmbd_spnego_negtokeninit.asn1 [moved from fs/ksmbd/ksmbd_spnego_negtokeninit.asn1 with 100% similarity]
fs/smb/server/ksmbd_spnego_negtokentarg.asn1 [moved from fs/ksmbd/ksmbd_spnego_negtokentarg.asn1 with 100% similarity]
fs/smb/server/ksmbd_work.c [moved from fs/ksmbd/ksmbd_work.c with 100% similarity]
fs/smb/server/ksmbd_work.h [moved from fs/ksmbd/ksmbd_work.h with 100% similarity]
fs/smb/server/mgmt/ksmbd_ida.c [moved from fs/ksmbd/mgmt/ksmbd_ida.c with 100% similarity]
fs/smb/server/mgmt/ksmbd_ida.h [moved from fs/ksmbd/mgmt/ksmbd_ida.h with 100% similarity]
fs/smb/server/mgmt/share_config.c [moved from fs/ksmbd/mgmt/share_config.c with 100% similarity]
fs/smb/server/mgmt/share_config.h [moved from fs/ksmbd/mgmt/share_config.h with 100% similarity]
fs/smb/server/mgmt/tree_connect.c [moved from fs/ksmbd/mgmt/tree_connect.c with 100% similarity]
fs/smb/server/mgmt/tree_connect.h [moved from fs/ksmbd/mgmt/tree_connect.h with 100% similarity]
fs/smb/server/mgmt/user_config.c [moved from fs/ksmbd/mgmt/user_config.c with 100% similarity]
fs/smb/server/mgmt/user_config.h [moved from fs/ksmbd/mgmt/user_config.h with 100% similarity]
fs/smb/server/mgmt/user_session.c [moved from fs/ksmbd/mgmt/user_session.c with 100% similarity]
fs/smb/server/mgmt/user_session.h [moved from fs/ksmbd/mgmt/user_session.h with 100% similarity]
fs/smb/server/misc.c [moved from fs/ksmbd/misc.c with 100% similarity]
fs/smb/server/misc.h [moved from fs/ksmbd/misc.h with 100% similarity]
fs/smb/server/ndr.c [moved from fs/ksmbd/ndr.c with 100% similarity]
fs/smb/server/ndr.h [moved from fs/ksmbd/ndr.h with 100% similarity]
fs/smb/server/nterr.h [moved from fs/ksmbd/nterr.h with 100% similarity]
fs/smb/server/ntlmssp.h [moved from fs/ksmbd/ntlmssp.h with 100% similarity]
fs/smb/server/oplock.c [moved from fs/ksmbd/oplock.c with 95% similarity]
fs/smb/server/oplock.h [moved from fs/ksmbd/oplock.h with 99% similarity]
fs/smb/server/server.c [moved from fs/ksmbd/server.c with 96% similarity]
fs/smb/server/server.h [moved from fs/ksmbd/server.h with 100% similarity]
fs/smb/server/smb2misc.c [moved from fs/ksmbd/smb2misc.c with 93% similarity]
fs/smb/server/smb2ops.c [moved from fs/ksmbd/smb2ops.c with 100% similarity]
fs/smb/server/smb2pdu.c [moved from fs/ksmbd/smb2pdu.c with 98% similarity]
fs/smb/server/smb2pdu.h [moved from fs/ksmbd/smb2pdu.h with 100% similarity]
fs/smb/server/smb_common.c [moved from fs/ksmbd/smb_common.c with 98% similarity]
fs/smb/server/smb_common.h [moved from fs/ksmbd/smb_common.h with 99% similarity]
fs/smb/server/smbacl.c [moved from fs/ksmbd/smbacl.c with 99% similarity]
fs/smb/server/smbacl.h [moved from fs/ksmbd/smbacl.h with 100% similarity]
fs/smb/server/smbfsctl.h [moved from fs/ksmbd/smbfsctl.h with 98% similarity]
fs/smb/server/smbstatus.h [moved from fs/ksmbd/smbstatus.h with 99% similarity]
fs/smb/server/transport_ipc.c [moved from fs/ksmbd/transport_ipc.c with 100% similarity]
fs/smb/server/transport_ipc.h [moved from fs/ksmbd/transport_ipc.h with 100% similarity]
fs/smb/server/transport_rdma.c [moved from fs/ksmbd/transport_rdma.c with 100% similarity]
fs/smb/server/transport_rdma.h [moved from fs/ksmbd/transport_rdma.h with 100% similarity]
fs/smb/server/transport_tcp.c [moved from fs/ksmbd/transport_tcp.c with 100% similarity]
fs/smb/server/transport_tcp.h [moved from fs/ksmbd/transport_tcp.h with 100% similarity]
fs/smb/server/unicode.c [moved from fs/ksmbd/unicode.c with 100% similarity]
fs/smb/server/unicode.h [moved from fs/ksmbd/unicode.h with 100% similarity]
fs/smb/server/uniupr.h [moved from fs/ksmbd/uniupr.h with 100% similarity]
fs/smb/server/vfs.c [moved from fs/ksmbd/vfs.c with 95% similarity]
fs/smb/server/vfs.h [moved from fs/ksmbd/vfs.h with 94% similarity]
fs/smb/server/vfs_cache.c [moved from fs/ksmbd/vfs_cache.c with 99% similarity]
fs/smb/server/vfs_cache.h [moved from fs/ksmbd/vfs_cache.h with 100% similarity]
fs/smb/server/xattr.h [moved from fs/ksmbd/xattr.h with 100% similarity]
fs/splice.c
fs/statfs.c
fs/super.c
fs/sysv/dir.c
fs/sysv/file.c
fs/sysv/itree.c
fs/sysv/namei.c
fs/ubifs/file.c
fs/udf/file.c
fs/udf/namei.c
fs/ufs/file.c
fs/userfaultfd.c
fs/vboxsf/file.c
fs/verity/Kconfig
fs/verity/enable.c
fs/verity/fsverity_private.h
fs/verity/hash_algs.c
fs/verity/measure.c
fs/verity/open.c
fs/verity/read_metadata.c
fs/verity/signature.c
fs/verity/verify.c
fs/xattr.c
fs/xfs/libxfs/xfs_ag.c
fs/xfs/libxfs/xfs_alloc.c
fs/xfs/libxfs/xfs_alloc.h
fs/xfs/libxfs/xfs_bmap.c
fs/xfs/libxfs/xfs_bmap_btree.c
fs/xfs/libxfs/xfs_ialloc.c
fs/xfs/libxfs/xfs_log_format.h
fs/xfs/libxfs/xfs_refcount.c
fs/xfs/libxfs/xfs_trans_inode.c
fs/xfs/scrub/bmap.c
fs/xfs/scrub/common.c
fs/xfs/scrub/common.h
fs/xfs/scrub/fscounters.c
fs/xfs/scrub/scrub.c
fs/xfs/scrub/scrub.h
fs/xfs/scrub/trace.h
fs/xfs/xfs_bmap_util.c
fs/xfs/xfs_buf_item.c
fs/xfs/xfs_file.c
fs/xfs/xfs_filestream.c
fs/xfs/xfs_fsops.c
fs/xfs/xfs_icache.c
fs/xfs/xfs_icache.h
fs/xfs/xfs_inode.c
fs/xfs/xfs_inode.h
fs/xfs/xfs_inode_item.c
fs/xfs/xfs_inode_item.h
fs/xfs/xfs_iomap.c
fs/xfs/xfs_log_recover.c
fs/xfs/xfs_mount.h
fs/xfs/xfs_reflink.c
fs/xfs/xfs_super.c
fs/xfs/xfs_trace.h
fs/xfs/xfs_trans.c
fs/zonefs/file.c
fs/zonefs/super.c
fs/zonefs/zonefs.h
include/acpi/acpi_bus.h
include/acpi/acpixf.h
include/acpi/actbl.h
include/asm-generic/atomic.h
include/asm-generic/bitops/atomic.h
include/asm-generic/bitops/lock.h
include/asm-generic/bugs.h [deleted file]
include/asm-generic/percpu.h
include/asm-generic/vmlinux.lds.h
include/clocksource/hyperv_timer.h
include/crypto/b128ops.h
include/drm/display/drm_dp.h
include/drm/display/drm_dp_helper.h
include/drm/drm_managed.h
include/dt-bindings/power/qcom-rpmpd.h
include/kunit/resource.h
include/kunit/test.h
include/linux/acpi.h
include/linux/acpi_agdi.h [deleted file]
include/linux/acpi_apmt.h [deleted file]
include/linux/acpi_iort.h
include/linux/amd-pstate.h
include/linux/arm_ffa.h
include/linux/atomic/atomic-arch-fallback.h
include/linux/atomic/atomic-instrumented.h
include/linux/atomic/atomic-long.h
include/linux/audit.h
include/linux/audit_arch.h
include/linux/bio.h
include/linux/blk-mq.h
include/linux/blk_types.h
include/linux/blkdev.h
include/linux/blktrace_api.h
include/linux/bsg.h
include/linux/cdrom.h
include/linux/cgroup.h
include/linux/compiler.h
include/linux/compiler_attributes.h
include/linux/context_tracking.h
include/linux/context_tracking_state.h
include/linux/cper.h
include/linux/cpu.h
include/linux/cpufreq.h
include/linux/cpuhotplug.h
include/linux/cpumask.h
include/linux/cpuset.h
include/linux/devfreq.h
include/linux/device-mapper.h
include/linux/device/class.h
include/linux/device/driver.h
include/linux/dim.h
include/linux/dmar.h
include/linux/efi.h
include/linux/err.h
include/linux/eventfd.h
include/linux/firewire.h
include/linux/fs.h
include/linux/fsnotify.h
include/linux/fsverity.h
include/linux/gpio/driver.h
include/linux/highmem.h
include/linux/if_team.h
include/linux/if_vlan.h
include/linux/iio/iio-gts-helper.h
include/linux/intel_rapl.h
include/linux/io.h
include/linux/io_uring.h
include/linux/io_uring_types.h
include/linux/irq.h
include/linux/irqchip/mmp.h [deleted file]
include/linux/irqchip/mxs.h [deleted file]
include/linux/irqdesc.h
include/linux/iscsi_ibft.h
include/linux/jump_label.h
include/linux/kthread.h
include/linux/libata.h
include/linux/lockdep.h
include/linux/lockdep_types.h
include/linux/math64.h
include/linux/mlx5/driver.h
include/linux/mlx5/mlx5_ifc.h
include/linux/mm.h
include/linux/mmzone.h
include/linux/mount.h
include/linux/msi.h
include/linux/mtd/blktrans.h
include/linux/netdevice.h
include/linux/notifier.h
include/linux/nubus.h
include/linux/nvme-fc-driver.h
include/linux/olpc-ec.h
include/linux/page-flags.h
include/linux/pci_ids.h
include/linux/pe.h
include/linux/percpu-defs.h
include/linux/perf/arm_pmu.h
include/linux/perf_event.h
include/linux/phy.h
include/linux/pipe_fs_i.h
include/linux/pktcdvd.h
include/linux/power/bq27xxx_battery.h
include/linux/proc_fs.h
include/linux/rbtree_latch.h
include/linux/rcupdate.h
include/linux/regulator/pca9450.h
include/linux/root_dev.h
include/linux/sched.h
include/linux/sched/clock.h
include/linux/sched/sd_flags.h
include/linux/sched/signal.h
include/linux/sched/task.h
include/linux/sched/topology.h
include/linux/sched/vhost_task.h
include/linux/seqlock.h
include/linux/shrinker.h
include/linux/skbuff.h
include/linux/skmsg.h
include/linux/slub_def.h
include/linux/soc/qcom/llcc-qcom.h
include/linux/splice.h
include/linux/srcu.h
include/linux/sunrpc/svc.h
include/linux/sunrpc/svc_rdma.h
include/linux/sunrpc/svc_xprt.h
include/linux/sunrpc/svcsock.h
include/linux/sunrpc/xdr.h
include/linux/surface_aggregator/device.h
include/linux/suspend.h
include/linux/syscalls.h
include/linux/time_namespace.h
include/linux/tpm.h
include/linux/trace_events.h
include/linux/types.h
include/linux/uio.h
include/linux/usb/composite.h
include/linux/usb/hcd.h
include/linux/user_events.h
include/linux/watch_queue.h
include/linux/workqueue.h
include/media/dvb_net.h
include/media/dvbdev.h
include/media/v4l2-subdev.h
include/net/bluetooth/hci.h
include/net/bluetooth/hci_core.h
include/net/bonding.h
include/net/dsa.h
include/net/handshake.h
include/net/ip.h
include/net/mana/mana.h
include/net/neighbour.h
include/net/netfilter/nf_flow_table.h
include/net/netfilter/nf_tables.h
include/net/netns/ipv6.h
include/net/nexthop.h
include/net/page_pool.h
include/net/ping.h
include/net/pkt_sched.h
include/net/rpl.h
include/net/sch_generic.h
include/net/sock.h
include/net/tcp.h
include/net/tls.h
include/net/xfrm.h
include/rdma/ib_addr.h
include/scsi/scsi_ioctl.h
include/soc/imx/timer.h [deleted file]
include/sound/hda-mlink.h
include/sound/soc-acpi.h
include/sound/soc-dpcm.h
include/target/iscsi/iscsi_target_core.h
include/trace/events/block.h
include/trace/events/btrfs.h
include/trace/events/csd.h [new file with mode: 0644]
include/trace/events/rpcrdma.h
include/trace/events/sunrpc.h
include/trace/events/timer.h
include/trace/events/writeback.h
include/uapi/linux/affs_hardblocks.h
include/uapi/linux/bpf.h
include/uapi/linux/ethtool_netlink.h
include/uapi/linux/eventfd.h [new file with mode: 0644]
include/uapi/linux/handshake.h
include/uapi/linux/in.h
include/uapi/linux/io_uring.h
include/uapi/linux/mount.h
include/uapi/linux/pktcdvd.h
include/uapi/linux/types.h
include/uapi/linux/ublk_cmd.h
include/uapi/linux/vfio.h
include/uapi/sound/skl-tplg-interface.h
include/uapi/sound/sof/tokens.h
include/ufs/ufshcd.h
include/xen/events.h
include/xen/xen.h
init/do_mounts.c
init/do_mounts.h
init/do_mounts_initrd.c
init/main.c
io_uring/cancel.c
io_uring/epoll.c
io_uring/filetable.c
io_uring/filetable.h
io_uring/io-wq.c
io_uring/io_uring.c
io_uring/io_uring.h
io_uring/msg_ring.c
io_uring/net.c
io_uring/poll.c
io_uring/poll.h
io_uring/rsrc.c
io_uring/rw.c
io_uring/rw.h
io_uring/sqpoll.c
io_uring/tctx.c
io_uring/timeout.c
io_uring/uring_cmd.c
kernel/audit.h
kernel/bpf/btf.c
kernel/bpf/hashtab.c
kernel/bpf/map_in_map.c
kernel/bpf/offload.c
kernel/bpf/syscall.c
kernel/bpf/verifier.c
kernel/cgroup/cgroup-internal.h
kernel/cgroup/cgroup-v1.c
kernel/cgroup/cgroup.c
kernel/cgroup/cpuset.c
kernel/cgroup/legacy_freezer.c
kernel/cgroup/misc.c
kernel/cgroup/rdma.c
kernel/context_tracking.c
kernel/cpu.c
kernel/events/core.c
kernel/exit.c
kernel/fork.c
kernel/irq/chip.c
kernel/irq/debugfs.c
kernel/irq/internals.h
kernel/irq/irqdesc.c
kernel/irq/irqdomain.c
kernel/irq/msi.c
kernel/irq/resend.c
kernel/kexec_file.c
kernel/kthread.c
kernel/locking/lockdep.c
kernel/locking/locktorture.c
kernel/locking/rwsem.c
kernel/module/decompress.c
kernel/module/main.c
kernel/module/stats.c
kernel/power/hibernate.c
kernel/power/main.c
kernel/power/power.h
kernel/power/snapshot.c
kernel/power/swap.c
kernel/printk/printk.c
kernel/rcu/Kconfig
kernel/rcu/rcu.h
kernel/rcu/rcuscale.c
kernel/rcu/tasks.h
kernel/rcu/tree.c
kernel/rcu/tree_exp.h
kernel/rcu/tree_nocb.h
kernel/rcu/tree_plugin.h
kernel/sched/clock.c
kernel/sched/core.c
kernel/sched/cpufreq_schedutil.c
kernel/sched/deadline.c
kernel/sched/debug.c
kernel/sched/fair.c
kernel/sched/psi.c
kernel/sched/sched.h
kernel/sched/topology.c
kernel/sched/wait.c
kernel/signal.c
kernel/smp.c
kernel/smpboot.c
kernel/softirq.c
kernel/time/alarmtimer.c
kernel/time/hrtimer.c
kernel/time/posix-timers.c
kernel/time/sched_clock.c
kernel/time/tick-broadcast.c
kernel/time/tick-common.c
kernel/time/tick-sched.c
kernel/time/timekeeping.c
kernel/trace/bpf_trace.c
kernel/trace/fprobe.c
kernel/trace/rethook.c
kernel/trace/trace.c
kernel/trace/trace_events.c
kernel/trace/trace_events_hist.c
kernel/trace/trace_events_user.c
kernel/trace/trace_osnoise.c
kernel/trace/trace_output.c
kernel/trace/trace_probe.h
kernel/trace/trace_selftest.c
kernel/vhost_task.c
kernel/watch_queue.c
kernel/workqueue.c
kernel/workqueue_internal.h
lib/Kconfig.debug
lib/Makefile
lib/checksum_kunit.c [new file with mode: 0644]
lib/cpu_rmap.c
lib/crypto/curve25519-hacl64.c
lib/crypto/poly1305-donna64.c
lib/debugobjects.c
lib/dim/dim.c
lib/dim/net_dim.c
lib/dim/rdma_dim.c
lib/iov_iter.c
lib/kunit/executor_test.c
lib/kunit/kunit-example-test.c
lib/kunit/kunit-test.c
lib/kunit/resource.c
lib/kunit/test.c
lib/maple_tree.c
lib/radix-tree.c
lib/radix-tree.h [new file with mode: 0644]
lib/raid6/neon.h [new file with mode: 0644]
lib/raid6/neon.uc
lib/raid6/recov_neon.c
lib/raid6/recov_neon_inner.c
lib/test_firmware.c
lib/test_vmalloc.c
lib/xarray.c
mm/Kconfig.debug
mm/damon/core.c
mm/filemap.c
mm/gup.c
mm/gup_test.c
mm/internal.h
mm/kfence/kfence.h
mm/khugepaged.c
mm/memblock.c
mm/memfd.c
mm/mm_init.c
mm/mmap.c
mm/mprotect.c
mm/page_alloc.c
mm/page_io.c
mm/page_table_check.c
mm/shmem.c
mm/shrinker_debug.c
mm/slab.h
mm/slub.c
mm/swapfile.c
mm/vmalloc.c
mm/vmscan.c
mm/vmstat.c
mm/zsmalloc.c
mm/zswap.c
net/8021q/vlan_dev.c
net/atm/resources.c
net/batman-adv/distributed-arp-table.c
net/bluetooth/hci_conn.c
net/bluetooth/hci_core.c
net/bluetooth/hci_event.c
net/bluetooth/hci_sync.c
net/bluetooth/l2cap_core.c
net/bridge/br_forward.c
net/bridge/br_private_tunnel.h
net/can/isotp.c
net/can/j1939/main.c
net/can/j1939/socket.c
net/core/datagram.c
net/core/dev.c
net/core/page_pool.c
net/core/rtnetlink.c
net/core/skbuff.c
net/core/skmsg.c
net/core/sock.c
net/core/sock_map.c
net/core/stream.c
net/dccp/proto.c
net/devlink/core.c
net/devlink/devl_internal.h
net/devlink/leftover.c
net/dsa/dsa.c
net/handshake/handshake-test.c
net/handshake/netlink.c
net/handshake/tlshd.c
net/ieee802154/trace.h
net/ipv4/af_inet.c
net/ipv4/esp4_offload.c
net/ipv4/inet_connection_sock.c
net/ipv4/ip_sockglue.c
net/ipv4/raw.c
net/ipv4/sysctl_net_ipv4.c
net/ipv4/tcp.c
net/ipv4/tcp_bpf.c
net/ipv4/tcp_input.c
net/ipv4/tcp_ipv4.c
net/ipv4/tcp_offload.c
net/ipv4/tcp_timer.c
net/ipv4/udp.c
net/ipv4/udplite.c
net/ipv4/xfrm4_input.c
net/ipv6/esp6_offload.c
net/ipv6/exthdrs.c
net/ipv6/exthdrs_core.c
net/ipv6/ip6_fib.c
net/ipv6/ip6_gre.c
net/ipv6/ping.c
net/ipv6/raw.c
net/ipv6/route.c
net/ipv6/udplite.c
net/ipv6/xfrm6_input.c
net/key/af_key.c
net/llc/af_llc.c
net/mac80211/cfg.c
net/mac80211/chan.c
net/mac80211/he.c
net/mac80211/ieee80211_i.h
net/mac80211/link.c
net/mac80211/mlme.c
net/mac80211/rx.c
net/mac80211/trace.h
net/mac80211/tx.c
net/mac80211/util.c
net/mac802154/trace.h
net/mptcp/pm.c
net/mptcp/pm_netlink.c
net/mptcp/pm_userspace.c
net/mptcp/protocol.c
net/mptcp/protocol.h
net/mptcp/subflow.c
net/netfilter/core.c
net/netfilter/ipset/ip_set_core.c
net/netfilter/ipvs/ip_vs_xmit.c
net/netfilter/nf_conntrack_core.c
net/netfilter/nf_conntrack_netlink.c
net/netfilter/nf_conntrack_standalone.c
net/netfilter/nf_flow_table_core.c
net/netfilter/nf_flow_table_ip.c
net/netfilter/nf_tables_api.c
net/netfilter/nfnetlink.c
net/netfilter/nfnetlink_osf.c
net/netfilter/nft_bitwise.c
net/netfilter/nft_chain_filter.c
net/netfilter/nft_immediate.c
net/netfilter/nft_set_bitmap.c
net/netfilter/nft_set_hash.c
net/netfilter/nft_set_pipapo.c
net/netfilter/nft_set_rbtree.c
net/netfilter/xt_osf.c
net/netlabel/netlabel_kapi.c
net/netlink/af_netlink.c
net/netrom/nr_subr.c
net/nsh/nsh.c
net/openvswitch/datapath.c
net/openvswitch/vport.c
net/packet/af_packet.c
net/packet/diag.c
net/qrtr/ns.c
net/rxrpc/af_rxrpc.c
net/rxrpc/ar-internal.h
net/rxrpc/local_event.c
net/sched/act_ct.c
net/sched/act_pedit.c
net/sched/act_police.c
net/sched/cls_api.c
net/sched/cls_flower.c
net/sched/cls_u32.c
net/sched/sch_api.c
net/sched/sch_fq_pie.c
net/sched/sch_generic.c
net/sched/sch_ingress.c
net/sched/sch_mq.c
net/sched/sch_mqprio.c
net/sched/sch_netem.c
net/sched/sch_pie.c
net/sched/sch_red.c
net/sched/sch_sfq.c
net/sched/sch_taprio.c
net/sched/sch_teql.c
net/sctp/sm_sideeffect.c
net/sctp/sm_statefuns.c
net/sctp/transport.c
net/smc/af_smc.c
net/smc/smc_close.c
net/smc/smc_core.c
net/smc/smc_llc.c
net/smc/smc_rx.c
net/smc/smc_tx.c
net/socket.c
net/sunrpc/auth_gss/gss_krb5_crypto.c
net/sunrpc/sched.c
net/sunrpc/svc.c
net/sunrpc/svc_xprt.c
net/sunrpc/svcsock.c
net/sunrpc/xdr.c
net/sunrpc/xprtrdma/svc_rdma_backchannel.c
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
net/sunrpc/xprtrdma/svc_rdma_rw.c
net/sunrpc/xprtrdma/svc_rdma_sendto.c
net/sunrpc/xprtrdma/svc_rdma_transport.c
net/tipc/bearer.c
net/tipc/bearer.h
net/tipc/link.c
net/tipc/socket.c
net/tipc/udp_media.c
net/tls/tls.h
net/tls/tls_device.c
net/tls/tls_main.c
net/tls/tls_strp.c
net/tls/tls_sw.c
net/unix/af_unix.c
net/vmw_vsock/af_vsock.c
net/vmw_vsock/virtio_transport_common.c
net/wireless/core.c
net/wireless/nl80211.c
net/wireless/rdev-ops.h
net/wireless/reg.c
net/wireless/scan.c
net/wireless/util.c
net/xfrm/xfrm_device.c
net/xfrm/xfrm_input.c
net/xfrm/xfrm_policy.c
net/xfrm/xfrm_user.c
rust/alloc/README.md
rust/alloc/alloc.rs
rust/alloc/boxed.rs
rust/alloc/collections/mod.rs
rust/alloc/lib.rs
rust/alloc/raw_vec.rs
rust/alloc/slice.rs
rust/alloc/vec/drain.rs
rust/alloc/vec/drain_filter.rs
rust/alloc/vec/into_iter.rs
rust/alloc/vec/is_zero.rs
rust/alloc/vec/mod.rs
rust/alloc/vec/set_len_on_drop.rs
rust/alloc/vec/spec_extend.rs
rust/bindings/bindings_helper.h
rust/bindings/lib.rs
rust/helpers.c
rust/kernel/build_assert.rs
rust/kernel/error.rs
rust/kernel/init.rs
rust/kernel/init/macros.rs
rust/kernel/lib.rs
rust/kernel/std_vendor.rs
rust/kernel/str.rs
rust/kernel/sync/arc.rs
rust/kernel/task.rs
rust/kernel/types.rs
rust/macros/helpers.rs
rust/macros/pin_data.rs
rust/macros/quote.rs
rust/uapi/lib.rs
samples/bpf/hbm.c
scripts/Makefile.build
scripts/atomic/atomic-tbl.sh
scripts/atomic/atomics.tbl
scripts/atomic/fallbacks/acquire
scripts/atomic/fallbacks/add_negative
scripts/atomic/fallbacks/add_unless
scripts/atomic/fallbacks/andnot
scripts/atomic/fallbacks/cmpxchg [new file with mode: 0644]
scripts/atomic/fallbacks/dec
scripts/atomic/fallbacks/dec_and_test
scripts/atomic/fallbacks/dec_if_positive
scripts/atomic/fallbacks/dec_unless_positive
scripts/atomic/fallbacks/fence
scripts/atomic/fallbacks/fetch_add_unless
scripts/atomic/fallbacks/inc
scripts/atomic/fallbacks/inc_and_test
scripts/atomic/fallbacks/inc_not_zero
scripts/atomic/fallbacks/inc_unless_negative
scripts/atomic/fallbacks/read_acquire
scripts/atomic/fallbacks/release
scripts/atomic/fallbacks/set_release
scripts/atomic/fallbacks/sub_and_test
scripts/atomic/fallbacks/try_cmpxchg
scripts/atomic/fallbacks/xchg [new file with mode: 0644]
scripts/atomic/gen-atomic-fallback.sh
scripts/atomic/gen-atomic-instrumented.sh
scripts/atomic/gen-atomic-long.sh
scripts/atomic/kerneldoc/add [new file with mode: 0644]
scripts/atomic/kerneldoc/add_negative [new file with mode: 0644]
scripts/atomic/kerneldoc/add_unless [new file with mode: 0644]
scripts/atomic/kerneldoc/and [new file with mode: 0644]
scripts/atomic/kerneldoc/andnot [new file with mode: 0644]
scripts/atomic/kerneldoc/cmpxchg [new file with mode: 0644]
scripts/atomic/kerneldoc/dec [new file with mode: 0644]
scripts/atomic/kerneldoc/dec_and_test [new file with mode: 0644]
scripts/atomic/kerneldoc/dec_if_positive [new file with mode: 0644]
scripts/atomic/kerneldoc/dec_unless_positive [new file with mode: 0644]
scripts/atomic/kerneldoc/inc [new file with mode: 0644]
scripts/atomic/kerneldoc/inc_and_test [new file with mode: 0644]
scripts/atomic/kerneldoc/inc_not_zero [new file with mode: 0644]
scripts/atomic/kerneldoc/inc_unless_negative [new file with mode: 0644]
scripts/atomic/kerneldoc/or [new file with mode: 0644]
scripts/atomic/kerneldoc/read [new file with mode: 0644]
scripts/atomic/kerneldoc/set [new file with mode: 0644]
scripts/atomic/kerneldoc/sub [new file with mode: 0644]
scripts/atomic/kerneldoc/sub_and_test [new file with mode: 0644]
scripts/atomic/kerneldoc/try_cmpxchg [new file with mode: 0644]
scripts/atomic/kerneldoc/xchg [new file with mode: 0644]
scripts/atomic/kerneldoc/xor [new file with mode: 0644]
scripts/gdb/linux/constants.py.in
scripts/gfp-translate
scripts/kernel-doc
scripts/min-tool-version.sh
scripts/mod/modpost.c
scripts/orc_hash.sh [new file with mode: 0644]
security/integrity/ima/ima_api.c
security/landlock/Kconfig
sound/core/oss/pcm_plugin.h
sound/core/seq/oss/seq_oss_midi.c
sound/firewire/digi00x/digi00x-stream.c
sound/hda/hdac_device.c
sound/isa/gus/gus_pcm.c
sound/pci/cmipci.c
sound/pci/cs46xx/cs46xx_lib.c
sound/pci/hda/hda_codec.c
sound/pci/hda/hda_generic.c
sound/pci/hda/patch_ca0132.c
sound/pci/hda/patch_hdmi.c
sound/pci/hda/patch_realtek.c
sound/pci/ice1712/aureon.c
sound/pci/ice1712/ice1712.c
sound/pci/ice1712/ice1724.c
sound/pci/ymfpci/ymfpci_main.c
sound/soc/amd/ps/pci-ps.c
sound/soc/amd/ps/ps-pdm-dma.c
sound/soc/amd/yc/acp6x-mach.c
sound/soc/codecs/cs35l41-lib.c
sound/soc/codecs/cs35l56.c
sound/soc/codecs/lpass-tx-macro.c
sound/soc/codecs/max98363.c
sound/soc/codecs/nau8824.c
sound/soc/codecs/rt5682-i2c.c
sound/soc/codecs/rt5682.c
sound/soc/codecs/rt5682.h
sound/soc/codecs/ssm2602.c
sound/soc/codecs/wcd938x-sdw.c
sound/soc/codecs/wsa881x.c
sound/soc/codecs/wsa883x.c
sound/soc/dwc/dwc-i2s.c
sound/soc/fsl/fsl_micfil.c
sound/soc/fsl/fsl_sai.c
sound/soc/fsl/fsl_sai.h
sound/soc/generic/simple-card-utils.c
sound/soc/generic/simple-card.c
sound/soc/intel/avs/apl.c
sound/soc/intel/avs/avs.h
sound/soc/intel/avs/board_selection.c
sound/soc/intel/avs/control.c
sound/soc/intel/avs/dsp.c
sound/soc/intel/avs/messages.h
sound/soc/intel/avs/path.h
sound/soc/intel/avs/pcm.c
sound/soc/intel/avs/probes.c
sound/soc/intel/boards/sof_sdw.c
sound/soc/jz4740/jz4740-i2s.c
sound/soc/mediatek/mt8186/mt8186-afe-clk.c
sound/soc/mediatek/mt8186/mt8186-afe-clk.h
sound/soc/mediatek/mt8186/mt8186-afe-pcm.c
sound/soc/mediatek/mt8186/mt8186-audsys-clk.c
sound/soc/mediatek/mt8186/mt8186-audsys-clk.h
sound/soc/mediatek/mt8188/mt8188-afe-clk.c
sound/soc/mediatek/mt8188/mt8188-afe-clk.h
sound/soc/mediatek/mt8188/mt8188-afe-pcm.c
sound/soc/mediatek/mt8188/mt8188-audsys-clk.c
sound/soc/mediatek/mt8188/mt8188-audsys-clk.h
sound/soc/mediatek/mt8195/mt8195-afe-clk.c
sound/soc/mediatek/mt8195/mt8195-afe-clk.h
sound/soc/mediatek/mt8195/mt8195-afe-pcm.c
sound/soc/mediatek/mt8195/mt8195-audsys-clk.c
sound/soc/mediatek/mt8195/mt8195-audsys-clk.h
sound/soc/soc-pcm.c
sound/soc/sof/amd/acp-ipc.c
sound/soc/sof/debug.c
sound/soc/sof/intel/hda-mlink.c
sound/soc/sof/ipc3-topology.c
sound/soc/sof/ipc4-topology.c
sound/soc/sof/pcm.c
sound/soc/sof/pm.c
sound/soc/sof/sof-client-probes.c
sound/soc/sof/topology.c
sound/soc/tegra/tegra_pcm.c
sound/usb/format.c
sound/usb/pcm.c
sound/usb/quirks.c
tools/arch/arm64/include/uapi/asm/kvm.h
tools/arch/x86/include/asm/cpufeatures.h
tools/arch/x86/include/asm/disabled-features.h
tools/arch/x86/include/asm/msr-index.h
tools/arch/x86/include/asm/nops.h
tools/arch/x86/include/uapi/asm/kvm.h
tools/arch/x86/include/uapi/asm/prctl.h
tools/arch/x86/include/uapi/asm/unistd_32.h
tools/arch/x86/kcpuid/.gitignore [new file with mode: 0644]
tools/arch/x86/kcpuid/kcpuid.c
tools/arch/x86/lib/memcpy_64.S
tools/arch/x86/lib/memset_64.S
tools/gpio/lsgpio.c
tools/include/asm/alternative.h
tools/include/linux/coresight-pmu.h
tools/include/nolibc/Makefile
tools/include/nolibc/arch-aarch64.h
tools/include/nolibc/arch-arm.h
tools/include/nolibc/arch-i386.h
tools/include/nolibc/arch-loongarch.h
tools/include/nolibc/arch-mips.h
tools/include/nolibc/arch-riscv.h
tools/include/nolibc/arch-s390.h
tools/include/nolibc/arch-x86_64.h
tools/include/nolibc/arch.h
tools/include/nolibc/compiler.h [new file with mode: 0644]
tools/include/nolibc/nolibc.h
tools/include/nolibc/stackprotector.h
tools/include/nolibc/stdint.h
tools/include/nolibc/stdio.h
tools/include/nolibc/stdlib.h
tools/include/nolibc/string.h
tools/include/nolibc/sys.h
tools/include/nolibc/types.h
tools/include/nolibc/unistd.h
tools/include/uapi/drm/drm.h
tools/include/uapi/drm/i915_drm.h
tools/include/uapi/linux/bpf.h
tools/include/uapi/linux/const.h
tools/include/uapi/linux/in.h
tools/include/uapi/linux/kvm.h
tools/include/uapi/linux/prctl.h
tools/include/uapi/sound/asound.h
tools/lib/bpf/libbpf.c
tools/lib/bpf/libbpf_probes.c
tools/lib/subcmd/parse-options.h
tools/lib/subcmd/subcmd-util.h
tools/net/ynl/lib/ynl.py
tools/objtool/Documentation/objtool.txt
tools/objtool/arch/powerpc/include/arch/elf.h
tools/objtool/arch/x86/decode.c
tools/objtool/arch/x86/include/arch/elf.h
tools/objtool/arch/x86/special.c
tools/objtool/builtin-check.c
tools/objtool/check.c
tools/objtool/elf.c
tools/objtool/include/objtool/builtin.h
tools/objtool/include/objtool/cfi.h
tools/objtool/include/objtool/elf.h
tools/objtool/include/objtool/warn.h
tools/objtool/noreturns.h [new file with mode: 0644]
tools/objtool/orc_gen.c
tools/objtool/special.c
tools/perf/Makefile.config
tools/perf/Makefile.perf
tools/perf/arch/arm/util/cs-etm.c
tools/perf/arch/arm/util/pmu.c
tools/perf/arch/arm64/util/header.c
tools/perf/arch/arm64/util/pmu.c
tools/perf/arch/s390/entry/syscalls/syscall.tbl
tools/perf/arch/x86/include/arch-tests.h
tools/perf/arch/x86/tests/Build
tools/perf/arch/x86/tests/amd-ibs-via-core-pmu.c [new file with mode: 0644]
tools/perf/arch/x86/tests/arch-tests.c
tools/perf/bench/mem-memcpy-x86-64-asm-def.h
tools/perf/bench/mem-memcpy-x86-64-asm.S
tools/perf/bench/mem-memset-x86-64-asm-def.h
tools/perf/bench/mem-memset-x86-64-asm.S
tools/perf/builtin-ftrace.c
tools/perf/builtin-script.c
tools/perf/builtin-stat.c
tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json
tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json
tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json
tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json
tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
tools/perf/pmu-events/jevents.py
tools/perf/pmu-events/pmu-events.h
tools/perf/tests/attr.py
tools/perf/tests/attr/base-stat
tools/perf/tests/attr/test-stat-default
tools/perf/tests/attr/test-stat-detailed-1
tools/perf/tests/attr/test-stat-detailed-2
tools/perf/tests/attr/test-stat-detailed-3
tools/perf/tests/expr.c
tools/perf/tests/parse-metric.c
tools/perf/tests/shell/stat.sh
tools/perf/tests/shell/test_intel_pt.sh
tools/perf/tests/shell/test_java_symbol.sh
tools/perf/trace/beauty/arch_prctl.c
tools/perf/trace/beauty/x86_arch_prctl.sh
tools/perf/util/Build
tools/perf/util/bpf_skel/lock_contention.bpf.c
tools/perf/util/bpf_skel/sample_filter.bpf.c
tools/perf/util/bpf_skel/vmlinux.h
tools/perf/util/cs-etm.h
tools/perf/util/evsel.c
tools/perf/util/evsel.h
tools/perf/util/expr.y
tools/perf/util/metricgroup.c
tools/perf/util/parse-events.c
tools/perf/util/stat-display.c
tools/perf/util/stat-shadow.c
tools/perf/util/symbol-elf.c
tools/power/cpupower/lib/powercap.c
tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
tools/testing/cxl/Kbuild
tools/testing/cxl/test/mem.c
tools/testing/cxl/test/mock.c
tools/testing/kunit/kunit_kernel.py
tools/testing/kunit/mypy.ini [new file with mode: 0644]
tools/testing/kunit/run_checks.py
tools/testing/radix-tree/Makefile
tools/testing/selftests/alsa/pcm-test.c
tools/testing/selftests/arm64/abi/hwcap.c
tools/testing/selftests/arm64/abi/ptrace.c
tools/testing/selftests/arm64/signal/.gitignore
tools/testing/selftests/arm64/signal/test_signals_utils.c
tools/testing/selftests/arm64/signal/testcases/tpidr2_restore.c [new file with mode: 0644]
tools/testing/selftests/bpf/Makefile
tools/testing/selftests/bpf/prog_tests/inner_array_lookup.c [new file with mode: 0644]
tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h [new file with mode: 0644]
tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
tools/testing/selftests/bpf/prog_tests/subprogs_extable.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/inner_array_lookup.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/test_sockmap_drop_prog.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/test_sockmap_kern.h
tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/test_subprogs_extable.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/verifier_spill_fill.c
tools/testing/selftests/clone3/clone3.c
tools/testing/selftests/cpufreq/config
tools/testing/selftests/drivers/net/bonding/bond_options.sh
tools/testing/selftests/drivers/net/bonding/bond_topo_3d1c.sh
tools/testing/selftests/ftrace/Makefile
tools/testing/selftests/ftrace/ftracetest
tools/testing/selftests/ftrace/ftracetest-ktap [new file with mode: 0755]
tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc [new file with mode: 0644]
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack-legacy.tc [new file with mode: 0644]
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack.tc
tools/testing/selftests/gpio/gpio-sim.sh
tools/testing/selftests/kselftest/runner.sh
tools/testing/selftests/kselftest_harness.h
tools/testing/selftests/kvm/Makefile
tools/testing/selftests/kvm/aarch64/get-reg-list.c
tools/testing/selftests/kvm/x86_64/recalc_apic_map_test.c [new file with mode: 0644]
tools/testing/selftests/landlock/config
tools/testing/selftests/landlock/config.um [new file with mode: 0644]
tools/testing/selftests/landlock/fs_test.c
tools/testing/selftests/media_tests/video_device_test.c
tools/testing/selftests/mm/Makefile
tools/testing/selftests/net/.gitignore
tools/testing/selftests/net/fcnal-test.sh
tools/testing/selftests/net/fib_nexthops.sh
tools/testing/selftests/net/fib_tests.sh
tools/testing/selftests/net/forwarding/hw_stats_l3.sh
tools/testing/selftests/net/forwarding/lib.sh
tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh
tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh
tools/testing/selftests/net/mptcp/Makefile
tools/testing/selftests/net/mptcp/config
tools/testing/selftests/net/mptcp/diag.sh
tools/testing/selftests/net/mptcp/mptcp_connect.sh
tools/testing/selftests/net/mptcp/mptcp_join.sh
tools/testing/selftests/net/mptcp/mptcp_lib.sh [new file with mode: 0644]
tools/testing/selftests/net/mptcp/mptcp_sockopt.c
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
tools/testing/selftests/net/mptcp/pm_netlink.sh
tools/testing/selftests/net/mptcp/simult_flows.sh
tools/testing/selftests/net/mptcp/userspace_pm.sh
tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
tools/testing/selftests/net/tls.c
tools/testing/selftests/net/vrf-xfrm-tests.sh
tools/testing/selftests/netfilter/nft_flowtable.sh
tools/testing/selftests/nolibc/.gitignore
tools/testing/selftests/nolibc/Makefile
tools/testing/selftests/nolibc/nolibc-test.c
tools/testing/selftests/pidfd/pidfd.h
tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
tools/testing/selftests/pidfd/pidfd_test.c
tools/testing/selftests/prctl/set-anon-vma-name-test.c
tools/testing/selftests/ptp/testptp.c
tools/testing/selftests/rcutorture/bin/functions.sh
tools/testing/selftests/rcutorture/configs/rcu/BUSTED-BOOST.boot
tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
tools/testing/selftests/run_kselftest.sh
tools/testing/selftests/sgx/Makefile
tools/testing/selftests/tc-testing/config
tools/testing/selftests/tc-testing/tc-tests/qdiscs/sfb.json
tools/testing/selftests/tc-testing/tdc.sh
tools/testing/selftests/user_events/dyn_test.c
tools/testing/selftests/user_events/ftrace_test.c
tools/testing/selftests/user_events/perf_test.c
tools/testing/selftests/vDSO/vdso_test_clock_getres.c
tools/virtio/ringtest/.gitignore [new file with mode: 0644]
tools/virtio/ringtest/main.h
tools/virtio/virtio-trace/README
tools/virtio/virtio-trace/trace-agent.c
tools/workqueue/wq_monitor.py [new file with mode: 0644]
virt/kvm/kvm_main.c

index c9ba5bf..2325c52 100644 (file)
@@ -2,3 +2,4 @@
 *.[ch] diff=cpp
 *.dts diff=dts
 *.dts[io] diff=dts
+*.rs diff=rust
index 71127b2..4d71480 100644 (file)
--- a/.mailmap
+++ b/.mailmap
@@ -70,6 +70,8 @@ Baolin Wang <baolin.wang@linux.alibaba.com> <baolin.wang@unisoc.com>
 Baolin Wang <baolin.wang@linux.alibaba.com> <baolin.wang7@gmail.com>
 Bart Van Assche <bvanassche@acm.org> <bart.vanassche@sandisk.com>
 Bart Van Assche <bvanassche@acm.org> <bart.vanassche@wdc.com>
+Ben Dooks <ben-linux@fluff.org> <ben.dooks@simtec.co.uk>
+Ben Dooks <ben-linux@fluff.org> <ben.dooks@sifive.com>
 Ben Gardner <bgardner@wabtec.com>
 Ben M Cahill <ben.m.cahill@intel.com>
 Ben Widawsky <bwidawsk@kernel.org> <ben@bwidawsk.net>
@@ -181,6 +183,8 @@ Henrik Rydberg <rydberg@bitmath.org>
 Herbert Xu <herbert@gondor.apana.org.au>
 Huacai Chen <chenhuacai@kernel.org> <chenhc@lemote.com>
 Huacai Chen <chenhuacai@kernel.org> <chenhuacai@loongson.cn>
+J. Bruce Fields <bfields@fieldses.org> <bfields@redhat.com>
+J. Bruce Fields <bfields@fieldses.org> <bfields@citi.umich.edu>
 Jacob Shin <Jacob.Shin@amd.com>
 Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@google.com>
 Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk.kim@samsung.com>
@@ -233,6 +237,7 @@ Jisheng Zhang <jszhang@kernel.org> <Jisheng.Zhang@synaptics.com>
 Johan Hovold <johan@kernel.org> <jhovold@gmail.com>
 Johan Hovold <johan@kernel.org> <johan@hovoldconsulting.com>
 John Crispin <john@phrozen.org> <blogic@openwrt.org>
+John Keeping <john@keeping.me.uk> <john@metanate.com>
 John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
 John Stultz <johnstul@us.ibm.com>
 <jon.toppins+linux@gmail.com> <jtoppins@cumulusnetworks.com>
@@ -364,6 +369,11 @@ Nicolas Pitre <nico@fluxnic.net> <nico@linaro.org>
 Nicolas Saenz Julienne <nsaenz@kernel.org> <nsaenzjulienne@suse.de>
 Nicolas Saenz Julienne <nsaenz@kernel.org> <nsaenzjulienne@suse.com>
 Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
+Nikolay Aleksandrov <razor@blackwall.org> <naleksan@redhat.com>
+Nikolay Aleksandrov <razor@blackwall.org> <nikolay@redhat.com>
+Nikolay Aleksandrov <razor@blackwall.org> <nikolay@cumulusnetworks.com>
+Nikolay Aleksandrov <razor@blackwall.org> <nikolay@nvidia.com>
+Nikolay Aleksandrov <razor@blackwall.org> <nikolay@isovalent.com>
 Oleksandr Natalenko <oleksandr@natalenko.name> <oleksandr@redhat.com>
 Oleksij Rempel <linux@rempel-privat.de> <bug-track@fisher-privat.net>
 Oleksij Rempel <linux@rempel-privat.de> <external.Oleksij.Rempel@de.bosch.com>
diff --git a/CREDITS b/CREDITS
index 2d9da9a..8b48820 100644 (file)
--- a/CREDITS
+++ b/CREDITS
@@ -383,6 +383,12 @@ E: tomas@nocrew.org
 W: http://tomas.nocrew.org/
 D: dsp56k device driver
 
+N: Srivatsa S. Bhat
+E: srivatsa@csail.mit.edu
+D: Maintainer of Generic Paravirt-Ops subsystem
+D: Maintainer of VMware hypervisor interface
+D: Maintainer of VMware virtual PTP clock driver (ptp_vmw)
+
 N: Ross Biro
 E: ross.biro@gmail.com
 D: Original author of the Linux networking code
@@ -1706,6 +1712,10 @@ S: Panoramastrasse 18
 S: D-69126 Heidelberg
 S: Germany
 
+N: Neil Horman
+M: nhorman@tuxdriver.com
+D: SCTP protocol maintainer.
+
 N: Simon Horman
 M: horms@verge.net.au
 D: Renesas ARM/ARM64 SoC maintainer
index 49387d8..f3b6052 100644 (file)
@@ -2071,41 +2071,7 @@ call.
 
 Because RCU avoids interrupting idle CPUs, it is illegal to execute an
 RCU read-side critical section on an idle CPU. (Kernels built with
-``CONFIG_PROVE_RCU=y`` will splat if you try it.) The RCU_NONIDLE()
-macro and ``_rcuidle`` event tracing is provided to work around this
-restriction. In addition, rcu_is_watching() may be used to test
-whether or not it is currently legal to run RCU read-side critical
-sections on this CPU. I learned of the need for diagnostics on the one
-hand and RCU_NONIDLE() on the other while inspecting idle-loop code.
-Steven Rostedt supplied ``_rcuidle`` event tracing, which is used quite
-heavily in the idle loop. However, there are some restrictions on the
-code placed within RCU_NONIDLE():
-
-#. Blocking is prohibited. In practice, this is not a serious
-   restriction given that idle tasks are prohibited from blocking to
-   begin with.
-#. Although nesting RCU_NONIDLE() is permitted, they cannot nest
-   indefinitely deeply. However, given that they can be nested on the
-   order of a million deep, even on 32-bit systems, this should not be a
-   serious restriction. This nesting limit would probably be reached
-   long after the compiler OOMed or the stack overflowed.
-#. Any code path that enters RCU_NONIDLE() must sequence out of that
-   same RCU_NONIDLE(). For example, the following is grossly
-   illegal:
-
-      ::
-
-         1     RCU_NONIDLE({
-         2       do_something();
-         3       goto bad_idea;  /* BUG!!! */
-         4       do_something_else();});
-         5   bad_idea:
-
-
-   It is just as illegal to transfer control into the middle of
-   RCU_NONIDLE()'s argument. Yes, in theory, you could transfer in
-   as long as you also transferred out, but in practice you could also
-   expect to get sharply worded review comments.
+``CONFIG_PROVE_RCU=y`` will splat if you try it.)
 
 It is similarly socially unacceptable to interrupt an ``nohz_full`` CPU
 running in userspace. RCU must therefore track ``nohz_full`` userspace
index 8eddef2..e488c8e 100644 (file)
@@ -1117,7 +1117,6 @@ All: lockdep-checked RCU utility APIs::
 
        RCU_LOCKDEP_WARN
        rcu_sleep_check
-       RCU_NONIDLE
 
 All: Unchecked RCU-protected pointer access::
 
index bb5032a..6fdb495 100644 (file)
@@ -508,9 +508,6 @@ cache_miss_collisions
   cache miss, but raced with a write and data was already present (usually 0
   since the synchronization for cache misses was rewritten)
 
-cache_readaheads
-  Count of times readahead occurred.
-
 Sysfs - cache set
 ~~~~~~~~~~~~~~~~~
 
index f67c082..9badcb2 100644 (file)
@@ -1213,23 +1213,25 @@ PAGE_SIZE multiple when read back.
        A read-write single value file which exists on non-root
        cgroups.  The default is "max".
 
-       Memory usage throttle limit.  This is the main mechanism to
-       control memory usage of a cgroup.  If a cgroup's usage goes
+       Memory usage throttle limit.  If a cgroup's usage goes
        over the high boundary, the processes of the cgroup are
        throttled and put under heavy reclaim pressure.
 
        Going over the high limit never invokes the OOM killer and
-       under extreme conditions the limit may be breached.
+       under extreme conditions the limit may be breached. The high
+       limit should be used in scenarios where an external process
+       monitors the limited cgroup to alleviate heavy reclaim
+       pressure.
 
   memory.max
        A read-write single value file which exists on non-root
        cgroups.  The default is "max".
 
-       Memory usage hard limit.  This is the final protection
-       mechanism.  If a cgroup's memory usage reaches this limit and
-       can't be reduced, the OOM killer is invoked in the cgroup.
-       Under certain circumstances, the usage may go over the limit
-       temporarily.
+       Memory usage hard limit.  This is the main mechanism to limit
+       memory usage of a cgroup.  If a cgroup's memory usage reaches
+       this limit and can't be reduced, the OOM killer is invoked in
+       the cgroup. Under certain circumstances, the usage may go
+       over the limit temporarily.
 
        In default configuration regular 0-order allocations always
        succeed unless OOM killer chooses current task as a victim.
@@ -1238,10 +1240,6 @@ PAGE_SIZE multiple when read back.
        Caller could retry them differently, return into userspace
        as -ENOMEM or silently ignore in cases like disk readahead.
 
-       This is the ultimate protection mechanism.  As long as the
-       high limit is used and monitored properly, this limit's
-       utility is limited to providing the final safety net.
-
   memory.reclaim
        A write-only nested-keyed file which exists for all cgroups.
 
@@ -2024,31 +2022,33 @@ that attribute:
   no-change
        Do not modify the I/O priority class.
 
-  none-to-rt
-       For requests that do not have an I/O priority class (NONE),
-       change the I/O priority class into RT. Do not modify
-       the I/O priority class of other requests.
+  promote-to-rt
+       For requests that have a non-RT I/O priority class, change it into RT.
+       Also change the priority level of these requests to 4. Do not modify
+       the I/O priority of requests that have priority class RT.
 
   restrict-to-be
        For requests that do not have an I/O priority class or that have I/O
-       priority class RT, change it into BE. Do not modify the I/O priority
-       class of requests that have priority class IDLE.
+       priority class RT, change it into BE. Also change the priority level
+       of these requests to 0. Do not modify the I/O priority class of
+       requests that have priority class IDLE.
 
   idle
        Change the I/O priority class of all requests into IDLE, the lowest
        I/O priority class.
 
+  none-to-rt
+       Deprecated. Just an alias for promote-to-rt.
+
 The following numerical values are associated with the I/O priority policies:
 
-+-------------+---+
-| no-change   | 0 |
-+-------------+---+
-| none-to-rt  | 1 |
-+-------------+---+
-| rt-to-be    | 2 |
-+-------------+---+
-| all-to-idle | 3 |
-+-------------+---+
++----------------+---+
+| no-change      | 0 |
++----------------+---+
+| rt-to-be       | 2 |
++----------------+---+
+| all-to-idle    | 3 |
++----------------+---+
 
 The numerical value that corresponds to each I/O priority class is as follows:
 
@@ -2064,9 +2064,13 @@ The numerical value that corresponds to each I/O priority class is as follows:
 
 The algorithm to set the I/O priority class for a request is as follows:
 
-- Translate the I/O priority class policy into a number.
-- Change the request I/O priority class into the maximum of the I/O priority
-  class policy number and the numerical I/O priority class.
+- If I/O priority class policy is promote-to-rt, change the request I/O
+  priority class to IOPRIO_CLASS_RT and change the request I/O priority
+  level to 4.
+- If I/O priorityt class is not promote-to-rt, translate the I/O priority
+  class policy into a number, then change the request I/O priority class
+  into the maximum of the I/O priority class policy number and the numerical
+  I/O priority class.
 
 PID
 ---
@@ -2439,7 +2443,7 @@ Miscellaneous controller provides 3 interface files. If two misc resources (res_
          res_b 10
 
   misc.current
-        A read-only flat-keyed file shown in the non-root cgroups.  It shows
+        A read-only flat-keyed file shown in the all cgroups.  It shows
         the current usage of the resources in the cgroup and its children.::
 
          $ cat misc.current
index 3147bba..8c42c4d 100644 (file)
@@ -5,5 +5,5 @@ Changes
 See https://wiki.samba.org/index.php/LinuxCIFSKernel for summary
 information about fixes/improvements to CIFS/SMB2/SMB3 support (changes
 to cifs.ko module) by kernel version (and cifs internal module version).
-This may be easier to read than parsing the output of "git log fs/cifs"
-by release.
+This may be easier to read than parsing the output of
+"git log fs/smb/client" by release.
index 2e151cd..5f936b4 100644 (file)
@@ -45,7 +45,7 @@ Installation instructions
 
 If you have built the CIFS vfs as module (successfully) simply
 type ``make modules_install`` (or if you prefer, manually copy the file to
-the modules directory e.g. /lib/modules/2.4.10-4GB/kernel/fs/cifs/cifs.ko).
+the modules directory e.g. /lib/modules/6.3.0-060300-generic/kernel/fs/smb/client/cifs.ko).
 
 If you have built the CIFS vfs into the kernel itself, follow the instructions
 for your distribution on how to install a new kernel (usually you
@@ -66,15 +66,15 @@ If cifs is built as a module, then the size and number of network buffers
 and maximum number of simultaneous requests to one server can be configured.
 Changing these from their defaults is not recommended. By executing modinfo::
 
-       modinfo kernel/fs/cifs/cifs.ko
+       modinfo <path to cifs.ko>
 
-on kernel/fs/cifs/cifs.ko the list of configuration changes that can be made
+on kernel/fs/smb/client/cifs.ko the list of configuration changes that can be made
 at module initialization time (by running insmod cifs.ko) can be seen.
 
 Recommendations
 ===============
 
-To improve security the SMB2.1 dialect or later (usually will get SMB3) is now
+To improve security the SMB2.1 dialect or later (usually will get SMB3.1.1) is now
 the new default. To use old dialects (e.g. to mount Windows XP) use "vers=1.0"
 on mount (or vers=2.0 for Windows Vista).  Note that the CIFS (vers=1.0) is
 much older and less secure than the default dialect SMB3 which includes
index 9e5bab2..487d5da 100644 (file)
        arm64.nosme     [ARM64] Unconditionally disable Scalable Matrix
                        Extension support
 
+       arm64.nomops    [ARM64] Unconditionally disable Memory Copy and Memory
+                       Set instructions support
+
        ataflop=        [HW,M68k]
 
        atarimouse=     [HW,MOUSE] Atari Mouse
                        Format:
                        <first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
 
-       cpu0_hotplug    [X86] Turn on CPU0 hotplug feature when
-                       CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
-                       Some features depend on CPU0. Known dependencies are:
-                       1. Resume from suspend/hibernate depends on CPU0.
-                       Suspend/hibernate will fail if CPU0 is offline and you
-                       need to online CPU0 before suspend/hibernate.
-                       2. PIC interrupts also depend on CPU0. CPU0 can't be
-                       removed if a PIC interrupt is detected.
-                       It's said poweroff/reboot may depend on CPU0 on some
-                       machines although I haven't seen such issues so far
-                       after CPU0 is offline on a few tested machines.
-                       If the dependencies are under your control, you can
-                       turn on cpu0_hotplug.
-
        cpuidle.off=1   [CPU_IDLE]
                        disable the cpuidle sub-system
 
                        on every CPU online, such as boot, and resume from suspend.
                        Default: 10000
 
+       cpuhp.parallel=
+                       [SMP] Enable/disable parallel bringup of secondary CPUs
+                       Format: <bool>
+                       Default is enabled if CONFIG_HOTPLUG_PARALLEL=y. Otherwise
+                       the parameter has no effect.
+
        crash_kexec_post_notifiers
                        Run kdump after running panic-notifiers and dumping
                        kmsg. This only for the users who doubt kdump always
                        disable
                          Do not enable intel_pstate as the default
                          scaling driver for the supported processors
+                        active
+                          Use intel_pstate driver to bypass the scaling
+                          governors layer of cpufreq and provides it own
+                          algorithms for p-state selection. There are two
+                          P-state selection algorithms provided by
+                          intel_pstate in the active mode: powersave and
+                          performance.  The way they both operate depends
+                          on whether or not the hardware managed P-states
+                          (HWP) feature has been enabled in the processor
+                          and possibly on the processor model.
                        passive
                          Use intel_pstate as a scaling driver, but configure it
                          to work with generic cpufreq governors (instead of
                        If the value is 0 (the default), KVM will pick a period based
                        on the ratio, such that a page is zapped after 1 hour on average.
 
-       kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
-                       Default is 1 (enabled)
+       kvm-amd.nested= [KVM,AMD] Control nested virtualization feature in
+                       KVM/SVM. Default is 1 (enabled).
 
-       kvm-amd.npt=    [KVM,AMD] Disable nested paging (virtualized MMU)
-                       for all guests.
-                       Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
+       kvm-amd.npt=    [KVM,AMD] Control KVM's use of Nested Page Tables,
+                       a.k.a. Two-Dimensional Page Tables. Default is 1
+                       (enabled). Disable by KVM if hardware lacks support
+                       for NPT.
 
        kvm-arm.mode=
                        [KVM,ARM] Select one of KVM/arm64's modes of operation.
                        Format: <integer>
                        Default: 5
 
-       kvm-intel.ept=  [KVM,Intel] Disable extended page tables
-                       (virtualized MMU) support on capable Intel chips.
-                       Default is 1 (enabled)
+       kvm-intel.ept=  [KVM,Intel] Control KVM's use of Extended Page Tables,
+                       a.k.a. Two-Dimensional Page Tables.  Default is 1
+                       (enabled). Disable by KVM if hardware lacks support
+                       for EPT.
 
        kvm-intel.emulate_invalid_guest_state=
-                       [KVM,Intel] Disable emulation of invalid guest state.
-                       Ignored if kvm-intel.enable_unrestricted_guest=1, as
-                       guest state is never invalid for unrestricted guests.
-                       This param doesn't apply to nested guests (L2), as KVM
-                       never emulates invalid L2 guest state.
-                       Default is 1 (enabled)
+                       [KVM,Intel] Control whether to emulate invalid guest
+                       state. Ignored if kvm-intel.enable_unrestricted_guest=1,
+                       as guest state is never invalid for unrestricted
+                       guests. This param doesn't apply to nested guests (L2),
+                       as KVM never emulates invalid L2 guest state.
+                       Default is 1 (enabled).
 
        kvm-intel.flexpriority=
-                       [KVM,Intel] Disable FlexPriority feature (TPR shadow).
-                       Default is 1 (enabled)
+                       [KVM,Intel] Control KVM's use of FlexPriority feature
+                       (TPR shadow). Default is 1 (enabled). Disalbe by KVM if
+                       hardware lacks support for it.
 
        kvm-intel.nested=
-                       [KVM,Intel] Enable VMX nesting (nVMX).
-                       Default is 0 (disabled)
+                       [KVM,Intel] Control nested virtualization feature in
+                       KVM/VMX. Default is 1 (enabled).
 
        kvm-intel.unrestricted_guest=
-                       [KVM,Intel] Disable unrestricted guest feature
-                       (virtualized real and unpaged mode) on capable
-                       Intel chips. Default is 1 (enabled)
+                       [KVM,Intel] Control KVM's use of unrestricted guest
+                       feature (virtualized real and unpaged mode). Default
+                       is 1 (enabled). Disable by KVM if EPT is disabled or
+                       hardware lacks support for it.
 
        kvm-intel.vmentry_l1d_flush=[KVM,Intel] Mitigation for L1 Terminal Fault
                        CVE-2018-3620.
 
                        Default is cond (do L1 cache flush in specific instances)
 
-       kvm-intel.vpid= [KVM,Intel] Disable Virtual Processor Identification
-                       feature (tagged TLBs) on capable Intel chips.
-                       Default is 1 (enabled)
+       kvm-intel.vpid= [KVM,Intel] Control KVM's use of Virtual Processor
+                       Identification feature (tagged TLBs). Default is 1
+                       (enabled). Disable by KVM if hardware lacks support
+                       for it.
 
        l1d_flush=      [X86,INTEL]
                        Control mitigation for L1D based snooping vulnerability.
                        [HW] Make the MicroTouch USB driver use raw coordinates
                        ('y', default) or cooked coordinates ('n')
 
+       mtrr=debug      [X86]
+                       Enable printing debug information related to MTRR
+                       registers at boot time.
+
        mtrr_chunk_size=nn[KMG] [X86]
                        used for mtrr cleanup. It is largest continuous chunk
                        that could hold holes aka. UC entries.
                        the propagation of recent CPU-hotplug changes up
                        the rcu_node combining tree.
 
-       rcutree.use_softirq=    [KNL]
-                       If set to zero, move all RCU_SOFTIRQ processing to
-                       per-CPU rcuc kthreads.  Defaults to a non-zero
-                       value, meaning that RCU_SOFTIRQ is used by default.
-                       Specify rcutree.use_softirq=0 to use rcuc kthreads.
-
-                       But note that CONFIG_PREEMPT_RT=y kernels disable
-                       this kernel boot parameter, forcibly setting it
-                       to zero.
-
-       rcutree.rcu_fanout_exact= [KNL]
-                       Disable autobalancing of the rcu_node combining
-                       tree.  This is used by rcutorture, and might
-                       possibly be useful for architectures having high
-                       cache-to-cache transfer latencies.
-
-       rcutree.rcu_fanout_leaf= [KNL]
-                       Change the number of CPUs assigned to each
-                       leaf rcu_node structure.  Useful for very
-                       large systems, which will choose the value 64,
-                       and for NUMA systems with large remote-access
-                       latencies, which will choose a value aligned
-                       with the appropriate hardware boundaries.
-
-       rcutree.rcu_min_cached_objs= [KNL]
-                       Minimum number of objects which are cached and
-                       maintained per one CPU. Object size is equal
-                       to PAGE_SIZE. The cache allows to reduce the
-                       pressure to page allocator, also it makes the
-                       whole algorithm to behave better in low memory
-                       condition.
-
-       rcutree.rcu_delay_page_cache_fill_msec= [KNL]
-                       Set the page-cache refill delay (in milliseconds)
-                       in response to low-memory conditions.  The range
-                       of permitted values is in the range 0:100000.
-
        rcutree.jiffies_till_first_fqs= [KNL]
                        Set delay from grace-period initialization to
                        first attempt to force quiescent states.
                        When RCU_NOCB_CPU is set, also adjust the
                        priority of NOCB callback kthreads.
 
-       rcutree.rcu_divisor= [KNL]
-                       Set the shift-right count to use to compute
-                       the callback-invocation batch limit bl from
-                       the number of callbacks queued on this CPU.
-                       The result will be bounded below by the value of
-                       the rcutree.blimit kernel parameter.  Every bl
-                       callbacks, the softirq handler will exit in
-                       order to allow the CPU to do other work.
-
-                       Please note that this callback-invocation batch
-                       limit applies only to non-offloaded callback
-                       invocation.  Offloaded callbacks are instead
-                       invoked in the context of an rcuoc kthread, which
-                       scheduler will preempt as it does any other task.
-
        rcutree.nocb_nobypass_lim_per_jiffy= [KNL]
                        On callback-offloaded (rcu_nocbs) CPUs,
                        RCU reduces the lock contention that would
                        the ->nocb_bypass queue.  The definition of "too
                        many" is supplied by this kernel boot parameter.
 
-       rcutree.rcu_nocb_gp_stride= [KNL]
-                       Set the number of NOCB callback kthreads in
-                       each group, which defaults to the square root
-                       of the number of CPUs.  Larger numbers reduce
-                       the wakeup overhead on the global grace-period
-                       kthread, but increases that same overhead on
-                       each group's NOCB grace-period kthread.
-
        rcutree.qhimark= [KNL]
                        Set threshold of queued RCU callbacks beyond which
                        batch limiting is disabled.
                        on rcutree.qhimark at boot time and to zero to
                        disable more aggressive help enlistment.
 
+       rcutree.rcu_delay_page_cache_fill_msec= [KNL]
+                       Set the page-cache refill delay (in milliseconds)
+                       in response to low-memory conditions.  The range
+                       of permitted values is in the range 0:100000.
+
+       rcutree.rcu_divisor= [KNL]
+                       Set the shift-right count to use to compute
+                       the callback-invocation batch limit bl from
+                       the number of callbacks queued on this CPU.
+                       The result will be bounded below by the value of
+                       the rcutree.blimit kernel parameter.  Every bl
+                       callbacks, the softirq handler will exit in
+                       order to allow the CPU to do other work.
+
+                       Please note that this callback-invocation batch
+                       limit applies only to non-offloaded callback
+                       invocation.  Offloaded callbacks are instead
+                       invoked in the context of an rcuoc kthread, which
+                       scheduler will preempt as it does any other task.
+
+       rcutree.rcu_fanout_exact= [KNL]
+                       Disable autobalancing of the rcu_node combining
+                       tree.  This is used by rcutorture, and might
+                       possibly be useful for architectures having high
+                       cache-to-cache transfer latencies.
+
+       rcutree.rcu_fanout_leaf= [KNL]
+                       Change the number of CPUs assigned to each
+                       leaf rcu_node structure.  Useful for very
+                       large systems, which will choose the value 64,
+                       and for NUMA systems with large remote-access
+                       latencies, which will choose a value aligned
+                       with the appropriate hardware boundaries.
+
+       rcutree.rcu_min_cached_objs= [KNL]
+                       Minimum number of objects which are cached and
+                       maintained per one CPU. Object size is equal
+                       to PAGE_SIZE. The cache allows to reduce the
+                       pressure to page allocator, also it makes the
+                       whole algorithm to behave better in low memory
+                       condition.
+
+       rcutree.rcu_nocb_gp_stride= [KNL]
+                       Set the number of NOCB callback kthreads in
+                       each group, which defaults to the square root
+                       of the number of CPUs.  Larger numbers reduce
+                       the wakeup overhead on the global grace-period
+                       kthread, but increases that same overhead on
+                       each group's NOCB grace-period kthread.
+
        rcutree.rcu_kick_kthreads= [KNL]
                        Cause the grace-period kthread to get an extra
                        wake_up() if it sleeps three times longer than
                        This wake_up() will be accompanied by a
                        WARN_ONCE() splat and an ftrace_dump().
 
+       rcutree.rcu_resched_ns= [KNL]
+                       Limit the time spend invoking a batch of RCU
+                       callbacks to the specified number of nanoseconds.
+                       By default, this limit is checked only once
+                       every 32 callbacks in order to limit the pain
+                       inflicted by local_clock() overhead.
+
        rcutree.rcu_unlock_delay= [KNL]
                        In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels,
                        this specifies an rcu_read_unlock()-time delay
                        rcu_node tree with an eye towards determining
                        why a new grace period has not yet started.
 
+       rcutree.use_softirq=    [KNL]
+                       If set to zero, move all RCU_SOFTIRQ processing to
+                       per-CPU rcuc kthreads.  Defaults to a non-zero
+                       value, meaning that RCU_SOFTIRQ is used by default.
+                       Specify rcutree.use_softirq=0 to use rcuc kthreads.
+
+                       But note that CONFIG_PREEMPT_RT=y kernels disable
+                       this kernel boot parameter, forcibly setting it
+                       to zero.
+
        rcuscale.gp_async= [KNL]
                        Measure performance of asynchronous
                        grace-period primitives such as call_rcu().
 
        rcutorture.stall_cpu_block= [KNL]
                        Sleep while stalling if set.  This will result
-                       in warnings from preemptible RCU in addition
-                       to any other stall-related activity.
+                       in warnings from preemptible RCU in addition to
+                       any other stall-related activity.  Note that
+                       in kernels built with CONFIG_PREEMPTION=n and
+                       CONFIG_PREEMPT_COUNT=y, this parameter will
+                       cause the CPU to pass through a quiescent state.
+                       Given CONFIG_PREEMPTION=n, this will suppress
+                       RCU CPU stall warnings, but will instead result
+                       in scheduling-while-atomic splats.
+
+                       Use of this module parameter results in splats.
+
 
        rcutorture.stall_cpu_holdoff= [KNL]
                        Time to wait (s) after boot before inducing stall.
                        port and the regular usb controller gets disabled.
 
        root=           [KNL] Root filesystem
-                       See name_to_dev_t comment in init/do_mounts.c.
+                       Usually this a a block device specifier of some kind,
+                       see the early_lookup_bdev comment in
+                       block/early-lookup.c for details.
+                       Alternatively this can be "ram" for the legacy initial
+                       ramdisk, "nfs" and "cifs" for root on a network file
+                       system, or "mtd" and "ubi" for mounting from raw flash.
 
        rootdelay=      [KNL] Delay (in seconds) to pause before attempting to
                        mount the root filesystem
        unknown_nmi_panic
                        [X86] Cause panic on unknown NMI.
 
+       unwind_debug    [X86-64]
+                       Enable unwinder debug output.  This can be
+                       useful for debugging certain unwinder error
+                       conditions, including corrupt stacks and
+                       bad/missing unwinder metadata.
+
        usbcore.authorized_default=
                        [USB] Default USB device authorization:
                        (default -1 = authorized except for wireless USB,
                        it can be updated at runtime by writing to the
                        corresponding sysfs file.
 
+       workqueue.cpu_intensive_thresh_us=
+                       Per-cpu work items which run for longer than this
+                       threshold are automatically considered CPU intensive
+                       and excluded from concurrency management to prevent
+                       them from noticeably delaying other per-cpu work
+                       items. Default is 10000 (10ms).
+
+                       If CONFIG_WQ_CPU_INTENSIVE_REPORT is set, the kernel
+                       will report the work functions which violate this
+                       threshold repeatedly. They are likely good
+                       candidates for using WQ_UNBOUND workqueues instead.
+
        workqueue.disable_numa
                        By default, all work items queued to unbound
                        workqueues are affine to the NUMA nodes they're
index 5469793..e0174d2 100644 (file)
@@ -56,14 +56,14 @@ Example usage of perf::
 For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
 as PMU v1, but some new functions are added to the hardware.
 
-(a) L3C PMU supports filtering by core/thread within the cluster which can be
+1. L3C PMU supports filtering by core/thread within the cluster which can be
 specified as a bitmap::
 
   $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
 
 This will only count the operations from core/thread 0 and 1 in this cluster.
 
-(b) Tracetag allow the user to chose to count only read, write or atomic
+2. Tracetag allow the user to chose to count only read, write or atomic
 operations via the tt_req parameeter in perf. The default value counts all
 operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
 represents write operations, 3'b110 represents atomic store operations and
@@ -73,14 +73,16 @@ represents write operations, 3'b110 represents atomic store operations and
 
 This will only count the read operations in this cluster.
 
-(c) Datasrc allows the user to check where the data comes from. It is 5 bits.
+3. Datasrc allows the user to check where the data comes from. It is 5 bits.
 Some important codes are as follows:
-5'b00001: comes from L3C in this die;
-5'b01000: comes from L3C in the cross-die;
-5'b01001: comes from L3C which is in another socket;
-5'b01110: comes from the local DDR;
-5'b01111: comes from the cross-die DDR;
-5'b10000: comes from cross-socket DDR;
+
+- 5'b00001: comes from L3C in this die;
+- 5'b01000: comes from L3C in the cross-die;
+- 5'b01001: comes from L3C which is in another socket;
+- 5'b01110: comes from the local DDR;
+- 5'b01111: comes from the cross-die DDR;
+- 5'b10000: comes from cross-socket DDR;
+
 etc, it is mainly helpful to find that the data source is nearest from the CPU
 cores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
 configured in perf command::
@@ -88,15 +90,25 @@ configured in perf command::
   $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
   hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
 
-(d)Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
+4. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
 contains several Compute Clusters (CCLs). The I/O dies are called Super I/O
 clusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
 SoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
 CCL/ICL-ID. For I/O die, the ICL-ID is followed by:
-5'b00000: I/O_MGMT_ICL;
-5'b00001: Network_ICL;
-5'b00011: HAC_ICL;
-5'b10000: PCIe_ICL;
+
+- 5'b00000: I/O_MGMT_ICL;
+- 5'b00001: Network_ICL;
+- 5'b00011: HAC_ICL;
+- 5'b10000: PCIe_ICL;
+
+5. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
+uring channel. It is 2 bits. Some important codes are as follows:
+
+- 2'b11: count the events which sent to the uring_ext (MATA) channel;
+- 2'b01: is the same as 2'b11;
+- 2'b10: count the events which sent to the uring (non-MATA) channel;
+- 2'b00: default value, count the events which sent to the both uring and
+  uring_ext channel;
 
 Users could configure IDs to count data come from specific CCL/ICL, by setting
 srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
index ff4f4cc..f08149b 100644 (file)
@@ -215,12 +215,14 @@ again.
    reduce the compile time enormously, especially if you are running an
    universal kernel from a commodity Linux distribution.
 
-   There is a catch: the make target 'localmodconfig' will disable kernel
-   features you have not directly or indirectly through some program utilized
-   since you booted the system. You can reduce or nearly eliminate that risk by
-   using tricks outlined in the reference section; for quick testing purposes
-   that risk is often negligible, but it is an aspect you want to keep in mind
-   in case your kernel behaves oddly.
+   There is a catch: 'localmodconfig' is likely to disable kernel features you
+   did not use since you booted your Linux -- like drivers for currently
+   disconnected peripherals or a virtualization software not haven't used yet.
+   You can reduce or nearly eliminate that risk with tricks the reference
+   section outlines; but when building a kernel just for quick testing purposes
+   it is often negligible if such features are missing. But you should keep that
+   aspect in mind when using a kernel built with this make target, as it might
+   be the reason why something you only use occasionally stopped working.
 
    [:ref:`details<configuration>`]
 
@@ -271,6 +273,9 @@ again.
    does nothing at all; in that case you have to manually install your kernel,
    as outlined in the reference section.
 
+   If you are running a immutable Linux distribution, check its documentation
+   and the web to find out how to install your own kernel there.
+
    [:ref:`details<install>`]
 
 .. _another_sbs:
@@ -291,29 +296,29 @@ again.
    version you care about, as git otherwise might retrieve the entire commit
    history::
 
-     git fetch --shallow-exclude=v6.1 origin
-
-   If you modified the sources (for example by applying a patch), you now need
-   to discard those modifications; that's because git otherwise will not be able
-   to switch to the sources of another version due to potential conflicting
-   changes::
-
-     git reset --hard
+     git fetch --shallow-exclude=v6.0 origin
 
-   Now checkout the version you are interested in, as explained above::
+   Now switch to the version you are interested in -- but be aware the command
+   used here will discard any modifications you performed, as they would
+   conflict with the sources you want to checkout::
 
-     git checkout --detach origin/master
+     git checkout --force --detach origin/master
 
    At this point you might want to patch the sources again or set/modify a build
-   tag, as explained earlier; afterwards adjust the build configuration to the
-   new codebase and build your next kernel::
+   tag, as explained earlier. Afterwards adjust the build configuration to the
+   new codebase using olddefconfig, which will now adjust the configuration file
+   you prepared earlier using localmodconfig  (~/linux/.config) for your next
+   kernel::
 
      # reminder: if you want to apply patches, do it at this point
      # reminder: you might want to update your build tag at this point
      make olddefconfig
+
+   Now build your kernel::
+
      make -j $(nproc --all)
 
-   Install the kernel as outlined above::
+   Afterwards install the kernel as outlined above::
 
      command -v installkernel && sudo make modules_install install
 
@@ -584,11 +589,11 @@ versions and individual commits at hand at any time::
     curl -L \
       https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/clone.bundle \
       -o linux-stable.git.bundle
-    git clone clone.bundle ~/linux/
+    git clone linux-stable.git.bundle ~/linux/
     rm linux-stable.git.bundle
     cd ~/linux/
-    git remote set-url origin
-    https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
+    git remote set-url origin \
+      https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
     git fetch origin
     git checkout --detach origin/master
 
index 80ee310..21e3d0b 100644 (file)
@@ -10,7 +10,7 @@ implementation.
    :maxdepth: 2
 
    arc/index
-   ../arm/index
+   arm/index
    ../arm64/index
    ia64/index
    ../loongarch/index
index 387ccbc..cb05d90 100644 (file)
@@ -287,6 +287,13 @@ Removing a directory will move all tasks and cpus owned by the group it
 represents to the parent. Removing one of the created CTRL_MON groups
 will automatically remove all MON groups below it.
 
+Moving MON group directories to a new parent CTRL_MON group is supported
+for the purpose of changing the resource allocations of a MON group
+without impacting its monitoring data or assigned tasks. This operation
+is not allowed for MON groups which monitor CPUs. No other move
+operation is currently allowed other than simply renaming a CTRL_MON or
+MON group.
+
 All groups contain the following files:
 
 "tasks":
index 484ef96..1da2220 100644 (file)
@@ -17,16 +17,37 @@ For ACPI on arm64, tables also fall into the following categories:
 
        -  Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT
 
-       -  Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IBFT,
-          IORT, MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT,
-          STAO, TCPA, TPM2, UEFI, XENV
+       -  Optional: AGDI, BGRT, CEDT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT,
+          HMAT, IBFT, IORT, MCHI, MPAM, MPST, MSCT, NFIT, PMTT, PPTT, RASF, SBST,
+          SDEI, SLIT, SPMI, SRAT, STAO, TCPA, TPM2, UEFI, XENV
 
-       -  Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT, MSDM, OEMx,
-          PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
+       -  Not supported: AEST, APMT, BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT,
+          MSDM, OEMx, PDTT, PSDT, RAS2, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
 
 ====== ========================================================================
 Table  Usage for ARMv8 Linux
 ====== ========================================================================
+AEST   Signature Reserved (signature == "AEST")
+
+       **Arm Error Source Table**
+
+       This table informs the OS of any error nodes in the system that are
+       compliant with the Arm RAS architecture.
+
+AGDI   Signature Reserved (signature == "AGDI")
+
+       **Arm Generic diagnostic Dump and Reset Device Interface Table**
+
+       This table describes a non-maskable event, that is used by the platform
+       firmware, to request the OS to generate a diagnostic dump and reset the device.
+
+APMT   Signature Reserved (signature == "APMT")
+
+       **Arm Performance Monitoring Table**
+
+       This table describes the properties of PMU support implmented by
+       components in the system.
+
 BERT   Section 18.3 (signature == "BERT")
 
        **Boot Error Record Table**
@@ -47,6 +68,13 @@ BGRT   Section 5.2.22 (signature == "BGRT")
        Optional, not currently supported, with no real use-case for an
        ARM server.
 
+CEDT   Signature Reserved (signature == "CEDT")
+
+       **CXL Early Discovery Table**
+
+       This table allows the OS to discover any CXL Host Bridges and the Host
+       Bridge registers.
+
 CPEP   Section 5.2.18 (signature == "CPEP")
 
        **Corrected Platform Error Polling table**
@@ -184,6 +212,15 @@ HEST   Section 18.3.2 (signature == "HEST")
        Must be supplied if RAS support is provided by the platform.  It
        is recommended this table be supplied.
 
+HMAT   Section 5.2.28 (signature == "HMAT")
+
+       **Heterogeneous Memory Attribute Table**
+
+       This table describes the memory attributes, such as memory side cache
+       attributes and bandwidth and latency details, related to Memory Proximity
+       Domains. The OS uses this information to optimize the system memory
+       configuration.
+
 HPET   Signature Reserved (signature == "HPET")
 
        **High Precision Event timer Table**
@@ -241,6 +278,13 @@ MCHI   Signature Reserved (signature == "MCHI")
 
        Optional, not currently supported.
 
+MPAM   Signature Reserved (signature == "MPAM")
+
+       **Memory Partitioning And Monitoring table**
+
+       This table allows the OS to discover the MPAM controls implemented by
+       the subsystems.
+
 MPST   Section 5.2.21 (signature == "MPST")
 
        **Memory Power State Table**
@@ -281,18 +325,39 @@ PCCT   Section 14.1 (signature == "PCCT)
        Recommend for use on arm64; use of PCC is recommended when using CPPC
        to control performance and power for platform processors.
 
+PDTT   Section 5.2.29 (signature == "PDTT")
+
+       **Platform Debug Trigger Table**
+
+       This table describes PCC channels used to gather debug logs of
+       non-architectural features.
+
+
 PMTT   Section 5.2.21.12 (signature == "PMTT")
 
        **Platform Memory Topology Table**
 
        Optional, not currently supported.
 
+PPTT   Section 5.2.30 (signature == "PPTT")
+
+       **Processor Properties Topology Table**
+
+       This table provides the processor and cache topology.
+
 PSDT   Section 5.2.11.3 (signature == "PSDT")
 
        **Persistent System Description Table**
 
        Obsolete table, will not be supported.
 
+RAS2   Section 5.2.21 (signature == "RAS2")
+
+       **RAS Features 2 table**
+
+       This table provides interfaces for the RAS capabilities implemented in
+       the platform.
+
 RASF   Section 5.2.20 (signature == "RASF")
 
        **RAS Feature table**
@@ -318,6 +383,12 @@ SBST   Section 5.2.14 (signature == "SBST")
 
        Optional, not currently supported.
 
+SDEI   Signature Reserved (signature == "SDEI")
+
+       **Software Delegated Exception Interface table**
+
+       This table advertises the presence of the SDEI interface.
+
 SLIC   Signature Reserved (signature == "SLIC")
 
        **Software LIcensing table**
index 47ecb99..37ec5e9 100644 (file)
@@ -1,40 +1,41 @@
-=====================
-ACPI on ARMv8 Servers
-=====================
-
-ACPI can be used for ARMv8 general purpose servers designed to follow
-the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server
-Base Boot Requirements) [1] specifications.  Please note that the SBBR
-can be retrieved simply by visiting [1], but the SBSA is currently only
-available to those with an ARM login due to ARM IP licensing concerns.
-
-The ARMv8 kernel implements the reduced hardware model of ACPI version
+===================
+ACPI on Arm systems
+===================
+
+ACPI can be used for Armv8 and Armv9 systems designed to follow
+the BSA (Arm Base System Architecture) [0] and BBR (Arm
+Base Boot Requirements) [1] specifications.  Both BSA and BBR are publicly
+accessible documents.
+Arm Servers, in addition to being BSA compliant, comply with a set
+of rules defined in SBSA (Server Base System Architecture) [2].
+
+The Arm kernel implements the reduced hardware model of ACPI version
 5.1 or later.  Links to the specification and all external documents
 it refers to are managed by the UEFI Forum.  The specification is
 available at http://www.uefi.org/specifications and documents referenced
 by the specification can be found via http://www.uefi.org/acpi.
 
-If an ARMv8 system does not meet the requirements of the SBSA and SBBR,
+If an Arm system does not meet the requirements of the BSA and BBR,
 or cannot be described using the mechanisms defined in the required ACPI
 specifications, then ACPI may not be a good fit for the hardware.
 
 While the documents mentioned above set out the requirements for building
-industry-standard ARMv8 servers, they also apply to more than one operating
+industry-standard Arm systems, they also apply to more than one operating
 system.  The purpose of this document is to describe the interaction between
-ACPI and Linux only, on an ARMv8 system -- that is, what Linux expects of
+ACPI and Linux only, on an Arm system -- that is, what Linux expects of
 ACPI and what ACPI can expect of Linux.
 
 
-Why ACPI on ARM?
+Why ACPI on Arm?
 ----------------
 Before examining the details of the interface between ACPI and Linux, it is
 useful to understand why ACPI is being used.  Several technologies already
 exist in Linux for describing non-enumerable hardware, after all.  In this
-section we summarize a blog post [2] from Grant Likely that outlines the
-reasoning behind ACPI on ARMv8 servers.  Actually, we snitch a good portion
+section we summarize a blog post [3] from Grant Likely that outlines the
+reasoning behind ACPI on Arm systems.  Actually, we snitch a good portion
 of the summary text almost directly, to be honest.
 
-The short form of the rationale for ACPI on ARM is:
+The short form of the rationale for ACPI on Arm is:
 
 -  ACPI’s byte code (AML) allows the platform to encode hardware behavior,
    while DT explicitly does not support this.  For hardware vendors, being
@@ -47,7 +48,7 @@ The short form of the rationale for ACPI on ARM is:
 
 -  In the enterprise server environment, ACPI has established bindings (such
    as for RAS) which are currently used in production systems.  DT does not.
-   Such bindings could be defined in DT at some point, but doing so means ARM
+   Such bindings could be defined in DT at some point, but doing so means Arm
    and x86 would end up using completely different code paths in both firmware
    and the kernel.
 
@@ -108,7 +109,7 @@ recent version of the kernel.
 
 Relationship with Device Tree
 -----------------------------
-ACPI support in drivers and subsystems for ARMv8 should never be mutually
+ACPI support in drivers and subsystems for Arm should never be mutually
 exclusive with DT support at compile time.
 
 At boot time the kernel will only use one description method depending on
@@ -121,11 +122,11 @@ time).
 
 Booting using ACPI tables
 -------------------------
-The only defined method for passing ACPI tables to the kernel on ARMv8
+The only defined method for passing ACPI tables to the kernel on Arm
 is via the UEFI system configuration table.  Just so it is explicit, this
 means that ACPI is only supported on platforms that boot via UEFI.
 
-When an ARMv8 system boots, it can either have DT information, ACPI tables,
+When an Arm system boots, it can either have DT information, ACPI tables,
 or in some very unusual cases, both.  If no command line parameters are used,
 the kernel will try to use DT for device enumeration; if there is no DT
 present, the kernel will try to use ACPI tables, but only if they are present.
@@ -169,7 +170,7 @@ hardware reduced mode must be set to zero.
 
 For the ACPI core to operate properly, and in turn provide the information
 the kernel needs to configure devices, it expects to find the following
-tables (all section numbers refer to the ACPI 6.1 specification):
+tables (all section numbers refer to the ACPI 6.5 specification):
 
     -  RSDP (Root System Description Pointer), section 5.2.5
 
@@ -184,20 +185,76 @@ tables (all section numbers refer to the ACPI 6.1 specification):
 
     -  GTDT (Generic Timer Description Table), section 5.2.24
 
+    -  PPTT (Processor Properties Topology Table), section 5.2.30
+
+    -  DBG2 (DeBuG port table 2), section 5.2.6, specifically Table 5-6.
+
+    -  APMT (Arm Performance Monitoring unit Table), section 5.2.6, specifically Table 5-6.
+
+    -  AGDI (Arm Generic diagnostic Dump and Reset Device Interface Table), section 5.2.6, specifically Table 5-6.
+
     -  If PCI is supported, the MCFG (Memory mapped ConFiGuration
-       Table), section 5.2.6, specifically Table 5-31.
+       Table), section 5.2.6, specifically Table 5-6.
 
     -  If booting without a console=<device> kernel parameter is
        supported, the SPCR (Serial Port Console Redirection table),
-       section 5.2.6, specifically Table 5-31.
+       section 5.2.6, specifically Table 5-6.
 
     -  If necessary to describe the I/O topology, SMMUs and GIC ITSs,
        the IORT (Input Output Remapping Table, section 5.2.6, specifically
-       Table 5-31).
+       Table 5-6).
+
+    -  If NUMA is supported, the following tables are required:
+
+       - SRAT (System Resource Affinity Table), section 5.2.16
+
+       - SLIT (System Locality distance Information Table), section 5.2.17
+
+    -  If NUMA is supported, and the system contains heterogeneous memory,
+       the HMAT (Heterogeneous Memory Attribute Table), section 5.2.28.
+
+    -  If the ACPI Platform Error Interfaces are required, the following
+       tables are conditionally required:
+
+       - BERT (Boot Error Record Table, section 18.3.1)
+
+       - EINJ (Error INJection table, section 18.6.1)
+
+       - ERST (Error Record Serialization Table, section 18.5)
+
+       - HEST (Hardware Error Source Table, section 18.3.2)
+
+       - SDEI (Software Delegated Exception Interface table, section 5.2.6,
+         specifically Table 5-6)
+
+       - AEST (Arm Error Source Table, section 5.2.6,
+         specifically Table 5-6)
+
+       - RAS2 (ACPI RAS2 feature table, section 5.2.21)
+
+    -  If the system contains controllers using PCC channel, the
+       PCCT (Platform Communications Channel Table), section 14.1
+
+    -  If the system contains a controller to capture board-level system state,
+       and communicates with the host via PCC, the PDTT (Platform Debug Trigger
+       Table), section 5.2.29.
+
+    -  If NVDIMM is supported, the NFIT (NVDIMM Firmware Interface Table), section 5.2.26
+
+    -  If video framebuffer is present, the BGRT (Boot Graphics Resource Table), section 5.2.23
+
+    -  If IPMI is implemented, the SPMI (Server Platform Management Interface),
+       section 5.2.6, specifically Table 5-6.
+
+    -  If the system contains a CXL Host Bridge, the CEDT (CXL Early Discovery
+       Table), section 5.2.6, specifically Table 5-6.
+
+    -  If the system supports MPAM, the MPAM (Memory Partitioning And Monitoring table), section 5.2.6,
+       specifically Table 5-6.
+
+    -  If the system lacks persistent storage, the IBFT (ISCSI Boot Firmware
+       Table), section 5.2.6, specifically Table 5-6.
 
-    -  If NUMA is supported, the SRAT (System Resource Affinity Table)
-       and SLIT (System Locality distance Information Table), sections
-       5.2.16 and 5.2.17, respectively.
 
 If the above tables are not all present, the kernel may or may not be
 able to boot properly since it may not be able to configure all of the
@@ -269,16 +326,14 @@ Drivers should look for device properties in the _DSD object ONLY; the _DSD
 object is described in the ACPI specification section 6.2.5, but this only
 describes how to define the structure of an object returned via _DSD, and
 how specific data structures are defined by specific UUIDs.  Linux should
-only use the _DSD Device Properties UUID [5]:
+only use the _DSD Device Properties UUID [4]:
 
    - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
 
-   - https://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
-
-The UEFI Forum provides a mechanism for registering device properties [4]
-so that they may be used across all operating systems supporting ACPI.
-Device properties that have not been registered with the UEFI Forum should
-not be used.
+Common device properties can be registered by creating a pull request to [4] so
+that they may be used across all operating systems supporting ACPI.
+Device properties that have not been registered with the UEFI Forum can be used
+but not as "uefi-" common properties.
 
 Before creating new device properties, check to be sure that they have not
 been defined before and either registered in the Linux kernel documentation
@@ -306,7 +361,7 @@ process.
 
 Once registration and review have been completed, the kernel provides an
 interface for looking up device properties in a manner independent of
-whether DT or ACPI is being used.  This API should be used [6]; it can
+whether DT or ACPI is being used.  This API should be used [5]; it can
 eliminate some duplication of code paths in driver probing functions and
 discourage divergence between DT bindings and ACPI device properties.
 
@@ -448,15 +503,15 @@ ASWG
 ----
 The ACPI specification changes regularly.  During the year 2014, for instance,
 version 5.1 was released and version 6.0 substantially completed, with most of
-the changes being driven by ARM-specific requirements.  Proposed changes are
+the changes being driven by Arm-specific requirements.  Proposed changes are
 presented and discussed in the ASWG (ACPI Specification Working Group) which
 is a part of the UEFI Forum.  The current version of the ACPI specification
-is 6.1 release in January 2016.
+is 6.5 release in August 2022.
 
 Participation in this group is open to all UEFI members.  Please see
 http://www.uefi.org/workinggroup for details on group membership.
 
-It is the intent of the ARMv8 ACPI kernel code to follow the ACPI specification
+It is the intent of the Arm ACPI kernel code to follow the ACPI specification
 as closely as possible, and to only implement functionality that complies with
 the released standards from UEFI ASWG.  As a practical matter, there will be
 vendors that provide bad ACPI tables or violate the standards in some way.
@@ -470,12 +525,12 @@ likely be willing to assist in submitting ECRs.
 
 Linux Code
 ----------
-Individual items specific to Linux on ARM, contained in the Linux
+Individual items specific to Linux on Arm, contained in the Linux
 source code, are in the list that follows:
 
 ACPI_OS_NAME
                        This macro defines the string to be returned when
-                       an ACPI method invokes the _OS method.  On ARM64
+                       an ACPI method invokes the _OS method.  On Arm
                        systems, this macro will be "Linux" by default.
                        The command line parameter acpi_os=<string>
                        can be used to set it to some other value.  The
@@ -490,31 +545,23 @@ Documentation/arm64/acpi_object_usage.rst.
 
 References
 ----------
-[0] http://silver.arm.com
-    document ARM-DEN-0029, or newer:
-    "Server Base System Architecture", version 2.3, dated 27 Mar 2014
+[0] https://developer.arm.com/documentation/den0094/latest
+    document Arm-DEN-0094: "Arm Base System Architecture", version 1.0C, dated 6 Oct 2022
+
+[1] https://developer.arm.com/documentation/den0044/latest
+    Document Arm-DEN-0044: "Arm Base Boot Requirements", version 2.0G, dated 15 Apr 2022
 
-[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf
-    Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System
-    Software on ARM Platforms", dated 16 Aug 2014
+[2] https://developer.arm.com/documentation/den0029/latest
+    Document Arm-DEN-0029: "Arm Server Base System Architecture", version 7.1, dated 06 Oct 2022
 
-[2] http://www.secretlab.ca/archives/151,
+[3] http://www.secretlab.ca/archives/151,
     10 Jan 2015, Copyright (c) 2015,
     Linaro Ltd., written by Grant Likely.
 
-[3] AMD ACPI for Seattle platform documentation
-    http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf
-
-
-[4] http://www.uefi.org/acpi
-    please see the link for the "ACPI _DSD Device
-    Property Registry Instructions"
-
-[5] http://www.uefi.org/acpi
-    please see the link for the "_DSD (Device
-    Specific Data) Implementation Guide"
+[4] _DSD (Device Specific Data) Implementation Guide
+    https://github.com/UEFI/DSD-Guide/blob/main/dsd-guide.pdf
 
-[6] Kernel code for the unified device
+[5] Kernel code for the unified device
     property interface can be found in
     include/linux/property.h and drivers/base/property.c.
 
index ffeccdd..b57776a 100644 (file)
@@ -379,6 +379,38 @@ Before jumping into the kernel, the following conditions must be met:
 
     - SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1.
 
+  For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
+
+  - If the kernel is entered at EL1 and EL2 is present:
+
+    - HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.
+
+  For CPUs with the Extended Translation Control Register feature (FEAT_TCR2):
+
+  - If EL3 is present:
+
+    - SCR_EL3.TCR2En (bit 43) must be initialised to 0b1.
+
+ - If the kernel is entered at EL1 and EL2 is present:
+
+    - HCRX_EL2.TCR2En (bit 14) must be initialised to 0b1.
+
+  For CPUs with the Stage 1 Permission Indirection Extension feature (FEAT_S1PIE):
+
+  - If EL3 is present:
+
+    - SCR_EL3.PIEn (bit 45) must be initialised to 0b1.
+
+  - If the kernel is entered at EL1 and EL2 is present:
+
+    - HFGRTR_EL2.nPIR_EL1 (bit 58) must be initialised to 0b1.
+
+    - HFGWTR_EL2.nPIR_EL1 (bit 58) must be initialised to 0b1.
+
+    - HFGRTR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.
+
+    - HFGRWR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.
+
 The requirements described above for CPU mode, caches, MMUs, architected
 timers, coherency and system registers apply to all CPUs.  All CPUs must
 enter the kernel in the same exception level.  Where the values documented
index c7adc78..4e4625f 100644 (file)
@@ -288,6 +288,8 @@ infrastructure:
      +------------------------------+---------+---------+
      | Name                         |  bits   | visible |
      +------------------------------+---------+---------+
+     | MOPS                         | [19-16] |    y    |
+     +------------------------------+---------+---------+
      | RPRES                        | [7-4]   |    y    |
      +------------------------------+---------+---------+
      | WFXT                         | [3-0]   |    y    |
index 83e57e4..8f847d0 100644 (file)
@@ -302,6 +302,9 @@ HWCAP2_SMEB16B16
 HWCAP2_SMEF16F16
     Functionality implied by ID_AA64SMFR0_EL1.F16F16 == 0b1
 
+HWCAP2_MOPS
+    Functionality implied by ID_AA64ISAR2_EL1.MOPS == 0b0001.
+
 4. Unused AT_HWCAP bits
 -----------------------
 
index ae21f81..d08e924 100644 (file)
@@ -15,11 +15,13 @@ ARM64 Architecture
     cpu-feature-registers
     elf_hwcaps
     hugetlbpage
+    kdump
     legacy_instructions
     memory
     memory-tagging-extension
     perf
     pointer-authentication
+    ptdump
     silicon-errata
     sme
     sve
diff --git a/Documentation/arm64/kdump.rst b/Documentation/arm64/kdump.rst
new file mode 100644 (file)
index 0000000..56a89f4
--- /dev/null
@@ -0,0 +1,92 @@
+=======================================
+crashkernel memory reservation on arm64
+=======================================
+
+Author: Baoquan He <bhe@redhat.com>
+
+Kdump mechanism is used to capture a corrupted kernel vmcore so that
+it can be subsequently analyzed. In order to do this, a preliminarily
+reserved memory is needed to pre-load the kdump kernel and boot such
+kernel if corruption happens.
+
+That reserved memory for kdump is adapted to be able to minimally
+accommodate the kdump kernel and the user space programs needed for the
+vmcore collection.
+
+Kernel parameter
+================
+
+Through the kernel parameters below, memory can be reserved accordingly
+during the early stage of the first kernel booting so that a continuous
+large chunk of memomy can be found. The low memory reservation needs to
+be considered if the crashkernel is reserved from the high memory area.
+
+- crashkernel=size@offset
+- crashkernel=size
+- crashkernel=size,high crashkernel=size,low
+
+Low memory and high memory
+==========================
+
+For kdump reservations, low memory is the memory area under a specific
+limit, usually decided by the accessible address bits of the DMA-capable
+devices needed by the kdump kernel to run. Those devices not related to
+vmcore dumping can be ignored. On arm64, the low memory upper bound is
+not fixed: it is 1G on the RPi4 platform but 4G on most other systems.
+On special kernels built with CONFIG_ZONE_(DMA|DMA32) disabled, the
+whole system RAM is low memory. Outside of the low memory described
+above, the rest of system RAM is considered high memory.
+
+Implementation
+==============
+
+1) crashkernel=size@offset
+--------------------------
+
+The crashkernel memory must be reserved at the user-specified region or
+fail if already occupied.
+
+
+2) crashkernel=size
+-------------------
+
+The crashkernel memory region will be reserved in any available position
+according to the search order:
+
+Firstly, the kernel searches the low memory area for an available region
+with the specified size.
+
+If searching for low memory fails, the kernel falls back to searching
+the high memory area for an available region of the specified size. If
+the reservation in high memory succeeds, a default size reservation in
+the low memory will be done. Currently the default size is 128M,
+sufficient for the low memory needs of the kdump kernel.
+
+Note: crashkernel=size is the recommended option for crashkernel kernel
+reservations. The user would not need to know the system memory layout
+for a specific platform.
+
+3) crashkernel=size,high crashkernel=size,low
+---------------------------------------------
+
+crashkernel=size,(high|low) are an important supplement to
+crashkernel=size. They allows the user to specify how much memory needs
+to be allocated from the high memory and low memory respectively. On
+many systems the low memory is precious and crashkernel reservations
+from this area should be kept to a minimum.
+
+To reserve memory for crashkernel=size,high, searching is first
+attempted from the high memory region. If the reservation succeeds, the
+low memory reservation will be done subsequently.
+
+If reservation from the high memory failed, the kernel falls back to
+searching the low memory with the specified size in crashkernel=,high.
+If it succeeds, no further reservation for low memory is needed.
+
+Notes:
+
+- If crashkernel=,low is not specified, the default low memory
+  reservation will be done automatically.
+
+- if crashkernel=0,low is specified, it means that the low memory
+  reservation is omitted intentionally.
index 2a641ba..55a55f3 100644 (file)
@@ -33,8 +33,8 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
   0000000000000000     0000ffffffffffff         256TB          user
   ffff000000000000     ffff7fffffffffff         128TB          kernel logical memory map
  [ffff600000000000     ffff7fffffffffff]         32TB          [kasan shadow region]
-  ffff800000000000     ffff800007ffffff         128MB          modules
-  ffff800008000000     fffffbffefffffff         124TB          vmalloc
+  ffff800000000000     ffff80007fffffff           2GB          modules
+  ffff800080000000     fffffbffefffffff         124TB          vmalloc
   fffffbfff0000000     fffffbfffdffffff         224MB          fixed mappings (top down)
   fffffbfffe000000     fffffbfffe7fffff           8MB          [guard region]
   fffffbfffe800000     fffffbffff7fffff          16MB          PCI I/O space
@@ -50,8 +50,8 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):
   0000000000000000     000fffffffffffff           4PB          user
   fff0000000000000     ffff7fffffffffff          ~4PB          kernel logical memory map
  [fffd800000000000     ffff7fffffffffff]        512TB          [kasan shadow region]
-  ffff800000000000     ffff800007ffffff         128MB          modules
-  ffff800008000000     fffffbffefffffff         124TB          vmalloc
+  ffff800000000000     ffff80007fffffff           2GB          modules
+  ffff800080000000     fffffbffefffffff         124TB          vmalloc
   fffffbfff0000000     fffffbfffdffffff         224MB          fixed mappings (top down)
   fffffbfffe000000     fffffbfffe7fffff           8MB          [guard region]
   fffffbfffe800000     fffffbffff7fffff          16MB          PCI I/O space
diff --git a/Documentation/arm64/ptdump.rst b/Documentation/arm64/ptdump.rst
new file mode 100644 (file)
index 0000000..5dcfc5d
--- /dev/null
@@ -0,0 +1,96 @@
+======================
+Kernel page table dump
+======================
+
+ptdump is a debugfs interface that provides a detailed dump of the
+kernel page tables. It offers a comprehensive overview of the kernel
+virtual memory layout as well as the attributes associated with the
+various regions in a human-readable format. It is useful to dump the
+kernel page tables to verify permissions and memory types. Examining the
+page table entries and permissions helps identify potential security
+vulnerabilities such as mappings with overly permissive access rights or
+improper memory protections.
+
+Memory hotplug allows dynamic expansion or contraction of available
+memory without requiring a system reboot. To maintain the consistency
+and integrity of the memory management data structures, arm64 makes use
+of the ``mem_hotplug_lock`` semaphore in write mode. Additionally, in
+read mode, ``mem_hotplug_lock`` supports an efficient implementation of
+``get_online_mems()`` and ``put_online_mems()``. These protect the
+offlining of memory being accessed by the ptdump code.
+
+In order to dump the kernel page tables, enable the following
+configurations and mount debugfs::
+
+ CONFIG_GENERIC_PTDUMP=y
+ CONFIG_PTDUMP_CORE=y
+ CONFIG_PTDUMP_DEBUGFS=y
+
+ mount -t debugfs nodev /sys/kernel/debug
+ cat /sys/kernel/debug/kernel_page_tables
+
+On analysing the output of ``cat /sys/kernel/debug/kernel_page_tables``
+one can derive information about the virtual address range of the entry,
+followed by size of the memory region covered by this entry, the
+hierarchical structure of the page tables and finally the attributes
+associated with each page. The page attributes provide information about
+access permissions, execution capability, type of mapping such as leaf
+level PTE or block level PGD, PMD and PUD, and access status of a page
+within the kernel memory. Assessing these attributes can assist in
+understanding the memory layout, access patterns and security
+characteristics of the kernel pages.
+
+Kernel virtual memory layout example::
+
+ start address        end address         size             attributes
+ +---------------------------------------------------------------------------------------+
+ | ---[ Linear Mapping start ]---------------------------------------------------------- |
+ | ..................                                                                    |
+ | 0xfff0000000000000-0xfff0000000210000  2112K PTE RW NX SHD AF  UXN  MEM/NORMAL-TAGGED |
+ | 0xfff0000000210000-0xfff0000001c00000 26560K PTE ro NX SHD AF  UXN  MEM/NORMAL        |
+ | ..................                                                                    |
+ | ---[ Linear Mapping end ]------------------------------------------------------------ |
+ +---------------------------------------------------------------------------------------+
+ | ---[ Modules start ]----------------------------------------------------------------- |
+ | ..................                                                                    |
+ | 0xffff800000000000-0xffff800008000000   128M PTE                                      |
+ | ..................                                                                    |
+ | ---[ Modules end ]------------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+ | ---[ vmalloc() area ]---------------------------------------------------------------- |
+ | ..................                                                                    |
+ | 0xffff800008010000-0xffff800008200000  1984K PTE ro x  SHD AF       UXN  MEM/NORMAL   |
+ | 0xffff800008200000-0xffff800008e00000    12M PTE ro x  SHD AF  CON  UXN  MEM/NORMAL   |
+ | ..................                                                                    |
+ | ---[ vmalloc() end ]----------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+ | ---[ Fixmap start ]------------------------------------------------------------------ |
+ | ..................                                                                    |
+ | 0xfffffbfffdb80000-0xfffffbfffdb90000    64K PTE ro x  SHD AF  UXN  MEM/NORMAL        |
+ | 0xfffffbfffdb90000-0xfffffbfffdba0000    64K PTE ro NX SHD AF  UXN  MEM/NORMAL        |
+ | ..................                                                                    |
+ | ---[ Fixmap end ]-------------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+ | ---[ PCI I/O start ]----------------------------------------------------------------- |
+ | ..................                                                                    |
+ | 0xfffffbfffe800000-0xfffffbffff800000    16M PTE                                      |
+ | ..................                                                                    |
+ | ---[ PCI I/O end ]------------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+ | ---[ vmemmap start ]----------------------------------------------------------------- |
+ | ..................                                                                    |
+ | 0xfffffc0002000000-0xfffffc0002200000     2M PTE RW NX SHD AF  UXN  MEM/NORMAL        |
+ | 0xfffffc0002200000-0xfffffc0020000000   478M PTE                                      |
+ | ..................                                                                    |
+ | ---[ vmemmap end ]------------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+
+``cat /sys/kernel/debug/kernel_page_tables`` output::
+
+ 0xfff0000001c00000-0xfff0000080000000     2020M PTE  RW NX SHD AF   UXN    MEM/NORMAL-TAGGED
+ 0xfff0000080000000-0xfff0000800000000       30G PMD
+ 0xfff0000800000000-0xfff0000800700000        7M PTE  RW NX SHD AF   UXN    MEM/NORMAL-TAGGED
+ 0xfff0000800700000-0xfff0000800710000       64K PTE  ro NX SHD AF   UXN    MEM/NORMAL-TAGGED
+ 0xfff0000800710000-0xfff0000880000000  2089920K PTE  RW NX SHD AF   UXN    MEM/NORMAL-TAGGED
+ 0xfff0000880000000-0xfff0040000000000     4062G PMD
+ 0xfff0040000000000-0xffff800000000000     3964T PGD
index 9e311bc..d6430ad 100644 (file)
@@ -214,3 +214,7 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | Fujitsu        | A64FX           | E#010001        | FUJITSU_ERRATUM_010001      |
 +----------------+-----------------+-----------------+-----------------------------+
+
++----------------+-----------------+-----------------+-----------------------------+
+| ASR            | ASR8601         | #8601001        | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
index 1029531..9fea696 100644 (file)
@@ -18,7 +18,6 @@ Block
    kyber-iosched
    null_blk
    pr
-   request
    stat
    switching-sched
    writeback_cache_control
diff --git a/Documentation/block/request.rst b/Documentation/block/request.rst
deleted file mode 100644 (file)
index 747021e..0000000
+++ /dev/null
@@ -1,99 +0,0 @@
-============================
-struct request documentation
-============================
-
-Jens Axboe <jens.axboe@oracle.com> 27/05/02
-
-
-.. FIXME:
-   No idea about what does mean - seems just some noise, so comment it
-
-   1.0
-   Index
-
-   2.0 Struct request members classification
-
-       2.1 struct request members explanation
-
-   3.0
-
-
-   2.0
-
-
-
-Short explanation of request members
-====================================
-
-Classification flags:
-
-       =       ====================
-       D       driver member
-       B       block layer member
-       I       I/O scheduler member
-       =       ====================
-
-Unless an entry contains a D classification, a device driver must not access
-this member. Some members may contain D classifications, but should only be
-access through certain macros or functions (eg ->flags).
-
-<linux/blkdev.h>
-
-=============================== ======= =======================================
-Member                         Flag    Comment
-=============================== ======= =======================================
-struct list_head queuelist     BI      Organization on various internal
-                                       queues
-
-``void *elevator_private``     I       I/O scheduler private data
-
-unsigned char cmd[16]          D       Driver can use this for setting up
-                                       a cdb before execution, see
-                                       blk_queue_prep_rq
-
-unsigned long flags            DBI     Contains info about data direction,
-                                       request type, etc.
-
-int rq_status                  D       Request status bits
-
-kdev_t rq_dev                  DBI     Target device
-
-int errors                     DB      Error counts
-
-sector_t sector                        DBI     Target location
-
-unsigned long hard_nr_sectors  B       Used to keep sector sane
-
-unsigned long nr_sectors       DBI     Total number of sectors in request
-
-unsigned long hard_nr_sectors  B       Used to keep nr_sectors sane
-
-unsigned short nr_phys_segments        DB      Number of physical scatter gather
-                                       segments in a request
-
-unsigned short nr_hw_segments  DB      Number of hardware scatter gather
-                                       segments in a request
-
-unsigned int current_nr_sectors        DB      Number of sectors in first segment
-                                       of request
-
-unsigned int hard_cur_sectors  B       Used to keep current_nr_sectors sane
-
-int tag                                DB      TCQ tag, if assigned
-
-``void *special``              D       Free to be used by driver
-
-``char *buffer``               D       Map of first segment, also see
-                                       section on bouncing SECTION
-
-``struct completion *waiting`` D       Can be used by driver to get signalled
-                                       on request completion
-
-``struct bio *bio``            DBI     First bio in request
-
-``struct bio *biotail``                DBI     Last bio in request
-
-``struct request_queue *q``    DB      Request queue this request belongs to
-
-``struct request_list *rl``    B       Request list this request came from
-=============================== ======= =======================================
index e87a878..e9b022d 100644 (file)
@@ -1,8 +1,8 @@
 .. SPDX-License-Identifier: GPL-2.0
 
-=====
-cdrom
-=====
+======
+CD-ROM
+======
 
 .. toctree::
     :maxdepth: 1
index 37314af..d4fdf6a 100644 (file)
@@ -74,6 +74,7 @@ if major >= 3:
             "__percpu",
             "__rcu",
             "__user",
+            "__force",
 
             # include/linux/compiler_attributes.h:
             "__alias",
index f75778d..e6f5bc3 100644 (file)
@@ -127,17 +127,8 @@ bring CPU4 back online::
  $ echo 1 > /sys/devices/system/cpu/cpu4/online
  smpboot: Booting Node 0 Processor 4 APIC 0x1
 
-The CPU is usable again. This should work on all CPUs. CPU0 is often special
-and excluded from CPU hotplug. On X86 the kernel option
-*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to
-shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be
-used. Some known dependencies of CPU0:
-
-* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline.
-* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected.
-
-Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies
-on CPU0.
+The CPU is usable again. This should work on all CPUs, but CPU0 is often special
+and excluded from CPU hotplug.
 
 The CPU hotplug coordination
 ============================
index 9b3f3e5..f2bcc5a 100644 (file)
@@ -96,6 +96,12 @@ Command-line Parsing
 .. kernel-doc:: lib/cmdline.c
    :export:
 
+Error Pointers
+--------------
+
+.. kernel-doc:: include/linux/err.h
+   :internal:
+
 Sorting
 -------
 
@@ -412,3 +418,15 @@ Read-Copy Update (RCU)
 .. kernel-doc:: include/linux/rcu_sync.h
 
 .. kernel-doc:: kernel/rcu/sync.c
+
+.. kernel-doc:: kernel/rcu/tasks.h
+
+.. kernel-doc:: kernel/rcu/tree_stall.h
+
+.. kernel-doc:: include/linux/rcupdate_trace.h
+
+.. kernel-doc:: include/linux/rcupdate_wait.h
+
+.. kernel-doc:: include/linux/rcuref.h
+
+.. kernel-doc:: include/linux/rcutree.h
index 9fb0b10..d3c1f6d 100644 (file)
@@ -112,6 +112,12 @@ pages:
 This also leads to limitations: there are only 31-10==21 bits available for a
 counter that increments 10 bits at a time.
 
+* Because of that limitation, special handling is applied to the zero pages
+  when using FOLL_PIN.  We only pretend to pin a zero page - we don't alter its
+  refcount or pincount at all (it is permanent, so there's no need).  The
+  unpinning functions also don't do anything to a zero page.  This is
+  transparent to the caller.
+
 * Callers must specifically request "dma-pinned tracking of pages". In other
   words, just calling get_user_pages() will not suffice; a new set of functions,
   pin_user_page() and related, must be used.
index 5cb8b88..91acbcf 100644 (file)
@@ -53,7 +53,6 @@ preemption and interrupts::
        this_cpu_add_return(pcp, val)
        this_cpu_xchg(pcp, nval)
        this_cpu_cmpxchg(pcp, oval, nval)
-       this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
        this_cpu_sub(pcp, val)
        this_cpu_inc(pcp)
        this_cpu_dec(pcp)
@@ -242,7 +241,6 @@ safe::
        __this_cpu_add_return(pcp, val)
        __this_cpu_xchg(pcp, nval)
        __this_cpu_cmpxchg(pcp, oval, nval)
-       __this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
        __this_cpu_sub(pcp, val)
        __this_cpu_inc(pcp)
        __this_cpu_dec(pcp)
index 8ec4d62..a4c9b9d 100644 (file)
@@ -348,6 +348,37 @@ Guidelines
   level of locality in wq operations and work item execution.
 
 
+Monitoring
+==========
+
+Use tools/workqueue/wq_monitor.py to monitor workqueue operations: ::
+
+  $ tools/workqueue/wq_monitor.py events
+                              total  infl  CPUtime  CPUhog  CMwake  mayday rescued
+  events                      18545     0      6.1       0       5       -       -
+  events_highpri                  8     0      0.0       0       0       -       -
+  events_long                     3     0      0.0       0       0       -       -
+  events_unbound              38306     0      0.1       -       -       -       -
+  events_freezable                0     0      0.0       0       0       -       -
+  events_power_efficient      29598     0      0.2       0       0       -       -
+  events_freezable_power_        10     0      0.0       0       0       -       -
+  sock_diag_events                0     0      0.0       0       0       -       -
+
+                              total  infl  CPUtime  CPUhog  CMwake  mayday rescued
+  events                      18548     0      6.1       0       5       -       -
+  events_highpri                  8     0      0.0       0       0       -       -
+  events_long                     3     0      0.0       0       0       -       -
+  events_unbound              38322     0      0.1       -       -       -       -
+  events_freezable                0     0      0.0       0       0       -       -
+  events_power_efficient      29603     0      0.2       0       0       -       -
+  events_freezable_power_        10     0      0.0       0       0       -       -
+  sock_diag_events                0     0      0.0       0       0       -       -
+
+  ...
+
+See the command's help message for more info.
+
+
 Debugging
 =========
 
@@ -387,6 +418,7 @@ the stack trace of the offending worker thread. ::
 The work item's function should be trivially visible in the stack
 trace.
 
+
 Non-reentrance Conditions
 =========================
 
index bfc7739..27c146b 100644 (file)
@@ -66,7 +66,7 @@ features surfaced as a result:
 ::
 
   struct dma_async_tx_descriptor *
-  async_<operation>(<op specific parameters>, struct async_submit ctl *submit)
+  async_<operation>(<op specific parameters>, struct async_submit_ctl *submit)
 
 3.2 Supported operations
 ------------------------
index 12b575b..dd214af 100644 (file)
@@ -168,6 +168,28 @@ the `-t` option for specific single tests. Either can be used multiple times::
 
 For other features see the script usage output, seen with the `-h` option.
 
+Timeout for selftests
+=====================
+
+Selftests are designed to be quick and so a default timeout is used of 45
+seconds for each test. Tests can override the default timeout by adding
+a settings file in their directory and set a timeout variable there to the
+configured a desired upper timeout for the test. Only a few tests override
+the timeout with a value higher than 45 seconds, selftests strives to keep
+it that way. Timeouts in selftests are not considered fatal because the
+system under which a test runs may change and this can also modify the
+expected time it takes to run a test. If you have control over the systems
+which will run the tests you can configure a test runner on those systems to
+use a greater or lower timeout on the command line as with the `-o` or
+the `--override-timeout` argument. For example to use 165 seconds instead
+one would use:
+
+   $ ./run_kselftest.sh --override-timeout 165
+
+You can look at the TAP output to see if you ran into the timeout. Test
+runners which know a test must run under a specific time can then optionally
+treat these timeouts then as fatal.
+
 Packaging selftests
 ===================
 
index e95ab05..f335f88 100644 (file)
@@ -119,9 +119,9 @@ All expectations/assertions are formatted as:
          terminated immediately.
 
                - Assertions call the function:
-                 ``void __noreturn kunit_abort(struct kunit *)``.
+                 ``void __noreturn __kunit_abort(struct kunit *)``.
 
-               - ``kunit_abort`` calls the function:
+               - ``__kunit_abort`` calls the function:
                  ``void __noreturn kunit_try_catch_throw(struct kunit_try_catch *try_catch)``.
 
                - ``kunit_try_catch_throw`` calls the function:
index c736613..a982353 100644 (file)
@@ -250,15 +250,20 @@ Now we are ready to write the test cases.
        };
        kunit_test_suite(misc_example_test_suite);
 
+       MODULE_LICENSE("GPL");
+
 2. Add the following lines to ``drivers/misc/Kconfig``:
 
 .. code-block:: kconfig
 
        config MISC_EXAMPLE_TEST
                tristate "Test for my example" if !KUNIT_ALL_TESTS
-               depends on MISC_EXAMPLE && KUNIT=y
+               depends on MISC_EXAMPLE && KUNIT
                default KUNIT_ALL_TESTS
 
+Note: If your test does not support being built as a loadable module (which is
+discouraged), replace tristate by bool, and depend on KUNIT=y instead of KUNIT.
+
 3. Add the following lines to ``drivers/misc/Makefile``:
 
 .. code-block:: make
index 9faf2b4..c27e164 100644 (file)
@@ -121,6 +121,12 @@ there's an allocation error.
    ``return`` so they only work from the test function. In KUnit, we stop the
    current kthread on failure, so you can call them from anywhere.
 
+.. note::
+   Warning: There is an exception to the above rule. You shouldn't use assertions
+   in the suite's exit() function, or in the free function for a resource. These
+   run when a test is shutting down, and an assertion here prevents further
+   cleanup code from running, potentially leading to a memory leak.
+
 Customizing error messages
 --------------------------
 
@@ -160,7 +166,12 @@ many similar tests. In order to reduce duplication in these closely related
 tests, most unit testing frameworks (including KUnit) provide the concept of a
 *test suite*. A test suite is a collection of test cases for a unit of code
 with optional setup and teardown functions that run before/after the whole
-suite and/or every test case. For example:
+suite and/or every test case.
+
+.. note::
+   A test case will only run if it is associated with a test suite.
+
+For example:
 
 .. code-block:: c
 
@@ -190,7 +201,10 @@ after everything else. ``kunit_test_suite(example_test_suite)`` registers the
 test suite with the KUnit test framework.
 
 .. note::
-   A test case will only run if it is associated with a test suite.
+   The ``exit`` and ``suite_exit`` functions will run even if ``init`` or
+   ``suite_init`` fail. Make sure that they can handle any inconsistent
+   state which may result from ``init`` or ``suite_init`` encountering errors
+   or exiting early.
 
 ``kunit_test_suite(...)`` is a macro which tells the linker to put the
 specified test suite in a special linker section so that it can be run by KUnit
@@ -601,6 +615,57 @@ For example:
                KUNIT_ASSERT_STREQ(test, buffer, "");
        }
 
+Registering Cleanup Actions
+---------------------------
+
+If you need to perform some cleanup beyond simple use of ``kunit_kzalloc``,
+you can register a custom "deferred action", which is a cleanup function
+run when the test exits (whether cleanly, or via a failed assertion).
+
+Actions are simple functions with no return value, and a single ``void*``
+context argument, and fulfill the same role as "cleanup" functions in Python
+and Go tests, "defer" statements in languages which support them, and
+(in some cases) destructors in RAII languages.
+
+These are very useful for unregistering things from global lists, closing
+files or other resources, or freeing resources.
+
+For example:
+
+.. code-block:: C
+
+       static void cleanup_device(void *ctx)
+       {
+               struct device *dev = (struct device *)ctx;
+
+               device_unregister(dev);
+       }
+
+       void example_device_test(struct kunit *test)
+       {
+               struct my_device dev;
+
+               device_register(&dev);
+
+               kunit_add_action(test, &cleanup_device, &dev);
+       }
+
+Note that, for functions like device_unregister which only accept a single
+pointer-sized argument, it's possible to directly cast that function to
+a ``kunit_action_t`` rather than writing a wrapper function, for example:
+
+.. code-block:: C
+
+       kunit_add_action(test, (kunit_action_t *)&device_unregister, &dev);
+
+``kunit_add_action`` can fail if, for example, the system is out of memory.
+You can use ``kunit_add_action_or_reset`` instead which runs the action
+immediately if it cannot be deferred.
+
+If you need more control over when the cleanup function is called, you
+can trigger it early using ``kunit_release_action``, or cancel it entirely
+with ``kunit_remove_action``.
+
 
 Testing Static Functions
 ------------------------
index 61d77ac..f925290 100644 (file)
@@ -56,7 +56,7 @@ hypervisor {
 };
 
 The format and meaning of the "xen,uefi-*" parameters are similar to those in
-Documentation/arm/uefi.rst, which are provided by the regular UEFI stub. However
+Documentation/arch/arm/uefi.rst, which are provided by the regular UEFI stub. However
 they differ because they are provided by the Xen hypervisor, together with a set
 of UEFI runtime services implemented via hypercalls, see
 http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,platform.h.html.
index 7fdf409..38770c4 100644 (file)
@@ -8,7 +8,7 @@ title: Common Properties for Serial ATA AHCI controllers
 
 maintainers:
   - Hans de Goede <hdegoede@redhat.com>
-  - Damien Le Moal <damien.lemoal@opensource.wdc.com>
+  - Damien Le Moal <dlemoal@kernel.org>
 
 description:
   This document defines device tree properties for a common AHCI SATA
index 9b31f86..71364c6 100644 (file)
@@ -32,7 +32,7 @@ properties:
     maxItems: 1
 
   iommus:
-    maxItems: 1
+    maxItems: 4
 
   power-domains:
     maxItems: 1
index d8b9194..44892aa 100644 (file)
@@ -129,6 +129,7 @@ allOf:
               - qcom,sm8250-llcc
               - qcom,sm8350-llcc
               - qcom,sm8450-llcc
+              - qcom,sm8550-llcc
     then:
       properties:
         reg:
index 998e5cc..380cb6d 100644 (file)
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Canaan Kendryte K210 Clock
 
 maintainers:
-  - Damien Le Moal <damien.lemoal@wdc.com>
+  - Damien Le Moal <dlemoal@kernel.org>
 
 description: |
   Canaan Kendryte K210 SoC clocks driver bindings. The clock
index e6c1ebf..130e16d 100644 (file)
@@ -82,6 +82,18 @@ properties:
       Indicates if the DSI controller is driving a panel which needs
       2 DSI links.
 
+  qcom,master-dsi:
+    type: boolean
+    description: |
+      Indicates if the DSI controller is the master DSI controller when
+      qcom,dual-dsi-mode enabled.
+
+  qcom,sync-dual-dsi:
+    type: boolean
+    description: |
+      Indicates if the DSI controller needs to sync the other DSI controller
+      with MIPI DCS commands when qcom,dual-dsi-mode enabled.
+
   assigned-clocks:
     minItems: 2
     maxItems: 4
index 367d04a..83381f3 100644 (file)
@@ -71,6 +71,8 @@ properties:
     minItems: 1
     maxItems: 3
 
+  dma-coherent: true
+
   interconnects:
     maxItems: 1
 
index 4fb05eb..164331e 100644 (file)
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Lattice Slave SPI sysCONFIG FPGA manager
 
 maintainers:
-  - Ivan Bornyakov <i.bornyakov@metrotek.ru>
+  - Vladimir Georgiev <v.georgiev@metrotek.ru>
 
 description: |
   Lattice sysCONFIG port, which is used for FPGA configuration, among others,
index 527532f..a157eec 100644 (file)
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Microchip Polarfire FPGA manager.
 
 maintainers:
-  - Ivan Bornyakov <i.bornyakov@metrotek.ru>
+  - Vladimir Georgiev <v.georgiev@metrotek.ru>
 
 description:
   Device Tree Bindings for Microchip Polarfire FPGA Manager using slave SPI to
index 85d9efb..d9ef867 100644 (file)
@@ -60,6 +60,7 @@ properties:
     default: 0
 
   regstep:
+    $ref: /schemas/types.yaml#/definitions/uint32
     description: |
       deprecated, use reg-shift above
     deprecated: true
index 62f3ca6..32c821f 100644 (file)
@@ -44,7 +44,7 @@ required:
   - clock-names
   - clocks
 
-additionalProperties: true
+unevaluatedProperties: false
 
 examples:
   - |
index 63369ba..0a192ca 100644 (file)
@@ -39,6 +39,12 @@ properties:
   power-domains:
     maxItems: 1
 
+  vref-supply:
+    description: |
+      External ADC reference voltage supply on VREFH pad. If VERID[MVI] is
+      set, there are additional, internal reference voltages selectable.
+      VREFH1 is always from VREFH pad.
+
   "#io-channel-cells":
     const: 1
 
@@ -72,6 +78,7 @@ examples:
             assigned-clocks = <&clk IMX_SC_R_ADC_0>;
             assigned-clock-rates = <24000000>;
             power-domains = <&pd IMX_SC_R_ADC_0>;
+            vref-supply = <&reg_1v8>;
             #io-channel-cells = <1>;
         };
     };
index 1c7aee5..36dff32 100644 (file)
@@ -90,7 +90,7 @@ patternProperties:
             of the MAX chips to the GyroADC, while MISO line of each Maxim
             ADC connects to a shared input pin of the GyroADC.
         enum:
-          - adi,7476
+          - adi,ad7476
           - fujitsu,mb88101a
           - maxim,max1162
           - maxim,max11100
index 9211726..39e64c7 100644 (file)
@@ -166,6 +166,12 @@ properties:
   resets:
     maxItems: 1
 
+  mediatek,broken-save-restore-fw:
+    type: boolean
+    description:
+      Asserts that the firmware on this device has issues saving and restoring
+      GICR registers when the GIC redistributors are powered off.
+
 dependencies:
   mbi-ranges: [ msi-controller ]
   msi-controller: [ mbi-ranges ]
diff --git a/Documentation/devicetree/bindings/interrupt-controller/loongson,eiointc.yaml b/Documentation/devicetree/bindings/interrupt-controller/loongson,eiointc.yaml
new file mode 100644 (file)
index 0000000..393c128
--- /dev/null
@@ -0,0 +1,59 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/loongson,eiointc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Loongson Extended I/O Interrupt Controller
+
+maintainers:
+  - Binbin Zhou <zhoubinbin@loongson.cn>
+
+description: |
+  This interrupt controller is found on the Loongson-3 family chips and
+  Loongson-2K series chips and is used to distribute interrupts directly to
+  individual cores without forwarding them through the HT's interrupt line.
+
+allOf:
+  - $ref: /schemas/interrupt-controller.yaml#
+
+properties:
+  compatible:
+    enum:
+      - loongson,ls2k0500-eiointc
+      - loongson,ls2k2000-eiointc
+
+  reg:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+  interrupt-controller: true
+
+  '#interrupt-cells':
+    const: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - interrupt-controller
+  - '#interrupt-cells'
+
+unevaluatedProperties: false
+
+examples:
+  - |
+    eiointc: interrupt-controller@1fe11600 {
+      compatible = "loongson,ls2k0500-eiointc";
+      reg = <0x1fe10000 0x10000>;
+
+      interrupt-controller;
+      #interrupt-cells = <1>;
+
+      interrupt-parent = <&cpuintc>;
+      interrupts = <3>;
+    };
+
+...
index 8b38931..e2ffe0a 100644 (file)
@@ -49,6 +49,7 @@ properties:
 
         properties:
           data-lanes:
+            minItems: 1
             maxItems: 2
 
         required:
diff --git a/Documentation/devicetree/bindings/memory-controllers/nuvoton,npcm-memory-controller.yaml b/Documentation/devicetree/bindings/memory-controllers/nuvoton,npcm-memory-controller.yaml
new file mode 100644 (file)
index 0000000..ac1a5a1
--- /dev/null
@@ -0,0 +1,50 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/memory-controllers/nuvoton,npcm-memory-controller.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Nuvoton NPCM Memory Controller
+
+maintainers:
+  - Marvin Lin <kflin@nuvoton.com>
+  - Stanley Chu <yschu@nuvoton.com>
+
+description: |
+  The Nuvoton BMC SoC supports DDR4 memory with or without ECC (error correction
+  check).
+
+  The memory controller supports single bit error correction, double bit error
+  detection (in-line ECC in which a section (1/8th) of the memory device used to
+  store data is used for ECC storage).
+
+  Note, the bootloader must configure ECC mode for the memory controller.
+
+properties:
+  compatible:
+    enum:
+      - nuvoton,npcm750-memory-controller
+      - nuvoton,npcm845-memory-controller
+
+  reg:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+    mc: memory-controller@f0824000 {
+        compatible = "nuvoton,npcm750-memory-controller";
+        reg = <0xf0824000 0x1000>;
+        interrupts = <GIC_SPI 25 IRQ_TYPE_LEVEL_HIGH>;
+    };
index 8459d36..3b3beab 100644 (file)
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Canaan Kendryte K210 System Controller
 
 maintainers:
-  - Damien Le Moal <damien.lemoal@wdc.com>
+  - Damien Le Moal <dlemoal@kernel.org>
 
 description:
   Canaan Inc. Kendryte K210 SoC system controller which provides a
index 769fa5c..de1d429 100644 (file)
@@ -21,11 +21,22 @@ properties:
 
   st,can-primary:
     description:
-      Primary and secondary mode of the bxCAN peripheral is only relevant
-      if the chip has two CAN peripherals. In that case they share some
-      of the required logic.
+      Primary mode of the bxCAN peripheral is only relevant if the chip has
+      two CAN peripherals in dual CAN configuration. In that case they share
+      some of the required logic.
+      Not to be used if the peripheral is in single CAN configuration.
       To avoid misunderstandings, it should be noted that ST documentation
-      uses the terms master/slave instead of primary/secondary.
+      uses the terms master instead of primary.
+    type: boolean
+
+  st,can-secondary:
+    description:
+      Secondary mode of the bxCAN peripheral is only relevant if the chip
+      has two CAN peripherals in dual CAN configuration. In that case they
+      share some of the required logic.
+      Not to be used if the peripheral is in single CAN configuration.
+      To avoid misunderstandings, it should be noted that ST documentation
+      uses the terms slave instead of secondary.
     type: boolean
 
   reg:
index 8cc2b99..043e118 100644 (file)
@@ -11,7 +11,7 @@ maintainers:
   - Alistair Francis <alistair@alistair23.me>
 
 description:
-  RTL8723CS/RTL8723CS/RTL8821CS/RTL8822CS is a WiFi + BT chip. WiFi part
+  RTL8723BS/RTL8723CS/RTL8821CS/RTL8822CS is a WiFi + BT chip. WiFi part
   is connected over SDIO, while BT is connected over serial. It speaks
   H5 protocol with few extra commands to upload firmware and change
   module speed.
@@ -27,7 +27,7 @@ properties:
       - items:
           - enum:
               - realtek,rtl8821cs-bt
-          - const: realtek,rtl8822cs-bt
+          - const: realtek,rtl8723bs-bt
 
   device-wake-gpios:
     maxItems: 1
index 9bff8ec..d91b639 100644 (file)
@@ -17,20 +17,11 @@ description:
 properties:
   clocks:
     minItems: 3
-    items:
-      - description: PCIe bridge clock.
-      - description: PCIe bus clock.
-      - description: PCIe PHY clock.
-      - description: Additional required clock entry for imx6sx-pcie,
-           imx6sx-pcie-ep, imx8mq-pcie, imx8mq-pcie-ep.
+    maxItems: 4
 
   clock-names:
     minItems: 3
-    items:
-      - const: pcie
-      - const: pcie_bus
-      - enum: [ pcie_phy, pcie_aux ]
-      - enum: [ pcie_inbound_axi, pcie_aux ]
+    maxItems: 4
 
   num-lanes:
     const: 1
index f4a328e..ee155ed 100644 (file)
@@ -31,6 +31,19 @@ properties:
       - const: dbi
       - const: addr_space
 
+  clocks:
+    minItems: 3
+    items:
+      - description: PCIe bridge clock.
+      - description: PCIe bus clock.
+      - description: PCIe PHY clock.
+      - description: Additional required clock entry for imx6sx-pcie,
+           imx6sx-pcie-ep, imx8mq-pcie, imx8mq-pcie-ep.
+
+  clock-names:
+    minItems: 3
+    maxItems: 4
+
   interrupts:
     items:
       - description: builtin eDMA interrupter.
@@ -49,6 +62,31 @@ required:
 allOf:
   - $ref: /schemas/pci/snps,dw-pcie-ep.yaml#
   - $ref: /schemas/pci/fsl,imx6q-pcie-common.yaml#
+  - if:
+      properties:
+        compatible:
+          enum:
+            - fsl,imx8mq-pcie-ep
+    then:
+      properties:
+        clocks:
+          minItems: 4
+        clock-names:
+          items:
+            - const: pcie
+            - const: pcie_bus
+            - const: pcie_phy
+            - const: pcie_aux
+    else:
+      properties:
+        clocks:
+          maxItems: 3
+        clock-names:
+          items:
+            - const: pcie
+            - const: pcie_bus
+            - const: pcie_aux
+
 
 unevaluatedProperties: false
 
index 2443641..81bbb87 100644 (file)
@@ -40,6 +40,19 @@ properties:
       - const: dbi
       - const: config
 
+  clocks:
+    minItems: 3
+    items:
+      - description: PCIe bridge clock.
+      - description: PCIe bus clock.
+      - description: PCIe PHY clock.
+      - description: Additional required clock entry for imx6sx-pcie,
+           imx6sx-pcie-ep, imx8mq-pcie, imx8mq-pcie-ep.
+
+  clock-names:
+    minItems: 3
+    maxItems: 4
+
   interrupts:
     items:
       - description: builtin MSI controller.
@@ -77,6 +90,70 @@ required:
 allOf:
   - $ref: /schemas/pci/snps,dw-pcie.yaml#
   - $ref: /schemas/pci/fsl,imx6q-pcie-common.yaml#
+  - if:
+      properties:
+        compatible:
+          enum:
+            - fsl,imx6sx-pcie
+    then:
+      properties:
+        clocks:
+          minItems: 4
+        clock-names:
+          items:
+            - const: pcie
+            - const: pcie_bus
+            - const: pcie_phy
+            - const: pcie_inbound_axi
+
+  - if:
+      properties:
+        compatible:
+          enum:
+            - fsl,imx8mq-pcie
+    then:
+      properties:
+        clocks:
+          minItems: 4
+        clock-names:
+          items:
+            - const: pcie
+            - const: pcie_bus
+            - const: pcie_phy
+            - const: pcie_aux
+
+  - if:
+      properties:
+        compatible:
+          enum:
+            - fsl,imx6q-pcie
+            - fsl,imx6qp-pcie
+            - fsl,imx7d-pcie
+    then:
+      properties:
+        clocks:
+          maxItems: 3
+        clock-names:
+          items:
+            - const: pcie
+            - const: pcie_bus
+            - const: pcie_phy
+
+  - if:
+      properties:
+        compatible:
+          enum:
+            - fsl,imx8mm-pcie
+            - fsl,imx8mp-pcie
+    then:
+      properties:
+        clocks:
+          maxItems: 3
+        clock-names:
+          items:
+            - const: pcie
+            - const: pcie_bus
+            - const: pcie_aux
 
 unevaluatedProperties: false
 
index 80a9238..e9fad4b 100644 (file)
@@ -4,7 +4,7 @@
 $id: http://devicetree.org/schemas/perf/fsl-imx-ddr.yaml#
 $schema: http://devicetree.org/meta-schemas/core.yaml#
 
-title: Freescale(NXP) IMX8 DDR performance monitor
+title: Freescale(NXP) IMX8/9 DDR performance monitor
 
 maintainers:
   - Frank Li <frank.li@nxp.com>
@@ -19,6 +19,7 @@ properties:
           - fsl,imx8mm-ddr-pmu
           - fsl,imx8mn-ddr-pmu
           - fsl,imx8mp-ddr-pmu
+          - fsl,imx93-ddr-pmu
       - items:
           - enum:
               - fsl,imx8mm-ddr-pmu
index 7f4f36a..739a08f 100644 (file)
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Canaan Kendryte K210 FPIOA
 
 maintainers:
-  - Damien Le Moal <damien.lemoal@wdc.com>
+  - Damien Le Moal <dlemoal@kernel.org>
 
 description:
   The Canaan Kendryte K210 SoC Fully Programmable IO Array (FPIOA)
index c91d3e3..80f9606 100644 (file)
@@ -144,8 +144,9 @@ $defs:
         enum: [0, 1, 2, 3, 4, 5, 6, 7]
 
       qcom,paired:
-        - description:
-            Indicates that the pin should be operating in paired mode.
+        type: boolean
+        description:
+          Indicates that the pin should be operating in paired mode.
 
     required:
       - pins
index afad313..f9c211a 100644 (file)
@@ -29,6 +29,7 @@ properties:
       - qcom,qcm2290-rpmpd
       - qcom,qcs404-rpmpd
       - qcom,qdu1000-rpmhpd
+      - qcom,sa8155p-rpmhpd
       - qcom,sa8540p-rpmhpd
       - qcom,sa8775p-rpmhpd
       - qcom,sdm660-rpmpd
index ee8a2dc..0c01359 100644 (file)
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Canaan Kendryte K210 Reset Controller
 
 maintainers:
-  - Damien Le Moal <damien.lemoal@wdc.com>
+  - Damien Le Moal <dlemoal@kernel.org>
 
 description: |
   Canaan Kendryte K210 reset controller driver which supports the SoC
index f8f3f28..41fd11f 100644 (file)
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Canaan SoC-based boards
 
 maintainers:
-  - Damien Le Moal <damien.lemoal@wdc.com>
+  - Damien Le Moal <dlemoal@kernel.org>
 
 description:
   Canaan Kendryte K210 SoC-based boards
index eb3488d..6a7be42 100644 (file)
@@ -70,6 +70,7 @@ properties:
   dsr-gpios: true
   rng-gpios: true
   dcd-gpios: true
+  rs485-rts-active-high: true
   rts-gpio: true
   power-domains: true
   clock-frequency: true
index a5bb561..31a3024 100644 (file)
@@ -55,7 +55,9 @@ properties:
     description: TDM TX current sense time slot.
 
   '#sound-dai-cells':
-    const: 1
+    # The codec has a single DAI, the #sound-dai-cells=<1>; case is left in for backward
+    # compatibility but is deprecated.
+    enum: [0, 1]
 
 required:
   - compatible
@@ -72,7 +74,7 @@ examples:
      codec: codec@4c {
        compatible = "ti,tas2562";
        reg = <0x4c>;
-       #sound-dai-cells = <1>;
+       #sound-dai-cells = <0>;
        interrupt-parent = <&gpio1>;
        interrupts = <14>;
        shutdown-gpios = <&gpio1 15 0>;
index 26088ad..8908bf1 100644 (file)
@@ -57,7 +57,9 @@ properties:
       - 1 # Falling edge
 
   '#sound-dai-cells':
-    const: 1
+    # The codec has a single DAI, the #sound-dai-cells=<1>; case is left in for backward
+    # compatibility but is deprecated.
+    enum: [0, 1]
 
 required:
   - compatible
@@ -74,7 +76,7 @@ examples:
      codec: codec@41 {
        compatible = "ti,tas2770";
        reg = <0x41>;
-       #sound-dai-cells = <1>;
+       #sound-dai-cells = <0>;
        interrupt-parent = <&gpio1>;
        interrupts = <14>;
        reset-gpio = <&gpio1 15 0>;
index 8cba013..a876545 100644 (file)
@@ -50,7 +50,9 @@ properties:
     description: TDM TX voltage sense time slot.
 
   '#sound-dai-cells':
-    const: 1
+    # The codec has a single DAI, the #sound-dai-cells=<1>; case is left in for backward
+    # compatibility but is deprecated.
+    enum: [0, 1]
 
 required:
   - compatible
@@ -67,7 +69,7 @@ examples:
      codec: codec@38 {
        compatible = "ti,tas2764";
        reg = <0x38>;
-       #sound-dai-cells = <1>;
+       #sound-dai-cells = <0>;
        interrupt-parent = <&gpio1>;
        interrupts = <14>;
        reset-gpios = <&gpio1 15 0>;
index f59125b..0b4e21b 100644 (file)
@@ -8,7 +8,7 @@ Required properties:
        "ti,tlv320aic32x6" TLV320AIC3206, TLV320AIC3256
        "ti,tas2505" TAS2505, TAS2521
  - reg: I2C slave address
- - supply-*: Required supply regulators are:
+ - *-supply: Required supply regulators are:
     "iov" - digital IO power supply
     "ldoin" - LDO power supply
     "dv" - Digital core power supply
index b0bee7e..ab8b8fc 100644 (file)
@@ -8,6 +8,7 @@ Required properties:
     * marvell,armada380-thermal
     * marvell,armadaxp-thermal
     * marvell,armada-ap806-thermal
+    * marvell,armada-ap807-thermal
     * marvell,armada-cp110-thermal
 
 Note: these bindings are deprecated for AP806/CP110 and should instead
diff --git a/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.txt b/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.txt
deleted file mode 100644 (file)
index a3e9ec5..0000000
+++ /dev/null
@@ -1,41 +0,0 @@
-Binding for Thermal Sensor driver for BCM2835 SoCs.
-
-Required parameters:
--------------------
-
-compatible:            should be one of: "brcm,bcm2835-thermal",
-                       "brcm,bcm2836-thermal" or "brcm,bcm2837-thermal"
-reg:                   Address range of the thermal registers.
-clocks:                Phandle of the clock used by the thermal sensor.
-#thermal-sensor-cells: should be 0 (see Documentation/devicetree/bindings/thermal/thermal-sensor.yaml)
-
-Example:
-
-thermal-zones {
-       cpu_thermal: cpu-thermal {
-               polling-delay-passive = <0>;
-               polling-delay = <1000>;
-
-               thermal-sensors = <&thermal>;
-
-               trips {
-                       cpu-crit {
-                               temperature     = <80000>;
-                               hysteresis      = <0>;
-                               type            = "critical";
-                       };
-               };
-
-               coefficients = <(-538)  407000>;
-
-               cooling-maps {
-               };
-       };
-};
-
-thermal: thermal@7e212000 {
-       compatible = "brcm,bcm2835-thermal";
-       reg = <0x7e212000 0x8>;
-       clocks = <&clocks BCM2835_CLOCK_TSENS>;
-       #thermal-sensor-cells = <0>;
-};
diff --git a/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.yaml b/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.yaml
new file mode 100644 (file)
index 0000000..2b6026d
--- /dev/null
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/thermal/brcm,bcm2835-thermal.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Broadcom BCM2835 thermal sensor
+
+maintainers:
+  - Stefan Wahren <stefan.wahren@i2se.com>
+
+allOf:
+  - $ref: thermal-sensor.yaml#
+
+properties:
+  compatible:
+    enum:
+      - brcm,bcm2835-thermal
+      - brcm,bcm2836-thermal
+      - brcm,bcm2837-thermal
+
+  reg:
+    maxItems: 1
+
+  clocks:
+    maxItems: 1
+
+  "#thermal-sensor-cells":
+    const: 0
+
+unevaluatedProperties: false
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - '#thermal-sensor-cells'
+
+examples:
+  - |
+    #include <dt-bindings/clock/bcm2835.h>
+
+    thermal@7e212000 {
+      compatible = "brcm,bcm2835-thermal";
+      reg = <0x7e212000 0x8>;
+      clocks = <&clocks BCM2835_CLOCK_TSENS>;
+      #thermal-sensor-cells = <0>;
+    };
index d1ec963..27e9e16 100644 (file)
@@ -29,6 +29,8 @@ properties:
         items:
           - enum:
               - qcom,mdm9607-tsens
+              - qcom,msm8226-tsens
+              - qcom,msm8909-tsens
               - qcom,msm8916-tsens
               - qcom,msm8939-tsens
               - qcom,msm8974-tsens
@@ -48,6 +50,7 @@ properties:
               - qcom,msm8953-tsens
               - qcom,msm8996-tsens
               - qcom,msm8998-tsens
+              - qcom,qcm2290-tsens
               - qcom,sc7180-tsens
               - qcom,sc7280-tsens
               - qcom,sc8180x-tsens
@@ -56,6 +59,7 @@ properties:
               - qcom,sdm845-tsens
               - qcom,sm6115-tsens
               - qcom,sm6350-tsens
+              - qcom,sm6375-tsens
               - qcom,sm8150-tsens
               - qcom,sm8250-tsens
               - qcom,sm8350-tsens
@@ -67,6 +71,12 @@ properties:
         enum:
           - qcom,ipq8074-tsens
 
+      - description: v2 of TSENS with combined interrupt
+        items:
+          - enum:
+              - qcom,ipq9574-tsens
+          - const: qcom,ipq8074-tsens
+
   reg:
     items:
       - description: TM registers
@@ -223,12 +233,7 @@ allOf:
           contains:
             enum:
               - qcom,ipq8064-tsens
-              - qcom,mdm9607-tsens
-              - qcom,msm8916-tsens
               - qcom,msm8960-tsens
-              - qcom,msm8974-tsens
-              - qcom,msm8976-tsens
-              - qcom,qcs404-tsens
               - qcom,tsens-v0_1
               - qcom,tsens-v1
     then:
@@ -244,22 +249,7 @@ allOf:
       properties:
         compatible:
           contains:
-            enum:
-              - qcom,msm8953-tsens
-              - qcom,msm8996-tsens
-              - qcom,msm8998-tsens
-              - qcom,sc7180-tsens
-              - qcom,sc7280-tsens
-              - qcom,sc8180x-tsens
-              - qcom,sc8280xp-tsens
-              - qcom,sdm630-tsens
-              - qcom,sdm845-tsens
-              - qcom,sm6350-tsens
-              - qcom,sm8150-tsens
-              - qcom,sm8250-tsens
-              - qcom,sm8350-tsens
-              - qcom,sm8450-tsens
-              - qcom,tsens-v2
+            const: qcom,tsens-v2
     then:
       properties:
         interrupts:
diff --git a/Documentation/devicetree/bindings/timer/brcm,kona-timer.txt b/Documentation/devicetree/bindings/timer/brcm,kona-timer.txt
deleted file mode 100644 (file)
index 39adf54..0000000
+++ /dev/null
@@ -1,25 +0,0 @@
-Broadcom Kona Family timer
------------------------------------------------------
-This timer is used in the following Broadcom SoCs:
- BCM11130, BCM11140, BCM11351, BCM28145, BCM28155
-
-Required properties:
-- compatible : "brcm,kona-timer"
-- DEPRECATED: compatible : "bcm,kona-timer"
-- reg : Register range for the timer
-- interrupts : interrupt for the timer
-- clocks: phandle + clock specifier pair of the external clock
-- clock-frequency: frequency that the clock operates
-
-Only one of clocks or clock-frequency should be specified.
-
-Refer to clocks/clock-bindings.txt for generic clock consumer properties.
-
-Example:
-       timer@35006000 {
-               compatible = "brcm,kona-timer";
-               reg = <0x35006000 0x1000>;
-               interrupts = <0x0 7 0x4>;
-               clocks = <&hub_timer_clk>;
-       };
-
diff --git a/Documentation/devicetree/bindings/timer/brcm,kona-timer.yaml b/Documentation/devicetree/bindings/timer/brcm,kona-timer.yaml
new file mode 100644 (file)
index 0000000..d6af838
--- /dev/null
@@ -0,0 +1,52 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/brcm,kona-timer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Broadcom Kona family timer
+
+maintainers:
+  - Florian Fainelli <f.fainelli@gmail.com>
+
+properties:
+  compatible:
+    const: brcm,kona-timer
+
+  reg:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+  clocks:
+    maxItems: 1
+
+  clock-frequency: true
+
+oneOf:
+  - required:
+      - clocks
+  - required:
+      - clock-frequency
+
+required:
+  - compatible
+  - reg
+  - interrupts
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/clock/bcm281xx.h>
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+    #include <dt-bindings/interrupt-controller/irq.h>
+
+    timer@35006000 {
+        compatible = "brcm,kona-timer";
+        reg = <0x35006000 0x1000>;
+        interrupts = <GIC_SPI 7 IRQ_TYPE_LEVEL_HIGH>;
+        clocks = <&aon_ccu BCM281XX_AON_CCU_HUB_TIMER>;
+    };
+...
diff --git a/Documentation/devicetree/bindings/timer/loongson,ls1x-pwmtimer.yaml b/Documentation/devicetree/bindings/timer/loongson,ls1x-pwmtimer.yaml
new file mode 100644 (file)
index 0000000..ad61ae5
--- /dev/null
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/loongson,ls1x-pwmtimer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Loongson-1 PWM timer
+
+maintainers:
+  - Keguang Zhang <keguang.zhang@gmail.com>
+
+description:
+  Loongson-1 PWM timer can be used for system clock source
+  and clock event timers.
+
+properties:
+  compatible:
+    const: loongson,ls1b-pwmtimer
+
+  reg:
+    maxItems: 1
+
+  clocks:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - interrupts
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/clock/loongson,ls1x-clk.h>
+    #include <dt-bindings/interrupt-controller/irq.h>
+    clocksource: timer@1fe5c030 {
+        compatible = "loongson,ls1b-pwmtimer";
+        reg = <0x1fe5c030 0x10>;
+
+        clocks = <&clkc LS1X_CLKID_APB>;
+        interrupt-parent = <&intc0>;
+        interrupts = <20 IRQ_TYPE_LEVEL_HIGH>;
+    };
diff --git a/Documentation/devicetree/bindings/timer/ralink,rt2880-timer.yaml b/Documentation/devicetree/bindings/timer/ralink,rt2880-timer.yaml
new file mode 100644 (file)
index 0000000..daa7832
--- /dev/null
@@ -0,0 +1,44 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/ralink,rt2880-timer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Timer present in Ralink family SoCs
+
+maintainers:
+  - Sergio Paracuellos <sergio.paracuellos@gmail.com>
+
+properties:
+  compatible:
+    const: ralink,rt2880-timer
+
+  reg:
+    maxItems: 1
+
+  clocks:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - interrupts
+
+additionalProperties: false
+
+examples:
+  - |
+    timer@100 {
+        compatible = "ralink,rt2880-timer";
+        reg = <0x100 0x20>;
+
+        clocks = <&sysc 3>;
+
+        interrupt-parent = <&intc>;
+        interrupts = <1>;
+    };
+...
index cae46c4..69a93a0 100644 (file)
@@ -64,7 +64,7 @@ properties:
     description:
       size of memory intended as internal memory for endpoints
       buffers expressed in KB
-    $ref: /schemas/types.yaml#/definitions/uint32
+    $ref: /schemas/types.yaml#/definitions/uint16
 
   cdns,phyrst-a-enable:
     description: Enable resetting of PHY if Rx fail is detected
index 50edc4d..4f76259 100644 (file)
@@ -287,7 +287,7 @@ properties:
     description:
       High-Speed PHY interface selection between UTMI+ and ULPI when the
       DWC_USB3_HSPHY_INTERFACE has value 3.
-    $ref: /schemas/types.yaml#/definitions/uint8
+    $ref: /schemas/types.yaml#/definitions/string
     enum: [utmi, ulpi]
 
   snps,quirk-frame-length-adjustment:
index b6a2879..0717426 100644 (file)
@@ -415,6 +415,6 @@ When using the DT, this creates problems for of_platform_populate()
 because it must decide whether to register each node as either a
 platform_device or an amba_device.  This unfortunately complicates the
 device creation model a little bit, but the solution turns out not to
-be too invasive.  If a node is compatible with "arm,amba-primecell", then
+be too invasive.  If a node is compatible with "arm,primecell", then
 of_platform_populate() will register it as an amba_device instead of a
 platform_device.
index 23edb42..cd8ad79 100644 (file)
@@ -313,9 +313,18 @@ the documentation build system will automatically turn a reference to
 function name exists.  If you see ``c:func:`` use in a kernel document,
 please feel free to remove it.
 
+Tables
+------
+
+ReStructuredText provides several options for table syntax. Kernel style for
+tables is to prefer *simple table* syntax or *grid table* syntax. See the
+`reStructuredText user reference for table syntax`_ for more details.
+
+.. _reStructuredText user reference for table syntax:
+   https://docutils.sourceforge.io/docs/user/rst/quickref.html#tables
 
 list tables
------------
+~~~~~~~~~~~
 
 The list-table formats can be useful for tables that are not easily laid
 out in the usual Sphinx ASCII-art formats.  These formats are nearly
index 4b4d8e2..7671b53 100644 (file)
@@ -84,7 +84,13 @@ Reference counting
 Atomics
 -------
 
-.. kernel-doc:: arch/x86/include/asm/atomic.h
+.. kernel-doc:: include/linux/atomic/atomic-instrumented.h
+   :internal:
+
+.. kernel-doc:: include/linux/atomic/atomic-arch-fallback.h
+   :internal:
+
+.. kernel-doc:: include/linux/atomic/atomic-long.h
    :internal:
 
 Kernel objects manipulation
index b8c742a..f4f044b 100644 (file)
@@ -106,6 +106,16 @@ will occupy those chip-select rows.
 This term is avoided because it is unclear when needing to distinguish
 between chip-select rows and socket sets.
 
+* High Bandwidth Memory (HBM)
+
+HBM is a new memory type with low power consumption and ultra-wide
+communication lanes. It uses vertically stacked memory chips (DRAM dies)
+interconnected by microscopic wires called "through-silicon vias," or
+TSVs.
+
+Several stacks of HBM chips connect to the CPU or GPU through an ultra-fast
+interconnect called the "interposer". Therefore, HBM's characteristics
+are nearly indistinguishable from on-chip integrated RAM.
 
 Memory Controllers
 ------------------
@@ -176,3 +186,113 @@ nodes::
        the L1 and L2 directories would be "edac_device_block's"
 
 .. kernel-doc:: drivers/edac/edac_device.h
+
+
+Heterogeneous system support
+----------------------------
+
+An AMD heterogeneous system is built by connecting the data fabrics of
+both CPUs and GPUs via custom xGMI links. Thus, the data fabric on the
+GPU nodes can be accessed the same way as the data fabric on CPU nodes.
+
+The MI200 accelerators are data center GPUs. They have 2 data fabrics,
+and each GPU data fabric contains four Unified Memory Controllers (UMC).
+Each UMC contains eight channels. Each UMC channel controls one 128-bit
+HBM2e (2GB) channel (equivalent to 8 X 2GB ranks).  This creates a total
+of 4096-bits of DRAM data bus.
+
+While the UMC is interfacing a 16GB (8high X 2GB DRAM) HBM stack, each UMC
+channel is interfacing 2GB of DRAM (represented as rank).
+
+Memory controllers on AMD GPU nodes can be represented in EDAC thusly:
+
+       GPU DF / GPU Node -> EDAC MC
+       GPU UMC           -> EDAC CSROW
+       GPU UMC channel   -> EDAC CHANNEL
+
+For example: a heterogeneous system with 1 AMD CPU is connected to
+4 MI200 (Aldebaran) GPUs using xGMI.
+
+Some more heterogeneous hardware details:
+
+- The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC.
+  They have chip selects (csrows) and channels. However, the layouts are different
+  for performance, physical layout, or other reasons.
+- CPU UMCs use 1 channel, In this case UMC = EDAC channel. This follows the
+  marketing speak. CPU has X memory channels, etc.
+- CPU UMCs use up to 4 chip selects, So UMC chip select = EDAC CSROW.
+- GPU UMCs use 1 chip select, So UMC = EDAC CSROW.
+- GPU UMCs use 8 channels, So UMC channel = EDAC channel.
+
+The EDAC subsystem provides a mechanism to handle AMD heterogeneous
+systems by calling system specific ops for both CPUs and GPUs.
+
+AMD GPU nodes are enumerated in sequential order based on the PCI
+hierarchy, and the first GPU node is assumed to have a Node ID value
+following those of the CPU nodes after latter are fully populated::
+
+       $ ls /sys/devices/system/edac/mc/
+               mc0   - CPU MC node 0
+               mc1  |
+               mc2  |- GPU card[0] => node 0(mc1), node 1(mc2)
+               mc3  |
+               mc4  |- GPU card[1] => node 0(mc3), node 1(mc4)
+               mc5  |
+               mc6  |- GPU card[2] => node 0(mc5), node 1(mc6)
+               mc7  |
+               mc8  |- GPU card[3] => node 0(mc7), node 1(mc8)
+
+For example, a heterogeneous system with one AMD CPU is connected to
+four MI200 (Aldebaran) GPUs using xGMI. This topology can be represented
+via the following sysfs entries::
+
+       /sys/devices/system/edac/mc/..
+
+       CPU                     # CPU node
+       ├── mc 0
+
+       GPU Nodes are enumerated sequentially after CPU nodes have been populated
+       GPU card 1              # Each MI200 GPU has 2 nodes/mcs
+       ├── mc 1          # GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs
+       │   ├── csrow 0               # UMC 0
+       │   │   ├── channel 0     # Each UMC has 8 channels
+       │   │   ├── channel 1   # size of each channel is 2 GB, so each UMC has 16 GB
+       │   │   ├── channel 2
+       │   │   ├── channel 3
+       │   │   ├── channel 4
+       │   │   ├── channel 5
+       │   │   ├── channel 6
+       │   │   ├── channel 7
+       │   ├── csrow 1               # UMC 1
+       │   │   ├── channel 0
+       │   │   ├── ..
+       │   │   ├── channel 7
+       │   ├── ..            ..
+       │   ├── csrow 3               # UMC 3
+       │   │   ├── channel 0
+       │   │   ├── ..
+       │   │   ├── channel 7
+       │   ├── rank 0
+       │   ├── ..            ..
+       │   ├── rank 31               # total 32 ranks/dimms from 4 UMCs
+       ├
+       ├── mc 2          # GPU node 1 == mc2
+       │   ├── ..            # each GPU has total 64 GB
+
+       GPU card 2
+       ├── mc 3
+       │   ├── ..
+       ├── mc 4
+       │   ├── ..
+
+       GPU card 3
+       ├── mc 5
+       │   ├── ..
+       ├── mc 6
+       │   ├── ..
+
+       GPU card 4
+       ├── mc 7
+       │   ├── ..
+       ├── mc 8
+       │   ├── ..
index 504ba94..dccd61c 100644 (file)
@@ -22,12 +22,11 @@ exclusive.
 3) object removal.  Locking rules: caller locks parent, finds victim,
 locks victim and calls the method.  Locks are exclusive.
 
-4) rename() that is _not_ cross-directory.  Locking rules: caller locks
-the parent and finds source and target.  In case of exchange (with
-RENAME_EXCHANGE in flags argument) lock both.  In any case,
-if the target already exists, lock it.  If the source is a non-directory,
-lock it.  If we need to lock both, lock them in inode pointer order.
-Then call the method.  All locks are exclusive.
+4) rename() that is _not_ cross-directory.  Locking rules: caller locks the
+parent and finds source and target.  We lock both (provided they exist).  If we
+need to lock two inodes of different type (dir vs non-dir), we lock directory
+first.  If we need to lock two inodes of the same type, lock them in inode
+pointer order.  Then call the method.  All locks are exclusive.
 NB: we might get away with locking the source (and target in exchange
 case) shared.
 
@@ -44,15 +43,17 @@ All locks are exclusive.
 rules:
 
        * lock the filesystem
-       * lock parents in "ancestors first" order.
+       * lock parents in "ancestors first" order. If one is not ancestor of
+         the other, lock them in inode pointer order.
        * find source and target.
        * if old parent is equal to or is a descendent of target
          fail with -ENOTEMPTY
        * if new parent is equal to or is a descendent of source
          fail with -ELOOP
-       * If it's an exchange, lock both the source and the target.
-       * If the target exists, lock it.  If the source is a non-directory,
-         lock it.  If we need to lock both, do so in inode pointer order.
+       * Lock both the source and the target provided they exist. If we
+         need to lock two inodes of different type (dir vs non-dir), we lock
+         the directory first. If we need to lock two inodes of the same type,
+         lock them in inode pointer order.
        * call the method.
 
 All ->i_rwsem are taken exclusive.  Again, we might get away with locking
@@ -66,8 +67,9 @@ If no directory is its own ancestor, the scheme above is deadlock-free.
 
 Proof:
 
-       First of all, at any moment we have a partial ordering of the
-       objects - A < B iff A is an ancestor of B.
+       First of all, at any moment we have a linear ordering of the
+       objects - A < B iff (A is an ancestor of B) or (B is not an ancestor
+        of A and ptr(A) < ptr(B)).
 
        That ordering can change.  However, the following is true:
 
index ede672d..cb845e8 100644 (file)
@@ -38,20 +38,14 @@ fail at runtime.
 Use cases
 =========
 
-By itself, the base fs-verity feature only provides integrity
-protection, i.e. detection of accidental (non-malicious) corruption.
+By itself, fs-verity only provides integrity protection, i.e.
+detection of accidental (non-malicious) corruption.
 
 However, because fs-verity makes retrieving the file hash extremely
 efficient, it's primarily meant to be used as a tool to support
 authentication (detection of malicious modifications) or auditing
 (logging file hashes before use).
 
-Trusted userspace code (e.g. operating system code running on a
-read-only partition that is itself authenticated by dm-verity) can
-authenticate the contents of an fs-verity file by using the
-`FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a
-digital signature of it.
-
 A standard file hash could be used instead of fs-verity.  However,
 this is inefficient if the file is large and only a small portion may
 be accessed.  This is often the case for Android application package
@@ -69,24 +63,31 @@ still be used on read-only filesystems.  fs-verity is for files that
 must live on a read-write filesystem because they are independently
 updated and potentially user-installed, so dm-verity cannot be used.
 
-The base fs-verity feature is a hashing mechanism only; actually
-authenticating the files may be done by:
-
-* Userspace-only
-
-* Builtin signature verification + userspace policy
-
-  fs-verity optionally supports a simple signature verification
-  mechanism where users can configure the kernel to require that
-  all fs-verity files be signed by a key loaded into a keyring;
-  see `Built-in signature verification`_.
-
-* Integrity Measurement Architecture (IMA)
-
-  IMA supports including fs-verity file digests and signatures in the
-  IMA measurement list and verifying fs-verity based file signatures
-  stored as security.ima xattrs, based on policy.
-
+fs-verity does not mandate a particular scheme for authenticating its
+file hashes.  (Similarly, dm-verity does not mandate a particular
+scheme for authenticating its block device root hashes.)  Options for
+authenticating fs-verity file hashes include:
+
+- Trusted userspace code.  Often, the userspace code that accesses
+  files can be trusted to authenticate them.  Consider e.g. an
+  application that wants to authenticate data files before using them,
+  or an application loader that is part of the operating system (which
+  is already authenticated in a different way, such as by being loaded
+  from a read-only partition that uses dm-verity) and that wants to
+  authenticate applications before loading them.  In these cases, this
+  trusted userspace code can authenticate a file's contents by
+  retrieving its fs-verity digest using `FS_IOC_MEASURE_VERITY`_, then
+  verifying a signature of it using any userspace cryptographic
+  library that supports digital signatures.
+
+- Integrity Measurement Architecture (IMA).  IMA supports fs-verity
+  file digests as an alternative to its traditional full file digests.
+  "IMA appraisal" enforces that files contain a valid, matching
+  signature in their "security.ima" extended attribute, as controlled
+  by the IMA policy.  For more information, see the IMA documentation.
+
+- Trusted userspace code in combination with `Built-in signature
+  verification`_.  This approach should be used only with great care.
 
 User API
 ========
@@ -111,8 +112,7 @@ follows::
     };
 
 This structure contains the parameters of the Merkle tree to build for
-the file, and optionally contains a signature.  It must be initialized
-as follows:
+the file.  It must be initialized as follows:
 
 - ``version`` must be 1.
 - ``hash_algorithm`` must be the identifier for the hash algorithm to
@@ -129,12 +129,14 @@ as follows:
   file or device.  Currently the maximum salt size is 32 bytes.
 - ``salt_ptr`` is the pointer to the salt, or NULL if no salt is
   provided.
-- ``sig_size`` is the size of the signature in bytes, or 0 if no
-  signature is provided.  Currently the signature is (somewhat
-  arbitrarily) limited to 16128 bytes.  See `Built-in signature
-  verification`_ for more information.
-- ``sig_ptr``  is the pointer to the signature, or NULL if no
-  signature is provided.
+- ``sig_size`` is the size of the builtin signature in bytes, or 0 if no
+  builtin signature is provided.  Currently the builtin signature is
+  (somewhat arbitrarily) limited to 16128 bytes.
+- ``sig_ptr``  is the pointer to the builtin signature, or NULL if no
+  builtin signature is provided.  A builtin signature is only needed
+  if the `Built-in signature verification`_ feature is being used.  It
+  is not needed for IMA appraisal, and it is not needed if the file
+  signature is being handled entirely in userspace.
 - All reserved fields must be zeroed.
 
 FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for
@@ -158,7 +160,7 @@ fatal signal), no changes are made to the file.
 FS_IOC_ENABLE_VERITY can fail with the following errors:
 
 - ``EACCES``: the process does not have write access to the file
-- ``EBADMSG``: the signature is malformed
+- ``EBADMSG``: the builtin signature is malformed
 - ``EBUSY``: this ioctl is already running on the file
 - ``EEXIST``: the file already has verity enabled
 - ``EFAULT``: the caller provided inaccessible memory
@@ -168,10 +170,10 @@ FS_IOC_ENABLE_VERITY can fail with the following errors:
   reserved bits are set; or the file descriptor refers to neither a
   regular file nor a directory.
 - ``EISDIR``: the file descriptor refers to a directory
-- ``EKEYREJECTED``: the signature doesn't match the file
-- ``EMSGSIZE``: the salt or signature is too long
-- ``ENOKEY``: the fs-verity keyring doesn't contain the certificate
-  needed to verify the signature
+- ``EKEYREJECTED``: the builtin signature doesn't match the file
+- ``EMSGSIZE``: the salt or builtin signature is too long
+- ``ENOKEY``: the ".fs-verity" keyring doesn't contain the certificate
+  needed to verify the builtin signature
 - ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not
   available in the kernel's crypto API as currently configured (e.g.
   for SHA-512, missing CONFIG_CRYPTO_SHA512).
@@ -180,8 +182,8 @@ FS_IOC_ENABLE_VERITY can fail with the following errors:
   support; or the filesystem superblock has not had the 'verity'
   feature enabled on it; or the filesystem does not support fs-verity
   on this file.  (See `Filesystem support`_.)
-- ``EPERM``: the file is append-only; or, a signature is required and
-  one was not provided.
+- ``EPERM``: the file is append-only; or, a builtin signature is
+  required and one was not provided.
 - ``EROFS``: the filesystem is read-only
 - ``ETXTBSY``: someone has the file open for writing.  This can be the
   caller's file descriptor, another open file descriptor, or the file
@@ -270,9 +272,9 @@ This ioctl takes in a pointer to the following structure::
 - ``FS_VERITY_METADATA_TYPE_DESCRIPTOR`` reads the fs-verity
   descriptor.  See `fs-verity descriptor`_.
 
-- ``FS_VERITY_METADATA_TYPE_SIGNATURE`` reads the signature which was
-  passed to FS_IOC_ENABLE_VERITY, if any.  See `Built-in signature
-  verification`_.
+- ``FS_VERITY_METADATA_TYPE_SIGNATURE`` reads the builtin signature
+  which was passed to FS_IOC_ENABLE_VERITY, if any.  See `Built-in
+  signature verification`_.
 
 The semantics are similar to those of ``pread()``.  ``offset``
 specifies the offset in bytes into the metadata item to read from, and
@@ -299,7 +301,7 @@ FS_IOC_READ_VERITY_METADATA can fail with the following errors:
   overflowed
 - ``ENODATA``: the file is not a verity file, or
   FS_VERITY_METADATA_TYPE_SIGNATURE was requested but the file doesn't
-  have a built-in signature
+  have a builtin signature
 - ``ENOTTY``: this type of filesystem does not implement fs-verity, or
   this ioctl is not yet implemented on it
 - ``EOPNOTSUPP``: the kernel was not configured with fs-verity
@@ -347,8 +349,8 @@ non-verity one, with the following exceptions:
   with EIO (for read()) or SIGBUS (for mmap() reads).
 
 - If the sysctl "fs.verity.require_signatures" is set to 1 and the
-  file is not signed by a key in the fs-verity keyring, then opening
-  the file will fail.  See `Built-in signature verification`_.
+  file is not signed by a key in the ".fs-verity" keyring, then
+  opening the file will fail.  See `Built-in signature verification`_.
 
 Direct access to the Merkle tree is not supported.  Therefore, if a
 verity file is copied, or is backed up and restored, then it will lose
@@ -433,20 +435,25 @@ root hash as well as other fields such as the file size::
 Built-in signature verification
 ===============================
 
-With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting
-a portion of an authentication policy (see `Use cases`_) in the
-kernel.  Specifically, it adds support for:
+CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y adds supports for in-kernel
+verification of fs-verity builtin signatures.
+
+**IMPORTANT**!  Please take great care before using this feature.
+It is not the only way to do signatures with fs-verity, and the
+alternatives (such as userspace signature verification, and IMA
+appraisal) can be much better.  It's also easy to fall into a trap
+of thinking this feature solves more problems than it actually does.
+
+Enabling this option adds the following:
 
-1. At fs-verity module initialization time, a keyring ".fs-verity" is
-   created.  The root user can add trusted X.509 certificates to this
-   keyring using the add_key() system call, then (when done)
-   optionally use keyctl_restrict_keyring() to prevent additional
-   certificates from being added.
+1. At boot time, the kernel creates a keyring named ".fs-verity".  The
+   root user can add trusted X.509 certificates to this keyring using
+   the add_key() system call.
 
 2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted
    detached signature in DER format of the file's fs-verity digest.
-   On success, this signature is persisted alongside the Merkle tree.
-   Then, any time the file is opened, the kernel will verify the
+   On success, the ioctl persists the signature alongside the Merkle
+   tree.  Then, any time the file is opened, the kernel verifies the
    file's actual digest against this signature, using the certificates
    in the ".fs-verity" keyring.
 
@@ -454,8 +461,8 @@ kernel.  Specifically, it adds support for:
    When set to 1, the kernel requires that all verity files have a
    correctly signed digest as described in (2).
 
-fs-verity file digests must be signed in the following format, which
-is similar to the structure used by `FS_IOC_MEASURE_VERITY`_::
+The data that the signature as described in (2) must be a signature of
+is the fs-verity file digest in the following format::
 
     struct fsverity_formatted_digest {
             char magic[8];                  /* must be "FSVerity" */
@@ -464,13 +471,66 @@ is similar to the structure used by `FS_IOC_MEASURE_VERITY`_::
             __u8 digest[];
     };
 
-fs-verity's built-in signature verification support is meant as a
-relatively simple mechanism that can be used to provide some level of
-authenticity protection for verity files, as an alternative to doing
-the signature verification in userspace or using IMA-appraisal.
-However, with this mechanism, userspace programs still need to check
-that the verity bit is set, and there is no protection against verity
-files being swapped around.
+That's it.  It should be emphasized again that fs-verity builtin
+signatures are not the only way to do signatures with fs-verity.  See
+`Use cases`_ for an overview of ways in which fs-verity can be used.
+fs-verity builtin signatures have some major limitations that should
+be carefully considered before using them:
+
+- Builtin signature verification does *not* make the kernel enforce
+  that any files actually have fs-verity enabled.  Thus, it is not a
+  complete authentication policy.  Currently, if it is used, the only
+  way to complete the authentication policy is for trusted userspace
+  code to explicitly check whether files have fs-verity enabled with a
+  signature before they are accessed.  (With
+  fs.verity.require_signatures=1, just checking whether fs-verity is
+  enabled suffices.)  But, in this case the trusted userspace code
+  could just store the signature alongside the file and verify it
+  itself using a cryptographic library, instead of using this feature.
+
+- A file's builtin signature can only be set at the same time that
+  fs-verity is being enabled on the file.  Changing or deleting the
+  builtin signature later requires re-creating the file.
+
+- Builtin signature verification uses the same set of public keys for
+  all fs-verity enabled files on the system.  Different keys cannot be
+  trusted for different files; each key is all or nothing.
+
+- The sysctl fs.verity.require_signatures applies system-wide.
+  Setting it to 1 only works when all users of fs-verity on the system
+  agree that it should be set to 1.  This limitation can prevent
+  fs-verity from being used in cases where it would be helpful.
+
+- Builtin signature verification can only use signature algorithms
+  that are supported by the kernel.  For example, the kernel does not
+  yet support Ed25519, even though this is often the signature
+  algorithm that is recommended for new cryptographic designs.
+
+- fs-verity builtin signatures are in PKCS#7 format, and the public
+  keys are in X.509 format.  These formats are commonly used,
+  including by some other kernel features (which is why the fs-verity
+  builtin signatures use them), and are very feature rich.
+  Unfortunately, history has shown that code that parses and handles
+  these formats (which are from the 1990s and are based on ASN.1)
+  often has vulnerabilities as a result of their complexity.  This
+  complexity is not inherent to the cryptography itself.
+
+  fs-verity users who do not need advanced features of X.509 and
+  PKCS#7 should strongly consider using simpler formats, such as plain
+  Ed25519 keys and signatures, and verifying signatures in userspace.
+
+  fs-verity users who choose to use X.509 and PKCS#7 anyway should
+  still consider that verifying those signatures in userspace is more
+  flexible (for other reasons mentioned earlier in this document) and
+  eliminates the need to enable CONFIG_FS_VERITY_BUILTIN_SIGNATURES
+  and its associated increase in kernel attack surface.  In some cases
+  it can even be necessary, since advanced X.509 and PKCS#7 features
+  do not always work as intended with the kernel.  For example, the
+  kernel does not check X.509 certificate validity times.
+
+  Note: IMA appraisal, which supports fs-verity, does not use PKCS#7
+  for its signatures, so it partially avoids the issues discussed
+  here.  IMA appraisal does use X.509.
 
 Filesystem support
 ==================
index fbb2b5a..eb252fc 100644 (file)
@@ -72,7 +72,6 @@ Documentation for filesystem implementations.
    befs
    bfs
    btrfs
-   cifs/index
    ceph
    coda
    configfs
@@ -111,6 +110,7 @@ Documentation for filesystem implementations.
    ramfs-rootfs-initramfs
    relay
    romfs
+   smb/index
    spufs/index
    squashfs
    sysfs
index 1649606..447f767 100644 (file)
@@ -6,8 +6,7 @@ Ramfs, rootfs and initramfs
 
 October 17, 2005
 
-Rob Landley <rob@landley.net>
-=============================
+:Author: Rob Landley <rob@landley.net>
 
 What is ramfs?
 --------------
index d833953..1cf5648 100644 (file)
@@ -147,6 +147,7 @@ replicas continue to be exactly same.
 
 
 3) Setting mount states
+-----------------------
 
        The mount command (util-linux package) can be used to set mount
        states::
@@ -612,6 +613,7 @@ replicas continue to be exactly same.
 
 
 6) Quiz
+-------
 
        A. What is the result of the following command sequence?
 
@@ -673,6 +675,7 @@ replicas continue to be exactly same.
                /mnt/1/test be?
 
 7) FAQ
+------
 
        Q1. Why is bind mount needed? How is it different from symbolic links?
                symbolic links can get stale if the destination mount gets
@@ -841,6 +844,7 @@ replicas continue to be exactly same.
                             tmp  usr tmp usr tmp usr
 
 8) Implementation
+-----------------
 
 8A) Datastructure
 
similarity index 97%
rename from Documentation/filesystems/cifs/cifsroot.rst
rename to Documentation/filesystems/smb/cifsroot.rst
index 4930bb4..bf2d9db 100644 (file)
@@ -59,7 +59,7 @@ the root file system via SMB protocol.
 Enables the kernel to mount the root file system via SMB that are
 located in the <server-ip> and <share> specified in this option.
 
-The default mount options are set in fs/cifs/cifsroot.c.
+The default mount options are set in fs/smb/client/cifsroot.c.
 
 server-ip
        IPv4 address of the server.
index f80f956..43c9688 100644 (file)
@@ -1,7 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0
 
 ====
-fpga
+FPGA
 ====
 
 .. toctree::
index 7003bd5..6a9ea96 100644 (file)
@@ -1,7 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0
 
 =======
-locking
+Locking
 =======
 
 .. toctree::
index 80ae503..ec0ddfb 100644 (file)
@@ -56,7 +56,7 @@ by adding the following hook into your git:
        $ cat >.git/hooks/applypatch-msg <<'EOF'
        #!/bin/sh
        . git-sh-setup
-       perl -pi -e 's|^Message-Id:\s*<?([^>]+)>?$|Link: https://lore.kernel.org/r/$1|g;' "$1"
+       perl -pi -e 's|^Message-I[dD]:\s*<?([^>]+)>?$|Link: https://lore.kernel.org/r/$1|g;' "$1"
        test -x "$GIT_DIR/hooks/commit-msg" &&
                exec "$GIT_DIR/hooks/commit-msg" ${1+"$@"}
        :
index cfd8f41..c12838c 100644 (file)
@@ -52,3 +52,22 @@ Build kernel with:
 
 Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page
 table support without extra kernel parameter.
+
+Implementation notes
+====================
+
+We specifically decided not to use VMA information in order to avoid relying on
+MM states (except for limited "struct page" info). The page table check is a
+separate from Linux-MM state machine that verifies that the user accessible
+pages are not falsely shared.
+
+PAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without
+EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory
+regions into the userspace via /dev/mem. At the same time, pages may change
+their properties (e.g., from anonymous pages to named pages) while they are
+still being mapped in the userspace, leading to "corruption" detected by the
+page table check.
+
+Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via
+/dev/mem. However, these pages are always considered as named pages, so they
+won't break the logic used in the page table check.
index 9693957..7840c18 100644 (file)
@@ -3,3 +3,152 @@
 ===========
 Page Tables
 ===========
+
+Paged virtual memory was invented along with virtual memory as a concept in
+1962 on the Ferranti Atlas Computer which was the first computer with paged
+virtual memory. The feature migrated to newer computers and became a de facto
+feature of all Unix-like systems as time went by. In 1985 the feature was
+included in the Intel 80386, which was the CPU Linux 1.0 was developed on.
+
+Page tables map virtual addresses as seen by the CPU into physical addresses
+as seen on the external memory bus.
+
+Linux defines page tables as a hierarchy which is currently five levels in
+height. The architecture code for each supported architecture will then
+map this to the restrictions of the hardware.
+
+The physical address corresponding to the virtual address is often referenced
+by the underlying physical page frame. The **page frame number** or **pfn**
+is the physical address of the page (as seen on the external memory bus)
+divided by `PAGE_SIZE`.
+
+Physical memory address 0 will be *pfn 0* and the highest pfn will be
+the last page of physical memory the external address bus of the CPU can
+address.
+
+With a page granularity of 4KB and a address range of 32 bits, pfn 0 is at
+address 0x00000000, pfn 1 is at address 0x00001000, pfn 2 is at 0x00002000
+and so on until we reach pfn 0xfffff at 0xfffff000. With 16KB pages pfs are
+at 0x00004000, 0x00008000 ... 0xffffc000 and pfn goes from 0 to 0x3fffff.
+
+As you can see, with 4KB pages the page base address uses bits 12-31 of the
+address, and this is why `PAGE_SHIFT` in this case is defined as 12 and
+`PAGE_SIZE` is usually defined in terms of the page shift as `(1 << PAGE_SHIFT)`
+
+Over time a deeper hierarchy has been developed in response to increasing memory
+sizes. When Linux was created, 4KB pages and a single page table called
+`swapper_pg_dir` with 1024 entries was used, covering 4MB which coincided with
+the fact that Torvald's first computer had 4MB of physical memory. Entries in
+this single table were referred to as *PTE*:s - page table entries.
+
+The software page table hierarchy reflects the fact that page table hardware has
+become hierarchical and that in turn is done to save page table memory and
+speed up mapping.
+
+One could of course imagine a single, linear page table with enormous amounts
+of entries, breaking down the whole memory into single pages. Such a page table
+would be very sparse, because large portions of the virtual memory usually
+remains unused. By using hierarchical page tables large holes in the virtual
+address space does not waste valuable page table memory, because it will suffice
+to mark large areas as unmapped at a higher level in the page table hierarchy.
+
+Additionally, on modern CPUs, a higher level page table entry can point directly
+to a physical memory range, which allows mapping a contiguous range of several
+megabytes or even gigabytes in a single high-level page table entry, taking
+shortcuts in mapping virtual memory to physical memory: there is no need to
+traverse deeper in the hierarchy when you find a large mapped range like this.
+
+The page table hierarchy has now developed into this::
+
+  +-----+
+  | PGD |
+  +-----+
+     |
+     |   +-----+
+     +-->| P4D |
+         +-----+
+            |
+            |   +-----+
+            +-->| PUD |
+                +-----+
+                   |
+                   |   +-----+
+                   +-->| PMD |
+                       +-----+
+                          |
+                          |   +-----+
+                          +-->| PTE |
+                              +-----+
+
+
+Symbols on the different levels of the page table hierarchy have the following
+meaning beginning from the bottom:
+
+- **pte**, `pte_t`, `pteval_t` = **Page Table Entry** - mentioned earlier.
+  The *pte* is an array of `PTRS_PER_PTE` elements of the `pteval_t` type, each
+  mapping a single page of virtual memory to a single page of physical memory.
+  The architecture defines the size and contents of `pteval_t`.
+
+  A typical example is that the `pteval_t` is a 32- or 64-bit value with the
+  upper bits being a **pfn** (page frame number), and the lower bits being some
+  architecture-specific bits such as memory protection.
+
+  The **entry** part of the name is a bit confusing because while in Linux 1.0
+  this did refer to a single page table entry in the single top level page
+  table, it was retrofitted to be an array of mapping elements when two-level
+  page tables were first introduced, so the *pte* is the lowermost page
+  *table*, not a page table *entry*.
+
+- **pmd**, `pmd_t`, `pmdval_t` = **Page Middle Directory**, the hierarchy right
+  above the *pte*, with `PTRS_PER_PMD` references to the *pte*:s.
+
+- **pud**, `pud_t`, `pudval_t` = **Page Upper Directory** was introduced after
+  the other levels to handle 4-level page tables. It is potentially unused,
+  or *folded* as we will discuss later.
+
+- **p4d**, `p4d_t`, `p4dval_t` = **Page Level 4 Directory** was introduced to
+  handle 5-level page tables after the *pud* was introduced. Now it was clear
+  that we needed to replace *pgd*, *pmd*, *pud* etc with a figure indicating the
+  directory level and that we cannot go on with ad hoc names any more. This
+  is only used on systems which actually have 5 levels of page tables, otherwise
+  it is folded.
+
+- **pgd**, `pgd_t`, `pgdval_t` = **Page Global Directory** - the Linux kernel
+  main page table handling the PGD for the kernel memory is still found in
+  `swapper_pg_dir`, but each userspace process in the system also has its own
+  memory context and thus its own *pgd*, found in `struct mm_struct` which
+  in turn is referenced to in each `struct task_struct`. So tasks have memory
+  context in the form of a `struct mm_struct` and this in turn has a
+  `struct pgt_t *pgd` pointer to the corresponding page global directory.
+
+To repeat: each level in the page table hierarchy is a *array of pointers*, so
+the **pgd** contains `PTRS_PER_PGD` pointers to the next level below, **p4d**
+contains `PTRS_PER_P4D` pointers to **pud** items and so on. The number of
+pointers on each level is architecture-defined.::
+
+        PMD
+  --> +-----+           PTE
+      | ptr |-------> +-----+
+      | ptr |-        | ptr |-------> PAGE
+      | ptr | \       | ptr |
+      | ptr |  \        ...
+      | ... |   \
+      | ptr |    \         PTE
+      +-----+     +----> +-----+
+                         | ptr |-------> PAGE
+                         | ptr |
+                           ...
+
+
+Page Table Folding
+==================
+
+If the architecture does not use all the page table levels, they can be *folded*
+which means skipped, and all operations performed on page tables will be
+compile-time augmented to just skip a level when accessing the next lower
+level.
+
+Page table handling code that wishes to be architecture-neutral, such as the
+virtual memory manager, will need to be written so that it traverses all of the
+currently five levels. This style should also be preferred for
+architecture-specific code, so as to be robust to future changes.
index 129f413..4846345 100644 (file)
@@ -61,22 +61,6 @@ attribute-sets:
         nested-attributes: bitset-bits
 
   -
-    name: u64-array
-    attributes:
-      -
-        name: u64
-        type: nest
-        multi-attr: true
-        nested-attributes: u64
-  -
-    name: s32-array
-    attributes:
-      -
-        name: s32
-        type: nest
-        multi-attr: true
-        nested-attributes: s32
-  -
     name: string
     attributes:
       -
@@ -239,7 +223,7 @@ attribute-sets:
         name: tx-min-frag-size
         type: u32
       -
-        name: tx-min-frag-size
+        name: rx-min-frag-size
         type: u32
       -
         name: verify-enabled
@@ -310,7 +294,7 @@ attribute-sets:
         name: master-slave-state
         type: u8
       -
-        name: master-slave-lanes
+        name: lanes
         type: u32
       -
         name: rate-matching
@@ -338,7 +322,7 @@ attribute-sets:
         name: ext-substate
         type: u8
       -
-        name: down-cnt
+        name: ext-down-cnt
         type: u32
   -
     name: debug
@@ -593,7 +577,7 @@ attribute-sets:
         name: phc-index
         type: u32
   -
-    name: cable-test-nft-nest-result
+    name: cable-test-ntf-nest-result
     attributes:
       -
         name: pair
@@ -602,7 +586,7 @@ attribute-sets:
         name: code
         type: u8
   -
-    name: cable-test-nft-nest-fault-length
+    name: cable-test-ntf-nest-fault-length
     attributes:
       -
         name: pair
@@ -611,16 +595,16 @@ attribute-sets:
         name: cm
         type: u32
   -
-    name: cable-test-nft-nest
+    name: cable-test-ntf-nest
     attributes:
       -
         name: result
         type: nest
-        nested-attributes: cable-test-nft-nest-result
+        nested-attributes: cable-test-ntf-nest-result
       -
         name: fault-length
         type: nest
-        nested-attributes: cable-test-nft-nest-fault-length
+        nested-attributes: cable-test-ntf-nest-fault-length
   -
     name: cable-test
     attributes:
@@ -634,7 +618,7 @@ attribute-sets:
       -
         name: nest
         type: nest
-        nested-attributes: cable-test-nft-nest
+        nested-attributes: cable-test-ntf-nest
   -
     name: cable-test-tdr-cfg
     attributes:
@@ -705,16 +689,16 @@ attribute-sets:
         type: u8
       -
         name: corrected
-        type: nest
-        nested-attributes: u64-array
+        type: binary
+        sub-type: u64
       -
         name: uncorr
-        type: nest
-        nested-attributes: u64-array
+        type: binary
+        sub-type: u64
       -
         name: corr-bits
-        type: nest
-        nested-attributes: u64-array
+        type: binary
+        sub-type: u64
   -
     name: fec
     attributes:
@@ -792,7 +776,7 @@ attribute-sets:
         name: hist-bkt-hi
         type: u32
       -
-        name: hist-bkt-val
+        name: hist-val
         type: u64
   -
     name: stats
@@ -827,8 +811,8 @@ attribute-sets:
         type: u32
       -
         name: index
-        type: nest
-        nested-attributes: s32-array
+        type: binary
+        sub-type: s32
   -
     name: module
     attributes:
@@ -981,7 +965,7 @@ operations:
             - duplex
             - master-slave-cfg
             - master-slave-state
-            - master-slave-lanes
+            - lanes
             - rate-matching
       dump: *linkmodes-get-op
     -
@@ -1015,7 +999,7 @@ operations:
             - sqi-max
             - ext-state
             - ext-substate
-            - down-cnt
+            - ext-down-cnt
       dump: *linkstate-get-op
     -
       name: debug-get
@@ -1367,7 +1351,7 @@ operations:
         reply:
           attributes:
             - header
-            - cable-test-nft-nest
+            - cable-test-ntf-nest
     -
       name: cable-test-tdr-act
       doc: Cable test TDR.
@@ -1555,7 +1539,7 @@ operations:
             - hkey
       dump: *rss-get-op
     -
-      name: plca-get
+      name: plca-get-cfg
       doc: Get PLCA params.
 
       attribute-set: plca
@@ -1577,7 +1561,7 @@ operations:
             - burst-tmr
       dump: *plca-get-op
     -
-      name: plca-set
+      name: plca-set-cfg
       doc: Set PLCA params.
 
       attribute-set: plca
@@ -1601,7 +1585,7 @@ operations:
     -
       name: plca-ntf
       doc: Notification for change in PLCA params.
-      notify: plca-get
+      notify: plca-get-cfg
     -
       name: mm-get
       doc: Get MAC Merge configuration and state
index 614f1a5..6d89e30 100644 (file)
@@ -68,6 +68,9 @@ attribute-sets:
         type: nest
         nested-attributes: x509
         multi-attr: true
+      -
+        name: peername
+        type: string
   -
     name: done
     attributes:
@@ -105,6 +108,7 @@ operations:
             - auth-mode
             - peer-identity
             - certificate
+            - peername
     -
       name: done
       doc: Handler reports handshake completion
index adc4bf4..28925e1 100644 (file)
@@ -776,10 +776,11 @@ peer_notif_delay
        Specify the delay, in milliseconds, between each peer
        notification (gratuitous ARP and unsolicited IPv6 Neighbor
        Advertisement) when they are issued after a failover event.
-       This delay should be a multiple of the link monitor interval
-       (arp_interval or miimon, whichever is active). The default
-       value is 0 which means to match the value of the link monitor
-       interval.
+       This delay should be a multiple of the MII link monitor interval
+       (miimon).
+
+       The valid range is 0 - 300000. The default value is 0, which means
+       to match the value of the MII link monitor interval.
 
 prio
        Slave priority. A higher number means higher priority.
index 3a7a714..3354ca3 100644 (file)
@@ -40,6 +40,7 @@ flow_steering_mode: Device flow steering mode
 ---------------------------------------------
 The flow steering mode parameter controls the flow steering mode of the driver.
 Two modes are supported:
+
 1. 'dmfs' - Device managed flow steering.
 2. 'smfs' - Software/Driver managed flow steering.
 
@@ -99,6 +100,7 @@ between representors and stacked devices.
 By default metadata is enabled on the supported devices in E-switch.
 Metadata is applicable only for E-switch in switchdev mode and
 users may disable it when NONE of the below use cases will be in use:
+
 1. HCA is in Dual/multi-port RoCE mode.
 2. VF/SF representor bonding (Usually used for Live migration)
 3. Stacked devices
@@ -180,7 +182,8 @@ User commands examples:
 
     $ devlink health diagnose pci/0000:82:00.0 reporter tx
 
-NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
+.. note::
+   This command has valid output only when interface is up, otherwise the command has empty output.
 
 - Show number of tx errors indicated, number of recover flows ended successfully,
   is autorecover enabled and graceful period from last recover::
@@ -232,8 +235,9 @@ User commands examples:
 
     $ devlink health dump show pci/0000:82:00.0 reporter fw
 
-NOTE: This command can run only on the PF which has fw tracer ownership,
-running it on other PF or any VF will return "Operation not permitted".
+.. note::
+   This command can run only on the PF which has fw tracer ownership,
+   running it on other PF or any VF will return "Operation not permitted".
 
 fw fatal reporter
 -----------------
@@ -256,7 +260,8 @@ User commands examples:
 
     $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
 
-NOTE: This command can run only on PF.
+.. note::
+   This command can run only on PF.
 
 vnic reporter
 -------------
@@ -265,28 +270,37 @@ It is responsible for querying the vnic diagnostic counters from fw and displayi
 them in realtime.
 
 Description of the vnic counters:
-total_q_under_processor_handle: number of queues in an error state due to
-an async error or errored command.
-send_queue_priority_update_flow: number of QP/SQ priority/SL update
-events.
-cq_overrun: number of times CQ entered an error state due to an
-overflow.
-async_eq_overrun: number of times an EQ mapped to async events was
-overrun.
-comp_eq_overrun: number of times an EQ mapped to completion events was
-overrun.
-quota_exceeded_command: number of commands issued and failed due to quota
-exceeded.
-invalid_command: number of commands issued and failed dues to any reason
-other than quota exceeded.
-nic_receive_steering_discard: number of packets that completed RX flow
-steering but were discarded due to a mismatch in flow table.
+
+- total_q_under_processor_handle
+        number of queues in an error state due to
+        an async error or errored command.
+- send_queue_priority_update_flow
+        number of QP/SQ priority/SL update events.
+- cq_overrun
+        number of times CQ entered an error state due to an overflow.
+- async_eq_overrun
+        number of times an EQ mapped to async events was overrun.
+        comp_eq_overrun number of times an EQ mapped to completion events was
+        overrun.
+- quota_exceeded_command
+        number of commands issued and failed due to quota exceeded.
+- invalid_command
+        number of commands issued and failed dues to any reason other than quota
+        exceeded.
+- nic_receive_steering_discard
+        number of packets that completed RX flow
+        steering but were discarded due to a mismatch in flow table.
 
 User commands examples:
-- Diagnose PF/VF vnic counters
+
+- Diagnose PF/VF vnic counters::
+
         $ devlink health diagnose pci/0000:82:00.1 reporter vnic
+
 - Diagnose representor vnic counters (performed by supplying devlink port of the
-  representor, which can be obtained via devlink port command)
+  representor, which can be obtained via devlink port command)::
+
         $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic
 
-NOTE: This command can run over all interfaces such as PF/VF and representor ports.
+.. note::
+   This command can run over all interfaces such as PF/VF and representor ports.
index a164ff0..5b75c3f 100644 (file)
@@ -116,8 +116,8 @@ Contents:
    udplite
    vrf
    vxlan
-   x25-iface
    x25
+   x25-iface
    xfrm_device
    xfrm_proc
    xfrm_sync
index 6ec06a3..80b8f73 100644 (file)
@@ -1352,8 +1352,8 @@ ping_group_range - 2 INTEGERS
        Restrict ICMP_PROTO datagram sockets to users in the group range.
        The default is "1 0", meaning, that nobody (not even root) may
        create ping sockets.  Setting it to "100 100" would grant permissions
-       to the single group. "0 4294967295" would enable it for the world, "100
-       4294967295" would enable it for the users, but not daemons.
+       to the single group. "0 4294967294" would enable it for the world, "100
+       4294967294" would enable it for the users, but not daemons.
 
 tcp_early_demux - BOOLEAN
        Enable early demux for established TCP sockets.
index a2817a8..6f5ea16 100644 (file)
@@ -53,6 +53,7 @@ fills in a structure that contains the parameters of the request:
         struct socket   *ta_sock;
         tls_done_func_t ta_done;
         void            *ta_data;
+        const char      *ta_peername;
         unsigned int    ta_timeout_ms;
         key_serial_t    ta_keyring;
         key_serial_t    ta_my_cert;
@@ -71,6 +72,10 @@ instantiated a struct file in sock->file.
 has completed. Further explanation of this function is in the "Handshake
 Completion" sesction below.
 
+The consumer can provide a NUL-terminated hostname in the @ta_peername
+field that is sent as part of ClientHello. If no peername is provided,
+the DNS hostname associated with the server's IP address is used instead.
+
 The consumer can fill in the @ta_timeout_ms field to force the servicing
 handshake agent to exit after a number of milliseconds. This enables the
 socket to be fully closed once both the kernel and the handshake agent
index f34e9ec..285cefc 100644 (file)
@@ -1,8 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0
 
-============================-
 X.25 Device Driver Interface
-============================-
+============================
 
 Version 1.1
 
index 7ae1f62..8067236 100644 (file)
@@ -1,7 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0
 
 ======
-pcmcia
+PCMCIA
 ======
 
 .. toctree::
index 6a919cf..613a01d 100644 (file)
@@ -434,9 +434,10 @@ There are a few hints which can help with linux-kernel survival:
   questions.  Some developers can get impatient with people who clearly
   have not done their homework.
 
-- Avoid top-posting (the practice of putting your answer above the quoted
-  text you are responding to).  It makes your response harder to read and
-  makes a poor impression.
+- Use interleaved ("inline") replies, which makes your response easier to
+  read. (i.e. avoid top-posting -- the practice of putting your answer above
+  the quoted text you are responding to.) For more details, see
+  :ref:`Documentation/process/submitting-patches.rst <interleaved_replies>`.
 
 - Ask on the correct mailing list.  Linux-kernel may be the general meeting
   point, but it is not the best place to find developers from all
index ef54086..5cf6a5f 100644 (file)
@@ -31,7 +31,7 @@ you probably needn't concern yourself with pcmciautils.
 ====================== ===============  ========================================
 GNU C                  5.1              gcc --version
 Clang/LLVM (optional)  11.0.0           clang --version
-Rust (optional)        1.62.0           rustc --version
+Rust (optional)        1.68.2           rustc --version
 bindgen (optional)     0.56.0           bindgen --version
 GNU make               3.82             make --version
 bash                   4.2              bash --version
index abb741b..5d3c3de 100644 (file)
@@ -129,88 +129,132 @@ tools and scripts used by other kernel developers or Linux distributions; one of
 these tools is regzbot, which heavily relies on the "Link:" tags to associate
 reports for regression with changes resolving them.
 
-Prioritize work on fixing regressions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You should fix any reported regression as quickly as possible, to provide
-affected users with a solution in a timely manner and prevent more users from
-running into the issue; nevertheless developers need to take enough time and
-care to ensure regression fixes do not cause additional damage.
-
-In the end though, developers should give their best to prevent users from
-running into situations where a regression leaves them only three options: "run
-a kernel with a regression that seriously impacts usage", "continue running an
-outdated and thus potentially insecure kernel version for more than two weeks
-after a regression's culprit was identified", and "downgrade to a still
-supported kernel series that lack required features".
-
-How to realize this depends a lot on the situation. Here are a few rules of
-thumb for you, in order or importance:
-
- * Prioritize work on handling regression reports and fixing regression over all
-   other Linux kernel work, unless the latter concerns acute security issues or
-   bugs causing data loss or damage.
-
- * Always consider reverting the culprit commits and reapplying them later
-   together with necessary fixes, as this might be the least dangerous and
-   quickest way to fix a regression.
-
- * Developers should handle regressions in all supported kernel series, but are
-   free to delegate the work to the stable team, if the issue probably at no
-   point in time occurred with mainline.
-
- * Try to resolve any regressions introduced in the current development before
-   its end. If you fear a fix might be too risky to apply only days before a new
-   mainline release, let Linus decide: submit the fix separately to him as soon
-   as possible with the explanation of the situation. He then can make a call
-   and postpone the release if necessary, for example if multiple such changes
-   show up in his inbox.
-
- * Address regressions in stable, longterm, or proper mainline releases with
-   more urgency than regressions in mainline pre-releases. That changes after
-   the release of the fifth pre-release, aka "-rc5": mainline then becomes as
-   important, to ensure all the improvements and fixes are ideally tested
-   together for at least one week before Linus releases a new mainline version.
-
- * Fix regressions within two or three days, if they are critical for some
-   reason -- for example, if the issue is likely to affect many users of the
-   kernel series in question on all or certain architectures. Note, this
-   includes mainline, as issues like compile errors otherwise might prevent many
-   testers or continuous integration systems from testing the series.
-
- * Aim to fix regressions within one week after the culprit was identified, if
-   the issue was introduced in either:
-
-    * a recent stable/longterm release
-
-    * the development cycle of the latest proper mainline release
-
-   In the latter case (say Linux v5.14), try to address regressions even
-   quicker, if the stable series for the predecessor (v5.13) will be abandoned
-   soon or already was stamped "End-of-Life" (EOL) -- this usually happens about
-   three to four weeks after a new mainline release.
-
- * Try to fix all other regressions within two weeks after the culprit was
-   found. Two or three additional weeks are acceptable for performance
-   regressions and other issues which are annoying, but don't prevent anyone
-   from running Linux (unless it's an issue in the current development cycle,
-   as those should ideally be addressed before the release). A few weeks in
-   total are acceptable if a regression can only be fixed with a risky change
-   and at the same time is affecting only a few users; as much time is
-   also okay if the regression is already present in the second newest longterm
-   kernel series.
-
-Note: The aforementioned time frames for resolving regressions are meant to
-include getting the fix tested, reviewed, and merged into mainline, ideally with
-the fix being in linux-next at least briefly. This leads to delays you need to
-account for.
-
-Subsystem maintainers are expected to assist in reaching those periods by doing
-timely reviews and quick handling of accepted patches. They thus might have to
-send git-pull requests earlier or more often than usual; depending on the fix,
-it might even be acceptable to skip testing in linux-next. Especially fixes for
-regressions in stable and longterm kernels need to be handled quickly, as fixes
-need to be merged in mainline before they can be backported to older series.
+Expectations and best practices for fixing regressions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As a Linux kernel developer, you are expected to give your best to prevent
+situations where a regression caused by a recent change of yours leaves users
+only these options:
+
+ * Run a kernel with a regression that impacts usage.
+
+ * Switch to an older or newer kernel series.
+
+ * Continue running an outdated and thus potentially insecure kernel for more
+   than three weeks after the regression's culprit was identified. Ideally it
+   should be less than two. And it ought to be just a few days, if the issue is
+   severe or affects many users -- either in general or in prevalent
+   environments.
+
+How to realize that in practice depends on various factors. Use the following
+rules of thumb as a guide.
+
+In general:
+
+ * Prioritize work on regressions over all other Linux kernel work, unless the
+   latter concerns a severe issue (e.g. acute security vulnerability, data loss,
+   bricked hardware, ...).
+
+ * Expedite fixing mainline regressions that recently made it into a proper
+   mainline, stable, or longterm release (either directly or via backport).
+
+ * Do not consider regressions from the current cycle as something that can wait
+   till the end of the cycle, as the issue might discourage or prevent users and
+   CI systems from testing mainline now or generally.
+
+ * Work with the required care to avoid additional or bigger damage, even if
+   resolving an issue then might take longer than outlined below.
+
+On timing once the culprit of a regression is known:
+
+ * Aim to mainline a fix within two or three days, if the issue is severe or
+   bothering many users -- either in general or in prevalent conditions like a
+   particular hardware environment, distribution, or stable/longterm series.
+
+ * Aim to mainline a fix by Sunday after the next, if the culprit made it
+   into a recent mainline, stable, or longterm release (either directly or via
+   backport); if the culprit became known early during a week and is simple to
+   resolve, try to mainline the fix within the same week.
+
+ * For other regressions, aim to mainline fixes before the hindmost Sunday
+   within the next three weeks. One or two Sundays later are acceptable, if the
+   regression is something people can live with easily for a while -- like a
+   mild performance regression.
+
+ * It's strongly discouraged to delay mainlining regression fixes till the next
+   merge window, except when the fix is extraordinarily risky or when the
+   culprit was mainlined more than a year ago.
+
+On procedure:
+
+ * Always consider reverting the culprit, as it's often the quickest and least
+   dangerous way to fix a regression. Don't worry about mainlining a fixed
+   variant later: that should be straight-forward, as most of the code went
+   through review once already.
+
+ * Try to resolve any regressions introduced in mainline during the past
+   twelve months before the current development cycle ends: Linus wants such
+   regressions to be handled like those from the current cycle, unless fixing
+   bears unusual risks.
+
+ * Consider CCing Linus on discussions or patch review, if a regression seems
+   tangly. Do the same in precarious or urgent cases -- especially if the
+   subsystem maintainer might be unavailable. Also CC the stable team, when you
+   know such a regression made it into a mainline, stable, or longterm release.
+
+ * For urgent regressions, consider asking Linus to pick up the fix straight
+   from the mailing list: he is totally fine with that for uncontroversial
+   fixes. Ideally though such requests should happen in accordance with the
+   subsystem maintainers or come directly from them.
+
+ * In case you are unsure if a fix is worth the risk applying just days before
+   a new mainline release, send Linus a mail with the usual lists and people in
+   CC; in it, summarize the situation while asking him to consider picking up
+   the fix straight from the list. He then himself can make the call and when
+   needed even postpone the release. Such requests again should ideally happen
+   in accordance with the subsystem maintainers or come directly from them.
+
+Regarding stable and longterm kernels:
+
+ * You are free to leave regressions to the stable team, if they at no point in
+   time occurred with mainline or were fixed there already.
+
+ * If a regression made it into a proper mainline release during the past
+   twelve months, ensure to tag the fix with "Cc: stable@vger.kernel.org", as a
+   "Fixes:" tag alone does not guarantee a backport. Please add the same tag,
+   in case you know the culprit was backported to stable or longterm kernels.
+
+ * When receiving reports about regressions in recent stable or longterm kernel
+   series, please evaluate at least briefly if the issue might happen in current
+   mainline as well -- and if that seems likely, take hold of the report. If in
+   doubt, ask the reporter to check mainline.
+
+ * Whenever you want to swiftly resolve a regression that recently also made it
+   into a proper mainline, stable, or longterm release, fix it quickly in
+   mainline; when appropriate thus involve Linus to fast-track the fix (see
+   above). That's because the stable team normally does neither revert nor fix
+   any changes that cause the same problems in mainline.
+
+ * In case of urgent regression fixes you might want to ensure prompt
+   backporting by dropping the stable team a note once the fix was mainlined;
+   this is especially advisable during merge windows and shortly thereafter, as
+   the fix otherwise might land at the end of a huge patch queue.
+
+On patch flow:
+
+ * Developers, when trying to reach the time periods mentioned above, remember
+   to account for the time it takes to get fixes tested, reviewed, and merged by
+   Linus, ideally with them being in linux-next at least briefly. Hence, if a
+   fix is urgent, make it obvious to ensure others handle it appropriately.
+
+ * Reviewers, you are kindly asked to assist developers in reaching the time
+   periods mentioned above by reviewing regression fixes in a timely manner.
+
+ * Subsystem maintainers, you likewise are encouraged to expedite the handling
+   of regression fixes. Thus evaluate if skipping linux-next is an option for
+   the particular fix. Also consider sending git pull requests more often than
+   usual when needed. And try to avoid holding onto regression fixes over
+   weekends -- especially when the fix is marked for backporting.
 
 
 More aspects regarding regressions developers should be aware of
index f73ac9e..83614ce 100644 (file)
@@ -127,13 +127,32 @@ the value of ``Message-ID`` to the URL above.
 Updating patch status
 ~~~~~~~~~~~~~~~~~~~~~
 
-It may be tempting to help the maintainers and update the state of your
-own patches when you post a new version or spot a bug. Please **do not**
-do that.
-Interfering with the patch status on patchwork will only cause confusion. Leave
-it to the maintainer to figure out what is the most recent and current
-version that should be applied. If there is any doubt, the maintainer
-will reply and ask what should be done.
+Contributors and reviewers do not have the permissions to update patch
+state directly in patchwork. Patchwork doesn't expose much information
+about the history of the state of patches, therefore having multiple
+people update the state leads to confusion.
+
+Instead of delegating patchwork permissions netdev uses a simple mail
+bot which looks for special commands/lines within the emails sent to
+the mailing list. For example to mark a series as Changes Requested
+one needs to send the following line anywhere in the email thread::
+
+  pw-bot: changes-requested
+
+As a result the bot will set the entire series to Changes Requested.
+This may be useful when author discovers a bug in their own series
+and wants to prevent it from getting applied.
+
+The use of the bot is entirely optional, if in doubt ignore its existence
+completely. Maintainers will classify and update the state of the patches
+themselves. No email should ever be sent to the list with the main purpose
+of communicating with the bot, the bot commands should be seen as metadata.
+
+The use of the bot is restricted to authors of the patches (the ``From:``
+header on patch submission and command must match!), maintainers themselves
+and a handful of senior reviewers. Bot records its activity here:
+
+  https://patchwork.hopto.org/pw-bot.html
 
 Review timelines
 ~~~~~~~~~~~~~~~~
index 178c95f..93d8a79 100644 (file)
@@ -421,6 +421,9 @@ allowing themselves a breath. Please respect that.
 The release candidate -rc1 is the starting point for new patches to be
 applied which are targeted for the next merge window.
 
+So called _urgent_ branches will be merged into mainline during the
+stabilization phase of each release.
+
 
 Git
 ^^^
index 486875f..efac910 100644 (file)
@@ -331,6 +331,31 @@ explaining difference against previous submission (see
 See Documentation/process/email-clients.rst for recommendations on email
 clients and mailing list etiquette.
 
+.. _interleaved_replies:
+
+Use trimmed interleaved replies in email discussions
+----------------------------------------------------
+Top-posting is strongly discouraged in Linux kernel development
+discussions. Interleaved (or "inline") replies make conversations much
+easier to follow. For more details see:
+https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
+
+As is frequently quoted on the mailing list::
+
+  A: http://en.wikipedia.org/wiki/Top_post
+  Q: Were do I find info about this thing called top-posting?
+  A: Because it messes up the order in which people normally read text.
+  Q: Why is top-posting such a bad thing?
+  A: Top-posting.
+  Q: What is the most annoying thing in e-mail?
+
+Similarly, please trim all unneeded quotations that aren't relevant
+to your reply. This makes responses easier to find, and saves time and
+space. For more details see: http://daringfireball.net/2007/07/on_top ::
+
+  A: No.
+  Q: Should I include quotations after my reply?
+
 .. _resend_reminders:
 
 Don't get discouraged - or impatient
index 07d5a56..634aa22 100644 (file)
@@ -16,6 +16,24 @@ tested code over experimental code.  We wish to extend these same
 principles to the RISC-V-related code that will be accepted for
 inclusion in the kernel.
 
+Patchwork
+---------
+
+RISC-V has a patchwork instance, where the status of patches can be checked:
+
+  https://patchwork.kernel.org/project/linux-riscv/list/
+
+If your patch does not appear in the default view, the RISC-V maintainers have
+likely either requested changes, or expect it to be applied to another tree.
+
+Automation runs against this patchwork instance, building/testing patches as
+they arrive. The automation applies patches against the current HEAD of the
+RISC-V `for-next` and `fixes` branches, depending on whether the patch has been
+detected as a fix. Failing those, it will use the RISC-V `master` branch.
+The exact commit to which a series has been applied will be noted on patchwork.
+Patches for which any of the checks fail are unlikely to be applied and in most
+cases will need to be resubmitted.
+
 Submit Checklist Addendum
 -------------------------
 We'll only accept patches for new modules or extensions if the
index 13b7744..a893151 100644 (file)
@@ -38,9 +38,9 @@ and run::
 
        rustup override set $(scripts/min-tool-version.sh rustc)
 
-Otherwise, fetch a standalone installer or install ``rustup`` from:
+Otherwise, fetch a standalone installer from:
 
-       https://www.rust-lang.org
+       https://forge.rust-lang.org/infra/other-installation-methods.html#standalone
 
 
 Rust standard library source
index d46e98c..bb3f4c4 100644 (file)
@@ -551,7 +551,6 @@ These are the steps:
    * IOMMU_SUPPORT
    * S390
    * ZCRYPT
-   * S390_AP_IOMMU
    * VFIO
    * KVM
 
index 9d9be52..9fe4846 100644 (file)
@@ -203,12 +203,15 @@ Deadline Task Scheduling
   - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
     runqueue, including the tasks in Inactive state.
 
+  - Maximum usable bandwidth (max_bw): This is the maximum bandwidth usable by
+    deadline tasks and is currently set to the RT capacity.
+
 
  The algorithm reclaims the bandwidth of the tasks in Inactive state.
  It does so by decrementing the runtime of the executing task Ti at a pace equal
  to
 
-           dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
+           dq = -(max{ Ui, (Umax - Uinact - Uextra) } / Umax) dt
 
  where:
 
index 8a6860f..7542220 100644 (file)
@@ -1,5 +1,5 @@
 =================================
-brief tutorial on CRC computation
+Brief tutorial on CRC computation
 =================================
 
 A CRC is a long-division remainder.  You add the CRC to the message,
index b51f385..02d6dc3 100644 (file)
@@ -10,6 +10,30 @@ is taken directly from the kernel source, with supplemental material added
 as needed (or at least as we managed to add it — probably *not* all that is
 needed).
 
+Human interfaces
+----------------
+
+.. toctree::
+   :maxdepth: 1
+
+   input/index
+   hid/index
+   sound/index
+   gpu/index
+   fb/index
+
+Storage interfaces
+------------------
+
+.. toctree::
+   :maxdepth: 1
+
+   filesystems/index
+   block/index
+   cdrom/index
+   scsi/index
+   target/index
+
 **Fixme**: much more organizational work is needed here.
 
 .. toctree::
@@ -19,12 +43,8 @@ needed).
    core-api/index
    locking/index
    accounting/index
-   block/index
-   cdrom/index
    cpu-freq/index
-   fb/index
    fpga/index
-   hid/index
    i2c/index
    iio/index
    isdn/index
@@ -34,25 +54,19 @@ needed).
    networking/index
    pcmcia/index
    power/index
-   target/index
    timers/index
    spi/index
    w1/index
    watchdog/index
    virt/index
-   input/index
    hwmon/index
-   gpu/index
    accel/index
    security/index
-   sound/index
    crypto/index
-   filesystems/index
    mm/index
    bpf/index
    usb/index
    PCI/index
-   scsi/index
    misc-devices/index
    scheduler/index
    mhi/index
index df510ad..983f91f 100644 (file)
@@ -1,7 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0
 
 ======
-timers
+Timers
 ======
 
 .. toctree::
index 479c9ea..3c9b263 100644 (file)
@@ -35,7 +35,7 @@ Documentation written by Tom Zanussi
   in place of an explicit value field - this is simply a count of
   event hits.  If 'values' isn't specified, an implicit 'hitcount'
   value will be automatically created and used as the only value.
-  Keys can be any field, or the special string 'stacktrace', which
+  Keys can be any field, or the special string 'common_stacktrace', which
   will use the event's kernel stacktrace as the key.  The keywords
   'keys' or 'key' can be used to specify keys, and the keywords
   'values', 'vals', or 'val' can be used to specify values.  Compound
@@ -54,7 +54,7 @@ Documentation written by Tom Zanussi
   'compatible' if the fields named in the trigger share the same
   number and type of fields and those fields also have the same names.
   Note that any two events always share the compatible 'hitcount' and
-  'stacktrace' fields and can therefore be combined using those
+  'common_stacktrace' fields and can therefore be combined using those
   fields, however pointless that may be.
 
   'hist' triggers add a 'hist' file to each event's subdirectory.
@@ -547,9 +547,9 @@ Extended error information
   the hist trigger display symbolic call_sites, we can have the hist
   trigger additionally display the complete set of kernel stack traces
   that led to each call_site.  To do that, we simply use the special
-  value 'stacktrace' for the key parameter::
+  value 'common_stacktrace' for the key parameter::
 
-    # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
+    # echo 'hist:keys=common_stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
            /sys/kernel/tracing/events/kmem/kmalloc/trigger
 
   The above trigger will use the kernel stack trace in effect when an
@@ -561,9 +561,9 @@ Extended error information
   every callpath to a kmalloc for a kernel compile)::
 
     # cat /sys/kernel/tracing/events/kmem/kmalloc/hist
-    # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
+    # trigger info: hist:keys=common_stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
 
-    { stacktrace:
+    { common_stacktrace:
          __kmalloc_track_caller+0x10b/0x1a0
          kmemdup+0x20/0x50
          hidraw_report_event+0x8a/0x120 [hid]
@@ -581,7 +581,7 @@ Extended error information
          cpu_startup_entry+0x315/0x3e0
          rest_init+0x7c/0x80
     } hitcount:          3  bytes_req:         21  bytes_alloc:         24
-    { stacktrace:
+    { common_stacktrace:
          __kmalloc_track_caller+0x10b/0x1a0
          kmemdup+0x20/0x50
          hidraw_report_event+0x8a/0x120 [hid]
@@ -596,7 +596,7 @@ Extended error information
          do_IRQ+0x5a/0xf0
          ret_from_intr+0x0/0x30
     } hitcount:          3  bytes_req:         21  bytes_alloc:         24
-    { stacktrace:
+    { common_stacktrace:
          kmem_cache_alloc_trace+0xeb/0x150
          aa_alloc_task_context+0x27/0x40
          apparmor_cred_prepare+0x1f/0x50
@@ -608,7 +608,7 @@ Extended error information
     .
     .
     .
-    { stacktrace:
+    { common_stacktrace:
          __kmalloc+0x11b/0x1b0
          i915_gem_execbuffer2+0x6c/0x2c0 [i915]
          drm_ioctl+0x349/0x670 [drm]
@@ -616,7 +616,7 @@ Extended error information
          SyS_ioctl+0x81/0xa0
          system_call_fastpath+0x12/0x6a
     } hitcount:      17726  bytes_req:   13944120  bytes_alloc:   19593808
-    { stacktrace:
+    { common_stacktrace:
          __kmalloc+0x11b/0x1b0
          load_elf_phdrs+0x76/0xa0
          load_elf_binary+0x102/0x1650
@@ -625,7 +625,7 @@ Extended error information
          SyS_execve+0x3a/0x50
          return_from_execve+0x0/0x23
     } hitcount:      33348  bytes_req:   17152128  bytes_alloc:   20226048
-    { stacktrace:
+    { common_stacktrace:
          kmem_cache_alloc_trace+0xeb/0x150
          apparmor_file_alloc_security+0x27/0x40
          security_file_alloc+0x16/0x20
@@ -636,7 +636,7 @@ Extended error information
          SyS_open+0x1e/0x20
          system_call_fastpath+0x12/0x6a
     } hitcount:    4766422  bytes_req:    9532844  bytes_alloc:   38131376
-    { stacktrace:
+    { common_stacktrace:
          __kmalloc+0x11b/0x1b0
          seq_buf_alloc+0x1b/0x50
          seq_read+0x2cc/0x370
@@ -1026,7 +1026,7 @@ Extended error information
   First we set up an initially paused stacktrace trigger on the
   netif_receive_skb event::
 
-    # echo 'hist:key=stacktrace:vals=len:pause' > \
+    # echo 'hist:key=common_stacktrace:vals=len:pause' > \
            /sys/kernel/tracing/events/net/netif_receive_skb/trigger
 
   Next, we set up an 'enable_hist' trigger on the sched_process_exec
@@ -1060,9 +1060,9 @@ Extended error information
     $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
 
     # cat /sys/kernel/tracing/events/net/netif_receive_skb/hist
-    # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
+    # trigger info: hist:keys=common_stacktrace:vals=len:sort=hitcount:size=2048 [paused]
 
-    { stacktrace:
+    { common_stacktrace:
          __netif_receive_skb_core+0x46d/0x990
          __netif_receive_skb+0x18/0x60
          netif_receive_skb_internal+0x23/0x90
@@ -1079,7 +1079,7 @@ Extended error information
          kthread+0xd2/0xf0
          ret_from_fork+0x42/0x70
     } hitcount:         85  len:      28884
-    { stacktrace:
+    { common_stacktrace:
          __netif_receive_skb_core+0x46d/0x990
          __netif_receive_skb+0x18/0x60
          netif_receive_skb_internal+0x23/0x90
@@ -1097,7 +1097,7 @@ Extended error information
          irq_thread+0x11f/0x150
          kthread+0xd2/0xf0
     } hitcount:         98  len:     664329
-    { stacktrace:
+    { common_stacktrace:
          __netif_receive_skb_core+0x46d/0x990
          __netif_receive_skb+0x18/0x60
          process_backlog+0xa8/0x150
@@ -1115,7 +1115,7 @@ Extended error information
          inet_sendmsg+0x64/0xa0
          sock_sendmsg+0x3d/0x50
     } hitcount:        115  len:      13030
-    { stacktrace:
+    { common_stacktrace:
          __netif_receive_skb_core+0x46d/0x990
          __netif_receive_skb+0x18/0x60
          netif_receive_skb_internal+0x23/0x90
@@ -1142,14 +1142,14 @@ Extended error information
   into the histogram.  In order to avoid having to set everything up
   again, we can just clear the histogram first::
 
-    # echo 'hist:key=stacktrace:vals=len:clear' >> \
+    # echo 'hist:key=common_stacktrace:vals=len:clear' >> \
            /sys/kernel/tracing/events/net/netif_receive_skb/trigger
 
   Just to verify that it is in fact cleared, here's what we now see in
   the hist file::
 
     # cat /sys/kernel/tracing/events/net/netif_receive_skb/hist
-    # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
+    # trigger info: hist:keys=common_stacktrace:vals=len:sort=hitcount:size=2048 [paused]
 
     Totals:
         Hits: 0
@@ -1485,12 +1485,12 @@ Extended error information
 
   And here's an example that shows how to combine histogram data from
   any two events even if they don't share any 'compatible' fields
-  other than 'hitcount' and 'stacktrace'.  These commands create a
+  other than 'hitcount' and 'common_stacktrace'.  These commands create a
   couple of triggers named 'bar' using those fields::
 
-    # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
+    # echo 'hist:name=bar:key=common_stacktrace:val=hitcount' > \
            /sys/kernel/tracing/events/sched/sched_process_fork/trigger
-    # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
+    # echo 'hist:name=bar:key=common_stacktrace:val=hitcount' > \
           /sys/kernel/tracing/events/net/netif_rx/trigger
 
   And displaying the output of either shows some interesting if
@@ -1501,16 +1501,16 @@ Extended error information
 
     # event histogram
     #
-    # trigger info: hist:name=bar:keys=stacktrace:vals=hitcount:sort=hitcount:size=2048 [active]
+    # trigger info: hist:name=bar:keys=common_stacktrace:vals=hitcount:sort=hitcount:size=2048 [active]
     #
 
-    { stacktrace:
+    { common_stacktrace:
              kernel_clone+0x18e/0x330
              kernel_thread+0x29/0x30
              kthreadd+0x154/0x1b0
              ret_from_fork+0x3f/0x70
     } hitcount:          1
-    { stacktrace:
+    { common_stacktrace:
              netif_rx_internal+0xb2/0xd0
              netif_rx_ni+0x20/0x70
              dev_loopback_xmit+0xaa/0xd0
@@ -1528,7 +1528,7 @@ Extended error information
              call_cpuidle+0x3b/0x60
              cpu_startup_entry+0x22d/0x310
     } hitcount:          1
-    { stacktrace:
+    { common_stacktrace:
              netif_rx_internal+0xb2/0xd0
              netif_rx_ni+0x20/0x70
              dev_loopback_xmit+0xaa/0xd0
@@ -1543,7 +1543,7 @@ Extended error information
              SyS_sendto+0xe/0x10
              entry_SYSCALL_64_fastpath+0x12/0x6a
     } hitcount:          2
-    { stacktrace:
+    { common_stacktrace:
              netif_rx_internal+0xb2/0xd0
              netif_rx+0x1c/0x60
              loopback_xmit+0x6c/0xb0
@@ -1561,7 +1561,7 @@ Extended error information
              sock_sendmsg+0x38/0x50
              ___sys_sendmsg+0x14e/0x270
     } hitcount:         76
-    { stacktrace:
+    { common_stacktrace:
              netif_rx_internal+0xb2/0xd0
              netif_rx+0x1c/0x60
              loopback_xmit+0x6c/0xb0
@@ -1579,7 +1579,7 @@ Extended error information
              sock_sendmsg+0x38/0x50
              ___sys_sendmsg+0x269/0x270
     } hitcount:         77
-    { stacktrace:
+    { common_stacktrace:
              netif_rx_internal+0xb2/0xd0
              netif_rx+0x1c/0x60
              loopback_xmit+0x6c/0xb0
@@ -1597,7 +1597,7 @@ Extended error information
              sock_sendmsg+0x38/0x50
              SYSC_sendto+0xef/0x170
     } hitcount:         88
-    { stacktrace:
+    { common_stacktrace:
              kernel_clone+0x18e/0x330
              SyS_clone+0x19/0x20
              entry_SYSCALL_64_fastpath+0x12/0x6a
@@ -1949,7 +1949,7 @@ uninterruptible state::
 
   # cd /sys/kernel/tracing
   # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
-  # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 2' >> events/sched/sched_switch/trigger
+  # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=common_stacktrace  if prev_state == 2' >> events/sched/sched_switch/trigger
   # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
   # echo 1 > events/synthetic/block_lat/enable
   # cat trace
index f79987e..e7b0731 100644 (file)
@@ -14,10 +14,6 @@ Programs can view status of the events via
 /sys/kernel/tracing/user_events_status and can both register and write
 data out via /sys/kernel/tracing/user_events_data.
 
-Programs can also use /sys/kernel/tracing/dynamic_events to register and
-delete user based events via the u: prefix. The format of the command to
-dynamic_events is the same as the ioctl with the u: prefix applied.
-
 Typically programs will register a set of events that they wish to expose to
 tools that can read trace_events (such as ftrace and perf). The registration
 process tells the kernel which address and bit to reflect if any tool has
@@ -144,6 +140,9 @@ its name. Delete will only succeed if there are no references left to the
 event (in both user and kernel space). User programs should use a separate file
 to request deletes than the one used for registration due to this.
 
+**NOTE:** By default events will auto-delete when there are no references left
+to the event. Flags in the future may change this logic.
+
 Unregistering
 -------------
 If after registering an event it is no longer wanted to be updated then it can
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/arm/booting.rst
+Chinese translated version of Documentation/arch/arm/booting.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -9,7 +9,7 @@ or if there is a problem with the translation.
 Maintainer: Russell King <linux@arm.linux.org.uk>
 Chinese maintainer: Fu Wei <tekkamanninja@gmail.com>
 ---------------------------------------------------------------------
-Documentation/arm/booting.rst 的中文翻译
+Documentation/arch/arm/booting.rst 的中文翻译
 
 如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
 交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/arm/kernel_user_helpers.rst
+Chinese translated version of Documentation/arch/arm/kernel_user_helpers.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -10,7 +10,7 @@ Maintainer: Nicolas Pitre <nicolas.pitre@linaro.org>
                Dave Martin <dave.martin@linaro.org>
 Chinese maintainer: Fu Wei <tekkamanninja@gmail.com>
 ---------------------------------------------------------------------
-Documentation/arm/kernel_user_helpers.rst 的中文翻译
+Documentation/arch/arm/kernel_user_helpers.rst 的中文翻译
 
 如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
 交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
index c6aee82..19ba4ae 100644 (file)
@@ -325,6 +325,6 @@ Primecell设备。然而,棘手的一点是,AMBA总线上的所有设备并
 
 当使用DT时,这给of_platform_populate()带来了问题,因为它必须决定是否将
 每个节点注册为platform_device或amba_device。不幸的是,这使设备创建模型
-变得有点复杂,但解决方案原来并不是太具有侵略性。如果一个节点与“arm,amba-primecell”
+变得有点复杂,但解决方案原来并不是太具有侵略性。如果一个节点与“arm,primecell”
 兼容,那么of_platform_populate()将把它注册为amba_device而不是
 platform_device。
index 176e8fc..4f7b23f 100644 (file)
@@ -363,7 +363,7 @@ Code  Seq#    Include File                                           Comments
 0xCC  00-0F  drivers/misc/ibmvmc.h                                   pseries VMC driver
 0xCD  01     linux/reiserfs_fs.h
 0xCE  01-02  uapi/linux/cxl_mem.h                                    Compute Express Link Memory Devices
-0xCF  02     fs/cifs/ioctl.c
+0xCF  02     fs/smb/client/cifs_ioctl.h
 0xDB  00-0F  drivers/char/mwave/mwavepub.h
 0xDD  00-3F                                                          ZFCP device driver see drivers/s390/scsi/
                                                                      <mailto:aherrman@de.ibm.com>
index b4e7479..922291d 100644 (file)
@@ -72,7 +72,7 @@ high once achieves global guest_halt_poll_ns value).
 
 Default: Y
 
-The module parameters can be set from the debugfs files in::
+The module parameters can be set from the sysfs files in::
 
        /sys/module/haltpoll/parameters/
 
index 3fae39b..4f1a1b2 100644 (file)
@@ -112,11 +112,11 @@ powerpc kvm-hv case.
 |                      | function.                 |                         |
 +-----------------------+---------------------------+-------------------------+
 
-These module parameters can be set from the debugfs files in:
+These module parameters can be set from the sysfs files in:
 
        /sys/module/kvm/parameters/
 
-Note: that these module parameters are system wide values and are not able to
+Note: these module parameters are system-wide values and are not able to
       be tuned on a per vm basis.
 
 Any changes to these parameters will be picked up by new and existing vCPUs the
@@ -142,12 +142,12 @@ Further Notes
   global max polling interval (halt_poll_ns) then the host will always poll for the
   entire block time and thus cpu utilisation will go to 100%.
 
-- Halt polling essentially presents a trade off between power usage and latency and
+- Halt polling essentially presents a trade-off between power usage and latency and
   the module parameters should be used to tune the affinity for this. Idle cpu time is
   essentially converted to host kernel time with the aim of decreasing latency when
   entering the guest.
 
 - Halt polling will only be conducted by the host when no other tasks are runnable on
   that cpu, otherwise the polling will cease immediately and schedule will be invoked to
-  allow that other task to run. Thus this doesn't allow a guest to denial of service the
-  cpu.
+  allow that other task to run. Thus this doesn't allow a guest to cause denial of service
+  of the cpu.
index 8c77554..3a034db 100644 (file)
@@ -67,7 +67,7 @@ following two cases:
 2. Write-Protection: The SPTE is present and the fault is caused by
    write-protect. That means we just need to change the W bit of the spte.
 
-What we use to avoid all the race is the Host-writable bit and MMU-writable bit
+What we use to avoid all the races is the Host-writable bit and MMU-writable bit
 on the spte:
 
 - Host-writable means the gfn is writable in the host kernel page tables and in
@@ -130,7 +130,7 @@ to gfn.  For indirect sp, we disabled fast page fault for simplicity.
 A solution for indirect sp could be to pin the gfn, for example via
 kvm_vcpu_gfn_to_pfn_atomic, before the cmpxchg.  After the pinning:
 
-- We have held the refcount of pfn that means the pfn can not be freed and
+- We have held the refcount of pfn; that means the pfn can not be freed and
   be reused for another gfn.
 - The pfn is writable and therefore it cannot be shared between different gfns
   by KSM.
@@ -186,22 +186,22 @@ writable between reading spte and updating spte. Like below case:
 The Dirty bit is lost in this case.
 
 In order to avoid this kind of issue, we always treat the spte as "volatile"
-if it can be updated out of mmu-lock, see spte_has_volatile_bits(), it means,
+if it can be updated out of mmu-lock [see spte_has_volatile_bits()]; it means
 the spte is always atomically updated in this case.
 
 3) flush tlbs due to spte updated
 
-If the spte is updated from writable to readonly, we should flush all TLBs,
+If the spte is updated from writable to read-only, we should flush all TLBs,
 otherwise rmap_write_protect will find a read-only spte, even though the
 writable spte might be cached on a CPU's TLB.
 
 As mentioned before, the spte can be updated to writable out of mmu-lock on
-fast page fault path, in order to easily audit the path, we see if TLBs need
-be flushed caused by this reason in mmu_spte_update() since this is a common
+fast page fault path. In order to easily audit the path, we see if TLBs needing
+to be flushed caused this reason in mmu_spte_update() since this is a common
 function to update spte (present -> present).
 
 Since the spte is "volatile" if it can be updated out of mmu-lock, we always
-atomically update the spte, the race caused by fast page fault can be avoided,
+atomically update the spte and the race caused by fast page fault can be avoided.
 See the comments in spte_has_volatile_bits() and mmu_spte_update().
 
 Lockless Access Tracking:
@@ -283,9 +283,9 @@ time it will be set using the Dirty tracking mechanism described above.
 :Arch:         x86
 :Protects:     wakeup_vcpus_on_cpu
 :Comment:      This is a per-CPU lock and it is used for VT-d posted-interrupts.
-               When VT-d posted-interrupts is supported and the VM has assigned
+               When VT-d posted-interrupts are supported and the VM has assigned
                devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
-               protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
+               protected by blocked_vcpu_on_cpu_lock. When VT-d hardware issues
                wakeup notification event since external interrupts from the
                assigned devices happens, we will find the vCPU on the list to
                wakeup.
index 5fdb907..740d03d 100644 (file)
@@ -89,7 +89,7 @@ also define a new hypercall feature to indicate that the host can give you more
 registers. Only if the host supports the additional features, make use of them.
 
 The magic page layout is described by struct kvm_vcpu_arch_shared
-in arch/powerpc/include/asm/kvm_para.h.
+in arch/powerpc/include/uapi/asm/kvm_para.h.
 
 Magic page features
 ===================
@@ -112,7 +112,7 @@ Magic page flags
 ================
 
 In addition to features that indicate whether a host is capable of a particular
-feature we also have a channel for a guest to tell the guest whether it's capable
+feature we also have a channel for a guest to tell the host whether it's capable
 of something. This is what we call "flags".
 
 Flags are passed to the host in the low 12 bits of the Effective Address.
@@ -139,7 +139,7 @@ Patched instructions
 ====================
 
 The "ld" and "std" instructions are transformed to "lwz" and "stw" instructions
-respectively on 32 bit systems with an added offset of 4 to accommodate for big
+respectively on 32-bit systems with an added offset of 4 to accommodate for big
 endianness.
 
 The following is a list of mapping the Linux kernel performs when running as
@@ -210,7 +210,7 @@ available on all targets.
 2) PAPR hypercalls
 
 PAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU).
-These are the same hypercalls that pHyp, the POWER hypervisor implements. Some of
+These are the same hypercalls that pHyp, the POWER hypervisor, implements. Some of
 them are handled in the kernel, some are handled in user space. This is only
 available on book3s_64.
 
index 87f04c1..06718b9 100644 (file)
@@ -101,7 +101,7 @@ also be used, e.g. ::
 
 However, VCPU request users should refrain from doing so, as it would
 break the abstraction.  The first 8 bits are reserved for architecture
-independent requests, all additional bits are available for architecture
+independent requests; all additional bits are available for architecture
 dependent requests.
 
 Architecture Independent Requests
@@ -151,8 +151,8 @@ KVM_REQUEST_NO_WAKEUP
 
   This flag is applied to requests that only need immediate attention
   from VCPUs running in guest mode.  That is, sleeping VCPUs do not need
-  to be awaken for these requests.  Sleeping VCPUs will handle the
-  requests when they are awaken later for some other reason.
+  to be awakened for these requests.  Sleeping VCPUs will handle the
+  requests when they are awakened later for some other reason.
 
 KVM_REQUEST_WAIT
 
index 6b789d2..62d867e 100644 (file)
@@ -5,31 +5,31 @@ Paravirt_ops
 ============
 
 Linux provides support for different hypervisor virtualization technologies.
-Historically different binary kernels would be required in order to support
-different hypervisors, this restriction was removed with pv_ops.
+Historically, different binary kernels would be required in order to support
+different hypervisors; this restriction was removed with pv_ops.
 Linux pv_ops is a virtualization API which enables support for different
 hypervisors. It allows each hypervisor to override critical operations and
 allows a single kernel binary to run on all supported execution environments
 including native machine -- without any hypervisors.
 
 pv_ops provides a set of function pointers which represent operations
-corresponding to low level critical instructions and high level
-functionalities in various areas. pv-ops allows for optimizations at run
-time by enabling binary patching of the low-ops critical operations
+corresponding to low-level critical instructions and high-level
+functionalities in various areas. pv_ops allows for optimizations at run
+time by enabling binary patching of the low-level critical operations
 at boot time.
 
 pv_ops operations are classified into three categories:
 
 - simple indirect call
-   These operations correspond to high level functionality where it is
+   These operations correspond to high-level functionality where it is
    known that the overhead of indirect call isn't very important.
 
 - indirect call which allows optimization with binary patch
-   Usually these operations correspond to low level critical instructions. They
+   Usually these operations correspond to low-level critical instructions. They
    are called frequently and are performance critical. The overhead is
    very important.
 
 - a set of macros for hand written assembly code
    Hand written assembly codes (.S files) also need paravirtualization
-   because they include sensitive instructions or some of code paths in
+   because they include sensitive instructions or some code paths in
    them are very performance critical.
index 7e0b87d..60ee351 100644 (file)
@@ -273,8 +273,8 @@ ABI/API
 L:     linux-api@vger.kernel.org
 F:     include/linux/syscalls.h
 F:     kernel/sys_ni.c
-X:     include/uapi/
 X:     arch/*/include/uapi/
+X:     include/uapi/
 
 ABIT UGURU 1,2 HARDWARE MONITOR DRIVER
 M:     Hans de Goede <hdegoede@redhat.com>
@@ -406,12 +406,6 @@ L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Maintained
 F:     drivers/acpi/arm64
 
-ACPI SERIAL MULTI INSTANTIATE DRIVER
-M:     Hans de Goede <hdegoede@redhat.com>
-L:     platform-driver-x86@vger.kernel.org
-S:     Maintained
-F:     drivers/platform/x86/serial-multi-instantiate.c
-
 ACPI PCC(Platform Communication Channel) MAILBOX DRIVER
 M:     Sudeep Holla <sudeep.holla@arm.com>
 L:     linux-acpi@vger.kernel.org
@@ -430,6 +424,12 @@ B: https://bugzilla.kernel.org
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
 F:     drivers/acpi/pmic/
 
+ACPI SERIAL MULTI INSTANTIATE DRIVER
+M:     Hans de Goede <hdegoede@redhat.com>
+L:     platform-driver-x86@vger.kernel.org
+S:     Maintained
+F:     drivers/platform/x86/serial-multi-instantiate.c
+
 ACPI THERMAL DRIVER
 M:     Rafael J. Wysocki <rafael@kernel.org>
 R:     Zhang Rui <rui.zhang@intel.com>
@@ -823,6 +823,13 @@ L: linux-crypto@vger.kernel.org
 S:     Maintained
 F:     drivers/crypto/allwinner/
 
+ALLWINNER DMIC DRIVERS
+M:     Ban Tao <fengzheng923@gmail.com>
+L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
+S:     Maintained
+F:     Documentation/devicetree/bindings/sound/allwinner,sun50i-h6-dmic.yaml
+F:     sound/soc/sunxi/sun50i-dmic.c
+
 ALLWINNER HARDWARE SPINLOCK SUPPORT
 M:     Wilken Gottwalt <wilken.gottwalt@posteo.net>
 S:     Maintained
@@ -844,13 +851,6 @@ L: linux-media@vger.kernel.org
 S:     Maintained
 F:     drivers/staging/media/sunxi/cedrus/
 
-ALLWINNER DMIC DRIVERS
-M:     Ban Tao <fengzheng923@gmail.com>
-L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
-S:     Maintained
-F:     Documentation/devicetree/bindings/sound/allwinner,sun50i-h6-dmic.yaml
-F:     sound/soc/sunxi/sun50i-dmic.c
-
 ALPHA PORT
 M:     Richard Henderson <richard.henderson@linaro.org>
 M:     Ivan Kokshaysky <ink@jurassic.park.msu.ru>
@@ -956,7 +956,8 @@ F:  Documentation/networking/device_drivers/ethernet/amazon/ena.rst
 F:     drivers/net/ethernet/amazon/
 
 AMAZON RDMA EFA DRIVER
-M:     Gal Pressman <galpress@amazon.com>
+M:     Michael Margolin <mrgolin@amazon.com>
+R:     Gal Pressman <gal.pressman@linux.dev>
 R:     Yossi Leybovich <sleybo@amazon.com>
 L:     linux-rdma@vger.kernel.org
 S:     Supported
@@ -1026,6 +1027,16 @@ F:       drivers/char/hw_random/geode-rng.c
 F:     drivers/crypto/geode*
 F:     drivers/video/fbdev/geode/
 
+AMD HSMP DRIVER
+M:     Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
+R:     Carlos Bilbao <carlos.bilbao@amd.com>
+L:     platform-driver-x86@vger.kernel.org
+S:     Maintained
+F:     Documentation/arch/x86/amd_hsmp.rst
+F:     arch/x86/include/asm/amd_hsmp.h
+F:     arch/x86/include/uapi/asm/amd_hsmp.h
+F:     drivers/platform/x86/amd/hsmp.c
+
 AMD IOMMU (AMD-VI)
 M:     Joerg Roedel <joro@8bytes.org>
 R:     Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
@@ -1049,6 +1060,13 @@ F:       drivers/gpu/drm/amd/include/vi_structs.h
 F:     include/uapi/linux/kfd_ioctl.h
 F:     include/uapi/linux/kfd_sysfs.h
 
+AMD MP2 I2C DRIVER
+M:     Elie Morisse <syniurge@gmail.com>
+M:     Shyam Sundar S K <shyam-sundar.s-k@amd.com>
+L:     linux-i2c@vger.kernel.org
+S:     Maintained
+F:     drivers/i2c/busses/i2c-amd-mp2*
+
 AMD PDS CORE DRIVER
 M:     Shannon Nelson <shannon.nelson@amd.com>
 M:     Brett Creeley <brett.creeley@amd.com>
@@ -1058,18 +1076,6 @@ F:       Documentation/networking/device_drivers/ethernet/amd/pds_core.rst
 F:     drivers/net/ethernet/amd/pds_core/
 F:     include/linux/pds/
 
-AMD SPI DRIVER
-M:     Sanjay R Mehta <sanju.mehta@amd.com>
-S:     Maintained
-F:     drivers/spi/spi-amd.c
-
-AMD MP2 I2C DRIVER
-M:     Elie Morisse <syniurge@gmail.com>
-M:     Shyam Sundar S K <shyam-sundar.s-k@amd.com>
-L:     linux-i2c@vger.kernel.org
-S:     Maintained
-F:     drivers/i2c/busses/i2c-amd-mp2*
-
 AMD PMC DRIVER
 M:     Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
 L:     platform-driver-x86@vger.kernel.org
@@ -1083,16 +1089,6 @@ S:       Maintained
 F:     Documentation/ABI/testing/sysfs-amd-pmf
 F:     drivers/platform/x86/amd/pmf/
 
-AMD HSMP DRIVER
-M:     Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
-R:     Carlos Bilbao <carlos.bilbao@amd.com>
-L:     platform-driver-x86@vger.kernel.org
-S:     Maintained
-F:     Documentation/arch/x86/amd_hsmp.rst
-F:     arch/x86/include/asm/amd_hsmp.h
-F:     arch/x86/include/uapi/asm/amd_hsmp.h
-F:     drivers/platform/x86/amd/hsmp.c
-
 AMD POWERPLAY AND SWSMU
 M:     Evan Quan <evan.quan@amd.com>
 L:     amd-gfx@lists.freedesktop.org
@@ -1121,13 +1117,6 @@ M:       Tom Lendacky <thomas.lendacky@amd.com>
 S:     Supported
 F:     arch/arm64/boot/dts/amd/
 
-AMD XGBE DRIVER
-M:     "Shyam Sundar S K" <Shyam-sundar.S-k@amd.com>
-L:     netdev@vger.kernel.org
-S:     Supported
-F:     arch/arm64/boot/dts/amd/amd-seattle-xgbe*.dtsi
-F:     drivers/net/ethernet/amd/xgbe/
-
 AMD SENSOR FUSION HUB DRIVER
 M:     Basavaraj Natikar <basavaraj.natikar@amd.com>
 L:     linux-input@vger.kernel.org
@@ -1135,6 +1124,18 @@ S:       Maintained
 F:     Documentation/hid/amd-sfh*
 F:     drivers/hid/amd-sfh-hid/
 
+AMD SPI DRIVER
+M:     Sanjay R Mehta <sanju.mehta@amd.com>
+S:     Maintained
+F:     drivers/spi/spi-amd.c
+
+AMD XGBE DRIVER
+M:     "Shyam Sundar S K" <Shyam-sundar.S-k@amd.com>
+L:     netdev@vger.kernel.org
+S:     Supported
+F:     arch/arm64/boot/dts/amd/amd-seattle-xgbe*.dtsi
+F:     drivers/net/ethernet/amd/xgbe/
+
 AMLOGIC DDR PMU DRIVER
 M:     Jiucheng Xu <jiucheng.xu@amlogic.com>
 L:     linux-amlogic@lists.infradead.org
@@ -1169,6 +1170,14 @@ T:       git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
 F:     drivers/net/amt.c
 
+ANALOG DEVICES INC AD3552R DRIVER
+M:     Nuno Sá <nuno.sa@analog.com>
+L:     linux-iio@vger.kernel.org
+S:     Supported
+W:     https://ez.analog.com/linux-software-drivers
+F:     Documentation/devicetree/bindings/iio/dac/adi,ad3552r.yaml
+F:     drivers/iio/dac/ad3552r.c
+
 ANALOG DEVICES INC AD4130 DRIVER
 M:     Cosmin Tanislav <cosmin.tanislav@analog.com>
 L:     linux-iio@vger.kernel.org
@@ -1194,14 +1203,6 @@ W:       https://ez.analog.com/linux-software-drivers
 F:     Documentation/devicetree/bindings/iio/adc/adi,ad7292.yaml
 F:     drivers/iio/adc/ad7292.c
 
-ANALOG DEVICES INC AD3552R DRIVER
-M:     Nuno Sá <nuno.sa@analog.com>
-L:     linux-iio@vger.kernel.org
-S:     Supported
-W:     https://ez.analog.com/linux-software-drivers
-F:     Documentation/devicetree/bindings/iio/dac/adi,ad3552r.yaml
-F:     drivers/iio/dac/ad3552r.c
-
 ANALOG DEVICES INC AD7293 DRIVER
 M:     Antoniu Miclaus <antoniu.miclaus@analog.com>
 L:     linux-iio@vger.kernel.org
@@ -1210,23 +1211,6 @@ W:       https://ez.analog.com/linux-software-drivers
 F:     Documentation/devicetree/bindings/iio/dac/adi,ad7293.yaml
 F:     drivers/iio/dac/ad7293.c
 
-ANALOG DEVICES INC AD7768-1 DRIVER
-M:     Michael Hennerich <Michael.Hennerich@analog.com>
-L:     linux-iio@vger.kernel.org
-S:     Supported
-W:     https://ez.analog.com/linux-software-drivers
-F:     Documentation/devicetree/bindings/iio/adc/adi,ad7768-1.yaml
-F:     drivers/iio/adc/ad7768-1.c
-
-ANALOG DEVICES INC AD7780 DRIVER
-M:     Michael Hennerich <Michael.Hennerich@analog.com>
-M:     Renato Lui Geh <renatogeh@gmail.com>
-L:     linux-iio@vger.kernel.org
-S:     Supported
-W:     https://ez.analog.com/linux-software-drivers
-F:     Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml
-F:     drivers/iio/adc/ad7780.c
-
 ANALOG DEVICES INC AD74115 DRIVER
 M:     Cosmin Tanislav <cosmin.tanislav@analog.com>
 L:     linux-iio@vger.kernel.org
@@ -1244,6 +1228,23 @@ F:       Documentation/devicetree/bindings/iio/addac/adi,ad74413r.yaml
 F:     drivers/iio/addac/ad74413r.c
 F:     include/dt-bindings/iio/addac/adi,ad74413r.h
 
+ANALOG DEVICES INC AD7768-1 DRIVER
+M:     Michael Hennerich <Michael.Hennerich@analog.com>
+L:     linux-iio@vger.kernel.org
+S:     Supported
+W:     https://ez.analog.com/linux-software-drivers
+F:     Documentation/devicetree/bindings/iio/adc/adi,ad7768-1.yaml
+F:     drivers/iio/adc/ad7768-1.c
+
+ANALOG DEVICES INC AD7780 DRIVER
+M:     Michael Hennerich <Michael.Hennerich@analog.com>
+M:     Renato Lui Geh <renatogeh@gmail.com>
+L:     linux-iio@vger.kernel.org
+S:     Supported
+W:     https://ez.analog.com/linux-software-drivers
+F:     Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml
+F:     drivers/iio/adc/ad7780.c
+
 ANALOG DEVICES INC ADA4250 DRIVER
 M:     Antoniu Miclaus <antoniu.miclaus@analog.com>
 L:     linux-iio@vger.kernel.org
@@ -1294,10 +1295,10 @@ F:      drivers/iio/imu/adis16460.c
 ANALOG DEVICES INC ADIS16475 DRIVER
 M:     Nuno Sa <nuno.sa@analog.com>
 L:     linux-iio@vger.kernel.org
-W:     https://ez.analog.com/linux-software-drivers
 S:     Supported
-F:     drivers/iio/imu/adis16475.c
+W:     https://ez.analog.com/linux-software-drivers
 F:     Documentation/devicetree/bindings/iio/imu/adi,adis16475.yaml
+F:     drivers/iio/imu/adis16475.c
 
 ANALOG DEVICES INC ADM1177 DRIVER
 M:     Michael Hennerich <Michael.Hennerich@analog.com>
@@ -1315,21 +1316,21 @@ W:      https://ez.analog.com/linux-software-drivers
 F:     Documentation/devicetree/bindings/iio/frequency/adi,admv1013.yaml
 F:     drivers/iio/frequency/admv1013.c
 
-ANALOG DEVICES INC ADMV8818 DRIVER
+ANALOG DEVICES INC ADMV1014 DRIVER
 M:     Antoniu Miclaus <antoniu.miclaus@analog.com>
 L:     linux-iio@vger.kernel.org
 S:     Supported
 W:     https://ez.analog.com/linux-software-drivers
-F:     Documentation/devicetree/bindings/iio/filter/adi,admv8818.yaml
-F:     drivers/iio/filter/admv8818.c
+F:     Documentation/devicetree/bindings/iio/frequency/adi,admv1014.yaml
+F:     drivers/iio/frequency/admv1014.c
 
-ANALOG DEVICES INC ADMV1014 DRIVER
+ANALOG DEVICES INC ADMV8818 DRIVER
 M:     Antoniu Miclaus <antoniu.miclaus@analog.com>
 L:     linux-iio@vger.kernel.org
 S:     Supported
 W:     https://ez.analog.com/linux-software-drivers
-F:     Documentation/devicetree/bindings/iio/frequency/adi,admv1014.yaml
-F:     drivers/iio/frequency/admv1014.c
+F:     Documentation/devicetree/bindings/iio/filter/adi,admv8818.yaml
+F:     drivers/iio/filter/admv8818.c
 
 ANALOG DEVICES INC ADP5061 DRIVER
 M:     Michael Hennerich <Michael.Hennerich@analog.com>
@@ -1351,8 +1352,8 @@ M:        Lars-Peter Clausen <lars@metafoo.de>
 L:     linux-media@vger.kernel.org
 S:     Supported
 W:     https://ez.analog.com/linux-software-drivers
-F:     drivers/media/i2c/adv7180.c
 F:     Documentation/devicetree/bindings/media/i2c/adv7180.yaml
+F:     drivers/media/i2c/adv7180.c
 
 ANALOG DEVICES INC ADV748X DRIVER
 M:     Kieran Bingham <kieran.bingham@ideasonboard.com>
@@ -1371,8 +1372,8 @@ ANALOG DEVICES INC ADV7604 DRIVER
 M:     Hans Verkuil <hverkuil-cisco@xs4all.nl>
 L:     linux-media@vger.kernel.org
 S:     Maintained
-F:     drivers/media/i2c/adv7604*
 F:     Documentation/devicetree/bindings/media/i2c/adv7604.yaml
+F:     drivers/media/i2c/adv7604*
 
 ANALOG DEVICES INC ADV7842 DRIVER
 M:     Hans Verkuil <hverkuil-cisco@xs4all.nl>
@@ -1384,8 +1385,8 @@ ANALOG DEVICES INC ADXRS290 DRIVER
 M:     Nishant Malpani <nish.malpani25@gmail.com>
 L:     linux-iio@vger.kernel.org
 S:     Supported
-F:     drivers/iio/gyro/adxrs290.c
 F:     Documentation/devicetree/bindings/iio/gyroscope/adi,adxrs290.yaml
+F:     drivers/iio/gyro/adxrs290.c
 
 ANALOG DEVICES INC ASOC CODEC DRIVERS
 M:     Lars-Peter Clausen <lars@metafoo.de>
@@ -1600,7 +1601,7 @@ F:        drivers/media/i2c/ar0521.c
 
 ARASAN NAND CONTROLLER DRIVER
 M:     Miquel Raynal <miquel.raynal@bootlin.com>
-M:     Naga Sureshkumar Relli <nagasure@xilinx.com>
+R:     Michal Simek <michal.simek@amd.com>
 L:     linux-mtd@lists.infradead.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/mtd/arasan,nand-controller.yaml
@@ -1625,6 +1626,17 @@ S:       Maintained
 F:     drivers/net/arcnet/
 F:     include/uapi/linux/if_arcnet.h
 
+ARM AND ARM64 SoC SUB-ARCHITECTURES (COMMON PARTS)
+M:     Arnd Bergmann <arnd@arndb.de>
+M:     Olof Johansson <olof@lixom.net>
+M:     soc@kernel.org
+L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:     Maintained
+C:     irc://irc.libera.chat/armlinux
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
+F:     arch/arm/boot/dts/Makefile
+F:     arch/arm64/boot/dts/Makefile
+
 ARM ARCHITECTED TIMER DRIVER
 M:     Mark Rutland <mark.rutland@arm.com>
 M:     Marc Zyngier <maz@kernel.org>
@@ -1666,10 +1678,7 @@ F:       drivers/power/reset/arm-versatile-reboot.c
 F:     drivers/soc/versatile/
 
 ARM KOMEDA DRM-KMS DRIVER
-M:     James (Qian) Wang <james.qian.wang@arm.com>
 M:     Liviu Dudau <liviu.dudau@arm.com>
-M:     Mihail Atanassov <mihail.atanassov@arm.com>
-L:     Mali DP Maintainers <malidp@foss.arm.com>
 S:     Supported
 T:     git git://anongit.freedesktop.org/drm/drm-misc
 F:     Documentation/devicetree/bindings/display/arm,komeda.yaml
@@ -1690,8 +1699,6 @@ F:        include/uapi/drm/panfrost_drm.h
 
 ARM MALI-DP DRM DRIVER
 M:     Liviu Dudau <liviu.dudau@arm.com>
-M:     Brian Starkey <brian.starkey@arm.com>
-L:     Mali DP Maintainers <malidp@foss.arm.com>
 S:     Supported
 T:     git git://anongit.freedesktop.org/drm/drm-misc
 F:     Documentation/devicetree/bindings/display/arm,malidp.yaml
@@ -1738,22 +1745,6 @@ S:       Odd Fixes
 F:     drivers/amba/
 F:     include/linux/amba/bus.h
 
-ARM PRIMECELL PL35X NAND CONTROLLER DRIVER
-M:     Miquel Raynal <miquel.raynal@bootlin.com>
-M:     Naga Sureshkumar Relli <nagasure@xilinx.com>
-L:     linux-mtd@lists.infradead.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/mtd/arm,pl353-nand-r2p1.yaml
-F:     drivers/mtd/nand/raw/pl35x-nand-controller.c
-
-ARM PRIMECELL PL35X SMC DRIVER
-M:     Miquel Raynal <miquel.raynal@bootlin.com>
-M:     Naga Sureshkumar Relli <nagasure@xilinx.com>
-L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-S:     Maintained
-F:     Documentation/devicetree/bindings/memory-controllers/arm,pl35x-smc.yaml
-F:     drivers/memory/pl353-smc.c
-
 ARM PRIMECELL CLCD PL110 DRIVER
 M:     Russell King <linux@armlinux.org.uk>
 S:     Odd Fixes
@@ -1771,6 +1762,22 @@ S:       Odd Fixes
 F:     drivers/mmc/host/mmci.*
 F:     include/linux/amba/mmci.h
 
+ARM PRIMECELL PL35X NAND CONTROLLER DRIVER
+M:     Miquel Raynal <miquel.raynal@bootlin.com>
+R:     Michal Simek <michal.simek@amd.com>
+L:     linux-mtd@lists.infradead.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/mtd/arm,pl353-nand-r2p1.yaml
+F:     drivers/mtd/nand/raw/pl35x-nand-controller.c
+
+ARM PRIMECELL PL35X SMC DRIVER
+M:     Miquel Raynal <miquel.raynal@bootlin.com>
+R:     Michal Simek <michal.simek@amd.com>
+L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:     Maintained
+F:     Documentation/devicetree/bindings/memory-controllers/arm,pl35x-smc.yaml
+F:     drivers/memory/pl353-smc.c
+
 ARM PRIMECELL SSP PL022 SPI DRIVER
 M:     Linus Walleij <linus.walleij@linaro.org>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
@@ -1807,17 +1814,6 @@ F:       Documentation/devicetree/bindings/iommu/arm,smmu*
 F:     drivers/iommu/arm/
 F:     drivers/iommu/io-pgtable-arm*
 
-ARM AND ARM64 SoC SUB-ARCHITECTURES (COMMON PARTS)
-M:     Arnd Bergmann <arnd@arndb.de>
-M:     Olof Johansson <olof@lixom.net>
-M:     soc@kernel.org
-L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-S:     Maintained
-C:     irc://irc.libera.chat/armlinux
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
-F:     arch/arm/boot/dts/Makefile
-F:     arch/arm64/boot/dts/Makefile
-
 ARM SUB-ARCHITECTURES
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Maintained
@@ -1869,9 +1865,9 @@ M:        Chen-Yu Tsai <wens@csie.org>
 M:     Jernej Skrabec <jernej.skrabec@gmail.com>
 M:     Samuel Holland <samuel@sholland.org>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+L:     linux-sunxi@lists.linux.dev
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git
-L:     linux-sunxi@lists.linux.dev
 F:     arch/arm/mach-sunxi/
 F:     arch/arm64/boot/dts/allwinner/
 F:     drivers/clk/sunxi-ng/
@@ -1934,6 +1930,15 @@ F:       arch/arm/mach-alpine/
 F:     arch/arm64/boot/dts/amazon/
 F:     drivers/*/*alpine*
 
+ARM/APPLE MACHINE SOUND DRIVERS
+M:     Martin Povišer <povik+lin@cutebit.org>
+L:     asahi@lists.linux.dev
+L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
+S:     Maintained
+F:     Documentation/devicetree/bindings/sound/apple,*
+F:     sound/soc/apple/*
+F:     sound/soc/codecs/cs42l83-i2c.c
+
 ARM/APPLE MACHINE SUPPORT
 M:     Hector Martin <marcan@marcan.st>
 M:     Sven Peter <sven@svenpeter.dev>
@@ -1961,7 +1966,7 @@ F:        Documentation/devicetree/bindings/nvmem/apple,efuses.yaml
 F:     Documentation/devicetree/bindings/pci/apple,pcie.yaml
 F:     Documentation/devicetree/bindings/pinctrl/apple,pinctrl.yaml
 F:     Documentation/devicetree/bindings/power/apple*
-F:     Documentation/devicetree/bindings/pwm/pwm-apple.yaml
+F:     Documentation/devicetree/bindings/pwm/apple,s5l-fpwm.yaml
 F:     Documentation/devicetree/bindings/watchdog/apple,wdt.yaml
 F:     arch/arm64/boot/dts/apple/
 F:     drivers/bluetooth/hci_bcm4377.c
@@ -1985,15 +1990,6 @@ F:       include/dt-bindings/pinctrl/apple.h
 F:     include/linux/apple-mailbox.h
 F:     include/linux/soc/apple/*
 
-ARM/APPLE MACHINE SOUND DRIVERS
-M:     Martin Povišer <povik+lin@cutebit.org>
-L:     asahi@lists.linux.dev
-L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
-S:     Maintained
-F:     Documentation/devicetree/bindings/sound/apple,*
-F:     sound/soc/apple/*
-F:     sound/soc/codecs/cs42l83-i2c.c
-
 ARM/ARTPEC MACHINE SUPPORT
 M:     Jesper Nilsson <jesper.nilsson@axis.com>
 M:     Lars Persson <lars.persson@axis.com>
@@ -2109,19 +2105,19 @@ S:      Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
 F:     Documentation/ABI/testing/sysfs-bus-coresight-devices-*
 F:     Documentation/devicetree/bindings/arm/arm,coresight-*.yaml
-F:     Documentation/devicetree/bindings/arm/qcom,coresight-*.yaml
 F:     Documentation/devicetree/bindings/arm/arm,embedded-trace-extension.yaml
 F:     Documentation/devicetree/bindings/arm/arm,trace-buffer-extension.yaml
+F:     Documentation/devicetree/bindings/arm/qcom,coresight-*.yaml
 F:     Documentation/trace/coresight/*
 F:     drivers/hwtracing/coresight/*
 F:     include/dt-bindings/arm/coresight-cti-dt.h
 F:     include/linux/coresight*
 F:     samples/coresight/*
-F:     tools/perf/tests/shell/coresight/*
 F:     tools/perf/arch/arm/util/auxtrace.c
 F:     tools/perf/arch/arm/util/cs-etm.c
 F:     tools/perf/arch/arm/util/cs-etm.h
 F:     tools/perf/arch/arm/util/pmu.c
+F:     tools/perf/tests/shell/coresight/*
 F:     tools/perf/util/cs-etm-decoder/*
 F:     tools/perf/util/cs-etm.*
 
@@ -2156,9 +2152,9 @@ F:        Documentation/devicetree/bindings/leds/cznic,turris-omnia-leds.yaml
 F:     Documentation/devicetree/bindings/watchdog/armada-37xx-wdt.txt
 F:     drivers/bus/moxtet.c
 F:     drivers/firmware/turris-mox-rwtm.c
+F:     drivers/gpio/gpio-moxtet.c
 F:     drivers/leds/leds-turris-omnia.c
 F:     drivers/mailbox/armada-37xx-rwtm-mailbox.c
-F:     drivers/gpio/gpio-moxtet.c
 F:     drivers/watchdog/armada_37xx_wdt.c
 F:     include/dt-bindings/bus/moxtet.h
 F:     include/linux/armada-37xx-rwtm-mailbox.h
@@ -2188,10 +2184,10 @@ R:      NXP Linux Team <linux-imx@nxp.com>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git
-X:     drivers/media/i2c/
 F:     arch/arm64/boot/dts/freescale/
 X:     arch/arm64/boot/dts/freescale/fsl-*
 X:     arch/arm64/boot/dts/freescale/qoriq-*
+X:     drivers/media/i2c/
 N:     imx
 N:     mxs
 
@@ -2245,12 +2241,12 @@ ARM/HPE GXP ARCHITECTURE
 M:     Jean-Marie Verdun <verdun@hpe.com>
 M:     Nick Hawkins <nick.hawkins@hpe.com>
 S:     Maintained
-F:     Documentation/hwmon/gxp-fan-ctrl.rst
 F:     Documentation/devicetree/bindings/arm/hpe,gxp.yaml
 F:     Documentation/devicetree/bindings/hwmon/hpe,gxp-fan-ctrl.yaml
 F:     Documentation/devicetree/bindings/i2c/hpe,gxp-i2c.yaml
 F:     Documentation/devicetree/bindings/spi/hpe,gxp-spifi.yaml
 F:     Documentation/devicetree/bindings/timer/hpe,gxp-timer.yaml
+F:     Documentation/hwmon/gxp-fan-ctrl.rst
 F:     arch/arm/boot/dts/hpe-bmc*
 F:     arch/arm/boot/dts/hpe-gxp*
 F:     arch/arm/mach-hpe/
@@ -2275,9 +2271,9 @@ M:        Krzysztof Halasa <khalasa@piap.pl>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Maintained
 F:     Documentation/devicetree/bindings/arm/intel-ixp4xx.yaml
-F:     Documentation/devicetree/bindings/memory-controllers/intel,ixp4xx-expansion*
 F:     Documentation/devicetree/bindings/gpio/intel,ixp4xx-gpio.txt
 F:     Documentation/devicetree/bindings/interrupt-controller/intel,ixp4xx-interrupt.yaml
+F:     Documentation/devicetree/bindings/memory-controllers/intel,ixp4xx-expansion*
 F:     Documentation/devicetree/bindings/timer/intel,ixp4xx-timer.yaml
 F:     arch/arm/boot/dts/intel-ixp*
 F:     arch/arm/mach-ixp4xx/
@@ -2434,6 +2430,15 @@ X:       drivers/net/wireless/atmel/
 N:     at91
 N:     atmel
 
+ARM/MICROCHIP (ARM64) SoC support
+M:     Conor Dooley <conor@kernel.org>
+M:     Nicolas Ferre <nicolas.ferre@microchip.com>
+M:     Claudiu Beznea <claudiu.beznea@microchip.com>
+L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:     Supported
+T:     git https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux.git
+F:     arch/arm64/boot/dts/microchip/
+
 ARM/Microchip Sparx5 SoC support
 M:     Lars Povlsen <lars.povlsen@microchip.com>
 M:     Steen Hegelund <Steen.Hegelund@microchip.com>
@@ -2441,22 +2446,14 @@ M:      Daniel Machon <daniel.machon@microchip.com>
 M:     UNGLinuxDriver@microchip.com
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Supported
-T:     git git://github.com/microchip-ung/linux-upstream.git
-F:     arch/arm64/boot/dts/microchip/
+F:     arch/arm64/boot/dts/microchip/sparx*
 F:     drivers/net/ethernet/microchip/vcap/
 F:     drivers/pinctrl/pinctrl-microchip-sgpio.c
 N:     sparx5
 
-Microchip Timer Counter Block (TCB) Capture Driver
-M:     Kamel Bouhara <kamel.bouhara@bootlin.com>
-L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-L:     linux-iio@vger.kernel.org
-S:     Maintained
-F:     drivers/counter/microchip-tcb-capture.c
-
-ARM/MILBEAUT ARCHITECTURE
-M:     Taichi Sugaya <sugaya.taichi@socionext.com>
-M:     Takao Orito <orito.takao@socionext.com>
+ARM/MILBEAUT ARCHITECTURE
+M:     Taichi Sugaya <sugaya.taichi@socionext.com>
+M:     Takao Orito <orito.takao@socionext.com>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Maintained
 F:     arch/arm/boot/dts/milbeaut*
@@ -2525,8 +2522,8 @@ F:        Documentation/devicetree/bindings/rtc/nuvoton,nct3018y.yaml
 F:     arch/arm/boot/dts/nuvoton-npcm*
 F:     arch/arm/mach-npcm/
 F:     arch/arm64/boot/dts/nuvoton/
-F:     drivers/*/*npcm*
 F:     drivers/*/*/*npcm*
+F:     drivers/*/*npcm*
 F:     drivers/rtc/rtc-nct3018y.c
 F:     include/dt-bindings/clock/nuvoton,npcm7xx-clock.h
 F:     include/dt-bindings/clock/nuvoton,npcm845-clk.h
@@ -2569,6 +2566,12 @@ F:       arch/arm/mach-oxnas/
 F:     drivers/power/reset/oxnas-restart.c
 N:     oxnas
 
+ARM/QUALCOMM CHROMEBOOK SUPPORT
+R:     cros-qcom-dts-watchers@chromium.org
+F:     arch/arm64/boot/dts/qcom/sc7180*
+F:     arch/arm64/boot/dts/qcom/sc7280*
+F:     arch/arm64/boot/dts/qcom/sdm845-cheza*
+
 ARM/QUALCOMM SUPPORT
 M:     Andy Gross <agross@kernel.org>
 M:     Bjorn Andersson <andersson@kernel.org>
@@ -2602,22 +2605,16 @@ F:      drivers/pci/controller/dwc/pcie-qcom.c
 F:     drivers/phy/qualcomm/
 F:     drivers/power/*/msm*
 F:     drivers/reset/reset-qcom-*
-F:     drivers/ufs/host/ufs-qcom*
 F:     drivers/spi/spi-geni-qcom.c
 F:     drivers/spi/spi-qcom-qspi.c
 F:     drivers/spi/spi-qup.c
 F:     drivers/tty/serial/msm_serial.c
+F:     drivers/ufs/host/ufs-qcom*
 F:     drivers/usb/dwc3/dwc3-qcom.c
 F:     include/dt-bindings/*/qcom*
 F:     include/linux/*/qcom*
 F:     include/linux/soc/qcom/
 
-ARM/QUALCOMM CHROMEBOOK SUPPORT
-R:     cros-qcom-dts-watchers@chromium.org
-F:     arch/arm64/boot/dts/qcom/sc7180*
-F:     arch/arm64/boot/dts/qcom/sc7280*
-F:     arch/arm64/boot/dts/qcom/sdm845-cheza*
-
 ARM/RDA MICRO ARCHITECTURE
 M:     Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
@@ -2709,11 +2706,11 @@ R:      Alim Akhtar <alim.akhtar@samsung.com>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 L:     linux-samsung-soc@vger.kernel.org
 S:     Maintained
-C:     irc://irc.libera.chat/linux-exynos
 Q:     https://patchwork.kernel.org/project/linux-samsung-soc/list/
 B:     mailto:linux-samsung-soc@vger.kernel.org
+C:     irc://irc.libera.chat/linux-exynos
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
-F:     Documentation/arm/samsung/
+F:     Documentation/arch/arm/samsung/
 F:     Documentation/devicetree/bindings/arm/samsung/
 F:     Documentation/devicetree/bindings/hwinfo/samsung,*
 F:     Documentation/devicetree/bindings/power/pd-samsung.yaml
@@ -2811,8 +2808,8 @@ M:        Patrice Chotard <patrice.chotard@foss.st.com>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Maintained
 W:     http://www.stlinux.com
-F:     Documentation/devicetree/bindings/spi/st,ssc-spi.yaml
 F:     Documentation/devicetree/bindings/i2c/st,sti-i2c.yaml
+F:     Documentation/devicetree/bindings/spi/st,ssc-spi.yaml
 F:     arch/arm/boot/dts/sti*
 F:     arch/arm/mach-sti/
 F:     drivers/ata/ahci_st.c
@@ -2959,15 +2956,15 @@ T:      git git://git.kernel.org/pub/scm/linux/kernel/git/iwamatsu/linux-visconti.git
 F:     Documentation/devicetree/bindings/arm/toshiba.yaml
 F:     Documentation/devicetree/bindings/clock/toshiba,tmpv770x-pipllct.yaml
 F:     Documentation/devicetree/bindings/clock/toshiba,tmpv770x-pismu.yaml
-F:     Documentation/devicetree/bindings/net/toshiba,visconti-dwmac.yaml
 F:     Documentation/devicetree/bindings/gpio/toshiba,gpio-visconti.yaml
+F:     Documentation/devicetree/bindings/net/toshiba,visconti-dwmac.yaml
 F:     Documentation/devicetree/bindings/pci/toshiba,visconti-pcie.yaml
 F:     Documentation/devicetree/bindings/pinctrl/toshiba,visconti-pinctrl.yaml
 F:     Documentation/devicetree/bindings/watchdog/toshiba,visconti-wdt.yaml
 F:     arch/arm64/boot/dts/toshiba/
 F:     drivers/clk/visconti/
-F:     drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c
 F:     drivers/gpio/gpio-visconti.c
+F:     drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c
 F:     drivers/pci/controller/dwc/pcie-visconti.c
 F:     drivers/pinctrl/visconti/
 F:     drivers/watchdog/visconti_wdt.c
@@ -3112,6 +3109,13 @@ S:       Maintained
 F:     Documentation/devicetree/bindings/net/asix,ax88796c.yaml
 F:     drivers/net/ethernet/asix/ax88796c_*
 
+ASPEED CRYPTO DRIVER
+M:     Neal Liu <neal_liu@aspeedtech.com>
+L:     linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)
+S:     Maintained
+F:     Documentation/devicetree/bindings/crypto/aspeed,*
+F:     drivers/crypto/aspeed/
+
 ASPEED PECI CONTROLLER
 M:     Iwona Winiarska <iwona.winiarska@intel.com>
 L:     linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)
@@ -3156,6 +3160,13 @@ S:       Maintained
 F:     Documentation/devicetree/bindings/spi/aspeed,ast2600-fmc.yaml
 F:     drivers/spi/spi-aspeed-smc.c
 
+ASPEED USB UDC DRIVER
+M:     Neal Liu <neal_liu@aspeedtech.com>
+L:     linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)
+S:     Maintained
+F:     Documentation/devicetree/bindings/usb/aspeed,ast2600-udc.yaml
+F:     drivers/usb/gadget/udc/aspeed_udc.c
+
 ASPEED VIDEO ENGINE DRIVER
 M:     Eddie James <eajames@linux.ibm.com>
 L:     linux-media@vger.kernel.org
@@ -3164,19 +3175,11 @@ S:      Maintained
 F:     Documentation/devicetree/bindings/media/aspeed-video.txt
 F:     drivers/media/platform/aspeed/
 
-ASPEED USB UDC DRIVER
-M:     Neal Liu <neal_liu@aspeedtech.com>
-L:     linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)
-S:     Maintained
-F:     Documentation/devicetree/bindings/usb/aspeed,ast2600-udc.yaml
-F:     drivers/usb/gadget/udc/aspeed_udc.c
-
-ASPEED CRYPTO DRIVER
-M:     Neal Liu <neal_liu@aspeedtech.com>
-L:     linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)
+ASUS EC HARDWARE MONITOR DRIVER
+M:     Eugene Shalygin <eugene.shalygin@gmail.com>
+L:     linux-hwmon@vger.kernel.org
 S:     Maintained
-F:     Documentation/devicetree/bindings/crypto/aspeed,*
-F:     drivers/crypto/aspeed/
+F:     drivers/hwmon/asus-ec-sensors.c
 
 ASUS NOTEBOOKS AND EEEPC ACPI/WMI EXTRAS DRIVERS
 M:     Corentin Chary <corentin.chary@gmail.com>
@@ -3194,6 +3197,12 @@ S:       Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git
 F:     drivers/platform/x86/asus-tf103c-dock.c
 
+ASUS WIRELESS RADIO CONTROL DRIVER
+M:     João Paulo Rechi Vita <jprvita@gmail.com>
+L:     platform-driver-x86@vger.kernel.org
+S:     Maintained
+F:     drivers/platform/x86/asus-wireless.c
+
 ASUS WMI HARDWARE MONITOR DRIVER
 M:     Ed Brindley <kernel@maidavale.org>
 M:     Denis Pauk <pauk.denis@gmail.com>
@@ -3201,18 +3210,6 @@ L:       linux-hwmon@vger.kernel.org
 S:     Maintained
 F:     drivers/hwmon/asus_wmi_sensors.c
 
-ASUS EC HARDWARE MONITOR DRIVER
-M:     Eugene Shalygin <eugene.shalygin@gmail.com>
-L:     linux-hwmon@vger.kernel.org
-S:     Maintained
-F:     drivers/hwmon/asus-ec-sensors.c
-
-ASUS WIRELESS RADIO CONTROL DRIVER
-M:     João Paulo Rechi Vita <jprvita@gmail.com>
-L:     platform-driver-x86@vger.kernel.org
-S:     Maintained
-F:     drivers/platform/x86/asus-wireless.c
-
 ASYMMETRIC KEYS
 M:     David Howells <dhowells@redhat.com>
 L:     keyrings@vger.kernel.org
@@ -3352,10 +3349,10 @@ R:      Boqun Feng <boqun.feng@gmail.com>
 R:     Mark Rutland <mark.rutland@arm.com>
 L:     linux-kernel@vger.kernel.org
 S:     Maintained
+F:     Documentation/atomic_*.txt
 F:     arch/*/include/asm/atomic*.h
 F:     include/*/atomic*.h
 F:     include/linux/refcount.h
-F:     Documentation/atomic_*.txt
 F:     scripts/atomic/
 
 ATTO EXPRESSSAS SAS/SATA RAID SCSI DRIVER
@@ -3548,7 +3545,7 @@ F:        Documentation/filesystems/befs.rst
 F:     fs/befs/
 
 BFQ I/O SCHEDULER
-M:     Paolo Valente <paolo.valente@linaro.org>
+M:     Paolo Valente <paolo.valente@unimore.it>
 M:     Jens Axboe <axboe@kernel.dk>
 L:     linux-block@vger.kernel.org
 S:     Maintained
@@ -3649,50 +3646,6 @@ S:       Maintained
 F:     Documentation/devicetree/bindings/iio/accel/bosch,bma400.yaml
 F:     drivers/iio/accel/bma400*
 
-BPF [GENERAL] (Safe Dynamic Programs and Tools)
-M:     Alexei Starovoitov <ast@kernel.org>
-M:     Daniel Borkmann <daniel@iogearbox.net>
-M:     Andrii Nakryiko <andrii@kernel.org>
-R:     Martin KaFai Lau <martin.lau@linux.dev>
-R:     Song Liu <song@kernel.org>
-R:     Yonghong Song <yhs@fb.com>
-R:     John Fastabend <john.fastabend@gmail.com>
-R:     KP Singh <kpsingh@kernel.org>
-R:     Stanislav Fomichev <sdf@google.com>
-R:     Hao Luo <haoluo@google.com>
-R:     Jiri Olsa <jolsa@kernel.org>
-L:     bpf@vger.kernel.org
-S:     Supported
-W:     https://bpf.io/
-Q:     https://patchwork.kernel.org/project/netdevbpf/list/?delegate=121173
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
-F:     Documentation/bpf/
-F:     Documentation/networking/filter.rst
-F:     Documentation/userspace-api/ebpf/
-F:     arch/*/net/*
-F:     include/linux/bpf*
-F:     include/linux/btf*
-F:     include/linux/filter.h
-F:     include/trace/events/xdp.h
-F:     include/uapi/linux/bpf*
-F:     include/uapi/linux/btf*
-F:     include/uapi/linux/filter.h
-F:     kernel/bpf/
-F:     kernel/trace/bpf_trace.c
-F:     lib/test_bpf.c
-F:     net/bpf/
-F:     net/core/filter.c
-F:     net/sched/act_bpf.c
-F:     net/sched/cls_bpf.c
-F:     samples/bpf/
-F:     scripts/bpf_doc.py
-F:     scripts/pahole-flags.sh
-F:     scripts/pahole-version.sh
-F:     tools/bpf/
-F:     tools/lib/bpf/
-F:     tools/testing/selftests/bpf/
-
 BPF JIT for ARM
 M:     Shubham Bansal <illusionist.neo@gmail.com>
 L:     bpf@vger.kernel.org
@@ -3771,79 +3724,79 @@ S:      Supported
 F:     arch/x86/net/
 X:     arch/x86/net/bpf_jit_comp32.c
 
+BPF [BTF]
+M:     Martin KaFai Lau <martin.lau@linux.dev>
+L:     bpf@vger.kernel.org
+S:     Maintained
+F:     include/linux/btf*
+F:     kernel/bpf/btf.c
+
 BPF [CORE]
 M:     Alexei Starovoitov <ast@kernel.org>
 M:     Daniel Borkmann <daniel@iogearbox.net>
 R:     John Fastabend <john.fastabend@gmail.com>
 L:     bpf@vger.kernel.org
 S:     Maintained
-F:     kernel/bpf/verifier.c
-F:     kernel/bpf/tnum.c
-F:     kernel/bpf/core.c
-F:     kernel/bpf/syscall.c
-F:     kernel/bpf/dispatcher.c
-F:     kernel/bpf/trampoline.c
 F:     include/linux/bpf*
 F:     include/linux/filter.h
 F:     include/linux/tnum.h
+F:     kernel/bpf/core.c
+F:     kernel/bpf/dispatcher.c
+F:     kernel/bpf/syscall.c
+F:     kernel/bpf/tnum.c
+F:     kernel/bpf/trampoline.c
+F:     kernel/bpf/verifier.c
 
-BPF [BTF]
-M:     Martin KaFai Lau <martin.lau@linux.dev>
-L:     bpf@vger.kernel.org
-S:     Maintained
-F:     kernel/bpf/btf.c
-F:     include/linux/btf*
-
-BPF [TRACING]
-M:     Song Liu <song@kernel.org>
-R:     Jiri Olsa <jolsa@kernel.org>
+BPF [DOCUMENTATION] (Related to Standardization)
+R:     David Vernet <void@manifault.com>
 L:     bpf@vger.kernel.org
+L:     bpf@ietf.org
 S:     Maintained
-F:     kernel/trace/bpf_trace.c
-F:     kernel/bpf/stackmap.c
+F:     Documentation/bpf/instruction-set.rst
 
-BPF [NETWORKING] (tc BPF, sock_addr)
-M:     Martin KaFai Lau <martin.lau@linux.dev>
+BPF [GENERAL] (Safe Dynamic Programs and Tools)
+M:     Alexei Starovoitov <ast@kernel.org>
 M:     Daniel Borkmann <daniel@iogearbox.net>
+M:     Andrii Nakryiko <andrii@kernel.org>
+R:     Martin KaFai Lau <martin.lau@linux.dev>
+R:     Song Liu <song@kernel.org>
+R:     Yonghong Song <yhs@fb.com>
 R:     John Fastabend <john.fastabend@gmail.com>
+R:     KP Singh <kpsingh@kernel.org>
+R:     Stanislav Fomichev <sdf@google.com>
+R:     Hao Luo <haoluo@google.com>
+R:     Jiri Olsa <jolsa@kernel.org>
 L:     bpf@vger.kernel.org
-L:     netdev@vger.kernel.org
-S:     Maintained
+S:     Supported
+W:     https://bpf.io/
+Q:     https://patchwork.kernel.org/project/netdevbpf/list/?delegate=121173
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
+F:     Documentation/bpf/
+F:     Documentation/networking/filter.rst
+F:     Documentation/userspace-api/ebpf/
+F:     arch/*/net/*
+F:     include/linux/bpf*
+F:     include/linux/btf*
+F:     include/linux/filter.h
+F:     include/trace/events/xdp.h
+F:     include/uapi/linux/bpf*
+F:     include/uapi/linux/btf*
+F:     include/uapi/linux/filter.h
+F:     kernel/bpf/
+F:     kernel/trace/bpf_trace.c
+F:     lib/test_bpf.c
+F:     net/bpf/
 F:     net/core/filter.c
 F:     net/sched/act_bpf.c
 F:     net/sched/cls_bpf.c
-
-BPF [NETWORKING] (struct_ops, reuseport)
-M:     Martin KaFai Lau <martin.lau@linux.dev>
-L:     bpf@vger.kernel.org
-L:     netdev@vger.kernel.org
-S:     Maintained
-F:     kernel/bpf/bpf_struct*
-
-BPF [SECURITY & LSM] (Security Audit and Enforcement using BPF)
-M:     KP Singh <kpsingh@kernel.org>
-R:     Florent Revest <revest@chromium.org>
-R:     Brendan Jackman <jackmanb@chromium.org>
-L:     bpf@vger.kernel.org
-S:     Maintained
-F:     Documentation/bpf/prog_lsm.rst
-F:     include/linux/bpf_lsm.h
-F:     kernel/bpf/bpf_lsm.c
-F:     security/bpf/
-
-BPF [STORAGE & CGROUPS]
-M:     Martin KaFai Lau <martin.lau@linux.dev>
-L:     bpf@vger.kernel.org
-S:     Maintained
-F:     kernel/bpf/cgroup.c
-F:     kernel/bpf/*storage.c
-F:     kernel/bpf/bpf_lru*
-
-BPF [RINGBUF]
-M:     Andrii Nakryiko <andrii@kernel.org>
-L:     bpf@vger.kernel.org
-S:     Maintained
-F:     kernel/bpf/ringbuf.c
+F:     samples/bpf/
+F:     scripts/bpf_doc.py
+F:     scripts/pahole-flags.sh
+F:     scripts/pahole-version.sh
+F:     tools/bpf/
+F:     tools/lib/bpf/
+F:     tools/testing/selftests/bpf/
 
 BPF [ITERATOR]
 M:     Yonghong Song <yhs@fb.com>
@@ -3870,12 +3823,45 @@ L:      bpf@vger.kernel.org
 S:     Maintained
 F:     tools/lib/bpf/
 
-BPF [TOOLING] (bpftool)
-M:     Quentin Monnet <quentin@isovalent.com>
+BPF [MISC]
+L:     bpf@vger.kernel.org
+S:     Odd Fixes
+K:     (?:\b|_)bpf(?:\b|_)
+
+BPF [NETWORKING] (struct_ops, reuseport)
+M:     Martin KaFai Lau <martin.lau@linux.dev>
 L:     bpf@vger.kernel.org
+L:     netdev@vger.kernel.org
 S:     Maintained
-F:     kernel/bpf/disasm.*
-F:     tools/bpf/bpftool/
+F:     kernel/bpf/bpf_struct*
+
+BPF [NETWORKING] (tc BPF, sock_addr)
+M:     Martin KaFai Lau <martin.lau@linux.dev>
+M:     Daniel Borkmann <daniel@iogearbox.net>
+R:     John Fastabend <john.fastabend@gmail.com>
+L:     bpf@vger.kernel.org
+L:     netdev@vger.kernel.org
+S:     Maintained
+F:     net/core/filter.c
+F:     net/sched/act_bpf.c
+F:     net/sched/cls_bpf.c
+
+BPF [RINGBUF]
+M:     Andrii Nakryiko <andrii@kernel.org>
+L:     bpf@vger.kernel.org
+S:     Maintained
+F:     kernel/bpf/ringbuf.c
+
+BPF [SECURITY & LSM] (Security Audit and Enforcement using BPF)
+M:     KP Singh <kpsingh@kernel.org>
+R:     Florent Revest <revest@chromium.org>
+R:     Brendan Jackman <jackmanb@chromium.org>
+L:     bpf@vger.kernel.org
+S:     Maintained
+F:     Documentation/bpf/prog_lsm.rst
+F:     include/linux/bpf_lsm.h
+F:     kernel/bpf/bpf_lsm.c
+F:     security/bpf/
 
 BPF [SELFTESTS] (Test Runners & Infrastructure)
 M:     Andrii Nakryiko <andrii@kernel.org>
@@ -3884,17 +3870,28 @@ L:      bpf@vger.kernel.org
 S:     Maintained
 F:     tools/testing/selftests/bpf/
 
-BPF [DOCUMENTATION] (Related to Standardization)
-R:     David Vernet <void@manifault.com>
+BPF [STORAGE & CGROUPS]
+M:     Martin KaFai Lau <martin.lau@linux.dev>
 L:     bpf@vger.kernel.org
-L:     bpf@ietf.org
 S:     Maintained
-F:     Documentation/bpf/instruction-set.rst
+F:     kernel/bpf/*storage.c
+F:     kernel/bpf/bpf_lru*
+F:     kernel/bpf/cgroup.c
 
-BPF [MISC]
+BPF [TOOLING] (bpftool)
+M:     Quentin Monnet <quentin@isovalent.com>
 L:     bpf@vger.kernel.org
-S:     Odd Fixes
-K:     (?:\b|_)bpf(?:\b|_)
+S:     Maintained
+F:     kernel/bpf/disasm.*
+F:     tools/bpf/bpftool/
+
+BPF [TRACING]
+M:     Song Liu <song@kernel.org>
+R:     Jiri Olsa <jolsa@kernel.org>
+L:     bpf@vger.kernel.org
+S:     Maintained
+F:     kernel/bpf/stackmap.c
+F:     kernel/trace/bpf_trace.c
 
 BROADCOM B44 10/100 ETHERNET DRIVER
 M:     Michael Chan <michael.chan@broadcom.com>
@@ -3913,34 +3910,6 @@ F:       drivers/net/dsa/bcm_sf2*
 F:     include/linux/dsa/brcm.h
 F:     include/linux/platform_data/b53.h
 
-BROADCOM BCMBCA ARM ARCHITECTURE
-M:     William Zhang <william.zhang@broadcom.com>
-M:     Anand Gore <anand.gore@broadcom.com>
-M:     Kursad Oney <kursad.oney@broadcom.com>
-M:     Florian Fainelli <f.fainelli@gmail.com>
-M:     Rafał Miłecki <rafal@milecki.pl>
-R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
-L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-S:     Maintained
-T:     git https://github.com/broadcom/stblinux.git
-F:     Documentation/devicetree/bindings/arm/bcm/brcm,bcmbca.yaml
-F:     arch/arm64/boot/dts/broadcom/bcmbca/*
-N:     bcmbca
-N:     bcm[9]?47622
-N:     bcm[9]?4912
-N:     bcm[9]?63138
-N:     bcm[9]?63146
-N:     bcm[9]?63148
-N:     bcm[9]?63158
-N:     bcm[9]?63178
-N:     bcm[9]?6756
-N:     bcm[9]?6813
-N:     bcm[9]?6846
-N:     bcm[9]?6855
-N:     bcm[9]?6856
-N:     bcm[9]?6858
-N:     bcm[9]?6878
-
 BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE
 M:     Florian Fainelli <f.fainelli@gmail.com>
 R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
@@ -4038,11 +4007,39 @@ N:      brcmstb
 N:     bcm7038
 N:     bcm7120
 
+BROADCOM BCMBCA ARM ARCHITECTURE
+M:     William Zhang <william.zhang@broadcom.com>
+M:     Anand Gore <anand.gore@broadcom.com>
+M:     Kursad Oney <kursad.oney@broadcom.com>
+M:     Florian Fainelli <f.fainelli@gmail.com>
+M:     Rafał Miłecki <rafal@milecki.pl>
+R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
+L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:     Maintained
+T:     git https://github.com/broadcom/stblinux.git
+F:     Documentation/devicetree/bindings/arm/bcm/brcm,bcmbca.yaml
+F:     arch/arm64/boot/dts/broadcom/bcmbca/*
+N:     bcmbca
+N:     bcm[9]?47622
+N:     bcm[9]?4912
+N:     bcm[9]?63138
+N:     bcm[9]?63146
+N:     bcm[9]?63148
+N:     bcm[9]?63158
+N:     bcm[9]?63178
+N:     bcm[9]?6756
+N:     bcm[9]?6813
+N:     bcm[9]?6846
+N:     bcm[9]?6855
+N:     bcm[9]?6856
+N:     bcm[9]?6858
+N:     bcm[9]?6878
+
 BROADCOM BDC DRIVER
 M:     Justin Chen <justinpopo6@gmail.com>
 M:     Al Cooper <alcooperx@gmail.com>
-L:     linux-usb@vger.kernel.org
 R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
+L:     linux-usb@vger.kernel.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/usb/brcm,bdc.yaml
 F:     drivers/usb/gadget/udc/bdc/
@@ -4064,10 +4061,10 @@ F:      arch/mips/bmips/*
 F:     arch/mips/boot/dts/brcm/bcm*.dts*
 F:     arch/mips/include/asm/mach-bmips/*
 F:     arch/mips/kernel/*bmips*
-F:     drivers/soc/bcm/bcm63xx
 F:     drivers/irqchip/irq-bcm63*
 F:     drivers/irqchip/irq-bcm7*
 F:     drivers/irqchip/irq-brcmstb*
+F:     drivers/soc/bcm/bcm63xx
 F:     include/linux/bcm963xx_nvram.h
 F:     include/linux/bcm963xx_tag.h
 
@@ -4349,9 +4346,9 @@ M:        Florian Fainelli <f.fainelli@gmail.com>
 R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
 L:     netdev@vger.kernel.org
 S:     Supported
+F:     Documentation/devicetree/bindings/net/brcm,systemport.yaml
 F:     drivers/net/ethernet/broadcom/bcmsysport.*
 F:     drivers/net/ethernet/broadcom/unimac.h
-F:     Documentation/devicetree/bindings/net/brcm,systemport.yaml
 
 BROADCOM TG3 GIGABIT ETHERNET DRIVER
 M:     Siva Reddy Kallam <siva.kallam@broadcom.com>
@@ -4483,29 +4480,6 @@ W:       https://github.com/Cascoda/ca8210-linux.git
 F:     Documentation/devicetree/bindings/net/ieee802154/ca8210.txt
 F:     drivers/net/ieee802154/ca8210.c
 
-CANAAN/KENDRYTE K210 SOC FPIOA DRIVER
-M:     Damien Le Moal <dlemoal@kernel.org>
-L:     linux-riscv@lists.infradead.org
-L:     linux-gpio@vger.kernel.org (pinctrl driver)
-F:     Documentation/devicetree/bindings/pinctrl/canaan,k210-fpioa.yaml
-F:     drivers/pinctrl/pinctrl-k210.c
-
-CANAAN/KENDRYTE K210 SOC RESET CONTROLLER DRIVER
-M:     Damien Le Moal <dlemoal@kernel.org>
-L:     linux-kernel@vger.kernel.org
-L:     linux-riscv@lists.infradead.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/reset/canaan,k210-rst.yaml
-F:     drivers/reset/reset-k210.c
-
-CANAAN/KENDRYTE K210 SOC SYSTEM CONTROLLER DRIVER
-M:     Damien Le Moal <dlemoal@kernel.org>
-L:     linux-riscv@lists.infradead.org
-S:     Maintained
-F:      Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml
-F:     drivers/soc/canaan/
-F:     include/soc/canaan/
-
 CACHEFILES: FS-CACHE BACKEND FOR CACHING ON MOUNTED FILESYSTEMS
 M:     David Howells <dhowells@redhat.com>
 L:     linux-cachefs@redhat.com (moderated for non-subscribers)
@@ -4627,6 +4601,29 @@ F:       Documentation/networking/j1939.rst
 F:     include/uapi/linux/can/j1939.h
 F:     net/can/j1939/
 
+CANAAN/KENDRYTE K210 SOC FPIOA DRIVER
+M:     Damien Le Moal <dlemoal@kernel.org>
+L:     linux-riscv@lists.infradead.org
+L:     linux-gpio@vger.kernel.org (pinctrl driver)
+F:     Documentation/devicetree/bindings/pinctrl/canaan,k210-fpioa.yaml
+F:     drivers/pinctrl/pinctrl-k210.c
+
+CANAAN/KENDRYTE K210 SOC RESET CONTROLLER DRIVER
+M:     Damien Le Moal <dlemoal@kernel.org>
+L:     linux-kernel@vger.kernel.org
+L:     linux-riscv@lists.infradead.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/reset/canaan,k210-rst.yaml
+F:     drivers/reset/reset-k210.c
+
+CANAAN/KENDRYTE K210 SOC SYSTEM CONTROLLER DRIVER
+M:     Damien Le Moal <dlemoal@kernel.org>
+L:     linux-riscv@lists.infradead.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml
+F:     drivers/soc/canaan/
+F:     include/soc/canaan/
+
 CAPABILITIES
 M:     Serge Hallyn <serge@hallyn.com>
 L:     linux-security-module@vger.kernel.org
@@ -4686,8 +4683,8 @@ F:        arch/arm64/boot/dts/cavium/thunder2-99xx*
 
 CBS/ETF/TAPRIO QDISCS
 M:     Vinicius Costa Gomes <vinicius.gomes@intel.com>
-S:     Maintained
 L:     netdev@vger.kernel.org
+S:     Maintained
 F:     net/sched/sch_cbs.c
 F:     net/sched/sch_etf.c
 F:     net/sched/sch_taprio.c
@@ -4710,10 +4707,10 @@ CCTRNG ARM TRUSTZONE CRYPTOCELL TRUE RANDOM NUMBER GENERATOR (TRNG) DRIVER
 M:     Hadar Gat <hadar.gat@arm.com>
 L:     linux-crypto@vger.kernel.org
 S:     Supported
+W:     https://developer.arm.com/products/system-ip/trustzone-cryptocell/cryptocell-700-family
+F:     Documentation/devicetree/bindings/rng/arm-cctrng.yaml
 F:     drivers/char/hw_random/cctrng.c
 F:     drivers/char/hw_random/cctrng.h
-F:     Documentation/devicetree/bindings/rng/arm-cctrng.yaml
-W:     https://developer.arm.com/products/system-ip/trustzone-cryptocell/cryptocell-700-family
 
 CEC FRAMEWORK
 M:     Hans Verkuil <hverkuil-cisco@xs4all.nl>
@@ -4873,13 +4870,6 @@ S:       Maintained
 F:     Documentation/devicetree/bindings/sound/google,cros-ec-codec.yaml
 F:     sound/soc/codecs/cros_ec_codec.*
 
-CHROMEOS EC UART DRIVER
-M:     Bhanu Prakash Maiya <bhanumaiya@chromium.org>
-R:     Benson Leung <bleung@chromium.org>
-R:     Tzung-Bi Shih <tzungbi@kernel.org>
-S:     Maintained
-F:     drivers/platform/chrome/cros_ec_uart.c
-
 CHROMEOS EC SUBDRIVERS
 M:     Benson Leung <bleung@chromium.org>
 R:     Guenter Roeck <groeck@chromium.org>
@@ -4889,13 +4879,12 @@ F:      drivers/power/supply/cros_usbpd-charger.c
 N:     cros_ec
 N:     cros-ec
 
-CHROMEOS EC USB TYPE-C DRIVER
-M:     Prashant Malani <pmalani@chromium.org>
-L:     chrome-platform@lists.linux.dev
+CHROMEOS EC UART DRIVER
+M:     Bhanu Prakash Maiya <bhanumaiya@chromium.org>
+R:     Benson Leung <bleung@chromium.org>
+R:     Tzung-Bi Shih <tzungbi@kernel.org>
 S:     Maintained
-F:     drivers/platform/chrome/cros_ec_typec.*
-F:     drivers/platform/chrome/cros_typec_switch.c
-F:     drivers/platform/chrome/cros_typec_vdm.*
+F:     drivers/platform/chrome/cros_ec_uart.c
 
 CHROMEOS EC USB PD NOTIFY DRIVER
 M:     Prashant Malani <pmalani@chromium.org>
@@ -4904,6 +4893,14 @@ S:       Maintained
 F:     drivers/platform/chrome/cros_usbpd_notify.c
 F:     include/linux/platform_data/cros_usbpd_notify.h
 
+CHROMEOS EC USB TYPE-C DRIVER
+M:     Prashant Malani <pmalani@chromium.org>
+L:     chrome-platform@lists.linux.dev
+S:     Maintained
+F:     drivers/platform/chrome/cros_ec_typec.*
+F:     drivers/platform/chrome/cros_typec_switch.c
+F:     drivers/platform/chrome/cros_typec_vdm.*
+
 CHROMEOS HPS DRIVER
 M:     Dan Callaghan <dcallagh@chromium.org>
 R:     Sami Kyöstilä <skyostil@chromium.org>
@@ -4921,7 +4918,6 @@ F:        drivers/media/cec/i2c/ch7322.c
 CIRRUS LOGIC AUDIO CODEC DRIVERS
 M:     James Schulman <james.schulman@cirrus.com>
 M:     David Rhodes <david.rhodes@cirrus.com>
-M:     Lucas Tanure <tanureal@opensource.cirrus.com>
 M:     Richard Fitzgerald <rf@opensource.cirrus.com>
 L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
 L:     patches@opensource.cirrus.com
@@ -5021,6 +5017,18 @@ M:       Nelson Escobar <neescoba@cisco.com>
 S:     Supported
 F:     drivers/infiniband/hw/usnic/
 
+CLANG CONTROL FLOW INTEGRITY SUPPORT
+M:     Sami Tolvanen <samitolvanen@google.com>
+M:     Kees Cook <keescook@chromium.org>
+R:     Nathan Chancellor <nathan@kernel.org>
+R:     Nick Desaulniers <ndesaulniers@google.com>
+L:     llvm@lists.linux.dev
+S:     Supported
+B:     https://github.com/ClangBuiltLinux/linux/issues
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/hardening
+F:     include/linux/cfi.h
+F:     kernel/cfi.c
+
 CLANG-FORMAT FILE
 M:     Miguel Ojeda <ojeda@kernel.org>
 S:     Maintained
@@ -5041,18 +5049,6 @@ F:       scripts/Makefile.clang
 F:     scripts/clang-tools/
 K:     \b(?i:clang|llvm)\b
 
-CLANG CONTROL FLOW INTEGRITY SUPPORT
-M:     Sami Tolvanen <samitolvanen@google.com>
-M:     Kees Cook <keescook@chromium.org>
-R:     Nathan Chancellor <nathan@kernel.org>
-R:     Nick Desaulniers <ndesaulniers@google.com>
-L:     llvm@lists.linux.dev
-S:     Supported
-B:     https://github.com/ClangBuiltLinux/linux/issues
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/hardening
-F:     include/linux/cfi.h
-F:     kernel/cfi.c
-
 CLK API
 M:     Russell King <linux@armlinux.org.uk>
 L:     linux-clk@vger.kernel.org
@@ -5143,7 +5139,7 @@ X:        drivers/clk/clkdev.c
 
 COMMON INTERNET FILE SYSTEM CLIENT (CIFS and SMB3)
 M:     Steve French <sfrench@samba.org>
-R:     Paulo Alcantara <pc@cjr.nz> (DFS, global name space)
+R:     Paulo Alcantara <pc@manguebit.com> (DFS, global name space)
 R:     Ronnie Sahlberg <lsahlber@redhat.com> (directory leases, sparse files)
 R:     Shyam Prasad N <sprasad@microsoft.com> (multichannel)
 R:     Tom Talpey <tom@talpey.com> (RDMA, smbdirect)
@@ -5153,8 +5149,8 @@ S:        Supported
 W:     https://wiki.samba.org/index.php/LinuxCIFS
 T:     git git://git.samba.org/sfrench/cifs-2.6.git
 F:     Documentation/admin-guide/cifs/
-F:     fs/cifs/
-F:     fs/smbfs_common/
+F:     fs/smb/client/
+F:     fs/smb/common/
 F:     include/uapi/linux/cifs
 
 COMPACTPCI HOTPLUG CORE
@@ -5223,8 +5219,8 @@ CONTEXT TRACKING
 M:     Frederic Weisbecker <frederic@kernel.org>
 M:     "Paul E. McKenney" <paulmck@kernel.org>
 S:     Maintained
-F:     kernel/context_tracking.c
 F:     include/linux/context_tracking*
+F:     kernel/context_tracking.c
 
 CONTROL GROUP (CGROUP)
 M:     Tejun Heo <tj@kernel.org>
@@ -5348,6 +5344,18 @@ F:       include/linux/sched/cpufreq.h
 F:     kernel/sched/cpufreq*.c
 F:     tools/testing/selftests/cpufreq/
 
+CPU HOTPLUG
+M:     Thomas Gleixner <tglx@linutronix.de>
+M:     Peter Zijlstra <peterz@infradead.org>
+L:     linux-kernel@vger.kernel.org
+S:     Maintained
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git smp/core
+F:     kernel/cpu.c
+F:     kernel/smpboot.*
+F:     include/linux/cpu.h
+F:     include/linux/cpuhotplug.h
+F:     include/linux/smpboot.h
+
 CPU IDLE TIME MANAGEMENT FRAMEWORK
 M:     "Rafael J. Wysocki" <rafael@kernel.org>
 M:     Daniel Lezcano <daniel.lezcano@linaro.org>
@@ -5385,8 +5393,8 @@ F:        drivers/cpuidle/cpuidle-big_little.c
 
 CPUIDLE DRIVER - ARM EXYNOS
 M:     Daniel Lezcano <daniel.lezcano@linaro.org>
-R:     Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
 M:     Kukjin Kim <kgene@kernel.org>
+R:     Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
 L:     linux-pm@vger.kernel.org
 L:     linux-samsung-soc@vger.kernel.org
 S:     Supported
@@ -5407,8 +5415,8 @@ M:        Ulf Hansson <ulf.hansson@linaro.org>
 L:     linux-pm@vger.kernel.org
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Supported
-F:     drivers/cpuidle/cpuidle-psci.h
 F:     drivers/cpuidle/cpuidle-psci-domain.c
+F:     drivers/cpuidle/cpuidle-psci.h
 
 CPUIDLE DRIVER - DT IDLE PM DOMAIN
 M:     Ulf Hansson <ulf.hansson@linaro.org>
@@ -5552,19 +5560,19 @@ S:      Supported
 W:     http://www.chelsio.com
 F:     drivers/crypto/chelsio
 
-CXGB4 INLINE CRYPTO DRIVER
-M:     Ayush Sawal <ayush.sawal@chelsio.com>
+CXGB4 ETHERNET DRIVER (CXGB4)
+M:     Raju Rangoju <rajur@chelsio.com>
 L:     netdev@vger.kernel.org
 S:     Supported
 W:     http://www.chelsio.com
-F:     drivers/net/ethernet/chelsio/inline_crypto/
+F:     drivers/net/ethernet/chelsio/cxgb4/
 
-CXGB4 ETHERNET DRIVER (CXGB4)
-M:     Raju Rangoju <rajur@chelsio.com>
+CXGB4 INLINE CRYPTO DRIVER
+M:     Ayush Sawal <ayush.sawal@chelsio.com>
 L:     netdev@vger.kernel.org
 S:     Supported
 W:     http://www.chelsio.com
-F:     drivers/net/ethernet/chelsio/cxgb4/
+F:     drivers/net/ethernet/chelsio/inline_crypto/
 
 CXGB4 ISCSI DRIVER (CXGB4I)
 M:     Varun Prakash <varun@chelsio.com>
@@ -5621,16 +5629,6 @@ CYCLADES PC300 DRIVER
 S:     Orphan
 F:     drivers/net/wan/pc300*
 
-CYPRESS_FIRMWARE MEDIA DRIVER
-M:     Antti Palosaari <crope@iki.fi>
-L:     linux-media@vger.kernel.org
-S:     Maintained
-W:     https://linuxtv.org
-W:     http://palosaari.fi/linux/
-Q:     http://patchwork.linuxtv.org/project/linux-media/list/
-T:     git git://linuxtv.org/anttip/media_tree.git
-F:     drivers/media/common/cypress_firmware*
-
 CYPRESS CY8C95X0 PINCTRL DRIVER
 M:     Patrick Rudolph <patrick.rudolph@9elements.com>
 L:     linux-gpio@vger.kernel.org
@@ -5650,6 +5648,16 @@ S:       Maintained
 F:     Documentation/devicetree/bindings/input/cypress-sf.yaml
 F:     drivers/input/keyboard/cypress-sf.c
 
+CYPRESS_FIRMWARE MEDIA DRIVER
+M:     Antti Palosaari <crope@iki.fi>
+L:     linux-media@vger.kernel.org
+S:     Maintained
+W:     https://linuxtv.org
+W:     http://palosaari.fi/linux/
+Q:     http://patchwork.linuxtv.org/project/linux-media/list/
+T:     git git://linuxtv.org/anttip/media_tree.git
+F:     drivers/media/common/cypress_firmware*
+
 CYTTSP TOUCHSCREEN DRIVER
 M:     Linus Walleij <linus.walleij@linaro.org>
 L:     linux-input@vger.kernel.org
@@ -5732,6 +5740,14 @@ F:       include/linux/tfrc.h
 F:     include/uapi/linux/dccp.h
 F:     net/dccp/
 
+DEBUGOBJECTS:
+M:     Thomas Gleixner <tglx@linutronix.de>
+L:     linux-kernel@vger.kernel.org
+S:     Maintained
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/debugobjects
+F:     lib/debugobjects.c
+F:     include/linux/debugobjects.h
+
 DECSTATION PLATFORM SUPPORT
 M:     "Maciej W. Rozycki" <macro@orcam.me.uk>
 L:     linux-mips@vger.kernel.org
@@ -5816,11 +5832,6 @@ S:       Maintained
 F:     Documentation/driver-api/dcdbas.rst
 F:     drivers/platform/x86/dell/dcdbas.*
 
-DELL WMI DESCRIPTOR DRIVER
-L:     Dell.Client.Kernel@dell.com
-S:     Maintained
-F:     drivers/platform/x86/dell/dell-wmi-descriptor.c
-
 DELL WMI DDV DRIVER
 M:     Armin Wolf <W_Armin@gmx.de>
 S:     Maintained
@@ -5828,13 +5839,17 @@ F:      Documentation/ABI/testing/debugfs-dell-wmi-ddv
 F:     Documentation/ABI/testing/sysfs-platform-dell-wmi-ddv
 F:     drivers/platform/x86/dell/dell-wmi-ddv.c
 
-DELL WMI SYSMAN DRIVER
-M:     Prasanth Ksr <prasanth.ksr@dell.com>
+DELL WMI DESCRIPTOR DRIVER
+L:     Dell.Client.Kernel@dell.com
+S:     Maintained
+F:     drivers/platform/x86/dell/dell-wmi-descriptor.c
+
+DELL WMI HARDWARE PRIVACY SUPPORT
+M:     Perry Yuan <Perry.Yuan@dell.com>
 L:     Dell.Client.Kernel@dell.com
 L:     platform-driver-x86@vger.kernel.org
 S:     Maintained
-F:     Documentation/ABI/testing/sysfs-class-firmware-attributes
-F:     drivers/platform/x86/dell/dell-wmi-sysman/
+F:     drivers/platform/x86/dell/dell-wmi-privacy.c
 
 DELL WMI NOTIFICATIONS DRIVER
 M:     Matthew Garrett <mjg59@srcf.ucam.org>
@@ -5842,20 +5857,13 @@ M:      Pali Rohár <pali@kernel.org>
 S:     Maintained
 F:     drivers/platform/x86/dell/dell-wmi-base.c
 
-DELL WMI HARDWARE PRIVACY SUPPORT
-M:     Perry Yuan <Perry.Yuan@dell.com>
+DELL WMI SYSMAN DRIVER
+M:     Prasanth Ksr <prasanth.ksr@dell.com>
 L:     Dell.Client.Kernel@dell.com
 L:     platform-driver-x86@vger.kernel.org
 S:     Maintained
-F:     drivers/platform/x86/dell/dell-wmi-privacy.c
-
-DELTA ST MEDIA DRIVER
-M:     Hugues Fruchet <hugues.fruchet@foss.st.com>
-L:     linux-media@vger.kernel.org
-S:     Supported
-W:     https://linuxtv.org
-T:     git git://linuxtv.org/media_tree.git
-F:     drivers/media/platform/st/sti/delta
+F:     Documentation/ABI/testing/sysfs-class-firmware-attributes
+F:     drivers/platform/x86/dell/dell-wmi-sysman/
 
 DELTA AHE-50DC FAN CONTROL MODULE DRIVER
 M:     Zev Weiss <zev@bewilderbeest.net>
@@ -5879,6 +5887,14 @@ F:       Documentation/devicetree/bindings/reset/delta,tn48m-reset.yaml
 F:     drivers/gpio/gpio-tn48m.c
 F:     include/dt-bindings/reset/delta,tn48m-reset.h
 
+DELTA ST MEDIA DRIVER
+M:     Hugues Fruchet <hugues.fruchet@foss.st.com>
+L:     linux-media@vger.kernel.org
+S:     Supported
+W:     https://linuxtv.org
+T:     git git://linuxtv.org/media_tree.git
+F:     drivers/media/platform/st/sti/delta
+
 DENALI NAND DRIVER
 L:     linux-mtd@lists.infradead.org
 S:     Orphan
@@ -5891,13 +5907,6 @@ S:       Maintained
 F:     drivers/dma/dw-edma/
 F:     include/linux/dma/edma.h
 
-DESIGNWARE XDATA IP DRIVER
-M:     Gustavo Pimentel <gustavo.pimentel@synopsys.com>
-L:     linux-pci@vger.kernel.org
-S:     Maintained
-F:     Documentation/misc-devices/dw-xdata-pcie.rst
-F:     drivers/misc/dw-xdata-pcie.c
-
 DESIGNWARE USB2 DRD IP DRIVER
 M:     Minas Harutyunyan <hminas@synopsys.com>
 L:     linux-usb@vger.kernel.org
@@ -5911,6 +5920,13 @@ L:       linux-usb@vger.kernel.org
 S:     Maintained
 F:     drivers/usb/dwc3/
 
+DESIGNWARE XDATA IP DRIVER
+M:     Gustavo Pimentel <gustavo.pimentel@synopsys.com>
+L:     linux-pci@vger.kernel.org
+S:     Maintained
+F:     Documentation/misc-devices/dw-xdata-pcie.rst
+F:     drivers/misc/dw-xdata-pcie.c
+
 DEVANTECH SRF ULTRASONIC RANGER IIO DRIVER
 M:     Andreas Klinger <ak@it-klinger.de>
 L:     linux-iio@vger.kernel.org
@@ -6019,9 +6035,9 @@ W:        http://www.dialog-semiconductor.com/products
 F:     Documentation/devicetree/bindings/input/da90??-onkey.txt
 F:     Documentation/devicetree/bindings/input/dlg,da72??.txt
 F:     Documentation/devicetree/bindings/mfd/da90*.txt
-F:     Documentation/devicetree/bindings/mfd/da90*.yaml
-F:     Documentation/devicetree/bindings/regulator/dlg,da9*.yaml
+F:     Documentation/devicetree/bindings/mfd/dlg,da90*.yaml
 F:     Documentation/devicetree/bindings/regulator/da92*.txt
+F:     Documentation/devicetree/bindings/regulator/dlg,da9*.yaml
 F:     Documentation/devicetree/bindings/regulator/slg51000.txt
 F:     Documentation/devicetree/bindings/sound/da[79]*.txt
 F:     Documentation/devicetree/bindings/thermal/da90??-thermal.txt
@@ -6140,6 +6156,12 @@ F:       include/linux/dma/
 F:     include/linux/dmaengine.h
 F:     include/linux/of_dma.h
 
+DMA MAPPING BENCHMARK
+M:     Xiang Chen <chenxiang66@hisilicon.com>
+L:     iommu@lists.linux.dev
+F:     kernel/dma/map_benchmark.c
+F:     tools/testing/selftests/dma/
+
 DMA MAPPING HELPERS
 M:     Christoph Hellwig <hch@lst.de>
 M:     Marek Szyprowski <m.szyprowski@samsung.com>
@@ -6150,17 +6172,11 @@ W:      http://git.infradead.org/users/hch/dma-mapping.git
 T:     git git://git.infradead.org/users/hch/dma-mapping.git
 F:     include/asm-generic/dma-mapping.h
 F:     include/linux/dma-direct.h
-F:     include/linux/dma-mapping.h
 F:     include/linux/dma-map-ops.h
+F:     include/linux/dma-mapping.h
 F:     include/linux/swiotlb.h
 F:     kernel/dma/
 
-DMA MAPPING BENCHMARK
-M:     Xiang Chen <chenxiang66@hisilicon.com>
-L:     iommu@lists.linux.dev
-F:     kernel/dma/map_benchmark.c
-F:     tools/testing/selftests/dma/
-
 DMA-BUF HEAPS FRAMEWORK
 M:     Sumit Semwal <sumit.semwal@linaro.org>
 R:     Benjamin Gaignard <benjamin.gaignard@collabora.com>
@@ -6218,10 +6234,17 @@ X:      Documentation/devicetree/
 X:     Documentation/driver-api/media/
 X:     Documentation/firmware-guide/acpi/
 X:     Documentation/i2c/
+X:     Documentation/netlink/
 X:     Documentation/power/
 X:     Documentation/spi/
 X:     Documentation/userspace-api/media/
 
+DOCUMENTATION PROCESS
+M:     Jonathan Corbet <corbet@lwn.net>
+S:     Maintained
+F:     Documentation/process/
+L:     workflows@vger.kernel.org
+
 DOCUMENTATION REPORTING ISSUES
 M:     Thorsten Leemhuis <linux@leemhuis.info>
 L:     linux-doc@vger.kernel.org
@@ -6350,6 +6373,25 @@ S:       Maintained
 F:     drivers/soc/ti/smartreflex.c
 F:     include/linux/power/smartreflex.h
 
+DRM ACCEL DRIVERS FOR INTEL VPU
+M:     Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
+M:     Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
+L:     dri-devel@lists.freedesktop.org
+S:     Supported
+T:     git git://anongit.freedesktop.org/drm/drm-misc
+F:     drivers/accel/ivpu/
+F:     include/uapi/drm/ivpu_accel.h
+
+DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK
+M:     Oded Gabbay <ogabbay@kernel.org>
+L:     dri-devel@lists.freedesktop.org
+S:     Maintained
+C:     irc://irc.oftc.net/dri-devel
+T:     git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git
+F:     Documentation/accel/
+F:     drivers/accel/
+F:     include/drm/drm_accel.h
+
 DRM DRIVER FOR ALLWINNER DE2 AND DE3 ENGINE
 M:     Maxime Ripard <mripard@kernel.org>
 M:     Chen-Yu Tsai <wens@csie.org>
@@ -6432,6 +6474,21 @@ S:       Maintained
 F:     Documentation/devicetree/bindings/display/panel/feiyang,fy07024di26a30d.yaml
 F:     drivers/gpu/drm/panel/panel-feiyang-fy07024di26a30d.c
 
+DRM DRIVER FOR FIRMWARE FRAMEBUFFERS
+M:     Thomas Zimmermann <tzimmermann@suse.de>
+M:     Javier Martinez Canillas <javierm@redhat.com>
+L:     dri-devel@lists.freedesktop.org
+S:     Maintained
+T:     git git://anongit.freedesktop.org/drm/drm-misc
+F:     drivers/gpu/drm/drm_aperture.c
+F:     drivers/gpu/drm/tiny/ofdrm.c
+F:     drivers/gpu/drm/tiny/simpledrm.c
+F:     drivers/video/aperture.c
+F:     drivers/video/nomodeset.c
+F:     include/drm/drm_aperture.h
+F:     include/linux/aperture.h
+F:     include/video/nomodeset.h
+
 DRM DRIVER FOR GENERIC EDP PANELS
 R:     Douglas Anderson <dianders@chromium.org>
 F:     Documentation/devicetree/bindings/display/panel/panel-edp.yaml
@@ -6466,6 +6523,14 @@ T:       git git://anongit.freedesktop.org/drm/drm-misc
 F:     Documentation/devicetree/bindings/display/himax,hx8357d.txt
 F:     drivers/gpu/drm/tiny/hx8357d.c
 
+DRM DRIVER FOR HYPERV SYNTHETIC VIDEO DEVICE
+M:     Deepak Rawat <drawat.floss@gmail.com>
+L:     linux-hyperv@vger.kernel.org
+L:     dri-devel@lists.freedesktop.org
+S:     Maintained
+T:     git git://anongit.freedesktop.org/drm/drm-misc
+F:     drivers/gpu/drm/hyperv
+
 DRM DRIVER FOR ILITEK ILI9225 PANELS
 M:     David Lechner <david@lechnology.com>
 S:     Maintained
@@ -6495,11 +6560,11 @@ F:      drivers/gpu/drm/logicvc/
 DRM DRIVER FOR LVDS PANELS
 M:     Laurent Pinchart <laurent.pinchart@ideasonboard.com>
 L:     dri-devel@lists.freedesktop.org
-T:     git git://anongit.freedesktop.org/drm/drm-misc
 S:     Maintained
-F:     drivers/gpu/drm/panel/panel-lvds.c
+T:     git git://anongit.freedesktop.org/drm/drm-misc
 F:     Documentation/devicetree/bindings/display/lvds.yaml
 F:     Documentation/devicetree/bindings/display/panel/panel-lvds.yaml
+F:     drivers/gpu/drm/panel/panel-lvds.c
 
 DRM DRIVER FOR MANTIX MLAF057WE51 PANELS
 M:     Guido Günther <agx@sigxcpu.org>
@@ -6608,13 +6673,6 @@ T:       git git://anongit.freedesktop.org/drm/drm-misc
 F:     Documentation/devicetree/bindings/display/repaper.txt
 F:     drivers/gpu/drm/tiny/repaper.c
 
-DRM DRIVER FOR SOLOMON SSD130X OLED DISPLAYS
-M:     Javier Martinez Canillas <javierm@redhat.com>
-S:     Maintained
-T:     git git://anongit.freedesktop.org/drm/drm-misc
-F:     Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml
-F:     drivers/gpu/drm/solomon/ssd130x*
-
 DRM DRIVER FOR QEMU'S CIRRUS DEVICE
 M:     Dave Airlie <airlied@redhat.com>
 M:     Gerd Hoffmann <kraxel@redhat.com>
@@ -6663,29 +6721,6 @@ S:       Maintained
 F:     Documentation/devicetree/bindings/display/panel/samsung,s6d27a1.yaml
 F:     drivers/gpu/drm/panel/panel-samsung-s6d27a1.c
 
-DRM DRIVER FOR SITRONIX ST7703 PANELS
-M:     Guido Günther <agx@sigxcpu.org>
-R:     Purism Kernel Team <kernel@puri.sm>
-R:     Ondrej Jirman <megous@megous.com>
-S:     Maintained
-F:     Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.yaml
-F:     drivers/gpu/drm/panel/panel-sitronix-st7703.c
-
-DRM DRIVER FOR FIRMWARE FRAMEBUFFERS
-M:     Thomas Zimmermann <tzimmermann@suse.de>
-M:     Javier Martinez Canillas <javierm@redhat.com>
-L:     dri-devel@lists.freedesktop.org
-S:     Maintained
-T:     git git://anongit.freedesktop.org/drm/drm-misc
-F:     drivers/gpu/drm/drm_aperture.c
-F:     drivers/gpu/drm/tiny/ofdrm.c
-F:     drivers/gpu/drm/tiny/simpledrm.c
-F:     drivers/video/aperture.c
-F:     drivers/video/nomodeset.c
-F:     include/drm/drm_aperture.h
-F:     include/linux/aperture.h
-F:     include/video/nomodeset.h
-
 DRM DRIVER FOR SITRONIX ST7586 PANELS
 M:     David Lechner <david@lechnology.com>
 S:     Maintained
@@ -6699,6 +6734,14 @@ S:       Maintained
 F:     Documentation/devicetree/bindings/display/panel/sitronix,st7701.yaml
 F:     drivers/gpu/drm/panel/panel-sitronix-st7701.c
 
+DRM DRIVER FOR SITRONIX ST7703 PANELS
+M:     Guido Günther <agx@sigxcpu.org>
+R:     Purism Kernel Team <kernel@puri.sm>
+R:     Ondrej Jirman <megous@megous.com>
+S:     Maintained
+F:     Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.yaml
+F:     drivers/gpu/drm/panel/panel-sitronix-st7703.c
+
 DRM DRIVER FOR SITRONIX ST7735R PANELS
 M:     David Lechner <david@lechnology.com>
 S:     Maintained
@@ -6706,6 +6749,13 @@ T:       git git://anongit.freedesktop.org/drm/drm-misc
 F:     Documentation/devicetree/bindings/display/sitronix,st7735r.yaml
 F:     drivers/gpu/drm/tiny/st7735r.c
 
+DRM DRIVER FOR SOLOMON SSD130X OLED DISPLAYS
+M:     Javier Martinez Canillas <javierm@redhat.com>
+S:     Maintained
+T:     git git://anongit.freedesktop.org/drm/drm-misc
+F:     Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml
+F:     drivers/gpu/drm/solomon/ssd130x*
+
 DRM DRIVER FOR ST-ERICSSON MCDE
 M:     Linus Walleij <linus.walleij@linaro.org>
 S:     Maintained
@@ -6804,25 +6854,6 @@ F:       include/drm/drm*
 F:     include/linux/vga*
 F:     include/uapi/drm/drm*
 
-DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK
-M:     Oded Gabbay <ogabbay@kernel.org>
-L:     dri-devel@lists.freedesktop.org
-S:     Maintained
-C:     irc://irc.oftc.net/dri-devel
-T:     git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git
-F:     Documentation/accel/
-F:     drivers/accel/
-F:     include/drm/drm_accel.h
-
-DRM ACCEL DRIVERS FOR INTEL VPU
-M:     Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
-M:     Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-L:     dri-devel@lists.freedesktop.org
-S:     Supported
-T:     git git://anongit.freedesktop.org/drm/drm-misc
-F:     drivers/accel/ivpu/
-F:     include/uapi/drm/ivpu_accel.h
-
 DRM DRIVERS FOR ALLWINNER A10
 M:     Maxime Ripard <mripard@kernel.org>
 M:     Chen-Yu Tsai <wens@csie.org>
@@ -6926,14 +6957,6 @@ T:       git git://anongit.freedesktop.org/drm/drm-misc
 F:     Documentation/devicetree/bindings/display/hisilicon/
 F:     drivers/gpu/drm/hisilicon/
 
-DRM DRIVER FOR HYPERV SYNTHETIC VIDEO DEVICE
-M:     Deepak Rawat <drawat.floss@gmail.com>
-L:     linux-hyperv@vger.kernel.org
-L:     dri-devel@lists.freedesktop.org
-S:     Maintained
-T:     git git://anongit.freedesktop.org/drm/drm-misc
-F:     drivers/gpu/drm/hyperv
-
 DRM DRIVERS FOR LIMA
 M:     Qiang Yu <yuq825@gmail.com>
 L:     dri-devel@lists.freedesktop.org
@@ -7085,6 +7108,14 @@ T:       git git://anongit.freedesktop.org/drm/drm-misc
 F:     Documentation/devicetree/bindings/display/xlnx/
 F:     drivers/gpu/drm/xlnx/
 
+DRM GPU SCHEDULER
+M:     Luben Tuikov <luben.tuikov@amd.com>
+L:     dri-devel@lists.freedesktop.org
+S:     Maintained
+T:     git git://anongit.freedesktop.org/drm/drm-misc
+F:     drivers/gpu/drm/scheduler/
+F:     include/drm/gpu_scheduler.h
+
 DRM PANEL DRIVERS
 M:     Neil Armstrong <neil.armstrong@linaro.org>
 R:     Sam Ravnborg <sam@ravnborg.org>
@@ -7113,14 +7144,6 @@ T:       git git://anongit.freedesktop.org/drm/drm-misc
 F:     drivers/gpu/drm/ttm/
 F:     include/drm/ttm/
 
-DRM GPU SCHEDULER
-M:     Luben Tuikov <luben.tuikov@amd.com>
-L:     dri-devel@lists.freedesktop.org
-S:     Maintained
-T:     git git://anongit.freedesktop.org/drm/drm-misc
-F:     drivers/gpu/drm/scheduler/
-F:     include/drm/gpu_scheduler.h
-
 DSBR100 USB FM RADIO DRIVER
 M:     Alexey Klimov <klimov.linux@gmail.com>
 L:     linux-media@vger.kernel.org
@@ -7248,10 +7271,10 @@ F:      drivers/media/usb/dvb-usb-v2/usb_urb.c
 
 DYNAMIC DEBUG
 M:     Jason Baron <jbaron@akamai.com>
+M:     Jim Cromie <jim.cromie@gmail.com>
 S:     Maintained
 F:     include/linux/dynamic_debug.h
 F:     lib/dynamic_debug.c
-M:     Jim Cromie <jim.cromie@gmail.com>
 F:     lib/test_dynamic_debug.c
 
 DYNAMIC INTERRUPT MODERATION
@@ -7261,6 +7284,15 @@ F:       Documentation/networking/net_dim.rst
 F:     include/linux/dim.h
 F:     lib/dim/
 
+DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
+M:     Daniel Lezcano <daniel.lezcano@kernel.org>
+L:     linux-pm@vger.kernel.org
+S:     Supported
+B:     https://bugzilla.kernel.org
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
+F:     drivers/powercap/dtpm*
+F:     include/linux/dtpm.h
+
 DZ DECSTATION DZ11 SERIAL DRIVER
 M:     "Maciej W. Rozycki" <macro@orcam.me.uk>
 S:     Maintained
@@ -7468,6 +7500,14 @@ L:       linux-edac@vger.kernel.org
 S:     Maintained
 F:     drivers/edac/mpc85xx_edac.[ch]
 
+EDAC-NPCM
+M:     Marvin Lin <kflin@nuvoton.com>
+M:     Stanley Chu <yschu@nuvoton.com>
+L:     linux-edac@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/memory-controllers/nuvoton,npcm-memory-controller.yaml
+F:     drivers/edac/npcm_edac.c
+
 EDAC-PASEMI
 M:     Egor Martovetsky <egor@pasemi.com>
 L:     linux-edac@vger.kernel.org
@@ -7599,22 +7639,22 @@ W:      http://www.broadcom.com
 F:     drivers/infiniband/hw/ocrdma/
 F:     include/uapi/rdma/ocrdma-abi.h
 
-EMULEX/BROADCOM LPFC FC/FCOE SCSI DRIVER
+EMULEX/BROADCOM EFCT FC/FCOE SCSI TARGET DRIVER
 M:     James Smart <james.smart@broadcom.com>
-M:     Dick Kennedy <dick.kennedy@broadcom.com>
+M:     Ram Vegesna <ram.vegesna@broadcom.com>
 L:     linux-scsi@vger.kernel.org
+L:     target-devel@vger.kernel.org
 S:     Supported
 W:     http://www.broadcom.com
-F:     drivers/scsi/lpfc/
+F:     drivers/scsi/elx/
 
-EMULEX/BROADCOM EFCT FC/FCOE SCSI TARGET DRIVER
+EMULEX/BROADCOM LPFC FC/FCOE SCSI DRIVER
 M:     James Smart <james.smart@broadcom.com>
-M:     Ram Vegesna <ram.vegesna@broadcom.com>
+M:     Dick Kennedy <dick.kennedy@broadcom.com>
 L:     linux-scsi@vger.kernel.org
-L:     target-devel@vger.kernel.org
 S:     Supported
 W:     http://www.broadcom.com
-F:     drivers/scsi/elx/
+F:     drivers/scsi/lpfc/
 
 ENE CB710 FLASH CARD READER DRIVER
 M:     Michał Mirosław <mirq-linux@rere.qmqm.pl>
@@ -7707,8 +7747,8 @@ F:        drivers/net/mdio/of_mdio.c
 F:     drivers/net/pcs/
 F:     drivers/net/phy/
 F:     include/dt-bindings/net/qca-ar803x.h
-F:     include/linux/linkmode.h
 F:     include/linux/*mdio*.h
+F:     include/linux/linkmode.h
 F:     include/linux/mdio/*.h
 F:     include/linux/mii.h
 F:     include/linux/of_net.h
@@ -7771,8 +7811,8 @@ M:        Mimi Zohar <zohar@linux.ibm.com>
 L:     linux-integrity@vger.kernel.org
 S:     Supported
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
-F:     security/integrity/evm/
 F:     security/integrity/
+F:     security/integrity/evm/
 
 EXTENSIBLE FIRMWARE INTERFACE (EFI)
 M:     Ard Biesheuvel <ardb@kernel.org>
@@ -7803,8 +7843,8 @@ EXTRA BOOT CONFIG
 M:     Masami Hiramatsu <mhiramat@kernel.org>
 L:     linux-kernel@vger.kernel.org
 L:     linux-trace-kernel@vger.kernel.org
-Q:     https://patchwork.kernel.org/project/linux-trace-kernel/list/
 S:     Maintained
+Q:     https://patchwork.kernel.org/project/linux-trace-kernel/list/
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
 F:     Documentation/admin-guide/bootconfig.rst
 F:     fs/proc/bootconfig.c
@@ -8091,21 +8131,6 @@ F:       Documentation/fpga/
 F:     drivers/fpga/
 F:     include/linux/fpga/
 
-INTEL MAX10 BMC SECURE UPDATES
-M:     Russ Weight <russell.h.weight@intel.com>
-L:     linux-fpga@vger.kernel.org
-S:     Maintained
-F:     Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
-F:     drivers/fpga/intel-m10-bmc-sec-update.c
-
-MICROCHIP POLARFIRE FPGA DRIVERS
-M:     Conor Dooley <conor.dooley@microchip.com>
-R:     Ivan Bornyakov <i.bornyakov@metrotek.ru>
-L:     linux-fpga@vger.kernel.org
-S:     Supported
-F:     Documentation/devicetree/bindings/fpga/microchip,mpf-spi-fpga-mgr.yaml
-F:     drivers/fpga/microchip-spi.c
-
 FPU EMULATOR
 M:     Bill Metzenthen <billm@melbpc.org.au>
 S:     Maintained
@@ -8114,9 +8139,9 @@ F:        arch/x86/math-emu/
 
 FRAMEBUFFER CORE
 M:     Daniel Vetter <daniel@ffwll.ch>
-F:     drivers/video/fbdev/core/
 S:     Odd Fixes
 T:     git git://anongit.freedesktop.org/drm/drm-misc
+F:     drivers/video/fbdev/core/
 
 FRAMEBUFFER LAYER
 M:     Helge Deller <deller@gmx.de>
@@ -8171,6 +8196,7 @@ F:        include/linux/spi/spi-fsl-dspi.h
 
 FREESCALE ENETC ETHERNET DRIVERS
 M:     Claudiu Manoil <claudiu.manoil@nxp.com>
+M:     Vladimir Oltean <vladimir.oltean@nxp.com>
 L:     netdev@vger.kernel.org
 S:     Maintained
 F:     drivers/net/ethernet/freescale/enetc/
@@ -8493,15 +8519,15 @@ M:      Masami Hiramatsu <mhiramat@kernel.org>
 R:     Mark Rutland <mark.rutland@arm.com>
 L:     linux-kernel@vger.kernel.org
 L:     linux-trace-kernel@vger.kernel.org
-Q:     https://patchwork.kernel.org/project/linux-trace-kernel/list/
 S:     Maintained
+Q:     https://patchwork.kernel.org/project/linux-trace-kernel/list/
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
 F:     Documentation/trace/ftrace*
-F:     kernel/trace/ftrace*
-F:     kernel/trace/fgraph.c
 F:     arch/*/*/*/*ftrace*
 F:     arch/*/*/*ftrace*
 F:     include/*/ftrace.h
+F:     kernel/trace/fgraph.c
+F:     kernel/trace/ftrace*
 F:     samples/ftrace
 
 FUNGIBLE ETHERNET DRIVERS
@@ -8542,10 +8568,10 @@ GATEWORKS SYSTEM CONTROLLER (GSC) DRIVER
 M:     Tim Harvey <tharvey@gateworks.com>
 S:     Maintained
 F:     Documentation/devicetree/bindings/mfd/gateworks-gsc.yaml
-F:     drivers/mfd/gateworks-gsc.c
-F:     include/linux/mfd/gsc.h
 F:     Documentation/hwmon/gsc-hwmon.rst
 F:     drivers/hwmon/gsc-hwmon.c
+F:     drivers/mfd/gateworks-gsc.c
+F:     include/linux/mfd/gsc.h
 F:     include/linux/platform_data/gsc_hwmon.h
 
 GCC PLUGINS
@@ -8673,8 +8699,8 @@ R:        Andy Shevchenko <andy@kernel.org>
 S:     Maintained
 F:     lib/string.c
 F:     lib/string_helpers.c
-F:     lib/test_string.c
 F:     lib/test-string_helpers.c
+F:     lib/test_string.c
 
 GENERIC UIO DRIVER FOR PCI DEVICES
 M:     "Michael S. Tsirkin" <mst@redhat.com>
@@ -8799,6 +8825,7 @@ F:        include/linux/gpio/regmap.h
 GPIO SUBSYSTEM
 M:     Linus Walleij <linus.walleij@linaro.org>
 M:     Bartosz Golaszewski <brgl@bgdev.pl>
+R:     Andy Shevchenko <andy@kernel.org>
 L:     linux-gpio@vger.kernel.org
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
@@ -9157,12 +9184,11 @@ L:      linux-input@vger.kernel.org
 S:     Maintained
 F:     drivers/hid/hid-logitech-*
 
-HID++ LOGITECH DRIVERS
-R:     Filipe Laíns <lains@riseup.net>
-R:     Bastien Nocera <hadess@hadess.net>
+HID PHOENIX RC FLIGHT CONTROLLER
+M:     Marcus Folkesson <marcus.folkesson@gmail.com>
 L:     linux-input@vger.kernel.org
 S:     Maintained
-F:     drivers/hid/hid-logitech-hidpp.c
+F:     drivers/hid/hid-pxrc.c
 
 HID PLAYSTATION DRIVER
 M:     Roderick Colenbrander <roderick.colenbrander@sony.com>
@@ -9170,12 +9196,6 @@ L:       linux-input@vger.kernel.org
 S:     Supported
 F:     drivers/hid/hid-playstation.c
 
-HID PHOENIX RC FLIGHT CONTROLLER
-M:     Marcus Folkesson <marcus.folkesson@gmail.com>
-L:     linux-input@vger.kernel.org
-S:     Maintained
-F:     drivers/hid/hid-pxrc.c
-
 HID SENSOR HUB DRIVERS
 M:     Jiri Kosina <jikos@kernel.org>
 M:     Jonathan Cameron <jic23@kernel.org>
@@ -9202,6 +9222,13 @@ S:       Maintained
 F:     drivers/hid/wacom.h
 F:     drivers/hid/wacom_*
 
+HID++ LOGITECH DRIVERS
+R:     Filipe Laíns <lains@riseup.net>
+R:     Bastien Nocera <hadess@hadess.net>
+L:     linux-input@vger.kernel.org
+S:     Maintained
+F:     drivers/hid/hid-logitech-hidpp.c
+
 HIGH-RESOLUTION TIMERS, CLOCKEVENTS
 M:     Thomas Gleixner <tglx@linutronix.de>
 L:     linux-kernel@vger.kernel.org
@@ -9226,6 +9253,12 @@ W:       http://www.highpoint-tech.com
 F:     Documentation/scsi/hptiop.rst
 F:     drivers/scsi/hptiop.c
 
+HIKEY960 ONBOARD USB GPIO HUB DRIVER
+M:     John Stultz <jstultz@google.com>
+L:     linux-kernel@vger.kernel.org
+S:     Maintained
+F:     drivers/misc/hisi_hikey_usb.c
+
 HIMAX HX83112B TOUCHSCREEN SUPPORT
 M:     Job Noorman <job@noorman.info>
 L:     linux-input@vger.kernel.org
@@ -9274,6 +9307,12 @@ F:       drivers/crypto/hisilicon/hpre/hpre.h
 F:     drivers/crypto/hisilicon/hpre/hpre_crypto.c
 F:     drivers/crypto/hisilicon/hpre/hpre_main.c
 
+HISILICON HNS3 PMU DRIVER
+M:     Guangbin Huang <huangguangbin2@huawei.com>
+S:     Supported
+F:     Documentation/admin-guide/perf/hns3-pmu.rst
+F:     drivers/perf/hisilicon/hns3_pmu.c
+
 HISILICON I2C CONTROLLER DRIVER
 M:     Yicong Yang <yangyicong@hisilicon.com>
 L:     linux-i2c@vger.kernel.org
@@ -9306,12 +9345,6 @@ W:       http://www.hisilicon.com
 F:     Documentation/devicetree/bindings/net/hisilicon*.txt
 F:     drivers/net/ethernet/hisilicon/
 
-HIKEY960 ONBOARD USB GPIO HUB DRIVER
-M:     John Stultz <jstultz@google.com>
-L:     linux-kernel@vger.kernel.org
-S:     Maintained
-F:     drivers/misc/hisi_hikey_usb.c
-
 HISILICON PMU DRIVER
 M:     Shaokun Zhang <zhangshaokun@hisilicon.com>
 M:     Jonathan Cameron <jonathan.cameron@huawei.com>
@@ -9321,12 +9354,6 @@ F:       Documentation/admin-guide/perf/hisi-pcie-pmu.rst
 F:     Documentation/admin-guide/perf/hisi-pmu.rst
 F:     drivers/perf/hisilicon
 
-HISILICON HNS3 PMU DRIVER
-M:     Guangbin Huang <huangguangbin2@huawei.com>
-S:     Supported
-F:     Documentation/admin-guide/perf/hns3-pmu.rst
-F:     drivers/perf/hisilicon/hns3_pmu.c
-
 HISILICON PTT DRIVER
 M:     Yicong Yang <yangyicong@hisilicon.com>
 M:     Jonathan Cameron <jonathan.cameron@huawei.com>
@@ -9350,17 +9377,9 @@ F:       drivers/crypto/hisilicon/qm.c
 F:     drivers/crypto/hisilicon/sgl.c
 F:     include/linux/hisi_acc_qm.h
 
-HISILICON ZIP Controller DRIVER
-M:     Yang Shen <shenyang39@huawei.com>
-M:     Zhou Wang <wangzhou1@hisilicon.com>
-L:     linux-crypto@vger.kernel.org
-S:     Maintained
-F:     Documentation/ABI/testing/debugfs-hisi-zip
-F:     drivers/crypto/hisilicon/zip/
-
 HISILICON ROCE DRIVER
 M:     Haoyue Xu <xuhaoyue1@hisilicon.com>
-M:     Wenpeng Liang <liangwenpeng@huawei.com>
+M:     Junxian Huang <huangjunxian6@hisilicon.com>
 L:     linux-rdma@vger.kernel.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt
@@ -9416,6 +9435,14 @@ S:       Maintained
 W:     http://www.hisilicon.com
 F:     drivers/spi/spi-hisi-sfc-v3xx.c
 
+HISILICON ZIP Controller DRIVER
+M:     Yang Shen <shenyang39@huawei.com>
+M:     Zhou Wang <wangzhou1@hisilicon.com>
+L:     linux-crypto@vger.kernel.org
+S:     Maintained
+F:     Documentation/ABI/testing/debugfs-hisi-zip
+F:     drivers/crypto/hisilicon/zip/
+
 HMM - Heterogeneous Memory Management
 M:     Jérôme Glisse <jglisse@redhat.com>
 L:     linux-mm@kvack.org
@@ -9492,9 +9519,9 @@ F:        drivers/input/touchscreen/htcpen.c
 HTE SUBSYSTEM
 M:     Dipen Patel <dipenp@nvidia.com>
 L:     timestamp@lists.linux.dev
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/pateldipen1984/linux.git
-Q:     https://patchwork.kernel.org/project/timestamp/list/
 S:     Maintained
+Q:     https://patchwork.kernel.org/project/timestamp/list/
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/pateldipen1984/linux.git
 F:     Documentation/devicetree/bindings/timestamp/
 F:     Documentation/driver-api/hte/
 F:     drivers/hte/
@@ -9589,8 +9616,8 @@ T:        git git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
 F:     Documentation/ABI/stable/sysfs-bus-vmbus
 F:     Documentation/ABI/testing/debugfs-hyperv
 F:     Documentation/devicetree/bindings/bus/microsoft,vmbus.yaml
-F:     Documentation/virt/hyperv
 F:     Documentation/networking/device_drivers/ethernet/microsoft/netvsc.rst
+F:     Documentation/virt/hyperv
 F:     arch/arm64/hyperv
 F:     arch/arm64/include/asm/hyperv-tlfs.h
 F:     arch/arm64/include/asm/mshyperv.h
@@ -9695,8 +9722,9 @@ F:        include/uapi/linux/i2c-*.h
 F:     include/uapi/linux/i2c.h
 
 I2C SUBSYSTEM HOST DRIVERS
+M:     Andi Shyti <andi.shyti@kernel.org>
 L:     linux-i2c@vger.kernel.org
-S:     Odd Fixes
+S:     Maintained
 W:     https://i2c.wiki.kernel.org/
 Q:     https://patchwork.ozlabs.org/project/linux-i2c/list/
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git
@@ -9772,6 +9800,12 @@ L:       linux-i2c@vger.kernel.org
 S:     Maintained
 F:     drivers/i2c/i2c-stub.c
 
+I3C DRIVER FOR ASPEED AST2600
+M:     Jeremy Kerr <jk@codeconstruct.com.au>
+S:     Maintained
+F:     Documentation/devicetree/bindings/i3c/aspeed,ast2600-i3c.yaml
+F:     drivers/i3c/master/ast2600-i3c-master.c
+
 I3C DRIVER FOR CADENCE I3C MASTER IP
 M:     Przemysław Gaj <pgaj@cadence.com>
 S:     Maintained
@@ -9783,12 +9817,6 @@ S:       Orphan
 F:     Documentation/devicetree/bindings/i3c/snps,dw-i3c-master.yaml
 F:     drivers/i3c/master/dw*
 
-I3C DRIVER FOR ASPEED AST2600
-M:     Jeremy Kerr <jk@codeconstruct.com.au>
-S:     Maintained
-F:     Documentation/devicetree/bindings/i3c/aspeed,ast2600-i3c.yaml
-F:     drivers/i3c/master/ast2600-i3c-master.c
-
 I3C SUBSYSTEM
 M:     Alexandre Belloni <alexandre.belloni@bootlin.com>
 L:     linux-i3c@lists.infradead.org (moderated for non-subscribers)
@@ -9867,6 +9895,11 @@ L:       netdev@vger.kernel.org
 S:     Supported
 F:     drivers/net/ethernet/ibm/ibmvnic.*
 
+IBM Power VFIO Support
+M:     Timothy Pearson <tpearson@raptorengineering.com>
+S:     Supported
+F:     drivers/vfio/vfio_iommu_spapr_tce.c
+
 IBM Power Virtual Ethernet Device Driver
 M:     Nick Child <nnac123@linux.ibm.com>
 L:     netdev@vger.kernel.org
@@ -9912,11 +9945,6 @@ F:       drivers/crypto/vmx/ghash*
 F:     drivers/crypto/vmx/ppc-xlate.pl
 F:     drivers/crypto/vmx/vmx.c
 
-IBM Power VFIO Support
-M:     Timothy Pearson <tpearson@raptorengineering.com>
-S:     Supported
-F:     drivers/vfio/vfio_iommu_spapr_tce.c
-
 IBM ServeRAID RAID DRIVER
 S:     Orphan
 F:     drivers/scsi/ips.*
@@ -9970,8 +9998,9 @@ M:        Miquel Raynal <miquel.raynal@bootlin.com>
 L:     linux-wpan@vger.kernel.org
 S:     Maintained
 W:     https://linux-wpan.org/
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan.git
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next.git
+Q:     https://patchwork.kernel.org/project/linux-wpan/list/
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan.git
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next.git
 F:     Documentation/networking/ieee802154.rst
 F:     drivers/net/ieee802154/
 F:     include/linux/ieee802154.h
@@ -9984,6 +10013,10 @@ F:      include/net/nl802154.h
 F:     net/ieee802154/
 F:     net/mac802154/
 
+IFCVF VIRTIO DATA PATH ACCELERATOR
+R:     Zhu Lingshan <lingshan.zhu@intel.com>
+F:     drivers/vdpa/ifcvf/
+
 IFE PROTOCOL
 M:     Yotam Gigi <yotam.gi@gmail.com>
 M:     Jamal Hadi Salim <jhs@mojatatu.com>
@@ -10119,7 +10152,7 @@ S:      Maintained
 F:     Documentation/process/kernel-docs.rst
 
 INDUSTRY PACK SUBSYSTEM (IPACK)
-M:     Samuel Iglesias Gonsalvez <siglesias@igalia.com>
+M:     Vaibhav Gupta <vaibhavgupta40@gmail.com>
 M:     Jens Taprogge <jens.taprogge@taprogge.org>
 M:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 L:     industrypack-devel@lists.sourceforge.net
@@ -10248,8 +10281,8 @@ M:      Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
 L:     linux-integrity@vger.kernel.org
 S:     Supported
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
-F:     security/integrity/ima/
 F:     security/integrity/
+F:     security/integrity/ima/
 
 INTEL 810/815 FRAMEBUFFER DRIVER
 M:     Antonino Daplas <adaplas@gmail.com>
@@ -10403,14 +10436,6 @@ S:     Supported
 Q:     https://patchwork.kernel.org/project/linux-dmaengine/list/
 F:     drivers/dma/ioat*
 
-INTEL IDXD DRIVER
-M:     Fenghua Yu <fenghua.yu@intel.com>
-M:     Dave Jiang <dave.jiang@intel.com>
-L:     dmaengine@vger.kernel.org
-S:     Supported
-F:     drivers/dma/idxd/*
-F:     include/uapi/linux/idxd.h
-
 INTEL IDLE DRIVER
 M:     Jacob Pan <jacob.jun.pan@linux.intel.com>
 M:     Len Brown <lenb@kernel.org>
@@ -10420,6 +10445,14 @@ B:     https://bugzilla.kernel.org
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git
 F:     drivers/idle/intel_idle.c
 
+INTEL IDXD DRIVER
+M:     Fenghua Yu <fenghua.yu@intel.com>
+M:     Dave Jiang <dave.jiang@intel.com>
+L:     dmaengine@vger.kernel.org
+S:     Supported
+F:     drivers/dma/idxd/*
+F:     include/uapi/linux/idxd.h
+
 INTEL IN FIELD SCAN (IFS) DEVICE
 M:     Jithu Joseph <jithu.joseph@intel.com>
 R:     Ashok Raj <ashok.raj@intel.com>
@@ -10466,18 +10499,18 @@ F:    Documentation/admin-guide/media/ipu3_rcb.svg
 F:     Documentation/userspace-api/media/v4l/pixfmt-meta-intel-ipu3.rst
 F:     drivers/staging/media/ipu3/
 
-INTEL IXP4XX CRYPTO SUPPORT
-M:     Corentin Labbe <clabbe@baylibre.com>
-L:     linux-crypto@vger.kernel.org
-S:     Maintained
-F:     drivers/crypto/intel/ixp4xx/ixp4xx_crypto.c
-
 INTEL ISHTP ECLITE DRIVER
 M:     Sumesh K Naduvalath <sumesh.k.naduvalath@intel.com>
 L:     platform-driver-x86@vger.kernel.org
 S:     Supported
 F:     drivers/platform/x86/intel/ishtp_eclite.c
 
+INTEL IXP4XX CRYPTO SUPPORT
+M:     Corentin Labbe <clabbe@baylibre.com>
+L:     linux-crypto@vger.kernel.org
+S:     Maintained
+F:     drivers/crypto/intel/ixp4xx/ixp4xx_crypto.c
+
 INTEL IXP4XX QMGR, NPE, ETHERNET and HSS SUPPORT
 M:     Krzysztof Halasa <khalasa@piap.pl>
 S:     Maintained
@@ -10556,6 +10589,13 @@ F:     drivers/hwmon/intel-m10-bmc-hwmon.c
 F:     drivers/mfd/intel-m10-bmc*
 F:     include/linux/mfd/intel-m10-bmc.h
 
+INTEL MAX10 BMC SECURE UPDATES
+M:     Russ Weight <russell.h.weight@intel.com>
+L:     linux-fpga@vger.kernel.org
+S:     Maintained
+F:     Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
+F:     drivers/fpga/intel-m10-bmc-sec-update.c
+
 INTEL P-Unit IPC DRIVER
 M:     Zha Qipeng <qipeng.zha@intel.com>
 L:     platform-driver-x86@vger.kernel.org
@@ -10603,6 +10643,13 @@ L:     linux-pm@vger.kernel.org
 S:     Supported
 F:     drivers/cpufreq/intel_pstate.c
 
+INTEL PTP DFL ToD DRIVER
+M:     Tianfei Zhang <tianfei.zhang@intel.com>
+L:     linux-fpga@vger.kernel.org
+L:     netdev@vger.kernel.org
+S:     Maintained
+F:     drivers/ptp/ptp_dfl_tod.c
+
 INTEL QUADRATURE ENCODER PERIPHERAL DRIVER
 M:     Jarkko Nikula <jarkko.nikula@linux.intel.com>
 L:     linux-iio@vger.kernel.org
@@ -10621,6 +10668,21 @@ F:     drivers/platform/x86/intel/sdsi.c
 F:     tools/arch/x86/intel_sdsi/
 F:     tools/testing/selftests/drivers/sdsi/
 
+INTEL SGX
+M:     Jarkko Sakkinen <jarkko@kernel.org>
+R:     Dave Hansen <dave.hansen@linux.intel.com>
+L:     linux-sgx@vger.kernel.org
+S:     Supported
+Q:     https://patchwork.kernel.org/project/intel-sgx/list/
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/sgx
+F:     Documentation/arch/x86/sgx.rst
+F:     arch/x86/entry/vdso/vsgx.S
+F:     arch/x86/include/asm/sgx.h
+F:     arch/x86/include/uapi/asm/sgx.h
+F:     arch/x86/kernel/cpu/sgx/*
+F:     tools/testing/selftests/sgx/*
+K:     \bSGX_
+
 INTEL SKYLAKE INT3472 ACPI DEVICE DRIVER
 M:     Daniel Scally <djrscally@gmail.com>
 S:     Maintained
@@ -10638,13 +10700,13 @@ INTEL STRATIX10 FIRMWARE DRIVERS
 M:     Dinh Nguyen <dinguyen@kernel.org>
 L:     linux-kernel@vger.kernel.org
 S:     Maintained
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux.git
 F:     Documentation/ABI/testing/sysfs-devices-platform-stratix10-rsu
 F:     Documentation/devicetree/bindings/firmware/intel,stratix10-svc.txt
 F:     drivers/firmware/stratix10-rsu.c
 F:     drivers/firmware/stratix10-svc.c
 F:     include/linux/firmware/intel/stratix10-smc.h
 F:     include/linux/firmware/intel/stratix10-svc-client.h
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux.git
 
 INTEL TELEMETRY DRIVER
 M:     Rajneesh Bhardwaj <irenic.rajneesh@gmail.com>
@@ -10729,21 +10791,6 @@ F:     Documentation/arch/x86/intel_txt.rst
 F:     arch/x86/kernel/tboot.c
 F:     include/linux/tboot.h
 
-INTEL SGX
-M:     Jarkko Sakkinen <jarkko@kernel.org>
-R:     Dave Hansen <dave.hansen@linux.intel.com>
-L:     linux-sgx@vger.kernel.org
-S:     Supported
-Q:     https://patchwork.kernel.org/project/intel-sgx/list/
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/sgx
-F:     Documentation/arch/x86/sgx.rst
-F:     arch/x86/entry/vdso/vsgx.S
-F:     arch/x86/include/asm/sgx.h
-F:     arch/x86/include/uapi/asm/sgx.h
-F:     arch/x86/kernel/cpu/sgx/*
-F:     tools/testing/selftests/sgx/*
-K:     \bSGX_
-
 INTERCONNECT API
 M:     Georgi Djakov <djakov@kernel.org>
 L:     linux-pm@vger.kernel.org
@@ -10812,18 +10859,6 @@ F:     drivers/iommu/dma-iommu.h
 F:     drivers/iommu/iova.c
 F:     include/linux/iova.h
 
-IOMMUFD
-M:     Jason Gunthorpe <jgg@nvidia.com>
-M:     Kevin Tian <kevin.tian@intel.com>
-L:     iommu@lists.linux.dev
-S:     Maintained
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git
-F:     Documentation/userspace-api/iommufd.rst
-F:     drivers/iommu/iommufd/
-F:     include/linux/iommufd.h
-F:     include/uapi/linux/iommufd.h
-F:     tools/testing/selftests/iommu/
-
 IOMMU SUBSYSTEM
 M:     Joerg Roedel <joro@8bytes.org>
 M:     Will Deacon <will@kernel.org>
@@ -10839,6 +10874,18 @@ F:     include/linux/iova.h
 F:     include/linux/of_iommu.h
 F:     include/uapi/linux/iommu.h
 
+IOMMUFD
+M:     Jason Gunthorpe <jgg@nvidia.com>
+M:     Kevin Tian <kevin.tian@intel.com>
+L:     iommu@lists.linux.dev
+S:     Maintained
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git
+F:     Documentation/userspace-api/iommufd.rst
+F:     drivers/iommu/iommufd/
+F:     include/linux/iommufd.h
+F:     include/uapi/linux/iommufd.h
+F:     tools/testing/selftests/iommu/
+
 IOSYS-MAP HELPERS
 M:     Thomas Zimmermann <tzimmermann@suse.de>
 L:     dri-devel@lists.freedesktop.org
@@ -10853,11 +10900,11 @@ L:    io-uring@vger.kernel.org
 S:     Maintained
 T:     git git://git.kernel.dk/linux-block
 T:     git git://git.kernel.dk/liburing
-F:     io_uring/
 F:     include/linux/io_uring.h
 F:     include/linux/io_uring_types.h
 F:     include/trace/events/io_uring.h
 F:     include/uapi/linux/io_uring.h
+F:     io_uring/
 F:     tools/io_uring/
 
 IPMI SUBSYSTEM
@@ -10866,8 +10913,8 @@ L:      openipmi-developer@lists.sourceforge.net (moderated for non-subscribers)
 S:     Supported
 W:     http://openipmi.sourceforge.net/
 T:     git https://github.com/cminyard/linux-ipmi.git for-next
-F:     Documentation/driver-api/ipmi.rst
 F:     Documentation/devicetree/bindings/ipmi/
+F:     Documentation/driver-api/ipmi.rst
 F:     drivers/char/ipmi/
 F:     include/linux/ipmi*
 F:     include/uapi/linux/ipmi*
@@ -10919,8 +10966,8 @@ M:      Thomas Gleixner <tglx@linutronix.de>
 L:     linux-kernel@vger.kernel.org
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core
-F:     kernel/irq/
 F:     include/linux/group_cpus.h
+F:     kernel/irq/
 F:     lib/group_cpus.c
 
 IRQCHIP DRIVERS
@@ -11254,10 +11301,15 @@ W:    http://kernelnewbies.org/KernelJanitors
 KERNEL NFSD, SUNRPC, AND LOCKD SERVERS
 M:     Chuck Lever <chuck.lever@oracle.com>
 M:     Jeff Layton <jlayton@kernel.org>
+R:     Neil Brown <neilb@suse.de>
+R:     Olga Kornievskaia <kolga@netapp.com>
+R:     Dai Ngo <Dai.Ngo@oracle.com>
+R:     Tom Talpey <tom@talpey.com>
 L:     linux-nfs@vger.kernel.org
 S:     Supported
 W:     http://nfs.sourceforge.net/
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
+F:     Documentation/filesystems/nfs/
 F:     fs/exportfs/
 F:     fs/lockd/
 F:     fs/nfs_common/
@@ -11273,7 +11325,6 @@ F:      include/trace/misc/sunrpc.h
 F:     include/uapi/linux/nfsd/
 F:     include/uapi/linux/sunrpc/
 F:     net/sunrpc/
-F:     Documentation/filesystems/nfs/
 
 KERNEL REGRESSIONS
 M:     Thorsten Leemhuis <linux@leemhuis.info>
@@ -11300,9 +11351,9 @@ R:      Tom Talpey <tom@talpey.com>
 L:     linux-cifs@vger.kernel.org
 S:     Maintained
 T:     git git://git.samba.org/ksmbd.git
-F:     Documentation/filesystems/cifs/ksmbd.rst
-F:     fs/ksmbd/
-F:     fs/smbfs_common/
+F:     Documentation/filesystems/smb/ksmbd.rst
+F:     fs/smb/common/
+F:     fs/smb/server/
 
 KERNEL UNIT TESTING FRAMEWORK (KUnit)
 M:     Brendan Higgins <brendanhiggins@google.com>
@@ -11311,6 +11362,8 @@ L:      linux-kselftest@vger.kernel.org
 L:     kunit-dev@googlegroups.com
 S:     Maintained
 W:     https://google.github.io/kunit-docs/third_party/kernel/docs/
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git kunit
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git kunit-fixes
 F:     Documentation/dev-tools/kunit/
 F:     include/kunit/
 F:     lib/kunit/
@@ -11398,73 +11451,32 @@ L:    kvm@vger.kernel.org
 S:     Supported
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
 F:     Documentation/virt/kvm/s390*
-F:     arch/s390/include/asm/gmap.h
-F:     arch/s390/include/asm/kvm*
-F:     arch/s390/include/uapi/asm/kvm*
-F:     arch/s390/include/uapi/asm/uvdevice.h
-F:     arch/s390/kernel/uv.c
-F:     arch/s390/kvm/
-F:     arch/s390/mm/gmap.c
-F:     drivers/s390/char/uvdevice.c
-F:     tools/testing/selftests/drivers/s390x/uvdevice/
-F:     tools/testing/selftests/kvm/*/s390x/
-F:     tools/testing/selftests/kvm/s390x/
-
-KERNEL VIRTUAL MACHINE FOR X86 (KVM/x86)
-M:     Sean Christopherson <seanjc@google.com>
-M:     Paolo Bonzini <pbonzini@redhat.com>
-L:     kvm@vger.kernel.org
-S:     Supported
-T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
-F:     arch/x86/include/asm/kvm*
-F:     arch/x86/include/asm/svm.h
-F:     arch/x86/include/asm/vmx*.h
-F:     arch/x86/include/uapi/asm/kvm*
-F:     arch/x86/include/uapi/asm/svm.h
-F:     arch/x86/include/uapi/asm/vmx.h
-F:     arch/x86/kvm/
-F:     arch/x86/kvm/*/
-
-KVM PARAVIRT (KVM/paravirt)
-M:     Paolo Bonzini <pbonzini@redhat.com>
-R:     Wanpeng Li <wanpengli@tencent.com>
-R:     Vitaly Kuznetsov <vkuznets@redhat.com>
-L:     kvm@vger.kernel.org
-S:     Supported
-T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
-F:     arch/x86/kernel/kvm.c
-F:     arch/x86/kernel/kvmclock.c
-F:     arch/x86/include/asm/pvclock-abi.h
-F:     include/linux/kvm_para.h
-F:     include/uapi/linux/kvm_para.h
-F:     include/uapi/asm-generic/kvm_para.h
-F:     include/asm-generic/kvm_para.h
-F:     arch/um/include/asm/kvm_para.h
-F:     arch/x86/include/asm/kvm_para.h
-F:     arch/x86/include/uapi/asm/kvm_para.h
-
-KVM X86 HYPER-V (KVM/hyper-v)
-M:     Vitaly Kuznetsov <vkuznets@redhat.com>
-M:     Sean Christopherson <seanjc@google.com>
-M:     Paolo Bonzini <pbonzini@redhat.com>
-L:     kvm@vger.kernel.org
-S:     Supported
-T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
-F:     arch/x86/kvm/hyperv.*
-F:     arch/x86/kvm/kvm_onhyperv.*
-F:     arch/x86/kvm/svm/hyperv.*
-F:     arch/x86/kvm/svm/svm_onhyperv.*
-F:     arch/x86/kvm/vmx/hyperv.*
+F:     arch/s390/include/asm/gmap.h
+F:     arch/s390/include/asm/kvm*
+F:     arch/s390/include/uapi/asm/kvm*
+F:     arch/s390/include/uapi/asm/uvdevice.h
+F:     arch/s390/kernel/uv.c
+F:     arch/s390/kvm/
+F:     arch/s390/mm/gmap.c
+F:     drivers/s390/char/uvdevice.c
+F:     tools/testing/selftests/drivers/s390x/uvdevice/
+F:     tools/testing/selftests/kvm/*/s390x/
+F:     tools/testing/selftests/kvm/s390x/
 
-KVM X86 Xen (KVM/Xen)
-M:     David Woodhouse <dwmw2@infradead.org>
-M:     Paul Durrant <paul@xen.org>
+KERNEL VIRTUAL MACHINE FOR X86 (KVM/x86)
 M:     Sean Christopherson <seanjc@google.com>
 M:     Paolo Bonzini <pbonzini@redhat.com>
 L:     kvm@vger.kernel.org
 S:     Supported
 T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
-F:     arch/x86/kvm/xen.*
+F:     arch/x86/include/asm/kvm*
+F:     arch/x86/include/asm/svm.h
+F:     arch/x86/include/asm/vmx*.h
+F:     arch/x86/include/uapi/asm/kvm*
+F:     arch/x86/include/uapi/asm/svm.h
+F:     arch/x86/include/uapi/asm/vmx.h
+F:     arch/x86/kvm/
+F:     arch/x86/kvm/*/
 
 KERNFS
 M:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@@ -11504,14 +11516,6 @@ F:     include/keys/trusted-type.h
 F:     include/keys/trusted_tpm.h
 F:     security/keys/trusted-keys/
 
-KEYS-TRUSTED-TEE
-M:     Sumit Garg <sumit.garg@linaro.org>
-L:     linux-integrity@vger.kernel.org
-L:     keyrings@vger.kernel.org
-S:     Supported
-F:     include/keys/trusted_tee.h
-F:     security/keys/trusted-keys/trusted_tee.c
-
 KEYS-TRUSTED-CAAM
 M:     Ahmad Fatoum <a.fatoum@pengutronix.de>
 R:     Pengutronix Kernel Team <kernel@pengutronix.de>
@@ -11521,6 +11525,14 @@ S:     Maintained
 F:     include/keys/trusted_caam.h
 F:     security/keys/trusted-keys/trusted_caam.c
 
+KEYS-TRUSTED-TEE
+M:     Sumit Garg <sumit.garg@linaro.org>
+L:     linux-integrity@vger.kernel.org
+L:     keyrings@vger.kernel.org
+S:     Supported
+F:     include/keys/trusted_tee.h
+F:     security/keys/trusted-keys/trusted_tee.c
+
 KEYS/KEYRINGS
 M:     David Howells <dhowells@redhat.com>
 M:     Jarkko Sakkinen <jarkko@kernel.org>
@@ -11583,8 +11595,8 @@ L:      linux-amlogic@lists.infradead.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/mfd/khadas,mcu.yaml
 F:     drivers/mfd/khadas-mcu.c
-F:     include/linux/mfd/khadas-mcu.h
 F:     drivers/thermal/khadas_mcu_fan.c
+F:     include/linux/mfd/khadas-mcu.h
 
 KIONIX/ROHM KX022A ACCELEROMETER
 M:     Matti Vaittinen <mazziesaccount@gmail.com>
@@ -11621,8 +11633,8 @@ M:      "David S. Miller" <davem@davemloft.net>
 M:     Masami Hiramatsu <mhiramat@kernel.org>
 L:     linux-kernel@vger.kernel.org
 L:     linux-trace-kernel@vger.kernel.org
-Q:     https://patchwork.kernel.org/project/linux-trace-kernel/list/
 S:     Maintained
+Q:     https://patchwork.kernel.org/project/linux-trace-kernel/list/
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
 F:     Documentation/trace/kprobes.rst
 F:     include/asm-generic/kprobes.h
@@ -11656,6 +11668,47 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/leds/backlight/kinetic,ktz8866.yaml
 F:     drivers/video/backlight/ktz8866.c
 
+KVM PARAVIRT (KVM/paravirt)
+M:     Paolo Bonzini <pbonzini@redhat.com>
+R:     Wanpeng Li <wanpengli@tencent.com>
+R:     Vitaly Kuznetsov <vkuznets@redhat.com>
+L:     kvm@vger.kernel.org
+S:     Supported
+T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
+F:     arch/um/include/asm/kvm_para.h
+F:     arch/x86/include/asm/kvm_para.h
+F:     arch/x86/include/asm/pvclock-abi.h
+F:     arch/x86/include/uapi/asm/kvm_para.h
+F:     arch/x86/kernel/kvm.c
+F:     arch/x86/kernel/kvmclock.c
+F:     include/asm-generic/kvm_para.h
+F:     include/linux/kvm_para.h
+F:     include/uapi/asm-generic/kvm_para.h
+F:     include/uapi/linux/kvm_para.h
+
+KVM X86 HYPER-V (KVM/hyper-v)
+M:     Vitaly Kuznetsov <vkuznets@redhat.com>
+M:     Sean Christopherson <seanjc@google.com>
+M:     Paolo Bonzini <pbonzini@redhat.com>
+L:     kvm@vger.kernel.org
+S:     Supported
+T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
+F:     arch/x86/kvm/hyperv.*
+F:     arch/x86/kvm/kvm_onhyperv.*
+F:     arch/x86/kvm/svm/hyperv.*
+F:     arch/x86/kvm/svm/svm_onhyperv.*
+F:     arch/x86/kvm/vmx/hyperv.*
+
+KVM X86 Xen (KVM/Xen)
+M:     David Woodhouse <dwmw2@infradead.org>
+M:     Paul Durrant <paul@xen.org>
+M:     Sean Christopherson <seanjc@google.com>
+M:     Paolo Bonzini <pbonzini@redhat.com>
+L:     kvm@vger.kernel.org
+S:     Supported
+T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
+F:     arch/x86/kvm/xen.*
+
 L3MDEV
 M:     David Ahern <dsahern@kernel.org>
 L:     netdev@vger.kernel.org
@@ -11897,9 +11950,9 @@ F:      scripts/spdxexclude
 LINEAR RANGES HELPERS
 M:     Mark Brown <broonie@kernel.org>
 R:     Matti Vaittinen <mazziesaccount@gmail.com>
+F:     include/linux/linear_range.h
 F:     lib/linear_ranges.c
 F:     lib/test_linear_ranges.c
-F:     include/linux/linear_range.h
 
 LINUX FOR POWER MACINTOSH
 M:     Benjamin Herrenschmidt <benh@kernel.crashing.org>
@@ -12026,11 +12079,11 @@ M:    Joel Stanley <joel@jms.id.au>
 S:     Maintained
 F:     Documentation/devicetree/bindings/*/litex,*.yaml
 F:     arch/openrisc/boot/dts/or1klitex.dts
-F:     include/linux/litex.h
-F:     drivers/tty/serial/liteuart.c
-F:     drivers/soc/litex/*
-F:     drivers/net/ethernet/litex/*
 F:     drivers/mmc/host/litex_mmc.c
+F:     drivers/net/ethernet/litex/*
+F:     drivers/soc/litex/*
+F:     drivers/tty/serial/liteuart.c
+F:     include/linux/litex.h
 N:     litex
 
 LIVE PATCHING
@@ -12159,10 +12212,17 @@ R:    WANG Xuerui <kernel@xen0n.name>
 L:     loongarch@lists.linux.dev
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git
-F:     arch/loongarch/
-F:     drivers/*/*loongarch*
 F:     Documentation/loongarch/
 F:     Documentation/translations/zh_CN/loongarch/
+F:     arch/loongarch/
+F:     drivers/*/*loongarch*
+
+LOONGSON GPIO DRIVER
+M:     Yinbo Zhu <zhuyinbo@loongson.cn>
+L:     linux-gpio@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/gpio/loongson,ls-gpio.yaml
+F:     drivers/gpio/gpio-loongson-64bit.c
 
 LOONGSON LS2X I2C DRIVER
 M:     Binbin Zhou <zhoubinbin@loongson.cn>
@@ -12171,6 +12231,14 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/i2c/loongson,ls2x-i2c.yaml
 F:     drivers/i2c/busses/i2c-ls2x.c
 
+LOONGSON-2 SOC SERIES CLOCK DRIVER
+M:     Yinbo Zhu <zhuyinbo@loongson.cn>
+L:     linux-clk@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/clock/loongson,ls2k-clk.yaml
+F:     drivers/clk/clk-loongson2.c
+F:     include/dt-bindings/clock/loongson,ls2k-clk.h
+
 LOONGSON-2 SOC SERIES GUTS DRIVER
 M:     Yinbo Zhu <zhuyinbo@loongson.cn>
 L:     loongarch@lists.linux.dev
@@ -12186,21 +12254,6 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/pinctrl/loongson,ls2k-pinctrl.yaml
 F:     drivers/pinctrl/pinctrl-loongson2.c
 
-LOONGSON GPIO DRIVER
-M:     Yinbo Zhu <zhuyinbo@loongson.cn>
-L:     linux-gpio@vger.kernel.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/gpio/loongson,ls-gpio.yaml
-F:     drivers/gpio/gpio-loongson-64bit.c
-
-LOONGSON-2 SOC SERIES CLOCK DRIVER
-M:     Yinbo Zhu <zhuyinbo@loongson.cn>
-L:     linux-clk@vger.kernel.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/clock/loongson,ls2k-clk.yaml
-F:     drivers/clk/clk-loongson2.c
-F:     include/dt-bindings/clock/loongson,ls2k-clk.h
-
 LSILOGIC MPT FUSION DRIVERS (FC/SAS/SPI)
 M:     Sathya Prakash <sathya.prakash@broadcom.com>
 M:     Sreekanth Reddy <sreekanth.reddy@broadcom.com>
@@ -12361,20 +12414,26 @@ MAILBOX API
 M:     Jassi Brar <jassisinghbrar@gmail.com>
 L:     linux-kernel@vger.kernel.org
 S:     Maintained
+F:     Documentation/devicetree/bindings/mailbox/
 F:     drivers/mailbox/
+F:     include/dt-bindings/mailbox/
 F:     include/linux/mailbox_client.h
 F:     include/linux/mailbox_controller.h
-F:     include/dt-bindings/mailbox/
-F:     Documentation/devicetree/bindings/mailbox/
 
 MAILBOX ARM MHUv2
 M:     Viresh Kumar <viresh.kumar@linaro.org>
 M:     Tushar Khandelwal <Tushar.Khandelwal@arm.com>
 L:     linux-kernel@vger.kernel.org
 S:     Maintained
+F:     Documentation/devicetree/bindings/mailbox/arm,mhuv2.yaml
 F:     drivers/mailbox/arm_mhuv2.c
 F:     include/linux/mailbox/arm_mhuv2_message.h
-F:     Documentation/devicetree/bindings/mailbox/arm,mhuv2.yaml
+
+MAN-PAGES: MANUAL PAGES FOR LINUX -- Sections 2, 3, 4, 5, and 7
+M:     Michael Kerrisk <mtk.manpages@gmail.com>
+L:     linux-man@vger.kernel.org
+S:     Maintained
+W:     http://www.kernel.org/doc/man-pages
 
 MANAGEMENT COMPONENT TRANSPORT PROTOCOL (MCTP)
 M:     Jeremy Kerr <jk@codeconstruct.com.au>
@@ -12388,12 +12447,6 @@ F:     include/net/mctpdevice.h
 F:     include/net/netns/mctp.h
 F:     net/mctp/
 
-MAN-PAGES: MANUAL PAGES FOR LINUX -- Sections 2, 3, 4, 5, and 7
-M:     Michael Kerrisk <mtk.manpages@gmail.com>
-L:     linux-man@vger.kernel.org
-S:     Maintained
-W:     http://www.kernel.org/doc/man-pages
-
 MAPLE TREE
 M:     Liam R. Howlett <Liam.Howlett@oracle.com>
 L:     linux-mm@kvack.org
@@ -12425,8 +12478,8 @@ F:      include/linux/platform_data/mv88e6xxx.h
 MARVELL ARMADA 3700 PHY DRIVERS
 M:     Miquel Raynal <miquel.raynal@bootlin.com>
 S:     Maintained
-F:     Documentation/devicetree/bindings/phy/phy-mvebu-comphy.txt
 F:     Documentation/devicetree/bindings/phy/marvell,armada-3700-utmi-phy.yaml
+F:     Documentation/devicetree/bindings/phy/phy-mvebu-comphy.txt
 F:     drivers/phy/marvell/phy-mvebu-a3700-comphy.c
 F:     drivers/phy/marvell/phy-mvebu-a3700-utmi.c
 
@@ -12528,6 +12581,13 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/mtd/marvell-nand.txt
 F:     drivers/mtd/nand/raw/marvell_nand.c
 
+MARVELL OCTEON ENDPOINT DRIVER
+M:     Veerasenareddy Burru <vburru@marvell.com>
+M:     Abhijit Ayarekar <aayarekar@marvell.com>
+L:     netdev@vger.kernel.org
+S:     Supported
+F:     drivers/net/ethernet/marvell/octeon_ep
+
 MARVELL OCTEONTX2 PHYSICAL FUNCTION DRIVER
 M:     Sunil Goutham <sgoutham@marvell.com>
 M:     Geetha sowjanya <gakula@marvell.com>
@@ -12575,13 +12635,6 @@ S:     Supported
 F:     Documentation/devicetree/bindings/mmc/marvell,xenon-sdhci.yaml
 F:     drivers/mmc/host/sdhci-xenon*
 
-MARVELL OCTEON ENDPOINT DRIVER
-M:     Veerasenareddy Burru <vburru@marvell.com>
-M:     Abhijit Ayarekar <aayarekar@marvell.com>
-L:     netdev@vger.kernel.org
-S:     Supported
-F:     drivers/net/ethernet/marvell/octeon_ep
-
 MATROX FRAMEBUFFER DRIVER
 L:     linux-fbdev@vger.kernel.org
 S:     Orphan
@@ -12781,12 +12834,6 @@ L:     netdev@vger.kernel.org
 S:     Supported
 F:     drivers/net/phy/mxl-gpy.c
 
-MCBA MICROCHIP CAN BUS ANALYZER TOOL DRIVER
-R:     Yasushi SHOJI <yashi@spacecubics.com>
-L:     linux-can@vger.kernel.org
-S:     Maintained
-F:     drivers/net/can/usb/mcba_usb.c
-
 MCAN MMIO DEVICE DRIVER
 M:     Chandrasekar Ramakrishnan <rcsekar@samsung.com>
 L:     linux-can@vger.kernel.org
@@ -12796,6 +12843,12 @@ F:     drivers/net/can/m_can/m_can.c
 F:     drivers/net/can/m_can/m_can.h
 F:     drivers/net/can/m_can/m_can_platform.c
 
+MCBA MICROCHIP CAN BUS ANALYZER TOOL DRIVER
+R:     Yasushi SHOJI <yashi@spacecubics.com>
+L:     linux-can@vger.kernel.org
+S:     Maintained
+F:     drivers/net/can/usb/mcba_usb.c
+
 MCP2221A MICROCHIP USB-HID TO I2C BRIDGE DRIVER
 M:     Rishi Gupta <gupt21@gmail.com>
 L:     linux-i2c@vger.kernel.org
@@ -13204,13 +13257,6 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/clock/mediatek,mt7621-sysc.yaml
 F:     drivers/clk/ralink/clk-mt7621.c
 
-MEDIATEK MT7621/28/88 I2C DRIVER
-M:     Stefan Roese <sr@denx.de>
-L:     linux-i2c@vger.kernel.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/i2c/mediatek,mt7621-i2c.yaml
-F:     drivers/i2c/busses/i2c-mt7621.c
-
 MEDIATEK MT7621 PCIE CONTROLLER DRIVER
 M:     Sergio Paracuellos <sergio.paracuellos@gmail.com>
 S:     Maintained
@@ -13223,6 +13269,13 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/phy/mediatek,mt7621-pci-phy.yaml
 F:     drivers/phy/ralink/phy-mt7621-pci.c
 
+MEDIATEK MT7621/28/88 I2C DRIVER
+M:     Stefan Roese <sr@denx.de>
+L:     linux-i2c@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/i2c/mediatek,mt7621-i2c.yaml
+F:     drivers/i2c/busses/i2c-mt7621.c
+
 MEDIATEK NAND CONTROLLER DRIVER
 L:     linux-mtd@lists.infradead.org
 S:     Orphan
@@ -13249,10 +13302,11 @@ F:    drivers/memory/mtk-smi.c
 F:     include/soc/mediatek/smi.h
 
 MEDIATEK SWITCH DRIVER
-M:     Sean Wang <sean.wang@mediatek.com>
+M:     Arınç ÜNAL <arinc.unal@arinc9.com>
+M:     Daniel Golle <daniel@makrotopia.org>
 M:     Landen Chao <Landen.Chao@mediatek.com>
 M:     DENG Qingfang <dqfext@gmail.com>
-M:     Daniel Golle <daniel@makrotopia.org>
+M:     Sean Wang <sean.wang@mediatek.com>
 L:     netdev@vger.kernel.org
 S:     Maintained
 F:     drivers/net/dsa/mt7530-mdio.c
@@ -13482,10 +13536,22 @@ MEMORY FREQUENCY SCALING DRIVERS FOR NVIDIA TEGRA
 M:     Dmitry Osipenko <digetx@gmail.com>
 L:     linux-pm@vger.kernel.org
 L:     linux-tegra@vger.kernel.org
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git
 S:     Maintained
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git
 F:     drivers/devfreq/tegra30-devfreq.c
 
+MEMORY HOT(UN)PLUG
+M:     David Hildenbrand <david@redhat.com>
+M:     Oscar Salvador <osalvador@suse.de>
+L:     linux-mm@kvack.org
+S:     Maintained
+F:     Documentation/admin-guide/mm/memory-hotplug.rst
+F:     Documentation/core-api/memory-hotplug.rst
+F:     drivers/base/memory.c
+F:     include/linux/memory_hotplug.h
+F:     mm/memory_hotplug.c
+F:     tools/testing/selftests/memory-hotplug/
+
 MEMORY MANAGEMENT
 M:     Andrew Morton <akpm@linux-foundation.org>
 L:     linux-mm@kvack.org
@@ -13504,30 +13570,6 @@ F:     mm/
 F:     tools/mm/
 F:     tools/testing/selftests/mm/
 
-VMALLOC
-M:     Andrew Morton <akpm@linux-foundation.org>
-R:     Uladzislau Rezki <urezki@gmail.com>
-R:     Christoph Hellwig <hch@infradead.org>
-R:     Lorenzo Stoakes <lstoakes@gmail.com>
-L:     linux-mm@kvack.org
-S:     Maintained
-W:     http://www.linux-mm.org
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
-F:     include/linux/vmalloc.h
-F:     mm/vmalloc.c
-
-MEMORY HOT(UN)PLUG
-M:     David Hildenbrand <david@redhat.com>
-M:     Oscar Salvador <osalvador@suse.de>
-L:     linux-mm@kvack.org
-S:     Maintained
-F:     Documentation/admin-guide/mm/memory-hotplug.rst
-F:     Documentation/core-api/memory-hotplug.rst
-F:     drivers/base/memory.c
-F:     include/linux/memory_hotplug.h
-F:     mm/memory_hotplug.c
-F:     tools/testing/selftests/memory-hotplug/
-
 MEMORY TECHNOLOGY DEVICES (MTD)
 M:     Miquel Raynal <miquel.raynal@bootlin.com>
 M:     Richard Weinberger <richard@nod.at>
@@ -13638,6 +13680,12 @@ W:     http://www.monstr.eu/fdt/
 T:     git git://git.monstr.eu/linux-2.6-microblaze.git
 F:     arch/microblaze/
 
+MICROBLAZE TMR INJECT
+M:     Appana Durga Kedareswara rao <appana.durga.kedareswara.rao@amd.com>
+S:     Supported
+F:     Documentation/devicetree/bindings/misc/xlnx,tmr-inject.yaml
+F:     drivers/misc/xilinx_tmr_inject.c
+
 MICROBLAZE TMR MANAGER
 M:     Appana Durga Kedareswara rao <appana.durga.kedareswara.rao@amd.com>
 S:     Supported
@@ -13645,12 +13693,6 @@ F:     Documentation/ABI/testing/sysfs-driver-xilinx-tmr-manager
 F:     Documentation/devicetree/bindings/misc/xlnx,tmr-manager.yaml
 F:     drivers/misc/xilinx_tmr_manager.c
 
-MICROBLAZE TMR INJECT
-M:     Appana Durga Kedareswara rao <appana.durga.kedareswara.rao@amd.com>
-S:     Supported
-F:     Documentation/devicetree/bindings/misc/xlnx,tmr-inject.yaml
-F:     drivers/misc/xilinx_tmr_inject.c
-
 MICROCHIP AT91 DMA DRIVERS
 M:     Ludovic Desroches <ludovic.desroches@microchip.com>
 M:     Tudor Ambarus <tudor.ambarus@linaro.org>
@@ -13726,10 +13768,10 @@ L:    linux-media@vger.kernel.org
 S:     Supported
 F:     Documentation/devicetree/bindings/media/atmel,isc.yaml
 F:     Documentation/devicetree/bindings/media/microchip,xisc.yaml
-F:     drivers/staging/media/deprecated/atmel/atmel-isc*
-F:     drivers/staging/media/deprecated/atmel/atmel-sama*-isc*
 F:     drivers/media/platform/microchip/microchip-isc*
 F:     drivers/media/platform/microchip/microchip-sama*-isc*
+F:     drivers/staging/media/deprecated/atmel/atmel-isc*
+F:     drivers/staging/media/deprecated/atmel/atmel-sama*-isc*
 F:     include/linux/atmel-isc-media.h
 
 MICROCHIP ISI DRIVER
@@ -13749,14 +13791,7 @@ F:     Documentation/devicetree/bindings/net/dsa/microchip,lan937x.yaml
 F:     drivers/net/dsa/microchip/*
 F:     include/linux/dsa/ksz_common.h
 F:     include/linux/platform_data/microchip-ksz.h
-F:     net/dsa/tag_ksz.c
-
-MICROCHIP LAN87xx/LAN937x T1 PHY DRIVER
-M:     Arun Ramadoss <arun.ramadoss@microchip.com>
-R:     UNGLinuxDriver@microchip.com
-L:     netdev@vger.kernel.org
-S:     Maintained
-F:     drivers/net/phy/microchip_t1.c
+F:     net/dsa/tag_ksz.c
 
 MICROCHIP LAN743X ETHERNET DRIVER
 M:     Bryan Whitehead <bryan.whitehead@microchip.com>
@@ -13765,6 +13800,13 @@ L:     netdev@vger.kernel.org
 S:     Maintained
 F:     drivers/net/ethernet/microchip/lan743x_*
 
+MICROCHIP LAN87xx/LAN937x T1 PHY DRIVER
+M:     Arun Ramadoss <arun.ramadoss@microchip.com>
+R:     UNGLinuxDriver@microchip.com
+L:     netdev@vger.kernel.org
+S:     Maintained
+F:     drivers/net/phy/microchip_t1.c
+
 MICROCHIP LAN966X ETHERNET DRIVER
 M:     Horatiu Vultur <horatiu.vultur@microchip.com>
 M:     UNGLinuxDriver@microchip.com
@@ -13806,14 +13848,6 @@ S:     Supported
 F:     Documentation/devicetree/bindings/mtd/atmel-nand.txt
 F:     drivers/mtd/nand/raw/atmel/*
 
-MICROCHIP PCI1XXXX GP DRIVER
-M:     Kumaravel Thiagarajan <kumaravel.thiagarajan@microchip.com>
-L:     linux-gpio@vger.kernel.org
-S:     Supported
-F:     drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c
-F:     drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.h
-F:     drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gpio.c
-
 MICROCHIP OTPC DRIVER
 M:     Claudiu Beznea <claudiu.beznea@microchip.com>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
@@ -13822,6 +13856,14 @@ F:     Documentation/devicetree/bindings/nvmem/microchip,sama7g5-otpc.yaml
 F:     drivers/nvmem/microchip-otpc.c
 F:     include/dt-bindings/nvmem/microchip,sama7g5-otpc.h
 
+MICROCHIP PCI1XXXX GP DRIVER
+M:     Kumaravel Thiagarajan <kumaravel.thiagarajan@microchip.com>
+L:     linux-gpio@vger.kernel.org
+S:     Supported
+F:     drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c
+F:     drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.h
+F:     drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gpio.c
+
 MICROCHIP PCI1XXXX I2C DRIVER
 M:     Tharun Kumar P <tharunkumar.pasumarthi@microchip.com>
 M:     Kumaravel Thiagarajan <kumaravel.thiagarajan@microchip.com>
@@ -13837,6 +13879,14 @@ L:     linux-serial@vger.kernel.org
 S:     Maintained
 F:     drivers/tty/serial/8250/8250_pci1xxxx.c
 
+MICROCHIP POLARFIRE FPGA DRIVERS
+M:     Conor Dooley <conor.dooley@microchip.com>
+R:     Vladimir Georgiev <v.georgiev@metrotek.ru>
+L:     linux-fpga@vger.kernel.org
+S:     Supported
+F:     Documentation/devicetree/bindings/fpga/microchip,mpf-spi-fpga-mgr.yaml
+F:     drivers/fpga/microchip-spi.c
+
 MICROCHIP PWM DRIVER
 M:     Claudiu Beznea <claudiu.beznea@microchip.com>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
@@ -13858,6 +13908,12 @@ M:     Claudiu Beznea <claudiu.beznea@microchip.com>
 S:     Supported
 F:     drivers/power/reset/at91-sama5d2_shdwc.c
 
+MICROCHIP SOC DRIVERS
+M:     Conor Dooley <conor@kernel.org>
+S:     Supported
+T:     git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/
+F:     drivers/soc/microchip/
+
 MICROCHIP SPI DRIVER
 M:     Tudor Ambarus <tudor.ambarus@linaro.org>
 S:     Supported
@@ -13871,11 +13927,12 @@ F:    Documentation/devicetree/bindings/misc/atmel-ssc.txt
 F:     drivers/misc/atmel-ssc.c
 F:     include/linux/atmel-ssc.h
 
-MICROCHIP SOC DRIVERS
-M:     Conor Dooley <conor@kernel.org>
-S:     Supported
-T:     git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/
-F:     drivers/soc/microchip/
+Microchip Timer Counter Block (TCB) Capture Driver
+M:     Kamel Bouhara <kamel.bouhara@bootlin.com>
+L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+L:     linux-iio@vger.kernel.org
+S:     Maintained
+F:     drivers/counter/microchip-tcb-capture.c
 
 MICROCHIP USB251XB DRIVER
 M:     Richard Leitner <richard.leitner@skidata.com>
@@ -13992,6 +14049,12 @@ L:     platform-driver-x86@vger.kernel.org
 S:     Supported
 F:     drivers/platform/surface/surfacepro3_button.c
 
+MICROSOFT SURFACE SYSTEM AGGREGATOR HUB DRIVER
+M:     Maximilian Luz <luzmaximilian@gmail.com>
+L:     platform-driver-x86@vger.kernel.org
+S:     Maintained
+F:     drivers/platform/surface/surface_aggregator_hub.c
+
 MICROSOFT SURFACE SYSTEM AGGREGATOR SUBSYSTEM
 M:     Maximilian Luz <luzmaximilian@gmail.com>
 L:     platform-driver-x86@vger.kernel.org
@@ -14007,12 +14070,6 @@ F:     include/linux/surface_acpi_notify.h
 F:     include/linux/surface_aggregator/
 F:     include/uapi/linux/surface_aggregator/
 
-MICROSOFT SURFACE SYSTEM AGGREGATOR HUB DRIVER
-M:     Maximilian Luz <luzmaximilian@gmail.com>
-L:     platform-driver-x86@vger.kernel.org
-S:     Maintained
-F:     drivers/platform/surface/surface_aggregator_hub.c
-
 MICROTEK X6 SCANNER
 M:     Oliver Neukum <oliver@neukum.org>
 S:     Maintained
@@ -14178,11 +14235,11 @@ L:    linux-modules@vger.kernel.org
 L:     linux-kernel@vger.kernel.org
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git modules-next
-F:     include/linux/module.h
 F:     include/linux/kmod.h
+F:     include/linux/module.h
 F:     kernel/module/
-F:     scripts/module*
 F:     lib/test_kmod.c
+F:     scripts/module*
 F:     tools/testing/selftests/kmod/
 
 MONOLITHIC POWER SYSTEM PMIC DRIVER
@@ -14558,6 +14615,7 @@ T:      git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
 F:     Documentation/devicetree/bindings/net/
 F:     drivers/connector/
 F:     drivers/net/
+X:     drivers/net/wireless/
 F:     include/dt-bindings/net/
 F:     include/linux/etherdevice.h
 F:     include/linux/fcdevice.h
@@ -14607,6 +14665,7 @@ B:      mailto:netdev@vger.kernel.org
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
 F:     Documentation/core-api/netlink.rst
+F:     Documentation/netlink/
 F:     Documentation/networking/
 F:     Documentation/process/maintainer-netdev.rst
 F:     Documentation/userspace-api/netlink/
@@ -14621,6 +14680,7 @@ F:      include/uapi/linux/netdevice.h
 F:     lib/net_utils.c
 F:     lib/random32.c
 F:     net/
+X:     net/bluetooth/
 F:     tools/net/
 F:     tools/testing/selftests/net/
 
@@ -14771,6 +14831,7 @@ L:      linux-nfs@vger.kernel.org
 S:     Maintained
 W:     http://client.linux-nfs.org
 T:     git git://git.linux-nfs.org/projects/trondmy/linux-nfs.git
+F:     Documentation/filesystems/nfs/
 F:     fs/lockd/
 F:     fs/nfs/
 F:     fs/nfs_common/
@@ -14780,7 +14841,6 @@ F:      include/linux/sunrpc/
 F:     include/uapi/linux/nfs*
 F:     include/uapi/linux/sunrpc/
 F:     net/sunrpc/
-F:     Documentation/filesystems/nfs/
 
 NILFS2 FILESYSTEM
 M:     Ryusuke Konishi <konishi.ryusuke@gmail.com>
@@ -14920,6 +14980,7 @@ F:      drivers/ntb/hw/intel/
 
 NTFS FILESYSTEM
 M:     Anton Altaparmakov <anton@tuxera.com>
+R:     Namjae Jeon <linkinjeon@kernel.org>
 L:     linux-ntfs-dev@lists.sourceforge.net
 S:     Supported
 W:     http://www.tuxera.com/
@@ -14984,12 +15045,6 @@ F:     drivers/nvme/target/auth.c
 F:     drivers/nvme/target/fabrics-cmd-auth.c
 F:     include/linux/nvme-auth.h
 
-NVM EXPRESS HARDWARE MONITORING SUPPORT
-M:     Guenter Roeck <linux@roeck-us.net>
-L:     linux-nvme@lists.infradead.org
-S:     Supported
-F:     drivers/nvme/host/hwmon.c
-
 NVM EXPRESS FC TRANSPORT DRIVERS
 M:     James Smart <james.smart@broadcom.com>
 L:     linux-nvme@lists.infradead.org
@@ -15000,6 +15055,12 @@ F:     drivers/nvme/target/fcloop.c
 F:     include/linux/nvme-fc-driver.h
 F:     include/linux/nvme-fc.h
 
+NVM EXPRESS HARDWARE MONITORING SUPPORT
+M:     Guenter Roeck <linux@roeck-us.net>
+L:     linux-nvme@lists.infradead.org
+S:     Supported
+F:     drivers/nvme/host/hwmon.c
+
 NVM EXPRESS TARGET DRIVER
 M:     Christoph Hellwig <hch@lst.de>
 M:     Sagi Grimberg <sagi@grimberg.me>
@@ -15020,6 +15081,13 @@ F:     drivers/nvmem/
 F:     include/linux/nvmem-consumer.h
 F:     include/linux/nvmem-provider.h
 
+NXP BLUETOOTH WIRELESS DRIVERS
+M:     Amitkumar Karwar <amitkumar.karwar@nxp.com>
+M:     Neeraj Kale <neeraj.sanjaykale@nxp.com>
+S:     Maintained
+F:     Documentation/devicetree/bindings/net/bluetooth/nxp,88w8987-bt.yaml
+F:     drivers/bluetooth/btnxpuart.c
+
 NXP C45 TJA11XX PHY DRIVER
 M:     Radu Pirea <radu-nicolae.pirea@oss.nxp.com>
 L:     netdev@vger.kernel.org
@@ -15045,16 +15113,17 @@ F:    drivers/iio/gyro/fxas21002c_core.c
 F:     drivers/iio/gyro/fxas21002c_i2c.c
 F:     drivers/iio/gyro/fxas21002c_spi.c
 
-NXP i.MX CLOCK DRIVERS
-M:     Abel Vesa <abelvesa@kernel.org>
-R:     Peng Fan <peng.fan@nxp.com>
-L:     linux-clk@vger.kernel.org
+NXP i.MX 7D/6SX/6UL/93 AND VF610 ADC DRIVER
+M:     Haibo Chen <haibo.chen@nxp.com>
+L:     linux-iio@vger.kernel.org
 L:     linux-imx@nxp.com
 S:     Maintained
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/abelvesa/linux.git clk/imx
-F:     Documentation/devicetree/bindings/clock/imx*
-F:     drivers/clk/imx/
-F:     include/dt-bindings/clock/imx*
+F:     Documentation/devicetree/bindings/iio/adc/fsl,imx7d-adc.yaml
+F:     Documentation/devicetree/bindings/iio/adc/fsl,vf610-adc.yaml
+F:     Documentation/devicetree/bindings/iio/adc/nxp,imx93-adc.yaml
+F:     drivers/iio/adc/imx7d_adc.c
+F:     drivers/iio/adc/imx93_adc.c
+F:     drivers/iio/adc/vf610_adc.c
 
 NXP i.MX 8M ISI DRIVER
 M:     Laurent Pinchart <laurent.pinchart@ideasonboard.com>
@@ -15063,6 +15132,15 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/media/nxp,imx8-isi.yaml
 F:     drivers/media/platform/nxp/imx8-isi/
 
+NXP i.MX 8MP DW100 V4L2 DRIVER
+M:     Xavier Roumegue <xavier.roumegue@oss.nxp.com>
+L:     linux-media@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/media/nxp,dw100.yaml
+F:     Documentation/userspace-api/media/drivers/dw100.rst
+F:     drivers/media/platform/nxp/dw100/
+F:     include/uapi/linux/dw100.h
+
 NXP i.MX 8MQ DCSS DRIVER
 M:     Laurentiu Palcu <laurentiu.palcu@oss.nxp.com>
 R:     Lucas Stach <l.stach@pengutronix.de>
@@ -15080,17 +15158,24 @@ S:    Maintained
 F:     Documentation/devicetree/bindings/iio/adc/nxp,imx8qxp-adc.yaml
 F:     drivers/iio/adc/imx8qxp-adc.c
 
-NXP i.MX 7D/6SX/6UL/93 AND VF610 ADC DRIVER
-M:     Haibo Chen <haibo.chen@nxp.com>
-L:     linux-iio@vger.kernel.org
+NXP i.MX 8QXP/8QM JPEG V4L2 DRIVER
+M:     Mirela Rabulea <mirela.rabulea@nxp.com>
+R:     NXP Linux Team <linux-imx@nxp.com>
+L:     linux-media@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/media/nxp,imx8-jpeg.yaml
+F:     drivers/media/platform/nxp/imx-jpeg
+
+NXP i.MX CLOCK DRIVERS
+M:     Abel Vesa <abelvesa@kernel.org>
+R:     Peng Fan <peng.fan@nxp.com>
+L:     linux-clk@vger.kernel.org
 L:     linux-imx@nxp.com
 S:     Maintained
-F:     Documentation/devicetree/bindings/iio/adc/fsl,imx7d-adc.yaml
-F:     Documentation/devicetree/bindings/iio/adc/fsl,vf610-adc.yaml
-F:     Documentation/devicetree/bindings/iio/adc/nxp,imx93-adc.yaml
-F:     drivers/iio/adc/imx7d_adc.c
-F:     drivers/iio/adc/imx93_adc.c
-F:     drivers/iio/adc/vf610_adc.c
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/abelvesa/linux.git clk/imx
+F:     Documentation/devicetree/bindings/clock/imx*
+F:     drivers/clk/imx/
+F:     include/dt-bindings/clock/imx*
 
 NXP PF8100/PF8121A/PF8200 PMIC REGULATOR DEVICE DRIVER
 M:     Jagan Teki <jagan@amarulasolutions.com>
@@ -15136,34 +15221,17 @@ S:    Maintained
 F:     Documentation/devicetree/bindings/sound/tfa9879.txt
 F:     sound/soc/codecs/tfa9879*
 
-NXP/Goodix TFA989X (TFA1) DRIVER
-M:     Stephan Gerhold <stephan@gerhold.net>
-L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
-S:     Maintained
-F:     Documentation/devicetree/bindings/sound/nxp,tfa989x.yaml
-F:     sound/soc/codecs/tfa989x.c
-
 NXP-NCI NFC DRIVER
 S:     Orphan
 F:     Documentation/devicetree/bindings/net/nfc/nxp,nci.yaml
 F:     drivers/nfc/nxp-nci
 
-NXP i.MX 8MP DW100 V4L2 DRIVER
-M:     Xavier Roumegue <xavier.roumegue@oss.nxp.com>
-L:     linux-media@vger.kernel.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/media/nxp,dw100.yaml
-F:     Documentation/userspace-api/media/drivers/dw100.rst
-F:     drivers/media/platform/nxp/dw100/
-F:     include/uapi/linux/dw100.h
-
-NXP i.MX 8QXP/8QM JPEG V4L2 DRIVER
-M:     Mirela Rabulea <mirela.rabulea@nxp.com>
-R:     NXP Linux Team <linux-imx@nxp.com>
-L:     linux-media@vger.kernel.org
+NXP/Goodix TFA989X (TFA1) DRIVER
+M:     Stephan Gerhold <stephan@gerhold.net>
+L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
 S:     Maintained
-F:     Documentation/devicetree/bindings/media/nxp,imx8-jpeg.yaml
-F:     drivers/media/platform/nxp/imx-jpeg
+F:     Documentation/devicetree/bindings/sound/nxp,tfa989x.yaml
+F:     sound/soc/codecs/tfa989x.c
 
 NZXT-KRAKEN2 HARDWARE MONITORING DRIVER
 M:     Jonas Malaco <jonas@protocubo.io>
@@ -15263,7 +15331,7 @@ OMAP DISPLAY SUBSYSTEM and FRAMEBUFFER SUPPORT (DSS2)
 L:     linux-omap@vger.kernel.org
 L:     linux-fbdev@vger.kernel.org
 S:     Orphan
-F:     Documentation/arm/omap/dss.rst
+F:     Documentation/arch/arm/omap/dss.rst
 F:     drivers/video/fbdev/omap2/
 
 OMAP FRAMEBUFFER SUPPORT
@@ -15689,8 +15757,8 @@ M:      Rob Herring <robh+dt@kernel.org>
 M:     Frank Rowand <frowand.list@gmail.com>
 L:     devicetree@vger.kernel.org
 S:     Maintained
-C:     irc://irc.libera.chat/devicetree
 W:     http://www.devicetree.org/
+C:     irc://irc.libera.chat/devicetree
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git
 F:     Documentation/ABI/testing/sysfs-firmware-ofw
 F:     drivers/of/
@@ -15706,8 +15774,8 @@ M:      Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>
 M:     Conor Dooley <conor+dt@kernel.org>
 L:     devicetree@vger.kernel.org
 S:     Maintained
-C:     irc://irc.libera.chat/devicetree
 Q:     http://patchwork.ozlabs.org/project/devicetree-bindings/list/
+C:     irc://irc.libera.chat/devicetree
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git
 F:     Documentation/devicetree/
 F:     arch/*/boot/dts/
@@ -15720,13 +15788,6 @@ L:     netdev@vger.kernel.org
 S:     Maintained
 F:     drivers/ptp/ptp_ocp.c
 
-INTEL PTP DFL ToD DRIVER
-M:     Tianfei Zhang <tianfei.zhang@intel.com>
-L:     linux-fpga@vger.kernel.org
-L:     netdev@vger.kernel.org
-S:     Maintained
-F:     drivers/ptp/ptp_dfl_tod.c
-
 OPENCORES I2C BUS DRIVER
 M:     Peter Korsgaard <peter@korsgaard.com>
 M:     Andrew Lunn <andrew@lunn.ch>
@@ -15745,8 +15806,8 @@ L:      linux-openrisc@vger.kernel.org
 S:     Maintained
 W:     http://openrisc.io
 T:     git https://github.com/openrisc/linux.git
-F:     Documentation/devicetree/bindings/openrisc/
 F:     Documentation/arch/openrisc/
+F:     Documentation/devicetree/bindings/openrisc/
 F:     arch/openrisc/
 F:     drivers/irqchip/irq-ompic.c
 F:     drivers/irqchip/irq-or1k-*
@@ -15921,7 +15982,7 @@ F:      include/uapi/linux/ppdev.h
 
 PARAVIRT_OPS INTERFACE
 M:     Juergen Gross <jgross@suse.com>
-M:     Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
+R:     Ajay Kaher <akaher@vmware.com>
 R:     Alexey Makhalov <amakhalov@vmware.com>
 R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
 L:     virtualization@lists.linux-foundation.org
@@ -16062,6 +16123,14 @@ L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Maintained
 F:     drivers/pci/controller/dwc/*layerscape*
 
+PCI DRIVER FOR FU740
+M:     Paul Walmsley <paul.walmsley@sifive.com>
+M:     Greentime Hu <greentime.hu@sifive.com>
+L:     linux-pci@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/pci/sifive,fu740-pcie.yaml
+F:     drivers/pci/controller/dwc/pcie-fu740.c
+
 PCI DRIVER FOR GENERIC OF HOSTS
 M:     Will Deacon <will@kernel.org>
 L:     linux-pci@vger.kernel.org
@@ -16082,14 +16151,6 @@ F:     Documentation/devicetree/bindings/pci/fsl,imx6q-pcie-ep.yaml
 F:     Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.yaml
 F:     drivers/pci/controller/dwc/*imx6*
 
-PCI DRIVER FOR FU740
-M:     Paul Walmsley <paul.walmsley@sifive.com>
-M:     Greentime Hu <greentime.hu@sifive.com>
-L:     linux-pci@vger.kernel.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/pci/sifive,fu740-pcie.yaml
-F:     drivers/pci/controller/dwc/pcie-fu740.c
-
 PCI DRIVER FOR INTEL IXP4XX
 M:     Linus Walleij <linus.walleij@linaro.org>
 S:     Maintained
@@ -16169,8 +16230,8 @@ M:      Jingoo Han <jingoohan1@gmail.com>
 M:     Gustavo Pimentel <gustavo.pimentel@synopsys.com>
 L:     linux-pci@vger.kernel.org
 S:     Maintained
-F:     Documentation/devicetree/bindings/pci/snps,dw-pcie.yaml
 F:     Documentation/devicetree/bindings/pci/snps,dw-pcie-ep.yaml
+F:     Documentation/devicetree/bindings/pci/snps,dw-pcie.yaml
 F:     drivers/pci/controller/dwc/*designware*
 
 PCI DRIVER FOR TI DRA7XX/J721E
@@ -16190,6 +16251,14 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/pci/v3-v360epc-pci.txt
 F:     drivers/pci/controller/pci-v3-semi.c
 
+PCI DRIVER FOR XILINX VERSAL CPM
+M:     Bharat Kumar Gogada <bharat.kumar.gogada@amd.com>
+M:     Michal Simek <michal.simek@amd.com>
+L:     linux-pci@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/pci/xilinx-versal-cpm.yaml
+F:     drivers/pci/controller/pcie-xilinx-cpm.c
+
 PCI ENDPOINT SUBSYSTEM
 M:     Lorenzo Pieralisi <lpieralisi@kernel.org>
 M:     Krzysztof Wilczyński <kw@linux.com>
@@ -16227,19 +16296,6 @@ L:     linux-pci@vger.kernel.org
 S:     Supported
 F:     Documentation/PCI/pci-error-recovery.rst
 
-PCI PEER-TO-PEER DMA (P2PDMA)
-M:     Bjorn Helgaas <bhelgaas@google.com>
-M:     Logan Gunthorpe <logang@deltatee.com>
-L:     linux-pci@vger.kernel.org
-S:     Supported
-Q:     https://patchwork.kernel.org/project/linux-pci/list/
-B:     https://bugzilla.kernel.org
-C:     irc://irc.oftc.net/linux-pci
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
-F:     Documentation/driver-api/pci/p2pdma.rst
-F:     drivers/pci/p2pdma.c
-F:     include/linux/pci-p2pdma.h
-
 PCI MSI DRIVER FOR ALTERA MSI IP
 M:     Joyce Ooi <joyce.ooi@intel.com>
 L:     linux-pci@vger.kernel.org
@@ -16270,6 +16326,19 @@ F:     drivers/pci/controller/
 F:     drivers/pci/pci-bridge-emul.c
 F:     drivers/pci/pci-bridge-emul.h
 
+PCI PEER-TO-PEER DMA (P2PDMA)
+M:     Bjorn Helgaas <bhelgaas@google.com>
+M:     Logan Gunthorpe <logang@deltatee.com>
+L:     linux-pci@vger.kernel.org
+S:     Supported
+Q:     https://patchwork.kernel.org/project/linux-pci/list/
+B:     https://bugzilla.kernel.org
+C:     irc://irc.oftc.net/linux-pci
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
+F:     Documentation/driver-api/pci/p2pdma.rst
+F:     drivers/pci/p2pdma.c
+F:     include/linux/pci-p2pdma.h
+
 PCI SUBSYSTEM
 M:     Bjorn Helgaas <bhelgaas@google.com>
 L:     linux-pci@vger.kernel.org
@@ -16349,7 +16418,7 @@ F:      Documentation/devicetree/bindings/pci/intel,keembay-pcie*
 F:     drivers/pci/controller/dwc/pcie-keembay.c
 
 PCIE DRIVER FOR INTEL LGM GW SOC
-M:     Rahul Tanwar <rtanwar@maxlinear.com>
+M:     Chuanhua Lei <lchuanhua@maxlinear.com>
 L:     linux-pci@vger.kernel.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/pci/intel-gw-pcie.yaml
@@ -16378,14 +16447,6 @@ L:     linux-arm-msm@vger.kernel.org
 S:     Maintained
 F:     drivers/pci/controller/dwc/pcie-qcom.c
 
-PCIE ENDPOINT DRIVER FOR QUALCOMM
-M:     Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
-L:     linux-pci@vger.kernel.org
-L:     linux-arm-msm@vger.kernel.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/pci/qcom,pcie-ep.yaml
-F:     drivers/pci/controller/dwc/pcie-qcom-ep.c
-
 PCIE DRIVER FOR ROCKCHIP
 M:     Shawn Lin <shawn.lin@rock-chips.com>
 L:     linux-pci@vger.kernel.org
@@ -16407,13 +16468,13 @@ L:    linux-pci@vger.kernel.org
 S:     Maintained
 F:     drivers/pci/controller/dwc/*spear*
 
-PCI DRIVER FOR XILINX VERSAL CPM
-M:     Bharat Kumar Gogada <bharat.kumar.gogada@amd.com>
-M:     Michal Simek <michal.simek@amd.com>
+PCIE ENDPOINT DRIVER FOR QUALCOMM
+M:     Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
 L:     linux-pci@vger.kernel.org
+L:     linux-arm-msm@vger.kernel.org
 S:     Maintained
-F:     Documentation/devicetree/bindings/pci/xilinx-versal-cpm.yaml
-F:     drivers/pci/controller/pcie-xilinx-cpm.c
+F:     Documentation/devicetree/bindings/pci/qcom,pcie-ep.yaml
+F:     drivers/pci/controller/dwc/pcie-qcom-ep.c
 
 PCMCIA SUBSYSTEM
 M:     Dominik Brodowski <linux@dominikbrodowski.net>
@@ -16683,9 +16744,9 @@ R:      Alim Akhtar <alim.akhtar@samsung.com>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 L:     linux-samsung-soc@vger.kernel.org
 S:     Maintained
-C:     irc://irc.libera.chat/linux-exynos
 Q:     https://patchwork.kernel.org/project/linux-samsung-soc/list/
 B:     mailto:linux-samsung-soc@vger.kernel.org
+C:     irc://irc.libera.chat/linux-exynos
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/samsung.git
 F:     Documentation/devicetree/bindings/pinctrl/samsung,pinctrl*yaml
 F:     drivers/pinctrl/samsung/
@@ -16747,13 +16808,6 @@ M:     Logan Gunthorpe <logang@deltatee.com>
 S:     Maintained
 F:     drivers/dma/plx_dma.c
 
-PM6764TR DRIVER
-M:     Charles Hsu     <hsu.yungteng@gmail.com>
-L:     linux-hwmon@vger.kernel.org
-S:     Maintained
-F:     Documentation/hwmon/pm6764tr.rst
-F:     drivers/hwmon/pmbus/pm6764tr.c
-
 PM-GRAPH UTILITY
 M:     "Todd E Brandt" <todd.e.brandt@linux.intel.com>
 L:     linux-pm@vger.kernel.org
@@ -16763,6 +16817,13 @@ B:     https://bugzilla.kernel.org/buglist.cgi?component=pm-graph&product=Tools
 T:     git git://github.com/intel/pm-graph
 F:     tools/power/pm-graph
 
+PM6764TR DRIVER
+M:     Charles Hsu     <hsu.yungteng@gmail.com>
+L:     linux-hwmon@vger.kernel.org
+S:     Maintained
+F:     Documentation/hwmon/pm6764tr.rst
+F:     drivers/hwmon/pmbus/pm6764tr.c
+
 PMBUS HARDWARE MONITORING DRIVERS
 M:     Guenter Roeck <linux@roeck-us.net>
 L:     linux-hwmon@vger.kernel.org
@@ -16843,15 +16904,6 @@ F:     include/linux/pm_*
 F:     include/linux/powercap.h
 F:     kernel/configs/nopm.config
 
-DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
-M:     Daniel Lezcano <daniel.lezcano@kernel.org>
-L:     linux-pm@vger.kernel.org
-S:     Supported
-B:     https://bugzilla.kernel.org
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
-F:     drivers/powercap/dtpm*
-F:     include/linux/dtpm.h
-
 POWER STATE COORDINATION INTERFACE (PSCI)
 M:     Mark Rutland <mark.rutland@arm.com>
 M:     Lorenzo Pieralisi <lpieralisi@kernel.org>
@@ -17010,8 +17062,8 @@ R:      Guilherme G. Piccoli <gpiccoli@igalia.com>
 L:     linux-hardening@vger.kernel.org
 S:     Supported
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
-F:     Documentation/admin-guide/ramoops.rst
 F:     Documentation/admin-guide/pstore-blk.rst
+F:     Documentation/admin-guide/ramoops.rst
 F:     Documentation/devicetree/bindings/reserved-memory/ramoops.yaml
 F:     drivers/acpi/apei/erst.c
 F:     drivers/firmware/efi/efi-pstore.c
@@ -17160,10 +17212,10 @@ F:    sound/soc/codecs/lpass-va-macro.c
 F:     sound/soc/codecs/lpass-wsa-macro.*
 F:     sound/soc/codecs/msm8916-wcd-analog.c
 F:     sound/soc/codecs/msm8916-wcd-digital.c
-F:     sound/soc/codecs/wcd9335.*
-F:     sound/soc/codecs/wcd934x.c
 F:     sound/soc/codecs/wcd-clsh-v2.*
 F:     sound/soc/codecs/wcd-mbhc-v2.*
+F:     sound/soc/codecs/wcd9335.*
+F:     sound/soc/codecs/wcd934x.c
 F:     sound/soc/codecs/wsa881x.c
 F:     sound/soc/codecs/wsa883x.c
 F:     sound/soc/qcom/
@@ -17320,14 +17372,21 @@ Q:    http://patchwork.linuxtv.org/project/linux-media/list/
 T:     git git://linuxtv.org/anttip/media_tree.git
 F:     drivers/media/tuners/qt1010*
 
+QUALCOMM ATH12K WIRELESS DRIVER
+M:     Kalle Valo <kvalo@kernel.org>
+L:     ath12k@lists.infradead.org
+S:     Supported
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
+F:     drivers/net/wireless/ath/ath12k/
+
 QUALCOMM ATHEROS ATH10K WIRELESS DRIVER
 M:     Kalle Valo <kvalo@kernel.org>
 L:     ath10k@lists.infradead.org
 S:     Supported
 W:     https://wireless.wiki.kernel.org/en/users/Drivers/ath10k
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
-F:     drivers/net/wireless/ath/ath10k/
 F:     Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml
+F:     drivers/net/wireless/ath/ath10k/
 
 QUALCOMM ATHEROS ATH11K WIRELESS DRIVER
 M:     Kalle Valo <kvalo@kernel.org>
@@ -17337,13 +17396,6 @@ T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
 F:     Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml
 F:     drivers/net/wireless/ath/ath11k/
 
-QUALCOMM ATH12K WIRELESS DRIVER
-M:     Kalle Valo <kvalo@kernel.org>
-L:     ath12k@lists.infradead.org
-S:     Supported
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
-F:     drivers/net/wireless/ath/ath12k/
-
 QUALCOMM ATHEROS ATH9K WIRELESS DRIVER
 M:     Toke Høiland-Jørgensen <toke@toke.dk>
 L:     linux-wireless@vger.kernel.org
@@ -17440,8 +17492,8 @@ F:      include/uapi/misc/fastrpc.h
 QUALCOMM HEXAGON ARCHITECTURE
 M:     Brian Cain <bcain@quicinc.com>
 L:     linux-hexagon@vger.kernel.org
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux.git
 S:     Supported
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux.git
 F:     arch/hexagon/
 
 QUALCOMM HIDMA DRIVER
@@ -17563,9 +17615,9 @@ M:      Christian König <christian.koenig@amd.com>
 M:     Pan, Xinhui <Xinhui.Pan@amd.com>
 L:     amd-gfx@lists.freedesktop.org
 S:     Supported
-T:     git https://gitlab.freedesktop.org/agd5f/linux.git
 B:     https://gitlab.freedesktop.org/drm/amd/-/issues
 C:     irc://irc.oftc.net/radeon
+T:     git https://gitlab.freedesktop.org/agd5f/linux.git
 F:     Documentation/gpu/amdgpu/
 F:     drivers/gpu/drm/amd/
 F:     drivers/gpu/drm/radeon/
@@ -17653,8 +17705,8 @@ F:      arch/mips/generic/board-ranchu.c
 RANDOM NUMBER DRIVER
 M:     "Theodore Ts'o" <tytso@mit.edu>
 M:     Jason A. Donenfeld <Jason@zx2c4.com>
-T:     git https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git
 S:     Maintained
+T:     git https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git
 F:     drivers/char/random.c
 F:     drivers/virt/vmgenid.c
 
@@ -17688,8 +17740,8 @@ T:      git git://linuxtv.org/media_tree.git
 F:     Documentation/driver-api/media/rc-core.rst
 F:     Documentation/userspace-api/media/rc/
 F:     drivers/media/rc/
-F:     include/media/rc-map.h
 F:     include/media/rc-core.h
+F:     include/media/rc-map.h
 F:     include/uapi/linux/lirc.h
 
 RCMM REMOTE CONTROLS DECODER
@@ -17778,7 +17830,7 @@ M:      Boqun Feng <boqun.feng@gmail.com>
 R:     Steven Rostedt <rostedt@goodmis.org>
 R:     Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
 R:     Lai Jiangshan <jiangshanlai@gmail.com>
-R:     Zqiang <qiang1.zhang@intel.com>
+R:     Zqiang <qiang.zhang1211@gmail.com>
 L:     rcu@vger.kernel.org
 S:     Supported
 W:     http://www.rdrop.com/users/paulmck/RCU/
@@ -17806,6 +17858,14 @@ F:     include/linux/rtc/
 F:     include/uapi/linux/rtc.h
 F:     tools/testing/selftests/rtc/
 
+Real-time Linux Analysis (RTLA) tools
+M:     Daniel Bristot de Oliveira <bristot@kernel.org>
+M:     Steven Rostedt <rostedt@goodmis.org>
+L:     linux-trace-kernel@vger.kernel.org
+S:     Maintained
+F:     Documentation/tools/rtla/
+F:     tools/tracing/rtla/
+
 REALTEK AUDIO CODECS
 M:     Oder Chiou <oder_chiou@realtek.com>
 S:     Maintained
@@ -17929,6 +17989,14 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/sound/renesas,idt821034.yaml
 F:     sound/soc/codecs/idt821034.c
 
+RENESAS R-CAR GEN3 & RZ/N1 NAND CONTROLLER DRIVER
+M:     Miquel Raynal <miquel.raynal@bootlin.com>
+L:     linux-mtd@lists.infradead.org
+L:     linux-renesas-soc@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/mtd/renesas-nandc.yaml
+F:     drivers/mtd/nand/raw/renesas-nand-controller.c
+
 RENESAS R-CAR GYROADC DRIVER
 M:     Marek Vasut <marek.vasut@gmail.com>
 L:     linux-iio@vger.kernel.org
@@ -17947,9 +18015,9 @@ F:      drivers/i2c/busses/i2c-sh_mobile.c
 
 RENESAS R-CAR SATA DRIVER
 R:     Sergey Shtylyov <s.shtylyov@omp.ru>
-S:     Supported
 L:     linux-ide@vger.kernel.org
 L:     linux-renesas-soc@vger.kernel.org
+S:     Supported
 F:     Documentation/devicetree/bindings/ata/renesas,rcar-sata.yaml
 F:     drivers/ata/sata_rcar.c
 
@@ -17969,12 +18037,6 @@ S:     Supported
 F:     Documentation/devicetree/bindings/i2c/renesas,riic.yaml
 F:     drivers/i2c/busses/i2c-riic.c
 
-RENESAS USB PHY DRIVER
-M:     Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
-L:     linux-renesas-soc@vger.kernel.org
-S:     Maintained
-F:     drivers/phy/renesas/phy-rcar-gen3-usb*.c
-
 RENESAS RZ/G2L A/D DRIVER
 M:     Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
 L:     linux-iio@vger.kernel.org
@@ -18020,13 +18082,19 @@ S:    Maintained
 F:     Documentation/devicetree/bindings/usb/renesas,rzn1-usbf.yaml
 F:     drivers/usb/gadget/udc/renesas_usbf.c
 
-RENESAS R-CAR GEN3 & RZ/N1 NAND CONTROLLER DRIVER
-M:     Miquel Raynal <miquel.raynal@bootlin.com>
-L:     linux-mtd@lists.infradead.org
+RENESAS RZ/V2M I2C DRIVER
+M:     Fabrizio Castro <fabrizio.castro.jz@renesas.com>
+L:     linux-i2c@vger.kernel.org
+L:     linux-renesas-soc@vger.kernel.org
+S:     Supported
+F:     Documentation/devicetree/bindings/i2c/renesas,rzv2m.yaml
+F:     drivers/i2c/busses/i2c-rzv2m.c
+
+RENESAS USB PHY DRIVER
+M:     Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
 L:     linux-renesas-soc@vger.kernel.org
 S:     Maintained
-F:     Documentation/devicetree/bindings/mtd/renesas-nandc.yaml
-F:     drivers/mtd/nand/raw/renesas-nand-controller.c
+F:     drivers/phy/renesas/phy-rcar-gen3-usb*.c
 
 RENESAS VERSACLOCK 7 CLOCK DRIVER
 M:     Alex Helms <alexander.helms.jy@renesas.com>
@@ -18094,15 +18162,6 @@ S:     Maintained
 F:     drivers/mtd/nand/raw/r852.c
 F:     drivers/mtd/nand/raw/r852.h
 
-RISC-V PMU DRIVERS
-M:     Atish Patra <atishp@atishpatra.org>
-R:     Anup Patel <anup@brainfault.org>
-L:     linux-riscv@lists.infradead.org
-S:     Supported
-F:     drivers/perf/riscv_pmu.c
-F:     drivers/perf/riscv_pmu_legacy.c
-F:     drivers/perf/riscv_pmu_sbi.c
-
 RISC-V ARCHITECTURE
 M:     Paul Walmsley <paul.walmsley@sifive.com>
 M:     Palmer Dabbelt <palmer@dabbelt.com>
@@ -18155,6 +18214,15 @@ T:     git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/
 F:     Documentation/devicetree/bindings/riscv/
 F:     arch/riscv/boot/dts/
 
+RISC-V PMU DRIVERS
+M:     Atish Patra <atishp@atishpatra.org>
+R:     Anup Patel <anup@brainfault.org>
+L:     linux-riscv@lists.infradead.org
+S:     Supported
+F:     drivers/perf/riscv_pmu.c
+F:     drivers/perf/riscv_pmu_legacy.c
+F:     drivers/perf/riscv_pmu_sbi.c
+
 RNBD BLOCK DRIVERS
 M:     Md. Haris Iqbal <haris.iqbal@ionos.com>
 M:     Jack Wang <jinpu.wang@ionos.com>
@@ -18363,7 +18431,7 @@ F:      drivers/infiniband/ulp/rtrs/
 RUNTIME VERIFICATION (RV)
 M:     Daniel Bristot de Oliveira <bristot@kernel.org>
 M:     Steven Rostedt <rostedt@goodmis.org>
-L:     linux-trace-devel@vger.kernel.org
+L:     linux-trace-kernel@vger.kernel.org
 S:     Maintained
 F:     Documentation/trace/rv/
 F:     include/linux/rv.h
@@ -18459,14 +18527,6 @@ F:     drivers/s390/net/*iucv*
 F:     include/net/iucv/
 F:     net/iucv/
 
-S390 NETWORK DRIVERS
-M:     Alexandra Winter <wintera@linux.ibm.com>
-M:     Wenjia Zhang <wenjia@linux.ibm.com>
-L:     linux-s390@vger.kernel.org
-L:     netdev@vger.kernel.org
-S:     Supported
-F:     drivers/s390/net/
-
 S390 MM
 M:     Alexander Gordeev <agordeev@linux.ibm.com>
 M:     Gerald Schaefer <gerald.schaefer@linux.ibm.com>
@@ -18476,14 +18536,22 @@ T:    git git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git
 F:     arch/s390/include/asm/pgtable.h
 F:     arch/s390/mm
 
+S390 NETWORK DRIVERS
+M:     Alexandra Winter <wintera@linux.ibm.com>
+M:     Wenjia Zhang <wenjia@linux.ibm.com>
+L:     linux-s390@vger.kernel.org
+L:     netdev@vger.kernel.org
+S:     Supported
+F:     drivers/s390/net/
+
 S390 PCI SUBSYSTEM
 M:     Niklas Schnelle <schnelle@linux.ibm.com>
 M:     Gerald Schaefer <gerald.schaefer@linux.ibm.com>
 L:     linux-s390@vger.kernel.org
 S:     Supported
+F:     Documentation/s390/pci.rst
 F:     arch/s390/pci/
 F:     drivers/pci/hotplug/s390_pci_hpc.c
-F:     Documentation/s390/pci.rst
 
 S390 SCM DRIVER
 M:     Vineeth Vijayan <vneethv@linux.ibm.com>
@@ -18568,10 +18636,9 @@ F:     Documentation/admin-guide/LSM/SafeSetID.rst
 F:     security/safesetid/
 
 SAMSUNG AUDIO (ASoC) DRIVERS
-M:     Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
 M:     Sylwester Nawrocki <s.nawrocki@samsung.com>
 L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
-S:     Supported
+S:     Maintained
 B:     mailto:linux-samsung-soc@vger.kernel.org
 F:     Documentation/devicetree/bindings/sound/samsung*
 F:     sound/soc/samsung/
@@ -18699,7 +18766,6 @@ F:      include/dt-bindings/clock/samsung,*.h
 F:     include/linux/clk/samsung.h
 
 SAMSUNG SPI DRIVERS
-M:     Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
 M:     Andi Shyti <andi.shyti@kernel.org>
 L:     linux-spi@vger.kernel.org
 L:     linux-samsung-soc@vger.kernel.org
@@ -18835,12 +18901,11 @@ F:    drivers/target/
 F:     include/target/
 
 SCTP PROTOCOL
-M:     Neil Horman <nhorman@tuxdriver.com>
 M:     Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
 M:     Xin Long <lucien.xin@gmail.com>
 L:     linux-sctp@vger.kernel.org
 S:     Maintained
-W:     http://lksctp.sourceforge.net
+W:     https://github.com/sctp/lksctp-tools/wiki
 F:     Documentation/networking/sctp.rst
 F:     include/linux/sctp.h
 F:     include/net/sctp/
@@ -18916,6 +18981,13 @@ L:     linux-mmc@vger.kernel.org
 S:     Supported
 F:     drivers/mmc/host/sdhci-of-at91.c
 
+SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) NXP i.MX DRIVER
+M:     Haibo Chen <haibo.chen@nxp.com>
+L:     linux-imx@nxp.com
+L:     linux-mmc@vger.kernel.org
+S:     Maintained
+F:     drivers/mmc/host/sdhci-esdhc-imx.c
+
 SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) SAMSUNG DRIVER
 M:     Ben Dooks <ben-linux@fluff.org>
 M:     Jaehoon Chung <jh80.chung@samsung.com>
@@ -18935,13 +19007,6 @@ L:     linux-mmc@vger.kernel.org
 S:     Maintained
 F:     drivers/mmc/host/sdhci-omap.c
 
-SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) NXP i.MX DRIVER
-M:     Haibo Chen <haibo.chen@nxp.com>
-L:     linux-imx@nxp.com
-L:     linux-mmc@vger.kernel.org
-S:     Maintained
-F:     drivers/mmc/host/sdhci-esdhc-imx.c
-
 SECURE ENCRYPTING DEVICE (SED) OPAL DRIVER
 M:     Jonathan Derrick <jonathan.derrick@linux.dev>
 L:     linux-block@vger.kernel.org
@@ -18951,6 +19016,15 @@ F:     block/sed*
 F:     include/linux/sed*
 F:     include/uapi/linux/sed*
 
+SECURE MONITOR CALL(SMC) CALLING CONVENTION (SMCCC)
+M:     Mark Rutland <mark.rutland@arm.com>
+M:     Lorenzo Pieralisi <lpieralisi@kernel.org>
+M:     Sudeep Holla <sudeep.holla@arm.com>
+L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:     Maintained
+F:     drivers/firmware/smccc/
+F:     include/linux/arm-smccc.h
+
 SECURITY CONTACT
 M:     Security Officers <security@kernel.org>
 S:     Supported
@@ -19100,6 +19174,9 @@ SHARED MEMORY COMMUNICATIONS (SMC) SOCKETS
 M:     Karsten Graul <kgraul@linux.ibm.com>
 M:     Wenjia Zhang <wenjia@linux.ibm.com>
 M:     Jan Karcher <jaka@linux.ibm.com>
+R:     D. Wythe <alibuda@linux.alibaba.com>
+R:     Tony Lu <tonylu@linux.alibaba.com>
+R:     Wen Gu <guwen@linux.alibaba.com>
 L:     linux-s390@vger.kernel.org
 S:     Supported
 F:     net/smc/
@@ -19400,15 +19477,6 @@ M:     Nicolas Pitre <nico@fluxnic.net>
 S:     Odd Fixes
 F:     drivers/net/ethernet/smsc/smc91x.*
 
-SECURE MONITOR CALL(SMC) CALLING CONVENTION (SMCCC)
-M:     Mark Rutland <mark.rutland@arm.com>
-M:     Lorenzo Pieralisi <lpieralisi@kernel.org>
-M:     Sudeep Holla <sudeep.holla@arm.com>
-L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-S:     Maintained
-F:     drivers/firmware/smccc/
-F:     include/linux/arm-smccc.h
-
 SMM665 HARDWARE MONITOR DRIVER
 M:     Guenter Roeck <linux@roeck-us.net>
 L:     linux-hwmon@vger.kernel.org
@@ -19456,6 +19524,10 @@ L:     netdev@vger.kernel.org
 S:     Maintained
 F:     drivers/net/ethernet/smsc/smsc9420.*
 
+SNET DPU VIRTIO DATA PATH ACCELERATOR
+R:     Alvaro Karsz <alvaro.karsz@solid-run.com>
+F:     drivers/vdpa/solidrun/
+
 SOCIONEXT (SNI) AVE NETWORK DRIVER
 M:     Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
 L:     netdev@vger.kernel.org
@@ -19725,6 +19797,13 @@ F:     include/uapi/sound/
 F:     sound/
 F:     tools/testing/selftests/alsa
 
+SOUND - ALSA SELFTESTS
+M:     Mark Brown <broonie@kernel.org>
+L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
+L:     linux-kselftest@vger.kernel.org
+S:     Supported
+F:     tools/testing/selftests/alsa
+
 SOUND - COMPRESSED AUDIO
 M:     Vinod Koul <vkoul@kernel.org>
 L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
@@ -19743,13 +19822,6 @@ F:     include/sound/dmaengine_pcm.h
 F:     sound/core/pcm_dmaengine.c
 F:     sound/soc/soc-generic-dmaengine-pcm.c
 
-SOUND - ALSA SELFTESTS
-M:     Mark Brown <broonie@kernel.org>
-L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
-L:     linux-kselftest@vger.kernel.org
-S:     Supported
-F:     tools/testing/selftests/alsa
-
 SOUND - SOC LAYER / DYNAMIC AUDIO POWER MANAGEMENT (ASoC)
 M:     Liam Girdwood <lgirdwood@gmail.com>
 M:     Mark Brown <broonie@kernel.org>
@@ -19769,8 +19841,8 @@ M:      Liam Girdwood <lgirdwood@gmail.com>
 M:     Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
 M:     Bard Liao <yung-chuan.liao@linux.intel.com>
 M:     Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
-R:     Kai Vehmanen <kai.vehmanen@linux.intel.com>
 M:     Daniel Baluta <daniel.baluta@nxp.com>
+R:     Kai Vehmanen <kai.vehmanen@linux.intel.com>
 L:     sound-open-firmware@alsa-project.org (moderated for non-subscribers)
 S:     Supported
 W:     https://github.com/thesofproject/linux/
@@ -19832,9 +19904,9 @@ M:      "Luc Van Oostenryck" <luc.vanoostenryck@gmail.com>
 L:     linux-sparse@vger.kernel.org
 S:     Maintained
 W:     https://sparse.docs.kernel.org/
-T:     git git://git.kernel.org/pub/scm/devel/sparse/sparse.git
 Q:     https://patchwork.kernel.org/project/linux-sparse/list/
 B:     https://bugzilla.kernel.org/enter_bug.cgi?component=Sparse&product=Tools
+T:     git git://git.kernel.org/pub/scm/devel/sparse/sparse.git
 F:     include/linux/compiler.h
 
 SPEAKUP CONSOLE SPEECH DRIVER
@@ -20203,6 +20275,11 @@ W:     http://www.stlinux.com
 F:     Documentation/networking/device_drivers/ethernet/stmicro/
 F:     drivers/net/ethernet/stmicro/stmmac/
 
+SUN HAPPY MEAL ETHERNET DRIVER
+M:     Sean Anderson <seanga2@gmail.com>
+S:     Maintained
+F:     drivers/net/ethernet/sun/sunhme.*
+
 SUN3/3X
 M:     Sam Creasey <sammy@sammy.net>
 S:     Maintained
@@ -20225,11 +20302,6 @@ L:     netdev@vger.kernel.org
 S:     Maintained
 F:     drivers/net/ethernet/dlink/sundance.c
 
-SUN HAPPY MEAL ETHERNET DRIVER
-M:     Sean Anderson <seanga2@gmail.com>
-S:     Maintained
-F:     drivers/net/ethernet/sun/sunhme.*
-
 SUNPLUS ETHERNET DRIVER
 M:     Wells Lu <wellslutw@gmail.com>
 L:     netdev@vger.kernel.org
@@ -20251,15 +20323,6 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/nvmem/sunplus,sp7021-ocotp.yaml
 F:     drivers/nvmem/sunplus-ocotp.c
 
-SUNPLUS USB2 PHY DRIVER
-M:     Vincent Shih <vincent.sunplus@gmail.com>
-L:     linux-usb@vger.kernel.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/phy/sunplus,sp7021-usb2-phy.yaml
-F:     drivers/phy/sunplus/Kconfig
-F:     drivers/phy/sunplus/Makefile
-F:     drivers/phy/sunplus/phy-sunplus-usb2.c
-
 SUNPLUS PWM DRIVER
 M:     Hammer Hsieh <hammerh0314@gmail.com>
 S:     Maintained
@@ -20286,6 +20349,15 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/serial/sunplus,sp7021-uart.yaml
 F:     drivers/tty/serial/sunplus-uart.c
 
+SUNPLUS USB2 PHY DRIVER
+M:     Vincent Shih <vincent.sunplus@gmail.com>
+L:     linux-usb@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/phy/sunplus,sp7021-usb2-phy.yaml
+F:     drivers/phy/sunplus/Kconfig
+F:     drivers/phy/sunplus/Makefile
+F:     drivers/phy/sunplus/phy-sunplus-usb2.c
+
 SUNPLUS WATCHDOG DRIVER
 M:     Xiantao Hu <xt.hu@cqplus1.com>
 L:     linux-watchdog@vger.kernel.org
@@ -20697,6 +20769,14 @@ F:     include/linux/if_team.h
 F:     include/uapi/linux/if_team.h
 F:     tools/testing/selftests/drivers/net/team/
 
+TECHNICAL ADVISORY BOARD PROCESS DOCS
+M:     "Theodore Ts'o" <tytso@mit.edu>
+M:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+L:     tech-board-discuss@lists.linux-foundation.org
+S:     Maintained
+F:     Documentation/process/contribution-maturity-model.rst
+F:     Documentation/process/researcher-guidelines.rst
+
 TECHNOLOGIC SYSTEMS TS-5500 PLATFORM SUPPORT
 M:     "Savoir-faire Linux Inc." <kernel@savoirfairelinux.com>
 S:     Maintained
@@ -20776,6 +20856,14 @@ M:     Thierry Reding <thierry.reding@gmail.com>
 S:     Supported
 F:     drivers/pwm/pwm-tegra.c
 
+TEGRA QUAD SPI DRIVER
+M:     Thierry Reding <thierry.reding@gmail.com>
+M:     Jonathan Hunter <jonathanh@nvidia.com>
+M:     Sowjanya Komatineni <skomatineni@nvidia.com>
+L:     linux-tegra@vger.kernel.org
+S:     Maintained
+F:     drivers/spi/spi-tegra210-quad.c
+
 TEGRA SERIAL DRIVER
 M:     Laxman Dewangan <ldewangan@nvidia.com>
 S:     Supported
@@ -20786,14 +20874,6 @@ M:     Laxman Dewangan <ldewangan@nvidia.com>
 S:     Supported
 F:     drivers/spi/spi-tegra*
 
-TEGRA QUAD SPI DRIVER
-M:     Thierry Reding <thierry.reding@gmail.com>
-M:     Jonathan Hunter <jonathanh@nvidia.com>
-M:     Sowjanya Komatineni <skomatineni@nvidia.com>
-L:     linux-tegra@vger.kernel.org
-S:     Maintained
-F:     drivers/spi/spi-tegra210-quad.c
-
 TEGRA VIDEO DRIVER
 M:     Thierry Reding <thierry.reding@gmail.com>
 M:     Jonathan Hunter <jonathanh@nvidia.com>
@@ -20842,13 +20922,6 @@ S:     Maintained
 F:     Documentation/devicetree/bindings/sound/davinci-mcasp-audio.yaml
 F:     sound/soc/ti/
 
-TEXAS INSTRUMENTS' DAC7612 DAC DRIVER
-M:     Ricardo Ribalda <ribalda@kernel.org>
-L:     linux-iio@vger.kernel.org
-S:     Supported
-F:     Documentation/devicetree/bindings/iio/dac/ti,dac7612.yaml
-F:     drivers/iio/dac/ti-dac7612.c
-
 TEXAS INSTRUMENTS DMA DRIVERS
 M:     Peter Ujfalusi <peter.ujfalusi@gmail.com>
 L:     dmaengine@vger.kernel.org
@@ -20857,10 +20930,26 @@ F:    Documentation/devicetree/bindings/dma/ti-dma-crossbar.txt
 F:     Documentation/devicetree/bindings/dma/ti-edma.txt
 F:     Documentation/devicetree/bindings/dma/ti/
 F:     drivers/dma/ti/
-X:     drivers/dma/ti/cppi41.c
+F:     include/linux/dma/k3-psil.h
 F:     include/linux/dma/k3-udma-glue.h
 F:     include/linux/dma/ti-cppi5.h
-F:     include/linux/dma/k3-psil.h
+X:     drivers/dma/ti/cppi41.c
+
+TEXAS INSTRUMENTS TPS23861 PoE PSE DRIVER
+M:     Robert Marko <robert.marko@sartura.hr>
+M:     Luka Perkov <luka.perkov@sartura.hr>
+L:     linux-hwmon@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/hwmon/ti,tps23861.yaml
+F:     Documentation/hwmon/tps23861.rst
+F:     drivers/hwmon/tps23861.c
+
+TEXAS INSTRUMENTS' DAC7612 DAC DRIVER
+M:     Ricardo Ribalda <ribalda@kernel.org>
+L:     linux-iio@vger.kernel.org
+S:     Supported
+F:     Documentation/devicetree/bindings/iio/dac/ti,dac7612.yaml
+F:     drivers/iio/dac/ti-dac7612.c
 
 TEXAS INSTRUMENTS' SYSTEM CONTROL INTERFACE (TISCI) PROTOCOL DRIVER
 M:     Nishanth Menon <nm@ti.com>
@@ -20886,15 +20975,6 @@ F:     include/dt-bindings/soc/ti,sci_pm_domain.h
 F:     include/linux/soc/ti/ti_sci_inta_msi.h
 F:     include/linux/soc/ti/ti_sci_protocol.h
 
-TEXAS INSTRUMENTS TPS23861 PoE PSE DRIVER
-M:     Robert Marko <robert.marko@sartura.hr>
-M:     Luka Perkov <luka.perkov@sartura.hr>
-L:     linux-hwmon@vger.kernel.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/hwmon/ti,tps23861.yaml
-F:     Documentation/hwmon/tps23861.rst
-F:     drivers/hwmon/tps23861.c
-
 TEXAS INSTRUMENTS' TMP117 TEMPERATURE SENSOR DRIVER
 M:     Puranjay Mohan <puranjay12@gmail.com>
 L:     linux-iio@vger.kernel.org
@@ -21371,8 +21451,8 @@ M:      Steven Rostedt <rostedt@goodmis.org>
 M:     Masami Hiramatsu <mhiramat@kernel.org>
 L:     linux-kernel@vger.kernel.org
 L:     linux-trace-kernel@vger.kernel.org
-Q:     https://patchwork.kernel.org/project/linux-trace-kernel/list/
 S:     Maintained
+Q:     https://patchwork.kernel.org/project/linux-trace-kernel/list/
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
 F:     Documentation/trace/*
 F:     fs/tracefs/
@@ -21400,31 +21480,15 @@ TRACING OS NOISE / LATENCY TRACERS
 M:     Steven Rostedt <rostedt@goodmis.org>
 M:     Daniel Bristot de Oliveira <bristot@kernel.org>
 S:     Maintained
-F:     kernel/trace/trace_osnoise.c
+F:     Documentation/trace/hwlat_detector.rst
+F:     Documentation/trace/osnoise-tracer.rst
+F:     Documentation/trace/timerlat-tracer.rst
+F:     arch/*/kernel/trace.c
 F:     include/trace/events/osnoise.h
 F:     kernel/trace/trace_hwlat.c
 F:     kernel/trace/trace_irqsoff.c
+F:     kernel/trace/trace_osnoise.c
 F:     kernel/trace/trace_sched_wakeup.c
-F:     Documentation/trace/osnoise-tracer.rst
-F:     Documentation/trace/timerlat-tracer.rst
-F:     Documentation/trace/hwlat_detector.rst
-F:     arch/*/kernel/trace.c
-
-Real-time Linux Analysis (RTLA) tools
-M:     Daniel Bristot de Oliveira <bristot@kernel.org>
-M:     Steven Rostedt <rostedt@goodmis.org>
-L:     linux-trace-devel@vger.kernel.org
-S:     Maintained
-F:     Documentation/tools/rtla/
-F:     tools/tracing/rtla/
-
-TECHNICAL ADVISORY BOARD PROCESS DOCS
-M:     "Theodore Ts'o" <tytso@mit.edu>
-M:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-L:     tech-board-discuss@lists.linux-foundation.org
-S:     Maintained
-F:     Documentation/process/researcher-guidelines.rst
-F:     Documentation/process/contribution-maturity-model.rst
 
 TRADITIONAL CHINESE DOCUMENTATION
 M:     Hu Haowen <src.res@email.cn>
@@ -21782,8 +21846,8 @@ USB ISP1760 DRIVER
 M:     Rui Miguel Silva <rui.silva@linaro.org>
 L:     linux-usb@vger.kernel.org
 S:     Maintained
-F:     drivers/usb/isp1760/*
 F:     Documentation/devicetree/bindings/usb/nxp,isp1760.yaml
+F:     drivers/usb/isp1760/*
 
 USB LAN78XX ETHERNET DRIVER
 M:     Woojung Huh <woojung.huh@microchip.com>
@@ -21854,6 +21918,13 @@ L:     linux-usb@vger.kernel.org
 S:     Supported
 F:     drivers/usb/class/usblp.c
 
+USB QMI WWAN NETWORK DRIVER
+M:     Bjørn Mork <bjorn@mork.no>
+L:     netdev@vger.kernel.org
+S:     Maintained
+F:     Documentation/ABI/testing/sysfs-class-net-qmi
+F:     drivers/net/usb/qmi_wwan.c
+
 USB RAW GADGET DRIVER
 R:     Andrey Konovalov <andreyknvl@gmail.com>
 L:     linux-usb@vger.kernel.org
@@ -21862,13 +21933,6 @@ F:     Documentation/usb/raw-gadget.rst
 F:     drivers/usb/gadget/legacy/raw_gadget.c
 F:     include/uapi/linux/usb/raw_gadget.h
 
-USB QMI WWAN NETWORK DRIVER
-M:     Bjørn Mork <bjorn@mork.no>
-L:     netdev@vger.kernel.org
-S:     Maintained
-F:     Documentation/ABI/testing/sysfs-class-net-qmi
-F:     drivers/net/usb/qmi_wwan.c
-
 USB RTL8150 DRIVER
 M:     Petko Manolov <petkan@nucleusys.com>
 L:     linux-usb@vger.kernel.org
@@ -22120,6 +22184,12 @@ F:     drivers/vfio/mdev/
 F:     include/linux/mdev.h
 F:     samples/vfio-mdev/
 
+VFIO MLX5 PCI DRIVER
+M:     Yishai Hadas <yishaih@nvidia.com>
+L:     kvm@vger.kernel.org
+S:     Maintained
+F:     drivers/vfio/pci/mlx5/
+
 VFIO PCI DEVICE SPECIFIC DRIVERS
 R:     Jason Gunthorpe <jgg@nvidia.com>
 R:     Yishai Hadas <yishaih@nvidia.com>
@@ -22136,12 +22206,6 @@ L:     kvm@vger.kernel.org
 S:     Maintained
 F:     drivers/vfio/platform/
 
-VFIO MLX5 PCI DRIVER
-M:     Yishai Hadas <yishaih@nvidia.com>
-L:     kvm@vger.kernel.org
-S:     Maintained
-F:     drivers/vfio/pci/mlx5/
-
 VGA_SWITCHEROO
 R:     Lukas Wunner <lukas@wunner.de>
 S:     Maintained
@@ -22151,8 +22215,8 @@ F:      drivers/gpu/vga/vga_switcheroo.c
 F:     include/linux/vga_switcheroo.h
 
 VIA RHINE NETWORK DRIVER
-S:     Maintained
 M:     Kevin Brace <kevinbrace@bracecomputerlab.com>
+S:     Maintained
 F:     drivers/net/ethernet/via/via-rhine.c
 
 VIA SD/MMC CARD CONTROLLER DRIVER
@@ -22204,6 +22268,14 @@ S:     Maintained
 F:     drivers/media/common/videobuf2/*
 F:     include/media/videobuf2-*
 
+VIDTV VIRTUAL DIGITAL TV DRIVER
+M:     Daniel W. S. Almeida <dwlsalmeida@gmail.com>
+L:     linux-media@vger.kernel.org
+S:     Maintained
+W:     https://linuxtv.org
+T:     git git://linuxtv.org/media_tree.git
+F:     drivers/media/test-drivers/vidtv/*
+
 VIMC VIRTUAL MEDIA CONTROLLER DRIVER
 M:     Shuah Khan <skhan@linuxfoundation.org>
 R:     Kieran Bingham <kieran.bingham@ideasonboard.com>
@@ -22233,6 +22305,16 @@ F:     include/uapi/linux/virtio_vsock.h
 F:     net/vmw_vsock/virtio_transport.c
 F:     net/vmw_vsock/virtio_transport_common.c
 
+VIRTIO BALLOON
+M:     "Michael S. Tsirkin" <mst@redhat.com>
+M:     David Hildenbrand <david@redhat.com>
+L:     virtualization@lists.linux-foundation.org
+S:     Maintained
+F:     drivers/virtio/virtio_balloon.c
+F:     include/linux/balloon_compaction.h
+F:     include/uapi/linux/virtio_balloon.h
+F:     mm/balloon_compaction.c
+
 VIRTIO BLOCK AND SCSI DRIVERS
 M:     "Michael S. Tsirkin" <mst@redhat.com>
 M:     Jason Wang <jasowang@redhat.com>
@@ -22275,30 +22357,6 @@ F:     include/linux/vringh.h
 F:     include/uapi/linux/virtio_*.h
 F:     tools/virtio/
 
-VISL VIRTUAL STATELESS DECODER DRIVER
-M:     Daniel Almeida <daniel.almeida@collabora.com>
-L:     linux-media@vger.kernel.org
-S:     Supported
-F:     drivers/media/test-drivers/visl
-
-IFCVF VIRTIO DATA PATH ACCELERATOR
-R:     Zhu Lingshan <lingshan.zhu@intel.com>
-F:     drivers/vdpa/ifcvf/
-
-SNET DPU VIRTIO DATA PATH ACCELERATOR
-R:     Alvaro Karsz <alvaro.karsz@solid-run.com>
-F:     drivers/vdpa/solidrun/
-
-VIRTIO BALLOON
-M:     "Michael S. Tsirkin" <mst@redhat.com>
-M:     David Hildenbrand <david@redhat.com>
-L:     virtualization@lists.linux-foundation.org
-S:     Maintained
-F:     drivers/virtio/virtio_balloon.c
-F:     include/uapi/linux/virtio_balloon.h
-F:     include/linux/balloon_compaction.h
-F:     mm/balloon_compaction.c
-
 VIRTIO CRYPTO DRIVER
 M:     Gonglei <arei.gonglei@huawei.com>
 L:     virtualization@lists.linux-foundation.org
@@ -22359,11 +22417,20 @@ L:    virtualization@lists.linux-foundation.org
 L:     netdev@vger.kernel.org
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
-F:     kernel/vhost_task.c
 F:     drivers/vhost/
 F:     include/linux/sched/vhost_task.h
 F:     include/linux/vhost_iotlb.h
 F:     include/uapi/linux/vhost.h
+F:     kernel/vhost_task.c
+
+VIRTIO I2C DRIVER
+M:     Conghui Chen <conghui.chen@intel.com>
+M:     Viresh Kumar <viresh.kumar@linaro.org>
+L:     linux-i2c@vger.kernel.org
+L:     virtualization@lists.linux-foundation.org
+S:     Maintained
+F:     drivers/i2c/busses/i2c-virtio.c
+F:     include/uapi/linux/virtio_i2c.h
 
 VIRTIO INPUT DRIVER
 M:     Gerd Hoffmann <kraxel@redhat.com>
@@ -22386,6 +22453,13 @@ W:     https://virtio-mem.gitlab.io/
 F:     drivers/virtio/virtio_mem.c
 F:     include/uapi/linux/virtio_mem.h
 
+VIRTIO PMEM DRIVER
+M:     Pankaj Gupta <pankaj.gupta.linux@gmail.com>
+L:     virtualization@lists.linux-foundation.org
+S:     Maintained
+F:     drivers/nvdimm/nd_virtio.c
+F:     drivers/nvdimm/virtio_pmem.c
+
 VIRTIO SOUND DRIVER
 M:     Anton Yakovlev <anton.yakovlev@opensynergy.com>
 M:     "Michael S. Tsirkin" <mst@redhat.com>
@@ -22395,22 +22469,6 @@ S:     Maintained
 F:     include/uapi/linux/virtio_snd.h
 F:     sound/virtio/*
 
-VIRTIO I2C DRIVER
-M:     Conghui Chen <conghui.chen@intel.com>
-M:     Viresh Kumar <viresh.kumar@linaro.org>
-L:     linux-i2c@vger.kernel.org
-L:     virtualization@lists.linux-foundation.org
-S:     Maintained
-F:     drivers/i2c/busses/i2c-virtio.c
-F:     include/uapi/linux/virtio_i2c.h
-
-VIRTIO PMEM DRIVER
-M:     Pankaj Gupta <pankaj.gupta.linux@gmail.com>
-L:     virtualization@lists.linux-foundation.org
-S:     Maintained
-F:     drivers/nvdimm/virtio_pmem.c
-F:     drivers/nvdimm/nd_virtio.c
-
 VIRTUAL BOX GUEST DEVICE DRIVER
 M:     Hans de Goede <hdegoede@redhat.com>
 M:     Arnd Bergmann <arnd@arndb.de>
@@ -22432,6 +22490,12 @@ S:     Maintained
 F:     drivers/input/serio/userio.c
 F:     include/uapi/linux/userio.h
 
+VISL VIRTUAL STATELESS DECODER DRIVER
+M:     Daniel Almeida <daniel.almeida@collabora.com>
+L:     linux-media@vger.kernel.org
+S:     Supported
+F:     drivers/media/test-drivers/visl
+
 VIVID VIRTUAL VIDEO DRIVER
 M:     Hans Verkuil <hverkuil@xs4all.nl>
 L:     linux-media@vger.kernel.org
@@ -22440,14 +22504,6 @@ W:     https://linuxtv.org
 T:     git git://linuxtv.org/media_tree.git
 F:     drivers/media/test-drivers/vivid/*
 
-VIDTV VIRTUAL DIGITAL TV DRIVER
-M:     Daniel W. S. Almeida <dwlsalmeida@gmail.com>
-L:     linux-media@vger.kernel.org
-S:     Maintained
-W:     https://linuxtv.org
-T:     git git://linuxtv.org/media_tree.git
-F:     drivers/media/test-drivers/vidtv/*
-
 VLYNQ BUS
 M:     Florian Fainelli <f.fainelli@gmail.com>
 L:     openwrt-devel@lists.openwrt.org (subscribers-only)
@@ -22455,16 +22511,6 @@ S:     Maintained
 F:     drivers/vlynq/vlynq.c
 F:     include/linux/vlynq.h
 
-VME SUBSYSTEM
-M:     Martyn Welch <martyn@welchs.me.uk>
-M:     Manohar Vanga <manohar.vanga@gmail.com>
-M:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-L:     linux-kernel@vger.kernel.org
-S:     Odd fixes
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
-F:     Documentation/driver-api/vme.rst
-F:     drivers/staging/vme_user/
-
 VM SOCKETS (AF_VSOCK)
 M:     Stefano Garzarella <sgarzare@redhat.com>
 L:     virtualization@lists.linux-foundation.org
@@ -22478,6 +22524,28 @@ F:     include/uapi/linux/vsockmon.h
 F:     net/vmw_vsock/
 F:     tools/testing/vsock/
 
+VMALLOC
+M:     Andrew Morton <akpm@linux-foundation.org>
+R:     Uladzislau Rezki <urezki@gmail.com>
+R:     Christoph Hellwig <hch@infradead.org>
+R:     Lorenzo Stoakes <lstoakes@gmail.com>
+L:     linux-mm@kvack.org
+S:     Maintained
+W:     http://www.linux-mm.org
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+F:     include/linux/vmalloc.h
+F:     mm/vmalloc.c
+
+VME SUBSYSTEM
+M:     Martyn Welch <martyn@welchs.me.uk>
+M:     Manohar Vanga <manohar.vanga@gmail.com>
+M:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+L:     linux-kernel@vger.kernel.org
+S:     Odd fixes
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
+F:     Documentation/driver-api/vme.rst
+F:     drivers/staging/vme_user/
+
 VMWARE BALLOON DRIVER
 M:     Nadav Amit <namit@vmware.com>
 R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
@@ -22486,7 +22554,7 @@ S:      Supported
 F:     drivers/misc/vmw_balloon.c
 
 VMWARE HYPERVISOR INTERFACE
-M:     Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
+M:     Ajay Kaher <akaher@vmware.com>
 M:     Alexey Makhalov <amakhalov@vmware.com>
 R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
 L:     virtualization@lists.linux-foundation.org
@@ -22513,8 +22581,8 @@ F:      drivers/scsi/vmw_pvscsi.c
 F:     drivers/scsi/vmw_pvscsi.h
 
 VMWARE VIRTUAL PTP CLOCK DRIVER
-M:     Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
 M:     Deep Shah <sdeep@vmware.com>
+R:     Ajay Kaher <akaher@vmware.com>
 R:     Alexey Makhalov <amakhalov@vmware.com>
 R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
 L:     netdev@vger.kernel.org
@@ -22659,9 +22727,9 @@ F:      drivers/input/tablet/wacom_serial4.c
 WANGXUN ETHERNET DRIVER
 M:     Jiawen Wu <jiawenwu@trustnetic.com>
 M:     Mengyuan Lou <mengyuanlou@net-swift.com>
-W:     https://www.net-swift.com
 L:     netdev@vger.kernel.org
 S:     Maintained
+W:     https://www.net-swift.com
 F:     Documentation/networking/device_drivers/ethernet/wangxun/*
 F:     drivers/net/ethernet/wangxun/
 
@@ -22676,8 +22744,8 @@ F:      Documentation/devicetree/bindings/watchdog/
 F:     Documentation/watchdog/
 F:     drivers/watchdog/
 F:     include/linux/watchdog.h
-F:     include/uapi/linux/watchdog.h
 F:     include/trace/events/watchdog.h
+F:     include/uapi/linux/watchdog.h
 
 WHISKEYCOVE PMIC GPIO DRIVER
 M:     Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
@@ -22834,8 +22902,8 @@ R:      "H. Peter Anvin" <hpa@zytor.com>
 L:     linux-kernel@vger.kernel.org
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/core
-F:     Documentation/devicetree/bindings/x86/
 F:     Documentation/arch/x86/
+F:     Documentation/devicetree/bindings/x86/
 F:     arch/x86/
 
 X86 ENTRY CODE
@@ -22966,6 +23034,8 @@ M:      John Fastabend <john.fastabend@gmail.com>
 L:     netdev@vger.kernel.org
 L:     bpf@vger.kernel.org
 S:     Supported
+F:     drivers/net/ethernet/*/*/*/*/*xdp*
+F:     drivers/net/ethernet/*/*/*xdp*
 F:     include/net/xdp.h
 F:     include/net/xdp_priv.h
 F:     include/trace/events/xdp.h
@@ -22973,10 +23043,8 @@ F:     kernel/bpf/cpumap.c
 F:     kernel/bpf/devmap.c
 F:     net/core/xdp.c
 F:     samples/bpf/xdp*
-F:     tools/testing/selftests/bpf/*xdp*
 F:     tools/testing/selftests/bpf/*/*xdp*
-F:     drivers/net/ethernet/*/*/*/*/*xdp*
-F:     drivers/net/ethernet/*/*/*xdp*
+F:     tools/testing/selftests/bpf/*xdp*
 K:     (?:\b|_)xdp(?:\b|_)
 
 XDP SOCKETS (AF_XDP)
@@ -22988,11 +23056,11 @@ L:    netdev@vger.kernel.org
 L:     bpf@vger.kernel.org
 S:     Maintained
 F:     Documentation/networking/af_xdp.rst
+F:     include/net/netns/xdp.h
 F:     include/net/xdp_sock*
 F:     include/net/xsk_buff_pool.h
 F:     include/uapi/linux/if_xdp.h
 F:     include/uapi/linux/xdp_diag.h
-F:     include/net/netns/xdp.h
 F:     net/xdp/
 F:     tools/testing/selftests/bpf/*xsk*
 
@@ -23094,11 +23162,11 @@ F:    include/xen/arm/swiotlb-xen.h
 F:     include/xen/swiotlb-xen.h
 
 XFS FILESYSTEM
-C:     irc://irc.oftc.net/xfs
 M:     Darrick J. Wong <djwong@kernel.org>
 L:     linux-xfs@vger.kernel.org
 S:     Supported
 W:     http://xfs.org/
+C:     irc://irc.oftc.net/xfs
 T:     git git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
 F:     Documentation/ABI/testing/sysfs-fs-xfs
 F:     Documentation/admin-guide/xfs.rst
@@ -23128,16 +23196,28 @@ S:    Maintained
 F:     Documentation/devicetree/bindings/net/can/xilinx,can.yaml
 F:     drivers/net/can/xilinx_can.c
 
+XILINX EVENT MANAGEMENT DRIVER
+M:     Abhyuday Godhasara <abhyuday.godhasara@xilinx.com>
+S:     Maintained
+F:     drivers/soc/xilinx/xlnx_event_manager.c
+F:     include/linux/firmware/xlnx-event-manager.h
+
 XILINX GPIO DRIVER
 M:     Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
 R:     Srinivas Neeli <srinivas.neeli@xilinx.com>
 R:     Michal Simek <michal.simek@amd.com>
 S:     Maintained
-F:     Documentation/devicetree/bindings/gpio/xlnx,gpio-xilinx.yaml
 F:     Documentation/devicetree/bindings/gpio/gpio-zynq.yaml
+F:     Documentation/devicetree/bindings/gpio/xlnx,gpio-xilinx.yaml
 F:     drivers/gpio/gpio-xilinx.c
 F:     drivers/gpio/gpio-zynq.c
 
+XILINX PWM DRIVER
+M:     Sean Anderson <sean.anderson@seco.com>
+S:     Maintained
+F:     drivers/pwm/pwm-xilinx.c
+F:     include/clocksource/timer-xilinx.h
+
 XILINX SD-FEC IP CORES
 M:     Derek Kiernan <derek.kiernan@xilinx.com>
 M:     Dragan Cvetic <dragan.cvetic@xilinx.com>
@@ -23149,12 +23229,6 @@ F:     drivers/misc/Makefile
 F:     drivers/misc/xilinx_sdfec.c
 F:     include/uapi/misc/xilinx_sdfec.h
 
-XILINX PWM DRIVER
-M:     Sean Anderson <sean.anderson@seco.com>
-S:     Maintained
-F:     drivers/pwm/pwm-xilinx.c
-F:     include/clocksource/timer-xilinx.h
-
 XILINX UARTLITE SERIAL DRIVER
 M:     Peter Korsgaard <jacmet@sunsite.dk>
 L:     linux-serial@vger.kernel.org
@@ -23220,12 +23294,6 @@ M:     Harsha <harsha.harsha@xilinx.com>
 S:     Maintained
 F:     drivers/crypto/xilinx/zynqmp-sha.c
 
-XILINX EVENT MANAGEMENT DRIVER
-M:     Abhyuday Godhasara <abhyuday.godhasara@xilinx.com>
-S:     Maintained
-F:     drivers/soc/xilinx/xlnx_event_manager.c
-F:     include/linux/firmware/xlnx-event-manager.h
-
 XILLYBUS DRIVER
 M:     Eli Billauer <eli.billauer@gmail.com>
 L:     linux-kernel@vger.kernel.org
@@ -23273,6 +23341,13 @@ S:     Maintained
 F:     Documentation/input/devices/yealink.rst
 F:     drivers/input/misc/yealink.*
 
+Z3FOLD COMPRESSED PAGE ALLOCATOR
+M:     Vitaly Wool <vitaly.wool@konsulko.com>
+R:     Miaohe Lin <linmiaohe@huawei.com>
+L:     linux-mm@kvack.org
+S:     Maintained
+F:     mm/z3fold.c
+
 Z8530 DRIVER FOR AX.25
 M:     Joerg Reuter <jreuter@yaina.de>
 L:     linux-hams@vger.kernel.org
@@ -23290,13 +23365,6 @@ L:     linux-mm@kvack.org
 S:     Maintained
 F:     mm/zbud.c
 
-Z3FOLD COMPRESSED PAGE ALLOCATOR
-M:     Vitaly Wool <vitaly.wool@konsulko.com>
-R:     Miaohe Lin <linmiaohe@huawei.com>
-L:     linux-mm@kvack.org
-S:     Maintained
-F:     mm/z3fold.c
-
 ZD1211RW WIRELESS DRIVER
 M:     Ulrich Kunitz <kune@deine-taler.de>
 L:     linux-wireless@vger.kernel.org
@@ -23383,10 +23451,10 @@ M:    Nick Terrell <terrelln@fb.com>
 S:     Maintained
 B:     https://github.com/facebook/zstd/issues
 T:     git https://github.com/terrelln/linux.git
+F:     crypto/zstd.c
 F:     include/linux/zstd*
-F:     lib/zstd/
 F:     lib/decompress_unzstd.c
-F:     crypto/zstd.c
+F:     lib/zstd/
 N:     zstd
 K:     zstd
 
@@ -23398,13 +23466,6 @@ L:     linux-mm@kvack.org
 S:     Maintained
 F:     mm/zswap.c
 
-NXP BLUETOOTH WIRELESS DRIVERS
-M:     Amitkumar Karwar <amitkumar.karwar@nxp.com>
-M:     Neeraj Kale <neeraj.sanjaykale@nxp.com>
-S:     Maintained
-F:     Documentation/devicetree/bindings/net/bluetooth/nxp,88w8987-bt.yaml
-F:     drivers/bluetooth/btnxpuart.c
-
 THE REST
 M:     Linus Torvalds <torvalds@linux-foundation.org>
 L:     linux-kernel@vger.kernel.org
index 9d765eb..e51e4d9 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -2,7 +2,7 @@
 VERSION = 6
 PATCHLEVEL = 4
 SUBLEVEL = 0
-EXTRAVERSION = -rc1
+EXTRAVERSION =
 NAME = Hurr durr I'ma ninja sloth
 
 # *DOCUMENTATION*
index 205fd23..d6a6865 100644 (file)
@@ -34,6 +34,29 @@ config ARCH_HAS_SUBPAGE_FAULTS
 config HOTPLUG_SMT
        bool
 
+# Selected by HOTPLUG_CORE_SYNC_DEAD or HOTPLUG_CORE_SYNC_FULL
+config HOTPLUG_CORE_SYNC
+       bool
+
+# Basic CPU dead synchronization selected by architecture
+config HOTPLUG_CORE_SYNC_DEAD
+       bool
+       select HOTPLUG_CORE_SYNC
+
+# Full CPU synchronization with alive state selected by architecture
+config HOTPLUG_CORE_SYNC_FULL
+       bool
+       select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
+       select HOTPLUG_CORE_SYNC
+
+config HOTPLUG_SPLIT_STARTUP
+       bool
+       select HOTPLUG_CORE_SYNC_FULL
+
+config HOTPLUG_PARALLEL
+       bool
+       select HOTPLUG_SPLIT_STARTUP
+
 config GENERIC_ENTRY
        bool
 
@@ -285,6 +308,9 @@ config ARCH_HAS_DMA_SET_UNCACHED
 config ARCH_HAS_DMA_CLEAR_UNCACHED
        bool
 
+config ARCH_HAS_CPU_FINALIZE_INIT
+       bool
+
 # Select if arch init_task must go in the __init_task_data section
 config ARCH_TASK_STRUCT_ON_STACK
        bool
@@ -1188,13 +1214,6 @@ config COMPAT_32BIT_TIME
 config ARCH_NO_PREEMPT
        bool
 
-config ARCH_EPHEMERAL_INODES
-       def_bool n
-       help
-         An arch should select this symbol if it doesn't keep track of inode
-         instances on its own, but instead relies on something else (e.g. the
-         host kernel for an UML kernel).
-
 config ARCH_SUPPORTS_RT
        bool
 
index f2861a4..cbd9244 100644 (file)
@@ -200,25 +200,6 @@ ATOMIC_OPS(xor, xor)
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-#define arch_atomic64_cmpxchg(v, old, new) \
-       (arch_cmpxchg(&((v)->counter), old, new))
-#define arch_atomic64_xchg(v, new) \
-       (arch_xchg(&((v)->counter), new))
-
-#define arch_atomic_cmpxchg(v, old, new) \
-       (arch_cmpxchg(&((v)->counter), old, new))
-#define arch_atomic_xchg(v, new) \
-       (arch_xchg(&((v)->counter), new))
-
-/**
- * arch_atomic_fetch_add_unless - add unless the number is a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v.
- */
 static __inline__ int arch_atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
        int c, new, old;
@@ -242,15 +223,6 @@ static __inline__ int arch_atomic_fetch_add_unless(atomic_t *v, int a, int u)
 }
 #define arch_atomic_fetch_add_unless arch_atomic_fetch_add_unless
 
-/**
- * arch_atomic64_fetch_add_unless - add unless the number is a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v.
- */
 static __inline__ s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
        s64 c, new, old;
@@ -274,13 +246,6 @@ static __inline__ s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u
 }
 #define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
 
-/*
- * arch_atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic_t
- *
- * The function returns the old value of *v minus 1, even if
- * the atomic variable, v, was not decremented.
- */
 static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
 {
        s64 old, tmp;
diff --git a/arch/alpha/include/asm/bugs.h b/arch/alpha/include/asm/bugs.h
deleted file mode 100644 (file)
index 78030d1..0000000
+++ /dev/null
@@ -1,20 +0,0 @@
-/*
- *  include/asm-alpha/bugs.h
- *
- *  Copyright (C) 1994  Linus Torvalds
- */
-
-/*
- * This is included by init/main.c to check for architecture-dependent bugs.
- *
- * Needs:
- *     void check_bugs(void);
- */
-
-/*
- * I don't know of any alpha bugs yet.. Nice chip
- */
-
-static void check_bugs(void)
-{
-}
index 2a9a877..d98701e 100644 (file)
@@ -1014,8 +1014,6 @@ SYSCALL_DEFINE2(osf_settimeofday, struct timeval32 __user *, tv,
        return do_sys_settimeofday64(tv ? &kts : NULL, tz ? &ktz : NULL);
 }
 
-asmlinkage long sys_ni_posix_timers(void);
-
 SYSCALL_DEFINE2(osf_utimes, const char __user *, filename,
                struct timeval32 __user *, tvs)
 {
index 33bf3a6..b650ff1 100644 (file)
@@ -658,7 +658,7 @@ setup_arch(char **cmdline_p)
 #endif
 
        /* Default root filesystem to sda2.  */
-       ROOT_DEV = Root_SDA2;
+       ROOT_DEV = MKDEV(SCSI_DISK0_MAJOR, 2);
 
 #ifdef CONFIG_EISA
        /* FIXME:  only set this when we actually have EISA in this box? */
index 2c83034..89d12a6 100644 (file)
@@ -81,6 +81,11 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t *v)         \
 ATOMIC_OPS(add, +=, add)
 ATOMIC_OPS(sub, -=, sub)
 
+#define arch_atomic_fetch_add          arch_atomic_fetch_add
+#define arch_atomic_fetch_sub          arch_atomic_fetch_sub
+#define arch_atomic_add_return         arch_atomic_add_return
+#define arch_atomic_sub_return         arch_atomic_sub_return
+
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(op, c_op, asm_op)                                   \
        ATOMIC_OP(op, c_op, asm_op)                                     \
@@ -92,7 +97,11 @@ ATOMIC_OPS(or, |=, or)
 ATOMIC_OPS(xor, ^=, xor)
 
 #define arch_atomic_andnot             arch_atomic_andnot
+
+#define arch_atomic_fetch_and          arch_atomic_fetch_and
 #define arch_atomic_fetch_andnot       arch_atomic_fetch_andnot
+#define arch_atomic_fetch_or           arch_atomic_fetch_or
+#define arch_atomic_fetch_xor          arch_atomic_fetch_xor
 
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
index 52ee51e..592d7ff 100644 (file)
 #include <asm/atomic-spinlock.h>
 #endif
 
-#define arch_atomic_cmpxchg(v, o, n)                                   \
-({                                                                     \
-       arch_cmpxchg(&((v)->counter), (o), (n));                        \
-})
-
-#ifdef arch_cmpxchg_relaxed
-#define arch_atomic_cmpxchg_relaxed(v, o, n)                           \
-({                                                                     \
-       arch_cmpxchg_relaxed(&((v)->counter), (o), (n));                \
-})
-#endif
-
-#define arch_atomic_xchg(v, n)                                         \
-({                                                                     \
-       arch_xchg(&((v)->counter), (n));                                \
-})
-
-#ifdef arch_xchg_relaxed
-#define arch_atomic_xchg_relaxed(v, n)                                 \
-({                                                                     \
-       arch_xchg_relaxed(&((v)->counter), (n));                        \
-})
-#endif
-
 /*
  * 64-bit atomics
  */
index c5a8010..6b6db98 100644 (file)
@@ -159,6 +159,7 @@ arch_atomic64_cmpxchg(atomic64_t *ptr, s64 expected, s64 new)
 
        return prev;
 }
+#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
 
 static inline s64 arch_atomic64_xchg(atomic64_t *ptr, s64 new)
 {
@@ -179,14 +180,7 @@ static inline s64 arch_atomic64_xchg(atomic64_t *ptr, s64 new)
 
        return prev;
 }
-
-/**
- * arch_atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic64_t
- *
- * The function returns the old value of *v minus 1, even if
- * the atomic variable, v, was not decremented.
- */
+#define arch_atomic64_xchg arch_atomic64_xchg
 
 static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
 {
@@ -212,15 +206,6 @@ static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
 }
 #define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
 
-/**
- * arch_atomic64_fetch_add_unless - add unless the number is a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, if it was not @u.
- * Returns the old value of @v
- */
 static inline s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
        s64 old, temp;
index 0fb4b21..cef741b 100644 (file)
@@ -5,6 +5,7 @@ config ARM
        select ARCH_32BIT_OFF_T
        select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE if HAVE_KRETPROBES && FRAME_POINTER && !ARM_UNWIND
        select ARCH_HAS_BINFMT_FLAT
+       select ARCH_HAS_CPU_FINALIZE_INIT if MMU
        select ARCH_HAS_CURRENT_STACK_POINTER
        select ARCH_HAS_DEBUG_VIRTUAL if MMU
        select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE
@@ -124,6 +125,7 @@ config ARM
        select HAVE_SYSCALL_TRACEPOINTS
        select HAVE_UID16
        select HAVE_VIRT_CPU_ACCOUNTING_GEN
+       select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
        select IRQ_FORCED_THREADING
        select MODULES_USE_ELF_REL
        select NEED_DMA_MAP_STATE
@@ -1780,7 +1782,7 @@ config VFP
          Say Y to include VFP support code in the kernel. This is needed
          if your hardware includes a VFP unit.
 
-         Please see <file:Documentation/arm/vfp/release-notes.rst> for
+         Please see <file:Documentation/arch/arm/vfp/release-notes.rst> for
          release notes and additional status information.
 
          Say N if your target does not have VFP hardware.
index 1feb6b0..627752f 100644 (file)
@@ -2,6 +2,7 @@
 #include <linux/libfdt_env.h>
 #include <asm/setup.h>
 #include <libfdt.h>
+#include "misc.h"
 
 #if defined(CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_EXTEND)
 #define do_extend_cmdline 1
index 9291a26..aa85656 100644 (file)
@@ -3,6 +3,7 @@
 #include <linux/kernel.h>
 #include <linux/libfdt.h>
 #include <linux/sizes.h>
+#include "misc.h"
 
 static const void *get_prop(const void *fdt, const char *node_path,
                            const char *property, int minlen)
index abfed1a..6b4baa6 100644 (file)
@@ -103,9 +103,6 @@ static void putstr(const char *ptr)
 /*
  * gzip declarations
  */
-extern char input_data[];
-extern char input_data_end[];
-
 unsigned char *output_data;
 
 unsigned long free_mem_ptr;
@@ -131,9 +128,6 @@ asmlinkage void __div0(void)
        error("Attempting division by 0!");
 }
 
-extern int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x));
-
-
 void
 decompress_kernel(unsigned long output_start, unsigned long free_mem_ptr_p,
                unsigned long free_mem_ptr_end_p,
index c958dcc..6da00a2 100644 (file)
@@ -6,5 +6,16 @@
 void error(char *x) __noreturn;
 extern unsigned long free_mem_ptr;
 extern unsigned long free_mem_end_ptr;
+void __div0(void);
+void
+decompress_kernel(unsigned long output_start, unsigned long free_mem_ptr_p,
+                 unsigned long free_mem_ptr_end_p, int arch_id);
+void fortify_panic(const char *name);
+int atags_to_fdt(void *atag_list, void *fdt, int total_space);
+uint32_t fdt_check_mem_start(uint32_t mem_start, const void *fdt);
+int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x));
+
+extern char input_data[];
+extern char input_data_end[];
 
 #endif
index 2fc9a5d..625b9b3 100644 (file)
 
                interrupt-parent = <&gpio1>;
                interrupts = <31 0>;
-               pendown-gpio = <&gpio1 31 0>;
+               pendown-gpio = <&gpio1 31 GPIO_ACTIVE_LOW>;
 
 
                ti,x-min = /bits/ 16 <0x0>;
index aa5cc0e..217e9b9 100644 (file)
 };
 
 &shdwc {
-       atmel,shdwc-debouncer = <976>;
+       debounce-delay-us = <976>;
        status = "okay";
 
        input@0 {
index 88869ca..045cb25 100644 (file)
                                        compatible = "ti,ads7843";
                                        interrupts-extended = <&pioC 2 IRQ_TYPE_EDGE_BOTH>;
                                        spi-max-frequency = <3000000>;
-                                       pendown-gpio = <&pioC 2 GPIO_ACTIVE_HIGH>;
+                                       pendown-gpio = <&pioC 2 GPIO_ACTIVE_LOW>;
 
                                        ti,x-min = /bits/ 16 <150>;
                                        ti,x-max = /bits/ 16 <3830>;
index 78555a6..7b7e6c2 100644 (file)
        pinctrl-names = "default";
        pinctrl-0 = <&pinctrl_pcie>;
        reset-gpio = <&gpio6 7 GPIO_ACTIVE_LOW>;
+       vpcie-supply = <&reg_pcie>;
        status = "okay";
 };
 
index 5882c75..32a6022 100644 (file)
@@ -8,6 +8,7 @@
 #include <dt-bindings/input/input.h>
 #include <dt-bindings/leds/common.h>
 #include <dt-bindings/pwm/pwm.h>
+#include <dt-bindings/regulator/dlg,da9063-regulator.h>
 #include "imx6ull.dtsi"
 
 / {
 
                regulators {
                        vdd_soc_in_1v4: buck1 {
+                               regulator-allowed-modes = <DA9063_BUCK_MODE_SLEEP>; /* PFM */
                                regulator-always-on;
                                regulator-boot-on;
+                               regulator-initial-mode = <DA9063_BUCK_MODE_SLEEP>;
                                regulator-max-microvolt = <1400000>;
                                regulator-min-microvolt = <1400000>;
                                regulator-name = "vdd_soc_in_1v4";
                        };
 
                        vcc_3v3: buck2 {
+                               regulator-allowed-modes = <DA9063_BUCK_MODE_SYNC>; /* PWM */
                                regulator-always-on;
                                regulator-boot-on;
+                               regulator-initial-mode = <DA9063_BUCK_MODE_SYNC>;
                                regulator-max-microvolt = <3300000>;
                                regulator-min-microvolt = <3300000>;
                                regulator-name = "vcc_3v3";
                         * the voltage is set to 1.5V.
                         */
                        vcc_ddr_1v35: buck3 {
+                               regulator-allowed-modes = <DA9063_BUCK_MODE_SYNC>; /* PWM */
                                regulator-always-on;
                                regulator-boot-on;
+                               regulator-initial-mode = <DA9063_BUCK_MODE_SYNC>;
                                regulator-max-microvolt = <1500000>;
                                regulator-min-microvolt = <1500000>;
                                regulator-name = "vcc_ddr_1v35";
index d917dc4..6ad39dc 100644 (file)
@@ -64,7 +64,7 @@
                interrupt-parent = <&gpio2>;
                interrupts = <7 0>;
                spi-max-frequency = <1000000>;
-               pendown-gpio = <&gpio2 7 0>;
+               pendown-gpio = <&gpio2 7 GPIO_ACTIVE_LOW>;
                vcc-supply = <&reg_3p3v>;
                ti,x-min = /bits/ 16 <0>;
                ti,x-max = /bits/ 16 <4095>;
index f483bc0..234e5fc 100644 (file)
                pinctrl-0 = <&pinctrl_tsc2046_pendown>;
                interrupt-parent = <&gpio2>;
                interrupts = <29 0>;
-               pendown-gpio = <&gpio2 29 GPIO_ACTIVE_HIGH>;
+               pendown-gpio = <&gpio2 29 GPIO_ACTIVE_LOW>;
                touchscreen-max-pressure = <255>;
                wakeup-source;
        };
index e61b8a2..51baedf 100644 (file)
 
                interrupt-parent = <&gpio2>;
                interrupts = <25 0>;            /* gpio_57 */
-               pendown-gpio = <&gpio2 25 GPIO_ACTIVE_HIGH>;
+               pendown-gpio = <&gpio2 25 GPIO_ACTIVE_LOW>;
 
                ti,x-min = /bits/ 16 <0x0>;
                ti,x-max = /bits/ 16 <0x0fff>;
index 3decc2d..a7f99ae 100644 (file)
@@ -54,7 +54,7 @@
 
                interrupt-parent = <&gpio1>;
                interrupts = <27 0>;            /* gpio_27 */
-               pendown-gpio = <&gpio1 27 GPIO_ACTIVE_HIGH>;
+               pendown-gpio = <&gpio1 27 GPIO_ACTIVE_LOW>;
 
                ti,x-min = /bits/ 16 <0x0>;
                ti,x-max = /bits/ 16 <0x0fff>;
index c595afe..d310b5c 100644 (file)
                interrupt-parent = <&gpio1>;
                interrupts = <8 0>;   /* boot6 / gpio_8 */
                spi-max-frequency = <1000000>;
-               pendown-gpio = <&gpio1 8 GPIO_ACTIVE_HIGH>;
+               pendown-gpio = <&gpio1 8 GPIO_ACTIVE_LOW>;
                vcc-supply = <&reg_vcc3>;
                pinctrl-names = "default";
                pinctrl-0 = <&tsc2048_pins>;
index 1d6e88f..c3570ac 100644 (file)
 
                interrupt-parent = <&gpio4>;
                interrupts = <18 0>;                    /* gpio_114 */
-               pendown-gpio = <&gpio4 18 GPIO_ACTIVE_HIGH>;
+               pendown-gpio = <&gpio4 18 GPIO_ACTIVE_LOW>;
 
                ti,x-min = /bits/ 16 <0x0>;
                ti,x-max = /bits/ 16 <0x0fff>;
index 7e30f9d..d95a0e1 100644 (file)
 
                interrupt-parent = <&gpio4>;
                interrupts = <18 0>;                    /* gpio_114 */
-               pendown-gpio = <&gpio4 18 GPIO_ACTIVE_HIGH>;
+               pendown-gpio = <&gpio4 18 GPIO_ACTIVE_LOW>;
 
                ti,x-min = /bits/ 16 <0x0>;
                ti,x-max = /bits/ 16 <0x0fff>;
index 5598537..4c3b6ba 100644 (file)
                pinctrl-0 = <&penirq_pins>;
                interrupt-parent = <&gpio3>;
                interrupts = <30 IRQ_TYPE_NONE>;        /* GPIO_94 */
-               pendown-gpio = <&gpio3 30 GPIO_ACTIVE_HIGH>;
+               pendown-gpio = <&gpio3 30 GPIO_ACTIVE_LOW>;
                vcc-supply = <&vaux4>;
 
                ti,x-min = /bits/ 16 <0>;
index 2d87b9f..af288d6 100644 (file)
 
                interrupt-parent = <&gpio1>;
                interrupts = <15 0>;                    /* gpio1_wk15 */
-               pendown-gpio = <&gpio1 15 GPIO_ACTIVE_HIGH>;
+               pendown-gpio = <&gpio1 15 GPIO_ACTIVE_LOW>;
 
 
                ti,x-min = /bits/ 16 <0x0>;
index 7a80e1c..aa0e0e8 100644 (file)
                function = "gpio";
                drive-strength = <8>;
                bias-disable;
-               input-enable;
        };
 
        wlan_hostwake_default_state: wlan-hostwake-default-state {
                function = "gpio";
                drive-strength = <2>;
                bias-disable;
-               input-enable;
        };
 
        wlan_regulator_default_state: wlan-regulator-default-state {
index d640960..5593a3a 100644 (file)
                function = "gpio";
                drive-strength = <2>;
                bias-disable;
-               input-enable;
        };
 
        wlan_regulator_default_state: wlan-regulator-default-state {
index b823812..b887e53 100644 (file)
                function = "gpio";
                drive-strength = <2>;
                bias-disable;
-               input-enable;
        };
 
        touch_pins: touch-state {
 
                        drive-strength = <8>;
                        bias-pull-down;
-                       input-enable;
                };
 
                reset-pins {
                function = "gpio";
                drive-strength = <2>;
                bias-disable;
-               input-enable;
        };
 
        wlan_regulator_default_state: wlan-regulator-default-state {
index 672b246..d228920 100644 (file)
@@ -83,6 +83,7 @@
                L2: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
 
                idle-states {
index b653ea4..83839e1 100644 (file)
@@ -74,6 +74,7 @@
                L2: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                        qcom,saw = <&saw_l2>;
                };
 
index dfcfb33..f0ef86f 100644 (file)
                L2: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                        qcom,saw = <&saw_l2>;
                };
        };
index af67647..7581845 100644 (file)
@@ -45,6 +45,7 @@
                L2: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index a830476..b269fdc 100644 (file)
@@ -49,7 +49,6 @@
                gpioext1-pins {
                        pins = "gpio2";
                        function = "gpio";
-                       input-enable;
                        bias-disable;
                };
        };
index f601b40..78023ed 100644 (file)
@@ -36,6 +36,7 @@
                L2: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 2a668cd..616fef2 100644 (file)
@@ -42,6 +42,7 @@
                L2: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index ab35f2d..861695c 100644 (file)
                pins = "gpio73";
                function = "gpio";
                bias-disable;
-               input-enable;
        };
 
        touch_pin: touch-state {
 
                        drive-strength = <2>;
                        bias-disable;
-                       input-enable;
                };
 
                reset-pins {
index d3bec03..68a2f90 100644 (file)
                function = "gpio";
                drive-strength = <2>;
                bias-disable;
-               input-enable;
        };
 
        sdc1_on: sdc1-on-state {
index 8208012..7ed0d92 100644 (file)
@@ -80,6 +80,7 @@
                L2: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                        qcom,saw = <&saw_l2>;
                };
 
index 8d2a054..8230d0e 100644 (file)
                        function = "gpio";
                        drive-strength = <2>;
                        bias-disable;
-                       input-enable;
                };
 
                reset-pins {
index b9698ff..eb505d6 100644 (file)
                        pins = "gpio75";
                        function = "gpio";
                        drive-strength = <16>;
-                       input-enable;
                };
 
                devwake-pins {
        i2c_touchkey_pins: i2c-touchkey-state {
                pins = "gpio95", "gpio96";
                function = "gpio";
-               input-enable;
                bias-pull-up;
        };
 
        i2c_led_gpioex_pins: i2c-led-gpioex-state {
                pins = "gpio120", "gpio121";
                function = "gpio";
-               input-enable;
                bias-pull-down;
        };
 
        wifi_pin: wifi-state {
                pins = "gpio92";
                function = "gpio";
-               input-enable;
                bias-pull-down;
        };
 
index 04bc58d..0f650ed 100644 (file)
                function = "gpio";
                drive-strength = <2>;
                bias-disable;
-               input-enable;
        };
 
        bt_host_wake_pin: bt-host-wake-state {
index c9e05e3..00bf53f 100644 (file)
                        interrupt-names = "tx", "rx0", "rx1", "sce";
                        resets = <&rcc STM32F4_APB1_RESET(CAN2)>;
                        clocks = <&rcc 0 STM32F4_APB1_CLOCK(CAN2)>;
+                       st,can-secondary;
                        st,gcan = <&gcan>;
                        status = "disabled";
                };
index c8e6c52..9f65403 100644 (file)
                                        slew-rate = <2>;
                                };
                        };
+
+                       can1_pins_a: can1-0 {
+                               pins1 {
+                                       pinmux = <STM32_PINMUX('A', 12, AF9)>; /* CAN1_TX */
+                               };
+                               pins2 {
+                                       pinmux = <STM32_PINMUX('A', 11, AF9)>; /* CAN1_RX */
+                                       bias-pull-up;
+                               };
+                       };
+
+                       can1_pins_b: can1-1 {
+                               pins1 {
+                                       pinmux = <STM32_PINMUX('B', 9, AF9)>; /* CAN1_TX */
+                               };
+                               pins2 {
+                                       pinmux = <STM32_PINMUX('B', 8, AF9)>; /* CAN1_RX */
+                                       bias-pull-up;
+                               };
+                       };
+
+                       can1_pins_c: can1-2 {
+                               pins1 {
+                                       pinmux = <STM32_PINMUX('D', 1, AF9)>; /* CAN1_TX */
+                               };
+                               pins2 {
+                                       pinmux = <STM32_PINMUX('D', 0, AF9)>; /* CAN1_RX */
+                                       bias-pull-up;
+
+                               };
+                       };
+
+                       can1_pins_d: can1-3 {
+                               pins1 {
+                                       pinmux = <STM32_PINMUX('H', 13, AF9)>; /* CAN1_TX */
+                               };
+                               pins2 {
+                                       pinmux = <STM32_PINMUX('H', 14, AF9)>; /* CAN1_RX */
+                                       bias-pull-up;
+
+                               };
+                       };
+
+                       can2_pins_a: can2-0 {
+                               pins1 {
+                                       pinmux = <STM32_PINMUX('B', 6, AF9)>; /* CAN2_TX */
+                               };
+                               pins2 {
+                                       pinmux = <STM32_PINMUX('B', 5, AF9)>; /* CAN2_RX */
+                                       bias-pull-up;
+                               };
+                       };
+
+                       can2_pins_b: can2-1 {
+                               pins1 {
+                                       pinmux = <STM32_PINMUX('B', 13, AF9)>; /* CAN2_TX */
+                               };
+                               pins2 {
+                                       pinmux = <STM32_PINMUX('B', 12, AF9)>; /* CAN2_RX */
+                                       bias-pull-up;
+                               };
+                       };
+
+                       can3_pins_a: can3-0 {
+                               pins1 {
+                                       pinmux = <STM32_PINMUX('A', 15, AF11)>; /* CAN3_TX */
+                               };
+                               pins2 {
+                                       pinmux = <STM32_PINMUX('A', 8, AF11)>; /* CAN3_RX */
+                                       bias-pull-up;
+                               };
+                       };
+
+                       can3_pins_b: can3-1 {
+                               pins1 {
+                                       pinmux = <STM32_PINMUX('B', 4, AF11)>;  /* CAN3_TX */
+                               };
+                               pins2 {
+                                       pinmux = <STM32_PINMUX('B', 3, AF11)>; /* CAN3_RX */
+                                       bias-pull-up;
+                               };
+                       };
                };
        };
 };
index 3b88209..ff1f9a1 100644 (file)
                reg = <0x2c0f0000 0x1000>;
                interrupts = <0 84 4>;
                cache-level = <2>;
+               cache-unified;
        };
 
        pmu {
index 8a9aeeb..e013ff1 100644 (file)
@@ -21,7 +21,7 @@
 /*
  * The public API for this code is documented in arch/arm/include/asm/mcpm.h.
  * For a comprehensive description of the main algorithm used here, please
- * see Documentation/arm/cluster-pm-race-avoidance.rst.
+ * see Documentation/arch/arm/cluster-pm-race-avoidance.rst.
  */
 
 struct sync_struct mcpm_sync;
index 299495c..f590e80 100644 (file)
@@ -5,7 +5,7 @@
  * Created by:  Nicolas Pitre, March 2012
  * Copyright:   (C) 2012-2013  Linaro Limited
  *
- * Refer to Documentation/arm/cluster-pm-race-avoidance.rst
+ * Refer to Documentation/arch/arm/cluster-pm-race-avoidance.rst
  * for details of the synchronisation algorithms used here.
  */
 
index 1fa09c4..c5eaed5 100644 (file)
@@ -6,7 +6,7 @@
  * Copyright:  (C) 2012-2013  Linaro Limited
  *
  * This algorithm is described in more detail in
- * Documentation/arm/vlocks.rst.
+ * Documentation/arch/arm/vlocks.rst.
  */
 
 #include <linux/linkage.h>
index 78d3d4b..f3cd04f 100644 (file)
@@ -92,7 +92,7 @@
 
 #define RETURN_READ_PMEVCNTRN(n) \
        return read_sysreg(PMEVCNTR##n)
-static unsigned long read_pmevcntrn(int n)
+static inline unsigned long read_pmevcntrn(int n)
 {
        PMEVN_SWITCH(n, RETURN_READ_PMEVCNTRN);
        return 0;
@@ -100,14 +100,14 @@ static unsigned long read_pmevcntrn(int n)
 
 #define WRITE_PMEVCNTRN(n) \
        write_sysreg(val, PMEVCNTR##n)
-static void write_pmevcntrn(int n, unsigned long val)
+static inline void write_pmevcntrn(int n, unsigned long val)
 {
        PMEVN_SWITCH(n, WRITE_PMEVCNTRN);
 }
 
 #define WRITE_PMEVTYPERN(n) \
        write_sysreg(val, PMEVTYPER##n)
-static void write_pmevtypern(int n, unsigned long val)
+static inline void write_pmevtypern(int n, unsigned long val)
 {
        PMEVN_SWITCH(n, WRITE_PMEVTYPERN);
 }
@@ -222,6 +222,11 @@ static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
        return false;
 }
 
+static inline bool kvm_set_pmuserenr(u64 val)
+{
+       return false;
+}
+
 /* PMU Version in DFR Register */
 #define ARMV8_PMU_DFR_VER_NI        0
 #define ARMV8_PMU_DFR_VER_V3P4      0x5
index 505a306..aebe2c8 100644 (file)
@@ -394,6 +394,23 @@ ALT_UP_B(.L0_\@)
 #endif
        .endm
 
+/*
+ * Raw SMP data memory barrier
+ */
+       .macro  __smp_dmb mode
+#if __LINUX_ARM_ARCH__ >= 7
+       .ifeqs "\mode","arm"
+       dmb     ish
+       .else
+       W(dmb)  ish
+       .endif
+#elif __LINUX_ARM_ARCH__ == 6
+       mcr     p15, 0, r0, c7, c10, 5  @ dmb
+#else
+       .error "Incompatible SMP platform"
+#endif
+       .endm
+
 #if defined(CONFIG_CPU_V7M)
        /*
         * setmode is used to assert to be in svc mode during boot. For v7-M
index db8512d..f0e3b01 100644 (file)
@@ -197,6 +197,16 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t *v)               \
        return val;                                                     \
 }
 
+#define arch_atomic_add_return                 arch_atomic_add_return
+#define arch_atomic_sub_return                 arch_atomic_sub_return
+#define arch_atomic_fetch_add                  arch_atomic_fetch_add
+#define arch_atomic_fetch_sub                  arch_atomic_fetch_sub
+
+#define arch_atomic_fetch_and                  arch_atomic_fetch_and
+#define arch_atomic_fetch_andnot               arch_atomic_fetch_andnot
+#define arch_atomic_fetch_or                   arch_atomic_fetch_or
+#define arch_atomic_fetch_xor                  arch_atomic_fetch_xor
+
 static inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
 {
        int ret;
@@ -210,8 +220,7 @@ static inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
 
        return ret;
 }
-
-#define arch_atomic_fetch_andnot               arch_atomic_fetch_andnot
+#define arch_atomic_cmpxchg arch_atomic_cmpxchg
 
 #endif /* __LINUX_ARM_ARCH__ */
 
@@ -240,8 +249,6 @@ ATOMIC_OPS(xor, ^=, eor)
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-#define arch_atomic_xchg(v, new) (arch_xchg(&((v)->counter), new))
-
 #ifndef CONFIG_GENERIC_ATOMIC64
 typedef struct {
        s64 counter;
index 97a312b..fe38555 100644 (file)
@@ -1,7 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 /*
- *  arch/arm/include/asm/bugs.h
- *
  *  Copyright (C) 1995-2003 Russell King
  */
 #ifndef __ASM_BUGS_H
@@ -10,10 +8,8 @@
 extern void check_writebuffer_bugs(void);
 
 #ifdef CONFIG_MMU
-extern void check_bugs(void);
 extern void check_other_bugs(void);
 #else
-#define check_bugs() do { } while (0)
 #define check_other_bugs() do { } while (0)
 #endif
 
index 7e9251c..5be3ddc 100644 (file)
@@ -75,6 +75,10 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
        return !strcasecmp(sym, name);
 }
 
+void prepare_ftrace_return(unsigned long *parent, unsigned long self,
+                          unsigned long frame_pointer,
+                          unsigned long stack_pointer);
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* _ASM_ARM_FTRACE */
index 9349e7a..2b18a25 100644 (file)
@@ -56,7 +56,6 @@ struct machine_desc {
        void                    (*init_time)(void);
        void                    (*init_machine)(void);
        void                    (*init_late)(void);
-       void                    (*handle_irq)(struct pt_regs *);
        void                    (*restart)(enum reboot_mode, const char *);
 };
 
index 74bb594..28c63d1 100644 (file)
@@ -113,6 +113,28 @@ struct cpu_user_fns {
                        unsigned long vaddr, struct vm_area_struct *vma);
 };
 
+void fa_copy_user_highpage(struct page *to, struct page *from,
+       unsigned long vaddr, struct vm_area_struct *vma);
+void fa_clear_user_highpage(struct page *page, unsigned long vaddr);
+void feroceon_copy_user_highpage(struct page *to, struct page *from,
+       unsigned long vaddr, struct vm_area_struct *vma);
+void feroceon_clear_user_highpage(struct page *page, unsigned long vaddr);
+void v4_mc_copy_user_highpage(struct page *to, struct page *from,
+       unsigned long vaddr, struct vm_area_struct *vma);
+void v4_mc_clear_user_highpage(struct page *page, unsigned long vaddr);
+void v4wb_copy_user_highpage(struct page *to, struct page *from,
+       unsigned long vaddr, struct vm_area_struct *vma);
+void v4wb_clear_user_highpage(struct page *page, unsigned long vaddr);
+void v4wt_copy_user_highpage(struct page *to, struct page *from,
+       unsigned long vaddr, struct vm_area_struct *vma);
+void v4wt_clear_user_highpage(struct page *page, unsigned long vaddr);
+void xsc3_mc_copy_user_highpage(struct page *to, struct page *from,
+       unsigned long vaddr, struct vm_area_struct *vma);
+void xsc3_mc_clear_user_highpage(struct page *page, unsigned long vaddr);
+void xscale_mc_copy_user_highpage(struct page *to, struct page *from,
+       unsigned long vaddr, struct vm_area_struct *vma);
+void xscale_mc_clear_user_highpage(struct page *page, unsigned long vaddr);
+
 #ifdef MULTI_USER
 extern struct cpu_user_fns cpu_user;
 
index 483b8dd..7f44e88 100644 (file)
@@ -193,5 +193,8 @@ static inline unsigned long it_advance(unsigned long cpsr)
        return cpsr;
 }
 
+int syscall_trace_enter(struct pt_regs *regs);
+void syscall_trace_exit(struct pt_regs *regs);
+
 #endif /* __ASSEMBLY__ */
 #endif
index ba0872a..546af8b 100644 (file)
@@ -5,7 +5,7 @@
  *  Copyright (C) 1997-1999 Russell King
  *
  *  Structure passed to kernel to tell it about the
- *  hardware it's running on.  See Documentation/arm/setup.rst
+ *  hardware it's running on.  See Documentation/arch/arm/setup.rst
  *  for more info.
  */
 #ifndef __ASMARM_SETUP_H
@@ -28,4 +28,11 @@ extern void save_atags(const struct tag *tags);
 static inline void save_atags(const struct tag *tags) { }
 #endif
 
+struct machine_desc;
+void init_default_cache_policy(unsigned long);
+void paging_init(const struct machine_desc *desc);
+void early_mm_init(const struct machine_desc *);
+void adjust_lowmem_bounds(void);
+void setup_dma_zone(const struct machine_desc *desc);
+
 #endif
index 430be77..8b84092 100644 (file)
@@ -22,4 +22,9 @@ typedef struct {
 #define __ARCH_HAS_SA_RESTORER
 
 #include <asm/sigcontext.h>
+
+void do_rseq_syscall(struct pt_regs *regs);
+int do_work_pending(struct pt_regs *regs, unsigned int thread_flags,
+                   int syscall);
+
 #endif
index 7c1c90d..8c05a7f 100644 (file)
@@ -64,7 +64,7 @@ extern void secondary_startup_arm(void);
 
 extern int __cpu_disable(void);
 
-extern void __cpu_die(unsigned int cpu);
+static inline void __cpu_die(unsigned int cpu) { }
 
 extern void arch_send_call_function_single_ipi(int cpu);
 extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
index 85f9e53..d9c28b3 100644 (file)
@@ -35,4 +35,8 @@ static inline void spectre_v2_update_state(unsigned int state,
 
 int spectre_bhb_update_vectors(unsigned int method);
 
+void cpu_v7_ca8_ibe(void);
+void cpu_v7_ca15_ibe(void);
+void cpu_v7_bugs_init(void);
+
 #endif
index 5063142..be81b9c 100644 (file)
@@ -13,5 +13,6 @@ extern void cpu_resume(void);
 extern void cpu_resume_no_hyp(void);
 extern void cpu_resume_arm(void);
 extern int cpu_suspend(unsigned long, int (*)(unsigned long));
+extern void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr);
 
 #endif
index 6f5d627..f46b3c5 100644 (file)
  * ops which are SMP safe even on a UP kernel.
  */
 
+/*
+ * Unordered
+ */
+
 #define sync_set_bit(nr, p)            _set_bit(nr, p)
 #define sync_clear_bit(nr, p)          _clear_bit(nr, p)
 #define sync_change_bit(nr, p)         _change_bit(nr, p)
-#define sync_test_and_set_bit(nr, p)   _test_and_set_bit(nr, p)
-#define sync_test_and_clear_bit(nr, p) _test_and_clear_bit(nr, p)
-#define sync_test_and_change_bit(nr, p)        _test_and_change_bit(nr, p)
 #define sync_test_bit(nr, addr)                test_bit(nr, addr)
-#define arch_sync_cmpxchg              arch_cmpxchg
 
+/*
+ * Fully ordered
+ */
+
+int _sync_test_and_set_bit(int nr, volatile unsigned long * p);
+#define sync_test_and_set_bit(nr, p)   _sync_test_and_set_bit(nr, p)
+
+int _sync_test_and_clear_bit(int nr, volatile unsigned long * p);
+#define sync_test_and_clear_bit(nr, p) _sync_test_and_clear_bit(nr, p)
+
+int _sync_test_and_change_bit(int nr, volatile unsigned long * p);
+#define sync_test_and_change_bit(nr, p)        _sync_test_and_change_bit(nr, p)
+
+#define arch_sync_cmpxchg(ptr, old, new)                               \
+({                                                                     \
+       __typeof__(*(ptr)) __ret;                                       \
+       __smp_mb__before_atomic();                                      \
+       __ret = arch_cmpxchg_relaxed((ptr), (old), (new));              \
+       __smp_mb__after_atomic();                                       \
+       __ret;                                                          \
+})
 
 #endif
diff --git a/arch/arm/include/asm/syscalls.h b/arch/arm/include/asm/syscalls.h
new file mode 100644 (file)
index 0000000..5912e7c
--- /dev/null
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_SYSCALLS_H
+#define __ASM_SYSCALLS_H
+
+#include <linux/linkage.h>
+#include <linux/types.h>
+
+struct pt_regs;
+asmlinkage int sys_sigreturn(struct pt_regs *regs);
+asmlinkage int sys_rt_sigreturn(struct pt_regs *regs);
+asmlinkage long sys_arm_fadvise64_64(int fd, int advice,
+                                    loff_t offset, loff_t len);
+
+struct oldabi_stat64;
+asmlinkage long sys_oabi_stat64(const char __user * filename,
+                               struct oldabi_stat64 __user * statbuf);
+asmlinkage long sys_oabi_lstat64(const char __user * filename,
+                                struct oldabi_stat64 __user * statbuf);
+asmlinkage long sys_oabi_fstat64(unsigned long fd,
+                                struct oldabi_stat64 __user * statbuf);
+asmlinkage long sys_oabi_fstatat64(int dfd,
+                                  const char __user *filename,
+                                  struct oldabi_stat64  __user *statbuf,
+                                  int flag);
+asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
+                                unsigned long arg);
+struct oabi_epoll_event;
+asmlinkage long sys_oabi_epoll_ctl(int epfd, int op, int fd,
+                                  struct oabi_epoll_event __user *event);
+struct oabi_sembuf;
+struct old_timespec32;
+asmlinkage long sys_oabi_semtimedop(int semid,
+                                   struct oabi_sembuf __user *tsops,
+                                   unsigned nsops,
+                                   const struct old_timespec32 __user *timeout);
+asmlinkage long sys_oabi_semop(int semid, struct oabi_sembuf __user *tsops,
+                              unsigned nsops);
+asmlinkage int sys_oabi_ipc(uint call, int first, int second, int third,
+                           void __user *ptr, long fifth);
+struct sockaddr;
+asmlinkage long sys_oabi_bind(int fd, struct sockaddr __user *addr, int addrlen);
+asmlinkage long sys_oabi_connect(int fd, struct sockaddr __user *addr, int addrlen);
+asmlinkage long sys_oabi_sendto(int fd, void __user *buff,
+                               size_t len, unsigned flags,
+                               struct sockaddr __user *addr,
+                               int addrlen);
+struct user_msghdr;
+asmlinkage long sys_oabi_sendmsg(int fd, struct user_msghdr __user *msg, unsigned flags);
+asmlinkage long sys_oabi_socketcall(int call, unsigned long __user *args);
+
+#endif
index d8bd8a4..e1f7dca 100644 (file)
@@ -9,9 +9,7 @@
 #ifndef __ASMARM_TCM_H
 #define __ASMARM_TCM_H
 
-#ifndef CONFIG_HAVE_TCM
-#error "You should not be including tcm.h unless you have a TCM!"
-#endif
+#ifdef CONFIG_HAVE_TCM
 
 #include <linux/compiler.h>
 
@@ -29,4 +27,11 @@ void tcm_free(void *addr, size_t len);
 bool tcm_dtcm_present(void);
 bool tcm_itcm_present(void);
 
+void __init tcm_init(void);
+#else
+/* No TCM support, just blank inlines to be optimized out */
+static inline void tcm_init(void)
+{
+}
+#endif
 #endif
index 987fefb..0aaefe3 100644 (file)
@@ -35,4 +35,13 @@ extern void ptrace_break(struct pt_regs *regs);
 
 extern void *vectors_page;
 
+asmlinkage void dump_backtrace_stm(u32 *stack, u32 instruction, const char *loglvl);
+asmlinkage void do_undefinstr(struct pt_regs *regs);
+asmlinkage void handle_fiq_as_nmi(struct pt_regs *regs);
+asmlinkage void bad_mode(struct pt_regs *regs, int reason);
+asmlinkage int arm_syscall(int no, struct pt_regs *regs);
+asmlinkage void baddataabort(int code, unsigned long instr, struct pt_regs *regs);
+asmlinkage void __div0(void);
+asmlinkage void handle_bad_stack(struct pt_regs *regs);
+
 #endif
index b51f854..d60b09a 100644 (file)
@@ -40,6 +40,10 @@ extern void unwind_table_del(struct unwind_table *tab);
 extern void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk,
                             const char *loglvl);
 
+void __aeabi_unwind_cpp_pr0(void);
+void __aeabi_unwind_cpp_pr1(void);
+void __aeabi_unwind_cpp_pr2(void);
+
 #endif /* !__ASSEMBLY__ */
 
 #ifdef CONFIG_ARM_UNWIND
index 5b85889..422c3af 100644 (file)
@@ -24,6 +24,11 @@ static inline void arm_install_vdso(struct mm_struct *mm, unsigned long addr)
 
 #endif /* CONFIG_VDSO */
 
+int __vdso_clock_gettime(clockid_t clock, struct old_timespec32 *ts);
+int __vdso_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts);
+int __vdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz);
+int __vdso_clock_getres(clockid_t clock_id, struct old_timespec32 *res);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
index 157ea34..5b57b87 100644 (file)
 
 #ifndef __ASSEMBLY__
 void vfp_disable(void);
+void VFP_bounce(u32 trigger, u32 fpexc, struct pt_regs *regs);
 #endif
 
 #endif /* __ASM_VFP_H */
index 25ceda6..8e50e03 100644 (file)
@@ -9,7 +9,7 @@
  * published by the Free Software Foundation.
  *
  *  Structure passed to kernel to tell it about the
- *  hardware it's running on.  See Documentation/arm/setup.rst
+ *  hardware it's running on.  See Documentation/arch/arm/setup.rst
  *  for more info.
  */
 #ifndef _UAPI__ASMARM_SETUP_H
index 373b61f..33f6eb5 100644 (file)
@@ -127,7 +127,7 @@ static int __init parse_tag_cmdline(const struct tag *tag)
 #elif defined(CONFIG_CMDLINE_FORCE)
        pr_warn("Ignoring tag cmdline (using the default kernel command line)\n");
 #else
-       strlcpy(default_command_line, tag->u.cmdline.cmdline,
+       strscpy(default_command_line, tag->u.cmdline.cmdline,
                COMMAND_LINE_SIZE);
 #endif
        return 0;
@@ -224,7 +224,7 @@ setup_machine_tags(void *atags_vaddr, unsigned int machine_nr)
        }
 
        /* parse_early_param needs a boot_command_line */
-       strlcpy(boot_command_line, from, COMMAND_LINE_SIZE);
+       strscpy(boot_command_line, from, COMMAND_LINE_SIZE);
 
        return mdesc;
 }
index 14c8dbb..087bce6 100644 (file)
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/init.h>
+#include <linux/cpu.h>
 #include <asm/bugs.h>
 #include <asm/proc-fns.h>
 
@@ -11,7 +12,7 @@ void check_other_bugs(void)
 #endif
 }
 
-void __init check_bugs(void)
+void __init arch_cpu_finalize_init(void)
 {
        check_writebuffer_bugs();
        check_other_bugs();
index c39303e..291dc48 100644 (file)
@@ -875,7 +875,7 @@ ENDPROC(__bad_stack)
  * existing ones.  This mechanism should be used only for things that are
  * really small and justified, and not be abused freely.
  *
- * See Documentation/arm/kernel_user_helpers.rst for formal definitions.
+ * See Documentation/arch/arm/kernel_user_helpers.rst for formal definitions.
  */
  THUMB(        .arm    )
 
index 98ca3e3..d2c8e53 100644 (file)
@@ -45,6 +45,7 @@
 #include <asm/cacheflush.h>
 #include <asm/cp15.h>
 #include <asm/fiq.h>
+#include <asm/mach/irq.h>
 #include <asm/irq.h>
 #include <asm/traps.h>
 
index 89a5210..225c069 100644 (file)
@@ -8,16 +8,13 @@
 
 #include <linux/init.h>
 #include <linux/zutil.h>
+#include "head.h"
 
 /* for struct inflate_state */
 #include "../../../lib/zlib_inflate/inftrees.h"
 #include "../../../lib/zlib_inflate/inflate.h"
 #include "../../../lib/zlib_inflate/infutil.h"
 
-extern char __data_loc[];
-extern char _edata_loc[];
-extern char _sdata[];
-
 /*
  * This code is called very early during the boot process to decompress
  * the .data segment stored compressed in ROM. Therefore none of the global
diff --git a/arch/arm/kernel/head.h b/arch/arm/kernel/head.h
new file mode 100644 (file)
index 0000000..0eb5acc
--- /dev/null
@@ -0,0 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+extern char __data_loc[];
+extern char _edata_loc[];
+extern char _sdata[];
+
+int __init __inflate_kernel_data(void);
index d59c36d..e74d84f 100644 (file)
@@ -169,8 +169,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 
                        offset = __mem_to_opcode_arm(*(u32 *)loc);
                        offset = (offset & 0x00ffffff) << 2;
-                       if (offset & 0x02000000)
-                               offset -= 0x04000000;
+                       offset = sign_extend32(offset, 25);
 
                        offset += sym->st_value - loc;
 
@@ -236,7 +235,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
                case R_ARM_MOVT_PREL:
                        offset = tmp = __mem_to_opcode_arm(*(u32 *)loc);
                        offset = ((offset & 0xf0000) >> 4) | (offset & 0xfff);
-                       offset = (offset ^ 0x8000) - 0x8000;
+                       offset = sign_extend32(offset, 15);
 
                        offset += sym->st_value;
                        if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL ||
@@ -344,8 +343,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
                                ((~(j2 ^ sign) & 1) << 22) |
                                ((upper & 0x03ff) << 12) |
                                ((lower & 0x07ff) << 1);
-                       if (offset & 0x01000000)
-                               offset -= 0x02000000;
+                       offset = sign_extend32(offset, 24);
                        offset += sym->st_value - loc;
 
                        /*
@@ -401,7 +399,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
                        offset = ((upper & 0x000f) << 12) |
                                ((upper & 0x0400) << 1) |
                                ((lower & 0x7000) >> 4) | (lower & 0x00ff);
-                       offset = (offset ^ 0x8000) - 0x8000;
+                       offset = sign_extend32(offset, 15);
                        offset += sym->st_value;
 
                        if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_PREL ||
index 75cd469..c66b560 100644 (file)
@@ -76,13 +76,6 @@ static int __init fpe_setup(char *line)
 __setup("fpe=", fpe_setup);
 #endif
 
-extern void init_default_cache_policy(unsigned long);
-extern void paging_init(const struct machine_desc *desc);
-extern void early_mm_init(const struct machine_desc *);
-extern void adjust_lowmem_bounds(void);
-extern enum reboot_mode reboot_mode;
-extern void setup_dma_zone(const struct machine_desc *desc);
-
 unsigned int processor_id;
 EXPORT_SYMBOL(processor_id);
 unsigned int __machine_arch_type __read_mostly;
@@ -1142,7 +1135,7 @@ void __init setup_arch(char **cmdline_p)
        setup_initial_init_mm(_text, _etext, _edata, _end);
 
        /* populate cmd_line too for later use, preserving boot_command_line */
-       strlcpy(cmd_line, boot_command_line, COMMAND_LINE_SIZE);
+       strscpy(cmd_line, boot_command_line, COMMAND_LINE_SIZE);
        *cmdline_p = cmd_line;
 
        early_fixmap_init();
@@ -1198,10 +1191,6 @@ void __init setup_arch(char **cmdline_p)
 
        reserve_crashkernel();
 
-#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
-       handle_arch_irq = mdesc->handle_irq;
-#endif
-
 #ifdef CONFIG_VT
 #if defined(CONFIG_VGA_CONSOLE)
        conswitchp = &vga_con;
index e07f359..8d0afa1 100644 (file)
@@ -18,6 +18,7 @@
 #include <asm/traps.h>
 #include <asm/unistd.h>
 #include <asm/vfp.h>
+#include <asm/syscalls.h>
 
 #include "signal.h"
 
index 87f8d0e..6756203 100644 (file)
@@ -288,15 +288,11 @@ int __cpu_disable(void)
 }
 
 /*
- * called on the thread which is asking for a CPU to be shutdown -
- * waits until shutdown has completed, or it is timed out.
+ * called on the thread which is asking for a CPU to be shutdown after the
+ * shutdown completed.
  */
-void __cpu_die(unsigned int cpu)
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
 {
-       if (!cpu_wait_death(cpu, 5)) {
-               pr_err("CPU%u: cpu didn't die\n", cpu);
-               return;
-       }
        pr_debug("CPU%u: shutdown\n", cpu);
 
        clear_tasks_mm_cpumask(cpu);
@@ -336,11 +332,11 @@ void __noreturn arch_cpu_idle_dead(void)
        flush_cache_louis();
 
        /*
-        * Tell __cpu_die() that this CPU is now safe to dispose of.  Once
-        * this returns, power and/or clocks can be removed at any point
-        * from this CPU and its cache by platform_cpu_kill().
+        * Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose
+        * of. Once this returns, power and/or clocks can be removed at
+        * any point from this CPU and its cache by platform_cpu_kill().
         */
-       (void)cpu_report_death();
+       cpuhp_ap_report_dead();
 
        /*
         * Ensure that the cache lines associated with that completion are
index a5f183c..0141e9b 100644 (file)
@@ -24,6 +24,7 @@
 #include <linux/ipc.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
+#include <asm/syscalls.h>
 
 /*
  * Since loff_t is a 64 bit type we avoid a lot of ABI hassle
index 0061631..d00f404 100644 (file)
@@ -10,6 +10,8 @@
  *  Copyright: MontaVista Software, Inc.
  */
 
+#include <asm/syscalls.h>
+
 /*
  * The legacy ABI and the new ARM EABI have different rules making some
  * syscalls incompatible especially with structure arguments.
index 40c7c80..3bad79d 100644 (file)
@@ -756,6 +756,7 @@ void __readwrite_bug(const char *fn)
 }
 EXPORT_SYMBOL(__readwrite_bug);
 
+#ifdef CONFIG_MMU
 void __pte_error(const char *file, int line, pte_t pte)
 {
        pr_err("%s:%d: bad pte %08llx.\n", file, line, (long long)pte_val(pte));
@@ -770,6 +771,7 @@ void __pgd_error(const char *file, int line, pgd_t pgd)
 {
        pr_err("%s:%d: bad pgd %08llx.\n", file, line, (long long)pgd_val(pgd));
 }
+#endif
 
 asmlinkage void __div0(void)
 {
index 53be7ea..9d21921 100644 (file)
@@ -308,6 +308,29 @@ static int unwind_exec_pop_subset_r0_to_r3(struct unwind_ctrl_block *ctrl,
        return URC_OK;
 }
 
+static unsigned long unwind_decode_uleb128(struct unwind_ctrl_block *ctrl)
+{
+       unsigned long bytes = 0;
+       unsigned long insn;
+       unsigned long result = 0;
+
+       /*
+        * unwind_get_byte() will advance `ctrl` one instruction at a time, so
+        * loop until we get an instruction byte where bit 7 is not set.
+        *
+        * Note: This decodes a maximum of 4 bytes to output 28 bits data where
+        * max is 0xfffffff: that will cover a vsp increment of 1073742336, hence
+        * it is sufficient for unwinding the stack.
+        */
+       do {
+               insn = unwind_get_byte(ctrl);
+               result |= (insn & 0x7f) << (bytes * 7);
+               bytes++;
+       } while (!!(insn & 0x80) && (bytes != sizeof(result)));
+
+       return result;
+}
+
 /*
  * Execute the current unwind instruction.
  */
@@ -361,7 +384,7 @@ static int unwind_exec_insn(struct unwind_ctrl_block *ctrl)
                if (ret)
                        goto error;
        } else if (insn == 0xb2) {
-               unsigned long uleb128 = unwind_get_byte(ctrl);
+               unsigned long uleb128 = unwind_decode_uleb128(ctrl);
 
                ctrl->vrs[SP] += 0x204 + (uleb128 << 2);
        } else {
index 3408269..f297d66 100644 (file)
@@ -135,7 +135,7 @@ static Elf32_Sym * __init find_symbol(struct elfinfo *lib, const char *symname)
 
                if (lib->dynsym[i].st_name == 0)
                        continue;
-               strlcpy(name, lib->dynstr + lib->dynsym[i].st_name,
+               strscpy(name, lib->dynstr + lib->dynsym[i].st_name,
                        MAX_SYMNAME);
                c = strchr(name, '@');
                if (c)
index 95bd359..f069d1b 100644 (file)
@@ -28,7 +28,7 @@ UNWIND(       .fnend          )
 ENDPROC(\name          )
        .endm
 
-       .macro  testop, name, instr, store
+       .macro  __testop, name, instr, store, barrier
 ENTRY( \name           )
 UNWIND(        .fnstart        )
        ands    ip, r1, #3
@@ -38,7 +38,7 @@ UNWIND(       .fnstart        )
        mov     r0, r0, lsr #5
        add     r1, r1, r0, lsl #2      @ Get word offset
        mov     r3, r2, lsl r3          @ create mask
-       smp_dmb
+       \barrier
 #if __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP)
        .arch_extension mp
        ALT_SMP(W(pldw) [r1])
@@ -50,13 +50,21 @@ UNWIND(     .fnstart        )
        strex   ip, r2, [r1]
        cmp     ip, #0
        bne     1b
-       smp_dmb
+       \barrier
        cmp     r0, #0
        movne   r0, #1
 2:     bx      lr
 UNWIND(        .fnend          )
 ENDPROC(\name          )
        .endm
+
+       .macro  testop, name, instr, store
+       __testop \name, \instr, \store, smp_dmb
+       .endm
+
+       .macro  sync_testop, name, instr, store
+       __testop \name, \instr, \store, __smp_dmb
+       .endm
 #else
        .macro  bitop, name, instr
 ENTRY( \name           )
index 4ebecc6..f13fe9b 100644 (file)
@@ -10,3 +10,7 @@
                 .text
 
 testop _test_and_change_bit, eor, str
+
+#if __LINUX_ARM_ARCH__ >= 6
+sync_testop    _sync_test_and_change_bit, eor, str
+#endif
index 009afa0..4d2c5ca 100644 (file)
@@ -10,3 +10,7 @@
                 .text
 
 testop _test_and_clear_bit, bicne, strne
+
+#if __LINUX_ARM_ARCH__ >= 6
+sync_testop    _sync_test_and_clear_bit, bicne, strne
+#endif
index f3192e5..649dbab 100644 (file)
@@ -10,3 +10,7 @@
                 .text
 
 testop _test_and_set_bit, orreq, streq
+
+#if __LINUX_ARM_ARCH__ >= 6
+sync_testop    _sync_test_and_set_bit, orreq, streq
+#endif
index 60dc56d..437dd03 100644 (file)
@@ -334,16 +334,14 @@ static bool at91_pm_eth_quirk_is_valid(struct at91_pm_quirk_eth *eth)
                pdev = of_find_device_by_node(eth->np);
                if (!pdev)
                        return false;
+               /* put_device(eth->dev) is called at the end of suspend. */
                eth->dev = &pdev->dev;
        }
 
        /* No quirks if device isn't a wakeup source. */
-       if (!device_may_wakeup(eth->dev)) {
-               put_device(eth->dev);
+       if (!device_may_wakeup(eth->dev))
                return false;
-       }
 
-       /* put_device(eth->dev) is called at the end of suspend. */
        return true;
 }
 
@@ -439,14 +437,14 @@ clk_unconfigure:
                                pr_err("AT91: PM: failed to enable %s clocks\n",
                                       j == AT91_PM_G_ETH ? "geth" : "eth");
                        }
-               } else {
-                       /*
-                        * Release the reference to eth->dev taken in
-                        * at91_pm_eth_quirk_is_valid().
-                        */
-                       put_device(eth->dev);
-                       eth->dev = NULL;
                }
+
+               /*
+                * Release the reference to eth->dev taken in
+                * at91_pm_eth_quirk_is_valid().
+                */
+               put_device(eth->dev);
+               eth->dev = NULL;
        }
 
        return ret;
index 29eb075..b5287ff 100644 (file)
@@ -106,7 +106,7 @@ void exynos_firmware_init(void);
 #define C2_STATE       (1 << 3)
 /*
  * Magic values for bootloader indicating chosen low power mode.
- * See also Documentation/arm/samsung/bootloader-interface.rst
+ * See also Documentation/arch/arm/samsung/bootloader-interface.rst
  */
 #define EXYNOS_SLEEP_MAGIC     0x00000bad
 #define EXYNOS_AFTR_MAGIC      0xfcba0d10
index 51e4705..3faf9a1 100644 (file)
@@ -11,7 +11,6 @@
 #include <linux/err.h>
 #include <linux/gpio.h>
 #include <linux/init.h>
-#include <linux/irqchip/mxs.h>
 #include <linux/reboot.h>
 #include <linux/micrel_phy.h>
 #include <linux/of_address.h>
@@ -472,7 +471,6 @@ static const char *const mxs_dt_compat[] __initconst = {
 };
 
 DT_MACHINE_START(MXS, "Freescale MXS (Device Tree)")
-       .handle_irq     = icoll_handle_irq,
        .init_machine   = mxs_machine_init,
        .init_late      = mxs_pm_init,
        .dt_compat      = mxs_dt_compat,
index 9108c87..8813920 100644 (file)
@@ -877,7 +877,6 @@ MACHINE_START(AMS_DELTA, "Amstrad E3 (Delta)")
        .map_io         = ams_delta_map_io,
        .init_early     = omap1_init_early,
        .init_irq       = omap1_init_irq,
-       .handle_irq     = omap1_handle_irq,
        .init_machine   = ams_delta_init,
        .init_late      = ams_delta_init_late,
        .init_time      = omap1_timer_init,
index a501a47..b56cea9 100644 (file)
@@ -291,7 +291,6 @@ MACHINE_START(NOKIA770, "Nokia 770")
        .map_io         = omap1_map_io,
        .init_early     = omap1_init_early,
        .init_irq       = omap1_init_irq,
-       .handle_irq     = omap1_handle_irq,
        .init_machine   = omap_nokia770_init,
        .init_late      = omap1_init_late,
        .init_time      = omap1_timer_init,
index df758c1..46eda4f 100644 (file)
@@ -389,7 +389,6 @@ MACHINE_START(OMAP_OSK, "TI-OSK")
        .map_io         = omap1_map_io,
        .init_early     = omap1_init_early,
        .init_irq       = omap1_init_irq,
-       .handle_irq     = omap1_handle_irq,
        .init_machine   = osk_init,
        .init_late      = omap1_init_late,
        .init_time      = omap1_timer_init,
index f79c497..91df3dc 100644 (file)
@@ -259,7 +259,6 @@ MACHINE_START(OMAP_PALMTE, "OMAP310 based Palm Tungsten E")
        .map_io         = omap1_map_io,
        .init_early     = omap1_init_early,
        .init_irq       = omap1_init_irq,
-       .handle_irq     = omap1_handle_irq,
        .init_machine   = omap_palmte_init,
        .init_late      = omap1_init_late,
        .init_time      = omap1_timer_init,
index 0c0cdd5..3ae295a 100644 (file)
@@ -338,7 +338,6 @@ MACHINE_START(SX1, "OMAP310 based Siemens SX1")
        .map_io         = omap1_map_io,
        .init_early     = omap1_init_early,
        .init_irq       = omap1_init_irq,
-       .handle_irq     = omap1_handle_irq,
        .init_machine   = omap_sx1_init,
        .init_late      = omap1_init_late,
        .init_time      = omap1_timer_init,
index bfc7ab0..3d9e72e 100644 (file)
@@ -37,6 +37,7 @@
  */
 #include <linux/gpio.h>
 #include <linux/init.h>
+#include <linux/irq.h>
 #include <linux/module.h>
 #include <linux/sched.h>
 #include <linux/interrupt.h>
@@ -254,4 +255,6 @@ void __init omap1_init_irq(void)
                ct = irq_data_get_chip_type(d);
                ct->chip.irq_unmask(d);
        }
+
+       set_handle_irq(omap1_handle_irq);
 }
index 72b08a9..6b7197a 100644 (file)
@@ -233,7 +233,6 @@ MACHINE_START(GUMSTIX, "Gumstix")
        .map_io         = pxa25x_map_io,
        .nr_irqs        = PXA_NR_IRQS,
        .init_irq       = pxa25x_init_irq,
-       .handle_irq     = pxa25x_handle_irq,
        .init_time      = pxa_timer_init,
        .init_machine   = gumstix_init,
        .restart        = pxa_restart,
index 1b83be1..032dc89 100644 (file)
@@ -143,6 +143,7 @@ set_pwer:
 void __init pxa25x_init_irq(void)
 {
        pxa_init_irq(32, pxa25x_set_wake);
+       set_handle_irq(pxa25x_handle_irq);
 }
 
 static int __init __init
index 4135ba2..c9b5642 100644 (file)
@@ -228,6 +228,7 @@ static int pxa27x_set_wake(struct irq_data *d, unsigned int on)
 void __init pxa27x_init_irq(void)
 {
        pxa_init_irq(34, pxa27x_set_wake);
+       set_handle_irq(pxa27x_handle_irq);
 }
 
 static int __init
index 4325bdc..042922a 100644 (file)
@@ -1043,7 +1043,6 @@ MACHINE_START(SPITZ, "SHARP Spitz")
        .map_io         = pxa27x_map_io,
        .nr_irqs        = PXA_NR_IRQS,
        .init_irq       = pxa27x_init_irq,
-       .handle_irq     = pxa27x_handle_irq,
        .init_machine   = spitz_init,
        .init_time      = pxa_timer_init,
        .restart        = spitz_restart,
@@ -1056,7 +1055,6 @@ MACHINE_START(BORZOI, "SHARP Borzoi")
        .map_io         = pxa27x_map_io,
        .nr_irqs        = PXA_NR_IRQS,
        .init_irq       = pxa27x_init_irq,
-       .handle_irq     = pxa27x_handle_irq,
        .init_machine   = spitz_init,
        .init_time      = pxa_timer_init,
        .restart        = spitz_restart,
@@ -1069,7 +1067,6 @@ MACHINE_START(AKITA, "SHARP Akita")
        .map_io         = pxa27x_map_io,
        .nr_irqs        = PXA_NR_IRQS,
        .init_irq       = pxa27x_init_irq,
-       .handle_irq     = pxa27x_handle_irq,
        .init_machine   = spitz_init,
        .init_time      = pxa_timer_init,
        .restart        = spitz_restart,
index 67f72ca..1956b09 100644 (file)
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
-/**
+/*
  *  arch/arm/mac-sa1100/jornada720_ssp.c
  *
  *  Copyright (C) 2006/2007 Kristoffer Ericson <Kristoffer.Ericson@gmail.com>
@@ -26,6 +26,7 @@ static unsigned long jornada_ssp_flags;
 
 /**
  * jornada_ssp_reverse - reverses input byte
+ * @byte: input byte to reverse
  *
  * we need to reverse all data we receive from the mcu due to its physical location
  * returns : 01110111 -> 11101110
@@ -46,6 +47,7 @@ EXPORT_SYMBOL(jornada_ssp_reverse);
 
 /**
  * jornada_ssp_byte - waits for ready ssp bus and sends byte
+ * @byte: input byte to transmit
  *
  * waits for fifo buffer to clear and then transmits, if it doesn't then we will
  * timeout after <timeout> rounds. Needs mcu running before its called.
@@ -77,6 +79,7 @@ EXPORT_SYMBOL(jornada_ssp_byte);
 
 /**
  * jornada_ssp_inout - decide if input is command or trading byte
+ * @byte: input byte to send (may be %TXDUMMY)
  *
  * returns : (jornada_ssp_byte(byte)) on success
  *         : %-ETIMEDOUT on timeout failure
index b2d45cf..b3842c9 100644 (file)
@@ -21,7 +21,7 @@ menuconfig ARCH_STI
        help
          Include support for STMicroelectronics' STiH415/416, STiH407/10 and
          STiH418 family SoCs using the Device Tree for discovery.  More
-         information can be found in Documentation/arm/sti/ and
+         information can be found in Documentation/arch/arm/sti/ and
          Documentation/devicetree.
 
 if ARCH_STI
index be183ed..c164cde 100644 (file)
@@ -712,7 +712,7 @@ config ARM_VIRT_EXT
          assistance.
 
          A compliant bootloader is required in order to make maximum
-         use of this feature.  Refer to Documentation/arm/booting.rst for
+         use of this feature.  Refer to Documentation/arch/arm/booting.rst for
          details.
 
 config SWP_EMULATE
@@ -904,7 +904,7 @@ config KUSER_HELPERS
          the CPU type fitted to the system.  This permits binaries to be
          run on ARMv4 through to ARMv7 without modification.
 
-         See Documentation/arm/kernel_user_helpers.rst for details.
+         See Documentation/arch/arm/kernel_user_helpers.rst for details.
 
          However, the fixed address nature of these helpers can be used
          by ROP (return orientated programming) authors when creating
index b4a3335..bc4ed5c 100644 (file)
@@ -258,12 +258,14 @@ static struct dma_contig_early_reserve dma_mmu_remap[MAX_CMA_AREAS] __initdata;
 
 static int dma_mmu_remap_num __initdata;
 
+#ifdef CONFIG_DMA_CMA
 void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
 {
        dma_mmu_remap[dma_mmu_remap_num].base = base;
        dma_mmu_remap[dma_mmu_remap_num].size = size;
        dma_mmu_remap_num++;
 }
+#endif
 
 void __init dma_contiguous_remap(void)
 {
index 54927ba..e8f8c19 100644 (file)
@@ -37,5 +37,9 @@ static inline int fsr_fs(unsigned int fsr)
 
 void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs);
 void early_abt_enable(void);
+asmlinkage void do_DataAbort(unsigned long addr, unsigned int fsr,
+                            struct pt_regs *regs);
+asmlinkage void do_PrefetchAbort(unsigned long addr, unsigned int ifsr,
+                                struct pt_regs *regs);
 
 #endif /* __ARCH_ARM_FAULT_H */
index 7ff9fee..2508be9 100644 (file)
@@ -354,6 +354,7 @@ EXPORT_SYMBOL(flush_dcache_page);
  *  memcpy() to/from page
  *  if written to page, flush_dcache_page()
  */
+void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr);
 void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr)
 {
        unsigned long pfn;
index 463fc2a..f3a52c0 100644 (file)
@@ -21,6 +21,7 @@
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
+#include <asm/tcm.h>
 #include <asm/tlb.h>
 #include <asm/highmem.h>
 #include <asm/system_info.h>
@@ -37,7 +38,6 @@
 
 #include "fault.h"
 #include "mm.h"
-#include "tcm.h"
 
 extern unsigned long __atags_pointer;
 
index 53f2d87..43cfd06 100644 (file)
@@ -21,6 +21,7 @@
 #include <asm/cputype.h>
 #include <asm/mpu.h>
 #include <asm/procinfo.h>
+#include <asm/idmap.h>
 
 #include "mm.h"
 
diff --git a/arch/arm/mm/tcm.h b/arch/arm/mm/tcm.h
deleted file mode 100644 (file)
index 6b80a76..0000000
+++ /dev/null
@@ -1,17 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2008-2009 ST-Ericsson AB
- * TCM memory handling for ARM systems
- *
- * Author: Linus Walleij <linus.walleij@stericsson.com>
- * Author: Rickard Andersson <rickard.andersson@stericsson.com>
- */
-
-#ifdef CONFIG_HAVE_TCM
-void __init tcm_init(void);
-#else
-/* No TCM support, just blank inlines to be optimized out */
-static inline void tcm_init(void)
-{
-}
-#endif
index 4d72099..eba7ac4 100644 (file)
@@ -40,7 +40,7 @@ enum probes_insn checker_stack_use_imm_0xx(probes_opcode_t insn,
  * Different from other insn uses imm8, the real addressing offset of
  * STRD in T32 encoding should be imm8 * 4. See ARMARM description.
  */
-enum probes_insn checker_stack_use_t32strd(probes_opcode_t insn,
+static enum probes_insn checker_stack_use_t32strd(probes_opcode_t insn,
                struct arch_probes_insn *asi,
                const struct decode_header *h)
 {
index 9090c3a..d8238da 100644 (file)
@@ -233,7 +233,7 @@ singlestep(struct kprobe *p, struct pt_regs *regs, struct kprobe_ctlblk *kcb)
  * kprobe, and that level is reserved for user kprobe handlers, so we can't
  * risk encountering a new kprobe in an interrupt handler.
  */
-void __kprobes kprobe_handler(struct pt_regs *regs)
+static void __kprobes kprobe_handler(struct pt_regs *regs)
 {
        struct kprobe *p, *cur;
        struct kprobe_ctlblk *kcb;
index dbef34e..7f65048 100644 (file)
@@ -145,8 +145,6 @@ __arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
        }
 }
 
-extern void kprobe_handler(struct pt_regs *regs);
-
 static void
 optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
 {
index c562832..171c707 100644 (file)
@@ -720,7 +720,7 @@ static const char coverage_register_lookup[16] = {
        [REG_TYPE_NOSPPCX]      = COVERAGE_ANY_REG | COVERAGE_SP,
 };
 
-unsigned coverage_start_registers(const struct decode_header *h)
+static unsigned coverage_start_registers(const struct decode_header *h)
 {
        unsigned regs = 0;
        int i;
index 56ad3c0..c729703 100644 (file)
@@ -454,3 +454,7 @@ void kprobe_thumb32_test_cases(void);
 #else
 void kprobe_arm_test_cases(void);
 #endif
+
+void __kprobes_test_case_start(void);
+void __kprobes_test_case_end_16(void);
+void __kprobes_test_case_end_32(void);
index 9e74c7f..97e2bfa 100644 (file)
@@ -7,7 +7,7 @@
 #   http://www.arm.linux.org.uk/developer/machines/download.php
 #
 # Please do not send patches to this file; it is automatically generated!
-# To add an entry into this database, please see Documentation/arm/arm.rst,
+# To add an entry into this database, please see Documentation/arch/arm/arm.rst,
 # or visit:
 #
 #   http://www.arm.linux.org.uk/developer/machines/?action=new
index 1976c6f..a003bea 100644 (file)
@@ -6,6 +6,8 @@
  */
 #include <linux/time.h>
 #include <linux/types.h>
+#include <asm/vdso.h>
+#include <asm/unwind.h>
 
 int __vdso_clock_gettime(clockid_t clock,
                         struct old_timespec32 *ts)
index 7483ef8..62206ef 100644 (file)
@@ -23,6 +23,9 @@
 @
 ENTRY(do_vfp)
        mov     r1, r10
-       mov     r3, r9
-       b       vfp_entry
+       str     lr, [sp, #-8]!
+       add     r3, sp, #4
+       str     r9, [r3]
+       bl      vfp_entry
+       ldr     pc, [sp], #8
 ENDPROC(do_vfp)
index 4d84782..a4610d0 100644 (file)
@@ -172,13 +172,14 @@ vfp_hw_state_valid:
                                        @ out before setting an FPEXC that
                                        @ stops us reading stuff
        VFPFMXR FPEXC, r1               @ Restore FPEXC last
+       mov     sp, r3                  @ we think we have handled things
+       pop     {lr}
        sub     r2, r2, #4              @ Retry current instruction - if Thumb
        str     r2, [sp, #S_PC]         @ mode it's two 16-bit instructions,
                                        @ else it's one 32-bit instruction, so
                                        @ always subtract 4 from the following
                                        @ instruction address.
 
-       mov     lr, r3                  @ we think we have handled things
 local_bh_enable_and_ret:
        adr     r0, .
        mov     r1, #SOFTIRQ_DISABLE_OFFSET
@@ -209,8 +210,9 @@ skip:
 
 process_exception:
        DBGSTR  "bounce"
+       mov     sp, r3                  @ setup for a return to the user code.
+       pop     {lr}
        mov     r2, sp                  @ nothing stacked - regdump is at TOS
-       mov     lr, r3                  @ setup for a return to the user code.
 
        @ Now call the C code to package up the bounce to the support code
        @   r0 holds the trigger instruction
index 349dcb9..1ba5078 100644 (file)
@@ -25,6 +25,7 @@
 #include <asm/thread_notify.h>
 #include <asm/traps.h>
 #include <asm/vfp.h>
+#include <asm/neon.h>
 
 #include "vfpinstr.h"
 #include "vfp.h"
index b1201d2..a8d0bd4 100644 (file)
@@ -207,6 +207,7 @@ config ARM64
        select HAVE_IOREMAP_PROT
        select HAVE_IRQ_TIME_ACCOUNTING
        select HAVE_KVM
+       select HAVE_MOD_ARCH_SPECIFIC
        select HAVE_NMI
        select HAVE_PERF_EVENTS
        select HAVE_PERF_REGS
@@ -222,6 +223,7 @@ config ARM64
        select HAVE_KPROBES
        select HAVE_KRETPROBES
        select HAVE_GENERIC_VDSO
+       select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
        select IRQ_DOMAIN
        select IRQ_FORCED_THREADING
        select KASAN_VMALLOC if KASAN
@@ -577,7 +579,6 @@ config ARM64_ERRATUM_845719
 config ARM64_ERRATUM_843419
        bool "Cortex-A53: 843419: A load or store might access an incorrect address"
        default y
-       select ARM64_MODULE_PLTS if MODULES
        help
          This option links the kernel with '--fix-cortex-a53-843419' and
          enables PLT support to replace certain ADRP instructions, which can
@@ -1516,7 +1517,7 @@ config XEN
 # 16K |       27          |      14      |       13        |         11         |
 # 64K |       29          |      16      |       13        |         13         |
 config ARCH_FORCE_MAX_ORDER
-       int "Order of maximal physically contiguous allocations" if EXPERT && (ARM64_4K_PAGES || ARM64_16K_PAGES)
+       int
        default "13" if ARM64_64K_PAGES
        default "11" if ARM64_16K_PAGES
        default "10"
@@ -1619,7 +1620,7 @@ config KUSER_HELPERS
          the system. This permits binaries to be run on ARMv4 through
          to ARMv8 without modification.
 
-         See Documentation/arm/kernel_user_helpers.rst for details.
+         See Documentation/arch/arm/kernel_user_helpers.rst for details.
 
          However, the fixed address nature of these helpers can be used
          by ROP (return orientated programming) authors when creating
@@ -2107,26 +2108,6 @@ config ARM64_SME
          register state capable of holding two dimensional matrix tiles to
          enable various matrix operations.
 
-config ARM64_MODULE_PLTS
-       bool "Use PLTs to allow module memory to spill over into vmalloc area"
-       depends on MODULES
-       select HAVE_MOD_ARCH_SPECIFIC
-       help
-         Allocate PLTs when loading modules so that jumps and calls whose
-         targets are too far away for their relative offsets to be encoded
-         in the instructions themselves can be bounced via veneers in the
-         module's PLT. This allows modules to be allocated in the generic
-         vmalloc area after the dedicated module memory area has been
-         exhausted.
-
-         When running with address space randomization (KASLR), the module
-         region itself may be too far away for ordinary relative jumps and
-         calls, and so in that case, module PLTs are required and cannot be
-         disabled.
-
-         Specific errata workaround(s) might also force module PLTs to be
-         enabled (ARM64_ERRATUM_843419).
-
 config ARM64_PSEUDO_NMI
        bool "Support for NMI-like interrupts"
        select ARM_GIC_V3
@@ -2167,7 +2148,6 @@ config RELOCATABLE
 
 config RANDOMIZE_BASE
        bool "Randomize the address of the kernel image"
-       select ARM64_MODULE_PLTS if MODULES
        select RELOCATABLE
        help
          Randomizes the virtual address at which the kernel image is
@@ -2198,9 +2178,8 @@ config RANDOMIZE_MODULE_REGION_FULL
          When this option is not set, the module region will be randomized over
          a limited range that contains the [_stext, _etext] interval of the
          core kernel, so branch relocations are almost always in range unless
-         ARM64_MODULE_PLTS is enabled and the region is exhausted. In this
-         particular case of region exhaustion, modules might be able to fall
-         back to a larger 2GB area.
+         the region is exhausted. In this particular case of region
+         exhaustion, modules might be able to fall back to a larger 2GB area.
 
 config CC_HAVE_STACKPROTECTOR_SYSREG
        def_bool $(cc-option,-mstack-protector-guard=sysreg -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard-offset=0)
index 0295780..7b41537 100644 (file)
@@ -59,6 +59,7 @@
                L2_0: l2-cache0 {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index ef68f5a..afdf954 100644 (file)
@@ -72,6 +72,7 @@
                L2_0: l2-cache0 {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 796cd7d..7bdeb96 100644 (file)
@@ -58,6 +58,7 @@
                L2_0: l2-cache0 {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 2209c1a..e62a435 100644 (file)
@@ -171,6 +171,7 @@ conn_subsys: bus@5b000000 {
                        interrupt-names = "host", "peripheral", "otg", "wakeup";
                        phys = <&usb3_phy>;
                        phy-names = "cdns3,usb3-phy";
+                       cdns,on-chip-buff-size = /bits/ 16 <18>;
                        status = "disabled";
                };
        };
index 2dce8f2..adb98a7 100644 (file)
@@ -90,6 +90,8 @@ dma_subsys: bus@5a000000 {
                clocks = <&uart0_lpcg IMX_LPCG_CLK_4>,
                         <&uart0_lpcg IMX_LPCG_CLK_0>;
                clock-names = "ipg", "baud";
+               assigned-clocks = <&clk IMX_SC_R_UART_0 IMX_SC_PM_CLK_PER>;
+               assigned-clock-rates = <80000000>;
                power-domains = <&pd IMX_SC_R_UART_0>;
                status = "disabled";
        };
@@ -100,6 +102,8 @@ dma_subsys: bus@5a000000 {
                clocks = <&uart1_lpcg IMX_LPCG_CLK_4>,
                         <&uart1_lpcg IMX_LPCG_CLK_0>;
                clock-names = "ipg", "baud";
+               assigned-clocks = <&clk IMX_SC_R_UART_1 IMX_SC_PM_CLK_PER>;
+               assigned-clock-rates = <80000000>;
                power-domains = <&pd IMX_SC_R_UART_1>;
                status = "disabled";
        };
@@ -110,6 +114,8 @@ dma_subsys: bus@5a000000 {
                clocks = <&uart2_lpcg IMX_LPCG_CLK_4>,
                         <&uart2_lpcg IMX_LPCG_CLK_0>;
                clock-names = "ipg", "baud";
+               assigned-clocks = <&clk IMX_SC_R_UART_2 IMX_SC_PM_CLK_PER>;
+               assigned-clock-rates = <80000000>;
                power-domains = <&pd IMX_SC_R_UART_2>;
                status = "disabled";
        };
@@ -120,6 +126,8 @@ dma_subsys: bus@5a000000 {
                clocks = <&uart3_lpcg IMX_LPCG_CLK_4>,
                         <&uart3_lpcg IMX_LPCG_CLK_0>;
                clock-names = "ipg", "baud";
+               assigned-clocks = <&clk IMX_SC_R_UART_3 IMX_SC_PM_CLK_PER>;
+               assigned-clock-rates = <80000000>;
                power-domains = <&pd IMX_SC_R_UART_3>;
                status = "disabled";
        };
index 9e82069..5a1f7c3 100644 (file)
@@ -81,7 +81,7 @@
 &ecspi2 {
        pinctrl-names = "default";
        pinctrl-0 = <&pinctrl_espi2>;
-       cs-gpios = <&gpio5 9 GPIO_ACTIVE_LOW>;
+       cs-gpios = <&gpio5 13 GPIO_ACTIVE_LOW>;
        status = "okay";
 
        eeprom@0 {
                        MX8MN_IOMUXC_ECSPI2_SCLK_ECSPI2_SCLK            0x82
                        MX8MN_IOMUXC_ECSPI2_MOSI_ECSPI2_MOSI            0x82
                        MX8MN_IOMUXC_ECSPI2_MISO_ECSPI2_MISO            0x82
-                       MX8MN_IOMUXC_ECSPI1_SS0_GPIO5_IO9               0x41
+                       MX8MN_IOMUXC_ECSPI2_SS0_GPIO5_IO13              0x41
                >;
        };
 
index 67072e6..cbd9d12 100644 (file)
                #address-cells = <1>;
                #size-cells = <0>;
 
-               ethphy: ethernet-phy@4 {
+               ethphy: ethernet-phy@4 { /* AR8033 or ADIN1300 */
                        compatible = "ethernet-phy-ieee802.3-c22";
                        reg = <4>;
                        reset-gpios = <&gpio1 9 GPIO_ACTIVE_LOW>;
                        reset-assert-us = <10000>;
+                       /*
+                        * Deassert delay:
+                        * ADIN1300 requires 5ms.
+                        * AR8033   requires 1ms.
+                        */
+                       reset-deassert-us = <20000>;
                };
        };
 };
index bd84db5..8be8f09 100644 (file)
                                         <&clk IMX8MN_CLK_DISP_APB_ROOT>,
                                         <&clk IMX8MN_CLK_DISP_AXI_ROOT>;
                                clock-names = "pix", "axi", "disp_axi";
-                               assigned-clocks = <&clk IMX8MN_CLK_DISP_PIXEL_ROOT>,
-                                                 <&clk IMX8MN_CLK_DISP_AXI>,
-                                                 <&clk IMX8MN_CLK_DISP_APB>;
-                               assigned-clock-parents = <&clk IMX8MN_CLK_DISP_PIXEL>,
-                                                        <&clk IMX8MN_SYS_PLL2_1000M>,
-                                                        <&clk IMX8MN_SYS_PLL1_800M>;
-                               assigned-clock-rates = <594000000>, <500000000>, <200000000>;
                                interrupts = <GIC_SPI 5 IRQ_TYPE_LEVEL_HIGH>;
                                power-domains = <&disp_blk_ctrl IMX8MN_DISPBLK_PD_LCDIF>;
                                status = "disabled";
                                clocks = <&clk IMX8MN_CLK_DSI_CORE>,
                                         <&clk IMX8MN_CLK_DSI_PHY_REF>;
                                clock-names = "bus_clk", "sclk_mipi";
-                               assigned-clocks = <&clk IMX8MN_CLK_DSI_CORE>,
-                                                 <&clk IMX8MN_CLK_DSI_PHY_REF>;
-                               assigned-clock-parents = <&clk IMX8MN_SYS_PLL1_266M>,
-                                                        <&clk IMX8MN_CLK_24M>;
-                               assigned-clock-rates = <266000000>, <24000000>;
-                               samsung,pll-clock-frequency = <24000000>;
                                interrupts = <GIC_SPI 18 IRQ_TYPE_LEVEL_HIGH>;
                                power-domains = <&disp_blk_ctrl IMX8MN_DISPBLK_PD_MIPI_DSI>;
                                status = "disabled";
                                              "lcdif-axi", "lcdif-apb", "lcdif-pix",
                                              "dsi-pclk", "dsi-ref",
                                              "csi-aclk", "csi-pclk";
+                               assigned-clocks = <&clk IMX8MN_CLK_DSI_CORE>,
+                                                 <&clk IMX8MN_CLK_DSI_PHY_REF>,
+                                                 <&clk IMX8MN_CLK_DISP_PIXEL>,
+                                                 <&clk IMX8MN_CLK_DISP_AXI>,
+                                                 <&clk IMX8MN_CLK_DISP_APB>;
+                               assigned-clock-parents = <&clk IMX8MN_SYS_PLL1_266M>,
+                                                        <&clk IMX8MN_CLK_24M>,
+                                                        <&clk IMX8MN_VIDEO_PLL1_OUT>,
+                                                        <&clk IMX8MN_SYS_PLL2_1000M>,
+                                                        <&clk IMX8MN_SYS_PLL1_800M>;
+                               assigned-clock-rates = <266000000>,
+                                                      <24000000>,
+                                                      <594000000>,
+                                                      <500000000>,
+                                                      <200000000>;
                                #power-domain-cells = <1>;
                        };
 
index f813919..428c604 100644 (file)
                                         <&clk IMX8MP_CLK_MEDIA_APB_ROOT>,
                                         <&clk IMX8MP_CLK_MEDIA_AXI_ROOT>;
                                clock-names = "pix", "axi", "disp_axi";
-                               assigned-clocks = <&clk IMX8MP_CLK_MEDIA_DISP1_PIX_ROOT>,
-                                                 <&clk IMX8MP_CLK_MEDIA_AXI>,
-                                                 <&clk IMX8MP_CLK_MEDIA_APB>;
-                               assigned-clock-parents = <&clk IMX8MP_CLK_MEDIA_DISP1_PIX>,
-                                                        <&clk IMX8MP_SYS_PLL2_1000M>,
-                                                        <&clk IMX8MP_SYS_PLL1_800M>;
-                               assigned-clock-rates = <594000000>, <500000000>, <200000000>;
                                interrupts = <GIC_SPI 5 IRQ_TYPE_LEVEL_HIGH>;
                                power-domains = <&media_blk_ctrl IMX8MP_MEDIABLK_PD_LCDIF_1>;
                                status = "disabled";
                                         <&clk IMX8MP_CLK_MEDIA_APB_ROOT>,
                                         <&clk IMX8MP_CLK_MEDIA_AXI_ROOT>;
                                clock-names = "pix", "axi", "disp_axi";
-                               assigned-clocks = <&clk IMX8MP_CLK_MEDIA_DISP2_PIX>,
-                                                 <&clk IMX8MP_VIDEO_PLL1>;
-                               assigned-clock-parents = <&clk IMX8MP_VIDEO_PLL1_OUT>,
-                                                        <&clk IMX8MP_VIDEO_PLL1_REF_SEL>;
-                               assigned-clock-rates = <0>, <1039500000>;
                                power-domains = <&media_blk_ctrl IMX8MP_MEDIABLK_PD_LCDIF_2>;
                                status = "disabled";
 
                                              "disp1", "disp2", "isp", "phy";
 
                                assigned-clocks = <&clk IMX8MP_CLK_MEDIA_AXI>,
-                                                 <&clk IMX8MP_CLK_MEDIA_APB>;
+                                                 <&clk IMX8MP_CLK_MEDIA_APB>,
+                                                 <&clk IMX8MP_CLK_MEDIA_DISP1_PIX>,
+                                                 <&clk IMX8MP_CLK_MEDIA_DISP2_PIX>,
+                                                 <&clk IMX8MP_VIDEO_PLL1>;
                                assigned-clock-parents = <&clk IMX8MP_SYS_PLL2_1000M>,
-                                                        <&clk IMX8MP_SYS_PLL1_800M>;
-                               assigned-clock-rates = <500000000>, <200000000>;
-
+                                                        <&clk IMX8MP_SYS_PLL1_800M>,
+                                                        <&clk IMX8MP_VIDEO_PLL1_OUT>,
+                                                        <&clk IMX8MP_VIDEO_PLL1_OUT>;
+                               assigned-clock-rates = <500000000>, <200000000>,
+                                                      <0>, <0>, <1039500000>;
                                #power-domain-cells = <1>;
 
                                lvds_bridge: bridge@5c {
index ce9d3f0..607cd6b 100644 (file)
@@ -82,8 +82,8 @@
        pinctrl-0 = <&pinctrl_usdhc2>;
        bus-width = <4>;
        vmmc-supply = <&reg_usdhc2_vmmc>;
-       cd-gpios = <&lsio_gpio4 22 GPIO_ACTIVE_LOW>;
-       wp-gpios = <&lsio_gpio4 21 GPIO_ACTIVE_HIGH>;
+       cd-gpios = <&lsio_gpio5 22 GPIO_ACTIVE_LOW>;
+       wp-gpios = <&lsio_gpio5 21 GPIO_ACTIVE_HIGH>;
        status = "okay";
 };
 
index 7264d78..9af769a 100644 (file)
        };
 };
 
+&iomuxc {
+       pinctrl-names = "default";
+       pinctrl-0 = <&pinctrl_ext_io0>, <&pinctrl_hog0>, <&pinctrl_hog1>,
+                   <&pinctrl_lpspi2_cs2>;
+};
+
 /* Colibri SPI */
 &lpspi2 {
        status = "okay";
index 5f30c88..f895306 100644 (file)
@@ -48,8 +48,7 @@
                           <IMX8QXP_SAI0_TXFS_LSIO_GPIO0_IO28           0x20>,          /* SODIMM 101 */
                           <IMX8QXP_SAI0_RXD_LSIO_GPIO0_IO27            0x20>,          /* SODIMM  97 */
                           <IMX8QXP_ENET0_RGMII_RXC_LSIO_GPIO5_IO03     0x06000020>,    /* SODIMM  85 */
-                          <IMX8QXP_SAI0_TXC_LSIO_GPIO0_IO26            0x20>,          /* SODIMM  79 */
-                          <IMX8QXP_QSPI0A_DATA1_LSIO_GPIO3_IO10        0x06700041>;    /* SODIMM  45 */
+                          <IMX8QXP_SAI0_TXC_LSIO_GPIO0_IO26            0x20>;          /* SODIMM  79 */
        };
 
        pinctrl_uart1_forceoff: uart1forceoffgrp {
index 7cad791..49d105e 100644 (file)
 /* TODO VPU Encoder/Decoder */
 
 &iomuxc {
-       pinctrl-names = "default";
-       pinctrl-0 = <&pinctrl_ext_io0>, <&pinctrl_hog0>, <&pinctrl_hog1>,
-                   <&pinctrl_hog2>, <&pinctrl_lpspi2_cs2>;
-
        /* On-module touch pen-down interrupt */
        pinctrl_ad7879_int: ad7879intgrp {
                fsl,pins = <IMX8QXP_MIPI_CSI0_I2C0_SCL_LSIO_GPIO3_IO05  0x21>;
        };
 
        pinctrl_hog1: hog1grp {
-               fsl,pins = <IMX8QXP_CSI_MCLK_LSIO_GPIO3_IO01                    0x20>,          /* SODIMM  75 */
-                          <IMX8QXP_QSPI0A_SCLK_LSIO_GPIO3_IO16                 0x20>;          /* SODIMM  93 */
+               fsl,pins = <IMX8QXP_QSPI0A_SCLK_LSIO_GPIO3_IO16                 0x20>;          /* SODIMM  93 */
        };
 
        pinctrl_hog2: hog2grp {
                fsl,pins = <IMX8QXP_SCU_BOOT_MODE3_SCU_DSC_RTC_CLOCK_OUTPUT_32K 0x20>;
        };
 };
+
+/* Delete peripherals which are not present on SOC, but are defined in imx8-ss-*.dtsi */
+
+/delete-node/ &adc1;
+/delete-node/ &adc1_lpcg;
+/delete-node/ &dsp;
+/delete-node/ &dsp_lpcg;
index 12e0e17..af4d971 100644 (file)
@@ -73,6 +73,7 @@
                L2_0: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 9ff4e9d..f531797 100644 (file)
@@ -83,7 +83,8 @@
 
                L2_0: l2-cache {
                        compatible = "cache";
-                       cache-level = <0x2>;
+                       cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 84e715a..5b2c198 100644 (file)
@@ -66,7 +66,8 @@
 
                L2_0: l2-cache {
                        compatible = "cache";
-                       cache-level = <0x2>;
+                       cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 3bb7435..0ed19fb 100644 (file)
@@ -72,6 +72,7 @@
                L2_0: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 7e0fa37..834e0b6 100644 (file)
                L2_0: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
 
                idle-states {
index 602cb18..d44cfa0 100644 (file)
                L2_0: l2-cache-0 {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
 
                L2_1: l2-cache-1 {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 1f0bd24..f47fb8e 100644 (file)
                l2_0: l2-cache0 {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
 
                l2_1: l2-cache1 {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 2831966..bdc3f2b 100644 (file)
@@ -52,6 +52,7 @@
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
@@ -88,6 +89,7 @@
                        L2_1: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
index 2b35cb3..30257c0 100644 (file)
@@ -53,8 +53,9 @@
                        #cooling-cells = <2>;
                        next-level-cache = <&L2_0>;
                        L2_0: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                        };
                };
 
@@ -83,8 +84,9 @@
                        #cooling-cells = <2>;
                        next-level-cache = <&L2_1>;
                        L2_1: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                        };
                };
 
index b150437..3ec941f 100644 (file)
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
                        L2_1: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
index ae5abc7..b29bc4e 100644 (file)
@@ -51,6 +51,7 @@
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
index eefed58..972f753 100644 (file)
@@ -95,6 +95,7 @@
                L2_0: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
 
                idle-states {
index 7344381..fb553f0 100644 (file)
                        next-level-cache = <&L2_0>;
                        L2_0: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
                                        compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
@@ -54,6 +58,8 @@
                        next-level-cache = <&L2_100>;
                        L2_100: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
@@ -70,6 +76,8 @@
                        next-level-cache = <&L2_200>;
                        L2_200: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
@@ -86,6 +94,8 @@
                        next-level-cache = <&L2_300>;
                        L2_300: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
index 339fea5..15e1ae1 100644 (file)
@@ -7,7 +7,7 @@
 
 #include <dt-bindings/regulator/qcom,rpmh-regulator.h>
 #include <dt-bindings/gpio/gpio.h>
-#include "sm8150.dtsi"
+#include "sa8155p.dtsi"
 #include "pmm8155au_1.dtsi"
 #include "pmm8155au_2.dtsi"
 
diff --git a/arch/arm64/boot/dts/qcom/sa8155p.dtsi b/arch/arm64/boot/dts/qcom/sa8155p.dtsi
new file mode 100644 (file)
index 0000000..ffb7ab6
--- /dev/null
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Copyright (c) 2023, Linaro Limited
+ *
+ * SA8155P is an automotive variant of SM8150, with some minor changes.
+ * Most notably, the RPMhPD setup differs: MMCX and LCX/LMX rails are gone,
+ * though the cmd-db doesn't reflect that and access attemps result in a bite.
+ */
+
+#include "sm8150.dtsi"
+
+&dispcc {
+       power-domains = <&rpmhpd SA8155P_CX>;
+};
+
+&mdss_dsi0 {
+       power-domains = <&rpmhpd SA8155P_CX>;
+};
+
+&mdss_dsi1 {
+       power-domains = <&rpmhpd SA8155P_CX>;
+};
+
+&mdss_mdp {
+       power-domains = <&rpmhpd SA8155P_CX>;
+};
+
+&remoteproc_slpi {
+       power-domains = <&rpmhpd SA8155P_CX>,
+                       <&rpmhpd SA8155P_MX>;
+};
+
+&rpmhpd {
+       /*
+        * The bindings were crafted such that SA8155P PDs match their
+        * SM8150 counterparts to make it more maintainable and only
+        * necessitate adjusting entries that actually differ
+        */
+       compatible = "qcom,sa8155p-rpmhpd";
+};
index 2343df7..c3310ca 100644 (file)
                        next-level-cache = <&L2_0>;
                        L2_0: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
                                        compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
@@ -58,6 +62,8 @@
                        next-level-cache = <&L2_1>;
                        L2_1: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
@@ -71,6 +77,8 @@
                        next-level-cache = <&L2_2>;
                        L2_2: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
@@ -84,6 +92,8 @@
                        next-level-cache = <&L2_3>;
                        L2_3: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        next-level-cache = <&L2_4>;
                        L2_4: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_1>;
                                L3_1: l3-cache {
                                        compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
 
                        };
                        next-level-cache = <&L2_5>;
                        L2_5: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_1>;
                        };
                };
                        next-level-cache = <&L2_6>;
                        L2_6: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_1>;
                        };
                };
                        next-level-cache = <&L2_7>;
                        L2_7: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_1>;
                        };
                };
index 9f05227..299ef5d 100644 (file)
        qcom,spare-regs = <&tcsr_regs_2 0xb3e4>;
 };
 
+&scm {
+       /* TF-A firmware maps memory cached so mark dma-coherent to match. */
+       dma-coherent;
+};
+
 &sdhc_1 {
        status = "okay";
 
index d8ed1d7..4b306a5 100644 (file)
 &cpu6_opp12 {
        opp-peak-kBps = <8532000 23347200>;
 };
+
+&cpu6_opp13 {
+       opp-peak-kBps = <8532000 23347200>;
+};
+
+&cpu6_opp14 {
+       opp-peak-kBps = <8532000 23347200>;
+};
index ca6920d..1472e7f 100644 (file)
@@ -892,6 +892,11 @@ hp_i2c: &i2c9 {
        qcom,spare-regs = <&tcsr_regs_2 0xb3e4>;
 };
 
+&scm {
+       /* TF-A firmware maps memory cached so mark dma-coherent to match. */
+       dma-coherent;
+};
+
 &sdhc_1 {
        status = "okay";
 
index ea1ffad..a65be76 100644 (file)
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
                                        compatible = "cache";
                                        cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
                        L2_100: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_200: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_300: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_400: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_500: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_600: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_700: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
        };
 
        firmware {
-               scm {
+               scm: scm {
                        compatible = "qcom,scm-sc7180", "qcom,scm";
                };
        };
index f562e4d..2e1cd21 100644 (file)
        firmware-name = "ath11k/WCN6750/hw1.0/wpss.mdt";
 };
 
+&scm {
+       /* TF-A firmware maps memory cached so mark dma-coherent to match. */
+       dma-coherent;
+};
+
 &wifi {
        status = "okay";
 
index c6dc200..2102704 100644 (file)
        wcd_rx: codec@0,4 {
                compatible = "sdw20217010d00";
                reg = <0 4>;
-               #sound-dai-cells = <1>;
                qcom,rx-port-mapping = <1 2 3 4 5>;
        };
 };
        wcd_tx: codec@0,3 {
                compatible = "sdw20217010d00";
                reg = <0 3>;
-               #sound-dai-cells = <1>;
                qcom,tx-port-mapping = <1 2 3 4>;
        };
 };
index 88b3586..9137db0 100644 (file)
        wcd_rx: codec@0,4 {
                compatible = "sdw20217010d00";
                reg = <0 4>;
-               #sound-dai-cells = <1>;
                qcom,rx-port-mapping = <1 2 3 4 5>;
        };
 };
        wcd_tx: codec@0,3 {
                compatible = "sdw20217010d00";
                reg = <0 3>;
-               #sound-dai-cells = <1>;
                qcom,tx-port-mapping = <1 2 3 4>;
        };
 };
index 31728f4..36f0bb9 100644 (file)
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
                                        compatible = "cache";
                                        cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
                        L2_100: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_200: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_300: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_400: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_500: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_600: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_700: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
        };
 
        firmware {
-               scm {
+               scm: scm {
                        compatible = "qcom,scm-sc7280", "qcom,scm";
                };
        };
index 8fa9fbf..cc4aef2 100644 (file)
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
-                                     compatible = "cache";
-                                     cache-level = <3>;
+                                       compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
@@ -83,6 +85,7 @@
                        L2_100: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_200: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_300: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_400: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_500: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_600: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_700: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                                        pins = "gpio7";
                                        function = "dmic1_data";
                                        drive-strength = <8>;
+                                       input-enable;
                                };
                        };
 
                                        function = "dmic1_data";
                                        drive-strength = <2>;
                                        bias-pull-down;
+                                       input-enable;
                                };
                        };
 
                                        pins = "gpio9";
                                        function = "dmic2_data";
                                        drive-strength = <8>;
+                                       input-enable;
                                };
                        };
 
                                        function = "dmic2_data";
                                        drive-strength = <2>;
                                        bias-pull-down;
+                                       input-enable;
                                };
                        };
 
                        qcom,tcs-config = <ACTIVE_TCS  2>, <SLEEP_TCS   3>,
                                          <WAKE_TCS    3>, <CONTROL_TCS 1>;
                        label = "apps_rsc";
+                       power-domains = <&CLUSTER_PD>;
 
                        apps_bcm_voter: bcm-voter {
                                compatible = "qcom,bcm-voter";
index 37e72b1..eaead2f 100644 (file)
@@ -63,6 +63,7 @@
                        L2_1: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
index c5f839d..b61e13d 100644 (file)
                        L2_0: l2-cache {
                                compatible = "cache";
                                next-level-cache = <&L3_0>;
+                               cache-level = <2>;
+                               cache-unified;
                                L3_0: l3-cache {
-                                     compatible = "cache";
+                                       compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
@@ -57,6 +61,8 @@
                        next-level-cache = <&L2_100>;
                        L2_100: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
@@ -71,6 +77,8 @@
                        next-level-cache = <&L2_200>;
                        L2_200: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
@@ -85,6 +93,8 @@
                        next-level-cache = <&L2_300>;
                        L2_300: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        next-level-cache = <&L2_400>;
                        L2_400: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        next-level-cache = <&L2_500>;
                        L2_500: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        next-level-cache = <&L2_600>;
                        L2_600: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        next-level-cache = <&L2_700>;
                        L2_700: l2-cache {
                                compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
index 9042444..cdeb05e 100644 (file)
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
-                                     compatible = "cache";
-                                     cache-level = <3>;
+                                       compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
                        L2_100: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_200: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_300: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_400: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_500: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_600: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_700: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
index 631ca32..43f31c1 100644 (file)
@@ -50,6 +50,7 @@
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
                        L2_1: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
index 9484752..2aa093d 100644 (file)
@@ -47,6 +47,7 @@
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
@@ -87,6 +88,7 @@
                        L2_1: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                        };
                };
 
index 18c4616..ad34301 100644 (file)
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
                                        compatible = "cache";
                                        cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
@@ -86,6 +88,7 @@
                        L2_100: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_200: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_300: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_400: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_500: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_600: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_700: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
index 8220e6f..b2f1bb1 100644 (file)
 };
 
 &remoteproc_adsp {
-       firmware-name = "qcom/Sony/murray/adsp.mbn";
+       firmware-name = "qcom/sm6375/Sony/murray/adsp.mbn";
        status = "okay";
 };
 
 &remoteproc_cdsp {
-       firmware-name = "qcom/Sony/murray/cdsp.mbn";
+       firmware-name = "qcom/sm6375/Sony/murray/cdsp.mbn";
        status = "okay";
 };
 
index ae9b6bc..f8d9c34 100644 (file)
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_0: l2-cache {
-                             compatible = "cache";
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
-                                     compatible = "cache";
+                                       compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_100: l2-cache {
-                             compatible = "cache";
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_200: l2-cache {
-                             compatible = "cache";
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_300: l2-cache {
-                             compatible = "cache";
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_400: l2-cache {
-                             compatible = "cache";
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_500: l2-cache {
-                             compatible = "cache";
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_600: l2-cache {
-                             compatible = "cache";
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_700: l2-cache {
-                             compatible = "cache";
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
index 2273fa5..27dcda0 100644 (file)
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
-                                     compatible = "cache";
-                                     cache-level = <3>;
+                                       compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
@@ -90,6 +92,7 @@
                        L2_100: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_200: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_300: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_400: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_500: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_600: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_700: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
index 8b2ae39..de6101d 100644 (file)
@@ -13,6 +13,6 @@
 };
 
 &display_panel {
-       compatible = "xiaomi,elish-boe-nt36523";
+       compatible = "xiaomi,elish-boe-nt36523", "novatek,nt36523";
        status = "okay";
 };
index a4d5341..4cffe9c 100644 (file)
@@ -13,6 +13,6 @@
 };
 
 &display_panel {
-       compatible = "xiaomi,elish-csot-nt36523";
+       compatible = "xiaomi,elish-csot-nt36523", "novatek,nt36523";
        status = "okay";
 };
index ebcb481..3efdc03 100644 (file)
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_0: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
-                                     compatible = "cache";
-                                     cache-level = <3>;
+                                       compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_100: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_200: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_300: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_400: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_500: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_600: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        power-domain-names = "psci";
                        #cooling-cells = <2>;
                        L2_700: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
index 595533a..d59ea8e 100644 (file)
                        #cooling-cells = <2>;
                        clocks = <&cpufreq_hw 0>;
                        L2_0: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
-                                     compatible = "cache";
-                                     cache-level = <3>;
+                                       compatible = "cache";
+                                       cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
                        #cooling-cells = <2>;
                        clocks = <&cpufreq_hw 0>;
                        L2_100: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        #cooling-cells = <2>;
                        clocks = <&cpufreq_hw 0>;
                        L2_200: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        #cooling-cells = <2>;
                        clocks = <&cpufreq_hw 0>;
                        L2_300: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        #cooling-cells = <2>;
                        clocks = <&cpufreq_hw 1>;
                        L2_400: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        #cooling-cells = <2>;
                        clocks = <&cpufreq_hw 1>;
                        L2_500: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        #cooling-cells = <2>;
                        clocks = <&cpufreq_hw 1>;
                        L2_600: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
                        #cooling-cells = <2>;
                        clocks = <&cpufreq_hw 2>;
                        L2_700: l2-cache {
-                             compatible = "cache";
-                             cache-level = <2>;
-                             next-level-cache = <&L3_0>;
+                               compatible = "cache";
+                               cache-level = <2>;
+                               cache-unified;
+                               next-level-cache = <&L3_0>;
                        };
                };
 
index 6e9bad8..558cbc4 100644 (file)
                        L2_0: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                                L3_0: l3-cache {
                                        compatible = "cache";
                                        cache-level = <3>;
+                                       cache-unified;
                                };
                        };
                };
                        L2_100: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_200: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_300: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_400: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_500: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_600: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        L2_700: l2-cache {
                                compatible = "cache";
                                cache-level = <2>;
+                               cache-unified;
                                next-level-cache = <&L3_0>;
                        };
                };
                        qcom,din-ports = <4>;
                        qcom,dout-ports = <9>;
 
-                       qcom,ports-sinterval =          <0x07 0x1f 0x3f 0x07 0x1f 0x3f 0x18f 0xff 0xff 0x0f 0x0f 0xff 0x31f>;
+                       qcom,ports-sinterval =          /bits/ 16 <0x07 0x1f 0x3f 0x07 0x1f 0x3f 0x18f 0xff 0xff 0x0f 0x0f 0xff 0x31f>;
                        qcom,ports-offset1 =            /bits/ 8 <0x01 0x03 0x05 0x02 0x04 0x15 0x00 0xff 0xff 0x06 0x0d 0xff 0x00>;
                        qcom,ports-offset2 =            /bits/ 8 <0xff 0x07 0x1f 0xff 0x07 0x1f 0xff 0xff 0xff 0xff 0xff 0xff 0xff>;
                        qcom,ports-hstart =             /bits/ 8 <0xff 0xff 0xff 0xff 0xff 0xff 0x08 0xff 0xff 0xff 0xff 0xff 0x0f>;
                        qcom,din-ports = <0>;
                        qcom,dout-ports = <10>;
 
-                       qcom,ports-sinterval =          <0x03 0x3f 0x1f 0x07 0x00 0x18f 0xff 0xff 0xff 0xff>;
+                       qcom,ports-sinterval =          /bits/ 16 <0x03 0x3f 0x1f 0x07 0x00 0x18f 0xff 0xff 0xff 0xff>;
                        qcom,ports-offset1 =            /bits/ 8 <0x00 0x00 0x0b 0x01 0x00 0x00 0xff 0xff 0xff 0xff>;
                        qcom,ports-offset2 =            /bits/ 8 <0x00 0x00 0x0b 0x00 0x00 0x00 0xff 0xff 0xff 0xff>;
                        qcom,ports-hstart =             /bits/ 8 <0xff 0x03 0xff 0xff 0xff 0x08 0xff 0xff 0xff 0xff>;
                        qcom,din-ports = <4>;
                        qcom,dout-ports = <9>;
 
-                       qcom,ports-sinterval =          <0x07 0x1f 0x3f 0x07 0x1f 0x3f 0x18f 0xff 0xff 0x0f 0x0f 0xff 0x31f>;
+                       qcom,ports-sinterval =          /bits/ 16 <0x07 0x1f 0x3f 0x07 0x1f 0x3f 0x18f 0xff 0xff 0x0f 0x0f 0xff 0x31f>;
                        qcom,ports-offset1 =            /bits/ 8 <0x01 0x03 0x05 0x02 0x04 0x15 0x00 0xff 0xff 0x06 0x0d 0xff 0x00>;
                        qcom,ports-offset2 =            /bits/ 8 <0xff 0x07 0x1f 0xff 0x07 0x1f 0xff 0xff 0xff 0xff 0xff 0xff 0xff>;
                        qcom,ports-hstart =             /bits/ 8 <0xff 0xff 0xff 0xff 0xff 0xff 0x08 0xff 0xff 0xff 0xff 0xff 0x0f>;
 
                system-cache-controller@25000000 {
                        compatible = "qcom,sm8550-llcc";
-                       reg = <0 0x25000000 0 0x800000>,
+                       reg = <0 0x25000000 0 0x200000>,
+                             <0 0x25200000 0 0x200000>,
+                             <0 0x25400000 0 0x200000>,
+                             <0 0x25600000 0 0x200000>,
                              <0 0x25800000 0 0x200000>;
-                       reg-names = "llcc_base", "llcc_broadcast_base";
+                       reg-names = "llcc0_base",
+                                   "llcc1_base",
+                                   "llcc2_base",
+                                   "llcc3_base",
+                                   "llcc_broadcast_base";
                        interrupts = <GIC_SPI 266 IRQ_TYPE_LEVEL_HIGH>;
                };
 
index dd228a2..2ae4bb7 100644 (file)
@@ -97,6 +97,7 @@
                l2: l2-cache {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index f69a38f..0a27fa5 100644 (file)
@@ -37,7 +37,8 @@
                vin-supply = <&vcc_io>;
        };
 
-       vcc_host_5v: vcc-host-5v-regulator {
+       /* Common enable line for all of the rails mentioned in the labels */
+       vcc_host_5v: vcc_host1_5v: vcc_otg_5v: vcc-host-5v-regulator {
                compatible = "regulator-fixed";
                gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_LOW>;
                pinctrl-names = "default";
                vin-supply = <&vcc_sys>;
        };
 
-       vcc_host1_5v: vcc_otg_5v: vcc-host1-5v-regulator {
-               compatible = "regulator-fixed";
-               gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_LOW>;
-               pinctrl-names = "default";
-               pinctrl-0 = <&usb20_host_drv>;
-               regulator-name = "vcc_host1_5v";
-               regulator-always-on;
-               regulator-boot-on;
-               vin-supply = <&vcc_sys>;
-       };
-
        vcc_sys: vcc-sys {
                compatible = "regulator-fixed";
                regulator-name = "vcc_sys";
index 6d7a7bf..e729e7a 100644 (file)
                l2: l2-cache0 {
                        compatible = "cache";
                        cache-level = <2>;
+                       cache-unified;
                };
        };
 
index 263ce40..cddf6cd 100644 (file)
                regulator-max-microvolt = <5000000>;
                vin-supply = <&vcc12v_dcin>;
        };
+
+       vcc_sd_pwr: vcc-sd-pwr-regulator {
+               compatible = "regulator-fixed";
+               regulator-name = "vcc_sd_pwr";
+               regulator-always-on;
+               regulator-boot-on;
+               regulator-min-microvolt = <3300000>;
+               regulator-max-microvolt = <3300000>;
+               vin-supply = <&vcc3v3_sys>;
+       };
 };
 
 /* phy for pcie */
 };
 
 &sdmmc0 {
-       vmmc-supply = <&sdmmc_pwr>;
-       status = "okay";
-};
-
-&sdmmc_pwr {
-       regulator-min-microvolt = <3300000>;
-       regulator-max-microvolt = <3300000>;
+       vmmc-supply = <&vcc_sd_pwr>;
        status = "okay";
 };
 
index 102e448..31aa2b8 100644 (file)
                regulator-max-microvolt = <3300000>;
                vin-supply = <&vcc5v0_sys>;
        };
-
-       sdmmc_pwr: sdmmc-pwr-regulator {
-               compatible = "regulator-fixed";
-               enable-active-high;
-               gpio = <&gpio0 RK_PA5 GPIO_ACTIVE_HIGH>;
-               pinctrl-names = "default";
-               pinctrl-0 = <&sdmmc_pwr_h>;
-               regulator-name = "sdmmc_pwr";
-               status = "disabled";
-       };
 };
 
 &cpu0 {
        status = "disabled";
 };
 
+&gpio0 {
+       nextrst-hog {
+               gpio-hog;
+               /*
+                * GPIO_ACTIVE_LOW + output-low here means that the pin is set
+                * to high, because output-low decides the value pre-inversion.
+                */
+               gpios = <RK_PA5 GPIO_ACTIVE_LOW>;
+               line-name = "nEXTRST";
+               output-low;
+       };
+};
+
 &gpu {
        mali-supply = <&vdd_gpu>;
        status = "okay";
                        rockchip,pins = <2 RK_PC2 RK_FUNC_GPIO &pcfg_pull_none>;
                };
        };
-
-       sdmmc-pwr {
-               sdmmc_pwr_h: sdmmc-pwr-h {
-                       rockchip,pins = <0 RK_PA5 RK_FUNC_GPIO &pcfg_pull_none>;
-               };
-       };
 };
 
 &pmu_io_domains {
index f70ca9f..c718b8d 100644 (file)
 
        rockchip-key {
                reset_button_pin: reset-button-pin {
-                       rockchip,pins = <4 RK_PA0 RK_FUNC_GPIO &pcfg_pull_up>;
+                       rockchip,pins = <0 RK_PB7 RK_FUNC_GPIO &pcfg_pull_up>;
                };
        };
 };
index ba67b58..f1be76a 100644 (file)
                power-domains = <&power RK3568_PD_PIPE>;
                reg = <0x3 0xc0400000 0x0 0x00400000>,
                      <0x0 0xfe270000 0x0 0x00010000>,
-                     <0x3 0x7f000000 0x0 0x01000000>;
-               ranges = <0x01000000 0x0 0x3ef00000 0x3 0x7ef00000 0x0 0x00100000>,
-                        <0x02000000 0x0 0x00000000 0x3 0x40000000 0x0 0x3ef00000>;
+                     <0x0 0xf2000000 0x0 0x00100000>;
+               ranges = <0x01000000 0x0 0xf2100000 0x0 0xf2100000 0x0 0x00100000>,
+                        <0x02000000 0x0 0xf2200000 0x0 0xf2200000 0x0 0x01e00000>,
+                        <0x03000000 0x0 0x40000000 0x3 0x40000000 0x0 0x40000000>;
                reg-names = "dbi", "apb", "config";
                resets = <&cru SRST_PCIE30X1_POWERUP>;
                reset-names = "pipe";
                power-domains = <&power RK3568_PD_PIPE>;
                reg = <0x3 0xc0800000 0x0 0x00400000>,
                      <0x0 0xfe280000 0x0 0x00010000>,
-                     <0x3 0xbf000000 0x0 0x01000000>;
-               ranges = <0x01000000 0x0 0x3ef00000 0x3 0xbef00000 0x0 0x00100000>,
-                        <0x02000000 0x0 0x00000000 0x3 0x80000000 0x0 0x3ef00000>;
+                     <0x0 0xf0000000 0x0 0x00100000>;
+               ranges = <0x01000000 0x0 0xf0100000 0x0 0xf0100000 0x0 0x00100000>,
+                        <0x02000000 0x0 0xf0200000 0x0 0xf0200000 0x0 0x01e00000>,
+                        <0x03000000 0x0 0x40000000 0x3 0x80000000 0x0 0x40000000>;
                reg-names = "dbi", "apb", "config";
                resets = <&cru SRST_PCIE30X2_POWERUP>;
                reset-names = "pipe";
index f62e0fd..61680c7 100644 (file)
                compatible = "rockchip,rk3568-pcie";
                reg = <0x3 0xc0000000 0x0 0x00400000>,
                      <0x0 0xfe260000 0x0 0x00010000>,
-                     <0x3 0x3f000000 0x0 0x01000000>;
+                     <0x0 0xf4000000 0x0 0x00100000>;
                reg-names = "dbi", "apb", "config";
                interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>,
                             <GIC_SPI 74 IRQ_TYPE_LEVEL_HIGH>,
                phys = <&combphy2 PHY_TYPE_PCIE>;
                phy-names = "pcie-phy";
                power-domains = <&power RK3568_PD_PIPE>;
-               ranges = <0x01000000 0x0 0x3ef00000 0x3 0x3ef00000 0x0 0x00100000
-                         0x02000000 0x0 0x00000000 0x3 0x00000000 0x0 0x3ef00000>;
+               ranges = <0x01000000 0x0 0xf4100000 0x0 0xf4100000 0x0 0x00100000>,
+                        <0x02000000 0x0 0xf4200000 0x0 0xf4200000 0x0 0x01e00000>,
+                        <0x03000000 0x0 0x40000000 0x3 0x00000000 0x0 0x40000000>;
                resets = <&cru SRST_PCIE20_POWERUP>;
                reset-names = "pipe";
                #address-cells = <3>;
index 657c019..a3124bd 100644 (file)
                        cache-line-size = <64>;
                        cache-sets = <512>;
                        cache-level = <2>;
+                       cache-unified;
                        next-level-cache = <&l3_cache>;
                };
 
                        cache-line-size = <64>;
                        cache-sets = <512>;
                        cache-level = <2>;
+                       cache-unified;
                        next-level-cache = <&l3_cache>;
                };
 
                        cache-line-size = <64>;
                        cache-sets = <512>;
                        cache-level = <2>;
+                       cache-unified;
                        next-level-cache = <&l3_cache>;
                };
 
                        cache-line-size = <64>;
                        cache-sets = <512>;
                        cache-level = <2>;
+                       cache-unified;
                        next-level-cache = <&l3_cache>;
                };
 
                        cache-line-size = <64>;
                        cache-sets = <1024>;
                        cache-level = <2>;
+                       cache-unified;
                        next-level-cache = <&l3_cache>;
                };
 
                        cache-line-size = <64>;
                        cache-sets = <1024>;
                        cache-level = <2>;
+                       cache-unified;
                        next-level-cache = <&l3_cache>;
                };
 
                        cache-line-size = <64>;
                        cache-sets = <1024>;
                        cache-level = <2>;
+                       cache-unified;
                        next-level-cache = <&l3_cache>;
                };
 
                        cache-line-size = <64>;
                        cache-sets = <1024>;
                        cache-level = <2>;
+                       cache-unified;
                        next-level-cache = <&l3_cache>;
                };
 
                        cache-line-size = <64>;
                        cache-sets = <4096>;
                        cache-level = <3>;
+                       cache-unified;
                };
        };
 
index a406454..f1b8a04 100644 (file)
@@ -67,7 +67,7 @@ static int __init hyperv_init(void)
        if (ret)
                return ret;
 
-       ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/hyperv_init:online",
+       ret = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "arm64/hyperv_init:online",
                                hv_common_cpu_init, hv_common_cpu_die);
        if (ret < 0) {
                hv_common_free();
index bdf1f6b..94b4861 100644 (file)
 
 #include <linux/stringify.h>
 
-#define ALTINSTR_ENTRY(feature)                                                      \
+#define ALTINSTR_ENTRY(cpucap)                                               \
        " .word 661b - .\n"                             /* label           */ \
        " .word 663f - .\n"                             /* new instruction */ \
-       " .hword " __stringify(feature) "\n"            /* feature bit     */ \
+       " .hword " __stringify(cpucap) "\n"             /* cpucap          */ \
        " .byte 662b-661b\n"                            /* source len      */ \
        " .byte 664f-663f\n"                            /* replacement len */
 
-#define ALTINSTR_ENTRY_CB(feature, cb)                                       \
+#define ALTINSTR_ENTRY_CB(cpucap, cb)                                        \
        " .word 661b - .\n"                             /* label           */ \
-       " .word " __stringify(cb) "- .\n"               /* callback */        \
-       " .hword " __stringify(feature) "\n"            /* feature bit     */ \
+       " .word " __stringify(cb) "- .\n"               /* callback        */ \
+       " .hword " __stringify(cpucap) "\n"             /* cpucap          */ \
        " .byte 662b-661b\n"                            /* source len      */ \
        " .byte 664f-663f\n"                            /* replacement len */
 
  *
  * Alternatives with callbacks do not generate replacement instructions.
  */
-#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled)    \
+#define __ALTERNATIVE_CFG(oldinstr, newinstr, cpucap, cfg_enabled)     \
        ".if "__stringify(cfg_enabled)" == 1\n"                         \
        "661:\n\t"                                                      \
        oldinstr "\n"                                                   \
        "662:\n"                                                        \
        ".pushsection .altinstructions,\"a\"\n"                         \
-       ALTINSTR_ENTRY(feature)                                         \
+       ALTINSTR_ENTRY(cpucap)                                          \
        ".popsection\n"                                                 \
        ".subsection 1\n"                                               \
        "663:\n\t"                                                      \
        ".previous\n"                                                   \
        ".endif\n"
 
-#define __ALTERNATIVE_CFG_CB(oldinstr, feature, cfg_enabled, cb)       \
+#define __ALTERNATIVE_CFG_CB(oldinstr, cpucap, cfg_enabled, cb)        \
        ".if "__stringify(cfg_enabled)" == 1\n"                         \
        "661:\n\t"                                                      \
        oldinstr "\n"                                                   \
        "662:\n"                                                        \
        ".pushsection .altinstructions,\"a\"\n"                         \
-       ALTINSTR_ENTRY_CB(feature, cb)                                  \
+       ALTINSTR_ENTRY_CB(cpucap, cb)                                   \
        ".popsection\n"                                                 \
        "663:\n\t"                                                      \
        "664:\n\t"                                                      \
        ".endif\n"
 
-#define _ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg, ...)        \
-       __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg))
+#define _ALTERNATIVE_CFG(oldinstr, newinstr, cpucap, cfg, ...) \
+       __ALTERNATIVE_CFG(oldinstr, newinstr, cpucap, IS_ENABLED(cfg))
 
-#define ALTERNATIVE_CB(oldinstr, feature, cb) \
-       __ALTERNATIVE_CFG_CB(oldinstr, (1 << ARM64_CB_SHIFT) | (feature), 1, cb)
+#define ALTERNATIVE_CB(oldinstr, cpucap, cb) \
+       __ALTERNATIVE_CFG_CB(oldinstr, (1 << ARM64_CB_SHIFT) | (cpucap), 1, cb)
 #else
 
 #include <asm/assembler.h>
 
-.macro altinstruction_entry orig_offset alt_offset feature orig_len alt_len
+.macro altinstruction_entry orig_offset alt_offset cpucap orig_len alt_len
        .word \orig_offset - .
        .word \alt_offset - .
-       .hword (\feature)
+       .hword (\cpucap)
        .byte \orig_len
        .byte \alt_len
 .endm
@@ -210,9 +210,9 @@ alternative_endif
 #endif  /*  __ASSEMBLY__  */
 
 /*
- * Usage: asm(ALTERNATIVE(oldinstr, newinstr, feature));
+ * Usage: asm(ALTERNATIVE(oldinstr, newinstr, cpucap));
  *
- * Usage: asm(ALTERNATIVE(oldinstr, newinstr, feature, CONFIG_FOO));
+ * Usage: asm(ALTERNATIVE(oldinstr, newinstr, cpucap, CONFIG_FOO));
  * N.B. If CONFIG_FOO is specified, but not selected, the whole block
  *      will be omitted, including oldinstr.
  */
@@ -224,15 +224,15 @@ alternative_endif
 #include <linux/types.h>
 
 static __always_inline bool
-alternative_has_feature_likely(const unsigned long feature)
+alternative_has_cap_likely(const unsigned long cpucap)
 {
-       compiletime_assert(feature < ARM64_NCAPS,
-                          "feature must be < ARM64_NCAPS");
+       compiletime_assert(cpucap < ARM64_NCAPS,
+                          "cpucap must be < ARM64_NCAPS");
 
        asm_volatile_goto(
-       ALTERNATIVE_CB("b       %l[l_no]", %[feature], alt_cb_patch_nops)
+       ALTERNATIVE_CB("b       %l[l_no]", %[cpucap], alt_cb_patch_nops)
        :
-       : [feature] "i" (feature)
+       : [cpucap] "i" (cpucap)
        :
        : l_no);
 
@@ -242,15 +242,15 @@ l_no:
 }
 
 static __always_inline bool
-alternative_has_feature_unlikely(const unsigned long feature)
+alternative_has_cap_unlikely(const unsigned long cpucap)
 {
-       compiletime_assert(feature < ARM64_NCAPS,
-                          "feature must be < ARM64_NCAPS");
+       compiletime_assert(cpucap < ARM64_NCAPS,
+                          "cpucap must be < ARM64_NCAPS");
 
        asm_volatile_goto(
-       ALTERNATIVE("nop", "b   %l[l_yes]", %[feature])
+       ALTERNATIVE("nop", "b   %l[l_yes]", %[cpucap])
        :
-       : [feature] "i" (feature)
+       : [cpucap] "i" (cpucap)
        :
        : l_yes);
 
index a38b92e..00d97b8 100644 (file)
@@ -13,7 +13,7 @@
 struct alt_instr {
        s32 orig_offset;        /* offset to original instruction */
        s32 alt_offset;         /* offset to replacement instruction */
-       u16 cpufeature;         /* cpufeature bit set for replacement */
+       u16 cpucap;             /* cpucap bit set for replacement */
        u8  orig_len;           /* size of original instruction(s) */
        u8  alt_len;            /* size of new instruction(s), <= orig_len */
 };
@@ -23,7 +23,7 @@ typedef void (*alternative_cb_t)(struct alt_instr *alt,
 
 void __init apply_boot_alternatives(void);
 void __init apply_alternatives_all(void);
-bool alternative_is_applied(u16 cpufeature);
+bool alternative_is_applied(u16 cpucap);
 
 #ifdef CONFIG_MODULES
 void apply_alternatives_module(void *start, size_t length);
@@ -31,5 +31,8 @@ void apply_alternatives_module(void *start, size_t length);
 static inline void apply_alternatives_module(void *start, size_t length) { }
 #endif
 
+void alt_cb_patch_nops(struct alt_instr *alt, __le32 *origptr,
+                      __le32 *updptr, int nr_inst);
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_ALTERNATIVE_H */
index af1fafb..934c658 100644 (file)
@@ -88,13 +88,7 @@ static inline notrace u64 arch_timer_read_cntvct_el0(void)
 
 #define arch_timer_reg_read_stable(reg)                                        \
        ({                                                              \
-               u64 _val;                                               \
-                                                                       \
-               preempt_disable_notrace();                              \
-               _val = erratum_handler(read_ ## reg)();                 \
-               preempt_enable_notrace();                               \
-                                                                       \
-               _val;                                                   \
+               erratum_handler(read_ ## reg)();                        \
        })
 
 /*
index 2f5f3da..b0abc64 100644 (file)
@@ -129,4 +129,6 @@ static inline bool __init __early_cpu_has_rndr(void)
        return (ftr >> ID_AA64ISAR0_EL1_RNDR_SHIFT) & 0xf;
 }
 
+u64 kaslr_early_init(void *fdt);
+
 #endif /* _ASM_ARCHRANDOM_H */
index d6b51de..18dc2fb 100644 (file)
@@ -13,7 +13,7 @@
 
 #define RETURN_READ_PMEVCNTRN(n) \
        return read_sysreg(pmevcntr##n##_el0)
-static unsigned long read_pmevcntrn(int n)
+static inline unsigned long read_pmevcntrn(int n)
 {
        PMEVN_SWITCH(n, RETURN_READ_PMEVCNTRN);
        return 0;
@@ -21,14 +21,14 @@ static unsigned long read_pmevcntrn(int n)
 
 #define WRITE_PMEVCNTRN(n) \
        write_sysreg(val, pmevcntr##n##_el0)
-static void write_pmevcntrn(int n, unsigned long val)
+static inline void write_pmevcntrn(int n, unsigned long val)
 {
        PMEVN_SWITCH(n, WRITE_PMEVCNTRN);
 }
 
 #define WRITE_PMEVTYPERN(n) \
        write_sysreg(val, pmevtyper##n##_el0)
-static void write_pmevtypern(int n, unsigned long val)
+static inline void write_pmevtypern(int n, unsigned long val)
 {
        PMEVN_SWITCH(n, WRITE_PMEVTYPERN);
 }
index 75b211c..5b6efe8 100644 (file)
@@ -18,7 +18,6 @@
        bic     \tmp1, \tmp1, #TTBR_ASID_MASK
        sub     \tmp1, \tmp1, #RESERVED_SWAPPER_OFFSET  // reserved_pg_dir
        msr     ttbr0_el1, \tmp1                        // set reserved TTBR0_EL1
-       isb
        add     \tmp1, \tmp1, #RESERVED_SWAPPER_OFFSET
        msr     ttbr1_el1, \tmp1                // set reserved ASID
        isb
@@ -31,7 +30,6 @@
        extr    \tmp2, \tmp2, \tmp1, #48
        ror     \tmp2, \tmp2, #16
        msr     ttbr1_el1, \tmp2                // set the active ASID
-       isb
        msr     ttbr0_el1, \tmp1                // set the non-PAN TTBR0_EL1
        isb
        .endm
index c997927..400d279 100644 (file)
@@ -142,24 +142,6 @@ static __always_inline long arch_atomic64_dec_if_positive(atomic64_t *v)
 #define arch_atomic_fetch_xor_release          arch_atomic_fetch_xor_release
 #define arch_atomic_fetch_xor                  arch_atomic_fetch_xor
 
-#define arch_atomic_xchg_relaxed(v, new) \
-       arch_xchg_relaxed(&((v)->counter), (new))
-#define arch_atomic_xchg_acquire(v, new) \
-       arch_xchg_acquire(&((v)->counter), (new))
-#define arch_atomic_xchg_release(v, new) \
-       arch_xchg_release(&((v)->counter), (new))
-#define arch_atomic_xchg(v, new) \
-       arch_xchg(&((v)->counter), (new))
-
-#define arch_atomic_cmpxchg_relaxed(v, old, new) \
-       arch_cmpxchg_relaxed(&((v)->counter), (old), (new))
-#define arch_atomic_cmpxchg_acquire(v, old, new) \
-       arch_cmpxchg_acquire(&((v)->counter), (old), (new))
-#define arch_atomic_cmpxchg_release(v, old, new) \
-       arch_cmpxchg_release(&((v)->counter), (old), (new))
-#define arch_atomic_cmpxchg(v, old, new) \
-       arch_cmpxchg(&((v)->counter), (old), (new))
-
 #define arch_atomic_andnot                     arch_atomic_andnot
 
 /*
@@ -209,16 +191,6 @@ static __always_inline long arch_atomic64_dec_if_positive(atomic64_t *v)
 #define arch_atomic64_fetch_xor_release                arch_atomic64_fetch_xor_release
 #define arch_atomic64_fetch_xor                        arch_atomic64_fetch_xor
 
-#define arch_atomic64_xchg_relaxed             arch_atomic_xchg_relaxed
-#define arch_atomic64_xchg_acquire             arch_atomic_xchg_acquire
-#define arch_atomic64_xchg_release             arch_atomic_xchg_release
-#define arch_atomic64_xchg                     arch_atomic_xchg
-
-#define arch_atomic64_cmpxchg_relaxed          arch_atomic_cmpxchg_relaxed
-#define arch_atomic64_cmpxchg_acquire          arch_atomic_cmpxchg_acquire
-#define arch_atomic64_cmpxchg_release          arch_atomic_cmpxchg_release
-#define arch_atomic64_cmpxchg                  arch_atomic_cmpxchg
-
 #define arch_atomic64_andnot                   arch_atomic64_andnot
 
 #define arch_atomic64_dec_if_positive          arch_atomic64_dec_if_positive
index cbb3d96..89d2ba2 100644 (file)
@@ -294,38 +294,46 @@ __CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
 
 #undef __CMPXCHG_CASE
 
-#define __CMPXCHG_DBL(name, mb, rel, cl)                               \
-static __always_inline long                                            \
-__ll_sc__cmpxchg_double##name(unsigned long old1,                      \
-                                     unsigned long old2,               \
-                                     unsigned long new1,               \
-                                     unsigned long new2,               \
-                                     volatile void *ptr)               \
+union __u128_halves {
+       u128 full;
+       struct {
+               u64 low, high;
+       };
+};
+
+#define __CMPXCHG128(name, mb, rel, cl...)                             \
+static __always_inline u128                                            \
+__ll_sc__cmpxchg128##name(volatile u128 *ptr, u128 old, u128 new)      \
 {                                                                      \
-       unsigned long tmp, ret;                                         \
+       union __u128_halves r, o = { .full = (old) },                   \
+                              n = { .full = (new) };                   \
+       unsigned int tmp;                                               \
                                                                        \
-       asm volatile("// __cmpxchg_double" #name "\n"                   \
-       "       prfm    pstl1strm, %2\n"                                \
-       "1:     ldxp    %0, %1, %2\n"                                   \
-       "       eor     %0, %0, %3\n"                                   \
-       "       eor     %1, %1, %4\n"                                   \
-       "       orr     %1, %0, %1\n"                                   \
-       "       cbnz    %1, 2f\n"                                       \
-       "       st" #rel "xp    %w0, %5, %6, %2\n"                      \
-       "       cbnz    %w0, 1b\n"                                      \
+       asm volatile("// __cmpxchg128" #name "\n"                       \
+       "       prfm    pstl1strm, %[v]\n"                              \
+       "1:     ldxp    %[rl], %[rh], %[v]\n"                           \
+       "       cmp     %[rl], %[ol]\n"                                 \
+       "       ccmp    %[rh], %[oh], 0, eq\n"                          \
+       "       b.ne    2f\n"                                           \
+       "       st" #rel "xp    %w[tmp], %[nl], %[nh], %[v]\n"          \
+       "       cbnz    %w[tmp], 1b\n"                                  \
        "       " #mb "\n"                                              \
        "2:"                                                            \
-       : "=&r" (tmp), "=&r" (ret), "+Q" (*(__uint128_t *)ptr)          \
-       : "r" (old1), "r" (old2), "r" (new1), "r" (new2)                \
-       : cl);                                                          \
+       : [v] "+Q" (*(u128 *)ptr),                                      \
+         [rl] "=&r" (r.low), [rh] "=&r" (r.high),                      \
+         [tmp] "=&r" (tmp)                                             \
+       : [ol] "r" (o.low), [oh] "r" (o.high),                          \
+         [nl] "r" (n.low), [nh] "r" (n.high)                           \
+       : "cc", ##cl);                                                  \
                                                                        \
-       return ret;                                                     \
+       return r.full;                                                  \
 }
 
-__CMPXCHG_DBL(   ,        ,  ,         )
-__CMPXCHG_DBL(_mb, dmb ish, l, "memory")
+__CMPXCHG128(   ,        ,  )
+__CMPXCHG128(_mb, dmb ish, l, "memory")
+
+#undef __CMPXCHG128
 
-#undef __CMPXCHG_DBL
 #undef K
 
 #endif /* __ASM_ATOMIC_LL_SC_H */
index 319958b..87f568a 100644 (file)
@@ -281,40 +281,35 @@ __CMPXCHG_CASE(x,  ,  mb_, 64, al, "memory")
 
 #undef __CMPXCHG_CASE
 
-#define __CMPXCHG_DBL(name, mb, cl...)                                 \
-static __always_inline long                                            \
-__lse__cmpxchg_double##name(unsigned long old1,                                \
-                                        unsigned long old2,            \
-                                        unsigned long new1,            \
-                                        unsigned long new2,            \
-                                        volatile void *ptr)            \
+#define __CMPXCHG128(name, mb, cl...)                                  \
+static __always_inline u128                                            \
+__lse__cmpxchg128##name(volatile u128 *ptr, u128 old, u128 new)                \
 {                                                                      \
-       unsigned long oldval1 = old1;                                   \
-       unsigned long oldval2 = old2;                                   \
-       register unsigned long x0 asm ("x0") = old1;                    \
-       register unsigned long x1 asm ("x1") = old2;                    \
-       register unsigned long x2 asm ("x2") = new1;                    \
-       register unsigned long x3 asm ("x3") = new2;                    \
+       union __u128_halves r, o = { .full = (old) },                   \
+                              n = { .full = (new) };                   \
+       register unsigned long x0 asm ("x0") = o.low;                   \
+       register unsigned long x1 asm ("x1") = o.high;                  \
+       register unsigned long x2 asm ("x2") = n.low;                   \
+       register unsigned long x3 asm ("x3") = n.high;                  \
        register unsigned long x4 asm ("x4") = (unsigned long)ptr;      \
                                                                        \
        asm volatile(                                                   \
        __LSE_PREAMBLE                                                  \
        "       casp" #mb "\t%[old1], %[old2], %[new1], %[new2], %[v]\n"\
-       "       eor     %[old1], %[old1], %[oldval1]\n"                 \
-       "       eor     %[old2], %[old2], %[oldval2]\n"                 \
-       "       orr     %[old1], %[old1], %[old2]"                      \
        : [old1] "+&r" (x0), [old2] "+&r" (x1),                         \
-         [v] "+Q" (*(__uint128_t *)ptr)                                \
+         [v] "+Q" (*(u128 *)ptr)                                       \
        : [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4),             \
-         [oldval1] "r" (oldval1), [oldval2] "r" (oldval2)              \
+         [oldval1] "r" (o.low), [oldval2] "r" (o.high)                 \
        : cl);                                                          \
                                                                        \
-       return x0;                                                      \
+       r.low = x0; r.high = x1;                                        \
+                                                                       \
+       return r.full;                                                  \
 }
 
-__CMPXCHG_DBL(   ,   )
-__CMPXCHG_DBL(_mb, al, "memory")
+__CMPXCHG128(   ,   )
+__CMPXCHG128(_mb, al, "memory")
 
-#undef __CMPXCHG_DBL
+#undef __CMPXCHG128
 
 #endif /* __ASM_ATOMIC_LSE_H */
index c6bc5d8..d7a5407 100644 (file)
@@ -130,21 +130,18 @@ __CMPXCHG_CASE(mb_, 64)
 
 #undef __CMPXCHG_CASE
 
-#define __CMPXCHG_DBL(name)                                            \
-static inline long __cmpxchg_double##name(unsigned long old1,          \
-                                        unsigned long old2,            \
-                                        unsigned long new1,            \
-                                        unsigned long new2,            \
-                                        volatile void *ptr)            \
+#define __CMPXCHG128(name)                                             \
+static inline u128 __cmpxchg128##name(volatile u128 *ptr,              \
+                                     u128 old, u128 new)               \
 {                                                                      \
-       return __lse_ll_sc_body(_cmpxchg_double##name,                  \
-                               old1, old2, new1, new2, ptr);           \
+       return __lse_ll_sc_body(_cmpxchg128##name,                      \
+                               ptr, old, new);                         \
 }
 
-__CMPXCHG_DBL(   )
-__CMPXCHG_DBL(_mb)
+__CMPXCHG128(   )
+__CMPXCHG128(_mb)
 
-#undef __CMPXCHG_DBL
+#undef __CMPXCHG128
 
 #define __CMPXCHG_GEN(sfx)                                             \
 static __always_inline unsigned long __cmpxchg##sfx(volatile void *ptr,        \
@@ -198,34 +195,17 @@ __CMPXCHG_GEN(_mb)
 #define arch_cmpxchg64                 arch_cmpxchg
 #define arch_cmpxchg64_local           arch_cmpxchg_local
 
-/* cmpxchg_double */
-#define system_has_cmpxchg_double()     1
-
-#define __cmpxchg_double_check(ptr1, ptr2)                                     \
-({                                                                             \
-       if (sizeof(*(ptr1)) != 8)                                               \
-               BUILD_BUG();                                                    \
-       VM_BUG_ON((unsigned long *)(ptr2) - (unsigned long *)(ptr1) != 1);      \
-})
+/* cmpxchg128 */
+#define system_has_cmpxchg128()                1
 
-#define arch_cmpxchg_double(ptr1, ptr2, o1, o2, n1, n2)                                \
+#define arch_cmpxchg128(ptr, o, n)                                             \
 ({                                                                             \
-       int __ret;                                                              \
-       __cmpxchg_double_check(ptr1, ptr2);                                     \
-       __ret = !__cmpxchg_double_mb((unsigned long)(o1), (unsigned long)(o2),  \
-                                    (unsigned long)(n1), (unsigned long)(n2),  \
-                                    ptr1);                                     \
-       __ret;                                                                  \
+       __cmpxchg128_mb((ptr), (o), (n));                                       \
 })
 
-#define arch_cmpxchg_double_local(ptr1, ptr2, o1, o2, n1, n2)                  \
+#define arch_cmpxchg128_local(ptr, o, n)                                       \
 ({                                                                             \
-       int __ret;                                                              \
-       __cmpxchg_double_check(ptr1, ptr2);                                     \
-       __ret = !__cmpxchg_double((unsigned long)(o1), (unsigned long)(o2),     \
-                                 (unsigned long)(n1), (unsigned long)(n2),     \
-                                 ptr1);                                        \
-       __ret;                                                                  \
+       __cmpxchg128((ptr), (o), (n));                                          \
 })
 
 #define __CMPWAIT_CASE(w, sfx, sz)                                     \
index 74575c3..ae904a1 100644 (file)
@@ -96,6 +96,8 @@ static inline int is_compat_thread(struct thread_info *thread)
        return test_ti_thread_flag(thread, TIF_32BIT);
 }
 
+long compat_arm_syscall(struct pt_regs *regs, int scno);
+
 #else /* !CONFIG_COMPAT */
 
 static inline int is_compat_thread(struct thread_info *thread)
index fd7a922..e749838 100644 (file)
@@ -56,6 +56,7 @@ struct cpuinfo_arm64 {
        u64             reg_id_aa64mmfr0;
        u64             reg_id_aa64mmfr1;
        u64             reg_id_aa64mmfr2;
+       u64             reg_id_aa64mmfr3;
        u64             reg_id_aa64pfr0;
        u64             reg_id_aa64pfr1;
        u64             reg_id_aa64zfr0;
index 6bf013f..7a95c32 100644 (file)
@@ -107,7 +107,7 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;
  * CPU capabilities:
  *
  * We use arm64_cpu_capabilities to represent system features, errata work
- * arounds (both used internally by kernel and tracked in cpu_hwcaps) and
+ * arounds (both used internally by kernel and tracked in system_cpucaps) and
  * ELF HWCAPs (which are exposed to user).
  *
  * To support systems with heterogeneous CPUs, we need to make sure that we
@@ -419,12 +419,12 @@ static __always_inline bool is_hyp_code(void)
        return is_vhe_hyp_code() || is_nvhe_hyp_code();
 }
 
-extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
+extern DECLARE_BITMAP(system_cpucaps, ARM64_NCAPS);
 
-extern DECLARE_BITMAP(boot_capabilities, ARM64_NCAPS);
+extern DECLARE_BITMAP(boot_cpucaps, ARM64_NCAPS);
 
 #define for_each_available_cap(cap)            \
-       for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
+       for_each_set_bit(cap, system_cpucaps, ARM64_NCAPS)
 
 bool this_cpu_has_cap(unsigned int cap);
 void cpu_set_feature(unsigned int num);
@@ -437,7 +437,7 @@ unsigned long cpu_get_elf_hwcap2(void);
 
 static __always_inline bool system_capabilities_finalized(void)
 {
-       return alternative_has_feature_likely(ARM64_ALWAYS_SYSTEM);
+       return alternative_has_cap_likely(ARM64_ALWAYS_SYSTEM);
 }
 
 /*
@@ -449,7 +449,7 @@ static __always_inline bool cpus_have_cap(unsigned int num)
 {
        if (num >= ARM64_NCAPS)
                return false;
-       return arch_test_bit(num, cpu_hwcaps);
+       return arch_test_bit(num, system_cpucaps);
 }
 
 /*
@@ -464,7 +464,7 @@ static __always_inline bool __cpus_have_const_cap(int num)
 {
        if (num >= ARM64_NCAPS)
                return false;
-       return alternative_has_feature_unlikely(num);
+       return alternative_has_cap_unlikely(num);
 }
 
 /*
@@ -504,16 +504,6 @@ static __always_inline bool cpus_have_const_cap(int num)
                return cpus_have_cap(num);
 }
 
-static inline void cpus_set_cap(unsigned int num)
-{
-       if (num >= ARM64_NCAPS) {
-               pr_warn("Attempt to set an illegal CPU capability (%d >= %d)\n",
-                       num, ARM64_NCAPS);
-       } else {
-               __set_bit(num, cpu_hwcaps);
-       }
-}
-
 static inline int __attribute_const__
 cpuid_feature_extract_signed_field_width(u64 features, int field, int width)
 {
index 683ca3a..5f6f848 100644 (file)
 #define APPLE_CPU_PART_M1_FIRESTORM_MAX        0x029
 #define APPLE_CPU_PART_M2_BLIZZARD     0x032
 #define APPLE_CPU_PART_M2_AVALANCHE    0x033
+#define APPLE_CPU_PART_M2_BLIZZARD_PRO 0x034
+#define APPLE_CPU_PART_M2_AVALANCHE_PRO        0x035
+#define APPLE_CPU_PART_M2_BLIZZARD_MAX 0x038
+#define APPLE_CPU_PART_M2_AVALANCHE_MAX        0x039
 
 #define AMPERE_CPU_PART_AMPERE1                0xAC3
 
 #define MIDR_APPLE_M1_FIRESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM_MAX)
 #define MIDR_APPLE_M2_BLIZZARD MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_BLIZZARD)
 #define MIDR_APPLE_M2_AVALANCHE MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_AVALANCHE)
+#define MIDR_APPLE_M2_BLIZZARD_PRO MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_BLIZZARD_PRO)
+#define MIDR_APPLE_M2_AVALANCHE_PRO MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_AVALANCHE_PRO)
+#define MIDR_APPLE_M2_BLIZZARD_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_BLIZZARD_MAX)
+#define MIDR_APPLE_M2_AVALANCHE_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_AVALANCHE_MAX)
 #define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, AMPERE_CPU_PART_AMPERE1)
 
 /* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */
index f86b157..ef46f2d 100644 (file)
@@ -166,4 +166,6 @@ static inline void efi_capsule_flush_cache_range(void *addr, int size)
        dcache_clean_inval_poc((unsigned long)addr, (unsigned long)addr + size);
 }
 
+efi_status_t efi_handle_corrupted_x18(efi_status_t s, const char *f);
+
 #endif /* _ASM_EFI_H */
index 037724b..f4c3d30 100644 (file)
        isb
 .endm
 
+.macro __init_el2_hcrx
+       mrs     x0, id_aa64mmfr1_el1
+       ubfx    x0, x0, #ID_AA64MMFR1_EL1_HCX_SHIFT, #4
+       cbz     x0, .Lskip_hcrx_\@
+       mov_q   x0, HCRX_HOST_FLAGS
+       msr_s   SYS_HCRX_EL2, x0
+.Lskip_hcrx_\@:
+.endm
+
 /*
  * Allow Non-secure EL1 and EL0 to access physical timer and counter.
  * This is not necessary for VHE, since the host kernel runs in EL2,
@@ -69,7 +78,7 @@
        cbz     x0, .Lskip_trace_\@             // Skip if TraceBuffer is not present
 
        mrs_s   x0, SYS_TRBIDR_EL1
-       and     x0, x0, TRBIDR_PROG
+       and     x0, x0, TRBIDR_EL1_P
        cbnz    x0, .Lskip_trace_\@             // If TRBE is available at EL2
 
        mov     x0, #(MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT)
        mov     x0, xzr
        mrs     x1, id_aa64pfr1_el1
        ubfx    x1, x1, #ID_AA64PFR1_EL1_SME_SHIFT, #4
-       cbz     x1, .Lset_fgt_\@
+       cbz     x1, .Lset_pie_fgt_\@
 
        /* Disable nVHE traps of TPIDR2 and SMPRI */
        orr     x0, x0, #HFGxTR_EL2_nSMPRI_EL1_MASK
        orr     x0, x0, #HFGxTR_EL2_nTPIDR2_EL0_MASK
 
+.Lset_pie_fgt_\@:
+       mrs_s   x1, SYS_ID_AA64MMFR3_EL1
+       ubfx    x1, x1, #ID_AA64MMFR3_EL1_S1PIE_SHIFT, #4
+       cbz     x1, .Lset_fgt_\@
+
+       /* Disable trapping of PIR_EL1 / PIRE0_EL1 */
+       orr     x0, x0, #HFGxTR_EL2_nPIR_EL1
+       orr     x0, x0, #HFGxTR_EL2_nPIRE0_EL1
+
 .Lset_fgt_\@:
        msr_s   SYS_HFGRTR_EL2, x0
        msr_s   SYS_HFGWTR_EL2, x0
  */
 .macro init_el2_state
        __init_el2_sctlr
+       __init_el2_hcrx
        __init_el2_timers
        __init_el2_debug
        __init_el2_lor
        cbz     x1, .Lskip_sme_\@
 
        msr_s   SYS_SMPRIMAP_EL2, xzr           // Make all priorities equal
-
-       mrs     x1, id_aa64mmfr1_el1            // HCRX_EL2 present?
-       ubfx    x1, x1, #ID_AA64MMFR1_EL1_HCX_SHIFT, #4
-       cbz     x1, .Lskip_sme_\@
-
-       mrs_s   x1, SYS_HCRX_EL2
-       orr     x1, x1, #HCRX_EL2_SMPME_MASK    // Enable priority mapping
-       msr_s   SYS_HCRX_EL2, x1
 .Lskip_sme_\@:
 .endm
 
index 8487aec..ae35939 100644 (file)
@@ -47,7 +47,7 @@
 #define ESR_ELx_EC_DABT_LOW    (0x24)
 #define ESR_ELx_EC_DABT_CUR    (0x25)
 #define ESR_ELx_EC_SP_ALIGN    (0x26)
-/* Unallocated EC: 0x27 */
+#define ESR_ELx_EC_MOPS                (0x27)
 #define ESR_ELx_EC_FP_EXC32    (0x28)
 /* Unallocated EC: 0x29 - 0x2B */
 #define ESR_ELx_EC_FP_EXC64    (0x2C)
 
 #define ESR_ELx_IL_SHIFT       (25)
 #define ESR_ELx_IL             (UL(1) << ESR_ELx_IL_SHIFT)
-#define ESR_ELx_ISS_MASK       (ESR_ELx_IL - 1)
+#define ESR_ELx_ISS_MASK       (GENMASK(24, 0))
 #define ESR_ELx_ISS(esr)       ((esr) & ESR_ELx_ISS_MASK)
+#define ESR_ELx_ISS2_SHIFT     (32)
+#define ESR_ELx_ISS2_MASK      (GENMASK_ULL(55, 32))
+#define ESR_ELx_ISS2(esr)      (((esr) & ESR_ELx_ISS2_MASK) >> ESR_ELx_ISS2_SHIFT)
 
 /* ISS field definitions shared by different classes */
 #define ESR_ELx_WNR_SHIFT      (6)
 #define ESR_ELx_CM_SHIFT       (8)
 #define ESR_ELx_CM             (UL(1) << ESR_ELx_CM_SHIFT)
 
+/* ISS2 field definitions for Data Aborts */
+#define ESR_ELx_TnD_SHIFT      (10)
+#define ESR_ELx_TnD            (UL(1) << ESR_ELx_TnD_SHIFT)
+#define ESR_ELx_TagAccess_SHIFT        (9)
+#define ESR_ELx_TagAccess      (UL(1) << ESR_ELx_TagAccess_SHIFT)
+#define ESR_ELx_GCS_SHIFT      (8)
+#define ESR_ELx_GCS            (UL(1) << ESR_ELx_GCS_SHIFT)
+#define ESR_ELx_Overlay_SHIFT  (6)
+#define ESR_ELx_Overlay                (UL(1) << ESR_ELx_Overlay_SHIFT)
+#define ESR_ELx_DirtyBit_SHIFT (5)
+#define ESR_ELx_DirtyBit       (UL(1) << ESR_ELx_DirtyBit_SHIFT)
+#define ESR_ELx_Xs_SHIFT       (0)
+#define ESR_ELx_Xs_MASK                (GENMASK_ULL(4, 0))
+
 /* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV             (UL(1) << 24)
 #define ESR_ELx_COND_SHIFT     (20)
 #define ESR_ELx_SME_ISS_ZA_DISABLED    3
 #define ESR_ELx_SME_ISS_ZT_DISABLED    4
 
+/* ISS field definitions for MOPS exceptions */
+#define ESR_ELx_MOPS_ISS_MEM_INST      (UL(1) << 24)
+#define ESR_ELx_MOPS_ISS_FROM_EPILOGUE (UL(1) << 18)
+#define ESR_ELx_MOPS_ISS_WRONG_OPTION  (UL(1) << 17)
+#define ESR_ELx_MOPS_ISS_OPTION_A      (UL(1) << 16)
+#define ESR_ELx_MOPS_ISS_DESTREG(esr)  (((esr) & (UL(0x1f) << 10)) >> 10)
+#define ESR_ELx_MOPS_ISS_SRCREG(esr)   (((esr) & (UL(0x1f) << 5)) >> 5)
+#define ESR_ELx_MOPS_ISS_SIZEREG(esr)  (((esr) & (UL(0x1f) << 0)) >> 0)
+
 #ifndef __ASSEMBLY__
 #include <asm/types.h>
 
index e73af70..ad688e1 100644 (file)
@@ -8,16 +8,11 @@
 #define __ASM_EXCEPTION_H
 
 #include <asm/esr.h>
-#include <asm/kprobes.h>
 #include <asm/ptrace.h>
 
 #include <linux/interrupt.h>
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
 #define __exception_irq_entry  __irq_entry
-#else
-#define __exception_irq_entry  __kprobes
-#endif
 
 static inline unsigned long disr_to_esr(u64 disr)
 {
@@ -77,6 +72,7 @@ void do_el0_svc(struct pt_regs *regs);
 void do_el0_svc_compat(struct pt_regs *regs);
 void do_el0_fpac(struct pt_regs *regs, unsigned long esr);
 void do_el1_fpac(struct pt_regs *regs, unsigned long esr);
+void do_el0_mops(struct pt_regs *regs, unsigned long esr);
 void do_serror(struct pt_regs *regs, unsigned long esr);
 void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags);
 
index fa4c6ff..8405532 100644 (file)
@@ -154,4 +154,12 @@ static inline int get_num_wrps(void)
                                                ID_AA64DFR0_EL1_WRPs_SHIFT);
 }
 
+#ifdef CONFIG_CPU_PM
+extern void cpu_suspend_set_dbg_restorer(int (*hw_bp_restore)(unsigned int));
+#else
+static inline void cpu_suspend_set_dbg_restorer(int (*hw_bp_restore)(unsigned int))
+{
+}
+#endif
+
 #endif /* __ASM_BREAKPOINT_H */
index 5d45f19..692b1ec 100644 (file)
 #define KERNEL_HWCAP_SME_BI32I32       __khwcap2_feature(SME_BI32I32)
 #define KERNEL_HWCAP_SME_B16B16                __khwcap2_feature(SME_B16B16)
 #define KERNEL_HWCAP_SME_F16F16                __khwcap2_feature(SME_F16F16)
+#define KERNEL_HWCAP_MOPS              __khwcap2_feature(MOPS)
 
 /*
  * This yields a mask that user programs can use to figure out what
index 877495a..51d92ab 100644 (file)
  * Generic IO read/write.  These perform native-endian accesses.
  */
 #define __raw_writeb __raw_writeb
-static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
+static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
 {
        asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_writew __raw_writew
-static inline void __raw_writew(u16 val, volatile void __iomem *addr)
+static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
 {
        asm volatile("strh %w0, [%1]" : : "rZ" (val), "r" (addr));
 }
@@ -40,13 +40,13 @@ static __always_inline void __raw_writel(u32 val, volatile void __iomem *addr)
 }
 
 #define __raw_writeq __raw_writeq
-static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
+static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
 {
        asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_readb __raw_readb
-static inline u8 __raw_readb(const volatile void __iomem *addr)
+static __always_inline u8 __raw_readb(const volatile void __iomem *addr)
 {
        u8 val;
        asm volatile(ALTERNATIVE("ldrb %w0, [%1]",
@@ -57,7 +57,7 @@ static inline u8 __raw_readb(const volatile void __iomem *addr)
 }
 
 #define __raw_readw __raw_readw
-static inline u16 __raw_readw(const volatile void __iomem *addr)
+static __always_inline u16 __raw_readw(const volatile void __iomem *addr)
 {
        u16 val;
 
@@ -80,7 +80,7 @@ static __always_inline u32 __raw_readl(const volatile void __iomem *addr)
 }
 
 #define __raw_readq __raw_readq
-static inline u64 __raw_readq(const volatile void __iomem *addr)
+static __always_inline u64 __raw_readq(const volatile void __iomem *addr)
 {
        u64 val;
        asm volatile(ALTERNATIVE("ldr %0, [%1]",
index e0f5f6b..1f31ec1 100644 (file)
@@ -24,7 +24,7 @@
 static __always_inline bool __irqflags_uses_pmr(void)
 {
        return IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) &&
-              alternative_has_feature_unlikely(ARM64_HAS_GIC_PRIO_MASKING);
+              alternative_has_cap_unlikely(ARM64_HAS_GIC_PRIO_MASKING);
 }
 
 static __always_inline void __daif_local_irq_enable(void)
index 186dd7f..5777738 100644 (file)
 /*
  * Initial memory map attributes.
  */
-#define SWAPPER_PTE_FLAGS      (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
-#define SWAPPER_PMD_FLAGS      (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+#define SWAPPER_PTE_FLAGS      (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
+#define SWAPPER_PMD_FLAGS      (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | PTE_UXN)
 
 #ifdef CONFIG_ARM64_4K_PAGES
-#define SWAPPER_RW_MMUFLAGS    (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
+#define SWAPPER_RW_MMUFLAGS    (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS | PTE_WRITE)
 #define SWAPPER_RX_MMUFLAGS    (SWAPPER_RW_MMUFLAGS | PMD_SECT_RDONLY)
 #else
-#define SWAPPER_RW_MMUFLAGS    (PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
+#define SWAPPER_RW_MMUFLAGS    (PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS | PTE_WRITE)
 #define SWAPPER_RX_MMUFLAGS    (SWAPPER_RW_MMUFLAGS | PTE_RDONLY)
 #endif
 
index baef29f..c6e12e8 100644 (file)
@@ -9,6 +9,7 @@
 
 #include <asm/esr.h>
 #include <asm/memory.h>
+#include <asm/sysreg.h>
 #include <asm/types.h>
 
 /* Hyp Configuration Register (HCR) bits */
@@ -92,6 +93,9 @@
 #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
+#define HCRX_GUEST_FLAGS (HCRX_EL2_SMPME | HCRX_EL2_TCR2En)
+#define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En)
+
 /* TCR_EL2 Registers bits */
 #define TCR_EL2_RES1           ((1U << 31) | (1 << 23))
 #define TCR_EL2_TBI            (1 << 20)
index 43c3bc0..86042af 100644 (file)
@@ -267,6 +267,24 @@ extern u64 __kvm_get_mdcr_el2(void);
        __kvm_at_err;                                                   \
 } )
 
+void __noreturn hyp_panic(void);
+asmlinkage void kvm_unexpected_el2_exception(void);
+asmlinkage void __noreturn hyp_panic(void);
+asmlinkage void __noreturn hyp_panic_bad_stack(void);
+asmlinkage void kvm_unexpected_el2_exception(void);
+struct kvm_cpu_context;
+void handle_trap(struct kvm_cpu_context *host_ctxt);
+asmlinkage void __noreturn kvm_host_psci_cpu_entry(bool is_cpu_on);
+void __noreturn __pkvm_init_finalise(void);
+void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc);
+void kvm_patch_vector_branch(struct alt_instr *alt,
+       __le32 *origptr, __le32 *updptr, int nr_inst);
+void kvm_get_kimage_voffset(struct alt_instr *alt,
+       __le32 *origptr, __le32 *updptr, int nr_inst);
+void kvm_compute_final_ctr_el0(struct alt_instr *alt,
+       __le32 *origptr, __le32 *updptr, int nr_inst);
+void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr, u64 elr_virt,
+       u64 elr_phys, u64 par, uintptr_t vcpu, u64 far, u64 hpfar);
 
 #else /* __ASSEMBLY__ */
 
index 7e7e19e..d48609d 100644 (file)
@@ -279,6 +279,7 @@ enum vcpu_sysreg {
        TTBR0_EL1,      /* Translation Table Base Register 0 */
        TTBR1_EL1,      /* Translation Table Base Register 1 */
        TCR_EL1,        /* Translation Control Register */
+       TCR2_EL1,       /* Extended Translation Control Register */
        ESR_EL1,        /* Exception Syndrome Register */
        AFSR0_EL1,      /* Auxiliary Fault Status Register 0 */
        AFSR1_EL1,      /* Auxiliary Fault Status Register 1 */
@@ -339,6 +340,10 @@ enum vcpu_sysreg {
        TFSR_EL1,       /* Tag Fault Status Register (EL1) */
        TFSRE0_EL1,     /* Tag Fault Status Register (EL0) */
 
+       /* Permission Indirection Extension registers */
+       PIR_EL1,       /* Permission Indirection Register 1 (EL1) */
+       PIRE0_EL1,     /*  Permission Indirection Register 0 (EL1) */
+
        /* 32bit specific registers. */
        DACR32_EL2,     /* Domain Access Control Register */
        IFSR32_EL2,     /* Instruction Fault Status Register */
@@ -699,6 +704,8 @@ struct kvm_vcpu_arch {
 #define SYSREGS_ON_CPU         __vcpu_single_flag(sflags, BIT(4))
 /* Software step state is Active-pending */
 #define DBG_SS_ACTIVE_PENDING  __vcpu_single_flag(sflags, BIT(5))
+/* PMUSERENR for the guest EL0 is on physical CPU */
+#define PMUSERENR_ON_CPU       __vcpu_single_flag(sflags, BIT(6))
 
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
@@ -1031,7 +1038,7 @@ void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
 #define kvm_vcpu_os_lock_enabled(vcpu)         \
-       (!!(__vcpu_sys_reg(vcpu, OSLSR_EL1) & SYS_OSLSR_OSLK))
+       (!!(__vcpu_sys_reg(vcpu, OSLSR_EL1) & OSLSR_EL1_OSLK))
 
 int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
                               struct kvm_device_attr *attr);
@@ -1065,9 +1072,14 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
 #ifdef CONFIG_KVM
 void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
 void kvm_clr_pmu_events(u32 clr);
+bool kvm_set_pmuserenr(u64 val);
 #else
 static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
 static inline void kvm_clr_pmu_events(u32 clr) {}
+static inline bool kvm_set_pmuserenr(u64 val)
+{
+       return false;
+}
 #endif
 
 void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
index 4cd6762..93bd097 100644 (file)
@@ -209,6 +209,7 @@ struct kvm_pgtable_visit_ctx {
        kvm_pte_t                               old;
        void                                    *arg;
        struct kvm_pgtable_mm_ops               *mm_ops;
+       u64                                     start;
        u64                                     addr;
        u64                                     end;
        u32                                     level;
@@ -631,9 +632,9 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
  *
  * The walker will walk the page-table entries corresponding to the input
  * address range specified, visiting entries according to the walker flags.
- * Invalid entries are treated as leaf entries. Leaf entries are reloaded
- * after invoking the walker callback, allowing the walker to descend into
- * a newly installed table.
+ * Invalid entries are treated as leaf entries. The visited page table entry is
+ * reloaded after invoking the walker callback, allowing the walker to descend
+ * into a newly installed table.
  *
  * Returning a negative error code from the walker callback function will
  * terminate the walk immediately with the same error code.
index f99d748..cbbcdc3 100644 (file)
@@ -18,7 +18,7 @@
 
 static __always_inline bool system_uses_lse_atomics(void)
 {
-       return alternative_has_feature_likely(ARM64_HAS_LSE_ATOMICS);
+       return alternative_has_cap_likely(ARM64_HAS_LSE_ATOMICS);
 }
 
 #define __lse_ll_sc_body(op, ...)                                      \
index c735afd..6e0e572 100644 (file)
@@ -46,7 +46,7 @@
 #define KIMAGE_VADDR           (MODULES_END)
 #define MODULES_END            (MODULES_VADDR + MODULES_VSIZE)
 #define MODULES_VADDR          (_PAGE_END(VA_BITS_MIN))
-#define MODULES_VSIZE          (SZ_128M)
+#define MODULES_VSIZE          (SZ_2G)
 #define VMEMMAP_START          (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
 #define VMEMMAP_END            (VMEMMAP_START + VMEMMAP_SIZE)
 #define PCI_IO_END             (VMEMMAP_START - SZ_8M)
@@ -204,15 +204,17 @@ static inline unsigned long kaslr_offset(void)
        return kimage_vaddr - KIMAGE_VADDR;
 }
 
+#ifdef CONFIG_RANDOMIZE_BASE
+void kaslr_init(void);
 static inline bool kaslr_enabled(void)
 {
-       /*
-        * The KASLR offset modulo MIN_KIMG_ALIGN is taken from the physical
-        * placement of the image rather than from the seed, so a displacement
-        * of less than MIN_KIMG_ALIGN means that no seed was provided.
-        */
-       return kaslr_offset() >= MIN_KIMG_ALIGN;
+       extern bool __kaslr_is_enabled;
+       return __kaslr_is_enabled;
 }
+#else
+static inline void kaslr_init(void) { }
+static inline bool kaslr_enabled(void) { return false; }
+#endif
 
 /*
  * Allow all memory at the discovery stage. We will clip it later.
index 5691169..a6fb325 100644 (file)
@@ -39,11 +39,16 @@ static inline void contextidr_thread_switch(struct task_struct *next)
 /*
  * Set TTBR0 to reserved_pg_dir. No translations will be possible via TTBR0.
  */
-static inline void cpu_set_reserved_ttbr0(void)
+static inline void cpu_set_reserved_ttbr0_nosync(void)
 {
        unsigned long ttbr = phys_to_ttbr(__pa_symbol(reserved_pg_dir));
 
        write_sysreg(ttbr, ttbr0_el1);
+}
+
+static inline void cpu_set_reserved_ttbr0(void)
+{
+       cpu_set_reserved_ttbr0_nosync();
        isb();
 }
 
@@ -52,7 +57,6 @@ void cpu_do_switch_mm(phys_addr_t pgd_phys, struct mm_struct *mm);
 static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
 {
        BUG_ON(pgd == swapper_pg_dir);
-       cpu_set_reserved_ttbr0();
        cpu_do_switch_mm(virt_to_phys(pgd),mm);
 }
 
@@ -164,7 +168,7 @@ static inline void cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap)
                 * up (i.e. cpufeature framework is not up yet) and
                 * latter only when we enable CNP via cpufeature's
                 * enable() callback.
-                * Also we rely on the cpu_hwcap bit being set before
+                * Also we rely on the system_cpucaps bit being set before
                 * calling the enable() function.
                 */
                ttbr1 |= TTBR_CNP_BIT;
index 18734fe..bfa6638 100644 (file)
@@ -7,7 +7,6 @@
 
 #include <asm-generic/module.h>
 
-#ifdef CONFIG_ARM64_MODULE_PLTS
 struct mod_plt_sec {
        int                     plt_shndx;
        int                     plt_num_entries;
@@ -21,7 +20,6 @@ struct mod_arch_specific {
        /* for CONFIG_DYNAMIC_FTRACE */
        struct plt_entry        *ftrace_trampolines;
 };
-#endif
 
 u64 module_emit_plt_entry(struct module *mod, Elf64_Shdr *sechdrs,
                          void *loc, const Elf64_Rela *rela,
@@ -30,12 +28,6 @@ u64 module_emit_plt_entry(struct module *mod, Elf64_Shdr *sechdrs,
 u64 module_emit_veneer_for_adrp(struct module *mod, Elf64_Shdr *sechdrs,
                                void *loc, u64 val);
 
-#ifdef CONFIG_RANDOMIZE_BASE
-extern u64 module_alloc_base;
-#else
-#define module_alloc_base      ((u64)_etext - MODULES_VSIZE)
-#endif
-
 struct plt_entry {
        /*
         * A program that conforms to the AArch64 Procedure Call Standard
index dbba4b7..b9ae834 100644 (file)
@@ -1,9 +1,7 @@
 SECTIONS {
-#ifdef CONFIG_ARM64_MODULE_PLTS
        .plt 0 : { BYTE(0) }
        .init.plt 0 : { BYTE(0) }
        .text.ftrace_trampoline 0 : { BYTE(0) }
-#endif
 
 #ifdef CONFIG_KASAN_SW_TAGS
        /*
index b9ba19d..9abcc8e 100644 (file)
@@ -140,17 +140,11 @@ PERCPU_RET_OP(add, add, ldadd)
  * re-enabling preemption for preemptible kernels, but doing that in a way
  * which builds inside a module would mean messing directly with the preempt
  * count. If you do this, peterz and tglx will hunt you down.
+ *
+ * Not to mention it'll break the actual preemption model for missing a
+ * preemption point when TIF_NEED_RESCHED gets set while preemption is
+ * disabled.
  */
-#define this_cpu_cmpxchg_double_8(ptr1, ptr2, o1, o2, n1, n2)          \
-({                                                                     \
-       int __ret;                                                      \
-       preempt_disable_notrace();                                      \
-       __ret = cmpxchg_double_local(   raw_cpu_ptr(&(ptr1)),           \
-                                       raw_cpu_ptr(&(ptr2)),           \
-                                       o1, o2, n1, n2);                \
-       preempt_enable_notrace();                                       \
-       __ret;                                                          \
-})
 
 #define _pcp_protect(op, pcp, ...)                                     \
 ({                                                                     \
@@ -240,6 +234,22 @@ PERCPU_RET_OP(add, add, ldadd)
 #define this_cpu_cmpxchg_8(pcp, o, n)  \
        _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
 
+#define this_cpu_cmpxchg64(pcp, o, n)  this_cpu_cmpxchg_8(pcp, o, n)
+
+#define this_cpu_cmpxchg128(pcp, o, n)                                 \
+({                                                                     \
+       typedef typeof(pcp) pcp_op_T__;                                 \
+       u128 old__, new__, ret__;                                       \
+       pcp_op_T__ *ptr__;                                              \
+       old__ = o;                                                      \
+       new__ = n;                                                      \
+       preempt_disable_notrace();                                      \
+       ptr__ = raw_cpu_ptr(&(pcp));                                    \
+       ret__ = cmpxchg128_local((void *)ptr__, old__, new__);          \
+       preempt_enable_notrace();                                       \
+       ret__;                                                          \
+})
+
 #ifdef __KVM_NVHE_HYPERVISOR__
 extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
 #define __per_cpu_offset
index f658aaf..e4944d5 100644 (file)
 #define PTE_ATTRINDX_MASK      (_AT(pteval_t, 7) << 2)
 
 /*
+ * PIIndex[3:0] encoding (Permission Indirection Extension)
+ */
+#define PTE_PI_IDX_0   6       /* AP[1], USER */
+#define PTE_PI_IDX_1   51      /* DBM */
+#define PTE_PI_IDX_2   53      /* PXN */
+#define PTE_PI_IDX_3   54      /* UXN */
+
+/*
  * Memory Attribute override for Stage-2 (MemAttr[3:0])
  */
 #define PTE_S2_MEMATTR(t)      (_AT(pteval_t, (t)) << 2)
index 9b16511..eed814b 100644 (file)
  */
 #define PMD_PRESENT_INVALID    (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
 
+#define _PROT_DEFAULT          (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
+#define _PROT_SECT_DEFAULT     (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+
+#define PROT_DEFAULT           (_PROT_DEFAULT | PTE_MAYBE_NG)
+#define PROT_SECT_DEFAULT      (_PROT_SECT_DEFAULT | PMD_MAYBE_NG)
+
+#define PROT_DEVICE_nGnRnE     (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRnE))
+#define PROT_DEVICE_nGnRE      (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRE))
+#define PROT_NORMAL_NC         (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_NC))
+#define PROT_NORMAL            (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL))
+#define PROT_NORMAL_TAGGED     (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_TAGGED))
+
+#define PROT_SECT_DEVICE_nGnRE (PROT_SECT_DEFAULT | PMD_SECT_PXN | PMD_SECT_UXN | PMD_ATTRINDX(MT_DEVICE_nGnRE))
+#define PROT_SECT_NORMAL       (PROT_SECT_DEFAULT | PMD_SECT_PXN | PMD_SECT_UXN | PTE_WRITE | PMD_ATTRINDX(MT_NORMAL))
+#define PROT_SECT_NORMAL_EXEC  (PROT_SECT_DEFAULT | PMD_SECT_UXN | PMD_ATTRINDX(MT_NORMAL))
+
+#define _PAGE_DEFAULT          (_PROT_DEFAULT | PTE_ATTRINDX(MT_NORMAL))
+
+#define _PAGE_KERNEL           (PROT_NORMAL)
+#define _PAGE_KERNEL_RO                ((PROT_NORMAL & ~PTE_WRITE) | PTE_RDONLY)
+#define _PAGE_KERNEL_ROX       ((PROT_NORMAL & ~(PTE_WRITE | PTE_PXN)) | PTE_RDONLY)
+#define _PAGE_KERNEL_EXEC      (PROT_NORMAL & ~PTE_PXN)
+#define _PAGE_KERNEL_EXEC_CONT ((PROT_NORMAL & ~PTE_PXN) | PTE_CONT)
+
+#define _PAGE_SHARED           (_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
+#define _PAGE_SHARED_EXEC      (_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_WRITE)
+#define _PAGE_READONLY         (_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
+#define _PAGE_READONLY_EXEC    (_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN)
+#define _PAGE_EXECONLY         (_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN)
+
+#ifdef __ASSEMBLY__
+#define PTE_MAYBE_NG   0
+#endif
+
 #ifndef __ASSEMBLY__
 
 #include <asm/cpufeature.h>
@@ -34,9 +68,6 @@
 
 extern bool arm64_use_ng_mappings;
 
-#define _PROT_DEFAULT          (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
-#define _PROT_SECT_DEFAULT     (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
-
 #define PTE_MAYBE_NG           (arm64_use_ng_mappings ? PTE_NG : 0)
 #define PMD_MAYBE_NG           (arm64_use_ng_mappings ? PMD_SECT_NG : 0)
 
@@ -50,26 +81,11 @@ extern bool arm64_use_ng_mappings;
 #define PTE_MAYBE_GP           0
 #endif
 
-#define PROT_DEFAULT           (_PROT_DEFAULT | PTE_MAYBE_NG)
-#define PROT_SECT_DEFAULT      (_PROT_SECT_DEFAULT | PMD_MAYBE_NG)
-
-#define PROT_DEVICE_nGnRnE     (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRnE))
-#define PROT_DEVICE_nGnRE      (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRE))
-#define PROT_NORMAL_NC         (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_NC))
-#define PROT_NORMAL            (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL))
-#define PROT_NORMAL_TAGGED     (PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_TAGGED))
-
-#define PROT_SECT_DEVICE_nGnRE (PROT_SECT_DEFAULT | PMD_SECT_PXN | PMD_SECT_UXN | PMD_ATTRINDX(MT_DEVICE_nGnRE))
-#define PROT_SECT_NORMAL       (PROT_SECT_DEFAULT | PMD_SECT_PXN | PMD_SECT_UXN | PMD_ATTRINDX(MT_NORMAL))
-#define PROT_SECT_NORMAL_EXEC  (PROT_SECT_DEFAULT | PMD_SECT_UXN | PMD_ATTRINDX(MT_NORMAL))
-
-#define _PAGE_DEFAULT          (_PROT_DEFAULT | PTE_ATTRINDX(MT_NORMAL))
-
-#define PAGE_KERNEL            __pgprot(PROT_NORMAL)
-#define PAGE_KERNEL_RO         __pgprot((PROT_NORMAL & ~PTE_WRITE) | PTE_RDONLY)
-#define PAGE_KERNEL_ROX                __pgprot((PROT_NORMAL & ~(PTE_WRITE | PTE_PXN)) | PTE_RDONLY)
-#define PAGE_KERNEL_EXEC       __pgprot(PROT_NORMAL & ~PTE_PXN)
-#define PAGE_KERNEL_EXEC_CONT  __pgprot((PROT_NORMAL & ~PTE_PXN) | PTE_CONT)
+#define PAGE_KERNEL            __pgprot(_PAGE_KERNEL)
+#define PAGE_KERNEL_RO         __pgprot(_PAGE_KERNEL_RO)
+#define PAGE_KERNEL_ROX                __pgprot(_PAGE_KERNEL_ROX)
+#define PAGE_KERNEL_EXEC       __pgprot(_PAGE_KERNEL_EXEC)
+#define PAGE_KERNEL_EXEC_CONT  __pgprot(_PAGE_KERNEL_EXEC_CONT)
 
 #define PAGE_S2_MEMATTR(attr, has_fwb)                                 \
        ({                                                              \
@@ -83,12 +99,62 @@ extern bool arm64_use_ng_mappings;
 
 #define PAGE_NONE              __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
 /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */
-#define PAGE_SHARED            __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
-#define PAGE_SHARED_EXEC       __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_WRITE)
-#define PAGE_READONLY          __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
-#define PAGE_READONLY_EXEC     __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN)
-#define PAGE_EXECONLY          __pgprot(_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN)
+#define PAGE_SHARED            __pgprot(_PAGE_SHARED)
+#define PAGE_SHARED_EXEC       __pgprot(_PAGE_SHARED_EXEC)
+#define PAGE_READONLY          __pgprot(_PAGE_READONLY)
+#define PAGE_READONLY_EXEC     __pgprot(_PAGE_READONLY_EXEC)
+#define PAGE_EXECONLY          __pgprot(_PAGE_EXECONLY)
 
 #endif /* __ASSEMBLY__ */
 
+#define pte_pi_index(pte) ( \
+       ((pte & BIT(PTE_PI_IDX_3)) >> (PTE_PI_IDX_3 - 3)) | \
+       ((pte & BIT(PTE_PI_IDX_2)) >> (PTE_PI_IDX_2 - 2)) | \
+       ((pte & BIT(PTE_PI_IDX_1)) >> (PTE_PI_IDX_1 - 1)) | \
+       ((pte & BIT(PTE_PI_IDX_0)) >> (PTE_PI_IDX_0 - 0)))
+
+/*
+ * Page types used via Permission Indirection Extension (PIE). PIE uses
+ * the USER, DBM, PXN and UXN bits to to generate an index which is used
+ * to look up the actual permission in PIR_ELx and PIRE0_EL1. We define
+ * combinations we use on non-PIE systems with the same encoding, for
+ * convenience these are listed here as comments as are the unallocated
+ * encodings.
+ */
+
+/* 0: PAGE_DEFAULT                                                  */
+/* 1:                                                      PTE_USER */
+/* 2:                                          PTE_WRITE            */
+/* 3:                                          PTE_WRITE | PTE_USER */
+/* 4: PAGE_EXECONLY                  PTE_PXN                        */
+/* 5: PAGE_READONLY_EXEC             PTE_PXN |             PTE_USER */
+/* 6:                                PTE_PXN | PTE_WRITE            */
+/* 7: PAGE_SHARED_EXEC               PTE_PXN | PTE_WRITE | PTE_USER */
+/* 8: PAGE_KERNEL_ROX      PTE_UXN                                  */
+/* 9:                      PTE_UXN |                       PTE_USER */
+/* a: PAGE_KERNEL_EXEC     PTE_UXN |           PTE_WRITE            */
+/* b:                      PTE_UXN |           PTE_WRITE | PTE_USER */
+/* c: PAGE_KERNEL_RO       PTE_UXN | PTE_PXN                        */
+/* d: PAGE_READONLY        PTE_UXN | PTE_PXN |             PTE_USER */
+/* e: PAGE_KERNEL          PTE_UXN | PTE_PXN | PTE_WRITE            */
+/* f: PAGE_SHARED          PTE_UXN | PTE_PXN | PTE_WRITE | PTE_USER */
+
+#define PIE_E0 ( \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY),      PIE_X_O) | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX)  | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC),   PIE_RWX) | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY),      PIE_R)   | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED),        PIE_RW))
+
+#define PIE_E1 ( \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY),      PIE_NONE_O) | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_R)      | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC),   PIE_RW)     | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY),      PIE_R)      | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED),        PIE_RW)     | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL_ROX),    PIE_RX)     | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL_EXEC),   PIE_RWX)    | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL_RO),     PIE_R)      | \
+       PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL),        PIE_RW))
+
 #endif /* __ASM_PGTABLE_PROT_H */
index 13df982..3fdae5f 100644 (file)
@@ -73,6 +73,7 @@ static inline void dynamic_scs_init(void) {}
 #endif
 
 int scs_patch(const u8 eh_frame[], int size);
+asmlinkage void scs_patch_vmlinux(void);
 
 #endif /* __ASSEMBLY __ */
 
index f2d2623..9b31e6d 100644 (file)
@@ -99,7 +99,7 @@ static inline void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
 
 extern int __cpu_disable(void);
 
-extern void __cpu_die(unsigned int cpu);
+static inline void __cpu_die(unsigned int cpu) { }
 extern void __noreturn cpu_die(void);
 extern void __noreturn cpu_die_early(void);
 
index db7b371..9cc5014 100644 (file)
@@ -100,5 +100,21 @@ bool is_spectre_bhb_affected(const struct arm64_cpu_capabilities *entry, int sco
 u8 spectre_bhb_loop_affected(int scope);
 void spectre_bhb_enable_mitigation(const struct arm64_cpu_capabilities *__unused);
 bool try_emulate_el1_ssbs(struct pt_regs *regs, u32 instr);
+
+void spectre_v4_patch_fw_mitigation_enable(struct alt_instr *alt, __le32 *origptr,
+                                          __le32 *updptr, int nr_inst);
+void smccc_patch_fw_mitigation_conduit(struct alt_instr *alt, __le32 *origptr,
+                                      __le32 *updptr, int nr_inst);
+void spectre_bhb_patch_loop_mitigation_enable(struct alt_instr *alt, __le32 *origptr,
+                                             __le32 *updptr, int nr_inst);
+void spectre_bhb_patch_fw_mitigation_enabled(struct alt_instr *alt, __le32 *origptr,
+                                            __le32 *updptr, int nr_inst);
+void spectre_bhb_patch_loop_iter(struct alt_instr *alt,
+                                __le32 *origptr, __le32 *updptr, int nr_inst);
+void spectre_bhb_patch_wa3(struct alt_instr *alt,
+                          __le32 *origptr, __le32 *updptr, int nr_inst);
+void spectre_bhb_patch_clearbhb(struct alt_instr *alt,
+                               __le32 *origptr, __le32 *updptr, int nr_inst);
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_SPECTRE_H */
index d30217c..17f6875 100644 (file)
@@ -38,6 +38,7 @@
        asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs *__unused)
 
 #define COND_SYSCALL_COMPAT(name)                                                      \
+       asmlinkage long __arm64_compat_sys_##name(const struct pt_regs *regs);          \
        asmlinkage long __weak __arm64_compat_sys_##name(const struct pt_regs *regs)    \
        {                                                                               \
                return sys_ni_syscall();                                                \
@@ -53,6 +54,7 @@
        ALLOW_ERROR_INJECTION(__arm64_sys##name, ERRNO);                        \
        static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));             \
        static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));      \
+       asmlinkage long __arm64_sys##name(const struct pt_regs *regs);          \
        asmlinkage long __arm64_sys##name(const struct pt_regs *regs)           \
        {                                                                       \
                return __se_sys##name(SC_ARM64_REGS_TO_ARGS(x,__VA_ARGS__));    \
        asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused)
 
 #define COND_SYSCALL(name)                                                     \
+       asmlinkage long __arm64_sys_##name(const struct pt_regs *regs);         \
        asmlinkage long __weak __arm64_sys_##name(const struct pt_regs *regs)   \
        {                                                                       \
                return sys_ni_syscall();                                        \
        }
 
+asmlinkage long __arm64_sys_ni_syscall(const struct pt_regs *__unused);
 #define SYS_NI(name) SYSCALL_ALIAS(__arm64_sys_##name, sys_ni_posix_timers);
 
 #endif /* __ASM_SYSCALL_WRAPPER_H */
index e72d9aa..7a1e626 100644 (file)
 #define SB_BARRIER_INSN                        __SYS_BARRIER_INSN(0, 7, 31)
 
 #define SYS_DC_ISW                     sys_insn(1, 0, 7, 6, 2)
+#define SYS_DC_IGSW                    sys_insn(1, 0, 7, 6, 4)
+#define SYS_DC_IGDSW                   sys_insn(1, 0, 7, 6, 6)
 #define SYS_DC_CSW                     sys_insn(1, 0, 7, 10, 2)
+#define SYS_DC_CGSW                    sys_insn(1, 0, 7, 10, 4)
+#define SYS_DC_CGDSW                   sys_insn(1, 0, 7, 10, 6)
 #define SYS_DC_CISW                    sys_insn(1, 0, 7, 14, 2)
+#define SYS_DC_CIGSW                   sys_insn(1, 0, 7, 14, 4)
+#define SYS_DC_CIGDSW                  sys_insn(1, 0, 7, 14, 6)
 
 /*
  * Automatically generated definitions for system registers, the
 #define SYS_SVCR_SMSTART_SM_EL0                sys_reg(0, 3, 4, 3, 3)
 #define SYS_SVCR_SMSTOP_SMZA_EL0       sys_reg(0, 3, 4, 6, 3)
 
-#define SYS_OSDTRRX_EL1                        sys_reg(2, 0, 0, 0, 2)
-#define SYS_MDCCINT_EL1                        sys_reg(2, 0, 0, 2, 0)
-#define SYS_MDSCR_EL1                  sys_reg(2, 0, 0, 2, 2)
-#define SYS_OSDTRTX_EL1                        sys_reg(2, 0, 0, 3, 2)
-#define SYS_OSECCR_EL1                 sys_reg(2, 0, 0, 6, 2)
 #define SYS_DBGBVRn_EL1(n)             sys_reg(2, 0, 0, n, 4)
 #define SYS_DBGBCRn_EL1(n)             sys_reg(2, 0, 0, n, 5)
 #define SYS_DBGWVRn_EL1(n)             sys_reg(2, 0, 0, n, 6)
 #define SYS_DBGWCRn_EL1(n)             sys_reg(2, 0, 0, n, 7)
 #define SYS_MDRAR_EL1                  sys_reg(2, 0, 1, 0, 0)
 
-#define SYS_OSLAR_EL1                  sys_reg(2, 0, 1, 0, 4)
-#define SYS_OSLAR_OSLK                 BIT(0)
-
 #define SYS_OSLSR_EL1                  sys_reg(2, 0, 1, 1, 4)
-#define SYS_OSLSR_OSLM_MASK            (BIT(3) | BIT(0))
-#define SYS_OSLSR_OSLM_NI              0
-#define SYS_OSLSR_OSLM_IMPLEMENTED     BIT(3)
-#define SYS_OSLSR_OSLK                 BIT(1)
+#define OSLSR_EL1_OSLM_MASK            (BIT(3) | BIT(0))
+#define OSLSR_EL1_OSLM_NI              0
+#define OSLSR_EL1_OSLM_IMPLEMENTED     BIT(3)
+#define OSLSR_EL1_OSLK                 BIT(1)
 
 #define SYS_OSDLR_EL1                  sys_reg(2, 0, 1, 3, 4)
 #define SYS_DBGPRCR_EL1                        sys_reg(2, 0, 1, 4, 4)
 
 /*** End of Statistical Profiling Extension ***/
 
-/*
- * TRBE Registers
- */
-#define SYS_TRBLIMITR_EL1              sys_reg(3, 0, 9, 11, 0)
-#define SYS_TRBPTR_EL1                 sys_reg(3, 0, 9, 11, 1)
-#define SYS_TRBBASER_EL1               sys_reg(3, 0, 9, 11, 2)
-#define SYS_TRBSR_EL1                  sys_reg(3, 0, 9, 11, 3)
-#define SYS_TRBMAR_EL1                 sys_reg(3, 0, 9, 11, 4)
-#define SYS_TRBTRG_EL1                 sys_reg(3, 0, 9, 11, 6)
-#define SYS_TRBIDR_EL1                 sys_reg(3, 0, 9, 11, 7)
-
-#define TRBLIMITR_LIMIT_MASK           GENMASK_ULL(51, 0)
-#define TRBLIMITR_LIMIT_SHIFT          12
-#define TRBLIMITR_NVM                  BIT(5)
-#define TRBLIMITR_TRIG_MODE_MASK       GENMASK(1, 0)
-#define TRBLIMITR_TRIG_MODE_SHIFT      3
-#define TRBLIMITR_FILL_MODE_MASK       GENMASK(1, 0)
-#define TRBLIMITR_FILL_MODE_SHIFT      1
-#define TRBLIMITR_ENABLE               BIT(0)
-#define TRBPTR_PTR_MASK                        GENMASK_ULL(63, 0)
-#define TRBPTR_PTR_SHIFT               0
-#define TRBBASER_BASE_MASK             GENMASK_ULL(51, 0)
-#define TRBBASER_BASE_SHIFT            12
-#define TRBSR_EC_MASK                  GENMASK(5, 0)
-#define TRBSR_EC_SHIFT                 26
-#define TRBSR_IRQ                      BIT(22)
-#define TRBSR_TRG                      BIT(21)
-#define TRBSR_WRAP                     BIT(20)
-#define TRBSR_ABORT                    BIT(18)
-#define TRBSR_STOP                     BIT(17)
-#define TRBSR_MSS_MASK                 GENMASK(15, 0)
-#define TRBSR_MSS_SHIFT                        0
-#define TRBSR_BSC_MASK                 GENMASK(5, 0)
-#define TRBSR_BSC_SHIFT                        0
-#define TRBSR_FSC_MASK                 GENMASK(5, 0)
-#define TRBSR_FSC_SHIFT                        0
-#define TRBMAR_SHARE_MASK              GENMASK(1, 0)
-#define TRBMAR_SHARE_SHIFT             8
-#define TRBMAR_OUTER_MASK              GENMASK(3, 0)
-#define TRBMAR_OUTER_SHIFT             4
-#define TRBMAR_INNER_MASK              GENMASK(3, 0)
-#define TRBMAR_INNER_SHIFT             0
-#define TRBTRG_TRG_MASK                        GENMASK(31, 0)
-#define TRBTRG_TRG_SHIFT               0
-#define TRBIDR_FLAG                    BIT(5)
-#define TRBIDR_PROG                    BIT(4)
-#define TRBIDR_ALIGN_MASK              GENMASK(3, 0)
-#define TRBIDR_ALIGN_SHIFT             0
+#define TRBSR_EL1_BSC_MASK             GENMASK(5, 0)
+#define TRBSR_EL1_BSC_SHIFT            0
 
 #define SYS_PMINTENSET_EL1             sys_reg(3, 0, 9, 14, 1)
 #define SYS_PMINTENCLR_EL1             sys_reg(3, 0, 9, 14, 2)
 #define ICH_VTR_TDS_SHIFT      19
 #define ICH_VTR_TDS_MASK       (1 << ICH_VTR_TDS_SHIFT)
 
+/*
+ * Permission Indirection Extension (PIE) permission encodings.
+ * Encodings with the _O suffix, have overlays applied (Permission Overlay Extension).
+ */
+#define PIE_NONE_O     0x0
+#define PIE_R_O                0x1
+#define PIE_X_O                0x2
+#define PIE_RX_O       0x3
+#define PIE_RW_O       0x5
+#define PIE_RWnX_O     0x6
+#define PIE_RWX_O      0x7
+#define PIE_R          0x8
+#define PIE_GCS                0x9
+#define PIE_RX         0xa
+#define PIE_RW         0xc
+#define PIE_RWX                0xe
+
+#define PIRx_ELx_PERM(idx, perm)       ((perm) << ((idx) * 4))
+
 #define ARM64_FEATURE_FIELD_BITS       4
 
 /* Defined for compatibility only, do not add new users. */
index 1f361e2..d66dfb3 100644 (file)
@@ -29,6 +29,8 @@ void arm64_force_sig_fault(int signo, int code, unsigned long far, const char *s
 void arm64_force_sig_mceerr(int code, unsigned long far, short lsb, const char *str);
 void arm64_force_sig_ptrace_errno_trap(int errno, unsigned long far, const char *str);
 
+int early_brk64(unsigned long addr, unsigned long esr, struct pt_regs *regs);
+
 /*
  * Move regs->pc to next instruction and do necessary setup before it
  * is executed.
index 05f4fc2..14be500 100644 (file)
@@ -65,7 +65,6 @@ static inline void __uaccess_ttbr0_disable(void)
        ttbr &= ~TTBR_ASID_MASK;
        /* reserved_pg_dir placed before swapper_pg_dir */
        write_sysreg(ttbr - RESERVED_SWAPPER_OFFSET, ttbr0_el1);
-       isb();
        /* Set reserved ASID */
        write_sysreg(ttbr, ttbr1_el1);
        isb();
@@ -89,7 +88,6 @@ static inline void __uaccess_ttbr0_enable(void)
        ttbr1 &= ~TTBR_ASID_MASK;               /* safety measure */
        ttbr1 |= ttbr0 & TTBR_ASID_MASK;
        write_sysreg(ttbr1, ttbr1_el1);
-       isb();
 
        /* Restore user page table */
        write_sysreg(ttbr0, ttbr0_el1);
index 69a4fb7..a2cac43 100644 (file)
 #define HWCAP2_SME_BI32I32     (1UL << 40)
 #define HWCAP2_SME_B16B16      (1UL << 41)
 #define HWCAP2_SME_F16F16      (1UL << 42)
+#define HWCAP2_MOPS            (1UL << 43)
 
 #endif /* _UAPI__ASM_HWCAP_H */
index 7c2bb4e..3864a64 100644 (file)
@@ -42,8 +42,7 @@ obj-$(CONFIG_COMPAT)                  += sigreturn32.o
 obj-$(CONFIG_COMPAT_ALIGNMENT_FIXUPS)  += compat_alignment.o
 obj-$(CONFIG_KUSER_HELPERS)            += kuser32.o
 obj-$(CONFIG_FUNCTION_TRACER)          += ftrace.o entry-ftrace.o
-obj-$(CONFIG_MODULES)                  += module.o
-obj-$(CONFIG_ARM64_MODULE_PLTS)                += module-plts.o
+obj-$(CONFIG_MODULES)                  += module.o module-plts.o
 obj-$(CONFIG_PERF_EVENTS)              += perf_regs.o perf_callchain.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)       += hw_breakpoint.o
 obj-$(CONFIG_CPU_PM)                   += sleep.o suspend.o
index d32d4ed..8ff6610 100644 (file)
@@ -24,8 +24,8 @@
 #define ALT_ORIG_PTR(a)                __ALT_PTR(a, orig_offset)
 #define ALT_REPL_PTR(a)                __ALT_PTR(a, alt_offset)
 
-#define ALT_CAP(a)             ((a)->cpufeature & ~ARM64_CB_BIT)
-#define ALT_HAS_CB(a)          ((a)->cpufeature & ARM64_CB_BIT)
+#define ALT_CAP(a)             ((a)->cpucap & ~ARM64_CB_BIT)
+#define ALT_HAS_CB(a)          ((a)->cpucap & ARM64_CB_BIT)
 
 /* Volatile, as we may be patching the guts of READ_ONCE() */
 static volatile int all_alternatives_applied;
@@ -37,12 +37,12 @@ struct alt_region {
        struct alt_instr *end;
 };
 
-bool alternative_is_applied(u16 cpufeature)
+bool alternative_is_applied(u16 cpucap)
 {
-       if (WARN_ON(cpufeature >= ARM64_NCAPS))
+       if (WARN_ON(cpucap >= ARM64_NCAPS))
                return false;
 
-       return test_bit(cpufeature, applied_alternatives);
+       return test_bit(cpucap, applied_alternatives);
 }
 
 /*
@@ -121,11 +121,11 @@ static noinstr void patch_alternative(struct alt_instr *alt,
  * accidentally call into the cache.S code, which is patched by us at
  * runtime.
  */
-static void clean_dcache_range_nopatch(u64 start, u64 end)
+static noinstr void clean_dcache_range_nopatch(u64 start, u64 end)
 {
        u64 cur, d_size, ctr_el0;
 
-       ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
+       ctr_el0 = arm64_ftr_reg_ctrel0.sys_val;
        d_size = 4 << cpuid_feature_extract_unsigned_field(ctr_el0,
                                                           CTR_EL0_DminLine_SHIFT);
        cur = start & ~(d_size - 1);
@@ -141,7 +141,7 @@ static void clean_dcache_range_nopatch(u64 start, u64 end)
 
 static void __apply_alternatives(const struct alt_region *region,
                                 bool is_module,
-                                unsigned long *feature_mask)
+                                unsigned long *cpucap_mask)
 {
        struct alt_instr *alt;
        __le32 *origptr, *updptr;
@@ -151,7 +151,7 @@ static void __apply_alternatives(const struct alt_region *region,
                int nr_inst;
                int cap = ALT_CAP(alt);
 
-               if (!test_bit(cap, feature_mask))
+               if (!test_bit(cap, cpucap_mask))
                        continue;
 
                if (!cpus_have_cap(cap))
@@ -188,11 +188,10 @@ static void __apply_alternatives(const struct alt_region *region,
                icache_inval_all_pou();
                isb();
 
-               /* Ignore ARM64_CB bit from feature mask */
                bitmap_or(applied_alternatives, applied_alternatives,
-                         feature_mask, ARM64_NCAPS);
+                         cpucap_mask, ARM64_NCAPS);
                bitmap_and(applied_alternatives, applied_alternatives,
-                          cpu_hwcaps, ARM64_NCAPS);
+                          system_cpucaps, ARM64_NCAPS);
        }
 }
 
@@ -239,7 +238,7 @@ static int __init __apply_alternatives_multi_stop(void *unused)
        } else {
                DECLARE_BITMAP(remaining_capabilities, ARM64_NCAPS);
 
-               bitmap_complement(remaining_capabilities, boot_capabilities,
+               bitmap_complement(remaining_capabilities, boot_cpucaps,
                                  ARM64_NCAPS);
 
                BUG_ON(all_alternatives_applied);
@@ -274,7 +273,7 @@ void __init apply_boot_alternatives(void)
        pr_info("applying boot alternatives\n");
 
        __apply_alternatives(&kernel_alternatives, false,
-                            &boot_capabilities[0]);
+                            &boot_cpucaps[0]);
 }
 
 #ifdef CONFIG_MODULES
index 7d7128c..6ea7f23 100644 (file)
@@ -105,11 +105,11 @@ unsigned int compat_elf_hwcap __read_mostly = COMPAT_ELF_HWCAP_DEFAULT;
 unsigned int compat_elf_hwcap2 __read_mostly;
 #endif
 
-DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
-EXPORT_SYMBOL(cpu_hwcaps);
-static struct arm64_cpu_capabilities const __ro_after_init *cpu_hwcaps_ptrs[ARM64_NCAPS];
+DECLARE_BITMAP(system_cpucaps, ARM64_NCAPS);
+EXPORT_SYMBOL(system_cpucaps);
+static struct arm64_cpu_capabilities const __ro_after_init *cpucap_ptrs[ARM64_NCAPS];
 
-DECLARE_BITMAP(boot_capabilities, ARM64_NCAPS);
+DECLARE_BITMAP(boot_cpucaps, ARM64_NCAPS);
 
 bool arm64_use_ng_mappings = false;
 EXPORT_SYMBOL(arm64_use_ng_mappings);
@@ -137,7 +137,7 @@ static cpumask_var_t cpu_32bit_el0_mask __cpumask_var_read_mostly;
 void dump_cpu_features(void)
 {
        /* file-wide pr_fmt adds "CPU features: " prefix */
-       pr_emerg("0x%*pb\n", ARM64_NCAPS, &cpu_hwcaps);
+       pr_emerg("0x%*pb\n", ARM64_NCAPS, &system_cpucaps);
 }
 
 #define ARM64_CPUID_FIELDS(reg, field, min_value)                      \
@@ -223,6 +223,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar2[] = {
        ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR2_EL1_CSSC_SHIFT, 4, 0),
        ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR2_EL1_RPRFM_SHIFT, 4, 0),
        ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_HIGHER_SAFE, ID_AA64ISAR2_EL1_BC_SHIFT, 4, 0),
+       ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR2_EL1_MOPS_SHIFT, 4, 0),
        ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_PTR_AUTH),
                       FTR_STRICT, FTR_EXACT, ID_AA64ISAR2_EL1_APA3_SHIFT, 4, 0),
        ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_PTR_AUTH),
@@ -364,6 +365,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
 static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = {
        ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_TIDCP1_SHIFT, 4, 0),
        ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_AFP_SHIFT, 4, 0),
+       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_HCX_SHIFT, 4, 0),
        ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_ETS_SHIFT, 4, 0),
        ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_TWED_SHIFT, 4, 0),
        ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_EL1_XNX_SHIFT, 4, 0),
@@ -396,6 +398,12 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
        ARM64_FTR_END,
 };
 
+static const struct arm64_ftr_bits ftr_id_aa64mmfr3[] = {
+       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR3_EL1_S1PIE_SHIFT, 4, 0),
+       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR3_EL1_TCRX_SHIFT, 4, 0),
+       ARM64_FTR_END,
+};
+
 static const struct arm64_ftr_bits ftr_ctr[] = {
        ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1), /* RES1 */
        ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_EL0_DIC_SHIFT, 1, 1),
@@ -722,6 +730,7 @@ static const struct __ftr_reg_entry {
        ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1,
                               &id_aa64mmfr1_override),
        ARM64_FTR_REG(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2),
+       ARM64_FTR_REG(SYS_ID_AA64MMFR3_EL1, ftr_id_aa64mmfr3),
 
        /* Op1 = 0, CRn = 1, CRm = 2 */
        ARM64_FTR_REG(SYS_ZCR_EL1, ftr_zcr),
@@ -954,24 +963,24 @@ extern const struct arm64_cpu_capabilities arm64_errata[];
 static const struct arm64_cpu_capabilities arm64_features[];
 
 static void __init
-init_cpu_hwcaps_indirect_list_from_array(const struct arm64_cpu_capabilities *caps)
+init_cpucap_indirect_list_from_array(const struct arm64_cpu_capabilities *caps)
 {
        for (; caps->matches; caps++) {
                if (WARN(caps->capability >= ARM64_NCAPS,
                        "Invalid capability %d\n", caps->capability))
                        continue;
-               if (WARN(cpu_hwcaps_ptrs[caps->capability],
+               if (WARN(cpucap_ptrs[caps->capability],
                        "Duplicate entry for capability %d\n",
                        caps->capability))
                        continue;
-               cpu_hwcaps_ptrs[caps->capability] = caps;
+               cpucap_ptrs[caps->capability] = caps;
        }
 }
 
-static void __init init_cpu_hwcaps_indirect_list(void)
+static void __init init_cpucap_indirect_list(void)
 {
-       init_cpu_hwcaps_indirect_list_from_array(arm64_features);
-       init_cpu_hwcaps_indirect_list_from_array(arm64_errata);
+       init_cpucap_indirect_list_from_array(arm64_features);
+       init_cpucap_indirect_list_from_array(arm64_errata);
 }
 
 static void __init setup_boot_cpu_capabilities(void);
@@ -1017,6 +1026,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
        init_cpu_ftr_reg(SYS_ID_AA64MMFR0_EL1, info->reg_id_aa64mmfr0);
        init_cpu_ftr_reg(SYS_ID_AA64MMFR1_EL1, info->reg_id_aa64mmfr1);
        init_cpu_ftr_reg(SYS_ID_AA64MMFR2_EL1, info->reg_id_aa64mmfr2);
+       init_cpu_ftr_reg(SYS_ID_AA64MMFR3_EL1, info->reg_id_aa64mmfr3);
        init_cpu_ftr_reg(SYS_ID_AA64PFR0_EL1, info->reg_id_aa64pfr0);
        init_cpu_ftr_reg(SYS_ID_AA64PFR1_EL1, info->reg_id_aa64pfr1);
        init_cpu_ftr_reg(SYS_ID_AA64ZFR0_EL1, info->reg_id_aa64zfr0);
@@ -1049,10 +1059,10 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
                init_cpu_ftr_reg(SYS_GMID_EL1, info->reg_gmid);
 
        /*
-        * Initialize the indirect array of CPU hwcaps capabilities pointers
-        * before we handle the boot CPU below.
+        * Initialize the indirect array of CPU capabilities pointers before we
+        * handle the boot CPU below.
         */
-       init_cpu_hwcaps_indirect_list();
+       init_cpucap_indirect_list();
 
        /*
         * Detect and enable early CPU capabilities based on the boot CPU,
@@ -1262,6 +1272,8 @@ void update_cpu_features(int cpu,
                                      info->reg_id_aa64mmfr1, boot->reg_id_aa64mmfr1);
        taint |= check_update_ftr_reg(SYS_ID_AA64MMFR2_EL1, cpu,
                                      info->reg_id_aa64mmfr2, boot->reg_id_aa64mmfr2);
+       taint |= check_update_ftr_reg(SYS_ID_AA64MMFR3_EL1, cpu,
+                                     info->reg_id_aa64mmfr3, boot->reg_id_aa64mmfr3);
 
        taint |= check_update_ftr_reg(SYS_ID_AA64PFR0_EL1, cpu,
                                      info->reg_id_aa64pfr0, boot->reg_id_aa64pfr0);
@@ -1391,6 +1403,7 @@ u64 __read_sysreg_by_encoding(u32 sys_id)
        read_sysreg_case(SYS_ID_AA64MMFR0_EL1);
        read_sysreg_case(SYS_ID_AA64MMFR1_EL1);
        read_sysreg_case(SYS_ID_AA64MMFR2_EL1);
+       read_sysreg_case(SYS_ID_AA64MMFR3_EL1);
        read_sysreg_case(SYS_ID_AA64ISAR0_EL1);
        read_sysreg_case(SYS_ID_AA64ISAR1_EL1);
        read_sysreg_case(SYS_ID_AA64ISAR2_EL1);
@@ -2048,9 +2061,9 @@ static bool has_address_auth_cpucap(const struct arm64_cpu_capabilities *entry,
 static bool has_address_auth_metacap(const struct arm64_cpu_capabilities *entry,
                                     int scope)
 {
-       bool api = has_address_auth_cpucap(cpu_hwcaps_ptrs[ARM64_HAS_ADDRESS_AUTH_IMP_DEF], scope);
-       bool apa = has_address_auth_cpucap(cpu_hwcaps_ptrs[ARM64_HAS_ADDRESS_AUTH_ARCH_QARMA5], scope);
-       bool apa3 = has_address_auth_cpucap(cpu_hwcaps_ptrs[ARM64_HAS_ADDRESS_AUTH_ARCH_QARMA3], scope);
+       bool api = has_address_auth_cpucap(cpucap_ptrs[ARM64_HAS_ADDRESS_AUTH_IMP_DEF], scope);
+       bool apa = has_address_auth_cpucap(cpucap_ptrs[ARM64_HAS_ADDRESS_AUTH_ARCH_QARMA5], scope);
+       bool apa3 = has_address_auth_cpucap(cpucap_ptrs[ARM64_HAS_ADDRESS_AUTH_ARCH_QARMA3], scope);
 
        return apa || apa3 || api;
 }
@@ -2186,6 +2199,11 @@ static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused)
        set_pstate_dit(1);
 }
 
+static void cpu_enable_mops(const struct arm64_cpu_capabilities *__unused)
+{
+       sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_MSCEn);
+}
+
 /* Internal helper functions to match cpu capability type */
 static bool
 cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
@@ -2235,11 +2253,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
                .capability = ARM64_HAS_ECV_CNTPOFF,
                .type = ARM64_CPUCAP_SYSTEM_FEATURE,
                .matches = has_cpuid_feature,
-               .sys_reg = SYS_ID_AA64MMFR0_EL1,
-               .field_pos = ID_AA64MMFR0_EL1_ECV_SHIFT,
-               .field_width = 4,
-               .sign = FTR_UNSIGNED,
-               .min_field_value = ID_AA64MMFR0_EL1_ECV_CNTPOFF,
+               ARM64_CPUID_FIELDS(ID_AA64MMFR0_EL1, ECV, CNTPOFF)
        },
 #ifdef CONFIG_ARM64_PAN
        {
@@ -2309,6 +2323,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
                .type = ARM64_CPUCAP_SYSTEM_FEATURE,
                .matches = is_kvm_protected_mode,
        },
+       {
+               .desc = "HCRX_EL2 register",
+               .capability = ARM64_HAS_HCX,
+               .type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
+               .matches = has_cpuid_feature,
+               ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HCX, IMP)
+       },
 #endif
        {
                .desc = "Kernel page table isolation (KPTI)",
@@ -2641,6 +2662,27 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
                .cpu_enable = cpu_enable_dit,
                ARM64_CPUID_FIELDS(ID_AA64PFR0_EL1, DIT, IMP)
        },
+       {
+               .desc = "Memory Copy and Memory Set instructions",
+               .capability = ARM64_HAS_MOPS,
+               .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+               .matches = has_cpuid_feature,
+               .cpu_enable = cpu_enable_mops,
+               ARM64_CPUID_FIELDS(ID_AA64ISAR2_EL1, MOPS, IMP)
+       },
+       {
+               .capability = ARM64_HAS_TCR2,
+               .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+               .matches = has_cpuid_feature,
+               ARM64_CPUID_FIELDS(ID_AA64MMFR3_EL1, TCRX, IMP)
+       },
+       {
+               .desc = "Stage-1 Permission Indirection Extension (S1PIE)",
+               .capability = ARM64_HAS_S1PIE,
+               .type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
+               .matches = has_cpuid_feature,
+               ARM64_CPUID_FIELDS(ID_AA64MMFR3_EL1, S1PIE, IMP)
+       },
        {},
 };
 
@@ -2769,6 +2811,7 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
        HWCAP_CAP(ID_AA64ISAR2_EL1, RPRFM, IMP, CAP_HWCAP, KERNEL_HWCAP_RPRFM),
        HWCAP_CAP(ID_AA64ISAR2_EL1, RPRES, IMP, CAP_HWCAP, KERNEL_HWCAP_RPRES),
        HWCAP_CAP(ID_AA64ISAR2_EL1, WFxT, IMP, CAP_HWCAP, KERNEL_HWCAP_WFXT),
+       HWCAP_CAP(ID_AA64ISAR2_EL1, MOPS, IMP, CAP_HWCAP, KERNEL_HWCAP_MOPS),
 #ifdef CONFIG_ARM64_SME
        HWCAP_CAP(ID_AA64PFR1_EL1, SME, IMP, CAP_HWCAP, KERNEL_HWCAP_SME),
        HWCAP_CAP(ID_AA64SMFR0_EL1, FA64, IMP, CAP_HWCAP, KERNEL_HWCAP_SME_FA64),
@@ -2895,7 +2938,7 @@ static void update_cpu_capabilities(u16 scope_mask)
 
        scope_mask &= ARM64_CPUCAP_SCOPE_MASK;
        for (i = 0; i < ARM64_NCAPS; i++) {
-               caps = cpu_hwcaps_ptrs[i];
+               caps = cpucap_ptrs[i];
                if (!caps || !(caps->type & scope_mask) ||
                    cpus_have_cap(caps->capability) ||
                    !caps->matches(caps, cpucap_default_scope(caps)))
@@ -2903,10 +2946,11 @@ static void update_cpu_capabilities(u16 scope_mask)
 
                if (caps->desc)
                        pr_info("detected: %s\n", caps->desc);
-               cpus_set_cap(caps->capability);
+
+               __set_bit(caps->capability, system_cpucaps);
 
                if ((scope_mask & SCOPE_BOOT_CPU) && (caps->type & SCOPE_BOOT_CPU))
-                       set_bit(caps->capability, boot_capabilities);
+                       set_bit(caps->capability, boot_cpucaps);
        }
 }
 
@@ -2920,7 +2964,7 @@ static int cpu_enable_non_boot_scope_capabilities(void *__unused)
        u16 non_boot_scope = SCOPE_ALL & ~SCOPE_BOOT_CPU;
 
        for_each_available_cap(i) {
-               const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[i];
+               const struct arm64_cpu_capabilities *cap = cpucap_ptrs[i];
 
                if (WARN_ON(!cap))
                        continue;
@@ -2950,7 +2994,7 @@ static void __init enable_cpu_capabilities(u16 scope_mask)
        for (i = 0; i < ARM64_NCAPS; i++) {
                unsigned int num;
 
-               caps = cpu_hwcaps_ptrs[i];
+               caps = cpucap_ptrs[i];
                if (!caps || !(caps->type & scope_mask))
                        continue;
                num = caps->capability;
@@ -2995,7 +3039,7 @@ static void verify_local_cpu_caps(u16 scope_mask)
        scope_mask &= ARM64_CPUCAP_SCOPE_MASK;
 
        for (i = 0; i < ARM64_NCAPS; i++) {
-               caps = cpu_hwcaps_ptrs[i];
+               caps = cpucap_ptrs[i];
                if (!caps || !(caps->type & scope_mask))
                        continue;
 
@@ -3194,7 +3238,7 @@ static void __init setup_boot_cpu_capabilities(void)
 bool this_cpu_has_cap(unsigned int n)
 {
        if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) {
-               const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[n];
+               const struct arm64_cpu_capabilities *cap = cpucap_ptrs[n];
 
                if (cap)
                        return cap->matches(cap, SCOPE_LOCAL_CPU);
@@ -3207,13 +3251,13 @@ EXPORT_SYMBOL_GPL(this_cpu_has_cap);
 /*
  * This helper function is used in a narrow window when,
  * - The system wide safe registers are set with all the SMP CPUs and,
- * - The SYSTEM_FEATURE cpu_hwcaps may not have been set.
+ * - The SYSTEM_FEATURE system_cpucaps may not have been set.
  * In all other cases cpus_have_{const_}cap() should be used.
  */
 static bool __maybe_unused __system_matches_cap(unsigned int n)
 {
        if (n < ARM64_NCAPS) {
-               const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[n];
+               const struct arm64_cpu_capabilities *cap = cpucap_ptrs[n];
 
                if (cap)
                        return cap->matches(cap, SCOPE_SYSTEM);
index 42e19ff..d1f6859 100644 (file)
@@ -13,7 +13,7 @@
 #include <linux/of_device.h>
 #include <linux/psci.h>
 
-#ifdef CONFIG_ACPI
+#ifdef CONFIG_ACPI_PROCESSOR_IDLE
 
 #include <acpi/processor.h>
 
index eb4378c..58622dc 100644 (file)
@@ -125,6 +125,7 @@ static const char *const hwcap_str[] = {
        [KERNEL_HWCAP_SME_BI32I32]      = "smebi32i32",
        [KERNEL_HWCAP_SME_B16B16]       = "smeb16b16",
        [KERNEL_HWCAP_SME_F16F16]       = "smef16f16",
+       [KERNEL_HWCAP_MOPS]             = "mops",
 };
 
 #ifdef CONFIG_COMPAT
@@ -446,6 +447,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
        info->reg_id_aa64mmfr0 = read_cpuid(ID_AA64MMFR0_EL1);
        info->reg_id_aa64mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
        info->reg_id_aa64mmfr2 = read_cpuid(ID_AA64MMFR2_EL1);
+       info->reg_id_aa64mmfr3 = read_cpuid(ID_AA64MMFR3_EL1);
        info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1);
        info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
        info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
index 3af3c01..6b2e0c3 100644 (file)
@@ -126,7 +126,7 @@ static __always_inline void __exit_to_user_mode(void)
        lockdep_hardirqs_on(CALLER_ADDR0);
 }
 
-static __always_inline void prepare_exit_to_user_mode(struct pt_regs *regs)
+static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs)
 {
        unsigned long flags;
 
@@ -135,11 +135,13 @@ static __always_inline void prepare_exit_to_user_mode(struct pt_regs *regs)
        flags = read_thread_flags();
        if (unlikely(flags & _TIF_WORK_MASK))
                do_notify_resume(regs, flags);
+
+       lockdep_sys_exit();
 }
 
 static __always_inline void exit_to_user_mode(struct pt_regs *regs)
 {
-       prepare_exit_to_user_mode(regs);
+       exit_to_user_mode_prepare(regs);
        mte_check_tfsr_exit();
        __exit_to_user_mode();
 }
@@ -611,6 +613,14 @@ static void noinstr el0_bti(struct pt_regs *regs)
        exit_to_user_mode(regs);
 }
 
+static void noinstr el0_mops(struct pt_regs *regs, unsigned long esr)
+{
+       enter_from_user_mode(regs);
+       local_daif_restore(DAIF_PROCCTX);
+       do_el0_mops(regs, esr);
+       exit_to_user_mode(regs);
+}
+
 static void noinstr el0_inv(struct pt_regs *regs, unsigned long esr)
 {
        enter_from_user_mode(regs);
@@ -688,6 +698,9 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
        case ESR_ELx_EC_BTI:
                el0_bti(regs);
                break;
+       case ESR_ELx_EC_MOPS:
+               el0_mops(regs, esr);
+               break;
        case ESR_ELx_EC_BREAKPT_LOW:
        case ESR_ELx_EC_SOFTSTP_LOW:
        case ESR_ELx_EC_WATCHPT_LOW:
index ab2a6e3..a40e5e5 100644 (file)
 .org .Lventry_start\@ + 128    // Did we overflow the ventry slot?
        .endm
 
-       .macro tramp_alias, dst, sym, tmp
-       mov_q   \dst, TRAMP_VALIAS
-       adr_l   \tmp, \sym
-       add     \dst, \dst, \tmp
-       adr_l   \tmp, .entry.tramp.text
-       sub     \dst, \dst, \tmp
+       .macro  tramp_alias, dst, sym
+       .set    .Lalias\@, TRAMP_VALIAS + \sym - .entry.tramp.text
+       movz    \dst, :abs_g2_s:.Lalias\@
+       movk    \dst, :abs_g1_nc:.Lalias\@
+       movk    \dst, :abs_g0_nc:.Lalias\@
        .endm
 
        /*
@@ -435,13 +434,14 @@ alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0
        eret
 alternative_else_nop_endif
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-       bne     4f
        msr     far_el1, x29
-       tramp_alias     x30, tramp_exit_native, x29
-       br      x30
-4:
-       tramp_alias     x30, tramp_exit_compat, x29
-       br      x30
+
+       ldr_this_cpu    x30, this_cpu_vector, x29
+       tramp_alias     x29, tramp_exit
+       msr             vbar_el1, x30           // install vector table
+       ldr             lr, [sp, #S_LR]         // restore x30
+       add             sp, sp, #PT_REGS_SIZE   // restore sp
+       br              x29
 #endif
        .else
        ldr     lr, [sp, #S_LR]
@@ -732,22 +732,6 @@ alternative_else_nop_endif
 .org 1b + 128  // Did we overflow the ventry slot?
        .endm
 
-       .macro tramp_exit, regsize = 64
-       tramp_data_read_var     x30, this_cpu_vector
-       get_this_cpu_offset x29
-       ldr     x30, [x30, x29]
-
-       msr     vbar_el1, x30
-       ldr     lr, [sp, #S_LR]
-       tramp_unmap_kernel      x29
-       .if     \regsize == 64
-       mrs     x29, far_el1
-       .endif
-       add     sp, sp, #PT_REGS_SIZE           // restore sp
-       eret
-       sb
-       .endm
-
        .macro  generate_tramp_vector,  kpti, bhb
 .Lvector_start\@:
        .space  0x400
@@ -768,7 +752,7 @@ alternative_else_nop_endif
  */
        .pushsection ".entry.tramp.text", "ax"
        .align  11
-SYM_CODE_START_NOALIGN(tramp_vectors)
+SYM_CODE_START_LOCAL_NOALIGN(tramp_vectors)
 #ifdef CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
        generate_tramp_vector   kpti=1, bhb=BHB_MITIGATION_LOOP
        generate_tramp_vector   kpti=1, bhb=BHB_MITIGATION_FW
@@ -777,13 +761,12 @@ SYM_CODE_START_NOALIGN(tramp_vectors)
        generate_tramp_vector   kpti=1, bhb=BHB_MITIGATION_NONE
 SYM_CODE_END(tramp_vectors)
 
-SYM_CODE_START(tramp_exit_native)
-       tramp_exit
-SYM_CODE_END(tramp_exit_native)
-
-SYM_CODE_START(tramp_exit_compat)
-       tramp_exit      32
-SYM_CODE_END(tramp_exit_compat)
+SYM_CODE_START_LOCAL(tramp_exit)
+       tramp_unmap_kernel      x29
+       mrs             x29, far_el1            // restore x29
+       eret
+       sb
+SYM_CODE_END(tramp_exit)
        .popsection                             // .entry.tramp.text
 #endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
 
@@ -1077,7 +1060,7 @@ alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0
 alternative_else_nop_endif
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-       tramp_alias     dst=x5, sym=__sdei_asm_exit_trampoline, tmp=x3
+       tramp_alias     dst=x5, sym=__sdei_asm_exit_trampoline
        br      x5
 #endif
 SYM_CODE_END(__sdei_asm_handler)
index 2fbafa5..7a1aeb9 100644 (file)
@@ -1649,6 +1649,7 @@ void fpsimd_flush_thread(void)
 
                fpsimd_flush_thread_vl(ARM64_VEC_SME);
                current->thread.svcr = 0;
+               sme_smstop();
        }
 
        current->thread.fp_type = FP_STATE_FPSIMD;
index 432626c..a650f5e 100644 (file)
@@ -197,7 +197,7 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 
 static struct plt_entry *get_ftrace_plt(struct module *mod)
 {
-#ifdef CONFIG_ARM64_MODULE_PLTS
+#ifdef CONFIG_MODULES
        struct plt_entry *plt = mod->arch.ftrace_trampolines;
 
        return &plt[FTRACE_PLT_IDX];
@@ -249,7 +249,7 @@ static bool ftrace_find_callable_addr(struct dyn_ftrace *rec,
         * must use a PLT to reach it. We can only place PLTs for modules, and
         * only when module PLT support is built-in.
         */
-       if (!IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
+       if (!IS_ENABLED(CONFIG_MODULES))
                return false;
 
        /*
@@ -431,10 +431,8 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
         *
         * Note: 'mod' is only set at module load time.
         */
-       if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_ARGS) &&
-           IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) && mod) {
+       if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_ARGS) && mod)
                return aarch64_insn_patch_text_nosync((void *)pc, new);
-       }
 
        if (!ftrace_find_callable_addr(rec, mod, &addr))
                return -EINVAL;
index e92caeb..0f5a30f 100644 (file)
@@ -382,7 +382,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
        adrp    x0, init_idmap_pg_dir
        adrp    x3, _text
        adrp    x6, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE
-       mov     x7, SWAPPER_RX_MMUFLAGS
+       mov_q   x7, SWAPPER_RX_MMUFLAGS
 
        map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
 
@@ -391,7 +391,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
        adrp    x2, init_pg_dir
        adrp    x3, init_pg_end
        bic     x4, x2, #SWAPPER_BLOCK_SIZE - 1
-       mov     x5, SWAPPER_RW_MMUFLAGS
+       mov_q   x5, SWAPPER_RW_MMUFLAGS
        mov     x6, #SWAPPER_BLOCK_SHIFT
        bl      remap_region
 
@@ -402,7 +402,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
        bfi     x22, x21, #0, #SWAPPER_BLOCK_SHIFT              // remapped FDT address
        add     x3, x2, #MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE
        bic     x4, x21, #SWAPPER_BLOCK_SIZE - 1
-       mov     x5, SWAPPER_RW_MMUFLAGS
+       mov_q   x5, SWAPPER_RW_MMUFLAGS
        mov     x6, #SWAPPER_BLOCK_SHIFT
        bl      remap_region
 
@@ -430,7 +430,7 @@ SYM_FUNC_START_LOCAL(create_kernel_mapping)
        adrp    x3, _text                       // runtime __pa(_text)
        sub     x6, x6, x3                      // _end - _text
        add     x6, x6, x5                      // runtime __va(_end)
-       mov     x7, SWAPPER_RW_MMUFLAGS
+       mov_q   x7, SWAPPER_RW_MMUFLAGS
 
        map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
 
index 788597a..02870be 100644 (file)
@@ -99,7 +99,6 @@ int pfn_is_nosave(unsigned long pfn)
 
 void notrace save_processor_state(void)
 {
-       WARN_ON(num_online_cpus() != 1);
 }
 
 void notrace restore_processor_state(void)
index b29a311..db2a186 100644 (file)
@@ -973,14 +973,6 @@ static int hw_breakpoint_reset(unsigned int cpu)
        return 0;
 }
 
-#ifdef CONFIG_CPU_PM
-extern void cpu_suspend_set_dbg_restorer(int (*hw_bp_restore)(unsigned int));
-#else
-static inline void cpu_suspend_set_dbg_restorer(int (*hw_bp_restore)(unsigned int))
-{
-}
-#endif
-
 /*
  * One-time initialisation.
  */
index 9439240..d63de19 100644 (file)
@@ -119,6 +119,24 @@ SYM_CODE_START_LOCAL(__finalise_el2)
        msr     ttbr1_el1, x0
        mrs_s   x0, SYS_MAIR_EL12
        msr     mair_el1, x0
+       mrs     x1, REG_ID_AA64MMFR3_EL1
+       ubfx    x1, x1, #ID_AA64MMFR3_EL1_TCRX_SHIFT, #4
+       cbz     x1, .Lskip_tcr2
+       mrs     x0, REG_TCR2_EL12
+       msr     REG_TCR2_EL1, x0
+
+       // Transfer permission indirection state
+       mrs     x1, REG_ID_AA64MMFR3_EL1
+       ubfx    x1, x1, #ID_AA64MMFR3_EL1_S1PIE_SHIFT, #4
+       cbz     x1, .Lskip_indirection
+       mrs     x0, REG_PIRE0_EL12
+       msr     REG_PIRE0_EL1, x0
+       mrs     x0, REG_PIR_EL12
+       msr     REG_PIR_EL1, x0
+
+.Lskip_indirection:
+.Lskip_tcr2:
+
        isb
 
        // Hack the exception return to stay at EL2
index 370ab84..8439248 100644 (file)
@@ -123,6 +123,7 @@ static const struct ftr_set_desc isar2 __initconst = {
        .fields         = {
                FIELD("gpa3", ID_AA64ISAR2_EL1_GPA3_SHIFT, NULL),
                FIELD("apa3", ID_AA64ISAR2_EL1_APA3_SHIFT, NULL),
+               FIELD("mops", ID_AA64ISAR2_EL1_MOPS_SHIFT, NULL),
                {}
        },
 };
@@ -174,6 +175,7 @@ static const struct {
          "id_aa64isar1.gpi=0 id_aa64isar1.gpa=0 "
          "id_aa64isar1.api=0 id_aa64isar1.apa=0 "
          "id_aa64isar2.gpa3=0 id_aa64isar2.apa3=0"        },
+       { "arm64.nomops",               "id_aa64isar2.mops=0" },
        { "arm64.nomte",                "id_aa64pfr1.mte=0" },
        { "nokaslr",                    "kaslr.disabled=1" },
 };
index e7477f2..17f96a1 100644 (file)
@@ -4,90 +4,35 @@
  */
 
 #include <linux/cache.h>
-#include <linux/crc32.h>
 #include <linux/init.h>
-#include <linux/libfdt.h>
-#include <linux/mm_types.h>
-#include <linux/sched.h>
-#include <linux/types.h>
-#include <linux/pgtable.h>
-#include <linux/random.h>
+#include <linux/printk.h>
 
-#include <asm/fixmap.h>
-#include <asm/kernel-pgtable.h>
+#include <asm/cpufeature.h>
 #include <asm/memory.h>
-#include <asm/mmu.h>
-#include <asm/sections.h>
-#include <asm/setup.h>
 
-u64 __ro_after_init module_alloc_base;
 u16 __initdata memstart_offset_seed;
 
 struct arm64_ftr_override kaslr_feature_override __initdata;
 
-static int __init kaslr_init(void)
-{
-       u64 module_range;
-       u32 seed;
-
-       /*
-        * Set a reasonable default for module_alloc_base in case
-        * we end up running with module randomization disabled.
-        */
-       module_alloc_base = (u64)_etext - MODULES_VSIZE;
+bool __ro_after_init __kaslr_is_enabled = false;
 
+void __init kaslr_init(void)
+{
        if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) {
                pr_info("KASLR disabled on command line\n");
-               return 0;
-       }
-
-       if (!kaslr_enabled()) {
-               pr_warn("KASLR disabled due to lack of seed\n");
-               return 0;
+               return;
        }
 
-       pr_info("KASLR enabled\n");
-
        /*
-        * KASAN without KASAN_VMALLOC does not expect the module region to
-        * intersect the vmalloc region, since shadow memory is allocated for
-        * each module at load time, whereas the vmalloc region will already be
-        * shadowed by KASAN zero pages.
+        * The KASLR offset modulo MIN_KIMG_ALIGN is taken from the physical
+        * placement of the image rather than from the seed, so a displacement
+        * of less than MIN_KIMG_ALIGN means that no seed was provided.
         */
-       BUILD_BUG_ON((IS_ENABLED(CONFIG_KASAN_GENERIC) ||
-                     IS_ENABLED(CONFIG_KASAN_SW_TAGS)) &&
-                    !IS_ENABLED(CONFIG_KASAN_VMALLOC));
-
-       seed = get_random_u32();
-
-       if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
-               /*
-                * Randomize the module region over a 2 GB window covering the
-                * kernel. This reduces the risk of modules leaking information
-                * about the address of the kernel itself, but results in
-                * branches between modules and the core kernel that are
-                * resolved via PLTs. (Branches between modules will be
-                * resolved normally.)
-                */
-               module_range = SZ_2G - (u64)(_end - _stext);
-               module_alloc_base = max((u64)_end - SZ_2G, (u64)MODULES_VADDR);
-       } else {
-               /*
-                * Randomize the module region by setting module_alloc_base to
-                * a PAGE_SIZE multiple in the range [_etext - MODULES_VSIZE,
-                * _stext) . This guarantees that the resulting region still
-                * covers [_stext, _etext], and that all relative branches can
-                * be resolved without veneers unless this region is exhausted
-                * and we fall back to a larger 2GB window in module_alloc()
-                * when ARM64_MODULE_PLTS is enabled.
-                */
-               module_range = MODULES_VSIZE - (u64)(_etext - _stext);
+       if (kaslr_offset() < MIN_KIMG_ALIGN) {
+               pr_warn("KASLR disabled due to lack of seed\n");
+               return;
        }
 
-       /* use the lower 21 bits to randomize the base of the module region */
-       module_alloc_base += (module_range * (seed & ((1 << 21) - 1))) >> 21;
-       module_alloc_base &= PAGE_MASK;
-
-       return 0;
+       pr_info("KASLR enabled\n");
+       __kaslr_is_enabled = true;
 }
-subsys_initcall(kaslr_init)
index 692e9d2..af046ce 100644 (file)
@@ -10,7 +10,7 @@
  * aarch32_setup_additional_pages() and are provided for compatibility
  * reasons with 32 bit (aarch32) applications that need them.
  *
- * See Documentation/arm/kernel_user_helpers.rst for formal definitions.
+ * See Documentation/arch/arm/kernel_user_helpers.rst for formal definitions.
  */
 
 #include <asm/unistd.h>
index 543493b..ad02058 100644 (file)
@@ -7,6 +7,7 @@
 #include <linux/ftrace.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/moduleloader.h>
 #include <linux/sort.h>
 
 static struct plt_entry __get_adrp_add_pair(u64 dst, u64 pc,
index 5af4975..dd85129 100644 (file)
@@ -7,6 +7,8 @@
  * Author: Will Deacon <will.deacon@arm.com>
  */
 
+#define pr_fmt(fmt) "Modules: " fmt
+
 #include <linux/bitops.h>
 #include <linux/elf.h>
 #include <linux/ftrace.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/moduleloader.h>
+#include <linux/random.h>
 #include <linux/scs.h>
 #include <linux/vmalloc.h>
+
 #include <asm/alternative.h>
 #include <asm/insn.h>
 #include <asm/scs.h>
 #include <asm/sections.h>
 
+static u64 module_direct_base __ro_after_init = 0;
+static u64 module_plt_base __ro_after_init = 0;
+
+/*
+ * Choose a random page-aligned base address for a window of 'size' bytes which
+ * entirely contains the interval [start, end - 1].
+ */
+static u64 __init random_bounding_box(u64 size, u64 start, u64 end)
+{
+       u64 max_pgoff, pgoff;
+
+       if ((end - start) >= size)
+               return 0;
+
+       max_pgoff = (size - (end - start)) / PAGE_SIZE;
+       pgoff = get_random_u32_inclusive(0, max_pgoff);
+
+       return start - pgoff * PAGE_SIZE;
+}
+
+/*
+ * Modules may directly reference data and text anywhere within the kernel
+ * image and other modules. References using PREL32 relocations have a +/-2G
+ * range, and so we need to ensure that the entire kernel image and all modules
+ * fall within a 2G window such that these are always within range.
+ *
+ * Modules may directly branch to functions and code within the kernel text,
+ * and to functions and code within other modules. These branches will use
+ * CALL26/JUMP26 relocations with a +/-128M range. Without PLTs, we must ensure
+ * that the entire kernel text and all module text falls within a 128M window
+ * such that these are always within range. With PLTs, we can expand this to a
+ * 2G window.
+ *
+ * We chose the 128M region to surround the entire kernel image (rather than
+ * just the text) as using the same bounds for the 128M and 2G regions ensures
+ * by construction that we never select a 128M region that is not a subset of
+ * the 2G region. For very large and unusual kernel configurations this means
+ * we may fall back to PLTs where they could have been avoided, but this keeps
+ * the logic significantly simpler.
+ */
+static int __init module_init_limits(void)
+{
+       u64 kernel_end = (u64)_end;
+       u64 kernel_start = (u64)_text;
+       u64 kernel_size = kernel_end - kernel_start;
+
+       /*
+        * The default modules region is placed immediately below the kernel
+        * image, and is large enough to use the full 2G relocation range.
+        */
+       BUILD_BUG_ON(KIMAGE_VADDR != MODULES_END);
+       BUILD_BUG_ON(MODULES_VSIZE < SZ_2G);
+
+       if (!kaslr_enabled()) {
+               if (kernel_size < SZ_128M)
+                       module_direct_base = kernel_end - SZ_128M;
+               if (kernel_size < SZ_2G)
+                       module_plt_base = kernel_end - SZ_2G;
+       } else {
+               u64 min = kernel_start;
+               u64 max = kernel_end;
+
+               if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
+                       pr_info("2G module region forced by RANDOMIZE_MODULE_REGION_FULL\n");
+               } else {
+                       module_direct_base = random_bounding_box(SZ_128M, min, max);
+                       if (module_direct_base) {
+                               min = module_direct_base;
+                               max = module_direct_base + SZ_128M;
+                       }
+               }
+
+               module_plt_base = random_bounding_box(SZ_2G, min, max);
+       }
+
+       pr_info("%llu pages in range for non-PLT usage",
+               module_direct_base ? (SZ_128M - kernel_size) / PAGE_SIZE : 0);
+       pr_info("%llu pages in range for PLT usage",
+               module_plt_base ? (SZ_2G - kernel_size) / PAGE_SIZE : 0);
+
+       return 0;
+}
+subsys_initcall(module_init_limits);
+
 void *module_alloc(unsigned long size)
 {
-       u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
-       gfp_t gfp_mask = GFP_KERNEL;
-       void *p;
-
-       /* Silence the initial allocation */
-       if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
-               gfp_mask |= __GFP_NOWARN;
-
-       if (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
-           IS_ENABLED(CONFIG_KASAN_SW_TAGS))
-               /* don't exceed the static module region - see below */
-               module_alloc_end = MODULES_END;
-
-       p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
-                               module_alloc_end, gfp_mask, PAGE_KERNEL, VM_DEFER_KMEMLEAK,
-                               NUMA_NO_NODE, __builtin_return_address(0));
-
-       if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
-           (IS_ENABLED(CONFIG_KASAN_VMALLOC) ||
-            (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
-             !IS_ENABLED(CONFIG_KASAN_SW_TAGS))))
-               /*
-                * KASAN without KASAN_VMALLOC can only deal with module
-                * allocations being served from the reserved module region,
-                * since the remainder of the vmalloc region is already
-                * backed by zero shadow pages, and punching holes into it
-                * is non-trivial. Since the module region is not randomized
-                * when KASAN is enabled without KASAN_VMALLOC, it is even
-                * less likely that the module region gets exhausted, so we
-                * can simply omit this fallback in that case.
-                */
-               p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
-                               module_alloc_base + SZ_2G, GFP_KERNEL,
-                               PAGE_KERNEL, 0, NUMA_NO_NODE,
-                               __builtin_return_address(0));
+       void *p = NULL;
+
+       /*
+        * Where possible, prefer to allocate within direct branch range of the
+        * kernel such that no PLTs are necessary.
+        */
+       if (module_direct_base) {
+               p = __vmalloc_node_range(size, MODULE_ALIGN,
+                                        module_direct_base,
+                                        module_direct_base + SZ_128M,
+                                        GFP_KERNEL | __GFP_NOWARN,
+                                        PAGE_KERNEL, 0, NUMA_NO_NODE,
+                                        __builtin_return_address(0));
+       }
 
-       if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
+       if (!p && module_plt_base) {
+               p = __vmalloc_node_range(size, MODULE_ALIGN,
+                                        module_plt_base,
+                                        module_plt_base + SZ_2G,
+                                        GFP_KERNEL | __GFP_NOWARN,
+                                        PAGE_KERNEL, 0, NUMA_NO_NODE,
+                                        __builtin_return_address(0));
+       }
+
+       if (!p) {
+               pr_warn_ratelimited("%s: unable to allocate memory\n",
+                                   __func__);
+       }
+
+       if (p && (kasan_alloc_module_shadow(p, size, GFP_KERNEL) < 0)) {
                vfree(p);
                return NULL;
        }
@@ -448,9 +529,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
                case R_AARCH64_CALL26:
                        ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 26,
                                             AARCH64_INSN_IMM_26);
-
-                       if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
-                           ovf == -ERANGE) {
+                       if (ovf == -ERANGE) {
                                val = module_emit_plt_entry(me, sechdrs, loc, &rel[i], sym);
                                if (!val)
                                        return -ENOEXEC;
@@ -487,7 +566,7 @@ static int module_init_ftrace_plt(const Elf_Ehdr *hdr,
                                  const Elf_Shdr *sechdrs,
                                  struct module *mod)
 {
-#if defined(CONFIG_ARM64_MODULE_PLTS) && defined(CONFIG_DYNAMIC_FTRACE)
+#if defined(CONFIG_DYNAMIC_FTRACE)
        const Elf_Shdr *s;
        struct plt_entry *plts;
 
index f5bcb0d..7e89968 100644 (file)
@@ -66,13 +66,10 @@ void mte_sync_tags(pte_t old_pte, pte_t pte)
                return;
 
        /* if PG_mte_tagged is set, tags have already been initialised */
-       for (i = 0; i < nr_pages; i++, page++) {
-               if (!page_mte_tagged(page)) {
+       for (i = 0; i < nr_pages; i++, page++)
+               if (!page_mte_tagged(page))
                        mte_sync_page_tags(page, old_pte, check_swap,
                                           pte_is_tagged);
-                       set_page_mte_tagged(page);
-               }
-       }
 
        /* ensure the tags are visible before the PTE is set */
        smp_wmb();
index b8ec7b3..417a8a8 100644 (file)
@@ -296,6 +296,8 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 
        *cmdline_p = boot_command_line;
 
+       kaslr_init();
+
        /*
         * If know now we are going to need KPTI then use non-global
         * mappings from the start, avoiding the cost of rewriting
index 2cfc810..e304f7e 100644 (file)
@@ -23,6 +23,7 @@
 #include <asm/daifflags.h>
 #include <asm/debug-monitors.h>
 #include <asm/elf.h>
+#include <asm/exception.h>
 #include <asm/cacheflush.h>
 #include <asm/ucontext.h>
 #include <asm/unistd.h>
@@ -398,7 +399,7 @@ static int restore_tpidr2_context(struct user_ctxs *user)
 
        __get_user_error(tpidr2_el0, &user->tpidr2->tpidr2, err);
        if (!err)
-               current->thread.tpidr2_el0 = tpidr2_el0;
+               write_sysreg_s(tpidr2_el0, SYS_TPIDR2_EL0);
 
        return err;
 }
index d00d4cb..edd6389 100644 (file)
@@ -332,17 +332,13 @@ static int op_cpu_kill(unsigned int cpu)
 }
 
 /*
- * called on the thread which is asking for a CPU to be shutdown -
- * waits until shutdown has completed, or it is timed out.
+ * Called on the thread which is asking for a CPU to be shutdown after the
+ * shutdown completed.
  */
-void __cpu_die(unsigned int cpu)
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
 {
        int err;
 
-       if (!cpu_wait_death(cpu, 5)) {
-               pr_crit("CPU%u: cpu didn't die\n", cpu);
-               return;
-       }
        pr_debug("CPU%u: shutdown\n", cpu);
 
        /*
@@ -369,8 +365,8 @@ void __noreturn cpu_die(void)
 
        local_daif_mask();
 
-       /* Tell __cpu_die() that this CPU is now safe to dispose of */
-       (void)cpu_report_death();
+       /* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */
+       cpuhp_ap_report_dead();
 
        /*
         * Actually shutdown the CPU. This must never fail. The specific hotplug
index da84cf8..5a668d7 100644 (file)
@@ -147,11 +147,9 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
         * exit regardless, as the old entry assembly did.
         */
        if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) {
-               local_daif_mask();
                flags = read_thread_flags();
                if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP))
                        return;
-               local_daif_restore(DAIF_PROCCTX);
        }
 
 trace_exit:
index 4bb1b8f..794a2dd 100644 (file)
@@ -514,6 +514,63 @@ void do_el1_fpac(struct pt_regs *regs, unsigned long esr)
        die("Oops - FPAC", regs, esr);
 }
 
+void do_el0_mops(struct pt_regs *regs, unsigned long esr)
+{
+       bool wrong_option = esr & ESR_ELx_MOPS_ISS_WRONG_OPTION;
+       bool option_a = esr & ESR_ELx_MOPS_ISS_OPTION_A;
+       int dstreg = ESR_ELx_MOPS_ISS_DESTREG(esr);
+       int srcreg = ESR_ELx_MOPS_ISS_SRCREG(esr);
+       int sizereg = ESR_ELx_MOPS_ISS_SIZEREG(esr);
+       unsigned long dst, src, size;
+
+       dst = pt_regs_read_reg(regs, dstreg);
+       src = pt_regs_read_reg(regs, srcreg);
+       size = pt_regs_read_reg(regs, sizereg);
+
+       /*
+        * Put the registers back in the original format suitable for a
+        * prologue instruction, using the generic return routine from the
+        * Arm ARM (DDI 0487I.a) rules CNTMJ and MWFQH.
+        */
+       if (esr & ESR_ELx_MOPS_ISS_MEM_INST) {
+               /* SET* instruction */
+               if (option_a ^ wrong_option) {
+                       /* Format is from Option A; forward set */
+                       pt_regs_write_reg(regs, dstreg, dst + size);
+                       pt_regs_write_reg(regs, sizereg, -size);
+               }
+       } else {
+               /* CPY* instruction */
+               if (!(option_a ^ wrong_option)) {
+                       /* Format is from Option B */
+                       if (regs->pstate & PSR_N_BIT) {
+                               /* Backward copy */
+                               pt_regs_write_reg(regs, dstreg, dst - size);
+                               pt_regs_write_reg(regs, srcreg, src - size);
+                       }
+               } else {
+                       /* Format is from Option A */
+                       if (size & BIT(63)) {
+                               /* Forward copy */
+                               pt_regs_write_reg(regs, dstreg, dst + size);
+                               pt_regs_write_reg(regs, srcreg, src + size);
+                               pt_regs_write_reg(regs, sizereg, -size);
+                       }
+               }
+       }
+
+       if (esr & ESR_ELx_MOPS_ISS_FROM_EPILOGUE)
+               regs->pc -= 8;
+       else
+               regs->pc -= 4;
+
+       /*
+        * If single stepping then finish the step before executing the
+        * prologue instruction.
+        */
+       user_fastforward_single_step(current);
+}
+
 #define __user_cache_maint(insn, address, res)                 \
        if (address >= TASK_SIZE_MAX) {                         \
                res = -EFAULT;                                  \
@@ -824,6 +881,7 @@ static const char *esr_class_str[] = {
        [ESR_ELx_EC_DABT_LOW]           = "DABT (lower EL)",
        [ESR_ELx_EC_DABT_CUR]           = "DABT (current EL)",
        [ESR_ELx_EC_SP_ALIGN]           = "SP Alignment",
+       [ESR_ELx_EC_MOPS]               = "MOPS",
        [ESR_ELx_EC_FP_EXC32]           = "FP (AArch32)",
        [ESR_ELx_EC_FP_EXC64]           = "FP (AArch64)",
        [ESR_ELx_EC_SERROR]             = "SError",
@@ -947,7 +1005,7 @@ void do_serror(struct pt_regs *regs, unsigned long esr)
 }
 
 /* GENERIC_BUG traps */
-
+#ifdef CONFIG_GENERIC_BUG
 int is_valid_bugaddr(unsigned long addr)
 {
        /*
@@ -959,6 +1017,7 @@ int is_valid_bugaddr(unsigned long addr)
         */
        return 1;
 }
+#endif
 
 static int bug_handler(struct pt_regs *regs, unsigned long esr)
 {
index 0119dc9..d9e1355 100644 (file)
@@ -288,7 +288,7 @@ static int aarch32_alloc_kuser_vdso_page(void)
 
        memcpy((void *)(vdso_page + 0x1000 - kuser_sz), __kuser_helper_start,
               kuser_sz);
-       aarch32_vectors_page = virt_to_page(vdso_page);
+       aarch32_vectors_page = virt_to_page((void *)vdso_page);
        return 0;
 }
 
index 55f80fb..8725291 100644 (file)
@@ -333,7 +333,7 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
 
        /* Check if we have TRBE implemented and available at the host */
        if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
-           !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_PROG))
+           !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
                vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
 }
 
index 1279949..4c9dcd8 100644 (file)
@@ -81,26 +81,34 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 
        fpsimd_kvm_prepare();
 
+       /*
+        * We will check TIF_FOREIGN_FPSTATE just before entering the
+        * guest in kvm_arch_vcpu_ctxflush_fp() and override this to
+        * FP_STATE_FREE if the flag set.
+        */
        vcpu->arch.fp_state = FP_STATE_HOST_OWNED;
 
        vcpu_clear_flag(vcpu, HOST_SVE_ENABLED);
        if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
                vcpu_set_flag(vcpu, HOST_SVE_ENABLED);
 
-       /*
-        * We don't currently support SME guests but if we leave
-        * things in streaming mode then when the guest starts running
-        * FPSIMD or SVE code it may generate SME traps so as a
-        * special case if we are in streaming mode we force the host
-        * state to be saved now and exit streaming mode so that we
-        * don't have to handle any SME traps for valid guest
-        * operations. Do this for ZA as well for now for simplicity.
-        */
        if (system_supports_sme()) {
                vcpu_clear_flag(vcpu, HOST_SME_ENABLED);
                if (read_sysreg(cpacr_el1) & CPACR_EL1_SMEN_EL0EN)
                        vcpu_set_flag(vcpu, HOST_SME_ENABLED);
 
+               /*
+                * If PSTATE.SM is enabled then save any pending FP
+                * state and disable PSTATE.SM. If we leave PSTATE.SM
+                * enabled and the guest does not enable SME via
+                * CPACR_EL1.SMEN then operations that should be valid
+                * may generate SME traps from EL1 to EL1 which we
+                * can't intercept and which would confuse the guest.
+                *
+                * Do the same for PSTATE.ZA in the case where there
+                * is state in the registers which has not already
+                * been saved, this is very unlikely to happen.
+                */
                if (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)) {
                        vcpu->arch.fp_state = FP_STATE_FREE;
                        fpsimd_save_and_flush_cpu_state();
index c41166f..2f6e0b3 100644 (file)
@@ -82,8 +82,14 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
         * EL1 instead of being trapped to EL2.
         */
        if (kvm_arm_support_pmu_v3()) {
+               struct kvm_cpu_context *hctxt;
+
                write_sysreg(0, pmselr_el0);
+
+               hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+               ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
                write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
+               vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
        }
 
        vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
@@ -106,8 +112,13 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
        write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
 
        write_sysreg(0, hstr_el2);
-       if (kvm_arm_support_pmu_v3())
-               write_sysreg(0, pmuserenr_el0);
+       if (kvm_arm_support_pmu_v3()) {
+               struct kvm_cpu_context *hctxt;
+
+               hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+               write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
+               vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU);
+       }
 
        if (cpus_have_final_cap(ARM64_SME)) {
                sysreg_clear_set_s(SYS_HFGRTR_EL2, 0,
@@ -130,6 +141,9 @@ static inline void ___activate_traps(struct kvm_vcpu *vcpu)
 
        if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
                write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
+
+       if (cpus_have_final_cap(ARM64_HAS_HCX))
+               write_sysreg_s(HCRX_GUEST_FLAGS, SYS_HCRX_EL2);
 }
 
 static inline void ___deactivate_traps(struct kvm_vcpu *vcpu)
@@ -144,6 +158,9 @@ static inline void ___deactivate_traps(struct kvm_vcpu *vcpu)
                vcpu->arch.hcr_el2 &= ~HCR_VSE;
                vcpu->arch.hcr_el2 |= read_sysreg(hcr_el2) & HCR_VSE;
        }
+
+       if (cpus_have_final_cap(ARM64_HAS_HCX))
+               write_sysreg_s(HCRX_HOST_FLAGS, SYS_HCRX_EL2);
 }
 
 static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
@@ -177,9 +194,17 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
        sve_guest = vcpu_has_sve(vcpu);
        esr_ec = kvm_vcpu_trap_get_class(vcpu);
 
-       /* Don't handle SVE traps for non-SVE vcpus here: */
-       if (!sve_guest && esr_ec != ESR_ELx_EC_FP_ASIMD)
+       /* Only handle traps the vCPU can support here: */
+       switch (esr_ec) {
+       case ESR_ELx_EC_FP_ASIMD:
+               break;
+       case ESR_ELx_EC_SVE:
+               if (!sve_guest)
+                       return false;
+               break;
+       default:
                return false;
+       }
 
        /* Valid trap.  Switch the context: */
 
@@ -404,17 +429,21 @@ static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code)
        return false;
 }
 
-static bool kvm_hyp_handle_iabt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
+static bool kvm_hyp_handle_memory_fault(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
        if (!__populate_fault_info(vcpu))
                return true;
 
        return false;
 }
+static bool kvm_hyp_handle_iabt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
+       __alias(kvm_hyp_handle_memory_fault);
+static bool kvm_hyp_handle_watchpt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
+       __alias(kvm_hyp_handle_memory_fault);
 
 static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
-       if (!__populate_fault_info(vcpu))
+       if (kvm_hyp_handle_memory_fault(vcpu, exit_code))
                return true;
 
        if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
index 699ea1f..bb6b571 100644 (file)
@@ -44,6 +44,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
        ctxt_sys_reg(ctxt, TTBR0_EL1)   = read_sysreg_el1(SYS_TTBR0);
        ctxt_sys_reg(ctxt, TTBR1_EL1)   = read_sysreg_el1(SYS_TTBR1);
        ctxt_sys_reg(ctxt, TCR_EL1)     = read_sysreg_el1(SYS_TCR);
+       if (cpus_have_final_cap(ARM64_HAS_TCR2))
+               ctxt_sys_reg(ctxt, TCR2_EL1)    = read_sysreg_el1(SYS_TCR2);
        ctxt_sys_reg(ctxt, ESR_EL1)     = read_sysreg_el1(SYS_ESR);
        ctxt_sys_reg(ctxt, AFSR0_EL1)   = read_sysreg_el1(SYS_AFSR0);
        ctxt_sys_reg(ctxt, AFSR1_EL1)   = read_sysreg_el1(SYS_AFSR1);
@@ -53,6 +55,10 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
        ctxt_sys_reg(ctxt, CONTEXTIDR_EL1) = read_sysreg_el1(SYS_CONTEXTIDR);
        ctxt_sys_reg(ctxt, AMAIR_EL1)   = read_sysreg_el1(SYS_AMAIR);
        ctxt_sys_reg(ctxt, CNTKCTL_EL1) = read_sysreg_el1(SYS_CNTKCTL);
+       if (cpus_have_final_cap(ARM64_HAS_S1PIE)) {
+               ctxt_sys_reg(ctxt, PIR_EL1)     = read_sysreg_el1(SYS_PIR);
+               ctxt_sys_reg(ctxt, PIRE0_EL1)   = read_sysreg_el1(SYS_PIRE0);
+       }
        ctxt_sys_reg(ctxt, PAR_EL1)     = read_sysreg_par();
        ctxt_sys_reg(ctxt, TPIDR_EL1)   = read_sysreg(tpidr_el1);
 
@@ -114,6 +120,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
        write_sysreg_el1(ctxt_sys_reg(ctxt, CPACR_EL1), SYS_CPACR);
        write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL1), SYS_TTBR0);
        write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL1), SYS_TTBR1);
+       if (cpus_have_final_cap(ARM64_HAS_TCR2))
+               write_sysreg_el1(ctxt_sys_reg(ctxt, TCR2_EL1),  SYS_TCR2);
        write_sysreg_el1(ctxt_sys_reg(ctxt, ESR_EL1),   SYS_ESR);
        write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR0_EL1), SYS_AFSR0);
        write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR1_EL1), SYS_AFSR1);
@@ -123,6 +131,10 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
        write_sysreg_el1(ctxt_sys_reg(ctxt, CONTEXTIDR_EL1), SYS_CONTEXTIDR);
        write_sysreg_el1(ctxt_sys_reg(ctxt, AMAIR_EL1), SYS_AMAIR);
        write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
+       if (cpus_have_final_cap(ARM64_HAS_S1PIE)) {
+               write_sysreg_el1(ctxt_sys_reg(ctxt, PIR_EL1),   SYS_PIR);
+               write_sysreg_el1(ctxt_sys_reg(ctxt, PIRE0_EL1), SYS_PIRE0);
+       }
        write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),       par_el1);
        write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),     tpidr_el1);
 
index d756b93..4558c02 100644 (file)
@@ -56,7 +56,7 @@ static void __debug_save_trace(u64 *trfcr_el1)
        *trfcr_el1 = 0;
 
        /* Check if the TRBE is enabled */
-       if (!(read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_ENABLE))
+       if (!(read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E))
                return;
        /*
         * Prohibit trace generation while we are in guest.
index 2e9ec4a..a8813b2 100644 (file)
@@ -575,7 +575,7 @@ struct pkvm_mem_donation {
 
 struct check_walk_data {
        enum pkvm_page_state    desired;
-       enum pkvm_page_state    (*get_page_state)(kvm_pte_t pte);
+       enum pkvm_page_state    (*get_page_state)(kvm_pte_t pte, u64 addr);
 };
 
 static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
@@ -583,10 +583,7 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
 {
        struct check_walk_data *d = ctx->arg;
 
-       if (kvm_pte_valid(ctx->old) && !addr_is_allowed_memory(kvm_pte_to_phys(ctx->old)))
-               return -EINVAL;
-
-       return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
+       return d->get_page_state(ctx->old, ctx->addr) == d->desired ? 0 : -EPERM;
 }
 
 static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
@@ -601,8 +598,11 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
        return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
-static enum pkvm_page_state host_get_page_state(kvm_pte_t pte)
+static enum pkvm_page_state host_get_page_state(kvm_pte_t pte, u64 addr)
 {
+       if (!addr_is_allowed_memory(addr))
+               return PKVM_NOPAGE;
+
        if (!kvm_pte_valid(pte) && pte)
                return PKVM_NOPAGE;
 
@@ -709,7 +709,7 @@ static int host_complete_donation(u64 addr, const struct pkvm_mem_transition *tx
        return host_stage2_set_owner_locked(addr, size, host_id);
 }
 
-static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
+static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte, u64 addr)
 {
        if (!kvm_pte_valid(pte))
                return PKVM_NOPAGE;
index 71fa16a..7779149 100644 (file)
@@ -186,6 +186,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
        [ESR_ELx_EC_FP_ASIMD]           = kvm_hyp_handle_fpsimd,
        [ESR_ELx_EC_IABT_LOW]           = kvm_hyp_handle_iabt_low,
        [ESR_ELx_EC_DABT_LOW]           = kvm_hyp_handle_dabt_low,
+       [ESR_ELx_EC_WATCHPT_LOW]        = kvm_hyp_handle_watchpt_low,
        [ESR_ELx_EC_PAC]                = kvm_hyp_handle_ptrauth,
 };
 
@@ -196,6 +197,7 @@ static const exit_handler_fn pvm_exit_handlers[] = {
        [ESR_ELx_EC_FP_ASIMD]           = kvm_hyp_handle_fpsimd,
        [ESR_ELx_EC_IABT_LOW]           = kvm_hyp_handle_iabt_low,
        [ESR_ELx_EC_DABT_LOW]           = kvm_hyp_handle_dabt_low,
+       [ESR_ELx_EC_WATCHPT_LOW]        = kvm_hyp_handle_watchpt_low,
        [ESR_ELx_EC_PAC]                = kvm_hyp_handle_ptrauth,
 };
 
index 3d61bd3..95dae02 100644 (file)
@@ -58,8 +58,9 @@
 struct kvm_pgtable_walk_data {
        struct kvm_pgtable_walker       *walker;
 
+       const u64                       start;
        u64                             addr;
-       u64                             end;
+       const u64                       end;
 };
 
 static bool kvm_phys_is_valid(u64 phys)
@@ -201,20 +202,33 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
                .old    = READ_ONCE(*ptep),
                .arg    = data->walker->arg,
                .mm_ops = mm_ops,
+               .start  = data->start,
                .addr   = data->addr,
                .end    = data->end,
                .level  = level,
                .flags  = flags,
        };
        int ret = 0;
+       bool reload = false;
        kvm_pteref_t childp;
        bool table = kvm_pte_table(ctx.old, level);
 
-       if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
+       if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
                ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
+               reload = true;
+       }
 
        if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
                ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
+               reload = true;
+       }
+
+       /*
+        * Reload the page table after invoking the walker callback for leaf
+        * entries or after pre-order traversal, to allow the walker to descend
+        * into a newly installed or replaced table.
+        */
+       if (reload) {
                ctx.old = READ_ONCE(*ptep);
                table = kvm_pte_table(ctx.old, level);
        }
@@ -293,6 +307,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
                     struct kvm_pgtable_walker *walker)
 {
        struct kvm_pgtable_walk_data walk_data = {
+               .start  = ALIGN_DOWN(addr, PAGE_SIZE),
                .addr   = ALIGN_DOWN(addr, PAGE_SIZE),
                .end    = PAGE_ALIGN(walk_data.addr + size),
                .walker = walker,
@@ -349,7 +364,7 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
 }
 
 struct hyp_map_data {
-       u64                             phys;
+       const u64                       phys;
        kvm_pte_t                       attr;
 };
 
@@ -407,13 +422,12 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
                                    struct hyp_map_data *data)
 {
+       u64 phys = data->phys + (ctx->addr - ctx->start);
        kvm_pte_t new;
-       u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 
        if (!kvm_block_mapping_supported(ctx, phys))
                return false;
 
-       data->phys += granule;
        new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
        if (ctx->old == new)
                return true;
@@ -576,7 +590,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 }
 
 struct stage2_map_data {
-       u64                             phys;
+       const u64                       phys;
        kvm_pte_t                       attr;
        u8                              owner_id;
 
@@ -794,20 +808,43 @@ static bool stage2_pte_executable(kvm_pte_t pte)
        return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
 }
 
+static u64 stage2_map_walker_phys_addr(const struct kvm_pgtable_visit_ctx *ctx,
+                                      const struct stage2_map_data *data)
+{
+       u64 phys = data->phys;
+
+       /*
+        * Stage-2 walks to update ownership data are communicated to the map
+        * walker using an invalid PA. Avoid offsetting an already invalid PA,
+        * which could overflow and make the address valid again.
+        */
+       if (!kvm_phys_is_valid(phys))
+               return phys;
+
+       /*
+        * Otherwise, work out the correct PA based on how far the walk has
+        * gotten.
+        */
+       return phys + (ctx->addr - ctx->start);
+}
+
 static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
                                        struct stage2_map_data *data)
 {
+       u64 phys = stage2_map_walker_phys_addr(ctx, data);
+
        if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
                return false;
 
-       return kvm_block_mapping_supported(ctx, data->phys);
+       return kvm_block_mapping_supported(ctx, phys);
 }
 
 static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
                                      struct stage2_map_data *data)
 {
        kvm_pte_t new;
-       u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
+       u64 phys = stage2_map_walker_phys_addr(ctx, data);
+       u64 granule = kvm_granule_size(ctx->level);
        struct kvm_pgtable *pgt = data->mmu->pgt;
        struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
@@ -841,8 +878,6 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 
        stage2_make_pte(ctx, new);
 
-       if (kvm_phys_is_valid(phys))
-               data->phys += granule;
        return 0;
 }
 
@@ -1297,4 +1332,7 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
        };
 
        WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level + 1));
+
+       WARN_ON(mm_ops->page_count(pgtable) != 1);
+       mm_ops->put_page(pgtable);
 }
index 3d868e8..b37e7c9 100644 (file)
@@ -92,14 +92,28 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 }
 NOKPROBE_SYMBOL(__deactivate_traps);
 
+/*
+ * Disable IRQs in {activate,deactivate}_traps_vhe_{load,put}() to
+ * prevent a race condition between context switching of PMUSERENR_EL0
+ * in __{activate,deactivate}_traps_common() and IPIs that attempts to
+ * update PMUSERENR_EL0. See also kvm_set_pmuserenr().
+ */
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
+       unsigned long flags;
+
+       local_irq_save(flags);
        __activate_traps_common(vcpu);
+       local_irq_restore(flags);
 }
 
 void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 {
+       unsigned long flags;
+
+       local_irq_save(flags);
        __deactivate_traps_common(vcpu);
+       local_irq_restore(flags);
 }
 
 static const exit_handler_fn hyp_exit_handlers[] = {
@@ -110,6 +124,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
        [ESR_ELx_EC_FP_ASIMD]           = kvm_hyp_handle_fpsimd,
        [ESR_ELx_EC_IABT_LOW]           = kvm_hyp_handle_iabt_low,
        [ESR_ELx_EC_DABT_LOW]           = kvm_hyp_handle_dabt_low,
+       [ESR_ELx_EC_WATCHPT_LOW]        = kvm_hyp_handle_watchpt_low,
        [ESR_ELx_EC_PAC]                = kvm_hyp_handle_ptrauth,
 };
 
index 64c3aec..0bd93a5 100644 (file)
@@ -204,7 +204,7 @@ void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
         * Size Fault at level 0, as if exceeding PARange.
         *
         * Non-LPAE guests will only get the external abort, as there
-        * is no way to to describe the ASF.
+        * is no way to describe the ASF.
         */
        if (vcpu_el1_is_32bit(vcpu) &&
            !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE))
index 45727d5..5606509 100644 (file)
@@ -694,45 +694,41 @@ out_unlock:
 
 static struct arm_pmu *kvm_pmu_probe_armpmu(void)
 {
-       struct perf_event_attr attr = { };
-       struct perf_event *event;
-       struct arm_pmu *pmu = NULL;
+       struct arm_pmu *tmp, *pmu = NULL;
+       struct arm_pmu_entry *entry;
+       int cpu;
+
+       mutex_lock(&arm_pmus_lock);
 
        /*
-        * Create a dummy event that only counts user cycles. As we'll never
-        * leave this function with the event being live, it will never
-        * count anything. But it allows us to probe some of the PMU
-        * details. Yes, this is terrible.
+        * It is safe to use a stale cpu to iterate the list of PMUs so long as
+        * the same value is used for the entirety of the loop. Given this, and
+        * the fact that no percpu data is used for the lookup there is no need
+        * to disable preemption.
+        *
+        * It is still necessary to get a valid cpu, though, to probe for the
+        * default PMU instance as userspace is not required to specify a PMU
+        * type. In order to uphold the preexisting behavior KVM selects the
+        * PMU instance for the core where the first call to the
+        * KVM_ARM_VCPU_PMU_V3_CTRL attribute group occurs. A dependent use case
+        * would be a user with disdain of all things big.LITTLE that affines
+        * the VMM to a particular cluster of cores.
+        *
+        * In any case, userspace should just do the sane thing and use the UAPI
+        * to select a PMU type directly. But, be wary of the baggage being
+        * carried here.
         */
-       attr.type = PERF_TYPE_RAW;
-       attr.size = sizeof(attr);
-       attr.pinned = 1;
-       attr.disabled = 0;
-       attr.exclude_user = 0;
-       attr.exclude_kernel = 1;
-       attr.exclude_hv = 1;
-       attr.exclude_host = 1;
-       attr.config = ARMV8_PMUV3_PERFCTR_CPU_CYCLES;
-       attr.sample_period = GENMASK(63, 0);
-
-       event = perf_event_create_kernel_counter(&attr, -1, current,
-                                                kvm_pmu_perf_overflow, &attr);
-
-       if (IS_ERR(event)) {
-               pr_err_once("kvm: pmu event creation failed %ld\n",
-                           PTR_ERR(event));
-               return NULL;
-       }
+       cpu = raw_smp_processor_id();
+       list_for_each_entry(entry, &arm_pmus, entry) {
+               tmp = entry->arm_pmu;
 
-       if (event->pmu) {
-               pmu = to_arm_pmu(event->pmu);
-               if (pmu->pmuver == ID_AA64DFR0_EL1_PMUVer_NI ||
-                   pmu->pmuver == ID_AA64DFR0_EL1_PMUVer_IMP_DEF)
-                       pmu = NULL;
+               if (cpumask_test_cpu(cpu, &tmp->supported_cpus)) {
+                       pmu = tmp;
+                       break;
+               }
        }
 
-       perf_event_disable(event);
-       perf_event_release_kernel(event);
+       mutex_unlock(&arm_pmus_lock);
 
        return pmu;
 }
@@ -912,7 +908,17 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
                return -EBUSY;
 
        if (!kvm->arch.arm_pmu) {
-               /* No PMU set, get the default one */
+               /*
+                * No PMU set, get the default one.
+                *
+                * The observant among you will notice that the supported_cpus
+                * mask does not get updated for the default PMU even though it
+                * is quite possible the selected instance supports only a
+                * subset of cores in the system. This is intentional, and
+                * upholds the preexisting behavior on heterogeneous systems
+                * where vCPUs can be scheduled on any core but the guest
+                * counters could stop working.
+                */
                kvm->arch.arm_pmu = kvm_pmu_probe_armpmu();
                if (!kvm->arch.arm_pmu)
                        return -ENODEV;
index 7887133..121f1a1 100644 (file)
@@ -209,3 +209,30 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
        kvm_vcpu_pmu_enable_el0(events_host);
        kvm_vcpu_pmu_disable_el0(events_guest);
 }
+
+/*
+ * With VHE, keep track of the PMUSERENR_EL0 value for the host EL0 on the pCPU
+ * where PMUSERENR_EL0 for the guest is loaded, since PMUSERENR_EL0 is switched
+ * to the value for the guest on vcpu_load().  The value for the host EL0
+ * will be restored on vcpu_put(), before returning to userspace.
+ * This isn't necessary for nVHE, as the register is context switched for
+ * every guest enter/exit.
+ *
+ * Return true if KVM takes care of the register. Otherwise return false.
+ */
+bool kvm_set_pmuserenr(u64 val)
+{
+       struct kvm_cpu_context *hctxt;
+       struct kvm_vcpu *vcpu;
+
+       if (!kvm_arm_support_pmu_v3() || !has_vhe())
+               return false;
+
+       vcpu = kvm_get_running_vcpu();
+       if (!vcpu || !vcpu_get_flag(vcpu, PMUSERENR_ON_CPU))
+               return false;
+
+       hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+       ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
+       return true;
+}
index 71b1209..5b5d5e5 100644 (file)
@@ -211,6 +211,19 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
        return true;
 }
 
+static bool access_dcgsw(struct kvm_vcpu *vcpu,
+                        struct sys_reg_params *p,
+                        const struct sys_reg_desc *r)
+{
+       if (!kvm_has_mte(vcpu->kvm)) {
+               kvm_inject_undefined(vcpu);
+               return false;
+       }
+
+       /* Treat MTE S/W ops as we treat the classic ones: with contempt */
+       return access_dcsw(vcpu, p, r);
+}
+
 static void get_access_mask(const struct sys_reg_desc *r, u64 *mask, u64 *shift)
 {
        switch (r->aarch32_map) {
@@ -388,9 +401,9 @@ static bool trap_oslar_el1(struct kvm_vcpu *vcpu,
                return read_from_write_only(vcpu, p, r);
 
        /* Forward the OSLK bit to OSLSR */
-       oslsr = __vcpu_sys_reg(vcpu, OSLSR_EL1) & ~SYS_OSLSR_OSLK;
-       if (p->regval & SYS_OSLAR_OSLK)
-               oslsr |= SYS_OSLSR_OSLK;
+       oslsr = __vcpu_sys_reg(vcpu, OSLSR_EL1) & ~OSLSR_EL1_OSLK;
+       if (p->regval & OSLAR_EL1_OSLK)
+               oslsr |= OSLSR_EL1_OSLK;
 
        __vcpu_sys_reg(vcpu, OSLSR_EL1) = oslsr;
        return true;
@@ -414,7 +427,7 @@ static int set_oslsr_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
         * The only modifiable bit is the OSLK bit. Refuse the write if
         * userspace attempts to change any other bit in the register.
         */
-       if ((val ^ rd->val) & ~SYS_OSLSR_OSLK)
+       if ((val ^ rd->val) & ~OSLSR_EL1_OSLK)
                return -EINVAL;
 
        __vcpu_sys_reg(vcpu, rd->reg) = val;
@@ -1252,6 +1265,7 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, struct sys_reg_desc const *r
                                 ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_GPA3));
                if (!cpus_have_final_cap(ARM64_HAS_WFXT))
                        val &= ~ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_WFxT);
+               val &= ~ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_MOPS);
                break;
        case SYS_ID_AA64DFR0_EL1:
                /* Limit debug to ARMv8.0 */
@@ -1756,8 +1770,14 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
  */
 static const struct sys_reg_desc sys_reg_descs[] = {
        { SYS_DESC(SYS_DC_ISW), access_dcsw },
+       { SYS_DESC(SYS_DC_IGSW), access_dcgsw },
+       { SYS_DESC(SYS_DC_IGDSW), access_dcgsw },
        { SYS_DESC(SYS_DC_CSW), access_dcsw },
+       { SYS_DESC(SYS_DC_CGSW), access_dcgsw },
+       { SYS_DESC(SYS_DC_CGDSW), access_dcgsw },
        { SYS_DESC(SYS_DC_CISW), access_dcsw },
+       { SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
+       { SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
 
        DBG_BCR_BVR_WCR_WVR_EL1(0),
        DBG_BCR_BVR_WCR_WVR_EL1(1),
@@ -1781,7 +1801,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
        { SYS_DESC(SYS_MDRAR_EL1), trap_raz_wi },
        { SYS_DESC(SYS_OSLAR_EL1), trap_oslar_el1 },
        { SYS_DESC(SYS_OSLSR_EL1), trap_oslsr_el1, reset_val, OSLSR_EL1,
-               SYS_OSLSR_OSLM_IMPLEMENTED, .set_user = set_oslsr_el1, },
+               OSLSR_EL1_OSLM_IMPLEMENTED, .set_user = set_oslsr_el1, },
        { SYS_DESC(SYS_OSDLR_EL1), trap_raz_wi },
        { SYS_DESC(SYS_DBGPRCR_EL1), trap_raz_wi },
        { SYS_DESC(SYS_DBGCLAIMSET_EL1), trap_raz_wi },
@@ -1872,7 +1892,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
        ID_SANITISED(ID_AA64MMFR0_EL1),
        ID_SANITISED(ID_AA64MMFR1_EL1),
        ID_SANITISED(ID_AA64MMFR2_EL1),
-       ID_UNALLOCATED(7,3),
+       ID_SANITISED(ID_AA64MMFR3_EL1),
        ID_UNALLOCATED(7,4),
        ID_UNALLOCATED(7,5),
        ID_UNALLOCATED(7,6),
@@ -1892,6 +1912,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
        { SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
        { SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },
        { SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 },
+       { SYS_DESC(SYS_TCR2_EL1), access_vm_reg, reset_val, TCR2_EL1, 0 },
 
        PTRAUTH_KEY(APIA),
        PTRAUTH_KEY(APIB),
@@ -1941,6 +1962,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
        { SYS_DESC(SYS_PMMIR_EL1), trap_raz_wi },
 
        { SYS_DESC(SYS_MAIR_EL1), access_vm_reg, reset_unknown, MAIR_EL1 },
+       { SYS_DESC(SYS_PIRE0_EL1), access_vm_reg, reset_unknown, PIRE0_EL1 },
+       { SYS_DESC(SYS_PIR_EL1), access_vm_reg, reset_unknown, PIR_EL1 },
        { SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
 
        { SYS_DESC(SYS_LORSA_EL1), trap_loregion },
index 9d42c7c..c8c3cb8 100644 (file)
@@ -235,9 +235,9 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
         * KVM io device for the redistributor that belongs to this VCPU.
         */
        if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
-               mutex_lock(&vcpu->kvm->arch.config_lock);
+               mutex_lock(&vcpu->kvm->slots_lock);
                ret = vgic_register_redist_iodev(vcpu);
-               mutex_unlock(&vcpu->kvm->arch.config_lock);
+               mutex_unlock(&vcpu->kvm->slots_lock);
        }
        return ret;
 }
@@ -406,7 +406,7 @@ void kvm_vgic_destroy(struct kvm *kvm)
 
 /**
  * vgic_lazy_init: Lazy init is only allowed if the GIC exposed to the guest
- * is a GICv2. A GICv3 must be explicitly initialized by the guest using the
+ * is a GICv2. A GICv3 must be explicitly initialized by userspace using the
  * KVM_DEV_ARM_VGIC_GRP_CTRL KVM_DEVICE group.
  * @kvm: kvm struct pointer
  */
@@ -446,11 +446,14 @@ int vgic_lazy_init(struct kvm *kvm)
 int kvm_vgic_map_resources(struct kvm *kvm)
 {
        struct vgic_dist *dist = &kvm->arch.vgic;
+       enum vgic_type type;
+       gpa_t dist_base;
        int ret = 0;
 
        if (likely(vgic_ready(kvm)))
                return 0;
 
+       mutex_lock(&kvm->slots_lock);
        mutex_lock(&kvm->arch.config_lock);
        if (vgic_ready(kvm))
                goto out;
@@ -458,18 +461,33 @@ int kvm_vgic_map_resources(struct kvm *kvm)
        if (!irqchip_in_kernel(kvm))
                goto out;
 
-       if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
+       if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2) {
                ret = vgic_v2_map_resources(kvm);
-       else
+               type = VGIC_V2;
+       } else {
                ret = vgic_v3_map_resources(kvm);
+               type = VGIC_V3;
+       }
 
-       if (ret)
+       if (ret) {
                __kvm_vgic_destroy(kvm);
-       else
-               dist->ready = true;
+               goto out;
+       }
+       dist->ready = true;
+       dist_base = dist->vgic_dist_base;
+       mutex_unlock(&kvm->arch.config_lock);
+
+       ret = vgic_register_dist_iodev(kvm, dist_base, type);
+       if (ret) {
+               kvm_err("Unable to register VGIC dist MMIO regions\n");
+               kvm_vgic_destroy(kvm);
+       }
+       mutex_unlock(&kvm->slots_lock);
+       return ret;
 
 out:
        mutex_unlock(&kvm->arch.config_lock);
+       mutex_unlock(&kvm->slots_lock);
        return ret;
 }
 
index 750e51e..5fe2365 100644 (file)
@@ -1936,6 +1936,7 @@ void vgic_lpi_translation_cache_destroy(struct kvm *kvm)
 
 static int vgic_its_create(struct kvm_device *dev, u32 type)
 {
+       int ret;
        struct vgic_its *its;
 
        if (type != KVM_DEV_TYPE_ARM_VGIC_ITS)
@@ -1945,9 +1946,12 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
        if (!its)
                return -ENOMEM;
 
+       mutex_lock(&dev->kvm->arch.config_lock);
+
        if (vgic_initialized(dev->kvm)) {
-               int ret = vgic_v4_init(dev->kvm);
+               ret = vgic_v4_init(dev->kvm);
                if (ret < 0) {
+                       mutex_unlock(&dev->kvm->arch.config_lock);
                        kfree(its);
                        return ret;
                }
@@ -1960,12 +1964,10 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
 
        /* Yep, even more trickery for lock ordering... */
 #ifdef CONFIG_LOCKDEP
-       mutex_lock(&dev->kvm->arch.config_lock);
        mutex_lock(&its->cmd_lock);
        mutex_lock(&its->its_lock);
        mutex_unlock(&its->its_lock);
        mutex_unlock(&its->cmd_lock);
-       mutex_unlock(&dev->kvm->arch.config_lock);
 #endif
 
        its->vgic_its_base = VGIC_ADDR_UNDEF;
@@ -1986,7 +1988,11 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
 
        dev->private = its;
 
-       return vgic_its_set_abi(its, NR_ITS_ABIS - 1);
+       ret = vgic_its_set_abi(its, NR_ITS_ABIS - 1);
+
+       mutex_unlock(&dev->kvm->arch.config_lock);
+
+       return ret;
 }
 
 static void vgic_its_destroy(struct kvm_device *kvm_dev)
index 35cfa26..212b73a 100644 (file)
@@ -102,7 +102,11 @@ static int kvm_vgic_addr(struct kvm *kvm, struct kvm_device_attr *attr, bool wri
                if (get_user(addr, uaddr))
                        return -EFAULT;
 
-       mutex_lock(&kvm->arch.config_lock);
+       /*
+        * Since we can't hold config_lock while registering the redistributor
+        * iodevs, take the slots_lock immediately.
+        */
+       mutex_lock(&kvm->slots_lock);
        switch (attr->attr) {
        case KVM_VGIC_V2_ADDR_TYPE_DIST:
                r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
@@ -182,6 +186,7 @@ static int kvm_vgic_addr(struct kvm *kvm, struct kvm_device_attr *attr, bool wri
        if (r)
                goto out;
 
+       mutex_lock(&kvm->arch.config_lock);
        if (write) {
                r = vgic_check_iorange(kvm, *addr_ptr, addr, alignment, size);
                if (!r)
@@ -189,9 +194,10 @@ static int kvm_vgic_addr(struct kvm *kvm, struct kvm_device_attr *attr, bool wri
        } else {
                addr = *addr_ptr;
        }
+       mutex_unlock(&kvm->arch.config_lock);
 
 out:
-       mutex_unlock(&kvm->arch.config_lock);
+       mutex_unlock(&kvm->slots_lock);
 
        if (!r && !write)
                r =  put_user(addr, uaddr);
index 472b18a..188d218 100644 (file)
@@ -769,10 +769,13 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu)
        struct vgic_io_device *rd_dev = &vcpu->arch.vgic_cpu.rd_iodev;
        struct vgic_redist_region *rdreg;
        gpa_t rd_base;
-       int ret;
+       int ret = 0;
+
+       lockdep_assert_held(&kvm->slots_lock);
+       mutex_lock(&kvm->arch.config_lock);
 
        if (!IS_VGIC_ADDR_UNDEF(vgic_cpu->rd_iodev.base_addr))
-               return 0;
+               goto out_unlock;
 
        /*
         * We may be creating VCPUs before having set the base address for the
@@ -782,10 +785,12 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu)
         */
        rdreg = vgic_v3_rdist_free_slot(&vgic->rd_regions);
        if (!rdreg)
-               return 0;
+               goto out_unlock;
 
-       if (!vgic_v3_check_base(kvm))
-               return -EINVAL;
+       if (!vgic_v3_check_base(kvm)) {
+               ret = -EINVAL;
+               goto out_unlock;
+       }
 
        vgic_cpu->rdreg = rdreg;
        vgic_cpu->rdreg_index = rdreg->free_index;
@@ -799,16 +804,20 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu)
        rd_dev->nr_regions = ARRAY_SIZE(vgic_v3_rd_registers);
        rd_dev->redist_vcpu = vcpu;
 
-       mutex_lock(&kvm->slots_lock);
+       mutex_unlock(&kvm->arch.config_lock);
+
        ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, rd_base,
                                      2 * SZ_64K, &rd_dev->dev);
-       mutex_unlock(&kvm->slots_lock);
-
        if (ret)
                return ret;
 
+       /* Protected by slots_lock */
        rdreg->free_index++;
        return 0;
+
+out_unlock:
+       mutex_unlock(&kvm->arch.config_lock);
+       return ret;
 }
 
 static void vgic_unregister_redist_iodev(struct kvm_vcpu *vcpu)
@@ -834,12 +843,10 @@ static int vgic_register_all_redist_iodevs(struct kvm *kvm)
                /* The current c failed, so iterate over the previous ones. */
                int i;
 
-               mutex_lock(&kvm->slots_lock);
                for (i = 0; i < c; i++) {
                        vcpu = kvm_get_vcpu(kvm, i);
                        vgic_unregister_redist_iodev(vcpu);
                }
-               mutex_unlock(&kvm->slots_lock);
        }
 
        return ret;
@@ -938,7 +945,9 @@ int vgic_v3_set_redist_base(struct kvm *kvm, u32 index, u64 addr, u32 count)
 {
        int ret;
 
+       mutex_lock(&kvm->arch.config_lock);
        ret = vgic_v3_alloc_redist_region(kvm, index, addr, count);
+       mutex_unlock(&kvm->arch.config_lock);
        if (ret)
                return ret;
 
@@ -950,8 +959,10 @@ int vgic_v3_set_redist_base(struct kvm *kvm, u32 index, u64 addr, u32 count)
        if (ret) {
                struct vgic_redist_region *rdreg;
 
+               mutex_lock(&kvm->arch.config_lock);
                rdreg = vgic_v3_rdist_region_from_index(kvm, index);
                vgic_v3_free_redist_region(rdreg);
+               mutex_unlock(&kvm->arch.config_lock);
                return ret;
        }
 
index 1939c94..ff558c0 100644 (file)
@@ -1096,7 +1096,6 @@ int vgic_register_dist_iodev(struct kvm *kvm, gpa_t dist_base_address,
                             enum vgic_type type)
 {
        struct vgic_io_device *io_device = &kvm->arch.vgic.dist_iodev;
-       int ret = 0;
        unsigned int len;
 
        switch (type) {
@@ -1114,10 +1113,6 @@ int vgic_register_dist_iodev(struct kvm *kvm, gpa_t dist_base_address,
        io_device->iodev_type = IODEV_DIST;
        io_device->redist_vcpu = NULL;
 
-       mutex_lock(&kvm->slots_lock);
-       ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, dist_base_address,
-                                     len, &io_device->dev);
-       mutex_unlock(&kvm->slots_lock);
-
-       return ret;
+       return kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, dist_base_address,
+                                      len, &io_device->dev);
 }
index 6456483..7e9cdb7 100644 (file)
@@ -312,12 +312,6 @@ int vgic_v2_map_resources(struct kvm *kvm)
                return ret;
        }
 
-       ret = vgic_register_dist_iodev(kvm, dist->vgic_dist_base, VGIC_V2);
-       if (ret) {
-               kvm_err("Unable to register VGIC MMIO regions\n");
-               return ret;
-       }
-
        if (!static_branch_unlikely(&vgic_v2_cpuif_trap)) {
                ret = kvm_phys_addr_ioremap(kvm, dist->vgic_cpu_base,
                                            kvm_vgic_global_state.vcpu_base,
index 469d816..c3b8e13 100644 (file)
@@ -539,7 +539,6 @@ int vgic_v3_map_resources(struct kvm *kvm)
 {
        struct vgic_dist *dist = &kvm->arch.vgic;
        struct kvm_vcpu *vcpu;
-       int ret = 0;
        unsigned long c;
 
        kvm_for_each_vcpu(c, vcpu, kvm) {
@@ -569,12 +568,6 @@ int vgic_v3_map_resources(struct kvm *kvm)
                return -EBUSY;
        }
 
-       ret = vgic_register_dist_iodev(kvm, dist->vgic_dist_base, VGIC_V3);
-       if (ret) {
-               kvm_err("Unable to register VGICv3 dist MMIO regions\n");
-               return ret;
-       }
-
        if (kvm_vgic_global_state.has_gicv4_1)
                vgic_v4_configure_vsgis(kvm);
 
@@ -616,6 +609,10 @@ static const struct midr_range broken_seis[] = {
        MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_MAX),
        MIDR_ALL_VERSIONS(MIDR_APPLE_M2_BLIZZARD),
        MIDR_ALL_VERSIONS(MIDR_APPLE_M2_AVALANCHE),
+       MIDR_ALL_VERSIONS(MIDR_APPLE_M2_BLIZZARD_PRO),
+       MIDR_ALL_VERSIONS(MIDR_APPLE_M2_AVALANCHE_PRO),
+       MIDR_ALL_VERSIONS(MIDR_APPLE_M2_BLIZZARD_MAX),
+       MIDR_ALL_VERSIONS(MIDR_APPLE_M2_AVALANCHE_MAX),
        {},
 };
 
index 3bb0034..c1c28fe 100644 (file)
@@ -184,13 +184,14 @@ static void vgic_v4_disable_vsgis(struct kvm_vcpu *vcpu)
        }
 }
 
-/* Must be called with the kvm lock held */
 void vgic_v4_configure_vsgis(struct kvm *kvm)
 {
        struct vgic_dist *dist = &kvm->arch.vgic;
        struct kvm_vcpu *vcpu;
        unsigned long i;
 
+       lockdep_assert_held(&kvm->arch.config_lock);
+
        kvm_arm_halt_guest(kvm);
 
        kvm_for_each_vcpu(i, vcpu, kvm) {
index 08978d0..7fe8ba1 100644 (file)
@@ -47,7 +47,7 @@ static void flush_context(void)
        int cpu;
        u64 vmid;
 
-       bitmap_clear(vmid_map, 0, NUM_USER_VMIDS);
+       bitmap_zero(vmid_map, NUM_USER_VMIDS);
 
        for_each_possible_cpu(cpu) {
                vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0);
@@ -182,8 +182,7 @@ int __init kvm_arm_vmid_alloc_init(void)
         */
        WARN_ON(NUM_USER_VMIDS - 1 <= num_possible_cpus());
        atomic64_set(&vmid_generation, VMID_FIRST_VERSION);
-       vmid_map = kcalloc(BITS_TO_LONGS(NUM_USER_VMIDS),
-                          sizeof(*vmid_map), GFP_KERNEL);
+       vmid_map = bitmap_zalloc(NUM_USER_VMIDS, GFP_KERNEL);
        if (!vmid_map)
                return -ENOMEM;
 
@@ -192,5 +191,5 @@ int __init kvm_arm_vmid_alloc_init(void)
 
 void __init kvm_arm_vmid_alloc_free(void)
 {
-       kfree(vmid_map);
+       bitmap_free(vmid_map);
 }
index 96b1719..f9a53b7 100644 (file)
@@ -10,7 +10,7 @@
 #include <linux/module.h>
 #include <asm/neon-intrinsics.h>
 
-void xor_arm64_neon_2(unsigned long bytes, unsigned long * __restrict p1,
+static void xor_arm64_neon_2(unsigned long bytes, unsigned long * __restrict p1,
        const unsigned long * __restrict p2)
 {
        uint64_t *dp1 = (uint64_t *)p1;
@@ -37,7 +37,7 @@ void xor_arm64_neon_2(unsigned long bytes, unsigned long * __restrict p1,
        } while (--lines > 0);
 }
 
-void xor_arm64_neon_3(unsigned long bytes, unsigned long * __restrict p1,
+static void xor_arm64_neon_3(unsigned long bytes, unsigned long * __restrict p1,
        const unsigned long * __restrict p2,
        const unsigned long * __restrict p3)
 {
@@ -73,7 +73,7 @@ void xor_arm64_neon_3(unsigned long bytes, unsigned long * __restrict p1,
        } while (--lines > 0);
 }
 
-void xor_arm64_neon_4(unsigned long bytes, unsigned long * __restrict p1,
+static void xor_arm64_neon_4(unsigned long bytes, unsigned long * __restrict p1,
        const unsigned long * __restrict p2,
        const unsigned long * __restrict p3,
        const unsigned long * __restrict p4)
@@ -118,7 +118,7 @@ void xor_arm64_neon_4(unsigned long bytes, unsigned long * __restrict p1,
        } while (--lines > 0);
 }
 
-void xor_arm64_neon_5(unsigned long bytes, unsigned long * __restrict p1,
+static void xor_arm64_neon_5(unsigned long bytes, unsigned long * __restrict p1,
        const unsigned long * __restrict p2,
        const unsigned long * __restrict p3,
        const unsigned long * __restrict p4,
index e1e0dca..1881975 100644 (file)
@@ -364,8 +364,8 @@ void cpu_do_switch_mm(phys_addr_t pgd_phys, struct mm_struct *mm)
        ttbr1 &= ~TTBR_ASID_MASK;
        ttbr1 |= FIELD_PREP(TTBR_ASID_MASK, asid);
 
+       cpu_set_reserved_ttbr0_nosync();
        write_sysreg(ttbr1, ttbr1_el1);
-       isb();
        write_sysreg(ttbr0, ttbr0_el1);
        isb();
        post_ttbr_update_workaround();
index 4aadcfb..a7bb200 100644 (file)
@@ -21,9 +21,10 @@ void copy_highpage(struct page *to, struct page *from)
 
        copy_page(kto, kfrom);
 
+       if (kasan_hw_tags_enabled())
+               page_kasan_tag_reset(to);
+
        if (system_supports_mte() && page_mte_tagged(from)) {
-               if (kasan_hw_tags_enabled())
-                       page_kasan_tag_reset(to);
                /* It's a new page, shouldn't have been tagged yet */
                WARN_ON_ONCE(!try_page_mte_tagging(to));
                mte_copy_page_tags(kto, kfrom);
index 9e0db5c..c85b6d7 100644 (file)
@@ -66,6 +66,8 @@ static inline const struct fault_info *esr_to_debug_fault_info(unsigned long esr
 
 static void data_abort_decode(unsigned long esr)
 {
+       unsigned long iss2 = ESR_ELx_ISS2(esr);
+
        pr_alert("Data abort info:\n");
 
        if (esr & ESR_ELx_ISV) {
@@ -78,12 +80,21 @@ static void data_abort_decode(unsigned long esr)
                         (esr & ESR_ELx_SF) >> ESR_ELx_SF_SHIFT,
                         (esr & ESR_ELx_AR) >> ESR_ELx_AR_SHIFT);
        } else {
-               pr_alert("  ISV = 0, ISS = 0x%08lx\n", esr & ESR_ELx_ISS_MASK);
+               pr_alert("  ISV = 0, ISS = 0x%08lx, ISS2 = 0x%08lx\n",
+                        esr & ESR_ELx_ISS_MASK, iss2);
        }
 
-       pr_alert("  CM = %lu, WnR = %lu\n",
+       pr_alert("  CM = %lu, WnR = %lu, TnD = %lu, TagAccess = %lu\n",
                 (esr & ESR_ELx_CM) >> ESR_ELx_CM_SHIFT,
-                (esr & ESR_ELx_WNR) >> ESR_ELx_WNR_SHIFT);
+                (esr & ESR_ELx_WNR) >> ESR_ELx_WNR_SHIFT,
+                (iss2 & ESR_ELx_TnD) >> ESR_ELx_TnD_SHIFT,
+                (iss2 & ESR_ELx_TagAccess) >> ESR_ELx_TagAccess_SHIFT);
+
+       pr_alert("  GCS = %ld, Overlay = %lu, DirtyBit = %lu, Xs = %llu\n",
+                (iss2 & ESR_ELx_GCS) >> ESR_ELx_GCS_SHIFT,
+                (iss2 & ESR_ELx_Overlay) >> ESR_ELx_Overlay_SHIFT,
+                (iss2 & ESR_ELx_DirtyBit) >> ESR_ELx_DirtyBit_SHIFT,
+                (iss2 & ESR_ELx_Xs_MASK) >> ESR_ELx_Xs_SHIFT);
 }
 
 static void mem_abort_decode(unsigned long esr)
@@ -480,8 +491,8 @@ static void do_bad_area(unsigned long far, unsigned long esr,
        }
 }
 
-#define VM_FAULT_BADMAP                0x010000
-#define VM_FAULT_BADACCESS     0x020000
+#define VM_FAULT_BADMAP                ((__force vm_fault_t)0x010000)
+#define VM_FAULT_BADACCESS     ((__force vm_fault_t)0x020000)
 
 static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr,
                                  unsigned int mm_flags, unsigned long vm_flags,
@@ -600,8 +611,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
                vma_end_read(vma);
                goto lock_mmap;
        }
-       fault = handle_mm_fault(vma, addr & PAGE_MASK,
-                               mm_flags | FAULT_FLAG_VMA_LOCK, regs);
+       fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs);
        vma_end_read(vma);
 
        if (!(fault & VM_FAULT_RETRY)) {
@@ -886,9 +896,6 @@ void do_sp_pc_abort(unsigned long addr, unsigned long esr, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(do_sp_pc_abort);
 
-int __init early_brk64(unsigned long addr, unsigned long esr,
-                      struct pt_regs *regs);
-
 /*
  * __refdata because early_brk64 is __init, but the reference to it is
  * clobbered at arch_initcall time.
index 5f9379b..4e64760 100644 (file)
@@ -8,6 +8,7 @@
 
 #include <linux/export.h>
 #include <linux/mm.h>
+#include <linux/libnvdimm.h>
 #include <linux/pagemap.h>
 
 #include <asm/cacheflush.h>
index 66e70ca..c28c2c8 100644 (file)
@@ -69,6 +69,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit;
 
 #define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
 #define CRASH_ADDR_HIGH_MAX            (PHYS_MASK + 1)
+#define CRASH_HIGH_SEARCH_BASE         SZ_4G
 
 #define DEFAULT_CRASH_KERNEL_LOW_SIZE  (128UL << 20)
 
@@ -101,12 +102,13 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
  */
 static void __init reserve_crashkernel(void)
 {
-       unsigned long long crash_base, crash_size;
-       unsigned long long crash_low_size = 0;
+       unsigned long long crash_low_size = 0, search_base = 0;
        unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
+       unsigned long long crash_base, crash_size;
        char *cmdline = boot_command_line;
-       int ret;
        bool fixed_base = false;
+       bool high = false;
+       int ret;
 
        if (!IS_ENABLED(CONFIG_KEXEC_CORE))
                return;
@@ -129,7 +131,9 @@ static void __init reserve_crashkernel(void)
                else if (ret)
                        return;
 
+               search_base = CRASH_HIGH_SEARCH_BASE;
                crash_max = CRASH_ADDR_HIGH_MAX;
+               high = true;
        } else if (ret || !crash_size) {
                /* The specified value is invalid */
                return;
@@ -140,31 +144,51 @@ static void __init reserve_crashkernel(void)
        /* User specifies base address explicitly. */
        if (crash_base) {
                fixed_base = true;
+               search_base = crash_base;
                crash_max = crash_base + crash_size;
        }
 
 retry:
        crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
-                                              crash_base, crash_max);
+                                              search_base, crash_max);
        if (!crash_base) {
                /*
-                * If the first attempt was for low memory, fall back to
-                * high memory, the minimum required low memory will be
-                * reserved later.
+                * For crashkernel=size[KMG]@offset[KMG], print out failure
+                * message if can't reserve the specified region.
                 */
-               if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
+               if (fixed_base) {
+                       pr_warn("crashkernel reservation failed - memory is in use.\n");
+                       return;
+               }
+
+               /*
+                * For crashkernel=size[KMG], if the first attempt was for
+                * low memory, fall back to high memory, the minimum required
+                * low memory will be reserved later.
+                */
+               if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
                        crash_max = CRASH_ADDR_HIGH_MAX;
+                       search_base = CRASH_ADDR_LOW_MAX;
                        crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
                        goto retry;
                }
 
+               /*
+                * For crashkernel=size[KMG],high, if the first attempt was
+                * for high memory, fall back to low memory.
+                */
+               if (high && crash_max == CRASH_ADDR_HIGH_MAX) {
+                       crash_max = CRASH_ADDR_LOW_MAX;
+                       search_base = 0;
+                       goto retry;
+               }
                pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
                        crash_size);
                return;
        }
 
-       if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) &&
-            crash_low_size && reserve_crashkernel_low(crash_low_size)) {
+       if ((crash_base >= CRASH_ADDR_LOW_MAX) && crash_low_size &&
+            reserve_crashkernel_low(crash_low_size)) {
                memblock_phys_free(crash_base, crash_size);
                return;
        }
index e969e68..f17d066 100644 (file)
@@ -214,7 +214,7 @@ static void __init clear_pgds(unsigned long start,
 static void __init kasan_init_shadow(void)
 {
        u64 kimg_shadow_start, kimg_shadow_end;
-       u64 mod_shadow_start, mod_shadow_end;
+       u64 mod_shadow_start;
        u64 vmalloc_shadow_end;
        phys_addr_t pa_start, pa_end;
        u64 i;
@@ -223,7 +223,6 @@ static void __init kasan_init_shadow(void)
        kimg_shadow_end = PAGE_ALIGN((u64)kasan_mem_to_shadow(KERNEL_END));
 
        mod_shadow_start = (u64)kasan_mem_to_shadow((void *)MODULES_VADDR);
-       mod_shadow_end = (u64)kasan_mem_to_shadow((void *)MODULES_END);
 
        vmalloc_shadow_end = (u64)kasan_mem_to_shadow((void *)VMALLOC_END);
 
@@ -246,17 +245,9 @@ static void __init kasan_init_shadow(void)
        kasan_populate_early_shadow(kasan_mem_to_shadow((void *)PAGE_END),
                                   (void *)mod_shadow_start);
 
-       if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
-               BUILD_BUG_ON(VMALLOC_START != MODULES_END);
-               kasan_populate_early_shadow((void *)vmalloc_shadow_end,
-                                           (void *)KASAN_SHADOW_END);
-       } else {
-               kasan_populate_early_shadow((void *)kimg_shadow_end,
-                                           (void *)KASAN_SHADOW_END);
-               if (kimg_shadow_start > mod_shadow_end)
-                       kasan_populate_early_shadow((void *)mod_shadow_end,
-                                                   (void *)kimg_shadow_start);
-       }
+       BUILD_BUG_ON(VMALLOC_START != MODULES_END);
+       kasan_populate_early_shadow((void *)vmalloc_shadow_end,
+                                   (void *)KASAN_SHADOW_END);
 
        for_each_mem_range(i, &pa_start, &pa_end) {
                void *start = (void *)__phys_to_virt(pa_start);
index af6bc84..95d3608 100644 (file)
@@ -451,7 +451,7 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
 void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
                                   phys_addr_t size, pgprot_t prot)
 {
-       if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
+       if (virt < PAGE_OFFSET) {
                pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
                        &phys, virt);
                return;
@@ -478,7 +478,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
                                phys_addr_t size, pgprot_t prot)
 {
-       if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
+       if (virt < PAGE_OFFSET) {
                pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
                        &phys, virt);
                return;
@@ -663,12 +663,17 @@ static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
        vm_area_add_early(vma);
 }
 
+static pgprot_t kernel_exec_prot(void)
+{
+       return rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
+}
+
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 static int __init map_entry_trampoline(void)
 {
        int i;
 
-       pgprot_t prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
+       pgprot_t prot = kernel_exec_prot();
        phys_addr_t pa_start = __pa_symbol(__entry_tramp_text_start);
 
        /* The trampoline is always mapped and can therefore be global */
@@ -723,7 +728,7 @@ static void __init map_kernel(pgd_t *pgdp)
         * mapping to install SW breakpoints. Allow this (only) when
         * explicitly requested with rodata=off.
         */
-       pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
+       pgprot_t text_prot = kernel_exec_prot();
 
        /*
         * If we have a CPU that supports BTI and a kernel built for
index c2cb437..2baeec4 100644 (file)
@@ -199,7 +199,7 @@ SYM_FUNC_END(idmap_cpu_replace_ttbr1)
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 
-#define KPTI_NG_PTE_FLAGS      (PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
+#define KPTI_NG_PTE_FLAGS      (PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS | PTE_WRITE)
 
        .pushsection ".idmap.text", "a"
 
@@ -290,7 +290,7 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
        isb
 
        mov     temp_pte, x5
-       mov     pte_flags, #KPTI_NG_PTE_FLAGS
+       mov_q   pte_flags, KPTI_NG_PTE_FLAGS
 
        /* Everybody is enjoying the idmap, so we can rewrite swapper. */
        /* PGD */
@@ -454,6 +454,21 @@ SYM_FUNC_START(__cpu_setup)
 #endif /* CONFIG_ARM64_HW_AFDBM */
        msr     mair_el1, mair
        msr     tcr_el1, tcr
+
+       mrs_s   x1, SYS_ID_AA64MMFR3_EL1
+       ubfx    x1, x1, #ID_AA64MMFR3_EL1_S1PIE_SHIFT, #4
+       cbz     x1, .Lskip_indirection
+
+       mov_q   x0, PIE_E0
+       msr     REG_PIRE0_EL1, x0
+       mov_q   x0, PIE_E1
+       msr     REG_PIR_EL1, x0
+
+       mov     x0, TCR2_EL1x_PIE
+       msr     REG_TCR2_EL1, x0
+
+.Lskip_indirection:
+
        /*
         * Prepare SCTLR
         */
index 40ba954..19c23c4 100644 (file)
@@ -32,16 +32,20 @@ HAS_GENERIC_AUTH_IMP_DEF
 HAS_GIC_CPUIF_SYSREGS
 HAS_GIC_PRIO_MASKING
 HAS_GIC_PRIO_RELAXED_SYNC
+HAS_HCX
 HAS_LDAPR
 HAS_LSE_ATOMICS
+HAS_MOPS
 HAS_NESTED_VIRT
 HAS_NO_FPSIMD
 HAS_NO_HW_PREFETCH
 HAS_PAN
+HAS_S1PIE
 HAS_RAS_EXTN
 HAS_RNG
 HAS_SB
 HAS_STAGE2_FWB
+HAS_TCR2
 HAS_TIDCP1
 HAS_TLB_RANGE
 HAS_VIRT_HOST_EXTN
index 00c9e72..8525980 100755 (executable)
@@ -24,12 +24,12 @@ BEGIN {
 }
 
 /^[vA-Z0-9_]+$/ {
-       printf("#define ARM64_%-30s\t%d\n", $0, cap_num++)
+       printf("#define ARM64_%-40s\t%d\n", $0, cap_num++)
        next
 }
 
 END {
-       printf("#define ARM64_NCAPS\t\t\t\t%d\n", cap_num)
+       printf("#define ARM64_NCAPS\t\t\t\t\t%d\n", cap_num)
        print ""
        print "#endif /* __ASM_CPUCAPS_H */"
 }
index c9a0d1f..1ea4a3d 100644 (file)
 # feature that introduces them (eg, FEAT_LS64_ACCDATA introduces enumeration
 # item ACCDATA) though it may be more taseful to do something else.
 
+Sysreg OSDTRRX_EL1     2       0       0       0       2
+Res0   63:32
+Field  31:0    DTRRX
+EndSysreg
+
+Sysreg MDCCINT_EL1     2       0       0       2       0
+Res0   63:31
+Field  30      RX
+Field  29      TX
+Res0   28:0
+EndSysreg
+
+Sysreg MDSCR_EL1       2       0       0       2       2
+Res0   63:36
+Field  35      EHBWE
+Field  34      EnSPM
+Field  33      TTA
+Field  32      EMBWE
+Field  31      TFO
+Field  30      RXfull
+Field  29      TXfull
+Res0   28
+Field  27      RXO
+Field  26      TXU
+Res0   25:24
+Field  23:22   INTdis
+Field  21      TDA
+Res0   20
+Field  19      SC2
+Res0   18:16
+Field  15      MDE
+Field  14      HDE
+Field  13      KDE
+Field  12      TDCC
+Res0   11:7
+Field  6       ERR
+Res0   5:1
+Field  0       SS
+EndSysreg
+
+Sysreg OSDTRTX_EL1     2       0       0       3       2
+Res0   63:32
+Field  31:0    DTRTX
+EndSysreg
+
+Sysreg OSECCR_EL1      2       0       0       6       2
+Res0   63:32
+Field  31:0    EDECCR
+EndSysreg
+
+Sysreg OSLAR_EL1       2       0       1       0       4
+Res0   63:1
+Field  0       OSLK
+EndSysreg
+
 Sysreg ID_PFR0_EL1     3       0       0       1       0
 Res0   63:32
 UnsignedEnum   31:28   RAS
@@ -1538,6 +1593,78 @@ UnsignedEnum     3:0     CnP
 EndEnum
 EndSysreg
 
+Sysreg ID_AA64MMFR3_EL1        3       0       0       7       3
+UnsignedEnum   63:60   Spec_FPACC
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   59:56   ADERR
+       0b0000  NI
+       0b0001  DEV_ASYNC
+       0b0010  FEAT_ADERR
+       0b0011  FEAT_ADERR_IND
+EndEnum
+UnsignedEnum   55:52   SDERR
+       0b0000  NI
+       0b0001  DEV_SYNC
+       0b0010  FEAT_ADERR
+       0b0011  FEAT_ADERR_IND
+EndEnum
+Res0   51:48
+UnsignedEnum   47:44   ANERR
+       0b0000  NI
+       0b0001  ASYNC
+       0b0010  FEAT_ANERR
+       0b0011  FEAT_ANERR_IND
+EndEnum
+UnsignedEnum   43:40   SNERR
+       0b0000  NI
+       0b0001  SYNC
+       0b0010  FEAT_ANERR
+       0b0011  FEAT_ANERR_IND
+EndEnum
+UnsignedEnum   39:36   D128_2
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   35:32   D128
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   31:28   MEC
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   27:24   AIE
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   23:20   S2POE
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   19:16   S1POE
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   15:12   S2PIE
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   11:8    S1PIE
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   7:4     SCTLRX
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+UnsignedEnum   3:0     TCRX
+       0b0000  NI
+       0b0001  IMP
+EndEnum
+EndSysreg
+
 Sysreg SCTLR_EL1       3       0       1       0       0
 Field  63      TIDCP
 Field  62      SPINTMASK
@@ -2034,7 +2161,17 @@ Fields   ZCR_ELx
 EndSysreg
 
 Sysreg HCRX_EL2        3       4       1       2       2
-Res0   63:12
+Res0   63:23
+Field  22      GCSEn
+Field  21      EnIDCP128
+Field  20      EnSDERR
+Field  19      TMEA
+Field  18      EnSNERR
+Field  17      D128En
+Field  16      PTTWI
+Field  15      SCTLR2En
+Field  14      TCR2En
+Res0   13:12
 Field  11      MSCEn
 Field  10      MCE2
 Field  9       CMOW
@@ -2153,6 +2290,87 @@ Sysreg   TTBR1_EL1       3       0       2       0       1
 Fields TTBRx_EL1
 EndSysreg
 
+SysregFields   TCR2_EL1x
+Res0   63:16
+Field  15      DisCH1
+Field  14      DisCH0
+Res0   13:12
+Field  11      HAFT
+Field  10      PTTWI
+Res0   9:6
+Field  5       D128
+Field  4       AIE
+Field  3       POE
+Field  2       E0POE
+Field  1       PIE
+Field  0       PnCH
+EndSysregFields
+
+Sysreg TCR2_EL1        3       0       2       0       3
+Fields TCR2_EL1x
+EndSysreg
+
+Sysreg TCR2_EL12       3       5       2       0       3
+Fields TCR2_EL1x
+EndSysreg
+
+Sysreg TCR2_EL2        3       4       2       0       3
+Res0   63:16
+Field  15      DisCH1
+Field  14      DisCH0
+Field  13      AMEC1
+Field  12      AMEC0
+Field  11      HAFT
+Field  10      PTTWI
+Field  9:8     SKL1
+Field  7:6     SKL0
+Field  5       D128
+Field  4       AIE
+Field  3       POE
+Field  2       E0POE
+Field  1       PIE
+Field  0       PnCH
+EndSysreg
+
+SysregFields PIRx_ELx
+Field  63:60   Perm15
+Field  59:56   Perm14
+Field  55:52   Perm13
+Field  51:48   Perm12
+Field  47:44   Perm11
+Field  43:40   Perm10
+Field  39:36   Perm9
+Field  35:32   Perm8
+Field  31:28   Perm7
+Field  27:24   Perm6
+Field  23:20   Perm5
+Field  19:16   Perm4
+Field  15:12   Perm3
+Field  11:8    Perm2
+Field  7:4     Perm1
+Field  3:0     Perm0
+EndSysregFields
+
+Sysreg PIRE0_EL1       3       0       10      2       2
+Fields PIRx_ELx
+EndSysreg
+
+Sysreg PIRE0_EL12      3       5       10      2       2
+Fields PIRx_ELx
+EndSysreg
+
+Sysreg PIR_EL1         3       0       10      2       3
+Fields PIRx_ELx
+EndSysreg
+
+Sysreg PIR_EL12        3       5       10      2       3
+Fields PIRx_ELx
+EndSysreg
+
+Sysreg PIR_EL2         3       4       10      2       3
+Fields PIRx_ELx
+EndSysreg
+
 Sysreg LORSA_EL1       3       0       10      4       0
 Res0   63:52
 Field  51:16   SA
@@ -2200,3 +2418,80 @@ Sysreg   ICC_NMIAR1_EL1  3       0       12      9       5
 Res0   63:24
 Field  23:0    INTID
 EndSysreg
+
+Sysreg TRBLIMITR_EL1   3       0       9       11      0
+Field  63:12   LIMIT
+Res0   11:7
+Field  6       XE
+Field  5       nVM
+Enum   4:3     TM
+       0b00    STOP
+       0b01    IRQ
+       0b11    IGNR
+EndEnum
+Enum   2:1     FM
+       0b00    FILL
+       0b01    WRAP
+       0b11    CBUF
+EndEnum
+Field  0       E
+EndSysreg
+
+Sysreg TRBPTR_EL1      3       0       9       11      1
+Field  63:0    PTR
+EndSysreg
+
+Sysreg TRBBASER_EL1    3       0       9       11      2
+Field  63:12   BASE
+Res0   11:0
+EndSysreg
+
+Sysreg TRBSR_EL1       3       0       9       11      3
+Res0   63:56
+Field  55:32   MSS2
+Field  31:26   EC
+Res0   25:24
+Field  23      DAT
+Field  22      IRQ
+Field  21      TRG
+Field  20      WRAP
+Res0   19
+Field  18      EA
+Field  17      S
+Res0   16
+Field  15:0    MSS
+EndSysreg
+
+Sysreg TRBMAR_EL1      3       0       9       11      4
+Res0   63:12
+Enum   11:10   PAS
+       0b00    SECURE
+       0b01    NON_SECURE
+       0b10    ROOT
+       0b11    REALM
+EndEnum
+Enum   9:8     SH
+       0b00    NON_SHAREABLE
+       0b10    OUTER_SHAREABLE
+       0b11    INNER_SHAREABLE
+EndEnum
+Field  7:0     Attr
+EndSysreg
+
+Sysreg TRBTRG_EL1      3       0       9       11      6
+Res0   63:32
+Field  31:0    TRG
+EndSysreg
+
+Sysreg TRBIDR_EL1      3       0       9       11      7
+Res0   63:12
+Enum   11:8    EA
+       0b0000  NON_DESC
+       0b0001  IGNORE
+       0b0010  SERROR
+EndEnum
+Res0   7:6
+Field  5       F
+Field  4       P
+Field  3:0     Align
+EndSysreg
index 4df1f8c..95f1e9b 100644 (file)
@@ -96,6 +96,7 @@ config CSKY
        select HAVE_REGS_AND_STACK_ACCESS_API
        select HAVE_STACKPROTECTOR
        select HAVE_SYSCALL_TRACEPOINTS
+       select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
        select MAY_HAVE_SPARSE_IRQ
        select MODULES_USE_ELF_RELA if MODULES
        select OF
index 60406ef..4dab44f 100644 (file)
@@ -195,41 +195,6 @@ arch_atomic_dec_if_positive(atomic_t *v)
 }
 #define arch_atomic_dec_if_positive arch_atomic_dec_if_positive
 
-#define ATOMIC_OP()                                                    \
-static __always_inline                                                 \
-int arch_atomic_xchg_relaxed(atomic_t *v, int n)                       \
-{                                                                      \
-       return __xchg_relaxed(n, &(v->counter), 4);                     \
-}                                                                      \
-static __always_inline                                                 \
-int arch_atomic_cmpxchg_relaxed(atomic_t *v, int o, int n)             \
-{                                                                      \
-       return __cmpxchg_relaxed(&(v->counter), o, n, 4);               \
-}                                                                      \
-static __always_inline                                                 \
-int arch_atomic_cmpxchg_acquire(atomic_t *v, int o, int n)             \
-{                                                                      \
-       return __cmpxchg_acquire(&(v->counter), o, n, 4);               \
-}                                                                      \
-static __always_inline                                                 \
-int arch_atomic_cmpxchg(atomic_t *v, int o, int n)                     \
-{                                                                      \
-       return __cmpxchg(&(v->counter), o, n, 4);                       \
-}
-
-#define ATOMIC_OPS()                                                   \
-       ATOMIC_OP()
-
-ATOMIC_OPS()
-
-#define arch_atomic_xchg_relaxed       arch_atomic_xchg_relaxed
-#define arch_atomic_cmpxchg_relaxed    arch_atomic_cmpxchg_relaxed
-#define arch_atomic_cmpxchg_acquire    arch_atomic_cmpxchg_acquire
-#define arch_atomic_cmpxchg            arch_atomic_cmpxchg
-
-#undef ATOMIC_OPS
-#undef ATOMIC_OP
-
 #else
 #include <asm-generic/atomic.h>
 #endif
index 668b79c..d3db334 100644 (file)
@@ -23,7 +23,7 @@ void __init set_send_ipi(void (*func)(const struct cpumask *mask), int irq);
 
 int __cpu_disable(void);
 
-void __cpu_die(unsigned int cpu);
+static inline void __cpu_die(unsigned int cpu) { }
 
 #endif /* CONFIG_SMP */
 
index b12e2c3..8e42352 100644 (file)
@@ -291,12 +291,8 @@ int __cpu_disable(void)
        return 0;
 }
 
-void __cpu_die(unsigned int cpu)
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
 {
-       if (!cpu_wait_death(cpu, 5)) {
-               pr_crit("CPU%u: shutdown failed\n", cpu);
-               return;
-       }
        pr_notice("CPU%u: shutdown\n", cpu);
 }
 
@@ -304,7 +300,7 @@ void __noreturn arch_cpu_idle_dead(void)
 {
        idle_task_exit();
 
-       cpu_report_death();
+       cpuhp_ap_report_dead();
 
        while (!secondary_stack)
                arch_cpu_idle();
index 6e94f8d..2447d08 100644 (file)
@@ -28,58 +28,8 @@ static inline void arch_atomic_set(atomic_t *v, int new)
 
 #define arch_atomic_set_release(v, i)  arch_atomic_set((v), (i))
 
-/**
- * arch_atomic_read - reads a word, atomically
- * @v: pointer to atomic value
- *
- * Assumes all word reads on our architecture are atomic.
- */
 #define arch_atomic_read(v)            READ_ONCE((v)->counter)
 
-/**
- * arch_atomic_xchg - atomic
- * @v: pointer to memory to change
- * @new: new value (technically passed in a register -- see xchg)
- */
-#define arch_atomic_xchg(v, new)       (arch_xchg(&((v)->counter), (new)))
-
-
-/**
- * arch_atomic_cmpxchg - atomic compare-and-exchange values
- * @v: pointer to value to change
- * @old:  desired old value to match
- * @new:  new value to put in
- *
- * Parameters are then pointer, value-in-register, value-in-register,
- * and the output is the old value.
- *
- * Apparently this is complicated for archs that don't support
- * the memw_locked like we do (or it's broken or whatever).
- *
- * Kind of the lynchpin of the rest of the generically defined routines.
- * Remember V2 had that bug with dotnew predicate set by memw_locked.
- *
- * "old" is "expected" old val, __oldval is actual old value
- */
-static inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
-{
-       int __oldval;
-
-       asm volatile(
-               "1:     %0 = memw_locked(%1);\n"
-               "       { P0 = cmp.eq(%0,%2);\n"
-               "         if (!P0.new) jump:nt 2f; }\n"
-               "       memw_locked(%1,P0) = %3;\n"
-               "       if (!P0) jump 1b;\n"
-               "2:\n"
-               : "=&r" (__oldval)
-               : "r" (&v->counter), "r" (old), "r" (new)
-               : "memory", "p0"
-       );
-
-       return __oldval;
-}
-
 #define ATOMIC_OP(op)                                                  \
 static inline void arch_atomic_##op(int i, atomic_t *v)                        \
 {                                                                      \
@@ -135,6 +85,11 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t *v)                \
 ATOMIC_OPS(add)
 ATOMIC_OPS(sub)
 
+#define arch_atomic_add_return                 arch_atomic_add_return
+#define arch_atomic_sub_return                 arch_atomic_sub_return
+#define arch_atomic_fetch_add                  arch_atomic_fetch_add
+#define arch_atomic_fetch_sub                  arch_atomic_fetch_sub
+
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_FETCH_OP(op)
 
@@ -142,21 +97,15 @@ ATOMIC_OPS(and)
 ATOMIC_OPS(or)
 ATOMIC_OPS(xor)
 
+#define arch_atomic_fetch_and                  arch_atomic_fetch_and
+#define arch_atomic_fetch_or                   arch_atomic_fetch_or
+#define arch_atomic_fetch_xor                  arch_atomic_fetch_xor
+
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-/**
- * arch_atomic_fetch_add_unless - add unless the number is a given value
- * @v: pointer to value
- * @a: amount to add
- * @u: unless value is equal to u
- *
- * Returns old value.
- *
- */
-
 static inline int arch_atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
        int __oldval;
index 21fa63c..2cd93e6 100644 (file)
@@ -9,6 +9,7 @@ menu "Processor type and features"
 config IA64
        bool
        select ARCH_BINFMT_ELF_EXTRA_PHDRS
+       select ARCH_HAS_CPU_FINALIZE_INIT
        select ARCH_HAS_DMA_MARK_CLEAN
        select ARCH_HAS_STRNCPY_FROM_USER
        select ARCH_HAS_STRNLEN_USER
index 266c429..6540a62 100644 (file)
@@ -207,13 +207,6 @@ ATOMIC64_FETCH_OP(xor, ^)
 #undef ATOMIC64_FETCH_OP
 #undef ATOMIC64_OP
 
-#define arch_atomic_cmpxchg(v, old, new) (arch_cmpxchg(&((v)->counter), old, new))
-#define arch_atomic_xchg(v, new) (arch_xchg(&((v)->counter), new))
-
-#define arch_atomic64_cmpxchg(v, old, new) \
-       (arch_cmpxchg(&((v)->counter), old, new))
-#define arch_atomic64_xchg(v, new) (arch_xchg(&((v)->counter), new))
-
 #define arch_atomic_add(i,v)           (void)arch_atomic_add_return((i), (v))
 #define arch_atomic_sub(i,v)           (void)arch_atomic_sub_return((i), (v))
 
diff --git a/arch/ia64/include/asm/bugs.h b/arch/ia64/include/asm/bugs.h
deleted file mode 100644 (file)
index 0d6b9bd..0000000
+++ /dev/null
@@ -1,20 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * This is included by init/main.c to check for architecture-dependent bugs.
- *
- * Needs:
- *     void check_bugs(void);
- *
- * Based on <asm-alpha/bugs.h>.
- *
- * Modified 1998, 1999, 2003
- *     David Mosberger-Tang <davidm@hpl.hp.com>,  Hewlett-Packard Co.
- */
-#ifndef _ASM_IA64_BUGS_H
-#define _ASM_IA64_BUGS_H
-
-#include <asm/processor.h>
-
-extern void check_bugs (void);
-
-#endif /* _ASM_IA64_BUGS_H */
index c057280..5a55ac8 100644 (file)
@@ -627,7 +627,7 @@ setup_arch (char **cmdline_p)
         * is physical disk 1 partition 1 and the Linux root disk is
         * physical disk 1 partition 2.
         */
-       ROOT_DEV = Root_SDA2;           /* default to second partition on first drive */
+       ROOT_DEV = MKDEV(SCSI_DISK0_MAJOR, 2);
 
        if (is_uv_system())
                uv_setup(cmdline_p);
@@ -1067,8 +1067,7 @@ cpu_init (void)
        }
 }
 
-void __init
-check_bugs (void)
+void __init arch_cpu_finalize_init(void)
 {
        ia64_patch_mckinley_e9((unsigned long) __start___mckinley_e9_bundles,
                               (unsigned long) __end___mckinley_e9_bundles);
index d38b066..cbab4f9 100644 (file)
@@ -10,6 +10,7 @@ config LOONGARCH
        select ARCH_ENABLE_MEMORY_HOTPLUG
        select ARCH_ENABLE_MEMORY_HOTREMOVE
        select ARCH_HAS_ACPI_TABLE_UPGRADE      if ACPI
+       select ARCH_HAS_CPU_FINALIZE_INIT
        select ARCH_HAS_FORTIFY_SOURCE
        select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
        select ARCH_HAS_PTE_SPECIAL
index 6b9aca9..e27f0c7 100644 (file)
 
 #define ATOMIC_INIT(i)   { (i) }
 
-/*
- * arch_atomic_read - read atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically reads the value of @v.
- */
 #define arch_atomic_read(v)    READ_ONCE((v)->counter)
-
-/*
- * arch_atomic_set - set atomic variable
- * @v: pointer of type atomic_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i.
- */
 #define arch_atomic_set(v, i)  WRITE_ONCE((v)->counter, (i))
 
 #define ATOMIC_OP(op, I, asm_op)                                       \
@@ -139,14 +125,6 @@ static inline int arch_atomic_fetch_add_unless(atomic_t *v, int a, int u)
 }
 #define arch_atomic_fetch_add_unless arch_atomic_fetch_add_unless
 
-/*
- * arch_atomic_sub_if_positive - conditionally subtract integer from atomic variable
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically test @v and subtract @i if @v is greater or equal than @i.
- * The function returns the old value of @v minus @i.
- */
 static inline int arch_atomic_sub_if_positive(int i, atomic_t *v)
 {
        int result;
@@ -181,31 +159,13 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v)
        return result;
 }
 
-#define arch_atomic_cmpxchg(v, o, n) (arch_cmpxchg(&((v)->counter), (o), (n)))
-#define arch_atomic_xchg(v, new) (arch_xchg(&((v)->counter), (new)))
-
-/*
- * arch_atomic_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic_t
- */
 #define arch_atomic_dec_if_positive(v) arch_atomic_sub_if_positive(1, v)
 
 #ifdef CONFIG_64BIT
 
 #define ATOMIC64_INIT(i)    { (i) }
 
-/*
- * arch_atomic64_read - read atomic variable
- * @v: pointer of type atomic64_t
- *
- */
 #define arch_atomic64_read(v)  READ_ONCE((v)->counter)
-
-/*
- * arch_atomic64_set - set atomic variable
- * @v: pointer of type atomic64_t
- * @i: required value
- */
 #define arch_atomic64_set(v, i)        WRITE_ONCE((v)->counter, (i))
 
 #define ATOMIC64_OP(op, I, asm_op)                                     \
@@ -300,14 +260,6 @@ static inline long arch_atomic64_fetch_add_unless(atomic64_t *v, long a, long u)
 }
 #define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
 
-/*
- * arch_atomic64_sub_if_positive - conditionally subtract integer from atomic variable
- * @i: integer value to subtract
- * @v: pointer of type atomic64_t
- *
- * Atomically test @v and subtract @i if @v is greater or equal than @i.
- * The function returns the old value of @v minus @i.
- */
 static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v)
 {
        long result;
@@ -342,14 +294,6 @@ static inline long arch_atomic64_sub_if_positive(long i, atomic64_t *v)
        return result;
 }
 
-#define arch_atomic64_cmpxchg(v, o, n) \
-       ((__typeof__((v)->counter))arch_cmpxchg(&((v)->counter), (o), (n)))
-#define arch_atomic64_xchg(v, new) (arch_xchg(&((v)->counter), (new)))
-
-/*
- * arch_atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic64_t
- */
 #define arch_atomic64_dec_if_positive(v)       arch_atomic64_sub_if_positive(1, v)
 
 #endif /* CONFIG_64BIT */
diff --git a/arch/loongarch/include/asm/bugs.h b/arch/loongarch/include/asm/bugs.h
deleted file mode 100644 (file)
index 9839653..0000000
+++ /dev/null
@@ -1,15 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * This is included by init/main.c to check for architecture-dependent bugs.
- *
- * Copyright (C) 2020-2022 Loongson Technology Corporation Limited
- */
-#ifndef _ASM_BUGS_H
-#define _ASM_BUGS_H
-
-#include <asm/cpu.h>
-#include <asm/cpu-info.h>
-
-extern void check_bugs(void);
-
-#endif /* _ASM_BUGS_H */
index b3323ab..1c2a0a2 100644 (file)
@@ -1167,7 +1167,7 @@ static __always_inline void iocsr_write64(u64 val, u32 reg)
 
 #ifndef __ASSEMBLY__
 
-static inline u64 drdtime(void)
+static __always_inline u64 drdtime(void)
 {
        int rID = 0;
        u64 val = 0;
@@ -1496,7 +1496,7 @@ __BUILD_CSR_OP(tlbidx)
 #define write_fcsr(dest, val) \
 do {   \
        __asm__ __volatile__(   \
-       "       movgr2fcsr      %0, "__stringify(dest)" \n"     \
+       "       movgr2fcsr      "__stringify(dest)", %0 \n"     \
        : : "r" (val)); \
 } while (0)
 
index 8b98d22..de46a6b 100644 (file)
 #define        _PAGE_PFN_SHIFT         12
 #define        _PAGE_SWP_EXCLUSIVE_SHIFT 23
 #define        _PAGE_PFN_END_SHIFT     48
+#define        _PAGE_PRESENT_INVALID_SHIFT 60
 #define        _PAGE_NO_READ_SHIFT     61
 #define        _PAGE_NO_EXEC_SHIFT     62
 #define        _PAGE_RPLV_SHIFT        63
 
 /* Used by software */
 #define _PAGE_PRESENT          (_ULCAST_(1) << _PAGE_PRESENT_SHIFT)
+#define _PAGE_PRESENT_INVALID  (_ULCAST_(1) << _PAGE_PRESENT_INVALID_SHIFT)
 #define _PAGE_WRITE            (_ULCAST_(1) << _PAGE_WRITE_SHIFT)
 #define _PAGE_ACCESSED         (_ULCAST_(1) << _PAGE_ACCESSED_SHIFT)
 #define _PAGE_MODIFIED         (_ULCAST_(1) << _PAGE_MODIFIED_SHIFT)
index d28fb9d..9a9f9ff 100644 (file)
@@ -213,7 +213,7 @@ static inline int pmd_bad(pmd_t pmd)
 static inline int pmd_present(pmd_t pmd)
 {
        if (unlikely(pmd_val(pmd) & _PAGE_HUGE))
-               return !!(pmd_val(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE));
+               return !!(pmd_val(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PRESENT_INVALID));
 
        return pmd_val(pmd) != (unsigned long)invalid_pte_table;
 }
@@ -558,6 +558,7 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 
 static inline pmd_t pmd_mkinvalid(pmd_t pmd)
 {
+       pmd_val(pmd) |= _PAGE_PRESENT_INVALID;
        pmd_val(pmd) &= ~(_PAGE_PRESENT | _PAGE_VALID | _PAGE_DIRTY | _PAGE_PROTNONE);
 
        return pmd;
index 2406c95..021b59c 100644 (file)
@@ -396,6 +396,8 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
 
        if (hw->ctrl.type != LOONGARCH_BREAKPOINT_EXECUTE)
                alignment_mask = 0x7;
+       else
+               alignment_mask = 0x3;
        offset = hw->address & alignment_mask;
 
        hw->address &= ~alignment_mask;
index ff28f99..0491bf4 100644 (file)
@@ -271,7 +271,7 @@ static void loongarch_pmu_enable_event(struct hw_perf_event *evt, int idx)
        WARN_ON(idx < 0 || idx >= loongarch_pmu.num_counters);
 
        /* Make sure interrupt enabled. */
-       cpuc->saved_ctrl[idx] = M_PERFCTL_EVENT(evt->event_base & 0xff) |
+       cpuc->saved_ctrl[idx] = M_PERFCTL_EVENT(evt->event_base) |
                (evt->config_base & M_PERFCTL_CONFIG_MASK) | CSR_PERFCTRL_IE;
 
        cpu = (event->cpu >= 0) ? event->cpu : smp_processor_id();
@@ -594,7 +594,7 @@ static struct pmu pmu = {
 
 static unsigned int loongarch_pmu_perf_event_encode(const struct loongarch_perf_event *pev)
 {
-       return (pev->event_id & 0xff);
+       return M_PERFCTL_EVENT(pev->event_id);
 }
 
 static const struct loongarch_perf_event *loongarch_pmu_map_general_event(int idx)
@@ -849,7 +849,7 @@ static void resume_local_counters(void)
 
 static const struct loongarch_perf_event *loongarch_pmu_map_raw_event(u64 config)
 {
-       raw_event.event_id = config & 0xff;
+       raw_event.event_id = M_PERFCTL_EVENT(config);
 
        return &raw_event;
 }
index 4444b13..78a0035 100644 (file)
@@ -12,6 +12,7 @@
  */
 #include <linux/init.h>
 #include <linux/acpi.h>
+#include <linux/cpu.h>
 #include <linux/dmi.h>
 #include <linux/efi.h>
 #include <linux/export.h>
@@ -37,7 +38,6 @@
 #include <asm/addrspace.h>
 #include <asm/alternative.h>
 #include <asm/bootinfo.h>
-#include <asm/bugs.h>
 #include <asm/cache.h>
 #include <asm/cpu.h>
 #include <asm/dma.h>
@@ -87,7 +87,7 @@ const char *get_system_type(void)
        return "generic-loongson-machine";
 }
 
-void __init check_bugs(void)
+void __init arch_cpu_finalize_init(void)
 {
        alternative_instructions();
 }
index f377e50..c189e03 100644 (file)
@@ -190,9 +190,9 @@ static u64 read_const_counter(struct clocksource *clk)
        return drdtime();
 }
 
-static u64 native_sched_clock(void)
+static noinstr u64 sched_clock_read(void)
 {
-       return read_const_counter(NULL);
+       return drdtime();
 }
 
 static struct clocksource clocksource_const = {
@@ -211,7 +211,7 @@ int __init constant_clocksource_init(void)
 
        res = clocksource_register_hz(&clocksource_const, freq);
 
-       sched_clock_register(native_sched_clock, 64, freq);
+       sched_clock_register(sched_clock_read, 64, freq);
 
        pr_info("Constant clock source device register\n");
 
index bdff825..85fae3d 100644 (file)
@@ -485,7 +485,7 @@ static int __init debugfs_unaligned(void)
        struct dentry *d;
 
        d = debugfs_create_dir("loongarch", NULL);
-       if (!d)
+       if (IS_ERR_OR_NULL(d))
                return -ENOMEM;
 
        debugfs_create_u32("unaligned_instructions_user",
index 40198a1..dc792b3 100644 (file)
@@ -4,6 +4,7 @@ config M68K
        default y
        select ARCH_32BIT_OFF_T
        select ARCH_HAS_BINFMT_FLAT
+       select ARCH_HAS_CPU_FINALIZE_INIT if MMU
        select ARCH_HAS_CURRENT_STACK_POINTER
        select ARCH_HAS_DMA_PREP_COHERENT if HAS_DMA && MMU && !COLDFIRE
        select ARCH_HAS_SYNC_DMA_FOR_DEVICE if HAS_DMA
index b26469a..62fdca7 100644 (file)
@@ -43,6 +43,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -454,7 +455,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index 944a49a..5bfbd04 100644 (file)
@@ -39,6 +39,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -411,7 +412,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index a32dd88..44302f1 100644 (file)
@@ -46,6 +46,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -431,7 +432,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index 23b7805..f3336f1 100644 (file)
@@ -36,6 +36,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -403,7 +404,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index 5605ab5..2d1bbac 100644 (file)
@@ -38,6 +38,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -413,7 +414,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index d0d1f9c..b4428dc 100644 (file)
@@ -37,6 +37,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -433,7 +434,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index 6d04314..4cd9fa4 100644 (file)
@@ -57,6 +57,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -519,7 +520,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index e6f5ae5..7ee9ad5 100644 (file)
@@ -35,6 +35,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -402,7 +403,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index f2d4dff..2488893 100644 (file)
@@ -36,6 +36,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -403,7 +404,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index 907eede..ffc6762 100644 (file)
@@ -37,6 +37,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -420,7 +421,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index 9e3d470..1981796 100644 (file)
@@ -402,7 +402,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index f654007..85364f6 100644 (file)
@@ -33,6 +33,7 @@ CONFIG_IOSCHED_BFQ=m
 CONFIG_BINFMT_MISC=m
 CONFIG_SLAB=y
 # CONFIG_COMPACTION is not set
+CONFIG_DMAPOOL_TEST=m
 CONFIG_USERFAULTFD=y
 CONFIG_NET=y
 CONFIG_PACKET=y
@@ -401,7 +402,6 @@ CONFIG_OCFS2_FS=m
 # CONFIG_OCFS2_DEBUG_MASKLOG is not set
 CONFIG_FANOTIFY=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
-# CONFIG_PRINT_QUOTA_WARNING is not set
 CONFIG_AUTOFS_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_CUSE=m
index 8059bd6..311b57e 100644 (file)
@@ -24,8 +24,6 @@ CONFIG_SUN_PARTITION=y
 CONFIG_SYSV68_PARTITION=y
 CONFIG_NET=y
 CONFIG_PACKET=y
-CONFIG_UNIX=y
-CONFIG_INET=y
 CONFIG_IP_PNP=y
 CONFIG_IP_PNP_DHCP=y
 CONFIG_IP_PNP_BOOTP=y
index cfba83d..4bfbc25 100644 (file)
@@ -106,6 +106,11 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t * v)              \
 ATOMIC_OPS(add, +=, add)
 ATOMIC_OPS(sub, -=, sub)
 
+#define arch_atomic_add_return                 arch_atomic_add_return
+#define arch_atomic_sub_return                 arch_atomic_sub_return
+#define arch_atomic_fetch_add                  arch_atomic_fetch_add
+#define arch_atomic_fetch_sub                  arch_atomic_fetch_sub
+
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(op, c_op, asm_op)                                   \
        ATOMIC_OP(op, c_op, asm_op)                                     \
@@ -115,6 +120,10 @@ ATOMIC_OPS(and, &=, and)
 ATOMIC_OPS(or, |=, or)
 ATOMIC_OPS(xor, ^=, eor)
 
+#define arch_atomic_fetch_and                  arch_atomic_fetch_and
+#define arch_atomic_fetch_or                   arch_atomic_fetch_or
+#define arch_atomic_fetch_xor                  arch_atomic_fetch_xor
+
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
@@ -158,12 +167,7 @@ static inline int arch_atomic_inc_and_test(atomic_t *v)
 }
 #define arch_atomic_inc_and_test arch_atomic_inc_and_test
 
-#ifdef CONFIG_RMW_INSNS
-
-#define arch_atomic_cmpxchg(v, o, n) ((int)arch_cmpxchg(&((v)->counter), (o), (n)))
-#define arch_atomic_xchg(v, new) (arch_xchg(&((v)->counter), new))
-
-#else /* !CONFIG_RMW_INSNS */
+#ifndef CONFIG_RMW_INSNS
 
 static inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
 {
@@ -177,6 +181,7 @@ static inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
        local_irq_restore(flags);
        return prev;
 }
+#define arch_atomic_cmpxchg arch_atomic_cmpxchg
 
 static inline int arch_atomic_xchg(atomic_t *v, int new)
 {
@@ -189,6 +194,7 @@ static inline int arch_atomic_xchg(atomic_t *v, int new)
        local_irq_restore(flags);
        return prev;
 }
+#define arch_atomic_xchg arch_atomic_xchg
 
 #endif /* !CONFIG_RMW_INSNS */
 
diff --git a/arch/m68k/include/asm/bugs.h b/arch/m68k/include/asm/bugs.h
deleted file mode 100644 (file)
index 7455306..0000000
+++ /dev/null
@@ -1,21 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- *  include/asm-m68k/bugs.h
- *
- *  Copyright (C) 1994  Linus Torvalds
- */
-
-/*
- * This is included by init/main.c to check for architecture-dependent bugs.
- *
- * Needs:
- *     void check_bugs(void);
- */
-
-#ifdef CONFIG_MMU
-extern void check_bugs(void);  /* in arch/m68k/kernel/setup.c */
-#else
-static void check_bugs(void)
-{
-}
-#endif
index fbff1ce..6f1ae01 100644 (file)
@@ -10,6 +10,7 @@
  */
 
 #include <linux/kernel.h>
+#include <linux/cpu.h>
 #include <linux/mm.h>
 #include <linux/sched.h>
 #include <linux/delay.h>
@@ -504,7 +505,7 @@ static int __init proc_hardware_init(void)
 module_init(proc_hardware_init);
 #endif
 
-void check_bugs(void)
+void __init arch_cpu_finalize_init(void)
 {
 #if defined(CONFIG_FPU) && !defined(CONFIG_M68KFPU_EMU)
        if (m68k_fputype == 0) {
index b9f6908..ba468b5 100644 (file)
@@ -858,11 +858,17 @@ static inline int rt_setup_ucontext(struct ucontext __user *uc, struct pt_regs *
 }
 
 static inline void __user *
-get_sigframe(struct ksignal *ksig, size_t frame_size)
+get_sigframe(struct ksignal *ksig, struct pt_regs *tregs, size_t frame_size)
 {
        unsigned long usp = sigsp(rdusp(), ksig);
+       unsigned long gap = 0;
 
-       return (void __user *)((usp - frame_size) & -8UL);
+       if (CPU_IS_020_OR_030 && tregs->format == 0xb) {
+               /* USP is unreliable so use worst-case value */
+               gap = 256;
+       }
+
+       return (void __user *)((usp - gap - frame_size) & -8UL);
 }
 
 static int setup_frame(struct ksignal *ksig, sigset_t *set,
@@ -880,7 +886,7 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
                return -EFAULT;
        }
 
-       frame = get_sigframe(ksig, sizeof(*frame) + fsize);
+       frame = get_sigframe(ksig, tregs, sizeof(*frame) + fsize);
 
        if (fsize)
                err |= copy_to_user (frame + 1, regs + 1, fsize);
@@ -952,7 +958,7 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
                return -EFAULT;
        }
 
-       frame = get_sigframe(ksig, sizeof(*frame));
+       frame = get_sigframe(ksig, tregs, sizeof(*frame));
 
        if (fsize)
                err |= copy_to_user (&frame->uc.uc_extra, regs + 1, fsize);
index c2f5498..ada18f3 100644 (file)
@@ -4,6 +4,7 @@ config MIPS
        default y
        select ARCH_32BIT_OFF_T if !64BIT
        select ARCH_BINFMT_ELF_STATE if MIPS_FP_SUPPORT
+       select ARCH_HAS_CPU_FINALIZE_INIT
        select ARCH_HAS_CURRENT_STACK_POINTER if !CC_IS_CLANG || CLANG_VERSION >= 140000
        select ARCH_HAS_DEBUG_VIRTUAL if !64BIT
        select ARCH_HAS_FORTIFY_SOURCE
@@ -79,6 +80,7 @@ config MIPS
        select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
        select HAVE_MOD_ARCH_SPECIFIC
        select HAVE_NMI
+       select HAVE_PATA_PLATFORM
        select HAVE_PERF_EVENTS
        select HAVE_PERF_REGS
        select HAVE_PERF_USER_STACK_DUMP
@@ -2285,6 +2287,7 @@ config MIPS_CPS
        select MIPS_CM
        select MIPS_CPS_PM if HOTPLUG_CPU
        select SMP
+       select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
        select SYNC_R4K if (CEVT_R4K || CSRC_R4K)
        select SYS_SUPPORTS_HOTPLUG_CPU
        select SYS_SUPPORTS_SCHED_SMT if CPU_MIPSR6
index 5ab0430..6a3c890 100644 (file)
@@ -30,6 +30,7 @@
  *
  */
 
+#include <linux/dma-map-ops.h> /* for dma_default_coherent */
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
@@ -623,17 +624,18 @@ u32 au1xxx_dbdma_put_source(u32 chanid, dma_addr_t buf, int nbytes, u32 flags)
                dp->dscr_cmd0 &= ~DSCR_CMD0_IE;
 
        /*
-        * There is an errata on the Au1200/Au1550 parts that could result
-        * in "stale" data being DMA'ed. It has to do with the snoop logic on
-        * the cache eviction buffer.  DMA_NONCOHERENT is on by default for
-        * these parts. If it is fixed in the future, these dma_cache_inv will
-        * just be nothing more than empty macros. See io.h.
+        * There is an erratum on certain Au1200/Au1550 revisions that could
+        * result in "stale" data being DMA'ed. It has to do with the snoop
+        * logic on the cache eviction buffer.  dma_default_coherent is set
+        * to false on these parts.
         */
-       dma_cache_wback_inv((unsigned long)buf, nbytes);
+       if (!dma_default_coherent)
+               dma_cache_wback_inv(KSEG0ADDR(buf), nbytes);
        dp->dscr_cmd0 |= DSCR_CMD0_V;   /* Let it rip */
        wmb(); /* drain writebuffer */
        dma_cache_wback_inv((unsigned long)dp, sizeof(*dp));
        ctp->chan_ptr->ddma_dbell = 0;
+       wmb(); /* force doorbell write out to dma engine */
 
        /* Get next descriptor pointer. */
        ctp->put_ptr = phys_to_virt(DSCR_GET_NXTPTR(dp->dscr_nxtptr));
@@ -685,17 +687,18 @@ u32 au1xxx_dbdma_put_dest(u32 chanid, dma_addr_t buf, int nbytes, u32 flags)
                          dp->dscr_source1, dp->dscr_dest0, dp->dscr_dest1);
 #endif
        /*
-        * There is an errata on the Au1200/Au1550 parts that could result in
-        * "stale" data being DMA'ed. It has to do with the snoop logic on the
-        * cache eviction buffer.  DMA_NONCOHERENT is on by default for these
-        * parts. If it is fixed in the future, these dma_cache_inv will just
-        * be nothing more than empty macros. See io.h.
+        * There is an erratum on certain Au1200/Au1550 revisions that could
+        * result in "stale" data being DMA'ed. It has to do with the snoop
+        * logic on the cache eviction buffer.  dma_default_coherent is set
+        * to false on these parts.
         */
-       dma_cache_inv((unsigned long)buf, nbytes);
+       if (!dma_default_coherent)
+               dma_cache_inv(KSEG0ADDR(buf), nbytes);
        dp->dscr_cmd0 |= DSCR_CMD0_V;   /* Let it rip */
        wmb(); /* drain writebuffer */
        dma_cache_wback_inv((unsigned long)dp, sizeof(*dp));
        ctp->chan_ptr->ddma_dbell = 0;
+       wmb(); /* force doorbell write out to dma engine */
 
        /* Get next descriptor pointer. */
        ctp->put_ptr = phys_to_virt(DSCR_GET_NXTPTR(dp->dscr_nxtptr));
index 549a639..053805c 100644 (file)
@@ -178,7 +178,10 @@ void __init plat_mem_setup(void)
        ioport_resource.start = 0;
        ioport_resource.end = ~0;
 
-       /* intended to somewhat resemble ARM; see Documentation/arm/booting.rst */
+       /*
+        * intended to somewhat resemble ARM; see
+        * Documentation/arch/arm/booting.rst
+        */
        if (fw_arg0 == 0 && fw_arg1 == 0xffffffff)
                dtb = phys_to_virt(fw_arg2);
        else
index 4212584..33c0968 100644 (file)
@@ -345,6 +345,7 @@ void play_dead(void)
        int cpu = cpu_number_map(cvmx_get_core_num());
 
        idle_task_exit();
+       cpuhp_ap_report_dead();
        octeon_processor_boot = 0xff;
        per_cpu(cpu_state, cpu) = CPU_DEAD;
 
index 712fb5a..ba188e7 100644 (file)
@@ -33,17 +33,6 @@ static __always_inline void arch_##pfx##_set(pfx##_t *v, type i)     \
 {                                                                      \
        WRITE_ONCE(v->counter, i);                                      \
 }                                                                      \
-                                                                       \
-static __always_inline type                                            \
-arch_##pfx##_cmpxchg(pfx##_t *v, type o, type n)                       \
-{                                                                      \
-       return arch_cmpxchg(&v->counter, o, n);                         \
-}                                                                      \
-                                                                       \
-static __always_inline type arch_##pfx##_xchg(pfx##_t *v, type n)      \
-{                                                                      \
-       return arch_xchg(&v->counter, n);                               \
-}
 
 ATOMIC_OPS(atomic, int)
 
index 653f78f..84be74a 100644 (file)
@@ -1,17 +1,11 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * This is included by init/main.c to check for architecture-dependent bugs.
- *
  * Copyright (C) 2007  Maciej W. Rozycki
- *
- * Needs:
- *     void check_bugs(void);
  */
 #ifndef _ASM_BUGS_H
 #define _ASM_BUGS_H
 
 #include <linux/bug.h>
-#include <linux/delay.h>
 #include <linux/smp.h>
 
 #include <asm/cpu.h>
@@ -24,17 +18,6 @@ extern void check_bugs64_early(void);
 extern void check_bugs32(void);
 extern void check_bugs64(void);
 
-static inline void __init check_bugs(void)
-{
-       unsigned int cpu = smp_processor_id();
-
-       cpu_data[cpu].udelay_val = loops_per_jiffy;
-       check_bugs32();
-
-       if (IS_ENABLED(CONFIG_CPU_R4X00_BUGS64))
-               check_bugs64();
-}
-
 static inline int r4k_daddiu_bug(void)
 {
        if (!IS_ENABLED(CONFIG_CPU_R4X00_BUGS64))
index eb3ddbe..d8f9dec 100644 (file)
@@ -47,7 +47,6 @@
 
 #include <regs-clk.h>
 #include <regs-mux.h>
-#include <regs-pwm.h>
 #include <regs-rtc.h>
 #include <regs-wdt.h>
 
diff --git a/arch/mips/include/asm/mach-loongson32/regs-pwm.h b/arch/mips/include/asm/mach-loongson32/regs-pwm.h
deleted file mode 100644 (file)
index ec870c8..0000000
+++ /dev/null
@@ -1,25 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * Copyright (c) 2014 Zhang, Keguang <keguang.zhang@gmail.com>
- *
- * Loongson 1 PWM Register Definitions.
- */
-
-#ifndef __ASM_MACH_LOONGSON32_REGS_PWM_H
-#define __ASM_MACH_LOONGSON32_REGS_PWM_H
-
-/* Loongson 1 PWM Timer Register Definitions */
-#define PWM_CNT                        0x0
-#define PWM_HRC                        0x4
-#define PWM_LRC                        0x8
-#define PWM_CTRL               0xc
-
-/* PWM Control Register Bits */
-#define CNT_RST                        BIT(7)
-#define INT_SR                 BIT(6)
-#define INT_EN                 BIT(5)
-#define PWM_SINGLE             BIT(4)
-#define PWM_OE                 BIT(3)
-#define CNT_EN                 BIT(0)
-
-#endif /* __ASM_MACH_LOONGSON32_REGS_PWM_H */
index 0145bbf..5719ff4 100644 (file)
@@ -33,6 +33,7 @@ struct plat_smp_ops {
 #ifdef CONFIG_HOTPLUG_CPU
        int (*cpu_disable)(void);
        void (*cpu_die)(unsigned int cpu);
+       void (*cleanup_dead_cpu)(unsigned cpu);
 #endif
 #ifdef CONFIG_KEXEC
        void (*kexec_nonboot_cpu)(void);
index 6d15a39..e79adcb 100644 (file)
@@ -1502,6 +1502,10 @@ static inline void cpu_probe_alchemy(struct cpuinfo_mips *c, unsigned int cpu)
                        break;
                }
                break;
+       case PRID_IMP_NETLOGIC_AU13XX:
+               c->cputype = CPU_ALCHEMY;
+               __cpu_name[cpu] = "Au1300";
+               break;
        }
 }
 
@@ -1863,6 +1867,7 @@ void cpu_probe(void)
                cpu_probe_mips(c, cpu);
                break;
        case PRID_COMP_ALCHEMY:
+       case PRID_COMP_NETLOGIC:
                cpu_probe_alchemy(c, cpu);
                break;
        case PRID_COMP_SIBYTE:
index febdc55..cb871eb 100644 (file)
@@ -11,6 +11,8 @@
  * Copyright (C) 2000, 2001, 2002, 2007         Maciej W. Rozycki
  */
 #include <linux/init.h>
+#include <linux/cpu.h>
+#include <linux/delay.h>
 #include <linux/ioport.h>
 #include <linux/export.h>
 #include <linux/screen_info.h>
@@ -158,10 +160,6 @@ static unsigned long __init init_initrd(void)
                pr_err("initrd start must be page aligned\n");
                goto disable;
        }
-       if (initrd_start < PAGE_OFFSET) {
-               pr_err("initrd start < PAGE_OFFSET\n");
-               goto disable;
-       }
 
        /*
         * Sanitize initrd addresses. For example firmware
@@ -174,6 +172,11 @@ static unsigned long __init init_initrd(void)
        initrd_end = (unsigned long)__va(end);
        initrd_start = (unsigned long)__va(__pa(initrd_start));
 
+       if (initrd_start < PAGE_OFFSET) {
+               pr_err("initrd start < PAGE_OFFSET\n");
+               goto disable;
+       }
+
        ROOT_DEV = Root_RAM0;
        return PFN_UP(end);
 disable:
@@ -840,3 +843,14 @@ static int __init setnocoherentio(char *str)
 }
 early_param("nocoherentio", setnocoherentio);
 #endif
+
+void __init arch_cpu_finalize_init(void)
+{
+       unsigned int cpu = smp_processor_id();
+
+       cpu_data[cpu].udelay_val = loops_per_jiffy;
+       check_bugs32();
+
+       if (IS_ENABLED(CONFIG_CPU_R4X00_BUGS64))
+               check_bugs64();
+}
index 15466d4..c074ecc 100644 (file)
@@ -392,6 +392,7 @@ static void bmips_cpu_die(unsigned int cpu)
 void __ref play_dead(void)
 {
        idle_task_exit();
+       cpuhp_ap_report_dead();
 
        /* flush data cache */
        _dma_cache_wback_inv(0, ~0);
index 62f677b..d7fdbec 100644 (file)
@@ -503,8 +503,7 @@ void play_dead(void)
                }
        }
 
-       /* This CPU has chosen its way out */
-       (void)cpu_report_death();
+       cpuhp_ap_report_dead();
 
        cps_shutdown_this_cpu(cpu_death);
 
@@ -527,7 +526,9 @@ static void wait_for_sibling_halt(void *ptr_cpu)
        } while (!(halted & TCHALT_H));
 }
 
-static void cps_cpu_die(unsigned int cpu)
+static void cps_cpu_die(unsigned int cpu) { }
+
+static void cps_cleanup_dead_cpu(unsigned cpu)
 {
        unsigned core = cpu_core(&cpu_data[cpu]);
        unsigned int vpe_id = cpu_vpe_id(&cpu_data[cpu]);
@@ -535,12 +536,6 @@ static void cps_cpu_die(unsigned int cpu)
        unsigned stat;
        int err;
 
-       /* Wait for the cpu to choose its way out */
-       if (!cpu_wait_death(cpu, 5)) {
-               pr_err("CPU%u: didn't offline\n", cpu);
-               return;
-       }
-
        /*
         * Now wait for the CPU to actually offline. Without doing this that
         * offlining may race with one or more of:
@@ -624,6 +619,7 @@ static const struct plat_smp_ops cps_smp_ops = {
 #ifdef CONFIG_HOTPLUG_CPU
        .cpu_disable            = cps_cpu_disable,
        .cpu_die                = cps_cpu_die,
+       .cleanup_dead_cpu       = cps_cleanup_dead_cpu,
 #endif
 #ifdef CONFIG_KEXEC
        .kexec_nonboot_cpu      = cps_kexec_nonboot_cpu,
index 1d93b85..90c71d8 100644 (file)
@@ -690,6 +690,14 @@ void flush_tlb_one(unsigned long vaddr)
 EXPORT_SYMBOL(flush_tlb_page);
 EXPORT_SYMBOL(flush_tlb_one);
 
+#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
+{
+       if (mp_ops->cleanup_dead_cpu)
+               mp_ops->cleanup_dead_cpu(cpu);
+}
+#endif
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 
 static void tick_broadcast_callee(void *info)
index 2ef9da0..a7c5009 100644 (file)
@@ -35,41 +35,4 @@ config LOONGSON1_LS1C
        select COMMON_CLK
 endchoice
 
-menuconfig CEVT_CSRC_LS1X
-       bool "Use PWM Timer for clockevent/clocksource"
-       select MIPS_EXTERNAL_TIMER
-       depends on CPU_LOONGSON32
-       help
-         This option changes the default clockevent/clocksource to PWM Timer,
-         and is required by Loongson1 CPUFreq support.
-
-         If unsure, say N.
-
-choice
-       prompt "Select clockevent/clocksource"
-       depends on CEVT_CSRC_LS1X
-       default TIMER_USE_PWM0
-
-config TIMER_USE_PWM0
-       bool "Use PWM Timer 0"
-       help
-         Use PWM Timer 0 as the default clockevent/clocksourcer.
-
-config TIMER_USE_PWM1
-       bool "Use PWM Timer 1"
-       help
-         Use PWM Timer 1 as the default clockevent/clocksourcer.
-
-config TIMER_USE_PWM2
-       bool "Use PWM Timer 2"
-       help
-         Use PWM Timer 2 as the default clockevent/clocksourcer.
-
-config TIMER_USE_PWM3
-       bool "Use PWM Timer 3"
-       help
-         Use PWM Timer 3 as the default clockevent/clocksourcer.
-
-endchoice
-
 endif # MACH_LOONGSON32
index 965c04a..74ad2b1 100644 (file)
@@ -5,208 +5,8 @@
 
 #include <linux/clk.h>
 #include <linux/of_clk.h>
-#include <linux/interrupt.h>
-#include <linux/sizes.h>
 #include <asm/time.h>
 
-#include <loongson1.h>
-#include <platform.h>
-
-#ifdef CONFIG_CEVT_CSRC_LS1X
-
-#if defined(CONFIG_TIMER_USE_PWM1)
-#define LS1X_TIMER_BASE        LS1X_PWM1_BASE
-#define LS1X_TIMER_IRQ LS1X_PWM1_IRQ
-
-#elif defined(CONFIG_TIMER_USE_PWM2)
-#define LS1X_TIMER_BASE        LS1X_PWM2_BASE
-#define LS1X_TIMER_IRQ LS1X_PWM2_IRQ
-
-#elif defined(CONFIG_TIMER_USE_PWM3)
-#define LS1X_TIMER_BASE        LS1X_PWM3_BASE
-#define LS1X_TIMER_IRQ LS1X_PWM3_IRQ
-
-#else
-#define LS1X_TIMER_BASE        LS1X_PWM0_BASE
-#define LS1X_TIMER_IRQ LS1X_PWM0_IRQ
-#endif
-
-DEFINE_RAW_SPINLOCK(ls1x_timer_lock);
-
-static void __iomem *timer_reg_base;
-static uint32_t ls1x_jiffies_per_tick;
-
-static inline void ls1x_pwmtimer_set_period(uint32_t period)
-{
-       __raw_writel(period, timer_reg_base + PWM_HRC);
-       __raw_writel(period, timer_reg_base + PWM_LRC);
-}
-
-static inline void ls1x_pwmtimer_restart(void)
-{
-       __raw_writel(0x0, timer_reg_base + PWM_CNT);
-       __raw_writel(INT_EN | CNT_EN, timer_reg_base + PWM_CTRL);
-}
-
-void __init ls1x_pwmtimer_init(void)
-{
-       timer_reg_base = ioremap(LS1X_TIMER_BASE, SZ_16);
-       if (!timer_reg_base)
-               panic("Failed to remap timer registers");
-
-       ls1x_jiffies_per_tick = DIV_ROUND_CLOSEST(mips_hpt_frequency, HZ);
-
-       ls1x_pwmtimer_set_period(ls1x_jiffies_per_tick);
-       ls1x_pwmtimer_restart();
-}
-
-static u64 ls1x_clocksource_read(struct clocksource *cs)
-{
-       unsigned long flags;
-       int count;
-       u32 jifs;
-       static int old_count;
-       static u32 old_jifs;
-
-       raw_spin_lock_irqsave(&ls1x_timer_lock, flags);
-       /*
-        * Although our caller may have the read side of xtime_lock,
-        * this is now a seqlock, and we are cheating in this routine
-        * by having side effects on state that we cannot undo if
-        * there is a collision on the seqlock and our caller has to
-        * retry.  (Namely, old_jifs and old_count.)  So we must treat
-        * jiffies as volatile despite the lock.  We read jiffies
-        * before latching the timer count to guarantee that although
-        * the jiffies value might be older than the count (that is,
-        * the counter may underflow between the last point where
-        * jiffies was incremented and the point where we latch the
-        * count), it cannot be newer.
-        */
-       jifs = jiffies;
-       /* read the count */
-       count = __raw_readl(timer_reg_base + PWM_CNT);
-
-       /*
-        * It's possible for count to appear to go the wrong way for this
-        * reason:
-        *
-        *  The timer counter underflows, but we haven't handled the resulting
-        *  interrupt and incremented jiffies yet.
-        *
-        * Previous attempts to handle these cases intelligently were buggy, so
-        * we just do the simple thing now.
-        */
-       if (count < old_count && jifs == old_jifs)
-               count = old_count;
-
-       old_count = count;
-       old_jifs = jifs;
-
-       raw_spin_unlock_irqrestore(&ls1x_timer_lock, flags);
-
-       return (u64) (jifs * ls1x_jiffies_per_tick) + count;
-}
-
-static struct clocksource ls1x_clocksource = {
-       .name           = "ls1x-pwmtimer",
-       .read           = ls1x_clocksource_read,
-       .mask           = CLOCKSOURCE_MASK(24),
-       .flags          = CLOCK_SOURCE_IS_CONTINUOUS,
-};
-
-static irqreturn_t ls1x_clockevent_isr(int irq, void *devid)
-{
-       struct clock_event_device *cd = devid;
-
-       ls1x_pwmtimer_restart();
-       cd->event_handler(cd);
-
-       return IRQ_HANDLED;
-}
-
-static int ls1x_clockevent_set_state_periodic(struct clock_event_device *cd)
-{
-       raw_spin_lock(&ls1x_timer_lock);
-       ls1x_pwmtimer_set_period(ls1x_jiffies_per_tick);
-       ls1x_pwmtimer_restart();
-       __raw_writel(INT_EN | CNT_EN, timer_reg_base + PWM_CTRL);
-       raw_spin_unlock(&ls1x_timer_lock);
-
-       return 0;
-}
-
-static int ls1x_clockevent_tick_resume(struct clock_event_device *cd)
-{
-       raw_spin_lock(&ls1x_timer_lock);
-       __raw_writel(INT_EN | CNT_EN, timer_reg_base + PWM_CTRL);
-       raw_spin_unlock(&ls1x_timer_lock);
-
-       return 0;
-}
-
-static int ls1x_clockevent_set_state_shutdown(struct clock_event_device *cd)
-{
-       raw_spin_lock(&ls1x_timer_lock);
-       __raw_writel(__raw_readl(timer_reg_base + PWM_CTRL) & ~CNT_EN,
-                    timer_reg_base + PWM_CTRL);
-       raw_spin_unlock(&ls1x_timer_lock);
-
-       return 0;
-}
-
-static int ls1x_clockevent_set_next(unsigned long evt,
-                                   struct clock_event_device *cd)
-{
-       raw_spin_lock(&ls1x_timer_lock);
-       ls1x_pwmtimer_set_period(evt);
-       ls1x_pwmtimer_restart();
-       raw_spin_unlock(&ls1x_timer_lock);
-
-       return 0;
-}
-
-static struct clock_event_device ls1x_clockevent = {
-       .name                   = "ls1x-pwmtimer",
-       .features               = CLOCK_EVT_FEAT_PERIODIC,
-       .rating                 = 300,
-       .irq                    = LS1X_TIMER_IRQ,
-       .set_next_event         = ls1x_clockevent_set_next,
-       .set_state_shutdown     = ls1x_clockevent_set_state_shutdown,
-       .set_state_periodic     = ls1x_clockevent_set_state_periodic,
-       .set_state_oneshot      = ls1x_clockevent_set_state_shutdown,
-       .tick_resume            = ls1x_clockevent_tick_resume,
-};
-
-static void __init ls1x_time_init(void)
-{
-       struct clock_event_device *cd = &ls1x_clockevent;
-       int ret;
-
-       if (!mips_hpt_frequency)
-               panic("Invalid timer clock rate");
-
-       ls1x_pwmtimer_init();
-
-       clockevent_set_clock(cd, mips_hpt_frequency);
-       cd->max_delta_ns = clockevent_delta2ns(0xffffff, cd);
-       cd->max_delta_ticks = 0xffffff;
-       cd->min_delta_ns = clockevent_delta2ns(0x000300, cd);
-       cd->min_delta_ticks = 0x000300;
-       cd->cpumask = cpumask_of(smp_processor_id());
-       clockevents_register_device(cd);
-
-       ls1x_clocksource.rating = 200 + mips_hpt_frequency / 10000000;
-       ret = clocksource_register_hz(&ls1x_clocksource, mips_hpt_frequency);
-       if (ret)
-               panic(KERN_ERR "Failed to register clocksource: %d\n", ret);
-
-       if (request_irq(LS1X_TIMER_IRQ, ls1x_clockevent_isr,
-                       IRQF_PERCPU | IRQF_TIMER, "ls1x-pwmtimer",
-                       &ls1x_clockevent))
-               pr_err("Failed to register ls1x-pwmtimer interrupt\n");
-}
-#endif /* CONFIG_CEVT_CSRC_LS1X */
-
 void __init plat_time_init(void)
 {
        struct clk *clk = NULL;
@@ -214,20 +14,10 @@ void __init plat_time_init(void)
        /* initialize LS1X clocks */
        of_clk_init(NULL);
 
-#ifdef CONFIG_CEVT_CSRC_LS1X
-       /* setup LS1X PWM timer */
-       clk = clk_get(NULL, "ls1x-pwmtimer");
-       if (IS_ERR(clk))
-               panic("unable to get timer clock, err=%ld", PTR_ERR(clk));
-
-       mips_hpt_frequency = clk_get_rate(clk);
-       ls1x_time_init();
-#else
        /* setup mips r4k timer */
        clk = clk_get(NULL, "cpu_clk");
        if (IS_ERR(clk))
                panic("unable to get cpu clock, err=%ld", PTR_ERR(clk));
 
        mips_hpt_frequency = clk_get_rate(clk) / 2;
-#endif /* CONFIG_CEVT_CSRC_LS1X */
 }
index b0e8bb9..cdecd7a 100644 (file)
@@ -775,6 +775,7 @@ void play_dead(void)
        void (*play_dead_at_ckseg1)(int *);
 
        idle_task_exit();
+       cpuhp_ap_report_dead();
 
        prid_imp = read_c0_prid() & PRID_IMP_MASK;
        prid_rev = read_c0_prid() & PRID_REV_MASK;
index 56339be..0e7e5b0 100644 (file)
@@ -97,7 +97,7 @@
                        rx-fifo-depth = <8192>;
                        tx-fifo-depth = <8192>;
                        address-bits = <48>;
-                       max-frame-size = <1518>;
+                       max-frame-size = <1500>;
                        local-mac-address = [00 00 00 00 00 00];
                        altr,has-supplementary-unicast;
                        altr,enable-sup-addr = <1>;
index d10fb81..3ee3169 100644 (file)
                                interrupt-names = "rx_irq", "tx_irq";
                                rx-fifo-depth = <8192>;
                                tx-fifo-depth = <8192>;
-                               max-frame-size = <1518>;
+                               max-frame-size = <1500>;
                                local-mac-address = [ 00 00 00 00 00 00 ];
                                phy-mode = "rgmii-id";
                                phy-handle = <&phy0>;
index 203870c..338849c 100644 (file)
@@ -47,7 +47,7 @@ void __init setup_cpuinfo(void)
 
        str = of_get_property(cpu, "altr,implementation", &len);
        if (str)
-               strlcpy(cpuinfo.cpu_impl, str, sizeof(cpuinfo.cpu_impl));
+               strscpy(cpuinfo.cpu_impl, str, sizeof(cpuinfo.cpu_impl));
        else
                strcpy(cpuinfo.cpu_impl, "<unknown>");
 
index 40bc8fb..8582ed9 100644 (file)
@@ -121,7 +121,7 @@ asmlinkage void __init nios2_boot_init(unsigned r4, unsigned r5, unsigned r6,
                dtb_passed = r6;
 
                if (r7)
-                       strlcpy(cmdline_passed, (char *)r7, COMMAND_LINE_SIZE);
+                       strscpy(cmdline_passed, (char *)r7, COMMAND_LINE_SIZE);
        }
 #endif
 
@@ -129,10 +129,10 @@ asmlinkage void __init nios2_boot_init(unsigned r4, unsigned r5, unsigned r6,
 
 #ifndef CONFIG_CMDLINE_FORCE
        if (cmdline_passed[0])
-               strlcpy(boot_command_line, cmdline_passed, COMMAND_LINE_SIZE);
+               strscpy(boot_command_line, cmdline_passed, COMMAND_LINE_SIZE);
 #ifdef CONFIG_NIOS2_CMDLINE_IGNORE_DTB
        else
-               strlcpy(boot_command_line, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
+               strscpy(boot_command_line, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
 #endif
 #endif
 
index 326167e..8ce67ec 100644 (file)
@@ -130,7 +130,4 @@ static inline int arch_atomic_fetch_add_unless(atomic_t *v, int a, int u)
 
 #include <asm/cmpxchg.h>
 
-#define arch_atomic_xchg(ptr, v)               (arch_xchg(&(ptr)->counter, (v)))
-#define arch_atomic_cmpxchg(v, old, new)       (arch_cmpxchg(&((v)->counter), (old), (new)))
-
 #endif /* __ASM_OPENRISC_ATOMIC_H */
index 466a255..c0b4b1c 100644 (file)
@@ -57,6 +57,7 @@ config PARISC
        select HAVE_ARCH_SECCOMP_FILTER
        select HAVE_ARCH_TRACEHOOK
        select HAVE_REGS_AND_STACK_ACCESS_API
+       select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
        select GENERIC_SCHED_CLOCK
        select GENERIC_IRQ_MIGRATION if SMP
        select HAVE_UNSTABLE_SCHED_CLOCK if SMP
@@ -130,6 +131,10 @@ config PM
 config STACKTRACE_SUPPORT
        def_bool y
 
+config LOCKDEP_SUPPORT
+       bool
+       default y
+
 config ISA_DMA_API
        bool
 
index f66554c..3a059cb 100644 (file)
@@ -1 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0
+#
+config LIGHTWEIGHT_SPINLOCK_CHECK
+       bool "Enable lightweight spinlock checks"
+       depends on SMP && !DEBUG_SPINLOCK
+       default y
+       help
+         Add checks with low performance impact to the spinlock functions
+         to catch memory overwrites at runtime. For more advanced
+         spinlock debugging you should choose the DEBUG_SPINLOCK option
+         which will detect unitialized spinlocks too.
+         If unsure say Y here.
index 0f0d4a4..75677b5 100644 (file)
 #include <asm/asmregs.h>
 #include <asm/psw.h>
 
-       sp      =       30
-       gp      =       27
-       ipsw    =       22
-
        /*
         * We provide two versions of each macro to convert from physical
         * to virtual and vice versa. The "_r1" versions take one argument
index dd5a299..d4f0238 100644 (file)
@@ -73,10 +73,6 @@ static __inline__ int arch_atomic_read(const atomic_t *v)
        return READ_ONCE((v)->counter);
 }
 
-/* exported interface */
-#define arch_atomic_cmpxchg(v, o, n)   (arch_cmpxchg(&((v)->counter), (o), (n)))
-#define arch_atomic_xchg(v, new)       (arch_xchg(&((v)->counter), new))
-
 #define ATOMIC_OP(op, c_op)                                            \
 static __inline__ void arch_atomic_##op(int i, atomic_t *v)            \
 {                                                                      \
@@ -122,6 +118,11 @@ static __inline__ int arch_atomic_fetch_##op(int i, atomic_t *v)   \
 ATOMIC_OPS(add, +=)
 ATOMIC_OPS(sub, -=)
 
+#define arch_atomic_add_return arch_atomic_add_return
+#define arch_atomic_sub_return arch_atomic_sub_return
+#define arch_atomic_fetch_add  arch_atomic_fetch_add
+#define arch_atomic_fetch_sub  arch_atomic_fetch_sub
+
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(op, c_op)                                           \
        ATOMIC_OP(op, c_op)                                             \
@@ -131,6 +132,10 @@ ATOMIC_OPS(and, &=)
 ATOMIC_OPS(or, |=)
 ATOMIC_OPS(xor, ^=)
 
+#define arch_atomic_fetch_and  arch_atomic_fetch_and
+#define arch_atomic_fetch_or   arch_atomic_fetch_or
+#define arch_atomic_fetch_xor  arch_atomic_fetch_xor
+
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
@@ -185,6 +190,11 @@ static __inline__ s64 arch_atomic64_fetch_##op(s64 i, atomic64_t *v)       \
 ATOMIC64_OPS(add, +=)
 ATOMIC64_OPS(sub, -=)
 
+#define arch_atomic64_add_return       arch_atomic64_add_return
+#define arch_atomic64_sub_return       arch_atomic64_sub_return
+#define arch_atomic64_fetch_add                arch_atomic64_fetch_add
+#define arch_atomic64_fetch_sub                arch_atomic64_fetch_sub
+
 #undef ATOMIC64_OPS
 #define ATOMIC64_OPS(op, c_op)                                         \
        ATOMIC64_OP(op, c_op)                                           \
@@ -194,6 +204,10 @@ ATOMIC64_OPS(and, &=)
 ATOMIC64_OPS(or, |=)
 ATOMIC64_OPS(xor, ^=)
 
+#define arch_atomic64_fetch_and                arch_atomic64_fetch_and
+#define arch_atomic64_fetch_or         arch_atomic64_fetch_or
+#define arch_atomic64_fetch_xor                arch_atomic64_fetch_xor
+
 #undef ATOMIC64_OPS
 #undef ATOMIC64_FETCH_OP
 #undef ATOMIC64_OP_RETURN
@@ -218,11 +232,6 @@ arch_atomic64_read(const atomic64_t *v)
        return READ_ONCE((v)->counter);
 }
 
-/* exported interface */
-#define arch_atomic64_cmpxchg(v, o, n) \
-       ((__typeof__((v)->counter))arch_cmpxchg(&((v)->counter), (o), (n)))
-#define arch_atomic64_xchg(v, new) (arch_xchg(&((v)->counter), new))
-
 #endif /* !CONFIG_64BIT */
 
 
diff --git a/arch/parisc/include/asm/bugs.h b/arch/parisc/include/asm/bugs.h
deleted file mode 100644 (file)
index 0a7f9db..0000000
+++ /dev/null
@@ -1,20 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- *  include/asm-parisc/bugs.h
- *
- *  Copyright (C) 1999 Mike Shaver
- */
-
-/*
- * This is included by init/main.c to check for architecture-dependent bugs.
- *
- * Needs:
- *     void check_bugs(void);
- */
-
-#include <asm/processor.h>
-
-static inline void check_bugs(void)
-{
-//     identify_cpu(&boot_cpu_data);
-}
index 0bdee67..c8b6928 100644 (file)
@@ -48,6 +48,10 @@ void flush_dcache_page(struct page *page);
 
 #define flush_dcache_mmap_lock(mapping)                xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)      xa_unlock_irq(&mapping->i_pages)
+#define flush_dcache_mmap_lock_irqsave(mapping, flags)         \
+               xa_lock_irqsave(&mapping->i_pages, flags)
+#define flush_dcache_mmap_unlock_irqrestore(mapping, flags)    \
+               xa_unlock_irqrestore(&mapping->i_pages, flags)
 
 #define flush_icache_page(vma,page)    do {            \
        flush_kernel_dcache_page_addr(page_address(page)); \
index e2950f5..5656395 100644 (file)
@@ -413,12 +413,12 @@ extern void paging_init (void);
  *   For the 64bit version, the offset is extended by 32bit.
  */
 #define __swp_type(x)                     ((x).val & 0x1f)
-#define __swp_offset(x)                   ( (((x).val >> 6) &  0x7) | \
-                                         (((x).val >> 8) & ~0x7) )
+#define __swp_offset(x)                   ( (((x).val >> 5) & 0x7) | \
+                                         (((x).val >> 10) << 3) )
 #define __swp_entry(type, offset)         ((swp_entry_t) { \
                                            ((type) & 0x1f) | \
-                                           ((offset &  0x7) << 6) | \
-                                           ((offset & ~0x7) << 8) })
+                                           ((offset & 0x7) << 5) | \
+                                           ((offset >> 3) << 10) })
 #define __pte_to_swp_entry(pte)                ((swp_entry_t) { pte_val(pte) })
 #define __swp_entry_to_pte(x)          ((pte_t) { (x).val })
 
@@ -472,9 +472,6 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
 
 #define pte_same(A,B)  (pte_val(A) == pte_val(B))
 
-struct seq_file;
-extern void arch_report_meminfo(struct seq_file *m);
-
 #endif /* !__ASSEMBLY__ */
 
 
index a6e5d66..edfcb98 100644 (file)
@@ -7,10 +7,26 @@
 #include <asm/processor.h>
 #include <asm/spinlock_types.h>
 
+#define SPINLOCK_BREAK_INSN    0x0000c006      /* break 6,6 */
+
+static inline void arch_spin_val_check(int lock_val)
+{
+       if (IS_ENABLED(CONFIG_LIGHTWEIGHT_SPINLOCK_CHECK))
+               asm volatile(   "andcm,= %0,%1,%%r0\n"
+                               ".word %2\n"
+               : : "r" (lock_val), "r" (__ARCH_SPIN_LOCK_UNLOCKED_VAL),
+                       "i" (SPINLOCK_BREAK_INSN));
+}
+
 static inline int arch_spin_is_locked(arch_spinlock_t *x)
 {
-       volatile unsigned int *a = __ldcw_align(x);
-       return READ_ONCE(*a) == 0;
+       volatile unsigned int *a;
+       int lock_val;
+
+       a = __ldcw_align(x);
+       lock_val = READ_ONCE(*a);
+       arch_spin_val_check(lock_val);
+       return (lock_val == 0);
 }
 
 static inline void arch_spin_lock(arch_spinlock_t *x)
@@ -18,9 +34,18 @@ static inline void arch_spin_lock(arch_spinlock_t *x)
        volatile unsigned int *a;
 
        a = __ldcw_align(x);
-       while (__ldcw(a) == 0)
+       do {
+               int lock_val_old;
+
+               lock_val_old = __ldcw(a);
+               arch_spin_val_check(lock_val_old);
+               if (lock_val_old)
+                       return; /* got lock */
+
+               /* wait until we should try to get lock again */
                while (*a == 0)
                        continue;
+       } while (1);
 }
 
 static inline void arch_spin_unlock(arch_spinlock_t *x)
@@ -29,15 +54,19 @@ static inline void arch_spin_unlock(arch_spinlock_t *x)
 
        a = __ldcw_align(x);
        /* Release with ordered store. */
-       __asm__ __volatile__("stw,ma %0,0(%1)" : : "r"(1), "r"(a) : "memory");
+       __asm__ __volatile__("stw,ma %0,0(%1)"
+               : : "r"(__ARCH_SPIN_LOCK_UNLOCKED_VAL), "r"(a) : "memory");
 }
 
 static inline int arch_spin_trylock(arch_spinlock_t *x)
 {
        volatile unsigned int *a;
+       int lock_val;
 
        a = __ldcw_align(x);
-       return __ldcw(a) != 0;
+       lock_val = __ldcw(a);
+       arch_spin_val_check(lock_val);
+       return lock_val != 0;
 }
 
 /*
index ca39ee3..d659340 100644 (file)
@@ -2,13 +2,17 @@
 #ifndef __ASM_SPINLOCK_TYPES_H
 #define __ASM_SPINLOCK_TYPES_H
 
+#define __ARCH_SPIN_LOCK_UNLOCKED_VAL  0x1a46
+
 typedef struct {
 #ifdef CONFIG_PA20
        volatile unsigned int slock;
-# define __ARCH_SPIN_LOCK_UNLOCKED { 1 }
+# define __ARCH_SPIN_LOCK_UNLOCKED { __ARCH_SPIN_LOCK_UNLOCKED_VAL }
 #else
        volatile unsigned int lock[4];
-# define __ARCH_SPIN_LOCK_UNLOCKED     { { 1, 1, 1, 1 } }
+# define __ARCH_SPIN_LOCK_UNLOCKED     \
+       { { __ARCH_SPIN_LOCK_UNLOCKED_VAL, __ARCH_SPIN_LOCK_UNLOCKED_VAL, \
+           __ARCH_SPIN_LOCK_UNLOCKED_VAL, __ARCH_SPIN_LOCK_UNLOCKED_VAL } }
 #endif
 } arch_spinlock_t;
 
index 66f5672..25c4d6c 100644 (file)
@@ -25,7 +25,7 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 {
        struct alt_instr *entry;
        int index = 0, applied = 0;
-       int num_cpus = num_online_cpus();
+       int num_cpus = num_present_cpus();
        u16 cond_check;
 
        cond_check = ALT_COND_ALWAYS |
index 1d3b8bc..ca4a302 100644 (file)
@@ -399,6 +399,7 @@ void flush_dcache_page(struct page *page)
        unsigned long offset;
        unsigned long addr, old_addr = 0;
        unsigned long count = 0;
+       unsigned long flags;
        pgoff_t pgoff;
 
        if (mapping && !mapping_mapped(mapping)) {
@@ -420,7 +421,7 @@ void flush_dcache_page(struct page *page)
         * to flush one address here for them all to become coherent
         * on machines that support equivalent aliasing
         */
-       flush_dcache_mmap_lock(mapping);
+       flush_dcache_mmap_lock_irqsave(mapping, flags);
        vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
                offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
                addr = mpnt->vm_start + offset;
@@ -460,7 +461,7 @@ void flush_dcache_page(struct page *page)
                }
                WARN_ON(++count == 4096);
        }
-       flush_dcache_mmap_unlock(mapping);
+       flush_dcache_mmap_unlock_irqrestore(mapping, flags);
 }
 EXPORT_SYMBOL(flush_dcache_page);
 
index 5eb7f30..db57345 100644 (file)
@@ -4,6 +4,8 @@
 #include <linux/console.h>
 #include <linux/kexec.h>
 #include <linux/delay.h>
+#include <linux/reboot.h>
+
 #include <asm/cacheflush.h>
 #include <asm/sections.h>
 
index ba87f79..71ed539 100644 (file)
@@ -446,11 +446,27 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr,
 void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
                enum dma_data_direction dir)
 {
+       /*
+        * fdc: The data cache line is written back to memory, if and only if
+        * it is dirty, and then invalidated from the data cache.
+        */
        flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
 }
 
 void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
                enum dma_data_direction dir)
 {
-       flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
+       unsigned long addr = (unsigned long) phys_to_virt(paddr);
+
+       switch (dir) {
+       case DMA_TO_DEVICE:
+       case DMA_BIDIRECTIONAL:
+               flush_kernel_dcache_range(addr, size);
+               return;
+       case DMA_FROM_DEVICE:
+               purge_kernel_dcache_range_asm(addr, addr + size);
+               return;
+       default:
+               BUG();
+       }
 }
index 97c6f87..abdbf03 100644 (file)
@@ -122,13 +122,18 @@ void machine_power_off(void)
        /* It seems we have no way to power the system off via
         * software. The user has to press the button himself. */
 
-       printk(KERN_EMERG "System shut down completed.\n"
-              "Please power this system off now.");
+       printk("Power off or press RETURN to reboot.\n");
 
        /* prevent soft lockup/stalled CPU messages for endless loop. */
        rcu_sysrq_start();
        lockup_detector_soft_poweroff();
-       for (;;);
+       while (1) {
+               /* reboot if user presses RETURN key */
+               if (pdc_iodc_getc() == 13) {
+                       printk("Rebooting...\n");
+                       machine_restart(NULL);
+               }
+       }
 }
 
 void (*pm_power_off)(void);
@@ -166,8 +171,8 @@ void __noreturn arch_cpu_idle_dead(void)
 
        local_irq_disable();
 
-       /* Tell __cpu_die() that this CPU is now safe to dispose of. */
-       (void)cpu_report_death();
+       /* Tell the core that this CPU is now safe to dispose of. */
+       cpuhp_ap_report_dead();
 
        /* Ensure that the cache lines are written out. */
        flush_cache_all_local();
index b7fc859..6b6eaa4 100644 (file)
@@ -500,11 +500,10 @@ int __cpu_disable(void)
 void __cpu_die(unsigned int cpu)
 {
        pdc_cpu_rendezvous_lock();
+}
 
-       if (!cpu_wait_death(cpu, 5)) {
-               pr_crit("CPU%u: cpu didn't die\n", cpu);
-               return;
-       }
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
+{
        pr_info("CPU%u: is shutting down\n", cpu);
 
        /* set task's state to interruptible sleep */
index f9696fb..304eebd 100644 (file)
 #include <linux/kgdb.h>
 #include <linux/kprobes.h>
 
+#if defined(CONFIG_LIGHTWEIGHT_SPINLOCK_CHECK)
+#include <asm/spinlock.h>
+#endif
+
 #include "../math-emu/math-emu.h"      /* for handle_fpe() */
 
 static void parisc_show_stack(struct task_struct *task,
@@ -291,24 +295,30 @@ static void handle_break(struct pt_regs *regs)
        }
 
 #ifdef CONFIG_KPROBES
-       if (unlikely(iir == PARISC_KPROBES_BREAK_INSN)) {
+       if (unlikely(iir == PARISC_KPROBES_BREAK_INSN && !user_mode(regs))) {
                parisc_kprobe_break_handler(regs);
                return;
        }
-       if (unlikely(iir == PARISC_KPROBES_BREAK_INSN2)) {
+       if (unlikely(iir == PARISC_KPROBES_BREAK_INSN2 && !user_mode(regs))) {
                parisc_kprobe_ss_handler(regs);
                return;
        }
 #endif
 
 #ifdef CONFIG_KGDB
-       if (unlikely(iir == PARISC_KGDB_COMPILED_BREAK_INSN ||
-               iir == PARISC_KGDB_BREAK_INSN)) {
+       if (unlikely((iir == PARISC_KGDB_COMPILED_BREAK_INSN ||
+               iir == PARISC_KGDB_BREAK_INSN)) && !user_mode(regs)) {
                kgdb_handle_exception(9, SIGTRAP, 0, regs);
                return;
        }
 #endif
 
+#ifdef CONFIG_LIGHTWEIGHT_SPINLOCK_CHECK
+        if ((iir == SPINLOCK_BREAK_INSN) && !user_mode(regs)) {
+               die_if_kernel("Spinlock was trashed", regs, 1);
+       }
+#endif
+
        if (unlikely(iir != GDB_BREAK_INSN))
                parisc_printk_ratelimited(0, regs,
                        KERN_DEBUG "break %d,%d: pid=%d command='%s'\n",
index 539d1f0..bff5820 100644 (file)
@@ -906,11 +906,17 @@ config DATA_SHIFT
 
 config ARCH_FORCE_MAX_ORDER
        int "Order of maximal physically contiguous allocations"
+       range 7 8 if PPC64 && PPC_64K_PAGES
        default "8" if PPC64 && PPC_64K_PAGES
+       range 12 12 if PPC64 && !PPC_64K_PAGES
        default "12" if PPC64 && !PPC_64K_PAGES
+       range 8 10 if PPC32 && PPC_16K_PAGES
        default "8" if PPC32 && PPC_16K_PAGES
+       range 6 10 if PPC32 && PPC_64K_PAGES
        default "6" if PPC32 && PPC_64K_PAGES
+       range 4 10 if PPC32 && PPC_256K_PAGES
        default "4" if PPC32 && PPC_256K_PAGES
+       range 10 10
        default "10"
        help
          The kernel page allocator limits the size of maximal physically
index 85cde5b..771b794 100644 (file)
@@ -34,8 +34,6 @@ endif
 
 BOOTCFLAGS    := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
                 -fno-strict-aliasing -O2 -msoft-float -mno-altivec -mno-vsx \
-                $(call cc-option,-mno-prefixed) $(call cc-option,-mno-pcrel) \
-                $(call cc-option,-mno-mma) \
                 $(call cc-option,-mno-spe) $(call cc-option,-mspe=no) \
                 -pipe -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \
                 $(LINUXINCLUDE)
@@ -71,6 +69,10 @@ BOOTAFLAGS   := -D__ASSEMBLY__ $(BOOTCFLAGS) -nostdinc
 
 BOOTARFLAGS    := -crD
 
+BOOTCFLAGS     += $(call cc-option,-mno-prefixed) \
+                  $(call cc-option,-mno-pcrel) \
+                  $(call cc-option,-mno-mma)
+
 ifdef CONFIG_CC_IS_CLANG
 BOOTCFLAGS += $(CLANG_FLAGS)
 BOOTAFLAGS += $(CLANG_FLAGS)
index 7113f93..ad18725 100644 (file)
@@ -96,7 +96,7 @@ config CRYPTO_AES_PPC_SPE
 
 config CRYPTO_AES_GCM_P10
        tristate "Stitched AES/GCM acceleration support on P10 or later CPU (PPC)"
-       depends on PPC64 && CPU_LITTLE_ENDIAN
+       depends on PPC64 && CPU_LITTLE_ENDIAN && VSX
        select CRYPTO_LIB_AES
        select CRYPTO_ALGAPI
        select CRYPTO_AEAD
index 05c7486..7b4f516 100644 (file)
@@ -22,15 +22,15 @@ sha1-ppc-spe-y := sha1-spe-asm.o sha1-spe-glue.o
 sha256-ppc-spe-y := sha256-spe-asm.o sha256-spe-glue.o
 crc32c-vpmsum-y := crc32c-vpmsum_asm.o crc32c-vpmsum_glue.o
 crct10dif-vpmsum-y := crct10dif-vpmsum_asm.o crct10dif-vpmsum_glue.o
-aes-gcm-p10-crypto-y := aes-gcm-p10-glue.o aes-gcm-p10.o ghashp8-ppc.o aesp8-ppc.o
+aes-gcm-p10-crypto-y := aes-gcm-p10-glue.o aes-gcm-p10.o ghashp10-ppc.o aesp10-ppc.o
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $< $(if $(CONFIG_CPU_LITTLE_ENDIAN), linux-ppc64le, linux-ppc64) > $@
 
-targets += aesp8-ppc.S ghashp8-ppc.S
+targets += aesp10-ppc.S ghashp10-ppc.S
 
-$(obj)/aesp8-ppc.S $(obj)/ghashp8-ppc.S: $(obj)/%.S: $(src)/%.pl FORCE
+$(obj)/aesp10-ppc.S $(obj)/ghashp10-ppc.S: $(obj)/%.S: $(src)/%.pl FORCE
        $(call if_changed,perl)
 
-OBJECT_FILES_NON_STANDARD_aesp8-ppc.o := y
-OBJECT_FILES_NON_STANDARD_ghashp8-ppc.o := y
+OBJECT_FILES_NON_STANDARD_aesp10-ppc.o := y
+OBJECT_FILES_NON_STANDARD_ghashp10-ppc.o := y
index bd3475f..4b6e899 100644 (file)
@@ -30,15 +30,15 @@ MODULE_AUTHOR("Danny Tsen <dtsen@linux.ibm.com");
 MODULE_LICENSE("GPL v2");
 MODULE_ALIAS_CRYPTO("aes");
 
-asmlinkage int aes_p8_set_encrypt_key(const u8 *userKey, const int bits,
+asmlinkage int aes_p10_set_encrypt_key(const u8 *userKey, const int bits,
                                      void *key);
-asmlinkage void aes_p8_encrypt(const u8 *in, u8 *out, const void *key);
+asmlinkage void aes_p10_encrypt(const u8 *in, u8 *out, const void *key);
 asmlinkage void aes_p10_gcm_encrypt(u8 *in, u8 *out, size_t len,
                                    void *rkey, u8 *iv, void *Xi);
 asmlinkage void aes_p10_gcm_decrypt(u8 *in, u8 *out, size_t len,
                                    void *rkey, u8 *iv, void *Xi);
 asmlinkage void gcm_init_htable(unsigned char htable[256], unsigned char Xi[16]);
-asmlinkage void gcm_ghash_p8(unsigned char *Xi, unsigned char *Htable,
+asmlinkage void gcm_ghash_p10(unsigned char *Xi, unsigned char *Htable,
                unsigned char *aad, unsigned int alen);
 
 struct aes_key {
@@ -93,7 +93,7 @@ static void set_aad(struct gcm_ctx *gctx, struct Hash_ctx *hash,
        gctx->aadLen = alen;
        i = alen & ~0xf;
        if (i) {
-               gcm_ghash_p8(nXi, hash->Htable+32, aad, i);
+               gcm_ghash_p10(nXi, hash->Htable+32, aad, i);
                aad += i;
                alen -= i;
        }
@@ -102,7 +102,7 @@ static void set_aad(struct gcm_ctx *gctx, struct Hash_ctx *hash,
                        nXi[i] ^= aad[i];
 
                memset(gctx->aad_hash, 0, 16);
-               gcm_ghash_p8(gctx->aad_hash, hash->Htable+32, nXi, 16);
+               gcm_ghash_p10(gctx->aad_hash, hash->Htable+32, nXi, 16);
        } else {
                memcpy(gctx->aad_hash, nXi, 16);
        }
@@ -115,7 +115,7 @@ static void gcmp10_init(struct gcm_ctx *gctx, u8 *iv, unsigned char *rdkey,
 {
        __be32 counter = cpu_to_be32(1);
 
-       aes_p8_encrypt(hash->H, hash->H, rdkey);
+       aes_p10_encrypt(hash->H, hash->H, rdkey);
        set_subkey(hash->H);
        gcm_init_htable(hash->Htable+32, hash->H);
 
@@ -126,7 +126,7 @@ static void gcmp10_init(struct gcm_ctx *gctx, u8 *iv, unsigned char *rdkey,
        /*
         * Encrypt counter vector as iv tag and increment counter.
         */
-       aes_p8_encrypt(iv, gctx->ivtag, rdkey);
+       aes_p10_encrypt(iv, gctx->ivtag, rdkey);
 
        counter = cpu_to_be32(2);
        *((__be32 *)(iv+12)) = counter;
@@ -160,7 +160,7 @@ static void finish_tag(struct gcm_ctx *gctx, struct Hash_ctx *hash, int len)
        /*
         * hash (AAD len and len)
         */
-       gcm_ghash_p8(hash->Htable, hash->Htable+32, aclen, 16);
+       gcm_ghash_p10(hash->Htable, hash->Htable+32, aclen, 16);
 
        for (i = 0; i < 16; i++)
                hash->Htable[i] ^= gctx->ivtag[i];
@@ -192,7 +192,7 @@ static int p10_aes_gcm_setkey(struct crypto_aead *aead, const u8 *key,
        int ret;
 
        vsx_begin();
-       ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+       ret = aes_p10_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
        vsx_end();
 
        return ret ? -EINVAL : 0;
similarity index 99%
rename from arch/powerpc/crypto/aesp8-ppc.pl
rename to arch/powerpc/crypto/aesp10-ppc.pl
index 1f22aec..2c06ce2 100644 (file)
@@ -110,7 +110,7 @@ die "can't locate ppc-xlate.pl";
 open STDOUT,"| $^X $xlate $flavour ".shift || die "can't call $xlate: $!";
 
 $FRAME=8*$SIZE_T;
-$prefix="aes_p8";
+$prefix="aes_p10";
 
 $sp="r1";
 $vrsave="r12";
similarity index 97%
rename from arch/powerpc/crypto/ghashp8-ppc.pl
rename to arch/powerpc/crypto/ghashp10-ppc.pl
index b56603b..27a6b0b 100644 (file)
@@ -64,7 +64,7 @@ $code=<<___;
 
 .text
 
-.globl .gcm_init_p8
+.globl .gcm_init_p10
        lis             r0,0xfff0
        li              r8,0x10
        mfspr           $vrsave,256
@@ -110,7 +110,7 @@ $code=<<___;
        .long           0
        .byte           0,12,0x14,0,0,0,2,0
        .long           0
-.size  .gcm_init_p8,.-.gcm_init_p8
+.size  .gcm_init_p10,.-.gcm_init_p10
 
 .globl .gcm_init_htable
        lis             r0,0xfff0
@@ -237,7 +237,7 @@ $code=<<___;
        .long           0
 .size  .gcm_init_htable,.-.gcm_init_htable
 
-.globl .gcm_gmult_p8
+.globl .gcm_gmult_p10
        lis             r0,0xfff8
        li              r8,0x10
        mfspr           $vrsave,256
@@ -283,9 +283,9 @@ $code=<<___;
        .long           0
        .byte           0,12,0x14,0,0,0,2,0
        .long           0
-.size  .gcm_gmult_p8,.-.gcm_gmult_p8
+.size  .gcm_gmult_p10,.-.gcm_gmult_p10
 
-.globl .gcm_ghash_p8
+.globl .gcm_ghash_p10
        lis             r0,0xfff8
        li              r8,0x10
        mfspr           $vrsave,256
@@ -350,7 +350,7 @@ Loop:
        .long           0
        .byte           0,12,0x14,0,0,0,4,0
        .long           0
-.size  .gcm_ghash_p8,.-.gcm_ghash_p8
+.size  .gcm_ghash_p10,.-.gcm_ghash_p10
 
 .asciz  "GHASH for PowerISA 2.07, CRYPTOGAMS by <appro\@openssl.org>"
 .align  2
index 47228b1..5bf6a4d 100644 (file)
@@ -126,18 +126,6 @@ ATOMIC_OPS(xor, xor, "", K)
 #undef ATOMIC_OP_RETURN_RELAXED
 #undef ATOMIC_OP
 
-#define arch_atomic_cmpxchg(v, o, n) \
-       (arch_cmpxchg(&((v)->counter), (o), (n)))
-#define arch_atomic_cmpxchg_relaxed(v, o, n) \
-       arch_cmpxchg_relaxed(&((v)->counter), (o), (n))
-#define arch_atomic_cmpxchg_acquire(v, o, n) \
-       arch_cmpxchg_acquire(&((v)->counter), (o), (n))
-
-#define arch_atomic_xchg(v, new) \
-       (arch_xchg(&((v)->counter), new))
-#define arch_atomic_xchg_relaxed(v, new) \
-       arch_xchg_relaxed(&((v)->counter), (new))
-
 /**
  * atomic_fetch_add_unless - add unless the number is a given value
  * @v: pointer of type atomic_t
@@ -396,18 +384,6 @@ static __inline__ s64 arch_atomic64_dec_if_positive(atomic64_t *v)
 }
 #define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
 
-#define arch_atomic64_cmpxchg(v, o, n) \
-       (arch_cmpxchg(&((v)->counter), (o), (n)))
-#define arch_atomic64_cmpxchg_relaxed(v, o, n) \
-       arch_cmpxchg_relaxed(&((v)->counter), (o), (n))
-#define arch_atomic64_cmpxchg_acquire(v, o, n) \
-       arch_cmpxchg_acquire(&((v)->counter), (o), (n))
-
-#define arch_atomic64_xchg(v, new) \
-       (arch_xchg(&((v)->counter), new))
-#define arch_atomic64_xchg_relaxed(v, new) \
-       arch_xchg_relaxed(&((v)->counter), (new))
-
 /**
  * atomic64_fetch_add_unless - add unless the number is a given value
  * @v: pointer of type atomic64_t
diff --git a/arch/powerpc/include/asm/bugs.h b/arch/powerpc/include/asm/bugs.h
deleted file mode 100644 (file)
index 01b8f6c..0000000
+++ /dev/null
@@ -1,15 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-#ifndef _ASM_POWERPC_BUGS_H
-#define _ASM_POWERPC_BUGS_H
-
-/*
- */
-
-/*
- * This file is included by 'init/main.c' to check for
- * architecture-dependent bugs.
- */
-
-static inline void check_bugs(void) { }
-
-#endif /* _ASM_POWERPC_BUGS_H */
index 678b5bd..34e14df 100644 (file)
@@ -205,7 +205,6 @@ extern void iommu_register_group(struct iommu_table_group *table_group,
                                 int pci_domain_number, unsigned long pe_num);
 extern int iommu_add_device(struct iommu_table_group *table_group,
                struct device *dev);
-extern void iommu_del_device(struct device *dev);
 extern long iommu_tce_xchg(struct mm_struct *mm, struct iommu_table *tbl,
                unsigned long entry, unsigned long *hpa,
                enum dma_data_direction *direction);
@@ -229,10 +228,6 @@ static inline int iommu_add_device(struct iommu_table_group *table_group,
 {
        return 0;
 }
-
-static inline void iommu_del_device(struct device *dev)
-{
-}
 #endif /* !CONFIG_IOMMU_API */
 
 u64 dma_iommu_get_required_mask(struct device *dev);
index 9972626..6a88bfd 100644 (file)
@@ -165,9 +165,6 @@ static inline bool is_ioremap_addr(const void *x)
 
        return addr >= IOREMAP_BASE && addr < IOREMAP_END;
 }
-
-struct seq_file;
-void arch_report_meminfo(struct seq_file *m);
 #endif /* CONFIG_PPC64 */
 
 #endif /* __ASSEMBLY__ */
index 038ce8d..8920862 100644 (file)
@@ -144,7 +144,7 @@ static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
 /* We support DMA to/from any memory page via the iommu */
 int dma_iommu_dma_supported(struct device *dev, u64 mask)
 {
-       struct iommu_table *tbl = get_iommu_table_base(dev);
+       struct iommu_table *tbl;
 
        if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) {
                /*
@@ -162,6 +162,8 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
                return 1;
        }
 
+       tbl = get_iommu_table_base(dev);
+
        if (!tbl) {
                dev_err(dev, "Warning: IOMMU dma not supported: mask 0x%08llx, table unavailable\n", mask);
                return 0;
index 0089dd4..67f0b01 100644 (file)
@@ -518,7 +518,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl,
                /* Convert entry to a dma_addr_t */
                entry += tbl->it_offset;
                dma_addr = entry << tbl->it_page_shift;
-               dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
+               dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
 
                DBG("  - %lu pages, entry: %lx, dma_addr: %lx\n",
                            npages, entry, dma_addr);
@@ -905,6 +905,7 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
        unsigned int order;
        unsigned int nio_pages, io_order;
        struct page *page;
+       int tcesize = (1 << tbl->it_page_shift);
 
        size = PAGE_ALIGN(size);
        order = get_order(size);
@@ -931,7 +932,8 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
        memset(ret, 0, size);
 
        /* Set up tces to cover the allocated range */
-       nio_pages = size >> tbl->it_page_shift;
+       nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
+
        io_order = get_iommu_order(size, tbl);
        mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
                              mask >> tbl->it_page_shift, io_order, 0);
@@ -939,7 +941,8 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
                free_pages((unsigned long)ret, order);
                return NULL;
        }
-       *dma_handle = mapping;
+
+       *dma_handle = mapping | ((u64)ret & (tcesize - 1));
        return ret;
 }
 
@@ -950,7 +953,7 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size,
                unsigned int nio_pages;
 
                size = PAGE_ALIGN(size);
-               nio_pages = size >> tbl->it_page_shift;
+               nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
                iommu_free(tbl, dma_handle, nio_pages);
                size = PAGE_ALIGN(size);
                free_pages((unsigned long)vaddr, get_order(size));
@@ -1168,23 +1171,6 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_add_device);
 
-void iommu_del_device(struct device *dev)
-{
-       /*
-        * Some devices might not have IOMMU table and group
-        * and we needn't detach them from the associated
-        * IOMMU groups
-        */
-       if (!device_iommu_mapped(dev)) {
-               pr_debug("iommu_tce: skipping device %s with no tbl\n",
-                        dev_name(dev));
-               return;
-       }
-
-       iommu_group_remove_device(dev);
-}
-EXPORT_SYMBOL_GPL(iommu_del_device);
-
 /*
  * A simple iommu_table_group_ops which only allows reusing the existing
  * iommu_table. This handles VFIO for POWER7 or the nested KVM.
index 85bdd7d..48e0eaf 100644 (file)
@@ -93,11 +93,12 @@ static int process_ISA_OF_ranges(struct device_node *isa_node,
        }
 
 inval_range:
-       if (!phb_io_base_phys) {
+       if (phb_io_base_phys) {
                pr_err("no ISA IO ranges or unexpected isa range, mapping 64k\n");
                remap_isa_base(phb_io_base_phys, 0x10000);
+               return 0;
        }
-       return 0;
+       return -EINVAL;
 }
 
 
index 265801a..e95660e 100644 (file)
@@ -417,9 +417,9 @@ noinstr static void nmi_ipi_lock_start(unsigned long *flags)
 {
        raw_local_irq_save(*flags);
        hard_irq_disable();
-       while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
+       while (raw_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
                raw_local_irq_restore(*flags);
-               spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
+               spin_until_cond(raw_atomic_read(&__nmi_ipi_lock) == 0);
                raw_local_irq_save(*flags);
                hard_irq_disable();
        }
@@ -427,15 +427,15 @@ noinstr static void nmi_ipi_lock_start(unsigned long *flags)
 
 noinstr static void nmi_ipi_lock(void)
 {
-       while (arch_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
-               spin_until_cond(arch_atomic_read(&__nmi_ipi_lock) == 0);
+       while (raw_atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
+               spin_until_cond(raw_atomic_read(&__nmi_ipi_lock) == 0);
 }
 
 noinstr static void nmi_ipi_unlock(void)
 {
        smp_mb();
-       WARN_ON(arch_atomic_read(&__nmi_ipi_lock) != 1);
-       arch_atomic_set(&__nmi_ipi_lock, 0);
+       WARN_ON(raw_atomic_read(&__nmi_ipi_lock) != 1);
+       raw_atomic_set(&__nmi_ipi_lock, 0);
 }
 
 noinstr static void nmi_ipi_unlock_end(unsigned long *flags)
@@ -1605,6 +1605,7 @@ static void add_cpu_to_masks(int cpu)
 }
 
 /* Activate a secondary processor. */
+__no_stack_protector
 void start_secondary(void *unused)
 {
        unsigned int cpu = raw_smp_processor_id();
index 828d0f4..cba6dd1 100644 (file)
@@ -200,7 +200,7 @@ static int __init TAU_init(void)
        tau_int_enable = IS_ENABLED(CONFIG_TAU_INT) &&
                         !strcmp(cur_cpu_spec->platform, "ppc750");
 
-       tau_workq = alloc_workqueue("tau", WQ_UNBOUND, 1);
+       tau_workq = alloc_ordered_workqueue("tau", 0);
        if (!tau_workq)
                return -ENOMEM;
 
index 26245aa..2297aa7 100644 (file)
@@ -1040,8 +1040,8 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
                                  pte_t entry, unsigned long address, int psize)
 {
        struct mm_struct *mm = vma->vm_mm;
-       unsigned long set = pte_val(entry) & (_PAGE_DIRTY | _PAGE_ACCESSED |
-                                             _PAGE_RW | _PAGE_EXEC);
+       unsigned long set = pte_val(entry) & (_PAGE_DIRTY | _PAGE_SOFT_DIRTY |
+                                             _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
 
        unsigned long change = pte_val(entry) ^ pte_val(*ptep);
        /*
index ce804b7..0bd4866 100644 (file)
@@ -795,12 +795,20 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
                goto out;
 
        if (current->active_mm == mm) {
+               unsigned long flags;
+
                WARN_ON_ONCE(current->mm != NULL);
-               /* Is a kernel thread and is using mm as the lazy tlb */
+               /*
+                * It is a kernel thread and is using mm as the lazy tlb, so
+                * switch it to init_mm. This is not always called from IPI
+                * (e.g., flush_type_needed), so must disable irqs.
+                */
+               local_irq_save(flags);
                mmgrab_lazy_tlb(&init_mm);
                current->active_mm = &init_mm;
                switch_mm_irqs_off(mm, &init_mm, current);
                mmdrop_lazy_tlb(mm);
+               local_irq_restore(flags);
        }
 
        /*
index e93aefc..37043df 100644 (file)
@@ -101,6 +101,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
                bpf_hdr = jit_data->header;
                proglen = jit_data->proglen;
                extra_pass = true;
+               /* During extra pass, ensure index is reset before repopulating extable entries */
+               cgctx.exentry_idx = 0;
                goto skip_init_ctx;
        }
 
index 0d9b760..3e2e252 100644 (file)
@@ -265,6 +265,7 @@ config CPM2
 config FSL_ULI1575
        bool "ULI1575 PCIe south bridge support"
        depends on FSL_SOC_BOOKE || PPC_86xx
+       depends on PCI
        select FSL_PCI
        select GENERIC_ISA_DMA
        help
index 193cc9c..0c41f4b 100644 (file)
@@ -76,7 +76,8 @@ int pmac_newworld;
 
 static int current_root_goodness = -1;
 
-#define DEFAULT_ROOT_DEVICE Root_SDA1  /* sda1 - slightly silly choice */
+/* sda1 - slightly silly choice */
+#define DEFAULT_ROOT_DEVICE    MKDEV(SCSI_DISK0_MAJOR, 1)
 
 sys_ctrler_t sys_ctrler = SYS_CTRLER_UNKNOWN;
 EXPORT_SYMBOL(sys_ctrler);
index 233a50e..7725492 100644 (file)
@@ -865,28 +865,3 @@ void __init pnv_pci_init(void)
        /* Configure IOMMU DMA hooks */
        set_pci_dma_ops(&dma_iommu_ops);
 }
-
-static int pnv_tce_iommu_bus_notifier(struct notifier_block *nb,
-               unsigned long action, void *data)
-{
-       struct device *dev = data;
-
-       switch (action) {
-       case BUS_NOTIFY_DEL_DEVICE:
-               iommu_del_device(dev);
-               return 0;
-       default:
-               return 0;
-       }
-}
-
-static struct notifier_block pnv_tce_iommu_bus_nb = {
-       .notifier_call = pnv_tce_iommu_bus_notifier,
-};
-
-static int __init pnv_tce_iommu_bus_notifier_init(void)
-{
-       bus_register_notifier(&pci_bus_type, &pnv_tce_iommu_bus_nb);
-       return 0;
-}
-machine_subsys_initcall_sync(powernv, pnv_tce_iommu_bus_notifier_init);
index 719c97a..47f8eab 100644 (file)
@@ -564,8 +564,7 @@ int __init dlpar_workqueue_init(void)
        if (pseries_hp_wq)
                return 0;
 
-       pseries_hp_wq = alloc_workqueue("pseries hotplug workqueue",
-                       WQ_UNBOUND, 1);
+       pseries_hp_wq = alloc_ordered_workqueue("pseries hotplug workqueue", 0);
 
        return pseries_hp_wq ? 0 : -ENOMEM;
 }
index 7464fa6..d59e8a9 100644 (file)
@@ -91,19 +91,24 @@ static struct iommu_table_group *iommu_pseries_alloc_group(int node)
 static void iommu_pseries_free_group(struct iommu_table_group *table_group,
                const char *node_name)
 {
-       struct iommu_table *tbl;
-
        if (!table_group)
                return;
 
-       tbl = table_group->tables[0];
 #ifdef CONFIG_IOMMU_API
        if (table_group->group) {
                iommu_group_put(table_group->group);
                BUG_ON(table_group->group);
        }
 #endif
-       iommu_tce_table_put(tbl);
+
+       /* Default DMA window table is at index 0, while DDW at 1. SR-IOV
+        * adapters only have table on index 1.
+        */
+       if (table_group->tables[0])
+               iommu_tce_table_put(table_group->tables[0]);
+
+       if (table_group->tables[1])
+               iommu_tce_table_put(table_group->tables[1]);
 
        kfree(table_group);
 }
@@ -312,13 +317,22 @@ static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
 static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages)
 {
        u64 rc;
+       long rpages = npages;
+       unsigned long limit;
 
        if (!firmware_has_feature(FW_FEATURE_STUFF_TCE))
                return tce_free_pSeriesLP(tbl->it_index, tcenum,
                                          tbl->it_page_shift, npages);
 
-       rc = plpar_tce_stuff((u64)tbl->it_index,
-                            (u64)tcenum << tbl->it_page_shift, 0, npages);
+       do {
+               limit = min_t(unsigned long, rpages, 512);
+
+               rc = plpar_tce_stuff((u64)tbl->it_index,
+                                    (u64)tcenum << tbl->it_page_shift, 0, limit);
+
+               rpages -= limit;
+               tcenum += limit;
+       } while (rpages > 0 && !rc);
 
        if (rc && printk_ratelimit()) {
                printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");
@@ -1695,31 +1709,6 @@ static int __init disable_multitce(char *str)
 
 __setup("multitce=", disable_multitce);
 
-static int tce_iommu_bus_notifier(struct notifier_block *nb,
-               unsigned long action, void *data)
-{
-       struct device *dev = data;
-
-       switch (action) {
-       case BUS_NOTIFY_DEL_DEVICE:
-               iommu_del_device(dev);
-               return 0;
-       default:
-               return 0;
-       }
-}
-
-static struct notifier_block tce_iommu_bus_nb = {
-       .notifier_call = tce_iommu_bus_notifier,
-};
-
-static int __init tce_iommu_bus_notifier_init(void)
-{
-       bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb);
-       return 0;
-}
-machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
-
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
                                             struct pci_dev *pdev)
index 6f5e272..78473d6 100644 (file)
@@ -5,6 +5,11 @@ KCSAN_SANITIZE := n
 
 targets += trampoline_$(BITS).o purgatory.ro
 
+# When profile-guided optimization is enabled, llvm emits two different
+# overlapping text sections, which is not supported by kexec. Remove profile
+# optimization flags.
+KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%,$(KBUILD_CFLAGS))
+
 LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined
 
 $(obj)/purgatory.ro: $(obj)/trampoline_$(BITS).o FORCE
index 728d3c2..70c4c59 100644 (file)
@@ -88,7 +88,7 @@ static unsigned long ndump = 64;
 static unsigned long nidump = 16;
 static unsigned long ncsum = 4096;
 static int termch;
-static char tmpstr[128];
+static char tmpstr[KSYM_NAME_LEN];
 static int tracing_enabled;
 
 static long bus_error_jmp[JMP_BUF_LEN];
index 348c0fa..c69572f 100644 (file)
@@ -26,6 +26,7 @@ config RISCV
        select ARCH_HAS_GIGANTIC_PAGE
        select ARCH_HAS_KCOV
        select ARCH_HAS_MMIOWB
+       select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
        select ARCH_HAS_PMEM_API
        select ARCH_HAS_PTE_SPECIAL
        select ARCH_HAS_SET_DIRECT_MAP if MMU
@@ -122,6 +123,7 @@ config RISCV
        select HAVE_RSEQ
        select HAVE_STACKPROTECTOR
        select HAVE_SYSCALL_TRACEPOINTS
+       select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
        select IRQ_DOMAIN
        select IRQ_FORCED_THREADING
        select KASAN_VMALLOC if KASAN
@@ -799,8 +801,11 @@ menu "Power management options"
 
 source "kernel/power/Kconfig"
 
+# Hibernation is only possible on systems where the SBI implementation has
+# marked its reserved memory as not accessible from, or does not run
+# from the same memory as, Linux
 config ARCH_HIBERNATION_POSSIBLE
-       def_bool y
+       def_bool NONPORTABLE
 
 config ARCH_HIBERNATION_HEADER
        def_bool HIBERNATION
index a105596..7b2637c 100644 (file)
@@ -1,2 +1,6 @@
+ifdef CONFIG_RELOCATABLE
+KBUILD_CFLAGS += -fno-pie
+endif
+
 obj-$(CONFIG_ERRATA_SIFIVE) += sifive/
 obj-$(CONFIG_ERRATA_THEAD) += thead/
index bba4729..f5dfef6 100644 (file)
@@ -238,78 +238,6 @@ static __always_inline s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a,
 #define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
 #endif
 
-/*
- * atomic_{cmp,}xchg is required to have exactly the same ordering semantics as
- * {cmp,}xchg and the operations that return, so they need a full barrier.
- */
-#define ATOMIC_OP(c_t, prefix, size)                                   \
-static __always_inline                                                 \
-c_t arch_atomic##prefix##_xchg_relaxed(atomic##prefix##_t *v, c_t n)   \
-{                                                                      \
-       return __xchg_relaxed(&(v->counter), n, size);                  \
-}                                                                      \
-static __always_inline                                                 \
-c_t arch_atomic##prefix##_xchg_acquire(atomic##prefix##_t *v, c_t n)   \
-{                                                                      \
-       return __xchg_acquire(&(v->counter), n, size);                  \
-}                                                                      \
-static __always_inline                                                 \
-c_t arch_atomic##prefix##_xchg_release(atomic##prefix##_t *v, c_t n)   \
-{                                                                      \
-       return __xchg_release(&(v->counter), n, size);                  \
-}                                                                      \
-static __always_inline                                                 \
-c_t arch_atomic##prefix##_xchg(atomic##prefix##_t *v, c_t n)           \
-{                                                                      \
-       return __arch_xchg(&(v->counter), n, size);                     \
-}                                                                      \
-static __always_inline                                                 \
-c_t arch_atomic##prefix##_cmpxchg_relaxed(atomic##prefix##_t *v,       \
-                                    c_t o, c_t n)                      \
-{                                                                      \
-       return __cmpxchg_relaxed(&(v->counter), o, n, size);            \
-}                                                                      \
-static __always_inline                                                 \
-c_t arch_atomic##prefix##_cmpxchg_acquire(atomic##prefix##_t *v,       \
-                                    c_t o, c_t n)                      \
-{                                                                      \
-       return __cmpxchg_acquire(&(v->counter), o, n, size);            \
-}                                                                      \
-static __always_inline                                                 \
-c_t arch_atomic##prefix##_cmpxchg_release(atomic##prefix##_t *v,       \
-                                    c_t o, c_t n)                      \
-{                                                                      \
-       return __cmpxchg_release(&(v->counter), o, n, size);            \
-}                                                                      \
-static __always_inline                                                 \
-c_t arch_atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n) \
-{                                                                      \
-       return __cmpxchg(&(v->counter), o, n, size);                    \
-}
-
-#ifdef CONFIG_GENERIC_ATOMIC64
-#define ATOMIC_OPS()                                                   \
-       ATOMIC_OP(int,   , 4)
-#else
-#define ATOMIC_OPS()                                                   \
-       ATOMIC_OP(int,   , 4)                                           \
-       ATOMIC_OP(s64, 64, 8)
-#endif
-
-ATOMIC_OPS()
-
-#define arch_atomic_xchg_relaxed       arch_atomic_xchg_relaxed
-#define arch_atomic_xchg_acquire       arch_atomic_xchg_acquire
-#define arch_atomic_xchg_release       arch_atomic_xchg_release
-#define arch_atomic_xchg               arch_atomic_xchg
-#define arch_atomic_cmpxchg_relaxed    arch_atomic_cmpxchg_relaxed
-#define arch_atomic_cmpxchg_acquire    arch_atomic_cmpxchg_acquire
-#define arch_atomic_cmpxchg_release    arch_atomic_cmpxchg_release
-#define arch_atomic_cmpxchg            arch_atomic_cmpxchg
-
-#undef ATOMIC_OPS
-#undef ATOMIC_OP
-
 static __always_inline bool arch_atomic_inc_unless_negative(atomic_t *v)
 {
        int prev, rc;
index fe6f230..ce1ebda 100644 (file)
@@ -36,6 +36,9 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
                               unsigned long addr, pte_t *ptep,
                               pte_t pte, int dirty);
 
+#define __HAVE_ARCH_HUGE_PTEP_GET
+pte_t huge_ptep_get(pte_t *ptep);
+
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
 
index d887a54..0bbffd5 100644 (file)
@@ -8,41 +8,8 @@
 #include <asm-generic/pgalloc.h>
 #include <asm/pgtable.h>
 
-static inline int split_pmd_page(unsigned long addr)
-{
-       int i;
-       unsigned long pfn = PFN_DOWN(__pa((addr & PMD_MASK)));
-       pmd_t *pmd = pmd_off_k(addr);
-       pte_t *pte = pte_alloc_one_kernel(&init_mm);
-
-       if (!pte)
-               return -ENOMEM;
-
-       for (i = 0; i < PTRS_PER_PTE; i++)
-               set_pte(pte + i, pfn_pte(pfn + i, PAGE_KERNEL));
-       set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(pte)), PAGE_TABLE));
-
-       flush_tlb_kernel_range(addr, addr + PMD_SIZE);
-       return 0;
-}
-
 static inline bool arch_kfence_init_pool(void)
 {
-       int ret;
-       unsigned long addr;
-       pmd_t *pmd;
-
-       for (addr = (unsigned long)__kfence_pool; is_kfence_address((void *)addr);
-            addr += PAGE_SIZE) {
-               pmd = pmd_off_k(addr);
-
-               if (pmd_leaf(*pmd)) {
-                       ret = split_pmd_page(addr);
-                       if (ret)
-                               return false;
-               }
-       }
-
        return true;
 }
 
index d42c901..665bbc9 100644 (file)
 
 #include <linux/perf_event.h>
 #define perf_arch_bpf_user_pt_regs(regs) (struct user_regs_struct *)regs
+
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+       (regs)->epc = (__ip); \
+       (regs)->s0 = (unsigned long) __builtin_frame_address(0); \
+       (regs)->sp = current_stack_pointer; \
+       (regs)->status = SR_PP; \
+}
 #endif /* _ASM_RISCV_PERF_EVENT_H */
index 2258b27..75970ee 100644 (file)
@@ -165,8 +165,7 @@ extern struct pt_alloc_ops pt_ops __initdata;
                                         _PAGE_EXEC | _PAGE_WRITE)
 
 #define PAGE_COPY              PAGE_READ
-#define PAGE_COPY_EXEC         PAGE_EXEC
-#define PAGE_COPY_READ_EXEC    PAGE_READ_EXEC
+#define PAGE_COPY_EXEC         PAGE_READ_EXEC
 #define PAGE_SHARED            PAGE_WRITE
 #define PAGE_SHARED_EXEC       PAGE_WRITE_EXEC
 
index c4b7701..0d55584 100644 (file)
@@ -70,7 +70,7 @@ asmlinkage void smp_callin(void);
 
 #if defined CONFIG_HOTPLUG_CPU
 int __cpu_disable(void);
-void __cpu_die(unsigned int cpu);
+static inline void __cpu_die(unsigned int cpu) { }
 #endif /* CONFIG_HOTPLUG_CPU */
 
 #else
index fbdccc2..153864e 100644 (file)
@@ -23,6 +23,10 @@ ifdef CONFIG_FTRACE
 CFLAGS_REMOVE_alternative.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_cpufeature.o = $(CC_FLAGS_FTRACE)
 endif
+ifdef CONFIG_RELOCATABLE
+CFLAGS_alternative.o += -fno-pie
+CFLAGS_cpufeature.o += -fno-pie
+endif
 ifdef CONFIG_KASAN
 KASAN_SANITIZE_alternative.o := n
 KASAN_SANITIZE_cpufeature.o := n
index a941adc..457a18e 100644 (file)
@@ -8,6 +8,7 @@
 #include <linux/sched.h>
 #include <linux/err.h>
 #include <linux/irq.h>
+#include <linux/cpuhotplug.h>
 #include <linux/cpu.h>
 #include <linux/sched/hotplug.h>
 #include <asm/irq.h>
@@ -49,17 +50,15 @@ int __cpu_disable(void)
        return ret;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
 /*
- * Called on the thread which is asking for a CPU to be shutdown.
+ * Called on the thread which is asking for a CPU to be shutdown, if the
+ * CPU reported dead to the hotplug core.
  */
-void __cpu_die(unsigned int cpu)
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
 {
        int ret = 0;
 
-       if (!cpu_wait_death(cpu, 5)) {
-               pr_err("CPU %u: didn't die\n", cpu);
-               return;
-       }
        pr_notice("CPU%u: off\n", cpu);
 
        /* Verify from the firmware if the cpu is really stopped*/
@@ -76,9 +75,10 @@ void __noreturn arch_cpu_idle_dead(void)
 {
        idle_task_exit();
 
-       (void)cpu_report_death();
+       cpuhp_ap_report_dead();
 
        cpu_ops[smp_processor_id()]->cpu_stop();
        /* It should never reach here */
        BUG();
 }
+#endif
index 5d7cb99..7b593d4 100644 (file)
@@ -22,7 +22,7 @@ KCOV_INSTRUMENT       := n
 
 $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \
                               --remove-section=.note.gnu.property \
-                              --prefix-alloc-sections=.init
+                              --prefix-alloc-sections=.init.pi
 $(obj)/%.pi.o: $(obj)/%.o FORCE
        $(call if_changed,objcopy)
 
index c40139e..8265ff4 100644 (file)
@@ -4,3 +4,5 @@ obj-$(CONFIG_RETHOOK)           += rethook.o rethook_trampoline.o
 obj-$(CONFIG_KPROBES_ON_FTRACE)        += ftrace.o
 obj-$(CONFIG_UPROBES)          += uprobes.o decode-insn.o simulate-insn.o
 CFLAGS_REMOVE_simulate-insn.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_rethook.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_rethook_trampoline.o = $(CC_FLAGS_FTRACE)
index f03b569..e5f9f46 100644 (file)
@@ -84,11 +84,8 @@ SECTIONS
        __init_data_begin = .;
        INIT_DATA_SECTION(16)
 
-       /* Those sections result from the compilation of kernel/pi/string.c */
-       .init.pidata : {
-               *(.init.srodata.cst8*)
-               *(.init__bug_table*)
-               *(.init.sdata*)
+       .init.pi : {
+               *(.init.pi*)
        }
 
        .init.bss : {
index a163a3e..e0ef56d 100644 (file)
@@ -3,6 +3,30 @@
 #include <linux/err.h>
 
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
+pte_t huge_ptep_get(pte_t *ptep)
+{
+       unsigned long pte_num;
+       int i;
+       pte_t orig_pte = ptep_get(ptep);
+
+       if (!pte_present(orig_pte) || !pte_napot(orig_pte))
+               return orig_pte;
+
+       pte_num = napot_pte_num(napot_cont_order(orig_pte));
+
+       for (i = 0; i < pte_num; i++, ptep++) {
+               pte_t pte = ptep_get(ptep);
+
+               if (pte_dirty(pte))
+                       orig_pte = pte_mkdirty(orig_pte);
+
+               if (pte_young(pte))
+                       orig_pte = pte_mkyoung(orig_pte);
+       }
+
+       return orig_pte;
+}
+
 pte_t *huge_pte_alloc(struct mm_struct *mm,
                      struct vm_area_struct *vma,
                      unsigned long addr,
@@ -218,6 +242,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 {
        pte_t pte = ptep_get(ptep);
        unsigned long order;
+       pte_t orig_pte;
        int i, pte_num;
 
        if (!pte_napot(pte)) {
@@ -228,9 +253,12 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
        order = napot_cont_order(pte);
        pte_num = napot_pte_num(order);
        ptep = huge_pte_offset(mm, addr, napot_cont_size(order));
+       orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
+
+       orig_pte = pte_wrprotect(orig_pte);
 
        for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
-               ptep_set_wrprotect(mm, addr, ptep);
+               set_pte_at(mm, addr, ptep, orig_pte);
 }
 
 pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
index 747e5b1..4fa420f 100644 (file)
@@ -23,6 +23,7 @@
 #ifdef CONFIG_RELOCATABLE
 #include <linux/elf.h>
 #endif
+#include <linux/kfence.h>
 
 #include <asm/fixmap.h>
 #include <asm/tlbflush.h>
@@ -293,7 +294,7 @@ static const pgprot_t protection_map[16] = {
        [VM_EXEC]                                       = PAGE_EXEC,
        [VM_EXEC | VM_READ]                             = PAGE_READ_EXEC,
        [VM_EXEC | VM_WRITE]                            = PAGE_COPY_EXEC,
-       [VM_EXEC | VM_WRITE | VM_READ]                  = PAGE_COPY_READ_EXEC,
+       [VM_EXEC | VM_WRITE | VM_READ]                  = PAGE_COPY_EXEC,
        [VM_SHARED]                                     = PAGE_NONE,
        [VM_SHARED | VM_READ]                           = PAGE_READ,
        [VM_SHARED | VM_WRITE]                          = PAGE_SHARED,
@@ -659,18 +660,19 @@ void __init create_pgd_mapping(pgd_t *pgdp,
        create_pgd_next_mapping(nextp, va, pa, sz, prot);
 }
 
-static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
+static uintptr_t __init best_map_size(phys_addr_t pa, uintptr_t va,
+                                     phys_addr_t size)
 {
-       if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
+       if (!(pa & (PGDIR_SIZE - 1)) && !(va & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
                return PGDIR_SIZE;
 
-       if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
+       if (!(pa & (P4D_SIZE - 1)) && !(va & (P4D_SIZE - 1)) && size >= P4D_SIZE)
                return P4D_SIZE;
 
-       if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
+       if (!(pa & (PUD_SIZE - 1)) && !(va & (PUD_SIZE - 1)) && size >= PUD_SIZE)
                return PUD_SIZE;
 
-       if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
+       if (!(pa & (PMD_SIZE - 1)) && !(va & (PMD_SIZE - 1)) && size >= PMD_SIZE)
                return PMD_SIZE;
 
        return PAGE_SIZE;
@@ -922,9 +924,9 @@ static void __init create_kernel_page_table(pgd_t *pgdir, bool early)
 static void __init create_fdt_early_page_table(uintptr_t fix_fdt_va,
                                               uintptr_t dtb_pa)
 {
+#ifndef CONFIG_BUILTIN_DTB
        uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
 
-#ifndef CONFIG_BUILTIN_DTB
        /* Make sure the fdt fixmap address is always aligned on PMD size */
        BUILD_BUG_ON(FIX_FDT % (PMD_SIZE / PAGE_SIZE));
 
@@ -1167,14 +1169,16 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 }
 
 static void __init create_linear_mapping_range(phys_addr_t start,
-                                              phys_addr_t end)
+                                              phys_addr_t end,
+                                              uintptr_t fixed_map_size)
 {
        phys_addr_t pa;
        uintptr_t va, map_size;
 
        for (pa = start; pa < end; pa += map_size) {
                va = (uintptr_t)__va(pa);
-               map_size = best_map_size(pa, end - pa);
+               map_size = fixed_map_size ? fixed_map_size :
+                                           best_map_size(pa, va, end - pa);
 
                create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
                                   pgprot_from_va(va));
@@ -1184,6 +1188,7 @@ static void __init create_linear_mapping_range(phys_addr_t start,
 static void __init create_linear_mapping_page_table(void)
 {
        phys_addr_t start, end;
+       phys_addr_t kfence_pool __maybe_unused;
        u64 i;
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
@@ -1197,6 +1202,19 @@ static void __init create_linear_mapping_page_table(void)
        memblock_mark_nomap(krodata_start, krodata_size);
 #endif
 
+#ifdef CONFIG_KFENCE
+       /*
+        *  kfence pool must be backed by PAGE_SIZE mappings, so allocate it
+        *  before we setup the linear mapping so that we avoid using hugepages
+        *  for this region.
+        */
+       kfence_pool = memblock_phys_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
+       BUG_ON(!kfence_pool);
+
+       memblock_mark_nomap(kfence_pool, KFENCE_POOL_SIZE);
+       __kfence_pool = __va(kfence_pool);
+#endif
+
        /* Map all memory banks in the linear mapping */
        for_each_mem_range(i, &start, &end) {
                if (start >= end)
@@ -1207,17 +1225,25 @@ static void __init create_linear_mapping_page_table(void)
                if (end >= __pa(PAGE_OFFSET) + memory_limit)
                        end = __pa(PAGE_OFFSET) + memory_limit;
 
-               create_linear_mapping_range(start, end);
+               create_linear_mapping_range(start, end, 0);
        }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
-       create_linear_mapping_range(ktext_start, ktext_start + ktext_size);
+       create_linear_mapping_range(ktext_start, ktext_start + ktext_size, 0);
        create_linear_mapping_range(krodata_start,
-                                   krodata_start + krodata_size);
+                                   krodata_start + krodata_size, 0);
 
        memblock_clear_nomap(ktext_start,  ktext_size);
        memblock_clear_nomap(krodata_start, krodata_size);
 #endif
+
+#ifdef CONFIG_KFENCE
+       create_linear_mapping_range(kfence_pool,
+                                   kfence_pool + KFENCE_POOL_SIZE,
+                                   PAGE_SIZE);
+
+       memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
+#endif
 }
 
 static void __init setup_vm_final(void)
index 5730797..bd2e27f 100644 (file)
@@ -35,6 +35,11 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS
 CFLAGS_string.o := -D__DISABLE_EXPORTS
 CFLAGS_ctype.o := -D__DISABLE_EXPORTS
 
+# When profile-guided optimization is enabled, llvm emits two different
+# overlapping text sections, which is not supported by kexec. Remove profile
+# optimization flags.
+KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%,$(KBUILD_CFLAGS))
+
 # When linking purgatory.ro with -r unresolved symbols are not checked,
 # also link a purgatory.chk binary without -r to check for unresolved symbols.
 PURGATORY_LDFLAGS := -e purgatory_start -z nodefaultlib
index db20c15..5b39918 100644 (file)
@@ -117,6 +117,7 @@ config S390
        select ARCH_SUPPORTS_ATOMIC_RMW
        select ARCH_SUPPORTS_DEBUG_PAGEALLOC
        select ARCH_SUPPORTS_HUGETLBFS
+       select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && CC_IS_CLANG
        select ARCH_SUPPORTS_NUMA_BALANCING
        select ARCH_SUPPORTS_PER_VMA_LOCK
        select ARCH_USE_BUILTIN_BSWAP
@@ -469,19 +470,11 @@ config SCHED_SMT
 config SCHED_MC
        def_bool n
 
-config SCHED_BOOK
-       def_bool n
-
-config SCHED_DRAWER
-       def_bool n
-
 config SCHED_TOPOLOGY
        def_bool y
        prompt "Topology scheduler support"
        select SCHED_SMT
        select SCHED_MC
-       select SCHED_BOOK
-       select SCHED_DRAWER
        help
          Topology scheduler support improves the CPU scheduler's decision
          making when dealing with machines that have multi-threading,
@@ -716,7 +709,6 @@ config EADM_SCH
 config VFIO_CCW
        def_tristate n
        prompt "Support for VFIO-CCW subchannels"
-       depends on S390_CCW_IOMMU
        depends on VFIO
        select VFIO_MDEV
        help
@@ -728,7 +720,7 @@ config VFIO_CCW
 config VFIO_AP
        def_tristate n
        prompt "VFIO support for AP devices"
-       depends on S390_AP_IOMMU && KVM
+       depends on KVM
        depends on VFIO
        depends on ZCRYPT
        select VFIO_MDEV
index acb1f8b..c67f59d 100644 (file)
@@ -45,6 +45,13 @@ static void pgtable_populate(unsigned long addr, unsigned long end, enum populat
 
 static pte_t pte_z;
 
+static inline void kasan_populate(unsigned long start, unsigned long end, enum populate_mode mode)
+{
+       start = PAGE_ALIGN_DOWN(__sha(start));
+       end = PAGE_ALIGN(__sha(end));
+       pgtable_populate(start, end, mode);
+}
+
 static void kasan_populate_shadow(void)
 {
        pmd_t pmd_z = __pmd(__pa(kasan_early_shadow_pte) | _SEGMENT_ENTRY);
@@ -95,17 +102,17 @@ static void kasan_populate_shadow(void)
         */
 
        for_each_physmem_usable_range(i, &start, &end)
-               pgtable_populate(__sha(start), __sha(end), POPULATE_KASAN_MAP_SHADOW);
+               kasan_populate(start, end, POPULATE_KASAN_MAP_SHADOW);
        if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
                untracked_end = VMALLOC_START;
                /* shallowly populate kasan shadow for vmalloc and modules */
-               pgtable_populate(__sha(VMALLOC_START), __sha(MODULES_END), POPULATE_KASAN_SHALLOW);
+               kasan_populate(VMALLOC_START, MODULES_END, POPULATE_KASAN_SHALLOW);
        } else {
                untracked_end = MODULES_VADDR;
        }
        /* populate kasan shadow for untracked memory */
-       pgtable_populate(__sha(ident_map_size), __sha(untracked_end), POPULATE_KASAN_ZERO_SHADOW);
-       pgtable_populate(__sha(MODULES_END), __sha(_REGION1_SIZE), POPULATE_KASAN_ZERO_SHADOW);
+       kasan_populate(ident_map_size, untracked_end, POPULATE_KASAN_ZERO_SHADOW);
+       kasan_populate(MODULES_END, _REGION1_SIZE, POPULATE_KASAN_ZERO_SHADOW);
 }
 
 static bool kasan_pgd_populate_zero_shadow(pgd_t *pgd, unsigned long addr,
index 4ccf66d..aa95cf6 100644 (file)
@@ -116,6 +116,7 @@ CONFIG_UNIX=y
 CONFIG_UNIX_DIAG=m
 CONFIG_XFRM_USER=m
 CONFIG_NET_KEY=m
+CONFIG_NET_TC_SKB_EXT=y
 CONFIG_SMC=m
 CONFIG_SMC_DIAG=m
 CONFIG_INET=y
@@ -591,8 +592,6 @@ CONFIG_VIRTIO_BALLOON=m
 CONFIG_VIRTIO_INPUT=y
 CONFIG_VHOST_NET=m
 CONFIG_VHOST_VSOCK=m
-CONFIG_S390_CCW_IOMMU=y
-CONFIG_S390_AP_IOMMU=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
 CONFIG_EXT4_FS_SECURITY=y
@@ -703,6 +702,7 @@ CONFIG_IMA_DEFAULT_HASH_SHA256=y
 CONFIG_IMA_WRITE_POLICY=y
 CONFIG_IMA_APPRAISE=y
 CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
+CONFIG_INIT_STACK_NONE=y
 CONFIG_CRYPTO_USER=m
 # CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
 CONFIG_CRYPTO_PCRYPT=m
index 693297a..f041945 100644 (file)
@@ -107,6 +107,7 @@ CONFIG_UNIX=y
 CONFIG_UNIX_DIAG=m
 CONFIG_XFRM_USER=m
 CONFIG_NET_KEY=m
+CONFIG_NET_TC_SKB_EXT=y
 CONFIG_SMC=m
 CONFIG_SMC_DIAG=m
 CONFIG_INET=y
@@ -580,8 +581,6 @@ CONFIG_VIRTIO_BALLOON=m
 CONFIG_VIRTIO_INPUT=y
 CONFIG_VHOST_NET=m
 CONFIG_VHOST_VSOCK=m
-CONFIG_S390_CCW_IOMMU=y
-CONFIG_S390_AP_IOMMU=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
 CONFIG_EXT4_FS_SECURITY=y
@@ -686,6 +685,7 @@ CONFIG_IMA_DEFAULT_HASH_SHA256=y
 CONFIG_IMA_WRITE_POLICY=y
 CONFIG_IMA_APPRAISE=y
 CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
+CONFIG_INIT_STACK_NONE=y
 CONFIG_CRYPTO_FIPS=y
 CONFIG_CRYPTO_USER=m
 # CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
index 33a232b..6f68b39 100644 (file)
@@ -67,6 +67,7 @@ CONFIG_ZFCP=y
 # CONFIG_MISC_FILESYSTEMS is not set
 # CONFIG_NETWORK_FILESYSTEMS is not set
 CONFIG_LSM="yama,loadpin,safesetid,integrity"
+CONFIG_INIT_STACK_NONE=y
 # CONFIG_ZLIB_DFLTCC is not set
 CONFIG_XZ_DEC_MICROLZMA=y
 CONFIG_PRINTK_TIME=y
index 7752bd3..5fae187 100644 (file)
@@ -82,7 +82,7 @@ void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src,
         * it cannot handle a block of data or less, but otherwise
         * it can handle data of arbitrary size
         */
-       if (bytes <= CHACHA_BLOCK_SIZE || nrounds != 20)
+       if (bytes <= CHACHA_BLOCK_SIZE || nrounds != 20 || !MACHINE_HAS_VX)
                chacha_crypt_generic(state, dst, src, bytes, nrounds);
        else
                chacha20_crypt_s390(state, dst, src, bytes,
index 29dc827..d29a9d9 100644 (file)
@@ -5,7 +5,7 @@
  * s390 implementation of the AES Cipher Algorithm with protected keys.
  *
  * s390 Version:
- *   Copyright IBM Corp. 2017,2020
+ *   Copyright IBM Corp. 2017, 2023
  *   Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
  *             Harald Freudenberger <freude@de.ibm.com>
  */
@@ -132,7 +132,8 @@ static inline int __paes_keyblob2pkey(struct key_blob *kb,
                if (i > 0 && ret == -EAGAIN && in_task())
                        if (msleep_interruptible(1000))
                                return -EINTR;
-               ret = pkey_keyblob2pkey(kb->key, kb->keylen, pk);
+               ret = pkey_keyblob2pkey(kb->key, kb->keylen,
+                                       pk->protkey, &pk->len, &pk->type);
                if (ret == 0)
                        break;
        }
@@ -145,6 +146,7 @@ static inline int __paes_convert_key(struct s390_paes_ctx *ctx)
        int ret;
        struct pkey_protkey pkey;
 
+       pkey.len = sizeof(pkey.protkey);
        ret = __paes_keyblob2pkey(&ctx->kb, &pkey);
        if (ret)
                return ret;
@@ -414,6 +416,9 @@ static inline int __xts_paes_convert_key(struct s390_pxts_ctx *ctx)
 {
        struct pkey_protkey pkey0, pkey1;
 
+       pkey0.len = sizeof(pkey0.protkey);
+       pkey1.len = sizeof(pkey1.protkey);
+
        if (__paes_keyblob2pkey(&ctx->kb[0], &pkey0) ||
            __paes_keyblob2pkey(&ctx->kb[1], &pkey1))
                return -EINVAL;
index c37eb92..a873e87 100644 (file)
@@ -6,4 +6,8 @@
 #include <asm/fpu/api.h>
 #include <asm-generic/asm-prototypes.h>
 
+__int128_t __ashlti3(__int128_t a, int b);
+__int128_t __ashrti3(__int128_t a, int b);
+__int128_t __lshrti3(__int128_t a, int b);
+
 #endif /* _ASM_S390_PROTOTYPES_H */
index 06e0e42..aae0315 100644 (file)
@@ -190,38 +190,18 @@ static __always_inline unsigned long __cmpxchg(unsigned long address,
 #define arch_cmpxchg_local     arch_cmpxchg
 #define arch_cmpxchg64_local   arch_cmpxchg
 
-#define system_has_cmpxchg_double()    1
+#define system_has_cmpxchg128()                1
 
-static __always_inline int __cmpxchg_double(unsigned long p1, unsigned long p2,
-                                           unsigned long o1, unsigned long o2,
-                                           unsigned long n1, unsigned long n2)
+static __always_inline u128 arch_cmpxchg128(volatile u128 *ptr, u128 old, u128 new)
 {
-       union register_pair old = { .even = o1, .odd = o2, };
-       union register_pair new = { .even = n1, .odd = n2, };
-       int cc;
-
        asm volatile(
                "       cdsg    %[old],%[new],%[ptr]\n"
-               "       ipm     %[cc]\n"
-               "       srl     %[cc],28\n"
-               : [cc] "=&d" (cc), [old] "+&d" (old.pair)
-               : [new] "d" (new.pair),
-                 [ptr] "QS" (*(unsigned long *)p1), "Q" (*(unsigned long *)p2)
+               : [old] "+d" (old), [ptr] "+QS" (*ptr)
+               : [new] "d" (new)
                : "memory", "cc");
-       return !cc;
+       return old;
 }
 
-#define arch_cmpxchg_double(p1, p2, o1, o2, n1, n2)                    \
-({                                                                     \
-       typeof(p1) __p1 = (p1);                                         \
-       typeof(p2) __p2 = (p2);                                         \
-                                                                       \
-       BUILD_BUG_ON(sizeof(*(p1)) != sizeof(long));                    \
-       BUILD_BUG_ON(sizeof(*(p2)) != sizeof(long));                    \
-       VM_BUG_ON((unsigned long)((__p1) + 1) != (unsigned long)(__p2));\
-       __cmpxchg_double((unsigned long)__p1, (unsigned long)__p2,      \
-                        (unsigned long)(o1), (unsigned long)(o2),      \
-                        (unsigned long)(n1), (unsigned long)(n2));     \
-})
+#define arch_cmpxchg128                arch_cmpxchg128
 
 #endif /* __ASM_CMPXCHG_H */
index a386070..3cb9d81 100644 (file)
@@ -112,7 +112,7 @@ struct compat_statfs64 {
        u32             f_namelen;
        u32             f_frsize;
        u32             f_flags;
-       u32             f_spare[4];
+       u32             f_spare[5];
 };
 
 /*
index 646b129..b378e2b 100644 (file)
@@ -2,7 +2,7 @@
 /*
  * CP Assist for Cryptographic Functions (CPACF)
  *
- * Copyright IBM Corp. 2003, 2017
+ * Copyright IBM Corp. 2003, 2023
  * Author(s): Thomas Spatzier
  *           Jan Glauber
  *           Harald Freudenberger (freude@de.ibm.com)
 #define CPACF_PCKMO_ENC_AES_128_KEY    0x12
 #define CPACF_PCKMO_ENC_AES_192_KEY    0x13
 #define CPACF_PCKMO_ENC_AES_256_KEY    0x14
+#define CPACF_PCKMO_ENC_ECC_P256_KEY   0x20
+#define CPACF_PCKMO_ENC_ECC_P384_KEY   0x21
+#define CPACF_PCKMO_ENC_ECC_P521_KEY   0x22
+#define CPACF_PCKMO_ENC_ECC_ED25519_KEY        0x28
+#define CPACF_PCKMO_ENC_ECC_ED448_KEY  0x29
 
 /*
  * Function codes for the PRNO (PERFORM RANDOM NUMBER OPERATION)
index 7e417d7..a0de5b9 100644 (file)
@@ -140,7 +140,7 @@ union hws_trailer_header {
                unsigned int dsdes:16;  /* 48-63: size of diagnostic SDE */
                unsigned long long overflow; /* 64 - Overflow Count   */
        };
-       __uint128_t val;
+       u128 val;
 };
 
 struct hws_trailer_entry {
index 0d1c74a..a4d2e10 100644 (file)
@@ -16,6 +16,9 @@
 
 #define OS_INFO_VMCOREINFO     0
 #define OS_INFO_REIPL_BLOCK    1
+#define OS_INFO_FLAGS_ENTRY    2
+
+#define OS_INFO_FLAG_REIPL_CLEAR       (1UL << 0)
 
 struct os_info_entry {
        u64     addr;
@@ -30,8 +33,8 @@ struct os_info {
        u16     version_minor;
        u64     crashkernel_addr;
        u64     crashkernel_size;
-       struct os_info_entry entry[2];
-       u8      reserved[4024];
+       struct os_info_entry entry[3];
+       u8      reserved[4004];
 } __packed;
 
 void os_info_init(void);
index 081837b..264095d 100644 (file)
 #define this_cpu_cmpxchg_4(pcp, oval, nval) arch_this_cpu_cmpxchg(pcp, oval, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval) arch_this_cpu_cmpxchg(pcp, oval, nval)
 
+#define this_cpu_cmpxchg64(pcp, o, n)  this_cpu_cmpxchg_8(pcp, o, n)
+
+#define this_cpu_cmpxchg128(pcp, oval, nval)                           \
+({                                                                     \
+       typedef typeof(pcp) pcp_op_T__;                                 \
+       u128 old__, new__, ret__;                                       \
+       pcp_op_T__ *ptr__;                                              \
+       old__ = oval;                                                   \
+       new__ = nval;                                                   \
+       preempt_disable_notrace();                                      \
+       ptr__ = raw_cpu_ptr(&(pcp));                                    \
+       ret__ = cmpxchg128((void *)ptr__, old__, new__);                \
+       preempt_enable_notrace();                                       \
+       ret__;                                                          \
+})
+
 #define arch_this_cpu_xchg(pcp, nval)                                  \
 ({                                                                     \
        typeof(pcp) *ptr__;                                             \
 #define this_cpu_xchg_4(pcp, nval) arch_this_cpu_xchg(pcp, nval)
 #define this_cpu_xchg_8(pcp, nval) arch_this_cpu_xchg(pcp, nval)
 
-#define arch_this_cpu_cmpxchg_double(pcp1, pcp2, o1, o2, n1, n2)           \
-({                                                                         \
-       typeof(pcp1) *p1__;                                                 \
-       typeof(pcp2) *p2__;                                                 \
-       int ret__;                                                          \
-                                                                           \
-       preempt_disable_notrace();                                          \
-       p1__ = raw_cpu_ptr(&(pcp1));                                        \
-       p2__ = raw_cpu_ptr(&(pcp2));                                        \
-       ret__ = __cmpxchg_double((unsigned long)p1__, (unsigned long)p2__,  \
-                                (unsigned long)(o1), (unsigned long)(o2),  \
-                                (unsigned long)(n1), (unsigned long)(n2)); \
-       preempt_enable_notrace();                                           \
-       ret__;                                                              \
-})
-
-#define this_cpu_cmpxchg_double_8 arch_this_cpu_cmpxchg_double
-
 #include <asm-generic/percpu.h>
 
 #endif /* __ARCH_S390_PERCPU__ */
index 6822a11..c55f3c3 100644 (file)
@@ -42,9 +42,6 @@ static inline void update_page_count(int level, long count)
                atomic_long_add(count, &direct_pages_count[level]);
 }
 
-struct seq_file;
-void arch_report_meminfo(struct seq_file *m);
-
 /*
  * The S390 doesn't have any external MMU info: the kernel page
  * tables contain all the necessary information.
index 8e9c582..9e41a74 100644 (file)
@@ -3,6 +3,7 @@
 #define _ASM_S390_MEM_DETECT_H
 
 #include <linux/types.h>
+#include <asm/page.h>
 
 enum physmem_info_source {
        MEM_DETECT_NONE = 0,
@@ -133,7 +134,7 @@ static inline const char *get_rr_type_name(enum reserved_range_type t)
 
 #define for_each_physmem_reserved_type_range(t, range, p_start, p_end)                         \
        for (range = &physmem_info.reserved[t], *p_start = range->start, *p_end = range->end;   \
-            range && range->end; range = range->chain,                                         \
+            range && range->end; range = range->chain ? __va(range->chain) : NULL,             \
             *p_start = range ? range->start : 0, *p_end = range ? range->end : 0)
 
 static inline struct reserved_range *__physmem_reserved_next(enum reserved_range_type *t,
@@ -145,7 +146,7 @@ static inline struct reserved_range *__physmem_reserved_next(enum reserved_range
                        return range;
        }
        if (range->chain)
-               return range->chain;
+               return __va(range->chain);
        while (++*t < RR_MAX) {
                range = &physmem_info.reserved[*t];
                if (range->end)
index dd3d20c..47d80a7 100644 (file)
@@ -2,7 +2,7 @@
 /*
  * Kernelspace interface to the pkey device driver
  *
- * Copyright IBM Corp. 2016,2019
+ * Copyright IBM Corp. 2016, 2023
  *
  * Author: Harald Freudenberger <freude@de.ibm.com>
  *
@@ -23,6 +23,6 @@
  * @return 0 on success, negative errno value on failure
  */
 int pkey_keyblob2pkey(const u8 *key, u32 keylen,
-                     struct pkey_protkey *protkey);
+                     u8 *protkey, u32 *protkeylen, u32 *protkeytype);
 
 #endif /* _KAPI_PKEY_H */
index ce878e8..4d64665 100644 (file)
@@ -63,7 +63,7 @@ static inline int store_tod_clock_ext_cc(union tod_clock *clk)
        return cc;
 }
 
-static inline void store_tod_clock_ext(union tod_clock *tod)
+static __always_inline void store_tod_clock_ext(union tod_clock *tod)
 {
        asm volatile("stcke %0" : "=Q" (*tod) : : "cc");
 }
@@ -177,7 +177,7 @@ static inline void local_tick_enable(unsigned long comp)
 
 typedef unsigned long cycles_t;
 
-static inline unsigned long get_tod_clock(void)
+static __always_inline unsigned long get_tod_clock(void)
 {
        union tod_clock clk;
 
@@ -204,6 +204,11 @@ void init_cpu_timer(void);
 
 extern union tod_clock tod_clock_base;
 
+static __always_inline unsigned long __get_tod_clock_monotonic(void)
+{
+       return get_tod_clock() - tod_clock_base.tod;
+}
+
 /**
  * get_clock_monotonic - returns current time in clock rate units
  *
@@ -216,7 +221,7 @@ static inline unsigned long get_tod_clock_monotonic(void)
        unsigned long tod;
 
        preempt_disable_notrace();
-       tod = get_tod_clock() - tod_clock_base.tod;
+       tod = __get_tod_clock_monotonic();
        preempt_enable_notrace();
        return tod;
 }
@@ -240,7 +245,7 @@ static inline unsigned long get_tod_clock_monotonic(void)
  * -> ns = (th * 125) + ((tl * 125) >> 9);
  *
  */
-static inline unsigned long tod_to_ns(unsigned long todval)
+static __always_inline unsigned long tod_to_ns(unsigned long todval)
 {
        return ((todval >> 9) * 125) + (((todval & 0x1ff) * 125) >> 9);
 }
index 924b876..f7bae1c 100644 (file)
@@ -2,7 +2,7 @@
 /*
  * Userspace interface to the pkey device driver
  *
- * Copyright IBM Corp. 2017, 2019
+ * Copyright IBM Corp. 2017, 2023
  *
  * Author: Harald Freudenberger <freude@de.ibm.com>
  *
 #define MINKEYBLOBSIZE SECKEYBLOBSIZE
 
 /* defines for the type field within the pkey_protkey struct */
-#define PKEY_KEYTYPE_AES_128                 1
-#define PKEY_KEYTYPE_AES_192                 2
-#define PKEY_KEYTYPE_AES_256                 3
-#define PKEY_KEYTYPE_ECC                     4
+#define PKEY_KEYTYPE_AES_128           1
+#define PKEY_KEYTYPE_AES_192           2
+#define PKEY_KEYTYPE_AES_256           3
+#define PKEY_KEYTYPE_ECC               4
+#define PKEY_KEYTYPE_ECC_P256          5
+#define PKEY_KEYTYPE_ECC_P384          6
+#define PKEY_KEYTYPE_ECC_P521          7
+#define PKEY_KEYTYPE_ECC_ED25519       8
+#define PKEY_KEYTYPE_ECC_ED448         9
 
 /* the newer ioctls use a pkey_key_type enum for type information */
 enum pkey_key_type {
index 72604f7..f85b507 100644 (file)
@@ -30,7 +30,7 @@ struct statfs {
        unsigned int    f_namelen;
        unsigned int    f_frsize;
        unsigned int    f_flags;
-       unsigned int    f_spare[4];
+       unsigned int    f_spare[5];
 };
 
 struct statfs64 {
@@ -45,7 +45,7 @@ struct statfs64 {
        unsigned int    f_namelen;
        unsigned int    f_frsize;
        unsigned int    f_flags;
-       unsigned int    f_spare[4];
+       unsigned int    f_spare[5];
 };
 
 #endif
index 8983837..6b2a051 100644 (file)
@@ -10,6 +10,7 @@ CFLAGS_REMOVE_ftrace.o                = $(CC_FLAGS_FTRACE)
 
 # Do not trace early setup code
 CFLAGS_REMOVE_early.o          = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_rethook.o                = $(CC_FLAGS_FTRACE)
 
 endif
 
index 8a617be..7af6994 100644 (file)
@@ -568,9 +568,9 @@ static size_t get_elfcorehdr_size(int mem_chunk_cnt)
 int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
 {
        Elf64_Phdr *phdr_notes, *phdr_loads;
+       size_t alloc_size;
        int mem_chunk_cnt;
        void *ptr, *hdr;
-       u32 alloc_size;
        u64 hdr_off;
 
        /* If we are not in kdump or zfcp/nvme dump mode return */
index 43de939..85a00d9 100644 (file)
@@ -176,6 +176,8 @@ static bool reipl_fcp_clear;
 static bool reipl_ccw_clear;
 static bool reipl_eckd_clear;
 
+static unsigned long os_info_flags;
+
 static inline int __diag308(unsigned long subcode, unsigned long addr)
 {
        union register_pair r1;
@@ -1935,14 +1937,27 @@ static struct shutdown_action __refdata dump_action = {
 
 static void dump_reipl_run(struct shutdown_trigger *trigger)
 {
-       unsigned long ipib = (unsigned long) reipl_block_actual;
        struct lowcore *abs_lc;
        unsigned int csum;
 
+       /*
+        * Set REIPL_CLEAR flag in os_info flags entry indicating
+        * 'clear' sysfs attribute has been set on the panicked system
+        * for specified reipl type.
+        * Always set for IPL_TYPE_NSS and IPL_TYPE_UNKNOWN.
+        */
+       if ((reipl_type == IPL_TYPE_CCW && reipl_ccw_clear) ||
+           (reipl_type == IPL_TYPE_ECKD && reipl_eckd_clear) ||
+           (reipl_type == IPL_TYPE_FCP && reipl_fcp_clear) ||
+           (reipl_type == IPL_TYPE_NVME && reipl_nvme_clear) ||
+           reipl_type == IPL_TYPE_NSS ||
+           reipl_type == IPL_TYPE_UNKNOWN)
+               os_info_flags |= OS_INFO_FLAG_REIPL_CLEAR;
+       os_info_entry_add(OS_INFO_FLAGS_ENTRY, &os_info_flags, sizeof(os_info_flags));
        csum = (__force unsigned int)
               csum_partial(reipl_block_actual, reipl_block_actual->hdr.len, 0);
        abs_lc = get_abs_lowcore();
-       abs_lc->ipib = ipib;
+       abs_lc->ipib = __pa(reipl_block_actual);
        abs_lc->ipib_checksum = csum;
        put_abs_lowcore(abs_lc);
        dump_run(trigger);
index f1b35dc..42215f9 100644 (file)
@@ -352,7 +352,8 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
                        rc = apply_rela_bits(loc, val, 0, 64, 0, write);
                else if (r_type == R_390_GOTENT ||
                         r_type == R_390_GOTPLTENT) {
-                       val += (Elf_Addr) me->mem[MOD_TEXT].base - loc;
+                       val += (Elf_Addr)me->mem[MOD_TEXT].base +
+                               me->arch.got_offset - loc;
                        rc = apply_rela_bits(loc, val, 1, 32, 1, write);
                }
                break;
index cf1b6e8..9067914 100644 (file)
@@ -76,6 +76,7 @@ static inline int ctr_stcctm(enum cpumf_ctr_set set, u64 range, u64 *dest)
 }
 
 struct cpu_cf_events {
+       refcount_t refcnt;              /* Reference count */
        atomic_t                ctr_set[CPUMF_CTR_SET_MAX];
        u64                     state;          /* For perf_event_open SVC */
        u64                     dev_state;      /* For /dev/hwctr */
@@ -88,9 +89,6 @@ struct cpu_cf_events {
        unsigned int sets;              /* # Counter set saved in memory */
 };
 
-/* Per-CPU event structure for the counter facility */
-static DEFINE_PER_CPU(struct cpu_cf_events, cpu_cf_events);
-
 static unsigned int cfdiag_cpu_speed;  /* CPU speed for CF_DIAG trailer */
 static debug_info_t *cf_dbg;
 
@@ -103,6 +101,221 @@ static debug_info_t *cf_dbg;
  */
 static struct cpumf_ctr_info   cpumf_ctr_info;
 
+struct cpu_cf_ptr {
+       struct cpu_cf_events *cpucf;
+};
+
+static struct cpu_cf_root {            /* Anchor to per CPU data */
+       refcount_t refcnt;              /* Overall active events */
+       struct cpu_cf_ptr __percpu *cfptr;
+} cpu_cf_root;
+
+/*
+ * Serialize event initialization and event removal. Both are called from
+ * user space in task context with perf_event_open() and close()
+ * system calls.
+ *
+ * This mutex serializes functions cpum_cf_alloc_cpu() called at event
+ * initialization via cpumf_pmu_event_init() and function cpum_cf_free_cpu()
+ * called at event removal via call back function hw_perf_event_destroy()
+ * when the event is deleted. They are serialized to enforce correct
+ * bookkeeping of pointer and reference counts anchored by
+ * struct cpu_cf_root and the access to cpu_cf_root::refcnt and the
+ * per CPU pointers stored in cpu_cf_root::cfptr.
+ */
+static DEFINE_MUTEX(pmc_reserve_mutex);
+
+/*
+ * Get pointer to per-cpu structure.
+ *
+ * Function get_cpu_cfhw() is called from
+ * - cfset_copy_all(): This function is protected by cpus_read_lock(), so
+ *   CPU hot plug remove can not happen. Event removal requires a close()
+ *   first.
+ *
+ * Function this_cpu_cfhw() is called from perf common code functions:
+ * - pmu_{en|dis}able(), pmu_{add|del}()and pmu_{start|stop}():
+ *   All functions execute with interrupts disabled on that particular CPU.
+ * - cfset_ioctl_{on|off}, cfset_cpu_read(): see comment cfset_copy_all().
+ *
+ * Therefore it is safe to access the CPU specific pointer to the event.
+ */
+static struct cpu_cf_events *get_cpu_cfhw(int cpu)
+{
+       struct cpu_cf_ptr __percpu *p = cpu_cf_root.cfptr;
+
+       if (p) {
+               struct cpu_cf_ptr *q = per_cpu_ptr(p, cpu);
+
+               return q->cpucf;
+       }
+       return NULL;
+}
+
+static struct cpu_cf_events *this_cpu_cfhw(void)
+{
+       return get_cpu_cfhw(smp_processor_id());
+}
+
+/* Disable counter sets on dedicated CPU */
+static void cpum_cf_reset_cpu(void *flags)
+{
+       lcctl(0);
+}
+
+/* Free per CPU data when the last event is removed. */
+static void cpum_cf_free_root(void)
+{
+       if (!refcount_dec_and_test(&cpu_cf_root.refcnt))
+               return;
+       free_percpu(cpu_cf_root.cfptr);
+       cpu_cf_root.cfptr = NULL;
+       irq_subclass_unregister(IRQ_SUBCLASS_MEASUREMENT_ALERT);
+       on_each_cpu(cpum_cf_reset_cpu, NULL, 1);
+       debug_sprintf_event(cf_dbg, 4, "%s2 root.refcnt %u cfptr %px\n",
+                           __func__, refcount_read(&cpu_cf_root.refcnt),
+                           cpu_cf_root.cfptr);
+}
+
+/*
+ * On initialization of first event also allocate per CPU data dynamically.
+ * Start with an array of pointers, the array size is the maximum number of
+ * CPUs possible, which might be larger than the number of CPUs currently
+ * online.
+ */
+static int cpum_cf_alloc_root(void)
+{
+       int rc = 0;
+
+       if (refcount_inc_not_zero(&cpu_cf_root.refcnt))
+               return rc;
+
+       /* The memory is already zeroed. */
+       cpu_cf_root.cfptr = alloc_percpu(struct cpu_cf_ptr);
+       if (cpu_cf_root.cfptr) {
+               refcount_set(&cpu_cf_root.refcnt, 1);
+               on_each_cpu(cpum_cf_reset_cpu, NULL, 1);
+               irq_subclass_register(IRQ_SUBCLASS_MEASUREMENT_ALERT);
+       } else {
+               rc = -ENOMEM;
+       }
+
+       return rc;
+}
+
+/* Free CPU counter data structure for a PMU */
+static void cpum_cf_free_cpu(int cpu)
+{
+       struct cpu_cf_events *cpuhw;
+       struct cpu_cf_ptr *p;
+
+       mutex_lock(&pmc_reserve_mutex);
+       /*
+        * When invoked via CPU hotplug handler, there might be no events
+        * installed or that particular CPU might not have an
+        * event installed. This anchor pointer can be NULL!
+        */
+       if (!cpu_cf_root.cfptr)
+               goto out;
+       p = per_cpu_ptr(cpu_cf_root.cfptr, cpu);
+       cpuhw = p->cpucf;
+       /*
+        * Might be zero when called from CPU hotplug handler and no event
+        * installed on that CPU, but on different CPUs.
+        */
+       if (!cpuhw)
+               goto out;
+
+       if (refcount_dec_and_test(&cpuhw->refcnt)) {
+               kfree(cpuhw);
+               p->cpucf = NULL;
+       }
+       cpum_cf_free_root();
+out:
+       mutex_unlock(&pmc_reserve_mutex);
+}
+
+/* Allocate CPU counter data structure for a PMU. Called under mutex lock. */
+static int cpum_cf_alloc_cpu(int cpu)
+{
+       struct cpu_cf_events *cpuhw;
+       struct cpu_cf_ptr *p;
+       int rc;
+
+       mutex_lock(&pmc_reserve_mutex);
+       rc = cpum_cf_alloc_root();
+       if (rc)
+               goto unlock;
+       p = per_cpu_ptr(cpu_cf_root.cfptr, cpu);
+       cpuhw = p->cpucf;
+
+       if (!cpuhw) {
+               cpuhw = kzalloc(sizeof(*cpuhw), GFP_KERNEL);
+               if (cpuhw) {
+                       p->cpucf = cpuhw;
+                       refcount_set(&cpuhw->refcnt, 1);
+               } else {
+                       rc = -ENOMEM;
+               }
+       } else {
+               refcount_inc(&cpuhw->refcnt);
+       }
+       if (rc) {
+               /*
+                * Error in allocation of event, decrement anchor. Since
+                * cpu_cf_event in not created, its destroy() function is not
+                * invoked. Adjust the reference counter for the anchor.
+                */
+               cpum_cf_free_root();
+       }
+unlock:
+       mutex_unlock(&pmc_reserve_mutex);
+       return rc;
+}
+
+/*
+ * Create/delete per CPU data structures for /dev/hwctr interface and events
+ * created by perf_event_open().
+ * If cpu is -1, track task on all available CPUs. This requires
+ * allocation of hardware data structures for all CPUs. This setup handles
+ * perf_event_open() with task context and /dev/hwctr interface.
+ * If cpu is non-zero install event on this CPU only. This setup handles
+ * perf_event_open() with CPU context.
+ */
+static int cpum_cf_alloc(int cpu)
+{
+       cpumask_var_t mask;
+       int rc;
+
+       if (cpu == -1) {
+               if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+                       return -ENOMEM;
+               for_each_online_cpu(cpu) {
+                       rc = cpum_cf_alloc_cpu(cpu);
+                       if (rc) {
+                               for_each_cpu(cpu, mask)
+                                       cpum_cf_free_cpu(cpu);
+                               break;
+                       }
+                       cpumask_set_cpu(cpu, mask);
+               }
+               free_cpumask_var(mask);
+       } else {
+               rc = cpum_cf_alloc_cpu(cpu);
+       }
+       return rc;
+}
+
+static void cpum_cf_free(int cpu)
+{
+       if (cpu == -1) {
+               for_each_online_cpu(cpu)
+                       cpum_cf_free_cpu(cpu);
+       } else {
+               cpum_cf_free_cpu(cpu);
+       }
+}
+
 #define        CF_DIAG_CTRSET_DEF              0xfeef  /* Counter set header mark */
                                                /* interval in seconds */
 
@@ -451,10 +664,10 @@ static int validate_ctr_version(const u64 config, enum cpumf_ctr_set set)
  */
 static void cpumf_pmu_enable(struct pmu *pmu)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
        int err;
 
-       if (cpuhw->flags & PMU_F_ENABLED)
+       if (!cpuhw || (cpuhw->flags & PMU_F_ENABLED))
                return;
 
        err = lcctl(cpuhw->state | cpuhw->dev_state);
@@ -471,11 +684,11 @@ static void cpumf_pmu_enable(struct pmu *pmu)
  */
 static void cpumf_pmu_disable(struct pmu *pmu)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
-       int err;
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
        u64 inactive;
+       int err;
 
-       if (!(cpuhw->flags & PMU_F_ENABLED))
+       if (!cpuhw || !(cpuhw->flags & PMU_F_ENABLED))
                return;
 
        inactive = cpuhw->state & ~((1 << CPUMF_LCCTL_ENABLE_SHIFT) - 1);
@@ -487,58 +700,10 @@ static void cpumf_pmu_disable(struct pmu *pmu)
                cpuhw->flags &= ~PMU_F_ENABLED;
 }
 
-#define PMC_INIT      0UL
-#define PMC_RELEASE   1UL
-
-static void cpum_cf_setup_cpu(void *flags)
-{
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
-
-       switch ((unsigned long)flags) {
-       case PMC_INIT:
-               cpuhw->flags |= PMU_F_RESERVED;
-               break;
-
-       case PMC_RELEASE:
-               cpuhw->flags &= ~PMU_F_RESERVED;
-               break;
-       }
-
-       /* Disable CPU counter sets */
-       lcctl(0);
-       debug_sprintf_event(cf_dbg, 5, "%s flags %#x flags %#x state %#llx\n",
-                           __func__, *(int *)flags, cpuhw->flags,
-                           cpuhw->state);
-}
-
-/* Initialize the CPU-measurement counter facility */
-static int __kernel_cpumcf_begin(void)
-{
-       on_each_cpu(cpum_cf_setup_cpu, (void *)PMC_INIT, 1);
-       irq_subclass_register(IRQ_SUBCLASS_MEASUREMENT_ALERT);
-
-       return 0;
-}
-
-/* Release the CPU-measurement counter facility */
-static void __kernel_cpumcf_end(void)
-{
-       on_each_cpu(cpum_cf_setup_cpu, (void *)PMC_RELEASE, 1);
-       irq_subclass_unregister(IRQ_SUBCLASS_MEASUREMENT_ALERT);
-}
-
-/* Number of perf events counting hardware events */
-static atomic_t num_events = ATOMIC_INIT(0);
-/* Used to avoid races in calling reserve/release_cpumf_hardware */
-static DEFINE_MUTEX(pmc_reserve_mutex);
-
 /* Release the PMU if event is the last perf event */
 static void hw_perf_event_destroy(struct perf_event *event)
 {
-       mutex_lock(&pmc_reserve_mutex);
-       if (atomic_dec_return(&num_events) == 0)
-               __kernel_cpumcf_end();
-       mutex_unlock(&pmc_reserve_mutex);
+       cpum_cf_free(event->cpu);
 }
 
 /* CPUMF <-> perf event mappings for kernel+userspace (basic set) */
@@ -562,14 +727,6 @@ static const int cpumf_generic_events_user[] = {
        [PERF_COUNT_HW_BUS_CYCLES]          = -1,
 };
 
-static void cpumf_hw_inuse(void)
-{
-       mutex_lock(&pmc_reserve_mutex);
-       if (atomic_inc_return(&num_events) == 1)
-               __kernel_cpumcf_begin();
-       mutex_unlock(&pmc_reserve_mutex);
-}
-
 static int is_userspace_event(u64 ev)
 {
        return cpumf_generic_events_user[PERF_COUNT_HW_CPU_CYCLES] == ev ||
@@ -653,7 +810,8 @@ static int __hw_perf_event_init(struct perf_event *event, unsigned int type)
        }
 
        /* Initialize for using the CPU-measurement counter facility */
-       cpumf_hw_inuse();
+       if (cpum_cf_alloc(event->cpu))
+               return -ENOMEM;
        event->destroy = hw_perf_event_destroy;
 
        /*
@@ -756,7 +914,7 @@ static void cpumf_pmu_read(struct perf_event *event)
 
 static void cpumf_pmu_start(struct perf_event *event, int flags)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
        struct hw_perf_event *hwc = &event->hw;
        int i;
 
@@ -830,7 +988,7 @@ static int cfdiag_push_sample(struct perf_event *event,
 
 static void cpumf_pmu_stop(struct perf_event *event, int flags)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
        struct hw_perf_event *hwc = &event->hw;
        int i;
 
@@ -857,8 +1015,7 @@ static void cpumf_pmu_stop(struct perf_event *event, int flags)
                                                      false);
                        if (cfdiag_diffctr(cpuhw, event->hw.config_base))
                                cfdiag_push_sample(event, cpuhw);
-               } else if (cpuhw->flags & PMU_F_RESERVED) {
-                       /* Only update when PMU not hotplugged off */
+               } else {
                        hw_perf_event_update(event);
                }
                hwc->state |= PERF_HES_UPTODATE;
@@ -867,7 +1024,7 @@ static void cpumf_pmu_stop(struct perf_event *event, int flags)
 
 static int cpumf_pmu_add(struct perf_event *event, int flags)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
 
        ctr_set_enable(&cpuhw->state, event->hw.config_base);
        event->hw.state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
@@ -880,7 +1037,7 @@ static int cpumf_pmu_add(struct perf_event *event, int flags)
 
 static void cpumf_pmu_del(struct perf_event *event, int flags)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
        int i;
 
        cpumf_pmu_stop(event, PERF_EF_UPDATE);
@@ -912,29 +1069,83 @@ static struct pmu cpumf_pmu = {
        .read         = cpumf_pmu_read,
 };
 
-static int cpum_cf_setup(unsigned int cpu, unsigned long flags)
-{
-       local_irq_disable();
-       cpum_cf_setup_cpu((void *)flags);
-       local_irq_enable();
-       return 0;
-}
+static struct cfset_session {          /* CPUs and counter set bit mask */
+       struct list_head head;          /* Head of list of active processes */
+} cfset_session = {
+       .head = LIST_HEAD_INIT(cfset_session.head)
+};
+
+static refcount_t cfset_opencnt = REFCOUNT_INIT(0);    /* Access count */
+/*
+ * Synchronize access to device /dev/hwc. This mutex protects against
+ * concurrent access to functions cfset_open() and cfset_release().
+ * Same for CPU hotplug add and remove events triggering
+ * cpum_cf_online_cpu() and cpum_cf_offline_cpu().
+ * It also serializes concurrent device ioctl access from multiple
+ * processes accessing /dev/hwc.
+ *
+ * The mutex protects concurrent access to the /dev/hwctr session management
+ * struct cfset_session and reference counting variable cfset_opencnt.
+ */
+static DEFINE_MUTEX(cfset_ctrset_mutex);
 
+/*
+ * CPU hotplug handles only /dev/hwctr device.
+ * For perf_event_open() the CPU hotplug handling is done on kernel common
+ * code:
+ * - CPU add: Nothing is done since a file descriptor can not be created
+ *   and returned to the user.
+ * - CPU delete: Handled by common code via pmu_disable(), pmu_stop() and
+ *   pmu_delete(). The event itself is removed when the file descriptor is
+ *   closed.
+ */
 static int cfset_online_cpu(unsigned int cpu);
+
 static int cpum_cf_online_cpu(unsigned int cpu)
 {
-       debug_sprintf_event(cf_dbg, 4, "%s cpu %d in_irq %ld\n", __func__,
-                           cpu, in_interrupt());
-       cpum_cf_setup(cpu, PMC_INIT);
-       return cfset_online_cpu(cpu);
+       int rc = 0;
+
+       debug_sprintf_event(cf_dbg, 4, "%s cpu %d root.refcnt %d "
+                           "opencnt %d\n", __func__, cpu,
+                           refcount_read(&cpu_cf_root.refcnt),
+                           refcount_read(&cfset_opencnt));
+       /*
+        * Ignore notification for perf_event_open().
+        * Handle only /dev/hwctr device sessions.
+        */
+       mutex_lock(&cfset_ctrset_mutex);
+       if (refcount_read(&cfset_opencnt)) {
+               rc = cpum_cf_alloc_cpu(cpu);
+               if (!rc)
+                       cfset_online_cpu(cpu);
+       }
+       mutex_unlock(&cfset_ctrset_mutex);
+       return rc;
 }
 
 static int cfset_offline_cpu(unsigned int cpu);
+
 static int cpum_cf_offline_cpu(unsigned int cpu)
 {
-       debug_sprintf_event(cf_dbg, 4, "%s cpu %d\n", __func__, cpu);
-       cfset_offline_cpu(cpu);
-       return cpum_cf_setup(cpu, PMC_RELEASE);
+       debug_sprintf_event(cf_dbg, 4, "%s cpu %d root.refcnt %d opencnt %d\n",
+                           __func__, cpu, refcount_read(&cpu_cf_root.refcnt),
+                           refcount_read(&cfset_opencnt));
+       /*
+        * During task exit processing of grouped perf events triggered by CPU
+        * hotplug processing, pmu_disable() is called as part of perf context
+        * removal process. Therefore do not trigger event removal now for
+        * perf_event_open() created events. Perf common code triggers event
+        * destruction when the event file descriptor is closed.
+        *
+        * Handle only /dev/hwctr device sessions.
+        */
+       mutex_lock(&cfset_ctrset_mutex);
+       if (refcount_read(&cfset_opencnt)) {
+               cfset_offline_cpu(cpu);
+               cpum_cf_free_cpu(cpu);
+       }
+       mutex_unlock(&cfset_ctrset_mutex);
+       return 0;
 }
 
 /* Return true if store counter set multiple instruction is available */
@@ -953,13 +1164,13 @@ static void cpumf_measurement_alert(struct ext_code ext_code,
                return;
 
        inc_irq_stat(IRQEXT_CMC);
-       cpuhw = this_cpu_ptr(&cpu_cf_events);
 
        /*
         * Measurement alerts are shared and might happen when the PMU
         * is not reserved.  Ignore these alerts in this case.
         */
-       if (!(cpuhw->flags & PMU_F_RESERVED))
+       cpuhw = this_cpu_cfhw();
+       if (!cpuhw)
                return;
 
        /* counter authorization change alert */
@@ -1039,19 +1250,11 @@ out1:
  * counter set via normal file operations.
  */
 
-static atomic_t cfset_opencnt = ATOMIC_INIT(0);                /* Access count */
-static DEFINE_MUTEX(cfset_ctrset_mutex);/* Synchronize access to hardware */
 struct cfset_call_on_cpu_parm {                /* Parm struct for smp_call_on_cpu */
        unsigned int sets;              /* Counter set bit mask */
        atomic_t cpus_ack;              /* # CPUs successfully executed func */
 };
 
-static struct cfset_session {          /* CPUs and counter set bit mask */
-       struct list_head head;          /* Head of list of active processes */
-} cfset_session = {
-       .head = LIST_HEAD_INIT(cfset_session.head)
-};
-
 struct cfset_request {                 /* CPUs and counter set bit mask */
        unsigned long ctrset;           /* Bit mask of counter set to read */
        cpumask_t mask;                 /* CPU mask to read from */
@@ -1113,11 +1316,11 @@ static void cfset_session_add(struct cfset_request *p)
 /* Stop all counter sets via ioctl interface */
 static void cfset_ioctl_off(void *parm)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
        struct cfset_call_on_cpu_parm *p = parm;
        int rc;
 
-       /* Check if any counter set used by /dev/hwc */
+       /* Check if any counter set used by /dev/hwctr */
        for (rc = CPUMF_CTR_SET_BASIC; rc < CPUMF_CTR_SET_MAX; ++rc)
                if ((p->sets & cpumf_ctr_ctl[rc])) {
                        if (!atomic_dec_return(&cpuhw->ctr_set[rc])) {
@@ -1141,7 +1344,7 @@ static void cfset_ioctl_off(void *parm)
 /* Start counter sets on particular CPU */
 static void cfset_ioctl_on(void *parm)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
        struct cfset_call_on_cpu_parm *p = parm;
        int rc;
 
@@ -1163,7 +1366,7 @@ static void cfset_ioctl_on(void *parm)
 
 static void cfset_release_cpu(void *p)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
        int rc;
 
        debug_sprintf_event(cf_dbg, 4, "%s state %#llx dev_state %#llx\n",
@@ -1203,27 +1406,41 @@ static int cfset_release(struct inode *inode, struct file *file)
                kfree(file->private_data);
                file->private_data = NULL;
        }
-       if (!atomic_dec_return(&cfset_opencnt))
+       if (refcount_dec_and_test(&cfset_opencnt)) {    /* Last close */
                on_each_cpu(cfset_release_cpu, NULL, 1);
+               cpum_cf_free(-1);
+       }
        mutex_unlock(&cfset_ctrset_mutex);
-
-       hw_perf_event_destroy(NULL);
        return 0;
 }
 
+/*
+ * Open via /dev/hwctr device. Allocate all per CPU resources on the first
+ * open of the device. The last close releases all per CPU resources.
+ * Parallel perf_event_open system calls also use per CPU resources.
+ * These invocations are handled via reference counting on the per CPU data
+ * structures.
+ */
 static int cfset_open(struct inode *inode, struct file *file)
 {
-       if (!capable(CAP_SYS_ADMIN))
+       int rc = 0;
+
+       if (!perfmon_capable())
                return -EPERM;
+       file->private_data = NULL;
+
        mutex_lock(&cfset_ctrset_mutex);
-       if (atomic_inc_return(&cfset_opencnt) == 1)
-               cfset_session_init();
+       if (!refcount_inc_not_zero(&cfset_opencnt)) {   /* First open */
+               rc = cpum_cf_alloc(-1);
+               if (!rc) {
+                       cfset_session_init();
+                       refcount_set(&cfset_opencnt, 1);
+               }
+       }
        mutex_unlock(&cfset_ctrset_mutex);
 
-       cpumf_hw_inuse();
-       file->private_data = NULL;
        /* nonseekable_open() never fails */
-       return nonseekable_open(inode, file);
+       return rc ?: nonseekable_open(inode, file);
 }
 
 static int cfset_all_start(struct cfset_request *req)
@@ -1280,7 +1497,7 @@ static int cfset_all_copy(unsigned long arg, cpumask_t *mask)
        ctrset_read = (struct s390_ctrset_read __user *)arg;
        uptr = ctrset_read->data;
        for_each_cpu(cpu, mask) {
-               struct cpu_cf_events *cpuhw = per_cpu_ptr(&cpu_cf_events, cpu);
+               struct cpu_cf_events *cpuhw = get_cpu_cfhw(cpu);
                struct s390_ctrset_cpudata __user *ctrset_cpudata;
 
                ctrset_cpudata = uptr;
@@ -1324,7 +1541,7 @@ static size_t cfset_cpuset_read(struct s390_ctrset_setdata *p, int ctrset,
 /* Read all counter sets. */
 static void cfset_cpu_read(void *parm)
 {
-       struct cpu_cf_events *cpuhw = this_cpu_ptr(&cpu_cf_events);
+       struct cpu_cf_events *cpuhw = this_cpu_cfhw();
        struct cfset_call_on_cpu_parm *p = parm;
        int set, set_size;
        size_t space;
@@ -1348,9 +1565,9 @@ static void cfset_cpu_read(void *parm)
                        cpuhw->used += space;
                        cpuhw->sets += 1;
                }
+               debug_sprintf_event(cf_dbg, 4, "%s sets %d used %zd\n", __func__,
+                                   cpuhw->sets, cpuhw->used);
        }
-       debug_sprintf_event(cf_dbg, 4, "%s sets %d used %zd\n", __func__,
-                           cpuhw->sets, cpuhw->used);
 }
 
 static int cfset_all_read(unsigned long arg, struct cfset_request *req)
@@ -1502,6 +1719,7 @@ static struct miscdevice cfset_dev = {
        .name   = S390_HWCTR_DEVICE,
        .minor  = MISC_DYNAMIC_MINOR,
        .fops   = &cfset_fops,
+       .mode   = 0666,
 };
 
 /* Hotplug add of a CPU. Scan through all active processes and add
@@ -1512,7 +1730,6 @@ static int cfset_online_cpu(unsigned int cpu)
        struct cfset_call_on_cpu_parm p;
        struct cfset_request *rp;
 
-       mutex_lock(&cfset_ctrset_mutex);
        if (!list_empty(&cfset_session.head)) {
                list_for_each_entry(rp, &cfset_session.head, node) {
                        p.sets = rp->ctrset;
@@ -1520,19 +1737,18 @@ static int cfset_online_cpu(unsigned int cpu)
                        cpumask_set_cpu(cpu, &rp->mask);
                }
        }
-       mutex_unlock(&cfset_ctrset_mutex);
        return 0;
 }
 
 /* Hotplug remove of a CPU. Scan through all active processes and clear
  * that CPU from the list of CPUs supplied with ioctl(..., START, ...).
+ * Adjust reference counts.
  */
 static int cfset_offline_cpu(unsigned int cpu)
 {
        struct cfset_call_on_cpu_parm p;
        struct cfset_request *rp;
 
-       mutex_lock(&cfset_ctrset_mutex);
        if (!list_empty(&cfset_session.head)) {
                list_for_each_entry(rp, &cfset_session.head, node) {
                        p.sets = rp->ctrset;
@@ -1540,7 +1756,6 @@ static int cfset_offline_cpu(unsigned int cpu)
                        cpumask_clear_cpu(cpu, &rp->mask);
                }
        }
-       mutex_unlock(&cfset_ctrset_mutex);
        return 0;
 }
 
@@ -1618,7 +1833,8 @@ static int cfdiag_event_init(struct perf_event *event)
        }
 
        /* Initialize for using the CPU-measurement counter facility */
-       cpumf_hw_inuse();
+       if (cpum_cf_alloc(event->cpu))
+               return -ENOMEM;
        event->destroy = hw_perf_event_destroy;
 
        err = cfdiag_event_init2(event);
index 7ef72f5..8ecfbce 100644 (file)
@@ -1271,16 +1271,6 @@ static void hw_collect_samples(struct perf_event *event, unsigned long *sdbt,
        }
 }
 
-static inline __uint128_t __cdsg(__uint128_t *ptr, __uint128_t old, __uint128_t new)
-{
-       asm volatile(
-               "       cdsg    %[old],%[new],%[ptr]\n"
-               : [old] "+d" (old), [ptr] "+QS" (*ptr)
-               : [new] "d" (new)
-               : "memory", "cc");
-       return old;
-}
-
 /* hw_perf_event_update() - Process sampling buffer
  * @event:     The perf event
  * @flush_all: Flag to also flush partially filled sample-data-blocks
@@ -1352,7 +1342,7 @@ static void hw_perf_event_update(struct perf_event *event, int flush_all)
                        new.f = 0;
                        new.a = 1;
                        new.overflow = 0;
-                       prev.val = __cdsg(&te->header.val, old.val, new.val);
+                       prev.val = cmpxchg128(&te->header.val, old.val, new.val);
                } while (prev.val != old.val);
 
                /* Advance to next sample-data-block */
@@ -1562,7 +1552,7 @@ static bool aux_set_alert(struct aux_buffer *aux, unsigned long alert_index,
                }
                new.a = 1;
                new.overflow = 0;
-               prev.val = __cdsg(&te->header.val, old.val, new.val);
+               prev.val = cmpxchg128(&te->header.val, old.val, new.val);
        } while (prev.val != old.val);
        return true;
 }
@@ -1636,7 +1626,7 @@ static bool aux_reset_buffer(struct aux_buffer *aux, unsigned long range,
                                new.a = 1;
                        else
                                new.a = 0;
-                       prev.val = __cdsg(&te->header.val, old.val, new.val);
+                       prev.val = cmpxchg128(&te->header.val, old.val, new.val);
                } while (prev.val != old.val);
                *overflow += orig_overflow;
        }
index a7b339c..fe7d177 100644 (file)
@@ -36,7 +36,7 @@ struct paicrypt_map {
        unsigned long *page;            /* Page for CPU to store counters */
        struct pai_userdata *save;      /* Page to store no-zero counters */
        unsigned int active_events;     /* # of PAI crypto users */
-       unsigned int refcnt;            /* Reference count mapped buffers */
+       refcount_t refcnt;              /* Reference count mapped buffers */
        enum paievt_mode mode;          /* Type of event */
        struct perf_event *event;       /* Perf event for sampling */
 };
@@ -57,10 +57,11 @@ static void paicrypt_event_destroy(struct perf_event *event)
        static_branch_dec(&pai_key);
        mutex_lock(&pai_reserve_mutex);
        debug_sprintf_event(cfm_dbg, 5, "%s event %#llx cpu %d users %d"
-                           " mode %d refcnt %d\n", __func__,
+                           " mode %d refcnt %u\n", __func__,
                            event->attr.config, event->cpu,
-                           cpump->active_events, cpump->mode, cpump->refcnt);
-       if (!--cpump->refcnt) {
+                           cpump->active_events, cpump->mode,
+                           refcount_read(&cpump->refcnt));
+       if (refcount_dec_and_test(&cpump->refcnt)) {
                debug_sprintf_event(cfm_dbg, 4, "%s page %#lx save %p\n",
                                    __func__, (unsigned long)cpump->page,
                                    cpump->save);
@@ -149,8 +150,10 @@ static int paicrypt_busy(struct perf_event_attr *a, struct paicrypt_map *cpump)
        /* Allocate memory for counter page and counter extraction.
         * Only the first counting event has to allocate a page.
         */
-       if (cpump->page)
+       if (cpump->page) {
+               refcount_inc(&cpump->refcnt);
                goto unlock;
+       }
 
        rc = -ENOMEM;
        cpump->page = (unsigned long *)get_zeroed_page(GFP_KERNEL);
@@ -164,18 +167,18 @@ static int paicrypt_busy(struct perf_event_attr *a, struct paicrypt_map *cpump)
                goto unlock;
        }
        rc = 0;
+       refcount_set(&cpump->refcnt, 1);
 
 unlock:
        /* If rc is non-zero, do not set mode and reference count */
        if (!rc) {
-               cpump->refcnt++;
                cpump->mode = a->sample_period ? PAI_MODE_SAMPLING
                                               : PAI_MODE_COUNTING;
        }
        debug_sprintf_event(cfm_dbg, 5, "%s sample_period %#llx users %d"
-                           " mode %d refcnt %d page %#lx save %p rc %d\n",
+                           " mode %d refcnt %u page %#lx save %p rc %d\n",
                            __func__, a->sample_period, cpump->active_events,
-                           cpump->mode, cpump->refcnt,
+                           cpump->mode, refcount_read(&cpump->refcnt),
                            (unsigned long)cpump->page, cpump->save, rc);
        mutex_unlock(&pai_reserve_mutex);
        return rc;
index fcea307..3b4f384 100644 (file)
@@ -50,7 +50,7 @@ struct paiext_map {
        struct pai_userdata *save;      /* Area to store non-zero counters */
        enum paievt_mode mode;          /* Type of event */
        unsigned int active_events;     /* # of PAI Extension users */
-       unsigned int refcnt;
+       refcount_t refcnt;
        struct perf_event *event;       /* Perf event for sampling */
        struct paiext_cb *paiext_cb;    /* PAI extension control block area */
 };
@@ -60,14 +60,14 @@ struct paiext_mapptr {
 };
 
 static struct paiext_root {            /* Anchor to per CPU data */
-       int refcnt;                     /* Overall active events */
+       refcount_t refcnt;              /* Overall active events */
        struct paiext_mapptr __percpu *mapptr;
 } paiext_root;
 
 /* Free per CPU data when the last event is removed. */
 static void paiext_root_free(void)
 {
-       if (!--paiext_root.refcnt) {
+       if (refcount_dec_and_test(&paiext_root.refcnt)) {
                free_percpu(paiext_root.mapptr);
                paiext_root.mapptr = NULL;
        }
@@ -80,7 +80,7 @@ static void paiext_root_free(void)
  */
 static int paiext_root_alloc(void)
 {
-       if (++paiext_root.refcnt == 1) {
+       if (!refcount_inc_not_zero(&paiext_root.refcnt)) {
                /* The memory is already zeroed. */
                paiext_root.mapptr = alloc_percpu(struct paiext_mapptr);
                if (!paiext_root.mapptr) {
@@ -91,6 +91,7 @@ static int paiext_root_alloc(void)
                         */
                        return -ENOMEM;
                }
+               refcount_set(&paiext_root.refcnt, 1);
        }
        return 0;
 }
@@ -122,7 +123,7 @@ static void paiext_event_destroy(struct perf_event *event)
 
        mutex_lock(&paiext_reserve_mutex);
        cpump->event = NULL;
-       if (!--cpump->refcnt)           /* Last reference gone */
+       if (refcount_dec_and_test(&cpump->refcnt))      /* Last reference gone */
                paiext_free(mp);
        paiext_root_free();
        mutex_unlock(&paiext_reserve_mutex);
@@ -163,7 +164,7 @@ static int paiext_alloc(struct perf_event_attr *a, struct perf_event *event)
                rc = -ENOMEM;
                cpump = kzalloc(sizeof(*cpump), GFP_KERNEL);
                if (!cpump)
-                       goto unlock;
+                       goto undo;
 
                /* Allocate memory for counter area and counter extraction.
                 * These are
@@ -183,8 +184,9 @@ static int paiext_alloc(struct perf_event_attr *a, struct perf_event *event)
                                             GFP_KERNEL);
                if (!cpump->save || !cpump->area || !cpump->paiext_cb) {
                        paiext_free(mp);
-                       goto unlock;
+                       goto undo;
                }
+               refcount_set(&cpump->refcnt, 1);
                cpump->mode = a->sample_period ? PAI_MODE_SAMPLING
                                               : PAI_MODE_COUNTING;
        } else {
@@ -195,15 +197,15 @@ static int paiext_alloc(struct perf_event_attr *a, struct perf_event *event)
                if (cpump->mode == PAI_MODE_SAMPLING ||
                    (cpump->mode == PAI_MODE_COUNTING && a->sample_period)) {
                        rc = -EBUSY;
-                       goto unlock;
+                       goto undo;
                }
+               refcount_inc(&cpump->refcnt);
        }
 
        rc = 0;
        cpump->event = event;
-       ++cpump->refcnt;
 
-unlock:
+undo:
        if (rc) {
                /* Error in allocation of event, decrement anchor. Since
                 * the event in not created, its destroy() function is never
@@ -211,6 +213,7 @@ unlock:
                 */
                paiext_root_free();
        }
+unlock:
        mutex_unlock(&paiext_reserve_mutex);
        /* If rc is non-zero, no increment of counter/sampler was done. */
        return rc;
index 6b7b6d5..2762781 100644 (file)
@@ -102,6 +102,11 @@ void __init time_early_init(void)
                        ((long) qui.old_leap * 4096000000L);
 }
 
+unsigned long long noinstr sched_clock_noinstr(void)
+{
+       return tod_to_ns(__get_tod_clock_monotonic());
+}
+
 /*
  * Scheduler clock - returns current time in nanosec units.
  */
index 9fd1953..68adf1d 100644 (file)
@@ -95,7 +95,7 @@ out:
 static void cpu_thread_map(cpumask_t *dst, unsigned int cpu)
 {
        static cpumask_t mask;
-       int i;
+       unsigned int max_cpu;
 
        cpumask_clear(&mask);
        if (!cpumask_test_cpu(cpu, &cpu_setup_mask))
@@ -104,9 +104,10 @@ static void cpu_thread_map(cpumask_t *dst, unsigned int cpu)
        if (topology_mode != TOPOLOGY_MODE_HW)
                goto out;
        cpu -= cpu % (smp_cpu_mtid + 1);
-       for (i = 0; i <= smp_cpu_mtid; i++) {
-               if (cpumask_test_cpu(cpu + i, &cpu_setup_mask))
-                       cpumask_set_cpu(cpu + i, &mask);
+       max_cpu = min(cpu + smp_cpu_mtid, nr_cpu_ids - 1);
+       for (; cpu <= max_cpu; cpu++) {
+               if (cpumask_test_cpu(cpu, &cpu_setup_mask))
+                       cpumask_set_cpu(cpu, &mask);
        }
 out:
        cpumask_copy(dst, &mask);
@@ -123,25 +124,26 @@ static void add_cpus_to_mask(struct topology_core *tl_core,
        unsigned int core;
 
        for_each_set_bit(core, &tl_core->mask, TOPOLOGY_CORE_BITS) {
-               unsigned int rcore;
-               int lcpu, i;
+               unsigned int max_cpu, rcore;
+               int cpu;
 
                rcore = TOPOLOGY_CORE_BITS - 1 - core + tl_core->origin;
-               lcpu = smp_find_processor_id(rcore << smp_cpu_mt_shift);
-               if (lcpu < 0)
+               cpu = smp_find_processor_id(rcore << smp_cpu_mt_shift);
+               if (cpu < 0)
                        continue;
-               for (i = 0; i <= smp_cpu_mtid; i++) {
-                       topo = &cpu_topology[lcpu + i];
+               max_cpu = min(cpu + smp_cpu_mtid, nr_cpu_ids - 1);
+               for (; cpu <= max_cpu; cpu++) {
+                       topo = &cpu_topology[cpu];
                        topo->drawer_id = drawer->id;
                        topo->book_id = book->id;
                        topo->socket_id = socket->id;
                        topo->core_id = rcore;
-                       topo->thread_id = lcpu + i;
+                       topo->thread_id = cpu;
                        topo->dedicated = tl_core->d;
-                       cpumask_set_cpu(lcpu + i, &drawer->mask);
-                       cpumask_set_cpu(lcpu + i, &book->mask);
-                       cpumask_set_cpu(lcpu + i, &socket->mask);
-                       smp_cpu_set_polarization(lcpu + i, tl_core->pp);
+                       cpumask_set_cpu(cpu, &drawer->mask);
+                       cpumask_set_cpu(cpu, &book->mask);
+                       cpumask_set_cpu(cpu, &socket->mask);
+                       smp_cpu_set_polarization(cpu, tl_core->pp);
                }
        }
 }
index 580d2e3..7c50eca 100644 (file)
@@ -3,7 +3,7 @@
 # Makefile for s390-specific library files..
 #
 
-lib-y += delay.o string.o uaccess.o find.o spinlock.o
+lib-y += delay.o string.o uaccess.o find.o spinlock.o tishift.o
 obj-y += mem.o xor.o
 lib-$(CONFIG_KPROBES) += probes.o
 lib-$(CONFIG_UPROBES) += probes.o
diff --git a/arch/s390/lib/tishift.S b/arch/s390/lib/tishift.S
new file mode 100644 (file)
index 0000000..de33cf0
--- /dev/null
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/linkage.h>
+#include <asm/nospec-insn.h>
+#include <asm/export.h>
+
+       .section .noinstr.text, "ax"
+
+       GEN_BR_THUNK %r14
+
+SYM_FUNC_START(__ashlti3)
+       lmg     %r0,%r1,0(%r3)
+       cije    %r4,0,1f
+       lhi     %r3,64
+       sr      %r3,%r4
+       jnh     0f
+       srlg    %r3,%r1,0(%r3)
+       sllg    %r0,%r0,0(%r4)
+       sllg    %r1,%r1,0(%r4)
+       ogr     %r0,%r3
+       j       1f
+0:     sllg    %r0,%r1,-64(%r4)
+       lghi    %r1,0
+1:     stmg    %r0,%r1,0(%r2)
+       BR_EX   %r14
+SYM_FUNC_END(__ashlti3)
+EXPORT_SYMBOL(__ashlti3)
+
+SYM_FUNC_START(__ashrti3)
+       lmg     %r0,%r1,0(%r3)
+       cije    %r4,0,1f
+       lhi     %r3,64
+       sr      %r3,%r4
+       jnh     0f
+       sllg    %r3,%r0,0(%r3)
+       srlg    %r1,%r1,0(%r4)
+       srag    %r0,%r0,0(%r4)
+       ogr     %r1,%r3
+       j       1f
+0:     srag    %r1,%r0,-64(%r4)
+       srag    %r0,%r0,63
+1:     stmg    %r0,%r1,0(%r2)
+       BR_EX   %r14
+SYM_FUNC_END(__ashrti3)
+EXPORT_SYMBOL(__ashrti3)
+
+SYM_FUNC_START(__lshrti3)
+       lmg     %r0,%r1,0(%r3)
+       cije    %r4,0,1f
+       lhi     %r3,64
+       sr      %r3,%r4
+       jnh     0f
+       sllg    %r3,%r0,0(%r3)
+       srlg    %r1,%r1,0(%r4)
+       srlg    %r0,%r0,0(%r4)
+       ogr     %r1,%r3
+       j       1f
+0:     srlg    %r1,%r0,-64(%r4)
+       lghi    %r0,0
+1:     stmg    %r0,%r1,0(%r2)
+       BR_EX   %r14
+SYM_FUNC_END(__lshrti3)
+EXPORT_SYMBOL(__lshrti3)
index 5ba3bd8..ca5a418 100644 (file)
@@ -4,6 +4,7 @@
  * Author(s): Jan Glauber <jang@linux.vnet.ibm.com>
  */
 #include <linux/hugetlb.h>
+#include <linux/proc_fs.h>
 #include <linux/vmalloc.h>
 #include <linux/mm.h>
 #include <asm/cacheflush.h>
index 5b22c6e..b9dcb4a 100644 (file)
@@ -667,7 +667,15 @@ static void __init memblock_region_swap(void *a, void *b, int size)
 
 #ifdef CONFIG_KASAN
 #define __sha(x)       ((unsigned long)kasan_mem_to_shadow((void *)x))
+
+static inline int set_memory_kasan(unsigned long start, unsigned long end)
+{
+       start = PAGE_ALIGN_DOWN(__sha(start));
+       end = PAGE_ALIGN(__sha(end));
+       return set_memory_rwnx(start, (end - start) >> PAGE_SHIFT);
+}
 #endif
+
 /*
  * map whole physical memory to virtual memory (identity mapping)
  * we reserve enough space in the vmalloc area for vmemmap to hotplug
@@ -737,10 +745,8 @@ void __init vmem_map_init(void)
        }
 
 #ifdef CONFIG_KASAN
-       for_each_mem_range(i, &base, &end) {
-               set_memory_rwnx(__sha(base),
-                               (__sha(end) - __sha(base)) >> PAGE_SHIFT);
-       }
+       for_each_mem_range(i, &base, &end)
+               set_memory_kasan(base, end);
 #endif
        set_memory_rox((unsigned long)_stext,
                       (unsigned long)(_etext - _stext) >> PAGE_SHIFT);
index 32573b4..cc8cf5a 100644 (file)
@@ -26,6 +26,7 @@ KBUILD_CFLAGS += -Wno-pointer-sign -Wno-sign-compare
 KBUILD_CFLAGS += -fno-zero-initialized-in-bss -fno-builtin -ffreestanding
 KBUILD_CFLAGS += -Os -m64 -msoft-float -fno-common
 KBUILD_CFLAGS += -fno-stack-protector
+KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
 KBUILD_CFLAGS += $(CLANG_FLAGS)
 KBUILD_CFLAGS += $(call cc-option,-fno-PIE)
 KBUILD_AFLAGS := $(filter-out -DCC_USING_EXPOLINE,$(KBUILD_AFLAGS))
index 9652d36..e339745 100644 (file)
@@ -6,6 +6,7 @@ config SUPERH
        select ARCH_ENABLE_MEMORY_HOTREMOVE if SPARSEMEM && MMU
        select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
        select ARCH_HAS_BINFMT_FLAT if !MMU
+       select ARCH_HAS_CPU_FINALIZE_INIT
        select ARCH_HAS_CURRENT_STACK_POINTER
        select ARCH_HAS_GIGANTIC_PAGE
        select ARCH_HAS_GCOV_PROFILE_ALL
index 059791f..cf1c10f 100644 (file)
@@ -71,6 +71,11 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t *v)         \
 ATOMIC_OPS(add)
 ATOMIC_OPS(sub)
 
+#define arch_atomic_add_return arch_atomic_add_return
+#define arch_atomic_sub_return arch_atomic_sub_return
+#define arch_atomic_fetch_add  arch_atomic_fetch_add
+#define arch_atomic_fetch_sub  arch_atomic_fetch_sub
+
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_FETCH_OP(op)
 
@@ -78,6 +83,10 @@ ATOMIC_OPS(and)
 ATOMIC_OPS(or)
 ATOMIC_OPS(xor)
 
+#define arch_atomic_fetch_and  arch_atomic_fetch_and
+#define arch_atomic_fetch_or   arch_atomic_fetch_or
+#define arch_atomic_fetch_xor  arch_atomic_fetch_xor
+
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
index 7665de9..b4090cc 100644 (file)
@@ -55,6 +55,11 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t *v)         \
 ATOMIC_OPS(add, +=)
 ATOMIC_OPS(sub, -=)
 
+#define arch_atomic_add_return arch_atomic_add_return
+#define arch_atomic_sub_return arch_atomic_sub_return
+#define arch_atomic_fetch_add  arch_atomic_fetch_add
+#define arch_atomic_fetch_sub  arch_atomic_fetch_sub
+
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(op, c_op)                                           \
        ATOMIC_OP(op, c_op)                                             \
@@ -64,6 +69,10 @@ ATOMIC_OPS(and, &=)
 ATOMIC_OPS(or, |=)
 ATOMIC_OPS(xor, ^=)
 
+#define arch_atomic_fetch_and  arch_atomic_fetch_and
+#define arch_atomic_fetch_or   arch_atomic_fetch_or
+#define arch_atomic_fetch_xor  arch_atomic_fetch_xor
+
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
index b63dcfb..9ef1fb1 100644 (file)
@@ -73,6 +73,11 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t *v)         \
 ATOMIC_OPS(add)
 ATOMIC_OPS(sub)
 
+#define arch_atomic_add_return arch_atomic_add_return
+#define arch_atomic_sub_return arch_atomic_sub_return
+#define arch_atomic_fetch_add  arch_atomic_fetch_add
+#define arch_atomic_fetch_sub  arch_atomic_fetch_sub
+
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_FETCH_OP(op)
 
@@ -80,6 +85,10 @@ ATOMIC_OPS(and)
 ATOMIC_OPS(or)
 ATOMIC_OPS(xor)
 
+#define arch_atomic_fetch_and  arch_atomic_fetch_and
+#define arch_atomic_fetch_or   arch_atomic_fetch_or
+#define arch_atomic_fetch_xor  arch_atomic_fetch_xor
+
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
index 528bfed..7a18cb2 100644 (file)
@@ -30,9 +30,6 @@
 #include <asm/atomic-irq.h>
 #endif
 
-#define arch_atomic_xchg(v, new)       (arch_xchg(&((v)->counter), new))
-#define arch_atomic_cmpxchg(v, o, n)   (arch_cmpxchg(&((v)->counter), (o), (n)))
-
 #endif /* CONFIG_CPU_J2 */
 
 #endif /* __ASM_SH_ATOMIC_H */
diff --git a/arch/sh/include/asm/bugs.h b/arch/sh/include/asm/bugs.h
deleted file mode 100644 (file)
index fe52abb..0000000
+++ /dev/null
@@ -1,74 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_SH_BUGS_H
-#define __ASM_SH_BUGS_H
-
-/*
- * This is included by init/main.c to check for architecture-dependent bugs.
- *
- * Needs:
- *     void check_bugs(void);
- */
-
-/*
- * I don't know of any Super-H bugs yet.
- */
-
-#include <asm/processor.h>
-
-extern void select_idle_routine(void);
-
-static void __init check_bugs(void)
-{
-       extern unsigned long loops_per_jiffy;
-       char *p = &init_utsname()->machine[2]; /* "sh" */
-
-       select_idle_routine();
-
-       current_cpu_data.loops_per_jiffy = loops_per_jiffy;
-
-       switch (current_cpu_data.family) {
-       case CPU_FAMILY_SH2:
-               *p++ = '2';
-               break;
-       case CPU_FAMILY_SH2A:
-               *p++ = '2';
-               *p++ = 'a';
-               break;
-       case CPU_FAMILY_SH3:
-               *p++ = '3';
-               break;
-       case CPU_FAMILY_SH4:
-               *p++ = '4';
-               break;
-       case CPU_FAMILY_SH4A:
-               *p++ = '4';
-               *p++ = 'a';
-               break;
-       case CPU_FAMILY_SH4AL_DSP:
-               *p++ = '4';
-               *p++ = 'a';
-               *p++ = 'l';
-               *p++ = '-';
-               *p++ = 'd';
-               *p++ = 's';
-               *p++ = 'p';
-               break;
-       case CPU_FAMILY_UNKNOWN:
-               /*
-                * Specifically use CPU_FAMILY_UNKNOWN rather than
-                * default:, so we're able to have the compiler whine
-                * about unhandled enumerations.
-                */
-               break;
-       }
-
-       printk("CPU: %s\n", get_cpu_subtype(&current_cpu_data));
-
-#ifndef __LITTLE_ENDIAN__
-       /* 'eb' means 'Endian Big' */
-       *p++ = 'e';
-       *p++ = 'b';
-#endif
-       *p = '\0';
-}
-#endif /* __ASM_SH_BUGS_H */
index 85a6c1c..73fba7c 100644 (file)
@@ -166,6 +166,8 @@ extern unsigned int instruction_size(unsigned int insn);
 #define instruction_size(insn) (2)
 #endif
 
+void select_idle_routine(void);
+
 #endif /* __ASSEMBLY__ */
 
 #include <asm/processor_32.h>
index d662503..045d93f 100644 (file)
@@ -15,6 +15,7 @@
 #include <linux/irqflags.h>
 #include <linux/smp.h>
 #include <linux/atomic.h>
+#include <asm/processor.h>
 #include <asm/smp.h>
 #include <asm/bl_bit.h>
 
index af977ec..cf7c0f7 100644 (file)
@@ -43,6 +43,7 @@
 #include <asm/smp.h>
 #include <asm/mmu_context.h>
 #include <asm/mmzone.h>
+#include <asm/processor.h>
 #include <asm/sparsemem.h>
 #include <asm/platform_early.h>
 
@@ -354,3 +355,57 @@ int test_mode_pin(int pin)
 {
        return sh_mv.mv_mode_pins() & pin;
 }
+
+void __init arch_cpu_finalize_init(void)
+{
+       char *p = &init_utsname()->machine[2]; /* "sh" */
+
+       select_idle_routine();
+
+       current_cpu_data.loops_per_jiffy = loops_per_jiffy;
+
+       switch (current_cpu_data.family) {
+       case CPU_FAMILY_SH2:
+               *p++ = '2';
+               break;
+       case CPU_FAMILY_SH2A:
+               *p++ = '2';
+               *p++ = 'a';
+               break;
+       case CPU_FAMILY_SH3:
+               *p++ = '3';
+               break;
+       case CPU_FAMILY_SH4:
+               *p++ = '4';
+               break;
+       case CPU_FAMILY_SH4A:
+               *p++ = '4';
+               *p++ = 'a';
+               break;
+       case CPU_FAMILY_SH4AL_DSP:
+               *p++ = '4';
+               *p++ = 'a';
+               *p++ = 'l';
+               *p++ = '-';
+               *p++ = 'd';
+               *p++ = 's';
+               *p++ = 'p';
+               break;
+       case CPU_FAMILY_UNKNOWN:
+               /*
+                * Specifically use CPU_FAMILY_UNKNOWN rather than
+                * default:, so we're able to have the compiler whine
+                * about unhandled enumerations.
+                */
+               break;
+       }
+
+       pr_info("CPU: %s\n", get_cpu_subtype(&current_cpu_data));
+
+#ifndef __LITTLE_ENDIAN__
+       /* 'eb' means 'Endian Big' */
+       *p++ = 'e';
+       *p++ = 'b';
+#endif
+       *p = '\0';
+}
index 8535e19..36fd488 100644 (file)
@@ -52,6 +52,7 @@ config SPARC
 config SPARC32
        def_bool !64BIT
        select ARCH_32BIT_OFF_T
+       select ARCH_HAS_CPU_FINALIZE_INIT if !SMP
        select ARCH_HAS_SYNC_DMA_FOR_CPU
        select CLZ_TAB
        select DMA_DIRECT_REMAP
index d775daa..60ce2fe 100644 (file)
 #include <asm-generic/atomic64.h>
 
 int arch_atomic_add_return(int, atomic_t *);
+#define arch_atomic_add_return arch_atomic_add_return
+
 int arch_atomic_fetch_add(int, atomic_t *);
+#define arch_atomic_fetch_add arch_atomic_fetch_add
+
 int arch_atomic_fetch_and(int, atomic_t *);
+#define arch_atomic_fetch_and arch_atomic_fetch_and
+
 int arch_atomic_fetch_or(int, atomic_t *);
+#define arch_atomic_fetch_or arch_atomic_fetch_or
+
 int arch_atomic_fetch_xor(int, atomic_t *);
+#define arch_atomic_fetch_xor arch_atomic_fetch_xor
+
 int arch_atomic_cmpxchg(atomic_t *, int, int);
+#define arch_atomic_cmpxchg arch_atomic_cmpxchg
+
 int arch_atomic_xchg(atomic_t *, int);
-int arch_atomic_fetch_add_unless(atomic_t *, int, int);
-void arch_atomic_set(atomic_t *, int);
+#define arch_atomic_xchg arch_atomic_xchg
 
+int arch_atomic_fetch_add_unless(atomic_t *, int, int);
 #define arch_atomic_fetch_add_unless arch_atomic_fetch_add_unless
 
+void arch_atomic_set(atomic_t *, int);
+
 #define arch_atomic_set_release(v, i)  arch_atomic_set((v), (i))
 
 #define arch_atomic_read(v)            READ_ONCE((v)->counter)
index 0778916..a5e9c37 100644 (file)
@@ -37,6 +37,16 @@ s64 arch_atomic64_fetch_##op(s64, atomic64_t *);
 ATOMIC_OPS(add)
 ATOMIC_OPS(sub)
 
+#define arch_atomic_add_return                 arch_atomic_add_return
+#define arch_atomic_sub_return                 arch_atomic_sub_return
+#define arch_atomic_fetch_add                  arch_atomic_fetch_add
+#define arch_atomic_fetch_sub                  arch_atomic_fetch_sub
+
+#define arch_atomic64_add_return               arch_atomic64_add_return
+#define arch_atomic64_sub_return               arch_atomic64_sub_return
+#define arch_atomic64_fetch_add                        arch_atomic64_fetch_add
+#define arch_atomic64_fetch_sub                        arch_atomic64_fetch_sub
+
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_FETCH_OP(op)
 
@@ -44,22 +54,19 @@ ATOMIC_OPS(and)
 ATOMIC_OPS(or)
 ATOMIC_OPS(xor)
 
+#define arch_atomic_fetch_and                  arch_atomic_fetch_and
+#define arch_atomic_fetch_or                   arch_atomic_fetch_or
+#define arch_atomic_fetch_xor                  arch_atomic_fetch_xor
+
+#define arch_atomic64_fetch_and                        arch_atomic64_fetch_and
+#define arch_atomic64_fetch_or                 arch_atomic64_fetch_or
+#define arch_atomic64_fetch_xor                        arch_atomic64_fetch_xor
+
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-#define arch_atomic_cmpxchg(v, o, n) (arch_cmpxchg(&((v)->counter), (o), (n)))
-
-static inline int arch_atomic_xchg(atomic_t *v, int new)
-{
-       return arch_xchg(&v->counter, new);
-}
-
-#define arch_atomic64_cmpxchg(v, o, n) \
-       ((__typeof__((v)->counter))arch_cmpxchg(&((v)->counter), (o), (n)))
-#define arch_atomic64_xchg(v, new) (arch_xchg(&((v)->counter), new))
-
 s64 arch_atomic64_dec_if_positive(atomic64_t *v);
 #define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
 
diff --git a/arch/sparc/include/asm/bugs.h b/arch/sparc/include/asm/bugs.h
deleted file mode 100644 (file)
index 02fa369..0000000
+++ /dev/null
@@ -1,18 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* include/asm/bugs.h:  Sparc probes for various bugs.
- *
- * Copyright (C) 1996, 2007 David S. Miller (davem@davemloft.net)
- */
-
-#ifdef CONFIG_SPARC32
-#include <asm/cpudata.h>
-#endif
-
-extern unsigned long loops_per_jiffy;
-
-static void __init check_bugs(void)
-{
-#if defined(CONFIG_SPARC32) && !defined(CONFIG_SMP)
-       cpu_data(0).udelay_val = loops_per_jiffy;
-#endif
-}
index c8e0dd9..c9d1ba4 100644 (file)
@@ -412,3 +412,10 @@ static int __init topology_init(void)
 }
 
 subsys_initcall(topology_init);
+
+#if defined(CONFIG_SPARC32) && !defined(CONFIG_SMP)
+void __init arch_cpu_finalize_init(void)
+{
+       cpu_data(0).udelay_val = loops_per_jiffy;
+}
+#endif
index 541a9b1..b5e1793 100644 (file)
@@ -5,7 +5,7 @@ menu "UML-specific options"
 config UML
        bool
        default y
-       select ARCH_EPHEMERAL_INODES
+       select ARCH_HAS_CPU_FINALIZE_INIT
        select ARCH_HAS_FORTIFY_SOURCE
        select ARCH_HAS_GCOV_PROFILE_ALL
        select ARCH_HAS_KCOV
index dee6f66..a461a95 100644 (file)
@@ -16,7 +16,8 @@ mconsole-objs := mconsole_kern.o mconsole_user.o
 hostaudio-objs := hostaudio_kern.o
 ubd-objs := ubd_kern.o ubd_user.o
 port-objs := port_kern.o port_user.o
-harddog-objs := harddog_kern.o harddog_user.o
+harddog-objs := harddog_kern.o
+harddog-builtin-$(CONFIG_UML_WATCHDOG) := harddog_user.o harddog_user_exp.o
 rtc-objs := rtc_kern.o rtc_user.o
 
 LDFLAGS_pcap.o = $(shell $(CC) $(KBUILD_CFLAGS) -print-file-name=libpcap.a)
@@ -60,6 +61,7 @@ obj-$(CONFIG_PTY_CHAN) += pty.o
 obj-$(CONFIG_TTY_CHAN) += tty.o 
 obj-$(CONFIG_XTERM_CHAN) += xterm.o xterm_kern.o
 obj-$(CONFIG_UML_WATCHDOG) += harddog.o
+obj-y += $(harddog-builtin-y) $(harddog-builtin-m)
 obj-$(CONFIG_BLK_DEV_COW_COMMON) += cow_user.o
 obj-$(CONFIG_UML_RANDOM) += random.o
 obj-$(CONFIG_VIRTIO_UML) += virtio_uml.o
diff --git a/arch/um/drivers/harddog.h b/arch/um/drivers/harddog.h
new file mode 100644 (file)
index 0000000..6d9ea60
--- /dev/null
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef UM_WATCHDOG_H
+#define UM_WATCHDOG_H
+
+int start_watchdog(int *in_fd_ret, int *out_fd_ret, char *sock);
+void stop_watchdog(int in_fd, int out_fd);
+int ping_watchdog(int fd);
+
+#endif /* UM_WATCHDOG_H */
index e6d4f43..60d1c6c 100644 (file)
@@ -47,6 +47,7 @@
 #include <linux/spinlock.h>
 #include <linux/uaccess.h>
 #include "mconsole.h"
+#include "harddog.h"
 
 MODULE_LICENSE("GPL");
 
@@ -60,8 +61,6 @@ static int harddog_out_fd = -1;
  *     Allow only one person to hold it open
  */
 
-extern int start_watchdog(int *in_fd_ret, int *out_fd_ret, char *sock);
-
 static int harddog_open(struct inode *inode, struct file *file)
 {
        int err = -EBUSY;
@@ -92,8 +91,6 @@ err:
        return err;
 }
 
-extern void stop_watchdog(int in_fd, int out_fd);
-
 static int harddog_release(struct inode *inode, struct file *file)
 {
        /*
@@ -112,8 +109,6 @@ static int harddog_release(struct inode *inode, struct file *file)
        return 0;
 }
 
-extern int ping_watchdog(int fd);
-
 static ssize_t harddog_write(struct file *file, const char __user *data, size_t len,
                             loff_t *ppos)
 {
index 070468d..9ed8930 100644 (file)
@@ -7,6 +7,7 @@
 #include <unistd.h>
 #include <errno.h>
 #include <os.h>
+#include "harddog.h"
 
 struct dog_data {
        int stdin_fd;
diff --git a/arch/um/drivers/harddog_user_exp.c b/arch/um/drivers/harddog_user_exp.c
new file mode 100644 (file)
index 0000000..c74d4b8
--- /dev/null
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/export.h>
+#include "harddog.h"
+
+#if IS_MODULE(CONFIG_UML_WATCHDOG)
+EXPORT_SYMBOL(start_watchdog);
+EXPORT_SYMBOL(stop_watchdog);
+EXPORT_SYMBOL(ping_watchdog);
+#endif
index f4c1e6e..50206fe 100644 (file)
@@ -108,9 +108,9 @@ static inline void ubd_set_bit(__u64 bit, unsigned char *data)
 static DEFINE_MUTEX(ubd_lock);
 static DEFINE_MUTEX(ubd_mutex); /* replaces BKL, might not be needed */
 
-static int ubd_open(struct block_device *bdev, fmode_t mode);
-static void ubd_release(struct gendisk *disk, fmode_t mode);
-static int ubd_ioctl(struct block_device *bdev, fmode_t mode,
+static int ubd_open(struct gendisk *disk, blk_mode_t mode);
+static void ubd_release(struct gendisk *disk);
+static int ubd_ioctl(struct block_device *bdev, blk_mode_t mode,
                     unsigned int cmd, unsigned long arg);
 static int ubd_getgeo(struct block_device *bdev, struct hd_geometry *geo);
 
@@ -1154,9 +1154,8 @@ static int __init ubd_driver_init(void){
 
 device_initcall(ubd_driver_init);
 
-static int ubd_open(struct block_device *bdev, fmode_t mode)
+static int ubd_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct gendisk *disk = bdev->bd_disk;
        struct ubd *ubd_dev = disk->private_data;
        int err = 0;
 
@@ -1171,19 +1170,12 @@ static int ubd_open(struct block_device *bdev, fmode_t mode)
        }
        ubd_dev->count++;
        set_disk_ro(disk, !ubd_dev->openflags.w);
-
-       /* This should no more be needed. And it didn't work anyway to exclude
-        * read-write remounting of filesystems.*/
-       /*if((mode & FMODE_WRITE) && !ubd_dev->openflags.w){
-               if(--ubd_dev->count == 0) ubd_close_dev(ubd_dev);
-               err = -EROFS;
-       }*/
 out:
        mutex_unlock(&ubd_mutex);
        return err;
 }
 
-static void ubd_release(struct gendisk *disk, fmode_t mode)
+static void ubd_release(struct gendisk *disk)
 {
        struct ubd *ubd_dev = disk->private_data;
 
@@ -1397,7 +1389,7 @@ static int ubd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
        return 0;
 }
 
-static int ubd_ioctl(struct block_device *bdev, fmode_t mode,
+static int ubd_ioctl(struct block_device *bdev, blk_mode_t mode,
                     unsigned int cmd, unsigned long arg)
 {
        struct ubd *ubd_dev = bdev->bd_disk->private_data;
diff --git a/arch/um/include/asm/bugs.h b/arch/um/include/asm/bugs.h
deleted file mode 100644 (file)
index 4473942..0000000
+++ /dev/null
@@ -1,7 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __UM_BUGS_H
-#define __UM_BUGS_H
-
-void check_bugs(void);
-
-#endif
index 0a23a98..918fed7 100644 (file)
@@ -3,6 +3,7 @@
  * Copyright (C) 2000 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
  */
 
+#include <linux/cpu.h>
 #include <linux/delay.h>
 #include <linux/init.h>
 #include <linux/mm.h>
@@ -430,7 +431,7 @@ void __init setup_arch(char **cmdline_p)
        }
 }
 
-void __init check_bugs(void)
+void __init arch_cpu_finalize_init(void)
 {
        arch_check_bugs();
        os_check_bugs();
index 53bab12..d5c6914 100644 (file)
@@ -71,6 +71,7 @@ config X86
        select ARCH_HAS_ACPI_TABLE_UPGRADE      if ACPI
        select ARCH_HAS_CACHE_LINE_SIZE
        select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
+       select ARCH_HAS_CPU_FINALIZE_INIT
        select ARCH_HAS_CURRENT_STACK_POINTER
        select ARCH_HAS_DEBUG_VIRTUAL
        select ARCH_HAS_DEBUG_VM_PGTABLE        if !X86_PAE
@@ -274,7 +275,9 @@ config X86
        select HAVE_UNSTABLE_SCHED_CLOCK
        select HAVE_USER_RETURN_NOTIFIER
        select HAVE_GENERIC_VDSO
+       select HOTPLUG_PARALLEL                 if SMP && X86_64
        select HOTPLUG_SMT                      if SMP
+       select HOTPLUG_SPLIT_STARTUP            if SMP && X86_32
        select IRQ_FORCED_THREADING
        select NEED_PER_CPU_EMBED_FIRST_CHUNK
        select NEED_PER_CPU_PAGE_FIRST_CHUNK
@@ -291,7 +294,6 @@ config X86
        select TRACE_IRQFLAGS_NMI_SUPPORT
        select USER_STACKTRACE_SUPPORT
        select HAVE_ARCH_KCSAN                  if X86_64
-       select X86_FEATURE_NAMES                if PROC_FS
        select PROC_PID_ARCH_STATUS             if PROC_FS
        select HAVE_ARCH_NODE_DEV_GROUP         if X86_SGX
        select FUNCTION_ALIGNMENT_16B           if X86_64 || X86_ALIGNMENT_16
@@ -441,17 +443,6 @@ config SMP
 
          If you don't know what to do here, say N.
 
-config X86_FEATURE_NAMES
-       bool "Processor feature human-readable names" if EMBEDDED
-       default y
-       help
-         This option compiles in a table of x86 feature bits and corresponding
-         names.  This is required to support /proc/cpuinfo and a few kernel
-         messages.  You can disable this to save space, at the expense of
-         making those few kernel messages show numeric feature bits instead.
-
-         If in doubt, say Y.
-
 config X86_X2APIC
        bool "Support x2apic"
        depends on X86_LOCAL_APIC && X86_64 && (IRQ_REMAP || HYPERVISOR_GUEST)
@@ -884,9 +875,11 @@ config INTEL_TDX_GUEST
        bool "Intel TDX (Trust Domain Extensions) - Guest Support"
        depends on X86_64 && CPU_SUP_INTEL
        depends on X86_X2APIC
+       depends on EFI_STUB
        select ARCH_HAS_CC_PLATFORM
        select X86_MEM_ENCRYPT
        select X86_MCE
+       select UNACCEPTED_MEMORY
        help
          Support running as a guest under Intel TDX.  Without this support,
          the guest kernel can not boot or run under TDX.
@@ -1541,11 +1534,13 @@ config X86_MEM_ENCRYPT
 config AMD_MEM_ENCRYPT
        bool "AMD Secure Memory Encryption (SME) support"
        depends on X86_64 && CPU_SUP_AMD
+       depends on EFI_STUB
        select DMA_COHERENT_POOL
        select ARCH_USE_MEMREMAP_PROT
        select INSTRUCTION_DECODER
        select ARCH_HAS_CC_PLATFORM
        select X86_MEM_ENCRYPT
+       select UNACCEPTED_MEMORY
        help
          Say yes to enable support for the encryption of system memory.
          This requires an AMD processor that supports Secure Memory
@@ -2305,49 +2300,6 @@ config HOTPLUG_CPU
        def_bool y
        depends on SMP
 
-config BOOTPARAM_HOTPLUG_CPU0
-       bool "Set default setting of cpu0_hotpluggable"
-       depends on HOTPLUG_CPU
-       help
-         Set whether default state of cpu0_hotpluggable is on or off.
-
-         Say Y here to enable CPU0 hotplug by default. If this switch
-         is turned on, there is no need to give cpu0_hotplug kernel
-         parameter and the CPU0 hotplug feature is enabled by default.
-
-         Please note: there are two known CPU0 dependencies if you want
-         to enable the CPU0 hotplug feature either by this switch or by
-         cpu0_hotplug kernel parameter.
-
-         First, resume from hibernate or suspend always starts from CPU0.
-         So hibernate and suspend are prevented if CPU0 is offline.
-
-         Second dependency is PIC interrupts always go to CPU0. CPU0 can not
-         offline if any interrupt can not migrate out of CPU0. There may
-         be other CPU0 dependencies.
-
-         Please make sure the dependencies are under your control before
-         you enable this feature.
-
-         Say N if you don't want to enable CPU0 hotplug feature by default.
-         You still can enable the CPU0 hotplug feature at boot by kernel
-         parameter cpu0_hotplug.
-
-config DEBUG_HOTPLUG_CPU0
-       def_bool n
-       prompt "Debug CPU0 hotplug"
-       depends on HOTPLUG_CPU
-       help
-         Enabling this option offlines CPU0 (if CPU0 can be offlined) as
-         soon as possible and boots up userspace with CPU0 offlined. User
-         can online CPU0 back after boot time.
-
-         To debug CPU0 hotplug, you need to enable CPU0 offline/online
-         feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during
-         compilation or giving cpu0_hotplug kernel parameter at boot.
-
-         If unsure, say N.
-
 config COMPAT_VDSO
        def_bool n
        prompt "Disable the 32-bit vDSO (needed for glibc 2.3.3)"
index 542377c..00468ad 100644 (file)
@@ -389,7 +389,7 @@ config IA32_FEAT_CTL
 
 config X86_VMX_FEATURE_NAMES
        def_bool y
-       depends on IA32_FEAT_CTL && X86_FEATURE_NAMES
+       depends on IA32_FEAT_CTL
 
 menuconfig PROCESSOR_SELECT
        bool "Supported processor vendors" if EXPERT
index b399759..fdc2e3a 100644 (file)
@@ -305,6 +305,18 @@ ifeq ($(RETPOLINE_CFLAGS),)
 endif
 endif
 
+ifdef CONFIG_UNWINDER_ORC
+orc_hash_h := arch/$(SRCARCH)/include/generated/asm/orc_hash.h
+orc_hash_sh := $(srctree)/scripts/orc_hash.sh
+targets += $(orc_hash_h)
+quiet_cmd_orc_hash = GEN     $@
+      cmd_orc_hash = mkdir -p $(dir $@); \
+                    $(CONFIG_SHELL) $(orc_hash_sh) < $< > $@
+$(orc_hash_h): $(srctree)/arch/x86/include/asm/orc_types.h $(orc_hash_sh) FORCE
+       $(call if_changed,orc_hash)
+archprepare: $(orc_hash_h)
+endif
+
 archclean:
        $(Q)rm -rf $(objtree)/arch/i386
        $(Q)rm -rf $(objtree)/arch/x86_64
diff --git a/arch/x86/Makefile.postlink b/arch/x86/Makefile.postlink
new file mode 100644 (file)
index 0000000..936093d
--- /dev/null
@@ -0,0 +1,47 @@
+# SPDX-License-Identifier: GPL-2.0
+# ===========================================================================
+# Post-link x86 pass
+# ===========================================================================
+#
+# 1. Separate relocations from vmlinux into vmlinux.relocs.
+# 2. Strip relocations from vmlinux.
+
+PHONY := __archpost
+__archpost:
+
+-include include/config/auto.conf
+include $(srctree)/scripts/Kbuild.include
+
+CMD_RELOCS = arch/x86/tools/relocs
+OUT_RELOCS = arch/x86/boot/compressed
+quiet_cmd_relocs = RELOCS  $(OUT_RELOCS)/$@.relocs
+      cmd_relocs = \
+       mkdir -p $(OUT_RELOCS); \
+       $(CMD_RELOCS) $@ > $(OUT_RELOCS)/$@.relocs; \
+       $(CMD_RELOCS) --abs-relocs $@
+
+quiet_cmd_strip_relocs = RSTRIP  $@
+      cmd_strip_relocs = \
+       $(OBJCOPY) --remove-section='.rel.*' --remove-section='.rel__*' \
+                  --remove-section='.rela.*' --remove-section='.rela__*' $@
+
+# `@true` prevents complaint when there is nothing to be done
+
+vmlinux: FORCE
+       @true
+ifeq ($(CONFIG_X86_NEED_RELOCS),y)
+       $(call cmd,relocs)
+       $(call cmd,strip_relocs)
+endif
+
+%.ko: FORCE
+       @true
+
+clean:
+       @rm -f $(OUT_RELOCS)/vmlinux.relocs
+
+PHONY += FORCE clean
+
+FORCE:
+
+.PHONY: $(PHONY)
index 9e38ffa..f33e45e 100644 (file)
@@ -55,14 +55,12 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include \
                    -include include/generated/autoconf.h \
                    -D__EXPORTED_HEADERS__
 
-ifdef CONFIG_X86_FEATURE_NAMES
 $(obj)/cpu.o: $(obj)/cpustr.h
 
 quiet_cmd_cpustr = CPUSTR  $@
       cmd_cpustr = $(obj)/mkcpustr > $@
 $(obj)/cpustr.h: $(obj)/mkcpustr FORCE
        $(call if_changed,cpustr)
-endif
 targets += cpustr.h
 
 # ---------------------------------------------------------------------------
index 6b6cfe6..40d2ff5 100644 (file)
@@ -106,7 +106,8 @@ ifdef CONFIG_X86_64
 endif
 
 vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
-vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
+vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o $(obj)/tdx-shared.o
+vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o
 
 vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
 vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
@@ -121,11 +122,9 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 
 targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all vmlinux.relocs
 
-CMD_RELOCS = arch/x86/tools/relocs
-quiet_cmd_relocs = RELOCS  $@
-      cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
-$(obj)/vmlinux.relocs: vmlinux FORCE
-       $(call if_changed,relocs)
+# vmlinux.relocs is created by the vmlinux postlink step.
+$(obj)/vmlinux.relocs: vmlinux
+       @true
 
 vmlinux.bin.all-y := $(obj)/vmlinux.bin
 vmlinux.bin.all-$(CONFIG_X86_NEED_RELOCS) += $(obj)/vmlinux.relocs
index 7db2f41..866c0af 100644 (file)
@@ -16,6 +16,7 @@ typedef guid_t efi_guid_t __aligned(__alignof__(u32));
 #define ACPI_TABLE_GUID                                EFI_GUID(0xeb9d2d30, 0x2d88, 0x11d3,  0x9a, 0x16, 0x00, 0x90, 0x27, 0x3f, 0xc1, 0x4d)
 #define ACPI_20_TABLE_GUID                     EFI_GUID(0x8868e871, 0xe4f1, 0x11d3,  0xbc, 0x22, 0x00, 0x80, 0xc7, 0x3c, 0x88, 0x81)
 #define EFI_CC_BLOB_GUID                       EFI_GUID(0x067b1f5f, 0xcf26, 0x44c5, 0x85, 0x54, 0x93, 0xd7, 0x77, 0x91, 0x2d, 0x42)
+#define LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID    EFI_GUID(0xd5d1de3c, 0x105c, 0x44f9,  0x9e, 0xa9, 0xbc, 0xef, 0x98, 0x12, 0x00, 0x31)
 
 #define EFI32_LOADER_SIGNATURE "EL32"
 #define EFI64_LOADER_SIGNATURE "EL64"
@@ -32,6 +33,7 @@ typedef       struct {
 } efi_table_hdr_t;
 
 #define EFI_CONVENTIONAL_MEMORY                 7
+#define EFI_UNACCEPTED_MEMORY          15
 
 #define EFI_MEMORY_MORE_RELIABLE \
                                ((u64)0x0000000000010000ULL)    /* higher reliability */
@@ -104,6 +106,14 @@ struct efi_setup_data {
        u64 reserved[8];
 };
 
+struct efi_unaccepted_memory {
+       u32 version;
+       u32 unit_size;
+       u64 phys_base;
+       u64 size;
+       unsigned long bitmap[];
+};
+
 static inline int efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
        return memcmp(&left, &right, sizeof (efi_guid_t));
index c881878..5313c5c 100644 (file)
@@ -22,3 +22,22 @@ void error(char *m)
        while (1)
                asm("hlt");
 }
+
+/* EFI libstub  provides vsnprintf() */
+#ifdef CONFIG_EFI_STUB
+void panic(const char *fmt, ...)
+{
+       static char buf[1024];
+       va_list args;
+       int len;
+
+       va_start(args, fmt);
+       len = vsnprintf(buf, sizeof(buf), fmt, args);
+       va_end(args);
+
+       if (len && buf[len - 1] == '\n')
+               buf[len - 1] = '\0';
+
+       error(buf);
+}
+#endif
index 1de5821..86fe33b 100644 (file)
@@ -6,5 +6,6 @@
 
 void warn(char *m);
 void error(char *m) __noreturn;
+void panic(const char *fmt, ...) __noreturn __cold;
 
 #endif /* BOOT_COMPRESSED_ERROR_H */
index 454757f..9193acf 100644 (file)
@@ -672,6 +672,33 @@ static bool process_mem_region(struct mem_vector *region,
 }
 
 #ifdef CONFIG_EFI
+
+/*
+ * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if supported) are
+ * guaranteed to be free.
+ *
+ * Pick free memory more conservatively than the EFI spec allows: according to
+ * the spec, EFI_BOOT_SERVICES_{CODE|DATA} are also free memory and thus
+ * available to place the kernel image into, but in practice there's firmware
+ * where using that memory leads to crashes. Buggy vendor EFI code registers
+ * for an event that triggers on SetVirtualAddressMap(). The handler assumes
+ * that EFI_BOOT_SERVICES_DATA memory has not been touched by loader yet, which
+ * is probably true for Windows.
+ *
+ * Preserve EFI_BOOT_SERVICES_* regions until after SetVirtualAddressMap().
+ */
+static inline bool memory_type_is_free(efi_memory_desc_t *md)
+{
+       if (md->type == EFI_CONVENTIONAL_MEMORY)
+               return true;
+
+       if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) &&
+           md->type == EFI_UNACCEPTED_MEMORY)
+                   return true;
+
+       return false;
+}
+
 /*
  * Returns true if we processed the EFI memmap, which we prefer over the E820
  * table if it is available.
@@ -716,18 +743,7 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
        for (i = 0; i < nr_desc; i++) {
                md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
 
-               /*
-                * Here we are more conservative in picking free memory than
-                * the EFI spec allows:
-                *
-                * According to the spec, EFI_BOOT_SERVICES_{CODE|DATA} are also
-                * free memory and thus available to place the kernel image into,
-                * but in practice there's firmware where using that memory leads
-                * to crashes.
-                *
-                * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free.
-                */
-               if (md->type != EFI_CONVENTIONAL_MEMORY)
+               if (!memory_type_is_free(md))
                        continue;
 
                if (efi_soft_reserve_enabled() &&
diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
new file mode 100644 (file)
index 0000000..3c16092
--- /dev/null
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "error.h"
+#include "misc.h"
+#include "tdx.h"
+#include "sev.h"
+#include <asm/shared/tdx.h>
+
+/*
+ * accept_memory() and process_unaccepted_memory() called from EFI stub which
+ * runs before decompresser and its early_tdx_detect().
+ *
+ * Enumerate TDX directly from the early users.
+ */
+static bool early_is_tdx_guest(void)
+{
+       static bool once;
+       static bool is_tdx;
+
+       if (!IS_ENABLED(CONFIG_INTEL_TDX_GUEST))
+               return false;
+
+       if (!once) {
+               u32 eax, sig[3];
+
+               cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax,
+                           &sig[0], &sig[2],  &sig[1]);
+               is_tdx = !memcmp(TDX_IDENT, sig, sizeof(sig));
+               once = true;
+       }
+
+       return is_tdx;
+}
+
+void arch_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+       /* Platform-specific memory-acceptance call goes here */
+       if (early_is_tdx_guest()) {
+               if (!tdx_accept_memory(start, end))
+                       panic("TDX: Failed to accept memory\n");
+       } else if (sev_snp_enabled()) {
+               snp_accept_memory(start, end);
+       } else {
+               error("Cannot accept memory: unknown platform\n");
+       }
+}
+
+bool init_unaccepted_memory(void)
+{
+       guid_t guid = LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID;
+       struct efi_unaccepted_memory *table;
+       unsigned long cfg_table_pa;
+       unsigned int cfg_table_len;
+       enum efi_type et;
+       int ret;
+
+       et = efi_get_type(boot_params);
+       if (et == EFI_TYPE_NONE)
+               return false;
+
+       ret = efi_get_conf_table(boot_params, &cfg_table_pa, &cfg_table_len);
+       if (ret) {
+               warn("EFI config table not found.");
+               return false;
+       }
+
+       table = (void *)efi_find_vendor_table(boot_params, cfg_table_pa,
+                                             cfg_table_len, guid);
+       if (!table)
+               return false;
+
+       if (table->version != 1)
+               error("Unknown version of unaccepted memory table\n");
+
+       /*
+        * In many cases unaccepted_table is already set by EFI stub, but it
+        * has to be initialized again to cover cases when the table is not
+        * allocated by EFI stub or EFI stub copied the kernel image with
+        * efi_relocate_kernel() before the variable is set.
+        *
+        * It must be initialized before the first usage of accept_memory().
+        */
+       unaccepted_table = table;
+
+       return true;
+}
index 014ff22..94b7abc 100644 (file)
@@ -455,6 +455,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 #endif
 
        debug_putstr("\nDecompressing Linux... ");
+
+       if (init_unaccepted_memory()) {
+               debug_putstr("Accepting memory... ");
+               accept_memory(__pa(output), __pa(output) + needed_size);
+       }
+
        __decompress(input_data, input_len, NULL, NULL, output, output_len,
                        NULL, error);
        entry_offset = parse_elf(output);
index 2f155a0..964fe90 100644 (file)
@@ -247,4 +247,14 @@ static inline unsigned long efi_find_vendor_table(struct boot_params *bp,
 }
 #endif /* CONFIG_EFI */
 
+#ifdef CONFIG_UNACCEPTED_MEMORY
+bool init_unaccepted_memory(void);
+#else
+static inline bool init_unaccepted_memory(void) { return false; }
+#endif
+
+/* Defined in EFI stub */
+extern struct efi_unaccepted_memory *unaccepted_table;
+void accept_memory(phys_addr_t start, phys_addr_t end);
+
 #endif /* BOOT_COMPRESSED_MISC_H */
index 014b89c..09dc8c1 100644 (file)
@@ -115,7 +115,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
 /* Include code for early handlers */
 #include "../../kernel/sev-shared.c"
 
-static inline bool sev_snp_enabled(void)
+bool sev_snp_enabled(void)
 {
        return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
 }
@@ -181,6 +181,58 @@ static bool early_setup_ghcb(void)
        return true;
 }
 
+static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
+                                      phys_addr_t pa, phys_addr_t pa_end)
+{
+       struct psc_hdr *hdr;
+       struct psc_entry *e;
+       unsigned int i;
+
+       hdr = &desc->hdr;
+       memset(hdr, 0, sizeof(*hdr));
+
+       e = desc->entries;
+
+       i = 0;
+       while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) {
+               hdr->end_entry = i;
+
+               e->gfn = pa >> PAGE_SHIFT;
+               e->operation = SNP_PAGE_STATE_PRIVATE;
+               if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) {
+                       e->pagesize = RMP_PG_SIZE_2M;
+                       pa += PMD_SIZE;
+               } else {
+                       e->pagesize = RMP_PG_SIZE_4K;
+                       pa += PAGE_SIZE;
+               }
+
+               e++;
+               i++;
+       }
+
+       if (vmgexit_psc(boot_ghcb, desc))
+               sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+       pvalidate_pages(desc);
+
+       return pa;
+}
+
+void snp_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+       struct snp_psc_desc desc = {};
+       unsigned int i;
+       phys_addr_t pa;
+
+       if (!boot_ghcb && !early_setup_ghcb())
+               sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+       pa = start;
+       while (pa < end)
+               pa = __snp_accept_memory(&desc, pa, end);
+}
+
 void sev_es_shutdown_ghcb(void)
 {
        if (!boot_ghcb)
diff --git a/arch/x86/boot/compressed/sev.h b/arch/x86/boot/compressed/sev.h
new file mode 100644 (file)
index 0000000..fc725a9
--- /dev/null
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD SEV header for early boot related functions.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ */
+
+#ifndef BOOT_COMPRESSED_SEV_H
+#define BOOT_COMPRESSED_SEV_H
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+bool sev_snp_enabled(void);
+void snp_accept_memory(phys_addr_t start, phys_addr_t end);
+
+#else
+
+static inline bool sev_snp_enabled(void) { return false; }
+static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
+
+#endif
+
+#endif
diff --git a/arch/x86/boot/compressed/tdx-shared.c b/arch/x86/boot/compressed/tdx-shared.c
new file mode 100644 (file)
index 0000000..5ac4376
--- /dev/null
@@ -0,0 +1,2 @@
+#include "error.h"
+#include "../../coco/tdx/tdx-shared.c"
index 2d81d3c..8841b94 100644 (file)
@@ -20,7 +20,7 @@ static inline unsigned int tdx_io_in(int size, u16 port)
 {
        struct tdx_hypercall_args args = {
                .r10 = TDX_HYPERCALL_STANDARD,
-               .r11 = EXIT_REASON_IO_INSTRUCTION,
+               .r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION),
                .r12 = size,
                .r13 = 0,
                .r14 = port,
@@ -36,7 +36,7 @@ static inline void tdx_io_out(int size, u16 port, u32 value)
 {
        struct tdx_hypercall_args args = {
                .r10 = TDX_HYPERCALL_STANDARD,
-               .r11 = EXIT_REASON_IO_INSTRUCTION,
+               .r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION),
                .r12 = size,
                .r13 = 1,
                .r14 = port,
index 0bbf4f3..feb6dbd 100644 (file)
@@ -14,9 +14,7 @@
  */
 
 #include "boot.h"
-#ifdef CONFIG_X86_FEATURE_NAMES
 #include "cpustr.h"
-#endif
 
 static char *cpu_name(int level)
 {
@@ -35,7 +33,6 @@ static char *cpu_name(int level)
 static void show_cap_strs(u32 *err_flags)
 {
        int i, j;
-#ifdef CONFIG_X86_FEATURE_NAMES
        const unsigned char *msg_strs = (const unsigned char *)x86_cap_strs;
        for (i = 0; i < NCAPINTS; i++) {
                u32 e = err_flags[i];
@@ -58,16 +55,6 @@ static void show_cap_strs(u32 *err_flags)
                        e >>= 1;
                }
        }
-#else
-       for (i = 0; i < NCAPINTS; i++) {
-               u32 e = err_flags[i];
-               for (j = 0; j < 32; j++) {
-                       if (e & 1)
-                               printf("%d:%d ", i, j);
-                       e >>= 1;
-               }
-       }
-#endif
 }
 
 int validate_cpu(void)
index 73f8323..eeec998 100644 (file)
 #include <asm/coco.h>
 #include <asm/processor.h>
 
-enum cc_vendor cc_vendor __ro_after_init;
+enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
 static u64 cc_mask __ro_after_init;
 
-static bool intel_cc_platform_has(enum cc_attr attr)
+static bool noinstr intel_cc_platform_has(enum cc_attr attr)
 {
        switch (attr) {
        case CC_ATTR_GUEST_UNROLL_STRING_IO:
@@ -34,7 +34,7 @@ static bool intel_cc_platform_has(enum cc_attr attr)
  * the other levels of SME/SEV functionality, including C-bit
  * based SEV-SNP, are not enabled.
  */
-static __maybe_unused bool amd_cc_platform_vtom(enum cc_attr attr)
+static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr attr)
 {
        switch (attr) {
        case CC_ATTR_GUEST_MEM_ENCRYPT:
@@ -58,7 +58,7 @@ static __maybe_unused bool amd_cc_platform_vtom(enum cc_attr attr)
  * the trampoline area must be encrypted.
  */
 
-static bool amd_cc_platform_has(enum cc_attr attr)
+static bool noinstr amd_cc_platform_has(enum cc_attr attr)
 {
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
@@ -97,7 +97,7 @@ static bool amd_cc_platform_has(enum cc_attr attr)
 #endif
 }
 
-bool cc_platform_has(enum cc_attr attr)
+bool noinstr cc_platform_has(enum cc_attr attr)
 {
        switch (cc_vendor) {
        case CC_VENDOR_AMD:
index 46c5599..2c7dcbf 100644 (file)
@@ -1,3 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-y += tdx.o tdcall.o
+obj-y += tdx.o tdx-shared.o tdcall.o
diff --git a/arch/x86/coco/tdx/tdx-shared.c b/arch/x86/coco/tdx/tdx-shared.c
new file mode 100644 (file)
index 0000000..ef20ddc
--- /dev/null
@@ -0,0 +1,71 @@
+#include <asm/tdx.h>
+#include <asm/pgtable.h>
+
+static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
+                                   enum pg_level pg_level)
+{
+       unsigned long accept_size = page_level_size(pg_level);
+       u64 tdcall_rcx;
+       u8 page_size;
+
+       if (!IS_ALIGNED(start, accept_size))
+               return 0;
+
+       if (len < accept_size)
+               return 0;
+
+       /*
+        * Pass the page physical address to the TDX module to accept the
+        * pending, private page.
+        *
+        * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
+        */
+       switch (pg_level) {
+       case PG_LEVEL_4K:
+               page_size = 0;
+               break;
+       case PG_LEVEL_2M:
+               page_size = 1;
+               break;
+       case PG_LEVEL_1G:
+               page_size = 2;
+               break;
+       default:
+               return 0;
+       }
+
+       tdcall_rcx = start | page_size;
+       if (__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL))
+               return 0;
+
+       return accept_size;
+}
+
+bool tdx_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+       /*
+        * For shared->private conversion, accept the page using
+        * TDX_ACCEPT_PAGE TDX module call.
+        */
+       while (start < end) {
+               unsigned long len = end - start;
+               unsigned long accept_size;
+
+               /*
+                * Try larger accepts first. It gives chance to VMM to keep
+                * 1G/2M Secure EPT entries where possible and speeds up
+                * process by cutting number of hypercalls (if successful).
+                */
+
+               accept_size = try_accept_one(start, len, PG_LEVEL_1G);
+               if (!accept_size)
+                       accept_size = try_accept_one(start, len, PG_LEVEL_2M);
+               if (!accept_size)
+                       accept_size = try_accept_one(start, len, PG_LEVEL_4K);
+               if (!accept_size)
+                       return false;
+               start += accept_size;
+       }
+
+       return true;
+}
index e146b59..1d6b863 100644 (file)
 #include <asm/insn-eval.h>
 #include <asm/pgtable.h>
 
-/* TDX module Call Leaf IDs */
-#define TDX_GET_INFO                   1
-#define TDX_GET_VEINFO                 3
-#define TDX_GET_REPORT                 4
-#define TDX_ACCEPT_PAGE                        6
-#define TDX_WR                         8
-
-/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
-#define TDCS_NOTIFY_ENABLES            0x9100000000000010
-
-/* TDX hypercall Leaf IDs */
-#define TDVMCALL_MAP_GPA               0x10001
-#define TDVMCALL_REPORT_FATAL_ERROR    0x10003
-
 /* MMIO direction */
 #define EPT_READ       0
 #define EPT_WRITE      1
 
 #define TDREPORT_SUBTYPE_0     0
 
-/*
- * Wrapper for standard use of __tdx_hypercall with no output aside from
- * return code.
- */
-static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
-{
-       struct tdx_hypercall_args args = {
-               .r10 = TDX_HYPERCALL_STANDARD,
-               .r11 = fn,
-               .r12 = r12,
-               .r13 = r13,
-               .r14 = r14,
-               .r15 = r15,
-       };
-
-       return __tdx_hypercall(&args);
-}
-
 /* Called from __tdx_hypercall() for unrecoverable failure */
 noinstr void __tdx_hypercall_failed(void)
 {
@@ -76,17 +44,6 @@ noinstr void __tdx_hypercall_failed(void)
        panic("TDVMCALL failed. TDX module bug?");
 }
 
-/*
- * The TDG.VP.VMCALL-Instruction-execution sub-functions are defined
- * independently from but are currently matched 1:1 with VMX EXIT_REASONs.
- * Reusing the KVM EXIT_REASON macros makes it easier to connect the host and
- * guest sides of these calls.
- */
-static __always_inline u64 hcall_func(u64 exit_reason)
-{
-       return exit_reason;
-}
-
 #ifdef CONFIG_KVM_GUEST
 long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, unsigned long p2,
                       unsigned long p3, unsigned long p4)
@@ -745,47 +702,6 @@ static bool tdx_cache_flush_required(void)
        return true;
 }
 
-static bool try_accept_one(phys_addr_t *start, unsigned long len,
-                         enum pg_level pg_level)
-{
-       unsigned long accept_size = page_level_size(pg_level);
-       u64 tdcall_rcx;
-       u8 page_size;
-
-       if (!IS_ALIGNED(*start, accept_size))
-               return false;
-
-       if (len < accept_size)
-               return false;
-
-       /*
-        * Pass the page physical address to the TDX module to accept the
-        * pending, private page.
-        *
-        * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
-        */
-       switch (pg_level) {
-       case PG_LEVEL_4K:
-               page_size = 0;
-               break;
-       case PG_LEVEL_2M:
-               page_size = 1;
-               break;
-       case PG_LEVEL_1G:
-               page_size = 2;
-               break;
-       default:
-               return false;
-       }
-
-       tdcall_rcx = *start | page_size;
-       if (__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL))
-               return false;
-
-       *start += accept_size;
-       return true;
-}
-
 /*
  * Inform the VMM of the guest's intent for this physical page: shared with
  * the VMM or private to the guest.  The VMM is expected to change its mapping
@@ -810,33 +726,34 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
        if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
                return false;
 
-       /* private->shared conversion  requires only MapGPA call */
-       if (!enc)
-               return true;
+       /* shared->private conversion requires memory to be accepted before use */
+       if (enc)
+               return tdx_accept_memory(start, end);
+
+       return true;
+}
 
+static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
+                                         bool enc)
+{
        /*
-        * For shared->private conversion, accept the page using
-        * TDX_ACCEPT_PAGE TDX module call.
+        * Only handle shared->private conversion here.
+        * See the comment in tdx_early_init().
         */
-       while (start < end) {
-               unsigned long len = end - start;
-
-               /*
-                * Try larger accepts first. It gives chance to VMM to keep
-                * 1G/2M SEPT entries where possible and speeds up process by
-                * cutting number of hypercalls (if successful).
-                */
-
-               if (try_accept_one(&start, len, PG_LEVEL_1G))
-                       continue;
-
-               if (try_accept_one(&start, len, PG_LEVEL_2M))
-                       continue;
-
-               if (!try_accept_one(&start, len, PG_LEVEL_4K))
-                       return false;
-       }
+       if (enc)
+               return tdx_enc_status_changed(vaddr, numpages, enc);
+       return true;
+}
 
+static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
+                                        bool enc)
+{
+       /*
+        * Only handle private->shared conversion here.
+        * See the comment in tdx_early_init().
+        */
+       if (!enc)
+               return tdx_enc_status_changed(vaddr, numpages, enc);
        return true;
 }
 
@@ -852,7 +769,7 @@ void __init tdx_early_init(void)
 
        setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
 
-       cc_set_vendor(CC_VENDOR_INTEL);
+       cc_vendor = CC_VENDOR_INTEL;
        tdx_parse_tdinfo(&cc_mask);
        cc_set_mask(cc_mask);
 
@@ -867,9 +784,41 @@ void __init tdx_early_init(void)
         */
        physical_mask &= cc_mask - 1;
 
-       x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
-       x86_platform.guest.enc_tlb_flush_required   = tdx_tlb_flush_required;
-       x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;
+       /*
+        * The kernel mapping should match the TDX metadata for the page.
+        * load_unaligned_zeropad() can touch memory *adjacent* to that which is
+        * owned by the caller and can catch even _momentary_ mismatches.  Bad
+        * things happen on mismatch:
+        *
+        *   - Private mapping => Shared Page  == Guest shutdown
+         *   - Shared mapping  => Private Page == Recoverable #VE
+        *
+        * guest.enc_status_change_prepare() converts the page from
+        * shared=>private before the mapping becomes private.
+        *
+        * guest.enc_status_change_finish() converts the page from
+        * private=>shared after the mapping becomes private.
+        *
+        * In both cases there is a temporary shared mapping to a private page,
+        * which can result in a #VE.  But, there is never a private mapping to
+        * a shared page.
+        */
+       x86_platform.guest.enc_status_change_prepare = tdx_enc_status_change_prepare;
+       x86_platform.guest.enc_status_change_finish  = tdx_enc_status_change_finish;
+
+       x86_platform.guest.enc_cache_flush_required  = tdx_cache_flush_required;
+       x86_platform.guest.enc_tlb_flush_required    = tdx_tlb_flush_required;
+
+       /*
+        * TDX intercepts the RDMSR to read the X2APIC ID in the parallel
+        * bringup low level code. That raises #VE which cannot be handled
+        * there.
+        *
+        * Intel-TDX has a secure RDMSR hypercall, but that needs to be
+        * implemented seperately in the low level startup ASM code.
+        * Until that is in place, disable parallel bringup for TDX.
+        */
+       x86_cpuinit.parallel_bringup = false;
 
        pr_info("Guest detected\n");
 }
index 7c1abc5..9556dac 100644 (file)
        .octa 0x3F893781E95FE1576CDA64D2BA0CB204
 
 #ifdef CONFIG_AS_GFNI
-.section       .rodata.cst8, "aM", @progbits, 8
-.align 8
 /* AES affine: */
 #define tf_aff_const BV8(1, 1, 0, 0, 0, 1, 1, 0)
 .Ltf_aff_bitmatrix:
index 5e37f41..27b5da2 100644 (file)
@@ -26,17 +26,7 @@ SYM_FUNC_START(\name)
        pushq %r11
 
        call \func
-       jmp  __thunk_restore
-SYM_FUNC_END(\name)
-       _ASM_NOKPROBE(\name)
-       .endm
-
-       THUNK preempt_schedule_thunk, preempt_schedule
-       THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
-       EXPORT_SYMBOL(preempt_schedule_thunk)
-       EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
 
-SYM_CODE_START_LOCAL(__thunk_restore)
        popq %r11
        popq %r10
        popq %r9
@@ -48,5 +38,11 @@ SYM_CODE_START_LOCAL(__thunk_restore)
        popq %rdi
        popq %rbp
        RET
-       _ASM_NOKPROBE(__thunk_restore)
-SYM_CODE_END(__thunk_restore)
+SYM_FUNC_END(\name)
+       _ASM_NOKPROBE(\name)
+       .endm
+
+THUNK preempt_schedule_thunk, preempt_schedule
+THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
+EXPORT_SYMBOL(preempt_schedule_thunk)
+EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
index 0a9007c..e464030 100644 (file)
@@ -8,6 +8,7 @@
 #include <linux/kernel.h>
 #include <linux/getcpu.h>
 #include <asm/segment.h>
+#include <vdso/processor.h>
 
 notrace long
 __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
index bccea57..abadd5f 100644 (file)
@@ -374,7 +374,7 @@ static int amd_pmu_hw_config(struct perf_event *event)
 
        /* pass precise event sampling to ibs: */
        if (event->attr.precise_ip && get_ibs_caps())
-               return -ENOENT;
+               return forward_event_to_ibs(event);
 
        if (has_branch_stack(event) && !x86_pmu.lbr_nr)
                return -EOPNOTSUPP;
index 6458295..3710148 100644 (file)
@@ -190,7 +190,7 @@ static struct perf_ibs *get_ibs_pmu(int type)
 }
 
 /*
- * Use IBS for precise event sampling:
+ * core pmu config -> IBS config
  *
  *  perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
  *  perf record -a -e r076:p ...          # same as -e cpu-cycles:p
@@ -199,25 +199,9 @@ static struct perf_ibs *get_ibs_pmu(int type)
  * IbsOpCntCtl (bit 19) of IBS Execution Control Register (IbsOpCtl,
  * MSRC001_1033) is used to select either cycle or micro-ops counting
  * mode.
- *
- * The rip of IBS samples has skid 0. Thus, IBS supports precise
- * levels 1 and 2 and the PERF_EFLAGS_EXACT is set. In rare cases the
- * rip is invalid when IBS was not able to record the rip correctly.
- * We clear PERF_EFLAGS_EXACT and take the rip from pt_regs then.
- *
  */
-static int perf_ibs_precise_event(struct perf_event *event, u64 *config)
+static int core_pmu_ibs_config(struct perf_event *event, u64 *config)
 {
-       switch (event->attr.precise_ip) {
-       case 0:
-               return -ENOENT;
-       case 1:
-       case 2:
-               break;
-       default:
-               return -EOPNOTSUPP;
-       }
-
        switch (event->attr.type) {
        case PERF_TYPE_HARDWARE:
                switch (event->attr.config) {
@@ -243,22 +227,37 @@ static int perf_ibs_precise_event(struct perf_event *event, u64 *config)
        return -EOPNOTSUPP;
 }
 
+/*
+ * The rip of IBS samples has skid 0. Thus, IBS supports precise
+ * levels 1 and 2 and the PERF_EFLAGS_EXACT is set. In rare cases the
+ * rip is invalid when IBS was not able to record the rip correctly.
+ * We clear PERF_EFLAGS_EXACT and take the rip from pt_regs then.
+ */
+int forward_event_to_ibs(struct perf_event *event)
+{
+       u64 config = 0;
+
+       if (!event->attr.precise_ip || event->attr.precise_ip > 2)
+               return -EOPNOTSUPP;
+
+       if (!core_pmu_ibs_config(event, &config)) {
+               event->attr.type = perf_ibs_op.pmu.type;
+               event->attr.config = config;
+       }
+       return -ENOENT;
+}
+
 static int perf_ibs_init(struct perf_event *event)
 {
        struct hw_perf_event *hwc = &event->hw;
        struct perf_ibs *perf_ibs;
        u64 max_cnt, config;
-       int ret;
 
        perf_ibs = get_ibs_pmu(event->attr.type);
-       if (perf_ibs) {
-               config = event->attr.config;
-       } else {
-               perf_ibs = &perf_ibs_op;
-               ret = perf_ibs_precise_event(event, &config);
-               if (ret)
-                       return ret;
-       }
+       if (!perf_ibs)
+               return -ENOENT;
+
+       config = event->attr.config;
 
        if (event->pmu != &perf_ibs->pmu)
                return -ENOENT;
index d096b04..9d24870 100644 (file)
@@ -1703,10 +1703,8 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 
                perf_sample_data_init(&data, 0, event->hw.last_period);
 
-               if (has_branch_stack(event)) {
-                       data.br_stack = &cpuc->lbr_stack;
-                       data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
-               }
+               if (has_branch_stack(event))
+                       perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
 
                if (perf_event_overflow(event, &data, regs))
                        x86_pmu_stop(event, 0);
index 070cc4e..a149faf 100644 (file)
@@ -349,6 +349,16 @@ static struct event_constraint intel_spr_event_constraints[] = {
        EVENT_CONSTRAINT_END
 };
 
+static struct extra_reg intel_gnr_extra_regs[] __read_mostly = {
+       INTEL_UEVENT_EXTRA_REG(0x012a, MSR_OFFCORE_RSP_0, 0x3fffffffffull, RSP_0),
+       INTEL_UEVENT_EXTRA_REG(0x012b, MSR_OFFCORE_RSP_1, 0x3fffffffffull, RSP_1),
+       INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
+       INTEL_UEVENT_EXTRA_REG(0x02c6, MSR_PEBS_FRONTEND, 0x9, FE),
+       INTEL_UEVENT_EXTRA_REG(0x03c6, MSR_PEBS_FRONTEND, 0x7fff1f, FE),
+       INTEL_UEVENT_EXTRA_REG(0x40ad, MSR_PEBS_FRONTEND, 0x7, FE),
+       INTEL_UEVENT_EXTRA_REG(0x04c2, MSR_PEBS_FRONTEND, 0x8, FE),
+       EVENT_EXTRA_END
+};
 
 EVENT_ATTR_STR(mem-loads,      mem_ld_nhm,     "event=0x0b,umask=0x10,ldlat=3");
 EVENT_ATTR_STR(mem-loads,      mem_ld_snb,     "event=0xcd,umask=0x1,ldlat=3");
@@ -2451,7 +2461,7 @@ static void intel_pmu_disable_fixed(struct perf_event *event)
 
        intel_clear_masks(event, idx);
 
-       mask = 0xfULL << ((idx - INTEL_PMC_IDX_FIXED) * 4);
+       mask = intel_fixed_bits_by_idx(idx - INTEL_PMC_IDX_FIXED, INTEL_FIXED_BITS_MASK);
        cpuc->fixed_ctrl_val &= ~mask;
 }
 
@@ -2750,25 +2760,25 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
         * if requested:
         */
        if (!event->attr.precise_ip)
-               bits |= 0x8;
+               bits |= INTEL_FIXED_0_ENABLE_PMI;
        if (hwc->config & ARCH_PERFMON_EVENTSEL_USR)
-               bits |= 0x2;
+               bits |= INTEL_FIXED_0_USER;
        if (hwc->config & ARCH_PERFMON_EVENTSEL_OS)
-               bits |= 0x1;
+               bits |= INTEL_FIXED_0_KERNEL;
 
        /*
         * ANY bit is supported in v3 and up
         */
        if (x86_pmu.version > 2 && hwc->config & ARCH_PERFMON_EVENTSEL_ANY)
-               bits |= 0x4;
+               bits |= INTEL_FIXED_0_ANYTHREAD;
 
        idx -= INTEL_PMC_IDX_FIXED;
-       bits <<= (idx * 4);
-       mask = 0xfULL << (idx * 4);
+       bits = intel_fixed_bits_by_idx(idx, bits);
+       mask = intel_fixed_bits_by_idx(idx, INTEL_FIXED_BITS_MASK);
 
        if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip) {
-               bits |= ICL_FIXED_0_ADAPTIVE << (idx * 4);
-               mask |= ICL_FIXED_0_ADAPTIVE << (idx * 4);
+               bits |= intel_fixed_bits_by_idx(idx, ICL_FIXED_0_ADAPTIVE);
+               mask |= intel_fixed_bits_by_idx(idx, ICL_FIXED_0_ADAPTIVE);
        }
 
        cpuc->fixed_ctrl_val &= ~mask;
@@ -4074,7 +4084,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
        if (x86_pmu.intel_cap.pebs_baseline) {
                arr[(*nr)++] = (struct perf_guest_switch_msr){
                        .msr = MSR_PEBS_DATA_CFG,
-                       .host = cpuc->pebs_data_cfg,
+                       .host = cpuc->active_pebs_data_cfg,
                        .guest = kvm_pmu->pebs_data_cfg,
                };
        }
@@ -6496,6 +6506,7 @@ __init int intel_pmu_init(void)
        case INTEL_FAM6_SAPPHIRERAPIDS_X:
        case INTEL_FAM6_EMERALDRAPIDS_X:
                x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;
+               x86_pmu.extra_regs = intel_spr_extra_regs;
                fallthrough;
        case INTEL_FAM6_GRANITERAPIDS_X:
        case INTEL_FAM6_GRANITERAPIDS_D:
@@ -6506,7 +6517,8 @@ __init int intel_pmu_init(void)
 
                x86_pmu.event_constraints = intel_spr_event_constraints;
                x86_pmu.pebs_constraints = intel_spr_pebs_event_constraints;
-               x86_pmu.extra_regs = intel_spr_extra_regs;
+               if (!x86_pmu.extra_regs)
+                       x86_pmu.extra_regs = intel_gnr_extra_regs;
                x86_pmu.limit_period = spr_limit_period;
                x86_pmu.pebs_ept = 1;
                x86_pmu.pebs_aliases = NULL;
@@ -6650,6 +6662,7 @@ __init int intel_pmu_init(void)
                pmu->pebs_constraints = intel_grt_pebs_event_constraints;
                pmu->extra_regs = intel_grt_extra_regs;
                if (is_mtl(boot_cpu_data.x86_model)) {
+                       x86_pmu.hybrid_pmu[X86_HYBRID_PMU_CORE_IDX].extra_regs = intel_gnr_extra_regs;
                        x86_pmu.pebs_latency_data = mtl_latency_data_small;
                        extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
                                mtl_hybrid_extra_attr_rtm : mtl_hybrid_extra_attr;
index a2e566e..df88576 100644 (file)
@@ -1229,12 +1229,14 @@ pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc,
                  struct perf_event *event, bool add)
 {
        struct pmu *pmu = event->pmu;
+
        /*
         * Make sure we get updated with the first PEBS
         * event. It will trigger also during removal, but
         * that does not hurt:
         */
-       bool update = cpuc->n_pebs == 1;
+       if (cpuc->n_pebs == 1)
+               cpuc->pebs_data_cfg = PEBS_UPDATE_DS_SW;
 
        if (needed_cb != pebs_needs_sched_cb(cpuc)) {
                if (!needed_cb)
@@ -1242,7 +1244,7 @@ pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc,
                else
                        perf_sched_cb_dec(pmu);
 
-               update = true;
+               cpuc->pebs_data_cfg |= PEBS_UPDATE_DS_SW;
        }
 
        /*
@@ -1252,24 +1254,13 @@ pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc,
        if (x86_pmu.intel_cap.pebs_baseline && add) {
                u64 pebs_data_cfg;
 
-               /* Clear pebs_data_cfg and pebs_record_size for first PEBS. */
-               if (cpuc->n_pebs == 1) {
-                       cpuc->pebs_data_cfg = 0;
-                       cpuc->pebs_record_size = sizeof(struct pebs_basic);
-               }
-
                pebs_data_cfg = pebs_update_adaptive_cfg(event);
-
-               /* Update pebs_record_size if new event requires more data. */
-               if (pebs_data_cfg & ~cpuc->pebs_data_cfg) {
-                       cpuc->pebs_data_cfg |= pebs_data_cfg;
-                       adaptive_pebs_record_size_update();
-                       update = true;
-               }
+               /*
+                * Be sure to update the thresholds when we change the record.
+                */
+               if (pebs_data_cfg & ~cpuc->pebs_data_cfg)
+                       cpuc->pebs_data_cfg |= pebs_data_cfg | PEBS_UPDATE_DS_SW;
        }
-
-       if (update)
-               pebs_update_threshold(cpuc);
 }
 
 void intel_pmu_pebs_add(struct perf_event *event)
@@ -1326,9 +1317,17 @@ static void intel_pmu_pebs_via_pt_enable(struct perf_event *event)
        wrmsrl(base + idx, value);
 }
 
+static inline void intel_pmu_drain_large_pebs(struct cpu_hw_events *cpuc)
+{
+       if (cpuc->n_pebs == cpuc->n_large_pebs &&
+           cpuc->n_pebs != cpuc->n_pebs_via_pt)
+               intel_pmu_drain_pebs_buffer();
+}
+
 void intel_pmu_pebs_enable(struct perf_event *event)
 {
        struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+       u64 pebs_data_cfg = cpuc->pebs_data_cfg & ~PEBS_UPDATE_DS_SW;
        struct hw_perf_event *hwc = &event->hw;
        struct debug_store *ds = cpuc->ds;
        unsigned int idx = hwc->idx;
@@ -1344,11 +1343,22 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 
        if (x86_pmu.intel_cap.pebs_baseline) {
                hwc->config |= ICL_EVENTSEL_ADAPTIVE;
-               if (cpuc->pebs_data_cfg != cpuc->active_pebs_data_cfg) {
-                       wrmsrl(MSR_PEBS_DATA_CFG, cpuc->pebs_data_cfg);
-                       cpuc->active_pebs_data_cfg = cpuc->pebs_data_cfg;
+               if (pebs_data_cfg != cpuc->active_pebs_data_cfg) {
+                       /*
+                        * drain_pebs() assumes uniform record size;
+                        * hence we need to drain when changing said
+                        * size.
+                        */
+                       intel_pmu_drain_large_pebs(cpuc);
+                       adaptive_pebs_record_size_update();
+                       wrmsrl(MSR_PEBS_DATA_CFG, pebs_data_cfg);
+                       cpuc->active_pebs_data_cfg = pebs_data_cfg;
                }
        }
+       if (cpuc->pebs_data_cfg & PEBS_UPDATE_DS_SW) {
+               cpuc->pebs_data_cfg = pebs_data_cfg;
+               pebs_update_threshold(cpuc);
+       }
 
        if (idx >= INTEL_PMC_IDX_FIXED) {
                if (x86_pmu.intel_cap.pebs_format < 5)
@@ -1391,9 +1401,7 @@ void intel_pmu_pebs_disable(struct perf_event *event)
        struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
        struct hw_perf_event *hwc = &event->hw;
 
-       if (cpuc->n_pebs == cpuc->n_large_pebs &&
-           cpuc->n_pebs != cpuc->n_pebs_via_pt)
-               intel_pmu_drain_pebs_buffer();
+       intel_pmu_drain_large_pebs(cpuc);
 
        cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
 
index fa9b209..d49e90d 100644 (file)
@@ -6150,6 +6150,7 @@ static struct intel_uncore_type spr_uncore_mdf = {
 };
 
 #define UNCORE_SPR_NUM_UNCORE_TYPES            12
+#define UNCORE_SPR_CHA                         0
 #define UNCORE_SPR_IIO                         1
 #define UNCORE_SPR_IMC                         6
 #define UNCORE_SPR_UPI                         8
@@ -6460,12 +6461,22 @@ static int uncore_type_max_boxes(struct intel_uncore_type **types,
        return max + 1;
 }
 
+#define SPR_MSR_UNC_CBO_CONFIG         0x2FFE
+
 void spr_uncore_cpu_init(void)
 {
+       struct intel_uncore_type *type;
+       u64 num_cbo;
+
        uncore_msr_uncores = uncore_get_uncores(UNCORE_ACCESS_MSR,
                                                UNCORE_SPR_MSR_EXTRA_UNCORES,
                                                spr_msr_uncores);
 
+       type = uncore_find_type_by_id(uncore_msr_uncores, UNCORE_SPR_CHA);
+       if (type) {
+               rdmsrl(SPR_MSR_UNC_CBO_CONFIG, num_cbo);
+               type->num_boxes = num_cbo;
+       }
        spr_uncore_iio_free_running.num_boxes = uncore_type_max_boxes(uncore_msr_uncores, UNCORE_SPR_IIO);
 }
 
index a5f9474..6c04b52 100644 (file)
@@ -416,7 +416,7 @@ void __init hyperv_init(void)
                        goto free_vp_assist_page;
        }
 
-       cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
+       cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online",
                                  hv_cpu_init, hv_cpu_die);
        if (cpuhp < 0)
                goto free_ghcb_page;
index 1ba5d3b..85d38b9 100644 (file)
@@ -20,6 +20,8 @@ void __init hv_vtl_init_platform(void)
 {
        pr_info("Linux runs in Hyper-V Virtual Trust Level\n");
 
+       x86_platform.realmode_reserve = x86_init_noop;
+       x86_platform.realmode_init = x86_init_noop;
        x86_init.irqs.pre_vector_init = x86_init_noop;
        x86_init.timers.timer_init = x86_init_noop;
 
index cc92388..14f46ad 100644 (file)
@@ -17,6 +17,7 @@
 #include <asm/mem_encrypt.h>
 #include <asm/mshyperv.h>
 #include <asm/hypervisor.h>
+#include <asm/mtrr.h>
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
@@ -364,7 +365,7 @@ void __init hv_vtom_init(void)
         * Set it here to indicate a vTOM VM.
         */
        sev_status = MSR_AMD64_SNP_VTOM;
-       cc_set_vendor(CC_VENDOR_AMD);
+       cc_vendor = CC_VENDOR_AMD;
        cc_set_mask(ms_hyperv.shared_gpa_boundary);
        physical_mask &= ms_hyperv.shared_gpa_boundary - 1;
 
@@ -372,6 +373,9 @@ void __init hv_vtom_init(void)
        x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
        x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
        x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
+
+       /* Set WB as the default cache mode. */
+       mtrr_overwrite_state(NULL, 0, MTRR_TYPE_WRBACK);
 }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
index 1e51650..4f1ce5f 100644 (file)
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
 
+generated-y += orc_hash.h
 generated-y += syscalls_32.h
 generated-y += syscalls_64.h
 generated-y += syscalls_x32.h
index d7da28f..6c15a62 100644 (file)
@@ -113,7 +113,6 @@ extern void callthunks_patch_builtin_calls(void);
 extern void callthunks_patch_module_calls(struct callthunk_sites *sites,
                                          struct module *mod);
 extern void *callthunks_translate_call_dest(void *dest);
-extern bool is_callthunk(void *addr);
 extern int x86_call_depth_emit_accounting(u8 **pprog, void *func);
 #else
 static __always_inline void callthunks_patch_builtin_calls(void) {}
@@ -124,10 +123,6 @@ static __always_inline void *callthunks_translate_call_dest(void *dest)
 {
        return dest;
 }
-static __always_inline bool is_callthunk(void *addr)
-{
-       return false;
-}
 static __always_inline int x86_call_depth_emit_accounting(u8 **pprog,
                                                          void *func)
 {
index 3216da7..98c32aa 100644 (file)
@@ -55,6 +55,8 @@ extern int local_apic_timer_c2_ok;
 extern int disable_apic;
 extern unsigned int lapic_timer_period;
 
+extern int cpuid_to_apicid[];
+
 extern enum apic_intr_mode_id apic_intr_mode;
 enum apic_intr_mode_id {
        APIC_PIC,
@@ -377,7 +379,6 @@ extern struct apic *__apicdrivers[], *__apicdrivers_end[];
  * APIC functionality to boot other CPUs - only used on SMP:
  */
 #ifdef CONFIG_SMP
-extern int wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip);
 extern int lapic_can_unplug_cpu(void);
 #endif
 
@@ -507,10 +508,8 @@ extern int default_check_phys_apicid_present(int phys_apicid);
 #endif /* CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_SMP
-bool apic_id_is_primary_thread(unsigned int id);
 void apic_smt_update(void);
 #else
-static inline bool apic_id_is_primary_thread(unsigned int id) { return false; }
 static inline void apic_smt_update(void) { }
 #endif
 
index 68d213e..4b125e5 100644 (file)
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_APICDEF_H
 #define _ASM_X86_APICDEF_H
 
+#include <linux/bits.h>
+
 /*
  * Constants for various Intel APICs. (local APIC, IOAPIC, etc.)
  *
 #define                APIC_EILVT_MASKED       (1 << 16)
 
 #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
-#define APIC_BASE_MSR  0x800
-#define XAPIC_ENABLE   (1UL << 11)
-#define X2APIC_ENABLE  (1UL << 10)
+#define APIC_BASE_MSR          0x800
+#define APIC_X2APIC_ID_MSR     0x802
+#define XAPIC_ENABLE           BIT(11)
+#define X2APIC_ENABLE          BIT(10)
 
 #ifdef CONFIG_X86_32
 # define MAX_IO_APICS 64
 #define APIC_CPUID(apicid)     ((apicid) & XAPIC_DEST_CPUS_MASK)
 #define NUM_APIC_CLUSTERS      ((BAD_APICID + 1) >> XAPIC_DEST_CPUS_SHIFT)
 
+#ifndef __ASSEMBLY__
 /*
  * the local APIC register structure, memory mapped. Not terribly well
  * tested, but we might eventually use this one in the future - the
@@ -435,4 +439,5 @@ enum apic_delivery_modes {
        APIC_DELIVERY_MODE_EXTINT       = 7,
 };
 
+#endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_APICDEF_H */
index 5e754e8..55a55ec 100644 (file)
  * resource counting etc..
  */
 
-/**
- * arch_atomic_read - read atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically reads the value of @v.
- */
 static __always_inline int arch_atomic_read(const atomic_t *v)
 {
        /*
@@ -29,25 +23,11 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
        return __READ_ONCE((v)->counter);
 }
 
-/**
- * arch_atomic_set - set atomic variable
- * @v: pointer of type atomic_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i.
- */
 static __always_inline void arch_atomic_set(atomic_t *v, int i)
 {
        __WRITE_ONCE(v->counter, i);
 }
 
-/**
- * arch_atomic_add - add integer to atomic variable
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v.
- */
 static __always_inline void arch_atomic_add(int i, atomic_t *v)
 {
        asm volatile(LOCK_PREFIX "addl %1,%0"
@@ -55,13 +35,6 @@ static __always_inline void arch_atomic_add(int i, atomic_t *v)
                     : "ir" (i) : "memory");
 }
 
-/**
- * arch_atomic_sub - subtract integer from atomic variable
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically subtracts @i from @v.
- */
 static __always_inline void arch_atomic_sub(int i, atomic_t *v)
 {
        asm volatile(LOCK_PREFIX "subl %1,%0"
@@ -69,27 +42,12 @@ static __always_inline void arch_atomic_sub(int i, atomic_t *v)
                     : "ir" (i) : "memory");
 }
 
-/**
- * arch_atomic_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
- */
 static __always_inline bool arch_atomic_sub_and_test(int i, atomic_t *v)
 {
        return GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, e, "er", i);
 }
 #define arch_atomic_sub_and_test arch_atomic_sub_and_test
 
-/**
- * arch_atomic_inc - increment atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1.
- */
 static __always_inline void arch_atomic_inc(atomic_t *v)
 {
        asm volatile(LOCK_PREFIX "incl %0"
@@ -97,12 +55,6 @@ static __always_inline void arch_atomic_inc(atomic_t *v)
 }
 #define arch_atomic_inc arch_atomic_inc
 
-/**
- * arch_atomic_dec - decrement atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1.
- */
 static __always_inline void arch_atomic_dec(atomic_t *v)
 {
        asm volatile(LOCK_PREFIX "decl %0"
@@ -110,69 +62,30 @@ static __always_inline void arch_atomic_dec(atomic_t *v)
 }
 #define arch_atomic_dec arch_atomic_dec
 
-/**
- * arch_atomic_dec_and_test - decrement and test
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
 static __always_inline bool arch_atomic_dec_and_test(atomic_t *v)
 {
        return GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, e);
 }
 #define arch_atomic_dec_and_test arch_atomic_dec_and_test
 
-/**
- * arch_atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
 static __always_inline bool arch_atomic_inc_and_test(atomic_t *v)
 {
        return GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, e);
 }
 #define arch_atomic_inc_and_test arch_atomic_inc_and_test
 
-/**
- * arch_atomic_add_negative - add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v and returns true
- * if the result is negative, or false when
- * result is greater than or equal to zero.
- */
 static __always_inline bool arch_atomic_add_negative(int i, atomic_t *v)
 {
        return GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, s, "er", i);
 }
 #define arch_atomic_add_negative arch_atomic_add_negative
 
-/**
- * arch_atomic_add_return - add integer and return
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v and returns @i + @v
- */
 static __always_inline int arch_atomic_add_return(int i, atomic_t *v)
 {
        return i + xadd(&v->counter, i);
 }
 #define arch_atomic_add_return arch_atomic_add_return
 
-/**
- * arch_atomic_sub_return - subtract integer and return
- * @v: pointer of type atomic_t
- * @i: integer value to subtract
- *
- * Atomically subtracts @i from @v and returns @v - @i
- */
 static __always_inline int arch_atomic_sub_return(int i, atomic_t *v)
 {
        return arch_atomic_add_return(-i, v);
index 808b4ee..3486d91 100644 (file)
@@ -61,30 +61,12 @@ ATOMIC64_DECL(add_unless);
 #undef __ATOMIC64_DECL
 #undef ATOMIC64_EXPORT
 
-/**
- * arch_atomic64_cmpxchg - cmpxchg atomic64 variable
- * @v: pointer to type atomic64_t
- * @o: expected value
- * @n: new value
- *
- * Atomically sets @v to @n if it was equal to @o and returns
- * the old value.
- */
-
 static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 o, s64 n)
 {
        return arch_cmpxchg64(&v->counter, o, n);
 }
 #define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
 
-/**
- * arch_atomic64_xchg - xchg atomic64 variable
- * @v: pointer to type atomic64_t
- * @n: value to assign
- *
- * Atomically xchgs the value of @v to @n and returns
- * the old value.
- */
 static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
 {
        s64 o;
@@ -97,13 +79,6 @@ static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
 }
 #define arch_atomic64_xchg arch_atomic64_xchg
 
-/**
- * arch_atomic64_set - set atomic64 variable
- * @v: pointer to type atomic64_t
- * @i: value to assign
- *
- * Atomically sets the value of @v to @n.
- */
 static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
 {
        unsigned high = (unsigned)(i >> 32);
@@ -113,12 +88,6 @@ static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
                             : "eax", "edx", "memory");
 }
 
-/**
- * arch_atomic64_read - read atomic64 variable
- * @v: pointer to type atomic64_t
- *
- * Atomically reads the value of @v and returns it.
- */
 static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
 {
        s64 r;
@@ -126,13 +95,6 @@ static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
        return r;
 }
 
-/**
- * arch_atomic64_add_return - add and return
- * @i: integer value to add
- * @v: pointer to type atomic64_t
- *
- * Atomically adds @i to @v and returns @i + *@v
- */
 static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
 {
        alternative_atomic64(add_return,
@@ -142,9 +104,6 @@ static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
 }
 #define arch_atomic64_add_return arch_atomic64_add_return
 
-/*
- * Other variants with different arithmetic operators:
- */
 static __always_inline s64 arch_atomic64_sub_return(s64 i, atomic64_t *v)
 {
        alternative_atomic64(sub_return,
@@ -172,13 +131,6 @@ static __always_inline s64 arch_atomic64_dec_return(atomic64_t *v)
 }
 #define arch_atomic64_dec_return arch_atomic64_dec_return
 
-/**
- * arch_atomic64_add - add integer to atomic64 variable
- * @i: integer value to add
- * @v: pointer to type atomic64_t
- *
- * Atomically adds @i to @v.
- */
 static __always_inline s64 arch_atomic64_add(s64 i, atomic64_t *v)
 {
        __alternative_atomic64(add, add_return,
@@ -187,13 +139,6 @@ static __always_inline s64 arch_atomic64_add(s64 i, atomic64_t *v)
        return i;
 }
 
-/**
- * arch_atomic64_sub - subtract the atomic64 variable
- * @i: integer value to subtract
- * @v: pointer to type atomic64_t
- *
- * Atomically subtracts @i from @v.
- */
 static __always_inline s64 arch_atomic64_sub(s64 i, atomic64_t *v)
 {
        __alternative_atomic64(sub, sub_return,
@@ -202,12 +147,6 @@ static __always_inline s64 arch_atomic64_sub(s64 i, atomic64_t *v)
        return i;
 }
 
-/**
- * arch_atomic64_inc - increment atomic64 variable
- * @v: pointer to type atomic64_t
- *
- * Atomically increments @v by 1.
- */
 static __always_inline void arch_atomic64_inc(atomic64_t *v)
 {
        __alternative_atomic64(inc, inc_return, /* no output */,
@@ -215,12 +154,6 @@ static __always_inline void arch_atomic64_inc(atomic64_t *v)
 }
 #define arch_atomic64_inc arch_atomic64_inc
 
-/**
- * arch_atomic64_dec - decrement atomic64 variable
- * @v: pointer to type atomic64_t
- *
- * Atomically decrements @v by 1.
- */
 static __always_inline void arch_atomic64_dec(atomic64_t *v)
 {
        __alternative_atomic64(dec, dec_return, /* no output */,
@@ -228,15 +161,6 @@ static __always_inline void arch_atomic64_dec(atomic64_t *v)
 }
 #define arch_atomic64_dec arch_atomic64_dec
 
-/**
- * arch_atomic64_add_unless - add unless the number is a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns non-zero if the add was done, zero otherwise.
- */
 static __always_inline int arch_atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 {
        unsigned low = (unsigned)u;
index c496595..3165c0f 100644 (file)
 
 #define ATOMIC64_INIT(i)       { (i) }
 
-/**
- * arch_atomic64_read - read atomic64 variable
- * @v: pointer of type atomic64_t
- *
- * Atomically reads the value of @v.
- * Doesn't imply a read memory barrier.
- */
 static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
 {
        return __READ_ONCE((v)->counter);
 }
 
-/**
- * arch_atomic64_set - set atomic64 variable
- * @v: pointer to type atomic64_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i.
- */
 static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
 {
        __WRITE_ONCE(v->counter, i);
 }
 
-/**
- * arch_atomic64_add - add integer to atomic64 variable
- * @i: integer value to add
- * @v: pointer to type atomic64_t
- *
- * Atomically adds @i to @v.
- */
 static __always_inline void arch_atomic64_add(s64 i, atomic64_t *v)
 {
        asm volatile(LOCK_PREFIX "addq %1,%0"
@@ -48,13 +27,6 @@ static __always_inline void arch_atomic64_add(s64 i, atomic64_t *v)
                     : "er" (i), "m" (v->counter) : "memory");
 }
 
-/**
- * arch_atomic64_sub - subtract the atomic64 variable
- * @i: integer value to subtract
- * @v: pointer to type atomic64_t
- *
- * Atomically subtracts @i from @v.
- */
 static __always_inline void arch_atomic64_sub(s64 i, atomic64_t *v)
 {
        asm volatile(LOCK_PREFIX "subq %1,%0"
@@ -62,27 +34,12 @@ static __always_inline void arch_atomic64_sub(s64 i, atomic64_t *v)
                     : "er" (i), "m" (v->counter) : "memory");
 }
 
-/**
- * arch_atomic64_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer to type atomic64_t
- *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
- */
 static __always_inline bool arch_atomic64_sub_and_test(s64 i, atomic64_t *v)
 {
        return GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, e, "er", i);
 }
 #define arch_atomic64_sub_and_test arch_atomic64_sub_and_test
 
-/**
- * arch_atomic64_inc - increment atomic64 variable
- * @v: pointer to type atomic64_t
- *
- * Atomically increments @v by 1.
- */
 static __always_inline void arch_atomic64_inc(atomic64_t *v)
 {
        asm volatile(LOCK_PREFIX "incq %0"
@@ -91,12 +48,6 @@ static __always_inline void arch_atomic64_inc(atomic64_t *v)
 }
 #define arch_atomic64_inc arch_atomic64_inc
 
-/**
- * arch_atomic64_dec - decrement atomic64 variable
- * @v: pointer to type atomic64_t
- *
- * Atomically decrements @v by 1.
- */
 static __always_inline void arch_atomic64_dec(atomic64_t *v)
 {
        asm volatile(LOCK_PREFIX "decq %0"
@@ -105,56 +56,24 @@ static __always_inline void arch_atomic64_dec(atomic64_t *v)
 }
 #define arch_atomic64_dec arch_atomic64_dec
 
-/**
- * arch_atomic64_dec_and_test - decrement and test
- * @v: pointer to type atomic64_t
- *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
 static __always_inline bool arch_atomic64_dec_and_test(atomic64_t *v)
 {
        return GEN_UNARY_RMWcc(LOCK_PREFIX "decq", v->counter, e);
 }
 #define arch_atomic64_dec_and_test arch_atomic64_dec_and_test
 
-/**
- * arch_atomic64_inc_and_test - increment and test
- * @v: pointer to type atomic64_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
 static __always_inline bool arch_atomic64_inc_and_test(atomic64_t *v)
 {
        return GEN_UNARY_RMWcc(LOCK_PREFIX "incq", v->counter, e);
 }
 #define arch_atomic64_inc_and_test arch_atomic64_inc_and_test
 
-/**
- * arch_atomic64_add_negative - add and test if negative
- * @i: integer value to add
- * @v: pointer to type atomic64_t
- *
- * Atomically adds @i to @v and returns true
- * if the result is negative, or false when
- * result is greater than or equal to zero.
- */
 static __always_inline bool arch_atomic64_add_negative(s64 i, atomic64_t *v)
 {
        return GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, s, "er", i);
 }
 #define arch_atomic64_add_negative arch_atomic64_add_negative
 
-/**
- * arch_atomic64_add_return - add and return
- * @i: integer value to add
- * @v: pointer to type atomic64_t
- *
- * Atomically adds @i to @v and returns @i + @v
- */
 static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
 {
        return i + xadd(&v->counter, i);
index 92ae283..f25ca2d 100644 (file)
@@ -4,8 +4,6 @@
 
 #include <asm/processor.h>
 
-extern void check_bugs(void);
-
 #if defined(CONFIG_CPU_SUP_INTEL) && defined(CONFIG_X86_32)
 int ppro_with_ram_bug(void);
 #else
index 540573f..d536365 100644 (file)
@@ -239,29 +239,4 @@ extern void __add_wrong_size(void)
 #define __xadd(ptr, inc, lock) __xchg_op((ptr), (inc), xadd, lock)
 #define xadd(ptr, inc)         __xadd((ptr), (inc), LOCK_PREFIX)
 
-#define __cmpxchg_double(pfx, p1, p2, o1, o2, n1, n2)                  \
-({                                                                     \
-       bool __ret;                                                     \
-       __typeof__(*(p1)) __old1 = (o1), __new1 = (n1);                 \
-       __typeof__(*(p2)) __old2 = (o2), __new2 = (n2);                 \
-       BUILD_BUG_ON(sizeof(*(p1)) != sizeof(long));                    \
-       BUILD_BUG_ON(sizeof(*(p2)) != sizeof(long));                    \
-       VM_BUG_ON((unsigned long)(p1) % (2 * sizeof(long)));            \
-       VM_BUG_ON((unsigned long)((p1) + 1) != (unsigned long)(p2));    \
-       asm volatile(pfx "cmpxchg%c5b %1"                               \
-                    CC_SET(e)                                          \
-                    : CC_OUT(e) (__ret),                               \
-                      "+m" (*(p1)), "+m" (*(p2)),                      \
-                      "+a" (__old1), "+d" (__old2)                     \
-                    : "i" (2 * sizeof(long)),                          \
-                      "b" (__new1), "c" (__new2));                     \
-       __ret;                                                          \
-})
-
-#define arch_cmpxchg_double(p1, p2, o1, o2, n1, n2) \
-       __cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2)
-
-#define arch_cmpxchg_double_local(p1, p2, o1, o2, n1, n2) \
-       __cmpxchg_double(, p1, p2, o1, o2, n1, n2)
-
 #endif /* ASM_X86_CMPXCHG_H */
index 6ba80ce..b5731c5 100644 (file)
@@ -103,6 +103,6 @@ static inline bool __try_cmpxchg64(volatile u64 *ptr, u64 *pold, u64 new)
 
 #endif
 
-#define system_has_cmpxchg_double() boot_cpu_has(X86_FEATURE_CX8)
+#define system_has_cmpxchg64()         boot_cpu_has(X86_FEATURE_CX8)
 
 #endif /* _ASM_X86_CMPXCHG_32_H */
index 0d3beb2..44b08b5 100644 (file)
        arch_try_cmpxchg((ptr), (po), (n));                             \
 })
 
-#define system_has_cmpxchg_double() boot_cpu_has(X86_FEATURE_CX16)
+union __u128_halves {
+       u128 full;
+       struct {
+               u64 low, high;
+       };
+};
+
+#define __arch_cmpxchg128(_ptr, _old, _new, _lock)                     \
+({                                                                     \
+       union __u128_halves o = { .full = (_old), },                    \
+                           n = { .full = (_new), };                    \
+                                                                       \
+       asm volatile(_lock "cmpxchg16b %[ptr]"                          \
+                    : [ptr] "+m" (*(_ptr)),                            \
+                      "+a" (o.low), "+d" (o.high)                      \
+                    : "b" (n.low), "c" (n.high)                        \
+                    : "memory");                                       \
+                                                                       \
+       o.full;                                                         \
+})
+
+static __always_inline u128 arch_cmpxchg128(volatile u128 *ptr, u128 old, u128 new)
+{
+       return __arch_cmpxchg128(ptr, old, new, LOCK_PREFIX);
+}
+#define arch_cmpxchg128 arch_cmpxchg128
+
+static __always_inline u128 arch_cmpxchg128_local(volatile u128 *ptr, u128 old, u128 new)
+{
+       return __arch_cmpxchg128(ptr, old, new,);
+}
+#define arch_cmpxchg128_local arch_cmpxchg128_local
+
+#define __arch_try_cmpxchg128(_ptr, _oldp, _new, _lock)                        \
+({                                                                     \
+       union __u128_halves o = { .full = *(_oldp), },                  \
+                           n = { .full = (_new), };                    \
+       bool ret;                                                       \
+                                                                       \
+       asm volatile(_lock "cmpxchg16b %[ptr]"                          \
+                    CC_SET(e)                                          \
+                    : CC_OUT(e) (ret),                                 \
+                      [ptr] "+m" (*ptr),                               \
+                      "+a" (o.low), "+d" (o.high)                      \
+                    : "b" (n.low), "c" (n.high)                        \
+                    : "memory");                                       \
+                                                                       \
+       if (unlikely(!ret))                                             \
+               *(_oldp) = o.full;                                      \
+                                                                       \
+       likely(ret);                                                    \
+})
+
+static __always_inline bool arch_try_cmpxchg128(volatile u128 *ptr, u128 *oldp, u128 new)
+{
+       return __arch_try_cmpxchg128(ptr, oldp, new, LOCK_PREFIX);
+}
+#define arch_try_cmpxchg128 arch_try_cmpxchg128
+
+static __always_inline bool arch_try_cmpxchg128_local(volatile u128 *ptr, u128 *oldp, u128 new)
+{
+       return __arch_try_cmpxchg128(ptr, oldp, new,);
+}
+#define arch_try_cmpxchg128_local arch_try_cmpxchg128_local
+
+#define system_has_cmpxchg128()                boot_cpu_has(X86_FEATURE_CX16)
 
 #endif /* _ASM_X86_CMPXCHG_64_H */
index eb08796..6ae2d16 100644 (file)
@@ -10,30 +10,13 @@ enum cc_vendor {
        CC_VENDOR_INTEL,
 };
 
-#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
 extern enum cc_vendor cc_vendor;
 
-static inline enum cc_vendor cc_get_vendor(void)
-{
-       return cc_vendor;
-}
-
-static inline void cc_set_vendor(enum cc_vendor vendor)
-{
-       cc_vendor = vendor;
-}
-
+#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
 void cc_set_mask(u64 mask);
 u64 cc_mkenc(u64 val);
 u64 cc_mkdec(u64 val);
 #else
-static inline enum cc_vendor cc_get_vendor(void)
-{
-       return CC_VENDOR_NONE;
-}
-
-static inline void cc_set_vendor(enum cc_vendor vendor) { }
-
 static inline u64 cc_mkenc(u64 val)
 {
        return val;
index 78796b9..3a233eb 100644 (file)
@@ -30,10 +30,7 @@ struct x86_cpu {
 #ifdef CONFIG_HOTPLUG_CPU
 extern int arch_register_cpu(int num);
 extern void arch_unregister_cpu(int);
-extern void start_cpu0(void);
-#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
-extern int _debug_hotplug_cpu(int cpu, int action);
-#endif
+extern void soft_restart_cpu(void);
 #endif
 
 extern void ap_init_aperfmperf(void);
@@ -98,4 +95,6 @@ extern u64 x86_read_arch_cap_msr(void);
 int intel_find_matching_signature(void *mc, unsigned int csig, int cpf);
 int intel_microcode_sanity_check(void *mc, bool print_err, int hdr_type);
 
+extern struct cpumask cpus_stop_mask;
+
 #endif /* _ASM_X86_CPU_H */
index ce0c8f7..a26bebb 100644 (file)
@@ -38,15 +38,10 @@ enum cpuid_leafs
 #define X86_CAP_FMT_NUM "%d:%d"
 #define x86_cap_flag_num(flag) ((flag) >> 5), ((flag) & 31)
 
-#ifdef CONFIG_X86_FEATURE_NAMES
 extern const char * const x86_cap_flags[NCAPINTS*32];
 extern const char * const x86_power_flags[32];
 #define X86_CAP_FMT "%s"
 #define x86_cap_flag(flag) x86_cap_flags[flag]
-#else
-#define X86_CAP_FMT X86_CAP_FMT_NUM
-#define x86_cap_flag x86_cap_flag_num
-#endif
 
 /*
  * In order to save room, we index into this array by doing
index c5aed9e..4acfd57 100644 (file)
@@ -4,11 +4,6 @@
 #ifndef __ASSEMBLY__
 #include <linux/cpumask.h>
 
-extern cpumask_var_t cpu_callin_mask;
-extern cpumask_var_t cpu_callout_mask;
-extern cpumask_var_t cpu_initialized_mask;
-extern cpumask_var_t cpu_sibling_setup_mask;
-
 extern void setup_cpu_local_masks(void);
 
 /*
index 54a6e4a..de0e88b 100644 (file)
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_DOUBLEFAULT_H
 #define _ASM_X86_DOUBLEFAULT_H
 
+#include <linux/linkage.h>
+
 #ifdef CONFIG_X86_32
 extern void doublefault_init_cpu_tss(void);
 #else
@@ -10,4 +12,6 @@ static inline void doublefault_init_cpu_tss(void)
 }
 #endif
 
+asmlinkage void __noreturn doublefault_shim(void);
+
 #endif /* _ASM_X86_DOUBLEFAULT_H */
index 419280d..8b4be7c 100644 (file)
@@ -31,6 +31,8 @@ extern unsigned long efi_mixed_mode_stack_pa;
 
 #define ARCH_EFI_IRQ_FLAGS_MASK        X86_EFLAGS_IF
 
+#define EFI_UNACCEPTED_UNIT_SIZE PMD_SIZE
+
 /*
  * The EFI services are called through variadic functions in many cases. These
  * functions are implemented in assembler and support only a fixed number of
index 503a577..b475d9a 100644 (file)
@@ -109,7 +109,7 @@ extern void fpu_reset_from_exception_fixup(void);
 
 /* Boot, hotplug and resume */
 extern void fpu__init_cpu(void);
-extern void fpu__init_system(struct cpuinfo_x86 *c);
+extern void fpu__init_system(void);
 extern void fpu__init_check_bugs(void);
 extern void fpu__resume_cpu(void);
 
index c2d6cd7..78fcde7 100644 (file)
@@ -39,7 +39,7 @@ extern void fpu_flush_thread(void);
 static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
        if (cpu_feature_enabled(X86_FEATURE_FPU) &&
-           !(current->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+           !(current->flags & (PF_KTHREAD | PF_USER_WORKER))) {
                save_fpregs_to_fpstate(old_fpu);
                /*
                 * The save operation preserved register state, so the
index 5061ac9..b8d4a07 100644 (file)
@@ -106,6 +106,9 @@ struct dyn_arch_ftrace {
 
 #ifndef __ASSEMBLY__
 
+void prepare_ftrace_return(unsigned long ip, unsigned long *parent,
+                          unsigned long frame_pointer);
+
 #if defined(CONFIG_FUNCTION_TRACER) && defined(CONFIG_DYNAMIC_FTRACE)
 extern void set_ftrace_ops_ro(void);
 #else
index 9646ed6..180b1cb 100644 (file)
@@ -350,4 +350,7 @@ static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)              { }
 #endif
 
 static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)       { return mce_amd_feature_init(c); }
+
+unsigned long copy_mc_fragile_handle_tail(char *to, char *from, unsigned len);
+
 #endif /* _ASM_X86_MCE_H */
index b712670..7f97a8a 100644 (file)
 
 #include <asm/bootparam.h>
 
+#ifdef CONFIG_X86_MEM_ENCRYPT
+void __init mem_encrypt_init(void);
+#else
+static inline void mem_encrypt_init(void) { }
+#endif
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern u64 sme_me_mask;
@@ -87,9 +93,6 @@ static inline void mem_encrypt_free_decrypted_mem(void) { }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
-/* Architecture __weak replacement functions */
-void __init mem_encrypt_init(void);
-
 void add_encrypt_protection_map(void);
 
 /*
index 49bb4f2..88d9ef9 100644 (file)
@@ -257,6 +257,11 @@ void hv_set_register(unsigned int reg, u64 value);
 u64 hv_get_non_nested_register(unsigned int reg);
 void hv_set_non_nested_register(unsigned int reg, u64 value);
 
+static __always_inline u64 hv_raw_get_register(unsigned int reg)
+{
+       return __rdmsr(reg);
+}
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
index f0eeaf6..090d658 100644 (file)
 #ifndef _ASM_X86_MTRR_H
 #define _ASM_X86_MTRR_H
 
+#include <linux/bits.h>
 #include <uapi/asm/mtrr.h>
 
+/* Defines for hardware MTRR registers. */
+#define MTRR_CAP_VCNT          GENMASK(7, 0)
+#define MTRR_CAP_FIX           BIT_MASK(8)
+#define MTRR_CAP_WC            BIT_MASK(10)
+
+#define MTRR_DEF_TYPE_TYPE     GENMASK(7, 0)
+#define MTRR_DEF_TYPE_FE       BIT_MASK(10)
+#define MTRR_DEF_TYPE_E                BIT_MASK(11)
+
+#define MTRR_DEF_TYPE_ENABLE   (MTRR_DEF_TYPE_FE | MTRR_DEF_TYPE_E)
+#define MTRR_DEF_TYPE_DISABLE  ~(MTRR_DEF_TYPE_TYPE | MTRR_DEF_TYPE_ENABLE)
+
+#define MTRR_PHYSBASE_TYPE     GENMASK(7, 0)
+#define MTRR_PHYSBASE_RSVD     GENMASK(11, 8)
+
+#define MTRR_PHYSMASK_RSVD     GENMASK(10, 0)
+#define MTRR_PHYSMASK_V                BIT_MASK(11)
+
+struct mtrr_state_type {
+       struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
+       mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
+       unsigned char enabled;
+       bool have_fixed;
+       mtrr_type def_type;
+};
+
 /*
  * The following functions are for use by other drivers that cannot use
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
 void mtrr_bp_init(void);
+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
+                         mtrr_type def_type);
 extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
@@ -40,7 +69,6 @@ extern int mtrr_add_page(unsigned long base, unsigned long size,
                         unsigned int type, bool increment);
 extern int mtrr_del(int reg, unsigned long base, unsigned long size);
 extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
-extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
@@ -48,12 +76,21 @@ void mtrr_disable(void);
 void mtrr_enable(void);
 void mtrr_generic_set_state(void);
 #  else
+static inline void mtrr_overwrite_state(struct mtrr_var_range *var,
+                                       unsigned int num_var,
+                                       mtrr_type def_type)
+{
+}
+
 static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
        /*
-        * Return no-MTRRs:
+        * Return the default MTRR type, without any known other types in
+        * that range.
         */
-       return MTRR_TYPE_INVALID;
+       *uniform = 1;
+
+       return MTRR_TYPE_UNCACHABLE;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
 #define mtrr_save_state() do {} while (0)
@@ -79,9 +116,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 {
        return 0;
 }
-static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
-{
-}
 #define mtrr_bp_init() do {} while (0)
 #define mtrr_bp_restore() do {} while (0)
 #define mtrr_disable() do {} while (0)
@@ -121,7 +155,8 @@ struct mtrr_gentry32 {
 #endif /* CONFIG_COMPAT */
 
 /* Bit fields for enabled in struct mtrr_state_type */
-#define MTRR_STATE_MTRR_FIXED_ENABLED  0x01
-#define MTRR_STATE_MTRR_ENABLED                0x02
+#define MTRR_STATE_SHIFT               10
+#define MTRR_STATE_MTRR_FIXED_ENABLED  (MTRR_DEF_TYPE_FE >> MTRR_STATE_SHIFT)
+#define MTRR_STATE_MTRR_ENABLED                (MTRR_DEF_TYPE_E >> MTRR_STATE_SHIFT)
 
 #endif /* _ASM_X86_MTRR_H */
index c5573ea..1c1b755 100644 (file)
@@ -34,6 +34,8 @@
 #define BYTES_NOP7     0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
 #define BYTES_NOP8     0x3e,BYTES_NOP7
 
+#define ASM_NOP_MAX 8
+
 #else
 
 /*
@@ -47,6 +49,9 @@
  * 6: osp nopl 0x00(%eax,%eax,1)
  * 7: nopl 0x00000000(%eax)
  * 8: nopl 0x00000000(%eax,%eax,1)
+ * 9: cs nopl 0x00000000(%eax,%eax,1)
+ * 10: osp cs nopl 0x00000000(%eax,%eax,1)
+ * 11: osp osp cs nopl 0x00000000(%eax,%eax,1)
  */
 #define BYTES_NOP1     0x90
 #define BYTES_NOP2     0x66,BYTES_NOP1
 #define BYTES_NOP6     0x66,BYTES_NOP5
 #define BYTES_NOP7     0x0f,0x1f,0x80,0x00,0x00,0x00,0x00
 #define BYTES_NOP8     0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
+#define BYTES_NOP9     0x2e,BYTES_NOP8
+#define BYTES_NOP10    0x66,BYTES_NOP9
+#define BYTES_NOP11    0x66,BYTES_NOP10
+
+#define ASM_NOP9  _ASM_BYTES(BYTES_NOP9)
+#define ASM_NOP10 _ASM_BYTES(BYTES_NOP10)
+#define ASM_NOP11 _ASM_BYTES(BYTES_NOP11)
+
+#define ASM_NOP_MAX 11
 
 #endif /* CONFIG_64BIT */
 
@@ -68,8 +82,6 @@
 #define ASM_NOP7 _ASM_BYTES(BYTES_NOP7)
 #define ASM_NOP8 _ASM_BYTES(BYTES_NOP8)
 
-#define ASM_NOP_MAX 8
-
 #ifndef __ASSEMBLY__
 extern const unsigned char * const x86_nops[];
 #endif
index edb2b0c..55388c9 100644 (file)
        movq    $-1, PER_CPU_VAR(pcpu_hot + X86_call_depth);
 
 #define RESET_CALL_DEPTH                                       \
-       mov     $0x80, %rax;                                    \
-       shl     $56, %rax;                                      \
+       xor     %eax, %eax;                                     \
+       bts     $63, %rax;                                      \
        movq    %rax, PER_CPU_VAR(pcpu_hot + X86_call_depth);
 
 #define RESET_CALL_DEPTH_FROM_CALL                             \
-       mov     $0xfc, %rax;                                    \
+       movb    $0xfc, %al;                                     \
        shl     $56, %rax;                                      \
        movq    %rax, PER_CPU_VAR(pcpu_hot + X86_call_depth);   \
        CALL_THUNKS_DEBUG_INC_CALLS
diff --git a/arch/x86/include/asm/orc_header.h b/arch/x86/include/asm/orc_header.h
new file mode 100644 (file)
index 0000000..07bacf3
--- /dev/null
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _ORC_HEADER_H
+#define _ORC_HEADER_H
+
+#include <linux/types.h>
+#include <linux/compiler.h>
+#include <asm/orc_hash.h>
+
+/*
+ * The header is currently a 20-byte hash of the ORC entry definition; see
+ * scripts/orc_hash.sh.
+ */
+#define ORC_HEADER                                     \
+       __used __section(".orc_header") __aligned(4)    \
+       static const u8 orc_header[] = { ORC_HASH }
+
+#endif /* _ORC_HEADER_H */
index 13c0d63..34734d7 100644 (file)
@@ -210,6 +210,67 @@ do {                                                                       \
        (typeof(_var))(unsigned long) pco_old__;                        \
 })
 
+#if defined(CONFIG_X86_32) && !defined(CONFIG_UML)
+#define percpu_cmpxchg64_op(size, qual, _var, _oval, _nval)            \
+({                                                                     \
+       union {                                                         \
+               u64 var;                                                \
+               struct {                                                \
+                       u32 low, high;                                  \
+               };                                                      \
+       } old__, new__;                                                 \
+                                                                       \
+       old__.var = _oval;                                              \
+       new__.var = _nval;                                              \
+                                                                       \
+       asm qual (ALTERNATIVE("leal %P[var], %%esi; call this_cpu_cmpxchg8b_emu", \
+                             "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \
+                 : [var] "+m" (_var),                                  \
+                   "+a" (old__.low),                                   \
+                   "+d" (old__.high)                                   \
+                 : "b" (new__.low),                                    \
+                   "c" (new__.high)                                    \
+                 : "memory", "esi");                                   \
+                                                                       \
+       old__.var;                                                      \
+})
+
+#define raw_cpu_cmpxchg64(pcp, oval, nval)     percpu_cmpxchg64_op(8,         , pcp, oval, nval)
+#define this_cpu_cmpxchg64(pcp, oval, nval)    percpu_cmpxchg64_op(8, volatile, pcp, oval, nval)
+#endif
+
+#ifdef CONFIG_X86_64
+#define raw_cpu_cmpxchg64(pcp, oval, nval)     percpu_cmpxchg_op(8,         , pcp, oval, nval);
+#define this_cpu_cmpxchg64(pcp, oval, nval)    percpu_cmpxchg_op(8, volatile, pcp, oval, nval);
+
+#define percpu_cmpxchg128_op(size, qual, _var, _oval, _nval)           \
+({                                                                     \
+       union {                                                         \
+               u128 var;                                               \
+               struct {                                                \
+                       u64 low, high;                                  \
+               };                                                      \
+       } old__, new__;                                                 \
+                                                                       \
+       old__.var = _oval;                                              \
+       new__.var = _nval;                                              \
+                                                                       \
+       asm qual (ALTERNATIVE("leaq %P[var], %%rsi; call this_cpu_cmpxchg16b_emu", \
+                             "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \
+                 : [var] "+m" (_var),                                  \
+                   "+a" (old__.low),                                   \
+                   "+d" (old__.high)                                   \
+                 : "b" (new__.low),                                    \
+                   "c" (new__.high)                                    \
+                 : "memory", "rsi");                                   \
+                                                                       \
+       old__.var;                                                      \
+})
+
+#define raw_cpu_cmpxchg128(pcp, oval, nval)    percpu_cmpxchg128_op(16,         , pcp, oval, nval)
+#define this_cpu_cmpxchg128(pcp, oval, nval)   percpu_cmpxchg128_op(16, volatile, pcp, oval, nval)
+#endif
+
 /*
  * this_cpu_read() makes gcc load the percpu variable every time it is
  * accessed while this_cpu_read_stable() allows the value to be cached.
@@ -290,23 +351,6 @@ do {                                                                       \
 #define this_cpu_cmpxchg_2(pcp, oval, nval)    percpu_cmpxchg_op(2, volatile, pcp, oval, nval)
 #define this_cpu_cmpxchg_4(pcp, oval, nval)    percpu_cmpxchg_op(4, volatile, pcp, oval, nval)
 
-#ifdef CONFIG_X86_CMPXCHG64
-#define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2)            \
-({                                                                     \
-       bool __ret;                                                     \
-       typeof(pcp1) __o1 = (o1), __n1 = (n1);                          \
-       typeof(pcp2) __o2 = (o2), __n2 = (n2);                          \
-       asm volatile("cmpxchg8b "__percpu_arg(1)                        \
-                    CC_SET(z)                                          \
-                    : CC_OUT(z) (__ret), "+m" (pcp1), "+m" (pcp2), "+a" (__o1), "+d" (__o2) \
-                    : "b" (__n1), "c" (__n2));                         \
-       __ret;                                                          \
-})
-
-#define raw_cpu_cmpxchg_double_4       percpu_cmpxchg8b_double
-#define this_cpu_cmpxchg_double_4      percpu_cmpxchg8b_double
-#endif /* CONFIG_X86_CMPXCHG64 */
-
 /*
  * Per cpu atomic 64 bit operations are only available under 64 bit.
  * 32 bit must fall back to generic operations.
@@ -329,30 +373,6 @@ do {                                                                       \
 #define this_cpu_add_return_8(pcp, val)                percpu_add_return_op(8, volatile, pcp, val)
 #define this_cpu_xchg_8(pcp, nval)             percpu_xchg_op(8, volatile, pcp, nval)
 #define this_cpu_cmpxchg_8(pcp, oval, nval)    percpu_cmpxchg_op(8, volatile, pcp, oval, nval)
-
-/*
- * Pretty complex macro to generate cmpxchg16 instruction.  The instruction
- * is not supported on early AMD64 processors so we must be able to emulate
- * it in software.  The address used in the cmpxchg16 instruction must be
- * aligned to a 16 byte boundary.
- */
-#define percpu_cmpxchg16b_double(pcp1, pcp2, o1, o2, n1, n2)           \
-({                                                                     \
-       bool __ret;                                                     \
-       typeof(pcp1) __o1 = (o1), __n1 = (n1);                          \
-       typeof(pcp2) __o2 = (o2), __n2 = (n2);                          \
-       alternative_io("leaq %P1,%%rsi\n\tcall this_cpu_cmpxchg16b_emu\n\t", \
-                      "cmpxchg16b " __percpu_arg(1) "\n\tsetz %0\n\t", \
-                      X86_FEATURE_CX16,                                \
-                      ASM_OUTPUT2("=a" (__ret), "+m" (pcp1),           \
-                                  "+m" (pcp2), "+d" (__o2)),           \
-                      "b" (__n1), "c" (__n2), "a" (__o1) : "rsi");     \
-       __ret;                                                          \
-})
-
-#define raw_cpu_cmpxchg_double_8       percpu_cmpxchg16b_double
-#define this_cpu_cmpxchg_double_8      percpu_cmpxchg16b_double
-
 #endif
 
 static __always_inline bool x86_this_cpu_constant_test_bit(unsigned int nr,
index 8fc15ed..85a9fd5 100644 (file)
 #define ARCH_PERFMON_EVENTSEL_INV                      (1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK                    0xFF000000ULL
 
+#define INTEL_FIXED_BITS_MASK                          0xFULL
+#define INTEL_FIXED_BITS_STRIDE                        4
+#define INTEL_FIXED_0_KERNEL                           (1ULL << 0)
+#define INTEL_FIXED_0_USER                             (1ULL << 1)
+#define INTEL_FIXED_0_ANYTHREAD                        (1ULL << 2)
+#define INTEL_FIXED_0_ENABLE_PMI                       (1ULL << 3)
+
 #define HSW_IN_TX                                      (1ULL << 32)
 #define HSW_IN_TX_CHECKPOINTED                         (1ULL << 33)
 #define ICL_EVENTSEL_ADAPTIVE                          (1ULL << 34)
 #define ICL_FIXED_0_ADAPTIVE                           (1ULL << 32)
 
+#define intel_fixed_bits_by_idx(_idx, _bits)                   \
+       ((_bits) << ((_idx) * INTEL_FIXED_BITS_STRIDE))
+
 #define AMD64_EVENTSEL_INT_CORE_ENABLE                 (1ULL << 36)
 #define AMD64_EVENTSEL_GUESTONLY                       (1ULL << 40)
 #define AMD64_EVENTSEL_HOSTONLY                                (1ULL << 41)
 #define PEBS_DATACFG_LBRS      BIT_ULL(3)
 #define PEBS_DATACFG_LBR_SHIFT 24
 
+/* Steal the highest bit of pebs_data_cfg for SW usage */
+#define PEBS_UPDATE_DS_SW      BIT_ULL(63)
+
 /*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
@@ -475,8 +488,10 @@ struct pebs_xmm {
 
 #ifdef CONFIG_X86_LOCAL_APIC
 extern u32 get_ibs_caps(void);
+extern int forward_event_to_ibs(struct perf_event *event);
 #else
 static inline u32 get_ibs_caps(void) { return 0; }
+static inline int forward_event_to_ibs(struct perf_event *event) { return -ENOENT; }
 #endif
 
 #ifdef CONFIG_PERF_EVENTS
index 15ae4d6..5700bb3 100644 (file)
@@ -27,6 +27,7 @@
 extern pgd_t early_top_pgt[PTRS_PER_PGD];
 bool __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
 
+struct seq_file;
 void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm);
 void ptdump_walk_pgd_level_debugfs(struct seq_file *m, struct mm_struct *mm,
                                   bool user);
index 7929327..a629b1b 100644 (file)
@@ -237,8 +237,8 @@ static inline void native_pgd_clear(pgd_t *pgd)
 
 #define __pte_to_swp_entry(pte)                ((swp_entry_t) { pte_val((pte)) })
 #define __pmd_to_swp_entry(pmd)                ((swp_entry_t) { pmd_val((pmd)) })
-#define __swp_entry_to_pte(x)          ((pte_t) { .pte = (x).val })
-#define __swp_entry_to_pmd(x)          ((pmd_t) { .pmd = (x).val })
+#define __swp_entry_to_pte(x)          (__pte((x).val))
+#define __swp_entry_to_pmd(x)          (__pmd((x).val))
 
 extern void cleanup_highmap(void);
 
index 447d4be..ba3e255 100644 (file)
@@ -513,9 +513,6 @@ extern void native_pagetable_init(void);
 #define native_pagetable_init        paging_init
 #endif
 
-struct seq_file;
-extern void arch_report_meminfo(struct seq_file *m);
-
 enum pg_level {
        PG_LEVEL_NONE,
        PG_LEVEL_4K,
index a1e4fa5..d46300e 100644 (file)
@@ -551,7 +551,6 @@ extern void switch_gdt_and_percpu_base(int);
 extern void load_direct_gdt(int);
 extern void load_fixmap_gdt(int);
 extern void cpu_init(void);
-extern void cpu_init_secondary(void);
 extern void cpu_init_exception_handling(void);
 extern void cr4_init(void);
 
index f6a1737..87e5482 100644 (file)
@@ -52,6 +52,7 @@ struct trampoline_header {
        u64 efer;
        u32 cr4;
        u32 flags;
+       u32 lock;
 #endif
 };
 
@@ -64,6 +65,8 @@ extern unsigned long initial_stack;
 extern unsigned long initial_vc_handler;
 #endif
 
+extern u32 *trampoline_lock;
+
 extern unsigned char real_mode_blob[];
 extern unsigned char real_mode_relocs[];
 
index 0759af9..b463fcb 100644 (file)
@@ -106,8 +106,13 @@ enum psc_op {
 #define GHCB_HV_FT_SNP                 BIT_ULL(0)
 #define GHCB_HV_FT_SNP_AP_CREATION     BIT_ULL(1)
 
-/* SNP Page State Change NAE event */
-#define VMGEXIT_PSC_MAX_ENTRY          253
+/*
+ * SNP Page State Change NAE event
+ *   The VMGEXIT_PSC_MAX_ENTRY determines the size of the PSC structure, which
+ *   is a local stack variable in set_pages_state(). Do not increase this value
+ *   without evaluating the impact to stack usage.
+ */
+#define VMGEXIT_PSC_MAX_ENTRY          64
 
 struct psc_hdr {
        u16 cur_entry;
index 13dc2a9..66c8067 100644 (file)
@@ -14,6 +14,7 @@
 #include <asm/insn.h>
 #include <asm/sev-common.h>
 #include <asm/bootparam.h>
+#include <asm/coco.h>
 
 #define GHCB_PROTOCOL_MIN      1ULL
 #define GHCB_PROTOCOL_MAX      2ULL
@@ -80,11 +81,15 @@ extern void vc_no_ghcb(void);
 extern void vc_boot_ghcb(void);
 extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 
+/* PVALIDATE return codes */
+#define PVALIDATE_FAIL_SIZEMISMATCH    6
+
 /* Software defined (when rFlags.CF = 1) */
 #define PVALIDATE_FAIL_NOUPDATE                255
 
 /* RMP page size */
 #define RMP_PG_SIZE_4K                 0
+#define RMP_PG_SIZE_2M                 1
 
 #define RMPADJUST_VMSA_PAGE_BIT                BIT(16)
 
@@ -136,24 +141,26 @@ struct snp_secrets_page_layout {
 } __packed;
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
-extern struct static_key_false sev_es_enable_key;
 extern void __sev_es_ist_enter(struct pt_regs *regs);
 extern void __sev_es_ist_exit(void);
 static __always_inline void sev_es_ist_enter(struct pt_regs *regs)
 {
-       if (static_branch_unlikely(&sev_es_enable_key))
+       if (cc_vendor == CC_VENDOR_AMD &&
+           cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
                __sev_es_ist_enter(regs);
 }
 static __always_inline void sev_es_ist_exit(void)
 {
-       if (static_branch_unlikely(&sev_es_enable_key))
+       if (cc_vendor == CC_VENDOR_AMD &&
+           cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
                __sev_es_ist_exit();
 }
 extern int sev_es_setup_ap_jump_table(struct real_mode_header *rmh);
 extern void __sev_es_nmi_complete(void);
 static __always_inline void sev_es_nmi_complete(void)
 {
-       if (static_branch_unlikely(&sev_es_enable_key))
+       if (cc_vendor == CC_VENDOR_AMD &&
+           cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
                __sev_es_nmi_complete();
 }
 extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
@@ -192,16 +199,17 @@ struct snp_guest_request_ioctl;
 
 void setup_ghcb(void);
 void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
-                                        unsigned int npages);
+                                        unsigned long npages);
 void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
-                                       unsigned int npages);
+                                       unsigned long npages);
 void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
-void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
-void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
+void snp_set_memory_shared(unsigned long vaddr, unsigned long npages);
+void snp_set_memory_private(unsigned long vaddr, unsigned long npages);
 void snp_set_wakeup_secondary_cpu(void);
 bool snp_init(struct boot_params *bp);
 void __init __noreturn snp_abort(void);
 int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
+void snp_accept_memory(phys_addr_t start, phys_addr_t end);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -212,12 +220,12 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
 static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { return 0; }
 static inline void setup_ghcb(void) { }
 static inline void __init
-early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages) { }
 static inline void __init
-early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages) { }
 static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
-static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
-static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_memory_shared(unsigned long vaddr, unsigned long npages) { }
+static inline void snp_set_memory_private(unsigned long vaddr, unsigned long npages) { }
 static inline void snp_set_wakeup_secondary_cpu(void) { }
 static inline bool snp_init(struct boot_params *bp) { return false; }
 static inline void snp_abort(void) { }
@@ -225,6 +233,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
 {
        return -ENOTTY;
 }
+
+static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
 #endif
 
 #endif
index 2631e01..7513b3b 100644 (file)
 #define TDX_CPUID_LEAF_ID      0x21
 #define TDX_IDENT              "IntelTDX    "
 
+/* TDX module Call Leaf IDs */
+#define TDX_GET_INFO                   1
+#define TDX_GET_VEINFO                 3
+#define TDX_GET_REPORT                 4
+#define TDX_ACCEPT_PAGE                        6
+#define TDX_WR                         8
+
+/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
+#define TDCS_NOTIFY_ENABLES            0x9100000000000010
+
+/* TDX hypercall Leaf IDs */
+#define TDVMCALL_MAP_GPA               0x10001
+#define TDVMCALL_REPORT_FATAL_ERROR    0x10003
+
 #ifndef __ASSEMBLY__
 
 /*
@@ -37,8 +51,58 @@ struct tdx_hypercall_args {
 u64 __tdx_hypercall(struct tdx_hypercall_args *args);
 u64 __tdx_hypercall_ret(struct tdx_hypercall_args *args);
 
+/*
+ * Wrapper for standard use of __tdx_hypercall with no output aside from
+ * return code.
+ */
+static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
+{
+       struct tdx_hypercall_args args = {
+               .r10 = TDX_HYPERCALL_STANDARD,
+               .r11 = fn,
+               .r12 = r12,
+               .r13 = r13,
+               .r14 = r14,
+               .r15 = r15,
+       };
+
+       return __tdx_hypercall(&args);
+}
+
+
 /* Called from __tdx_hypercall() for unrecoverable failure */
 void __tdx_hypercall_failed(void);
 
+/*
+ * Used in __tdx_module_call() to gather the output registers' values of the
+ * TDCALL instruction when requesting services from the TDX module. This is a
+ * software only structure and not part of the TDX module/VMM ABI
+ */
+struct tdx_module_output {
+       u64 rcx;
+       u64 rdx;
+       u64 r8;
+       u64 r9;
+       u64 r10;
+       u64 r11;
+};
+
+/* Used to communicate with the TDX module */
+u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+                     struct tdx_module_output *out);
+
+bool tdx_accept_memory(phys_addr_t start, phys_addr_t end);
+
+/*
+ * The TDG.VP.VMCALL-Instruction-execution sub-functions are defined
+ * independently from but are currently matched 1:1 with VMX EXIT_REASONs.
+ * Reusing the KVM EXIT_REASON macros makes it easier to connect the host and
+ * guest sides of these calls.
+ */
+static __always_inline u64 hcall_func(u64 exit_reason)
+{
+        return exit_reason;
+}
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_SHARED_TDX_H */
index 5b1ed65..84eab27 100644 (file)
@@ -85,6 +85,4 @@ struct rt_sigframe_x32 {
 
 #endif /* CONFIG_X86_64 */
 
-void __init init_sigframe_size(void);
-
 #endif /* _ASM_X86_SIGFRAME_H */
index 4e91054..600cf25 100644 (file)
@@ -38,7 +38,9 @@ struct smp_ops {
        void (*crash_stop_other_cpus)(void);
        void (*smp_send_reschedule)(int cpu);
 
-       int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
+       void (*cleanup_dead_cpu)(unsigned cpu);
+       void (*poll_sync_state)(void);
+       int (*kick_ap_alive)(unsigned cpu, struct task_struct *tidle);
        int (*cpu_disable)(void);
        void (*cpu_die)(unsigned int cpu);
        void (*play_dead)(void);
@@ -78,11 +80,6 @@ static inline void smp_cpus_done(unsigned int max_cpus)
        smp_ops.smp_cpus_done(max_cpus);
 }
 
-static inline int __cpu_up(unsigned int cpu, struct task_struct *tidle)
-{
-       return smp_ops.cpu_up(cpu, tidle);
-}
-
 static inline int __cpu_disable(void)
 {
        return smp_ops.cpu_disable();
@@ -90,7 +87,8 @@ static inline int __cpu_disable(void)
 
 static inline void __cpu_die(unsigned int cpu)
 {
-       smp_ops.cpu_die(cpu);
+       if (smp_ops.cpu_die)
+               smp_ops.cpu_die(cpu);
 }
 
 static inline void __noreturn play_dead(void)
@@ -121,22 +119,23 @@ void native_smp_prepare_cpus(unsigned int max_cpus);
 void calculate_max_logical_packages(void);
 void native_smp_cpus_done(unsigned int max_cpus);
 int common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
-int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
+int native_kick_ap(unsigned int cpu, struct task_struct *tidle);
 int native_cpu_disable(void);
-int common_cpu_die(unsigned int cpu);
-void native_cpu_die(unsigned int cpu);
 void __noreturn hlt_play_dead(void);
 void native_play_dead(void);
 void play_dead_common(void);
 void wbinvd_on_cpu(int cpu);
 int wbinvd_on_all_cpus(void);
-void cond_wakeup_cpu0(void);
+
+void smp_kick_mwait_play_dead(void);
 
 void native_smp_send_reschedule(int cpu);
 void native_send_call_func_ipi(const struct cpumask *mask);
 void native_send_call_func_single_ipi(int cpu);
 void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);
 
+bool smp_park_other_cpus_in_init(void);
+
 void smp_store_boot_cpu_info(void);
 void smp_store_cpu_info(int id);
 
@@ -201,7 +200,14 @@ extern void nmi_selftest(void);
 #endif
 
 extern unsigned int smpboot_control;
+extern unsigned long apic_mmio_base;
 
 #endif /* !__ASSEMBLY__ */
 
+/* Control bits for startup_64 */
+#define STARTUP_READ_APICID    0x80000000
+
+/* Top 8 bits are reserved for control */
+#define STARTUP_PARALLEL_MASK  0xFF000000
+
 #endif /* _ASM_X86_SMP_H */
index 5b85987..4fb36fb 100644 (file)
@@ -127,9 +127,11 @@ static inline int syscall_get_arch(struct task_struct *task)
 }
 
 void do_syscall_64(struct pt_regs *regs, int nr);
-void do_int80_syscall_32(struct pt_regs *regs);
-long do_fast_syscall_32(struct pt_regs *regs);
 
 #endif /* CONFIG_X86_32 */
 
+void do_int80_syscall_32(struct pt_regs *regs);
+long do_fast_syscall_32(struct pt_regs *regs);
+long do_SYSENTER_32(struct pt_regs *regs);
+
 #endif /* _ASM_X86_SYSCALL_H */
index 28d889c..603e6d1 100644 (file)
@@ -5,6 +5,8 @@
 
 #include <linux/init.h>
 #include <linux/bits.h>
+
+#include <asm/errno.h>
 #include <asm/ptrace.h>
 #include <asm/shared/tdx.h>
 
 #ifndef __ASSEMBLY__
 
 /*
- * Used to gather the output registers values of the TDCALL and SEAMCALL
- * instructions when requesting services from the TDX module.
- *
- * This is a software only structure and not part of the TDX module/VMM ABI.
- */
-struct tdx_module_output {
-       u64 rcx;
-       u64 rdx;
-       u64 r8;
-       u64 r9;
-       u64 r10;
-       u64 r11;
-};
-
-/*
  * Used by the #VE exception handler to gather the #VE exception
  * info from the TDX module. This is a software only structure
  * and not part of the TDX module/VMM ABI.
@@ -55,10 +42,6 @@ struct ve_info {
 
 void __init tdx_early_init(void);
 
-/* Used to communicate with the TDX module */
-u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
-                     struct tdx_module_output *out);
-
 void tdx_get_ve_info(struct ve_info *ve);
 
 bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve);
index 75bfaa4..80450e1 100644 (file)
@@ -14,6 +14,8 @@
 #include <asm/processor-flags.h>
 #include <asm/pgtable.h>
 
+DECLARE_PER_CPU(u64, tlbstate_untag_mask);
+
 void __flush_tlb_all(void);
 
 #define TLB_FLUSH_ALL  -1UL
@@ -54,15 +56,6 @@ static inline void cr4_clear_bits(unsigned long mask)
        local_irq_restore(flags);
 }
 
-#ifdef CONFIG_ADDRESS_MASKING
-DECLARE_PER_CPU(u64, tlbstate_untag_mask);
-
-static inline u64 current_untag_mask(void)
-{
-       return this_cpu_read(tlbstate_untag_mask);
-}
-#endif
-
 #ifndef MODULE
 /*
  * 6 because 6 should be plenty and struct tlb_state will fit in two cache
index 458c891..caf41c4 100644 (file)
@@ -31,9 +31,9 @@
  * CONFIG_NUMA.
  */
 #include <linux/numa.h>
+#include <linux/cpumask.h>
 
 #ifdef CONFIG_NUMA
-#include <linux/cpumask.h>
 
 #include <asm/mpspec.h>
 #include <asm/percpu.h>
@@ -139,23 +139,31 @@ static inline int topology_max_smt_threads(void)
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
 int topology_update_die_map(unsigned int dieid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
-int topology_phys_to_logical_die(unsigned int die, unsigned int cpu);
-bool topology_is_primary_thread(unsigned int cpu);
 bool topology_smt_supported(void);
-#else
+
+extern struct cpumask __cpu_primary_thread_mask;
+#define cpu_primary_thread_mask ((const struct cpumask *)&__cpu_primary_thread_mask)
+
+/**
+ * topology_is_primary_thread - Check whether CPU is the primary SMT thread
+ * @cpu:       CPU to check
+ */
+static inline bool topology_is_primary_thread(unsigned int cpu)
+{
+       return cpumask_test_cpu(cpu, cpu_primary_thread_mask);
+}
+#else /* CONFIG_SMP */
 #define topology_max_packages()                        (1)
 static inline int
 topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; }
 static inline int
 topology_update_die_map(unsigned int dieid, unsigned int cpu) { return 0; }
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
-static inline int topology_phys_to_logical_die(unsigned int die,
-               unsigned int cpu) { return 0; }
 static inline int topology_max_die_per_package(void) { return 1; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; }
 static inline bool topology_smt_supported(void) { return false; }
-#endif
+#endif /* !CONFIG_SMP */
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
 {
index fbdc3d9..dc1b03b 100644 (file)
@@ -55,12 +55,10 @@ extern bool tsc_async_resets;
 #ifdef CONFIG_X86_TSC
 extern bool tsc_store_and_check_tsc_adjust(bool bootcpu);
 extern void tsc_verify_tsc_adjust(bool resume);
-extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
 #else
 static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) { return false; }
 static inline void tsc_verify_tsc_adjust(bool resume) { }
-static inline void check_tsc_sync_source(int cpu) { }
 static inline void check_tsc_sync_target(void) { }
 #endif
 
diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
new file mode 100644 (file)
index 0000000..f5937e9
--- /dev/null
@@ -0,0 +1,27 @@
+#ifndef _ASM_X86_UNACCEPTED_MEMORY_H
+#define _ASM_X86_UNACCEPTED_MEMORY_H
+
+#include <linux/efi.h>
+#include <asm/tdx.h>
+#include <asm/sev.h>
+
+static inline void arch_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+       /* Platform-specific memory-acceptance call goes here */
+       if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
+               if (!tdx_accept_memory(start, end))
+                       panic("TDX: Failed to accept memory\n");
+       } else if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
+               snp_accept_memory(start, end);
+       } else {
+               panic("Cannot accept memory: unknown platform\n");
+       }
+}
+
+static inline struct efi_unaccepted_memory *efi_get_unaccepted_table(void)
+{
+       if (efi.unaccepted == EFI_INVALID_TABLE_ADDR)
+               return NULL;
+       return __va(efi.unaccepted);
+}
+#endif
index 01cb969..85cc57c 100644 (file)
 
 #else
 
+#define UNWIND_HINT_UNDEFINED \
+       UNWIND_HINT(UNWIND_HINT_TYPE_UNDEFINED, 0, 0, 0)
+
 #define UNWIND_HINT_FUNC \
        UNWIND_HINT(UNWIND_HINT_TYPE_FUNC, ORC_REG_SP, 8, 0)
 
+#define UNWIND_HINT_SAVE \
+       UNWIND_HINT(UNWIND_HINT_TYPE_SAVE, 0, 0, 0)
+
+#define UNWIND_HINT_RESTORE \
+       UNWIND_HINT(UNWIND_HINT_TYPE_RESTORE, 0, 0, 0)
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_UNWIND_HINTS_H */
index d3e3197..5fa76c2 100644 (file)
@@ -177,6 +177,7 @@ struct uv_hub_info_s {
        unsigned short          nr_possible_cpus;
        unsigned short          nr_online_cpus;
        short                   memory_nid;
+       unsigned short          *node_to_socket;
 };
 
 /* CPU specific info with a pointer to the hub common info struct */
@@ -519,25 +520,30 @@ static inline int uv_socket_to_node(int socket)
        return _uv_socket_to_node(socket, uv_hub_info->socket_to_node);
 }
 
+static inline int uv_pnode_to_socket(int pnode)
+{
+       unsigned short *p2s = uv_hub_info->pnode_to_socket;
+
+       return p2s ? p2s[pnode - uv_hub_info->min_pnode] : pnode;
+}
+
 /* pnode, offset --> socket virtual */
 static inline void *uv_pnode_offset_to_vaddr(int pnode, unsigned long offset)
 {
        unsigned int m_val = uv_hub_info->m_val;
        unsigned long base;
-       unsigned short sockid, node, *p2s;
+       unsigned short sockid;
 
        if (m_val)
                return __va(((unsigned long)pnode << m_val) | offset);
 
-       p2s = uv_hub_info->pnode_to_socket;
-       sockid = p2s ? p2s[pnode - uv_hub_info->min_pnode] : pnode;
-       node = uv_socket_to_node(sockid);
+       sockid = uv_pnode_to_socket(pnode);
 
        /* limit address of previous socket is our base, except node 0 is 0 */
-       if (!node)
+       if (sockid == 0)
                return __va((unsigned long)offset);
 
-       base = (unsigned long)(uv_hub_info->gr_table[node - 1].limit);
+       base = (unsigned long)(uv_hub_info->gr_table[sockid - 1].limit);
        return __va(base << UV_GAM_RANGE_SHFT | offset);
 }
 
@@ -644,7 +650,7 @@ static inline int uv_cpu_blade_processor_id(int cpu)
 /* Blade number to Node number (UV2..UV4 is 1:1) */
 static inline int uv_blade_to_node(int blade)
 {
-       return blade;
+       return uv_socket_to_node(blade);
 }
 
 /* Blade number of current cpu. Numnbered 0 .. <#blades -1> */
@@ -656,23 +662,27 @@ static inline int uv_numa_blade_id(void)
 /*
  * Convert linux node number to the UV blade number.
  * .. Currently for UV2 thru UV4 the node and the blade are identical.
- * .. If this changes then you MUST check references to this function!
+ * .. UV5 needs conversion when sub-numa clustering is enabled.
  */
 static inline int uv_node_to_blade_id(int nid)
 {
-       return nid;
+       unsigned short *n2s = uv_hub_info->node_to_socket;
+
+       return n2s ? n2s[nid] : nid;
 }
 
 /* Convert a CPU number to the UV blade number */
 static inline int uv_cpu_to_blade_id(int cpu)
 {
-       return uv_node_to_blade_id(cpu_to_node(cpu));
+       return uv_cpu_hub_info(cpu)->numa_blade_id;
 }
 
 /* Convert a blade id to the PNODE of the blade */
 static inline int uv_blade_to_pnode(int bid)
 {
-       return uv_hub_info_list(uv_blade_to_node(bid))->pnode;
+       unsigned short *s2p = uv_hub_info->socket_to_pnode;
+
+       return s2p ? s2p[bid] : bid;
 }
 
 /* Nid of memory node on blade. -1 if no blade-local memory */
index 57fa673..bb45812 100644 (file)
@@ -4199,6 +4199,13 @@ union uvh_rh_gam_mmioh_overlay_config1_u {
 #define UV3H_RH_GAM_MMIOH_REDIRECT_CONFIG0_NASID_SHFT  0
 #define UV3H_RH_GAM_MMIOH_REDIRECT_CONFIG0_NASID_MASK  0x0000000000007fffUL
 
+/* UVH common defines */
+#define UVH_RH_GAM_MMIOH_REDIRECT_CONFIG0_NASID_MASK (                 \
+       is_uv(UV4A) ? UV4AH_RH_GAM_MMIOH_REDIRECT_CONFIG0_NASID_MASK :  \
+       is_uv(UV4)  ?  UV4H_RH_GAM_MMIOH_REDIRECT_CONFIG0_NASID_MASK :  \
+       is_uv(UV3)  ?  UV3H_RH_GAM_MMIOH_REDIRECT_CONFIG0_NASID_MASK :  \
+       0)
+
 
 union uvh_rh_gam_mmioh_redirect_config0_u {
        unsigned long   v;
@@ -4247,8 +4254,8 @@ union uvh_rh_gam_mmioh_redirect_config0_u {
        0)
 
 /* UV4A unique defines */
-#define UV4AH_RH_GAM_MMIOH_REDIRECT_CONFIG0_NASID_SHFT 0
-#define UV4AH_RH_GAM_MMIOH_REDIRECT_CONFIG0_NASID_MASK 0x0000000000000fffUL
+#define UV4AH_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_SHFT 0
+#define UV4AH_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_MASK 0x0000000000000fffUL
 
 /* UV4 unique defines */
 #define UV4H_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_SHFT  0
@@ -4258,6 +4265,13 @@ union uvh_rh_gam_mmioh_redirect_config0_u {
 #define UV3H_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_SHFT  0
 #define UV3H_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_MASK  0x0000000000007fffUL
 
+/* UVH common defines */
+#define UVH_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_MASK (                 \
+       is_uv(UV4A) ? UV4AH_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_MASK :  \
+       is_uv(UV4)  ?  UV4H_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_MASK :  \
+       is_uv(UV3)  ?  UV3H_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_MASK :  \
+       0)
+
 
 union uvh_rh_gam_mmioh_redirect_config1_u {
        unsigned long   v;
index 4cf6794..c81858d 100644 (file)
@@ -231,14 +231,19 @@ static u64 vread_pvclock(void)
                ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
        } while (pvclock_read_retry(pvti, version));
 
-       return ret;
+       return ret & S64_MAX;
 }
 #endif
 
 #ifdef CONFIG_HYPERV_TIMER
 static u64 vread_hvclock(void)
 {
-       return hv_read_tsc_page(&hvclock_page);
+       u64 tsc, time;
+
+       if (hv_read_tsc_page_tsc(&hvclock_page, &tsc, &time))
+               return time & S64_MAX;
+
+       return U64_MAX;
 }
 #endif
 
@@ -246,7 +251,7 @@ static inline u64 __arch_get_hw_counter(s32 clock_mode,
                                        const struct vdso_data *vd)
 {
        if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
-               return (u64)rdtsc_ordered();
+               return (u64)rdtsc_ordered() & S64_MAX;
        /*
         * For any memory-mapped vclock type, we need to make sure that gcc
         * doesn't cleverly hoist a load before the mode check.  Otherwise we
@@ -284,6 +289,9 @@ static inline bool arch_vdso_clocksource_ok(const struct vdso_data *vd)
  * which can be invalidated asynchronously and indicate invalidation by
  * returning U64_MAX, which can be effectively tested by checking for a
  * negative value after casting it to s64.
+ *
+ * This effectively forces a S64_MAX mask on the calculations, unlike the
+ * U64_MAX mask normally used by x86 clocksources.
  */
 static inline bool arch_vdso_cycles_ok(u64 cycles)
 {
@@ -303,18 +311,29 @@ static inline bool arch_vdso_cycles_ok(u64 cycles)
  * @last. If not then use @last, which is the base time of the current
  * conversion period.
  *
- * This variant also removes the masking of the subtraction because the
- * clocksource mask of all VDSO capable clocksources on x86 is U64_MAX
- * which would result in a pointless operation. The compiler cannot
- * optimize it away as the mask comes from the vdso data and is not compile
- * time constant.
+ * This variant also uses a custom mask because while the clocksource mask of
+ * all the VDSO capable clocksources on x86 is U64_MAX, the above code uses
+ * U64_MASK as an exception value, additionally arch_vdso_cycles_ok() above
+ * declares everything with the MSB/Sign-bit set as invalid. Therefore the
+ * effective mask is S64_MAX.
  */
 static __always_inline
 u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
 {
-       if (cycles > last)
-               return (cycles - last) * mult;
-       return 0;
+       /*
+        * Due to the MSB/Sign-bit being used as invald marker (see
+        * arch_vdso_cycles_valid() above), the effective mask is S64_MAX.
+        */
+       u64 delta = (cycles - last) & S64_MAX;
+
+       /*
+        * Due to the above mentioned TSC wobbles, filter out negative motion.
+        * Per the above masking, the effective sign bit is now bit 62.
+        */
+       if (unlikely(delta & (1ULL << 62)))
+               return 0;
+
+       return delta * mult;
 }
 #define vdso_calc_delta vdso_calc_delta
 
index 498dc60..0d02c4a 100644 (file)
@@ -13,7 +13,9 @@
 
 
 #include <linux/bitops.h>
+#include <linux/bug.h>
 #include <linux/types.h>
+
 #include <uapi/asm/vmx.h>
 #include <asm/vmxfeatures.h>
 
index 88085f3..5240d88 100644 (file)
@@ -150,7 +150,7 @@ struct x86_init_acpi {
  * @enc_cache_flush_required   Returns true if a cache flush is needed before changing page encryption status
  */
 struct x86_guest {
-       void (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
+       bool (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
        bool (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc);
        bool (*enc_tlb_flush_required)(bool enc);
        bool (*enc_cache_flush_required)(void);
@@ -177,11 +177,14 @@ struct x86_init_ops {
  * struct x86_cpuinit_ops - platform specific cpu hotplug setups
  * @setup_percpu_clockev:      set up the per cpu clock event device
  * @early_percpu_clock_init:   early init of the per cpu clock event device
+ * @fixup_cpu_id:              fixup function for cpuinfo_x86::phys_proc_id
+ * @parallel_bringup:          Parallel bringup control
  */
 struct x86_cpuinit_ops {
        void (*setup_percpu_clockev)(void);
        void (*early_percpu_clock_init)(void);
        void (*fixup_cpu_id)(struct cpuinfo_x86 *c, int node);
+       bool parallel_bringup;
 };
 
 struct timespec64;
index 376563f..3a8a8eb 100644 (file)
@@ -81,14 +81,6 @@ typedef __u8 mtrr_type;
 #define MTRR_NUM_FIXED_RANGES 88
 #define MTRR_MAX_VAR_RANGES 256
 
-struct mtrr_state_type {
-       struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
-       mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
-       unsigned char enabled;
-       unsigned char have_fixed;
-       mtrr_type def_type;
-};
-
 #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
 #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
 
@@ -115,9 +107,9 @@ struct mtrr_state_type {
 #define MTRR_NUM_TYPES       7
 
 /*
- * Invalid MTRR memory type.  mtrr_type_lookup() returns this value when
- * MTRRs are disabled.  Note, this value is allocated from the reserved
- * values (0x7-0xff) of the MTRR memory types.
+ * Invalid MTRR memory type.  No longer used outside of MTRR code.
+ * Note, this value is allocated from the reserved values (0x7-0xff) of
+ * the MTRR memory types.
  */
 #define MTRR_TYPE_INVALID    0xff
 
index dd61752..4070a01 100644 (file)
@@ -17,6 +17,7 @@ CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 CFLAGS_REMOVE_sev.o = -pg
+CFLAGS_REMOVE_rethook.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o                           := n
index 1328c22..6dfecb2 100644 (file)
@@ -16,6 +16,7 @@
 #include <asm/cacheflush.h>
 #include <asm/realmode.h>
 #include <asm/hypervisor.h>
+#include <asm/smp.h>
 
 #include <linux/ftrace.h>
 #include "../../realmode/rm/wakeup.h"
@@ -127,7 +128,13 @@ int x86_acpi_suspend_lowlevel(void)
         * value is in the actual %rsp register.
         */
        current->thread.sp = (unsigned long)temp_stack + sizeof(temp_stack);
-       smpboot_control = smp_processor_id();
+       /*
+        * Ensure the CPU knows which one it is when it comes back, if
+        * it isn't in parallel mode and expected to work that out for
+        * itself.
+        */
+       if (!(smpboot_control & STARTUP_PARALLEL_MASK))
+               smpboot_control = smp_processor_id();
 #endif
        initial_code = (unsigned long)wakeup_long64;
        saved_magic = 0x123456789abcdef0L;
index 171a40c..054c15a 100644 (file)
@@ -12,7 +12,6 @@ extern int wakeup_pmode_return;
 
 extern u8 wake_sleep_flags;
 
-extern unsigned long acpi_copy_wakeup_routine(unsigned long);
 extern void wakeup_long64(void);
 
 extern void do_suspend_lowlevel(void);
index f615e0c..72646d7 100644 (file)
@@ -37,11 +37,23 @@ EXPORT_SYMBOL_GPL(alternatives_patched);
 
 #define MAX_PATCH_LEN (255-1)
 
-static int __initdata_or_module debug_alternative;
+#define DA_ALL         (~0)
+#define DA_ALT         0x01
+#define DA_RET         0x02
+#define DA_RETPOLINE   0x04
+#define DA_ENDBR       0x08
+#define DA_SMP         0x10
+
+static unsigned int __initdata_or_module debug_alternative;
 
 static int __init debug_alt(char *str)
 {
-       debug_alternative = 1;
+       if (str && *str == '=')
+               str++;
+
+       if (!str || kstrtouint(str, 0, &debug_alternative))
+               debug_alternative = DA_ALL;
+
        return 1;
 }
 __setup("debug-alternative", debug_alt);
@@ -55,15 +67,15 @@ static int __init setup_noreplace_smp(char *str)
 }
 __setup("noreplace-smp", setup_noreplace_smp);
 
-#define DPRINTK(fmt, args...)                                          \
+#define DPRINTK(type, fmt, args...)                                    \
 do {                                                                   \
-       if (debug_alternative)                                          \
+       if (debug_alternative & DA_##type)                              \
                printk(KERN_DEBUG pr_fmt(fmt) "\n", ##args);            \
 } while (0)
 
-#define DUMP_BYTES(buf, len, fmt, args...)                             \
+#define DUMP_BYTES(type, buf, len, fmt, args...)                       \
 do {                                                                   \
-       if (unlikely(debug_alternative)) {                              \
+       if (unlikely(debug_alternative & DA_##type)) {                  \
                int j;                                                  \
                                                                        \
                if (!(len))                                             \
@@ -86,6 +98,11 @@ static const unsigned char x86nops[] =
        BYTES_NOP6,
        BYTES_NOP7,
        BYTES_NOP8,
+#ifdef CONFIG_64BIT
+       BYTES_NOP9,
+       BYTES_NOP10,
+       BYTES_NOP11,
+#endif
 };
 
 const unsigned char * const x86_nops[ASM_NOP_MAX+1] =
@@ -99,19 +116,44 @@ const unsigned char * const x86_nops[ASM_NOP_MAX+1] =
        x86nops + 1 + 2 + 3 + 4 + 5,
        x86nops + 1 + 2 + 3 + 4 + 5 + 6,
        x86nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
+#ifdef CONFIG_64BIT
+       x86nops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8,
+       x86nops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9,
+       x86nops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10,
+#endif
 };
 
-/* Use this to add nops to a buffer, then text_poke the whole buffer. */
-static void __init_or_module add_nops(void *insns, unsigned int len)
+/*
+ * Fill the buffer with a single effective instruction of size @len.
+ *
+ * In order not to issue an ORC stack depth tracking CFI entry (Call Frame Info)
+ * for every single-byte NOP, try to generate the maximally available NOP of
+ * size <= ASM_NOP_MAX such that only a single CFI entry is generated (vs one for
+ * each single-byte NOPs). If @len to fill out is > ASM_NOP_MAX, pad with INT3 and
+ * *jump* over instead of executing long and daft NOPs.
+ */
+static void __init_or_module add_nop(u8 *instr, unsigned int len)
 {
-       while (len > 0) {
-               unsigned int noplen = len;
-               if (noplen > ASM_NOP_MAX)
-                       noplen = ASM_NOP_MAX;
-               memcpy(insns, x86_nops[noplen], noplen);
-               insns += noplen;
-               len -= noplen;
+       u8 *target = instr + len;
+
+       if (!len)
+               return;
+
+       if (len <= ASM_NOP_MAX) {
+               memcpy(instr, x86_nops[len], len);
+               return;
        }
+
+       if (len < 128) {
+               __text_gen_insn(instr, JMP8_INSN_OPCODE, instr, target, JMP8_INSN_SIZE);
+               instr += JMP8_INSN_SIZE;
+       } else {
+               __text_gen_insn(instr, JMP32_INSN_OPCODE, instr, target, JMP32_INSN_SIZE);
+               instr += JMP32_INSN_SIZE;
+       }
+
+       for (;instr < target; instr++)
+               *instr = INT3_INSN_OPCODE;
 }
 
 extern s32 __retpoline_sites[], __retpoline_sites_end[];
@@ -123,133 +165,223 @@ extern s32 __smp_locks[], __smp_locks_end[];
 void text_poke_early(void *addr, const void *opcode, size_t len);
 
 /*
- * Are we looking at a near JMP with a 1 or 4-byte displacement.
+ * Matches NOP and NOPL, not any of the other possible NOPs.
  */
-static inline bool is_jmp(const u8 opcode)
+static bool insn_is_nop(struct insn *insn)
 {
-       return opcode == 0xeb || opcode == 0xe9;
+       /* Anything NOP, but no REP NOP */
+       if (insn->opcode.bytes[0] == 0x90 &&
+           (!insn->prefixes.nbytes || insn->prefixes.bytes[0] != 0xF3))
+               return true;
+
+       /* NOPL */
+       if (insn->opcode.bytes[0] == 0x0F && insn->opcode.bytes[1] == 0x1F)
+               return true;
+
+       /* TODO: more nops */
+
+       return false;
 }
 
-static void __init_or_module
-recompute_jump(struct alt_instr *a, u8 *orig_insn, u8 *repl_insn, u8 *insn_buff)
+/*
+ * Find the offset of the first non-NOP instruction starting at @offset
+ * but no further than @len.
+ */
+static int skip_nops(u8 *instr, int offset, int len)
 {
-       u8 *next_rip, *tgt_rip;
-       s32 n_dspl, o_dspl;
-       int repl_len;
+       struct insn insn;
 
-       if (a->replacementlen != 5)
-               return;
+       for (; offset < len; offset += insn.length) {
+               if (insn_decode_kernel(&insn, &instr[offset]))
+                       break;
 
-       o_dspl = *(s32 *)(insn_buff + 1);
+               if (!insn_is_nop(&insn))
+                       break;
+       }
 
-       /* next_rip of the replacement JMP */
-       next_rip = repl_insn + a->replacementlen;
-       /* target rip of the replacement JMP */
-       tgt_rip  = next_rip + o_dspl;
-       n_dspl = tgt_rip - orig_insn;
+       return offset;
+}
 
-       DPRINTK("target RIP: %px, new_displ: 0x%x", tgt_rip, n_dspl);
+/*
+ * Optimize a sequence of NOPs, possibly preceded by an unconditional jump
+ * to the end of the NOP sequence into a single NOP.
+ */
+static bool __init_or_module
+__optimize_nops(u8 *instr, size_t len, struct insn *insn, int *next, int *prev, int *target)
+{
+       int i = *next - insn->length;
 
-       if (tgt_rip - orig_insn >= 0) {
-               if (n_dspl - 2 <= 127)
-                       goto two_byte_jmp;
-               else
-                       goto five_byte_jmp;
-       /* negative offset */
-       } else {
-               if (((n_dspl - 2) & 0xff) == (n_dspl - 2))
-                       goto two_byte_jmp;
-               else
-                       goto five_byte_jmp;
+       switch (insn->opcode.bytes[0]) {
+       case JMP8_INSN_OPCODE:
+       case JMP32_INSN_OPCODE:
+               *prev = i;
+               *target = *next + insn->immediate.value;
+               return false;
        }
 
-two_byte_jmp:
-       n_dspl -= 2;
+       if (insn_is_nop(insn)) {
+               int nop = i;
 
-       insn_buff[0] = 0xeb;
-       insn_buff[1] = (s8)n_dspl;
-       add_nops(insn_buff + 2, 3);
+               *next = skip_nops(instr, *next, len);
+               if (*target && *next == *target)
+                       nop = *prev;
 
-       repl_len = 2;
-       goto done;
+               add_nop(instr + nop, *next - nop);
+               DUMP_BYTES(ALT, instr, len, "%px: [%d:%d) optimized NOPs: ", instr, nop, *next);
+               return true;
+       }
+
+       *target = 0;
+       return false;
+}
 
-five_byte_jmp:
-       n_dspl -= 5;
+/*
+ * "noinline" to cause control flow change and thus invalidate I$ and
+ * cause refetch after modification.
+ */
+static void __init_or_module noinline optimize_nops(u8 *instr, size_t len)
+{
+       int prev, target = 0;
 
-       insn_buff[0] = 0xe9;
-       *(s32 *)&insn_buff[1] = n_dspl;
+       for (int next, i = 0; i < len; i = next) {
+               struct insn insn;
 
-       repl_len = 5;
+               if (insn_decode_kernel(&insn, &instr[i]))
+                       return;
 
-done:
+               next = i + insn.length;
 
-       DPRINTK("final displ: 0x%08x, JMP 0x%lx",
-               n_dspl, (unsigned long)orig_insn + n_dspl + repl_len);
+               __optimize_nops(instr, len, &insn, &next, &prev, &target);
+       }
 }
 
 /*
- * optimize_nops_range() - Optimize a sequence of single byte NOPs (0x90)
+ * In this context, "source" is where the instructions are placed in the
+ * section .altinstr_replacement, for example during kernel build by the
+ * toolchain.
+ * "Destination" is where the instructions are being patched in by this
+ * machinery.
  *
- * @instr: instruction byte stream
- * @instrlen: length of the above
- * @off: offset within @instr where the first NOP has been detected
+ * The source offset is:
  *
- * Return: number of NOPs found (and replaced).
+ *   src_imm = target - src_next_ip                  (1)
+ *
+ * and the target offset is:
+ *
+ *   dst_imm = target - dst_next_ip                  (2)
+ *
+ * so rework (1) as an expression for target like:
+ *
+ *   target = src_imm + src_next_ip                  (1a)
+ *
+ * and substitute in (2) to get:
+ *
+ *   dst_imm = (src_imm + src_next_ip) - dst_next_ip (3)
+ *
+ * Now, since the instruction stream is 'identical' at src and dst (it
+ * is being copied after all) it can be stated that:
+ *
+ *   src_next_ip = src + ip_offset
+ *   dst_next_ip = dst + ip_offset                   (4)
+ *
+ * Substitute (4) in (3) and observe ip_offset being cancelled out to
+ * obtain:
+ *
+ *   dst_imm = src_imm + (src + ip_offset) - (dst + ip_offset)
+ *           = src_imm + src - dst + ip_offset - ip_offset
+ *           = src_imm + src - dst                   (5)
+ *
+ * IOW, only the relative displacement of the code block matters.
  */
-static __always_inline int optimize_nops_range(u8 *instr, u8 instrlen, int off)
-{
-       unsigned long flags;
-       int i = off, nnops;
 
-       while (i < instrlen) {
-               if (instr[i] != 0x90)
-                       break;
+#define apply_reloc_n(n_, p_, d_)                              \
+       do {                                                    \
+               s32 v = *(s##n_ *)(p_);                         \
+               v += (d_);                                      \
+               BUG_ON((v >> 31) != (v >> (n_-1)));             \
+               *(s##n_ *)(p_) = (s##n_)v;                      \
+       } while (0)
+
 
-               i++;
+static __always_inline
+void apply_reloc(int n, void *ptr, uintptr_t diff)
+{
+       switch (n) {
+       case 1: apply_reloc_n(8, ptr, diff); break;
+       case 2: apply_reloc_n(16, ptr, diff); break;
+       case 4: apply_reloc_n(32, ptr, diff); break;
+       default: BUG();
        }
+}
 
-       nnops = i - off;
+static __always_inline
+bool need_reloc(unsigned long offset, u8 *src, size_t src_len)
+{
+       u8 *target = src + offset;
+       /*
+        * If the target is inside the patched block, it's relative to the
+        * block itself and does not need relocation.
+        */
+       return (target < src || target > src + src_len);
+}
 
-       if (nnops <= 1)
-               return nnops;
+static void __init_or_module noinline
+apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len)
+{
+       int prev, target = 0;
 
-       local_irq_save(flags);
-       add_nops(instr + off, nnops);
-       local_irq_restore(flags);
+       for (int next, i = 0; i < len; i = next) {
+               struct insn insn;
 
-       DUMP_BYTES(instr, instrlen, "%px: [%d:%d) optimized NOPs: ", instr, off, i);
+               if (WARN_ON_ONCE(insn_decode_kernel(&insn, &buf[i])))
+                       return;
 
-       return nnops;
-}
+               next = i + insn.length;
 
-/*
- * "noinline" to cause control flow change and thus invalidate I$ and
- * cause refetch after modification.
- */
-static void __init_or_module noinline optimize_nops(u8 *instr, size_t len)
-{
-       struct insn insn;
-       int i = 0;
+               if (__optimize_nops(buf, len, &insn, &next, &prev, &target))
+                       continue;
 
-       /*
-        * Jump over the non-NOP insns and optimize single-byte NOPs into bigger
-        * ones.
-        */
-       for (;;) {
-               if (insn_decode_kernel(&insn, &instr[i]))
-                       return;
+               switch (insn.opcode.bytes[0]) {
+               case 0x0f:
+                       if (insn.opcode.bytes[1] < 0x80 ||
+                           insn.opcode.bytes[1] > 0x8f)
+                               break;
 
-               /*
-                * See if this and any potentially following NOPs can be
-                * optimized.
-                */
-               if (insn.length == 1 && insn.opcode.bytes[0] == 0x90)
-                       i += optimize_nops_range(instr, len, i);
-               else
-                       i += insn.length;
+                       fallthrough;    /* Jcc.d32 */
+               case 0x70 ... 0x7f:     /* Jcc.d8 */
+               case JMP8_INSN_OPCODE:
+               case JMP32_INSN_OPCODE:
+               case CALL_INSN_OPCODE:
+                       if (need_reloc(next + insn.immediate.value, src, src_len)) {
+                               apply_reloc(insn.immediate.nbytes,
+                                           buf + i + insn_offset_immediate(&insn),
+                                           src - dest);
+                       }
 
-               if (i >= len)
-                       return;
+                       /*
+                        * Where possible, convert JMP.d32 into JMP.d8.
+                        */
+                       if (insn.opcode.bytes[0] == JMP32_INSN_OPCODE) {
+                               s32 imm = insn.immediate.value;
+                               imm += src - dest;
+                               imm += JMP32_INSN_SIZE - JMP8_INSN_SIZE;
+                               if ((imm >> 31) == (imm >> 7)) {
+                                       buf[i+0] = JMP8_INSN_OPCODE;
+                                       buf[i+1] = (s8)imm;
+
+                                       memset(&buf[i+2], INT3_INSN_OPCODE, insn.length - 2);
+                               }
+                       }
+                       break;
+               }
+
+               if (insn_rip_relative(&insn)) {
+                       if (need_reloc(next + insn.displacement.value, src, src_len)) {
+                               apply_reloc(insn.displacement.nbytes,
+                                           buf + i + insn_offset_displacement(&insn),
+                                           src - dest);
+                       }
+               }
        }
 }
 
@@ -270,7 +402,7 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
        u8 *instr, *replacement;
        u8 insn_buff[MAX_PATCH_LEN];
 
-       DPRINTK("alt table %px, -> %px", start, end);
+       DPRINTK(ALT, "alt table %px, -> %px", start, end);
        /*
         * The scan order should be from start to end. A later scanned
         * alternative code can overwrite previously scanned alternative code.
@@ -294,47 +426,31 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
                 * - feature not present but ALT_FLAG_NOT is set to mean,
                 *   patch if feature is *NOT* present.
                 */
-               if (!boot_cpu_has(a->cpuid) == !(a->flags & ALT_FLAG_NOT))
-                       goto next;
+               if (!boot_cpu_has(a->cpuid) == !(a->flags & ALT_FLAG_NOT)) {
+                       optimize_nops(instr, a->instrlen);
+                       continue;
+               }
 
-               DPRINTK("feat: %s%d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d)",
+               DPRINTK(ALT, "feat: %s%d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d)",
                        (a->flags & ALT_FLAG_NOT) ? "!" : "",
                        a->cpuid >> 5,
                        a->cpuid & 0x1f,
                        instr, instr, a->instrlen,
                        replacement, a->replacementlen);
 
-               DUMP_BYTES(instr, a->instrlen, "%px:   old_insn: ", instr);
-               DUMP_BYTES(replacement, a->replacementlen, "%px:   rpl_insn: ", replacement);
-
                memcpy(insn_buff, replacement, a->replacementlen);
                insn_buff_sz = a->replacementlen;
 
-               /*
-                * 0xe8 is a relative jump; fix the offset.
-                *
-                * Instruction length is checked before the opcode to avoid
-                * accessing uninitialized bytes for zero-length replacements.
-                */
-               if (a->replacementlen == 5 && *insn_buff == 0xe8) {
-                       *(s32 *)(insn_buff + 1) += replacement - instr;
-                       DPRINTK("Fix CALL offset: 0x%x, CALL 0x%lx",
-                               *(s32 *)(insn_buff + 1),
-                               (unsigned long)instr + *(s32 *)(insn_buff + 1) + 5);
-               }
-
-               if (a->replacementlen && is_jmp(replacement[0]))
-                       recompute_jump(a, instr, replacement, insn_buff);
-
                for (; insn_buff_sz < a->instrlen; insn_buff_sz++)
                        insn_buff[insn_buff_sz] = 0x90;
 
-               DUMP_BYTES(insn_buff, insn_buff_sz, "%px: final_insn: ", instr);
+               apply_relocation(insn_buff, a->instrlen, instr, replacement, a->replacementlen);
 
-               text_poke_early(instr, insn_buff, insn_buff_sz);
+               DUMP_BYTES(ALT, instr, a->instrlen, "%px:   old_insn: ", instr);
+               DUMP_BYTES(ALT, replacement, a->replacementlen, "%px:   rpl_insn: ", replacement);
+               DUMP_BYTES(ALT, insn_buff, insn_buff_sz, "%px: final_insn: ", instr);
 
-next:
-               optimize_nops(instr, a->instrlen);
+               text_poke_early(instr, insn_buff, insn_buff_sz);
        }
 }
 
@@ -555,15 +671,15 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
                        continue;
                }
 
-               DPRINTK("retpoline at: %pS (%px) len: %d to: %pS",
+               DPRINTK(RETPOLINE, "retpoline at: %pS (%px) len: %d to: %pS",
                        addr, addr, insn.length,
                        addr + insn.length + insn.immediate.value);
 
                len = patch_retpoline(addr, &insn, bytes);
                if (len == insn.length) {
                        optimize_nops(bytes, len);
-                       DUMP_BYTES(((u8*)addr),  len, "%px: orig: ", addr);
-                       DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
+                       DUMP_BYTES(RETPOLINE, ((u8*)addr),  len, "%px: orig: ", addr);
+                       DUMP_BYTES(RETPOLINE, ((u8*)bytes), len, "%px: repl: ", addr);
                        text_poke_early(addr, bytes, len);
                }
        }
@@ -590,13 +706,12 @@ static int patch_return(void *addr, struct insn *insn, u8 *bytes)
 {
        int i = 0;
 
+       /* Patch the custom return thunks... */
        if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) {
-               if (x86_return_thunk == __x86_return_thunk)
-                       return -1;
-
                i = JMP32_INSN_SIZE;
                __text_gen_insn(bytes, JMP32_INSN_OPCODE, addr, x86_return_thunk, i);
        } else {
+               /* ... or patch them out if not needed. */
                bytes[i++] = RET_INSN_OPCODE;
        }
 
@@ -609,6 +724,14 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)
 {
        s32 *s;
 
+       /*
+        * Do not patch out the default return thunks if those needed are the
+        * ones generated by the compiler.
+        */
+       if (cpu_feature_enabled(X86_FEATURE_RETHUNK) &&
+           (x86_return_thunk == __x86_return_thunk))
+               return;
+
        for (s = start; s < end; s++) {
                void *dest = NULL, *addr = (void *)s + *s;
                struct insn insn;
@@ -630,14 +753,14 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)
                              addr, dest, 5, addr))
                        continue;
 
-               DPRINTK("return thunk at: %pS (%px) len: %d to: %pS",
+               DPRINTK(RET, "return thunk at: %pS (%px) len: %d to: %pS",
                        addr, addr, insn.length,
                        addr + insn.length + insn.immediate.value);
 
                len = patch_return(addr, &insn, bytes);
                if (len == insn.length) {
-                       DUMP_BYTES(((u8*)addr),  len, "%px: orig: ", addr);
-                       DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
+                       DUMP_BYTES(RET, ((u8*)addr),  len, "%px: orig: ", addr);
+                       DUMP_BYTES(RET, ((u8*)bytes), len, "%px: repl: ", addr);
                        text_poke_early(addr, bytes, len);
                }
        }
@@ -655,7 +778,7 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end) { }
 
 #ifdef CONFIG_X86_KERNEL_IBT
 
-static void poison_endbr(void *addr, bool warn)
+static void __init_or_module poison_endbr(void *addr, bool warn)
 {
        u32 endbr, poison = gen_endbr_poison();
 
@@ -667,13 +790,13 @@ static void poison_endbr(void *addr, bool warn)
                return;
        }
 
-       DPRINTK("ENDBR at: %pS (%px)", addr, addr);
+       DPRINTK(ENDBR, "ENDBR at: %pS (%px)", addr, addr);
 
        /*
         * When we have IBT, the lack of ENDBR will trigger #CP
         */
-       DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr);
-       DUMP_BYTES(((u8*)&poison), 4, "%px: repl: ", addr);
+       DUMP_BYTES(ENDBR, ((u8*)addr), 4, "%px: orig: ", addr);
+       DUMP_BYTES(ENDBR, ((u8*)&poison), 4, "%px: repl: ", addr);
        text_poke_early(addr, &poison, 4);
 }
 
@@ -1148,7 +1271,7 @@ void __init_or_module alternatives_smp_module_add(struct module *mod,
        smp->locks_end  = locks_end;
        smp->text       = text;
        smp->text_end   = text_end;
-       DPRINTK("locks %p -> %p, text %p -> %p, name %s\n",
+       DPRINTK(SMP, "locks %p -> %p, text %p -> %p, name %s\n",
                smp->locks, smp->locks_end,
                smp->text, smp->text_end, smp->name);
 
@@ -1225,6 +1348,20 @@ int alternatives_text_reserved(void *start, void *end)
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_PARAVIRT
+
+/* Use this to add nops to a buffer, then text_poke the whole buffer. */
+static void __init_or_module add_nops(void *insns, unsigned int len)
+{
+       while (len > 0) {
+               unsigned int noplen = len;
+               if (noplen > ASM_NOP_MAX)
+                       noplen = ASM_NOP_MAX;
+               memcpy(insns, x86_nops[noplen], noplen);
+               insns += noplen;
+               len -= noplen;
+       }
+}
+
 void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
                                     struct paravirt_patch_site *end)
 {
@@ -1332,6 +1469,35 @@ static noinline void __init int3_selftest(void)
        unregister_die_notifier(&int3_exception_nb);
 }
 
+static __initdata int __alt_reloc_selftest_addr;
+
+__visible noinline void __init __alt_reloc_selftest(void *arg)
+{
+       WARN_ON(arg != &__alt_reloc_selftest_addr);
+}
+
+static noinline void __init alt_reloc_selftest(void)
+{
+       /*
+        * Tests apply_relocation().
+        *
+        * This has a relative immediate (CALL) in a place other than the first
+        * instruction and additionally on x86_64 we get a RIP-relative LEA:
+        *
+        *   lea    0x0(%rip),%rdi  # 5d0: R_X86_64_PC32    .init.data+0x5566c
+        *   call   +0              # 5d5: R_X86_64_PLT32   __alt_reloc_selftest-0x4
+        *
+        * Getting this wrong will either crash and burn or tickle the WARN
+        * above.
+        */
+       asm_inline volatile (
+               ALTERNATIVE("", "lea %[mem], %%" _ASM_ARG1 "; call __alt_reloc_selftest;", X86_FEATURE_ALWAYS)
+               : /* output */
+               : [mem] "m" (__alt_reloc_selftest_addr)
+               : _ASM_ARG1
+       );
+}
+
 void __init alternative_instructions(void)
 {
        int3_selftest();
@@ -1419,6 +1585,8 @@ void __init alternative_instructions(void)
 
        restart_nmi();
        alternatives_patched = 1;
+
+       alt_reloc_selftest();
 }
 
 /**
@@ -1799,7 +1967,7 @@ struct bp_patching_desc *try_get_desc(void)
 {
        struct bp_patching_desc *desc = &bp_desc;
 
-       if (!arch_atomic_inc_not_zero(&desc->refs))
+       if (!raw_atomic_inc_not_zero(&desc->refs))
                return NULL;
 
        return desc;
@@ -1810,7 +1978,7 @@ static __always_inline void put_desc(void)
        struct bp_patching_desc *desc = &bp_desc;
 
        smp_mb__before_atomic();
-       arch_atomic_dec(&desc->refs);
+       raw_atomic_dec(&desc->refs);
 }
 
 static __always_inline void *text_poke_addr(struct text_poke_loc *tp)
@@ -1954,6 +2122,16 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
        atomic_set_release(&bp_desc.refs, 1);
 
        /*
+        * Function tracing can enable thousands of places that need to be
+        * updated. This can take quite some time, and with full kernel debugging
+        * enabled, this could cause the softlockup watchdog to trigger.
+        * This function gets called every 256 entries added to be patched.
+        * Call cond_resched() here to make sure that other tasks can get scheduled
+        * while processing all the functions being patched.
+        */
+       cond_resched();
+
+       /*
         * Corresponding read barrier in int3 notifier for making sure the
         * nr_entries and handler are correctly ordered wrt. patching.
         */
index 4266b64..035a3db 100644 (file)
 #include <linux/pci_ids.h>
 #include <asm/amd_nb.h>
 
-#define PCI_DEVICE_ID_AMD_17H_ROOT     0x1450
-#define PCI_DEVICE_ID_AMD_17H_M10H_ROOT        0x15d0
-#define PCI_DEVICE_ID_AMD_17H_M30H_ROOT        0x1480
-#define PCI_DEVICE_ID_AMD_17H_M60H_ROOT        0x1630
-#define PCI_DEVICE_ID_AMD_17H_MA0H_ROOT        0x14b5
-#define PCI_DEVICE_ID_AMD_19H_M10H_ROOT        0x14a4
-#define PCI_DEVICE_ID_AMD_19H_M60H_ROOT        0x14d8
-#define PCI_DEVICE_ID_AMD_19H_M70H_ROOT        0x14e8
-#define PCI_DEVICE_ID_AMD_17H_DF_F4    0x1464
-#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F4 0x15ec
-#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F4 0x1494
-#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F4 0x144c
-#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
-#define PCI_DEVICE_ID_AMD_17H_MA0H_DF_F4 0x1728
-#define PCI_DEVICE_ID_AMD_19H_DF_F4    0x1654
-#define PCI_DEVICE_ID_AMD_19H_M10H_DF_F4 0x14b1
-#define PCI_DEVICE_ID_AMD_19H_M40H_ROOT        0x14b5
-#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d
-#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
-#define PCI_DEVICE_ID_AMD_19H_M60H_DF_F4 0x14e4
-#define PCI_DEVICE_ID_AMD_19H_M70H_DF_F4 0x14f4
+#define PCI_DEVICE_ID_AMD_17H_ROOT             0x1450
+#define PCI_DEVICE_ID_AMD_17H_M10H_ROOT                0x15d0
+#define PCI_DEVICE_ID_AMD_17H_M30H_ROOT                0x1480
+#define PCI_DEVICE_ID_AMD_17H_M60H_ROOT                0x1630
+#define PCI_DEVICE_ID_AMD_17H_MA0H_ROOT                0x14b5
+#define PCI_DEVICE_ID_AMD_19H_M10H_ROOT                0x14a4
+#define PCI_DEVICE_ID_AMD_19H_M40H_ROOT                0x14b5
+#define PCI_DEVICE_ID_AMD_19H_M60H_ROOT                0x14d8
+#define PCI_DEVICE_ID_AMD_19H_M70H_ROOT                0x14e8
+#define PCI_DEVICE_ID_AMD_MI200_ROOT           0x14bb
+
+#define PCI_DEVICE_ID_AMD_17H_DF_F4            0x1464
+#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F4       0x15ec
+#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F4       0x1494
+#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F4       0x144c
+#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4       0x1444
+#define PCI_DEVICE_ID_AMD_17H_MA0H_DF_F4       0x1728
+#define PCI_DEVICE_ID_AMD_19H_DF_F4            0x1654
+#define PCI_DEVICE_ID_AMD_19H_M10H_DF_F4       0x14b1
+#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4       0x167d
+#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4       0x166e
+#define PCI_DEVICE_ID_AMD_19H_M60H_DF_F4       0x14e4
+#define PCI_DEVICE_ID_AMD_19H_M70H_DF_F4       0x14f4
+#define PCI_DEVICE_ID_AMD_19H_M78H_DF_F4       0x12fc
+#define PCI_DEVICE_ID_AMD_MI200_DF_F4          0x14d4
 
 /* Protect the PCI config register pairs used for SMN. */
 static DEFINE_MUTEX(smn_mutex);
@@ -52,6 +56,7 @@ static const struct pci_device_id amd_root_ids[] = {
        { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) },
        { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M60H_ROOT) },
        { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M70H_ROOT) },
+       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MI200_ROOT) },
        {}
 };
 
@@ -79,6 +84,8 @@ static const struct pci_device_id amd_nb_misc_ids[] = {
        { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) },
        { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M60H_DF_F3) },
        { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M70H_DF_F3) },
+       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M78H_DF_F3) },
+       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MI200_DF_F3) },
        {}
 };
 
@@ -99,6 +106,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
        { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) },
        { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) },
        { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
+       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MI200_DF_F4) },
        {}
 };
 
index 7705571..af49e24 100644 (file)
@@ -101,6 +101,9 @@ static int apic_extnmi __ro_after_init = APIC_EXTNMI_BSP;
  */
 static bool virt_ext_dest_id __ro_after_init;
 
+/* For parallel bootup. */
+unsigned long apic_mmio_base __ro_after_init;
+
 /*
  * Map cpu index to physical APIC ID
  */
@@ -2163,6 +2166,7 @@ void __init register_lapic_address(unsigned long address)
 
        if (!x2apic_mode) {
                set_fixmap_nocache(FIX_APIC_BASE, address);
+               apic_mmio_base = APIC_BASE;
                apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
                            APIC_BASE, address);
        }
@@ -2376,7 +2380,7 @@ static int nr_logical_cpuids = 1;
 /*
  * Used to store mapping between logical CPU IDs and APIC IDs.
  */
-static int cpuid_to_apicid[] = {
+int cpuid_to_apicid[] = {
        [0 ... NR_CPUS - 1] = -1,
 };
 
@@ -2386,20 +2390,31 @@ bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 }
 
 #ifdef CONFIG_SMP
-/**
- * apic_id_is_primary_thread - Check whether APIC ID belongs to a primary thread
- * @apicid: APIC ID to check
+static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid)
+{
+       /* Isolate the SMT bit(s) in the APICID and check for 0 */
+       u32 mask = (1U << (fls(smp_num_siblings) - 1)) - 1;
+
+       if (smp_num_siblings == 1 || !(apicid & mask))
+               cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
+}
+
+/*
+ * Due to the utter mess of CPUID evaluation smp_num_siblings is not valid
+ * during early boot. Initialize the primary thread mask before SMP
+ * bringup.
  */
-bool apic_id_is_primary_thread(unsigned int apicid)
+static int __init smp_init_primary_thread_mask(void)
 {
-       u32 mask;
+       unsigned int cpu;
 
-       if (smp_num_siblings == 1)
-               return true;
-       /* Isolate the SMT bit(s) in the APICID and check for 0 */
-       mask = (1U << (fls(smp_num_siblings) - 1)) - 1;
-       return !(apicid & mask);
+       for (cpu = 0; cpu < nr_logical_cpuids; cpu++)
+               cpu_mark_primary_thread(cpu, cpuid_to_apicid[cpu]);
+       return 0;
 }
+early_initcall(smp_init_primary_thread_mask);
+#else
+static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
 #endif
 
 /*
@@ -2544,6 +2559,9 @@ int generic_processor_info(int apicid, int version)
        set_cpu_present(cpu, true);
        num_processors++;
 
+       if (system_state != SYSTEM_BOOTING)
+               cpu_mark_primary_thread(cpu, apicid);
+
        return cpu;
 }
 
index 6bde05a..896bc41 100644 (file)
@@ -97,7 +97,10 @@ static void init_x2apic_ldr(void)
 
 static int x2apic_phys_probe(void)
 {
-       if (x2apic_mode && (x2apic_phys || x2apic_fadt_phys()))
+       if (!x2apic_mode)
+               return 0;
+
+       if (x2apic_phys || x2apic_fadt_phys())
                return 1;
 
        return apic == &apic_x2apic_phys;
index 4828552..d9384d5 100644 (file)
@@ -546,7 +546,6 @@ unsigned long sn_rtc_cycles_per_second;
 EXPORT_SYMBOL(sn_rtc_cycles_per_second);
 
 /* The following values are used for the per node hub info struct */
-static __initdata unsigned short               *_node_to_pnode;
 static __initdata unsigned short               _min_socket, _max_socket;
 static __initdata unsigned short               _min_pnode, _max_pnode, _gr_table_len;
 static __initdata struct uv_gam_range_entry    *uv_gre_table;
@@ -554,6 +553,7 @@ static __initdata struct uv_gam_parameters  *uv_gp_table;
 static __initdata unsigned short               *_socket_to_node;
 static __initdata unsigned short               *_socket_to_pnode;
 static __initdata unsigned short               *_pnode_to_socket;
+static __initdata unsigned short               *_node_to_socket;
 
 static __initdata struct uv_gam_range_s                *_gr_table;
 
@@ -617,7 +617,8 @@ static __init void build_uv_gr_table(void)
 
        bytes = _gr_table_len * sizeof(struct uv_gam_range_s);
        grt = kzalloc(bytes, GFP_KERNEL);
-       BUG_ON(!grt);
+       if (WARN_ON_ONCE(!grt))
+               return;
        _gr_table = grt;
 
        for (; gre->type != UV_GAM_RANGE_TYPE_UNUSED; gre++) {
@@ -1022,7 +1023,7 @@ static void __init calc_mmioh_map(enum mmioh_arch index,
        switch (index) {
        case UVY_MMIOH0:
                mmr = UVH_RH10_GAM_MMIOH_REDIRECT_CONFIG0;
-               nasid_mask = UVH_RH10_GAM_MMIOH_OVERLAY_CONFIG0_BASE_MASK;
+               nasid_mask = UVYH_RH10_GAM_MMIOH_REDIRECT_CONFIG0_NASID_MASK;
                n = UVH_RH10_GAM_MMIOH_REDIRECT_CONFIG0_DEPTH;
                min_nasid = min_pnode;
                max_nasid = max_pnode;
@@ -1030,7 +1031,7 @@ static void __init calc_mmioh_map(enum mmioh_arch index,
                break;
        case UVY_MMIOH1:
                mmr = UVH_RH10_GAM_MMIOH_REDIRECT_CONFIG1;
-               nasid_mask = UVH_RH10_GAM_MMIOH_OVERLAY_CONFIG1_BASE_MASK;
+               nasid_mask = UVYH_RH10_GAM_MMIOH_REDIRECT_CONFIG1_NASID_MASK;
                n = UVH_RH10_GAM_MMIOH_REDIRECT_CONFIG1_DEPTH;
                min_nasid = min_pnode;
                max_nasid = max_pnode;
@@ -1038,7 +1039,7 @@ static void __init calc_mmioh_map(enum mmioh_arch index,
                break;
        case UVX_MMIOH0:
                mmr = UVH_RH_GAM_MMIOH_REDIRECT_CONFIG0;
-               nasid_mask = UVH_RH_GAM_MMIOH_OVERLAY_CONFIG0_BASE_MASK;
+               nasid_mask = UVH_RH_GAM_MMIOH_REDIRECT_CONFIG0_NASID_MASK;
                n = UVH_RH_GAM_MMIOH_REDIRECT_CONFIG0_DEPTH;
                min_nasid = min_pnode * 2;
                max_nasid = max_pnode * 2;
@@ -1046,7 +1047,7 @@ static void __init calc_mmioh_map(enum mmioh_arch index,
                break;
        case UVX_MMIOH1:
                mmr = UVH_RH_GAM_MMIOH_REDIRECT_CONFIG1;
-               nasid_mask = UVH_RH_GAM_MMIOH_OVERLAY_CONFIG1_BASE_MASK;
+               nasid_mask = UVH_RH_GAM_MMIOH_REDIRECT_CONFIG1_NASID_MASK;
                n = UVH_RH_GAM_MMIOH_REDIRECT_CONFIG1_DEPTH;
                min_nasid = min_pnode * 2;
                max_nasid = max_pnode * 2;
@@ -1072,8 +1073,9 @@ static void __init calc_mmioh_map(enum mmioh_arch index,
 
                /* Invalid NASID check */
                if (nasid < min_nasid || max_nasid < nasid) {
-                       pr_err("UV:%s:Invalid NASID:%x (range:%x..%x)\n",
-                               __func__, index, min_nasid, max_nasid);
+                       /* Not an error: unused table entries get "poison" values */
+                       pr_debug("UV:%s:Invalid NASID(%x):%x (range:%x..%x)\n",
+                              __func__, index, nasid, min_nasid, max_nasid);
                        nasid = -1;
                }
 
@@ -1292,6 +1294,7 @@ static void __init uv_init_hub_info(struct uv_hub_info_s *hi)
        hi->nasid_shift         = uv_cpuid.nasid_shift;
        hi->min_pnode           = _min_pnode;
        hi->min_socket          = _min_socket;
+       hi->node_to_socket      = _node_to_socket;
        hi->pnode_to_socket     = _pnode_to_socket;
        hi->socket_to_node      = _socket_to_node;
        hi->socket_to_pnode     = _socket_to_pnode;
@@ -1348,7 +1351,7 @@ static void __init decode_gam_rng_tbl(unsigned long ptr)
        struct uv_gam_range_entry *gre = (struct uv_gam_range_entry *)ptr;
        unsigned long lgre = 0, gend = 0;
        int index = 0;
-       int sock_min = 999999, pnode_min = 99999;
+       int sock_min = INT_MAX, pnode_min = INT_MAX;
        int sock_max = -1, pnode_max = -1;
 
        uv_gre_table = gre;
@@ -1459,11 +1462,37 @@ static int __init decode_uv_systab(void)
        return 0;
 }
 
+/*
+ * Given a bitmask 'bits' representing presnt blades, numbered
+ * starting at 'base', masking off unused high bits of blade number
+ * with 'mask', update the minimum and maximum blade numbers that we
+ * have found.  (Masking with 'mask' necessary because of BIOS
+ * treatment of system partitioning when creating this table we are
+ * interpreting.)
+ */
+static inline void blade_update_min_max(unsigned long bits, int base, int mask, int *min, int *max)
+{
+       int first, last;
+
+       if (!bits)
+               return;
+       first = (base + __ffs(bits)) & mask;
+       last =  (base + __fls(bits)) & mask;
+
+       if (*min > first)
+               *min = first;
+       if (*max < last)
+               *max = last;
+}
+
 /* Set up physical blade translations from UVH_NODE_PRESENT_TABLE */
 static __init void boot_init_possible_blades(struct uv_hub_info_s *hub_info)
 {
        unsigned long np;
        int i, uv_pb = 0;
+       int sock_min = INT_MAX, sock_max = -1, s_mask;
+
+       s_mask = (1 << uv_cpuid.n_skt) - 1;
 
        if (UVH_NODE_PRESENT_TABLE) {
                pr_info("UV: NODE_PRESENT_DEPTH = %d\n",
@@ -1471,35 +1500,82 @@ static __init void boot_init_possible_blades(struct uv_hub_info_s *hub_info)
                for (i = 0; i < UVH_NODE_PRESENT_TABLE_DEPTH; i++) {
                        np = uv_read_local_mmr(UVH_NODE_PRESENT_TABLE + i * 8);
                        pr_info("UV: NODE_PRESENT(%d) = 0x%016lx\n", i, np);
-                       uv_pb += hweight64(np);
+                       blade_update_min_max(np, i * 64, s_mask, &sock_min, &sock_max);
                }
        }
        if (UVH_NODE_PRESENT_0) {
                np = uv_read_local_mmr(UVH_NODE_PRESENT_0);
                pr_info("UV: NODE_PRESENT_0 = 0x%016lx\n", np);
-               uv_pb += hweight64(np);
+               blade_update_min_max(np, 0, s_mask, &sock_min, &sock_max);
        }
        if (UVH_NODE_PRESENT_1) {
                np = uv_read_local_mmr(UVH_NODE_PRESENT_1);
                pr_info("UV: NODE_PRESENT_1 = 0x%016lx\n", np);
-               uv_pb += hweight64(np);
+               blade_update_min_max(np, 64, s_mask, &sock_min, &sock_max);
+       }
+
+       /* Only update if we actually found some bits indicating blades present */
+       if (sock_max >= sock_min) {
+               _min_socket = sock_min;
+               _max_socket = sock_max;
+               uv_pb = sock_max - sock_min + 1;
        }
        if (uv_possible_blades != uv_pb)
                uv_possible_blades = uv_pb;
 
-       pr_info("UV: number nodes/possible blades %d\n", uv_pb);
+       pr_info("UV: number nodes/possible blades %d (%d - %d)\n",
+               uv_pb, sock_min, sock_max);
+}
+
+static int __init alloc_conv_table(int num_elem, unsigned short **table)
+{
+       int i;
+       size_t bytes;
+
+       bytes = num_elem * sizeof(*table[0]);
+       *table = kmalloc(bytes, GFP_KERNEL);
+       if (WARN_ON_ONCE(!*table))
+               return -ENOMEM;
+       for (i = 0; i < num_elem; i++)
+               ((unsigned short *)*table)[i] = SOCK_EMPTY;
+       return 0;
 }
 
+/* Remove conversion table if it's 1:1 */
+#define FREE_1_TO_1_TABLE(tbl, min, max, max2) free_1_to_1_table(&tbl, #tbl, min, max, max2)
+
+static void __init free_1_to_1_table(unsigned short **tp, char *tname, int min, int max, int max2)
+{
+       int i;
+       unsigned short *table = *tp;
+
+       if (table == NULL)
+               return;
+       if (max != max2)
+               return;
+       for (i = 0; i < max; i++) {
+               if (i != table[i])
+                       return;
+       }
+       kfree(table);
+       *tp = NULL;
+       pr_info("UV: %s is 1:1, conversion table removed\n", tname);
+}
+
+/*
+ * Build Socket Tables
+ * If the number of nodes is >1 per socket, socket to node table will
+ * contain lowest node number on that socket.
+ */
 static void __init build_socket_tables(void)
 {
        struct uv_gam_range_entry *gre = uv_gre_table;
-       int num, nump;
+       int nums, numn, nump;
        int cpu, i, lnid;
        int minsock = _min_socket;
        int maxsock = _max_socket;
        int minpnode = _min_pnode;
        int maxpnode = _max_pnode;
-       size_t bytes;
 
        if (!gre) {
                if (is_uv2_hub() || is_uv3_hub()) {
@@ -1507,39 +1583,36 @@ static void __init build_socket_tables(void)
                        return;
                }
                pr_err("UV: Error: UVsystab address translations not available!\n");
-               BUG();
+               WARN_ON_ONCE(!gre);
+               return;
        }
 
-       /* Build socket id -> node id, pnode */
-       num = maxsock - minsock + 1;
-       bytes = num * sizeof(_socket_to_node[0]);
-       _socket_to_node = kmalloc(bytes, GFP_KERNEL);
-       _socket_to_pnode = kmalloc(bytes, GFP_KERNEL);
-
+       numn = num_possible_nodes();
        nump = maxpnode - minpnode + 1;
-       bytes = nump * sizeof(_pnode_to_socket[0]);
-       _pnode_to_socket = kmalloc(bytes, GFP_KERNEL);
-       BUG_ON(!_socket_to_node || !_socket_to_pnode || !_pnode_to_socket);
-
-       for (i = 0; i < num; i++)
-               _socket_to_node[i] = _socket_to_pnode[i] = SOCK_EMPTY;
-
-       for (i = 0; i < nump; i++)
-               _pnode_to_socket[i] = SOCK_EMPTY;
+       nums = maxsock - minsock + 1;
+
+       /* Allocate and clear tables */
+       if ((alloc_conv_table(nump, &_pnode_to_socket) < 0)
+           || (alloc_conv_table(nums, &_socket_to_pnode) < 0)
+           || (alloc_conv_table(numn, &_node_to_socket) < 0)
+           || (alloc_conv_table(nums, &_socket_to_node) < 0)) {
+               kfree(_pnode_to_socket);
+               kfree(_socket_to_pnode);
+               kfree(_node_to_socket);
+               return;
+       }
 
        /* Fill in pnode/node/addr conversion list values: */
-       pr_info("UV: GAM Building socket/pnode conversion tables\n");
        for (; gre->type != UV_GAM_RANGE_TYPE_UNUSED; gre++) {
                if (gre->type == UV_GAM_RANGE_TYPE_HOLE)
                        continue;
                i = gre->sockid - minsock;
-               /* Duplicate: */
-               if (_socket_to_pnode[i] != SOCK_EMPTY)
-                       continue;
-               _socket_to_pnode[i] = gre->pnode;
+               if (_socket_to_pnode[i] == SOCK_EMPTY)
+                       _socket_to_pnode[i] = gre->pnode;
 
                i = gre->pnode - minpnode;
-               _pnode_to_socket[i] = gre->sockid;
+               if (_pnode_to_socket[i] == SOCK_EMPTY)
+                       _pnode_to_socket[i] = gre->sockid;
 
                pr_info("UV: sid:%02x type:%d nasid:%04x pn:%02x pn2s:%2x\n",
                        gre->sockid, gre->type, gre->nasid,
@@ -1549,66 +1622,39 @@ static void __init build_socket_tables(void)
 
        /* Set socket -> node values: */
        lnid = NUMA_NO_NODE;
-       for_each_present_cpu(cpu) {
+       for_each_possible_cpu(cpu) {
                int nid = cpu_to_node(cpu);
                int apicid, sockid;
 
                if (lnid == nid)
                        continue;
                lnid = nid;
+
                apicid = per_cpu(x86_cpu_to_apicid, cpu);
                sockid = apicid >> uv_cpuid.socketid_shift;
-               _socket_to_node[sockid - minsock] = nid;
-               pr_info("UV: sid:%02x: apicid:%04x node:%2d\n",
-                       sockid, apicid, nid);
-       }
 
-       /* Set up physical blade to pnode translation from GAM Range Table: */
-       bytes = num_possible_nodes() * sizeof(_node_to_pnode[0]);
-       _node_to_pnode = kmalloc(bytes, GFP_KERNEL);
-       BUG_ON(!_node_to_pnode);
+               if (_socket_to_node[sockid - minsock] == SOCK_EMPTY)
+                       _socket_to_node[sockid - minsock] = nid;
 
-       for (lnid = 0; lnid < num_possible_nodes(); lnid++) {
-               unsigned short sockid;
+               if (_node_to_socket[nid] == SOCK_EMPTY)
+                       _node_to_socket[nid] = sockid;
 
-               for (sockid = minsock; sockid <= maxsock; sockid++) {
-                       if (lnid == _socket_to_node[sockid - minsock]) {
-                               _node_to_pnode[lnid] = _socket_to_pnode[sockid - minsock];
-                               break;
-                       }
-               }
-               if (sockid > maxsock) {
-                       pr_err("UV: socket for node %d not found!\n", lnid);
-                       BUG();
-               }
+               pr_info("UV: sid:%02x: apicid:%04x socket:%02d node:%03x s2n:%03x\n",
+                       sockid,
+                       apicid,
+                       _node_to_socket[nid],
+                       nid,
+                       _socket_to_node[sockid - minsock]);
        }
 
        /*
-        * If socket id == pnode or socket id == node for all nodes,
+        * If e.g. socket id == pnode for all pnodes,
         *   system runs faster by removing corresponding conversion table.
         */
-       pr_info("UV: Checking socket->node/pnode for identity maps\n");
-       if (minsock == 0) {
-               for (i = 0; i < num; i++)
-                       if (_socket_to_node[i] == SOCK_EMPTY || i != _socket_to_node[i])
-                               break;
-               if (i >= num) {
-                       kfree(_socket_to_node);
-                       _socket_to_node = NULL;
-                       pr_info("UV: 1:1 socket_to_node table removed\n");
-               }
-       }
-       if (minsock == minpnode) {
-               for (i = 0; i < num; i++)
-                       if (_socket_to_pnode[i] != SOCK_EMPTY &&
-                               _socket_to_pnode[i] != i + minpnode)
-                               break;
-               if (i >= num) {
-                       kfree(_socket_to_pnode);
-                       _socket_to_pnode = NULL;
-                       pr_info("UV: 1:1 socket_to_pnode table removed\n");
-               }
-       }
+       FREE_1_TO_1_TABLE(_socket_to_node, _min_socket, nums, numn);
+       FREE_1_TO_1_TABLE(_node_to_socket, _min_socket, nums, numn);
+       FREE_1_TO_1_TABLE(_socket_to_pnode, _min_pnode, nums, nump);
+       FREE_1_TO_1_TABLE(_pnode_to_socket, _min_pnode, nums, nump);
 }
 
 /* Check which reboot to use */
@@ -1692,12 +1738,13 @@ static __init int uv_system_init_hubless(void)
 static void __init uv_system_init_hub(void)
 {
        struct uv_hub_info_s hub_info = {0};
-       int bytes, cpu, nodeid;
-       unsigned short min_pnode = 9999, max_pnode = 0;
+       int bytes, cpu, nodeid, bid;
+       unsigned short min_pnode = USHRT_MAX, max_pnode = 0;
        char *hub = is_uv5_hub() ? "UV500" :
                    is_uv4_hub() ? "UV400" :
                    is_uv3_hub() ? "UV300" :
                    is_uv2_hub() ? "UV2000/3000" : NULL;
+       struct uv_hub_info_s **uv_hub_info_list_blade;
 
        if (!hub) {
                pr_err("UV: Unknown/unsupported UV hub\n");
@@ -1720,9 +1767,12 @@ static void __init uv_system_init_hub(void)
        build_uv_gr_table();
        set_block_size();
        uv_init_hub_info(&hub_info);
-       uv_possible_blades = num_possible_nodes();
-       if (!_node_to_pnode)
+       /* If UV2 or UV3 may need to get # blades from HW */
+       if (is_uv(UV2|UV3) && !uv_gre_table)
                boot_init_possible_blades(&hub_info);
+       else
+               /* min/max sockets set in decode_gam_rng_tbl */
+               uv_possible_blades = (_max_socket - _min_socket) + 1;
 
        /* uv_num_possible_blades() is really the hub count: */
        pr_info("UV: Found %d hubs, %d nodes, %d CPUs\n", uv_num_possible_blades(), num_possible_nodes(), num_possible_cpus());
@@ -1731,79 +1781,98 @@ static void __init uv_system_init_hub(void)
        hub_info.coherency_domain_number = sn_coherency_id;
        uv_rtc_init();
 
+       /*
+        * __uv_hub_info_list[] is indexed by node, but there is only
+        * one hub_info structure per blade.  First, allocate one
+        * structure per blade.  Further down we create a per-node
+        * table (__uv_hub_info_list[]) pointing to hub_info
+        * structures for the correct blade.
+        */
+
        bytes = sizeof(void *) * uv_num_possible_blades();
-       __uv_hub_info_list = kzalloc(bytes, GFP_KERNEL);
-       BUG_ON(!__uv_hub_info_list);
+       uv_hub_info_list_blade = kzalloc(bytes, GFP_KERNEL);
+       if (WARN_ON_ONCE(!uv_hub_info_list_blade))
+               return;
 
        bytes = sizeof(struct uv_hub_info_s);
-       for_each_node(nodeid) {
+       for_each_possible_blade(bid) {
                struct uv_hub_info_s *new_hub;
 
-               if (__uv_hub_info_list[nodeid]) {
-                       pr_err("UV: Node %d UV HUB already initialized!?\n", nodeid);
-                       BUG();
+               /* Allocate & fill new per hub info list */
+               new_hub = (bid == 0) ?  &uv_hub_info_node0
+                       : kzalloc_node(bytes, GFP_KERNEL, uv_blade_to_node(bid));
+               if (WARN_ON_ONCE(!new_hub)) {
+                       /* do not kfree() bid 0, which is statically allocated */
+                       while (--bid > 0)
+                               kfree(uv_hub_info_list_blade[bid]);
+                       kfree(uv_hub_info_list_blade);
+                       return;
                }
 
-               /* Allocate new per hub info list */
-               new_hub = (nodeid == 0) ?  &uv_hub_info_node0 : kzalloc_node(bytes, GFP_KERNEL, nodeid);
-               BUG_ON(!new_hub);
-               __uv_hub_info_list[nodeid] = new_hub;
-               new_hub = uv_hub_info_list(nodeid);
-               BUG_ON(!new_hub);
+               uv_hub_info_list_blade[bid] = new_hub;
                *new_hub = hub_info;
 
                /* Use information from GAM table if available: */
-               if (_node_to_pnode)
-                       new_hub->pnode = _node_to_pnode[nodeid];
+               if (uv_gre_table)
+                       new_hub->pnode = uv_blade_to_pnode(bid);
                else /* Or fill in during CPU loop: */
                        new_hub->pnode = 0xffff;
 
-               new_hub->numa_blade_id = uv_node_to_blade_id(nodeid);
+               new_hub->numa_blade_id = bid;
                new_hub->memory_nid = NUMA_NO_NODE;
                new_hub->nr_possible_cpus = 0;
                new_hub->nr_online_cpus = 0;
        }
 
+       /*
+        * Now populate __uv_hub_info_list[] for each node with the
+        * pointer to the struct for the blade it resides on.
+        */
+
+       bytes = sizeof(void *) * num_possible_nodes();
+       __uv_hub_info_list = kzalloc(bytes, GFP_KERNEL);
+       if (WARN_ON_ONCE(!__uv_hub_info_list)) {
+               for_each_possible_blade(bid)
+                       /* bid 0 is statically allocated */
+                       if (bid != 0)
+                               kfree(uv_hub_info_list_blade[bid]);
+               kfree(uv_hub_info_list_blade);
+               return;
+       }
+
+       for_each_node(nodeid)
+               __uv_hub_info_list[nodeid] = uv_hub_info_list_blade[uv_node_to_blade_id(nodeid)];
+
        /* Initialize per CPU info: */
        for_each_possible_cpu(cpu) {
-               int apicid = per_cpu(x86_cpu_to_apicid, cpu);
-               int numa_node_id;
+               int apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
+               unsigned short bid;
                unsigned short pnode;
 
-               nodeid = cpu_to_node(cpu);
-               numa_node_id = numa_cpu_node(cpu);
                pnode = uv_apicid_to_pnode(apicid);
+               bid = uv_pnode_to_socket(pnode) - _min_socket;
 
-               uv_cpu_info_per(cpu)->p_uv_hub_info = uv_hub_info_list(nodeid);
+               uv_cpu_info_per(cpu)->p_uv_hub_info = uv_hub_info_list_blade[bid];
                uv_cpu_info_per(cpu)->blade_cpu_id = uv_cpu_hub_info(cpu)->nr_possible_cpus++;
                if (uv_cpu_hub_info(cpu)->memory_nid == NUMA_NO_NODE)
                        uv_cpu_hub_info(cpu)->memory_nid = cpu_to_node(cpu);
 
-               /* Init memoryless node: */
-               if (nodeid != numa_node_id &&
-                   uv_hub_info_list(numa_node_id)->pnode == 0xffff)
-                       uv_hub_info_list(numa_node_id)->pnode = pnode;
-               else if (uv_cpu_hub_info(cpu)->pnode == 0xffff)
+               if (uv_cpu_hub_info(cpu)->pnode == 0xffff)
                        uv_cpu_hub_info(cpu)->pnode = pnode;
        }
 
-       for_each_node(nodeid) {
-               unsigned short pnode = uv_hub_info_list(nodeid)->pnode;
+       for_each_possible_blade(bid) {
+               unsigned short pnode = uv_hub_info_list_blade[bid]->pnode;
 
-               /* Add pnode info for pre-GAM list nodes without CPUs: */
-               if (pnode == 0xffff) {
-                       unsigned long paddr;
+               if (pnode == 0xffff)
+                       continue;
 
-                       paddr = node_start_pfn(nodeid) << PAGE_SHIFT;
-                       pnode = uv_gpa_to_pnode(uv_soc_phys_ram_to_gpa(paddr));
-                       uv_hub_info_list(nodeid)->pnode = pnode;
-               }
                min_pnode = min(pnode, min_pnode);
                max_pnode = max(pnode, max_pnode);
-               pr_info("UV: UVHUB node:%2d pn:%02x nrcpus:%d\n",
-                       nodeid,
-                       uv_hub_info_list(nodeid)->pnode,
-                       uv_hub_info_list(nodeid)->nr_possible_cpus);
+               pr_info("UV: HUB:%2d pn:%02x nrcpus:%d\n",
+                       bid,
+                       uv_hub_info_list_blade[bid]->pnode,
+                       uv_hub_info_list_blade[bid]->nr_possible_cpus);
        }
 
        pr_info("UV: min_pnode:%02x max_pnode:%02x\n", min_pnode, max_pnode);
@@ -1811,6 +1880,9 @@ static void __init uv_system_init_hub(void)
        map_mmr_high(max_pnode);
        map_mmioh_high(min_pnode, max_pnode);
 
+       kfree(uv_hub_info_list_blade);
+       uv_hub_info_list_blade = NULL;
+
        uv_nmi_setup();
        uv_cpu_init();
        uv_setup_proc_files(0);
index 22ab139..c06bfc0 100644 (file)
@@ -133,8 +133,8 @@ static bool skip_addr(void *dest)
        /* Accounts directly */
        if (dest == ret_from_fork)
                return true;
-#ifdef CONFIG_HOTPLUG_CPU
-       if (dest == start_cpu0)
+#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_AMD_MEM_ENCRYPT)
+       if (dest == soft_restart_cpu)
                return true;
 #endif
 #ifdef CONFIG_FUNCTION_TRACER
@@ -293,7 +293,8 @@ void *callthunks_translate_call_dest(void *dest)
        return target ? : dest;
 }
 
-bool is_callthunk(void *addr)
+#ifdef CONFIG_BPF_JIT
+static bool is_callthunk(void *addr)
 {
        unsigned int tmpl_size = SKL_TMPL_SIZE;
        void *tmpl = skl_call_thunk_template;
@@ -306,7 +307,6 @@ bool is_callthunk(void *addr)
        return !bcmp((void *)(dest - tmpl_size), tmpl, tmpl_size);
 }
 
-#ifdef CONFIG_BPF_JIT
 int x86_call_depth_emit_accounting(u8 **pprog, void *func)
 {
        unsigned int tmpl_size = SKL_TMPL_SIZE;
index d7e3cea..4350f6b 100644 (file)
@@ -27,7 +27,7 @@ obj-y                 += cpuid-deps.o
 obj-y                  += umwait.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
-obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
+obj-y += capflags.o powerflags.o
 
 obj-$(CONFIG_IA32_FEAT_CTL) += feat_ctl.o
 ifdef CONFIG_CPU_SUP_INTEL
@@ -54,7 +54,6 @@ obj-$(CONFIG_X86_LOCAL_APIC)          += perfctr-watchdog.o
 obj-$(CONFIG_HYPERVISOR_GUEST)         += vmware.o hypervisor.o mshyperv.o
 obj-$(CONFIG_ACRN_GUEST)               += acrn.o
 
-ifdef CONFIG_X86_FEATURE_NAMES
 quiet_cmd_mkcapflags = MKCAP   $@
       cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $@ $^
 
@@ -63,5 +62,4 @@ vmxfeature = $(src)/../../include/asm/vmxfeatures.h
 
 $(obj)/capflags.c: $(cpufeature) $(vmxfeature) $(src)/mkcapflags.sh FORCE
        $(call if_changed,mkcapflags)
-endif
 targets += capflags.c
index 182af64..9e2a918 100644 (file)
@@ -9,7 +9,6 @@
  *     - Andrew D. Balsa (code cleanup).
  */
 #include <linux/init.h>
-#include <linux/utsname.h>
 #include <linux/cpu.h>
 #include <linux/module.h>
 #include <linux/nospec.h>
@@ -27,8 +26,6 @@
 #include <asm/msr.h>
 #include <asm/vmx.h>
 #include <asm/paravirt.h>
-#include <asm/alternative.h>
-#include <asm/set_memory.h>
 #include <asm/intel-family.h>
 #include <asm/e820/api.h>
 #include <asm/hypervisor.h>
@@ -125,21 +122,8 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
 DEFINE_STATIC_KEY_FALSE(mmio_stale_data_clear);
 EXPORT_SYMBOL_GPL(mmio_stale_data_clear);
 
-void __init check_bugs(void)
+void __init cpu_select_mitigations(void)
 {
-       identify_boot_cpu();
-
-       /*
-        * identify_boot_cpu() initialized SMT support information, let the
-        * core code know.
-        */
-       cpu_smt_check_topology();
-
-       if (!IS_ENABLED(CONFIG_SMP)) {
-               pr_info("CPU: ");
-               print_cpu_info(&boot_cpu_data);
-       }
-
        /*
         * Read the SPEC_CTRL MSR to account for reserved bits which may
         * have unknown values. AMD64_LS_CFG MSR is cached in the early AMD
@@ -176,39 +160,6 @@ void __init check_bugs(void)
        md_clear_select_mitigation();
        srbds_select_mitigation();
        l1d_flush_select_mitigation();
-
-       arch_smt_update();
-
-#ifdef CONFIG_X86_32
-       /*
-        * Check whether we are able to run this kernel safely on SMP.
-        *
-        * - i386 is no longer supported.
-        * - In order to run on anything without a TSC, we need to be
-        *   compiled for a i486.
-        */
-       if (boot_cpu_data.x86 < 4)
-               panic("Kernel requires i486+ for 'invlpg' and other features");
-
-       init_utsname()->machine[1] =
-               '0' + (boot_cpu_data.x86 > 6 ? 6 : boot_cpu_data.x86);
-       alternative_instructions();
-
-       fpu__init_check_bugs();
-#else /* CONFIG_X86_64 */
-       alternative_instructions();
-
-       /*
-        * Make sure the first 2MB area is not mapped by huge pages
-        * There are typically fixed size MTRRs in there and overlapping
-        * MTRRs into large pages causes slow downs.
-        *
-        * Right now we don't do that with gbpages because there seems
-        * very little benefit for that case.
-        */
-       if (!direct_gbpages)
-               set_memory_4k((unsigned long)__va(0), 1);
-#endif
 }
 
 /*
index 4063e89..8f86eac 100644 (file)
@@ -39,6 +39,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
 /* Shared L2 cache maps */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
 
+static cpumask_var_t cpu_cacheinfo_mask;
+
 /* Kernel controls MTRR and/or PAT MSRs. */
 unsigned int memory_caching_control __ro_after_init;
 
@@ -1172,8 +1174,10 @@ void cache_bp_restore(void)
                cache_cpu_init();
 }
 
-static int cache_ap_init(unsigned int cpu)
+static int cache_ap_online(unsigned int cpu)
 {
+       cpumask_set_cpu(cpu, cpu_cacheinfo_mask);
+
        if (!memory_caching_control || get_cache_aps_delayed_init())
                return 0;
 
@@ -1191,11 +1195,17 @@ static int cache_ap_init(unsigned int cpu)
         *      lock to prevent MTRR entry changes
         */
        stop_machine_from_inactive_cpu(cache_rendezvous_handler, NULL,
-                                      cpu_callout_mask);
+                                      cpu_cacheinfo_mask);
 
        return 0;
 }
 
+static int cache_ap_offline(unsigned int cpu)
+{
+       cpumask_clear_cpu(cpu, cpu_cacheinfo_mask);
+       return 0;
+}
+
 /*
  * Delayed cache initialization for all AP's
  */
@@ -1210,9 +1220,12 @@ void cache_aps_init(void)
 
 static int __init cache_ap_register(void)
 {
+       zalloc_cpumask_var(&cpu_cacheinfo_mask, GFP_KERNEL);
+       cpumask_set_cpu(smp_processor_id(), cpu_cacheinfo_mask);
+
        cpuhp_setup_state_nocalls(CPUHP_AP_CACHECTRL_STARTING,
                                  "x86/cachectrl:starting",
-                                 cache_ap_init, NULL);
+                                 cache_ap_online, cache_ap_offline);
        return 0;
 }
-core_initcall(cache_ap_register);
+early_initcall(cache_ap_register);
index 80710a6..52683fd 100644 (file)
 #include <linux/init.h>
 #include <linux/kprobes.h>
 #include <linux/kgdb.h>
+#include <linux/mem_encrypt.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/io.h>
 #include <linux/syscore_ops.h>
 #include <linux/pgtable.h>
 #include <linux/stackprotector.h>
+#include <linux/utsname.h>
 
+#include <asm/alternative.h>
 #include <asm/cmdline.h>
 #include <asm/perf_event.h>
 #include <asm/mmu_context.h>
@@ -59,7 +63,7 @@
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
 #include <asm/uv/uv.h>
-#include <asm/sigframe.h>
+#include <asm/set_memory.h>
 #include <asm/traps.h>
 #include <asm/sev.h>
 
 
 u32 elf_hwcap2 __read_mostly;
 
-/* all of these masks are initialized in setup_cpu_local_masks() */
-cpumask_var_t cpu_initialized_mask;
-cpumask_var_t cpu_callout_mask;
-cpumask_var_t cpu_callin_mask;
-
-/* representing cpus for which sibling maps can be computed */
-cpumask_var_t cpu_sibling_setup_mask;
-
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
 EXPORT_SYMBOL(smp_num_siblings);
@@ -169,15 +165,6 @@ clear_ppin:
        clear_cpu_cap(c, info->feature);
 }
 
-/* correctly size the local cpu masks */
-void __init setup_cpu_local_masks(void)
-{
-       alloc_bootmem_cpumask_var(&cpu_initialized_mask);
-       alloc_bootmem_cpumask_var(&cpu_callin_mask);
-       alloc_bootmem_cpumask_var(&cpu_callout_mask);
-       alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
-}
-
 static void default_init(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_X86_64
@@ -1502,12 +1489,10 @@ static void __init cpu_parse_early_param(void)
                if (!kstrtouint(opt, 10, &bit)) {
                        if (bit < NCAPINTS * 32) {
 
-#ifdef CONFIG_X86_FEATURE_NAMES
                                /* empty-string, i.e., ""-defined feature flags */
                                if (!x86_cap_flags[bit])
                                        pr_cont(" " X86_CAP_FMT_NUM, x86_cap_flag_num(bit));
                                else
-#endif
                                        pr_cont(" " X86_CAP_FMT, x86_cap_flag(bit));
 
                                setup_clear_cpu_cap(bit);
@@ -1520,7 +1505,6 @@ static void __init cpu_parse_early_param(void)
                        continue;
                }
 
-#ifdef CONFIG_X86_FEATURE_NAMES
                for (bit = 0; bit < 32 * NCAPINTS; bit++) {
                        if (!x86_cap_flag(bit))
                                continue;
@@ -1537,7 +1521,6 @@ static void __init cpu_parse_early_param(void)
 
                if (!found)
                        pr_cont(" (unknown: %s)", opt);
-#endif
        }
        pr_cont("\n");
 
@@ -1600,10 +1583,6 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
        sld_setup(c);
 
-       fpu__init_system(c);
-
-       init_sigframe_size();
-
 #ifdef CONFIG_X86_32
        /*
         * Regardless of whether PCID is enumerated, the SDM says
@@ -2123,19 +2102,6 @@ static void dbg_restore_debug_regs(void)
 #define dbg_restore_debug_regs()
 #endif /* ! CONFIG_KGDB */
 
-static void wait_for_master_cpu(int cpu)
-{
-#ifdef CONFIG_SMP
-       /*
-        * wait for ACK from master CPU before continuing
-        * with AP initialization
-        */
-       WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
-       while (!cpumask_test_cpu(cpu, cpu_callout_mask))
-               cpu_relax();
-#endif
-}
-
 static inline void setup_getcpu(int cpu)
 {
        unsigned long cpudata = vdso_encode_cpunode(cpu, early_cpu_to_node(cpu));
@@ -2158,11 +2124,7 @@ static inline void setup_getcpu(int cpu)
 }
 
 #ifdef CONFIG_X86_64
-static inline void ucode_cpu_init(int cpu)
-{
-       if (cpu)
-               load_ucode_ap();
-}
+static inline void ucode_cpu_init(int cpu) { }
 
 static inline void tss_setup_ist(struct tss_struct *tss)
 {
@@ -2239,8 +2201,6 @@ void cpu_init(void)
        struct task_struct *cur = current;
        int cpu = raw_smp_processor_id();
 
-       wait_for_master_cpu(cpu);
-
        ucode_cpu_init(cpu);
 
 #ifdef CONFIG_NUMA
@@ -2285,26 +2245,12 @@ void cpu_init(void)
 
        doublefault_init_cpu_tss();
 
-       fpu__init_cpu();
-
        if (is_uv_system())
                uv_cpu_init();
 
        load_fixmap_gdt(cpu);
 }
 
-#ifdef CONFIG_SMP
-void cpu_init_secondary(void)
-{
-       /*
-        * Relies on the BP having set-up the IDT tables, which are loaded
-        * on this CPU in cpu_init_exception_handling().
-        */
-       cpu_init_exception_handling();
-       cpu_init();
-}
-#endif
-
 #ifdef CONFIG_MICROCODE_LATE_LOADING
 /**
  * store_cpu_caps() - Store a snapshot of CPU capabilities
@@ -2362,3 +2308,69 @@ void arch_smt_update(void)
        /* Check whether IPI broadcasting can be enabled */
        apic_smt_update();
 }
+
+void __init arch_cpu_finalize_init(void)
+{
+       identify_boot_cpu();
+
+       /*
+        * identify_boot_cpu() initialized SMT support information, let the
+        * core code know.
+        */
+       cpu_smt_check_topology();
+
+       if (!IS_ENABLED(CONFIG_SMP)) {
+               pr_info("CPU: ");
+               print_cpu_info(&boot_cpu_data);
+       }
+
+       cpu_select_mitigations();
+
+       arch_smt_update();
+
+       if (IS_ENABLED(CONFIG_X86_32)) {
+               /*
+                * Check whether this is a real i386 which is not longer
+                * supported and fixup the utsname.
+                */
+               if (boot_cpu_data.x86 < 4)
+                       panic("Kernel requires i486+ for 'invlpg' and other features");
+
+               init_utsname()->machine[1] =
+                       '0' + (boot_cpu_data.x86 > 6 ? 6 : boot_cpu_data.x86);
+       }
+
+       /*
+        * Must be before alternatives because it might set or clear
+        * feature bits.
+        */
+       fpu__init_system();
+       fpu__init_cpu();
+
+       alternative_instructions();
+
+       if (IS_ENABLED(CONFIG_X86_64)) {
+               /*
+                * Make sure the first 2MB area is not mapped by huge pages
+                * There are typically fixed size MTRRs in there and overlapping
+                * MTRRs into large pages causes slow downs.
+                *
+                * Right now we don't do that with gbpages because there seems
+                * very little benefit for that case.
+                */
+               if (!direct_gbpages)
+                       set_memory_4k((unsigned long)__va(0), 1);
+       } else {
+               fpu__init_check_bugs();
+       }
+
+       /*
+        * This needs to be called before any devices perform DMA
+        * operations that might use the SWIOTLB bounce buffers. It will
+        * mark the bounce buffers as decrypted so that their usage will
+        * not cause "plain-text" data to be decrypted when accessed. It
+        * must be called after late_time_init() so that Hyper-V x86/x64
+        * hypercalls work when the SWIOTLB bounce buffers are decrypted.
+        */
+       mem_encrypt_init();
+}
index f97b0fe..1c44630 100644 (file)
@@ -79,6 +79,7 @@ extern void detect_ht(struct cpuinfo_x86 *c);
 extern void check_null_seg_clears_base(struct cpuinfo_x86 *c);
 
 unsigned int aperfmperf_get_khz(int cpu);
+void cpu_select_mitigations(void);
 
 extern void x86_spec_ctrl_setup_ap(void);
 extern void update_srbds_msr(void);
index 0b971f9..5e74610 100644 (file)
@@ -715,11 +715,13 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
 bool amd_mce_is_memory_error(struct mce *m)
 {
+       enum smca_bank_types bank_type;
        /* ErrCodeExt[20:16] */
        u8 xec = (m->status >> 16) & 0x1f;
 
+       bank_type = smca_get_bank_type(m->extcpu, m->bank);
        if (mce_flags.smca)
-               return smca_get_bank_type(m->extcpu, m->bank) == SMCA_UMC && xec == 0x0;
+               return (bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2) && xec == 0x0;
 
        return m->bank == 4 && xec == 0x8;
 }
@@ -1050,7 +1052,7 @@ static const char *get_name(unsigned int cpu, unsigned int bank, struct threshol
        if (bank_type >= N_SMCA_BANK_TYPES)
                return NULL;
 
-       if (b && bank_type == SMCA_UMC) {
+       if (b && (bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2)) {
                if (b->block < ARRAY_SIZE(smca_umc_block_names))
                        return smca_umc_block_names[b->block];
                return NULL;
index 2eec60f..89e2aab 100644 (file)
@@ -1022,12 +1022,12 @@ static noinstr int mce_start(int *no_way_out)
        if (!timeout)
                return ret;
 
-       arch_atomic_add(*no_way_out, &global_nwo);
+       raw_atomic_add(*no_way_out, &global_nwo);
        /*
         * Rely on the implied barrier below, such that global_nwo
         * is updated before mce_callin.
         */
-       order = arch_atomic_inc_return(&mce_callin);
+       order = raw_atomic_inc_return(&mce_callin);
        arch_cpumask_clear_cpu(smp_processor_id(), &mce_missing_cpus);
 
        /* Enable instrumentation around calls to external facilities */
@@ -1036,10 +1036,10 @@ static noinstr int mce_start(int *no_way_out)
        /*
         * Wait for everyone.
         */
-       while (arch_atomic_read(&mce_callin) != num_online_cpus()) {
+       while (raw_atomic_read(&mce_callin) != num_online_cpus()) {
                if (mce_timed_out(&timeout,
                                  "Timeout: Not all CPUs entered broadcast exception handler")) {
-                       arch_atomic_set(&global_nwo, 0);
+                       raw_atomic_set(&global_nwo, 0);
                        goto out;
                }
                ndelay(SPINUNIT);
@@ -1054,7 +1054,7 @@ static noinstr int mce_start(int *no_way_out)
                /*
                 * Monarch: Starts executing now, the others wait.
                 */
-               arch_atomic_set(&mce_executing, 1);
+               raw_atomic_set(&mce_executing, 1);
        } else {
                /*
                 * Subject: Now start the scanning loop one by one in
@@ -1062,10 +1062,10 @@ static noinstr int mce_start(int *no_way_out)
                 * This way when there are any shared banks it will be
                 * only seen by one CPU before cleared, avoiding duplicates.
                 */
-               while (arch_atomic_read(&mce_executing) < order) {
+               while (raw_atomic_read(&mce_executing) < order) {
                        if (mce_timed_out(&timeout,
                                          "Timeout: Subject CPUs unable to finish machine check processing")) {
-                               arch_atomic_set(&global_nwo, 0);
+                               raw_atomic_set(&global_nwo, 0);
                                goto out;
                        }
                        ndelay(SPINUNIT);
@@ -1075,7 +1075,7 @@ static noinstr int mce_start(int *no_way_out)
        /*
         * Cache the global no_way_out state.
         */
-       *no_way_out = arch_atomic_read(&global_nwo);
+       *no_way_out = raw_atomic_read(&global_nwo);
 
        ret = order;
 
@@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
                /* If this triggers there is no way to recover. Die hard. */
                BUG_ON(!on_thread_stack() || !user_mode(regs));
 
-               if (kill_current_task)
+               if (!mce_usable_address(&m))
                        queue_task_work(&m, msg, kill_me_now);
                else
                        queue_task_work(&m, msg, kill_me_maybe);
index f5fdeb1..87208e4 100644 (file)
@@ -78,8 +78,6 @@ static u16 find_equiv_id(struct equiv_cpu_table *et, u32 sig)
 
                if (sig == e->installed_cpu)
                        return e->equiv_cpu;
-
-               e++;
        }
        return 0;
 }
@@ -596,11 +594,6 @@ void reload_ucode_amd(unsigned int cpu)
                }
        }
 }
-static u16 __find_equiv_id(unsigned int cpu)
-{
-       struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
-       return find_equiv_id(&equiv_table, uci->cpu_sig.sig);
-}
 
 /*
  * a small, trivial cache of per-family ucode patches
@@ -651,9 +644,11 @@ static void free_cache(void)
 
 static struct ucode_patch *find_patch(unsigned int cpu)
 {
+       struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
        u16 equiv_id;
 
-       equiv_id = __find_equiv_id(cpu);
+
+       equiv_id = find_equiv_id(&equiv_table, uci->cpu_sig.sig);
        if (!equiv_id)
                return NULL;
 
@@ -705,7 +700,7 @@ static enum ucode_state apply_microcode_amd(int cpu)
        rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy);
 
        /* need to apply patch? */
-       if (rev >= mc_amd->hdr.patch_id) {
+       if (rev > mc_amd->hdr.patch_id) {
                ret = UCODE_OK;
                goto out;
        }
index cc4f9f1..aee4bc5 100644 (file)
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y          := mtrr.o if.o generic.o cleanup.o
-obj-$(CONFIG_X86_32) += amd.o cyrix.o centaur.o
+obj-$(CONFIG_X86_32) += amd.o cyrix.o centaur.o legacy.o
 
index eff6ac6..ef3e8e4 100644 (file)
@@ -110,7 +110,7 @@ amd_validate_add_page(unsigned long base, unsigned long size, unsigned int type)
 }
 
 const struct mtrr_ops amd_mtrr_ops = {
-       .vendor            = X86_VENDOR_AMD,
+       .var_regs          = 2,
        .set               = amd_set_mtrr,
        .get               = amd_get_mtrr,
        .get_free_region   = generic_get_free_region,
index b8a74ed..6f6c3ae 100644 (file)
@@ -45,15 +45,6 @@ centaur_get_free_region(unsigned long base, unsigned long size, int replace_reg)
        return -ENOSPC;
 }
 
-/*
- * Report boot time MCR setups
- */
-void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
-{
-       centaur_mcr[mcr].low = lo;
-       centaur_mcr[mcr].high = hi;
-}
-
 static void
 centaur_get_mcr(unsigned int reg, unsigned long *base,
                unsigned long *size, mtrr_type * type)
@@ -112,7 +103,7 @@ centaur_validate_add_page(unsigned long base, unsigned long size, unsigned int t
 }
 
 const struct mtrr_ops centaur_mtrr_ops = {
-       .vendor            = X86_VENDOR_CENTAUR,
+       .var_regs          = 8,
        .set               = centaur_set_mcr,
        .get               = centaur_get_mcr,
        .get_free_region   = centaur_get_free_region,
index b5f4304..18cf79d 100644 (file)
@@ -55,9 +55,6 @@ static int __initdata                         nr_range;
 
 static struct var_mtrr_range_state __initdata  range_state[RANGE_NUM];
 
-static int __initdata debug_print;
-#define Dprintk(x...) do { if (debug_print) pr_debug(x); } while (0)
-
 #define BIOS_BUG_MSG \
        "WARNING: BIOS bug: VAR MTRR %d contains strange UC entry under 1M, check with your system vendor!\n"
 
@@ -79,12 +76,11 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
                nr_range = add_range_with_merge(range, RANGE_NUM, nr_range,
                                                base, base + size);
        }
-       if (debug_print) {
-               pr_debug("After WB checking\n");
-               for (i = 0; i < nr_range; i++)
-                       pr_debug("MTRR MAP PFN: %016llx - %016llx\n",
-                                range[i].start, range[i].end);
-       }
+
+       Dprintk("After WB checking\n");
+       for (i = 0; i < nr_range; i++)
+               Dprintk("MTRR MAP PFN: %016llx - %016llx\n",
+                        range[i].start, range[i].end);
 
        /* Take out UC ranges: */
        for (i = 0; i < num_var_ranges; i++) {
@@ -112,24 +108,22 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
                subtract_range(range, RANGE_NUM, extra_remove_base,
                                 extra_remove_base + extra_remove_size);
 
-       if  (debug_print) {
-               pr_debug("After UC checking\n");
-               for (i = 0; i < RANGE_NUM; i++) {
-                       if (!range[i].end)
-                               continue;
-                       pr_debug("MTRR MAP PFN: %016llx - %016llx\n",
-                                range[i].start, range[i].end);
-               }
+       Dprintk("After UC checking\n");
+       for (i = 0; i < RANGE_NUM; i++) {
+               if (!range[i].end)
+                       continue;
+
+               Dprintk("MTRR MAP PFN: %016llx - %016llx\n",
+                        range[i].start, range[i].end);
        }
 
        /* sort the ranges */
        nr_range = clean_sort_range(range, RANGE_NUM);
-       if  (debug_print) {
-               pr_debug("After sorting\n");
-               for (i = 0; i < nr_range; i++)
-                       pr_debug("MTRR MAP PFN: %016llx - %016llx\n",
-                                range[i].start, range[i].end);
-       }
+
+       Dprintk("After sorting\n");
+       for (i = 0; i < nr_range; i++)
+               Dprintk("MTRR MAP PFN: %016llx - %016llx\n",
+                       range[i].start, range[i].end);
 
        return nr_range;
 }
@@ -164,16 +158,9 @@ static int __init enable_mtrr_cleanup_setup(char *str)
 }
 early_param("enable_mtrr_cleanup", enable_mtrr_cleanup_setup);
 
-static int __init mtrr_cleanup_debug_setup(char *str)
-{
-       debug_print = 1;
-       return 0;
-}
-early_param("mtrr_cleanup_debug", mtrr_cleanup_debug_setup);
-
 static void __init
 set_var_mtrr(unsigned int reg, unsigned long basek, unsigned long sizek,
-            unsigned char type, unsigned int address_bits)
+            unsigned char type)
 {
        u32 base_lo, base_hi, mask_lo, mask_hi;
        u64 base, mask;
@@ -183,7 +170,7 @@ set_var_mtrr(unsigned int reg, unsigned long basek, unsigned long sizek,
                return;
        }
 
-       mask = (1ULL << address_bits) - 1;
+       mask = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
        mask &= ~((((u64)sizek) << 10) - 1);
 
        base = ((u64)basek) << 10;
@@ -209,7 +196,7 @@ save_var_mtrr(unsigned int reg, unsigned long basek, unsigned long sizek,
        range_state[reg].type = type;
 }
 
-static void __init set_var_mtrr_all(unsigned int address_bits)
+static void __init set_var_mtrr_all(void)
 {
        unsigned long basek, sizek;
        unsigned char type;
@@ -220,7 +207,7 @@ static void __init set_var_mtrr_all(unsigned int address_bits)
                sizek = range_state[reg].size_pfn << (PAGE_SHIFT - 10);
                type = range_state[reg].type;
 
-               set_var_mtrr(reg, basek, sizek, type, address_bits);
+               set_var_mtrr(reg, basek, sizek, type);
        }
 }
 
@@ -267,7 +254,7 @@ range_to_mtrr(unsigned int reg, unsigned long range_startk,
                        align = max_align;
 
                sizek = 1UL << align;
-               if (debug_print) {
+               if (mtrr_debug) {
                        char start_factor = 'K', size_factor = 'K';
                        unsigned long start_base, size_base;
 
@@ -542,7 +529,7 @@ static void __init print_out_mtrr_range_state(void)
                start_base = to_size_factor(start_base, &start_factor);
                type = range_state[i].type;
 
-               pr_debug("reg %d, base: %ld%cB, range: %ld%cB, type %s\n",
+               Dprintk("reg %d, base: %ld%cB, range: %ld%cB, type %s\n",
                        i, start_base, start_factor,
                        size_base, size_factor,
                        (type == MTRR_TYPE_UNCACHABLE) ? "UC" :
@@ -680,7 +667,7 @@ static int __init mtrr_search_optimal_index(void)
        return index_good;
 }
 
-int __init mtrr_cleanup(unsigned address_bits)
+int __init mtrr_cleanup(void)
 {
        unsigned long x_remove_base, x_remove_size;
        unsigned long base, size, def, dummy;
@@ -689,7 +676,10 @@ int __init mtrr_cleanup(unsigned address_bits)
        int index_good;
        int i;
 
-       if (!is_cpu(INTEL) || enable_mtrr_cleanup < 1)
+       if (!mtrr_enabled())
+               return 0;
+
+       if (!cpu_feature_enabled(X86_FEATURE_MTRR) || enable_mtrr_cleanup < 1)
                return 0;
 
        rdmsr(MSR_MTRRdefType, def, dummy);
@@ -711,7 +701,7 @@ int __init mtrr_cleanup(unsigned address_bits)
                return 0;
 
        /* Print original var MTRRs at first, for debugging: */
-       pr_debug("original variable MTRRs\n");
+       Dprintk("original variable MTRRs\n");
        print_out_mtrr_range_state();
 
        memset(range, 0, sizeof(range));
@@ -742,8 +732,8 @@ int __init mtrr_cleanup(unsigned address_bits)
                mtrr_print_out_one_result(i);
 
                if (!result[i].bad) {
-                       set_var_mtrr_all(address_bits);
-                       pr_debug("New variable MTRRs\n");
+                       set_var_mtrr_all();
+                       Dprintk("New variable MTRRs\n");
                        print_out_mtrr_range_state();
                        return 1;
                }
@@ -763,7 +753,7 @@ int __init mtrr_cleanup(unsigned address_bits)
 
                        mtrr_calc_range_state(chunk_size, gran_size,
                                      x_remove_base, x_remove_size, i);
-                       if (debug_print) {
+                       if (mtrr_debug) {
                                mtrr_print_out_one_result(i);
                                pr_info("\n");
                        }
@@ -786,8 +776,8 @@ int __init mtrr_cleanup(unsigned address_bits)
                gran_size = result[i].gran_sizek;
                gran_size <<= 10;
                x86_setup_var_mtrrs(range, nr_range, chunk_size, gran_size);
-               set_var_mtrr_all(address_bits);
-               pr_debug("New variable MTRRs\n");
+               set_var_mtrr_all();
+               Dprintk("New variable MTRRs\n");
                print_out_mtrr_range_state();
                return 1;
        } else {
@@ -802,7 +792,7 @@ int __init mtrr_cleanup(unsigned address_bits)
        return 0;
 }
 #else
-int __init mtrr_cleanup(unsigned address_bits)
+int __init mtrr_cleanup(void)
 {
        return 0;
 }
@@ -882,15 +872,18 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
        /* extra one for all 0 */
        int num[MTRR_NUM_TYPES + 1];
 
+       if (!mtrr_enabled())
+               return 0;
+
        /*
         * Make sure we only trim uncachable memory on machines that
         * support the Intel MTRR architecture:
         */
-       if (!is_cpu(INTEL) || disable_mtrr_trim)
+       if (!cpu_feature_enabled(X86_FEATURE_MTRR) || disable_mtrr_trim)
                return 0;
 
        rdmsr(MSR_MTRRdefType, def, dummy);
-       def &= 0xff;
+       def &= MTRR_DEF_TYPE_TYPE;
        if (def != MTRR_TYPE_UNCACHABLE)
                return 0;
 
index 173b9e0..238dad5 100644 (file)
@@ -235,7 +235,7 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
 }
 
 const struct mtrr_ops cyrix_mtrr_ops = {
-       .vendor            = X86_VENDOR_CYRIX,
+       .var_regs          = 8,
        .set               = cyrix_set_arr,
        .get               = cyrix_get_arr,
        .get_free_region   = cyrix_get_free_region,
index ee09d35..2d6aa5d 100644 (file)
@@ -8,10 +8,12 @@
 #include <linux/init.h>
 #include <linux/io.h>
 #include <linux/mm.h>
-
+#include <linux/cc_platform.h>
 #include <asm/processor-flags.h>
 #include <asm/cacheinfo.h>
 #include <asm/cpufeature.h>
+#include <asm/hypervisor.h>
+#include <asm/mshyperv.h>
 #include <asm/tlbflush.h>
 #include <asm/mtrr.h>
 #include <asm/msr.h>
@@ -31,6 +33,55 @@ static struct fixed_range_block fixed_range_blocks[] = {
        {}
 };
 
+struct cache_map {
+       u64 start;
+       u64 end;
+       u64 flags;
+       u64 type:8;
+       u64 fixed:1;
+};
+
+bool mtrr_debug;
+
+static int __init mtrr_param_setup(char *str)
+{
+       int rc = 0;
+
+       if (!str)
+               return -EINVAL;
+       if (!strcmp(str, "debug"))
+               mtrr_debug = true;
+       else
+               rc = -EINVAL;
+
+       return rc;
+}
+early_param("mtrr", mtrr_param_setup);
+
+/*
+ * CACHE_MAP_MAX is the maximum number of memory ranges in cache_map, where
+ * no 2 adjacent ranges have the same cache mode (those would be merged).
+ * The number is based on the worst case:
+ * - no two adjacent fixed MTRRs share the same cache mode
+ * - one variable MTRR is spanning a huge area with mode WB
+ * - 255 variable MTRRs with mode UC all overlap with the WB MTRR, creating 2
+ *   additional ranges each (result like "ababababa...aba" with a = WB, b = UC),
+ *   accounting for MTRR_MAX_VAR_RANGES * 2 - 1 range entries
+ * - a TOP_MEM2 area (even with overlapping an UC MTRR can't add 2 range entries
+ *   to the possible maximum, as it always starts at 4GB, thus it can't be in
+ *   the middle of that MTRR, unless that MTRR starts at 0, which would remove
+ *   the initial "a" from the "abababa" pattern above)
+ * The map won't contain ranges with no matching MTRR (those fall back to the
+ * default cache mode).
+ */
+#define CACHE_MAP_MAX  (MTRR_NUM_FIXED_RANGES + MTRR_MAX_VAR_RANGES * 2)
+
+static struct cache_map init_cache_map[CACHE_MAP_MAX] __initdata;
+static struct cache_map *cache_map __refdata = init_cache_map;
+static unsigned int cache_map_size = CACHE_MAP_MAX;
+static unsigned int cache_map_n;
+static unsigned int cache_map_fixed;
+
 static unsigned long smp_changes_mask;
 static int mtrr_state_set;
 u64 mtrr_tom2;
@@ -38,6 +89,9 @@ u64 mtrr_tom2;
 struct mtrr_state_type mtrr_state;
 EXPORT_SYMBOL_GPL(mtrr_state);
 
+/* Reserved bits in the high portion of the MTRRphysBaseN MSR. */
+u32 phys_hi_rsvd;
+
 /*
  * BIOS is expected to clear MtrrFixDramModEn bit, see for example
  * "BIOS and Kernel Developer's Guide for the AMD Athlon 64 and AMD
@@ -69,175 +123,370 @@ static u64 get_mtrr_size(u64 mask)
 {
        u64 size;
 
-       mask >>= PAGE_SHIFT;
-       mask |= size_or_mask;
+       mask |= (u64)phys_hi_rsvd << 32;
        size = -mask;
-       size <<= PAGE_SHIFT;
+
        return size;
 }
 
+static u8 get_var_mtrr_state(unsigned int reg, u64 *start, u64 *size)
+{
+       struct mtrr_var_range *mtrr = mtrr_state.var_ranges + reg;
+
+       if (!(mtrr->mask_lo & MTRR_PHYSMASK_V))
+               return MTRR_TYPE_INVALID;
+
+       *start = (((u64)mtrr->base_hi) << 32) + (mtrr->base_lo & PAGE_MASK);
+       *size = get_mtrr_size((((u64)mtrr->mask_hi) << 32) +
+                             (mtrr->mask_lo & PAGE_MASK));
+
+       return mtrr->base_lo & MTRR_PHYSBASE_TYPE;
+}
+
+static u8 get_effective_type(u8 type1, u8 type2)
+{
+       if (type1 == MTRR_TYPE_UNCACHABLE || type2 == MTRR_TYPE_UNCACHABLE)
+               return MTRR_TYPE_UNCACHABLE;
+
+       if ((type1 == MTRR_TYPE_WRBACK && type2 == MTRR_TYPE_WRTHROUGH) ||
+           (type1 == MTRR_TYPE_WRTHROUGH && type2 == MTRR_TYPE_WRBACK))
+               return MTRR_TYPE_WRTHROUGH;
+
+       if (type1 != type2)
+               return MTRR_TYPE_UNCACHABLE;
+
+       return type1;
+}
+
+static void rm_map_entry_at(int idx)
+{
+       cache_map_n--;
+       if (cache_map_n > idx) {
+               memmove(cache_map + idx, cache_map + idx + 1,
+                       sizeof(*cache_map) * (cache_map_n - idx));
+       }
+}
+
 /*
- * Check and return the effective type for MTRR-MTRR type overlap.
- * Returns 1 if the effective type is UNCACHEABLE, else returns 0
+ * Add an entry into cache_map at a specific index.  Merges adjacent entries if
+ * appropriate.  Return the number of merges for correcting the scan index
+ * (this is needed as merging will reduce the number of entries, which will
+ * result in skipping entries in future iterations if the scan index isn't
+ * corrected).
+ * Note that the corrected index can never go below -1 (resulting in being 0 in
+ * the next scan iteration), as "2" is returned only if the current index is
+ * larger than zero.
  */
-static int check_type_overlap(u8 *prev, u8 *curr)
+static int add_map_entry_at(u64 start, u64 end, u8 type, int idx)
 {
-       if (*prev == MTRR_TYPE_UNCACHABLE || *curr == MTRR_TYPE_UNCACHABLE) {
-               *prev = MTRR_TYPE_UNCACHABLE;
-               *curr = MTRR_TYPE_UNCACHABLE;
-               return 1;
+       bool merge_prev = false, merge_next = false;
+
+       if (start >= end)
+               return 0;
+
+       if (idx > 0) {
+               struct cache_map *prev = cache_map + idx - 1;
+
+               if (!prev->fixed && start == prev->end && type == prev->type)
+                       merge_prev = true;
        }
 
-       if ((*prev == MTRR_TYPE_WRBACK && *curr == MTRR_TYPE_WRTHROUGH) ||
-           (*prev == MTRR_TYPE_WRTHROUGH && *curr == MTRR_TYPE_WRBACK)) {
-               *prev = MTRR_TYPE_WRTHROUGH;
-               *curr = MTRR_TYPE_WRTHROUGH;
+       if (idx < cache_map_n) {
+               struct cache_map *next = cache_map + idx;
+
+               if (!next->fixed && end == next->start && type == next->type)
+                       merge_next = true;
        }
 
-       if (*prev != *curr) {
-               *prev = MTRR_TYPE_UNCACHABLE;
-               *curr = MTRR_TYPE_UNCACHABLE;
+       if (merge_prev && merge_next) {
+               cache_map[idx - 1].end = cache_map[idx].end;
+               rm_map_entry_at(idx);
+               return 2;
+       }
+       if (merge_prev) {
+               cache_map[idx - 1].end = end;
                return 1;
        }
+       if (merge_next) {
+               cache_map[idx].start = start;
+               return 1;
+       }
+
+       /* Sanity check: the array should NEVER be too small! */
+       if (cache_map_n == cache_map_size) {
+               WARN(1, "MTRR cache mode memory map exhausted!\n");
+               cache_map_n = cache_map_fixed;
+               return 0;
+       }
+
+       if (cache_map_n > idx) {
+               memmove(cache_map + idx + 1, cache_map + idx,
+                       sizeof(*cache_map) * (cache_map_n - idx));
+       }
+
+       cache_map[idx].start = start;
+       cache_map[idx].end = end;
+       cache_map[idx].type = type;
+       cache_map[idx].fixed = 0;
+       cache_map_n++;
 
        return 0;
 }
 
-/**
- * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
- *
- * Return the MTRR fixed memory type of 'start'.
- *
- * MTRR fixed entries are divided into the following ways:
- *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
- *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
- *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
- *
- * Return Values:
- * MTRR_TYPE_(type)  - Matched memory type
- * MTRR_TYPE_INVALID - Unmatched
+/* Clear a part of an entry. Return 1 if start of entry is still valid. */
+static int clr_map_range_at(u64 start, u64 end, int idx)
+{
+       int ret = start != cache_map[idx].start;
+       u64 tmp;
+
+       if (start == cache_map[idx].start && end == cache_map[idx].end) {
+               rm_map_entry_at(idx);
+       } else if (start == cache_map[idx].start) {
+               cache_map[idx].start = end;
+       } else if (end == cache_map[idx].end) {
+               cache_map[idx].end = start;
+       } else {
+               tmp = cache_map[idx].end;
+               cache_map[idx].end = start;
+               add_map_entry_at(end, tmp, cache_map[idx].type, idx + 1);
+       }
+
+       return ret;
+}
+
+/*
+ * Add MTRR to the map.  The current map is scanned and each part of the MTRR
+ * either overlapping with an existing entry or with a hole in the map is
+ * handled separately.
  */
-static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+static void add_map_entry(u64 start, u64 end, u8 type)
 {
-       int idx;
+       u8 new_type, old_type;
+       u64 tmp;
+       int i;
 
-       if (start >= 0x100000)
-               return MTRR_TYPE_INVALID;
+       for (i = 0; i < cache_map_n && start < end; i++) {
+               if (start >= cache_map[i].end)
+                       continue;
+
+               if (start < cache_map[i].start) {
+                       /* Region start has no overlap. */
+                       tmp = min(end, cache_map[i].start);
+                       i -= add_map_entry_at(start, tmp,  type, i);
+                       start = tmp;
+                       continue;
+               }
 
-       /* 0x0 - 0x7FFFF */
-       if (start < 0x80000) {
-               idx = 0;
-               idx += (start >> 16);
-               return mtrr_state.fixed_ranges[idx];
-       /* 0x80000 - 0xBFFFF */
-       } else if (start < 0xC0000) {
-               idx = 1 * 8;
-               idx += ((start - 0x80000) >> 14);
-               return mtrr_state.fixed_ranges[idx];
+               new_type = get_effective_type(type, cache_map[i].type);
+               old_type = cache_map[i].type;
+
+               if (cache_map[i].fixed || new_type == old_type) {
+                       /* Cut off start of new entry. */
+                       start = cache_map[i].end;
+                       continue;
+               }
+
+               /* Handle only overlapping part of region. */
+               tmp = min(end, cache_map[i].end);
+               i += clr_map_range_at(start, tmp, i);
+               i -= add_map_entry_at(start, tmp, new_type, i);
+               start = tmp;
        }
 
-       /* 0xC0000 - 0xFFFFF */
-       idx = 3 * 8;
-       idx += ((start - 0xC0000) >> 12);
-       return mtrr_state.fixed_ranges[idx];
+       /* Add rest of region after last map entry (rest might be empty). */
+       add_map_entry_at(start, end, type, i);
 }
 
-/**
- * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
- *
- * Return Value:
- * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
- *
- * Output Arguments:
- * repeat - Set to 1 when [start:end] spanned across MTRR range and type
- *         returned corresponds only to [start:*partial_end].  Caller has
- *         to lookup again for [*partial_end:end].
- *
- * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
- *          region is fully covered by a single MTRR entry or the default
- *          type.
+/* Add variable MTRRs to cache map. */
+static void map_add_var(void)
+{
+       u64 start, size;
+       unsigned int i;
+       u8 type;
+
+       /*
+        * Add AMD TOP_MEM2 area.  Can't be added in mtrr_build_map(), as it
+        * needs to be added again when rebuilding the map due to potentially
+        * having moved as a result of variable MTRRs for memory below 4GB.
+        */
+       if (mtrr_tom2) {
+               add_map_entry(BIT_ULL(32), mtrr_tom2, MTRR_TYPE_WRBACK);
+               cache_map[cache_map_n - 1].fixed = 1;
+       }
+
+       for (i = 0; i < num_var_ranges; i++) {
+               type = get_var_mtrr_state(i, &start, &size);
+               if (type != MTRR_TYPE_INVALID)
+                       add_map_entry(start, start + size, type);
+       }
+}
+
+/*
+ * Rebuild map by replacing variable entries.  Needs to be called when MTRR
+ * registers are being changed after boot, as such changes could include
+ * removals of registers, which are complicated to handle without rebuild of
+ * the map.
  */
-static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-                                   int *repeat, u8 *uniform)
+void generic_rebuild_map(void)
 {
-       int i;
-       u64 base, mask;
-       u8 prev_match, curr_match;
+       if (mtrr_if != &generic_mtrr_ops)
+               return;
 
-       *repeat = 0;
-       *uniform = 1;
+       cache_map_n = cache_map_fixed;
 
-       prev_match = MTRR_TYPE_INVALID;
-       for (i = 0; i < num_var_ranges; ++i) {
-               unsigned short start_state, end_state, inclusive;
+       map_add_var();
+}
 
-               if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
-                       continue;
+static unsigned int __init get_cache_map_size(void)
+{
+       return cache_map_fixed + 2 * num_var_ranges + (mtrr_tom2 != 0);
+}
 
-               base = (((u64)mtrr_state.var_ranges[i].base_hi) << 32) +
-                      (mtrr_state.var_ranges[i].base_lo & PAGE_MASK);
-               mask = (((u64)mtrr_state.var_ranges[i].mask_hi) << 32) +
-                      (mtrr_state.var_ranges[i].mask_lo & PAGE_MASK);
-
-               start_state = ((start & mask) == (base & mask));
-               end_state = ((end & mask) == (base & mask));
-               inclusive = ((start < base) && (end > base));
-
-               if ((start_state != end_state) || inclusive) {
-                       /*
-                        * We have start:end spanning across an MTRR.
-                        * We split the region into either
-                        *
-                        * - start_state:1
-                        * (start:mtrr_end)(mtrr_end:end)
-                        * - end_state:1
-                        * (start:mtrr_start)(mtrr_start:end)
-                        * - inclusive:1
-                        * (start:mtrr_start)(mtrr_start:mtrr_end)(mtrr_end:end)
-                        *
-                        * depending on kind of overlap.
-                        *
-                        * Return the type of the first region and a pointer
-                        * to the start of next region so that caller will be
-                        * advised to lookup again after having adjusted start
-                        * and end.
-                        *
-                        * Note: This way we handle overlaps with multiple
-                        * entries and the default type properly.
-                        */
-                       if (start_state)
-                               *partial_end = base + get_mtrr_size(mask);
-                       else
-                               *partial_end = base;
-
-                       if (unlikely(*partial_end <= start)) {
-                               WARN_ON(1);
-                               *partial_end = start + PAGE_SIZE;
-                       }
+/* Build the cache_map containing the cache modes per memory range. */
+void __init mtrr_build_map(void)
+{
+       u64 start, end, size;
+       unsigned int i;
+       u8 type;
 
-                       end = *partial_end - 1; /* end is inclusive */
-                       *repeat = 1;
-                       *uniform = 0;
+       /* Add fixed MTRRs, optimize for adjacent entries with same type. */
+       if (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED) {
+               /*
+                * Start with 64k size fixed entries, preset 1st one (hence the
+                * loop below is starting with index 1).
+                */
+               start = 0;
+               end = size = 0x10000;
+               type = mtrr_state.fixed_ranges[0];
+
+               for (i = 1; i < MTRR_NUM_FIXED_RANGES; i++) {
+                       /* 8 64k entries, then 16 16k ones, rest 4k. */
+                       if (i == 8 || i == 24)
+                               size >>= 2;
+
+                       if (mtrr_state.fixed_ranges[i] != type) {
+                               add_map_entry(start, end, type);
+                               start = end;
+                               type = mtrr_state.fixed_ranges[i];
+                       }
+                       end += size;
                }
+               add_map_entry(start, end, type);
+       }
 
-               if ((start & mask) != (base & mask))
-                       continue;
+       /* Mark fixed, they take precedence. */
+       for (i = 0; i < cache_map_n; i++)
+               cache_map[i].fixed = 1;
+       cache_map_fixed = cache_map_n;
 
-               curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
-               if (prev_match == MTRR_TYPE_INVALID) {
-                       prev_match = curr_match;
-                       continue;
+       map_add_var();
+
+       pr_info("MTRR map: %u entries (%u fixed + %u variable; max %u), built from %u variable MTRRs\n",
+               cache_map_n, cache_map_fixed, cache_map_n - cache_map_fixed,
+               get_cache_map_size(), num_var_ranges + (mtrr_tom2 != 0));
+
+       if (mtrr_debug) {
+               for (i = 0; i < cache_map_n; i++) {
+                       pr_info("%3u: %016llx-%016llx %s\n", i,
+                               cache_map[i].start, cache_map[i].end - 1,
+                               mtrr_attrib_to_str(cache_map[i].type));
                }
+       }
+}
 
-               *uniform = 0;
-               if (check_type_overlap(&prev_match, &curr_match))
-                       return curr_match;
+/* Copy the cache_map from __initdata memory to dynamically allocated one. */
+void __init mtrr_copy_map(void)
+{
+       unsigned int new_size = get_cache_map_size();
+
+       if (!mtrr_state.enabled || !new_size) {
+               cache_map = NULL;
+               return;
+       }
+
+       mutex_lock(&mtrr_mutex);
+
+       cache_map = kcalloc(new_size, sizeof(*cache_map), GFP_KERNEL);
+       if (cache_map) {
+               memmove(cache_map, init_cache_map,
+                       cache_map_n * sizeof(*cache_map));
+               cache_map_size = new_size;
+       } else {
+               mtrr_state.enabled = 0;
+               pr_err("MTRRs disabled due to allocation failure for lookup map.\n");
+       }
+
+       mutex_unlock(&mtrr_mutex);
+}
+
+/**
+ * mtrr_overwrite_state - set static MTRR state
+ *
+ * Used to set MTRR state via different means (e.g. with data obtained from
+ * a hypervisor).
+ * Is allowed only for special cases when running virtualized. Must be called
+ * from the x86_init.hyper.init_platform() hook.  It can be called only once.
+ * The MTRR state can't be changed afterwards.  To ensure that, X86_FEATURE_MTRR
+ * is cleared.
+ */
+void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
+                         mtrr_type def_type)
+{
+       unsigned int i;
+
+       /* Only allowed to be called once before mtrr_bp_init(). */
+       if (WARN_ON_ONCE(mtrr_state_set))
+               return;
+
+       /* Only allowed when running virtualized. */
+       if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+               return;
+
+       /*
+        * Only allowed for special virtualization cases:
+        * - when running as Hyper-V, SEV-SNP guest using vTOM
+        * - when running as Xen PV guest
+        * - when running as SEV-SNP or TDX guest to avoid unnecessary
+        *   VMM communication/Virtualization exceptions (#VC, #VE)
+        */
+       if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) &&
+           !hv_is_isolation_supported() &&
+           !cpu_feature_enabled(X86_FEATURE_XENPV) &&
+           !cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
+               return;
+
+       /* Disable MTRR in order to disable MTRR modifications. */
+       setup_clear_cpu_cap(X86_FEATURE_MTRR);
+
+       if (var) {
+               if (num_var > MTRR_MAX_VAR_RANGES) {
+                       pr_warn("Trying to overwrite MTRR state with %u variable entries\n",
+                               num_var);
+                       num_var = MTRR_MAX_VAR_RANGES;
+               }
+               for (i = 0; i < num_var; i++)
+                       mtrr_state.var_ranges[i] = var[i];
+               num_var_ranges = num_var;
        }
 
-       if (prev_match != MTRR_TYPE_INVALID)
-               return prev_match;
+       mtrr_state.def_type = def_type;
+       mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED;
 
-       return mtrr_state.def_type;
+       mtrr_state_set = 1;
+}
+
+static u8 type_merge(u8 type, u8 new_type, u8 *uniform)
+{
+       u8 effective_type;
+
+       if (type == MTRR_TYPE_INVALID)
+               return new_type;
+
+       effective_type = get_effective_type(type, new_type);
+       if (type != effective_type)
+               *uniform = 0;
+
+       return effective_type;
 }
 
 /**
@@ -248,66 +497,49 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * MTRR_TYPE_INVALID - MTRR is disabled
  *
  * Output Argument:
- * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
- *          region is fully covered by a single MTRR entry or the default
- *          type.
+ * uniform - Set to 1 when the returned MTRR type is valid for the whole
+ *          region, set to 0 else.
  */
 u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-       u8 type, prev_type, is_uniform = 1, dummy;
-       int repeat;
-       u64 partial_end;
+       u8 type = MTRR_TYPE_INVALID;
+       unsigned int i;
 
-       /* Make end inclusive instead of exclusive */
-       end--;
+       if (!mtrr_state_set) {
+               /* Uniformity is unknown. */
+               *uniform = 0;
+               return MTRR_TYPE_UNCACHABLE;
+       }
 
-       if (!mtrr_state_set)
-               return MTRR_TYPE_INVALID;
+       *uniform = 1;
 
        if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-               return MTRR_TYPE_INVALID;
+               return MTRR_TYPE_UNCACHABLE;
 
-       /*
-        * Look up the fixed ranges first, which take priority over
-        * the variable ranges.
-        */
-       if ((start < 0x100000) &&
-           (mtrr_state.have_fixed) &&
-           (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
-               is_uniform = 0;
-               type = mtrr_type_lookup_fixed(start, end);
-               goto out;
-       }
+       for (i = 0; i < cache_map_n && start < end; i++) {
+               /* Region after current map entry? -> continue with next one. */
+               if (start >= cache_map[i].end)
+                       continue;
 
-       /*
-        * Look up the variable ranges.  Look of multiple ranges matching
-        * this address and pick type as per MTRR precedence.
-        */
-       type = mtrr_type_lookup_variable(start, end, &partial_end,
-                                        &repeat, &is_uniform);
+               /* Start of region not covered by current map entry? */
+               if (start < cache_map[i].start) {
+                       /* At least some part of region has default type. */
+                       type = type_merge(type, mtrr_state.def_type, uniform);
+                       /* End of region not covered, too? -> lookup done. */
+                       if (end <= cache_map[i].start)
+                               return type;
+               }
 
-       /*
-        * Common path is with repeat = 0.
-        * However, we can have cases where [start:end] spans across some
-        * MTRR ranges and/or the default type.  Do repeated lookups for
-        * that case here.
-        */
-       while (repeat) {
-               prev_type = type;
-               start = partial_end;
-               is_uniform = 0;
-               type = mtrr_type_lookup_variable(start, end, &partial_end,
-                                                &repeat, &dummy);
+               /* At least part of region covered by map entry. */
+               type = type_merge(type, cache_map[i].type, uniform);
 
-               if (check_type_overlap(&prev_type, &type))
-                       goto out;
+               start = cache_map[i].end;
        }
 
-       if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
-               type = MTRR_TYPE_WRBACK;
+       /* End of region past last entry in map? -> use default type. */
+       if (start < end)
+               type = type_merge(type, mtrr_state.def_type, uniform);
 
-out:
-       *uniform = is_uniform;
        return type;
 }
 
@@ -363,8 +595,8 @@ static void __init print_fixed_last(void)
        if (!last_fixed_end)
                return;
 
-       pr_debug("  %05X-%05X %s\n", last_fixed_start,
-                last_fixed_end - 1, mtrr_attrib_to_str(last_fixed_type));
+       pr_info("  %05X-%05X %s\n", last_fixed_start,
+               last_fixed_end - 1, mtrr_attrib_to_str(last_fixed_type));
 
        last_fixed_end = 0;
 }
@@ -402,10 +634,10 @@ static void __init print_mtrr_state(void)
        unsigned int i;
        int high_width;
 
-       pr_debug("MTRR default type: %s\n",
-                mtrr_attrib_to_str(mtrr_state.def_type));
+       pr_info("MTRR default type: %s\n",
+               mtrr_attrib_to_str(mtrr_state.def_type));
        if (mtrr_state.have_fixed) {
-               pr_debug("MTRR fixed ranges %sabled:\n",
+               pr_info("MTRR fixed ranges %sabled:\n",
                        ((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
                         (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
                         "en" : "dis");
@@ -420,26 +652,27 @@ static void __init print_mtrr_state(void)
                /* tail */
                print_fixed_last();
        }
-       pr_debug("MTRR variable ranges %sabled:\n",
-                mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
-       high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
+       pr_info("MTRR variable ranges %sabled:\n",
+               mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
+       high_width = (boot_cpu_data.x86_phys_bits - (32 - PAGE_SHIFT) + 3) / 4;
 
        for (i = 0; i < num_var_ranges; ++i) {
-               if (mtrr_state.var_ranges[i].mask_lo & (1 << 11))
-                       pr_debug("  %u base %0*X%05X000 mask %0*X%05X000 %s\n",
-                                i,
-                                high_width,
-                                mtrr_state.var_ranges[i].base_hi,
-                                mtrr_state.var_ranges[i].base_lo >> 12,
-                                high_width,
-                                mtrr_state.var_ranges[i].mask_hi,
-                                mtrr_state.var_ranges[i].mask_lo >> 12,
-                                mtrr_attrib_to_str(mtrr_state.var_ranges[i].base_lo & 0xff));
+               if (mtrr_state.var_ranges[i].mask_lo & MTRR_PHYSMASK_V)
+                       pr_info("  %u base %0*X%05X000 mask %0*X%05X000 %s\n",
+                               i,
+                               high_width,
+                               mtrr_state.var_ranges[i].base_hi,
+                               mtrr_state.var_ranges[i].base_lo >> 12,
+                               high_width,
+                               mtrr_state.var_ranges[i].mask_hi,
+                               mtrr_state.var_ranges[i].mask_lo >> 12,
+                               mtrr_attrib_to_str(mtrr_state.var_ranges[i].base_lo &
+                                                   MTRR_PHYSBASE_TYPE));
                else
-                       pr_debug("  %u disabled\n", i);
+                       pr_info("  %u disabled\n", i);
        }
        if (mtrr_tom2)
-               pr_debug("TOM2: %016llx aka %lldM\n", mtrr_tom2, mtrr_tom2>>20);
+               pr_info("TOM2: %016llx aka %lldM\n", mtrr_tom2, mtrr_tom2>>20);
 }
 
 /* Grab all of the MTRR state for this CPU into *state */
@@ -452,7 +685,7 @@ bool __init get_mtrr_state(void)
        vrs = mtrr_state.var_ranges;
 
        rdmsr(MSR_MTRRcap, lo, dummy);
-       mtrr_state.have_fixed = (lo >> 8) & 1;
+       mtrr_state.have_fixed = lo & MTRR_CAP_FIX;
 
        for (i = 0; i < num_var_ranges; i++)
                get_mtrr_var_range(i, &vrs[i]);
@@ -460,8 +693,8 @@ bool __init get_mtrr_state(void)
                get_fixed_ranges(mtrr_state.fixed_ranges);
 
        rdmsr(MSR_MTRRdefType, lo, dummy);
-       mtrr_state.def_type = (lo & 0xff);
-       mtrr_state.enabled = (lo & 0xc00) >> 10;
+       mtrr_state.def_type = lo & MTRR_DEF_TYPE_TYPE;
+       mtrr_state.enabled = (lo & MTRR_DEF_TYPE_ENABLE) >> MTRR_STATE_SHIFT;
 
        if (amd_special_default_mtrr()) {
                unsigned low, high;
@@ -474,7 +707,8 @@ bool __init get_mtrr_state(void)
                mtrr_tom2 &= 0xffffff800000ULL;
        }
 
-       print_mtrr_state();
+       if (mtrr_debug)
+               print_mtrr_state();
 
        mtrr_state_set = 1;
 
@@ -574,7 +808,7 @@ static void generic_get_mtrr(unsigned int reg, unsigned long *base,
 
        rdmsr(MTRRphysMask_MSR(reg), mask_lo, mask_hi);
 
-       if ((mask_lo & 0x800) == 0) {
+       if (!(mask_lo & MTRR_PHYSMASK_V)) {
                /*  Invalid (i.e. free) range */
                *base = 0;
                *size = 0;
@@ -585,8 +819,8 @@ static void generic_get_mtrr(unsigned int reg, unsigned long *base,
        rdmsr(MTRRphysBase_MSR(reg), base_lo, base_hi);
 
        /* Work out the shifted address mask: */
-       tmp = (u64)mask_hi << (32 - PAGE_SHIFT) | mask_lo >> PAGE_SHIFT;
-       mask = size_or_mask | tmp;
+       tmp = (u64)mask_hi << 32 | (mask_lo & PAGE_MASK);
+       mask = (u64)phys_hi_rsvd << 32 | tmp;
 
        /* Expand tmp with high bits to all 1s: */
        hi = fls64(tmp);
@@ -604,9 +838,9 @@ static void generic_get_mtrr(unsigned int reg, unsigned long *base,
         * This works correctly if size is a power of two, i.e. a
         * contiguous range:
         */
-       *size = -mask;
+       *size = -mask >> PAGE_SHIFT;
        *base = (u64)base_hi << (32 - PAGE_SHIFT) | base_lo >> PAGE_SHIFT;
-       *type = base_lo & 0xff;
+       *type = base_lo & MTRR_PHYSBASE_TYPE;
 
 out_put_cpu:
        put_cpu();
@@ -644,9 +878,8 @@ static bool set_mtrr_var_ranges(unsigned int index, struct mtrr_var_range *vr)
        bool changed = false;
 
        rdmsr(MTRRphysBase_MSR(index), lo, hi);
-       if ((vr->base_lo & 0xfffff0ffUL) != (lo & 0xfffff0ffUL)
-           || (vr->base_hi & (size_and_mask >> (32 - PAGE_SHIFT))) !=
-               (hi & (size_and_mask >> (32 - PAGE_SHIFT)))) {
+       if ((vr->base_lo & ~MTRR_PHYSBASE_RSVD) != (lo & ~MTRR_PHYSBASE_RSVD)
+           || (vr->base_hi & ~phys_hi_rsvd) != (hi & ~phys_hi_rsvd)) {
 
                mtrr_wrmsr(MTRRphysBase_MSR(index), vr->base_lo, vr->base_hi);
                changed = true;
@@ -654,9 +887,8 @@ static bool set_mtrr_var_ranges(unsigned int index, struct mtrr_var_range *vr)
 
        rdmsr(MTRRphysMask_MSR(index), lo, hi);
 
-       if ((vr->mask_lo & 0xfffff800UL) != (lo & 0xfffff800UL)
-           || (vr->mask_hi & (size_and_mask >> (32 - PAGE_SHIFT))) !=
-               (hi & (size_and_mask >> (32 - PAGE_SHIFT)))) {
+       if ((vr->mask_lo & ~MTRR_PHYSMASK_RSVD) != (lo & ~MTRR_PHYSMASK_RSVD)
+           || (vr->mask_hi & ~phys_hi_rsvd) != (hi & ~phys_hi_rsvd)) {
                mtrr_wrmsr(MTRRphysMask_MSR(index), vr->mask_lo, vr->mask_hi);
                changed = true;
        }
@@ -691,11 +923,12 @@ static unsigned long set_mtrr_state(void)
         * Set_mtrr_restore restores the old value of MTRRdefType,
         * so to set it we fiddle with the saved value:
         */
-       if ((deftype_lo & 0xff) != mtrr_state.def_type
-           || ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
+       if ((deftype_lo & MTRR_DEF_TYPE_TYPE) != mtrr_state.def_type ||
+           ((deftype_lo & MTRR_DEF_TYPE_ENABLE) >> MTRR_STATE_SHIFT) != mtrr_state.enabled) {
 
-               deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type |
-                            (mtrr_state.enabled << 10);
+               deftype_lo = (deftype_lo & MTRR_DEF_TYPE_DISABLE) |
+                            mtrr_state.def_type |
+                            (mtrr_state.enabled << MTRR_STATE_SHIFT);
                change_mask |= MTRR_CHANGE_MASK_DEFTYPE;
        }
 
@@ -708,7 +941,7 @@ void mtrr_disable(void)
        rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
 
        /* Disable MTRRs, and set the default type to uncached */
-       mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
+       mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & MTRR_DEF_TYPE_DISABLE, deftype_hi);
 }
 
 void mtrr_enable(void)
@@ -762,9 +995,9 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
                memset(vr, 0, sizeof(struct mtrr_var_range));
        } else {
                vr->base_lo = base << PAGE_SHIFT | type;
-               vr->base_hi = (base & size_and_mask) >> (32 - PAGE_SHIFT);
-               vr->mask_lo = -size << PAGE_SHIFT | 0x800;
-               vr->mask_hi = (-size & size_and_mask) >> (32 - PAGE_SHIFT);
+               vr->base_hi = (base >> (32 - PAGE_SHIFT)) & ~phys_hi_rsvd;
+               vr->mask_lo = -size << PAGE_SHIFT | MTRR_PHYSMASK_V;
+               vr->mask_hi = (-size >> (32 - PAGE_SHIFT)) & ~phys_hi_rsvd;
 
                mtrr_wrmsr(MTRRphysBase_MSR(reg), vr->base_lo, vr->base_hi);
                mtrr_wrmsr(MTRRphysMask_MSR(reg), vr->mask_lo, vr->mask_hi);
@@ -783,7 +1016,7 @@ int generic_validate_add_page(unsigned long base, unsigned long size,
         * For Intel PPro stepping <= 7
         * must be 4 MiB aligned and not touch 0x70000000 -> 0x7003FFFF
         */
-       if (is_cpu(INTEL) && boot_cpu_data.x86 == 6 &&
+       if (mtrr_if == &generic_mtrr_ops && boot_cpu_data.x86 == 6 &&
            boot_cpu_data.x86_model == 1 &&
            boot_cpu_data.x86_stepping <= 7) {
                if (base & ((1 << (22 - PAGE_SHIFT)) - 1)) {
@@ -817,7 +1050,7 @@ static int generic_have_wrcomb(void)
 {
        unsigned long config, dummy;
        rdmsr(MSR_MTRRcap, config, dummy);
-       return config & (1 << 10);
+       return config & MTRR_CAP_WC;
 }
 
 int positive_have_wrcomb(void)
diff --git a/arch/x86/kernel/cpu/mtrr/legacy.c b/arch/x86/kernel/cpu/mtrr/legacy.c
new file mode 100644 (file)
index 0000000..d25882f
--- /dev/null
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/syscore_ops.h>
+#include <asm/cpufeature.h>
+#include <asm/mtrr.h>
+#include <asm/processor.h>
+#include "mtrr.h"
+
+void mtrr_set_if(void)
+{
+       switch (boot_cpu_data.x86_vendor) {
+       case X86_VENDOR_AMD:
+               /* Pre-Athlon (K6) AMD CPU MTRRs */
+               if (cpu_feature_enabled(X86_FEATURE_K6_MTRR))
+                       mtrr_if = &amd_mtrr_ops;
+               break;
+       case X86_VENDOR_CENTAUR:
+               if (cpu_feature_enabled(X86_FEATURE_CENTAUR_MCR))
+                       mtrr_if = &centaur_mtrr_ops;
+               break;
+       case X86_VENDOR_CYRIX:
+               if (cpu_feature_enabled(X86_FEATURE_CYRIX_ARR))
+                       mtrr_if = &cyrix_mtrr_ops;
+               break;
+       default:
+               break;
+       }
+}
+
+/*
+ * The suspend/resume methods are only for CPUs without MTRR. CPUs using generic
+ * MTRR driver don't require this.
+ */
+struct mtrr_value {
+       mtrr_type       ltype;
+       unsigned long   lbase;
+       unsigned long   lsize;
+};
+
+static struct mtrr_value *mtrr_value;
+
+static int mtrr_save(void)
+{
+       int i;
+
+       if (!mtrr_value)
+               return -ENOMEM;
+
+       for (i = 0; i < num_var_ranges; i++) {
+               mtrr_if->get(i, &mtrr_value[i].lbase,
+                               &mtrr_value[i].lsize,
+                               &mtrr_value[i].ltype);
+       }
+       return 0;
+}
+
+static void mtrr_restore(void)
+{
+       int i;
+
+       for (i = 0; i < num_var_ranges; i++) {
+               if (mtrr_value[i].lsize) {
+                       mtrr_if->set(i, mtrr_value[i].lbase,
+                                    mtrr_value[i].lsize,
+                                    mtrr_value[i].ltype);
+               }
+       }
+}
+
+static struct syscore_ops mtrr_syscore_ops = {
+       .suspend        = mtrr_save,
+       .resume         = mtrr_restore,
+};
+
+void mtrr_register_syscore(void)
+{
+       mtrr_value = kcalloc(num_var_ranges, sizeof(*mtrr_value), GFP_KERNEL);
+
+       /*
+        * The CPU has no MTRR and seems to not support SMP. They have
+        * specific drivers, we use a tricky method to support
+        * suspend/resume for them.
+        *
+        * TBD: is there any system with such CPU which supports
+        * suspend/resume? If no, we should remove the code.
+        */
+       register_syscore_ops(&mtrr_syscore_ops);
+}
index 783f321..767bf1c 100644 (file)
 #define MTRR_TO_PHYS_WC_OFFSET 1000
 
 u32 num_var_ranges;
-static bool mtrr_enabled(void)
-{
-       return !!mtrr_if;
-}
 
 unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
-static DEFINE_MUTEX(mtrr_mutex);
-
-u64 size_or_mask, size_and_mask;
+DEFINE_MUTEX(mtrr_mutex);
 
 const struct mtrr_ops *mtrr_if;
 
@@ -105,21 +99,6 @@ static int have_wrcomb(void)
        return mtrr_if->have_wrcomb ? mtrr_if->have_wrcomb() : 0;
 }
 
-/*  This function returns the number of variable MTRRs  */
-static void __init set_num_var_ranges(bool use_generic)
-{
-       unsigned long config = 0, dummy;
-
-       if (use_generic)
-               rdmsr(MSR_MTRRcap, config, dummy);
-       else if (is_cpu(AMD) || is_cpu(HYGON))
-               config = 2;
-       else if (is_cpu(CYRIX) || is_cpu(CENTAUR))
-               config = 8;
-
-       num_var_ranges = config & 0xff;
-}
-
 static void __init init_table(void)
 {
        int i, max;
@@ -194,20 +173,8 @@ static inline int types_compatible(mtrr_type type1, mtrr_type type2)
  * Note that the mechanism is the same for UP systems, too; all the SMP stuff
  * becomes nops.
  */
-static void
-set_mtrr(unsigned int reg, unsigned long base, unsigned long size, mtrr_type type)
-{
-       struct set_mtrr_data data = { .smp_reg = reg,
-                                     .smp_base = base,
-                                     .smp_size = size,
-                                     .smp_type = type
-                                   };
-
-       stop_machine(mtrr_rendezvous_handler, &data, cpu_online_mask);
-}
-
-static void set_mtrr_cpuslocked(unsigned int reg, unsigned long base,
-                               unsigned long size, mtrr_type type)
+static void set_mtrr(unsigned int reg, unsigned long base, unsigned long size,
+                    mtrr_type type)
 {
        struct set_mtrr_data data = { .smp_reg = reg,
                                      .smp_base = base,
@@ -216,6 +183,8 @@ static void set_mtrr_cpuslocked(unsigned int reg, unsigned long base,
                                    };
 
        stop_machine_cpuslocked(mtrr_rendezvous_handler, &data, cpu_online_mask);
+
+       generic_rebuild_map();
 }
 
 /**
@@ -337,7 +306,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
        /* Search for an empty MTRR */
        i = mtrr_if->get_free_region(base, size, replace);
        if (i >= 0) {
-               set_mtrr_cpuslocked(i, base, size, type);
+               set_mtrr(i, base, size, type);
                if (likely(replace < 0)) {
                        mtrr_usage_table[i] = 1;
                } else {
@@ -345,7 +314,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
                        if (increment)
                                mtrr_usage_table[i]++;
                        if (unlikely(replace != i)) {
-                               set_mtrr_cpuslocked(replace, 0, 0, 0);
+                               set_mtrr(replace, 0, 0, 0);
                                mtrr_usage_table[replace] = 0;
                        }
                }
@@ -363,7 +332,7 @@ static int mtrr_check(unsigned long base, unsigned long size)
 {
        if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
                pr_warn("size and base must be multiples of 4 kiB\n");
-               pr_debug("size: 0x%lx  base: 0x%lx\n", size, base);
+               Dprintk("size: 0x%lx  base: 0x%lx\n", size, base);
                dump_stack();
                return -1;
        }
@@ -454,8 +423,7 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
                        }
                }
                if (reg < 0) {
-                       pr_debug("no MTRR for %lx000,%lx000 found\n",
-                                base, size);
+                       Dprintk("no MTRR for %lx000,%lx000 found\n", base, size);
                        goto out;
                }
        }
@@ -473,7 +441,7 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
                goto out;
        }
        if (--mtrr_usage_table[reg] < 1)
-               set_mtrr_cpuslocked(reg, 0, 0, 0);
+               set_mtrr(reg, 0, 0, 0);
        error = reg;
  out:
        mutex_unlock(&mtrr_mutex);
@@ -574,136 +542,54 @@ int arch_phys_wc_index(int handle)
 }
 EXPORT_SYMBOL_GPL(arch_phys_wc_index);
 
-/* The suspend/resume methods are only for CPU without MTRR. CPU using generic
- * MTRR driver doesn't require this
- */
-struct mtrr_value {
-       mtrr_type       ltype;
-       unsigned long   lbase;
-       unsigned long   lsize;
-};
-
-static struct mtrr_value mtrr_value[MTRR_MAX_VAR_RANGES];
-
-static int mtrr_save(void)
-{
-       int i;
-
-       for (i = 0; i < num_var_ranges; i++) {
-               mtrr_if->get(i, &mtrr_value[i].lbase,
-                               &mtrr_value[i].lsize,
-                               &mtrr_value[i].ltype);
-       }
-       return 0;
-}
-
-static void mtrr_restore(void)
-{
-       int i;
-
-       for (i = 0; i < num_var_ranges; i++) {
-               if (mtrr_value[i].lsize) {
-                       set_mtrr(i, mtrr_value[i].lbase,
-                                   mtrr_value[i].lsize,
-                                   mtrr_value[i].ltype);
-               }
-       }
-}
-
-
-
-static struct syscore_ops mtrr_syscore_ops = {
-       .suspend        = mtrr_save,
-       .resume         = mtrr_restore,
-};
-
 int __initdata changed_by_mtrr_cleanup;
 
-#define SIZE_OR_MASK_BITS(n)  (~((1ULL << ((n) - PAGE_SHIFT)) - 1))
 /**
- * mtrr_bp_init - initialize mtrrs on the boot CPU
+ * mtrr_bp_init - initialize MTRRs on the boot CPU
  *
  * This needs to be called early; before any of the other CPUs are
  * initialized (i.e. before smp_init()).
- *
  */
 void __init mtrr_bp_init(void)
 {
+       bool generic_mtrrs = cpu_feature_enabled(X86_FEATURE_MTRR);
        const char *why = "(not available)";
-       u32 phys_addr;
-
-       phys_addr = 32;
+       unsigned long config, dummy;
 
-       if (boot_cpu_has(X86_FEATURE_MTRR)) {
-               mtrr_if = &generic_mtrr_ops;
-               size_or_mask = SIZE_OR_MASK_BITS(36);
-               size_and_mask = 0x00f00000;
-               phys_addr = 36;
+       phys_hi_rsvd = GENMASK(31, boot_cpu_data.x86_phys_bits - 32);
 
+       if (!generic_mtrrs && mtrr_state.enabled) {
                /*
-                * This is an AMD specific MSR, but we assume(hope?) that
-                * Intel will implement it too when they extend the address
-                * bus of the Xeon.
+                * Software overwrite of MTRR state, only for generic case.
+                * Note that X86_FEATURE_MTRR has been reset in this case.
                 */
-               if (cpuid_eax(0x80000000) >= 0x80000008) {
-                       phys_addr = cpuid_eax(0x80000008) & 0xff;
-                       /* CPUID workaround for Intel 0F33/0F34 CPU */
-                       if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
-                           boot_cpu_data.x86 == 0xF &&
-                           boot_cpu_data.x86_model == 0x3 &&
-                           (boot_cpu_data.x86_stepping == 0x3 ||
-                            boot_cpu_data.x86_stepping == 0x4))
-                               phys_addr = 36;
-
-                       size_or_mask = SIZE_OR_MASK_BITS(phys_addr);
-                       size_and_mask = ~size_or_mask & 0xfffff00000ULL;
-               } else if (boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR &&
-                          boot_cpu_data.x86 == 6) {
-                       /*
-                        * VIA C* family have Intel style MTRRs,
-                        * but don't support PAE
-                        */
-                       size_or_mask = SIZE_OR_MASK_BITS(32);
-                       size_and_mask = 0;
-                       phys_addr = 32;
-               }
-       } else {
-               switch (boot_cpu_data.x86_vendor) {
-               case X86_VENDOR_AMD:
-                       if (cpu_feature_enabled(X86_FEATURE_K6_MTRR)) {
-                               /* Pre-Athlon (K6) AMD CPU MTRRs */
-                               mtrr_if = &amd_mtrr_ops;
-                               size_or_mask = SIZE_OR_MASK_BITS(32);
-                               size_and_mask = 0;
-                       }
-                       break;
-               case X86_VENDOR_CENTAUR:
-                       if (cpu_feature_enabled(X86_FEATURE_CENTAUR_MCR)) {
-                               mtrr_if = &centaur_mtrr_ops;
-                               size_or_mask = SIZE_OR_MASK_BITS(32);
-                               size_and_mask = 0;
-                       }
-                       break;
-               case X86_VENDOR_CYRIX:
-                       if (cpu_feature_enabled(X86_FEATURE_CYRIX_ARR)) {
-                               mtrr_if = &cyrix_mtrr_ops;
-                               size_or_mask = SIZE_OR_MASK_BITS(32);
-                               size_and_mask = 0;
-                       }
-                       break;
-               default:
-                       break;
-               }
+               init_table();
+               mtrr_build_map();
+               pr_info("MTRRs set to read-only\n");
+
+               return;
        }
 
+       if (generic_mtrrs)
+               mtrr_if = &generic_mtrr_ops;
+       else
+               mtrr_set_if();
+
        if (mtrr_enabled()) {
-               set_num_var_ranges(mtrr_if == &generic_mtrr_ops);
+               /* Get the number of variable MTRR ranges. */
+               if (mtrr_if == &generic_mtrr_ops)
+                       rdmsr(MSR_MTRRcap, config, dummy);
+               else
+                       config = mtrr_if->var_regs;
+               num_var_ranges = config & MTRR_CAP_VCNT;
+
                init_table();
                if (mtrr_if == &generic_mtrr_ops) {
                        /* BIOS may override */
                        if (get_mtrr_state()) {
                                memory_caching_control |= CACHE_MTRR;
-                               changed_by_mtrr_cleanup = mtrr_cleanup(phys_addr);
+                               changed_by_mtrr_cleanup = mtrr_cleanup();
+                               mtrr_build_map();
                        } else {
                                mtrr_if = NULL;
                                why = "by BIOS";
@@ -730,8 +616,14 @@ void mtrr_save_state(void)
        smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
 }
 
-static int __init mtrr_init_finialize(void)
+static int __init mtrr_init_finalize(void)
 {
+       /*
+        * Map might exist if mtrr_overwrite_state() has been called or if
+        * mtrr_enabled() returns true.
+        */
+       mtrr_copy_map();
+
        if (!mtrr_enabled())
                return 0;
 
@@ -741,16 +633,8 @@ static int __init mtrr_init_finialize(void)
                return 0;
        }
 
-       /*
-        * The CPU has no MTRR and seems to not support SMP. They have
-        * specific drivers, we use a tricky method to support
-        * suspend/resume for them.
-        *
-        * TBD: is there any system with such CPU which supports
-        * suspend/resume? If no, we should remove the code.
-        */
-       register_syscore_ops(&mtrr_syscore_ops);
+       mtrr_register_syscore();
 
        return 0;
 }
-subsys_initcall(mtrr_init_finialize);
+subsys_initcall(mtrr_init_finalize);
index 02eb587..5655f25 100644 (file)
 #define MTRR_CHANGE_MASK_VARIABLE  0x02
 #define MTRR_CHANGE_MASK_DEFTYPE   0x04
 
+extern bool mtrr_debug;
+#define Dprintk(x...) do { if (mtrr_debug) pr_info(x); } while (0)
+
 extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 
 struct mtrr_ops {
-       u32     vendor;
+       u32     var_regs;
        void    (*set)(unsigned int reg, unsigned long base,
                       unsigned long size, mtrr_type type);
        void    (*get)(unsigned int reg, unsigned long *base,
@@ -51,18 +54,26 @@ void fill_mtrr_var_range(unsigned int index,
                u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
 bool get_mtrr_state(void);
 
-extern u64 size_or_mask, size_and_mask;
 extern const struct mtrr_ops *mtrr_if;
-
-#define is_cpu(vnd)    (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd)
+extern struct mutex mtrr_mutex;
 
 extern unsigned int num_var_ranges;
 extern u64 mtrr_tom2;
 extern struct mtrr_state_type mtrr_state;
+extern u32 phys_hi_rsvd;
 
 void mtrr_state_warn(void);
 const char *mtrr_attrib_to_str(int x);
 void mtrr_wrmsr(unsigned, unsigned, unsigned);
+#ifdef CONFIG_X86_32
+void mtrr_set_if(void);
+void mtrr_register_syscore(void);
+#else
+static inline void mtrr_set_if(void) { }
+static inline void mtrr_register_syscore(void) { }
+#endif
+void mtrr_build_map(void);
+void mtrr_copy_map(void);
 
 /* CPU specific mtrr_ops vectors. */
 extern const struct mtrr_ops amd_mtrr_ops;
@@ -70,4 +81,14 @@ extern const struct mtrr_ops cyrix_mtrr_ops;
 extern const struct mtrr_ops centaur_mtrr_ops;
 
 extern int changed_by_mtrr_cleanup;
-extern int mtrr_cleanup(unsigned address_bits);
+extern int mtrr_cleanup(void);
+
+/*
+ * Must be used by code which uses mtrr_if to call platform-specific
+ * MTRR manipulation functions.
+ */
+static inline bool mtrr_enabled(void)
+{
+       return !!mtrr_if;
+}
+void generic_rebuild_map(void);
index 6ad33f3..7253440 100644 (file)
@@ -726,11 +726,15 @@ unlock:
 static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s)
 {
        struct task_struct *p, *t;
+       pid_t pid;
 
        rcu_read_lock();
        for_each_process_thread(p, t) {
-               if (is_closid_match(t, r) || is_rmid_match(t, r))
-                       seq_printf(s, "%d\n", t->pid);
+               if (is_closid_match(t, r) || is_rmid_match(t, r)) {
+                       pid = task_pid_vnr(t);
+                       if (pid)
+                               seq_printf(s, "%d\n", pid);
+               }
        }
        rcu_read_unlock();
 }
@@ -2301,6 +2305,26 @@ static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn)
        }
 }
 
+static void rdtgroup_kn_get(struct rdtgroup *rdtgrp, struct kernfs_node *kn)
+{
+       atomic_inc(&rdtgrp->waitcount);
+       kernfs_break_active_protection(kn);
+}
+
+static void rdtgroup_kn_put(struct rdtgroup *rdtgrp, struct kernfs_node *kn)
+{
+       if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+           (rdtgrp->flags & RDT_DELETED)) {
+               if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+                   rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
+                       rdtgroup_pseudo_lock_remove(rdtgrp);
+               kernfs_unbreak_active_protection(kn);
+               rdtgroup_remove(rdtgrp);
+       } else {
+               kernfs_unbreak_active_protection(kn);
+       }
+}
+
 struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
 {
        struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
@@ -2308,8 +2332,7 @@ struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
        if (!rdtgrp)
                return NULL;
 
-       atomic_inc(&rdtgrp->waitcount);
-       kernfs_break_active_protection(kn);
+       rdtgroup_kn_get(rdtgrp, kn);
 
        mutex_lock(&rdtgroup_mutex);
 
@@ -2328,17 +2351,7 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
                return;
 
        mutex_unlock(&rdtgroup_mutex);
-
-       if (atomic_dec_and_test(&rdtgrp->waitcount) &&
-           (rdtgrp->flags & RDT_DELETED)) {
-               if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
-                   rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
-                       rdtgroup_pseudo_lock_remove(rdtgrp);
-               kernfs_unbreak_active_protection(kn);
-               rdtgroup_remove(rdtgrp);
-       } else {
-               kernfs_unbreak_active_protection(kn);
-       }
+       rdtgroup_kn_put(rdtgrp, kn);
 }
 
 static int mkdir_mondata_all(struct kernfs_node *parent_kn,
@@ -3505,6 +3518,133 @@ out:
        return ret;
 }
 
+/**
+ * mongrp_reparent() - replace parent CTRL_MON group of a MON group
+ * @rdtgrp:            the MON group whose parent should be replaced
+ * @new_prdtgrp:       replacement parent CTRL_MON group for @rdtgrp
+ * @cpus:              cpumask provided by the caller for use during this call
+ *
+ * Replaces the parent CTRL_MON group for a MON group, resulting in all member
+ * tasks' CLOSID immediately changing to that of the new parent group.
+ * Monitoring data for the group is unaffected by this operation.
+ */
+static void mongrp_reparent(struct rdtgroup *rdtgrp,
+                           struct rdtgroup *new_prdtgrp,
+                           cpumask_var_t cpus)
+{
+       struct rdtgroup *prdtgrp = rdtgrp->mon.parent;
+
+       WARN_ON(rdtgrp->type != RDTMON_GROUP);
+       WARN_ON(new_prdtgrp->type != RDTCTRL_GROUP);
+
+       /* Nothing to do when simply renaming a MON group. */
+       if (prdtgrp == new_prdtgrp)
+               return;
+
+       WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list));
+       list_move_tail(&rdtgrp->mon.crdtgrp_list,
+                      &new_prdtgrp->mon.crdtgrp_list);
+
+       rdtgrp->mon.parent = new_prdtgrp;
+       rdtgrp->closid = new_prdtgrp->closid;
+
+       /* Propagate updated closid to all tasks in this group. */
+       rdt_move_group_tasks(rdtgrp, rdtgrp, cpus);
+
+       update_closid_rmid(cpus, NULL);
+}
+
+static int rdtgroup_rename(struct kernfs_node *kn,
+                          struct kernfs_node *new_parent, const char *new_name)
+{
+       struct rdtgroup *new_prdtgrp;
+       struct rdtgroup *rdtgrp;
+       cpumask_var_t tmpmask;
+       int ret;
+
+       rdtgrp = kernfs_to_rdtgroup(kn);
+       new_prdtgrp = kernfs_to_rdtgroup(new_parent);
+       if (!rdtgrp || !new_prdtgrp)
+               return -ENOENT;
+
+       /* Release both kernfs active_refs before obtaining rdtgroup mutex. */
+       rdtgroup_kn_get(rdtgrp, kn);
+       rdtgroup_kn_get(new_prdtgrp, new_parent);
+
+       mutex_lock(&rdtgroup_mutex);
+
+       rdt_last_cmd_clear();
+
+       /*
+        * Don't allow kernfs_to_rdtgroup() to return a parent rdtgroup if
+        * either kernfs_node is a file.
+        */
+       if (kernfs_type(kn) != KERNFS_DIR ||
+           kernfs_type(new_parent) != KERNFS_DIR) {
+               rdt_last_cmd_puts("Source and destination must be directories");
+               ret = -EPERM;
+               goto out;
+       }
+
+       if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) {
+               ret = -ENOENT;
+               goto out;
+       }
+
+       if (rdtgrp->type != RDTMON_GROUP || !kn->parent ||
+           !is_mon_groups(kn->parent, kn->name)) {
+               rdt_last_cmd_puts("Source must be a MON group\n");
+               ret = -EPERM;
+               goto out;
+       }
+
+       if (!is_mon_groups(new_parent, new_name)) {
+               rdt_last_cmd_puts("Destination must be a mon_groups subdirectory\n");
+               ret = -EPERM;
+               goto out;
+       }
+
+       /*
+        * If the MON group is monitoring CPUs, the CPUs must be assigned to the
+        * current parent CTRL_MON group and therefore cannot be assigned to
+        * the new parent, making the move illegal.
+        */
+       if (!cpumask_empty(&rdtgrp->cpu_mask) &&
+           rdtgrp->mon.parent != new_prdtgrp) {
+               rdt_last_cmd_puts("Cannot move a MON group that monitors CPUs\n");
+               ret = -EPERM;
+               goto out;
+       }
+
+       /*
+        * Allocate the cpumask for use in mongrp_reparent() to avoid the
+        * possibility of failing to allocate it after kernfs_rename() has
+        * succeeded.
+        */
+       if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) {
+               ret = -ENOMEM;
+               goto out;
+       }
+
+       /*
+        * Perform all input validation and allocations needed to ensure
+        * mongrp_reparent() will succeed before calling kernfs_rename(),
+        * otherwise it would be necessary to revert this call if
+        * mongrp_reparent() failed.
+        */
+       ret = kernfs_rename(kn, new_parent, new_name);
+       if (!ret)
+               mongrp_reparent(rdtgrp, new_prdtgrp, tmpmask);
+
+       free_cpumask_var(tmpmask);
+
+out:
+       mutex_unlock(&rdtgroup_mutex);
+       rdtgroup_kn_put(rdtgrp, kn);
+       rdtgroup_kn_put(new_prdtgrp, new_parent);
+       return ret;
+}
+
 static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
 {
        if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3))
@@ -3522,6 +3662,7 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
 static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
        .mkdir          = rdtgroup_mkdir,
        .rmdir          = rdtgroup_rmdir,
+       .rename         = rdtgroup_rename,
        .show_options   = rdtgroup_show_options,
 };
 
index 2a0e90f..91fa70e 100644 (file)
@@ -755,6 +755,7 @@ static void sgx_mmu_notifier_release(struct mmu_notifier *mn,
 {
        struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier);
        struct sgx_encl_mm *tmp = NULL;
+       bool found = false;
 
        /*
         * The enclave itself can remove encl_mm.  Note, objects can't be moved
@@ -764,12 +765,13 @@ static void sgx_mmu_notifier_release(struct mmu_notifier *mn,
        list_for_each_entry(tmp, &encl_mm->encl->mm_list, list) {
                if (tmp == encl_mm) {
                        list_del_rcu(&encl_mm->list);
+                       found = true;
                        break;
                }
        }
        spin_unlock(&encl_mm->encl->mm_lock);
 
-       if (tmp == encl_mm) {
+       if (found) {
                synchronize_srcu(&encl_mm->encl->srcu);
                mmu_notifier_put(mn);
        }
index 5e868b6..0270925 100644 (file)
@@ -79,7 +79,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
         * initial apic id, which also represents 32-bit extended x2apic id.
         */
        c->initial_apicid = edx;
-       smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+       smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
 #endif
        return 0;
 }
@@ -109,7 +109,8 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
         */
        cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
        c->initial_apicid = edx;
-       core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+       core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+       smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
        core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
        die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
        pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
index 3b58d87..6eaf9a6 100644 (file)
@@ -9,6 +9,7 @@
 #include <asm/processor.h>
 #include <asm/desc.h>
 #include <asm/traps.h>
+#include <asm/doublefault.h>
 
 #define ptr_ok(x) ((x) > PAGE_OFFSET && (x) < PAGE_OFFSET + MAXMEM)
 
index 0bf6779..f18ca44 100644 (file)
@@ -195,7 +195,6 @@ static void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
        printk("%sCall Trace:\n", log_lvl);
 
        unwind_start(&state, task, regs, stack);
-       stack = stack ? : get_stack_pointer(task, regs);
        regs = unwind_get_entry_regs(&state, &partial);
 
        /*
@@ -214,9 +213,13 @@ static void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
         * - hardirq stack
         * - entry stack
         */
-       for ( ; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
+       for (stack = stack ?: get_stack_pointer(task, regs);
+            stack;
+            stack = stack_info.next_sp) {
                const char *stack_name;
 
+               stack = PTR_ALIGN(stack, sizeof(long));
+
                if (get_stack_info(stack, task, &stack_info, &visit_mask)) {
                        /*
                         * We weren't on a valid stack.  It's possible that
index 9fcfa5c..af5cbdd 100644 (file)
@@ -57,7 +57,7 @@ static inline void fpregs_restore_userregs(void)
        struct fpu *fpu = &current->thread.fpu;
        int cpu = smp_processor_id();
 
-       if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_IO_WORKER)))
+       if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
                return;
 
        if (!fpregs_state_valid(fpu, cpu)) {
index caf3348..1015af1 100644 (file)
@@ -426,7 +426,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 
        this_cpu_write(in_kernel_fpu, true);
 
-       if (!(current->flags & (PF_KTHREAD | PF_IO_WORKER)) &&
+       if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
            !test_thread_flag(TIF_NEED_FPU_LOAD)) {
                set_thread_flag(TIF_NEED_FPU_LOAD);
                save_fpregs_to_fpstate(&current->thread.fpu);
index 851eb13..998a08f 100644 (file)
@@ -53,7 +53,7 @@ void fpu__init_cpu(void)
        fpu__init_cpu_xstate();
 }
 
-static bool fpu__probe_without_cpuid(void)
+static bool __init fpu__probe_without_cpuid(void)
 {
        unsigned long cr0;
        u16 fsw, fcw;
@@ -71,7 +71,7 @@ static bool fpu__probe_without_cpuid(void)
        return fsw == 0 && (fcw & 0x103f) == 0x003f;
 }
 
-static void fpu__init_system_early_generic(struct cpuinfo_x86 *c)
+static void __init fpu__init_system_early_generic(void)
 {
        if (!boot_cpu_has(X86_FEATURE_CPUID) &&
            !test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
@@ -211,10 +211,10 @@ static void __init fpu__init_system_xstate_size_legacy(void)
  * Called on the boot CPU once per system bootup, to set up the initial
  * FPU state that is later cloned into all processes:
  */
-void __init fpu__init_system(struct cpuinfo_x86 *c)
+void __init fpu__init_system(void)
 {
        fpstate_reset(&current->thread.fpu);
-       fpu__init_system_early_generic(c);
+       fpu__init_system_early_generic();
 
        /*
         * The FPU has to be operational for some of the
index 5e7ead5..01e8f34 100644 (file)
@@ -525,9 +525,6 @@ static void *addr_from_call(void *ptr)
        return ptr + CALL_INSN_SIZE + call.disp;
 }
 
-void prepare_ftrace_return(unsigned long ip, unsigned long *parent,
-                          unsigned long frame_pointer);
-
 /*
  * If the ops->trampoline was not allocated, then it probably
  * has a static trampoline func, or is the ftrace caller itself.
index 10c27b4..246a609 100644 (file)
@@ -69,6 +69,7 @@ asmlinkage __visible void __init __noreturn i386_start_kernel(void)
  * to the first kernel PMD. Note the upper half of each PMD or PTE are
  * always zero at this stage.
  */
+void __init mk_early_pgtbl_32(void);
 void __init mk_early_pgtbl_32(void)
 {
 #ifdef __pa
index 67c8ed9..c931899 100644 (file)
@@ -138,20 +138,6 @@ SYM_CODE_START(startup_32)
        jmp .Ldefault_entry
 SYM_CODE_END(startup_32)
 
-#ifdef CONFIG_HOTPLUG_CPU
-/*
- * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
- * up already except stack. We just set up stack here. Then call
- * start_secondary().
- */
-SYM_FUNC_START(start_cpu0)
-       movl initial_stack, %ecx
-       movl %ecx, %esp
-       call *(initial_code)
-1:     jmp 1b
-SYM_FUNC_END(start_cpu0)
-#endif
-
 /*
  * Non-boot CPU entry point; entered from trampoline.S
  * We can't lgdt here, because lgdt itself uses a data segment, but
index a5df3e9..c5b9289 100644 (file)
@@ -24,7 +24,9 @@
 #include "../entry/calling.h"
 #include <asm/export.h>
 #include <asm/nospec-branch.h>
+#include <asm/apicdef.h>
 #include <asm/fixmap.h>
+#include <asm/smp.h>
 
 /*
  * We are not able to switch in one step to the final KERNEL ADDRESS SPACE
@@ -77,6 +79,15 @@ SYM_CODE_START_NOALIGN(startup_64)
        call    startup_64_setup_env
        popq    %rsi
 
+       /* Now switch to __KERNEL_CS so IRET works reliably */
+       pushq   $__KERNEL_CS
+       leaq    .Lon_kernel_cs(%rip), %rax
+       pushq   %rax
+       lretq
+
+.Lon_kernel_cs:
+       UNWIND_HINT_END_OF_STACK
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
        /*
         * Activate SEV/SME memory encryption if supported/enabled. This needs to
@@ -90,15 +101,6 @@ SYM_CODE_START_NOALIGN(startup_64)
        popq    %rsi
 #endif
 
-       /* Now switch to __KERNEL_CS so IRET works reliably */
-       pushq   $__KERNEL_CS
-       leaq    .Lon_kernel_cs(%rip), %rax
-       pushq   %rax
-       lretq
-
-.Lon_kernel_cs:
-       UNWIND_HINT_END_OF_STACK
-
        /* Sanitize CPU configuration */
        call verify_cpu
 
@@ -234,8 +236,67 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
        ANNOTATE_NOENDBR // above
 
 #ifdef CONFIG_SMP
+       /*
+        * For parallel boot, the APIC ID is read from the APIC, and then
+        * used to look up the CPU number.  For booting a single CPU, the
+        * CPU number is encoded in smpboot_control.
+        *
+        * Bit 31       STARTUP_READ_APICID (Read APICID from APIC)
+        * Bit 0-23     CPU# if STARTUP_xx flags are not set
+        */
        movl    smpboot_control(%rip), %ecx
+       testl   $STARTUP_READ_APICID, %ecx
+       jnz     .Lread_apicid
+       /*
+        * No control bit set, single CPU bringup. CPU number is provided
+        * in bit 0-23. This is also the boot CPU case (CPU number 0).
+        */
+       andl    $(~STARTUP_PARALLEL_MASK), %ecx
+       jmp     .Lsetup_cpu
 
+.Lread_apicid:
+       /* Check whether X2APIC mode is already enabled */
+       mov     $MSR_IA32_APICBASE, %ecx
+       rdmsr
+       testl   $X2APIC_ENABLE, %eax
+       jnz     .Lread_apicid_msr
+
+       /* Read the APIC ID from the fix-mapped MMIO space. */
+       movq    apic_mmio_base(%rip), %rcx
+       addq    $APIC_ID, %rcx
+       movl    (%rcx), %eax
+       shr     $24, %eax
+       jmp     .Llookup_AP
+
+.Lread_apicid_msr:
+       mov     $APIC_X2APIC_ID_MSR, %ecx
+       rdmsr
+
+.Llookup_AP:
+       /* EAX contains the APIC ID of the current CPU */
+       xorq    %rcx, %rcx
+       leaq    cpuid_to_apicid(%rip), %rbx
+
+.Lfind_cpunr:
+       cmpl    (%rbx,%rcx,4), %eax
+       jz      .Lsetup_cpu
+       inc     %ecx
+#ifdef CONFIG_FORCE_NR_CPUS
+       cmpl    $NR_CPUS, %ecx
+#else
+       cmpl    nr_cpu_ids(%rip), %ecx
+#endif
+       jb      .Lfind_cpunr
+
+       /*  APIC ID not found in the table. Drop the trampoline lock and bail. */
+       movq    trampoline_lock(%rip), %rax
+       movl    $0, (%rax)
+
+1:     cli
+       hlt
+       jmp     1b
+
+.Lsetup_cpu:
        /* Get the per cpu offset for the given CPU# which is in ECX */
        movq    __per_cpu_offset(,%rcx,8), %rdx
 #else
@@ -252,6 +313,16 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
        movq    TASK_threadsp(%rax), %rsp
 
        /*
+        * Now that this CPU is running on its own stack, drop the realmode
+        * protection. For the boot CPU the pointer is NULL!
+        */
+       movq    trampoline_lock(%rip), %rax
+       testq   %rax, %rax
+       jz      .Lsetup_gdt
+       movl    $0, (%rax)
+
+.Lsetup_gdt:
+       /*
         * We must switch to a new descriptor in kernel space for the GDT
         * because soon the kernel won't have access anymore to the userspace
         * addresses where we're currently running on. We have to do that here
@@ -375,13 +446,13 @@ SYM_CODE_END(secondary_startup_64)
 #include "verify_cpu.S"
 #include "sev_verify_cbit.S"
 
-#ifdef CONFIG_HOTPLUG_CPU
+#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_AMD_MEM_ENCRYPT)
 /*
- * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
- * up already except stack. We just set up stack here. Then call
- * start_secondary() via .Ljump_to_C_code.
+ * Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for
+ * restarting the boot CPU or for restarting SEV guest CPUs after CPU hot
+ * unplug. Everything is set up already except the stack.
  */
-SYM_CODE_START(start_cpu0)
+SYM_CODE_START(soft_restart_cpu)
        ANNOTATE_NOENDBR
        UNWIND_HINT_END_OF_STACK
 
@@ -390,7 +461,7 @@ SYM_CODE_START(start_cpu0)
        movq    TASK_threadsp(%rcx), %rsp
 
        jmp     .Ljump_to_C_code
-SYM_CODE_END(start_cpu0)
+SYM_CODE_END(soft_restart_cpu)
 #endif
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
@@ -433,6 +504,8 @@ SYM_DATA(initial_code,      .quad x86_64_start_kernel)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 SYM_DATA(initial_vc_handler,   .quad handle_vc_boot_ghcb)
 #endif
+
+SYM_DATA(trampoline_lock, .quad 0);
        __FINITDATA
 
        __INIT
index 766ffe3..9f668d2 100644 (file)
@@ -211,6 +211,13 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
 #ifdef CONFIG_X86_MCE_THRESHOLD
        sum += irq_stats(cpu)->irq_threshold_count;
 #endif
+#ifdef CONFIG_X86_HV_CALLBACK_VECTOR
+       sum += irq_stats(cpu)->irq_hv_callback_count;
+#endif
+#if IS_ENABLED(CONFIG_HYPERV)
+       sum += irq_stats(cpu)->irq_hv_reenlightenment_count;
+       sum += irq_stats(cpu)->hyperv_stimer0_count;
+#endif
 #ifdef CONFIG_X86_MCE
        sum += per_cpu(mce_exception_count, cpu);
        sum += per_cpu(mce_poll_count, cpu);
index 670eb08..ee4fe8c 100644 (file)
@@ -165,32 +165,19 @@ int arch_asym_cpu_priority(int cpu)
 
 /**
  * sched_set_itmt_core_prio() - Set CPU priority based on ITMT
- * @prio:      Priority of cpu core
- * @core_cpu:  The cpu number associated with the core
+ * @prio:      Priority of @cpu
+ * @cpu:       The CPU number
  *
  * The pstate driver will find out the max boost frequency
  * and call this function to set a priority proportional
- * to the max boost frequency. CPU with higher boost
+ * to the max boost frequency. CPUs with higher boost
  * frequency will receive higher priority.
  *
  * No need to rebuild sched domain after updating
  * the CPU priorities. The sched domains have no
  * dependency on CPU priorities.
  */
-void sched_set_itmt_core_prio(int prio, int core_cpu)
+void sched_set_itmt_core_prio(int prio, int cpu)
 {
-       int cpu, i = 1;
-
-       for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
-               int smt_prio;
-
-               /*
-                * Ensure that the siblings are moved to the end
-                * of the priority chain and only used when
-                * all other high priority cpus are out of capacity.
-                */
-               smt_prio = prio * smp_num_siblings / (i * i);
-               per_cpu(sched_core_priority, cpu) = smt_prio;
-               i++;
-       }
+       per_cpu(sched_core_priority, cpu) = prio;
 }
index 0f35d44..fb8f521 100644 (file)
@@ -71,7 +71,7 @@ static int kvm_set_wallclock(const struct timespec64 *now)
        return -ENODEV;
 }
 
-static noinstr u64 kvm_clock_read(void)
+static u64 kvm_clock_read(void)
 {
        u64 ret;
 
@@ -88,7 +88,7 @@ static u64 kvm_clock_get_cycles(struct clocksource *cs)
 
 static noinstr u64 kvm_sched_clock_read(void)
 {
-       return kvm_clock_read() - kvm_sched_clock_offset;
+       return pvclock_clocksource_read_nowd(this_cpu_pvti()) - kvm_sched_clock_offset;
 }
 
 static inline void kvm_sched_clock_init(bool stable)
index 776f4b1..a0c5518 100644 (file)
@@ -496,7 +496,7 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
         */
        sev_es_nmi_complete();
        if (IS_ENABLED(CONFIG_NMI_CHECK_CPU))
-               arch_atomic_long_inc(&nsp->idt_calls);
+               raw_atomic_long_inc(&nsp->idt_calls);
 
        if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id()))
                return;
index b348a67..b525fe6 100644 (file)
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/pnp.h>
 
 #include <asm/setup.h>
 #include <asm/bios_ebda.h>
index dac41a0..ff9b80a 100644 (file)
@@ -759,15 +759,26 @@ bool xen_set_default_idle(void)
 }
 #endif
 
+struct cpumask cpus_stop_mask;
+
 void __noreturn stop_this_cpu(void *dummy)
 {
+       struct cpuinfo_x86 *c = this_cpu_ptr(&cpu_info);
+       unsigned int cpu = smp_processor_id();
+
        local_irq_disable();
+
        /*
-        * Remove this CPU:
+        * Remove this CPU from the online mask and disable it
+        * unconditionally. This might be redundant in case that the reboot
+        * vector was handled late and stop_other_cpus() sent an NMI.
+        *
+        * According to SDM and APM NMIs can be accepted even after soft
+        * disabling the local APIC.
         */
-       set_cpu_online(smp_processor_id(), false);
+       set_cpu_online(cpu, false);
        disable_local_APIC();
-       mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
+       mcheck_cpu_clear(c);
 
        /*
         * Use wbinvd on processors that support SME. This provides support
@@ -781,8 +792,17 @@ void __noreturn stop_this_cpu(void *dummy)
         * Test the CPUID bit directly because the machine might've cleared
         * X86_FEATURE_SME due to cmdline options.
         */
-       if (cpuid_eax(0x8000001f) & BIT(0))
+       if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
                native_wbinvd();
+
+       /*
+        * This brings a cache line back and dirties it, but
+        * native_stop_other_cpus() will overwrite cpus_stop_mask after it
+        * observed that all CPUs reported stop. This write will invalidate
+        * the related cache line on this CPU.
+        */
+       cpumask_clear_cpu(cpu, &cpus_stop_mask);
+
        for (;;) {
                /*
                 * Use native_halt() so that memory contents don't change
index 56acf53..b3f8137 100644 (file)
@@ -101,11 +101,11 @@ u64 __pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, bool dowd)
         * updating at the same time, and one of them could be slightly behind,
         * making the assumption that last_value always go forward fail to hold.
         */
-       last = arch_atomic64_read(&last_value);
+       last = raw_atomic64_read(&last_value);
        do {
                if (ret <= last)
                        return last;
-       } while (!arch_atomic64_try_cmpxchg(&last_value, &last, ret));
+       } while (!raw_atomic64_try_cmpxchg(&last_value, &last, ret));
 
        return ret;
 }
index 16babff..fd975a4 100644 (file)
@@ -796,7 +796,6 @@ static void __init early_reserve_memory(void)
 
        memblock_x86_reserve_range_setup_data();
 
-       reserve_ibft_region();
        reserve_bios_regions();
        trim_snb_memory();
 }
@@ -1032,11 +1031,14 @@ void __init setup_arch(char **cmdline_p)
        if (efi_enabled(EFI_BOOT))
                efi_init();
 
+       reserve_ibft_region();
        dmi_setup();
 
        /*
         * VMware detection requires dmi to be available, so this
         * needs to be done after dmi_setup(), for the boot CPU.
+        * For some guest types (Xen PV, SEV-SNP, TDX) it is required to be
+        * called before cache_bp_init() for setting up MTRR state.
         */
        init_hypervisor_platform();
 
index 3a5b0c9..2eabccd 100644 (file)
@@ -12,6 +12,9 @@
 #ifndef __BOOT_COMPRESSED
 #define error(v)       pr_err(v)
 #define has_cpuflag(f) boot_cpu_has(f)
+#else
+#undef WARN
+#define WARN(condition, format...) (!!(condition))
 #endif
 
 /* I/O parameters for CPUID-related helpers */
@@ -991,3 +994,103 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
                        cpuid_ext_range_max = fn->eax;
        }
 }
+
+static void pvalidate_pages(struct snp_psc_desc *desc)
+{
+       struct psc_entry *e;
+       unsigned long vaddr;
+       unsigned int size;
+       unsigned int i;
+       bool validate;
+       int rc;
+
+       for (i = 0; i <= desc->hdr.end_entry; i++) {
+               e = &desc->entries[i];
+
+               vaddr = (unsigned long)pfn_to_kaddr(e->gfn);
+               size = e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K;
+               validate = e->operation == SNP_PAGE_STATE_PRIVATE;
+
+               rc = pvalidate(vaddr, size, validate);
+               if (rc == PVALIDATE_FAIL_SIZEMISMATCH && size == RMP_PG_SIZE_2M) {
+                       unsigned long vaddr_end = vaddr + PMD_SIZE;
+
+                       for (; vaddr < vaddr_end; vaddr += PAGE_SIZE) {
+                               rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
+                               if (rc)
+                                       break;
+                       }
+               }
+
+               if (rc) {
+                       WARN(1, "Failed to validate address 0x%lx ret %d", vaddr, rc);
+                       sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+               }
+       }
+}
+
+static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc)
+{
+       int cur_entry, end_entry, ret = 0;
+       struct snp_psc_desc *data;
+       struct es_em_ctxt ctxt;
+
+       vc_ghcb_invalidate(ghcb);
+
+       /* Copy the input desc into GHCB shared buffer */
+       data = (struct snp_psc_desc *)ghcb->shared_buffer;
+       memcpy(ghcb->shared_buffer, desc, min_t(int, GHCB_SHARED_BUF_SIZE, sizeof(*desc)));
+
+       /*
+        * As per the GHCB specification, the hypervisor can resume the guest
+        * before processing all the entries. Check whether all the entries
+        * are processed. If not, then keep retrying. Note, the hypervisor
+        * will update the data memory directly to indicate the status, so
+        * reference the data->hdr everywhere.
+        *
+        * The strategy here is to wait for the hypervisor to change the page
+        * state in the RMP table before guest accesses the memory pages. If the
+        * page state change was not successful, then later memory access will
+        * result in a crash.
+        */
+       cur_entry = data->hdr.cur_entry;
+       end_entry = data->hdr.end_entry;
+
+       while (data->hdr.cur_entry <= data->hdr.end_entry) {
+               ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+
+               /* This will advance the shared buffer data points to. */
+               ret = sev_es_ghcb_hv_call(ghcb, &ctxt, SVM_VMGEXIT_PSC, 0, 0);
+
+               /*
+                * Page State Change VMGEXIT can pass error code through
+                * exit_info_2.
+                */
+               if (WARN(ret || ghcb->save.sw_exit_info_2,
+                        "SNP: PSC failed ret=%d exit_info_2=%llx\n",
+                        ret, ghcb->save.sw_exit_info_2)) {
+                       ret = 1;
+                       goto out;
+               }
+
+               /* Verify that reserved bit is not set */
+               if (WARN(data->hdr.reserved, "Reserved bit is set in the PSC header\n")) {
+                       ret = 1;
+                       goto out;
+               }
+
+               /*
+                * Sanity check that entry processing is not going backwards.
+                * This will happen only if hypervisor is tricking us.
+                */
+               if (WARN(data->hdr.end_entry > end_entry || cur_entry > data->hdr.cur_entry,
+"SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
+                        end_entry, data->hdr.end_entry, cur_entry, data->hdr.cur_entry)) {
+                       ret = 1;
+                       goto out;
+               }
+       }
+
+out:
+       return ret;
+}
index b031244..1ee7bed 100644 (file)
@@ -113,13 +113,23 @@ struct ghcb_state {
 };
 
 static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
-DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
-
 static DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
 
 struct sev_config {
        __u64 debug             : 1,
-             __reserved        : 63;
+
+             /*
+              * A flag used by __set_pages_state() that indicates when the
+              * per-CPU GHCB has been created and registered and thus can be
+              * used by the BSP instead of the early boot GHCB.
+              *
+              * For APs, the per-CPU GHCB is created before they are started
+              * and registered upon startup, so this flag can be used globally
+              * for the BSP and APs.
+              */
+             ghcbs_initialized : 1,
+
+             __reserved        : 62;
 };
 
 static struct sev_config sev_cfg __read_mostly;
@@ -645,32 +655,26 @@ static u64 __init get_jump_table_addr(void)
        return ret;
 }
 
-static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool validate)
-{
-       unsigned long vaddr_end;
-       int rc;
-
-       vaddr = vaddr & PAGE_MASK;
-       vaddr_end = vaddr + (npages << PAGE_SHIFT);
-
-       while (vaddr < vaddr_end) {
-               rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
-               if (WARN(rc, "Failed to validate address 0x%lx ret %d", vaddr, rc))
-                       sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
-
-               vaddr = vaddr + PAGE_SIZE;
-       }
-}
-
-static void __init early_set_pages_state(unsigned long paddr, unsigned int npages, enum psc_op op)
+static void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
+                                 unsigned long npages, enum psc_op op)
 {
        unsigned long paddr_end;
        u64 val;
+       int ret;
+
+       vaddr = vaddr & PAGE_MASK;
 
        paddr = paddr & PAGE_MASK;
        paddr_end = paddr + (npages << PAGE_SHIFT);
 
        while (paddr < paddr_end) {
+               if (op == SNP_PAGE_STATE_SHARED) {
+                       /* Page validation must be rescinded before changing to shared */
+                       ret = pvalidate(vaddr, RMP_PG_SIZE_4K, false);
+                       if (WARN(ret, "Failed to validate address 0x%lx ret %d", paddr, ret))
+                               goto e_term;
+               }
+
                /*
                 * Use the MSR protocol because this function can be called before
                 * the GHCB is established.
@@ -691,7 +695,15 @@ static void __init early_set_pages_state(unsigned long paddr, unsigned int npage
                         paddr, GHCB_MSR_PSC_RESP_VAL(val)))
                        goto e_term;
 
-               paddr = paddr + PAGE_SIZE;
+               if (op == SNP_PAGE_STATE_PRIVATE) {
+                       /* Page validation must be performed after changing to private */
+                       ret = pvalidate(vaddr, RMP_PG_SIZE_4K, true);
+                       if (WARN(ret, "Failed to validate address 0x%lx ret %d", paddr, ret))
+                               goto e_term;
+               }
+
+               vaddr += PAGE_SIZE;
+               paddr += PAGE_SIZE;
        }
 
        return;
@@ -701,7 +713,7 @@ e_term:
 }
 
 void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
-                                        unsigned int npages)
+                                        unsigned long npages)
 {
        /*
         * This can be invoked in early boot while running identity mapped, so
@@ -716,14 +728,11 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
          * Ask the hypervisor to mark the memory pages as private in the RMP
          * table.
          */
-       early_set_pages_state(paddr, npages, SNP_PAGE_STATE_PRIVATE);
-
-       /* Validate the memory pages after they've been added in the RMP table. */
-       pvalidate_pages(vaddr, npages, true);
+       early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE);
 }
 
 void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
-                                       unsigned int npages)
+                                       unsigned long npages)
 {
        /*
         * This can be invoked in early boot while running identity mapped, so
@@ -734,11 +743,8 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
        if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
                return;
 
-       /* Invalidate the memory pages before they are marked shared in the RMP table. */
-       pvalidate_pages(vaddr, npages, false);
-
         /* Ask hypervisor to mark the memory pages shared in the RMP table. */
-       early_set_pages_state(paddr, npages, SNP_PAGE_STATE_SHARED);
+       early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED);
 }
 
 void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op)
@@ -756,96 +762,16 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
                WARN(1, "invalid memory op %d\n", op);
 }
 
-static int vmgexit_psc(struct snp_psc_desc *desc)
+static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
+                                      unsigned long vaddr_end, int op)
 {
-       int cur_entry, end_entry, ret = 0;
-       struct snp_psc_desc *data;
        struct ghcb_state state;
-       struct es_em_ctxt ctxt;
-       unsigned long flags;
-       struct ghcb *ghcb;
-
-       /*
-        * __sev_get_ghcb() needs to run with IRQs disabled because it is using
-        * a per-CPU GHCB.
-        */
-       local_irq_save(flags);
-
-       ghcb = __sev_get_ghcb(&state);
-       if (!ghcb) {
-               ret = 1;
-               goto out_unlock;
-       }
-
-       /* Copy the input desc into GHCB shared buffer */
-       data = (struct snp_psc_desc *)ghcb->shared_buffer;
-       memcpy(ghcb->shared_buffer, desc, min_t(int, GHCB_SHARED_BUF_SIZE, sizeof(*desc)));
-
-       /*
-        * As per the GHCB specification, the hypervisor can resume the guest
-        * before processing all the entries. Check whether all the entries
-        * are processed. If not, then keep retrying. Note, the hypervisor
-        * will update the data memory directly to indicate the status, so
-        * reference the data->hdr everywhere.
-        *
-        * The strategy here is to wait for the hypervisor to change the page
-        * state in the RMP table before guest accesses the memory pages. If the
-        * page state change was not successful, then later memory access will
-        * result in a crash.
-        */
-       cur_entry = data->hdr.cur_entry;
-       end_entry = data->hdr.end_entry;
-
-       while (data->hdr.cur_entry <= data->hdr.end_entry) {
-               ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
-
-               /* This will advance the shared buffer data points to. */
-               ret = sev_es_ghcb_hv_call(ghcb, &ctxt, SVM_VMGEXIT_PSC, 0, 0);
-
-               /*
-                * Page State Change VMGEXIT can pass error code through
-                * exit_info_2.
-                */
-               if (WARN(ret || ghcb->save.sw_exit_info_2,
-                        "SNP: PSC failed ret=%d exit_info_2=%llx\n",
-                        ret, ghcb->save.sw_exit_info_2)) {
-                       ret = 1;
-                       goto out;
-               }
-
-               /* Verify that reserved bit is not set */
-               if (WARN(data->hdr.reserved, "Reserved bit is set in the PSC header\n")) {
-                       ret = 1;
-                       goto out;
-               }
-
-               /*
-                * Sanity check that entry processing is not going backwards.
-                * This will happen only if hypervisor is tricking us.
-                */
-               if (WARN(data->hdr.end_entry > end_entry || cur_entry > data->hdr.cur_entry,
-"SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
-                        end_entry, data->hdr.end_entry, cur_entry, data->hdr.cur_entry)) {
-                       ret = 1;
-                       goto out;
-               }
-       }
-
-out:
-       __sev_put_ghcb(&state);
-
-out_unlock:
-       local_irq_restore(flags);
-
-       return ret;
-}
-
-static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
-                             unsigned long vaddr_end, int op)
-{
+       bool use_large_entry;
        struct psc_hdr *hdr;
        struct psc_entry *e;
+       unsigned long flags;
        unsigned long pfn;
+       struct ghcb *ghcb;
        int i;
 
        hdr = &data->hdr;
@@ -854,74 +780,104 @@ static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
        memset(data, 0, sizeof(*data));
        i = 0;
 
-       while (vaddr < vaddr_end) {
-               if (is_vmalloc_addr((void *)vaddr))
+       while (vaddr < vaddr_end && i < ARRAY_SIZE(data->entries)) {
+               hdr->end_entry = i;
+
+               if (is_vmalloc_addr((void *)vaddr)) {
                        pfn = vmalloc_to_pfn((void *)vaddr);
-               else
+                       use_large_entry = false;
+               } else {
                        pfn = __pa(vaddr) >> PAGE_SHIFT;
+                       use_large_entry = true;
+               }
 
                e->gfn = pfn;
                e->operation = op;
-               hdr->end_entry = i;
 
-               /*
-                * Current SNP implementation doesn't keep track of the RMP page
-                * size so use 4K for simplicity.
-                */
-               e->pagesize = RMP_PG_SIZE_4K;
+               if (use_large_entry && IS_ALIGNED(vaddr, PMD_SIZE) &&
+                   (vaddr_end - vaddr) >= PMD_SIZE) {
+                       e->pagesize = RMP_PG_SIZE_2M;
+                       vaddr += PMD_SIZE;
+               } else {
+                       e->pagesize = RMP_PG_SIZE_4K;
+                       vaddr += PAGE_SIZE;
+               }
 
-               vaddr = vaddr + PAGE_SIZE;
                e++;
                i++;
        }
 
-       if (vmgexit_psc(data))
+       /* Page validation must be rescinded before changing to shared */
+       if (op == SNP_PAGE_STATE_SHARED)
+               pvalidate_pages(data);
+
+       local_irq_save(flags);
+
+       if (sev_cfg.ghcbs_initialized)
+               ghcb = __sev_get_ghcb(&state);
+       else
+               ghcb = boot_ghcb;
+
+       /* Invoke the hypervisor to perform the page state changes */
+       if (!ghcb || vmgexit_psc(ghcb, data))
                sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+       if (sev_cfg.ghcbs_initialized)
+               __sev_put_ghcb(&state);
+
+       local_irq_restore(flags);
+
+       /* Page validation must be performed after changing to private */
+       if (op == SNP_PAGE_STATE_PRIVATE)
+               pvalidate_pages(data);
+
+       return vaddr;
 }
 
-static void set_pages_state(unsigned long vaddr, unsigned int npages, int op)
+static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
 {
-       unsigned long vaddr_end, next_vaddr;
-       struct snp_psc_desc *desc;
+       struct snp_psc_desc desc;
+       unsigned long vaddr_end;
 
-       desc = kmalloc(sizeof(*desc), GFP_KERNEL_ACCOUNT);
-       if (!desc)
-               panic("SNP: failed to allocate memory for PSC descriptor\n");
+       /* Use the MSR protocol when a GHCB is not available. */
+       if (!boot_ghcb)
+               return early_set_pages_state(vaddr, __pa(vaddr), npages, op);
 
        vaddr = vaddr & PAGE_MASK;
        vaddr_end = vaddr + (npages << PAGE_SHIFT);
 
-       while (vaddr < vaddr_end) {
-               /* Calculate the last vaddr that fits in one struct snp_psc_desc. */
-               next_vaddr = min_t(unsigned long, vaddr_end,
-                                  (VMGEXIT_PSC_MAX_ENTRY * PAGE_SIZE) + vaddr);
-
-               __set_pages_state(desc, vaddr, next_vaddr, op);
-
-               vaddr = next_vaddr;
-       }
-
-       kfree(desc);
+       while (vaddr < vaddr_end)
+               vaddr = __set_pages_state(&desc, vaddr, vaddr_end, op);
 }
 
-void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
+void snp_set_memory_shared(unsigned long vaddr, unsigned long npages)
 {
        if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
                return;
 
-       pvalidate_pages(vaddr, npages, false);
-
        set_pages_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
 }
 
-void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
+void snp_set_memory_private(unsigned long vaddr, unsigned long npages)
 {
        if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
                return;
 
        set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
+}
+
+void snp_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+       unsigned long vaddr;
+       unsigned int npages;
+
+       if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+               return;
+
+       vaddr = (unsigned long)__va(start);
+       npages = (end - start) >> PAGE_SHIFT;
 
-       pvalidate_pages(vaddr, npages, true);
+       set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
 }
 
 static int snp_set_vmsa(void *va, bool vmsa)
@@ -1267,6 +1223,8 @@ void setup_ghcb(void)
                if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
                        snp_register_per_cpu_ghcb();
 
+               sev_cfg.ghcbs_initialized = true;
+
                return;
        }
 
@@ -1328,7 +1286,7 @@ static void sev_es_play_dead(void)
         * If we get here, the VCPU was woken up again. Jump to CPU
         * startup code to get it back online.
         */
-       start_cpu0();
+       soft_restart_cpu();
 }
 #else  /* CONFIG_HOTPLUG_CPU */
 #define sev_es_play_dead       native_play_dead
@@ -1395,9 +1353,6 @@ void __init sev_es_init_vc_handling(void)
                        sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
        }
 
-       /* Enable SEV-ES special handling */
-       static_branch_enable(&sev_es_enable_key);
-
        /* Initialize per-cpu GHCB pages */
        for_each_possible_cpu(cpu) {
                alloc_runtime_data(cpu);
index 004cb30..cfeec3e 100644 (file)
@@ -182,7 +182,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
 static unsigned long __ro_after_init max_frame_size;
 static unsigned int __ro_after_init fpu_default_state_size;
 
-void __init init_sigframe_size(void)
+static int __init init_sigframe_size(void)
 {
        fpu_default_state_size = fpu__get_fpstate_size();
 
@@ -194,7 +194,9 @@ void __init init_sigframe_size(void)
        max_frame_size = round_up(max_frame_size, FRAME_ALIGNMENT);
 
        pr_info("max sigframe size: %lu\n", max_frame_size);
+       return 0;
 }
+early_initcall(init_sigframe_size);
 
 unsigned long get_sigframe_size(void)
 {
index 375b33e..7eb18ca 100644 (file)
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
 #include <linux/gfp.h>
+#include <linux/kexec.h>
 
 #include <asm/mtrr.h>
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
 #include <asm/proto.h>
 #include <asm/apic.h>
+#include <asm/cpu.h>
 #include <asm/idtentry.h>
 #include <asm/nmi.h>
 #include <asm/mce.h>
@@ -129,7 +131,7 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
 }
 
 /*
- * this function calls the 'stop' function on all other CPUs in the system.
+ * Disable virtualization, APIC etc. and park the CPU in a HLT loop
  */
 DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
 {
@@ -146,61 +148,96 @@ static int register_stop_handler(void)
 
 static void native_stop_other_cpus(int wait)
 {
-       unsigned long flags;
-       unsigned long timeout;
+       unsigned int cpu = smp_processor_id();
+       unsigned long flags, timeout;
 
        if (reboot_force)
                return;
 
-       /*
-        * Use an own vector here because smp_call_function
-        * does lots of things not suitable in a panic situation.
-        */
+       /* Only proceed if this is the first CPU to reach this code */
+       if (atomic_cmpxchg(&stopping_cpu, -1, cpu) != -1)
+               return;
+
+       /* For kexec, ensure that offline CPUs are out of MWAIT and in HLT */
+       if (kexec_in_progress)
+               smp_kick_mwait_play_dead();
 
        /*
-        * We start by using the REBOOT_VECTOR irq.
-        * The irq is treated as a sync point to allow critical
-        * regions of code on other cpus to release their spin locks
-        * and re-enable irqs.  Jumping straight to an NMI might
-        * accidentally cause deadlocks with further shutdown/panic
-        * code.  By syncing, we give the cpus up to one second to
-        * finish their work before we force them off with the NMI.
+        * 1) Send an IPI on the reboot vector to all other CPUs.
+        *
+        *    The other CPUs should react on it after leaving critical
+        *    sections and re-enabling interrupts. They might still hold
+        *    locks, but there is nothing which can be done about that.
+        *
+        * 2) Wait for all other CPUs to report that they reached the
+        *    HLT loop in stop_this_cpu()
+        *
+        * 3) If the system uses INIT/STARTUP for CPU bringup, then
+        *    send all present CPUs an INIT vector, which brings them
+        *    completely out of the way.
+        *
+        * 4) If #3 is not possible and #2 timed out send an NMI to the
+        *    CPUs which did not yet report
+        *
+        * 5) Wait for all other CPUs to report that they reached the
+        *    HLT loop in stop_this_cpu()
+        *
+        * #4 can obviously race against a CPU reaching the HLT loop late.
+        * That CPU will have reported already and the "have all CPUs
+        * reached HLT" condition will be true despite the fact that the
+        * other CPU is still handling the NMI. Again, there is no
+        * protection against that as "disabled" APICs still respond to
+        * NMIs.
         */
-       if (num_online_cpus() > 1) {
-               /* did someone beat us here? */
-               if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1)
-                       return;
-
-               /* sync above data before sending IRQ */
-               wmb();
+       cpumask_copy(&cpus_stop_mask, cpu_online_mask);
+       cpumask_clear_cpu(cpu, &cpus_stop_mask);
 
+       if (!cpumask_empty(&cpus_stop_mask)) {
                apic_send_IPI_allbutself(REBOOT_VECTOR);
 
                /*
                 * Don't wait longer than a second for IPI completion. The
                 * wait request is not checked here because that would
-                * prevent an NMI shutdown attempt in case that not all
+                * prevent an NMI/INIT shutdown in case that not all
                 * CPUs reach shutdown state.
                 */
                timeout = USEC_PER_SEC;
-               while (num_online_cpus() > 1 && timeout--)
+               while (!cpumask_empty(&cpus_stop_mask) && timeout--)
                        udelay(1);
        }
 
-       /* if the REBOOT_VECTOR didn't work, try with the NMI */
-       if (num_online_cpus() > 1) {
+       /*
+        * Park all other CPUs in INIT including "offline" CPUs, if
+        * possible. That's a safe place where they can't resume execution
+        * of HLT and then execute the HLT loop from overwritten text or
+        * page tables.
+        *
+        * The only downside is a broadcast MCE, but up to the point where
+        * the kexec() kernel brought all APs online again an MCE will just
+        * make HLT resume and handle the MCE. The machine crashes and burns
+        * due to overwritten text, page tables and data. So there is a
+        * choice between fire and frying pan. The result is pretty much
+        * the same. Chose frying pan until x86 provides a sane mechanism
+        * to park a CPU.
+        */
+       if (smp_park_other_cpus_in_init())
+               goto done;
+
+       /*
+        * If park with INIT was not possible and the REBOOT_VECTOR didn't
+        * take all secondary CPUs offline, try with the NMI.
+        */
+       if (!cpumask_empty(&cpus_stop_mask)) {
                /*
                 * If NMI IPI is enabled, try to register the stop handler
                 * and send the IPI. In any case try to wait for the other
                 * CPUs to stop.
                 */
                if (!smp_no_nmi_ipi && !register_stop_handler()) {
-                       /* Sync above data before sending IRQ */
-                       wmb();
-
                        pr_emerg("Shutting down cpus with NMI\n");
 
-                       apic_send_IPI_allbutself(NMI_VECTOR);
+                       for_each_cpu(cpu, &cpus_stop_mask)
+                               apic->send_IPI(cpu, NMI_VECTOR);
                }
                /*
                 * Don't wait longer than 10 ms if the caller didn't
@@ -208,14 +245,21 @@ static void native_stop_other_cpus(int wait)
                 * one or more CPUs do not reach shutdown state.
                 */
                timeout = USEC_PER_MSEC * 10;
-               while (num_online_cpus() > 1 && (wait || timeout--))
+               while (!cpumask_empty(&cpus_stop_mask) && (wait || timeout--))
                        udelay(1);
        }
 
+done:
        local_irq_save(flags);
        disable_local_APIC();
        mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
        local_irq_restore(flags);
+
+       /*
+        * Ensure that the cpus_stop_mask cache lines are invalidated on
+        * the other CPUs. See comment vs. SME in stop_this_cpu().
+        */
+       cpumask_clear(&cpus_stop_mask);
 }
 
 /*
@@ -268,8 +312,7 @@ struct smp_ops smp_ops = {
 #endif
        .smp_send_reschedule    = native_smp_send_reschedule,
 
-       .cpu_up                 = native_cpu_up,
-       .cpu_die                = native_cpu_die,
+       .kick_ap_alive          = native_kick_ap,
        .cpu_disable            = native_cpu_disable,
        .play_dead              = native_play_dead,
 
index 352f0ce..ed2d519 100644 (file)
 #include <linux/tboot.h>
 #include <linux/gfp.h>
 #include <linux/cpuidle.h>
+#include <linux/kexec.h>
 #include <linux/numa.h>
 #include <linux/pgtable.h>
 #include <linux/overflow.h>
 #include <linux/stackprotector.h>
+#include <linux/cpuhotplug.h>
+#include <linux/mc146818rtc.h>
 
 #include <asm/acpi.h>
 #include <asm/cacheinfo.h>
@@ -74,7 +77,7 @@
 #include <asm/fpu/api.h>
 #include <asm/setup.h>
 #include <asm/uv/uv.h>
-#include <linux/mc146818rtc.h>
+#include <asm/microcode.h>
 #include <asm/i8259.h>
 #include <asm/misc.h>
 #include <asm/qspinlock.h>
@@ -101,6 +104,26 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map);
 DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
 EXPORT_PER_CPU_SYMBOL(cpu_info);
 
+/* CPUs which are the primary SMT threads */
+struct cpumask __cpu_primary_thread_mask __read_mostly;
+
+/* Representing CPUs for which sibling maps can be computed */
+static cpumask_var_t cpu_sibling_setup_mask;
+
+struct mwait_cpu_dead {
+       unsigned int    control;
+       unsigned int    status;
+};
+
+#define CPUDEAD_MWAIT_WAIT     0xDEADBEEF
+#define CPUDEAD_MWAIT_KEXEC_HLT        0x4A17DEAD
+
+/*
+ * Cache line aligned data for mwait_play_dead(). Separate on purpose so
+ * that it's unlikely to be touched by other CPUs.
+ */
+static DEFINE_PER_CPU_ALIGNED(struct mwait_cpu_dead, mwait_cpu_dead);
+
 /* Logical package management. We might want to allocate that dynamically */
 unsigned int __max_logical_packages __read_mostly;
 EXPORT_SYMBOL(__max_logical_packages);
@@ -121,7 +144,6 @@ int arch_update_cpu_topology(void)
        return retval;
 }
 
-
 static unsigned int smpboot_warm_reset_vector_count;
 
 static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
@@ -154,66 +176,63 @@ static inline void smpboot_restore_warm_reset_vector(void)
 
 }
 
-/*
- * Report back to the Boot Processor during boot time or to the caller processor
- * during CPU online.
- */
-static void smp_callin(void)
+/* Run the next set of setup steps for the upcoming CPU */
+static void ap_starting(void)
 {
-       int cpuid;
+       int cpuid = smp_processor_id();
 
-       /*
-        * If waken up by an INIT in an 82489DX configuration
-        * cpu_callout_mask guarantees we don't get here before
-        * an INIT_deassert IPI reaches our local APIC, so it is
-        * now safe to touch our local APIC.
-        */
-       cpuid = smp_processor_id();
+       /* Mop up eventual mwait_play_dead() wreckage */
+       this_cpu_write(mwait_cpu_dead.status, 0);
+       this_cpu_write(mwait_cpu_dead.control, 0);
 
        /*
-        * the boot CPU has finished the init stage and is spinning
-        * on callin_map until we finish. We are free to set up this
-        * CPU, first the APIC. (this is probably redundant on most
-        * boards)
+        * If woken up by an INIT in an 82489DX configuration the alive
+        * synchronization guarantees that the CPU does not reach this
+        * point before an INIT_deassert IPI reaches the local APIC, so it
+        * is now safe to touch the local APIC.
+        *
+        * Set up this CPU, first the APIC, which is probably redundant on
+        * most boards.
         */
        apic_ap_setup();
 
-       /*
-        * Save our processor parameters. Note: this information
-        * is needed for clock calibration.
-        */
+       /* Save the processor parameters. */
        smp_store_cpu_info(cpuid);
 
        /*
         * The topology information must be up to date before
-        * calibrate_delay() and notify_cpu_starting().
+        * notify_cpu_starting().
         */
-       set_cpu_sibling_map(raw_smp_processor_id());
+       set_cpu_sibling_map(cpuid);
 
        ap_init_aperfmperf();
 
-       /*
-        * Get our bogomips.
-        * Update loops_per_jiffy in cpu_data. Previous call to
-        * smp_store_cpu_info() stored a value that is close but not as
-        * accurate as the value just calculated.
-        */
-       calibrate_delay();
-       cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy;
        pr_debug("Stack at about %p\n", &cpuid);
 
        wmb();
 
+       /*
+        * This runs the AP through all the cpuhp states to its target
+        * state CPUHP_ONLINE.
+        */
        notify_cpu_starting(cpuid);
+}
 
+static void ap_calibrate_delay(void)
+{
        /*
-        * Allow the master to continue.
+        * Calibrate the delay loop and update loops_per_jiffy in cpu_data.
+        * smp_store_cpu_info() stored a value that is close but not as
+        * accurate as the value just calculated.
+        *
+        * As this is invoked after the TSC synchronization check,
+        * calibrate_delay_is_known() will skip the calibration routine
+        * when TSC is synchronized across sockets.
         */
-       cpumask_set_cpu(cpuid, cpu_callin_mask);
+       calibrate_delay();
+       cpu_data(smp_processor_id()).loops_per_jiffy = loops_per_jiffy;
 }
 
-static int cpu0_logical_apicid;
-static int enable_start_cpu0;
 /*
  * Activate a secondary processor.
  */
@@ -226,24 +245,63 @@ static void notrace start_secondary(void *unused)
         */
        cr4_init();
 
-#ifdef CONFIG_X86_32
-       /* switch away from the initial page table */
-       load_cr3(swapper_pg_dir);
-       __flush_tlb_all();
-#endif
-       cpu_init_secondary();
+       /*
+        * 32-bit specific. 64-bit reaches this code with the correct page
+        * table established. Yet another historical divergence.
+        */
+       if (IS_ENABLED(CONFIG_X86_32)) {
+               /* switch away from the initial page table */
+               load_cr3(swapper_pg_dir);
+               __flush_tlb_all();
+       }
+
+       cpu_init_exception_handling();
+
+       /*
+        * 32-bit systems load the microcode from the ASM startup code for
+        * historical reasons.
+        *
+        * On 64-bit systems load it before reaching the AP alive
+        * synchronization point below so it is not part of the full per
+        * CPU serialized bringup part when "parallel" bringup is enabled.
+        *
+        * That's even safe when hyperthreading is enabled in the CPU as
+        * the core code starts the primary threads first and leaves the
+        * secondary threads waiting for SIPI. Loading microcode on
+        * physical cores concurrently is a safe operation.
+        *
+        * This covers both the Intel specific issue that concurrent
+        * microcode loading on SMT siblings must be prohibited and the
+        * vendor independent issue`that microcode loading which changes
+        * CPUID, MSRs etc. must be strictly serialized to maintain
+        * software state correctness.
+        */
+       if (IS_ENABLED(CONFIG_X86_64))
+               load_ucode_ap();
+
+       /*
+        * Synchronization point with the hotplug core. Sets this CPUs
+        * synchronization state to ALIVE and spin-waits for the control CPU to
+        * release this CPU for further bringup.
+        */
+       cpuhp_ap_sync_alive();
+
+       cpu_init();
+       fpu__init_cpu();
        rcu_cpu_starting(raw_smp_processor_id());
        x86_cpuinit.early_percpu_clock_init();
-       smp_callin();
 
-       enable_start_cpu0 = 0;
+       ap_starting();
+
+       /* Check TSC synchronization with the control CPU. */
+       check_tsc_sync_target();
 
-       /* otherwise gcc will move up smp_processor_id before the cpu_init */
-       barrier();
        /*
-        * Check TSC synchronization with the boot CPU:
+        * Calibrate the delay loop after the TSC synchronization check.
+        * This allows to skip the calibration when TSC is synchronized
+        * across sockets.
         */
-       check_tsc_sync_target();
+       ap_calibrate_delay();
 
        speculative_store_bypass_ht_init();
 
@@ -257,7 +315,6 @@ static void notrace start_secondary(void *unused)
        set_cpu_online(smp_processor_id(), true);
        lapic_online();
        unlock_vector_lock();
-       cpu_set_state_online(smp_processor_id());
        x86_platform.nmi_init();
 
        /* enable local interrupts */
@@ -270,15 +327,6 @@ static void notrace start_secondary(void *unused)
 }
 
 /**
- * topology_is_primary_thread - Check whether CPU is the primary SMT thread
- * @cpu:       CPU to check
- */
-bool topology_is_primary_thread(unsigned int cpu)
-{
-       return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu));
-}
-
-/**
  * topology_smt_supported - Check whether SMT is supported by the CPUs
  */
 bool topology_smt_supported(void)
@@ -288,6 +336,7 @@ bool topology_smt_supported(void)
 
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
+ * @phys_pkg:  The physical package id to map
  *
  * Returns logical package id or -1 if not found
  */
@@ -304,15 +353,17 @@ int topology_phys_to_logical_pkg(unsigned int phys_pkg)
        return -1;
 }
 EXPORT_SYMBOL(topology_phys_to_logical_pkg);
+
 /**
  * topology_phys_to_logical_die - Map a physical die id to logical
+ * @die_id:    The physical die id to map
+ * @cur_cpu:   The CPU for which the mapping is done
  *
  * Returns logical die id or -1 if not found
  */
-int topology_phys_to_logical_die(unsigned int die_id, unsigned int cur_cpu)
+static int topology_phys_to_logical_die(unsigned int die_id, unsigned int cur_cpu)
 {
-       int cpu;
-       int proc_id = cpu_data(cur_cpu).phys_proc_id;
+       int cpu, proc_id = cpu_data(cur_cpu).phys_proc_id;
 
        for_each_possible_cpu(cpu) {
                struct cpuinfo_x86 *c = &cpu_data(cpu);
@@ -323,7 +374,6 @@ int topology_phys_to_logical_die(unsigned int die_id, unsigned int cur_cpu)
        }
        return -1;
 }
-EXPORT_SYMBOL(topology_phys_to_logical_die);
 
 /**
  * topology_update_package_map - Update the physical to logical package map
@@ -398,7 +448,7 @@ void smp_store_cpu_info(int id)
        c->cpu_index = id;
        /*
         * During boot time, CPU0 has this setup already. Save the info when
-        * bringing up AP or offlined CPU0.
+        * bringing up an AP.
         */
        identify_secondary_cpu(c);
        c->initialized = true;
@@ -552,7 +602,7 @@ static int x86_core_flags(void)
 #ifdef CONFIG_SCHED_SMT
 static int x86_smt_flags(void)
 {
-       return cpu_smt_flags() | x86_sched_itmt_flags();
+       return cpu_smt_flags();
 }
 #endif
 #ifdef CONFIG_SCHED_CLUSTER
@@ -563,50 +613,57 @@ static int x86_cluster_flags(void)
 #endif
 #endif
 
-static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
-#ifdef CONFIG_SCHED_SMT
-       { cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
-#endif
-#ifdef CONFIG_SCHED_CLUSTER
-       { cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS) },
-#endif
-#ifdef CONFIG_SCHED_MC
-       { cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
-#endif
-       { NULL, },
-};
+/*
+ * Set if a package/die has multiple NUMA nodes inside.
+ * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
+ * Sub-NUMA Clustering have this.
+ */
+static bool x86_has_numa_in_package;
 
-static struct sched_domain_topology_level x86_hybrid_topology[] = {
-#ifdef CONFIG_SCHED_SMT
-       { cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
-#endif
-#ifdef CONFIG_SCHED_MC
-       { cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
-#endif
-       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
-       { NULL, },
-};
+static struct sched_domain_topology_level x86_topology[6];
+
+static void __init build_sched_topology(void)
+{
+       int i = 0;
 
-static struct sched_domain_topology_level x86_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-       { cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
+       x86_topology[i++] = (struct sched_domain_topology_level){
+               cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT)
+       };
 #endif
 #ifdef CONFIG_SCHED_CLUSTER
-       { cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS) },
+       /*
+        * For now, skip the cluster domain on Hybrid.
+        */
+       if (!cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) {
+               x86_topology[i++] = (struct sched_domain_topology_level){
+                       cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS)
+               };
+       }
 #endif
 #ifdef CONFIG_SCHED_MC
-       { cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
+       x86_topology[i++] = (struct sched_domain_topology_level){
+               cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC)
+       };
 #endif
-       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
-       { NULL, },
-};
+       /*
+        * When there is NUMA topology inside the package skip the DIE domain
+        * since the NUMA domains will auto-magically create the right spanning
+        * domains based on the SLIT.
+        */
+       if (!x86_has_numa_in_package) {
+               x86_topology[i++] = (struct sched_domain_topology_level){
+                       cpu_cpu_mask, SD_INIT_NAME(DIE)
+               };
+       }
 
-/*
- * Set if a package/die has multiple NUMA nodes inside.
- * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
- * Sub-NUMA Clustering have this.
- */
-static bool x86_has_numa_in_package;
+       /*
+        * There must be one trailing NULL entry left.
+        */
+       BUG_ON(i >= ARRAY_SIZE(x86_topology)-1);
+
+       set_sched_topology(x86_topology);
+}
 
 void set_cpu_sibling_map(int cpu)
 {
@@ -706,9 +763,9 @@ static void impress_friends(void)
         * Allow the user to impress friends.
         */
        pr_debug("Before bogomips\n");
-       for_each_possible_cpu(cpu)
-               if (cpumask_test_cpu(cpu, cpu_callout_mask))
-                       bogosum += cpu_data(cpu).loops_per_jiffy;
+       for_each_online_cpu(cpu)
+               bogosum += cpu_data(cpu).loops_per_jiffy;
+
        pr_info("Total of %d processors activated (%lu.%02lu BogoMIPS)\n",
                num_online_cpus(),
                bogosum/(500000/HZ),
@@ -795,86 +852,42 @@ static void __init smp_quirk_init_udelay(void)
 }
 
 /*
- * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal
- * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
- * won't ... remember to clear down the APIC, etc later.
+ * Wake up AP by INIT, INIT, STARTUP sequence.
  */
-int
-wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
+static void send_init_sequence(int phys_apicid)
 {
-       u32 dm = apic->dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
-       unsigned long send_status, accept_status = 0;
-       int maxlvt;
+       int maxlvt = lapic_get_maxlvt();
 
-       /* Target chip */
-       /* Boot on the stack */
-       /* Kick the second */
-       apic_icr_write(APIC_DM_NMI | dm, apicid);
-
-       pr_debug("Waiting for send to finish...\n");
-       send_status = safe_apic_wait_icr_idle();
-
-       /*
-        * Give the other CPU some time to accept the IPI.
-        */
-       udelay(200);
+       /* Be paranoid about clearing APIC errors. */
        if (APIC_INTEGRATED(boot_cpu_apic_version)) {
-               maxlvt = lapic_get_maxlvt();
-               if (maxlvt > 3)                 /* Due to the Pentium erratum 3AP.  */
+               /* Due to the Pentium erratum 3AP.  */
+               if (maxlvt > 3)
                        apic_write(APIC_ESR, 0);
-               accept_status = (apic_read(APIC_ESR) & 0xEF);
+               apic_read(APIC_ESR);
        }
-       pr_debug("NMI sent\n");
 
-       if (send_status)
-               pr_err("APIC never delivered???\n");
-       if (accept_status)
-               pr_err("APIC delivery error (%lx)\n", accept_status);
+       /* Assert INIT on the target CPU */
+       apic_icr_write(APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT, phys_apicid);
+       safe_apic_wait_icr_idle();
 
-       return (send_status | accept_status);
+       udelay(init_udelay);
+
+       /* Deassert INIT on the target CPU */
+       apic_icr_write(APIC_INT_LEVELTRIG | APIC_DM_INIT, phys_apicid);
+       safe_apic_wait_icr_idle();
 }
 
-static int
-wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
+/*
+ * Wake up AP by INIT, INIT, STARTUP sequence.
+ */
+static int wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
 {
        unsigned long send_status = 0, accept_status = 0;
-       int maxlvt, num_starts, j;
+       int num_starts, j, maxlvt;
 
+       preempt_disable();
        maxlvt = lapic_get_maxlvt();
-
-       /*
-        * Be paranoid about clearing APIC errors.
-        */
-       if (APIC_INTEGRATED(boot_cpu_apic_version)) {
-               if (maxlvt > 3)         /* Due to the Pentium erratum 3AP.  */
-                       apic_write(APIC_ESR, 0);
-               apic_read(APIC_ESR);
-       }
-
-       pr_debug("Asserting INIT\n");
-
-       /*
-        * Turn INIT on target chip
-        */
-       /*
-        * Send IPI
-        */
-       apic_icr_write(APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT,
-                      phys_apicid);
-
-       pr_debug("Waiting for send to finish...\n");
-       send_status = safe_apic_wait_icr_idle();
-
-       udelay(init_udelay);
-
-       pr_debug("Deasserting INIT\n");
-
-       /* Target chip */
-       /* Send IPI */
-       apic_icr_write(APIC_INT_LEVELTRIG | APIC_DM_INIT, phys_apicid);
-
-       pr_debug("Waiting for send to finish...\n");
-       send_status = safe_apic_wait_icr_idle();
+       send_init_sequence(phys_apicid);
 
        mb();
 
@@ -945,15 +958,16 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
        if (accept_status)
                pr_err("APIC delivery error (%lx)\n", accept_status);
 
+       preempt_enable();
        return (send_status | accept_status);
 }
 
 /* reduce the number of lines printed when booting a large cpu count system */
 static void announce_cpu(int cpu, int apicid)
 {
+       static int width, node_width, first = 1;
        static int current_node = NUMA_NO_NODE;
        int node = early_cpu_to_node(cpu);
-       static int width, node_width;
 
        if (!width)
                width = num_digits(num_possible_cpus()) + 1; /* + '#' sign */
@@ -961,10 +975,10 @@ static void announce_cpu(int cpu, int apicid)
        if (!node_width)
                node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */
 
-       if (cpu == 1)
-               printk(KERN_INFO "x86: Booting SMP configuration:\n");
-
        if (system_state < SYSTEM_RUNNING) {
+               if (first)
+                       pr_info("x86: Booting SMP configuration:\n");
+
                if (node != current_node) {
                        if (current_node > (-1))
                                pr_cont("\n");
@@ -975,77 +989,16 @@ static void announce_cpu(int cpu, int apicid)
                }
 
                /* Add padding for the BSP */
-               if (cpu == 1)
+               if (first)
                        pr_cont("%*s", width + 1, " ");
+               first = 0;
 
                pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu);
-
        } else
                pr_info("Booting Node %d Processor %d APIC 0x%x\n",
                        node, cpu, apicid);
 }
 
-static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs)
-{
-       int cpu;
-
-       cpu = smp_processor_id();
-       if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0)
-               return NMI_HANDLED;
-
-       return NMI_DONE;
-}
-
-/*
- * Wake up AP by INIT, INIT, STARTUP sequence.
- *
- * Instead of waiting for STARTUP after INITs, BSP will execute the BIOS
- * boot-strap code which is not a desired behavior for waking up BSP. To
- * void the boot-strap code, wake up CPU0 by NMI instead.
- *
- * This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined
- * (i.e. physically hot removed and then hot added), NMI won't wake it up.
- * We'll change this code in the future to wake up hard offlined CPU0 if
- * real platform and request are available.
- */
-static int
-wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
-              int *cpu0_nmi_registered)
-{
-       int id;
-       int boot_error;
-
-       preempt_disable();
-
-       /*
-        * Wake up AP by INIT, INIT, STARTUP sequence.
-        */
-       if (cpu) {
-               boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
-               goto out;
-       }
-
-       /*
-        * Wake up BSP by nmi.
-        *
-        * Register a NMI handler to help wake up CPU0.
-        */
-       boot_error = register_nmi_handler(NMI_LOCAL,
-                                         wakeup_cpu0_nmi, 0, "wake_cpu0");
-
-       if (!boot_error) {
-               enable_start_cpu0 = 1;
-               *cpu0_nmi_registered = 1;
-               id = apic->dest_mode_logical ? cpu0_logical_apicid : apicid;
-               boot_error = wakeup_secondary_cpu_via_nmi(id, start_ip);
-       }
-
-out:
-       preempt_enable();
-
-       return boot_error;
-}
-
 int common_cpu_up(unsigned int cpu, struct task_struct *idle)
 {
        int ret;
@@ -1071,17 +1024,13 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
- * Returns zero if CPU booted OK, else error code from
+ * Returns zero if startup was successfully sent, else error code from
  * ->wakeup_secondary_cpu.
  */
-static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
-                      int *cpu0_nmi_registered)
+static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 {
-       /* start_ip had better be page-aligned! */
        unsigned long start_ip = real_mode_header->trampoline_start;
-
-       unsigned long boot_error = 0;
-       unsigned long timeout;
+       int ret;
 
 #ifdef CONFIG_X86_64
        /* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */
@@ -1094,7 +1043,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
        if (IS_ENABLED(CONFIG_X86_32)) {
                early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
                initial_stack  = idle->thread.sp;
-       } else {
+       } else if (!(smpboot_control & STARTUP_PARALLEL_MASK)) {
                smpboot_control = cpu;
        }
 
@@ -1108,7 +1057,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
         * This grunge runs the startup process for
         * the targeted processor.
         */
-
        if (x86_platform.legacy.warm_reset) {
 
                pr_debug("Setting warm reset code and vector.\n");
@@ -1123,13 +1071,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
                }
        }
 
-       /*
-        * AP might wait on cpu_callout_mask in cpu_init() with
-        * cpu_initialized_mask set if previous attempt to online
-        * it timed-out. Clear cpu_initialized_mask so that after
-        * INIT/SIPI it could start with a clean state.
-        */
-       cpumask_clear_cpu(cpu, cpu_initialized_mask);
        smp_mb();
 
        /*
@@ -1137,66 +1078,25 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
         * - Use a method from the APIC driver if one defined, with wakeup
         *   straight to 64-bit mode preferred over wakeup to RM.
         * Otherwise,
-        * - Use an INIT boot APIC message for APs or NMI for BSP.
+        * - Use an INIT boot APIC message
         */
        if (apic->wakeup_secondary_cpu_64)
-               boot_error = apic->wakeup_secondary_cpu_64(apicid, start_ip);
+               ret = apic->wakeup_secondary_cpu_64(apicid, start_ip);
        else if (apic->wakeup_secondary_cpu)
-               boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
+               ret = apic->wakeup_secondary_cpu(apicid, start_ip);
        else
-               boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
-                                                    cpu0_nmi_registered);
-
-       if (!boot_error) {
-               /*
-                * Wait 10s total for first sign of life from AP
-                */
-               boot_error = -1;
-               timeout = jiffies + 10*HZ;
-               while (time_before(jiffies, timeout)) {
-                       if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
-                               /*
-                                * Tell AP to proceed with initialization
-                                */
-                               cpumask_set_cpu(cpu, cpu_callout_mask);
-                               boot_error = 0;
-                               break;
-                       }
-                       schedule();
-               }
-       }
-
-       if (!boot_error) {
-               /*
-                * Wait till AP completes initial initialization
-                */
-               while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
-                       /*
-                        * Allow other tasks to run while we wait for the
-                        * AP to come online. This also gives a chance
-                        * for the MTRR work(triggered by the AP coming online)
-                        * to be completed in the stop machine context.
-                        */
-                       schedule();
-               }
-       }
+               ret = wakeup_secondary_cpu_via_init(apicid, start_ip);
 
-       if (x86_platform.legacy.warm_reset) {
-               /*
-                * Cleanup possible dangling ends...
-                */
-               smpboot_restore_warm_reset_vector();
-       }
-
-       return boot_error;
+       /* If the wakeup mechanism failed, cleanup the warm reset vector */
+       if (ret)
+               arch_cpuhp_cleanup_kick_cpu(cpu);
+       return ret;
 }
 
-int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
+int native_kick_ap(unsigned int cpu, struct task_struct *tidle)
 {
        int apicid = apic->cpu_present_to_apicid(cpu);
-       int cpu0_nmi_registered = 0;
-       unsigned long flags;
-       int err, ret = 0;
+       int err;
 
        lockdep_assert_irqs_enabled();
 
@@ -1210,24 +1110,11 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
        }
 
        /*
-        * Already booted CPU?
-        */
-       if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
-               pr_debug("do_boot_cpu %d Already started\n", cpu);
-               return -ENOSYS;
-       }
-
-       /*
         * Save current MTRR state in case it was changed since early boot
         * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync:
         */
        mtrr_save_state();
 
-       /* x86 CPUs take themselves offline, so delayed offline is OK. */
-       err = cpu_check_up_prepare(cpu);
-       if (err && err != -EBUSY)
-               return err;
-
        /* the FPU context is blank, nobody can own it */
        per_cpu(fpu_fpregs_owner_ctx, cpu) = NULL;
 
@@ -1235,41 +1122,44 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
        if (err)
                return err;
 
-       err = do_boot_cpu(apicid, cpu, tidle, &cpu0_nmi_registered);
-       if (err) {
+       err = do_boot_cpu(apicid, cpu, tidle);
+       if (err)
                pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
-               ret = -EIO;
-               goto unreg_nmi;
-       }
 
-       /*
-        * Check TSC synchronization with the AP (keep irqs disabled
-        * while doing so):
-        */
-       local_irq_save(flags);
-       check_tsc_sync_source(cpu);
-       local_irq_restore(flags);
+       return err;
+}
 
-       while (!cpu_online(cpu)) {
-               cpu_relax();
-               touch_nmi_watchdog();
-       }
+int arch_cpuhp_kick_ap_alive(unsigned int cpu, struct task_struct *tidle)
+{
+       return smp_ops.kick_ap_alive(cpu, tidle);
+}
 
-unreg_nmi:
-       /*
-        * Clean up the nmi handler. Do this after the callin and callout sync
-        * to avoid impact of possible long unregister time.
-        */
-       if (cpu0_nmi_registered)
-               unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
+void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu)
+{
+       /* Cleanup possible dangling ends... */
+       if (smp_ops.kick_ap_alive == native_kick_ap && x86_platform.legacy.warm_reset)
+               smpboot_restore_warm_reset_vector();
+}
 
-       return ret;
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
+{
+       if (smp_ops.cleanup_dead_cpu)
+               smp_ops.cleanup_dead_cpu(cpu);
+
+       if (system_state == SYSTEM_RUNNING)
+               pr_info("CPU %u is now offline\n", cpu);
+}
+
+void arch_cpuhp_sync_state_poll(void)
+{
+       if (smp_ops.poll_sync_state)
+               smp_ops.poll_sync_state();
 }
 
 /**
- * arch_disable_smp_support() - disables SMP support for x86 at runtime
+ * arch_disable_smp_support() - Disables SMP support for x86 at boottime
  */
-void arch_disable_smp_support(void)
+void __init arch_disable_smp_support(void)
 {
        disable_ioapic_support();
 }
@@ -1361,14 +1251,6 @@ static void __init smp_cpu_index_default(void)
        }
 }
 
-static void __init smp_get_logical_apicid(void)
-{
-       if (x2apic_mode)
-               cpu0_logical_apicid = apic_read(APIC_LDR);
-       else
-               cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
-}
-
 void __init smp_prepare_cpus_common(void)
 {
        unsigned int i;
@@ -1379,7 +1261,6 @@ void __init smp_prepare_cpus_common(void)
         * Setup boot CPU information
         */
        smp_store_boot_cpu_info(); /* Final full version of the data */
-       cpumask_copy(cpu_callin_mask, cpumask_of(0));
        mb();
 
        for_each_possible_cpu(i) {
@@ -1390,18 +1271,24 @@ void __init smp_prepare_cpus_common(void)
                zalloc_cpumask_var(&per_cpu(cpu_l2c_shared_map, i), GFP_KERNEL);
        }
 
-       /*
-        * Set 'default' x86 topology, this matches default_topology() in that
-        * it has NUMA nodes as a topology level. See also
-        * native_smp_cpus_done().
-        *
-        * Must be done before set_cpus_sibling_map() is ran.
-        */
-       set_sched_topology(x86_topology);
-
        set_cpu_sibling_map(0);
 }
 
+#ifdef CONFIG_X86_64
+/* Establish whether parallel bringup can be supported. */
+bool __init arch_cpuhp_init_parallel_bringup(void)
+{
+       if (!x86_cpuinit.parallel_bringup) {
+               pr_info("Parallel CPU startup disabled by the platform\n");
+               return false;
+       }
+
+       smpboot_control = STARTUP_READ_APICID;
+       pr_debug("Parallel CPU startup enabled: 0x%08x\n", smpboot_control);
+       return true;
+}
+#endif
+
 /*
  * Prepare for SMP bootup.
  * @max_cpus: configured maximum number of CPUs, It is a legacy parameter
@@ -1431,8 +1318,6 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
        /* Setup local timer */
        x86_init.timers.setup_percpu_clockev();
 
-       smp_get_logical_apicid();
-
        pr_info("CPU0: ");
        print_cpu_info(&cpu_data(0));
 
@@ -1455,6 +1340,25 @@ void arch_thaw_secondary_cpus_end(void)
        cache_aps_init();
 }
 
+bool smp_park_other_cpus_in_init(void)
+{
+       unsigned int cpu, this_cpu = smp_processor_id();
+       unsigned int apicid;
+
+       if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu)
+               return false;
+
+       for_each_present_cpu(cpu) {
+               if (cpu == this_cpu)
+                       continue;
+               apicid = apic->cpu_present_to_apicid(cpu);
+               if (apicid == BAD_APICID)
+                       continue;
+               send_init_sequence(apicid);
+       }
+       return true;
+}
+
 /*
  * Early setup to make printk work.
  */
@@ -1466,9 +1370,6 @@ void __init native_smp_prepare_boot_cpu(void)
        if (!IS_ENABLED(CONFIG_SMP))
                switch_gdt_and_percpu_base(me);
 
-       /* already set me in cpu_online_mask in boot_cpu_init() */
-       cpumask_set_cpu(me, cpu_callout_mask);
-       cpu_set_state_online(me);
        native_pv_lock_init();
 }
 
@@ -1490,13 +1391,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
        pr_debug("Boot done\n");
 
        calculate_max_logical_packages();
-
-       /* XXX for now assume numa-in-package and hybrid don't overlap */
-       if (x86_has_numa_in_package)
-               set_sched_topology(x86_numa_in_package_topology);
-       if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
-               set_sched_topology(x86_hybrid_topology);
-
+       build_sched_topology();
        nmi_selftest();
        impress_friends();
        cache_aps_init();
@@ -1592,6 +1487,12 @@ __init void prefill_possible_map(void)
                set_cpu_possible(i, true);
 }
 
+/* correctly size the local cpu masks */
+void __init setup_cpu_local_masks(void)
+{
+       alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 
 /* Recompute SMT state for all CPUs on offline */
@@ -1650,10 +1551,6 @@ static void remove_siblinginfo(int cpu)
 static void remove_cpu_from_maps(int cpu)
 {
        set_cpu_online(cpu, false);
-       cpumask_clear_cpu(cpu, cpu_callout_mask);
-       cpumask_clear_cpu(cpu, cpu_callin_mask);
-       /* was set by cpu_init() */
-       cpumask_clear_cpu(cpu, cpu_initialized_mask);
        numa_remove_cpu(cpu);
 }
 
@@ -1704,64 +1601,27 @@ int native_cpu_disable(void)
        return 0;
 }
 
-int common_cpu_die(unsigned int cpu)
-{
-       int ret = 0;
-
-       /* We don't do anything here: idle task is faking death itself. */
-
-       /* They ack this in play_dead() by setting CPU_DEAD */
-       if (cpu_wait_death(cpu, 5)) {
-               if (system_state == SYSTEM_RUNNING)
-                       pr_info("CPU %u is now offline\n", cpu);
-       } else {
-               pr_err("CPU %u didn't die...\n", cpu);
-               ret = -1;
-       }
-
-       return ret;
-}
-
-void native_cpu_die(unsigned int cpu)
-{
-       common_cpu_die(cpu);
-}
-
 void play_dead_common(void)
 {
        idle_task_exit();
 
-       /* Ack it */
-       (void)cpu_report_death();
-
+       cpuhp_ap_report_dead();
        /*
         * With physical CPU hotplug, we should halt the cpu
         */
        local_irq_disable();
 }
 
-/**
- * cond_wakeup_cpu0 - Wake up CPU0 if needed.
- *
- * If NMI wants to wake up CPU0, start CPU0.
- */
-void cond_wakeup_cpu0(void)
-{
-       if (smp_processor_id() == 0 && enable_start_cpu0)
-               start_cpu0();
-}
-EXPORT_SYMBOL_GPL(cond_wakeup_cpu0);
-
 /*
  * We need to flush the caches before going to sleep, lest we have
  * dirty data in our caches when we come back up.
  */
 static inline void mwait_play_dead(void)
 {
+       struct mwait_cpu_dead *md = this_cpu_ptr(&mwait_cpu_dead);
        unsigned int eax, ebx, ecx, edx;
        unsigned int highest_cstate = 0;
        unsigned int highest_subcstate = 0;
-       void *mwait_ptr;
        int i;
 
        if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
@@ -1796,12 +1656,9 @@ static inline void mwait_play_dead(void)
                        (highest_subcstate - 1);
        }
 
-       /*
-        * This should be a memory location in a cache line which is
-        * unlikely to be touched by other processors.  The actual
-        * content is immaterial as it is not actually modified in any way.
-        */
-       mwait_ptr = &current_thread_info()->flags;
+       /* Set up state for the kexec() hack below */
+       md->status = CPUDEAD_MWAIT_WAIT;
+       md->control = CPUDEAD_MWAIT_WAIT;
 
        wbinvd();
 
@@ -1814,13 +1671,58 @@ static inline void mwait_play_dead(void)
                 * case where we return around the loop.
                 */
                mb();
-               clflush(mwait_ptr);
+               clflush(md);
                mb();
-               __monitor(mwait_ptr, 0, 0);
+               __monitor(md, 0, 0);
                mb();
                __mwait(eax, 0);
 
-               cond_wakeup_cpu0();
+               if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) {
+                       /*
+                        * Kexec is about to happen. Don't go back into mwait() as
+                        * the kexec kernel might overwrite text and data including
+                        * page tables and stack. So mwait() would resume when the
+                        * monitor cache line is written to and then the CPU goes
+                        * south due to overwritten text, page tables and stack.
+                        *
+                        * Note: This does _NOT_ protect against a stray MCE, NMI,
+                        * SMI. They will resume execution at the instruction
+                        * following the HLT instruction and run into the problem
+                        * which this is trying to prevent.
+                        */
+                       WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT);
+                       while(1)
+                               native_halt();
+               }
+       }
+}
+
+/*
+ * Kick all "offline" CPUs out of mwait on kexec(). See comment in
+ * mwait_play_dead().
+ */
+void smp_kick_mwait_play_dead(void)
+{
+       u32 newstate = CPUDEAD_MWAIT_KEXEC_HLT;
+       struct mwait_cpu_dead *md;
+       unsigned int cpu, i;
+
+       for_each_cpu_andnot(cpu, cpu_present_mask, cpu_online_mask) {
+               md = per_cpu_ptr(&mwait_cpu_dead, cpu);
+
+               /* Does it sit in mwait_play_dead() ? */
+               if (READ_ONCE(md->status) != CPUDEAD_MWAIT_WAIT)
+                       continue;
+
+               /* Wait up to 5ms */
+               for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) {
+                       /* Bring it out of mwait */
+                       WRITE_ONCE(md->control, newstate);
+                       udelay(5);
+               }
+
+               if (READ_ONCE(md->status) != newstate)
+                       pr_err_once("CPU%u is stuck in mwait_play_dead()\n", cpu);
        }
 }
 
@@ -1829,11 +1731,8 @@ void __noreturn hlt_play_dead(void)
        if (__this_cpu_read(cpu_info.x86) >= 4)
                wbinvd();
 
-       while (1) {
+       while (1)
                native_halt();
-
-               cond_wakeup_cpu0();
-       }
 }
 
 void native_play_dead(void)
@@ -1852,12 +1751,6 @@ int native_cpu_disable(void)
        return -ENOSYS;
 }
 
-void native_cpu_die(unsigned int cpu)
-{
-       /* We said "no" in __cpu_disable */
-       BUG();
-}
-
 void native_play_dead(void)
 {
        BUG();
index 1b83377..ca004e2 100644 (file)
 static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
 
 #ifdef CONFIG_HOTPLUG_CPU
-
-#ifdef CONFIG_BOOTPARAM_HOTPLUG_CPU0
-static int cpu0_hotpluggable = 1;
-#else
-static int cpu0_hotpluggable;
-static int __init enable_cpu0_hotplug(char *str)
-{
-       cpu0_hotpluggable = 1;
-       return 1;
-}
-
-__setup("cpu0_hotplug", enable_cpu0_hotplug);
-#endif
-
-#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
-/*
- * This function offlines a CPU as early as possible and allows userspace to
- * boot up without the CPU. The CPU can be onlined back by user after boot.
- *
- * This is only called for debugging CPU offline/online feature.
- */
-int _debug_hotplug_cpu(int cpu, int action)
+int arch_register_cpu(int cpu)
 {
-       int ret;
-
-       if (!cpu_is_hotpluggable(cpu))
-               return -EINVAL;
+       struct x86_cpu *xc = per_cpu_ptr(&cpu_devices, cpu);
 
-       switch (action) {
-       case 0:
-               ret = remove_cpu(cpu);
-               if (!ret)
-                       pr_info("DEBUG_HOTPLUG_CPU0: CPU %u is now offline\n", cpu);
-               else
-                       pr_debug("Can't offline CPU%d.\n", cpu);
-               break;
-       case 1:
-               ret = add_cpu(cpu);
-               if (ret)
-                       pr_debug("Can't online CPU%d.\n", cpu);
-
-               break;
-       default:
-               ret = -EINVAL;
-       }
-
-       return ret;
-}
-
-static int __init debug_hotplug_cpu(void)
-{
-       _debug_hotplug_cpu(0, 0);
-       return 0;
-}
-
-late_initcall_sync(debug_hotplug_cpu);
-#endif /* CONFIG_DEBUG_HOTPLUG_CPU0 */
-
-int arch_register_cpu(int num)
-{
-       struct cpuinfo_x86 *c = &cpu_data(num);
-
-       /*
-        * Currently CPU0 is only hotpluggable on Intel platforms. Other
-        * vendors can add hotplug support later.
-        * Xen PV guests don't support CPU0 hotplug at all.
-        */
-       if (c->x86_vendor != X86_VENDOR_INTEL ||
-           cpu_feature_enabled(X86_FEATURE_XENPV))
-               cpu0_hotpluggable = 0;
-
-       /*
-        * Two known BSP/CPU0 dependencies: Resume from suspend/hibernate
-        * depends on BSP. PIC interrupts depend on BSP.
-        *
-        * If the BSP dependencies are under control, one can tell kernel to
-        * enable BSP hotplug. This basically adds a control file and
-        * one can attempt to offline BSP.
-        */
-       if (num == 0 && cpu0_hotpluggable) {
-               unsigned int irq;
-               /*
-                * We won't take down the boot processor on i386 if some
-                * interrupts only are able to be serviced by the BSP in PIC.
-                */
-               for_each_active_irq(irq) {
-                       if (!IO_APIC_IRQ(irq) && irq_has_action(irq)) {
-                               cpu0_hotpluggable = 0;
-                               break;
-                       }
-               }
-       }
-       if (num || cpu0_hotpluggable)
-               per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
-
-       return register_cpu(&per_cpu(cpu_devices, num).cpu, num);
+       xc->cpu.hotpluggable = cpu > 0;
+       return register_cpu(&xc->cpu, cpu);
 }
 EXPORT_SYMBOL(arch_register_cpu);
 
index 3446988..3425c6a 100644 (file)
@@ -69,12 +69,10 @@ static int __init tsc_early_khz_setup(char *buf)
 }
 early_param("tsc_early_khz", tsc_early_khz_setup);
 
-__always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
+__always_inline void __cyc2ns_read(struct cyc2ns_data *data)
 {
        int seq, idx;
 
-       preempt_disable_notrace();
-
        do {
                seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
                idx = seq & 1;
@@ -86,6 +84,12 @@ __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
        } while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
 }
 
+__always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
+{
+       preempt_disable_notrace();
+       __cyc2ns_read(data);
+}
+
 __always_inline void cyc2ns_read_end(void)
 {
        preempt_enable_notrace();
@@ -115,18 +119,25 @@ __always_inline void cyc2ns_read_end(void)
  *                      -johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
 
-static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static __always_inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
        struct cyc2ns_data data;
        unsigned long long ns;
 
-       cyc2ns_read_begin(&data);
+       __cyc2ns_read(&data);
 
        ns = data.cyc2ns_offset;
        ns += mul_u64_u32_shr(cyc, data.cyc2ns_mul, data.cyc2ns_shift);
 
-       cyc2ns_read_end();
+       return ns;
+}
 
+static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+       unsigned long long ns;
+       preempt_disable_notrace();
+       ns = __cycles_2_ns(cyc);
+       preempt_enable_notrace();
        return ns;
 }
 
@@ -223,7 +234,7 @@ noinstr u64 native_sched_clock(void)
                u64 tsc_now = rdtsc();
 
                /* return the value in ns */
-               return cycles_2_ns(tsc_now);
+               return __cycles_2_ns(tsc_now);
        }
 
        /*
@@ -250,7 +261,7 @@ u64 native_sched_clock_from_tsc(u64 tsc)
 /* We need to define a real function for sched_clock, to override the
    weak default version */
 #ifdef CONFIG_PARAVIRT
-noinstr u64 sched_clock(void)
+noinstr u64 sched_clock_noinstr(void)
 {
        return paravirt_sched_clock();
 }
@@ -260,11 +271,20 @@ bool using_native_sched_clock(void)
        return static_call_query(pv_sched_clock) == native_sched_clock;
 }
 #else
-u64 sched_clock(void) __attribute__((alias("native_sched_clock")));
+u64 sched_clock_noinstr(void) __attribute__((alias("native_sched_clock")));
 
 bool using_native_sched_clock(void) { return true; }
 #endif
 
+notrace u64 sched_clock(void)
+{
+       u64 now;
+       preempt_disable_notrace();
+       now = sched_clock_noinstr();
+       preempt_enable_notrace();
+       return now;
+}
+
 int check_tsc_unstable(void)
 {
        return tsc_unstable;
@@ -1598,10 +1618,7 @@ void __init tsc_init(void)
 
 #ifdef CONFIG_SMP
 /*
- * If we have a constant TSC and are using the TSC for the delay loop,
- * we can skip clock calibration if another cpu in the same socket has already
- * been calibrated. This assumes that CONSTANT_TSC applies to all
- * cpus in the socket - this should be a safe assumption.
+ * Check whether existing calibration data can be reused.
  */
 unsigned long calibrate_delay_is_known(void)
 {
@@ -1609,6 +1626,21 @@ unsigned long calibrate_delay_is_known(void)
        int constant_tsc = cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC);
        const struct cpumask *mask = topology_core_cpumask(cpu);
 
+       /*
+        * If TSC has constant frequency and TSC is synchronized across
+        * sockets then reuse CPU0 calibration.
+        */
+       if (constant_tsc && !tsc_unstable)
+               return cpu_data(0).loops_per_jiffy;
+
+       /*
+        * If TSC has constant frequency and TSC is not synchronized across
+        * sockets and this is not the first CPU in the socket, then reuse
+        * the calibration value of an already online CPU on that socket.
+        *
+        * This assumes that CONSTANT_TSC is consistent for all CPUs in a
+        * socket.
+        */
        if (!constant_tsc || !mask)
                return 0;
 
index 9452dc9..bbc440c 100644 (file)
@@ -245,7 +245,6 @@ bool tsc_store_and_check_tsc_adjust(bool bootcpu)
  */
 static atomic_t start_count;
 static atomic_t stop_count;
-static atomic_t skip_test;
 static atomic_t test_runs;
 
 /*
@@ -344,21 +343,14 @@ static inline unsigned int loop_timeout(int cpu)
 }
 
 /*
- * Source CPU calls into this - it waits for the freshly booted
- * target CPU to arrive and then starts the measurement:
+ * The freshly booted CPU initiates this via an async SMP function call.
  */
-void check_tsc_sync_source(int cpu)
+static void check_tsc_sync_source(void *__cpu)
 {
+       unsigned int cpu = (unsigned long)__cpu;
        int cpus = 2;
 
        /*
-        * No need to check if we already know that the TSC is not
-        * synchronized or if we have no TSC.
-        */
-       if (unsynchronized_tsc())
-               return;
-
-       /*
         * Set the maximum number of test runs to
         *  1 if the CPU does not provide the TSC_ADJUST MSR
         *  3 if the MSR is available, so the target can try to adjust
@@ -368,16 +360,9 @@ void check_tsc_sync_source(int cpu)
        else
                atomic_set(&test_runs, 3);
 retry:
-       /*
-        * Wait for the target to start or to skip the test:
-        */
-       while (atomic_read(&start_count) != cpus - 1) {
-               if (atomic_read(&skip_test) > 0) {
-                       atomic_set(&skip_test, 0);
-                       return;
-               }
+       /* Wait for the target to start. */
+       while (atomic_read(&start_count) != cpus - 1)
                cpu_relax();
-       }
 
        /*
         * Trigger the target to continue into the measurement too:
@@ -397,14 +382,14 @@ retry:
        if (!nr_warps) {
                atomic_set(&test_runs, 0);
 
-               pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
+               pr_debug("TSC synchronization [CPU#%d -> CPU#%u]: passed\n",
                        smp_processor_id(), cpu);
 
        } else if (atomic_dec_and_test(&test_runs) || random_warps) {
                /* Force it to 0 if random warps brought us here */
                atomic_set(&test_runs, 0);
 
-               pr_warn("TSC synchronization [CPU#%d -> CPU#%d]:\n",
+               pr_warn("TSC synchronization [CPU#%d -> CPU#%u]:\n",
                        smp_processor_id(), cpu);
                pr_warn("Measured %Ld cycles TSC warp between CPUs, "
                        "turning off TSC clock.\n", max_warp);
@@ -457,11 +442,12 @@ void check_tsc_sync_target(void)
         * SoCs the TSC is frequency synchronized, but still the TSC ADJUST
         * register might have been wreckaged by the BIOS..
         */
-       if (tsc_store_and_check_tsc_adjust(false) || tsc_clocksource_reliable) {
-               atomic_inc(&skip_test);
+       if (tsc_store_and_check_tsc_adjust(false) || tsc_clocksource_reliable)
                return;
-       }
 
+       /* Kick the control CPU into the TSC synchronization function */
+       smp_call_function_single(cpumask_first(cpu_online_mask), check_tsc_sync_source,
+                                (unsigned long *)(unsigned long)cpu, 0);
 retry:
        /*
         * Register this CPU's participation and wait for the
index 3ac50b7..7e574cf 100644 (file)
@@ -7,14 +7,23 @@
 #include <asm/unwind.h>
 #include <asm/orc_types.h>
 #include <asm/orc_lookup.h>
+#include <asm/orc_header.h>
+
+ORC_HEADER;
 
 #define orc_warn(fmt, ...) \
        printk_deferred_once(KERN_WARNING "WARNING: " fmt, ##__VA_ARGS__)
 
 #define orc_warn_current(args...)                                      \
 ({                                                                     \
-       if (state->task == current && !state->error)                    \
+       static bool dumped_before;                                      \
+       if (state->task == current && !state->error) {                  \
                orc_warn(args);                                         \
+               if (unwind_debug && !dumped_before) {                   \
+                       dumped_before = true;                           \
+                       unwind_dump(state);                             \
+               }                                                       \
+       }                                                               \
 })
 
 extern int __start_orc_unwind_ip[];
@@ -23,8 +32,49 @@ extern struct orc_entry __start_orc_unwind[];
 extern struct orc_entry __stop_orc_unwind[];
 
 static bool orc_init __ro_after_init;
+static bool unwind_debug __ro_after_init;
 static unsigned int lookup_num_blocks __ro_after_init;
 
+static int __init unwind_debug_cmdline(char *str)
+{
+       unwind_debug = true;
+
+       return 0;
+}
+early_param("unwind_debug", unwind_debug_cmdline);
+
+static void unwind_dump(struct unwind_state *state)
+{
+       static bool dumped_before;
+       unsigned long word, *sp;
+       struct stack_info stack_info = {0};
+       unsigned long visit_mask = 0;
+
+       if (dumped_before)
+               return;
+
+       dumped_before = true;
+
+       printk_deferred("unwind stack type:%d next_sp:%p mask:0x%lx graph_idx:%d\n",
+                       state->stack_info.type, state->stack_info.next_sp,
+                       state->stack_mask, state->graph_idx);
+
+       for (sp = __builtin_frame_address(0); sp;
+            sp = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
+               if (get_stack_info(sp, state->task, &stack_info, &visit_mask))
+                       break;
+
+               for (; sp < stack_info.end; sp++) {
+
+                       word = READ_ONCE_NOCHECK(*sp);
+
+                       printk_deferred("%0*lx: %0*lx (%pB)\n", BITS_PER_LONG/4,
+                                       (unsigned long)sp, BITS_PER_LONG/4,
+                                       word, (void *)word);
+               }
+       }
+}
+
 static inline unsigned long orc_ip(const int *ip)
 {
        return (unsigned long)ip + *ip;
@@ -136,21 +186,6 @@ static struct orc_entry null_orc_entry = {
        .type = ORC_TYPE_CALL
 };
 
-#ifdef CONFIG_CALL_THUNKS
-static struct orc_entry *orc_callthunk_find(unsigned long ip)
-{
-       if (!is_callthunk((void *)ip))
-               return NULL;
-
-       return &null_orc_entry;
-}
-#else
-static struct orc_entry *orc_callthunk_find(unsigned long ip)
-{
-       return NULL;
-}
-#endif
-
 /* Fake frame pointer entry -- used as a fallback for generated code */
 static struct orc_entry orc_fp_entry = {
        .type           = ORC_TYPE_CALL,
@@ -203,11 +238,7 @@ static struct orc_entry *orc_find(unsigned long ip)
        if (orc)
                return orc;
 
-       orc =  orc_ftrace_find(ip);
-       if (orc)
-               return orc;
-
-       return orc_callthunk_find(ip);
+       return orc_ftrace_find(ip);
 }
 
 #ifdef CONFIG_MODULES
@@ -219,7 +250,6 @@ static struct orc_entry *cur_orc_table = __start_orc_unwind;
 static void orc_sort_swap(void *_a, void *_b, int size)
 {
        struct orc_entry *orc_a, *orc_b;
-       struct orc_entry orc_tmp;
        int *a = _a, *b = _b, tmp;
        int delta = _b - _a;
 
@@ -231,9 +261,7 @@ static void orc_sort_swap(void *_a, void *_b, int size)
        /* Swap the corresponding .orc_unwind entries: */
        orc_a = cur_orc_table + (a - cur_orc_ip_table);
        orc_b = cur_orc_table + (b - cur_orc_ip_table);
-       orc_tmp = *orc_a;
-       *orc_a = *orc_b;
-       *orc_b = orc_tmp;
+       swap(*orc_a, *orc_b);
 }
 
 static int orc_sort_cmp(const void *_a, const void *_b)
index 25f1552..03c885d 100644 (file)
@@ -508,4 +508,8 @@ INIT_PER_CPU(irq_stack_backing_store);
            "fixed_percpu_data is not at start of per-cpu area");
 #endif
 
+#ifdef CONFIG_RETHUNK
+. = ASSERT((__x86_return_thunk & 0x3f) == 0, "__x86_return_thunk not cacheline-aligned");
+#endif
+
 #endif /* CONFIG_X86_64 */
index d82f4fa..a37ebd3 100644 (file)
@@ -126,12 +126,13 @@ struct x86_init_ops x86_init __initdata = {
 struct x86_cpuinit_ops x86_cpuinit = {
        .early_percpu_clock_init        = x86_init_noop,
        .setup_percpu_clockev           = setup_secondary_APIC_clock,
+       .parallel_bringup               = true,
 };
 
 static void default_nmi_init(void) { };
 
-static void enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { }
-static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return false; }
+static bool enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { return true; }
+static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return true; }
 static bool enc_tlb_flush_required_noop(bool enc) { return false; }
 static bool enc_cache_flush_required_noop(void) { return false; }
 static bool is_private_mmio_noop(u64 addr) {return false; }
index 123bf8b..0c9660a 100644 (file)
@@ -253,7 +253,6 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
                                       int nent)
 {
        struct kvm_cpuid_entry2 *best;
-       u64 guest_supported_xcr0 = cpuid_get_supported_xcr0(entries, nent);
 
        best = cpuid_entry2_find(entries, nent, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
        if (best) {
@@ -292,21 +291,6 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
                                           vcpu->arch.ia32_misc_enable_msr &
                                           MSR_IA32_MISC_ENABLE_MWAIT);
        }
-
-       /*
-        * Bits 127:0 of the allowed SECS.ATTRIBUTES (CPUID.0x12.0x1) enumerate
-        * the supported XSAVE Feature Request Mask (XFRM), i.e. the enclave's
-        * requested XCR0 value.  The enclave's XFRM must be a subset of XCRO
-        * at the time of EENTER, thus adjust the allowed XFRM by the guest's
-        * supported XCR0.  Similar to XCR0 handling, FP and SSE are forced to
-        * '1' even on CPUs that don't support XSAVE.
-        */
-       best = cpuid_entry2_find(entries, nent, 0x12, 0x1);
-       if (best) {
-               best->ecx &= guest_supported_xcr0 & 0xffffffff;
-               best->edx &= guest_supported_xcr0 >> 32;
-               best->ecx |= XFEATURE_MASK_FPSSE;
-       }
 }
 
 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
index e542cf2..3c300a1 100644 (file)
@@ -229,6 +229,23 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
        u32 physical_id;
 
        /*
+        * For simplicity, KVM always allocates enough space for all possible
+        * xAPIC IDs.  Yell, but don't kill the VM, as KVM can continue on
+        * without the optimized map.
+        */
+       if (WARN_ON_ONCE(xapic_id > new->max_apic_id))
+               return -EINVAL;
+
+       /*
+        * Bail if a vCPU was added and/or enabled its APIC between allocating
+        * the map and doing the actual calculations for the map.  Note, KVM
+        * hardcodes the x2APIC ID to vcpu_id, i.e. there's no TOCTOU bug if
+        * the compiler decides to reload x2apic_id after this check.
+        */
+       if (x2apic_id > new->max_apic_id)
+               return -E2BIG;
+
+       /*
         * Deliberately truncate the vCPU ID when detecting a mismatched APIC
         * ID to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a
         * 32-bit value.  Any unwanted aliasing due to truncation results will
@@ -253,8 +270,7 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
         */
        if (vcpu->kvm->arch.x2apic_format) {
                /* See also kvm_apic_match_physical_addr(). */
-               if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
-                       x2apic_id <= new->max_apic_id)
+               if (apic_x2apic_mode(apic) || x2apic_id > 0xff)
                        new->phys_map[x2apic_id] = apic;
 
                if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
index c8961f4..6eaa3d6 100644 (file)
@@ -7091,7 +7091,10 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
                 */
                slot = NULL;
                if (atomic_read(&kvm->nr_memslots_dirty_logging)) {
-                       slot = gfn_to_memslot(kvm, sp->gfn);
+                       struct kvm_memslots *slots;
+
+                       slots = kvm_memslots_for_spte_role(kvm, sp->role);
+                       slot = __gfn_to_memslot(slots, sp->gfn);
                        WARN_ON_ONCE(!slot);
                }
 
index ca32389..54089f9 100644 (file)
@@ -3510,7 +3510,7 @@ static bool svm_is_vnmi_pending(struct kvm_vcpu *vcpu)
        if (!is_vnmi_enabled(svm))
                return false;
 
-       return !!(svm->vmcb->control.int_ctl & V_NMI_BLOCKING_MASK);
+       return !!(svm->vmcb->control.int_ctl & V_NMI_PENDING_MASK);
 }
 
 static bool svm_set_vnmi_pending(struct kvm_vcpu *vcpu)
index 0574030..2261b68 100644 (file)
@@ -170,12 +170,19 @@ static int __handle_encls_ecreate(struct kvm_vcpu *vcpu,
                return 1;
        }
 
-       /* Enforce CPUID restrictions on MISCSELECT, ATTRIBUTES and XFRM. */
+       /*
+        * Enforce CPUID restrictions on MISCSELECT, ATTRIBUTES and XFRM.  Note
+        * that the allowed XFRM (XFeature Request Mask) isn't strictly bound
+        * by the supported XCR0.  FP+SSE *must* be set in XFRM, even if XSAVE
+        * is unsupported, i.e. even if XCR0 itself is completely unsupported.
+        */
        if ((u32)miscselect & ~sgx_12_0->ebx ||
            (u32)attributes & ~sgx_12_1->eax ||
            (u32)(attributes >> 32) & ~sgx_12_1->ebx ||
            (u32)xfrm & ~sgx_12_1->ecx ||
-           (u32)(xfrm >> 32) & ~sgx_12_1->edx) {
+           (u32)(xfrm >> 32) & ~sgx_12_1->edx ||
+           xfrm & ~(vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FPSSE) ||
+           (xfrm & XFEATURE_MASK_FPSSE) != XFEATURE_MASK_FPSSE) {
                kvm_inject_gp(vcpu, 0);
                return 1;
        }
index ceb7c5e..7f70207 100644 (file)
@@ -1446,7 +1446,7 @@ static const u32 msrs_to_save_base[] = {
 #endif
        MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
        MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
-       MSR_IA32_SPEC_CTRL,
+       MSR_IA32_SPEC_CTRL, MSR_IA32_TSX_CTRL,
        MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
        MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
        MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
@@ -2799,14 +2799,13 @@ static u64 read_tsc(void)
 static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp,
                          int *mode)
 {
-       long v;
        u64 tsc_pg_val;
+       long v;
 
        switch (clock->vclock_mode) {
        case VDSO_CLOCKMODE_HVCLOCK:
-               tsc_pg_val = hv_read_tsc_page_tsc(hv_get_tsc_page(),
-                                                 tsc_timestamp);
-               if (tsc_pg_val != U64_MAX) {
+               if (hv_read_tsc_page_tsc(hv_get_tsc_page(),
+                                        tsc_timestamp, &tsc_pg_val)) {
                        /* TSC page valid */
                        *mode = VDSO_CLOCKMODE_HVCLOCK;
                        v = (tsc_pg_val - clock->cycle_last) &
@@ -7155,6 +7154,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
                if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
                        return;
                break;
+       case MSR_IA32_TSX_CTRL:
+               if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
+                       return;
+               break;
        default:
                break;
        }
@@ -10754,6 +10757,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
                        exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED;
                        break;
                }
+
+               /* Note, VM-Exits that go down the "slow" path are accounted below. */
+               ++vcpu->stat.exits;
        }
 
        /*
@@ -13155,7 +13161,7 @@ EXPORT_SYMBOL_GPL(kvm_arch_end_assignment);
 
 bool noinstr kvm_arch_has_assigned_device(struct kvm *kvm)
 {
-       return arch_atomic_read(&kvm->arch.assigned_device_count);
+       return raw_atomic_read(&kvm->arch.assigned_device_count);
 }
 EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
 
index 01932af..ea3a28e 100644 (file)
@@ -61,8 +61,9 @@ ifeq ($(CONFIG_X86_32),y)
         lib-y += strstr_32.o
         lib-y += string_32.o
         lib-y += memmove_32.o
+        lib-y += cmpxchg8b_emu.o
 ifneq ($(CONFIG_X86_CMPXCHG64),y)
-        lib-y += cmpxchg8b_emu.o atomic64_386_32.o
+        lib-y += atomic64_386_32.o
 endif
 else
         obj-y += iomap_copy_64.o
index 33c70c0..6962df3 100644 (file)
@@ -1,47 +1,54 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 #include <linux/linkage.h>
 #include <asm/percpu.h>
+#include <asm/processor-flags.h>
 
 .text
 
 /*
+ * Emulate 'cmpxchg16b %gs:(%rsi)'
+ *
  * Inputs:
  * %rsi : memory location to compare
  * %rax : low 64 bits of old value
  * %rdx : high 64 bits of old value
  * %rbx : low 64 bits of new value
  * %rcx : high 64 bits of new value
- * %al  : Operation successful
+ *
+ * Notably this is not LOCK prefixed and is not safe against NMIs
  */
 SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
 
-#
-# Emulate 'cmpxchg16b %gs:(%rsi)' except we return the result in %al not
-# via the ZF.  Caller will access %al to get result.
-#
-# Note that this is only useful for a cpuops operation.  Meaning that we
-# do *not* have a fully atomic operation but just an operation that is
-# *atomic* on a single cpu (as provided by the this_cpu_xx class of
-# macros).
-#
        pushfq
        cli
 
-       cmpq PER_CPU_VAR((%rsi)), %rax
-       jne .Lnot_same
-       cmpq PER_CPU_VAR(8(%rsi)), %rdx
-       jne .Lnot_same
+       /* if (*ptr == old) */
+       cmpq    PER_CPU_VAR(0(%rsi)), %rax
+       jne     .Lnot_same
+       cmpq    PER_CPU_VAR(8(%rsi)), %rdx
+       jne     .Lnot_same
 
-       movq %rbx, PER_CPU_VAR((%rsi))
-       movq %rcx, PER_CPU_VAR(8(%rsi))
+       /* *ptr = new */
+       movq    %rbx, PER_CPU_VAR(0(%rsi))
+       movq    %rcx, PER_CPU_VAR(8(%rsi))
+
+       /* set ZF in EFLAGS to indicate success */
+       orl     $X86_EFLAGS_ZF, (%rsp)
 
        popfq
-       mov $1, %al
        RET
 
 .Lnot_same:
+       /* *ptr != old */
+
+       /* old = *ptr */
+       movq    PER_CPU_VAR(0(%rsi)), %rax
+       movq    PER_CPU_VAR(8(%rsi)), %rdx
+
+       /* clear ZF in EFLAGS to indicate failure */
+       andl    $(~X86_EFLAGS_ZF), (%rsp)
+
        popfq
-       xor %al,%al
        RET
 
 SYM_FUNC_END(this_cpu_cmpxchg16b_emu)
index 6a912d5..4980525 100644 (file)
@@ -2,10 +2,16 @@
 
 #include <linux/linkage.h>
 #include <asm/export.h>
+#include <asm/percpu.h>
+#include <asm/processor-flags.h>
 
 .text
 
+#ifndef CONFIG_X86_CMPXCHG64
+
 /*
+ * Emulate 'cmpxchg8b (%esi)' on UP
+ *
  * Inputs:
  * %esi : memory location to compare
  * %eax : low 32 bits of old value
  */
 SYM_FUNC_START(cmpxchg8b_emu)
 
-#
-# Emulate 'cmpxchg8b (%esi)' on UP except we don't
-# set the whole ZF thing (caller will just compare
-# eax:edx with the expected value)
-#
        pushfl
        cli
 
-       cmpl  (%esi), %eax
-       jne .Lnot_same
-       cmpl 4(%esi), %edx
-       jne .Lhalf_same
+       cmpl    0(%esi), %eax
+       jne     .Lnot_same
+       cmpl    4(%esi), %edx
+       jne     .Lnot_same
+
+       movl    %ebx, 0(%esi)
+       movl    %ecx, 4(%esi)
 
-       movl %ebx,  (%esi)
-       movl %ecx, 4(%esi)
+       orl     $X86_EFLAGS_ZF, (%esp)
 
        popfl
        RET
 
 .Lnot_same:
-       movl  (%esi), %eax
-.Lhalf_same:
-       movl 4(%esi), %edx
+       movl    0(%esi), %eax
+       movl    4(%esi), %edx
+
+       andl    $(~X86_EFLAGS_ZF), (%esp)
 
        popfl
        RET
 
 SYM_FUNC_END(cmpxchg8b_emu)
 EXPORT_SYMBOL(cmpxchg8b_emu)
+
+#endif
+
+#ifndef CONFIG_UML
+
+SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
+
+       pushfl
+       cli
+
+       cmpl    PER_CPU_VAR(0(%esi)), %eax
+       jne     .Lnot_same2
+       cmpl    PER_CPU_VAR(4(%esi)), %edx
+       jne     .Lnot_same2
+
+       movl    %ebx, PER_CPU_VAR(0(%esi))
+       movl    %ecx, PER_CPU_VAR(4(%esi))
+
+       orl     $X86_EFLAGS_ZF, (%esp)
+
+       popfl
+       RET
+
+.Lnot_same2:
+       movl    PER_CPU_VAR(0(%esi)), %eax
+       movl    PER_CPU_VAR(4(%esi)), %edx
+
+       andl    $(~X86_EFLAGS_ZF), (%esp)
+
+       popfl
+       RET
+
+SYM_FUNC_END(this_cpu_cmpxchg8b_emu)
+
+#endif
index 4fc5c2d..01c5de4 100644 (file)
@@ -7,6 +7,8 @@
  */
 
 #include <linux/linkage.h>
+#include <asm/cpufeatures.h>
+#include <asm/alternative.h>
 #include <asm/asm.h>
 #include <asm/export.h>
 
@@ -29,7 +31,7 @@
  */
 SYM_FUNC_START(rep_movs_alternative)
        cmpq $64,%rcx
-       jae .Lunrolled
+       jae .Llarge
 
        cmp $8,%ecx
        jae .Lword
@@ -65,6 +67,12 @@ SYM_FUNC_START(rep_movs_alternative)
        _ASM_EXTABLE_UA( 2b, .Lcopy_user_tail)
        _ASM_EXTABLE_UA( 3b, .Lcopy_user_tail)
 
+.Llarge:
+0:     ALTERNATIVE "jmp .Lunrolled", "rep movsb", X86_FEATURE_ERMS
+1:     RET
+
+        _ASM_EXTABLE_UA( 0b, 1b)
+
        .p2align 4
 .Lunrolled:
 10:    movq (%rsi),%r8
index 50734a2..cea25ca 100644 (file)
@@ -5,22 +5,34 @@
  * This file contains network checksum routines that are better done
  * in an architecture-specific manner due to speed.
  */
+
 #include <linux/compiler.h>
 #include <linux/export.h>
 #include <asm/checksum.h>
 #include <asm/word-at-a-time.h>
 
-static inline unsigned short from32to16(unsigned a) 
+static inline unsigned short from32to16(unsigned a)
 {
-       unsigned short b = a >> 16; 
+       unsigned short b = a >> 16;
        asm("addw %w2,%w0\n\t"
-           "adcw $0,%w0\n" 
+           "adcw $0,%w0\n"
            : "=r" (b)
            : "0" (b), "r" (a));
        return b;
 }
 
+static inline __wsum csum_tail(u64 temp64, int odd)
+{
+       unsigned int result;
+
+       result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff);
+       if (unlikely(odd)) {
+               result = from32to16(result);
+               result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
+       }
+       return (__force __wsum)result;
+}
+
 /*
  * Do a checksum on an arbitrary memory area.
  * Returns a 32bit checksum.
@@ -35,7 +47,7 @@ static inline unsigned short from32to16(unsigned a)
 __wsum csum_partial(const void *buff, int len, __wsum sum)
 {
        u64 temp64 = (__force u64)sum;
-       unsigned odd, result;
+       unsigned odd;
 
        odd = 1 & (unsigned long) buff;
        if (unlikely(odd)) {
@@ -47,21 +59,52 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
                buff++;
        }
 
-       while (unlikely(len >= 64)) {
+       /*
+        * len == 40 is the hot case due to IPv6 headers, but annotating it likely()
+        * has noticeable negative affect on codegen for all other cases with
+        * minimal performance benefit here.
+        */
+       if (len == 40) {
                asm("addq 0*8(%[src]),%[res]\n\t"
                    "adcq 1*8(%[src]),%[res]\n\t"
                    "adcq 2*8(%[src]),%[res]\n\t"
                    "adcq 3*8(%[src]),%[res]\n\t"
                    "adcq 4*8(%[src]),%[res]\n\t"
-                   "adcq 5*8(%[src]),%[res]\n\t"
-                   "adcq 6*8(%[src]),%[res]\n\t"
-                   "adcq 7*8(%[src]),%[res]\n\t"
                    "adcq $0,%[res]"
-                   : [res] "+r" (temp64)
-                   : [src] "r" (buff)
-                   : "memory");
-               buff += 64;
-               len -= 64;
+                   : [res] "+r"(temp64)
+                   : [src] "r"(buff), "m"(*(const char(*)[40])buff));
+               return csum_tail(temp64, odd);
+       }
+       if (unlikely(len >= 64)) {
+               /*
+                * Extra accumulators for better ILP in the loop.
+                */
+               u64 tmp_accum, tmp_carries;
+
+               asm("xorl %k[tmp_accum],%k[tmp_accum]\n\t"
+                   "xorl %k[tmp_carries],%k[tmp_carries]\n\t"
+                   "subl $64, %[len]\n\t"
+                   "1:\n\t"
+                   "addq 0*8(%[src]),%[res]\n\t"
+                   "adcq 1*8(%[src]),%[res]\n\t"
+                   "adcq 2*8(%[src]),%[res]\n\t"
+                   "adcq 3*8(%[src]),%[res]\n\t"
+                   "adcl $0,%k[tmp_carries]\n\t"
+                   "addq 4*8(%[src]),%[tmp_accum]\n\t"
+                   "adcq 5*8(%[src]),%[tmp_accum]\n\t"
+                   "adcq 6*8(%[src]),%[tmp_accum]\n\t"
+                   "adcq 7*8(%[src]),%[tmp_accum]\n\t"
+                   "adcl $0,%k[tmp_carries]\n\t"
+                   "addq $64, %[src]\n\t"
+                   "subl $64, %[len]\n\t"
+                   "jge 1b\n\t"
+                   "addq %[tmp_accum],%[res]\n\t"
+                   "adcq %[tmp_carries],%[res]\n\t"
+                   "adcq $0,%[res]"
+                   : [tmp_accum] "=&r"(tmp_accum),
+                     [tmp_carries] "=&r"(tmp_carries), [res] "+r"(temp64),
+                     [len] "+r"(len), [src] "+r"(buff)
+                   : "m"(*(const char *)buff));
        }
 
        if (len & 32) {
@@ -70,45 +113,37 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
                    "adcq 2*8(%[src]),%[res]\n\t"
                    "adcq 3*8(%[src]),%[res]\n\t"
                    "adcq $0,%[res]"
-                       : [res] "+r" (temp64)
-                       : [src] "r" (buff)
-                       : "memory");
+                   : [res] "+r"(temp64)
+                   : [src] "r"(buff), "m"(*(const char(*)[32])buff));
                buff += 32;
        }
        if (len & 16) {
                asm("addq 0*8(%[src]),%[res]\n\t"
                    "adcq 1*8(%[src]),%[res]\n\t"
                    "adcq $0,%[res]"
-                       : [res] "+r" (temp64)
-                       : [src] "r" (buff)
-                       : "memory");
+                   : [res] "+r"(temp64)
+                   : [src] "r"(buff), "m"(*(const char(*)[16])buff));
                buff += 16;
        }
        if (len & 8) {
                asm("addq 0*8(%[src]),%[res]\n\t"
                    "adcq $0,%[res]"
-                       : [res] "+r" (temp64)
-                       : [src] "r" (buff)
-                       : "memory");
+                   : [res] "+r"(temp64)
+                   : [src] "r"(buff), "m"(*(const char(*)[8])buff));
                buff += 8;
        }
        if (len & 7) {
-               unsigned int shift = (8 - (len & 7)) * 8;
+               unsigned int shift = (-len << 3) & 63;
                unsigned long trail;
 
                trail = (load_unaligned_zeropad(buff) << shift) >> shift;
 
                asm("addq %[trail],%[res]\n\t"
                    "adcq $0,%[res]"
-                       : [res] "+r" (temp64)
-                       : [trail] "r" (trail));
+                   : [res] "+r"(temp64)
+                   : [trail] "r"(trail));
        }
-       result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff);
-       if (unlikely(odd)) {
-               result = from32to16(result);
-               result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
-       }
-       return (__force __wsum)result;
+       return csum_tail(temp64, odd);
 }
 EXPORT_SYMBOL(csum_partial);
 
@@ -118,6 +153,6 @@ EXPORT_SYMBOL(csum_partial);
  */
 __sum16 ip_compute_csum(const void *buff, int len)
 {
-       return csum_fold(csum_partial(buff,len,0));
+       return csum_fold(csum_partial(buff, len, 0));
 }
 EXPORT_SYMBOL(ip_compute_csum);
index b64a2bd..9c63713 100644 (file)
@@ -143,43 +143,43 @@ SYM_FUNC_END(__get_user_nocheck_8)
 EXPORT_SYMBOL(__get_user_nocheck_8)
 
 
-SYM_CODE_START_LOCAL(.Lbad_get_user_clac)
+SYM_CODE_START_LOCAL(__get_user_handle_exception)
        ASM_CLAC
 .Lbad_get_user:
        xor %edx,%edx
        mov $(-EFAULT),%_ASM_AX
        RET
-SYM_CODE_END(.Lbad_get_user_clac)
+SYM_CODE_END(__get_user_handle_exception)
 
 #ifdef CONFIG_X86_32
-SYM_CODE_START_LOCAL(.Lbad_get_user_8_clac)
+SYM_CODE_START_LOCAL(__get_user_8_handle_exception)
        ASM_CLAC
 bad_get_user_8:
        xor %edx,%edx
        xor %ecx,%ecx
        mov $(-EFAULT),%_ASM_AX
        RET
-SYM_CODE_END(.Lbad_get_user_8_clac)
+SYM_CODE_END(__get_user_8_handle_exception)
 #endif
 
 /* get_user */
-       _ASM_EXTABLE(1b, .Lbad_get_user_clac)
-       _ASM_EXTABLE(2b, .Lbad_get_user_clac)
-       _ASM_EXTABLE(3b, .Lbad_get_user_clac)
+       _ASM_EXTABLE(1b, __get_user_handle_exception)
+       _ASM_EXTABLE(2b, __get_user_handle_exception)
+       _ASM_EXTABLE(3b, __get_user_handle_exception)
 #ifdef CONFIG_X86_64
-       _ASM_EXTABLE(4b, .Lbad_get_user_clac)
+       _ASM_EXTABLE(4b, __get_user_handle_exception)
 #else
-       _ASM_EXTABLE(4b, .Lbad_get_user_8_clac)
-       _ASM_EXTABLE(5b, .Lbad_get_user_8_clac)
+       _ASM_EXTABLE(4b, __get_user_8_handle_exception)
+       _ASM_EXTABLE(5b, __get_user_8_handle_exception)
 #endif
 
 /* __get_user */
-       _ASM_EXTABLE(6b, .Lbad_get_user_clac)
-       _ASM_EXTABLE(7b, .Lbad_get_user_clac)
-       _ASM_EXTABLE(8b, .Lbad_get_user_clac)
+       _ASM_EXTABLE(6b, __get_user_handle_exception)
+       _ASM_EXTABLE(7b, __get_user_handle_exception)
+       _ASM_EXTABLE(8b, __get_user_handle_exception)
 #ifdef CONFIG_X86_64
-       _ASM_EXTABLE(9b, .Lbad_get_user_clac)
+       _ASM_EXTABLE(9b, __get_user_handle_exception)
 #else
-       _ASM_EXTABLE(9b, .Lbad_get_user_8_clac)
-       _ASM_EXTABLE(10b, .Lbad_get_user_8_clac)
+       _ASM_EXTABLE(9b, __get_user_8_handle_exception)
+       _ASM_EXTABLE(10b, __get_user_8_handle_exception)
 #endif
index 0266186..0559b20 100644 (file)
@@ -38,10 +38,12 @@ SYM_FUNC_START(__memmove)
        cmp %rdi, %r8
        jg 2f
 
-       /* FSRM implies ERMS => no length checks, do the copy directly */
+#define CHECK_LEN      cmp $0x20, %rdx; jb 1f
+#define MEMMOVE_BYTES  movq %rdx, %rcx; rep movsb; RET
 .Lmemmove_begin_forward:
-       ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
-       ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
+       ALTERNATIVE_2 __stringify(CHECK_LEN), \
+                     __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
+                     __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
 
        /*
         * movsq instruction have many startup latency
@@ -207,11 +209,6 @@ SYM_FUNC_START(__memmove)
        movb %r11b, (%rdi)
 13:
        RET
-
-.Lmemmove_erms:
-       movq %rdx, %rcx
-       rep movsb
-       RET
 SYM_FUNC_END(__memmove)
 EXPORT_SYMBOL(__memmove)
 
index b09cd2a..47fd9bd 100644 (file)
@@ -27,14 +27,14 @@ void msrs_free(struct msr *msrs)
 EXPORT_SYMBOL(msrs_free);
 
 /**
- * Read an MSR with error handling
- *
+ * msr_read - Read an MSR with error handling
  * @msr: MSR to read
  * @m: value to read into
  *
  * It returns read data only on success, otherwise it doesn't change the output
  * argument @m.
  *
+ * Return: %0 for success, otherwise an error code
  */
 static int msr_read(u32 msr, struct msr *m)
 {
@@ -49,10 +49,12 @@ static int msr_read(u32 msr, struct msr *m)
 }
 
 /**
- * Write an MSR with error handling
+ * msr_write - Write an MSR with error handling
  *
  * @msr: MSR to write
  * @m: value to write
+ *
+ * Return: %0 for success, otherwise an error code
  */
 static int msr_write(u32 msr, struct msr *m)
 {
@@ -88,12 +90,14 @@ static inline int __flip_bit(u32 msr, u8 bit, bool set)
 }
 
 /**
- * Set @bit in a MSR @msr.
+ * msr_set_bit - Set @bit in a MSR @msr.
+ * @msr: MSR to write
+ * @bit: bit number to set
  *
- * Retval:
- * < 0: An error was encountered.
- * = 0: Bit was already set.
- * > 0: Hardware accepted the MSR write.
+ * Return:
+ * < 0: An error was encountered.
+ * = 0: Bit was already set.
+ * > 0: Hardware accepted the MSR write.
  */
 int msr_set_bit(u32 msr, u8 bit)
 {
@@ -101,12 +105,14 @@ int msr_set_bit(u32 msr, u8 bit)
 }
 
 /**
- * Clear @bit in a MSR @msr.
+ * msr_clear_bit - Clear @bit in a MSR @msr.
+ * @msr: MSR to write
+ * @bit: bit number to clear
  *
- * Retval:
- * < 0: An error was encountered.
- * = 0: Bit was already cleared.
- * > 0: Hardware accepted the MSR write.
+ * Return:
+ * < 0: An error was encountered.
+ * = 0: Bit was already cleared.
+ * > 0: Hardware accepted the MSR write.
  */
 int msr_clear_bit(u32 msr, u8 bit)
 {
index 3062d09..1451e0c 100644 (file)
@@ -131,22 +131,22 @@ SYM_FUNC_START(__put_user_nocheck_8)
 SYM_FUNC_END(__put_user_nocheck_8)
 EXPORT_SYMBOL(__put_user_nocheck_8)
 
-SYM_CODE_START_LOCAL(.Lbad_put_user_clac)
+SYM_CODE_START_LOCAL(__put_user_handle_exception)
        ASM_CLAC
 .Lbad_put_user:
        movl $-EFAULT,%ecx
        RET
-SYM_CODE_END(.Lbad_put_user_clac)
+SYM_CODE_END(__put_user_handle_exception)
 
-       _ASM_EXTABLE(1b, .Lbad_put_user_clac)
-       _ASM_EXTABLE(2b, .Lbad_put_user_clac)
-       _ASM_EXTABLE(3b, .Lbad_put_user_clac)
-       _ASM_EXTABLE(4b, .Lbad_put_user_clac)
-       _ASM_EXTABLE(5b, .Lbad_put_user_clac)
-       _ASM_EXTABLE(6b, .Lbad_put_user_clac)
-       _ASM_EXTABLE(7b, .Lbad_put_user_clac)
-       _ASM_EXTABLE(9b, .Lbad_put_user_clac)
+       _ASM_EXTABLE(1b, __put_user_handle_exception)
+       _ASM_EXTABLE(2b, __put_user_handle_exception)
+       _ASM_EXTABLE(3b, __put_user_handle_exception)
+       _ASM_EXTABLE(4b, __put_user_handle_exception)
+       _ASM_EXTABLE(5b, __put_user_handle_exception)
+       _ASM_EXTABLE(6b, __put_user_handle_exception)
+       _ASM_EXTABLE(7b, __put_user_handle_exception)
+       _ASM_EXTABLE(9b, __put_user_handle_exception)
 #ifdef CONFIG_X86_32
-       _ASM_EXTABLE(8b, .Lbad_put_user_clac)
-       _ASM_EXTABLE(10b, .Lbad_put_user_clac)
+       _ASM_EXTABLE(8b, __put_user_handle_exception)
+       _ASM_EXTABLE(10b, __put_user_handle_exception)
 #endif
index 27ef53f..3fd066d 100644 (file)
@@ -143,9 +143,9 @@ SYM_CODE_END(__x86_indirect_jump_thunk_array)
  *    from re-poisioning the BTB prediction.
  */
        .align 64
-       .skip 63, 0xcc
-SYM_FUNC_START_NOALIGN(zen_untrain_ret);
-
+       .skip 64 - (__x86_return_thunk - zen_untrain_ret), 0xcc
+SYM_START(zen_untrain_ret, SYM_L_GLOBAL, SYM_A_NONE)
+       ANNOTATE_NOENDBR
        /*
         * As executed from zen_untrain_ret, this is:
         *
index 003d901..e9251b8 100644 (file)
@@ -9,6 +9,7 @@
 #include <linux/export.h>
 #include <linux/uaccess.h>
 #include <linux/highmem.h>
+#include <linux/libnvdimm.h>
 
 /*
  * Zero Userspace
index 7fe56c5..91c52ea 100644 (file)
@@ -32,6 +32,7 @@
 #include <asm/traps.h>
 #include <asm/user.h>
 #include <asm/fpu/api.h>
+#include <asm/fpu/regset.h>
 
 #include "fpu_system.h"
 #include "fpu_emu.h"
index 2c54b76..d9efa35 100644 (file)
@@ -3,6 +3,7 @@
 #include <linux/export.h>
 #include <linux/swap.h> /* for totalram_pages */
 #include <linux/memblock.h>
+#include <asm/numa.h>
 
 void __init set_highmem_pages_init(void)
 {
index 3cdac0f..8192452 100644 (file)
@@ -9,6 +9,7 @@
 #include <linux/sched/task.h>
 
 #include <asm/set_memory.h>
+#include <asm/cpu_device_id.h>
 #include <asm/e820/api.h>
 #include <asm/init.h>
 #include <asm/page.h>
@@ -261,6 +262,24 @@ static void __init probe_page_size_mask(void)
        }
 }
 
+#define INTEL_MATCH(_model) { .vendor  = X86_VENDOR_INTEL,     \
+                             .family  = 6,                     \
+                             .model = _model,                  \
+                           }
+/*
+ * INVLPG may not properly flush Global entries
+ * on these CPUs when PCIDs are enabled.
+ */
+static const struct x86_cpu_id invlpg_miss_ids[] = {
+       INTEL_MATCH(INTEL_FAM6_ALDERLAKE   ),
+       INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
+       INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
+       INTEL_MATCH(INTEL_FAM6_RAPTORLAKE  ),
+       INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
+       INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+       {}
+};
+
 static void setup_pcid(void)
 {
        if (!IS_ENABLED(CONFIG_X86_64))
@@ -269,6 +288,12 @@ static void setup_pcid(void)
        if (!boot_cpu_has(X86_FEATURE_PCID))
                return;
 
+       if (x86_match_cpu(invlpg_miss_ids)) {
+               pr_info("Incomplete global flushes, disabling PCID");
+               setup_clear_cpu_cap(X86_FEATURE_PCID);
+               return;
+       }
+
        if (boot_cpu_has(X86_FEATURE_PGE)) {
                /*
                 * This can't be cr4_set_bits_and_update_boot() -- the
index d4e2648..b63403d 100644 (file)
@@ -45,7 +45,6 @@
 #include <asm/olpc_ofw.h>
 #include <asm/pgalloc.h>
 #include <asm/sections.h>
-#include <asm/paravirt.h>
 #include <asm/setup.h>
 #include <asm/set_memory.h>
 #include <asm/page_types.h>
@@ -74,7 +73,6 @@ static pmd_t * __init one_md_table_init(pgd_t *pgd)
 #ifdef CONFIG_X86_PAE
        if (!(pgd_val(*pgd) & _PAGE_PRESENT)) {
                pmd_table = (pmd_t *)alloc_low_page();
-               paravirt_alloc_pmd(&init_mm, __pa(pmd_table) >> PAGE_SHIFT);
                set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT));
                p4d = p4d_offset(pgd, 0);
                pud = pud_offset(p4d, 0);
@@ -99,7 +97,6 @@ static pte_t * __init one_page_table_init(pmd_t *pmd)
        if (!(pmd_val(*pmd) & _PAGE_PRESENT)) {
                pte_t *page_table = (pte_t *)alloc_low_page();
 
-               paravirt_alloc_pte(&init_mm, __pa(page_table) >> PAGE_SHIFT);
                set_pmd(pmd, __pmd(__pa(page_table) | _PAGE_TABLE));
                BUG_ON(page_table != pte_offset_kernel(pmd, 0));
        }
@@ -181,12 +178,10 @@ static pte_t *__init page_table_kmap_check(pte_t *pte, pmd_t *pmd,
                        set_pte(newpte + i, pte[i]);
                *adr = (void *)(((unsigned long)(*adr)) + PAGE_SIZE);
 
-               paravirt_alloc_pte(&init_mm, __pa(newpte) >> PAGE_SHIFT);
                set_pmd(pmd, __pmd(__pa(newpte)|_PAGE_TABLE));
                BUG_ON(newpte != pte_offset_kernel(pmd, 0));
                __flush_tlb_all();
 
-               paravirt_release_pte(__pa(pte) >> PAGE_SHIFT);
                pte = newpte;
        }
        BUG_ON(vaddr < fix_to_virt(FIX_KMAP_BEGIN - 1)
@@ -482,7 +477,6 @@ void __init native_pagetable_init(void)
                                pfn, pmd, __pa(pmd), pte, __pa(pte));
                pte_clear(NULL, va, pte);
        }
-       paravirt_alloc_pmd(&init_mm, __pa(base) >> PAGE_SHIFT);
        paging_init();
 }
 
@@ -491,15 +485,8 @@ void __init native_pagetable_init(void)
  * point, we've been running on some set of pagetables constructed by
  * the boot process.
  *
- * If we're booting on native hardware, this will be a pagetable
- * constructed in arch/x86/kernel/head_32.S.  The root of the
- * pagetable will be swapper_pg_dir.
- *
- * If we're booting paravirtualized under a hypervisor, then there are
- * more options: we may already be running PAE, and the pagetable may
- * or may not be based in swapper_pg_dir.  In any case,
- * paravirt_pagetable_init() will set up swapper_pg_dir
- * appropriately for the rest of the initialization to work.
+ * This will be a pagetable constructed in arch/x86/kernel/head_32.S.
+ * The root of the pagetable will be swapper_pg_dir.
  *
  * In general, pagetable_init() assumes that the pagetable may already
  * be partially populated, and so it avoids stomping on any existing
index 557f0fe..37db264 100644 (file)
@@ -172,10 +172,10 @@ void __meminit init_trampoline_kaslr(void)
                set_p4d(p4d_tramp,
                        __p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
 
-               set_pgd(&trampoline_pgd_entry,
-                       __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
+               trampoline_pgd_entry =
+                       __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp));
        } else {
-               set_pgd(&trampoline_pgd_entry,
-                       __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
+               trampoline_pgd_entry =
+                       __pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
        }
 }
index e0b51c0..54bbd51 100644 (file)
@@ -319,7 +319,7 @@ static void enc_dec_hypercall(unsigned long vaddr, int npages, bool enc)
 #endif
 }
 
-static void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc)
+static bool amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc)
 {
        /*
         * To maintain the security guarantees of SEV-SNP guests, make sure
@@ -327,6 +327,8 @@ static void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool
         */
        if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP) && !enc)
                snp_set_memory_shared(vaddr, npages);
+
+       return true;
 }
 
 /* Return true unconditionally: return value doesn't matter for the SEV side */
@@ -501,6 +503,21 @@ void __init sme_early_init(void)
        x86_platform.guest.enc_status_change_finish  = amd_enc_status_change_finish;
        x86_platform.guest.enc_tlb_flush_required    = amd_enc_tlb_flush_required;
        x86_platform.guest.enc_cache_flush_required  = amd_enc_cache_flush_required;
+
+       /*
+        * AMD-SEV-ES intercepts the RDMSR to read the X2APIC ID in the
+        * parallel bringup low level code. That raises #VC which cannot be
+        * handled there.
+        * It does not provide a RDMSR GHCB protocol so the early startup
+        * code cannot directly communicate with the secure firmware. The
+        * alternative solution to retrieve the APIC ID via CPUID(0xb),
+        * which is covered by the GHCB protocol, is not viable either
+        * because there is no enforcement of the CPUID(0xb) provided
+        * "initial" APIC ID to be the same as the real APIC ID.
+        * Disable parallel bootup.
+        */
+       if (sev_status & MSR_AMD64_SEV_ES_ENABLED)
+               x86_cpuinit.parallel_bringup = false;
 }
 
 void __init mem_encrypt_free_decrypted_mem(void)
index c6efcf5..bfe22fd 100644 (file)
@@ -612,7 +612,7 @@ void __init sme_enable(struct boot_params *bp)
 out:
        if (sme_me_mask) {
                physical_mask &= ~sme_me_mask;
-               cc_set_vendor(CC_VENDOR_AMD);
+               cc_vendor = CC_VENDOR_AMD;
                cc_set_mask(sme_me_mask);
        }
 }
index 7159cf7..df4182b 100644 (file)
@@ -9,6 +9,7 @@
 #include <linux/mm.h>
 #include <linux/interrupt.h>
 #include <linux/seq_file.h>
+#include <linux/proc_fs.h>
 #include <linux/debugfs.h>
 #include <linux/pfn.h>
 #include <linux/percpu.h>
@@ -231,7 +232,7 @@ within_inclusive(unsigned long addr, unsigned long start, unsigned long end)
  * points to #2, but almost all physical-to-virtual translations point to #1.
  *
  * This is so that we can have both a directmap of all physical memory *and*
- * take full advantage of the the limited (s32) immediate addressing range (2G)
+ * take full advantage of the limited (s32) immediate addressing range (2G)
  * of x86_64.
  *
  * See Documentation/arch/x86/x86_64/mm.rst for more detail.
@@ -2151,7 +2152,8 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
                cpa_flush(&cpa, x86_platform.guest.enc_cache_flush_required());
 
        /* Notify hypervisor that we are about to set/clr encryption attribute. */
-       x86_platform.guest.enc_status_change_prepare(addr, numpages, enc);
+       if (!x86_platform.guest.enc_status_change_prepare(addr, numpages, enc))
+               return -EIO;
 
        ret = __change_page_attr_set_clr(&cpa, 1);
 
index e4f499e..15a8009 100644 (file)
@@ -702,14 +702,8 @@ void p4d_clear_huge(p4d_t *p4d)
  * pud_set_huge - setup kernel PUD mapping
  *
  * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
- * function sets up a huge page only if any of the following conditions are met:
- *
- * - MTRRs are disabled, or
- *
- * - MTRRs are enabled and the range is completely covered by a single MTRR, or
- *
- * - MTRRs are enabled and the corresponding MTRR memory type is WB, which
- *   has no effect on the requested PAT memory type.
+ * function sets up a huge page only if the complete range has the same MTRR
+ * caching mode.
  *
  * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
  * page mapping attempt fails.
@@ -718,11 +712,10 @@ void p4d_clear_huge(p4d_t *p4d)
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-       u8 mtrr, uniform;
+       u8 uniform;
 
-       mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
-       if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
-           (mtrr != MTRR_TYPE_WRBACK))
+       mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+       if (!uniform)
                return 0;
 
        /* Bail out if we are we on a populated non-leaf entry: */
@@ -745,11 +738,10 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-       u8 mtrr, uniform;
+       u8 uniform;
 
-       mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
-       if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
-           (mtrr != MTRR_TYPE_WRBACK)) {
+       mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+       if (!uniform) {
                pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
                             __func__, addr, addr + PMD_SIZE);
                return 0;
index 1056bbf..438adb6 100644 (file)
@@ -2570,7 +2570,7 @@ out_image:
        }
 
        if (bpf_jit_enable > 1)
-               bpf_jit_dump(prog->len, proglen, pass + 1, image);
+               bpf_jit_dump(prog->len, proglen, pass + 1, rw_image);
 
        if (image) {
                if (!prog->is_func || extra_pass) {
index 584c25b..8731370 100644 (file)
@@ -83,7 +83,7 @@ static void ehci_reg_read(struct sim_dev_reg *reg, u32 *value)
                *value |= 0x100;
 }
 
-void sata_revid_init(struct sim_dev_reg *reg)
+static void sata_revid_init(struct sim_dev_reg *reg)
 {
        reg->sim_reg.value = 0x01060100;
        reg->sim_reg.mask = 0;
@@ -172,7 +172,7 @@ static inline void extract_bytes(u32 *value, int reg, int len)
        *value &= mask;
 }
 
-int bridge_read(unsigned int devfn, int reg, int len, u32 *value)
+static int bridge_read(unsigned int devfn, int reg, int len, u32 *value)
 {
        u32 av_bridge_base, av_bridge_limit;
        int retval = 0;
index 8babce7..014c508 100644 (file)
@@ -198,7 +198,7 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
                i++;
        }
        kfree(v);
-       return 0;
+       return msi_device_populate_sysfs(&dev->dev);
 
 error:
        if (ret == -ENOSYS)
@@ -254,7 +254,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
                dev_dbg(&dev->dev,
                        "xen: msi --> pirq=%d --> irq=%d\n", pirq, irq);
        }
-       return 0;
+       return msi_device_populate_sysfs(&dev->dev);
 
 error:
        dev_err(&dev->dev, "Failed to create MSI%s! ret=%d!\n",
@@ -346,7 +346,7 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
                if (ret < 0)
                        goto out;
        }
-       ret = 0;
+       ret = msi_device_populate_sysfs(&dev->dev);
 out:
        return ret;
 }
@@ -394,6 +394,8 @@ static void xen_teardown_msi_irqs(struct pci_dev *dev)
                        xen_destroy_irq(msidesc->irq + i);
                msidesc->irq = 0;
        }
+
+       msi_device_destroy_sysfs(&dev->dev);
 }
 
 static void xen_pv_teardown_msi_irqs(struct pci_dev *dev)
index f3f2d87..e9f99c5 100644 (file)
@@ -96,6 +96,9 @@ static const unsigned long * const efi_tables[] = {
 #ifdef CONFIG_EFI_COCO_SECRET
        &efi.coco_secret,
 #endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+       &efi.unaccepted,
+#endif
 };
 
 u64 efi_setup;         /* efi setup_data physical address */
index 75e3319..74ebd68 100644 (file)
@@ -234,7 +234,7 @@ static int __init olpc_dt_compatible_match(phandle node, const char *compat)
        return 0;
 }
 
-void __init olpc_dt_fixup(void)
+static void __init olpc_dt_fixup(void)
 {
        phandle node;
        u32 board_rev;
index 7a4d5e9..63230ff 100644 (file)
@@ -351,43 +351,6 @@ static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
        case PM_HIBERNATION_PREPARE:
                ret = bsp_check();
                break;
-#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
-       case PM_RESTORE_PREPARE:
-               /*
-                * When system resumes from hibernation, online CPU0 because
-                * 1. it's required for resume and
-                * 2. the CPU was online before hibernation
-                */
-               if (!cpu_online(0))
-                       _debug_hotplug_cpu(0, 1);
-               break;
-       case PM_POST_RESTORE:
-               /*
-                * When a resume really happens, this code won't be called.
-                *
-                * This code is called only when user space hibernation software
-                * prepares for snapshot device during boot time. So we just
-                * call _debug_hotplug_cpu() to restore to CPU0's state prior to
-                * preparing the snapshot device.
-                *
-                * This works for normal boot case in our CPU0 hotplug debug
-                * mode, i.e. CPU0 is offline and user mode hibernation
-                * software initializes during boot time.
-                *
-                * If CPU0 is online and user application accesses snapshot
-                * device after boot time, this will offline CPU0 and user may
-                * see different CPU0 state before and after accessing
-                * the snapshot device. But hopefully this is not a case when
-                * user debugging CPU0 hotplug. Even if users hit this case,
-                * they can easily online CPU0 back.
-                *
-                * To simplify this debug code, we only consider normal boot
-                * case. Otherwise we need to remember CPU0's state and restore
-                * to that state and resolve racy conditions etc.
-                */
-               _debug_hotplug_cpu(0, 0);
-               break;
-#endif
        default:
                break;
        }
index 82fec66..42abd6a 100644 (file)
@@ -14,6 +14,11 @@ $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
 
 CFLAGS_sha256.o := -D__DISABLE_EXPORTS
 
+# When profile-guided optimization is enabled, llvm emits two different
+# overlapping text sections, which is not supported by kexec. Remove profile
+# optimization flags.
+KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%,$(KBUILD_CFLAGS))
+
 # When linking purgatory.ro with -r unresolved symbols are not checked,
 # also link a purgatory.chk binary without -r to check for unresolved symbols.
 PURGATORY_LDFLAGS := -e purgatory_start -z nodefaultlib
index af56581..788e555 100644 (file)
@@ -154,6 +154,9 @@ static void __init setup_real_mode(void)
 
        trampoline_header->flags = 0;
 
+       trampoline_lock = &trampoline_header->lock;
+       *trampoline_lock = 0;
+
        trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
 
        /* Map the real mode stub as virtual == physical */
index e38d61d..c9f76fa 100644 (file)
        .text
        .code16
 
+.macro LOCK_AND_LOAD_REALMODE_ESP lock_pa=0
+       /*
+        * Make sure only one CPU fiddles with the realmode stack
+        */
+.Llock_rm\@:
+       .if \lock_pa
+        lock btsl       $0, pa_tr_lock
+       .else
+        lock btsl       $0, tr_lock
+       .endif
+        jnc             2f
+        pause
+        jmp             .Llock_rm\@
+2:
+       # Setup stack
+       movl    $rm_stack_end, %esp
+.endm
+
        .balign PAGE_SIZE
 SYM_CODE_START(trampoline_start)
        cli                     # We should be safe anyway
@@ -49,8 +67,7 @@ SYM_CODE_START(trampoline_start)
        mov     %ax, %es
        mov     %ax, %ss
 
-       # Setup stack
-       movl    $rm_stack_end, %esp
+       LOCK_AND_LOAD_REALMODE_ESP
 
        call    verify_cpu              # Verify the cpu supports long mode
        testl   %eax, %eax              # Check for return code
@@ -93,8 +110,7 @@ SYM_CODE_START(sev_es_trampoline_start)
        mov     %ax, %es
        mov     %ax, %ss
 
-       # Setup stack
-       movl    $rm_stack_end, %esp
+       LOCK_AND_LOAD_REALMODE_ESP
 
        jmp     .Lswitch_to_protected
 SYM_CODE_END(sev_es_trampoline_start)
@@ -177,7 +193,7 @@ SYM_CODE_START(pa_trampoline_compat)
         * In compatibility mode.  Prep ESP and DX for startup_32, then disable
         * paging and complete the switch to legacy 32-bit mode.
         */
-       movl    $rm_stack_end, %esp
+       LOCK_AND_LOAD_REALMODE_ESP lock_pa=1
        movw    $__KERNEL_DS, %dx
 
        movl    $(CR0_STATE & ~X86_CR0_PG), %eax
@@ -241,6 +257,7 @@ SYM_DATA_START(trampoline_header)
        SYM_DATA(tr_efer,               .space 8)
        SYM_DATA(tr_cr4,                .space 4)
        SYM_DATA(tr_flags,              .space 4)
+       SYM_DATA(tr_lock,               .space 4)
 SYM_DATA_END(trampoline_header)
 
 #include "trampoline_common.S"
index 9fd2484..9e91430 100644 (file)
@@ -10,6 +10,7 @@
 #include <linux/pci.h>
 #include <linux/module.h>
 #include <linux/vgaarb.h>
+#include <asm/fb.h>
 
 int fb_is_primary_device(struct fb_info *info)
 {
index 7d7ffb9..863d0d6 100644 (file)
@@ -16,6 +16,8 @@
 #include <asm/setup.h>
 #include <asm/xen/hypercall.h>
 
+#include "xen-ops.h"
+
 static efi_char16_t vendor[100] __initdata;
 
 static efi_system_table_t efi_systab_xen __initdata = {
index c1cd28e..a6820ca 100644 (file)
@@ -161,13 +161,12 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
        int rc = 0;
 
        /*
-        * This can happen if CPU was offlined earlier and
-        * offlining timed out in common_cpu_die().
+        * If a CPU was offlined earlier and offlining timed out then the
+        * lock mechanism is still initialized. Uninit it unconditionally
+        * as it's safe to call even if already uninited. Interrupts and
+        * timer have already been handled in xen_cpu_dead_hvm().
         */
-       if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
-               xen_smp_intr_free(cpu);
-               xen_uninit_lock_cpu(cpu);
-       }
+       xen_uninit_lock_cpu(cpu);
 
        if (cpu_acpi_id(cpu) != U32_MAX)
                per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
index 093b78c..93b6582 100644 (file)
@@ -68,6 +68,7 @@
 #include <asm/reboot.h>
 #include <asm/hypervisor.h>
 #include <asm/mach_traps.h>
+#include <asm/mtrr.h>
 #include <asm/mwait.h>
 #include <asm/pci_x86.h>
 #include <asm/cpu.h>
@@ -119,6 +120,54 @@ static int __init parse_xen_msr_safe(char *str)
 }
 early_param("xen_msr_safe", parse_xen_msr_safe);
 
+/* Get MTRR settings from Xen and put them into mtrr_state. */
+static void __init xen_set_mtrr_data(void)
+{
+#ifdef CONFIG_MTRR
+       struct xen_platform_op op = {
+               .cmd = XENPF_read_memtype,
+               .interface_version = XENPF_INTERFACE_VERSION,
+       };
+       unsigned int reg;
+       unsigned long mask;
+       uint32_t eax, width;
+       static struct mtrr_var_range var[MTRR_MAX_VAR_RANGES] __initdata;
+
+       /* Get physical address width (only 64-bit cpus supported). */
+       width = 36;
+       eax = cpuid_eax(0x80000000);
+       if ((eax >> 16) == 0x8000 && eax >= 0x80000008) {
+               eax = cpuid_eax(0x80000008);
+               width = eax & 0xff;
+       }
+
+       for (reg = 0; reg < MTRR_MAX_VAR_RANGES; reg++) {
+               op.u.read_memtype.reg = reg;
+               if (HYPERVISOR_platform_op(&op))
+                       break;
+
+               /*
+                * Only called in dom0, which has all RAM PFNs mapped at
+                * RAM MFNs, and all PCI space etc. is identity mapped.
+                * This means we can treat MFN == PFN regarding MTRR settings.
+                */
+               var[reg].base_lo = op.u.read_memtype.type;
+               var[reg].base_lo |= op.u.read_memtype.mfn << PAGE_SHIFT;
+               var[reg].base_hi = op.u.read_memtype.mfn >> (32 - PAGE_SHIFT);
+               mask = ~((op.u.read_memtype.nr_mfns << PAGE_SHIFT) - 1);
+               mask &= (1UL << width) - 1;
+               if (mask)
+                       mask |= MTRR_PHYSMASK_V;
+               var[reg].mask_lo = mask;
+               var[reg].mask_hi = mask >> 32;
+       }
+
+       /* Only overwrite MTRR state if any MTRR could be got from Xen. */
+       if (reg)
+               mtrr_overwrite_state(var, reg, MTRR_TYPE_UNCACHABLE);
+#endif
+}
+
 static void __init xen_pv_init_platform(void)
 {
        /* PV guests can't operate virtio devices without grants. */
@@ -135,6 +184,11 @@ static void __init xen_pv_init_platform(void)
 
        /* pvclock is in shared info area */
        xen_init_time_ops();
+
+       if (xen_initial_domain())
+               xen_set_mtrr_data();
+       else
+               mtrr_overwrite_state(NULL, 0, MTRR_TYPE_WRBACK);
 }
 
 static void __init xen_pv_guest_late_init(void)
index b3b8d28..e0a9751 100644 (file)
 #include "mmu.h"
 #include "debugfs.h"
 
+/*
+ * Prototypes for functions called via PV_CALLEE_SAVE_REGS_THUNK() in order
+ * to avoid warnings with "-Wmissing-prototypes".
+ */
+pteval_t xen_pte_val(pte_t pte);
+pgdval_t xen_pgd_val(pgd_t pgd);
+pmdval_t xen_pmd_val(pmd_t pmd);
+pudval_t xen_pud_val(pud_t pud);
+p4dval_t xen_p4d_val(p4d_t p4d);
+pte_t xen_make_pte(pteval_t pte);
+pgd_t xen_make_pgd(pgdval_t pgd);
+pmd_t xen_make_pmd(pmdval_t pmd);
+pud_t xen_make_pud(pudval_t pud);
+p4d_t xen_make_p4d(p4dval_t p4d);
+pte_t xen_make_pte_init(pteval_t pte);
+
 #ifdef CONFIG_X86_VSYSCALL_EMULATION
 /* l3 pud for userspace vsyscall mapping */
 static pud_t level3_user_vsyscall[PTRS_PER_PUD] __page_aligned_bss;
index c2be3ef..8b5cf7b 100644 (file)
@@ -6,6 +6,7 @@
  */
 
 #include <linux/init.h>
+#include <linux/iscsi_ibft.h>
 #include <linux/sched.h>
 #include <linux/kstrtox.h>
 #include <linux/mm.h>
@@ -764,17 +765,26 @@ char * __init xen_memory_setup(void)
        BUG_ON(memmap.nr_entries == 0);
        xen_e820_table.nr_entries = memmap.nr_entries;
 
-       /*
-        * Xen won't allow a 1:1 mapping to be created to UNUSABLE
-        * regions, so if we're using the machine memory map leave the
-        * region as RAM as it is in the pseudo-physical map.
-        *
-        * UNUSABLE regions in domUs are not handled and will need
-        * a patch in the future.
-        */
-       if (xen_initial_domain())
+       if (xen_initial_domain()) {
+               /*
+                * Xen won't allow a 1:1 mapping to be created to UNUSABLE
+                * regions, so if we're using the machine memory map leave the
+                * region as RAM as it is in the pseudo-physical map.
+                *
+                * UNUSABLE regions in domUs are not handled and will need
+                * a patch in the future.
+                */
                xen_ignore_unusable();
 
+#ifdef CONFIG_ISCSI_IBFT_FIND
+               /* Reserve 0.5 MiB to 1 MiB region so iBFT can be found */
+               xen_e820_table.entries[xen_e820_table.nr_entries].addr = IBFT_START;
+               xen_e820_table.entries[xen_e820_table.nr_entries].size = IBFT_END - IBFT_START;
+               xen_e820_table.entries[xen_e820_table.nr_entries].type = E820_TYPE_RESERVED;
+               xen_e820_table.nr_entries++;
+#endif
+       }
+
        /* Make sure the Xen-supplied memory map is well-ordered. */
        e820__update_table(&xen_e820_table);
 
index 22fb982..c20cbb1 100644 (file)
@@ -2,6 +2,10 @@
 #ifndef _XEN_SMP_H
 
 #ifdef CONFIG_SMP
+
+void asm_cpu_bringup_and_idle(void);
+asmlinkage void cpu_bringup_and_idle(void);
+
 extern void xen_send_IPI_mask(const struct cpumask *mask,
                              int vector);
 extern void xen_send_IPI_mask_allbutself(const struct cpumask *mask,
index b70afdf..ac95d19 100644 (file)
@@ -55,18 +55,16 @@ static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-static void xen_hvm_cpu_die(unsigned int cpu)
+static void xen_hvm_cleanup_dead_cpu(unsigned int cpu)
 {
-       if (common_cpu_die(cpu) == 0) {
-               if (xen_have_vector_callback) {
-                       xen_smp_intr_free(cpu);
-                       xen_uninit_lock_cpu(cpu);
-                       xen_teardown_timer(cpu);
-               }
+       if (xen_have_vector_callback) {
+               xen_smp_intr_free(cpu);
+               xen_uninit_lock_cpu(cpu);
+               xen_teardown_timer(cpu);
        }
 }
 #else
-static void xen_hvm_cpu_die(unsigned int cpu)
+static void xen_hvm_cleanup_dead_cpu(unsigned int cpu)
 {
        BUG();
 }
@@ -77,7 +75,7 @@ void __init xen_hvm_smp_init(void)
        smp_ops.smp_prepare_boot_cpu = xen_hvm_smp_prepare_boot_cpu;
        smp_ops.smp_prepare_cpus = xen_hvm_smp_prepare_cpus;
        smp_ops.smp_cpus_done = xen_smp_cpus_done;
-       smp_ops.cpu_die = xen_hvm_cpu_die;
+       smp_ops.cleanup_dead_cpu = xen_hvm_cleanup_dead_cpu;
 
        if (!xen_have_vector_callback) {
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
index a9cf8c8..d5ae5de 100644 (file)
@@ -55,13 +55,13 @@ static DEFINE_PER_CPU(struct xen_common_irq, xen_irq_work) = { .irq = -1 };
 static DEFINE_PER_CPU(struct xen_common_irq, xen_pmu_irq) = { .irq = -1 };
 
 static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id);
-void asm_cpu_bringup_and_idle(void);
 
 static void cpu_bringup(void)
 {
        int cpu;
 
        cr4_init();
+       cpuhp_ap_sync_alive();
        cpu_init();
        touch_softlockup_watchdog();
 
@@ -83,7 +83,7 @@ static void cpu_bringup(void)
 
        set_cpu_online(cpu, true);
 
-       cpu_set_state_online(cpu);  /* Implies full memory barrier. */
+       smp_mb();
 
        /* We can take interrupts now: we're officially "up". */
        local_irq_enable();
@@ -254,15 +254,12 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
        struct desc_struct *gdt;
        unsigned long gdt_mfn;
 
-       /* used to tell cpu_init() that it can proceed with initialization */
-       cpumask_set_cpu(cpu, cpu_callout_mask);
        if (cpumask_test_and_set_cpu(cpu, xen_cpu_initialized_map))
                return 0;
 
        ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL);
        if (ctxt == NULL) {
                cpumask_clear_cpu(cpu, xen_cpu_initialized_map);
-               cpumask_clear_cpu(cpu, cpu_callout_mask);
                return -ENOMEM;
        }
 
@@ -316,7 +313,7 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
        return 0;
 }
 
-static int xen_pv_cpu_up(unsigned int cpu, struct task_struct *idle)
+static int xen_pv_kick_ap(unsigned int cpu, struct task_struct *idle)
 {
        int rc;
 
@@ -326,14 +323,6 @@ static int xen_pv_cpu_up(unsigned int cpu, struct task_struct *idle)
 
        xen_setup_runstate_info(cpu);
 
-       /*
-        * PV VCPUs are always successfully taken down (see 'while' loop
-        * in xen_cpu_die()), so -EBUSY is an error.
-        */
-       rc = cpu_check_up_prepare(cpu);
-       if (rc)
-               return rc;
-
        /* make sure interrupts start blocked */
        per_cpu(xen_vcpu, cpu)->evtchn_upcall_mask = 1;
 
@@ -343,15 +332,20 @@ static int xen_pv_cpu_up(unsigned int cpu, struct task_struct *idle)
 
        xen_pmu_init(cpu);
 
-       rc = HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL);
-       BUG_ON(rc);
-
-       while (cpu_report_state(cpu) != CPU_ONLINE)
-               HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
+       /*
+        * Why is this a BUG? If the hypercall fails then everything can be
+        * rolled back, no?
+        */
+       BUG_ON(HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL));
 
        return 0;
 }
 
+static void xen_pv_poll_sync_state(void)
+{
+       HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 static int xen_pv_cpu_disable(void)
 {
@@ -367,18 +361,18 @@ static int xen_pv_cpu_disable(void)
 
 static void xen_pv_cpu_die(unsigned int cpu)
 {
-       while (HYPERVISOR_vcpu_op(VCPUOP_is_up,
-                                 xen_vcpu_nr(cpu), NULL)) {
+       while (HYPERVISOR_vcpu_op(VCPUOP_is_up, xen_vcpu_nr(cpu), NULL)) {
                __set_current_state(TASK_UNINTERRUPTIBLE);
                schedule_timeout(HZ/10);
        }
+}
 
-       if (common_cpu_die(cpu) == 0) {
-               xen_smp_intr_free(cpu);
-               xen_uninit_lock_cpu(cpu);
-               xen_teardown_timer(cpu);
-               xen_pmu_finish(cpu);
-       }
+static void xen_pv_cleanup_dead_cpu(unsigned int cpu)
+{
+       xen_smp_intr_free(cpu);
+       xen_uninit_lock_cpu(cpu);
+       xen_teardown_timer(cpu);
+       xen_pmu_finish(cpu);
 }
 
 static void __noreturn xen_pv_play_dead(void) /* used only with HOTPLUG_CPU */
@@ -400,6 +394,11 @@ static void xen_pv_cpu_die(unsigned int cpu)
        BUG();
 }
 
+static void xen_pv_cleanup_dead_cpu(unsigned int cpu)
+{
+       BUG();
+}
+
 static void __noreturn xen_pv_play_dead(void)
 {
        BUG();
@@ -438,8 +437,10 @@ static const struct smp_ops xen_smp_ops __initconst = {
        .smp_prepare_cpus = xen_pv_smp_prepare_cpus,
        .smp_cpus_done = xen_smp_cpus_done,
 
-       .cpu_up = xen_pv_cpu_up,
+       .kick_ap_alive = xen_pv_kick_ap,
        .cpu_die = xen_pv_cpu_die,
+       .cleanup_dead_cpu = xen_pv_cleanup_dead_cpu,
+       .poll_sync_state = xen_pv_poll_sync_state,
        .cpu_disable = xen_pv_cpu_disable,
        .play_dead = xen_pv_play_dead,
 
index b74ac25..52fa560 100644 (file)
@@ -66,11 +66,10 @@ static noinstr u64 xen_sched_clock(void)
         struct pvclock_vcpu_time_info *src;
        u64 ret;
 
-       preempt_disable_notrace();
        src = &__this_cpu_read(xen_vcpu)->time;
        ret = pvclock_clocksource_read_nowd(src);
        ret -= xen_sched_clock_offset;
-       preempt_enable_notrace();
+
        return ret;
 }
 
index a109037..408a2aa 100644 (file)
@@ -72,8 +72,6 @@ void xen_restore_time_memory_area(void);
 void xen_init_time_ops(void);
 void xen_hvm_init_time_ops(void);
 
-irqreturn_t xen_debug_interrupt(int irq, void *dev_id);
-
 bool xen_vcpu_stolen(int vcpu);
 
 void xen_vcpu_setup(int cpu);
@@ -148,9 +146,12 @@ int xen_cpuhp_setup(int (*cpu_up_prepare_cb)(unsigned int),
 void xen_pin_vcpu(int cpu);
 
 void xen_emergency_restart(void);
+void xen_force_evtchn_callback(void);
+
 #ifdef CONFIG_XEN_PV
 void xen_pv_pre_suspend(void);
 void xen_pv_post_suspend(int suspend_cancelled);
+void xen_start_kernel(struct start_info *si);
 #else
 static inline void xen_pv_pre_suspend(void) {}
 static inline void xen_pv_post_suspend(int suspend_cancelled) {}
index 3c6e547..c1bcfc2 100644 (file)
@@ -16,7 +16,6 @@ config XTENSA
        select ARCH_USE_MEMTEST
        select ARCH_USE_QUEUED_RWLOCKS
        select ARCH_USE_QUEUED_SPINLOCKS
-       select ARCH_WANT_FRAME_POINTERS
        select ARCH_WANT_IPC_PARSE_VERSION
        select BUILDTIME_TABLE_SORT
        select CLONE_BACKWARDS
@@ -35,6 +34,7 @@ config XTENSA
        select HAVE_ARCH_KCSAN
        select HAVE_ARCH_SECCOMP_FILTER
        select HAVE_ARCH_TRACEHOOK
+       select HAVE_ASM_MODVERSIONS
        select HAVE_CONTEXT_TRACKING_USER
        select HAVE_DEBUG_KMEMLEAK
        select HAVE_DMA_CONTIGUOUS
@@ -203,6 +203,18 @@ config XTENSA_UNALIGNED_USER
 
          Say Y here to enable unaligned memory access in user space.
 
+config XTENSA_LOAD_STORE
+       bool "Load/store exception handler for memory only readable with l32"
+       help
+         The Xtensa architecture only allows reading memory attached to its
+         instruction bus with l32r and l32i instructions, all other
+         instructions raise an exception with the LoadStoreErrorCause code.
+         This makes it hard to use some configurations, e.g. store string
+         literals in FLASH memory attached to the instruction bus.
+
+         Say Y here to enable exception handler that allows transparent
+         byte and 2-byte access to memory attached to instruction bus.
+
 config HAVE_SMP
        bool "System Supports SMP (MX)"
        depends on XTENSA_VARIANT_CUSTOM
index 83cc8d1..e84172a 100644 (file)
@@ -38,3 +38,11 @@ config PRINT_STACK_DEPTH
        help
          This option allows you to set the stack depth that the kernel
          prints in stack traces.
+
+config PRINT_USER_CODE_ON_UNHANDLED_EXCEPTION
+       bool "Dump user code around unhandled exception address"
+       help
+         Enable this option to display user code around PC of the unhandled
+         exception (starting at address aligned on 16 byte boundary).
+         This may simplify finding faulting code in the absence of other
+         debug facilities.
index 1d1d462..c0eef3f 100644 (file)
@@ -6,16 +6,12 @@
 
 OBJCOPY_ARGS := -O $(if $(CONFIG_CPU_BIG_ENDIAN),elf32-xtensa-be,elf32-xtensa-le)
 
-LD_ARGS        = -T $(srctree)/$(obj)/boot.ld
-
 boot-y := bootstrap.o
 targets        += $(boot-y)
 
 OBJS   := $(addprefix $(obj)/,$(boot-y))
 LIBS   := arch/xtensa/boot/lib/lib.a arch/xtensa/lib/lib.a
 
-LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
-
 $(obj)/zImage.o: $(obj)/../vmlinux.bin.gz $(OBJS)
        $(Q)$(OBJCOPY) $(OBJCOPY_ARGS) -R .comment \
                --add-section image=$< \
@@ -23,7 +19,10 @@ $(obj)/zImage.o: $(obj)/../vmlinux.bin.gz $(OBJS)
                $(OBJS) $@
 
 $(obj)/zImage.elf: $(obj)/zImage.o $(LIBS)
-       $(Q)$(LD) $(LD_ARGS) -o $@ $^ -L/xtensa-elf/lib $(LIBGCC)
+       $(Q)$(LD) $(KBUILD_LDFLAGS) \
+               -T $(srctree)/$(obj)/boot.ld \
+               --build-id=none \
+               -o $@ $^
 
 $(obj)/../zImage.redboot: $(obj)/zImage.elf
        $(Q)$(OBJCOPY) -S -O binary $< $@
diff --git a/arch/xtensa/include/asm/asm-prototypes.h b/arch/xtensa/include/asm/asm-prototypes.h
new file mode 100644 (file)
index 0000000..b0da618
--- /dev/null
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_PROTOTYPES_H
+#define __ASM_PROTOTYPES_H
+
+#include <asm/cacheflush.h>
+#include <asm/checksum.h>
+#include <asm/ftrace.h>
+#include <asm/page.h>
+#include <asm/string.h>
+#include <asm/uaccess.h>
+
+#include <asm-generic/asm-prototypes.h>
+
+/*
+ * gcc internal math functions
+ */
+long long __ashrdi3(long long, int);
+long long __ashldi3(long long, int);
+long long __bswapdi2(long long);
+int __bswapsi2(int);
+long long __lshrdi3(long long, int);
+int __divsi3(int, int);
+int __modsi3(int, int);
+int __mulsi3(int, int);
+unsigned int __udivsi3(unsigned int, unsigned int);
+unsigned int __umodsi3(unsigned int, unsigned int);
+unsigned long long __umulsidi3(unsigned int, unsigned int);
+
+#endif /* __ASM_PROTOTYPES_H */
index e3474ca..01bf7d9 100644 (file)
@@ -11,6 +11,7 @@
 #ifndef _XTENSA_ASMMACRO_H
 #define _XTENSA_ASMMACRO_H
 
+#include <asm-generic/export.h>
 #include <asm/core.h>
 
 /*
index 52da614..7308b7f 100644 (file)
@@ -245,6 +245,11 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t * v)              \
 ATOMIC_OPS(add)
 ATOMIC_OPS(sub)
 
+#define arch_atomic_add_return                 arch_atomic_add_return
+#define arch_atomic_sub_return                 arch_atomic_sub_return
+#define arch_atomic_fetch_add                  arch_atomic_fetch_add
+#define arch_atomic_fetch_sub                  arch_atomic_fetch_sub
+
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_FETCH_OP(op)
 
@@ -252,12 +257,13 @@ ATOMIC_OPS(and)
 ATOMIC_OPS(or)
 ATOMIC_OPS(xor)
 
+#define arch_atomic_fetch_and                  arch_atomic_fetch_and
+#define arch_atomic_fetch_or                   arch_atomic_fetch_or
+#define arch_atomic_fetch_xor                  arch_atomic_fetch_xor
+
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-#define arch_atomic_cmpxchg(v, o, n) ((int)arch_cmpxchg(&((v)->counter), (o), (n)))
-#define arch_atomic_xchg(v, new) (arch_xchg(&((v)->counter), new))
-
 #endif /* _XTENSA_ATOMIC_H */
diff --git a/arch/xtensa/include/asm/bugs.h b/arch/xtensa/include/asm/bugs.h
deleted file mode 100644 (file)
index 69b29d1..0000000
+++ /dev/null
@@ -1,18 +0,0 @@
-/*
- * include/asm-xtensa/bugs.h
- *
- * This is included by init/main.c to check for architecture-dependent bugs.
- *
- * Xtensa processors don't have any bugs.  :)
- *
- * This file is subject to the terms and conditions of the GNU General
- * Public License.  See the file "COPYING" in the main directory of
- * this archive for more details.
- */
-
-#ifndef _XTENSA_BUGS_H
-#define _XTENSA_BUGS_H
-
-static void check_bugs(void) { }
-
-#endif /* _XTENSA_BUGS_H */
index f856d2b..0e1bb6f 100644 (file)
 #define XCHAL_SPANNING_WAY 0
 #endif
 
+#ifndef XCHAL_HAVE_TRAX
+#define XCHAL_HAVE_TRAX 0
+#endif
+
+#ifndef XCHAL_NUM_PERF_COUNTERS
+#define XCHAL_NUM_PERF_COUNTERS 0
+#endif
+
 #if XCHAL_HAVE_WINDOWED
 #if defined(CONFIG_USER_ABI_DEFAULT) || defined(CONFIG_USER_ABI_CALL0_PROBE)
 /* Whether windowed ABI is supported in userspace. */
index 6c6d9a9..0ea4f84 100644 (file)
 #include <asm/processor.h>
 
 #ifndef __ASSEMBLY__
-#define ftrace_return_address0 ({ unsigned long a0, a1; \
-               __asm__ __volatile__ ( \
-                       "mov %0, a0\n" \
-                       "mov %1, a1\n" \
-                       : "=r"(a0), "=r"(a1)); \
-               MAKE_PC_FROM_RA(a0, a1); })
-
-#ifdef CONFIG_FRAME_POINTER
 extern unsigned long return_address(unsigned level);
 #define ftrace_return_address(n) return_address(n)
-#endif
 #endif /* __ASSEMBLY__ */
 
 #ifdef CONFIG_FUNCTION_TRACER
index 354ca94..94f13fa 100644 (file)
@@ -28,31 +28,11 @@ extern void platform_init(bp_tag_t*);
 extern void platform_setup (char **);
 
 /*
- * platform_restart is called to restart the system.
- */
-extern void platform_restart (void);
-
-/*
- * platform_halt is called to stop the system and halt.
- */
-extern void platform_halt (void);
-
-/*
- * platform_power_off is called to stop the system and power it off.
- */
-extern void platform_power_off (void);
-
-/*
  * platform_idle is called from the idle function.
  */
 extern void platform_idle (void);
 
 /*
- * platform_heartbeat is called every HZ
- */
-extern void platform_heartbeat (void);
-
-/*
  * platform_calibrate_ccount calibrates cpu clock freq (CONFIG_XTENSA_CALIBRATE)
  */
 extern void platform_calibrate_ccount (void);
index 89b51a0..ffce435 100644 (file)
@@ -118,9 +118,6 @@ extern void *__memcpy(void *__to, __const__ void *__from, size_t __n);
 extern void *memmove(void *__dest, __const__ void *__src, size_t __n);
 extern void *__memmove(void *__dest, __const__ void *__src, size_t __n);
 
-/* Don't build bcopy at all ...  */
-#define __HAVE_ARCH_BCOPY
-
 #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
 
 /*
index 6f74ccc..212c3b9 100644 (file)
@@ -47,6 +47,7 @@ __init trap_set_handler(int cause, xtensa_exception_handler *handler);
 asmlinkage void fast_illegal_instruction_user(void);
 asmlinkage void fast_syscall_user(void);
 asmlinkage void fast_alloca(void);
+asmlinkage void fast_load_store(void);
 asmlinkage void fast_unaligned(void);
 asmlinkage void fast_second_level_miss(void);
 asmlinkage void fast_store_prohibited(void);
@@ -64,8 +65,14 @@ void do_unhandled(struct pt_regs *regs);
 static inline void __init early_trap_init(void)
 {
        static struct exc_table init_exc_table __initdata = {
+#ifdef CONFIG_XTENSA_LOAD_STORE
+               .fast_kernel_handler[EXCCAUSE_LOAD_STORE_ERROR] =
+                       fast_load_store,
+#endif
+#ifdef CONFIG_MMU
                .fast_kernel_handler[EXCCAUSE_DTLB_MISS] =
                        fast_second_level_miss,
+#endif
        };
        xtensa_set_sr(&init_exc_table, excsave1);
 }
index d062c73..20d6b49 100644 (file)
 #include <asm/asmmacro.h>
 #include <asm/processor.h>
 
-#if XCHAL_UNALIGNED_LOAD_EXCEPTION || XCHAL_UNALIGNED_STORE_EXCEPTION
+#if XCHAL_UNALIGNED_LOAD_EXCEPTION || defined CONFIG_XTENSA_LOAD_STORE
+#define LOAD_EXCEPTION_HANDLER
+#endif
+
+#if XCHAL_UNALIGNED_STORE_EXCEPTION || defined LOAD_EXCEPTION_HANDLER
+#define ANY_EXCEPTION_HANDLER
+#endif
+
+#if XCHAL_HAVE_WINDOWED
+#define UNALIGNED_USER_EXCEPTION
+#endif
 
 /*  First-level exception handler for unaligned exceptions.
  *
  *  BE  shift left / mask 0 0 X X
  */
 
-#if XCHAL_HAVE_WINDOWED
-#define UNALIGNED_USER_EXCEPTION
-#endif
-
 #if XCHAL_HAVE_BE
 
 #define HWORD_START    16
  *
  *            23                           0
  *             -----------------------------
- *     res               0000           0010
+ *     L8UI    xxxx xxxx 0000 ssss tttt 0010
  *     L16UI   xxxx xxxx 0001 ssss tttt 0010
  *     L32I    xxxx xxxx 0010 ssss tttt 0010
  *     XXX               0011 ssss tttt 0010
 
 #define OP0_L32I_N     0x8             /* load immediate narrow */
 #define OP0_S32I_N     0x9             /* store immediate narrow */
+#define OP0_LSAI       0x2             /* load/store */
 #define OP1_SI_MASK    0x4             /* OP1 bit set for stores */
 #define OP1_SI_BIT     2               /* OP1 bit number for stores */
 
+#define OP1_L8UI       0x0
 #define OP1_L32I       0x2
 #define OP1_L16UI      0x1
 #define OP1_L16SI      0x9
  */
 
        .literal_position
-ENTRY(fast_unaligned)
+#ifdef CONFIG_XTENSA_LOAD_STORE
+ENTRY(fast_load_store)
 
-       /* Note: We don't expect the address to be aligned on a word
-        *       boundary. After all, the processor generated that exception
-        *       and it would be a hardware fault.
-        */
+       call0   .Lsave_and_load_instruction
 
-       /* Save some working register */
+       /* Analyze the instruction (load or store?). */
 
-       s32i    a4, a2, PT_AREG4
-       s32i    a5, a2, PT_AREG5
-       s32i    a6, a2, PT_AREG6
-       s32i    a7, a2, PT_AREG7
-       s32i    a8, a2, PT_AREG8
+       extui   a0, a4, INSN_OP0, 4     # get insn.op0 nibble
 
-       rsr     a0, depc
-       s32i    a0, a2, PT_AREG2
-       s32i    a3, a2, PT_AREG3
+#if XCHAL_HAVE_DENSITY
+       _beqi   a0, OP0_L32I_N, 1f      # L32I.N, jump
+#endif
+       bnei    a0, OP0_LSAI, .Linvalid_instruction
+       /* 'store indicator bit' set, jump */
+       bbsi.l  a4, OP1_SI_BIT + INSN_OP1, .Linvalid_instruction
 
-       rsr     a3, excsave1
-       movi    a4, fast_unaligned_fixup
-       s32i    a4, a3, EXC_TABLE_FIXUP
+1:
+       movi    a3, ~3
+       and     a3, a3, a8              # align memory address
 
-       /* Keep value of SAR in a0 */
+       __ssa8  a8
 
-       rsr     a0, sar
-       rsr     a8, excvaddr            # load unaligned memory address
+#ifdef CONFIG_MMU
+       /* l32e can't be used here even when it's available. */
+       /* TODO access_ok(a3) could be used here */
+       j       .Linvalid_instruction
+#endif
+       l32i    a5, a3, 0
+       l32i    a6, a3, 4
+       __src_b a3, a5, a6              # a3 has the data word
 
-       /* Now, identify one of the following load/store instructions.
-        *
-        * The only possible danger of a double exception on the
-        * following l32i instructions is kernel code in vmalloc
-        * memory. The processor was just executing at the EPC_1
-        * address, and indeed, already fetched the instruction.  That
-        * guarantees a TLB mapping, which hasn't been replaced by
-        * this unaligned exception handler that uses only static TLB
-        * mappings. However, high-level interrupt handlers might
-        * modify TLB entries, so for the generic case, we register a
-        * TABLE_FIXUP handler here, too.
-        */
+#if XCHAL_HAVE_DENSITY
+       addi    a7, a7, 2               # increment PC (assume 16-bit insn)
+       _beqi   a0, OP0_L32I_N, .Lload_w# l32i.n: jump
+       addi    a7, a7, 1
+#else
+       addi    a7, a7, 3
+#endif
 
-       /* a3...a6 saved on stack, a2 = SP */
+       extui   a5, a4, INSN_OP1, 4
+       _beqi   a5, OP1_L32I, .Lload_w
+       bnei    a5, OP1_L8UI, .Lload16
+       extui   a3, a3, 0, 8
+       j       .Lload_w
 
-       /* Extract the instruction that caused the unaligned access. */
+ENDPROC(fast_load_store)
+#endif
 
-       rsr     a7, epc1        # load exception address
-       movi    a3, ~3
-       and     a3, a3, a7      # mask lower bits
+/*
+ * Entry condition:
+ *
+ *   a0:       trashed, original value saved on stack (PT_AREG0)
+ *   a1:       a1
+ *   a2:       new stack pointer, original in DEPC
+ *   a3:       a3
+ *   depc:     a2, original value saved on stack (PT_DEPC)
+ *   excsave_1:        dispatch table
+ *
+ *   PT_DEPC >= VALID_DOUBLE_EXCEPTION_ADDRESS: double exception, DEPC
+ *          <  VALID_DOUBLE_EXCEPTION_ADDRESS: regular exception
+ */
 
-       l32i    a4, a3, 0       # load 2 words
-       l32i    a5, a3, 4
+#ifdef ANY_EXCEPTION_HANDLER
+ENTRY(fast_unaligned)
 
-       __ssa8  a7
-       __src_b a4, a4, a5      # a4 has the instruction
+#if XCHAL_UNALIGNED_LOAD_EXCEPTION || XCHAL_UNALIGNED_STORE_EXCEPTION
+
+       call0   .Lsave_and_load_instruction
 
        /* Analyze the instruction (load or store?). */
 
@@ -222,12 +244,17 @@ ENTRY(fast_unaligned)
        /* 'store indicator bit' not set, jump */
        _bbci.l a4, OP1_SI_BIT + INSN_OP1, .Lload
 
+#endif
+#if XCHAL_UNALIGNED_STORE_EXCEPTION
+
        /* Store: Jump to table entry to get the value in the source register.*/
 
 .Lstore:movi   a5, .Lstore_table       # table
        extui   a6, a4, INSN_T, 4       # get source register
        addx8   a5, a6, a5
        jx      a5                      # jump into table
+#endif
+#if XCHAL_UNALIGNED_LOAD_EXCEPTION
 
        /* Load: Load memory address. */
 
@@ -249,7 +276,7 @@ ENTRY(fast_unaligned)
        addi    a7, a7, 2               # increment PC (assume 16-bit insn)
 
        extui   a5, a4, INSN_OP0, 4
-       _beqi   a5, OP0_L32I_N, 1f      # l32i.n: jump
+       _beqi   a5, OP0_L32I_N, .Lload_w# l32i.n: jump
 
        addi    a7, a7, 1
 #else
@@ -257,21 +284,26 @@ ENTRY(fast_unaligned)
 #endif
 
        extui   a5, a4, INSN_OP1, 4
-       _beqi   a5, OP1_L32I, 1f        # l32i: jump
-
+       _beqi   a5, OP1_L32I, .Lload_w  # l32i: jump
+#endif
+#ifdef LOAD_EXCEPTION_HANDLER
+.Lload16:
        extui   a3, a3, 0, 16           # extract lower 16 bits
-       _beqi   a5, OP1_L16UI, 1f
+       _beqi   a5, OP1_L16UI, .Lload_w
        addi    a5, a5, -OP1_L16SI
-       _bnez   a5, .Linvalid_instruction_load
+       _bnez   a5, .Linvalid_instruction
 
        /* sign extend value */
-
+#if XCHAL_HAVE_SEXT
+       sext    a3, a3, 15
+#else
        slli    a3, a3, 16
        srai    a3, a3, 16
+#endif
 
        /* Set target register. */
 
-1:
+.Lload_w:
        extui   a4, a4, INSN_T, 4       # extract target register
        movi    a5, .Lload_table
        addx8   a4, a4, a5
@@ -295,30 +327,32 @@ ENTRY(fast_unaligned)
        mov     a13, a3         ;       _j .Lexit;      .align 8
        mov     a14, a3         ;       _j .Lexit;      .align 8
        mov     a15, a3         ;       _j .Lexit;      .align 8
-
+#endif
+#if XCHAL_UNALIGNED_STORE_EXCEPTION
 .Lstore_table:
-       l32i    a3, a2, PT_AREG0;       _j 1f;  .align 8
-       mov     a3, a1;                 _j 1f;  .align 8        # fishy??
-       l32i    a3, a2, PT_AREG2;       _j 1f;  .align 8
-       l32i    a3, a2, PT_AREG3;       _j 1f;  .align 8
-       l32i    a3, a2, PT_AREG4;       _j 1f;  .align 8
-       l32i    a3, a2, PT_AREG5;       _j 1f;  .align 8
-       l32i    a3, a2, PT_AREG6;       _j 1f;  .align 8
-       l32i    a3, a2, PT_AREG7;       _j 1f;  .align 8
-       l32i    a3, a2, PT_AREG8;       _j 1f;  .align 8
-       mov     a3, a9          ;       _j 1f;  .align 8
-       mov     a3, a10         ;       _j 1f;  .align 8
-       mov     a3, a11         ;       _j 1f;  .align 8
-       mov     a3, a12         ;       _j 1f;  .align 8
-       mov     a3, a13         ;       _j 1f;  .align 8
-       mov     a3, a14         ;       _j 1f;  .align 8
-       mov     a3, a15         ;       _j 1f;  .align 8
+       l32i    a3, a2, PT_AREG0;       _j .Lstore_w;   .align 8
+       mov     a3, a1;                 _j .Lstore_w;   .align 8        # fishy??
+       l32i    a3, a2, PT_AREG2;       _j .Lstore_w;   .align 8
+       l32i    a3, a2, PT_AREG3;       _j .Lstore_w;   .align 8
+       l32i    a3, a2, PT_AREG4;       _j .Lstore_w;   .align 8
+       l32i    a3, a2, PT_AREG5;       _j .Lstore_w;   .align 8
+       l32i    a3, a2, PT_AREG6;       _j .Lstore_w;   .align 8
+       l32i    a3, a2, PT_AREG7;       _j .Lstore_w;   .align 8
+       l32i    a3, a2, PT_AREG8;       _j .Lstore_w;   .align 8
+       mov     a3, a9          ;       _j .Lstore_w;   .align 8
+       mov     a3, a10         ;       _j .Lstore_w;   .align 8
+       mov     a3, a11         ;       _j .Lstore_w;   .align 8
+       mov     a3, a12         ;       _j .Lstore_w;   .align 8
+       mov     a3, a13         ;       _j .Lstore_w;   .align 8
+       mov     a3, a14         ;       _j .Lstore_w;   .align 8
+       mov     a3, a15         ;       _j .Lstore_w;   .align 8
+#endif
 
+#ifdef ANY_EXCEPTION_HANDLER
        /* We cannot handle this exception. */
 
        .extern _kernel_exception
-.Linvalid_instruction_load:
-.Linvalid_instruction_store:
+.Linvalid_instruction:
 
        movi    a4, 0
        rsr     a3, excsave1
@@ -326,6 +360,7 @@ ENTRY(fast_unaligned)
 
        /* Restore a4...a8 and SAR, set SP, and jump to default exception. */
 
+       l32i    a0, a2, PT_SAR
        l32i    a8, a2, PT_AREG8
        l32i    a7, a2, PT_AREG7
        l32i    a6, a2, PT_AREG6
@@ -342,9 +377,11 @@ ENTRY(fast_unaligned)
 
 2:     movi    a0, _user_exception
        jx      a0
+#endif
+#if XCHAL_UNALIGNED_STORE_EXCEPTION
 
-1:     # a7: instruction pointer, a4: instruction, a3: value
-
+       # a7: instruction pointer, a4: instruction, a3: value
+.Lstore_w:
        movi    a6, 0                   # mask: ffffffff:00000000
 
 #if XCHAL_HAVE_DENSITY
@@ -361,7 +398,7 @@ ENTRY(fast_unaligned)
 
        extui   a5, a4, INSN_OP1, 4     # extract OP1
        _beqi   a5, OP1_S32I, 1f        # jump if 32 bit store
-       _bnei   a5, OP1_S16I, .Linvalid_instruction_store
+       _bnei   a5, OP1_S16I, .Linvalid_instruction
 
        movi    a5, -1
        __extl  a3, a3                  # get 16-bit value
@@ -406,7 +443,8 @@ ENTRY(fast_unaligned)
 #else
        s32i    a6, a4, 4
 #endif
-
+#endif
+#ifdef ANY_EXCEPTION_HANDLER
 .Lexit:
 #if XCHAL_HAVE_LOOPS
        rsr     a4, lend                # check if we reached LEND
@@ -434,6 +472,7 @@ ENTRY(fast_unaligned)
 
        /* Restore working register */
 
+       l32i    a0, a2, PT_SAR
        l32i    a8, a2, PT_AREG8
        l32i    a7, a2, PT_AREG7
        l32i    a6, a2, PT_AREG6
@@ -448,6 +487,59 @@ ENTRY(fast_unaligned)
        l32i    a2, a2, PT_AREG2
        rfe
 
+       .align  4
+.Lsave_and_load_instruction:
+
+       /* Save some working register */
+
+       s32i    a3, a2, PT_AREG3
+       s32i    a4, a2, PT_AREG4
+       s32i    a5, a2, PT_AREG5
+       s32i    a6, a2, PT_AREG6
+       s32i    a7, a2, PT_AREG7
+       s32i    a8, a2, PT_AREG8
+
+       rsr     a4, depc
+       s32i    a4, a2, PT_AREG2
+
+       rsr     a5, sar
+       s32i    a5, a2, PT_SAR
+
+       rsr     a3, excsave1
+       movi    a4, fast_unaligned_fixup
+       s32i    a4, a3, EXC_TABLE_FIXUP
+
+       rsr     a8, excvaddr            # load unaligned memory address
+
+       /* Now, identify one of the following load/store instructions.
+        *
+        * The only possible danger of a double exception on the
+        * following l32i instructions is kernel code in vmalloc
+        * memory. The processor was just executing at the EPC_1
+        * address, and indeed, already fetched the instruction.  That
+        * guarantees a TLB mapping, which hasn't been replaced by
+        * this unaligned exception handler that uses only static TLB
+        * mappings. However, high-level interrupt handlers might
+        * modify TLB entries, so for the generic case, we register a
+        * TABLE_FIXUP handler here, too.
+        */
+
+       /* a3...a6 saved on stack, a2 = SP */
+
+       /* Extract the instruction that caused the unaligned access. */
+
+       rsr     a7, epc1        # load exception address
+       movi    a3, ~3
+       and     a3, a3, a7      # mask lower bits
+
+       l32i    a4, a3, 0       # load 2 words
+       l32i    a5, a3, 4
+
+       __ssa8  a7
+       __src_b a4, a4, a5      # a4 has the instruction
+
+       ret
+#endif
 ENDPROC(fast_unaligned)
 
 ENTRY(fast_unaligned_fixup)
@@ -459,10 +551,11 @@ ENTRY(fast_unaligned_fixup)
        l32i    a7, a2, PT_AREG7
        l32i    a6, a2, PT_AREG6
        l32i    a5, a2, PT_AREG5
-       l32i    a4, a2, PT_AREG4
+       l32i    a4, a2, PT_SAR
        l32i    a0, a2, PT_AREG2
-       xsr     a0, depc                        # restore depc and a0
-       wsr     a0, sar
+       wsr     a4, sar
+       wsr     a0, depc                        # restore depc and a0
+       l32i    a4, a2, PT_AREG4
 
        rsr     a0, exccause
        s32i    a0, a2, PT_DEPC                 # mark as a regular exception
@@ -483,5 +576,4 @@ ENTRY(fast_unaligned_fixup)
        jx      a0
 
 ENDPROC(fast_unaligned_fixup)
-
-#endif /* XCHAL_UNALIGNED_LOAD_EXCEPTION || XCHAL_UNALIGNED_STORE_EXCEPTION */
+#endif
index 51daaf4..309b329 100644 (file)
@@ -78,6 +78,7 @@ ENTRY(_mcount)
 #error Unsupported Xtensa ABI
 #endif
 ENDPROC(_mcount)
+EXPORT_SYMBOL(_mcount)
 
 ENTRY(ftrace_stub)
        abi_entry_default
index ac1e0e5..926b8bf 100644 (file)
 #include <asm/platform.h>
 #include <asm/timex.h>
 
-#define _F(r,f,a,b)                                                    \
-       r __platform_##f a b;                                           \
-       r platform_##f a __attribute__((weak, alias("__platform_"#f)))
-
 /*
  * Default functions that are used if no platform specific function is defined.
- * (Please, refer to include/asm-xtensa/platform.h for more information)
+ * (Please, refer to arch/xtensa/include/asm/platform.h for more information)
  */
 
-_F(void, init, (bp_tag_t *first), { });
-_F(void, setup, (char** cmd), { });
-_F(void, restart, (void), { while(1); });
-_F(void, halt, (void), { while(1); });
-_F(void, power_off, (void), { while(1); });
-_F(void, idle, (void), { __asm__ __volatile__ ("waiti 0" ::: "memory"); });
-_F(void, heartbeat, (void), { });
+void __weak __init platform_init(bp_tag_t *first)
+{
+}
+
+void __weak __init platform_setup(char **cmd)
+{
+}
+
+void __weak platform_idle(void)
+{
+       __asm__ __volatile__ ("waiti 0" ::: "memory");
+}
 
 #ifdef CONFIG_XTENSA_CALIBRATE_CCOUNT
-_F(void, calibrate_ccount, (void),
+void __weak platform_calibrate_ccount(void)
 {
        pr_err("ERROR: Cannot calibrate cpu frequency! Assuming 10MHz.\n");
        ccount_freq = 10 * 1000000UL;
-});
+}
 #endif
index 9191738..aba3ff4 100644 (file)
@@ -22,6 +22,7 @@
 #include <linux/screen_info.h>
 #include <linux/kernel.h>
 #include <linux/percpu.h>
+#include <linux/reboot.h>
 #include <linux/cpu.h>
 #include <linux/of.h>
 #include <linux/of_fdt.h>
@@ -46,6 +47,7 @@
 #include <asm/smp.h>
 #include <asm/sysmem.h>
 #include <asm/timex.h>
+#include <asm/traps.h>
 
 #if defined(CONFIG_VGA_CONSOLE) || defined(CONFIG_DUMMY_CONSOLE)
 struct screen_info screen_info = {
@@ -241,6 +243,12 @@ void __init early_init_devtree(void *params)
 
 void __init init_arch(bp_tag_t *bp_start)
 {
+       /* Initialize basic exception handling if configuration may need it */
+
+       if (IS_ENABLED(CONFIG_KASAN) ||
+           IS_ENABLED(CONFIG_XTENSA_LOAD_STORE))
+               early_trap_init();
+
        /* Initialize MMU. */
 
        init_mmu();
@@ -522,19 +530,30 @@ void cpu_reset(void)
 
 void machine_restart(char * cmd)
 {
-       platform_restart();
+       local_irq_disable();
+       smp_send_stop();
+       do_kernel_restart(cmd);
+       pr_err("Reboot failed -- System halted\n");
+       while (1)
+               cpu_relax();
 }
 
 void machine_halt(void)
 {
-       platform_halt();
-       while (1);
+       local_irq_disable();
+       smp_send_stop();
+       do_kernel_power_off();
+       while (1)
+               cpu_relax();
 }
 
 void machine_power_off(void)
 {
-       platform_power_off();
-       while (1);
+       local_irq_disable();
+       smp_send_stop();
+       do_kernel_power_off();
+       while (1)
+               cpu_relax();
 }
 #ifdef CONFIG_PROC_FS
 
@@ -574,6 +593,12 @@ c_show(struct seq_file *f, void *slot)
 # if XCHAL_HAVE_OCD
                     "ocd "
 # endif
+#if XCHAL_HAVE_TRAX
+                    "trax "
+#endif
+#if XCHAL_NUM_PERF_COUNTERS
+                    "perf "
+#endif
 #endif
 #if XCHAL_HAVE_DENSITY
                     "density "
@@ -623,11 +648,13 @@ c_show(struct seq_file *f, void *slot)
        seq_printf(f,"physical aregs\t: %d\n"
                     "misc regs\t: %d\n"
                     "ibreak\t\t: %d\n"
-                    "dbreak\t\t: %d\n",
+                    "dbreak\t\t: %d\n"
+                    "perf counters\t: %d\n",
                     XCHAL_NUM_AREGS,
                     XCHAL_NUM_MISC_REGS,
                     XCHAL_NUM_IBREAK,
-                    XCHAL_NUM_DBREAK);
+                    XCHAL_NUM_DBREAK,
+                    XCHAL_NUM_PERF_COUNTERS);
 
 
        /* Interrupt. */
index 876d5df..5c01d7e 100644 (file)
@@ -343,7 +343,19 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
        struct rt_sigframe *frame;
        int err = 0, sig = ksig->sig;
        unsigned long sp, ra, tp, ps;
+       unsigned long handler = (unsigned long)ksig->ka.sa.sa_handler;
+       unsigned long handler_fdpic_GOT = 0;
        unsigned int base;
+       bool fdpic = IS_ENABLED(CONFIG_BINFMT_ELF_FDPIC) &&
+               (current->personality & FDPIC_FUNCPTRS);
+
+       if (fdpic) {
+               unsigned long __user *fdpic_func_desc =
+                       (unsigned long __user *)handler;
+               if (__get_user(handler, &fdpic_func_desc[0]) ||
+                   __get_user(handler_fdpic_GOT, &fdpic_func_desc[1]))
+                       return -EFAULT;
+       }
 
        sp = regs->areg[1];
 
@@ -373,20 +385,26 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
        err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
 
        if (ksig->ka.sa.sa_flags & SA_RESTORER) {
-               ra = (unsigned long)ksig->ka.sa.sa_restorer;
+               if (fdpic) {
+                       unsigned long __user *fdpic_func_desc =
+                               (unsigned long __user *)ksig->ka.sa.sa_restorer;
+
+                       err |= __get_user(ra, fdpic_func_desc);
+               } else {
+                       ra = (unsigned long)ksig->ka.sa.sa_restorer;
+               }
        } else {
 
                /* Create sys_rt_sigreturn syscall in stack frame */
 
                err |= gen_return_code(frame->retcode);
-
-               if (err) {
-                       return -EFAULT;
-               }
                ra = (unsigned long) frame->retcode;
        }
 
-       /* 
+       if (err)
+               return -EFAULT;
+
+       /*
         * Create signal handler execution context.
         * Return context not modified until this point.
         */
@@ -394,8 +412,7 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
        /* Set up registers for signal handler; preserve the threadptr */
        tp = regs->threadptr;
        ps = regs->ps;
-       start_thread(regs, (unsigned long) ksig->ka.sa.sa_handler,
-                    (unsigned long) frame);
+       start_thread(regs, handler, (unsigned long)frame);
 
        /* Set up a stack frame for a call4 if userspace uses windowed ABI */
        if (ps & PS_WOE_MASK) {
@@ -413,6 +430,8 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
        regs->areg[base + 4] = (unsigned long) &frame->uc;
        regs->threadptr = tp;
        regs->ps = ps;
+       if (fdpic)
+               regs->areg[base + 11] = handler_fdpic_GOT;
 
        pr_debug("SIG rt deliver (%s:%d): signal=%d sp=%p pc=%08lx\n",
                 current->comm, current->pid, sig, frame, regs->pc);
index 7f7755c..f643ea5 100644 (file)
@@ -237,8 +237,6 @@ EXPORT_SYMBOL_GPL(save_stack_trace);
 
 #endif
 
-#ifdef CONFIG_FRAME_POINTER
-
 struct return_addr_data {
        unsigned long addr;
        unsigned skip;
@@ -271,5 +269,3 @@ unsigned long return_address(unsigned level)
        return r.addr;
 }
 EXPORT_SYMBOL(return_address);
-
-#endif
index 16b8a62..1c3dfea 100644 (file)
@@ -121,10 +121,6 @@ static irqreturn_t timer_interrupt(int irq, void *dev_id)
 
        set_linux_timer(get_linux_timer());
        evt->event_handler(evt);
-
-       /* Allow platform to do something useful (Wdog). */
-       platform_heartbeat();
-
        return IRQ_HANDLED;
 }
 
index f0a7d1c..17eb180 100644 (file)
@@ -54,9 +54,10 @@ static void do_interrupt(struct pt_regs *regs);
 #if XTENSA_FAKE_NMI
 static void do_nmi(struct pt_regs *regs);
 #endif
-#if XCHAL_UNALIGNED_LOAD_EXCEPTION || XCHAL_UNALIGNED_STORE_EXCEPTION
-static void do_unaligned_user(struct pt_regs *regs);
+#ifdef CONFIG_XTENSA_LOAD_STORE
+static void do_load_store(struct pt_regs *regs);
 #endif
+static void do_unaligned_user(struct pt_regs *regs);
 static void do_multihit(struct pt_regs *regs);
 #if XTENSA_HAVE_COPROCESSORS
 static void do_coprocessor(struct pt_regs *regs);
@@ -91,7 +92,10 @@ static dispatch_init_table_t __initdata dispatch_init_table[] = {
 { EXCCAUSE_SYSTEM_CALL,                USER,      fast_syscall_user },
 { EXCCAUSE_SYSTEM_CALL,                0,         system_call },
 /* EXCCAUSE_INSTRUCTION_FETCH unhandled */
-/* EXCCAUSE_LOAD_STORE_ERROR unhandled*/
+#ifdef CONFIG_XTENSA_LOAD_STORE
+{ EXCCAUSE_LOAD_STORE_ERROR,   USER|KRNL, fast_load_store },
+{ EXCCAUSE_LOAD_STORE_ERROR,   0,         do_load_store },
+#endif
 { EXCCAUSE_LEVEL1_INTERRUPT,   0,         do_interrupt },
 #ifdef SUPPORT_WINDOWED
 { EXCCAUSE_ALLOCA,             USER|KRNL, fast_alloca },
@@ -102,9 +106,9 @@ static dispatch_init_table_t __initdata dispatch_init_table[] = {
 #ifdef CONFIG_XTENSA_UNALIGNED_USER
 { EXCCAUSE_UNALIGNED,          USER,      fast_unaligned },
 #endif
-{ EXCCAUSE_UNALIGNED,          0,         do_unaligned_user },
 { EXCCAUSE_UNALIGNED,          KRNL,      fast_unaligned },
 #endif
+{ EXCCAUSE_UNALIGNED,          0,         do_unaligned_user },
 #ifdef CONFIG_MMU
 { EXCCAUSE_ITLB_MISS,                  0,         do_page_fault },
 { EXCCAUSE_ITLB_MISS,                  USER|KRNL, fast_second_level_miss},
@@ -171,6 +175,23 @@ __die_if_kernel(const char *str, struct pt_regs *regs, long err)
                die(str, regs, err);
 }
 
+#ifdef CONFIG_PRINT_USER_CODE_ON_UNHANDLED_EXCEPTION
+static inline void dump_user_code(struct pt_regs *regs)
+{
+       char buf[32];
+
+       if (copy_from_user(buf, (void __user *)(regs->pc & -16), sizeof(buf)) == 0) {
+               print_hex_dump(KERN_INFO, " ", DUMP_PREFIX_NONE,
+                              32, 1, buf, sizeof(buf), false);
+
+       }
+}
+#else
+static inline void dump_user_code(struct pt_regs *regs)
+{
+}
+#endif
+
 /*
  * Unhandled Exceptions. Kill user task or panic if in kernel space.
  */
@@ -186,6 +207,7 @@ void do_unhandled(struct pt_regs *regs)
                            "\tEXCCAUSE is %ld\n",
                            current->comm, task_pid_nr(current), regs->pc,
                            regs->exccause);
+       dump_user_code(regs);
        force_sig(SIGILL);
 }
 
@@ -349,6 +371,19 @@ static void do_div0(struct pt_regs *regs)
        force_sig_fault(SIGFPE, FPE_INTDIV, (void __user *)regs->pc);
 }
 
+#ifdef CONFIG_XTENSA_LOAD_STORE
+static void do_load_store(struct pt_regs *regs)
+{
+       __die_if_kernel("Unhandled load/store exception in kernel",
+                       regs, SIGKILL);
+
+       pr_info_ratelimited("Load/store error to %08lx in '%s' (pid = %d, pc = %#010lx)\n",
+                           regs->excvaddr, current->comm,
+                           task_pid_nr(current), regs->pc);
+       force_sig_fault(SIGBUS, BUS_ADRERR, (void *)regs->excvaddr);
+}
+#endif
+
 /*
  * Handle unaligned memory accesses from user space. Kill task.
  *
@@ -356,7 +391,6 @@ static void do_div0(struct pt_regs *regs)
  * accesses causes from user space.
  */
 
-#if XCHAL_UNALIGNED_LOAD_EXCEPTION || XCHAL_UNALIGNED_STORE_EXCEPTION
 static void do_unaligned_user(struct pt_regs *regs)
 {
        __die_if_kernel("Unhandled unaligned exception in kernel",
@@ -368,7 +402,6 @@ static void do_unaligned_user(struct pt_regs *regs)
                            task_pid_nr(current), regs->pc);
        force_sig_fault(SIGBUS, BUS_ADRALN, (void *) regs->excvaddr);
 }
-#endif
 
 #if XTENSA_HAVE_COPROCESSORS
 static void do_coprocessor(struct pt_regs *regs)
@@ -534,31 +567,58 @@ static void show_trace(struct task_struct *task, unsigned long *sp,
 }
 
 #define STACK_DUMP_ENTRY_SIZE 4
-#define STACK_DUMP_LINE_SIZE 32
+#define STACK_DUMP_LINE_SIZE 16
 static size_t kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH;
 
-void show_stack(struct task_struct *task, unsigned long *sp, const char *loglvl)
+struct stack_fragment
 {
-       size_t len, off = 0;
-
-       if (!sp)
-               sp = stack_pointer(task);
+       size_t len;
+       size_t off;
+       u8 *sp;
+       const char *loglvl;
+};
 
-       len = min((-(size_t)sp) & (THREAD_SIZE - STACK_DUMP_ENTRY_SIZE),
-                 kstack_depth_to_print * STACK_DUMP_ENTRY_SIZE);
+static int show_stack_fragment_cb(struct stackframe *frame, void *data)
+{
+       struct stack_fragment *sf = data;
 
-       printk("%sStack:\n", loglvl);
-       while (off < len) {
+       while (sf->off < sf->len) {
                u8 line[STACK_DUMP_LINE_SIZE];
-               size_t line_len = len - off > STACK_DUMP_LINE_SIZE ?
-                       STACK_DUMP_LINE_SIZE : len - off;
+               size_t line_len = sf->len - sf->off > STACK_DUMP_LINE_SIZE ?
+                       STACK_DUMP_LINE_SIZE : sf->len - sf->off;
+               bool arrow = sf->off == 0;
 
-               __memcpy(line, (u8 *)sp + off, line_len);
-               print_hex_dump(loglvl, " ", DUMP_PREFIX_NONE,
+               if (frame && frame->sp == (unsigned long)(sf->sp + sf->off))
+                       arrow = true;
+
+               __memcpy(line, sf->sp + sf->off, line_len);
+               print_hex_dump(sf->loglvl, arrow ? "> " : "  ", DUMP_PREFIX_NONE,
                               STACK_DUMP_LINE_SIZE, STACK_DUMP_ENTRY_SIZE,
                               line, line_len, false);
-               off += STACK_DUMP_LINE_SIZE;
+               sf->off += STACK_DUMP_LINE_SIZE;
+               if (arrow)
+                       return 0;
        }
+       return 1;
+}
+
+void show_stack(struct task_struct *task, unsigned long *sp, const char *loglvl)
+{
+       struct stack_fragment sf;
+
+       if (!sp)
+               sp = stack_pointer(task);
+
+       sf.len = min((-(size_t)sp) & (THREAD_SIZE - STACK_DUMP_ENTRY_SIZE),
+                    kstack_depth_to_print * STACK_DUMP_ENTRY_SIZE);
+       sf.off = 0;
+       sf.sp = (u8 *)sp;
+       sf.loglvl = loglvl;
+
+       printk("%sStack:\n", loglvl);
+       walk_stackframe(sp, show_stack_fragment_cb, &sf);
+       while (sf.off < sf.len)
+               show_stack_fragment_cb(NULL, &sf);
        show_trace(task, sp, loglvl);
 }
 
index 2a31b1a..62d81e7 100644 (file)
  */
 
 #include <linux/module.h>
-#include <linux/string.h>
-#include <linux/mm.h>
-#include <linux/interrupt.h>
-#include <asm/irq.h>
-#include <linux/in6.h>
-
-#include <linux/uaccess.h>
-#include <asm/cacheflush.h>
-#include <asm/checksum.h>
-#include <asm/dma.h>
-#include <asm/io.h>
-#include <asm/page.h>
-#include <asm/ftrace.h>
-#ifdef CONFIG_BLK_DEV_FD
-#include <asm/floppy.h>
-#endif
-#ifdef CONFIG_NET
-#include <net/checksum.h>
-#endif /* CONFIG_NET */
-
-
-/*
- * String functions
- */
-EXPORT_SYMBOL(memset);
-EXPORT_SYMBOL(memcpy);
-EXPORT_SYMBOL(memmove);
-EXPORT_SYMBOL(__memset);
-EXPORT_SYMBOL(__memcpy);
-EXPORT_SYMBOL(__memmove);
-#ifdef CONFIG_ARCH_HAS_STRNCPY_FROM_USER
-EXPORT_SYMBOL(__strncpy_user);
-#endif
-EXPORT_SYMBOL(clear_page);
-EXPORT_SYMBOL(copy_page);
+#include <asm/pgtable.h>
 
 EXPORT_SYMBOL(empty_zero_page);
 
-/*
- * gcc internal math functions
- */
-extern long long __ashrdi3(long long, int);
-extern long long __ashldi3(long long, int);
-extern long long __lshrdi3(long long, int);
-extern int __divsi3(int, int);
-extern int __modsi3(int, int);
-extern int __mulsi3(int, int);
-extern unsigned int __udivsi3(unsigned int, unsigned int);
-extern unsigned int __umodsi3(unsigned int, unsigned int);
-extern unsigned long long __umulsidi3(unsigned int, unsigned int);
-
-EXPORT_SYMBOL(__ashldi3);
-EXPORT_SYMBOL(__ashrdi3);
-EXPORT_SYMBOL(__lshrdi3);
-EXPORT_SYMBOL(__divsi3);
-EXPORT_SYMBOL(__modsi3);
-EXPORT_SYMBOL(__mulsi3);
-EXPORT_SYMBOL(__udivsi3);
-EXPORT_SYMBOL(__umodsi3);
-EXPORT_SYMBOL(__umulsidi3);
-
 unsigned int __sync_fetch_and_and_4(volatile void *p, unsigned int v)
 {
        BUG();
@@ -85,35 +28,3 @@ unsigned int __sync_fetch_and_or_4(volatile void *p, unsigned int v)
        BUG();
 }
 EXPORT_SYMBOL(__sync_fetch_and_or_4);
-
-/*
- * Networking support
- */
-EXPORT_SYMBOL(csum_partial);
-EXPORT_SYMBOL(csum_partial_copy_generic);
-
-/*
- * Architecture-specific symbols
- */
-EXPORT_SYMBOL(__xtensa_copy_user);
-EXPORT_SYMBOL(__invalidate_icache_range);
-
-/*
- * Kernel hacking ...
- */
-
-#if defined(CONFIG_VGA_CONSOLE) || defined(CONFIG_DUMMY_CONSOLE)
-// FIXME EXPORT_SYMBOL(screen_info);
-#endif
-
-extern long common_exception_return;
-EXPORT_SYMBOL(common_exception_return);
-
-#ifdef CONFIG_FUNCTION_TRACER
-EXPORT_SYMBOL(_mcount);
-#endif
-
-EXPORT_SYMBOL(__invalidate_dcache_range);
-#if XCHAL_DCACHE_IS_WRITEBACK
-EXPORT_SYMBOL(__flush_dcache_range);
-#endif
index 7ecef05..6e5b223 100644 (file)
@@ -4,9 +4,10 @@
 #
 
 lib-y  += memcopy.o memset.o checksum.o \
-          ashldi3.o ashrdi3.o lshrdi3.o \
+          ashldi3.o ashrdi3.o bswapdi2.o bswapsi2.o lshrdi3.o \
           divsi3.o udivsi3.o modsi3.o umodsi3.o mulsi3.o umulsidi3.o \
-          usercopy.o strncpy_user.o strnlen_user.o
+          usercopy.o strnlen_user.o
+lib-$(CONFIG_ARCH_HAS_STRNCPY_FROM_USER) += strncpy_user.o
 lib-$(CONFIG_PCI) += pci-auto.o
 lib-$(CONFIG_KCSAN) += kcsan-stubs.o
 KCSAN_SANITIZE_kcsan-stubs.o := n
index 67fb0da..cd6b731 100644 (file)
@@ -26,3 +26,4 @@ ENTRY(__ashldi3)
        abi_ret_default
 
 ENDPROC(__ashldi3)
+EXPORT_SYMBOL(__ashldi3)
index cbf052c..07bc6e7 100644 (file)
@@ -26,3 +26,4 @@ ENTRY(__ashrdi3)
        abi_ret_default
 
 ENDPROC(__ashrdi3)
+EXPORT_SYMBOL(__ashrdi3)
diff --git a/arch/xtensa/lib/bswapdi2.S b/arch/xtensa/lib/bswapdi2.S
new file mode 100644 (file)
index 0000000..5d94a93
--- /dev/null
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later WITH GCC-exception-2.0 */
+#include <linux/linkage.h>
+#include <asm/asmmacro.h>
+#include <asm/core.h>
+
+ENTRY(__bswapdi2)
+
+       abi_entry_default
+       ssai    8
+       srli    a4, a2, 16
+       src     a4, a4, a2
+       src     a4, a4, a4
+       src     a4, a2, a4
+       srli    a2, a3, 16
+       src     a2, a2, a3
+       src     a2, a2, a2
+       src     a2, a3, a2
+       mov     a3, a4
+       abi_ret_default
+
+ENDPROC(__bswapdi2)
+EXPORT_SYMBOL(__bswapdi2)
diff --git a/arch/xtensa/lib/bswapsi2.S b/arch/xtensa/lib/bswapsi2.S
new file mode 100644 (file)
index 0000000..fbfb861
--- /dev/null
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later WITH GCC-exception-2.0 */
+#include <linux/linkage.h>
+#include <asm/asmmacro.h>
+#include <asm/core.h>
+
+ENTRY(__bswapsi2)
+
+       abi_entry_default
+       ssai    8
+       srli    a3, a2, 16
+       src     a3, a3, a2
+       src     a3, a3, a3
+       src     a2, a2, a3
+       abi_ret_default
+
+ENDPROC(__bswapsi2)
+EXPORT_SYMBOL(__bswapsi2)
index cf1bed1..ffee6f9 100644 (file)
@@ -169,6 +169,7 @@ ENTRY(csum_partial)
        j       5b              /* branch to handle the remaining byte */
 
 ENDPROC(csum_partial)
+EXPORT_SYMBOL(csum_partial)
 
 /*
  * Copy from ds while checksumming, otherwise like csum_partial
@@ -346,6 +347,7 @@ EX(10f)     s8i     a8, a3, 1
        j       4b              /* process the possible trailing odd byte */
 
 ENDPROC(csum_partial_copy_generic)
+EXPORT_SYMBOL(csum_partial_copy_generic)
 
 
 # Exception handler:
index b044b47..edb3c4a 100644 (file)
@@ -72,3 +72,4 @@ ENTRY(__divsi3)
        abi_ret_default
 
 ENDPROC(__divsi3)
+EXPORT_SYMBOL(__divsi3)
index 129ef8d..e432e1a 100644 (file)
@@ -26,3 +26,4 @@ ENTRY(__lshrdi3)
        abi_ret_default
 
 ENDPROC(__lshrdi3)
+EXPORT_SYMBOL(__lshrdi3)
index b20d206..f607603 100644 (file)
@@ -273,21 +273,8 @@ WEAK(memcpy)
        abi_ret_default
 
 ENDPROC(__memcpy)
-
-/*
- * void bcopy(const void *src, void *dest, size_t n);
- */
-
-ENTRY(bcopy)
-
-       abi_entry_default
-       # a2=src, a3=dst, a4=len
-       mov     a5, a3
-       mov     a3, a2
-       mov     a2, a5
-       j       .Lmovecommon    # go to common code for memmove+bcopy
-
-ENDPROC(bcopy)
+EXPORT_SYMBOL(__memcpy)
+EXPORT_SYMBOL(memcpy)
 
 /*
  * void *memmove(void *dst, const void *src, size_t len);
@@ -551,3 +538,5 @@ WEAK(memmove)
        abi_ret_default
 
 ENDPROC(__memmove)
+EXPORT_SYMBOL(__memmove)
+EXPORT_SYMBOL(memmove)
index 59b1524..262c3f3 100644 (file)
@@ -142,6 +142,8 @@ EX(10f) s8i a3, a5, 0
        abi_ret_default
 
 ENDPROC(__memset)
+EXPORT_SYMBOL(__memset)
+EXPORT_SYMBOL(memset)
 
        .section .fixup, "ax"
        .align  4
index d00e771..c5f4295 100644 (file)
@@ -60,6 +60,7 @@ ENTRY(__modsi3)
        abi_ret_default
 
 ENDPROC(__modsi3)
+EXPORT_SYMBOL(__modsi3)
 
 #if !XCHAL_HAVE_NSA
        .section .rodata
index 91a9d7c..c6b4fd4 100644 (file)
@@ -131,3 +131,4 @@ ENTRY(__mulsi3)
        abi_ret_default
 
 ENDPROC(__mulsi3)
+EXPORT_SYMBOL(__mulsi3)
index 0731912..9841d16 100644 (file)
@@ -201,6 +201,7 @@ EX(10f)     s8i     a9, a11, 0
        abi_ret_default
 
 ENDPROC(__strncpy_user)
+EXPORT_SYMBOL(__strncpy_user)
 
        .section .fixup, "ax"
        .align  4
index 3d391dc..cdcf574 100644 (file)
@@ -133,6 +133,7 @@ EX(10f)     l32i    a9, a4, 0       # get word with first two bytes of string
        abi_ret_default
 
 ENDPROC(__strnlen_user)
+EXPORT_SYMBOL(__strnlen_user)
 
        .section .fixup, "ax"
        .align  4
index d2477e0..59ea2df 100644 (file)
@@ -66,3 +66,4 @@ ENTRY(__udivsi3)
        abi_ret_default
 
 ENDPROC(__udivsi3)
+EXPORT_SYMBOL(__udivsi3)
index 5f031bf..d39a7e5 100644 (file)
@@ -55,3 +55,4 @@ ENTRY(__umodsi3)
        abi_ret_default
 
 ENDPROC(__umodsi3)
+EXPORT_SYMBOL(__umodsi3)
index 1360816..8c7a94a 100644 (file)
@@ -228,3 +228,4 @@ ENTRY(__umulsidi3)
 #endif /* XCHAL_NO_MUL */
 
 ENDPROC(__umulsidi3)
+EXPORT_SYMBOL(__umulsidi3)
index 16128c0..2c665c0 100644 (file)
@@ -283,6 +283,7 @@ EX(10f)     s8i     a6, a5,  0
        abi_ret(STACK_SIZE)
 
 ENDPROC(__xtensa_copy_user)
+EXPORT_SYMBOL(__xtensa_copy_user)
 
        .section .fixup, "ax"
        .align  4
index 1fef24d..f00d122 100644 (file)
@@ -14,7 +14,6 @@
 #include <linux/kernel.h>
 #include <asm/initialize_mmu.h>
 #include <asm/tlbflush.h>
-#include <asm/traps.h>
 
 void __init kasan_early_init(void)
 {
@@ -31,7 +30,6 @@ void __init kasan_early_init(void)
                BUG_ON(!pmd_none(*pmd));
                set_pmd(pmd, __pmd((unsigned long)kasan_early_shadow_pte));
        }
-       early_trap_init();
 }
 
 static void __init populate(void *start, void *end)
index 0527bf6..ec36f73 100644 (file)
@@ -47,6 +47,7 @@ ENTRY(clear_page)
        abi_ret_default
 
 ENDPROC(clear_page)
+EXPORT_SYMBOL(clear_page)
 
 /*
  * copy_page and copy_user_page are the same for non-cache-aliased configs.
@@ -89,6 +90,7 @@ ENTRY(copy_page)
        abi_ret_default
 
 ENDPROC(copy_page)
+EXPORT_SYMBOL(copy_page)
 
 #ifdef CONFIG_MMU
 /*
@@ -367,6 +369,7 @@ ENTRY(__invalidate_icache_range)
        abi_ret_default
 
 ENDPROC(__invalidate_icache_range)
+EXPORT_SYMBOL(__invalidate_icache_range)
 
 /*
  * void __flush_invalidate_dcache_range(ulong start, ulong size)
@@ -397,6 +400,7 @@ ENTRY(__flush_dcache_range)
        abi_ret_default
 
 ENDPROC(__flush_dcache_range)
+EXPORT_SYMBOL(__flush_dcache_range)
 
 /*
  * void _invalidate_dcache_range(ulong start, ulong size)
@@ -411,6 +415,7 @@ ENTRY(__invalidate_dcache_range)
        abi_ret_default
 
 ENDPROC(__invalidate_dcache_range)
+EXPORT_SYMBOL(__invalidate_dcache_range)
 
 /*
  * void _invalidate_icache_all(void)
index d3433e1..0f1fe13 100644 (file)
@@ -16,6 +16,7 @@
 #include <linux/notifier.h>
 #include <linux/panic_notifier.h>
 #include <linux/printk.h>
+#include <linux/reboot.h>
 #include <linux/string.h>
 
 #include <asm/platform.h>
 #include <platform/simcall.h>
 
 
-void platform_halt(void)
-{
-       pr_info(" ** Called platform_halt() **\n");
-       simc_exit(0);
-}
-
-void platform_power_off(void)
+static int iss_power_off(struct sys_off_data *unused)
 {
        pr_info(" ** Called platform_power_off() **\n");
        simc_exit(0);
+       return NOTIFY_DONE;
 }
 
-void platform_restart(void)
+static int iss_restart(struct notifier_block *this,
+                      unsigned long event, void *ptr)
 {
        /* Flush and reset the mmu, simulate a processor reset, and
         * jump to the reset vector. */
        cpu_reset();
-       /* control never gets here */
+
+       return NOTIFY_DONE;
 }
 
+static struct notifier_block iss_restart_block = {
+       .notifier_call = iss_restart,
+};
+
 static int
 iss_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
 {
@@ -82,4 +84,8 @@ void __init platform_setup(char **p_cmdline)
        }
 
        atomic_notifier_chain_register(&panic_notifier_list, &iss_panic_block);
+       register_restart_handler(&iss_restart_block);
+       register_sys_off_handler(SYS_OFF_MODE_POWER_OFF,
+                                SYS_OFF_PRIO_PLATFORM,
+                                iss_power_off, NULL);
 }
index f50caaa..178cf96 100644 (file)
@@ -120,9 +120,9 @@ static void simdisk_submit_bio(struct bio *bio)
        bio_endio(bio);
 }
 
-static int simdisk_open(struct block_device *bdev, fmode_t mode)
+static int simdisk_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct simdisk *dev = bdev->bd_disk->private_data;
+       struct simdisk *dev = disk->private_data;
 
        spin_lock(&dev->lock);
        ++dev->users;
@@ -130,7 +130,7 @@ static int simdisk_open(struct block_device *bdev, fmode_t mode)
        return 0;
 }
 
-static void simdisk_release(struct gendisk *disk, fmode_t mode)
+static void simdisk_release(struct gendisk *disk)
 {
        struct simdisk *dev = disk->private_data;
        spin_lock(&dev->lock);
index 0dc22c3..258e01a 100644 (file)
@@ -23,6 +23,7 @@
 #include <linux/platform_device.h>
 #include <linux/serial.h>
 #include <linux/serial_8250.h>
+#include <linux/timer.h>
 
 #include <asm/processor.h>
 #include <asm/platform.h>
@@ -41,51 +42,46 @@ static void led_print (int f, char *s)
                    break;
 }
 
-void platform_halt(void)
-{
-       led_print (0, "  HALT  ");
-       local_irq_disable();
-       while (1);
-}
-
-void platform_power_off(void)
+static int xt2000_power_off(struct sys_off_data *unused)
 {
        led_print (0, "POWEROFF");
        local_irq_disable();
        while (1);
+       return NOTIFY_DONE;
 }
 
-void platform_restart(void)
+static int xt2000_restart(struct notifier_block *this,
+                         unsigned long event, void *ptr)
 {
        /* Flush and reset the mmu, simulate a processor reset, and
         * jump to the reset vector. */
        cpu_reset();
-       /* control never gets here */
+
+       return NOTIFY_DONE;
 }
 
+static struct notifier_block xt2000_restart_block = {
+       .notifier_call = xt2000_restart,
+};
+
 void __init platform_setup(char** cmdline)
 {
        led_print (0, "LINUX   ");
 }
 
-/* early initialization */
+/* Heartbeat. Let the LED blink. */
 
-void __init platform_init(bp_tag_t *first)
-{
-}
+static void xt2000_heartbeat(struct timer_list *unused);
 
-/* Heartbeat. Let the LED blink. */
+static DEFINE_TIMER(heartbeat_timer, xt2000_heartbeat);
 
-void platform_heartbeat(void)
+static void xt2000_heartbeat(struct timer_list *unused)
 {
-       static int i, t;
+       static int i;
 
-       if (--t < 0)
-       {
-               t = 59;
-               led_print(7, i ? ".": " ");
-               i ^= 1;
-       }
+       led_print(7, i ? "." : " ");
+       i ^= 1;
+       mod_timer(&heartbeat_timer, jiffies + HZ / 2);
 }
 
 //#define RS_TABLE_SIZE 2
@@ -143,7 +139,11 @@ static int __init xt2000_setup_devinit(void)
 {
        platform_device_register(&xt2000_serial8250_device);
        platform_device_register(&xt2000_sonic_device);
-
+       mod_timer(&heartbeat_timer, jiffies + HZ / 2);
+       register_restart_handler(&xt2000_restart_block);
+       register_sys_off_handler(SYS_OFF_MODE_POWER_OFF,
+                                SYS_OFF_PRIO_DEFAULT,
+                                xt2000_power_off, NULL);
        return 0;
 }
 
index c79c1d0..a2432f0 100644 (file)
 #include <platform/lcd.h>
 #include <platform/hardware.h>
 
-void platform_halt(void)
-{
-       lcd_disp_at_pos(" HALT ", 0);
-       local_irq_disable();
-       while (1)
-               cpu_relax();
-}
-
-void platform_power_off(void)
+static int xtfpga_power_off(struct sys_off_data *unused)
 {
        lcd_disp_at_pos("POWEROFF", 0);
        local_irq_disable();
        while (1)
                cpu_relax();
+       return NOTIFY_DONE;
 }
 
-void platform_restart(void)
+static int xtfpga_restart(struct notifier_block *this,
+                         unsigned long event, void *ptr)
 {
        /* Try software reset first. */
        WRITE_ONCE(*(u32 *)XTFPGA_SWRST_VADDR, 0xdead);
@@ -58,9 +52,14 @@ void platform_restart(void)
         * simulate a processor reset, and jump to the reset vector.
         */
        cpu_reset();
-       /* control never gets here */
+
+       return NOTIFY_DONE;
 }
 
+static struct notifier_block xtfpga_restart_block = {
+       .notifier_call = xtfpga_restart,
+};
+
 #ifdef CONFIG_XTENSA_CALIBRATE_CCOUNT
 
 void __init platform_calibrate_ccount(void)
@@ -70,6 +69,14 @@ void __init platform_calibrate_ccount(void)
 
 #endif
 
+static void __init xtfpga_register_handlers(void)
+{
+       register_restart_handler(&xtfpga_restart_block);
+       register_sys_off_handler(SYS_OFF_MODE_POWER_OFF,
+                                SYS_OFF_PRIO_DEFAULT,
+                                xtfpga_power_off, NULL);
+}
+
 #ifdef CONFIG_USE_OF
 
 static void __init xtfpga_clk_setup(struct device_node *np)
@@ -134,6 +141,9 @@ static int __init machine_setup(void)
        if ((eth = of_find_compatible_node(eth, NULL, "opencores,ethoc")))
                update_local_mac(eth);
        of_node_put(eth);
+
+       xtfpga_register_handlers();
+
        return 0;
 }
 arch_initcall(machine_setup);
@@ -281,6 +291,8 @@ static int __init xtavnet_init(void)
        pr_info("XTFPGA: Ethernet MAC %pM\n", ethoc_pdata.hwaddr);
        ethoc_pdata.eth_clkfreq = *(long *)XTFPGA_CLKFRQ_VADDR;
 
+       xtfpga_register_handlers();
+
        return 0;
 }
 
index b31b053..46ada9d 100644 (file)
@@ -9,7 +9,7 @@ obj-y           := bdev.o fops.o bio.o elevator.o blk-core.o blk-sysfs.o \
                        blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
                        blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
                        genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
-                       disk-events.o blk-ia-ranges.o
+                       disk-events.o blk-ia-ranges.o early-lookup.o
 
 obj-$(CONFIG_BOUNCE)           += bounce.o
 obj-$(CONFIG_BLK_DEV_BSG_COMMON) += bsg.o
index 21c63bf..979e28a 100644 (file)
@@ -93,7 +93,7 @@ EXPORT_SYMBOL(invalidate_bdev);
  * Drop all buffers & page cache for given bdev range. This function bails
  * with error if bdev has other exclusive owner (such as filesystem).
  */
-int truncate_bdev_range(struct block_device *bdev, fmode_t mode,
+int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
                        loff_t lstart, loff_t lend)
 {
        /*
@@ -101,14 +101,14 @@ int truncate_bdev_range(struct block_device *bdev, fmode_t mode,
         * while we discard the buffer cache to avoid discarding buffers
         * under live filesystem.
         */
-       if (!(mode & FMODE_EXCL)) {
-               int err = bd_prepare_to_claim(bdev, truncate_bdev_range);
+       if (!(mode & BLK_OPEN_EXCL)) {
+               int err = bd_prepare_to_claim(bdev, truncate_bdev_range, NULL);
                if (err)
                        goto invalidate;
        }
 
        truncate_inode_pages_range(bdev->bd_inode->i_mapping, lstart, lend);
-       if (!(mode & FMODE_EXCL))
+       if (!(mode & BLK_OPEN_EXCL))
                bd_abort_claiming(bdev, truncate_bdev_range);
        return 0;
 
@@ -308,7 +308,7 @@ EXPORT_SYMBOL(thaw_bdev);
  * pseudo-fs
  */
 
-static  __cacheline_aligned_in_smp DEFINE_SPINLOCK(bdev_lock);
+static  __cacheline_aligned_in_smp DEFINE_MUTEX(bdev_lock);
 static struct kmem_cache * bdev_cachep __read_mostly;
 
 static struct inode *bdev_alloc_inode(struct super_block *sb)
@@ -415,6 +415,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
        bdev = I_BDEV(inode);
        mutex_init(&bdev->bd_fsfreeze_mutex);
        spin_lock_init(&bdev->bd_size_lock);
+       mutex_init(&bdev->bd_holder_lock);
        bdev->bd_partno = partno;
        bdev->bd_inode = inode;
        bdev->bd_queue = disk->queue;
@@ -463,39 +464,48 @@ long nr_blockdev_pages(void)
 /**
  * bd_may_claim - test whether a block device can be claimed
  * @bdev: block device of interest
- * @whole: whole block device containing @bdev, may equal @bdev
  * @holder: holder trying to claim @bdev
+ * @hops: holder ops
  *
  * Test whether @bdev can be claimed by @holder.
  *
- * CONTEXT:
- * spin_lock(&bdev_lock).
- *
  * RETURNS:
  * %true if @bdev can be claimed, %false otherwise.
  */
-static bool bd_may_claim(struct block_device *bdev, struct block_device *whole,
-                        void *holder)
+static bool bd_may_claim(struct block_device *bdev, void *holder,
+               const struct blk_holder_ops *hops)
 {
-       if (bdev->bd_holder == holder)
-               return true;     /* already a holder */
-       else if (bdev->bd_holder != NULL)
-               return false;    /* held by someone else */
-       else if (whole == bdev)
-               return true;     /* is a whole device which isn't held */
-
-       else if (whole->bd_holder == bd_may_claim)
-               return true;     /* is a partition of a device that is being partitioned */
-       else if (whole->bd_holder != NULL)
-               return false;    /* is a partition of a held device */
-       else
-               return true;     /* is a partition of an un-held device */
+       struct block_device *whole = bdev_whole(bdev);
+
+       lockdep_assert_held(&bdev_lock);
+
+       if (bdev->bd_holder) {
+               /*
+                * The same holder can always re-claim.
+                */
+               if (bdev->bd_holder == holder) {
+                       if (WARN_ON_ONCE(bdev->bd_holder_ops != hops))
+                               return false;
+                       return true;
+               }
+               return false;
+       }
+
+       /*
+        * If the whole devices holder is set to bd_may_claim, a partition on
+        * the device is claimed, but not the whole device.
+        */
+       if (whole != bdev &&
+           whole->bd_holder && whole->bd_holder != bd_may_claim)
+               return false;
+       return true;
 }
 
 /**
  * bd_prepare_to_claim - claim a block device
  * @bdev: block device of interest
  * @holder: holder trying to claim @bdev
+ * @hops: holder ops.
  *
  * Claim @bdev.  This function fails if @bdev is already claimed by another
  * holder and waits if another claiming is in progress. return, the caller
@@ -504,17 +514,18 @@ static bool bd_may_claim(struct block_device *bdev, struct block_device *whole,
  * RETURNS:
  * 0 if @bdev can be claimed, -EBUSY otherwise.
  */
-int bd_prepare_to_claim(struct block_device *bdev, void *holder)
+int bd_prepare_to_claim(struct block_device *bdev, void *holder,
+               const struct blk_holder_ops *hops)
 {
        struct block_device *whole = bdev_whole(bdev);
 
        if (WARN_ON_ONCE(!holder))
                return -EINVAL;
 retry:
-       spin_lock(&bdev_lock);
+       mutex_lock(&bdev_lock);
        /* if someone else claimed, fail */
-       if (!bd_may_claim(bdev, whole, holder)) {
-               spin_unlock(&bdev_lock);
+       if (!bd_may_claim(bdev, holder, hops)) {
+               mutex_unlock(&bdev_lock);
                return -EBUSY;
        }
 
@@ -524,7 +535,7 @@ retry:
                DEFINE_WAIT(wait);
 
                prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
-               spin_unlock(&bdev_lock);
+               mutex_unlock(&bdev_lock);
                schedule();
                finish_wait(wq, &wait);
                goto retry;
@@ -532,7 +543,7 @@ retry:
 
        /* yay, all mine */
        whole->bd_claiming = holder;
-       spin_unlock(&bdev_lock);
+       mutex_unlock(&bdev_lock);
        return 0;
 }
 EXPORT_SYMBOL_GPL(bd_prepare_to_claim); /* only for the loop driver */
@@ -550,16 +561,18 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
  * bd_finish_claiming - finish claiming of a block device
  * @bdev: block device of interest
  * @holder: holder that has claimed @bdev
+ * @hops: block device holder operations
  *
  * Finish exclusive open of a block device. Mark the device as exlusively
  * open by the holder and wake up all waiters for exclusive open to finish.
  */
-static void bd_finish_claiming(struct block_device *bdev, void *holder)
+static void bd_finish_claiming(struct block_device *bdev, void *holder,
+               const struct blk_holder_ops *hops)
 {
        struct block_device *whole = bdev_whole(bdev);
 
-       spin_lock(&bdev_lock);
-       BUG_ON(!bd_may_claim(bdev, whole, holder));
+       mutex_lock(&bdev_lock);
+       BUG_ON(!bd_may_claim(bdev, holder, hops));
        /*
         * Note that for a whole device bd_holders will be incremented twice,
         * and bd_holder will be set to bd_may_claim before being set to holder
@@ -567,9 +580,12 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
        whole->bd_holders++;
        whole->bd_holder = bd_may_claim;
        bdev->bd_holders++;
+       mutex_lock(&bdev->bd_holder_lock);
        bdev->bd_holder = holder;
+       bdev->bd_holder_ops = hops;
+       mutex_unlock(&bdev->bd_holder_lock);
        bd_clear_claiming(whole, holder);
-       spin_unlock(&bdev_lock);
+       mutex_unlock(&bdev_lock);
 }
 
 /**
@@ -583,12 +599,47 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
  */
 void bd_abort_claiming(struct block_device *bdev, void *holder)
 {
-       spin_lock(&bdev_lock);
+       mutex_lock(&bdev_lock);
        bd_clear_claiming(bdev_whole(bdev), holder);
-       spin_unlock(&bdev_lock);
+       mutex_unlock(&bdev_lock);
 }
 EXPORT_SYMBOL(bd_abort_claiming);
 
+static void bd_end_claim(struct block_device *bdev, void *holder)
+{
+       struct block_device *whole = bdev_whole(bdev);
+       bool unblock = false;
+
+       /*
+        * Release a claim on the device.  The holder fields are protected with
+        * bdev_lock.  open_mutex is used to synchronize disk_holder unlinking.
+        */
+       mutex_lock(&bdev_lock);
+       WARN_ON_ONCE(bdev->bd_holder != holder);
+       WARN_ON_ONCE(--bdev->bd_holders < 0);
+       WARN_ON_ONCE(--whole->bd_holders < 0);
+       if (!bdev->bd_holders) {
+               mutex_lock(&bdev->bd_holder_lock);
+               bdev->bd_holder = NULL;
+               bdev->bd_holder_ops = NULL;
+               mutex_unlock(&bdev->bd_holder_lock);
+               if (bdev->bd_write_holder)
+                       unblock = true;
+       }
+       if (!whole->bd_holders)
+               whole->bd_holder = NULL;
+       mutex_unlock(&bdev_lock);
+
+       /*
+        * If this was the last claim, remove holder link and unblock evpoll if
+        * it was a write holder.
+        */
+       if (unblock) {
+               disk_unblock_events(bdev->bd_disk);
+               bdev->bd_write_holder = false;
+       }
+}
+
 static void blkdev_flush_mapping(struct block_device *bdev)
 {
        WARN_ON_ONCE(bdev->bd_holders);
@@ -597,13 +648,13 @@ static void blkdev_flush_mapping(struct block_device *bdev)
        bdev_write_inode(bdev);
 }
 
-static int blkdev_get_whole(struct block_device *bdev, fmode_t mode)
+static int blkdev_get_whole(struct block_device *bdev, blk_mode_t mode)
 {
        struct gendisk *disk = bdev->bd_disk;
        int ret;
 
        if (disk->fops->open) {
-               ret = disk->fops->open(bdev, mode);
+               ret = disk->fops->open(disk, mode);
                if (ret) {
                        /* avoid ghost partitions on a removed medium */
                        if (ret == -ENOMEDIUM &&
@@ -621,22 +672,19 @@ static int blkdev_get_whole(struct block_device *bdev, fmode_t mode)
        return 0;
 }
 
-static void blkdev_put_whole(struct block_device *bdev, fmode_t mode)
+static void blkdev_put_whole(struct block_device *bdev)
 {
        if (atomic_dec_and_test(&bdev->bd_openers))
                blkdev_flush_mapping(bdev);
        if (bdev->bd_disk->fops->release)
-               bdev->bd_disk->fops->release(bdev->bd_disk, mode);
+               bdev->bd_disk->fops->release(bdev->bd_disk);
 }
 
-static int blkdev_get_part(struct block_device *part, fmode_t mode)
+static int blkdev_get_part(struct block_device *part, blk_mode_t mode)
 {
        struct gendisk *disk = part->bd_disk;
        int ret;
 
-       if (atomic_read(&part->bd_openers))
-               goto done;
-
        ret = blkdev_get_whole(bdev_whole(part), mode);
        if (ret)
                return ret;
@@ -645,26 +693,27 @@ static int blkdev_get_part(struct block_device *part, fmode_t mode)
        if (!bdev_nr_sectors(part))
                goto out_blkdev_put;
 
-       disk->open_partitions++;
-       set_init_blocksize(part);
-done:
+       if (!atomic_read(&part->bd_openers)) {
+               disk->open_partitions++;
+               set_init_blocksize(part);
+       }
        atomic_inc(&part->bd_openers);
        return 0;
 
 out_blkdev_put:
-       blkdev_put_whole(bdev_whole(part), mode);
+       blkdev_put_whole(bdev_whole(part));
        return ret;
 }
 
-static void blkdev_put_part(struct block_device *part, fmode_t mode)
+static void blkdev_put_part(struct block_device *part)
 {
        struct block_device *whole = bdev_whole(part);
 
-       if (!atomic_dec_and_test(&part->bd_openers))
-               return;
-       blkdev_flush_mapping(part);
-       whole->bd_disk->open_partitions--;
-       blkdev_put_whole(whole, mode);
+       if (atomic_dec_and_test(&part->bd_openers)) {
+               blkdev_flush_mapping(part);
+               whole->bd_disk->open_partitions--;
+       }
+       blkdev_put_whole(whole);
 }
 
 struct block_device *blkdev_get_no_open(dev_t dev)
@@ -695,17 +744,17 @@ void blkdev_put_no_open(struct block_device *bdev)
 {
        put_device(&bdev->bd_device);
 }
-
+       
 /**
  * blkdev_get_by_dev - open a block device by device number
  * @dev: device number of block device to open
- * @mode: FMODE_* mask
+ * @mode: open mode (BLK_OPEN_*)
  * @holder: exclusive holder identifier
+ * @hops: holder operations
  *
- * Open the block device described by device number @dev. If @mode includes
- * %FMODE_EXCL, the block device is opened with exclusive access.  Specifying
- * %FMODE_EXCL with a %NULL @holder is invalid.  Exclusive opens may nest for
- * the same @holder.
+ * Open the block device described by device number @dev. If @holder is not
+ * %NULL, the block device is opened with exclusive access.  Exclusive opens may
+ * nest for the same @holder.
  *
  * Use this interface ONLY if you really do not have anything better - i.e. when
  * you are behind a truly sucky interface and all you are given is a device
@@ -717,7 +766,8 @@ void blkdev_put_no_open(struct block_device *bdev)
  * RETURNS:
  * Reference to the block_device on success, ERR_PTR(-errno) on failure.
  */
-struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
+struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+               const struct blk_holder_ops *hops)
 {
        bool unblock_events = true;
        struct block_device *bdev;
@@ -726,8 +776,8 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
 
        ret = devcgroup_check_permission(DEVCG_DEV_BLOCK,
                        MAJOR(dev), MINOR(dev),
-                       ((mode & FMODE_READ) ? DEVCG_ACC_READ : 0) |
-                       ((mode & FMODE_WRITE) ? DEVCG_ACC_WRITE : 0));
+                       ((mode & BLK_OPEN_READ) ? DEVCG_ACC_READ : 0) |
+                       ((mode & BLK_OPEN_WRITE) ? DEVCG_ACC_WRITE : 0));
        if (ret)
                return ERR_PTR(ret);
 
@@ -736,10 +786,16 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
                return ERR_PTR(-ENXIO);
        disk = bdev->bd_disk;
 
-       if (mode & FMODE_EXCL) {
-               ret = bd_prepare_to_claim(bdev, holder);
+       if (holder) {
+               mode |= BLK_OPEN_EXCL;
+               ret = bd_prepare_to_claim(bdev, holder, hops);
                if (ret)
                        goto put_blkdev;
+       } else {
+               if (WARN_ON_ONCE(mode & BLK_OPEN_EXCL)) {
+                       ret = -EIO;
+                       goto put_blkdev;
+               }
        }
 
        disk_block_events(disk);
@@ -756,8 +812,8 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
                ret = blkdev_get_whole(bdev, mode);
        if (ret)
                goto put_module;
-       if (mode & FMODE_EXCL) {
-               bd_finish_claiming(bdev, holder);
+       if (holder) {
+               bd_finish_claiming(bdev, holder, hops);
 
                /*
                 * Block event polling for write claims if requested.  Any write
@@ -766,7 +822,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
                 * writeable reference is too fragile given the way @mode is
                 * used in blkdev_get/put().
                 */
-               if ((mode & FMODE_WRITE) && !bdev->bd_write_holder &&
+               if ((mode & BLK_OPEN_WRITE) && !bdev->bd_write_holder &&
                    (disk->event_flags & DISK_EVENT_FLAG_BLOCK_ON_EXCL_WRITE)) {
                        bdev->bd_write_holder = true;
                        unblock_events = false;
@@ -780,7 +836,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
 put_module:
        module_put(disk->fops->owner);
 abort_claiming:
-       if (mode & FMODE_EXCL)
+       if (holder)
                bd_abort_claiming(bdev, holder);
        mutex_unlock(&disk->open_mutex);
        disk_unblock_events(disk);
@@ -793,13 +849,13 @@ EXPORT_SYMBOL(blkdev_get_by_dev);
 /**
  * blkdev_get_by_path - open a block device by name
  * @path: path to the block device to open
- * @mode: FMODE_* mask
+ * @mode: open mode (BLK_OPEN_*)
  * @holder: exclusive holder identifier
+ * @hops: holder operations
  *
- * Open the block device described by the device file at @path.  If @mode
- * includes %FMODE_EXCL, the block device is opened with exclusive access.
- * Specifying %FMODE_EXCL with a %NULL @holder is invalid.  Exclusive opens may
- * nest for the same @holder.
+ * Open the block device described by the device file at @path.  If @holder is
+ * not %NULL, the block device is opened with exclusive access.  Exclusive opens
+ * may nest for the same @holder.
  *
  * CONTEXT:
  * Might sleep.
@@ -807,8 +863,8 @@ EXPORT_SYMBOL(blkdev_get_by_dev);
  * RETURNS:
  * Reference to the block_device on success, ERR_PTR(-errno) on failure.
  */
-struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
-                                       void *holder)
+struct block_device *blkdev_get_by_path(const char *path, blk_mode_t mode,
+               void *holder, const struct blk_holder_ops *hops)
 {
        struct block_device *bdev;
        dev_t dev;
@@ -818,9 +874,9 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
        if (error)
                return ERR_PTR(error);
 
-       bdev = blkdev_get_by_dev(dev, mode, holder);
-       if (!IS_ERR(bdev) && (mode & FMODE_WRITE) && bdev_read_only(bdev)) {
-               blkdev_put(bdev, mode);
+       bdev = blkdev_get_by_dev(dev, mode, holder, hops);
+       if (!IS_ERR(bdev) && (mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) {
+               blkdev_put(bdev, holder);
                return ERR_PTR(-EACCES);
        }
 
@@ -828,7 +884,7 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 }
 EXPORT_SYMBOL(blkdev_get_by_path);
 
-void blkdev_put(struct block_device *bdev, fmode_t mode)
+void blkdev_put(struct block_device *bdev, void *holder)
 {
        struct gendisk *disk = bdev->bd_disk;
 
@@ -843,36 +899,8 @@ void blkdev_put(struct block_device *bdev, fmode_t mode)
                sync_blockdev(bdev);
 
        mutex_lock(&disk->open_mutex);
-       if (mode & FMODE_EXCL) {
-               struct block_device *whole = bdev_whole(bdev);
-               bool bdev_free;
-
-               /*
-                * Release a claim on the device.  The holder fields
-                * are protected with bdev_lock.  open_mutex is to
-                * synchronize disk_holder unlinking.
-                */
-               spin_lock(&bdev_lock);
-
-               WARN_ON_ONCE(--bdev->bd_holders < 0);
-               WARN_ON_ONCE(--whole->bd_holders < 0);
-
-               if ((bdev_free = !bdev->bd_holders))
-                       bdev->bd_holder = NULL;
-               if (!whole->bd_holders)
-                       whole->bd_holder = NULL;
-
-               spin_unlock(&bdev_lock);
-
-               /*
-                * If this was the last claim, remove holder link and
-                * unblock evpoll if it was a write holder.
-                */
-               if (bdev_free && bdev->bd_write_holder) {
-                       disk_unblock_events(disk);
-                       bdev->bd_write_holder = false;
-               }
-       }
+       if (holder)
+               bd_end_claim(bdev, holder);
 
        /*
         * Trigger event checking and tell drivers to flush MEDIA_CHANGE
@@ -882,9 +910,9 @@ void blkdev_put(struct block_device *bdev, fmode_t mode)
        disk_flush_events(disk, DISK_EVENT_MEDIA_CHANGE);
 
        if (bdev_is_partition(bdev))
-               blkdev_put_part(bdev, mode);
+               blkdev_put_part(bdev);
        else
-               blkdev_put_whole(bdev, mode);
+               blkdev_put_whole(bdev);
        mutex_unlock(&disk->open_mutex);
 
        module_put(disk->fops->owner);
index 3164e31..09bbbcf 100644 (file)
@@ -5403,6 +5403,10 @@ void bfq_put_queue(struct bfq_queue *bfqq)
        if (bfqq->bfqd->last_completed_rq_bfqq == bfqq)
                bfqq->bfqd->last_completed_rq_bfqq = NULL;
 
+       WARN_ON_ONCE(!list_empty(&bfqq->fifo));
+       WARN_ON_ONCE(!RB_EMPTY_ROOT(&bfqq->sort_list));
+       WARN_ON_ONCE(bfqq->dispatched);
+
        kmem_cache_free(bfq_pool, bfqq);
        bfqg_and_blkg_put(bfqg);
 }
@@ -7135,6 +7139,7 @@ static void bfq_exit_queue(struct elevator_queue *e)
 {
        struct bfq_data *bfqd = e->elevator_data;
        struct bfq_queue *bfqq, *n;
+       unsigned int actuator;
 
        hrtimer_cancel(&bfqd->idle_slice_timer);
 
@@ -7143,6 +7148,10 @@ static void bfq_exit_queue(struct elevator_queue *e)
                bfq_deactivate_bfqq(bfqd, bfqq, false, false);
        spin_unlock_irq(&bfqd->lock);
 
+       for (actuator = 0; actuator < bfqd->num_actuators; actuator++)
+               WARN_ON_ONCE(bfqd->rq_in_driver[actuator]);
+       WARN_ON_ONCE(bfqd->tot_rq_in_driver);
+
        hrtimer_cancel(&bfqd->idle_slice_timer);
 
        /* release oom-queue reference to root group */
index 043944f..8672179 100644 (file)
@@ -1138,6 +1138,14 @@ int bio_add_page(struct bio *bio, struct page *page,
 }
 EXPORT_SYMBOL(bio_add_page);
 
+void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len,
+                         size_t off)
+{
+       WARN_ON_ONCE(len > UINT_MAX);
+       WARN_ON_ONCE(off > UINT_MAX);
+       __bio_add_page(bio, &folio->page, len, off);
+}
+
 /**
  * bio_add_folio - Attempt to add part of a folio to a bio.
  * @bio: BIO to add to.
@@ -1169,7 +1177,7 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty)
        bio_for_each_segment_all(bvec, bio, iter_all) {
                if (mark_dirty && !PageCompound(bvec->bv_page))
                        set_page_dirty_lock(bvec->bv_page);
-               put_page(bvec->bv_page);
+               bio_release_page(bio, bvec->bv_page);
        }
 }
 EXPORT_SYMBOL_GPL(__bio_release_pages);
@@ -1191,7 +1199,6 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
        bio->bi_io_vec = (struct bio_vec *)iter->bvec;
        bio->bi_iter.bi_bvec_done = iter->iov_offset;
        bio->bi_iter.bi_size = size;
-       bio_set_flag(bio, BIO_NO_PAGE_REF);
        bio_set_flag(bio, BIO_CLONED);
 }
 
@@ -1206,7 +1213,7 @@ static int bio_iov_add_page(struct bio *bio, struct page *page,
        }
 
        if (same_page)
-               put_page(page);
+               bio_release_page(bio, page);
        return 0;
 }
 
@@ -1220,7 +1227,7 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
                        queue_max_zone_append_sectors(q), &same_page) != len)
                return -EINVAL;
        if (same_page)
-               put_page(page);
+               bio_release_page(bio, page);
        return 0;
 }
 
@@ -1231,10 +1238,10 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
  * @bio: bio to add pages to
  * @iter: iov iterator describing the region to be mapped
  *
- * Pins pages from *iter and appends them to @bio's bvec array. The
- * pages will have to be released using put_page() when done.
- * For multi-segment *iter, this function only adds pages from the
- * next non-empty segment of the iov iterator.
+ * Extracts pages from *iter and appends them to @bio's bvec array.  The pages
+ * will have to be cleaned up in the way indicated by the BIO_PAGE_PINNED flag.
+ * For a multi-segment *iter, this function only adds pages from the next
+ * non-empty segment of the iov iterator.
  */
 static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
@@ -1266,9 +1273,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
         * result to ensure the bio's total size is correct. The remainder of
         * the iov data will be picked up in the next bio iteration.
         */
-       size = iov_iter_get_pages(iter, pages,
-                                 UINT_MAX - bio->bi_iter.bi_size,
-                                 nr_pages, &offset, extraction_flags);
+       size = iov_iter_extract_pages(iter, &pages,
+                                     UINT_MAX - bio->bi_iter.bi_size,
+                                     nr_pages, extraction_flags, &offset);
        if (unlikely(size <= 0))
                return size ? size : -EFAULT;
 
@@ -1301,7 +1308,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
        iov_iter_revert(iter, left);
 out:
        while (i < nr_pages)
-               put_page(pages[i++]);
+               bio_release_page(bio, pages[i++]);
 
        return ret;
 }
@@ -1336,6 +1343,8 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
                return 0;
        }
 
+       if (iov_iter_extract_will_pin(iter))
+               bio_set_flag(bio, BIO_PAGE_PINNED);
        do {
                ret = __bio_iov_iter_get_pages(bio, iter);
        } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
@@ -1489,8 +1498,8 @@ void bio_set_pages_dirty(struct bio *bio)
  * the BIO and re-dirty the pages in process context.
  *
  * It is expected that bio_check_pages_dirty() will wholly own the BIO from
- * here on.  It will run one put_page() against each page and will run one
- * bio_put() against the BIO.
+ * here on.  It will unpin each page and will run one bio_put() against the
+ * BIO.
  */
 
 static void bio_dirty_fn(struct work_struct *work);
index 842e5e1..3ec2133 100644 (file)
@@ -34,7 +34,7 @@ int blkcg_set_fc_appid(char *app_id, u64 cgrp_id, size_t app_id_len)
         * the vmid from the fabric.
         * Adding the overhead of a lock is not necessary.
         */
-       strlcpy(blkcg->fc_app_id, app_id, app_id_len);
+       strscpy(blkcg->fc_app_id, app_id, app_id_len);
        css_put(css);
 out_cgrp_put:
        cgroup_put(cgrp);
index 0ce64dd..aaf9903 100644 (file)
@@ -34,6 +34,8 @@
 #include "blk-ioprio.h"
 #include "blk-throttle.h"
 
+static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu);
+
 /*
  * blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
  * blkcg_pol_register_mutex nests outside of it and synchronizes entire
@@ -56,6 +58,8 @@ static LIST_HEAD(all_blkcgs);         /* protected by blkcg_pol_mutex */
 
 bool blkcg_debug_stats = false;
 
+static DEFINE_RAW_SPINLOCK(blkg_stat_lock);
+
 #define BLKG_DESTROY_BATCH_SIZE  64
 
 /*
@@ -163,10 +167,20 @@ static void blkg_free(struct blkcg_gq *blkg)
 static void __blkg_release(struct rcu_head *rcu)
 {
        struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
+       struct blkcg *blkcg = blkg->blkcg;
+       int cpu;
 
 #ifdef CONFIG_BLK_CGROUP_PUNT_BIO
        WARN_ON(!bio_list_empty(&blkg->async_bios));
 #endif
+       /*
+        * Flush all the non-empty percpu lockless lists before releasing
+        * us, given these stat belongs to us.
+        *
+        * blkg_stat_lock is for serializing blkg stat update
+        */
+       for_each_possible_cpu(cpu)
+               __blkcg_rstat_flush(blkcg, cpu);
 
        /* release the blkcg and parent blkg refs this blkg has been holding */
        css_put(&blkg->blkcg->css);
@@ -610,8 +624,13 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
                        struct blkg_iostat_set *bis =
                                per_cpu_ptr(blkg->iostat_cpu, cpu);
                        memset(bis, 0, sizeof(*bis));
+
+                       /* Re-initialize the cleared blkg_iostat_set */
+                       u64_stats_init(&bis->sync);
+                       bis->blkg = blkg;
                }
                memset(&blkg->iostat, 0, sizeof(blkg->iostat));
+               u64_stats_init(&blkg->iostat.sync);
 
                for (i = 0; i < BLKCG_MAX_POLS; i++) {
                        struct blkcg_policy *pol = blkcg_policy[i];
@@ -748,6 +767,13 @@ int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx)
                return -ENODEV;
        }
 
+       mutex_lock(&bdev->bd_queue->rq_qos_mutex);
+       if (!disk_live(bdev->bd_disk)) {
+               blkdev_put_no_open(bdev);
+               mutex_unlock(&bdev->bd_queue->rq_qos_mutex);
+               return -ENODEV;
+       }
+
        ctx->body = input;
        ctx->bdev = bdev;
        return 0;
@@ -892,6 +918,7 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep);
  */
 void blkg_conf_exit(struct blkg_conf_ctx *ctx)
        __releases(&ctx->bdev->bd_queue->queue_lock)
+       __releases(&ctx->bdev->bd_queue->rq_qos_mutex)
 {
        if (ctx->blkg) {
                spin_unlock_irq(&bdev_get_queue(ctx->bdev)->queue_lock);
@@ -899,6 +926,7 @@ void blkg_conf_exit(struct blkg_conf_ctx *ctx)
        }
 
        if (ctx->bdev) {
+               mutex_unlock(&ctx->bdev->bd_queue->rq_qos_mutex);
                blkdev_put_no_open(ctx->bdev);
                ctx->body = NULL;
                ctx->bdev = NULL;
@@ -951,16 +979,12 @@ static void blkcg_iostat_update(struct blkcg_gq *blkg, struct blkg_iostat *cur,
        u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
 }
 
-static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
+static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
 {
-       struct blkcg *blkcg = css_to_blkcg(css);
        struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
        struct llist_node *lnode;
        struct blkg_iostat_set *bisc, *next_bisc;
-
-       /* Root-level stats are sourced from system-wide IO stats */
-       if (!cgroup_parent(css->cgroup))
-               return;
+       unsigned long flags;
 
        rcu_read_lock();
 
@@ -969,6 +993,14 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
                goto out;
 
        /*
+        * For covering concurrent parent blkg update from blkg_release().
+        *
+        * When flushing from cgroup, cgroup_rstat_lock is always held, so
+        * this lock won't cause contention most of time.
+        */
+       raw_spin_lock_irqsave(&blkg_stat_lock, flags);
+
+       /*
         * Iterate only the iostat_cpu's queued in the lockless list.
         */
        llist_for_each_entry_safe(bisc, next_bisc, lnode, lnode) {
@@ -991,13 +1023,19 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
                if (parent && parent->parent)
                        blkcg_iostat_update(parent, &blkg->iostat.cur,
                                            &blkg->iostat.last);
-               percpu_ref_put(&blkg->refcnt);
        }
-
+       raw_spin_unlock_irqrestore(&blkg_stat_lock, flags);
 out:
        rcu_read_unlock();
 }
 
+static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
+{
+       /* Root-level stats are sourced from system-wide IO stats */
+       if (cgroup_parent(css->cgroup))
+               __blkcg_rstat_flush(css_to_blkcg(css), cpu);
+}
+
 /*
  * We source root cgroup stats from the system-wide stats to avoid
  * tracking the same information twice and incurring overhead when no
@@ -2075,7 +2113,6 @@ void blk_cgroup_bio_start(struct bio *bio)
 
                llist_add(&bis->lnode, lhead);
                WRITE_ONCE(bis->lqueued, true);
-               percpu_ref_get(&bis->blkg->refcnt);
        }
 
        u64_stats_update_end_irqrestore(&bis->sync, flags);
index 00c7433..3fc68b9 100644 (file)
@@ -420,6 +420,7 @@ struct request_queue *blk_alloc_queue(int node_id)
        mutex_init(&q->debugfs_mutex);
        mutex_init(&q->sysfs_lock);
        mutex_init(&q->sysfs_dir_lock);
+       mutex_init(&q->rq_qos_mutex);
        spin_lock_init(&q->queue_lock);
 
        init_waitqueue_head(&q->mq_freeze_wq);
@@ -520,7 +521,7 @@ static inline int bio_check_eod(struct bio *bio)
        sector_t maxsector = bdev_nr_sectors(bio->bi_bdev);
        unsigned int nr_sectors = bio_sectors(bio);
 
-       if (nr_sectors && maxsector &&
+       if (nr_sectors &&
            (nr_sectors > maxsector ||
             bio->bi_iter.bi_sector > maxsector - nr_sectors)) {
                pr_info_ratelimited("%s: attempt to access beyond end of device\n"
index 04698ed..dba392c 100644 (file)
@@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
 
        case REQ_FSEQ_DATA:
                list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
-               blk_mq_add_to_requeue_list(rq, BLK_MQ_INSERT_AT_HEAD);
+               spin_lock(&q->requeue_lock);
+               list_add_tail(&rq->queuelist, &q->flush_list);
+               spin_unlock(&q->requeue_lock);
                blk_mq_kick_requeue_list(q);
                break;
 
@@ -346,7 +348,10 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
        smp_wmb();
        req_ref_set(flush_rq, 1);
 
-       blk_mq_add_to_requeue_list(flush_rq, 0);
+       spin_lock(&q->requeue_lock);
+       list_add_tail(&flush_rq->queuelist, &q->flush_list);
+       spin_unlock(&q->requeue_lock);
+
        blk_mq_kick_requeue_list(q);
 }
 
@@ -376,22 +381,29 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
        return RQ_END_IO_NONE;
 }
 
-/**
- * blk_insert_flush - insert a new PREFLUSH/FUA request
- * @rq: request to insert
- *
- * To be called from __elv_add_request() for %ELEVATOR_INSERT_FLUSH insertions.
- * or __blk_mq_run_hw_queue() to dispatch request.
- * @rq is being submitted.  Analyze what needs to be done and put it on the
- * right queue.
+static void blk_rq_init_flush(struct request *rq)
+{
+       rq->flush.seq = 0;
+       INIT_LIST_HEAD(&rq->flush.list);
+       rq->rq_flags |= RQF_FLUSH_SEQ;
+       rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
+       rq->end_io = mq_flush_data_end_io;
+}
+
+/*
+ * Insert a PREFLUSH/FUA request into the flush state machine.
+ * Returns true if the request has been consumed by the flush state machine,
+ * or false if the caller should continue to process it.
  */
-void blk_insert_flush(struct request *rq)
+bool blk_insert_flush(struct request *rq)
 {
        struct request_queue *q = rq->q;
        unsigned long fflags = q->queue_flags;  /* may change, cache */
        unsigned int policy = blk_flush_policy(fflags, rq);
        struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
-       struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
+
+       /* FLUSH/FUA request must never be merged */
+       WARN_ON_ONCE(rq->bio != rq->biotail);
 
        /*
         * @policy now records what operations need to be done.  Adjust
@@ -408,45 +420,45 @@ void blk_insert_flush(struct request *rq)
         */
        rq->cmd_flags |= REQ_SYNC;
 
-       /*
-        * An empty flush handed down from a stacking driver may
-        * translate into nothing if the underlying device does not
-        * advertise a write-back cache.  In this case, simply
-        * complete the request.
-        */
-       if (!policy) {
+       switch (policy) {
+       case 0:
+               /*
+                * An empty flush handed down from a stacking driver may
+                * translate into nothing if the underlying device does not
+                * advertise a write-back cache.  In this case, simply
+                * complete the request.
+                */
                blk_mq_end_request(rq, 0);
-               return;
-       }
-
-       BUG_ON(rq->bio != rq->biotail); /*assumes zero or single bio rq */
-
-       /*
-        * If there's data but flush is not necessary, the request can be
-        * processed directly without going through flush machinery.  Queue
-        * for normal execution.
-        */
-       if ((policy & REQ_FSEQ_DATA) &&
-           !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
-               blk_mq_request_bypass_insert(rq, 0);
-               blk_mq_run_hw_queue(hctx, false);
-               return;
+               return true;
+       case REQ_FSEQ_DATA:
+               /*
+                * If there's data, but no flush is necessary, the request can
+                * be processed directly without going through flush machinery.
+                * Queue for normal execution.
+                */
+               return false;
+       case REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH:
+               /*
+                * Initialize the flush fields and completion handler to trigger
+                * the post flush, and then just pass the command on.
+                */
+               blk_rq_init_flush(rq);
+               rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
+               spin_lock_irq(&fq->mq_flush_lock);
+               list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
+               spin_unlock_irq(&fq->mq_flush_lock);
+               return false;
+       default:
+               /*
+                * Mark the request as part of a flush sequence and submit it
+                * for further processing to the flush state machine.
+                */
+               blk_rq_init_flush(rq);
+               spin_lock_irq(&fq->mq_flush_lock);
+               blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
+               spin_unlock_irq(&fq->mq_flush_lock);
+               return true;
        }
-
-       /*
-        * @rq should go through flush machinery.  Mark it part of flush
-        * sequence and submit for further processing.
-        */
-       memset(&rq->flush, 0, sizeof(rq->flush));
-       INIT_LIST_HEAD(&rq->flush.list);
-       rq->rq_flags |= RQF_FLUSH_SEQ;
-       rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
-
-       rq->end_io = mq_flush_data_end_io;
-
-       spin_lock_irq(&fq->mq_flush_lock);
-       blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
-       spin_unlock_irq(&fq->mq_flush_lock);
 }
 
 /**
index 63fc020..25dd4db 100644 (file)
@@ -77,6 +77,10 @@ static void ioc_destroy_icq(struct io_cq *icq)
        struct elevator_type *et = q->elevator->type;
 
        lockdep_assert_held(&ioc->lock);
+       lockdep_assert_held(&q->queue_lock);
+
+       if (icq->flags & ICQ_DESTROYED)
+               return;
 
        radix_tree_delete(&ioc->icq_tree, icq->q->id);
        hlist_del_init(&icq->ioc_node);
@@ -128,12 +132,7 @@ static void ioc_release_fn(struct work_struct *work)
                        spin_lock(&q->queue_lock);
                        spin_lock(&ioc->lock);
 
-                       /*
-                        * The icq may have been destroyed when the ioc lock
-                        * was released.
-                        */
-                       if (!(icq->flags & ICQ_DESTROYED))
-                               ioc_destroy_icq(icq);
+                       ioc_destroy_icq(icq);
 
                        spin_unlock(&q->queue_lock);
                        rcu_read_unlock();
@@ -171,23 +170,20 @@ static bool ioc_delay_free(struct io_context *ioc)
  */
 void ioc_clear_queue(struct request_queue *q)
 {
-       LIST_HEAD(icq_list);
-
        spin_lock_irq(&q->queue_lock);
-       list_splice_init(&q->icq_list, &icq_list);
-       spin_unlock_irq(&q->queue_lock);
-
-       rcu_read_lock();
-       while (!list_empty(&icq_list)) {
+       while (!list_empty(&q->icq_list)) {
                struct io_cq *icq =
-                       list_entry(icq_list.next, struct io_cq, q_node);
-
-               spin_lock_irq(&icq->ioc->lock);
-               if (!(icq->flags & ICQ_DESTROYED))
-                       ioc_destroy_icq(icq);
-               spin_unlock_irq(&icq->ioc->lock);
+                       list_first_entry(&q->icq_list, struct io_cq, q_node);
+
+               /*
+                * Other context won't hold ioc lock to wait for queue_lock, see
+                * details in ioc_release_fn().
+                */
+               spin_lock(&icq->ioc->lock);
+               ioc_destroy_icq(icq);
+               spin_unlock(&icq->ioc->lock);
        }
-       rcu_read_unlock();
+       spin_unlock_irq(&q->queue_lock);
 }
 #else /* CONFIG_BLK_ICQ */
 static inline void ioc_exit_icqs(struct io_context *ioc)
index 285ced3..6084a95 100644 (file)
@@ -2455,6 +2455,7 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime,
        u32 hwi, adj_step;
        s64 margin;
        u64 cost, new_inuse;
+       unsigned long flags;
 
        current_hweight(iocg, NULL, &hwi);
        old_hwi = hwi;
@@ -2473,11 +2474,11 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime,
            iocg->inuse == iocg->active)
                return cost;
 
-       spin_lock_irq(&ioc->lock);
+       spin_lock_irqsave(&ioc->lock, flags);
 
        /* we own inuse only when @iocg is in the normal active state */
        if (iocg->abs_vdebt || list_empty(&iocg->active_list)) {
-               spin_unlock_irq(&ioc->lock);
+               spin_unlock_irqrestore(&ioc->lock, flags);
                return cost;
        }
 
@@ -2498,7 +2499,7 @@ static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime,
        } while (time_after64(vtime + cost, now->vnow) &&
                 iocg->inuse != iocg->active);
 
-       spin_unlock_irq(&ioc->lock);
+       spin_unlock_irqrestore(&ioc->lock, flags);
 
        TRACE_IOCG_PATH(inuse_adjust, iocg, now,
                        old_inuse, iocg->inuse, old_hwi, hwi);
index 055529b..4051fad 100644 (file)
 /**
  * enum prio_policy - I/O priority class policy.
  * @POLICY_NO_CHANGE: (default) do not modify the I/O priority class.
- * @POLICY_NONE_TO_RT: modify IOPRIO_CLASS_NONE into IOPRIO_CLASS_RT.
+ * @POLICY_PROMOTE_TO_RT: modify no-IOPRIO_CLASS_RT to IOPRIO_CLASS_RT.
  * @POLICY_RESTRICT_TO_BE: modify IOPRIO_CLASS_NONE and IOPRIO_CLASS_RT into
  *             IOPRIO_CLASS_BE.
  * @POLICY_ALL_TO_IDLE: change the I/O priority class into IOPRIO_CLASS_IDLE.
+ * @POLICY_NONE_TO_RT: an alias for POLICY_PROMOTE_TO_RT.
  *
  * See also <linux/ioprio.h>.
  */
 enum prio_policy {
        POLICY_NO_CHANGE        = 0,
-       POLICY_NONE_TO_RT       = 1,
+       POLICY_PROMOTE_TO_RT    = 1,
        POLICY_RESTRICT_TO_BE   = 2,
        POLICY_ALL_TO_IDLE      = 3,
+       POLICY_NONE_TO_RT       = 4,
 };
 
 static const char *policy_name[] = {
        [POLICY_NO_CHANGE]      = "no-change",
-       [POLICY_NONE_TO_RT]     = "none-to-rt",
+       [POLICY_PROMOTE_TO_RT]  = "promote-to-rt",
        [POLICY_RESTRICT_TO_BE] = "restrict-to-be",
        [POLICY_ALL_TO_IDLE]    = "idle",
+       [POLICY_NONE_TO_RT]     = "none-to-rt",
 };
 
 static struct blkcg_policy ioprio_policy;
@@ -189,6 +192,20 @@ void blkcg_set_ioprio(struct bio *bio)
        if (!blkcg || blkcg->prio_policy == POLICY_NO_CHANGE)
                return;
 
+       if (blkcg->prio_policy == POLICY_PROMOTE_TO_RT ||
+           blkcg->prio_policy == POLICY_NONE_TO_RT) {
+               /*
+                * For RT threads, the default priority level is 4 because
+                * task_nice is 0. By promoting non-RT io-priority to RT-class
+                * and default level 4, those requests that are already
+                * RT-class but need a higher io-priority can use ioprio_set()
+                * to achieve this.
+                */
+               if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) != IOPRIO_CLASS_RT)
+                       bio->bi_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 4);
+               return;
+       }
+
        /*
         * Except for IOPRIO_CLASS_NONE, higher I/O priority numbers
         * correspond to a lower priority. Hence, the max_t() below selects
index 04c55f1..44d74a3 100644 (file)
@@ -248,7 +248,7 @@ static struct bio *blk_rq_map_bio_alloc(struct request *rq,
 {
        struct bio *bio;
 
-       if (rq->cmd_flags & REQ_ALLOC_CACHE) {
+       if (rq->cmd_flags & REQ_ALLOC_CACHE && (nr_vecs <= BIO_INLINE_VECS)) {
                bio = bio_alloc_bioset(NULL, nr_vecs, rq->cmd_flags, gfp_mask,
                                        &fs_bio_set);
                if (!bio)
@@ -281,21 +281,21 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 
        if (blk_queue_pci_p2pdma(rq->q))
                extraction_flags |= ITER_ALLOW_P2PDMA;
+       if (iov_iter_extract_will_pin(iter))
+               bio_set_flag(bio, BIO_PAGE_PINNED);
 
        while (iov_iter_count(iter)) {
-               struct page **pages, *stack_pages[UIO_FASTIOV];
+               struct page *stack_pages[UIO_FASTIOV];
+               struct page **pages = stack_pages;
                ssize_t bytes;
                size_t offs;
                int npages;
 
-               if (nr_vecs <= ARRAY_SIZE(stack_pages)) {
-                       pages = stack_pages;
-                       bytes = iov_iter_get_pages(iter, pages, LONG_MAX,
-                                                  nr_vecs, &offs, extraction_flags);
-               } else {
-                       bytes = iov_iter_get_pages_alloc(iter, &pages,
-                                               LONG_MAX, &offs, extraction_flags);
-               }
+               if (nr_vecs > ARRAY_SIZE(stack_pages))
+                       pages = NULL;
+
+               bytes = iov_iter_extract_pages(iter, &pages, LONG_MAX,
+                                              nr_vecs, extraction_flags, &offs);
                if (unlikely(bytes <= 0)) {
                        ret = bytes ? bytes : -EFAULT;
                        goto out_unmap;
@@ -317,7 +317,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
                                if (!bio_add_hw_page(rq->q, bio, page, n, offs,
                                                     max_sectors, &same_page)) {
                                        if (same_page)
-                                               put_page(page);
+                                               bio_release_page(bio, page);
                                        break;
                                }
 
@@ -329,7 +329,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
                 * release the pages we didn't map into the bio, if any
                 */
                while (j < npages)
-                       put_page(pages[j++]);
+                       bio_release_page(bio, pages[j++]);
                if (pages != stack_pages)
                        kvfree(pages);
                /* couldn't stuff something into bio? */
index d23a855..c3b5930 100644 (file)
@@ -88,6 +88,7 @@ static const char *const blk_queue_flag_name[] = {
        QUEUE_FLAG_NAME(IO_STAT),
        QUEUE_FLAG_NAME(NOXMERGES),
        QUEUE_FLAG_NAME(ADD_RANDOM),
+       QUEUE_FLAG_NAME(SYNCHRONOUS),
        QUEUE_FLAG_NAME(SAME_FORCE),
        QUEUE_FLAG_NAME(INIT_DONE),
        QUEUE_FLAG_NAME(STABLE_WRITES),
@@ -103,6 +104,8 @@ static const char *const blk_queue_flag_name[] = {
        QUEUE_FLAG_NAME(RQ_ALLOC_TIME),
        QUEUE_FLAG_NAME(HCTX_ACTIVE),
        QUEUE_FLAG_NAME(NOWAIT),
+       QUEUE_FLAG_NAME(SQ_SCHED),
+       QUEUE_FLAG_NAME(SKIP_TAGSET_QUIESCE),
 };
 #undef QUEUE_FLAG_NAME
 
@@ -241,14 +244,14 @@ static const char *const cmd_flag_name[] = {
 #define RQF_NAME(name) [ilog2((__force u32)RQF_##name)] = #name
 static const char *const rqf_name[] = {
        RQF_NAME(STARTED),
-       RQF_NAME(SOFTBARRIER),
        RQF_NAME(FLUSH_SEQ),
        RQF_NAME(MIXED_MERGE),
        RQF_NAME(MQ_INFLIGHT),
        RQF_NAME(DONTPREP),
+       RQF_NAME(SCHED_TAGS),
+       RQF_NAME(USE_SCHED),
        RQF_NAME(FAILED),
        RQF_NAME(QUIET),
-       RQF_NAME(ELVPRIV),
        RQF_NAME(IO_STAT),
        RQF_NAME(PM),
        RQF_NAME(HASHED),
@@ -256,7 +259,6 @@ static const char *const rqf_name[] = {
        RQF_NAME(SPECIAL_PAYLOAD),
        RQF_NAME(ZONE_WRITE_LOCKED),
        RQF_NAME(TIMED_OUT),
-       RQF_NAME(ELV),
        RQF_NAME(RESV),
 };
 #undef RQF_NAME
@@ -399,7 +401,7 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
        seq_printf(m, "nr_tags=%u\n", tags->nr_tags);
        seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags);
        seq_printf(m, "active_queues=%d\n",
-                  atomic_read(&tags->active_queues));
+                  READ_ONCE(tags->active_queues));
 
        seq_puts(m, "\nbitmap_tags:\n");
        sbitmap_queue_show(&tags->bitmap_tags, m);
index 7c3cbad..1326526 100644 (file)
@@ -37,7 +37,7 @@ static inline bool
 blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
                         struct bio *bio)
 {
-       if (rq->rq_flags & RQF_ELV) {
+       if (rq->rq_flags & RQF_USE_SCHED) {
                struct elevator_queue *e = q->elevator;
 
                if (e->type->ops.allow_merge)
@@ -48,7 +48,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
 
 static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
 {
-       if (rq->rq_flags & RQF_ELV) {
+       if (rq->rq_flags & RQF_USE_SCHED) {
                struct elevator_queue *e = rq->q->elevator;
 
                if (e->type->ops.completed_request)
@@ -58,11 +58,11 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
 
 static inline void blk_mq_sched_requeue_request(struct request *rq)
 {
-       if (rq->rq_flags & RQF_ELV) {
+       if (rq->rq_flags & RQF_USE_SCHED) {
                struct request_queue *q = rq->q;
                struct elevator_queue *e = q->elevator;
 
-               if ((rq->rq_flags & RQF_ELVPRIV) && e->type->ops.requeue_request)
+               if (e->type->ops.requeue_request)
                        e->type->ops.requeue_request(rq);
        }
 }
index d6af9d4..cc57e2d 100644 (file)
@@ -38,22 +38,29 @@ static void blk_mq_update_wake_batch(struct blk_mq_tags *tags,
 void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
        unsigned int users;
+       struct blk_mq_tags *tags = hctx->tags;
 
+       /*
+        * calling test_bit() prior to test_and_set_bit() is intentional,
+        * it avoids dirtying the cacheline if the queue is already active.
+        */
        if (blk_mq_is_shared_tags(hctx->flags)) {
                struct request_queue *q = hctx->queue;
 
-               if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
+               if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
+                   test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
                        return;
-               set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags);
        } else {
-               if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+               if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
+                   test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
                        return;
-               set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state);
        }
 
-       users = atomic_inc_return(&hctx->tags->active_queues);
-
-       blk_mq_update_wake_batch(hctx->tags, users);
+       spin_lock_irq(&tags->lock);
+       users = tags->active_queues + 1;
+       WRITE_ONCE(tags->active_queues, users);
+       blk_mq_update_wake_batch(tags, users);
+       spin_unlock_irq(&tags->lock);
 }
 
 /*
@@ -86,9 +93,11 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
                        return;
        }
 
-       users = atomic_dec_return(&tags->active_queues);
-
+       spin_lock_irq(&tags->lock);
+       users = tags->active_queues - 1;
+       WRITE_ONCE(tags->active_queues, users);
        blk_mq_update_wake_batch(tags, users);
+       spin_unlock_irq(&tags->lock);
 
        blk_mq_tag_wakeup_all(tags, false);
 }
index f6dad08..decb6ab 100644 (file)
@@ -45,6 +45,8 @@
 static DEFINE_PER_CPU(struct llist_head, blk_cpu_done);
 
 static void blk_mq_insert_request(struct request *rq, blk_insert_t flags);
+static void blk_mq_request_bypass_insert(struct request *rq,
+               blk_insert_t flags);
 static void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
                struct list_head *list);
 
@@ -354,12 +356,12 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
                data->rq_flags |= RQF_IO_STAT;
        rq->rq_flags = data->rq_flags;
 
-       if (!(data->rq_flags & RQF_ELV)) {
-               rq->tag = tag;
-               rq->internal_tag = BLK_MQ_NO_TAG;
-       } else {
+       if (data->rq_flags & RQF_SCHED_TAGS) {
                rq->tag = BLK_MQ_NO_TAG;
                rq->internal_tag = tag;
+       } else {
+               rq->tag = tag;
+               rq->internal_tag = BLK_MQ_NO_TAG;
        }
        rq->timeout = 0;
 
@@ -386,17 +388,14 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
        WRITE_ONCE(rq->deadline, 0);
        req_ref_set(rq, 1);
 
-       if (rq->rq_flags & RQF_ELV) {
+       if (rq->rq_flags & RQF_USE_SCHED) {
                struct elevator_queue *e = data->q->elevator;
 
                INIT_HLIST_NODE(&rq->hash);
                RB_CLEAR_NODE(&rq->rb_node);
 
-               if (!op_is_flush(data->cmd_flags) &&
-                   e->type->ops.prepare_request) {
+               if (e->type->ops.prepare_request)
                        e->type->ops.prepare_request(rq);
-                       rq->rq_flags |= RQF_ELVPRIV;
-               }
        }
 
        return rq;
@@ -449,26 +448,32 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
                data->flags |= BLK_MQ_REQ_NOWAIT;
 
        if (q->elevator) {
-               struct elevator_queue *e = q->elevator;
-
-               data->rq_flags |= RQF_ELV;
+               /*
+                * All requests use scheduler tags when an I/O scheduler is
+                * enabled for the queue.
+                */
+               data->rq_flags |= RQF_SCHED_TAGS;
 
                /*
                 * Flush/passthrough requests are special and go directly to the
-                * dispatch list. Don't include reserved tags in the
-                * limiting, as it isn't useful.
+                * dispatch list.
                 */
-               if (!op_is_flush(data->cmd_flags) &&
-                   !blk_op_is_passthrough(data->cmd_flags) &&
-                   e->type->ops.limit_depth &&
-                   !(data->flags & BLK_MQ_REQ_RESERVED))
-                       e->type->ops.limit_depth(data->cmd_flags, data);
+               if ((data->cmd_flags & REQ_OP_MASK) != REQ_OP_FLUSH &&
+                   !blk_op_is_passthrough(data->cmd_flags)) {
+                       struct elevator_mq_ops *ops = &q->elevator->type->ops;
+
+                       WARN_ON_ONCE(data->flags & BLK_MQ_REQ_RESERVED);
+
+                       data->rq_flags |= RQF_USE_SCHED;
+                       if (ops->limit_depth)
+                               ops->limit_depth(data->cmd_flags, data);
+               }
        }
 
 retry:
        data->ctx = blk_mq_get_ctx(q);
        data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
-       if (!(data->rq_flags & RQF_ELV))
+       if (!(data->rq_flags & RQF_SCHED_TAGS))
                blk_mq_tag_busy(data->hctx);
 
        if (data->flags & BLK_MQ_REQ_RESERVED)
@@ -648,10 +653,10 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
                goto out_queue_exit;
        data.ctx = __blk_mq_get_ctx(q, cpu);
 
-       if (!q->elevator)
-               blk_mq_tag_busy(data.hctx);
+       if (q->elevator)
+               data.rq_flags |= RQF_SCHED_TAGS;
        else
-               data.rq_flags |= RQF_ELV;
+               blk_mq_tag_busy(data.hctx);
 
        if (flags & BLK_MQ_REQ_RESERVED)
                data.rq_flags |= RQF_RESV;
@@ -683,6 +688,10 @@ static void __blk_mq_free_request(struct request *rq)
        blk_crypto_free_request(rq);
        blk_pm_mark_last_busy(rq);
        rq->mq_hctx = NULL;
+
+       if (rq->rq_flags & RQF_MQ_INFLIGHT)
+               __blk_mq_dec_active_requests(hctx);
+
        if (rq->tag != BLK_MQ_NO_TAG)
                blk_mq_put_tag(hctx->tags, ctx, rq->tag);
        if (sched_tag != BLK_MQ_NO_TAG)
@@ -694,15 +703,11 @@ static void __blk_mq_free_request(struct request *rq)
 void blk_mq_free_request(struct request *rq)
 {
        struct request_queue *q = rq->q;
-       struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
-       if ((rq->rq_flags & RQF_ELVPRIV) &&
+       if ((rq->rq_flags & RQF_USE_SCHED) &&
            q->elevator->type->ops.finish_request)
                q->elevator->type->ops.finish_request(rq);
 
-       if (rq->rq_flags & RQF_MQ_INFLIGHT)
-               __blk_mq_dec_active_requests(hctx);
-
        if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
                laptop_io_completion(q->disk->bdi);
 
@@ -957,6 +962,8 @@ EXPORT_SYMBOL_GPL(blk_update_request);
 
 static inline void blk_account_io_done(struct request *req, u64 now)
 {
+       trace_block_io_done(req);
+
        /*
         * Account IO completion.  flush_rq isn't accounted as a
         * normal IO on queueing nor completion.  Accounting the
@@ -976,6 +983,8 @@ static inline void blk_account_io_done(struct request *req, u64 now)
 
 static inline void blk_account_io_start(struct request *req)
 {
+       trace_block_io_start(req);
+
        if (blk_do_io_stat(req)) {
                /*
                 * All non-passthrough requests are created from a bio with one
@@ -1176,8 +1185,9 @@ bool blk_mq_complete_request_remote(struct request *rq)
         * or a polled request, always complete locally,
         * it's pointless to redirect the completion.
         */
-       if (rq->mq_hctx->nr_ctx == 1 ||
-               rq->cmd_flags & REQ_POLLED)
+       if ((rq->mq_hctx->nr_ctx == 1 &&
+            rq->mq_ctx->cpu == raw_smp_processor_id()) ||
+            rq->cmd_flags & REQ_POLLED)
                return false;
 
        if (blk_mq_complete_need_ipi(rq)) {
@@ -1270,7 +1280,7 @@ static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
 
        if (!plug->multiple_queues && last && last->q != rq->q)
                plug->multiple_queues = true;
-       if (!plug->has_elevator && (rq->rq_flags & RQF_ELV))
+       if (!plug->has_elevator && (rq->rq_flags & RQF_USE_SCHED))
                plug->has_elevator = true;
        rq->rq_next = NULL;
        rq_list_add(&plug->mq_list, rq);
@@ -1411,13 +1421,16 @@ static void __blk_mq_requeue_request(struct request *rq)
 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
        struct request_queue *q = rq->q;
+       unsigned long flags;
 
        __blk_mq_requeue_request(rq);
 
        /* this request will be re-inserted to io scheduler queue */
        blk_mq_sched_requeue_request(rq);
 
-       blk_mq_add_to_requeue_list(rq, BLK_MQ_INSERT_AT_HEAD);
+       spin_lock_irqsave(&q->requeue_lock, flags);
+       list_add_tail(&rq->queuelist, &q->requeue_list);
+       spin_unlock_irqrestore(&q->requeue_lock, flags);
 
        if (kick_requeue_list)
                blk_mq_kick_requeue_list(q);
@@ -1429,13 +1442,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
        struct request_queue *q =
                container_of(work, struct request_queue, requeue_work.work);
        LIST_HEAD(rq_list);
-       struct request *rq, *next;
+       LIST_HEAD(flush_list);
+       struct request *rq;
 
        spin_lock_irq(&q->requeue_lock);
        list_splice_init(&q->requeue_list, &rq_list);
+       list_splice_init(&q->flush_list, &flush_list);
        spin_unlock_irq(&q->requeue_lock);
 
-       list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
+       while (!list_empty(&rq_list)) {
+               rq = list_entry(rq_list.next, struct request, queuelist);
                /*
                 * If RQF_DONTPREP ist set, the request has been started by the
                 * driver already and might have driver-specific data allocated
@@ -1443,18 +1459,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
                 * block layer merges for the request.
                 */
                if (rq->rq_flags & RQF_DONTPREP) {
-                       rq->rq_flags &= ~RQF_SOFTBARRIER;
                        list_del_init(&rq->queuelist);
                        blk_mq_request_bypass_insert(rq, 0);
-               } else if (rq->rq_flags & RQF_SOFTBARRIER) {
-                       rq->rq_flags &= ~RQF_SOFTBARRIER;
+               } else {
                        list_del_init(&rq->queuelist);
                        blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
                }
        }
 
-       while (!list_empty(&rq_list)) {
-               rq = list_entry(rq_list.next, struct request, queuelist);
+       while (!list_empty(&flush_list)) {
+               rq = list_entry(flush_list.next, struct request, queuelist);
                list_del_init(&rq->queuelist);
                blk_mq_insert_request(rq, 0);
        }
@@ -1462,27 +1476,6 @@ static void blk_mq_requeue_work(struct work_struct *work)
        blk_mq_run_hw_queues(q, false);
 }
 
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags)
-{
-       struct request_queue *q = rq->q;
-       unsigned long flags;
-
-       /*
-        * We abuse this flag that is otherwise used by the I/O scheduler to
-        * request head insertion from the workqueue.
-        */
-       BUG_ON(rq->rq_flags & RQF_SOFTBARRIER);
-
-       spin_lock_irqsave(&q->requeue_lock, flags);
-       if (insert_flags & BLK_MQ_INSERT_AT_HEAD) {
-               rq->rq_flags |= RQF_SOFTBARRIER;
-               list_add(&rq->queuelist, &q->requeue_list);
-       } else {
-               list_add_tail(&rq->queuelist, &q->requeue_list);
-       }
-       spin_unlock_irqrestore(&q->requeue_lock, flags);
-}
-
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
        kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work, 0);
@@ -2427,7 +2420,7 @@ static void blk_mq_run_work_fn(struct work_struct *work)
  * Should only be used carefully, when the caller knows we want to
  * bypass a potential IO scheduler on the target device.
  */
-void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
+static void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
 {
        struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
@@ -2492,7 +2485,7 @@ static void blk_mq_insert_request(struct request *rq, blk_insert_t flags)
                 * dispatch it given we prioritize requests in hctx->dispatch.
                 */
                blk_mq_request_bypass_insert(rq, flags);
-       } else if (rq->rq_flags & RQF_FLUSH_SEQ) {
+       } else if (req_op(rq) == REQ_OP_FLUSH) {
                /*
                 * Firstly normal IO request is inserted to scheduler queue or
                 * sw queue, meantime we add flush request to dispatch queue(
@@ -2622,7 +2615,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
                return;
        }
 
-       if ((rq->rq_flags & RQF_ELV) || !blk_mq_get_budget_and_tag(rq)) {
+       if ((rq->rq_flags & RQF_USE_SCHED) || !blk_mq_get_budget_and_tag(rq)) {
                blk_mq_insert_request(rq, 0);
                blk_mq_run_hw_queue(hctx, false);
                return;
@@ -2711,6 +2704,7 @@ static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
        struct request *requeue_list = NULL;
        struct request **requeue_lastp = &requeue_list;
        unsigned int depth = 0;
+       bool is_passthrough = false;
        LIST_HEAD(list);
 
        do {
@@ -2719,7 +2713,9 @@ static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
                if (!this_hctx) {
                        this_hctx = rq->mq_hctx;
                        this_ctx = rq->mq_ctx;
-               } else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx) {
+                       is_passthrough = blk_rq_is_passthrough(rq);
+               } else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
+                          is_passthrough != blk_rq_is_passthrough(rq)) {
                        rq_list_add_tail(&requeue_lastp, rq);
                        continue;
                }
@@ -2731,7 +2727,13 @@ static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
        trace_block_unplug(this_hctx->queue, depth, !from_sched);
 
        percpu_ref_get(&this_hctx->queue->q_usage_counter);
-       if (this_hctx->queue->elevator) {
+       /* passthrough requests should never be issued to the I/O scheduler */
+       if (is_passthrough) {
+               spin_lock(&this_hctx->lock);
+               list_splice_tail_init(&list, &this_hctx->dispatch);
+               spin_unlock(&this_hctx->lock);
+               blk_mq_run_hw_queue(this_hctx, from_sched);
+       } else if (this_hctx->queue->elevator) {
                this_hctx->queue->elevator->type->ops.insert_requests(this_hctx,
                                &list, 0);
                blk_mq_run_hw_queue(this_hctx, from_sched);
@@ -2970,10 +2972,8 @@ void blk_mq_submit_bio(struct bio *bio)
                return;
        }
 
-       if (op_is_flush(bio->bi_opf)) {
-               blk_insert_flush(rq);
+       if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
                return;
-       }
 
        if (plug) {
                blk_add_rq_to_plug(plug, rq);
@@ -2981,7 +2981,7 @@ void blk_mq_submit_bio(struct bio *bio)
        }
 
        hctx = rq->mq_hctx;
-       if ((rq->rq_flags & RQF_ELV) ||
+       if ((rq->rq_flags & RQF_USE_SCHED) ||
            (hctx->dispatch_busy && (q->nr_hw_queues == 1 || !is_sync))) {
                blk_mq_insert_request(rq, 0);
                blk_mq_run_hw_queue(hctx, true);
@@ -4232,6 +4232,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
        blk_mq_update_poll_flag(q);
 
        INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
+       INIT_LIST_HEAD(&q->flush_list);
        INIT_LIST_HEAD(&q->requeue_list);
        spin_lock_init(&q->requeue_lock);
 
@@ -4608,9 +4609,6 @@ static bool blk_mq_elv_switch_none(struct list_head *head,
 {
        struct blk_mq_qe_pair *qe;
 
-       if (!q->elevator)
-               return true;
-
        qe = kmalloc(sizeof(*qe), GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY);
        if (!qe)
                return false;
@@ -4618,6 +4616,12 @@ static bool blk_mq_elv_switch_none(struct list_head *head,
        /* q->elevator needs protection from ->sysfs_lock */
        mutex_lock(&q->sysfs_lock);
 
+       /* the check has to be done with holding sysfs_lock */
+       if (!q->elevator) {
+               kfree(qe);
+               goto unlock;
+       }
+
        INIT_LIST_HEAD(&qe->node);
        qe->q = q;
        qe->type = q->elevator->type;
@@ -4625,6 +4629,7 @@ static bool blk_mq_elv_switch_none(struct list_head *head,
        __elevator_get(qe->type);
        list_add(&qe->node, head);
        elevator_disable(q);
+unlock:
        mutex_unlock(&q->sysfs_lock);
 
        return true;
index e876584..1743857 100644 (file)
@@ -47,7 +47,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
 bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *,
                             unsigned int);
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags);
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
                                        struct blk_mq_ctx *start);
@@ -64,10 +63,6 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
 void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
                             struct blk_mq_tags *tags,
                             unsigned int hctx_idx);
-/*
- * Internal helpers for request insertion into sw queues
- */
-void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags);
 
 /*
  * CPU -> queue mappings
@@ -226,9 +221,9 @@ static inline bool blk_mq_is_shared_tags(unsigned int flags)
 
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
 {
-       if (!(data->rq_flags & RQF_ELV))
-               return data->hctx->tags;
-       return data->hctx->sched_tags;
+       if (data->rq_flags & RQF_SCHED_TAGS)
+               return data->hctx->sched_tags;
+       return data->hctx->tags;
 }
 
 static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
@@ -417,8 +412,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
                        return true;
        }
 
-       users = atomic_read(&hctx->tags->active_queues);
-
+       users = READ_ONCE(hctx->tags->active_queues);
        if (!users)
                return true;
 
index d8cc820..167be74 100644 (file)
@@ -288,11 +288,13 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 
 void rq_qos_exit(struct request_queue *q)
 {
+       mutex_lock(&q->rq_qos_mutex);
        while (q->rq_qos) {
                struct rq_qos *rqos = q->rq_qos;
                q->rq_qos = rqos->next;
                rqos->ops->exit(rqos);
        }
+       mutex_unlock(&q->rq_qos_mutex);
 }
 
 int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
@@ -300,6 +302,8 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
 {
        struct request_queue *q = disk->queue;
 
+       lockdep_assert_held(&q->rq_qos_mutex);
+
        rqos->disk = disk;
        rqos->id = id;
        rqos->ops = ops;
@@ -307,18 +311,13 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
        /*
         * No IO can be in-flight when adding rqos, so freeze queue, which
         * is fine since we only support rq_qos for blk-mq queue.
-        *
-        * Reuse ->queue_lock for protecting against other concurrent
-        * rq_qos adding/deleting
         */
        blk_mq_freeze_queue(q);
 
-       spin_lock_irq(&q->queue_lock);
        if (rq_qos_id(q, rqos->id))
                goto ebusy;
        rqos->next = q->rq_qos;
        q->rq_qos = rqos;
-       spin_unlock_irq(&q->queue_lock);
 
        blk_mq_unfreeze_queue(q);
 
@@ -330,7 +329,6 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
 
        return 0;
 ebusy:
-       spin_unlock_irq(&q->queue_lock);
        blk_mq_unfreeze_queue(q);
        return -EBUSY;
 }
@@ -340,21 +338,15 @@ void rq_qos_del(struct rq_qos *rqos)
        struct request_queue *q = rqos->disk->queue;
        struct rq_qos **cur;
 
-       /*
-        * See comment in rq_qos_add() about freezing queue & using
-        * ->queue_lock.
-        */
-       blk_mq_freeze_queue(q);
+       lockdep_assert_held(&q->rq_qos_mutex);
 
-       spin_lock_irq(&q->queue_lock);
+       blk_mq_freeze_queue(q);
        for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) {
                if (*cur == rqos) {
                        *cur = rqos->next;
                        break;
                }
        }
-       spin_unlock_irq(&q->queue_lock);
-
        blk_mq_unfreeze_queue(q);
 
        mutex_lock(&q->debugfs_mutex);
index 896b465..4dd5905 100644 (file)
@@ -915,6 +915,7 @@ static bool disk_has_partitions(struct gendisk *disk)
 void disk_set_zoned(struct gendisk *disk, enum blk_zoned_model model)
 {
        struct request_queue *q = disk->queue;
+       unsigned int old_model = q->limits.zoned;
 
        switch (model) {
        case BLK_ZONED_HM:
@@ -952,7 +953,7 @@ void disk_set_zoned(struct gendisk *disk, enum blk_zoned_model model)
                 */
                blk_queue_zone_write_granularity(q,
                                                queue_logical_block_size(q));
-       } else {
+       } else if (old_model != BLK_ZONED_NONE) {
                disk_clear_zone_settings(disk);
        }
 }
index e49a486..7a87506 100644 (file)
@@ -730,14 +730,16 @@ void wbt_enable_default(struct gendisk *disk)
 {
        struct request_queue *q = disk->queue;
        struct rq_qos *rqos;
-       bool disable_flag = q->elevator &&
-                   test_bit(ELEVATOR_FLAG_DISABLE_WBT, &q->elevator->flags);
+       bool enable = IS_ENABLED(CONFIG_BLK_WBT_MQ);
+
+       if (q->elevator &&
+           test_bit(ELEVATOR_FLAG_DISABLE_WBT, &q->elevator->flags))
+               enable = false;
 
        /* Throttling already enabled? */
        rqos = wbt_rq_qos(q);
        if (rqos) {
-               if (!disable_flag &&
-                   RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT)
+               if (enable && RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT)
                        RQWB(rqos)->enable_state = WBT_STATE_ON_DEFAULT;
                return;
        }
@@ -746,7 +748,7 @@ void wbt_enable_default(struct gendisk *disk)
        if (!blk_queue_registered(q))
                return;
 
-       if (queue_is_mq(q) && !disable_flag)
+       if (queue_is_mq(q) && enable)
                wbt_init(disk);
 }
 EXPORT_SYMBOL_GPL(wbt_enable_default);
@@ -942,7 +944,9 @@ int wbt_init(struct gendisk *disk)
        /*
         * Assign rwb and add the stats callback.
         */
+       mutex_lock(&q->rq_qos_mutex);
        ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
+       mutex_unlock(&q->rq_qos_mutex);
        if (ret)
                goto err_free;
 
index fce9082..0f9f97c 100644 (file)
@@ -57,16 +57,10 @@ EXPORT_SYMBOL_GPL(blk_zone_cond_str);
  */
 bool blk_req_needs_zone_write_lock(struct request *rq)
 {
-       if (blk_rq_is_passthrough(rq))
-               return false;
-
        if (!rq->q->disk->seq_zones_wlock)
                return false;
 
-       if (bdev_op_is_zoned_write(rq->q->disk->part0, req_op(rq)))
-               return blk_rq_zone_is_seq(rq);
-
-       return false;
+       return blk_rq_is_seq_zoned_write(rq);
 }
 EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock);
 
@@ -329,8 +323,8 @@ static int blkdev_copy_zone_to_user(struct blk_zone *zone, unsigned int idx,
  * BLKREPORTZONE ioctl processing.
  * Called from blkdev_ioctl.
  */
-int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
-                             unsigned int cmd, unsigned long arg)
+int blkdev_report_zones_ioctl(struct block_device *bdev, unsigned int cmd,
+               unsigned long arg)
 {
        void __user *argp = (void __user *)arg;
        struct zone_report_args args;
@@ -362,8 +356,8 @@ int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
        return 0;
 }
 
-static int blkdev_truncate_zone_range(struct block_device *bdev, fmode_t mode,
-                                     const struct blk_zone_range *zrange)
+static int blkdev_truncate_zone_range(struct block_device *bdev,
+               blk_mode_t mode, const struct blk_zone_range *zrange)
 {
        loff_t start, end;
 
@@ -382,7 +376,7 @@ static int blkdev_truncate_zone_range(struct block_device *bdev, fmode_t mode,
  * BLKRESETZONE, BLKOPENZONE, BLKCLOSEZONE and BLKFINISHZONE ioctl processing.
  * Called from blkdev_ioctl.
  */
-int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
+int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
                           unsigned int cmd, unsigned long arg)
 {
        void __user *argp = (void __user *)arg;
@@ -396,7 +390,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
        if (!bdev_is_zoned(bdev))
                return -ENOTTY;
 
-       if (!(mode & FMODE_WRITE))
+       if (!(mode & BLK_OPEN_WRITE))
                return -EBADF;
 
        if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range)))
index 45547bc..608c5dc 100644 (file)
@@ -269,7 +269,7 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
  */
 #define ELV_ON_HASH(rq) ((rq)->rq_flags & RQF_HASHED)
 
-void blk_insert_flush(struct request *rq);
+bool blk_insert_flush(struct request *rq);
 
 int elevator_switch(struct request_queue *q, struct elevator_type *new_e);
 void elevator_disable(struct request_queue *q);
@@ -394,10 +394,27 @@ static inline struct bio *blk_queue_bounce(struct bio *bio,
 #ifdef CONFIG_BLK_DEV_ZONED
 void disk_free_zone_bitmaps(struct gendisk *disk);
 void disk_clear_zone_settings(struct gendisk *disk);
-#else
+int blkdev_report_zones_ioctl(struct block_device *bdev, unsigned int cmd,
+               unsigned long arg);
+int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
+               unsigned int cmd, unsigned long arg);
+#else /* CONFIG_BLK_DEV_ZONED */
 static inline void disk_free_zone_bitmaps(struct gendisk *disk) {}
 static inline void disk_clear_zone_settings(struct gendisk *disk) {}
-#endif
+static inline int blkdev_report_zones_ioctl(struct block_device *bdev,
+               unsigned int cmd, unsigned long arg)
+{
+       return -ENOTTY;
+}
+static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
+               blk_mode_t mode, unsigned int cmd, unsigned long arg)
+{
+       return -ENOTTY;
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
+
+struct block_device *bdev_alloc(struct gendisk *disk, u8 partno);
+void bdev_add(struct block_device *bdev, dev_t dev);
 
 int blk_alloc_ext_minor(void);
 void blk_free_ext_minor(unsigned int minor);
@@ -409,7 +426,7 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
 int bdev_del_partition(struct gendisk *disk, int partno);
 int bdev_resize_partition(struct gendisk *disk, int partno, sector_t start,
                sector_t length);
-void blk_drop_partitions(struct gendisk *disk);
+void drop_partition(struct block_device *part);
 
 void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors);
 
@@ -420,9 +437,19 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
                struct page *page, unsigned int len, unsigned int offset,
                unsigned int max_sectors, bool *same_page);
 
+/*
+ * Clean up a page appropriately, where the page may be pinned, may have a
+ * ref taken on it or neither.
+ */
+static inline void bio_release_page(struct bio *bio, struct page *page)
+{
+       if (bio_flagged(bio, BIO_PAGE_PINNED))
+               unpin_user_page(page);
+}
+
 struct request_queue *blk_alloc_queue(int node_id);
 
-int disk_scan_partitions(struct gendisk *disk, fmode_t mode);
+int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode);
 
 int disk_alloc_events(struct gendisk *disk);
 void disk_add_events(struct gendisk *disk);
@@ -437,6 +464,9 @@ extern struct device_attribute dev_attr_events_poll_msecs;
 
 extern struct attribute_group blk_trace_attr_group;
 
+blk_mode_t file_to_blk_mode(struct file *file);
+int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
+               loff_t lstart, loff_t lend);
 long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
 long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
 
index 435c323..b3acdbd 100644 (file)
@@ -26,7 +26,7 @@ struct bsg_set {
 };
 
 static int bsg_transport_sg_io_fn(struct request_queue *q, struct sg_io_v4 *hdr,
-               fmode_t mode, unsigned int timeout)
+               bool open_for_write, unsigned int timeout)
 {
        struct bsg_job *job;
        struct request *rq;
index 7eca43f..1a9396a 100644 (file)
@@ -39,7 +39,7 @@ static inline struct bsg_device *to_bsg_device(struct inode *inode)
 #define BSG_MAX_DEVS           32768
 
 static DEFINE_IDA(bsg_minor_ida);
-static struct class *bsg_class;
+static const struct class bsg_class;
 static int bsg_major;
 
 static unsigned int bsg_timeout(struct bsg_device *bd, struct sg_io_v4 *hdr)
@@ -54,7 +54,8 @@ static unsigned int bsg_timeout(struct bsg_device *bd, struct sg_io_v4 *hdr)
        return max_t(unsigned int, timeout, BLK_MIN_SG_TIMEOUT);
 }
 
-static int bsg_sg_io(struct bsg_device *bd, fmode_t mode, void __user *uarg)
+static int bsg_sg_io(struct bsg_device *bd, bool open_for_write,
+                    void __user *uarg)
 {
        struct sg_io_v4 hdr;
        int ret;
@@ -63,7 +64,8 @@ static int bsg_sg_io(struct bsg_device *bd, fmode_t mode, void __user *uarg)
                return -EFAULT;
        if (hdr.guard != 'Q')
                return -EINVAL;
-       ret = bd->sg_io_fn(bd->queue, &hdr, mode, bsg_timeout(bd, &hdr));
+       ret = bd->sg_io_fn(bd->queue, &hdr, open_for_write,
+                          bsg_timeout(bd, &hdr));
        if (!ret && copy_to_user(uarg, &hdr, sizeof(hdr)))
                return -EFAULT;
        return ret;
@@ -146,7 +148,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
        case SG_EMULATED_HOST:
                return put_user(1, intp);
        case SG_IO:
-               return bsg_sg_io(bd, file->f_mode, uarg);
+               return bsg_sg_io(bd, file->f_mode & FMODE_WRITE, uarg);
        case SCSI_IOCTL_SEND_COMMAND:
                pr_warn_ratelimited("%s: calling unsupported SCSI_IOCTL_SEND_COMMAND\n",
                                current->comm);
@@ -206,7 +208,7 @@ struct bsg_device *bsg_register_queue(struct request_queue *q,
                return ERR_PTR(ret);
        }
        bd->device.devt = MKDEV(bsg_major, ret);
-       bd->device.class = bsg_class;
+       bd->device.class = &bsg_class;
        bd->device.parent = parent;
        bd->device.release = bsg_device_release;
        dev_set_name(&bd->device, "%s", name);
@@ -240,15 +242,19 @@ static char *bsg_devnode(const struct device *dev, umode_t *mode)
        return kasprintf(GFP_KERNEL, "bsg/%s", dev_name(dev));
 }
 
+static const struct class bsg_class = {
+       .name           = "bsg",
+       .devnode        = bsg_devnode,
+};
+
 static int __init bsg_init(void)
 {
        dev_t devid;
        int ret;
 
-       bsg_class = class_create("bsg");
-       if (IS_ERR(bsg_class))
-               return PTR_ERR(bsg_class);
-       bsg_class->devnode = bsg_devnode;
+       ret = class_register(&bsg_class);
+       if (ret)
+               return ret;
 
        ret = alloc_chrdev_region(&devid, 0, BSG_MAX_DEVS, "bsg");
        if (ret)
@@ -260,7 +266,7 @@ static int __init bsg_init(void)
        return 0;
 
 destroy_bsg_class:
-       class_destroy(bsg_class);
+       class_unregister(&bsg_class);
        return ret;
 }
 
index aee25a7..0cfac46 100644 (file)
@@ -263,31 +263,31 @@ static unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask)
 }
 
 /**
- * bdev_check_media_change - check if a removable media has been changed
- * @bdev: block device to check
+ * disk_check_media_change - check if a removable media has been changed
+ * @disk: gendisk to check
  *
  * Check whether a removable media has been changed, and attempt to free all
  * dentries and inodes and invalidates all block device page cache entries in
  * that case.
  *
- * Returns %true if the block device changed, or %false if not.
+ * Returns %true if the media has changed, or %false if not.
  */
-bool bdev_check_media_change(struct block_device *bdev)
+bool disk_check_media_change(struct gendisk *disk)
 {
        unsigned int events;
 
-       events = disk_clear_events(bdev->bd_disk, DISK_EVENT_MEDIA_CHANGE |
+       events = disk_clear_events(disk, DISK_EVENT_MEDIA_CHANGE |
                                   DISK_EVENT_EJECT_REQUEST);
        if (!(events & DISK_EVENT_MEDIA_CHANGE))
                return false;
 
-       if (__invalidate_device(bdev, true))
+       if (__invalidate_device(disk->part0, true))
                pr_warn("VFS: busy inodes on changed media %s\n",
-                       bdev->bd_disk->disk_name);
-       set_bit(GD_NEED_PART_SCAN, &bdev->bd_disk->state);
+                       disk->disk_name);
+       set_bit(GD_NEED_PART_SCAN, &disk->state);
        return true;
 }
-EXPORT_SYMBOL(bdev_check_media_change);
+EXPORT_SYMBOL(disk_check_media_change);
 
 /**
  * disk_force_media_change - force a media change event
@@ -307,6 +307,7 @@ bool disk_force_media_change(struct gendisk *disk, unsigned int events)
        if (!(events & DISK_EVENT_MEDIA_CHANGE))
                return false;
 
+       inc_diskseq(disk);
        if (__invalidate_device(disk->part0, true))
                pr_warn("VFS: busy inodes on changed media %s\n",
                        disk->disk_name);
diff --git a/block/early-lookup.c b/block/early-lookup.c
new file mode 100644 (file)
index 0000000..3effbd0
--- /dev/null
@@ -0,0 +1,316 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Code for looking up block devices in the early boot code before mounting the
+ * root file system.
+ */
+#include <linux/blkdev.h>
+#include <linux/ctype.h>
+
+struct uuidcmp {
+       const char *uuid;
+       int len;
+};
+
+/**
+ * match_dev_by_uuid - callback for finding a partition using its uuid
+ * @dev:       device passed in by the caller
+ * @data:      opaque pointer to the desired struct uuidcmp to match
+ *
+ * Returns 1 if the device matches, and 0 otherwise.
+ */
+static int __init match_dev_by_uuid(struct device *dev, const void *data)
+{
+       struct block_device *bdev = dev_to_bdev(dev);
+       const struct uuidcmp *cmp = data;
+
+       if (!bdev->bd_meta_info ||
+           strncasecmp(cmp->uuid, bdev->bd_meta_info->uuid, cmp->len))
+               return 0;
+       return 1;
+}
+
+/**
+ * devt_from_partuuid - looks up the dev_t of a partition by its UUID
+ * @uuid_str:  char array containing ascii UUID
+ * @devt:      dev_t result
+ *
+ * The function will return the first partition which contains a matching
+ * UUID value in its partition_meta_info struct.  This does not search
+ * by filesystem UUIDs.
+ *
+ * If @uuid_str is followed by a "/PARTNROFF=%d", then the number will be
+ * extracted and used as an offset from the partition identified by the UUID.
+ *
+ * Returns 0 on success or a negative error code on failure.
+ */
+static int __init devt_from_partuuid(const char *uuid_str, dev_t *devt)
+{
+       struct uuidcmp cmp;
+       struct device *dev = NULL;
+       int offset = 0;
+       char *slash;
+
+       cmp.uuid = uuid_str;
+
+       slash = strchr(uuid_str, '/');
+       /* Check for optional partition number offset attributes. */
+       if (slash) {
+               char c = 0;
+
+               /* Explicitly fail on poor PARTUUID syntax. */
+               if (sscanf(slash + 1, "PARTNROFF=%d%c", &offset, &c) != 1)
+                       goto out_invalid;
+               cmp.len = slash - uuid_str;
+       } else {
+               cmp.len = strlen(uuid_str);
+       }
+
+       if (!cmp.len)
+               goto out_invalid;
+
+       dev = class_find_device(&block_class, NULL, &cmp, &match_dev_by_uuid);
+       if (!dev)
+               return -ENODEV;
+
+       if (offset) {
+               /*
+                * Attempt to find the requested partition by adding an offset
+                * to the partition number found by UUID.
+                */
+               *devt = part_devt(dev_to_disk(dev),
+                                 dev_to_bdev(dev)->bd_partno + offset);
+       } else {
+               *devt = dev->devt;
+       }
+
+       put_device(dev);
+       return 0;
+
+out_invalid:
+       pr_err("VFS: PARTUUID= is invalid.\n"
+              "Expected PARTUUID=<valid-uuid-id>[/PARTNROFF=%%d]\n");
+       return -EINVAL;
+}
+
+/**
+ * match_dev_by_label - callback for finding a partition using its label
+ * @dev:       device passed in by the caller
+ * @data:      opaque pointer to the label to match
+ *
+ * Returns 1 if the device matches, and 0 otherwise.
+ */
+static int __init match_dev_by_label(struct device *dev, const void *data)
+{
+       struct block_device *bdev = dev_to_bdev(dev);
+       const char *label = data;
+
+       if (!bdev->bd_meta_info || strcmp(label, bdev->bd_meta_info->volname))
+               return 0;
+       return 1;
+}
+
+static int __init devt_from_partlabel(const char *label, dev_t *devt)
+{
+       struct device *dev;
+
+       dev = class_find_device(&block_class, NULL, label, &match_dev_by_label);
+       if (!dev)
+               return -ENODEV;
+       *devt = dev->devt;
+       put_device(dev);
+       return 0;
+}
+
+static dev_t __init blk_lookup_devt(const char *name, int partno)
+{
+       dev_t devt = MKDEV(0, 0);
+       struct class_dev_iter iter;
+       struct device *dev;
+
+       class_dev_iter_init(&iter, &block_class, NULL, &disk_type);
+       while ((dev = class_dev_iter_next(&iter))) {
+               struct gendisk *disk = dev_to_disk(dev);
+
+               if (strcmp(dev_name(dev), name))
+                       continue;
+
+               if (partno < disk->minors) {
+                       /* We need to return the right devno, even
+                        * if the partition doesn't exist yet.
+                        */
+                       devt = MKDEV(MAJOR(dev->devt),
+                                    MINOR(dev->devt) + partno);
+               } else {
+                       devt = part_devt(disk, partno);
+                       if (devt)
+                               break;
+               }
+       }
+       class_dev_iter_exit(&iter);
+       return devt;
+}
+
+static int __init devt_from_devname(const char *name, dev_t *devt)
+{
+       int part;
+       char s[32];
+       char *p;
+
+       if (strlen(name) > 31)
+               return -EINVAL;
+       strcpy(s, name);
+       for (p = s; *p; p++) {
+               if (*p == '/')
+                       *p = '!';
+       }
+
+       *devt = blk_lookup_devt(s, 0);
+       if (*devt)
+               return 0;
+
+       /*
+        * Try non-existent, but valid partition, which may only exist after
+        * opening the device, like partitioned md devices.
+        */
+       while (p > s && isdigit(p[-1]))
+               p--;
+       if (p == s || !*p || *p == '0')
+               return -ENODEV;
+
+       /* try disk name without <part number> */
+       part = simple_strtoul(p, NULL, 10);
+       *p = '\0';
+       *devt = blk_lookup_devt(s, part);
+       if (*devt)
+               return 0;
+
+       /* try disk name without p<part number> */
+       if (p < s + 2 || !isdigit(p[-2]) || p[-1] != 'p')
+               return -ENODEV;
+       p[-1] = '\0';
+       *devt = blk_lookup_devt(s, part);
+       if (*devt)
+               return 0;
+       return -ENODEV;
+}
+
+static int __init devt_from_devnum(const char *name, dev_t *devt)
+{
+       unsigned maj, min, offset;
+       char *p, dummy;
+
+       if (sscanf(name, "%u:%u%c", &maj, &min, &dummy) == 2 ||
+           sscanf(name, "%u:%u:%u:%c", &maj, &min, &offset, &dummy) == 3) {
+               *devt = MKDEV(maj, min);
+               if (maj != MAJOR(*devt) || min != MINOR(*devt))
+                       return -EINVAL;
+       } else {
+               *devt = new_decode_dev(simple_strtoul(name, &p, 16));
+               if (*p)
+                       return -EINVAL;
+       }
+
+       return 0;
+}
+
+/*
+ *     Convert a name into device number.  We accept the following variants:
+ *
+ *     1) <hex_major><hex_minor> device number in hexadecimal represents itself
+ *         no leading 0x, for example b302.
+ *     3) /dev/<disk_name> represents the device number of disk
+ *     4) /dev/<disk_name><decimal> represents the device number
+ *         of partition - device number of disk plus the partition number
+ *     5) /dev/<disk_name>p<decimal> - same as the above, that form is
+ *        used when disk name of partitioned disk ends on a digit.
+ *     6) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+ *        unique id of a partition if the partition table provides it.
+ *        The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+ *        partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+ *        filled hex representation of the 32-bit "NT disk signature", and PP
+ *        is a zero-filled hex representation of the 1-based partition number.
+ *     7) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
+ *        a partition with a known unique id.
+ *     8) <major>:<minor> major and minor number of the device separated by
+ *        a colon.
+ *     9) PARTLABEL=<name> with name being the GPT partition label.
+ *        MSDOS partitions do not support labels!
+ *
+ *     If name doesn't have fall into the categories above, we return (0,0).
+ *     block_class is used to check if something is a disk name. If the disk
+ *     name contains slashes, the device name has them replaced with
+ *     bangs.
+ */
+int __init early_lookup_bdev(const char *name, dev_t *devt)
+{
+       if (strncmp(name, "PARTUUID=", 9) == 0)
+               return devt_from_partuuid(name + 9, devt);
+       if (strncmp(name, "PARTLABEL=", 10) == 0)
+               return devt_from_partlabel(name + 10, devt);
+       if (strncmp(name, "/dev/", 5) == 0)
+               return devt_from_devname(name + 5, devt);
+       return devt_from_devnum(name, devt);
+}
+
+static char __init *bdevt_str(dev_t devt, char *buf)
+{
+       if (MAJOR(devt) <= 0xff && MINOR(devt) <= 0xff) {
+               char tbuf[BDEVT_SIZE];
+               snprintf(tbuf, BDEVT_SIZE, "%02x%02x", MAJOR(devt), MINOR(devt));
+               snprintf(buf, BDEVT_SIZE, "%-9s", tbuf);
+       } else
+               snprintf(buf, BDEVT_SIZE, "%03x:%05x", MAJOR(devt), MINOR(devt));
+
+       return buf;
+}
+
+/*
+ * print a full list of all partitions - intended for places where the root
+ * filesystem can't be mounted and thus to give the victim some idea of what
+ * went wrong
+ */
+void __init printk_all_partitions(void)
+{
+       struct class_dev_iter iter;
+       struct device *dev;
+
+       class_dev_iter_init(&iter, &block_class, NULL, &disk_type);
+       while ((dev = class_dev_iter_next(&iter))) {
+               struct gendisk *disk = dev_to_disk(dev);
+               struct block_device *part;
+               char devt_buf[BDEVT_SIZE];
+               unsigned long idx;
+
+               /*
+                * Don't show empty devices or things that have been
+                * suppressed
+                */
+               if (get_capacity(disk) == 0 || (disk->flags & GENHD_FL_HIDDEN))
+                       continue;
+
+               /*
+                * Note, unlike /proc/partitions, I am showing the numbers in
+                * hex - the same format as the root= option takes.
+                */
+               rcu_read_lock();
+               xa_for_each(&disk->part_tbl, idx, part) {
+                       if (!bdev_nr_sectors(part))
+                               continue;
+                       printk("%s%s %10llu %pg %s",
+                              bdev_is_partition(part) ? "  " : "",
+                              bdevt_str(part->bd_dev, devt_buf),
+                              bdev_nr_sectors(part) >> 1, part,
+                              part->bd_meta_info ?
+                                       part->bd_meta_info->uuid : "");
+                       if (bdev_is_partition(part))
+                               printk("\n");
+                       else if (dev->parent && dev->parent->driver)
+                               printk(" driver: %s\n",
+                                       dev->parent->driver->name);
+                       else
+                               printk(" (driver?)\n");
+               }
+               rcu_read_unlock();
+       }
+       class_dev_iter_exit(&iter);
+}
index 2490906..8400e30 100644 (file)
@@ -751,7 +751,7 @@ ssize_t elv_iosched_store(struct request_queue *q, const char *buf,
        if (!elv_support_iosched(q))
                return count;
 
-       strlcpy(elevator_name, buf, sizeof(elevator_name));
+       strscpy(elevator_name, buf, sizeof(elevator_name));
        ret = elevator_change(q, strstrip(elevator_name));
        if (!ret)
                return count;
index d2e6be4..555b1b9 100644 (file)
@@ -54,7 +54,7 @@ static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos,
 static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
                struct iov_iter *iter, unsigned int nr_pages)
 {
-       struct block_device *bdev = iocb->ki_filp->private_data;
+       struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
        struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs;
        loff_t pos = iocb->ki_pos;
        bool should_dirty = false;
@@ -170,7 +170,7 @@ static void blkdev_bio_end_io(struct bio *bio)
 static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
                unsigned int nr_pages)
 {
-       struct block_device *bdev = iocb->ki_filp->private_data;
+       struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
        struct blk_plug plug;
        struct blkdev_dio *dio;
        struct bio *bio;
@@ -310,7 +310,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
                                        struct iov_iter *iter,
                                        unsigned int nr_pages)
 {
-       struct block_device *bdev = iocb->ki_filp->private_data;
+       struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
        bool is_read = iov_iter_rw(iter) == READ;
        blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb);
        struct blkdev_dio *dio;
@@ -451,7 +451,7 @@ static loff_t blkdev_llseek(struct file *file, loff_t offset, int whence)
 static int blkdev_fsync(struct file *filp, loff_t start, loff_t end,
                int datasync)
 {
-       struct block_device *bdev = filp->private_data;
+       struct block_device *bdev = I_BDEV(filp->f_mapping->host);
        int error;
 
        error = file_write_and_wait_range(filp, start, end);
@@ -470,6 +470,30 @@ static int blkdev_fsync(struct file *filp, loff_t start, loff_t end,
        return error;
 }
 
+blk_mode_t file_to_blk_mode(struct file *file)
+{
+       blk_mode_t mode = 0;
+
+       if (file->f_mode & FMODE_READ)
+               mode |= BLK_OPEN_READ;
+       if (file->f_mode & FMODE_WRITE)
+               mode |= BLK_OPEN_WRITE;
+       if (file->private_data)
+               mode |= BLK_OPEN_EXCL;
+       if (file->f_flags & O_NDELAY)
+               mode |= BLK_OPEN_NDELAY;
+
+       /*
+        * If all bits in O_ACCMODE set (aka O_RDWR | O_WRONLY), the floppy
+        * driver has historically allowed ioctls as if the file was opened for
+        * writing, but does not allow and actual reads or writes.
+        */
+       if ((file->f_flags & O_ACCMODE) == (O_RDWR | O_WRONLY))
+               mode |= BLK_OPEN_WRITE_IOCTL;
+
+       return mode;
+}
+
 static int blkdev_open(struct inode *inode, struct file *filp)
 {
        struct block_device *bdev;
@@ -481,30 +505,31 @@ static int blkdev_open(struct inode *inode, struct file *filp)
         * during an unstable branch.
         */
        filp->f_flags |= O_LARGEFILE;
-       filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
+       filp->f_mode |= FMODE_BUF_RASYNC;
 
-       if (filp->f_flags & O_NDELAY)
-               filp->f_mode |= FMODE_NDELAY;
+       /*
+        * Use the file private data to store the holder for exclusive openes.
+        * file_to_blk_mode relies on it being present to set BLK_OPEN_EXCL.
+        */
        if (filp->f_flags & O_EXCL)
-               filp->f_mode |= FMODE_EXCL;
-       if ((filp->f_flags & O_ACCMODE) == 3)
-               filp->f_mode |= FMODE_WRITE_IOCTL;
+               filp->private_data = filp;
 
-       bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp);
+       bdev = blkdev_get_by_dev(inode->i_rdev, file_to_blk_mode(filp),
+                                filp->private_data, NULL);
        if (IS_ERR(bdev))
                return PTR_ERR(bdev);
 
-       filp->private_data = bdev;
+       if (bdev_nowait(bdev))
+               filp->f_mode |= FMODE_NOWAIT;
+
        filp->f_mapping = bdev->bd_inode->i_mapping;
        filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
        return 0;
 }
 
-static int blkdev_close(struct inode *inode, struct file *filp)
+static int blkdev_release(struct inode *inode, struct file *filp)
 {
-       struct block_device *bdev = filp->private_data;
-
-       blkdev_put(bdev, filp->f_mode);
+       blkdev_put(I_BDEV(filp->f_mapping->host), filp->private_data);
        return 0;
 }
 
@@ -517,10 +542,9 @@ static int blkdev_close(struct inode *inode, struct file *filp)
  */
 static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
-       struct block_device *bdev = iocb->ki_filp->private_data;
+       struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
        struct inode *bd_inode = bdev->bd_inode;
        loff_t size = bdev_nr_bytes(bdev);
-       struct blk_plug plug;
        size_t shorted = 0;
        ssize_t ret;
 
@@ -545,18 +569,16 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
                iov_iter_truncate(from, size);
        }
 
-       blk_start_plug(&plug);
        ret = __generic_file_write_iter(iocb, from);
        if (ret > 0)
                ret = generic_write_sync(iocb, ret);
        iov_iter_reexpand(from, iov_iter_count(from) + shorted);
-       blk_finish_plug(&plug);
        return ret;
 }
 
 static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
-       struct block_device *bdev = iocb->ki_filp->private_data;
+       struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
        loff_t size = bdev_nr_bytes(bdev);
        loff_t pos = iocb->ki_pos;
        size_t shorted = 0;
@@ -649,7 +671,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
        filemap_invalidate_lock(inode->i_mapping);
 
        /* Invalidate the page cache, including dirty pages. */
-       error = truncate_bdev_range(bdev, file->f_mode, start, end);
+       error = truncate_bdev_range(bdev, file_to_blk_mode(file), start, end);
        if (error)
                goto fail;
 
@@ -678,20 +700,30 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
        return error;
 }
 
+static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+       struct inode *bd_inode = bdev_file_inode(file);
+
+       if (bdev_read_only(I_BDEV(bd_inode)))
+               return generic_file_readonly_mmap(file, vma);
+
+       return generic_file_mmap(file, vma);
+}
+
 const struct file_operations def_blk_fops = {
        .open           = blkdev_open,
-       .release        = blkdev_close,
+       .release        = blkdev_release,
        .llseek         = blkdev_llseek,
        .read_iter      = blkdev_read_iter,
        .write_iter     = blkdev_write_iter,
        .iopoll         = iocb_bio_iopoll,
-       .mmap           = generic_file_mmap,
+       .mmap           = blkdev_mmap,
        .fsync          = blkdev_fsync,
        .unlocked_ioctl = blkdev_ioctl,
 #ifdef CONFIG_COMPAT
        .compat_ioctl   = compat_blkdev_ioctl,
 #endif
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
        .fallocate      = blkdev_fallocate,
 };
index 1cb489b..3d287b3 100644 (file)
@@ -25,8 +25,9 @@
 #include <linux/pm_runtime.h>
 #include <linux/badblocks.h>
 #include <linux/part_stat.h>
-#include "blk-throttle.h"
+#include <linux/blktrace_api.h>
 
+#include "blk-throttle.h"
 #include "blk.h"
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
@@ -253,7 +254,7 @@ int __register_blkdev(unsigned int major, const char *name,
 #ifdef CONFIG_BLOCK_LEGACY_AUTOLOAD
        p->probe = probe;
 #endif
-       strlcpy(p->name, name, sizeof(p->name));
+       strscpy(p->name, name, sizeof(p->name));
        p->next = NULL;
        index = major_to_index(major);
 
@@ -318,18 +319,6 @@ void blk_free_ext_minor(unsigned int minor)
        ida_free(&ext_devt_ida, minor);
 }
 
-static char *bdevt_str(dev_t devt, char *buf)
-{
-       if (MAJOR(devt) <= 0xff && MINOR(devt) <= 0xff) {
-               char tbuf[BDEVT_SIZE];
-               snprintf(tbuf, BDEVT_SIZE, "%02x%02x", MAJOR(devt), MINOR(devt));
-               snprintf(buf, BDEVT_SIZE, "%-9s", tbuf);
-       } else
-               snprintf(buf, BDEVT_SIZE, "%03x:%05x", MAJOR(devt), MINOR(devt));
-
-       return buf;
-}
-
 void disk_uevent(struct gendisk *disk, enum kobject_action action)
 {
        struct block_device *part;
@@ -351,7 +340,7 @@ void disk_uevent(struct gendisk *disk, enum kobject_action action)
 }
 EXPORT_SYMBOL_GPL(disk_uevent);
 
-int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
+int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode)
 {
        struct block_device *bdev;
        int ret = 0;
@@ -369,18 +358,20 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
         * synchronize with other exclusive openers and other partition
         * scanners.
         */
-       if (!(mode & FMODE_EXCL)) {
-               ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions);
+       if (!(mode & BLK_OPEN_EXCL)) {
+               ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions,
+                                         NULL);
                if (ret)
                        return ret;
        }
 
        set_bit(GD_NEED_PART_SCAN, &disk->state);
-       bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL);
+       bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~BLK_OPEN_EXCL, NULL,
+                                NULL);
        if (IS_ERR(bdev))
                ret =  PTR_ERR(bdev);
        else
-               blkdev_put(bdev, mode & ~FMODE_EXCL);
+               blkdev_put(bdev, NULL);
 
        /*
         * If blkdev_get_by_dev() failed early, GD_NEED_PART_SCAN is still set,
@@ -388,7 +379,7 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
         * creat partition for underlying disk.
         */
        clear_bit(GD_NEED_PART_SCAN, &disk->state);
-       if (!(mode & FMODE_EXCL))
+       if (!(mode & BLK_OPEN_EXCL))
                bd_abort_claiming(disk->part0, disk_scan_partitions);
        return ret;
 }
@@ -516,7 +507,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 
                bdev_add(disk->part0, ddev->devt);
                if (get_capacity(disk))
-                       disk_scan_partitions(disk, FMODE_READ);
+                       disk_scan_partitions(disk, BLK_OPEN_READ);
 
                /*
                 * Announce the disk and partitions after all partitions are
@@ -563,6 +554,28 @@ out_exit_elevator:
 }
 EXPORT_SYMBOL(device_add_disk);
 
+static void blk_report_disk_dead(struct gendisk *disk)
+{
+       struct block_device *bdev;
+       unsigned long idx;
+
+       rcu_read_lock();
+       xa_for_each(&disk->part_tbl, idx, bdev) {
+               if (!kobject_get_unless_zero(&bdev->bd_device.kobj))
+                       continue;
+               rcu_read_unlock();
+
+               mutex_lock(&bdev->bd_holder_lock);
+               if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead)
+                       bdev->bd_holder_ops->mark_dead(bdev);
+               mutex_unlock(&bdev->bd_holder_lock);
+
+               put_device(&bdev->bd_device);
+               rcu_read_lock();
+       }
+       rcu_read_unlock();
+}
+
 /**
  * blk_mark_disk_dead - mark a disk as dead
  * @disk: disk to mark as dead
@@ -572,13 +585,26 @@ EXPORT_SYMBOL(device_add_disk);
  */
 void blk_mark_disk_dead(struct gendisk *disk)
 {
-       set_bit(GD_DEAD, &disk->state);
-       blk_queue_start_drain(disk->queue);
+       /*
+        * Fail any new I/O.
+        */
+       if (test_and_set_bit(GD_DEAD, &disk->state))
+               return;
+
+       if (test_bit(GD_OWNS_QUEUE, &disk->state))
+               blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
 
        /*
         * Stop buffered writers from dirtying pages that can't be written out.
         */
-       set_capacity_and_notify(disk, 0);
+       set_capacity(disk, 0);
+
+       /*
+        * Prevent new I/O from crossing bio_queue_enter().
+        */
+       blk_queue_start_drain(disk->queue);
+
+       blk_report_disk_dead(disk);
 }
 EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 
@@ -604,6 +630,8 @@ EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 void del_gendisk(struct gendisk *disk)
 {
        struct request_queue *q = disk->queue;
+       struct block_device *part;
+       unsigned long idx;
 
        might_sleep();
 
@@ -612,26 +640,27 @@ void del_gendisk(struct gendisk *disk)
 
        disk_del_events(disk);
 
+       /*
+        * Prevent new openers by unlinked the bdev inode, and write out
+        * dirty data before marking the disk dead and stopping all I/O.
+        */
        mutex_lock(&disk->open_mutex);
-       remove_inode_hash(disk->part0->bd_inode);
-       blk_drop_partitions(disk);
+       xa_for_each(&disk->part_tbl, idx, part) {
+               remove_inode_hash(part->bd_inode);
+               fsync_bdev(part);
+               __invalidate_device(part, true);
+       }
        mutex_unlock(&disk->open_mutex);
 
-       fsync_bdev(disk->part0);
-       __invalidate_device(disk->part0, true);
+       blk_mark_disk_dead(disk);
 
        /*
-        * Fail any new I/O.
+        * Drop all partitions now that the disk is marked dead.
         */
-       set_bit(GD_DEAD, &disk->state);
-       if (test_bit(GD_OWNS_QUEUE, &disk->state))
-               blk_queue_flag_set(QUEUE_FLAG_DYING, q);
-       set_capacity(disk, 0);
-
-       /*
-        * Prevent new I/O from crossing bio_queue_enter().
-        */
-       blk_queue_start_drain(q);
+       mutex_lock(&disk->open_mutex);
+       xa_for_each_start(&disk->part_tbl, idx, part, 1)
+               drop_partition(part);
+       mutex_unlock(&disk->open_mutex);
 
        if (!(disk->flags & GENHD_FL_HIDDEN)) {
                sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
@@ -755,57 +784,6 @@ void blk_request_module(dev_t devt)
 }
 #endif /* CONFIG_BLOCK_LEGACY_AUTOLOAD */
 
-/*
- * print a full list of all partitions - intended for places where the root
- * filesystem can't be mounted and thus to give the victim some idea of what
- * went wrong
- */
-void __init printk_all_partitions(void)
-{
-       struct class_dev_iter iter;
-       struct device *dev;
-
-       class_dev_iter_init(&iter, &block_class, NULL, &disk_type);
-       while ((dev = class_dev_iter_next(&iter))) {
-               struct gendisk *disk = dev_to_disk(dev);
-               struct block_device *part;
-               char devt_buf[BDEVT_SIZE];
-               unsigned long idx;
-
-               /*
-                * Don't show empty devices or things that have been
-                * suppressed
-                */
-               if (get_capacity(disk) == 0 || (disk->flags & GENHD_FL_HIDDEN))
-                       continue;
-
-               /*
-                * Note, unlike /proc/partitions, I am showing the numbers in
-                * hex - the same format as the root= option takes.
-                */
-               rcu_read_lock();
-               xa_for_each(&disk->part_tbl, idx, part) {
-                       if (!bdev_nr_sectors(part))
-                               continue;
-                       printk("%s%s %10llu %pg %s",
-                              bdev_is_partition(part) ? "  " : "",
-                              bdevt_str(part->bd_dev, devt_buf),
-                              bdev_nr_sectors(part) >> 1, part,
-                              part->bd_meta_info ?
-                                       part->bd_meta_info->uuid : "");
-                       if (bdev_is_partition(part))
-                               printk("\n");
-                       else if (dev->parent && dev->parent->driver)
-                               printk(" driver: %s\n",
-                                       dev->parent->driver->name);
-                       else
-                               printk(" (driver?)\n");
-               }
-               rcu_read_unlock();
-       }
-       class_dev_iter_exit(&iter);
-}
-
 #ifdef CONFIG_PROC_FS
 /* iterator */
 static void *disk_seqf_start(struct seq_file *seqf, loff_t *pos)
@@ -1171,6 +1149,8 @@ static void disk_release(struct device *dev)
        might_sleep();
        WARN_ON_ONCE(disk_live(disk));
 
+       blk_trace_remove(disk->queue);
+
        /*
         * To undo the all initialization from blk_mq_init_allocated_queue in
         * case of a probe failure where add_disk is never called we have to
@@ -1339,35 +1319,6 @@ dev_t part_devt(struct gendisk *disk, u8 partno)
        return devt;
 }
 
-dev_t blk_lookup_devt(const char *name, int partno)
-{
-       dev_t devt = MKDEV(0, 0);
-       struct class_dev_iter iter;
-       struct device *dev;
-
-       class_dev_iter_init(&iter, &block_class, NULL, &disk_type);
-       while ((dev = class_dev_iter_next(&iter))) {
-               struct gendisk *disk = dev_to_disk(dev);
-
-               if (strcmp(dev_name(dev), name))
-                       continue;
-
-               if (partno < disk->minors) {
-                       /* We need to return the right devno, even
-                        * if the partition doesn't exist yet.
-                        */
-                       devt = MKDEV(MAJOR(dev->devt),
-                                    MINOR(dev->devt) + partno);
-               } else {
-                       devt = part_devt(disk, partno);
-                       if (devt)
-                               break;
-               }
-       }
-       class_dev_iter_exit(&iter);
-       return devt;
-}
-
 struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
                struct lock_class_key *lkclass)
 {
index 9c5f637..3be1194 100644 (file)
@@ -82,7 +82,7 @@ static int compat_blkpg_ioctl(struct block_device *bdev,
 }
 #endif
 
-static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
+static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
                unsigned long arg)
 {
        uint64_t range[2];
@@ -90,7 +90,7 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
        struct inode *inode = bdev->bd_inode;
        int err;
 
-       if (!(mode & FMODE_WRITE))
+       if (!(mode & BLK_OPEN_WRITE))
                return -EBADF;
 
        if (!bdev_max_discard_sectors(bdev))
@@ -120,14 +120,14 @@ fail:
        return err;
 }
 
-static int blk_ioctl_secure_erase(struct block_device *bdev, fmode_t mode,
+static int blk_ioctl_secure_erase(struct block_device *bdev, blk_mode_t mode,
                void __user *argp)
 {
        uint64_t start, len;
        uint64_t range[2];
        int err;
 
-       if (!(mode & FMODE_WRITE))
+       if (!(mode & BLK_OPEN_WRITE))
                return -EBADF;
        if (!bdev_max_secure_erase_sectors(bdev))
                return -EOPNOTSUPP;
@@ -151,7 +151,7 @@ static int blk_ioctl_secure_erase(struct block_device *bdev, fmode_t mode,
 }
 
 
-static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
+static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
                unsigned long arg)
 {
        uint64_t range[2];
@@ -159,7 +159,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
        struct inode *inode = bdev->bd_inode;
        int err;
 
-       if (!(mode & FMODE_WRITE))
+       if (!(mode & BLK_OPEN_WRITE))
                return -EBADF;
 
        if (copy_from_user(range, (void __user *)arg, sizeof(range)))
@@ -240,7 +240,7 @@ static int compat_put_ulong(compat_ulong_t __user *argp, compat_ulong_t val)
  * drivers that implement only commands that are completely compatible
  * between 32-bit and 64-bit user space
  */
-int blkdev_compat_ptr_ioctl(struct block_device *bdev, fmode_t mode,
+int blkdev_compat_ptr_ioctl(struct block_device *bdev, blk_mode_t mode,
                        unsigned cmd, unsigned long arg)
 {
        struct gendisk *disk = bdev->bd_disk;
@@ -254,13 +254,28 @@ int blkdev_compat_ptr_ioctl(struct block_device *bdev, fmode_t mode,
 EXPORT_SYMBOL(blkdev_compat_ptr_ioctl);
 #endif
 
-static int blkdev_pr_register(struct block_device *bdev,
+static bool blkdev_pr_allowed(struct block_device *bdev, blk_mode_t mode)
+{
+       /* no sense to make reservations for partitions */
+       if (bdev_is_partition(bdev))
+               return false;
+
+       if (capable(CAP_SYS_ADMIN))
+               return true;
+       /*
+        * Only allow unprivileged reservations if the file descriptor is open
+        * for writing.
+        */
+       return mode & BLK_OPEN_WRITE;
+}
+
+static int blkdev_pr_register(struct block_device *bdev, blk_mode_t mode,
                struct pr_registration __user *arg)
 {
        const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
        struct pr_registration reg;
 
-       if (!capable(CAP_SYS_ADMIN))
+       if (!blkdev_pr_allowed(bdev, mode))
                return -EPERM;
        if (!ops || !ops->pr_register)
                return -EOPNOTSUPP;
@@ -272,13 +287,13 @@ static int blkdev_pr_register(struct block_device *bdev,
        return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
 }
 
-static int blkdev_pr_reserve(struct block_device *bdev,
+static int blkdev_pr_reserve(struct block_device *bdev, blk_mode_t mode,
                struct pr_reservation __user *arg)
 {
        const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
        struct pr_reservation rsv;
 
-       if (!capable(CAP_SYS_ADMIN))
+       if (!blkdev_pr_allowed(bdev, mode))
                return -EPERM;
        if (!ops || !ops->pr_reserve)
                return -EOPNOTSUPP;
@@ -290,13 +305,13 @@ static int blkdev_pr_reserve(struct block_device *bdev,
        return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags);
 }
 
-static int blkdev_pr_release(struct block_device *bdev,
+static int blkdev_pr_release(struct block_device *bdev, blk_mode_t mode,
                struct pr_reservation __user *arg)
 {
        const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
        struct pr_reservation rsv;
 
-       if (!capable(CAP_SYS_ADMIN))
+       if (!blkdev_pr_allowed(bdev, mode))
                return -EPERM;
        if (!ops || !ops->pr_release)
                return -EOPNOTSUPP;
@@ -308,13 +323,13 @@ static int blkdev_pr_release(struct block_device *bdev,
        return ops->pr_release(bdev, rsv.key, rsv.type);
 }
 
-static int blkdev_pr_preempt(struct block_device *bdev,
+static int blkdev_pr_preempt(struct block_device *bdev, blk_mode_t mode,
                struct pr_preempt __user *arg, bool abort)
 {
        const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
        struct pr_preempt p;
 
-       if (!capable(CAP_SYS_ADMIN))
+       if (!blkdev_pr_allowed(bdev, mode))
                return -EPERM;
        if (!ops || !ops->pr_preempt)
                return -EOPNOTSUPP;
@@ -326,13 +341,13 @@ static int blkdev_pr_preempt(struct block_device *bdev,
        return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort);
 }
 
-static int blkdev_pr_clear(struct block_device *bdev,
+static int blkdev_pr_clear(struct block_device *bdev, blk_mode_t mode,
                struct pr_clear __user *arg)
 {
        const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
        struct pr_clear c;
 
-       if (!capable(CAP_SYS_ADMIN))
+       if (!blkdev_pr_allowed(bdev, mode))
                return -EPERM;
        if (!ops || !ops->pr_clear)
                return -EOPNOTSUPP;
@@ -344,8 +359,8 @@ static int blkdev_pr_clear(struct block_device *bdev,
        return ops->pr_clear(bdev, c.key);
 }
 
-static int blkdev_flushbuf(struct block_device *bdev, fmode_t mode,
-               unsigned cmd, unsigned long arg)
+static int blkdev_flushbuf(struct block_device *bdev, unsigned cmd,
+               unsigned long arg)
 {
        if (!capable(CAP_SYS_ADMIN))
                return -EACCES;
@@ -354,8 +369,8 @@ static int blkdev_flushbuf(struct block_device *bdev, fmode_t mode,
        return 0;
 }
 
-static int blkdev_roset(struct block_device *bdev, fmode_t mode,
-               unsigned cmd, unsigned long arg)
+static int blkdev_roset(struct block_device *bdev, unsigned cmd,
+               unsigned long arg)
 {
        int ret, n;
 
@@ -439,7 +454,7 @@ static int compat_hdio_getgeo(struct block_device *bdev,
 #endif
 
 /* set the logical block size */
-static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
+static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
                int __user *argp)
 {
        int ret, n;
@@ -451,13 +466,13 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
        if (get_user(n, argp))
                return -EFAULT;
 
-       if (mode & FMODE_EXCL)
+       if (mode & BLK_OPEN_EXCL)
                return set_blocksize(bdev, n);
 
-       if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev)))
+       if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode, &bdev, NULL)))
                return -EBUSY;
        ret = set_blocksize(bdev, n);
-       blkdev_put(bdev, mode | FMODE_EXCL);
+       blkdev_put(bdev, &bdev);
 
        return ret;
 }
@@ -467,7 +482,7 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
  * user space. Note the separate arg/argp parameters that are needed
  * to deal with the compat_ptr() conversion.
  */
-static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
+static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
                               unsigned int cmd, unsigned long arg,
                               void __user *argp)
 {
@@ -475,9 +490,9 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
 
        switch (cmd) {
        case BLKFLSBUF:
-               return blkdev_flushbuf(bdev, mode, cmd, arg);
+               return blkdev_flushbuf(bdev, cmd, arg);
        case BLKROSET:
-               return blkdev_roset(bdev, mode, cmd, arg);
+               return blkdev_roset(bdev, cmd, arg);
        case BLKDISCARD:
                return blk_ioctl_discard(bdev, mode, arg);
        case BLKSECDISCARD:
@@ -487,7 +502,7 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
        case BLKGETDISKSEQ:
                return put_u64(argp, bdev->bd_disk->diskseq);
        case BLKREPORTZONE:
-               return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
+               return blkdev_report_zones_ioctl(bdev, cmd, arg);
        case BLKRESETZONE:
        case BLKOPENZONE:
        case BLKCLOSEZONE:
@@ -534,17 +549,17 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
        case BLKTRACETEARDOWN:
                return blk_trace_ioctl(bdev, cmd, argp);
        case IOC_PR_REGISTER:
-               return blkdev_pr_register(bdev, argp);
+               return blkdev_pr_register(bdev, mode, argp);
        case IOC_PR_RESERVE:
-               return blkdev_pr_reserve(bdev, argp);
+               return blkdev_pr_reserve(bdev, mode, argp);
        case IOC_PR_RELEASE:
-               return blkdev_pr_release(bdev, argp);
+               return blkdev_pr_release(bdev, mode, argp);
        case IOC_PR_PREEMPT:
-               return blkdev_pr_preempt(bdev, argp, false);
+               return blkdev_pr_preempt(bdev, mode, argp, false);
        case IOC_PR_PREEMPT_ABORT:
-               return blkdev_pr_preempt(bdev, argp, true);
+               return blkdev_pr_preempt(bdev, mode, argp, true);
        case IOC_PR_CLEAR:
-               return blkdev_pr_clear(bdev, argp);
+               return blkdev_pr_clear(bdev, mode, argp);
        default:
                return -ENOIOCTLCMD;
        }
@@ -560,18 +575,9 @@ long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 {
        struct block_device *bdev = I_BDEV(file->f_mapping->host);
        void __user *argp = (void __user *)arg;
-       fmode_t mode = file->f_mode;
+       blk_mode_t mode = file_to_blk_mode(file);
        int ret;
 
-       /*
-        * O_NDELAY can be altered using fcntl(.., F_SETFL, ..), so we have
-        * to updated it before every ioctl.
-        */
-       if (file->f_flags & O_NDELAY)
-               mode |= FMODE_NDELAY;
-       else
-               mode &= ~FMODE_NDELAY;
-
        switch (cmd) {
        /* These need separate implementations for the data structure */
        case HDIO_GETGEO:
@@ -630,16 +636,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
        void __user *argp = compat_ptr(arg);
        struct block_device *bdev = I_BDEV(file->f_mapping->host);
        struct gendisk *disk = bdev->bd_disk;
-       fmode_t mode = file->f_mode;
-
-       /*
-        * O_NDELAY can be altered using fcntl(.., F_SETFL, ..), so we have
-        * to updated it before every ioctl.
-        */
-       if (file->f_flags & O_NDELAY)
-               mode |= FMODE_NDELAY;
-       else
-               mode &= ~FMODE_NDELAY;
+       blk_mode_t mode = file_to_blk_mode(file);
 
        switch (cmd) {
        /* These need separate implementations for the data structure */
index 5839a02..6aa5daf 100644 (file)
@@ -74,8 +74,8 @@ struct dd_per_prio {
        struct list_head dispatch;
        struct rb_root sort_list[DD_DIR_COUNT];
        struct list_head fifo_list[DD_DIR_COUNT];
-       /* Next request in FIFO order. Read, write or both are NULL. */
-       struct request *next_rq[DD_DIR_COUNT];
+       /* Position of the most recently dispatched request. */
+       sector_t latest_pos[DD_DIR_COUNT];
        struct io_stats_per_prio stats;
 };
 
@@ -156,6 +156,40 @@ deadline_latter_request(struct request *rq)
        return NULL;
 }
 
+/*
+ * Return the first request for which blk_rq_pos() >= @pos. For zoned devices,
+ * return the first request after the start of the zone containing @pos.
+ */
+static inline struct request *deadline_from_pos(struct dd_per_prio *per_prio,
+                               enum dd_data_dir data_dir, sector_t pos)
+{
+       struct rb_node *node = per_prio->sort_list[data_dir].rb_node;
+       struct request *rq, *res = NULL;
+
+       if (!node)
+               return NULL;
+
+       rq = rb_entry_rq(node);
+       /*
+        * A zoned write may have been requeued with a starting position that
+        * is below that of the most recently dispatched request. Hence, for
+        * zoned writes, start searching from the start of a zone.
+        */
+       if (blk_rq_is_seq_zoned_write(rq))
+               pos -= round_down(pos, rq->q->limits.chunk_sectors);
+
+       while (node) {
+               rq = rb_entry_rq(node);
+               if (blk_rq_pos(rq) >= pos) {
+                       res = rq;
+                       node = node->rb_left;
+               } else {
+                       node = node->rb_right;
+               }
+       }
+       return res;
+}
+
 static void
 deadline_add_rq_rb(struct dd_per_prio *per_prio, struct request *rq)
 {
@@ -167,11 +201,6 @@ deadline_add_rq_rb(struct dd_per_prio *per_prio, struct request *rq)
 static inline void
 deadline_del_rq_rb(struct dd_per_prio *per_prio, struct request *rq)
 {
-       const enum dd_data_dir data_dir = rq_data_dir(rq);
-
-       if (per_prio->next_rq[data_dir] == rq)
-               per_prio->next_rq[data_dir] = deadline_latter_request(rq);
-
        elv_rb_del(deadline_rb_root(per_prio, rq), rq);
 }
 
@@ -251,10 +280,6 @@ static void
 deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
                      struct request *rq)
 {
-       const enum dd_data_dir data_dir = rq_data_dir(rq);
-
-       per_prio->next_rq[data_dir] = deadline_latter_request(rq);
-
        /*
         * take it off the sort and fifo list
         */
@@ -272,21 +297,15 @@ static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
 }
 
 /*
- * deadline_check_fifo returns 0 if there are no expired requests on the fifo,
- * 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
+ * deadline_check_fifo returns true if and only if there are expired requests
+ * in the FIFO list. Requires !list_empty(&dd->fifo_list[data_dir]).
  */
-static inline int deadline_check_fifo(struct dd_per_prio *per_prio,
-                                     enum dd_data_dir data_dir)
+static inline bool deadline_check_fifo(struct dd_per_prio *per_prio,
+                                      enum dd_data_dir data_dir)
 {
        struct request *rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
 
-       /*
-        * rq is expired!
-        */
-       if (time_after_eq(jiffies, (unsigned long)rq->fifo_time))
-               return 1;
-
-       return 0;
+       return time_is_before_eq_jiffies((unsigned long)rq->fifo_time);
 }
 
 /*
@@ -310,14 +329,11 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
                                                struct request *rq)
 {
        sector_t pos = blk_rq_pos(rq);
-       sector_t skipped_sectors = 0;
 
-       while (rq) {
-               if (blk_rq_pos(rq) != pos + skipped_sectors)
-                       break;
-               skipped_sectors += blk_rq_sectors(rq);
+       do {
+               pos += blk_rq_sectors(rq);
                rq = deadline_latter_request(rq);
-       }
+       } while (rq && blk_rq_pos(rq) == pos);
 
        return rq;
 }
@@ -330,7 +346,7 @@ static struct request *
 deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
                      enum dd_data_dir data_dir)
 {
-       struct request *rq;
+       struct request *rq, *rb_rq, *next;
        unsigned long flags;
 
        if (list_empty(&per_prio->fifo_list[data_dir]))
@@ -348,7 +364,12 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
         * zones and these zones are unlocked.
         */
        spin_lock_irqsave(&dd->zone_lock, flags);
-       list_for_each_entry(rq, &per_prio->fifo_list[DD_WRITE], queuelist) {
+       list_for_each_entry_safe(rq, next, &per_prio->fifo_list[DD_WRITE],
+                                queuelist) {
+               /* Check whether a prior request exists for the same zone. */
+               rb_rq = deadline_from_pos(per_prio, data_dir, blk_rq_pos(rq));
+               if (rb_rq && blk_rq_pos(rb_rq) < blk_rq_pos(rq))
+                       rq = rb_rq;
                if (blk_req_can_dispatch_to_zone(rq) &&
                    (blk_queue_nonrot(rq->q) ||
                     !deadline_is_seq_write(dd, rq)))
@@ -372,7 +393,8 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
        struct request *rq;
        unsigned long flags;
 
-       rq = per_prio->next_rq[data_dir];
+       rq = deadline_from_pos(per_prio, data_dir,
+                              per_prio->latest_pos[data_dir]);
        if (!rq)
                return NULL;
 
@@ -435,6 +457,7 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
                if (started_after(dd, rq, latest_start))
                        return NULL;
                list_del_init(&rq->queuelist);
+               data_dir = rq_data_dir(rq);
                goto done;
        }
 
@@ -442,9 +465,11 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
         * batches are currently reads XOR writes
         */
        rq = deadline_next_request(dd, per_prio, dd->last_dir);
-       if (rq && dd->batching < dd->fifo_batch)
-               /* we have a next request are still entitled to batch */
+       if (rq && dd->batching < dd->fifo_batch) {
+               /* we have a next request and are still entitled to batch */
+               data_dir = rq_data_dir(rq);
                goto dispatch_request;
+       }
 
        /*
         * at this point we are not running a batch. select the appropriate
@@ -522,6 +547,7 @@ dispatch_request:
 done:
        ioprio_class = dd_rq_ioclass(rq);
        prio = ioprio_class_to_prio[ioprio_class];
+       dd->per_prio[prio].latest_pos[data_dir] = blk_rq_pos(rq);
        dd->per_prio[prio].stats.dispatched++;
        /*
         * If the request needs its target zone locked, do it.
@@ -766,7 +792,7 @@ static bool dd_bio_merge(struct request_queue *q, struct bio *bio,
  * add rq to rbtree and fifo
  */
 static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
-                             blk_insert_t flags)
+                             blk_insert_t flags, struct list_head *free)
 {
        struct request_queue *q = hctx->queue;
        struct deadline_data *dd = q->elevator->elevator_data;
@@ -775,7 +801,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
        u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
        struct dd_per_prio *per_prio;
        enum dd_prio prio;
-       LIST_HEAD(free);
 
        lockdep_assert_held(&dd->lock);
 
@@ -792,10 +817,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
                rq->elv.priv[0] = (void *)(uintptr_t)1;
        }
 
-       if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
-               blk_mq_free_requests(&free);
+       if (blk_mq_sched_try_insert_merge(q, rq, free))
                return;
-       }
 
        trace_block_rq_insert(rq);
 
@@ -803,6 +826,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
                list_add(&rq->queuelist, &per_prio->dispatch);
                rq->fifo_time = jiffies;
        } else {
+               struct list_head *insert_before;
+
                deadline_add_rq_rb(per_prio, rq);
 
                if (rq_mergeable(rq)) {
@@ -815,7 +840,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
                 * set expire time and add to fifo list
                 */
                rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
-               list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
+               insert_before = &per_prio->fifo_list[data_dir];
+#ifdef CONFIG_BLK_DEV_ZONED
+               /*
+                * Insert zoned writes such that requests are sorted by
+                * position per zone.
+                */
+               if (blk_rq_is_seq_zoned_write(rq)) {
+                       struct request *rq2 = deadline_latter_request(rq);
+
+                       if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
+                               insert_before = &rq2->queuelist;
+               }
+#endif
+               list_add_tail(&rq->queuelist, insert_before);
        }
 }
 
@@ -828,6 +866,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 {
        struct request_queue *q = hctx->queue;
        struct deadline_data *dd = q->elevator->elevator_data;
+       LIST_HEAD(free);
 
        spin_lock(&dd->lock);
        while (!list_empty(list)) {
@@ -835,9 +874,11 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 
                rq = list_first_entry(list, struct request, queuelist);
                list_del_init(&rq->queuelist);
-               dd_insert_request(hctx, rq, flags);
+               dd_insert_request(hctx, rq, flags, &free);
        }
        spin_unlock(&dd->lock);
+
+       blk_mq_free_requests(&free);
 }
 
 /* Callback from inside blk_mq_rq_ctx_init(). */
@@ -1035,8 +1076,10 @@ static int deadline_##name##_next_rq_show(void *data,                    \
        struct request_queue *q = data;                                 \
        struct deadline_data *dd = q->elevator->elevator_data;          \
        struct dd_per_prio *per_prio = &dd->per_prio[prio];             \
-       struct request *rq = per_prio->next_rq[data_dir];               \
+       struct request *rq;                                             \
                                                                        \
+       rq = deadline_from_pos(per_prio, data_dir,                      \
+                              per_prio->latest_pos[data_dir]);         \
        if (rq)                                                         \
                __blk_mq_debugfs_rq_show(m, rq);                        \
        return 0;                                                       \
index 5c8624e..ed222b9 100644 (file)
 #define pr_fmt(fmt) fmt
 
 #include <linux/types.h>
+#include <linux/mm_types.h>
+#include <linux/overflow.h>
 #include <linux/affs_hardblocks.h>
 
 #include "check.h"
 
+/* magic offsets in partition DosEnvVec */
+#define NR_HD  3
+#define NR_SECT        5
+#define LO_CYL 9
+#define HI_CYL 10
+
 static __inline__ u32
 checksum_block(__be32 *m, int size)
 {
@@ -31,8 +39,12 @@ int amiga_partition(struct parsed_partitions *state)
        unsigned char *data;
        struct RigidDiskBlock *rdb;
        struct PartitionBlock *pb;
-       int start_sect, nr_sects, blk, part, res = 0;
-       int blksize = 1;        /* Multiplier for disk block size */
+       u64 start_sect, nr_sects;
+       sector_t blk, end_sect;
+       u32 cylblk;             /* rdb_CylBlocks = nr_heads*sect_per_track */
+       u32 nr_hd, nr_sect, lo_cyl, hi_cyl;
+       int part, res = 0;
+       unsigned int blksize = 1;       /* Multiplier for disk block size */
        int slot = 1;
 
        for (blk = 0; ; blk++, put_dev_sector(sect)) {
@@ -40,7 +52,7 @@ int amiga_partition(struct parsed_partitions *state)
                        goto rdb_done;
                data = read_part_sector(state, blk, &sect);
                if (!data) {
-                       pr_err("Dev %s: unable to read RDB block %d\n",
+                       pr_err("Dev %s: unable to read RDB block %llu\n",
                               state->disk->disk_name, blk);
                        res = -1;
                        goto rdb_done;
@@ -57,12 +69,12 @@ int amiga_partition(struct parsed_partitions *state)
                *(__be32 *)(data+0xdc) = 0;
                if (checksum_block((__be32 *)data,
                                be32_to_cpu(rdb->rdb_SummedLongs) & 0x7F)==0) {
-                       pr_err("Trashed word at 0xd0 in block %d ignored in checksum calculation\n",
+                       pr_err("Trashed word at 0xd0 in block %llu ignored in checksum calculation\n",
                               blk);
                        break;
                }
 
-               pr_err("Dev %s: RDB in block %d has bad checksum\n",
+               pr_err("Dev %s: RDB in block %llu has bad checksum\n",
                       state->disk->disk_name, blk);
        }
 
@@ -79,10 +91,15 @@ int amiga_partition(struct parsed_partitions *state)
        blk = be32_to_cpu(rdb->rdb_PartitionList);
        put_dev_sector(sect);
        for (part = 1; blk>0 && part<=16; part++, put_dev_sector(sect)) {
-               blk *= blksize; /* Read in terms partition table understands */
+               /* Read in terms partition table understands */
+               if (check_mul_overflow(blk, (sector_t) blksize, &blk)) {
+                       pr_err("Dev %s: overflow calculating partition block %llu! Skipping partitions %u and beyond\n",
+                               state->disk->disk_name, blk, part);
+                       break;
+               }
                data = read_part_sector(state, blk, &sect);
                if (!data) {
-                       pr_err("Dev %s: unable to read partition block %d\n",
+                       pr_err("Dev %s: unable to read partition block %llu\n",
                               state->disk->disk_name, blk);
                        res = -1;
                        goto rdb_done;
@@ -94,19 +111,70 @@ int amiga_partition(struct parsed_partitions *state)
                if (checksum_block((__be32 *)pb, be32_to_cpu(pb->pb_SummedLongs) & 0x7F) != 0 )
                        continue;
 
-               /* Tell Kernel about it */
+               /* RDB gives us more than enough rope to hang ourselves with,
+                * many times over (2^128 bytes if all fields max out).
+                * Some careful checks are in order, so check for potential
+                * overflows.
+                * We are multiplying four 32 bit numbers to one sector_t!
+                */
+
+               nr_hd   = be32_to_cpu(pb->pb_Environment[NR_HD]);
+               nr_sect = be32_to_cpu(pb->pb_Environment[NR_SECT]);
+
+               /* CylBlocks is total number of blocks per cylinder */
+               if (check_mul_overflow(nr_hd, nr_sect, &cylblk)) {
+                       pr_err("Dev %s: heads*sects %u overflows u32, skipping partition!\n",
+                               state->disk->disk_name, cylblk);
+                       continue;
+               }
+
+               /* check for consistency with RDB defined CylBlocks */
+               if (cylblk > be32_to_cpu(rdb->rdb_CylBlocks)) {
+                       pr_warn("Dev %s: cylblk %u > rdb_CylBlocks %u!\n",
+                               state->disk->disk_name, cylblk,
+                               be32_to_cpu(rdb->rdb_CylBlocks));
+               }
+
+               /* RDB allows for variable logical block size -
+                * normalize to 512 byte blocks and check result.
+                */
+
+               if (check_mul_overflow(cylblk, blksize, &cylblk)) {
+                       pr_err("Dev %s: partition %u bytes per cyl. overflows u32, skipping partition!\n",
+                               state->disk->disk_name, part);
+                       continue;
+               }
+
+               /* Calculate partition start and end. Limit of 32 bit on cylblk
+                * guarantees no overflow occurs if LBD support is enabled.
+                */
+
+               lo_cyl = be32_to_cpu(pb->pb_Environment[LO_CYL]);
+               start_sect = ((u64) lo_cyl * cylblk);
+
+               hi_cyl = be32_to_cpu(pb->pb_Environment[HI_CYL]);
+               nr_sects = (((u64) hi_cyl - lo_cyl + 1) * cylblk);
 
-               nr_sects = (be32_to_cpu(pb->pb_Environment[10]) + 1 -
-                           be32_to_cpu(pb->pb_Environment[9])) *
-                          be32_to_cpu(pb->pb_Environment[3]) *
-                          be32_to_cpu(pb->pb_Environment[5]) *
-                          blksize;
                if (!nr_sects)
                        continue;
-               start_sect = be32_to_cpu(pb->pb_Environment[9]) *
-                            be32_to_cpu(pb->pb_Environment[3]) *
-                            be32_to_cpu(pb->pb_Environment[5]) *
-                            blksize;
+
+               /* Warn user if partition end overflows u32 (AmigaDOS limit) */
+
+               if ((start_sect + nr_sects) > UINT_MAX) {
+                       pr_warn("Dev %s: partition %u (%llu-%llu) needs 64 bit device support!\n",
+                               state->disk->disk_name, part,
+                               start_sect, start_sect + nr_sects);
+               }
+
+               if (check_add_overflow(start_sect, nr_sects, &end_sect)) {
+                       pr_err("Dev %s: partition %u (%llu-%llu) needs LBD device support, skipping partition!\n",
+                               state->disk->disk_name, part,
+                               start_sect, end_sect);
+                       continue;
+               }
+
+               /* Tell Kernel about it */
+
                put_partition(state,slot++,start_sect,nr_sects);
                {
                        /* Be even more informative to aid mounting */
index 49e0496..13a7341 100644 (file)
@@ -12,7 +12,7 @@
 #include <linux/raid/detect.h>
 #include "check.h"
 
-static int (*check_part[])(struct parsed_partitions *) = {
+static int (*const check_part[])(struct parsed_partitions *) = {
        /*
         * Probe partition formats with tables at disk address 0
         * that also have an ADFS boot block at 0xdc0.
@@ -228,7 +228,7 @@ static struct attribute *part_attrs[] = {
        NULL
 };
 
-static struct attribute_group part_attr_group = {
+static const struct attribute_group part_attr_group = {
        .attrs = part_attrs,
 };
 
@@ -256,31 +256,36 @@ static int part_uevent(const struct device *dev, struct kobj_uevent_env *env)
        return 0;
 }
 
-struct device_type part_type = {
+const struct device_type part_type = {
        .name           = "partition",
        .groups         = part_attr_groups,
        .release        = part_release,
        .uevent         = part_uevent,
 };
 
-static void delete_partition(struct block_device *part)
+void drop_partition(struct block_device *part)
 {
        lockdep_assert_held(&part->bd_disk->open_mutex);
 
-       fsync_bdev(part);
-       __invalidate_device(part, true);
-
        xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
        kobject_put(part->bd_holder_dir);
+
        device_del(&part->bd_device);
+       put_device(&part->bd_device);
+}
 
+static void delete_partition(struct block_device *part)
+{
        /*
         * Remove the block device from the inode hash, so that it cannot be
         * looked up any more even when openers still hold references.
         */
        remove_inode_hash(part->bd_inode);
 
-       put_device(&part->bd_device);
+       fsync_bdev(part);
+       __invalidate_device(part, true);
+
+       drop_partition(part);
 }
 
 static ssize_t whole_disk_show(struct device *dev,
@@ -288,7 +293,7 @@ static ssize_t whole_disk_show(struct device *dev,
 {
        return 0;
 }
-static DEVICE_ATTR(whole_disk, 0444, whole_disk_show, NULL);
+static const DEVICE_ATTR(whole_disk, 0444, whole_disk_show, NULL);
 
 /*
  * Must be called either with open_mutex held, before a disk can be opened or
@@ -436,10 +441,21 @@ static bool partition_overlaps(struct gendisk *disk, sector_t start,
 int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
                sector_t length)
 {
+       sector_t capacity = get_capacity(disk), end;
        struct block_device *part;
        int ret;
 
        mutex_lock(&disk->open_mutex);
+       if (check_add_overflow(start, length, &end)) {
+               ret = -EINVAL;
+               goto out;
+       }
+
+       if (start >= capacity || end > capacity) {
+               ret = -EINVAL;
+               goto out;
+       }
+
        if (!disk_live(disk)) {
                ret = -ENXIO;
                goto out;
@@ -519,17 +535,6 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
        return true;
 }
 
-void blk_drop_partitions(struct gendisk *disk)
-{
-       struct block_device *part;
-       unsigned long idx;
-
-       lockdep_assert_held(&disk->open_mutex);
-
-       xa_for_each_start(&disk->part_tbl, idx, part, 1)
-               delete_partition(part);
-}
-
 static bool blk_add_partition(struct gendisk *disk,
                struct parsed_partitions *state, int p)
 {
@@ -646,6 +651,8 @@ out_free_state:
 
 int bdev_disk_changed(struct gendisk *disk, bool invalidate)
 {
+       struct block_device *part;
+       unsigned long idx;
        int ret = 0;
 
        lockdep_assert_held(&disk->open_mutex);
@@ -658,8 +665,9 @@ rescan:
                return -EBUSY;
        sync_blockdev(disk->part0);
        invalidate_bdev(disk->part0);
-       blk_drop_partitions(disk);
 
+       xa_for_each_start(&disk->part_tbl, idx, part, 1)
+               delete_partition(part);
        clear_bit(GD_NEED_PART_SCAN, &disk->state);
 
        /*
index eca5671..50c933f 100644 (file)
@@ -380,9 +380,10 @@ int public_key_verify_signature(const struct public_key *pkey,
        struct crypto_wait cwait;
        struct crypto_akcipher *tfm;
        struct akcipher_request *req;
-       struct scatterlist src_sg[2];
+       struct scatterlist src_sg;
        char alg_name[CRYPTO_MAX_ALG_NAME];
-       char *key, *ptr;
+       char *buf, *ptr;
+       size_t buf_len;
        int ret;
 
        pr_devel("==>%s()\n", __func__);
@@ -420,34 +421,37 @@ int public_key_verify_signature(const struct public_key *pkey,
        if (!req)
                goto error_free_tfm;
 
-       key = kmalloc(pkey->keylen + sizeof(u32) * 2 + pkey->paramlen,
-                     GFP_KERNEL);
-       if (!key)
+       buf_len = max_t(size_t, pkey->keylen + sizeof(u32) * 2 + pkey->paramlen,
+                       sig->s_size + sig->digest_size);
+
+       buf = kmalloc(buf_len, GFP_KERNEL);
+       if (!buf)
                goto error_free_req;
 
-       memcpy(key, pkey->key, pkey->keylen);
-       ptr = key + pkey->keylen;
+       memcpy(buf, pkey->key, pkey->keylen);
+       ptr = buf + pkey->keylen;
        ptr = pkey_pack_u32(ptr, pkey->algo);
        ptr = pkey_pack_u32(ptr, pkey->paramlen);
        memcpy(ptr, pkey->params, pkey->paramlen);
 
        if (pkey->key_is_private)
-               ret = crypto_akcipher_set_priv_key(tfm, key, pkey->keylen);
+               ret = crypto_akcipher_set_priv_key(tfm, buf, pkey->keylen);
        else
-               ret = crypto_akcipher_set_pub_key(tfm, key, pkey->keylen);
+               ret = crypto_akcipher_set_pub_key(tfm, buf, pkey->keylen);
        if (ret)
-               goto error_free_key;
+               goto error_free_buf;
 
        if (strcmp(pkey->pkey_algo, "sm2") == 0 && sig->data_size) {
                ret = cert_sig_digest_update(sig, tfm);
                if (ret)
-                       goto error_free_key;
+                       goto error_free_buf;
        }
 
-       sg_init_table(src_sg, 2);
-       sg_set_buf(&src_sg[0], sig->s, sig->s_size);
-       sg_set_buf(&src_sg[1], sig->digest, sig->digest_size);
-       akcipher_request_set_crypt(req, src_sg, NULL, sig->s_size,
+       memcpy(buf, sig->s, sig->s_size);
+       memcpy(buf + sig->s_size, sig->digest, sig->digest_size);
+
+       sg_init_one(&src_sg, buf, sig->s_size + sig->digest_size);
+       akcipher_request_set_crypt(req, &src_sg, NULL, sig->s_size,
                                   sig->digest_size);
        crypto_init_wait(&cwait);
        akcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
@@ -455,8 +459,8 @@ int public_key_verify_signature(const struct public_key *pkey,
                                      crypto_req_done, &cwait);
        ret = crypto_wait_req(crypto_akcipher_verify(req), &cwait);
 
-error_free_key:
-       kfree(key);
+error_free_buf:
+       kfree(buf);
 error_free_req:
        akcipher_request_free(req);
 error_free_tfm:
index 9bdf168..1a4c4ed 100644 (file)
@@ -7,6 +7,7 @@ config DRM_ACCEL_IVPU
        depends on PCI && PCI_MSI
        select FW_LOADER
        select SHMEM
+       select GENERIC_ALLOCATOR
        help
          Choose this option if you have a system that has an 14th generation Intel CPU
          or newer. VPU stands for Versatile Processing Unit and it's a CPU-integrated
index 382ec12..fef3542 100644 (file)
@@ -197,6 +197,11 @@ static void ivpu_pll_init_frequency_ratios(struct ivpu_device *vdev)
        hw->pll.pn_ratio = clamp_t(u8, fuse_pn_ratio, hw->pll.min_ratio, hw->pll.max_ratio);
 }
 
+static int ivpu_hw_mtl_wait_for_vpuip_bar(struct ivpu_device *vdev)
+{
+       return REGV_POLL_FLD(MTL_VPU_HOST_SS_CPR_RST_CLR, AON, 0, 100);
+}
+
 static int ivpu_pll_drive(struct ivpu_device *vdev, bool enable)
 {
        struct ivpu_hw_info *hw = vdev->hw;
@@ -239,6 +244,12 @@ static int ivpu_pll_drive(struct ivpu_device *vdev, bool enable)
                        ivpu_err(vdev, "Timed out waiting for PLL ready status\n");
                        return ret;
                }
+
+               ret = ivpu_hw_mtl_wait_for_vpuip_bar(vdev);
+               if (ret) {
+                       ivpu_err(vdev, "Timed out waiting for VPUIP bar\n");
+                       return ret;
+               }
        }
 
        return 0;
@@ -256,7 +267,7 @@ static int ivpu_pll_disable(struct ivpu_device *vdev)
 
 static void ivpu_boot_host_ss_rst_clr_assert(struct ivpu_device *vdev)
 {
-       u32 val = REGV_RD32(MTL_VPU_HOST_SS_CPR_RST_CLR);
+       u32 val = 0;
 
        val = REG_SET_FLD(MTL_VPU_HOST_SS_CPR_RST_CLR, TOP_NOC, val);
        val = REG_SET_FLD(MTL_VPU_HOST_SS_CPR_RST_CLR, DSS_MAS, val);
@@ -754,9 +765,8 @@ static int ivpu_hw_mtl_power_down(struct ivpu_device *vdev)
 {
        int ret = 0;
 
-       if (ivpu_hw_mtl_reset(vdev)) {
+       if (!ivpu_hw_mtl_is_idle(vdev) && ivpu_hw_mtl_reset(vdev)) {
                ivpu_err(vdev, "Failed to reset the VPU\n");
-               ret = -EIO;
        }
 
        if (ivpu_pll_disable(vdev)) {
@@ -764,8 +774,10 @@ static int ivpu_hw_mtl_power_down(struct ivpu_device *vdev)
                ret = -EIO;
        }
 
-       if (ivpu_hw_mtl_d0i3_enable(vdev))
-               ivpu_warn(vdev, "Failed to enable D0I3\n");
+       if (ivpu_hw_mtl_d0i3_enable(vdev)) {
+               ivpu_err(vdev, "Failed to enter D0I3\n");
+               ret = -EIO;
+       }
 
        return ret;
 }
index d83ccfd..593b8ff 100644 (file)
@@ -91,6 +91,7 @@
 #define MTL_VPU_HOST_SS_CPR_RST_SET_MSS_MAS_MASK                       BIT_MASK(11)
 
 #define MTL_VPU_HOST_SS_CPR_RST_CLR                                    0x00000098u
+#define MTL_VPU_HOST_SS_CPR_RST_CLR_AON_MASK                           BIT_MASK(0)
 #define MTL_VPU_HOST_SS_CPR_RST_CLR_TOP_NOC_MASK                       BIT_MASK(1)
 #define MTL_VPU_HOST_SS_CPR_RST_CLR_DSS_MAS_MASK                       BIT_MASK(10)
 #define MTL_VPU_HOST_SS_CPR_RST_CLR_MSS_MAS_MASK                       BIT_MASK(11)
index 3adcfa8..fa0af59 100644 (file)
@@ -183,9 +183,7 @@ ivpu_ipc_send(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons, struct v
        struct ivpu_ipc_info *ipc = vdev->ipc;
        int ret;
 
-       ret = mutex_lock_interruptible(&ipc->lock);
-       if (ret)
-               return ret;
+       mutex_lock(&ipc->lock);
 
        if (!ipc->on) {
                ret = -EAGAIN;
index 3c6f1e1..d45be06 100644 (file)
@@ -431,6 +431,7 @@ ivpu_job_prepare_bos_for_submit(struct drm_file *file, struct ivpu_job *job, u32
        struct ivpu_file_priv *file_priv = file->driver_priv;
        struct ivpu_device *vdev = file_priv->vdev;
        struct ww_acquire_ctx acquire_ctx;
+       enum dma_resv_usage usage;
        struct ivpu_bo *bo;
        int ret;
        u32 i;
@@ -461,22 +462,28 @@ ivpu_job_prepare_bos_for_submit(struct drm_file *file, struct ivpu_job *job, u32
 
        job->cmd_buf_vpu_addr = bo->vpu_addr + commands_offset;
 
-       ret = drm_gem_lock_reservations((struct drm_gem_object **)job->bos, 1, &acquire_ctx);
+       ret = drm_gem_lock_reservations((struct drm_gem_object **)job->bos, buf_count,
+                                       &acquire_ctx);
        if (ret) {
                ivpu_warn(vdev, "Failed to lock reservations: %d\n", ret);
                return ret;
        }
 
-       ret = dma_resv_reserve_fences(bo->base.resv, 1);
-       if (ret) {
-               ivpu_warn(vdev, "Failed to reserve fences: %d\n", ret);
-               goto unlock_reservations;
+       for (i = 0; i < buf_count; i++) {
+               ret = dma_resv_reserve_fences(job->bos[i]->base.resv, 1);
+               if (ret) {
+                       ivpu_warn(vdev, "Failed to reserve fences: %d\n", ret);
+                       goto unlock_reservations;
+               }
        }
 
-       dma_resv_add_fence(bo->base.resv, job->done_fence, DMA_RESV_USAGE_WRITE);
+       for (i = 0; i < buf_count; i++) {
+               usage = (i == CMD_BUF_IDX) ? DMA_RESV_USAGE_WRITE : DMA_RESV_USAGE_BOOKKEEP;
+               dma_resv_add_fence(job->bos[i]->base.resv, job->done_fence, usage);
+       }
 
 unlock_reservations:
-       drm_gem_unlock_reservations((struct drm_gem_object **)job->bos, 1, &acquire_ctx);
+       drm_gem_unlock_reservations((struct drm_gem_object **)job->bos, buf_count, &acquire_ctx);
 
        wmb(); /* Flush write combining buffers */
 
index 694e978..b8b259b 100644 (file)
@@ -587,16 +587,11 @@ static int ivpu_mmu_strtab_init(struct ivpu_device *vdev)
 int ivpu_mmu_invalidate_tlb(struct ivpu_device *vdev, u16 ssid)
 {
        struct ivpu_mmu_info *mmu = vdev->mmu;
-       int ret;
-
-       ret = mutex_lock_interruptible(&mmu->lock);
-       if (ret)
-               return ret;
+       int ret = 0;
 
-       if (!mmu->on) {
-               ret = 0;
+       mutex_lock(&mmu->lock);
+       if (!mmu->on)
                goto unlock;
-       }
 
        ret = ivpu_mmu_cmdq_write_tlbi_nh_asid(vdev, ssid);
        if (ret)
@@ -614,7 +609,7 @@ static int ivpu_mmu_cd_add(struct ivpu_device *vdev, u32 ssid, u64 cd_dma)
        struct ivpu_mmu_cdtab *cdtab = &mmu->cdtab;
        u64 *entry;
        u64 cd[4];
-       int ret;
+       int ret = 0;
 
        if (ssid > IVPU_MMU_CDTAB_ENT_COUNT)
                return -EINVAL;
@@ -655,14 +650,9 @@ static int ivpu_mmu_cd_add(struct ivpu_device *vdev, u32 ssid, u64 cd_dma)
        ivpu_dbg(vdev, MMU, "CDTAB %s entry (SSID=%u, dma=%pad): 0x%llx, 0x%llx, 0x%llx, 0x%llx\n",
                 cd_dma ? "write" : "clear", ssid, &cd_dma, cd[0], cd[1], cd[2], cd[3]);
 
-       ret = mutex_lock_interruptible(&mmu->lock);
-       if (ret)
-               return ret;
-
-       if (!mmu->on) {
-               ret = 0;
+       mutex_lock(&mmu->lock);
+       if (!mmu->on)
                goto unlock;
-       }
 
        ret = ivpu_mmu_cmdq_write_cfgi_all(vdev);
        if (ret)
index 9f216eb..5c57f7b 100644 (file)
@@ -997,14 +997,34 @@ static void *msg_xfer(struct qaic_device *qdev, struct wrapper_list *wrappers, u
        struct xfer_queue_elem elem;
        struct wire_msg *out_buf;
        struct wrapper_msg *w;
+       long ret = -EAGAIN;
+       int xfer_count = 0;
        int retry_count;
-       long ret;
 
        if (qdev->in_reset) {
                mutex_unlock(&qdev->cntl_mutex);
                return ERR_PTR(-ENODEV);
        }
 
+       /* Attempt to avoid a partial commit of a message */
+       list_for_each_entry(w, &wrappers->list, list)
+               xfer_count++;
+
+       for (retry_count = 0; retry_count < QAIC_MHI_RETRY_MAX; retry_count++) {
+               if (xfer_count <= mhi_get_free_desc_count(qdev->cntl_ch, DMA_TO_DEVICE)) {
+                       ret = 0;
+                       break;
+               }
+               msleep_interruptible(QAIC_MHI_RETRY_WAIT_MS);
+               if (signal_pending(current))
+                       break;
+       }
+
+       if (ret) {
+               mutex_unlock(&qdev->cntl_mutex);
+               return ERR_PTR(ret);
+       }
+
        elem.seq_num = seq_num;
        elem.buf = NULL;
        init_completion(&elem.xfer_done);
@@ -1038,16 +1058,9 @@ static void *msg_xfer(struct qaic_device *qdev, struct wrapper_list *wrappers, u
        list_for_each_entry(w, &wrappers->list, list) {
                kref_get(&w->ref_count);
                retry_count = 0;
-retry:
                ret = mhi_queue_buf(qdev->cntl_ch, DMA_TO_DEVICE, &w->msg, w->len,
                                    list_is_last(&w->list, &wrappers->list) ? MHI_EOT : MHI_CHAIN);
                if (ret) {
-                       if (ret == -EAGAIN && retry_count++ < QAIC_MHI_RETRY_MAX) {
-                               msleep_interruptible(QAIC_MHI_RETRY_WAIT_MS);
-                               if (!signal_pending(current))
-                                       goto retry;
-                       }
-
                        qdev->cntl_lost_buf = true;
                        kref_put(&w->ref_count, free_wrapper);
                        mutex_unlock(&qdev->cntl_mutex);
@@ -1249,7 +1262,7 @@ dma_cont_failed:
 
 int qaic_manage_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv)
 {
-       struct qaic_manage_msg *user_msg;
+       struct qaic_manage_msg *user_msg = data;
        struct qaic_device *qdev;
        struct manage_msg *msg;
        struct qaic_user *usr;
@@ -1258,6 +1271,9 @@ int qaic_manage_ioctl(struct drm_device *dev, void *data, struct drm_file *file_
        int usr_rcu_id;
        int ret;
 
+       if (user_msg->len > QAIC_MANAGE_MAX_MSG_LENGTH)
+               return -EINVAL;
+
        usr = file_priv->driver_priv;
 
        usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
@@ -1275,13 +1291,6 @@ int qaic_manage_ioctl(struct drm_device *dev, void *data, struct drm_file *file_
                return -ENODEV;
        }
 
-       user_msg = data;
-
-       if (user_msg->len > QAIC_MANAGE_MAX_MSG_LENGTH) {
-               ret = -EINVAL;
-               goto out;
-       }
-
        msg = kzalloc(QAIC_MANAGE_MAX_MSG_LENGTH + sizeof(*msg), GFP_KERNEL);
        if (!msg) {
                ret = -ENOMEM;
index c0a574c..e9a1cb7 100644 (file)
@@ -23,6 +23,7 @@
 #include <linux/wait.h>
 #include <drm/drm_file.h>
 #include <drm/drm_gem.h>
+#include <drm/drm_prime.h>
 #include <drm/drm_print.h>
 #include <uapi/drm/qaic_accel.h>
 
@@ -591,7 +592,7 @@ static int qaic_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struc
        struct qaic_bo *bo = to_qaic_bo(obj);
        unsigned long offset = 0;
        struct scatterlist *sg;
-       int ret;
+       int ret = 0;
 
        if (obj->import_attach)
                return -EINVAL;
@@ -616,8 +617,7 @@ static void qaic_free_object(struct drm_gem_object *obj)
 
        if (obj->import_attach) {
                /* DMABUF/PRIME Path */
-               dma_buf_detach(obj->import_attach->dmabuf, obj->import_attach);
-               dma_buf_put(obj->import_attach->dmabuf);
+               drm_prime_gem_destroy(obj, NULL);
        } else {
                /* Private buffer allocation path */
                qaic_free_sgt(bo->sgt);
@@ -663,6 +663,10 @@ int qaic_create_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *fi
        if (args->pad)
                return -EINVAL;
 
+       size = PAGE_ALIGN(args->size);
+       if (size == 0)
+               return -EINVAL;
+
        usr = file_priv->driver_priv;
        usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
        if (!usr->qddev) {
@@ -677,12 +681,6 @@ int qaic_create_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *fi
                goto unlock_dev_srcu;
        }
 
-       size = PAGE_ALIGN(args->size);
-       if (size == 0) {
-               ret = -EINVAL;
-               goto unlock_dev_srcu;
-       }
-
        bo = qaic_alloc_init_bo();
        if (IS_ERR(bo)) {
                ret = PTR_ERR(bo);
@@ -926,8 +924,8 @@ int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi
 {
        struct qaic_attach_slice_entry *slice_ent;
        struct qaic_attach_slice *args = data;
+       int rcu_id, usr_rcu_id, qdev_rcu_id;
        struct dma_bridge_chan  *dbc;
-       int usr_rcu_id, qdev_rcu_id;
        struct drm_gem_object *obj;
        struct qaic_device *qdev;
        unsigned long arg_size;
@@ -936,6 +934,22 @@ int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi
        struct qaic_bo *bo;
        int ret;
 
+       if (args->hdr.count == 0)
+               return -EINVAL;
+
+       arg_size = args->hdr.count * sizeof(*slice_ent);
+       if (arg_size / args->hdr.count != sizeof(*slice_ent))
+               return -EINVAL;
+
+       if (args->hdr.size == 0)
+               return -EINVAL;
+
+       if (!(args->hdr.dir == DMA_TO_DEVICE || args->hdr.dir == DMA_FROM_DEVICE))
+               return -EINVAL;
+
+       if (args->data == 0)
+               return -EINVAL;
+
        usr = file_priv->driver_priv;
        usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
        if (!usr->qddev) {
@@ -950,43 +964,11 @@ int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi
                goto unlock_dev_srcu;
        }
 
-       if (args->hdr.count == 0) {
-               ret = -EINVAL;
-               goto unlock_dev_srcu;
-       }
-
-       arg_size = args->hdr.count * sizeof(*slice_ent);
-       if (arg_size / args->hdr.count != sizeof(*slice_ent)) {
-               ret = -EINVAL;
-               goto unlock_dev_srcu;
-       }
-
        if (args->hdr.dbc_id >= qdev->num_dbc) {
                ret = -EINVAL;
                goto unlock_dev_srcu;
        }
 
-       if (args->hdr.size == 0) {
-               ret = -EINVAL;
-               goto unlock_dev_srcu;
-       }
-
-       if (!(args->hdr.dir == DMA_TO_DEVICE  || args->hdr.dir == DMA_FROM_DEVICE)) {
-               ret = -EINVAL;
-               goto unlock_dev_srcu;
-       }
-
-       dbc = &qdev->dbc[args->hdr.dbc_id];
-       if (dbc->usr != usr) {
-               ret = -EINVAL;
-               goto unlock_dev_srcu;
-       }
-
-       if (args->data == 0) {
-               ret = -EINVAL;
-               goto unlock_dev_srcu;
-       }
-
        user_data = u64_to_user_ptr(args->data);
 
        slice_ent = kzalloc(arg_size, GFP_KERNEL);
@@ -1013,9 +995,21 @@ int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi
 
        bo = to_qaic_bo(obj);
 
+       if (bo->sliced) {
+               ret = -EINVAL;
+               goto put_bo;
+       }
+
+       dbc = &qdev->dbc[args->hdr.dbc_id];
+       rcu_id = srcu_read_lock(&dbc->ch_lock);
+       if (dbc->usr != usr) {
+               ret = -EINVAL;
+               goto unlock_ch_srcu;
+       }
+
        ret = qaic_prepare_bo(qdev, bo, &args->hdr);
        if (ret)
-               goto put_bo;
+               goto unlock_ch_srcu;
 
        ret = qaic_attach_slicing_bo(qdev, bo, &args->hdr, slice_ent);
        if (ret)
@@ -1025,6 +1019,7 @@ int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi
                dma_sync_sgtable_for_cpu(&qdev->pdev->dev, bo->sgt, args->hdr.dir);
 
        bo->dbc = dbc;
+       srcu_read_unlock(&dbc->ch_lock, rcu_id);
        drm_gem_object_put(obj);
        srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
        srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
@@ -1033,6 +1028,8 @@ int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi
 
 unprepare_bo:
        qaic_unprepare_bo(qdev, bo);
+unlock_ch_srcu:
+       srcu_read_unlock(&dbc->ch_lock, rcu_id);
 put_bo:
        drm_gem_object_put(obj);
 free_slice_ent:
@@ -1316,7 +1313,6 @@ static int __qaic_execute_bo_ioctl(struct drm_device *dev, void *data, struct dr
        received_ts = ktime_get_ns();
 
        size = is_partial ? sizeof(*pexec) : sizeof(*exec);
-
        n = (unsigned long)size * args->hdr.count;
        if (args->hdr.count == 0 || n / args->hdr.count != size)
                return -EINVAL;
@@ -1665,6 +1661,9 @@ int qaic_wait_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file
        int rcu_id;
        int ret;
 
+       if (args->pad != 0)
+               return -EINVAL;
+
        usr = file_priv->driver_priv;
        usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
        if (!usr->qddev) {
@@ -1679,11 +1678,6 @@ int qaic_wait_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file
                goto unlock_dev_srcu;
        }
 
-       if (args->pad != 0) {
-               ret = -EINVAL;
-               goto unlock_dev_srcu;
-       }
-
        if (args->dbc_id >= qdev->num_dbc) {
                ret = -EINVAL;
                goto unlock_dev_srcu;
@@ -1855,6 +1849,11 @@ void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id)
        dbc->usr = NULL;
        empty_xfer_list(qdev, dbc);
        synchronize_srcu(&dbc->ch_lock);
+       /*
+        * Threads holding channel lock, may add more elements in the xfer_list.
+        * Flush out these elements from xfer_list.
+        */
+       empty_xfer_list(qdev, dbc);
 }
 
 void release_dbc(struct qaic_device *qdev, u32 dbc_id)
index ff80eb5..b5ba550 100644 (file)
@@ -97,6 +97,7 @@ static int qaic_open(struct drm_device *dev, struct drm_file *file)
 
 cleanup_usr:
        cleanup_srcu_struct(&usr->qddev_lock);
+       ida_free(&qaic_usrs, usr->handle);
 free_usr:
        kfree(usr);
 dev_unlock:
@@ -224,6 +225,9 @@ static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id)
        struct qaic_user *usr;
 
        qddev = qdev->qddev;
+       qdev->qddev = NULL;
+       if (!qddev)
+               return;
 
        /*
         * Existing users get unresolvable errors till they close FDs.
@@ -262,8 +266,8 @@ static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id)
 
 static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id *id)
 {
+       u16 major = -1, minor = -1;
        struct qaic_device *qdev;
-       u16 major, minor;
        int ret;
 
        /*
index 19aff80..8d51269 100644 (file)
@@ -9,8 +9,6 @@
 #include <linux/idr.h>
 #include <linux/io.h>
 
-#include <linux/arm-smccc.h>
-
 static struct acpi_ffh_info ffh_ctx;
 
 int __weak acpi_ffh_address_space_arch_setup(void *handler_ctxt,
index 77186f0..539e700 100644 (file)
@@ -201,11 +201,19 @@ static void byt_i2c_setup(struct lpss_private_data *pdata)
        writel(0, pdata->mmio_base + LPSS_I2C_ENABLE);
 }
 
-/* BSW PWM used for backlight control by the i915 driver */
+/*
+ * BSW PWM1 is used for backlight control by the i915 driver
+ * BSW PWM2 is used for backlight control for fixed (etched into the glass)
+ * touch controls on some models. These touch-controls have specialized
+ * drivers which know they need the "pwm_soc_lpss_2" con-id.
+ */
 static struct pwm_lookup bsw_pwm_lookup[] = {
        PWM_LOOKUP_WITH_MODULE("80862288:00", 0, "0000:00:02.0",
                               "pwm_soc_backlight", 0, PWM_POLARITY_NORMAL,
                               "pwm-lpss-platform"),
+       PWM_LOOKUP_WITH_MODULE("80862289:00", 0, NULL,
+                              "pwm_soc_lpss_2", 0, PWM_POLARITY_NORMAL,
+                              "pwm-lpss-platform"),
 };
 
 static void bsw_pwm_setup(struct lpss_private_data *pdata)
index 02f1a1b..7a453c5 100644 (file)
@@ -66,6 +66,7 @@ static void power_saving_mwait_init(void)
        case X86_VENDOR_AMD:
        case X86_VENDOR_INTEL:
        case X86_VENDOR_ZHAOXIN:
+       case X86_VENDOR_CENTAUR:
                /*
                 * AMD Fam10h TSC will tick in all
                 * C/P/S0/S1 states when this bit is set.
index ebf8fd3..79bbfe0 100644 (file)
@@ -101,8 +101,6 @@ acpi_status
 acpi_hw_get_gpe_status(struct acpi_gpe_event_info *gpe_event_info,
                       acpi_event_status *event_status);
 
-acpi_status acpi_hw_disable_all_gpes(void);
-
 acpi_status acpi_hw_enable_all_runtime_gpes(void);
 
 acpi_status acpi_hw_enable_all_wakeup_gpes(void);
index 1d6ef96..67c2c3b 100644 (file)
@@ -7,7 +7,6 @@
 #ifndef APEI_INTERNAL_H
 #define APEI_INTERNAL_H
 
-#include <linux/cper.h>
 #include <linux/acpi.h>
 
 struct apei_exec_context;
@@ -130,10 +129,5 @@ static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
                return sizeof(*estatus) + estatus->data_length;
 }
 
-void cper_estatus_print(const char *pfx,
-                       const struct acpi_hest_generic_status *estatus);
-int cper_estatus_check_header(const struct acpi_hest_generic_status *estatus);
-int cper_estatus_check(const struct acpi_hest_generic_status *estatus);
-
 int apei_osc_setup(void);
 #endif
index c23eb75..5427e49 100644 (file)
@@ -23,6 +23,7 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/acpi.h>
+#include <linux/cper.h>
 #include <linux/io.h>
 
 #include "apei-internal.h"
@@ -33,7 +34,7 @@
 #define ACPI_BERT_PRINT_MAX_RECORDS 5
 #define ACPI_BERT_PRINT_MAX_LEN 1024
 
-static int bert_disable;
+static int bert_disable __initdata;
 
 /*
  * Print "all" the error records in the BERT table, but avoid huge spam to
index 34ad071..ef59d6e 100644 (file)
@@ -152,7 +152,6 @@ struct ghes_vendor_record_entry {
 };
 
 static struct gen_pool *ghes_estatus_pool;
-static unsigned long ghes_estatus_pool_size_request;
 
 static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
@@ -191,7 +190,6 @@ int ghes_estatus_pool_init(unsigned int num_ghes)
        len = GHES_ESTATUS_CACHE_AVG_SIZE * GHES_ESTATUS_CACHE_ALLOCED_MAX;
        len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
 
-       ghes_estatus_pool_size_request = PAGE_ALIGN(len);
        addr = (unsigned long)vmalloc(PAGE_ALIGN(len));
        if (!addr)
                goto err_pool_alloc;
@@ -1544,6 +1542,8 @@ struct list_head *ghes_get_devices(void)
 
                        pr_warn_once("Force-loading ghes_edac on an unsupported platform. You're on your own!\n");
                }
+       } else if (list_empty(&ghes_devs)) {
+               return NULL;
        }
 
        return &ghes_devs;
index e21a9e8..f81fe24 100644 (file)
@@ -3,4 +3,4 @@ obj-$(CONFIG_ACPI_AGDI)         += agdi.o
 obj-$(CONFIG_ACPI_IORT)        += iort.o
 obj-$(CONFIG_ACPI_GTDT)        += gtdt.o
 obj-$(CONFIG_ACPI_APMT)        += apmt.o
-obj-y                          += dma.o
+obj-y                          += dma.o init.o
index f605302..8b3c7d4 100644 (file)
@@ -9,11 +9,11 @@
 #define pr_fmt(fmt) "ACPI: AGDI: " fmt
 
 #include <linux/acpi.h>
-#include <linux/acpi_agdi.h>
 #include <linux/arm_sdei.h>
 #include <linux/io.h>
 #include <linux/kernel.h>
 #include <linux/platform_device.h>
+#include "init.h"
 
 struct agdi_data {
        int sdei_event;
index 8cab69f..bb010f6 100644 (file)
 #define pr_fmt(fmt)    "ACPI: APMT: " fmt
 
 #include <linux/acpi.h>
-#include <linux/acpi_apmt.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/platform_device.h>
+#include "init.h"
 
 #define DEV_NAME "arm-cs-arch-pmu"
 
@@ -35,11 +35,13 @@ static int __init apmt_init_resources(struct resource *res,
 
        num_res++;
 
-       res[num_res].start = node->base_address1;
-       res[num_res].end = node->base_address1 + SZ_4K - 1;
-       res[num_res].flags = IORESOURCE_MEM;
+       if (node->flags & ACPI_APMT_FLAGS_DUAL_PAGE) {
+               res[num_res].start = node->base_address1;
+               res[num_res].end = node->base_address1 + SZ_4K - 1;
+               res[num_res].flags = IORESOURCE_MEM;
 
-       num_res++;
+               num_res++;
+       }
 
        if (node->ovflw_irq != 0) {
                trigger = (node->ovflw_irq_flags & ACPI_APMT_OVFLW_IRQ_FLAGS_MODE);
diff --git a/drivers/acpi/arm64/init.c b/drivers/acpi/arm64/init.c
new file mode 100644 (file)
index 0000000..d3ce53d
--- /dev/null
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/acpi.h>
+#include "init.h"
+
+void __init acpi_arm_init(void)
+{
+       if (IS_ENABLED(CONFIG_ACPI_AGDI))
+               acpi_agdi_init();
+       if (IS_ENABLED(CONFIG_ACPI_APMT))
+               acpi_apmt_init();
+       if (IS_ENABLED(CONFIG_ACPI_IORT))
+               acpi_iort_init();
+}
diff --git a/drivers/acpi/arm64/init.h b/drivers/acpi/arm64/init.h
new file mode 100644 (file)
index 0000000..a1715a2
--- /dev/null
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include <linux/init.h>
+
+void __init acpi_agdi_init(void);
+void __init acpi_apmt_init(void);
+void __init acpi_iort_init(void);
index 38fb849..3631230 100644 (file)
@@ -19,6 +19,7 @@
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/dma-map-ops.h>
+#include "init.h"
 
 #define IORT_TYPE_MASK(type)   (1 << (type))
 #define IORT_MSI_TYPE          (1 << ACPI_IORT_NODE_ITS_GROUP)
index d161ff7..e3e0bd0 100644 (file)
@@ -26,9 +26,6 @@
 #include <asm/mpspec.h>
 #include <linux/dmi.h>
 #endif
-#include <linux/acpi_agdi.h>
-#include <linux/acpi_apmt.h>
-#include <linux/acpi_iort.h>
 #include <linux/acpi_viot.h>
 #include <linux/pci.h>
 #include <acpi/apei.h>
@@ -530,65 +527,30 @@ static void acpi_notify_device(acpi_handle handle, u32 event, void *data)
        acpi_drv->ops.notify(device, event);
 }
 
-static void acpi_notify_device_fixed(void *data)
-{
-       struct acpi_device *device = data;
-
-       /* Fixed hardware devices have no handles */
-       acpi_notify_device(NULL, ACPI_FIXED_HARDWARE_EVENT, device);
-}
-
-static u32 acpi_device_fixed_event(void *data)
-{
-       acpi_os_execute(OSL_NOTIFY_HANDLER, acpi_notify_device_fixed, data);
-       return ACPI_INTERRUPT_HANDLED;
-}
-
 static int acpi_device_install_notify_handler(struct acpi_device *device,
                                              struct acpi_driver *acpi_drv)
 {
-       acpi_status status;
-
-       if (device->device_type == ACPI_BUS_TYPE_POWER_BUTTON) {
-               status =
-                   acpi_install_fixed_event_handler(ACPI_EVENT_POWER_BUTTON,
-                                                    acpi_device_fixed_event,
-                                                    device);
-       } else if (device->device_type == ACPI_BUS_TYPE_SLEEP_BUTTON) {
-               status =
-                   acpi_install_fixed_event_handler(ACPI_EVENT_SLEEP_BUTTON,
-                                                    acpi_device_fixed_event,
-                                                    device);
-       } else {
-               u32 type = acpi_drv->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS ?
+       u32 type = acpi_drv->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS ?
                                ACPI_ALL_NOTIFY : ACPI_DEVICE_NOTIFY;
+       acpi_status status;
 
-               status = acpi_install_notify_handler(device->handle, type,
-                                                    acpi_notify_device,
-                                                    device);
-       }
-
+       status = acpi_install_notify_handler(device->handle, type,
+                                            acpi_notify_device, device);
        if (ACPI_FAILURE(status))
                return -EINVAL;
+
        return 0;
 }
 
 static void acpi_device_remove_notify_handler(struct acpi_device *device,
                                              struct acpi_driver *acpi_drv)
 {
-       if (device->device_type == ACPI_BUS_TYPE_POWER_BUTTON) {
-               acpi_remove_fixed_event_handler(ACPI_EVENT_POWER_BUTTON,
-                                               acpi_device_fixed_event);
-       } else if (device->device_type == ACPI_BUS_TYPE_SLEEP_BUTTON) {
-               acpi_remove_fixed_event_handler(ACPI_EVENT_SLEEP_BUTTON,
-                                               acpi_device_fixed_event);
-       } else {
-               u32 type = acpi_drv->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS ?
+       u32 type = acpi_drv->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS ?
                                ACPI_ALL_NOTIFY : ACPI_DEVICE_NOTIFY;
 
-               acpi_remove_notify_handler(device->handle, type,
-                                          acpi_notify_device);
-       }
+       acpi_remove_notify_handler(device->handle, type,
+                                  acpi_notify_device);
+
        acpi_os_wait_events_complete();
 }
 
@@ -1408,7 +1370,7 @@ static int __init acpi_init(void)
        acpi_init_ffh();
 
        pci_mmcfg_late_init();
-       acpi_iort_init();
+       acpi_arm_init();
        acpi_viot_early_init();
        acpi_hest_init();
        acpi_ghes_init();
@@ -1420,8 +1382,6 @@ static int __init acpi_init(void)
        acpi_debugger_init();
        acpi_setup_sb_notify_handler();
        acpi_viot_init();
-       acpi_agdi_init();
-       acpi_apmt_init();
        return 0;
 }
 
index 475e1ed..1e76a64 100644 (file)
@@ -78,6 +78,15 @@ static const struct dmi_system_id dmi_lid_quirks[] = {
                .driver_data = (void *)(long)ACPI_BUTTON_LID_INIT_DISABLED,
        },
        {
+               /* Nextbook Ares 8A tablet, _LID device always reports lid closed */
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Insyde"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "CherryTrail"),
+                       DMI_MATCH(DMI_BIOS_VERSION, "M882"),
+               },
+               .driver_data = (void *)(long)ACPI_BUTTON_LID_INIT_DISABLED,
+       },
+       {
                /*
                 * Lenovo Yoga 9 14ITL5, initial notification of the LID device
                 * never happens.
@@ -126,7 +135,6 @@ static const struct dmi_system_id dmi_lid_quirks[] = {
 
 static int acpi_button_add(struct acpi_device *device);
 static void acpi_button_remove(struct acpi_device *device);
-static void acpi_button_notify(struct acpi_device *device, u32 event);
 
 #ifdef CONFIG_PM_SLEEP
 static int acpi_button_suspend(struct device *dev);
@@ -144,7 +152,6 @@ static struct acpi_driver acpi_button_driver = {
        .ops = {
                .add = acpi_button_add,
                .remove = acpi_button_remove,
-               .notify = acpi_button_notify,
        },
        .drv.pm = &acpi_button_pm,
 };
@@ -400,45 +407,65 @@ static void acpi_lid_initialize_state(struct acpi_device *device)
        button->lid_state_initialized = true;
 }
 
-static void acpi_button_notify(struct acpi_device *device, u32 event)
+static void acpi_lid_notify(acpi_handle handle, u32 event, void *data)
 {
-       struct acpi_button *button = acpi_driver_data(device);
+       struct acpi_device *device = data;
+       struct acpi_button *button;
+
+       if (event != ACPI_BUTTON_NOTIFY_STATUS) {
+               acpi_handle_debug(device->handle, "Unsupported event [0x%x]\n",
+                                 event);
+               return;
+       }
+
+       button = acpi_driver_data(device);
+       if (!button->lid_state_initialized)
+               return;
+
+       acpi_lid_update_state(device, true);
+}
+
+static void acpi_button_notify(acpi_handle handle, u32 event, void *data)
+{
+       struct acpi_device *device = data;
+       struct acpi_button *button;
        struct input_dev *input;
+       int keycode;
 
-       switch (event) {
-       case ACPI_FIXED_HARDWARE_EVENT:
-               event = ACPI_BUTTON_NOTIFY_STATUS;
-               fallthrough;
-       case ACPI_BUTTON_NOTIFY_STATUS:
-               input = button->input;
-               if (button->type == ACPI_BUTTON_TYPE_LID) {
-                       if (button->lid_state_initialized)
-                               acpi_lid_update_state(device, true);
-               } else {
-                       int keycode;
-
-                       acpi_pm_wakeup_event(&device->dev);
-                       if (button->suspended)
-                               break;
-
-                       keycode = test_bit(KEY_SLEEP, input->keybit) ?
-                                               KEY_SLEEP : KEY_POWER;
-                       input_report_key(input, keycode, 1);
-                       input_sync(input);
-                       input_report_key(input, keycode, 0);
-                       input_sync(input);
-
-                       acpi_bus_generate_netlink_event(
-                                       device->pnp.device_class,
-                                       dev_name(&device->dev),
-                                       event, ++button->pushed);
-               }
-               break;
-       default:
+       if (event != ACPI_BUTTON_NOTIFY_STATUS) {
                acpi_handle_debug(device->handle, "Unsupported event [0x%x]\n",
                                  event);
-               break;
+               return;
        }
+
+       acpi_pm_wakeup_event(&device->dev);
+
+       button = acpi_driver_data(device);
+       if (button->suspended)
+               return;
+
+       input = button->input;
+       keycode = test_bit(KEY_SLEEP, input->keybit) ? KEY_SLEEP : KEY_POWER;
+
+       input_report_key(input, keycode, 1);
+       input_sync(input);
+       input_report_key(input, keycode, 0);
+       input_sync(input);
+
+       acpi_bus_generate_netlink_event(device->pnp.device_class,
+                                       dev_name(&device->dev),
+                                       event, ++button->pushed);
+}
+
+static void acpi_button_notify_run(void *data)
+{
+       acpi_button_notify(NULL, ACPI_BUTTON_NOTIFY_STATUS, data);
+}
+
+static u32 acpi_button_event(void *data)
+{
+       acpi_os_execute(OSL_NOTIFY_HANDLER, acpi_button_notify_run, data);
+       return ACPI_INTERRUPT_HANDLED;
 }
 
 #ifdef CONFIG_PM_SLEEP
@@ -480,11 +507,13 @@ static int acpi_lid_input_open(struct input_dev *input)
 
 static int acpi_button_add(struct acpi_device *device)
 {
+       acpi_notify_handler handler;
        struct acpi_button *button;
        struct input_dev *input;
        const char *hid = acpi_device_hid(device);
+       acpi_status status;
        char *name, *class;
-       int error;
+       int error = 0;
 
        if (!strcmp(hid, ACPI_BUTTON_HID_LID) &&
             lid_init_state == ACPI_BUTTON_LID_INIT_DISABLED)
@@ -508,17 +537,20 @@ static int acpi_button_add(struct acpi_device *device)
        if (!strcmp(hid, ACPI_BUTTON_HID_POWER) ||
            !strcmp(hid, ACPI_BUTTON_HID_POWERF)) {
                button->type = ACPI_BUTTON_TYPE_POWER;
+               handler = acpi_button_notify;
                strcpy(name, ACPI_BUTTON_DEVICE_NAME_POWER);
                sprintf(class, "%s/%s",
                        ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_POWER);
        } else if (!strcmp(hid, ACPI_BUTTON_HID_SLEEP) ||
                   !strcmp(hid, ACPI_BUTTON_HID_SLEEPF)) {
                button->type = ACPI_BUTTON_TYPE_SLEEP;
+               handler = acpi_button_notify;
                strcpy(name, ACPI_BUTTON_DEVICE_NAME_SLEEP);
                sprintf(class, "%s/%s",
                        ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_SLEEP);
        } else if (!strcmp(hid, ACPI_BUTTON_HID_LID)) {
                button->type = ACPI_BUTTON_TYPE_LID;
+               handler = acpi_lid_notify;
                strcpy(name, ACPI_BUTTON_DEVICE_NAME_LID);
                sprintf(class, "%s/%s",
                        ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
@@ -526,12 +558,15 @@ static int acpi_button_add(struct acpi_device *device)
        } else {
                pr_info("Unsupported hid [%s]\n", hid);
                error = -ENODEV;
-               goto err_free_input;
        }
 
-       error = acpi_button_add_fs(device);
-       if (error)
-               goto err_free_input;
+       if (!error)
+               error = acpi_button_add_fs(device);
+
+       if (error) {
+               input_free_device(input);
+               goto err_free_button;
+       }
 
        snprintf(button->phys, sizeof(button->phys), "%s/button/input0", hid);
 
@@ -559,6 +594,29 @@ static int acpi_button_add(struct acpi_device *device)
        error = input_register_device(input);
        if (error)
                goto err_remove_fs;
+
+       switch (device->device_type) {
+       case ACPI_BUS_TYPE_POWER_BUTTON:
+               status = acpi_install_fixed_event_handler(ACPI_EVENT_POWER_BUTTON,
+                                                         acpi_button_event,
+                                                         device);
+               break;
+       case ACPI_BUS_TYPE_SLEEP_BUTTON:
+               status = acpi_install_fixed_event_handler(ACPI_EVENT_SLEEP_BUTTON,
+                                                         acpi_button_event,
+                                                         device);
+               break;
+       default:
+               status = acpi_install_notify_handler(device->handle,
+                                                    ACPI_DEVICE_NOTIFY, handler,
+                                                    device);
+               break;
+       }
+       if (ACPI_FAILURE(status)) {
+               error = -ENODEV;
+               goto err_input_unregister;
+       }
+
        if (button->type == ACPI_BUTTON_TYPE_LID) {
                /*
                 * This assumes there's only one lid device, or if there are
@@ -571,11 +629,11 @@ static int acpi_button_add(struct acpi_device *device)
        pr_info("%s [%s]\n", name, acpi_device_bid(device));
        return 0;
 
- err_remove_fs:
+err_input_unregister:
+       input_unregister_device(input);
+err_remove_fs:
        acpi_button_remove_fs(device);
- err_free_input:
-       input_free_device(input);
- err_free_button:
+err_free_button:
        kfree(button);
        return error;
 }
@@ -584,6 +642,24 @@ static void acpi_button_remove(struct acpi_device *device)
 {
        struct acpi_button *button = acpi_driver_data(device);
 
+       switch (device->device_type) {
+       case ACPI_BUS_TYPE_POWER_BUTTON:
+               acpi_remove_fixed_event_handler(ACPI_EVENT_POWER_BUTTON,
+                                               acpi_button_event);
+               break;
+       case ACPI_BUS_TYPE_SLEEP_BUTTON:
+               acpi_remove_fixed_event_handler(ACPI_EVENT_SLEEP_BUTTON,
+                                               acpi_button_event);
+               break;
+       default:
+               acpi_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
+                                          button->type == ACPI_BUTTON_TYPE_LID ?
+                                               acpi_lid_notify :
+                                               acpi_button_notify);
+               break;
+       }
+       acpi_os_wait_events_complete();
+
        acpi_button_remove_fs(device);
        input_unregister_device(button->input);
        kfree(button);
index 928899a..8569f55 100644 (file)
@@ -662,21 +662,6 @@ static void advance_transaction(struct acpi_ec *ec, bool interrupt)
 
        ec_dbg_stm("%s (%d)", interrupt ? "IRQ" : "TASK", smp_processor_id());
 
-       /*
-        * Clear GPE_STS upfront to allow subsequent hardware GPE_STS 0->1
-        * changes to always trigger a GPE interrupt.
-        *
-        * GPE STS is a W1C register, which means:
-        *
-        * 1. Software can clear it without worrying about clearing the other
-        *    GPEs' STS bits when the hardware sets them in parallel.
-        *
-        * 2. As long as software can ensure only clearing it when it is set,
-        *    hardware won't set it in parallel.
-        */
-       if (ec->gpe >= 0 && acpi_ec_gpe_status_set(ec))
-               acpi_clear_gpe(NULL, ec->gpe);
-
        status = acpi_ec_read_status(ec);
 
        /*
@@ -1287,6 +1272,22 @@ static void acpi_ec_handle_interrupt(struct acpi_ec *ec)
        unsigned long flags;
 
        spin_lock_irqsave(&ec->lock, flags);
+
+       /*
+        * Clear GPE_STS upfront to allow subsequent hardware GPE_STS 0->1
+        * changes to always trigger a GPE interrupt.
+        *
+        * GPE STS is a W1C register, which means:
+        *
+        * 1. Software can clear it without worrying about clearing the other
+        *    GPEs' STS bits when the hardware sets them in parallel.
+        *
+        * 2. As long as software can ensure only clearing it when it is set,
+        *    hardware won't set it in parallel.
+        */
+       if (ec->gpe >= 0 && acpi_ec_gpe_status_set(ec))
+               acpi_clear_gpe(NULL, ec->gpe);
+
        advance_transaction(ec, true);
        spin_unlock_irqrestore(&ec->lock, flags);
 }
index 6023ad6..573bc0d 100644 (file)
@@ -347,4 +347,6 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm,
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev);
 bool intel_fwa_supported(struct nvdimm_bus *nvdimm_bus);
 extern struct device_attribute dev_attr_firmware_activate_noidle;
+void nfit_intel_shutdown_status(struct nfit_mem *nfit_mem);
+
 #endif /* __NFIT_H__ */
index 9718d07..dc615ef 100644 (file)
@@ -597,10 +597,6 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index)
                        io_idle(cx->address);
                } else
                        return -ENODEV;
-
-#if defined(CONFIG_X86) && defined(CONFIG_HOTPLUG_CPU)
-               cond_wakeup_cpu0();
-#endif
        }
 
        /* Never reached */
index e8492b3..1dd8d5a 100644 (file)
@@ -470,47 +470,12 @@ static const struct dmi_system_id asus_laptop[] = {
        { }
 };
 
-static const struct dmi_system_id lenovo_laptop[] = {
+static const struct dmi_system_id lg_laptop[] = {
        {
-               .ident = "LENOVO IdeaPad Flex 5 14ALC7",
+               .ident = "LG Electronics 17U70P",
                .matches = {
-                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
-                       DMI_MATCH(DMI_PRODUCT_NAME, "82R9"),
-               },
-       },
-       {
-               .ident = "LENOVO IdeaPad Flex 5 16ALC7",
-               .matches = {
-                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
-                       DMI_MATCH(DMI_PRODUCT_NAME, "82RA"),
-               },
-       },
-       { }
-};
-
-static const struct dmi_system_id tongfang_gm_rg[] = {
-       {
-               .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD",
-               .matches = {
-                       DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"),
-               },
-       },
-       { }
-};
-
-static const struct dmi_system_id maingear_laptop[] = {
-       {
-               .ident = "MAINGEAR Vector Pro 2 15",
-               .matches = {
-                       DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
-                       DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"),
-               }
-       },
-       {
-               .ident = "MAINGEAR Vector Pro 2 17",
-               .matches = {
-                       DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
-                       DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"),
+                       DMI_MATCH(DMI_SYS_VENDOR, "LG Electronics"),
+                       DMI_MATCH(DMI_BOARD_NAME, "17U70P"),
                },
        },
        { }
@@ -528,10 +493,7 @@ struct irq_override_cmp {
 static const struct irq_override_cmp override_table[] = {
        { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
        { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
-       { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
-       { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
-       { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
-       { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
+       { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
 };
 
 static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
@@ -550,16 +512,6 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
                        return entry->override;
        }
 
-#ifdef CONFIG_X86
-       /*
-        * IRQ override isn't needed on modern AMD Zen systems and
-        * this override breaks active low IRQs on AMD Ryzen 6000 and
-        * newer systems. Skip it.
-        */
-       if (boot_cpu_has(X86_FEATURE_ZEN))
-               return false;
-#endif
-
        return true;
 }
 
index 0c6f06a..1c3e1e2 100644 (file)
@@ -2029,8 +2029,6 @@ static u32 acpi_scan_check_dep(acpi_handle handle, bool check_dep)
        return count;
 }
 
-static bool acpi_bus_scan_second_pass;
-
 static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
                                      struct acpi_device **adev_p)
 {
@@ -2050,10 +2048,8 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
                        return AE_OK;
 
                /* Bail out if there are dependencies. */
-               if (acpi_scan_check_dep(handle, check_dep) > 0) {
-                       acpi_bus_scan_second_pass = true;
+               if (acpi_scan_check_dep(handle, check_dep) > 0)
                        return AE_CTRL_DEPTH;
-               }
 
                fallthrough;
        case ACPI_TYPE_ANY:     /* for ACPI_ROOT_OBJECT */
@@ -2301,6 +2297,12 @@ static bool acpi_scan_clear_dep_queue(struct acpi_device *adev)
        return true;
 }
 
+static void acpi_scan_delete_dep_data(struct acpi_dep_data *dep)
+{
+       list_del(&dep->node);
+       kfree(dep);
+}
+
 static int acpi_scan_clear_dep(struct acpi_dep_data *dep, void *data)
 {
        struct acpi_device *adev = acpi_get_acpi_dev(dep->consumer);
@@ -2311,8 +2313,10 @@ static int acpi_scan_clear_dep(struct acpi_dep_data *dep, void *data)
                        acpi_dev_put(adev);
        }
 
-       list_del(&dep->node);
-       kfree(dep);
+       if (dep->free_when_met)
+               acpi_scan_delete_dep_data(dep);
+       else
+               dep->met = true;
 
        return 0;
 }
@@ -2406,6 +2410,55 @@ struct acpi_device *acpi_dev_get_next_consumer_dev(struct acpi_device *supplier,
 }
 EXPORT_SYMBOL_GPL(acpi_dev_get_next_consumer_dev);
 
+static void acpi_scan_postponed_branch(acpi_handle handle)
+{
+       struct acpi_device *adev = NULL;
+
+       if (ACPI_FAILURE(acpi_bus_check_add(handle, false, &adev)))
+               return;
+
+       acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
+                           acpi_bus_check_add_2, NULL, NULL, (void **)&adev);
+       acpi_bus_attach(adev, NULL);
+}
+
+static void acpi_scan_postponed(void)
+{
+       struct acpi_dep_data *dep, *tmp;
+
+       mutex_lock(&acpi_dep_list_lock);
+
+       list_for_each_entry_safe(dep, tmp, &acpi_dep_list, node) {
+               acpi_handle handle = dep->consumer;
+
+               /*
+                * In case there are multiple acpi_dep_list entries with the
+                * same consumer, skip the current entry if the consumer device
+                * object corresponding to it is present already.
+                */
+               if (!acpi_fetch_acpi_dev(handle)) {
+                       /*
+                        * Even though the lock is released here, tmp is
+                        * guaranteed to be valid, because none of the list
+                        * entries following dep is marked as "free when met"
+                        * and so they cannot be deleted.
+                        */
+                       mutex_unlock(&acpi_dep_list_lock);
+
+                       acpi_scan_postponed_branch(handle);
+
+                       mutex_lock(&acpi_dep_list_lock);
+               }
+
+               if (dep->met)
+                       acpi_scan_delete_dep_data(dep);
+               else
+                       dep->free_when_met = true;
+       }
+
+       mutex_unlock(&acpi_dep_list_lock);
+}
+
 /**
  * acpi_bus_scan - Add ACPI device node objects in a given namespace scope.
  * @handle: Root of the namespace scope to scan.
@@ -2424,8 +2477,6 @@ int acpi_bus_scan(acpi_handle handle)
 {
        struct acpi_device *device = NULL;
 
-       acpi_bus_scan_second_pass = false;
-
        /* Pass 1: Avoid enumerating devices with missing dependencies. */
 
        if (ACPI_SUCCESS(acpi_bus_check_add(handle, true, &device)))
@@ -2438,19 +2489,9 @@ int acpi_bus_scan(acpi_handle handle)
 
        acpi_bus_attach(device, (void *)true);
 
-       if (!acpi_bus_scan_second_pass)
-               return 0;
-
        /* Pass 2: Enumerate all of the remaining devices. */
 
-       device = NULL;
-
-       if (ACPI_SUCCESS(acpi_bus_check_add(handle, false, &device)))
-               acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
-                                   acpi_bus_check_add_2, NULL, NULL,
-                                   (void **)&device);
-
-       acpi_bus_attach(device, NULL);
+       acpi_scan_postponed();
 
        return 0;
 }
index 72470b9..808484d 100644 (file)
@@ -636,11 +636,19 @@ static int acpi_suspend_enter(suspend_state_t pm_state)
        }
 
        /*
-        * Disable and clear GPE status before interrupt is enabled. Some GPEs
-        * (like wakeup GPE) haven't handler, this can avoid such GPE misfire.
-        * acpi_leave_sleep_state will reenable specific GPEs later
+        * Disable all GPE and clear their status bits before interrupts are
+        * enabled. Some GPEs (like wakeup GPEs) have no handlers and this can
+        * prevent them from producing spurious interrups.
+        *
+        * acpi_leave_sleep_state() will reenable specific GPEs later.
+        *
+        * Because this code runs on one CPU with disabled interrupts (all of
+        * the other CPUs are offline at this time), it need not acquire any
+        * sleeping locks which may trigger an implicit preemption point even
+        * if there is no contention, so avoid doing that by using a low-level
+        * library routine here.
         */
-       acpi_disable_all_gpes();
+       acpi_hw_disable_all_gpes();
        /* Allow EC transactions to happen. */
        acpi_ec_unblock_transactions();
 
@@ -840,7 +848,7 @@ void __weak acpi_s2idle_setup(void)
        s2idle_set_ops(&acpi_s2idle_ops);
 }
 
-static void acpi_sleep_suspend_setup(void)
+static void __init acpi_sleep_suspend_setup(void)
 {
        bool suspend_ops_needed = false;
        int i;
index 4720a36..f9f6ebb 100644 (file)
 #define ACPI_THERMAL_NOTIFY_HOT                0xF1
 #define ACPI_THERMAL_MODE_ACTIVE       0x00
 
-#define ACPI_THERMAL_MAX_ACTIVE        10
-#define ACPI_THERMAL_MAX_LIMIT_STR_LEN 65
+#define ACPI_THERMAL_MAX_ACTIVE                10
+#define ACPI_THERMAL_MAX_LIMIT_STR_LEN 65
 
-MODULE_AUTHOR("Paul Diefenbaugh");
-MODULE_DESCRIPTION("ACPI Thermal Zone Driver");
-MODULE_LICENSE("GPL");
+#define ACPI_TRIPS_CRITICAL    BIT(0)
+#define ACPI_TRIPS_HOT         BIT(1)
+#define ACPI_TRIPS_PASSIVE     BIT(2)
+#define ACPI_TRIPS_ACTIVE      BIT(3)
+#define ACPI_TRIPS_DEVICES     BIT(4)
+
+#define ACPI_TRIPS_THRESHOLDS  (ACPI_TRIPS_PASSIVE | ACPI_TRIPS_ACTIVE)
+
+#define ACPI_TRIPS_INIT                (ACPI_TRIPS_CRITICAL | ACPI_TRIPS_HOT | \
+                                ACPI_TRIPS_PASSIVE | ACPI_TRIPS_ACTIVE | \
+                                ACPI_TRIPS_DEVICES)
+
+/*
+ * This exception is thrown out in two cases:
+ * 1.An invalid trip point becomes invalid or a valid trip point becomes invalid
+ *   when re-evaluating the AML code.
+ * 2.TODO: Devices listed in _PSL, _ALx, _TZD may change.
+ *   We need to re-bind the cooling devices of a thermal zone when this occurs.
+ */
+#define ACPI_THERMAL_TRIPS_EXCEPTION(flags, tz, str) \
+do { \
+       if (flags != ACPI_TRIPS_INIT) \
+               acpi_handle_info(tz->device->handle, \
+                       "ACPI thermal trip point %s changed\n" \
+                       "Please report to linux-acpi@vger.kernel.org\n", str); \
+} while (0)
 
 static int act;
 module_param(act, int, 0644);
@@ -73,75 +96,30 @@ MODULE_PARM_DESC(psv, "Disable or override all passive trip points.");
 
 static struct workqueue_struct *acpi_thermal_pm_queue;
 
-static int acpi_thermal_add(struct acpi_device *device);
-static void acpi_thermal_remove(struct acpi_device *device);
-static void acpi_thermal_notify(struct acpi_device *device, u32 event);
-
-static const struct acpi_device_id  thermal_device_ids[] = {
-       {ACPI_THERMAL_HID, 0},
-       {"", 0},
-};
-MODULE_DEVICE_TABLE(acpi, thermal_device_ids);
-
-#ifdef CONFIG_PM_SLEEP
-static int acpi_thermal_suspend(struct device *dev);
-static int acpi_thermal_resume(struct device *dev);
-#else
-#define acpi_thermal_suspend NULL
-#define acpi_thermal_resume NULL
-#endif
-static SIMPLE_DEV_PM_OPS(acpi_thermal_pm, acpi_thermal_suspend, acpi_thermal_resume);
-
-static struct acpi_driver acpi_thermal_driver = {
-       .name = "thermal",
-       .class = ACPI_THERMAL_CLASS,
-       .ids = thermal_device_ids,
-       .ops = {
-               .add = acpi_thermal_add,
-               .remove = acpi_thermal_remove,
-               .notify = acpi_thermal_notify,
-               },
-       .drv.pm = &acpi_thermal_pm,
-};
-
-struct acpi_thermal_state {
-       u8 critical:1;
-       u8 hot:1;
-       u8 passive:1;
-       u8 active:1;
-       u8 reserved:4;
-       int active_index;
-};
-
-struct acpi_thermal_state_flags {
-       u8 valid:1;
-       u8 enabled:1;
-       u8 reserved:6;
-};
-
 struct acpi_thermal_critical {
-       struct acpi_thermal_state_flags flags;
        unsigned long temperature;
+       bool valid;
 };
 
 struct acpi_thermal_hot {
-       struct acpi_thermal_state_flags flags;
        unsigned long temperature;
+       bool valid;
 };
 
 struct acpi_thermal_passive {
-       struct acpi_thermal_state_flags flags;
+       struct acpi_handle_list devices;
        unsigned long temperature;
        unsigned long tc1;
        unsigned long tc2;
        unsigned long tsp;
-       struct acpi_handle_list devices;
+       bool valid;
 };
 
 struct acpi_thermal_active {
-       struct acpi_thermal_state_flags flags;
-       unsigned long temperature;
        struct acpi_handle_list devices;
+       unsigned long temperature;
+       bool valid;
+       bool enabled;
 };
 
 struct acpi_thermal_trips {
@@ -151,12 +129,6 @@ struct acpi_thermal_trips {
        struct acpi_thermal_active active[ACPI_THERMAL_MAX_ACTIVE];
 };
 
-struct acpi_thermal_flags {
-       u8 cooling_mode:1;      /* _SCP */
-       u8 devices:1;           /* _TZD */
-       u8 reserved:6;
-};
-
 struct acpi_thermal {
        struct acpi_device *device;
        acpi_bus_id name;
@@ -164,8 +136,6 @@ struct acpi_thermal {
        unsigned long last_temperature;
        unsigned long polling_frequency;
        volatile u8 zombie;
-       struct acpi_thermal_flags flags;
-       struct acpi_thermal_state state;
        struct acpi_thermal_trips trips;
        struct acpi_handle_list devices;
        struct thermal_zone_device *thermal_zone;
@@ -220,52 +190,12 @@ static int acpi_thermal_get_polling_frequency(struct acpi_thermal *tz)
        return 0;
 }
 
-static int acpi_thermal_set_cooling_mode(struct acpi_thermal *tz, int mode)
-{
-       if (!tz)
-               return -EINVAL;
-
-       if (ACPI_FAILURE(acpi_execute_simple_method(tz->device->handle,
-                                                   "_SCP", mode)))
-               return -ENODEV;
-
-       return 0;
-}
-
-#define ACPI_TRIPS_CRITICAL    0x01
-#define ACPI_TRIPS_HOT         0x02
-#define ACPI_TRIPS_PASSIVE     0x04
-#define ACPI_TRIPS_ACTIVE      0x08
-#define ACPI_TRIPS_DEVICES     0x10
-
-#define ACPI_TRIPS_REFRESH_THRESHOLDS  (ACPI_TRIPS_PASSIVE | ACPI_TRIPS_ACTIVE)
-#define ACPI_TRIPS_REFRESH_DEVICES     ACPI_TRIPS_DEVICES
-
-#define ACPI_TRIPS_INIT      (ACPI_TRIPS_CRITICAL | ACPI_TRIPS_HOT |   \
-                             ACPI_TRIPS_PASSIVE | ACPI_TRIPS_ACTIVE |  \
-                             ACPI_TRIPS_DEVICES)
-
-/*
- * This exception is thrown out in two cases:
- * 1.An invalid trip point becomes invalid or a valid trip point becomes invalid
- *   when re-evaluating the AML code.
- * 2.TODO: Devices listed in _PSL, _ALx, _TZD may change.
- *   We need to re-bind the cooling devices of a thermal zone when this occurs.
- */
-#define ACPI_THERMAL_TRIPS_EXCEPTION(flags, tz, str)   \
-do {   \
-       if (flags != ACPI_TRIPS_INIT)   \
-               acpi_handle_info(tz->device->handle,    \
-               "ACPI thermal trip point %s changed\n"  \
-               "Please report to linux-acpi@vger.kernel.org\n", str); \
-} while (0)
-
 static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
 {
        acpi_status status;
        unsigned long long tmp;
        struct acpi_handle_list devices;
-       int valid = 0;
+       bool valid = false;
        int i;
 
        /* Critical Shutdown */
@@ -279,21 +209,21 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
                 * ... so lets discard those as invalid.
                 */
                if (ACPI_FAILURE(status)) {
-                       tz->trips.critical.flags.valid = 0;
+                       tz->trips.critical.valid = false;
                        acpi_handle_debug(tz->device->handle,
                                          "No critical threshold\n");
                } else if (tmp <= 2732) {
                        pr_info(FW_BUG "Invalid critical threshold (%llu)\n", tmp);
-                       tz->trips.critical.flags.valid = 0;
+                       tz->trips.critical.valid = false;
                } else {
-                       tz->trips.critical.flags.valid = 1;
+                       tz->trips.critical.valid = true;
                        acpi_handle_debug(tz->device->handle,
                                          "Found critical threshold [%lu]\n",
                                          tz->trips.critical.temperature);
                }
-               if (tz->trips.critical.flags.valid) {
+               if (tz->trips.critical.valid) {
                        if (crt == -1) {
-                               tz->trips.critical.flags.valid = 0;
+                               tz->trips.critical.valid = false;
                        } else if (crt > 0) {
                                unsigned long crt_k = celsius_to_deci_kelvin(crt);
 
@@ -312,12 +242,12 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
        if (flag & ACPI_TRIPS_HOT) {
                status = acpi_evaluate_integer(tz->device->handle, "_HOT", NULL, &tmp);
                if (ACPI_FAILURE(status)) {
-                       tz->trips.hot.flags.valid = 0;
+                       tz->trips.hot.valid = false;
                        acpi_handle_debug(tz->device->handle,
                                          "No hot threshold\n");
                } else {
                        tz->trips.hot.temperature = tmp;
-                       tz->trips.hot.flags.valid = 1;
+                       tz->trips.hot.valid = true;
                        acpi_handle_debug(tz->device->handle,
                                          "Found hot threshold [%lu]\n",
                                          tz->trips.hot.temperature);
@@ -325,9 +255,9 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
        }
 
        /* Passive (optional) */
-       if (((flag & ACPI_TRIPS_PASSIVE) && tz->trips.passive.flags.valid) ||
+       if (((flag & ACPI_TRIPS_PASSIVE) && tz->trips.passive.valid) ||
            flag == ACPI_TRIPS_INIT) {
-               valid = tz->trips.passive.flags.valid;
+               valid = tz->trips.passive.valid;
                if (psv == -1) {
                        status = AE_SUPPORT;
                } else if (psv > 0) {
@@ -339,44 +269,44 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
                }
 
                if (ACPI_FAILURE(status)) {
-                       tz->trips.passive.flags.valid = 0;
+                       tz->trips.passive.valid = false;
                } else {
                        tz->trips.passive.temperature = tmp;
-                       tz->trips.passive.flags.valid = 1;
+                       tz->trips.passive.valid = true;
                        if (flag == ACPI_TRIPS_INIT) {
                                status = acpi_evaluate_integer(tz->device->handle,
                                                               "_TC1", NULL, &tmp);
                                if (ACPI_FAILURE(status))
-                                       tz->trips.passive.flags.valid = 0;
+                                       tz->trips.passive.valid = false;
                                else
                                        tz->trips.passive.tc1 = tmp;
 
                                status = acpi_evaluate_integer(tz->device->handle,
                                                               "_TC2", NULL, &tmp);
                                if (ACPI_FAILURE(status))
-                                       tz->trips.passive.flags.valid = 0;
+                                       tz->trips.passive.valid = false;
                                else
                                        tz->trips.passive.tc2 = tmp;
 
                                status = acpi_evaluate_integer(tz->device->handle,
                                                               "_TSP", NULL, &tmp);
                                if (ACPI_FAILURE(status))
-                                       tz->trips.passive.flags.valid = 0;
+                                       tz->trips.passive.valid = false;
                                else
                                        tz->trips.passive.tsp = tmp;
                        }
                }
        }
-       if ((flag & ACPI_TRIPS_DEVICES) && tz->trips.passive.flags.valid) {
+       if ((flag & ACPI_TRIPS_DEVICES) && tz->trips.passive.valid) {
                memset(&devices, 0, sizeof(struct acpi_handle_list));
                status = acpi_evaluate_reference(tz->device->handle, "_PSL",
                                                 NULL, &devices);
                if (ACPI_FAILURE(status)) {
                        acpi_handle_info(tz->device->handle,
                                         "Invalid passive threshold\n");
-                       tz->trips.passive.flags.valid = 0;
+                       tz->trips.passive.valid = false;
                } else {
-                       tz->trips.passive.flags.valid = 1;
+                       tz->trips.passive.valid = true;
                }
 
                if (memcmp(&tz->trips.passive.devices, &devices,
@@ -387,24 +317,24 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
                }
        }
        if ((flag & ACPI_TRIPS_PASSIVE) || (flag & ACPI_TRIPS_DEVICES)) {
-               if (valid != tz->trips.passive.flags.valid)
+               if (valid != tz->trips.passive.valid)
                        ACPI_THERMAL_TRIPS_EXCEPTION(flag, tz, "state");
        }
 
        /* Active (optional) */
        for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE; i++) {
                char name[5] = { '_', 'A', 'C', ('0' + i), '\0' };
-               valid = tz->trips.active[i].flags.valid;
+               valid = tz->trips.active[i].valid;
 
                if (act == -1)
                        break; /* disable all active trip points */
 
                if (flag == ACPI_TRIPS_INIT || ((flag & ACPI_TRIPS_ACTIVE) &&
-                   tz->trips.active[i].flags.valid)) {
+                   tz->trips.active[i].valid)) {
                        status = acpi_evaluate_integer(tz->device->handle,
                                                       name, NULL, &tmp);
                        if (ACPI_FAILURE(status)) {
-                               tz->trips.active[i].flags.valid = 0;
+                               tz->trips.active[i].valid = false;
                                if (i == 0)
                                        break;
 
@@ -426,21 +356,21 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
                                break;
                        } else {
                                tz->trips.active[i].temperature = tmp;
-                               tz->trips.active[i].flags.valid = 1;
+                               tz->trips.active[i].valid = true;
                        }
                }
 
                name[2] = 'L';
-               if ((flag & ACPI_TRIPS_DEVICES) && tz->trips.active[i].flags.valid) {
+               if ((flag & ACPI_TRIPS_DEVICES) && tz->trips.active[i].valid) {
                        memset(&devices, 0, sizeof(struct acpi_handle_list));
                        status = acpi_evaluate_reference(tz->device->handle,
                                                         name, NULL, &devices);
                        if (ACPI_FAILURE(status)) {
                                acpi_handle_info(tz->device->handle,
                                                 "Invalid active%d threshold\n", i);
-                               tz->trips.active[i].flags.valid = 0;
+                               tz->trips.active[i].valid = false;
                        } else {
-                               tz->trips.active[i].flags.valid = 1;
+                               tz->trips.active[i].valid = true;
                        }
 
                        if (memcmp(&tz->trips.active[i].devices, &devices,
@@ -451,10 +381,10 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
                        }
                }
                if ((flag & ACPI_TRIPS_ACTIVE) || (flag & ACPI_TRIPS_DEVICES))
-                       if (valid != tz->trips.active[i].flags.valid)
+                       if (valid != tz->trips.active[i].valid)
                                ACPI_THERMAL_TRIPS_EXCEPTION(flag, tz, "state");
 
-               if (!tz->trips.active[i].flags.valid)
+               if (!tz->trips.active[i].valid)
                        break;
        }
 
@@ -474,17 +404,18 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
 
 static int acpi_thermal_get_trip_points(struct acpi_thermal *tz)
 {
-       int i, valid, ret = acpi_thermal_trips_update(tz, ACPI_TRIPS_INIT);
+       int i, ret = acpi_thermal_trips_update(tz, ACPI_TRIPS_INIT);
+       bool valid;
 
        if (ret)
                return ret;
 
-       valid = tz->trips.critical.flags.valid |
-               tz->trips.hot.flags.valid |
-               tz->trips.passive.flags.valid;
+       valid = tz->trips.critical.valid |
+               tz->trips.hot.valid |
+               tz->trips.passive.valid;
 
        for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE; i++)
-               valid |= tz->trips.active[i].flags.valid;
+               valid = valid || tz->trips.active[i].valid;
 
        if (!valid) {
                pr_warn(FW_BUG "No valid trip found\n");
@@ -521,7 +452,7 @@ static int thermal_get_trip_type(struct thermal_zone_device *thermal,
        if (!tz || trip < 0)
                return -EINVAL;
 
-       if (tz->trips.critical.flags.valid) {
+       if (tz->trips.critical.valid) {
                if (!trip) {
                        *type = THERMAL_TRIP_CRITICAL;
                        return 0;
@@ -529,7 +460,7 @@ static int thermal_get_trip_type(struct thermal_zone_device *thermal,
                trip--;
        }
 
-       if (tz->trips.hot.flags.valid) {
+       if (tz->trips.hot.valid) {
                if (!trip) {
                        *type = THERMAL_TRIP_HOT;
                        return 0;
@@ -537,7 +468,7 @@ static int thermal_get_trip_type(struct thermal_zone_device *thermal,
                trip--;
        }
 
-       if (tz->trips.passive.flags.valid) {
+       if (tz->trips.passive.valid) {
                if (!trip) {
                        *type = THERMAL_TRIP_PASSIVE;
                        return 0;
@@ -545,7 +476,7 @@ static int thermal_get_trip_type(struct thermal_zone_device *thermal,
                trip--;
        }
 
-       for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE && tz->trips.active[i].flags.valid; i++) {
+       for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE && tz->trips.active[i].valid; i++) {
                if (!trip) {
                        *type = THERMAL_TRIP_ACTIVE;
                        return 0;
@@ -565,7 +496,7 @@ static int thermal_get_trip_temp(struct thermal_zone_device *thermal,
        if (!tz || trip < 0)
                return -EINVAL;
 
-       if (tz->trips.critical.flags.valid) {
+       if (tz->trips.critical.valid) {
                if (!trip) {
                        *temp = deci_kelvin_to_millicelsius_with_offset(
                                        tz->trips.critical.temperature,
@@ -575,7 +506,7 @@ static int thermal_get_trip_temp(struct thermal_zone_device *thermal,
                trip--;
        }
 
-       if (tz->trips.hot.flags.valid) {
+       if (tz->trips.hot.valid) {
                if (!trip) {
                        *temp = deci_kelvin_to_millicelsius_with_offset(
                                        tz->trips.hot.temperature,
@@ -585,7 +516,7 @@ static int thermal_get_trip_temp(struct thermal_zone_device *thermal,
                trip--;
        }
 
-       if (tz->trips.passive.flags.valid) {
+       if (tz->trips.passive.valid) {
                if (!trip) {
                        *temp = deci_kelvin_to_millicelsius_with_offset(
                                        tz->trips.passive.temperature,
@@ -596,7 +527,7 @@ static int thermal_get_trip_temp(struct thermal_zone_device *thermal,
        }
 
        for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE &&
-               tz->trips.active[i].flags.valid; i++) {
+               tz->trips.active[i].valid; i++) {
                if (!trip) {
                        *temp = deci_kelvin_to_millicelsius_with_offset(
                                        tz->trips.active[i].temperature,
@@ -614,7 +545,7 @@ static int thermal_get_crit_temp(struct thermal_zone_device *thermal,
 {
        struct acpi_thermal *tz = thermal_zone_device_priv(thermal);
 
-       if (tz->trips.critical.flags.valid) {
+       if (tz->trips.critical.valid) {
                *temperature = deci_kelvin_to_millicelsius_with_offset(
                                        tz->trips.critical.temperature,
                                        tz->kelvin_offset);
@@ -700,13 +631,13 @@ static int acpi_thermal_cooling_device_cb(struct thermal_zone_device *thermal,
        int trip = -1;
        int result = 0;
 
-       if (tz->trips.critical.flags.valid)
+       if (tz->trips.critical.valid)
                trip++;
 
-       if (tz->trips.hot.flags.valid)
+       if (tz->trips.hot.valid)
                trip++;
 
-       if (tz->trips.passive.flags.valid) {
+       if (tz->trips.passive.valid) {
                trip++;
                for (i = 0; i < tz->trips.passive.devices.count; i++) {
                        handle = tz->trips.passive.devices.handles[i];
@@ -731,7 +662,7 @@ static int acpi_thermal_cooling_device_cb(struct thermal_zone_device *thermal,
        }
 
        for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE; i++) {
-               if (!tz->trips.active[i].flags.valid)
+               if (!tz->trips.active[i].valid)
                        break;
 
                trip++;
@@ -819,19 +750,19 @@ static int acpi_thermal_register_thermal_zone(struct acpi_thermal *tz)
        acpi_status status;
        int i;
 
-       if (tz->trips.critical.flags.valid)
+       if (tz->trips.critical.valid)
                trips++;
 
-       if (tz->trips.hot.flags.valid)
+       if (tz->trips.hot.valid)
                trips++;
 
-       if (tz->trips.passive.flags.valid)
+       if (tz->trips.passive.valid)
                trips++;
 
-       for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE && tz->trips.active[i].flags.valid;
+       for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE && tz->trips.active[i].valid;
             i++, trips++);
 
-       if (tz->trips.passive.flags.valid)
+       if (tz->trips.passive.valid)
                tz->thermal_zone = thermal_zone_device_register("acpitz", trips, 0, tz,
                                                                &acpi_thermal_zone_ops, NULL,
                                                                tz->trips.passive.tsp * 100,
@@ -906,13 +837,13 @@ static void acpi_thermal_notify(struct acpi_device *device, u32 event)
                acpi_queue_thermal_check(tz);
                break;
        case ACPI_THERMAL_NOTIFY_THRESHOLDS:
-               acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_THRESHOLDS);
+               acpi_thermal_trips_update(tz, ACPI_TRIPS_THRESHOLDS);
                acpi_queue_thermal_check(tz);
                acpi_bus_generate_netlink_event(device->pnp.device_class,
                                                dev_name(&device->dev), event, 0);
                break;
        case ACPI_THERMAL_NOTIFY_DEVICES:
-               acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_DEVICES);
+               acpi_thermal_trips_update(tz, ACPI_TRIPS_DEVICES);
                acpi_queue_thermal_check(tz);
                acpi_bus_generate_netlink_event(device->pnp.device_class,
                                                dev_name(&device->dev), event, 0);
@@ -976,9 +907,8 @@ static int acpi_thermal_get_info(struct acpi_thermal *tz)
                return result;
 
        /* Set the cooling mode [_SCP] to active cooling (default) */
-       result = acpi_thermal_set_cooling_mode(tz, ACPI_THERMAL_MODE_ACTIVE);
-       if (!result)
-               tz->flags.cooling_mode = 1;
+       acpi_execute_simple_method(tz->device->handle, "_SCP",
+                                  ACPI_THERMAL_MODE_ACTIVE);
 
        /* Get default polling frequency [_TZP] (optional) */
        if (tzp)
@@ -1001,7 +931,7 @@ static int acpi_thermal_get_info(struct acpi_thermal *tz)
  */
 static void acpi_thermal_guess_offset(struct acpi_thermal *tz)
 {
-       if (tz->trips.critical.flags.valid &&
+       if (tz->trips.critical.valid &&
            (tz->trips.critical.temperature % 5) == 1)
                tz->kelvin_offset = 273100;
        else
@@ -1110,27 +1040,48 @@ static int acpi_thermal_resume(struct device *dev)
                return -EINVAL;
 
        for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE; i++) {
-               if (!tz->trips.active[i].flags.valid)
+               if (!tz->trips.active[i].valid)
                        break;
 
-               tz->trips.active[i].flags.enabled = 1;
+               tz->trips.active[i].enabled = true;
                for (j = 0; j < tz->trips.active[i].devices.count; j++) {
                        result = acpi_bus_update_power(
                                        tz->trips.active[i].devices.handles[j],
                                        &power_state);
                        if (result || (power_state != ACPI_STATE_D0)) {
-                               tz->trips.active[i].flags.enabled = 0;
+                               tz->trips.active[i].enabled = false;
                                break;
                        }
                }
-               tz->state.active |= tz->trips.active[i].flags.enabled;
        }
 
        acpi_queue_thermal_check(tz);
 
        return AE_OK;
 }
+#else
+#define acpi_thermal_suspend   NULL
+#define acpi_thermal_resume    NULL
 #endif
+static SIMPLE_DEV_PM_OPS(acpi_thermal_pm, acpi_thermal_suspend, acpi_thermal_resume);
+
+static const struct acpi_device_id  thermal_device_ids[] = {
+       {ACPI_THERMAL_HID, 0},
+       {"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, thermal_device_ids);
+
+static struct acpi_driver acpi_thermal_driver = {
+       .name = "thermal",
+       .class = ACPI_THERMAL_CLASS,
+       .ids = thermal_device_ids,
+       .ops = {
+               .add = acpi_thermal_add,
+               .remove = acpi_thermal_remove,
+               .notify = acpi_thermal_notify,
+               },
+       .drv.pm = &acpi_thermal_pm,
+};
 
 static int thermal_act(const struct dmi_system_id *d) {
        if (act == 0) {
@@ -1236,3 +1187,7 @@ static void __exit acpi_thermal_exit(void)
 
 module_init(acpi_thermal_init);
 module_exit(acpi_thermal_exit);
+
+MODULE_AUTHOR("Paul Diefenbaugh");
+MODULE_DESCRIPTION("ACPI Thermal Zone Driver");
+MODULE_LICENSE("GPL");
index 598f548..6353be6 100644 (file)
@@ -19,18 +19,52 @@ static const struct acpi_device_id tiny_power_button_device_ids[] = {
 };
 MODULE_DEVICE_TABLE(acpi, tiny_power_button_device_ids);
 
-static int acpi_noop_add(struct acpi_device *device)
+static void acpi_tiny_power_button_notify(acpi_handle handle, u32 event, void *data)
 {
-       return 0;
+       kill_cad_pid(power_signal, 1);
 }
 
-static void acpi_noop_remove(struct acpi_device *device)
+static void acpi_tiny_power_button_notify_run(void *not_used)
 {
+       acpi_tiny_power_button_notify(NULL, ACPI_FIXED_HARDWARE_EVENT, NULL);
 }
 
-static void acpi_tiny_power_button_notify(struct acpi_device *device, u32 event)
+static u32 acpi_tiny_power_button_event(void *not_used)
 {
-       kill_cad_pid(power_signal, 1);
+       acpi_os_execute(OSL_NOTIFY_HANDLER, acpi_tiny_power_button_notify_run, NULL);
+       return ACPI_INTERRUPT_HANDLED;
+}
+
+static int acpi_tiny_power_button_add(struct acpi_device *device)
+{
+       acpi_status status;
+
+       if (device->device_type == ACPI_BUS_TYPE_POWER_BUTTON) {
+               status = acpi_install_fixed_event_handler(ACPI_EVENT_POWER_BUTTON,
+                                                         acpi_tiny_power_button_event,
+                                                         NULL);
+       } else {
+               status = acpi_install_notify_handler(device->handle,
+                                                    ACPI_DEVICE_NOTIFY,
+                                                    acpi_tiny_power_button_notify,
+                                                    NULL);
+       }
+       if (ACPI_FAILURE(status))
+               return -ENODEV;
+
+       return 0;
+}
+
+static void acpi_tiny_power_button_remove(struct acpi_device *device)
+{
+       if (device->device_type == ACPI_BUS_TYPE_POWER_BUTTON) {
+               acpi_remove_fixed_event_handler(ACPI_EVENT_POWER_BUTTON,
+                                               acpi_tiny_power_button_event);
+       } else {
+               acpi_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
+                                          acpi_tiny_power_button_notify);
+       }
+       acpi_os_wait_events_complete();
 }
 
 static struct acpi_driver acpi_tiny_power_button_driver = {
@@ -38,9 +72,8 @@ static struct acpi_driver acpi_tiny_power_button_driver = {
        .class = "tiny-power-button",
        .ids = tiny_power_button_device_ids,
        .ops = {
-               .add = acpi_noop_add,
-               .remove = acpi_noop_remove,
-               .notify = acpi_tiny_power_button_notify,
+               .add = acpi_tiny_power_button_add,
+               .remove = acpi_tiny_power_button_remove,
        },
 };
 
index bcc25d4..18cc08c 100644 (file)
@@ -471,6 +471,22 @@ static const struct dmi_system_id video_detect_dmi_table[] = {
                },
        },
        {
+        .callback = video_detect_force_native,
+        /* Lenovo ThinkPad X131e (3371 AMD version) */
+        .matches = {
+               DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+               DMI_MATCH(DMI_PRODUCT_NAME, "3371"),
+               },
+       },
+       {
+        .callback = video_detect_force_native,
+        /* Apple iMac11,3 */
+        .matches = {
+               DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."),
+               DMI_MATCH(DMI_PRODUCT_NAME, "iMac11,3"),
+               },
+       },
+       {
         /* https://bugzilla.redhat.com/show_bug.cgi?id=1217249 */
         .callback = video_detect_force_native,
         /* Apple MacBook Pro 12,1 */
@@ -514,6 +530,14 @@ static const struct dmi_system_id video_detect_dmi_table[] = {
        },
        {
         .callback = video_detect_force_native,
+        /* Dell Studio 1569 */
+        .matches = {
+               DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+               DMI_MATCH(DMI_PRODUCT_NAME, "Studio 1569"),
+               },
+       },
+       {
+        .callback = video_detect_force_native,
         /* Acer Aspire 3830TG */
         .matches = {
                DMI_MATCH(DMI_SYS_VENDOR, "Acer"),
@@ -828,6 +852,27 @@ enum acpi_backlight_type __acpi_video_get_backlight_type(bool native, bool *auto
        if (native_available)
                return acpi_backlight_native;
 
+       /*
+        * The vendor specific BIOS interfaces are only necessary for
+        * laptops from before ~2008.
+        *
+        * For laptops from ~2008 till ~2023 this point is never reached
+        * because on those (video_caps & ACPI_VIDEO_BACKLIGHT) above is true.
+        *
+        * Laptops from after ~2023 no longer support ACPI_VIDEO_BACKLIGHT,
+        * if this point is reached on those, this likely means that
+        * the GPU kms driver which sets native_available has not loaded yet.
+        *
+        * Returning acpi_backlight_vendor in this case is known to sometimes
+        * cause a non working vendor specific /sys/class/backlight device to
+        * get registered.
+        *
+        * Return acpi_backlight_none on laptops with ACPI tables written
+        * for Windows 8 (laptops from after ~2012) to avoid this problem.
+        */
+       if (acpi_osi_is_win8())
+               return acpi_backlight_none;
+
        /* No ACPI video/native (old hw), use vendor specific fw methods. */
        return acpi_backlight_vendor;
 }
index e499c60..ce62e61 100644 (file)
@@ -59,6 +59,7 @@ static int lps0_dsm_func_mask;
 
 static guid_t lps0_dsm_guid_microsoft;
 static int lps0_dsm_func_mask_microsoft;
+static int lps0_dsm_state;
 
 /* Device constraint entry structure */
 struct lpi_device_info {
@@ -320,6 +321,44 @@ static void lpi_check_constraints(void)
        }
 }
 
+static bool acpi_s2idle_vendor_amd(void)
+{
+       return boot_cpu_data.x86_vendor == X86_VENDOR_AMD;
+}
+
+static const char *acpi_sleep_dsm_state_to_str(unsigned int state)
+{
+       if (lps0_dsm_func_mask_microsoft || !acpi_s2idle_vendor_amd()) {
+               switch (state) {
+               case ACPI_LPS0_SCREEN_OFF:
+                       return "screen off";
+               case ACPI_LPS0_SCREEN_ON:
+                       return "screen on";
+               case ACPI_LPS0_ENTRY:
+                       return "lps0 entry";
+               case ACPI_LPS0_EXIT:
+                       return "lps0 exit";
+               case ACPI_LPS0_MS_ENTRY:
+                       return "lps0 ms entry";
+               case ACPI_LPS0_MS_EXIT:
+                       return "lps0 ms exit";
+               }
+       } else {
+               switch (state) {
+               case ACPI_LPS0_SCREEN_ON_AMD:
+                       return "screen on";
+               case ACPI_LPS0_SCREEN_OFF_AMD:
+                       return "screen off";
+               case ACPI_LPS0_ENTRY_AMD:
+                       return "lps0 entry";
+               case ACPI_LPS0_EXIT_AMD:
+                       return "lps0 exit";
+               }
+       }
+
+       return "unknown";
+}
+
 static void acpi_sleep_run_lps0_dsm(unsigned int func, unsigned int func_mask, guid_t dsm_guid)
 {
        union acpi_object *out_obj;
@@ -331,14 +370,15 @@ static void acpi_sleep_run_lps0_dsm(unsigned int func, unsigned int func_mask, g
                                        rev_id, func, NULL);
        ACPI_FREE(out_obj);
 
-       acpi_handle_debug(lps0_device_handle, "_DSM function %u evaluation %s\n",
-                         func, out_obj ? "successful" : "failed");
+       lps0_dsm_state = func;
+       if (pm_debug_messages_on) {
+               acpi_handle_info(lps0_device_handle,
+                               "%s transitioned to state %s\n",
+                                out_obj ? "Successfully" : "Failed to",
+                                acpi_sleep_dsm_state_to_str(lps0_dsm_state));
+       }
 }
 
-static bool acpi_s2idle_vendor_amd(void)
-{
-       return boot_cpu_data.x86_vendor == X86_VENDOR_AMD;
-}
 
 static int validate_dsm(acpi_handle handle, const char *uuid, int rev, guid_t *dsm_guid)
 {
@@ -485,11 +525,11 @@ int acpi_s2idle_prepare_late(void)
                                        ACPI_LPS0_ENTRY,
                                        lps0_dsm_func_mask, lps0_dsm_guid);
        if (lps0_dsm_func_mask_microsoft > 0) {
-               acpi_sleep_run_lps0_dsm(ACPI_LPS0_ENTRY,
-                               lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
                /* modern standby entry */
                acpi_sleep_run_lps0_dsm(ACPI_LPS0_MS_ENTRY,
                                lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
+               acpi_sleep_run_lps0_dsm(ACPI_LPS0_ENTRY,
+                               lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
        }
 
        list_for_each_entry(handler, &lps0_s2idle_devops_head, list_node) {
@@ -524,11 +564,6 @@ void acpi_s2idle_restore_early(void)
                if (handler->restore)
                        handler->restore();
 
-       /* Modern standby exit */
-       if (lps0_dsm_func_mask_microsoft > 0)
-               acpi_sleep_run_lps0_dsm(ACPI_LPS0_MS_EXIT,
-                               lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
-
        /* LPS0 exit */
        if (lps0_dsm_func_mask > 0)
                acpi_sleep_run_lps0_dsm(acpi_s2idle_vendor_amd() ?
@@ -539,6 +574,11 @@ void acpi_s2idle_restore_early(void)
                acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT,
                                lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
 
+       /* Modern standby exit */
+       if (lps0_dsm_func_mask_microsoft > 0)
+               acpi_sleep_run_lps0_dsm(ACPI_LPS0_MS_EXIT,
+                               lps0_dsm_func_mask_microsoft, lps0_dsm_guid_microsoft);
+
        /* Screen on */
        if (lps0_dsm_func_mask_microsoft > 0)
                acpi_sleep_run_lps0_dsm(ACPI_LPS0_SCREEN_ON,
index 9c2d6f3..c2b925f 100644 (file)
@@ -259,10 +259,11 @@ bool force_storage_d3(void)
  * drivers/platform/x86/x86-android-tablets.c kernel module.
  */
 #define ACPI_QUIRK_SKIP_I2C_CLIENTS                            BIT(0)
-#define ACPI_QUIRK_UART1_TTY_UART2_SKIP                                BIT(1)
-#define ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY                    BIT(2)
-#define ACPI_QUIRK_USE_ACPI_AC_AND_BATTERY                     BIT(3)
-#define ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS                    BIT(4)
+#define ACPI_QUIRK_UART1_SKIP                                  BIT(1)
+#define ACPI_QUIRK_UART1_TTY_UART2_SKIP                                BIT(2)
+#define ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY                    BIT(3)
+#define ACPI_QUIRK_USE_ACPI_AC_AND_BATTERY                     BIT(4)
+#define ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS                    BIT(5)
 
 static const struct dmi_system_id acpi_quirk_skip_dmi_ids[] = {
        /*
@@ -319,6 +320,7 @@ static const struct dmi_system_id acpi_quirk_skip_dmi_ids[] = {
                        DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "YETI-11"),
                },
                .driver_data = (void *)(ACPI_QUIRK_SKIP_I2C_CLIENTS |
+                                       ACPI_QUIRK_UART1_SKIP |
                                        ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY |
                                        ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS),
        },
@@ -365,7 +367,7 @@ static const struct dmi_system_id acpi_quirk_skip_dmi_ids[] = {
                                        ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY),
        },
        {
-               /* Nextbook Ares 8 */
+               /* Nextbook Ares 8 (BYT version)*/
                .matches = {
                        DMI_MATCH(DMI_SYS_VENDOR, "Insyde"),
                        DMI_MATCH(DMI_PRODUCT_NAME, "M890BAP"),
@@ -375,6 +377,16 @@ static const struct dmi_system_id acpi_quirk_skip_dmi_ids[] = {
                                        ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS),
        },
        {
+               /* Nextbook Ares 8A (CHT version)*/
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Insyde"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "CherryTrail"),
+                       DMI_MATCH(DMI_BIOS_VERSION, "M882"),
+               },
+               .driver_data = (void *)(ACPI_QUIRK_SKIP_I2C_CLIENTS |
+                                       ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY),
+       },
+       {
                /* Whitelabel (sold as various brands) TM800A550L */
                .matches = {
                        DMI_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"),
@@ -392,6 +404,7 @@ static const struct dmi_system_id acpi_quirk_skip_dmi_ids[] = {
 #if IS_ENABLED(CONFIG_X86_ANDROID_TABLETS)
 static const struct acpi_device_id i2c_acpi_known_good_ids[] = {
        { "10EC5640", 0 }, /* RealTek ALC5640 audio codec */
+       { "10EC5651", 0 }, /* RealTek ALC5651 audio codec */
        { "INT33F4", 0 },  /* X-Powers AXP288 PMIC */
        { "INT33FD", 0 },  /* Intel Crystal Cove PMIC */
        { "INT34D3", 0 },  /* Intel Whiskey Cove PMIC */
@@ -438,6 +451,9 @@ int acpi_quirk_skip_serdev_enumeration(struct device *controller_parent, bool *s
        if (dmi_id)
                quirks = (unsigned long)dmi_id->driver_data;
 
+       if ((quirks & ACPI_QUIRK_UART1_SKIP) && uid == 1)
+               *skip = true;
+
        if (quirks & ACPI_QUIRK_UART1_TTY_UART2_SKIP) {
                if (uid == 1)
                        return -ENODEV; /* Create tty cdev instead of serdev */
index fb56bfc..8fb7672 100644 (file)
@@ -1934,24 +1934,23 @@ static void binder_deferred_fd_close(int fd)
 static void binder_transaction_buffer_release(struct binder_proc *proc,
                                              struct binder_thread *thread,
                                              struct binder_buffer *buffer,
-                                             binder_size_t failed_at,
+                                             binder_size_t off_end_offset,
                                              bool is_failure)
 {
        int debug_id = buffer->debug_id;
-       binder_size_t off_start_offset, buffer_offset, off_end_offset;
+       binder_size_t off_start_offset, buffer_offset;
 
        binder_debug(BINDER_DEBUG_TRANSACTION,
                     "%d buffer release %d, size %zd-%zd, failed at %llx\n",
                     proc->pid, buffer->debug_id,
                     buffer->data_size, buffer->offsets_size,
-                    (unsigned long long)failed_at);
+                    (unsigned long long)off_end_offset);
 
        if (buffer->target_node)
                binder_dec_node(buffer->target_node, 1, 0);
 
        off_start_offset = ALIGN(buffer->data_size, sizeof(void *));
-       off_end_offset = is_failure && failed_at ? failed_at :
-                               off_start_offset + buffer->offsets_size;
+
        for (buffer_offset = off_start_offset; buffer_offset < off_end_offset;
             buffer_offset += sizeof(binder_size_t)) {
                struct binder_object_header *hdr;
@@ -2111,6 +2110,21 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
        }
 }
 
+/* Clean up all the objects in the buffer */
+static inline void binder_release_entire_buffer(struct binder_proc *proc,
+                                               struct binder_thread *thread,
+                                               struct binder_buffer *buffer,
+                                               bool is_failure)
+{
+       binder_size_t off_end_offset;
+
+       off_end_offset = ALIGN(buffer->data_size, sizeof(void *));
+       off_end_offset += buffer->offsets_size;
+
+       binder_transaction_buffer_release(proc, thread, buffer,
+                                         off_end_offset, is_failure);
+}
+
 static int binder_translate_binder(struct flat_binder_object *fp,
                                   struct binder_transaction *t,
                                   struct binder_thread *thread)
@@ -2806,7 +2820,7 @@ static int binder_proc_transaction(struct binder_transaction *t,
                t_outdated->buffer = NULL;
                buffer->transaction = NULL;
                trace_binder_transaction_update_buffer_release(buffer);
-               binder_transaction_buffer_release(proc, NULL, buffer, 0, 0);
+               binder_release_entire_buffer(proc, NULL, buffer, false);
                binder_alloc_free_buf(&proc->alloc, buffer);
                kfree(t_outdated);
                binder_stats_deleted(BINDER_STAT_TRANSACTION);
@@ -3775,7 +3789,7 @@ binder_free_buf(struct binder_proc *proc,
                binder_node_inner_unlock(buf_node);
        }
        trace_binder_transaction_buffer_release(buffer);
-       binder_transaction_buffer_release(proc, thread, buffer, 0, is_failure);
+       binder_release_entire_buffer(proc, thread, buffer, is_failure);
        binder_alloc_free_buf(&proc->alloc, buffer);
 }
 
index 55a3c3c..662a2a2 100644 (file)
@@ -212,8 +212,8 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
                mm = alloc->mm;
 
        if (mm) {
-               mmap_read_lock(mm);
-               vma = vma_lookup(mm, alloc->vma_addr);
+               mmap_write_lock(mm);
+               vma = alloc->vma;
        }
 
        if (!vma && need_mm) {
@@ -270,7 +270,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
                trace_binder_alloc_page_end(alloc, index);
        }
        if (mm) {
-               mmap_read_unlock(mm);
+               mmap_write_unlock(mm);
                mmput(mm);
        }
        return 0;
@@ -303,21 +303,24 @@ err_page_ptr_cleared:
        }
 err_no_vma:
        if (mm) {
-               mmap_read_unlock(mm);
+               mmap_write_unlock(mm);
                mmput(mm);
        }
        return vma ? -ENOMEM : -ESRCH;
 }
 
+static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
+               struct vm_area_struct *vma)
+{
+       /* pairs with smp_load_acquire in binder_alloc_get_vma() */
+       smp_store_release(&alloc->vma, vma);
+}
+
 static inline struct vm_area_struct *binder_alloc_get_vma(
                struct binder_alloc *alloc)
 {
-       struct vm_area_struct *vma = NULL;
-
-       if (alloc->vma_addr)
-               vma = vma_lookup(alloc->mm, alloc->vma_addr);
-
-       return vma;
+       /* pairs with smp_store_release in binder_alloc_set_vma() */
+       return smp_load_acquire(&alloc->vma);
 }
 
 static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
@@ -380,15 +383,13 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
        size_t size, data_offsets_size;
        int ret;
 
-       mmap_read_lock(alloc->mm);
+       /* Check binder_alloc is fully initialized */
        if (!binder_alloc_get_vma(alloc)) {
-               mmap_read_unlock(alloc->mm);
                binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
                                   "%d: binder_alloc_buf, no vma\n",
                                   alloc->pid);
                return ERR_PTR(-ESRCH);
        }
-       mmap_read_unlock(alloc->mm);
 
        data_offsets_size = ALIGN(data_size, sizeof(void *)) +
                ALIGN(offsets_size, sizeof(void *));
@@ -778,7 +779,9 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
        buffer->free = 1;
        binder_insert_free_buffer(alloc, buffer);
        alloc->free_async_space = alloc->buffer_size / 2;
-       alloc->vma_addr = vma->vm_start;
+
+       /* Signal binder_alloc is fully initialized */
+       binder_alloc_set_vma(alloc, vma);
 
        return 0;
 
@@ -808,8 +811,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
 
        buffers = 0;
        mutex_lock(&alloc->mutex);
-       BUG_ON(alloc->vma_addr &&
-              vma_lookup(alloc->mm, alloc->vma_addr));
+       BUG_ON(alloc->vma);
 
        while ((n = rb_first(&alloc->allocated_buffers))) {
                buffer = rb_entry(n, struct binder_buffer, rb_node);
@@ -916,25 +918,17 @@ void binder_alloc_print_pages(struct seq_file *m,
         * Make sure the binder_alloc is fully initialized, otherwise we might
         * read inconsistent state.
         */
-
-       mmap_read_lock(alloc->mm);
-       if (binder_alloc_get_vma(alloc) == NULL) {
-               mmap_read_unlock(alloc->mm);
-               goto uninitialized;
-       }
-
-       mmap_read_unlock(alloc->mm);
-       for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
-               page = &alloc->pages[i];
-               if (!page->page_ptr)
-                       free++;
-               else if (list_empty(&page->lru))
-                       active++;
-               else
-                       lru++;
+       if (binder_alloc_get_vma(alloc) != NULL) {
+               for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
+                       page = &alloc->pages[i];
+                       if (!page->page_ptr)
+                               free++;
+                       else if (list_empty(&page->lru))
+                               active++;
+                       else
+                               lru++;
+               }
        }
-
-uninitialized:
        mutex_unlock(&alloc->mutex);
        seq_printf(m, "  pages: %d:%d:%d\n", active, lru, free);
        seq_printf(m, "  pages high watermark: %zu\n", alloc->pages_high);
@@ -969,7 +963,7 @@ int binder_alloc_get_allocated_count(struct binder_alloc *alloc)
  */
 void binder_alloc_vma_close(struct binder_alloc *alloc)
 {
-       alloc->vma_addr = 0;
+       binder_alloc_set_vma(alloc, NULL);
 }
 
 /**
index 0f811ac..138d1d5 100644 (file)
@@ -75,7 +75,7 @@ struct binder_lru_page {
 /**
  * struct binder_alloc - per-binder proc state for binder allocator
  * @mutex:              protects binder_alloc fields
- * @vma_addr:           vm_area_struct->vm_start passed to mmap_handler
+ * @vma:                vm_area_struct passed to mmap_handler
  *                      (invariant after mmap)
  * @mm:                 copy of task->mm (invariant after open)
  * @buffer:             base of per-proc address space mapped via mmap
@@ -99,7 +99,7 @@ struct binder_lru_page {
  */
 struct binder_alloc {
        struct mutex mutex;
-       unsigned long vma_addr;
+       struct vm_area_struct *vma;
        struct mm_struct *mm;
        void __user *buffer;
        struct list_head buffers;
index 43a8810..c2b323b 100644 (file)
@@ -287,7 +287,7 @@ void binder_selftest_alloc(struct binder_alloc *alloc)
        if (!binder_selftest_run)
                return;
        mutex_lock(&binder_selftest_lock);
-       if (!binder_selftest_run || !alloc->vma_addr)
+       if (!binder_selftest_run || !alloc->vma)
                goto done;
        pr_info("STARTED\n");
        binder_selftest_alloc_offset(alloc, end_offset, 0);
index 8bf612b..b4f246f 100644 (file)
@@ -5348,7 +5348,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
 
        mutex_init(&ap->scsi_scan_mutex);
        INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);
-       INIT_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
+       INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);
        INIT_LIST_HEAD(&ap->eh_done_q);
        init_waitqueue_head(&ap->eh_wait_q);
        init_completion(&ap->park_req_pending);
@@ -5954,6 +5954,7 @@ static void ata_port_detach(struct ata_port *ap)
        WARN_ON(!(ap->pflags & ATA_PFLAG_UNLOADED));
 
        cancel_delayed_work_sync(&ap->hotplug_task);
+       cancel_delayed_work_sync(&ap->scsi_rescan_task);
 
  skip_eh:
        /* clean up zpodd on port removal */
index a6c9018..6f8d141 100644 (file)
@@ -2984,7 +2984,7 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
                        ehc->i.flags |= ATA_EHI_SETMODE;
 
                        /* schedule the scsi_rescan_device() here */
-                       schedule_work(&(ap->scsi_rescan_task));
+                       schedule_delayed_work(&ap->scsi_rescan_task, 0);
                } else if (dev->class == ATA_DEV_UNKNOWN &&
                           ehc->tries[dev->devno] &&
                           ata_class_enabled(ehc->classes[dev->devno])) {
index 7bb12de..551077c 100644 (file)
@@ -2694,18 +2694,36 @@ static unsigned int atapi_xlat(struct ata_queued_cmd *qc)
        return 0;
 }
 
-static struct ata_device *ata_find_dev(struct ata_port *ap, int devno)
+static struct ata_device *ata_find_dev(struct ata_port *ap, unsigned int devno)
 {
-       if (!sata_pmp_attached(ap)) {
-               if (likely(devno >= 0 &&
-                          devno < ata_link_max_devices(&ap->link)))
+       /*
+        * For the non-PMP case, ata_link_max_devices() returns 1 (SATA case),
+        * or 2 (IDE master + slave case). However, the former case includes
+        * libsas hosted devices which are numbered per scsi host, leading
+        * to devno potentially being larger than 0 but with each struct
+        * ata_device having its own struct ata_port and struct ata_link.
+        * To accommodate these, ignore devno and always use device number 0.
+        */
+       if (likely(!sata_pmp_attached(ap))) {
+               int link_max_devices = ata_link_max_devices(&ap->link);
+
+               if (link_max_devices == 1)
+                       return &ap->link.device[0];
+
+               if (devno < link_max_devices)
                        return &ap->link.device[devno];
-       } else {
-               if (likely(devno >= 0 &&
-                          devno < ap->nr_pmp_links))
-                       return &ap->pmp_link[devno].device[0];
+
+               return NULL;
        }
 
+       /*
+        * For PMP-attached devices, the device number corresponds to C
+        * (channel) of SCSI [H:C:I:L], indicating the port pmp link
+        * for the device.
+        */
+       if (devno < ap->nr_pmp_links)
+               return &ap->pmp_link[devno].device[0];
+
        return NULL;
 }
 
@@ -4579,10 +4597,11 @@ int ata_scsi_user_scan(struct Scsi_Host *shost, unsigned int channel,
 void ata_scsi_dev_rescan(struct work_struct *work)
 {
        struct ata_port *ap =
-               container_of(work, struct ata_port, scsi_rescan_task);
+               container_of(work, struct ata_port, scsi_rescan_task.work);
        struct ata_link *link;
        struct ata_device *dev;
        unsigned long flags;
+       bool delay_rescan = false;
 
        mutex_lock(&ap->scsi_scan_mutex);
        spin_lock_irqsave(ap->lock, flags);
@@ -4596,6 +4615,21 @@ void ata_scsi_dev_rescan(struct work_struct *work)
                        if (scsi_device_get(sdev))
                                continue;
 
+                       /*
+                        * If the rescan work was scheduled because of a resume
+                        * event, the port is already fully resumed, but the
+                        * SCSI device may not yet be fully resumed. In such
+                        * case, executing scsi_rescan_device() may cause a
+                        * deadlock with the PM code on device_lock(). Prevent
+                        * this by giving up and retrying rescan after a short
+                        * delay.
+                        */
+                       delay_rescan = sdev->sdev_gendev.power.is_suspended;
+                       if (delay_rescan) {
+                               scsi_device_put(sdev);
+                               break;
+                       }
+
                        spin_unlock_irqrestore(ap->lock, flags);
                        scsi_rescan_device(&(sdev->sdev_gendev));
                        scsi_device_put(sdev);
@@ -4605,4 +4639,8 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 
        spin_unlock_irqrestore(ap->lock, flags);
        mutex_unlock(&ap->scsi_scan_mutex);
+
+       if (delay_rescan)
+               schedule_delayed_work(&ap->scsi_rescan_task,
+                                     msecs_to_jiffies(5));
 }
index 0242599..d44814b 100644 (file)
@@ -820,7 +820,7 @@ static const struct of_device_id ht16k33_of_match[] = {
 MODULE_DEVICE_TABLE(of, ht16k33_of_match);
 
 static struct i2c_driver ht16k33_driver = {
-       .probe_new      = ht16k33_probe,
+       .probe          = ht16k33_probe,
        .remove         = ht16k33_remove,
        .driver         = {
                .name           = DRIVER_NAME,
index 135831a..6422be0 100644 (file)
@@ -365,7 +365,7 @@ static struct i2c_driver lcd2s_i2c_driver = {
                .name = "lcd2s",
                .of_match_table = lcd2s_of_table,
        },
-       .probe_new = lcd2s_i2c_probe,
+       .probe = lcd2s_i2c_probe,
        .remove = lcd2s_i2c_remove,
        .id_table = lcd2s_i2c_id,
 };
index bba3482..cbae8be 100644 (file)
@@ -388,6 +388,16 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
                                continue;/* skip if itself or no cacheinfo */
                        for (sib_index = 0; sib_index < cache_leaves(i); sib_index++) {
                                sib_leaf = per_cpu_cacheinfo_idx(i, sib_index);
+
+                               /*
+                                * Comparing cache IDs only makes sense if the leaves
+                                * belong to the same cache level of same type. Skip
+                                * the check if level and type do not match.
+                                */
+                               if (sib_leaf->level != this_leaf->level ||
+                                   sib_leaf->type != this_leaf->type)
+                                       continue;
+
                                if (cache_leaves_are_shared(this_leaf, sib_leaf)) {
                                        cpumask_set_cpu(cpu, &sib_leaf->shared_cpu_map);
                                        cpumask_set_cpu(i, &this_leaf->shared_cpu_map);
@@ -400,11 +410,14 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
                        coherency_max_size = this_leaf->coherency_line_size;
        }
 
+       /* shared_cpu_map is now populated for the cpu */
+       this_cpu_ci->cpu_map_populated = true;
        return 0;
 }
 
 static void cache_shared_cpu_map_remove(unsigned int cpu)
 {
+       struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
        struct cacheinfo *this_leaf, *sib_leaf;
        unsigned int sibling, index, sib_index;
 
@@ -419,6 +432,16 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 
                        for (sib_index = 0; sib_index < cache_leaves(sibling); sib_index++) {
                                sib_leaf = per_cpu_cacheinfo_idx(sibling, sib_index);
+
+                               /*
+                                * Comparing cache IDs only makes sense if the leaves
+                                * belong to the same cache level of same type. Skip
+                                * the check if level and type do not match.
+                                */
+                               if (sib_leaf->level != this_leaf->level ||
+                                   sib_leaf->type != this_leaf->type)
+                                       continue;
+
                                if (cache_leaves_are_shared(this_leaf, sib_leaf)) {
                                        cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
                                        cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
@@ -427,6 +450,9 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
                        }
                }
        }
+
+       /* cpu is no longer populated in the shared map */
+       this_cpu_ci->cpu_map_populated = false;
 }
 
 static void free_cache_attributes(unsigned int cpu)
index ac1808d..05d9df9 100644 (file)
@@ -320,6 +320,7 @@ void class_dev_iter_init(struct class_dev_iter *iter, const struct class *class,
                start_knode = &start->p->knode_class;
        klist_iter_init_node(&sp->klist_devices, &iter->ki, start_knode);
        iter->type = type;
+       iter->sp = sp;
 }
 EXPORT_SYMBOL_GPL(class_dev_iter_init);
 
@@ -361,6 +362,7 @@ EXPORT_SYMBOL_GPL(class_dev_iter_next);
 void class_dev_iter_exit(struct class_dev_iter *iter)
 {
        klist_iter_exit(&iter->ki);
+       subsys_put(iter->sp);
 }
 EXPORT_SYMBOL_GPL(class_dev_iter_exit);
 
index 9c09ca5..878aa76 100644 (file)
@@ -751,14 +751,12 @@ static int really_probe_debug(struct device *dev, struct device_driver *drv)
  *
  * Should somehow figure out how to use a semaphore, not an atomic variable...
  */
-int driver_probe_done(void)
+bool __init driver_probe_done(void)
 {
        int local_probe_count = atomic_read(&probe_count);
 
        pr_debug("%s: probe_count = %d\n", __func__, local_probe_count);
-       if (local_probe_count)
-               return -EBUSY;
-       return 0;
+       return !local_probe_count;
 }
 
 /**
index 9d79d5a..b58c42f 100644 (file)
@@ -812,7 +812,7 @@ static void fw_log_firmware_info(const struct firmware *fw, const char *name, st
        char *outbuf;
 
        alg = crypto_alloc_shash("sha256", 0, 0);
-       if (!alg)
+       if (IS_ERR(alg))
                return;
 
        sha256buf = kmalloc(SHA256_DIGEST_SIZE, GFP_KERNEL);
index b46db17..6559759 100644 (file)
@@ -449,6 +449,9 @@ static ssize_t node_read_meminfo(struct device *dev,
                             "Node %d FileHugePages: %8lu kB\n"
                             "Node %d FilePmdMapped: %8lu kB\n"
 #endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+                            "Node %d Unaccepted:     %8lu kB\n"
+#endif
                             ,
                             nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
                             nid, K(node_page_state(pgdat, NR_WRITEBACK)),
@@ -478,6 +481,10 @@ static ssize_t node_read_meminfo(struct device *dev,
                             nid, K(node_page_state(pgdat, NR_FILE_THPS)),
                             nid, K(node_page_state(pgdat, NR_FILE_PMDMAPPED))
 #endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+                            ,
+                            nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
+#endif
                            );
        len += hugetlb_report_node_meminfo(buf, len, nid);
        return len;
index 32084e3..5cb2023 100644 (file)
@@ -1632,9 +1632,6 @@ static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
 
        dev_dbg(dev, "%s()\n", __func__);
 
-       if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(dev))
-               return -EINVAL;
-
        gpd_data = genpd_alloc_dev_data(dev, gd);
        if (IS_ERR(gpd_data))
                return PTR_ERR(gpd_data);
@@ -1676,6 +1673,9 @@ int pm_genpd_add_device(struct generic_pm_domain *genpd, struct device *dev)
 {
        int ret;
 
+       if (!genpd || !dev)
+               return -EINVAL;
+
        mutex_lock(&gpd_list_lock);
        ret = genpd_add_device(genpd, dev, dev);
        mutex_unlock(&gpd_list_lock);
@@ -2523,6 +2523,9 @@ int of_genpd_add_device(struct of_phandle_args *genpdspec, struct device *dev)
        struct generic_pm_domain *genpd;
        int ret;
 
+       if (!dev)
+               return -EINVAL;
+
        mutex_lock(&gpd_list_lock);
 
        genpd = genpd_get_from_provider(genpdspec);
@@ -2939,10 +2942,10 @@ static int genpd_parse_state(struct genpd_power_state *genpd_state,
 
        err = of_property_read_u32(state_node, "min-residency-us", &residency);
        if (!err)
-               genpd_state->residency_ns = 1000 * residency;
+               genpd_state->residency_ns = 1000LL * residency;
 
-       genpd_state->power_on_latency_ns = 1000 * exit_latency;
-       genpd_state->power_off_latency_ns = 1000 * entry_latency;
+       genpd_state->power_on_latency_ns = 1000LL * exit_latency;
+       genpd_state->power_off_latency_ns = 1000LL * entry_latency;
        genpd_state->fwnode = &state_node->fwnode;
 
        return 0;
index 7cc0c0c..a917219 100644 (file)
 
 #include "power.h"
 
-#ifndef CONFIG_SUSPEND
-suspend_state_t pm_suspend_target_state;
-#define pm_suspend_target_state        (PM_SUSPEND_ON)
-#endif
-
 #define list_for_each_entry_rcu_locked(pos, head, member) \
        list_for_each_entry_rcu(pos, head, member, \
                srcu_read_lock_held(&wakeup_srcu))
index 33a8366..0db2021 100644 (file)
@@ -4,16 +4,23 @@
 # subsystems should select the appropriate symbols.
 
 config REGMAP
+       bool "Register Map support" if KUNIT_ALL_TESTS
        default y if (REGMAP_I2C || REGMAP_SPI || REGMAP_SPMI || REGMAP_W1 || REGMAP_AC97 || REGMAP_MMIO || REGMAP_IRQ || REGMAP_SOUNDWIRE || REGMAP_SOUNDWIRE_MBQ || REGMAP_SCCB || REGMAP_I3C || REGMAP_SPI_AVMM || REGMAP_MDIO || REGMAP_FSI)
        select IRQ_DOMAIN if REGMAP_IRQ
        select MDIO_BUS if REGMAP_MDIO
-       bool
+       help
+         Enable support for the Register Map (regmap) access API.
+
+         Usually, this option is automatically selected when needed.
+         However, you may want to enable it manually for running the regmap
+         KUnit tests.
+
+         If unsure, say N.
 
 config REGMAP_KUNIT
        tristate "KUnit tests for regmap"
-       depends on KUNIT
+       depends on KUNIT && REGMAP
        default KUNIT_ALL_TESTS
-       select REGMAP
        select REGMAP_RAM
 
 config REGMAP_AC97
index 9b1b559..c2e3a0f 100644 (file)
@@ -203,15 +203,18 @@ static int regcache_maple_sync(struct regmap *map, unsigned int min,
 
        mas_for_each(&mas, entry, max) {
                for (r = max(mas.index, lmin); r <= min(mas.last, lmax); r++) {
+                       mas_pause(&mas);
+                       rcu_read_unlock();
                        ret = regcache_sync_val(map, r, entry[r - mas.index]);
                        if (ret != 0)
                                goto out;
+                       rcu_read_lock();
                }
        }
 
-out:
        rcu_read_unlock();
 
+out:
        map->cache_bypass = false;
 
        return ret;
index 0295646..97c681f 100644 (file)
@@ -284,6 +284,9 @@ static bool regcache_reg_needs_sync(struct regmap *map, unsigned int reg,
 {
        int ret;
 
+       if (!regmap_writeable(map, reg))
+               return false;
+
        /* If we don't know the chip just got reset, then sync everything. */
        if (!map->no_sync_defaults)
                return true;
index 09899ae..159c0b7 100644 (file)
@@ -59,6 +59,10 @@ static int regmap_sdw_config_check(const struct regmap_config *config)
        if (config->pad_bits != 0)
                return -ENOTSUPP;
 
+       /* Only bulk writes are supported not multi-register writes */
+       if (config->can_multi_write)
+               return -ENOTSUPP;
+
        return 0;
 }
 
index 4c2b94b..6af6928 100644 (file)
@@ -660,7 +660,7 @@ static const struct regmap_bus regmap_spi_avmm_bus = {
        .reg_format_endian_default = REGMAP_ENDIAN_NATIVE,
        .val_format_endian_default = REGMAP_ENDIAN_NATIVE,
        .max_raw_read = SPI_AVMM_VAL_SIZE * MAX_READ_CNT,
-       .max_raw_write = SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT,
+       .max_raw_write = SPI_AVMM_REG_SIZE + SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT,
        .free_context = spi_avmm_bridge_ctx_free,
 };
 
index db7851f..fa2d3fb 100644 (file)
@@ -2082,6 +2082,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg,
        size_t val_count = val_len / val_bytes;
        size_t chunk_count, chunk_bytes;
        size_t chunk_regs = val_count;
+       size_t max_data = map->max_raw_write - map->format.reg_bytes -
+                       map->format.pad_bytes;
        int ret, i;
 
        if (!val_count)
@@ -2089,8 +2091,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg,
 
        if (map->use_single_write)
                chunk_regs = 1;
-       else if (map->max_raw_write && val_len > map->max_raw_write)
-               chunk_regs = map->max_raw_write / val_bytes;
+       else if (map->max_raw_write && val_len > max_data)
+               chunk_regs = max_data / val_bytes;
 
        chunk_count = val_count / chunk_regs;
        chunk_bytes = chunk_regs * val_bytes;
index 4c8b2ba..e460c97 100644 (file)
@@ -1532,7 +1532,7 @@ static int fd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
        return 0;
 }
 
-static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode,
+static int fd_locked_ioctl(struct block_device *bdev, blk_mode_t mode,
                    unsigned int cmd, unsigned long param)
 {
        struct amiga_floppy_struct *p = bdev->bd_disk->private_data;
@@ -1607,7 +1607,7 @@ static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode,
        return 0;
 }
 
-static int fd_ioctl(struct block_device *bdev, fmode_t mode,
+static int fd_ioctl(struct block_device *bdev, blk_mode_t mode,
                             unsigned int cmd, unsigned long param)
 {
        int ret;
@@ -1654,10 +1654,10 @@ static void fd_probe(int dev)
  * /dev/PS0 etc), and disallows simultaneous access to the same
  * drive with different device numbers.
  */
-static int floppy_open(struct block_device *bdev, fmode_t mode)
+static int floppy_open(struct gendisk *disk, blk_mode_t mode)
 {
-       int drive = MINOR(bdev->bd_dev) & 3;
-       int system =  (MINOR(bdev->bd_dev) & 4) >> 2;
+       int drive = disk->first_minor & 3;
+       int system = (disk->first_minor & 4) >> 2;
        int old_dev;
        unsigned long flags;
 
@@ -1673,10 +1673,9 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
                mutex_unlock(&amiflop_mutex);
                return -ENXIO;
        }
-
-       if (mode & (FMODE_READ|FMODE_WRITE)) {
-               bdev_check_media_change(bdev);
-               if (mode & FMODE_WRITE) {
+       if (mode & (BLK_OPEN_READ | BLK_OPEN_WRITE)) {
+               disk_check_media_change(disk);
+               if (mode & BLK_OPEN_WRITE) {
                        int wrprot;
 
                        get_fdc(drive);
@@ -1691,7 +1690,6 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
                        }
                }
        }
-
        local_irq_save(flags);
        fd_ref[drive]++;
        fd_device[drive] = system;
@@ -1709,7 +1707,7 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
        return 0;
 }
 
-static void floppy_release(struct gendisk *disk, fmode_t mode)
+static void floppy_release(struct gendisk *disk)
 {
        struct amiga_floppy_struct *p = disk->private_data;
        int drive = p - unit;
index 128722c..cf68837 100644 (file)
@@ -204,9 +204,9 @@ aoedisk_rm_debugfs(struct aoedev *d)
 }
 
 static int
-aoeblk_open(struct block_device *bdev, fmode_t mode)
+aoeblk_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct aoedev *d = bdev->bd_disk->private_data;
+       struct aoedev *d = disk->private_data;
        ulong flags;
 
        if (!virt_addr_valid(d)) {
@@ -232,7 +232,7 @@ aoeblk_open(struct block_device *bdev, fmode_t mode)
 }
 
 static void
-aoeblk_release(struct gendisk *disk, fmode_t mode)
+aoeblk_release(struct gendisk *disk)
 {
        struct aoedev *d = disk->private_data;
        ulong flags;
@@ -285,7 +285,7 @@ aoeblk_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 }
 
 static int
-aoeblk_ioctl(struct block_device *bdev, fmode_t mode, uint cmd, ulong arg)
+aoeblk_ioctl(struct block_device *bdev, blk_mode_t mode, uint cmd, ulong arg)
 {
        struct aoedev *d;
 
index 4c666f7..a42c4bc 100644 (file)
@@ -49,7 +49,7 @@ static int emsgs_head_idx, emsgs_tail_idx;
 static struct completion emsgs_comp;
 static spinlock_t emsgs_lock;
 static int nblocked_emsgs_readers;
-static struct class *aoe_class;
+
 static struct aoe_chardev chardevs[] = {
        { MINOR_ERR, "err" },
        { MINOR_DISCOVER, "discover" },
@@ -58,6 +58,16 @@ static struct aoe_chardev chardevs[] = {
        { MINOR_FLUSH, "flush" },
 };
 
+static char *aoe_devnode(const struct device *dev, umode_t *mode)
+{
+       return kasprintf(GFP_KERNEL, "etherd/%s", dev_name(dev));
+}
+
+static const struct class aoe_class = {
+       .name = "aoe",
+       .devnode = aoe_devnode,
+};
+
 static int
 discover(void)
 {
@@ -273,11 +283,6 @@ static const struct file_operations aoe_fops = {
        .llseek = noop_llseek,
 };
 
-static char *aoe_devnode(const struct device *dev, umode_t *mode)
-{
-       return kasprintf(GFP_KERNEL, "etherd/%s", dev_name(dev));
-}
-
 int __init
 aoechr_init(void)
 {
@@ -290,15 +295,14 @@ aoechr_init(void)
        }
        init_completion(&emsgs_comp);
        spin_lock_init(&emsgs_lock);
-       aoe_class = class_create("aoe");
-       if (IS_ERR(aoe_class)) {
+       n = class_register(&aoe_class);
+       if (n) {
                unregister_chrdev(AOE_MAJOR, "aoechr");
-               return PTR_ERR(aoe_class);
+               return n;
        }
-       aoe_class->devnode = aoe_devnode;
 
        for (i = 0; i < ARRAY_SIZE(chardevs); ++i)
-               device_create(aoe_class, NULL,
+               device_create(&aoe_class, NULL,
                              MKDEV(AOE_MAJOR, chardevs[i].minor), NULL,
                              chardevs[i].name);
 
@@ -311,8 +315,8 @@ aoechr_exit(void)
        int i;
 
        for (i = 0; i < ARRAY_SIZE(chardevs); ++i)
-               device_destroy(aoe_class, MKDEV(AOE_MAJOR, chardevs[i].minor));
-       class_destroy(aoe_class);
+               device_destroy(&aoe_class, MKDEV(AOE_MAJOR, chardevs[i].minor));
+       class_unregister(&aoe_class);
        unregister_chrdev(AOE_MAJOR, "aoechr");
 }
 
index 9deb4df..cd738ca 100644 (file)
@@ -442,13 +442,13 @@ static void fd_times_out(struct timer_list *unused);
 static void finish_fdc( void );
 static void finish_fdc_done( int dummy );
 static void setup_req_params( int drive );
-static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode, unsigned int
-                     cmd, unsigned long param);
+static int fd_locked_ioctl(struct block_device *bdev, blk_mode_t mode,
+               unsigned int cmd, unsigned long param);
 static void fd_probe( int drive );
 static int fd_test_drive_present( int drive );
 static void config_types( void );
-static int floppy_open(struct block_device *bdev, fmode_t mode);
-static void floppy_release(struct gendisk *disk, fmode_t mode);
+static int floppy_open(struct gendisk *disk, blk_mode_t mode);
+static void floppy_release(struct gendisk *disk);
 
 /************************* End of Prototypes **************************/
 
@@ -1581,7 +1581,7 @@ out:
        return BLK_STS_OK;
 }
 
-static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode,
+static int fd_locked_ioctl(struct block_device *bdev, blk_mode_t mode,
                    unsigned int cmd, unsigned long param)
 {
        struct gendisk *disk = bdev->bd_disk;
@@ -1760,15 +1760,15 @@ static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode,
                /* invalidate the buffer track to force a reread */
                BufferDrive = -1;
                set_bit(drive, &fake_change);
-               if (bdev_check_media_change(bdev))
-                       floppy_revalidate(bdev->bd_disk);
+               if (disk_check_media_change(disk))
+                       floppy_revalidate(disk);
                return 0;
        default:
                return -EINVAL;
        }
 }
 
-static int fd_ioctl(struct block_device *bdev, fmode_t mode,
+static int fd_ioctl(struct block_device *bdev, blk_mode_t mode,
                             unsigned int cmd, unsigned long arg)
 {
        int ret;
@@ -1915,32 +1915,31 @@ static void __init config_types( void )
  * drive with different device numbers.
  */
 
-static int floppy_open(struct block_device *bdev, fmode_t mode)
+static int floppy_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct atari_floppy_struct *p = bdev->bd_disk->private_data;
-       int type  = MINOR(bdev->bd_dev) >> 2;
+       struct atari_floppy_struct *p = disk->private_data;
+       int type = disk->first_minor >> 2;
 
        DPRINT(("fd_open: type=%d\n",type));
        if (p->ref && p->type != type)
                return -EBUSY;
 
-       if (p->ref == -1 || (p->ref && mode & FMODE_EXCL))
+       if (p->ref == -1 || (p->ref && mode & BLK_OPEN_EXCL))
                return -EBUSY;
-
-       if (mode & FMODE_EXCL)
+       if (mode & BLK_OPEN_EXCL)
                p->ref = -1;
        else
                p->ref++;
 
        p->type = type;
 
-       if (mode & FMODE_NDELAY)
+       if (mode & BLK_OPEN_NDELAY)
                return 0;
 
-       if (mode & (FMODE_READ|FMODE_WRITE)) {
-               if (bdev_check_media_change(bdev))
-                       floppy_revalidate(bdev->bd_disk);
-               if (mode & FMODE_WRITE) {
+       if (mode & (BLK_OPEN_READ | BLK_OPEN_WRITE)) {
+               if (disk_check_media_change(disk))
+                       floppy_revalidate(disk);
+               if (mode & BLK_OPEN_WRITE) {
                        if (p->wpstat) {
                                if (p->ref < 0)
                                        p->ref = 0;
@@ -1953,18 +1952,18 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
        return 0;
 }
 
-static int floppy_unlocked_open(struct block_device *bdev, fmode_t mode)
+static int floppy_unlocked_open(struct gendisk *disk, blk_mode_t mode)
 {
        int ret;
 
        mutex_lock(&ataflop_mutex);
-       ret = floppy_open(bdev, mode);
+       ret = floppy_open(disk, mode);
        mutex_unlock(&ataflop_mutex);
 
        return ret;
 }
 
-static void floppy_release(struct gendisk *disk, fmode_t mode)
+static void floppy_release(struct gendisk *disk)
 {
        struct atari_floppy_struct *p = disk->private_data;
        mutex_lock(&ataflop_mutex);
index bcad9b9..970bd6f 100644 (file)
@@ -19,7 +19,7 @@
 #include <linux/highmem.h>
 #include <linux/mutex.h>
 #include <linux/pagemap.h>
-#include <linux/radix-tree.h>
+#include <linux/xarray.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/backing-dev.h>
@@ -28,7 +28,7 @@
 #include <linux/uaccess.h>
 
 /*
- * Each block ramdisk device has a radix_tree brd_pages of pages that stores
+ * Each block ramdisk device has a xarray brd_pages of pages that stores
  * the pages containing the block device's contents. A brd page's ->index is
  * its offset in PAGE_SIZE units. This is similar to, but in no way connected
  * with, the kernel's pagecache or buffer cache (which sit above our block
@@ -40,11 +40,9 @@ struct brd_device {
        struct list_head        brd_list;
 
        /*
-        * Backing store of pages and lock to protect it. This is the contents
-        * of the block device.
+        * Backing store of pages. This is the contents of the block device.
         */
-       spinlock_t              brd_lock;
-       struct radix_tree_root  brd_pages;
+       struct xarray           brd_pages;
        u64                     brd_nr_pages;
 };
 
@@ -56,21 +54,8 @@ static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
        pgoff_t idx;
        struct page *page;
 
-       /*
-        * The page lifetime is protected by the fact that we have opened the
-        * device node -- brd pages will never be deleted under us, so we
-        * don't need any further locking or refcounting.
-        *
-        * This is strictly true for the radix-tree nodes as well (ie. we
-        * don't actually need the rcu_read_lock()), however that is not a
-        * documented feature of the radix-tree API so it is better to be
-        * safe here (we don't have total exclusion from radix tree updates
-        * here, only deletes).
-        */
-       rcu_read_lock();
        idx = sector >> PAGE_SECTORS_SHIFT; /* sector to page index */
-       page = radix_tree_lookup(&brd->brd_pages, idx);
-       rcu_read_unlock();
+       page = xa_load(&brd->brd_pages, idx);
 
        BUG_ON(page && page->index != idx);
 
@@ -83,7 +68,7 @@ static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
 static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
 {
        pgoff_t idx;
-       struct page *page;
+       struct page *page, *cur;
        int ret = 0;
 
        page = brd_lookup_page(brd, sector);
@@ -94,71 +79,42 @@ static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
        if (!page)
                return -ENOMEM;
 
-       if (radix_tree_maybe_preload(gfp)) {
-               __free_page(page);
-               return -ENOMEM;
-       }
+       xa_lock(&brd->brd_pages);
 
-       spin_lock(&brd->brd_lock);
        idx = sector >> PAGE_SECTORS_SHIFT;
        page->index = idx;
-       if (radix_tree_insert(&brd->brd_pages, idx, page)) {
+
+       cur = __xa_cmpxchg(&brd->brd_pages, idx, NULL, page, gfp);
+
+       if (unlikely(cur)) {
                __free_page(page);
-               page = radix_tree_lookup(&brd->brd_pages, idx);
-               if (!page)
-                       ret = -ENOMEM;
-               else if (page->index != idx)
+               ret = xa_err(cur);
+               if (!ret && (cur->index != idx))
                        ret = -EIO;
        } else {
                brd->brd_nr_pages++;
        }
-       spin_unlock(&brd->brd_lock);
 
-       radix_tree_preload_end();
+       xa_unlock(&brd->brd_pages);
+
        return ret;
 }
 
 /*
- * Free all backing store pages and radix tree. This must only be called when
+ * Free all backing store pages and xarray. This must only be called when
  * there are no other users of the device.
  */
-#define FREE_BATCH 16
 static void brd_free_pages(struct brd_device *brd)
 {
-       unsigned long pos = 0;
-       struct page *pages[FREE_BATCH];
-       int nr_pages;
-
-       do {
-               int i;
-
-               nr_pages = radix_tree_gang_lookup(&brd->brd_pages,
-                               (void **)pages, pos, FREE_BATCH);
-
-               for (i = 0; i < nr_pages; i++) {
-                       void *ret;
-
-                       BUG_ON(pages[i]->index < pos);
-                       pos = pages[i]->index;
-                       ret = radix_tree_delete(&brd->brd_pages, pos);
-                       BUG_ON(!ret || ret != pages[i]);
-                       __free_page(pages[i]);
-               }
-
-               pos++;
+       struct page *page;
+       pgoff_t idx;
 
-               /*
-                * It takes 3.4 seconds to remove 80GiB ramdisk.
-                * So, we need cond_resched to avoid stalling the CPU.
-                */
+       xa_for_each(&brd->brd_pages, idx, page) {
+               __free_page(page);
                cond_resched();
+       }
 
-               /*
-                * This assumes radix_tree_gang_lookup always returns as
-                * many pages as possible. If the radix-tree code changes,
-                * so will this have to.
-                */
-       } while (nr_pages == FREE_BATCH);
+       xa_destroy(&brd->brd_pages);
 }
 
 /*
@@ -372,8 +328,7 @@ static int brd_alloc(int i)
        brd->brd_number         = i;
        list_add_tail(&brd->brd_list, &brd_devices);
 
-       spin_lock_init(&brd->brd_lock);
-       INIT_RADIX_TREE(&brd->brd_pages, GFP_ATOMIC);
+       xa_init(&brd->brd_pages);
 
        snprintf(buf, DISK_NAME_LEN, "ram%d", i);
        if (!IS_ERR_OR_NULL(brd_debugfs_dir))
index 6ac8c54..85ca000 100644 (file)
@@ -1043,9 +1043,7 @@ static void bm_page_io_async(struct drbd_bm_aio_ctx *ctx, int page_nr) __must_ho
        bio = bio_alloc_bioset(device->ldev->md_bdev, 1, op, GFP_NOIO,
                        &drbd_md_io_bio_set);
        bio->bi_iter.bi_sector = on_disk_sector;
-       /* bio_add_page of a single page to an empty bio will always succeed,
-        * according to api.  Do we want to assert that? */
-       bio_add_page(bio, page, len, 0);
+       __bio_add_page(bio, page, len, 0);
        bio->bi_private = ctx;
        bio->bi_end_io = drbd_bm_endio;
 
index 83987e7..965f672 100644 (file)
@@ -37,7 +37,6 @@
 #include <linux/notifier.h>
 #include <linux/kthread.h>
 #include <linux/workqueue.h>
-#define __KERNEL_SYSCALLS__
 #include <linux/unistd.h>
 #include <linux/vmalloc.h>
 #include <linux/sched/signal.h>
@@ -50,8 +49,8 @@
 #include "drbd_debugfs.h"
 
 static DEFINE_MUTEX(drbd_main_mutex);
-static int drbd_open(struct block_device *bdev, fmode_t mode);
-static void drbd_release(struct gendisk *gd, fmode_t mode);
+static int drbd_open(struct gendisk *disk, blk_mode_t mode);
+static void drbd_release(struct gendisk *gd);
 static void md_sync_timer_fn(struct timer_list *t);
 static int w_bitmap_io(struct drbd_work *w, int unused);
 
@@ -1883,9 +1882,9 @@ int drbd_send_all(struct drbd_connection *connection, struct socket *sock, void
        return 0;
 }
 
-static int drbd_open(struct block_device *bdev, fmode_t mode)
+static int drbd_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct drbd_device *device = bdev->bd_disk->private_data;
+       struct drbd_device *device = disk->private_data;
        unsigned long flags;
        int rv = 0;
 
@@ -1895,7 +1894,7 @@ static int drbd_open(struct block_device *bdev, fmode_t mode)
         * and no race with updating open_cnt */
 
        if (device->state.role != R_PRIMARY) {
-               if (mode & FMODE_WRITE)
+               if (mode & BLK_OPEN_WRITE)
                        rv = -EROFS;
                else if (!drbd_allow_oos)
                        rv = -EMEDIUMTYPE;
@@ -1909,9 +1908,10 @@ static int drbd_open(struct block_device *bdev, fmode_t mode)
        return rv;
 }
 
-static void drbd_release(struct gendisk *gd, fmode_t mode)
+static void drbd_release(struct gendisk *gd)
 {
        struct drbd_device *device = gd->private_data;
+
        mutex_lock(&drbd_main_mutex);
        device->open_cnt--;
        mutex_unlock(&drbd_main_mutex);
index 1a5d3d7..cddae6f 100644 (file)
@@ -1640,8 +1640,8 @@ static struct block_device *open_backing_dev(struct drbd_device *device,
        struct block_device *bdev;
        int err = 0;
 
-       bdev = blkdev_get_by_path(bdev_path,
-                                 FMODE_READ | FMODE_WRITE | FMODE_EXCL, claim_ptr);
+       bdev = blkdev_get_by_path(bdev_path, BLK_OPEN_READ | BLK_OPEN_WRITE,
+                                 claim_ptr, NULL);
        if (IS_ERR(bdev)) {
                drbd_err(device, "open(\"%s\") failed with %ld\n",
                                bdev_path, PTR_ERR(bdev));
@@ -1653,7 +1653,7 @@ static struct block_device *open_backing_dev(struct drbd_device *device,
 
        err = bd_link_disk_holder(bdev, device->vdisk);
        if (err) {
-               blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+               blkdev_put(bdev, claim_ptr);
                drbd_err(device, "bd_link_disk_holder(\"%s\", ...) failed with %d\n",
                                bdev_path, err);
                bdev = ERR_PTR(err);
@@ -1695,13 +1695,13 @@ static int open_backing_devices(struct drbd_device *device,
 }
 
 static void close_backing_dev(struct drbd_device *device, struct block_device *bdev,
-       bool do_bd_unlink)
+               void *claim_ptr, bool do_bd_unlink)
 {
        if (!bdev)
                return;
        if (do_bd_unlink)
                bd_unlink_disk_holder(bdev, device->vdisk);
-       blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+       blkdev_put(bdev, claim_ptr);
 }
 
 void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *ldev)
@@ -1709,8 +1709,11 @@ void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *
        if (ldev == NULL)
                return;
 
-       close_backing_dev(device, ldev->md_bdev, ldev->md_bdev != ldev->backing_bdev);
-       close_backing_dev(device, ldev->backing_bdev, true);
+       close_backing_dev(device, ldev->md_bdev,
+                         ldev->md.meta_dev_idx < 0 ?
+                               (void *)device : (void *)drbd_m_holder,
+                         ldev->md_bdev != ldev->backing_bdev);
+       close_backing_dev(device, ldev->backing_bdev, device, true);
 
        kfree(ldev->disk_conf);
        kfree(ldev);
@@ -2126,8 +2129,11 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
  fail:
        conn_reconfig_done(connection);
        if (nbc) {
-               close_backing_dev(device, nbc->md_bdev, nbc->md_bdev != nbc->backing_bdev);
-               close_backing_dev(device, nbc->backing_bdev, true);
+               close_backing_dev(device, nbc->md_bdev,
+                         nbc->disk_conf->meta_dev_idx < 0 ?
+                               (void *)device : (void *)drbd_m_holder,
+                         nbc->md_bdev != nbc->backing_bdev);
+               close_backing_dev(device, nbc->backing_bdev, device, true);
                kfree(nbc);
        }
        kfree(new_disk_conf);
index 8c2bc47..0c9f541 100644 (file)
@@ -27,7 +27,6 @@
 #include <uapi/linux/sched/types.h>
 #include <linux/sched/signal.h>
 #include <linux/pkt_sched.h>
-#define __KERNEL_SYSCALLS__
 #include <linux/unistd.h>
 #include <linux/vmalloc.h>
 #include <linux/random.h>
index cec2c20..2db9b18 100644 (file)
@@ -402,7 +402,7 @@ static struct floppy_drive_struct drive_state[N_DRIVE];
 static struct floppy_write_errors write_errors[N_DRIVE];
 static struct timer_list motor_off_timer[N_DRIVE];
 static struct blk_mq_tag_set tag_sets[N_DRIVE];
-static struct block_device *opened_bdev[N_DRIVE];
+static struct gendisk *opened_disk[N_DRIVE];
 static DEFINE_MUTEX(open_lock);
 static struct floppy_raw_cmd *raw_cmd, default_raw_cmd;
 
@@ -3210,13 +3210,13 @@ static int floppy_raw_cmd_ioctl(int type, int drive, int cmd,
 
 #endif
 
-static int invalidate_drive(struct block_device *bdev)
+static int invalidate_drive(struct gendisk *disk)
 {
        /* invalidate the buffer track to force a reread */
-       set_bit((long)bdev->bd_disk->private_data, &fake_change);
+       set_bit((long)disk->private_data, &fake_change);
        process_fd_request();
-       if (bdev_check_media_change(bdev))
-               floppy_revalidate(bdev->bd_disk);
+       if (disk_check_media_change(disk))
+               floppy_revalidate(disk);
        return 0;
 }
 
@@ -3251,10 +3251,11 @@ static int set_geometry(unsigned int cmd, struct floppy_struct *g,
                            floppy_type[type].size + 1;
                process_fd_request();
                for (cnt = 0; cnt < N_DRIVE; cnt++) {
-                       struct block_device *bdev = opened_bdev[cnt];
-                       if (!bdev || ITYPE(drive_state[cnt].fd_device) != type)
+                       struct gendisk *disk = opened_disk[cnt];
+
+                       if (!disk || ITYPE(drive_state[cnt].fd_device) != type)
                                continue;
-                       __invalidate_device(bdev, true);
+                       __invalidate_device(disk->part0, true);
                }
                mutex_unlock(&open_lock);
        } else {
@@ -3287,7 +3288,7 @@ static int set_geometry(unsigned int cmd, struct floppy_struct *g,
                    drive_state[current_drive].maxtrack ||
                    ((user_params[drive].sect ^ oldStretch) &
                     (FD_SWAPSIDES | FD_SECTBASEMASK)))
-                       invalidate_drive(bdev);
+                       invalidate_drive(bdev->bd_disk);
                else
                        process_fd_request();
        }
@@ -3393,8 +3394,8 @@ static bool valid_floppy_drive_params(const short autodetect[FD_AUTODETECT_SIZE]
        return true;
 }
 
-static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd,
-                   unsigned long param)
+static int fd_locked_ioctl(struct block_device *bdev, blk_mode_t mode,
+               unsigned int cmd, unsigned long param)
 {
        int drive = (long)bdev->bd_disk->private_data;
        int type = ITYPE(drive_state[drive].fd_device);
@@ -3427,7 +3428,8 @@ static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode, unsigned int
                return ret;
 
        /* permission checks */
-       if (((cmd & 0x40) && !(mode & (FMODE_WRITE | FMODE_WRITE_IOCTL))) ||
+       if (((cmd & 0x40) &&
+            !(mode & (BLK_OPEN_WRITE | BLK_OPEN_WRITE_IOCTL))) ||
            ((cmd & 0x80) && !capable(CAP_SYS_ADMIN)))
                return -EPERM;
 
@@ -3464,7 +3466,7 @@ static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode, unsigned int
                current_type[drive] = NULL;
                floppy_sizes[drive] = MAX_DISK_SIZE << 1;
                drive_state[drive].keep_data = 0;
-               return invalidate_drive(bdev);
+               return invalidate_drive(bdev->bd_disk);
        case FDSETPRM:
        case FDDEFPRM:
                return set_geometry(cmd, &inparam.g, drive, type, bdev);
@@ -3503,7 +3505,7 @@ static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode, unsigned int
        case FDFLUSH:
                if (lock_fdc(drive))
                        return -EINTR;
-               return invalidate_drive(bdev);
+               return invalidate_drive(bdev->bd_disk);
        case FDSETEMSGTRESH:
                drive_params[drive].max_errors.reporting = (unsigned short)(param & 0x0f);
                return 0;
@@ -3565,7 +3567,7 @@ static int fd_locked_ioctl(struct block_device *bdev, fmode_t mode, unsigned int
        return 0;
 }
 
-static int fd_ioctl(struct block_device *bdev, fmode_t mode,
+static int fd_ioctl(struct block_device *bdev, blk_mode_t mode,
                             unsigned int cmd, unsigned long param)
 {
        int ret;
@@ -3653,8 +3655,8 @@ struct compat_floppy_write_errors {
 #define FDGETFDCSTAT32 _IOR(2, 0x15, struct compat_floppy_fdc_state)
 #define FDWERRORGET32  _IOR(2, 0x17, struct compat_floppy_write_errors)
 
-static int compat_set_geometry(struct block_device *bdev, fmode_t mode, unsigned int cmd,
-                   struct compat_floppy_struct __user *arg)
+static int compat_set_geometry(struct block_device *bdev, blk_mode_t mode,
+               unsigned int cmd, struct compat_floppy_struct __user *arg)
 {
        struct floppy_struct v;
        int drive, type;
@@ -3663,7 +3665,7 @@ static int compat_set_geometry(struct block_device *bdev, fmode_t mode, unsigned
        BUILD_BUG_ON(offsetof(struct floppy_struct, name) !=
                     offsetof(struct compat_floppy_struct, name));
 
-       if (!(mode & (FMODE_WRITE | FMODE_WRITE_IOCTL)))
+       if (!(mode & (BLK_OPEN_WRITE | BLK_OPEN_WRITE_IOCTL)))
                return -EPERM;
 
        memset(&v, 0, sizeof(struct floppy_struct));
@@ -3860,8 +3862,8 @@ static int compat_werrorget(int drive,
        return 0;
 }
 
-static int fd_compat_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd,
-                   unsigned long param)
+static int fd_compat_ioctl(struct block_device *bdev, blk_mode_t mode,
+               unsigned int cmd, unsigned long param)
 {
        int drive = (long)bdev->bd_disk->private_data;
        switch (cmd) {
@@ -3962,7 +3964,7 @@ static void __init config_types(void)
                pr_cont("\n");
 }
 
-static void floppy_release(struct gendisk *disk, fmode_t mode)
+static void floppy_release(struct gendisk *disk)
 {
        int drive = (long)disk->private_data;
 
@@ -3973,7 +3975,7 @@ static void floppy_release(struct gendisk *disk, fmode_t mode)
                drive_state[drive].fd_ref = 0;
        }
        if (!drive_state[drive].fd_ref)
-               opened_bdev[drive] = NULL;
+               opened_disk[drive] = NULL;
        mutex_unlock(&open_lock);
        mutex_unlock(&floppy_mutex);
 }
@@ -3983,9 +3985,9 @@ static void floppy_release(struct gendisk *disk, fmode_t mode)
  * /dev/PS0 etc), and disallows simultaneous access to the same
  * drive with different device numbers.
  */
-static int floppy_open(struct block_device *bdev, fmode_t mode)
+static int floppy_open(struct gendisk *disk, blk_mode_t mode)
 {
-       int drive = (long)bdev->bd_disk->private_data;
+       int drive = (long)disk->private_data;
        int old_dev, new_dev;
        int try;
        int res = -EBUSY;
@@ -3994,7 +3996,7 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
        mutex_lock(&floppy_mutex);
        mutex_lock(&open_lock);
        old_dev = drive_state[drive].fd_device;
-       if (opened_bdev[drive] && opened_bdev[drive] != bdev)
+       if (opened_disk[drive] && opened_disk[drive] != disk)
                goto out2;
 
        if (!drive_state[drive].fd_ref && (drive_params[drive].flags & FD_BROKEN_DCL)) {
@@ -4004,7 +4006,7 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
 
        drive_state[drive].fd_ref++;
 
-       opened_bdev[drive] = bdev;
+       opened_disk[drive] = disk;
 
        res = -ENXIO;
 
@@ -4038,7 +4040,7 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
                }
        }
 
-       new_dev = MINOR(bdev->bd_dev);
+       new_dev = disk->first_minor;
        drive_state[drive].fd_device = new_dev;
        set_capacity(disks[drive][ITYPE(new_dev)], floppy_sizes[new_dev]);
        if (old_dev != -1 && old_dev != new_dev) {
@@ -4048,21 +4050,20 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
 
        if (fdc_state[FDC(drive)].rawcmd == 1)
                fdc_state[FDC(drive)].rawcmd = 2;
-
-       if (!(mode & FMODE_NDELAY)) {
-               if (mode & (FMODE_READ|FMODE_WRITE)) {
+       if (!(mode & BLK_OPEN_NDELAY)) {
+               if (mode & (BLK_OPEN_READ | BLK_OPEN_WRITE)) {
                        drive_state[drive].last_checked = 0;
                        clear_bit(FD_OPEN_SHOULD_FAIL_BIT,
                                  &drive_state[drive].flags);
-                       if (bdev_check_media_change(bdev))
-                               floppy_revalidate(bdev->bd_disk);
+                       if (disk_check_media_change(disk))
+                               floppy_revalidate(disk);
                        if (test_bit(FD_DISK_CHANGED_BIT, &drive_state[drive].flags))
                                goto out;
                        if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, &drive_state[drive].flags))
                                goto out;
                }
                res = -EROFS;
-               if ((mode & FMODE_WRITE) &&
+               if ((mode & BLK_OPEN_WRITE) &&
                    !test_bit(FD_DISK_WRITABLE_BIT, &drive_state[drive].flags))
                        goto out;
        }
@@ -4073,7 +4074,7 @@ out:
        drive_state[drive].fd_ref--;
 
        if (!drive_state[drive].fd_ref)
-               opened_bdev[drive] = NULL;
+               opened_disk[drive] = NULL;
 out2:
        mutex_unlock(&open_lock);
        mutex_unlock(&floppy_mutex);
@@ -4147,7 +4148,7 @@ static int __floppy_read_block_0(struct block_device *bdev, int drive)
        cbdata.drive = drive;
 
        bio_init(&bio, bdev, &bio_vec, 1, REQ_OP_READ);
-       bio_add_page(&bio, page, block_size(bdev), 0);
+       __bio_add_page(&bio, page, block_size(bdev), 0);
 
        bio.bi_iter.bi_sector = 0;
        bio.bi_flags |= (1 << BIO_QUIET);
@@ -4203,7 +4204,8 @@ static int floppy_revalidate(struct gendisk *disk)
                        drive_state[drive].generation++;
                if (drive_no_geom(drive)) {
                        /* auto-sensing */
-                       res = __floppy_read_block_0(opened_bdev[drive], drive);
+                       res = __floppy_read_block_0(opened_disk[drive]->part0,
+                                                   drive);
                } else {
                        if (cf)
                                poll_drive(false, FD_RAW_NEED_DISK);
index bc31bb7..37511d2 100644 (file)
@@ -990,7 +990,7 @@ loop_set_status_from_info(struct loop_device *lo,
        return 0;
 }
 
-static int loop_configure(struct loop_device *lo, fmode_t mode,
+static int loop_configure(struct loop_device *lo, blk_mode_t mode,
                          struct block_device *bdev,
                          const struct loop_config *config)
 {
@@ -1014,8 +1014,8 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
         * If we don't hold exclusive handle for the device, upgrade to it
         * here to avoid changing device under exclusive owner.
         */
-       if (!(mode & FMODE_EXCL)) {
-               error = bd_prepare_to_claim(bdev, loop_configure);
+       if (!(mode & BLK_OPEN_EXCL)) {
+               error = bd_prepare_to_claim(bdev, loop_configure, NULL);
                if (error)
                        goto out_putf;
        }
@@ -1050,7 +1050,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
        if (error)
                goto out_unlock;
 
-       if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) ||
+       if (!(file->f_mode & FMODE_WRITE) || !(mode & BLK_OPEN_WRITE) ||
            !file->f_op->write_iter)
                lo->lo_flags |= LO_FLAGS_READ_ONLY;
 
@@ -1116,7 +1116,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
        if (partscan)
                loop_reread_partitions(lo);
 
-       if (!(mode & FMODE_EXCL))
+       if (!(mode & BLK_OPEN_EXCL))
                bd_abort_claiming(bdev, loop_configure);
 
        return 0;
@@ -1124,7 +1124,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 out_unlock:
        loop_global_unlock(lo, is_loop);
 out_bdev:
-       if (!(mode & FMODE_EXCL))
+       if (!(mode & BLK_OPEN_EXCL))
                bd_abort_claiming(bdev, loop_configure);
 out_putf:
        fput(file);
@@ -1528,7 +1528,7 @@ static int lo_simple_ioctl(struct loop_device *lo, unsigned int cmd,
        return err;
 }
 
-static int lo_ioctl(struct block_device *bdev, fmode_t mode,
+static int lo_ioctl(struct block_device *bdev, blk_mode_t mode,
        unsigned int cmd, unsigned long arg)
 {
        struct loop_device *lo = bdev->bd_disk->private_data;
@@ -1563,24 +1563,22 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode,
                return loop_clr_fd(lo);
        case LOOP_SET_STATUS:
                err = -EPERM;
-               if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
+               if ((mode & BLK_OPEN_WRITE) || capable(CAP_SYS_ADMIN))
                        err = loop_set_status_old(lo, argp);
-               }
                break;
        case LOOP_GET_STATUS:
                return loop_get_status_old(lo, argp);
        case LOOP_SET_STATUS64:
                err = -EPERM;
-               if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
+               if ((mode & BLK_OPEN_WRITE) || capable(CAP_SYS_ADMIN))
                        err = loop_set_status64(lo, argp);
-               }
                break;
        case LOOP_GET_STATUS64:
                return loop_get_status64(lo, argp);
        case LOOP_SET_CAPACITY:
        case LOOP_SET_DIRECT_IO:
        case LOOP_SET_BLOCK_SIZE:
-               if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
+               if (!(mode & BLK_OPEN_WRITE) && !capable(CAP_SYS_ADMIN))
                        return -EPERM;
                fallthrough;
        default:
@@ -1691,7 +1689,7 @@ loop_get_status_compat(struct loop_device *lo,
        return err;
 }
 
-static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
+static int lo_compat_ioctl(struct block_device *bdev, blk_mode_t mode,
                           unsigned int cmd, unsigned long arg)
 {
        struct loop_device *lo = bdev->bd_disk->private_data;
@@ -1727,7 +1725,7 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
 }
 #endif
 
-static void lo_release(struct gendisk *disk, fmode_t mode)
+static void lo_release(struct gendisk *disk)
 {
        struct loop_device *lo = disk->private_data;
 
index 815d77b..b200950 100644 (file)
@@ -3041,7 +3041,7 @@ static int rssd_disk_name_format(char *prefix,
  *                 structure pointer.
  */
 static int mtip_block_ioctl(struct block_device *dev,
-                           fmode_t mode,
+                           blk_mode_t mode,
                            unsigned cmd,
                            unsigned long arg)
 {
@@ -3079,7 +3079,7 @@ static int mtip_block_ioctl(struct block_device *dev,
  *                 structure pointer.
  */
 static int mtip_block_compat_ioctl(struct block_device *dev,
-                           fmode_t mode,
+                           blk_mode_t mode,
                            unsigned cmd,
                            unsigned long arg)
 {
index 9c35c95..8576d69 100644 (file)
@@ -1502,7 +1502,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
        return -ENOTTY;
 }
 
-static int nbd_ioctl(struct block_device *bdev, fmode_t mode,
+static int nbd_ioctl(struct block_device *bdev, blk_mode_t mode,
                     unsigned int cmd, unsigned long arg)
 {
        struct nbd_device *nbd = bdev->bd_disk->private_data;
@@ -1553,13 +1553,13 @@ static struct nbd_config *nbd_alloc_config(void)
        return config;
 }
 
-static int nbd_open(struct block_device *bdev, fmode_t mode)
+static int nbd_open(struct gendisk *disk, blk_mode_t mode)
 {
        struct nbd_device *nbd;
        int ret = 0;
 
        mutex_lock(&nbd_index_mutex);
-       nbd = bdev->bd_disk->private_data;
+       nbd = disk->private_data;
        if (!nbd) {
                ret = -ENXIO;
                goto out;
@@ -1587,17 +1587,17 @@ static int nbd_open(struct block_device *bdev, fmode_t mode)
                refcount_inc(&nbd->refs);
                mutex_unlock(&nbd->config_lock);
                if (max_part)
-                       set_bit(GD_NEED_PART_SCAN, &bdev->bd_disk->state);
+                       set_bit(GD_NEED_PART_SCAN, &disk->state);
        } else if (nbd_disconnected(nbd->config)) {
                if (max_part)
-                       set_bit(GD_NEED_PART_SCAN, &bdev->bd_disk->state);
+                       set_bit(GD_NEED_PART_SCAN, &disk->state);
        }
 out:
        mutex_unlock(&nbd_index_mutex);
        return ret;
 }
 
-static void nbd_release(struct gendisk *disk, fmode_t mode)
+static void nbd_release(struct gendisk *disk)
 {
        struct nbd_device *nbd = disk->private_data;
 
@@ -1666,7 +1666,7 @@ static int nbd_dev_dbg_init(struct nbd_device *nbd)
                return -EIO;
 
        dir = debugfs_create_dir(nbd_name(nbd), nbd_dbg_dir);
-       if (!dir) {
+       if (IS_ERR(dir)) {
                dev_err(nbd_to_dev(nbd), "Failed to create debugfs dir for '%s'\n",
                        nbd_name(nbd));
                return -EIO;
@@ -1692,7 +1692,7 @@ static int nbd_dbg_init(void)
        struct dentry *dbg_dir;
 
        dbg_dir = debugfs_create_dir("nbd", NULL);
-       if (!dbg_dir)
+       if (IS_ERR(dbg_dir))
                return -EIO;
 
        nbd_dbg_dir = dbg_dir;
@@ -1776,7 +1776,8 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs)
                if (err == -ENOSPC)
                        err = -EEXIST;
        } else {
-               err = idr_alloc(&nbd_index_idr, nbd, 0, 0, GFP_KERNEL);
+               err = idr_alloc(&nbd_index_idr, nbd, 0,
+                               (MINORMASK >> part_shift) + 1, GFP_KERNEL);
                if (err >= 0)
                        index = err;
        }
index b3fedaf..8640130 100644 (file)
@@ -2244,6 +2244,7 @@ static void null_destroy_dev(struct nullb *nullb)
        struct nullb_device *dev = nullb->dev;
 
        null_del_dev(nullb);
+       null_free_device_storage(dev, false);
        null_free_dev(dev);
 }
 
index d5d7884..a142853 100644 (file)
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
-#include <linux/pktcdvd.h>
-#include <linux/module.h>
-#include <linux/types.h>
-#include <linux/kernel.h>
+#include <linux/backing-dev.h>
 #include <linux/compat.h>
-#include <linux/kthread.h>
+#include <linux/debugfs.h>
+#include <linux/device.h>
 #include <linux/errno.h>
-#include <linux/spinlock.h>
 #include <linux/file.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/miscdevice.h>
 #include <linux/freezer.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
 #include <linux/mutex.h>
+#include <linux/nospec.h>
+#include <linux/pktcdvd.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
 #include <linux/slab.h>
-#include <linux/backing-dev.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+
+#include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_ioctl.h>
-#include <scsi/scsi.h>
-#include <linux/debugfs.h>
-#include <linux/device.h>
-#include <linux/nospec.h>
-#include <linux/uaccess.h>
 
-#define DRIVER_NAME    "pktcdvd"
+#include <asm/unaligned.h>
 
-#define pkt_err(pd, fmt, ...)                                          \
-       pr_err("%s: " fmt, pd->name, ##__VA_ARGS__)
-#define pkt_notice(pd, fmt, ...)                                       \
-       pr_notice("%s: " fmt, pd->name, ##__VA_ARGS__)
-#define pkt_info(pd, fmt, ...)                                         \
-       pr_info("%s: " fmt, pd->name, ##__VA_ARGS__)
-
-#define pkt_dbg(level, pd, fmt, ...)                                   \
-do {                                                                   \
-       if (level == 2 && PACKET_DEBUG >= 2)                            \
-               pr_notice("%s: %s():" fmt,                              \
-                         pd->name, __func__, ##__VA_ARGS__);           \
-       else if (level == 1 && PACKET_DEBUG >= 1)                       \
-               pr_notice("%s: " fmt, pd->name, ##__VA_ARGS__);         \
-} while (0)
+#define DRIVER_NAME    "pktcdvd"
 
 #define MAX_SPEED 0xffff
 
@@ -107,7 +94,6 @@ static struct dentry *pkt_debugfs_root = NULL; /* /sys/kernel/debug/pktcdvd */
 /* forward declaration */
 static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev);
 static int pkt_remove_dev(dev_t pkt_dev);
-static int pkt_seq_show(struct seq_file *m, void *p);
 
 static sector_t get_zone(sector_t sector, struct pktcdvd_device *pd)
 {
@@ -253,15 +239,16 @@ static ssize_t congestion_off_store(struct device *dev,
                                    const char *buf, size_t len)
 {
        struct pktcdvd_device *pd = dev_get_drvdata(dev);
-       int val;
+       int val, ret;
 
-       if (sscanf(buf, "%d", &val) == 1) {
-               spin_lock(&pd->lock);
-               pd->write_congestion_off = val;
-               init_write_congestion_marks(&pd->write_congestion_off,
-                                       &pd->write_congestion_on);
-               spin_unlock(&pd->lock);
-       }
+       ret = kstrtoint(buf, 10, &val);
+       if (ret)
+               return ret;
+
+       spin_lock(&pd->lock);
+       pd->write_congestion_off = val;
+       init_write_congestion_marks(&pd->write_congestion_off, &pd->write_congestion_on);
+       spin_unlock(&pd->lock);
        return len;
 }
 static DEVICE_ATTR_RW(congestion_off);
@@ -283,15 +270,16 @@ static ssize_t congestion_on_store(struct device *dev,
                                   const char *buf, size_t len)
 {
        struct pktcdvd_device *pd = dev_get_drvdata(dev);
-       int val;
+       int val, ret;
 
-       if (sscanf(buf, "%d", &val) == 1) {
-               spin_lock(&pd->lock);
-               pd->write_congestion_on = val;
-               init_write_congestion_marks(&pd->write_congestion_off,
-                                       &pd->write_congestion_on);
-               spin_unlock(&pd->lock);
-       }
+       ret = kstrtoint(buf, 10, &val);
+       if (ret)
+               return ret;
+
+       spin_lock(&pd->lock);
+       pd->write_congestion_on = val;
+       init_write_congestion_marks(&pd->write_congestion_off, &pd->write_congestion_on);
+       spin_unlock(&pd->lock);
        return len;
 }
 static DEVICE_ATTR_RW(congestion_on);
@@ -319,7 +307,7 @@ static void pkt_sysfs_dev_new(struct pktcdvd_device *pd)
        if (class_is_registered(&class_pktcdvd)) {
                pd->dev = device_create_with_groups(&class_pktcdvd, NULL,
                                                    MKDEV(0, 0), pd, pkt_groups,
-                                                   "%s", pd->name);
+                                                   "%s", pd->disk->disk_name);
                if (IS_ERR(pd->dev))
                        pd->dev = NULL;
        }
@@ -349,8 +337,8 @@ static ssize_t device_map_show(const struct class *c, const struct class_attribu
                struct pktcdvd_device *pd = pkt_devs[idx];
                if (!pd)
                        continue;
-               n += sprintf(data+n, "%s %u:%u %u:%u\n",
-                       pd->name,
+               n += sysfs_emit_at(data, n, "%s %u:%u %u:%u\n",
+                       pd->disk->disk_name,
                        MAJOR(pd->pkt_dev), MINOR(pd->pkt_dev),
                        MAJOR(pd->bdev->bd_dev),
                        MINOR(pd->bdev->bd_dev));
@@ -428,34 +416,92 @@ static void pkt_sysfs_cleanup(void)
 
  *******************************************************************/
 
-static int pkt_debugfs_seq_show(struct seq_file *m, void *p)
+static void pkt_count_states(struct pktcdvd_device *pd, int *states)
 {
-       return pkt_seq_show(m, p);
+       struct packet_data *pkt;
+       int i;
+
+       for (i = 0; i < PACKET_NUM_STATES; i++)
+               states[i] = 0;
+
+       spin_lock(&pd->cdrw.active_list_lock);
+       list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) {
+               states[pkt->state]++;
+       }
+       spin_unlock(&pd->cdrw.active_list_lock);
 }
 
-static int pkt_debugfs_fops_open(struct inode *inode, struct file *file)
+static int pkt_seq_show(struct seq_file *m, void *p)
 {
-       return single_open(file, pkt_debugfs_seq_show, inode->i_private);
-}
+       struct pktcdvd_device *pd = m->private;
+       char *msg;
+       int states[PACKET_NUM_STATES];
 
-static const struct file_operations debug_fops = {
-       .open           = pkt_debugfs_fops_open,
-       .read           = seq_read,
-       .llseek         = seq_lseek,
-       .release        = single_release,
-       .owner          = THIS_MODULE,
-};
+       seq_printf(m, "Writer %s mapped to %pg:\n", pd->disk->disk_name, pd->bdev);
+
+       seq_printf(m, "\nSettings:\n");
+       seq_printf(m, "\tpacket size:\t\t%dkB\n", pd->settings.size / 2);
+
+       if (pd->settings.write_type == 0)
+               msg = "Packet";
+       else
+               msg = "Unknown";
+       seq_printf(m, "\twrite type:\t\t%s\n", msg);
+
+       seq_printf(m, "\tpacket type:\t\t%s\n", pd->settings.fp ? "Fixed" : "Variable");
+       seq_printf(m, "\tlink loss:\t\t%d\n", pd->settings.link_loss);
+
+       seq_printf(m, "\ttrack mode:\t\t%d\n", pd->settings.track_mode);
+
+       if (pd->settings.block_mode == PACKET_BLOCK_MODE1)
+               msg = "Mode 1";
+       else if (pd->settings.block_mode == PACKET_BLOCK_MODE2)
+               msg = "Mode 2";
+       else
+               msg = "Unknown";
+       seq_printf(m, "\tblock mode:\t\t%s\n", msg);
+
+       seq_printf(m, "\nStatistics:\n");
+       seq_printf(m, "\tpackets started:\t%lu\n", pd->stats.pkt_started);
+       seq_printf(m, "\tpackets ended:\t\t%lu\n", pd->stats.pkt_ended);
+       seq_printf(m, "\twritten:\t\t%lukB\n", pd->stats.secs_w >> 1);
+       seq_printf(m, "\tread gather:\t\t%lukB\n", pd->stats.secs_rg >> 1);
+       seq_printf(m, "\tread:\t\t\t%lukB\n", pd->stats.secs_r >> 1);
+
+       seq_printf(m, "\nMisc:\n");
+       seq_printf(m, "\treference count:\t%d\n", pd->refcnt);
+       seq_printf(m, "\tflags:\t\t\t0x%lx\n", pd->flags);
+       seq_printf(m, "\tread speed:\t\t%ukB/s\n", pd->read_speed);
+       seq_printf(m, "\twrite speed:\t\t%ukB/s\n", pd->write_speed);
+       seq_printf(m, "\tstart offset:\t\t%lu\n", pd->offset);
+       seq_printf(m, "\tmode page offset:\t%u\n", pd->mode_offset);
+
+       seq_printf(m, "\nQueue state:\n");
+       seq_printf(m, "\tbios queued:\t\t%d\n", pd->bio_queue_size);
+       seq_printf(m, "\tbios pending:\t\t%d\n", atomic_read(&pd->cdrw.pending_bios));
+       seq_printf(m, "\tcurrent sector:\t\t0x%llx\n", pd->current_sector);
+
+       pkt_count_states(pd, states);
+       seq_printf(m, "\tstate:\t\t\ti:%d ow:%d rw:%d ww:%d rec:%d fin:%d\n",
+                  states[0], states[1], states[2], states[3], states[4], states[5]);
+
+       seq_printf(m, "\twrite congestion marks:\toff=%d on=%d\n",
+                       pd->write_congestion_off,
+                       pd->write_congestion_on);
+       return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(pkt_seq);
 
 static void pkt_debugfs_dev_new(struct pktcdvd_device *pd)
 {
        if (!pkt_debugfs_root)
                return;
-       pd->dfs_d_root = debugfs_create_dir(pd->name, pkt_debugfs_root);
+       pd->dfs_d_root = debugfs_create_dir(pd->disk->disk_name, pkt_debugfs_root);
        if (!pd->dfs_d_root)
                return;
 
-       pd->dfs_f_info = debugfs_create_file("info", 0444,
-                                            pd->dfs_d_root, pd, &debug_fops);
+       pd->dfs_f_info = debugfs_create_file("info", 0444, pd->dfs_d_root,
+                                            pd, &pkt_seq_fops);
 }
 
 static void pkt_debugfs_dev_remove(struct pktcdvd_device *pd)
@@ -484,9 +530,11 @@ static void pkt_debugfs_cleanup(void)
 
 static void pkt_bio_finished(struct pktcdvd_device *pd)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
+
        BUG_ON(atomic_read(&pd->cdrw.pending_bios) <= 0);
        if (atomic_dec_and_test(&pd->cdrw.pending_bios)) {
-               pkt_dbg(2, pd, "queue empty\n");
+               dev_dbg(ddev, "queue empty\n");
                atomic_set(&pd->iosched.attention, 1);
                wake_up(&pd->wqueue);
        }
@@ -717,15 +765,16 @@ static const char *sense_key_string(__u8 index)
 static void pkt_dump_sense(struct pktcdvd_device *pd,
                           struct packet_command *cgc)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        struct scsi_sense_hdr *sshdr = cgc->sshdr;
 
        if (sshdr)
-               pkt_err(pd, "%*ph - sense %02x.%02x.%02x (%s)\n",
+               dev_err(ddev, "%*ph - sense %02x.%02x.%02x (%s)\n",
                        CDROM_PACKET_SIZE, cgc->cmd,
                        sshdr->sense_key, sshdr->asc, sshdr->ascq,
                        sense_key_string(sshdr->sense_key));
        else
-               pkt_err(pd, "%*ph - no sense\n", CDROM_PACKET_SIZE, cgc->cmd);
+               dev_err(ddev, "%*ph - no sense\n", CDROM_PACKET_SIZE, cgc->cmd);
 }
 
 /*
@@ -762,10 +811,8 @@ static noinline_for_stack int pkt_set_speed(struct pktcdvd_device *pd,
        init_cdrom_command(&cgc, NULL, 0, CGC_DATA_NONE);
        cgc.sshdr = &sshdr;
        cgc.cmd[0] = GPCMD_SET_SPEED;
-       cgc.cmd[2] = (read_speed >> 8) & 0xff;
-       cgc.cmd[3] = read_speed & 0xff;
-       cgc.cmd[4] = (write_speed >> 8) & 0xff;
-       cgc.cmd[5] = write_speed & 0xff;
+       put_unaligned_be16(read_speed, &cgc.cmd[2]);
+       put_unaligned_be16(write_speed, &cgc.cmd[4]);
 
        ret = pkt_generic_packet(pd, &cgc);
        if (ret)
@@ -809,6 +856,7 @@ static void pkt_queue_bio(struct pktcdvd_device *pd, struct bio *bio)
  */
 static void pkt_iosched_process_queue(struct pktcdvd_device *pd)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
 
        if (atomic_read(&pd->iosched.attention) == 0)
                return;
@@ -836,7 +884,7 @@ static void pkt_iosched_process_queue(struct pktcdvd_device *pd)
                                need_write_seek = 0;
                        if (need_write_seek && reads_queued) {
                                if (atomic_read(&pd->cdrw.pending_bios) > 0) {
-                                       pkt_dbg(2, pd, "write, waiting\n");
+                                       dev_dbg(ddev, "write, waiting\n");
                                        break;
                                }
                                pkt_flush_cache(pd);
@@ -845,7 +893,7 @@ static void pkt_iosched_process_queue(struct pktcdvd_device *pd)
                } else {
                        if (!reads_queued && writes_queued) {
                                if (atomic_read(&pd->cdrw.pending_bios) > 0) {
-                                       pkt_dbg(2, pd, "read, waiting\n");
+                                       dev_dbg(ddev, "read, waiting\n");
                                        break;
                                }
                                pd->iosched.writing = 1;
@@ -892,25 +940,27 @@ static void pkt_iosched_process_queue(struct pktcdvd_device *pd)
  */
 static int pkt_set_segment_merging(struct pktcdvd_device *pd, struct request_queue *q)
 {
-       if ((pd->settings.size << 9) / CD_FRAMESIZE
-           <= queue_max_segments(q)) {
+       struct device *ddev = disk_to_dev(pd->disk);
+
+       if ((pd->settings.size << 9) / CD_FRAMESIZE <= queue_max_segments(q)) {
                /*
                 * The cdrom device can handle one segment/frame
                 */
                clear_bit(PACKET_MERGE_SEGS, &pd->flags);
                return 0;
-       } else if ((pd->settings.size << 9) / PAGE_SIZE
-                  <= queue_max_segments(q)) {
+       }
+
+       if ((pd->settings.size << 9) / PAGE_SIZE <= queue_max_segments(q)) {
                /*
                 * We can handle this case at the expense of some extra memory
                 * copies during write operations
                 */
                set_bit(PACKET_MERGE_SEGS, &pd->flags);
                return 0;
-       } else {
-               pkt_err(pd, "cdrom max_phys_segments too small\n");
-               return -EIO;
        }
+
+       dev_err(ddev, "cdrom max_phys_segments too small\n");
+       return -EIO;
 }
 
 static void pkt_end_io_read(struct bio *bio)
@@ -919,9 +969,8 @@ static void pkt_end_io_read(struct bio *bio)
        struct pktcdvd_device *pd = pkt->pd;
        BUG_ON(!pd);
 
-       pkt_dbg(2, pd, "bio=%p sec0=%llx sec=%llx err=%d\n",
-               bio, (unsigned long long)pkt->sector,
-               (unsigned long long)bio->bi_iter.bi_sector, bio->bi_status);
+       dev_dbg(disk_to_dev(pd->disk), "bio=%p sec0=%llx sec=%llx err=%d\n",
+               bio, pkt->sector, bio->bi_iter.bi_sector, bio->bi_status);
 
        if (bio->bi_status)
                atomic_inc(&pkt->io_errors);
@@ -939,7 +988,7 @@ static void pkt_end_io_packet_write(struct bio *bio)
        struct pktcdvd_device *pd = pkt->pd;
        BUG_ON(!pd);
 
-       pkt_dbg(2, pd, "id=%d, err=%d\n", pkt->id, bio->bi_status);
+       dev_dbg(disk_to_dev(pd->disk), "id=%d, err=%d\n", pkt->id, bio->bi_status);
 
        pd->stats.pkt_ended++;
 
@@ -955,6 +1004,7 @@ static void pkt_end_io_packet_write(struct bio *bio)
  */
 static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        int frames_read = 0;
        struct bio *bio;
        int f;
@@ -983,8 +1033,7 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
        spin_unlock(&pkt->lock);
 
        if (pkt->cache_valid) {
-               pkt_dbg(2, pd, "zone %llx cached\n",
-                       (unsigned long long)pkt->sector);
+               dev_dbg(ddev, "zone %llx cached\n", pkt->sector);
                goto out_account;
        }
 
@@ -1005,8 +1054,8 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
 
                p = (f * CD_FRAMESIZE) / PAGE_SIZE;
                offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
-               pkt_dbg(2, pd, "Adding frame %d, page:%p offs:%d\n",
-                       f, pkt->pages[p], offset);
+               dev_dbg(ddev, "Adding frame %d, page:%p offs:%d\n", f,
+                       pkt->pages[p], offset);
                if (!bio_add_page(bio, pkt->pages[p], CD_FRAMESIZE, offset))
                        BUG();
 
@@ -1016,8 +1065,7 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
        }
 
 out_account:
-       pkt_dbg(2, pd, "need %d frames for zone %llx\n",
-               frames_read, (unsigned long long)pkt->sector);
+       dev_dbg(ddev, "need %d frames for zone %llx\n", frames_read, pkt->sector);
        pd->stats.pkt_started++;
        pd->stats.secs_rg += frames_read * (CD_FRAMESIZE >> 9);
 }
@@ -1051,17 +1099,17 @@ static void pkt_put_packet_data(struct pktcdvd_device *pd, struct packet_data *p
        }
 }
 
-static inline void pkt_set_state(struct packet_data *pkt, enum packet_data_state state)
+static inline void pkt_set_state(struct device *ddev, struct packet_data *pkt,
+                                enum packet_data_state state)
 {
-#if PACKET_DEBUG > 1
        static const char *state_name[] = {
                "IDLE", "WAITING", "READ_WAIT", "WRITE_WAIT", "RECOVERY", "FINISHED"
        };
        enum packet_data_state old_state = pkt->state;
-       pkt_dbg(2, pd, "pkt %2d : s=%6llx %s -> %s\n",
-               pkt->id, (unsigned long long)pkt->sector,
-               state_name[old_state], state_name[state]);
-#endif
+
+       dev_dbg(ddev, "pkt %2d : s=%6llx %s -> %s\n",
+               pkt->id, pkt->sector, state_name[old_state], state_name[state]);
+
        pkt->state = state;
 }
 
@@ -1071,6 +1119,7 @@ static inline void pkt_set_state(struct packet_data *pkt, enum packet_data_state
  */
 static int pkt_handle_queue(struct pktcdvd_device *pd)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        struct packet_data *pkt, *p;
        struct bio *bio = NULL;
        sector_t zone = 0; /* Suppress gcc warning */
@@ -1080,7 +1129,7 @@ static int pkt_handle_queue(struct pktcdvd_device *pd)
        atomic_set(&pd->scan_queue, 0);
 
        if (list_empty(&pd->cdrw.pkt_free_list)) {
-               pkt_dbg(2, pd, "no pkt\n");
+               dev_dbg(ddev, "no pkt\n");
                return 0;
        }
 
@@ -1117,7 +1166,7 @@ try_next_bio:
        }
        spin_unlock(&pd->lock);
        if (!bio) {
-               pkt_dbg(2, pd, "no bio\n");
+               dev_dbg(ddev, "no bio\n");
                return 0;
        }
 
@@ -1133,12 +1182,13 @@ try_next_bio:
         * to this packet.
         */
        spin_lock(&pd->lock);
-       pkt_dbg(2, pd, "looking for zone %llx\n", (unsigned long long)zone);
+       dev_dbg(ddev, "looking for zone %llx\n", zone);
        while ((node = pkt_rbtree_find(pd, zone)) != NULL) {
+               sector_t tmp = get_zone(node->bio->bi_iter.bi_sector, pd);
+
                bio = node->bio;
-               pkt_dbg(2, pd, "found zone=%llx\n", (unsigned long long)
-                       get_zone(bio->bi_iter.bi_sector, pd));
-               if (get_zone(bio->bi_iter.bi_sector, pd) != zone)
+               dev_dbg(ddev, "found zone=%llx\n", tmp);
+               if (tmp != zone)
                        break;
                pkt_rbtree_erase(pd, node);
                spin_lock(&pkt->lock);
@@ -1157,7 +1207,7 @@ try_next_bio:
        spin_unlock(&pd->lock);
 
        pkt->sleep_time = max(PACKET_WAIT_TIME, 1);
-       pkt_set_state(pkt, PACKET_WAITING_STATE);
+       pkt_set_state(ddev, pkt, PACKET_WAITING_STATE);
        atomic_set(&pkt->run_sm, 1);
 
        spin_lock(&pd->cdrw.active_list_lock);
@@ -1209,6 +1259,7 @@ static void bio_list_copy_data(struct bio *dst, struct bio *src)
  */
 static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        int f;
 
        bio_init(pkt->w_bio, pd->bdev, pkt->w_bio->bi_inline_vecs, pkt->frames,
@@ -1225,7 +1276,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
                if (!bio_add_page(pkt->w_bio, page, CD_FRAMESIZE, offset))
                        BUG();
        }
-       pkt_dbg(2, pd, "vcnt=%d\n", pkt->w_bio->bi_vcnt);
+       dev_dbg(ddev, "vcnt=%d\n", pkt->w_bio->bi_vcnt);
 
        /*
         * Fill-in bvec with data from orig_bios.
@@ -1233,11 +1284,10 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
        spin_lock(&pkt->lock);
        bio_list_copy_data(pkt->w_bio, pkt->orig_bios.head);
 
-       pkt_set_state(pkt, PACKET_WRITE_WAIT_STATE);
+       pkt_set_state(ddev, pkt, PACKET_WRITE_WAIT_STATE);
        spin_unlock(&pkt->lock);
 
-       pkt_dbg(2, pd, "Writing %d frames for zone %llx\n",
-               pkt->write_size, (unsigned long long)pkt->sector);
+       dev_dbg(ddev, "Writing %d frames for zone %llx\n", pkt->write_size, pkt->sector);
 
        if (test_bit(PACKET_MERGE_SEGS, &pd->flags) || (pkt->write_size < pkt->frames))
                pkt->cache_valid = 1;
@@ -1265,7 +1315,9 @@ static void pkt_finish_packet(struct packet_data *pkt, blk_status_t status)
 
 static void pkt_run_state_machine(struct pktcdvd_device *pd, struct packet_data *pkt)
 {
-       pkt_dbg(2, pd, "pkt %d\n", pkt->id);
+       struct device *ddev = disk_to_dev(pd->disk);
+
+       dev_dbg(ddev, "pkt %d\n", pkt->id);
 
        for (;;) {
                switch (pkt->state) {
@@ -1275,7 +1327,7 @@ static void pkt_run_state_machine(struct pktcdvd_device *pd, struct packet_data
 
                        pkt->sleep_time = 0;
                        pkt_gather_data(pd, pkt);
-                       pkt_set_state(pkt, PACKET_READ_WAIT_STATE);
+                       pkt_set_state(ddev, pkt, PACKET_READ_WAIT_STATE);
                        break;
 
                case PACKET_READ_WAIT_STATE:
@@ -1283,7 +1335,7 @@ static void pkt_run_state_machine(struct pktcdvd_device *pd, struct packet_data
                                return;
 
                        if (atomic_read(&pkt->io_errors) > 0) {
-                               pkt_set_state(pkt, PACKET_RECOVERY_STATE);
+                               pkt_set_state(ddev, pkt, PACKET_RECOVERY_STATE);
                        } else {
                                pkt_start_write(pd, pkt);
                        }
@@ -1294,15 +1346,15 @@ static void pkt_run_state_machine(struct pktcdvd_device *pd, struct packet_data
                                return;
 
                        if (!pkt->w_bio->bi_status) {
-                               pkt_set_state(pkt, PACKET_FINISHED_STATE);
+                               pkt_set_state(ddev, pkt, PACKET_FINISHED_STATE);
                        } else {
-                               pkt_set_state(pkt, PACKET_RECOVERY_STATE);
+                               pkt_set_state(ddev, pkt, PACKET_RECOVERY_STATE);
                        }
                        break;
 
                case PACKET_RECOVERY_STATE:
-                       pkt_dbg(2, pd, "No recovery possible\n");
-                       pkt_set_state(pkt, PACKET_FINISHED_STATE);
+                       dev_dbg(ddev, "No recovery possible\n");
+                       pkt_set_state(ddev, pkt, PACKET_FINISHED_STATE);
                        break;
 
                case PACKET_FINISHED_STATE:
@@ -1318,6 +1370,7 @@ static void pkt_run_state_machine(struct pktcdvd_device *pd, struct packet_data
 
 static void pkt_handle_packets(struct pktcdvd_device *pd)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        struct packet_data *pkt, *next;
 
        /*
@@ -1338,28 +1391,13 @@ static void pkt_handle_packets(struct pktcdvd_device *pd)
                if (pkt->state == PACKET_FINISHED_STATE) {
                        list_del(&pkt->list);
                        pkt_put_packet_data(pd, pkt);
-                       pkt_set_state(pkt, PACKET_IDLE_STATE);
+                       pkt_set_state(ddev, pkt, PACKET_IDLE_STATE);
                        atomic_set(&pd->scan_queue, 1);
                }
        }
        spin_unlock(&pd->cdrw.active_list_lock);
 }
 
-static void pkt_count_states(struct pktcdvd_device *pd, int *states)
-{
-       struct packet_data *pkt;
-       int i;
-
-       for (i = 0; i < PACKET_NUM_STATES; i++)
-               states[i] = 0;
-
-       spin_lock(&pd->cdrw.active_list_lock);
-       list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) {
-               states[pkt->state]++;
-       }
-       spin_unlock(&pd->cdrw.active_list_lock);
-}
-
 /*
  * kcdrwd is woken up when writes have been queued for one of our
  * registered devices
@@ -1367,7 +1405,9 @@ static void pkt_count_states(struct pktcdvd_device *pd, int *states)
 static int kcdrwd(void *foobar)
 {
        struct pktcdvd_device *pd = foobar;
+       struct device *ddev = disk_to_dev(pd->disk);
        struct packet_data *pkt;
+       int states[PACKET_NUM_STATES];
        long min_sleep_time, residue;
 
        set_user_nice(current, MIN_NICE);
@@ -1398,13 +1438,9 @@ static int kcdrwd(void *foobar)
                                goto work_to_do;
 
                        /* Otherwise, go to sleep */
-                       if (PACKET_DEBUG > 1) {
-                               int states[PACKET_NUM_STATES];
-                               pkt_count_states(pd, states);
-                               pkt_dbg(2, pd, "i:%d ow:%d rw:%d ww:%d rec:%d fin:%d\n",
-                                       states[0], states[1], states[2],
-                                       states[3], states[4], states[5]);
-                       }
+                       pkt_count_states(pd, states);
+                       dev_dbg(ddev, "i:%d ow:%d rw:%d ww:%d rec:%d fin:%d\n",
+                               states[0], states[1], states[2], states[3], states[4], states[5]);
 
                        min_sleep_time = MAX_SCHEDULE_TIMEOUT;
                        list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) {
@@ -1412,9 +1448,9 @@ static int kcdrwd(void *foobar)
                                        min_sleep_time = pkt->sleep_time;
                        }
 
-                       pkt_dbg(2, pd, "sleeping\n");
+                       dev_dbg(ddev, "sleeping\n");
                        residue = schedule_timeout(min_sleep_time);
-                       pkt_dbg(2, pd, "wake up\n");
+                       dev_dbg(ddev, "wake up\n");
 
                        /* make swsusp happy with our thread */
                        try_to_freeze();
@@ -1462,7 +1498,7 @@ work_to_do:
 
 static void pkt_print_settings(struct pktcdvd_device *pd)
 {
-       pkt_info(pd, "%s packets, %u blocks, Mode-%c disc\n",
+       dev_info(disk_to_dev(pd->disk), "%s packets, %u blocks, Mode-%c disc\n",
                 pd->settings.fp ? "Fixed" : "Variable",
                 pd->settings.size >> 2,
                 pd->settings.block_mode == 8 ? '1' : '2');
@@ -1474,8 +1510,7 @@ static int pkt_mode_sense(struct pktcdvd_device *pd, struct packet_command *cgc,
 
        cgc->cmd[0] = GPCMD_MODE_SENSE_10;
        cgc->cmd[2] = page_code | (page_control << 6);
-       cgc->cmd[7] = cgc->buflen >> 8;
-       cgc->cmd[8] = cgc->buflen & 0xff;
+       put_unaligned_be16(cgc->buflen, &cgc->cmd[7]);
        cgc->data_direction = CGC_DATA_READ;
        return pkt_generic_packet(pd, cgc);
 }
@@ -1486,8 +1521,7 @@ static int pkt_mode_select(struct pktcdvd_device *pd, struct packet_command *cgc
        memset(cgc->buffer, 0, 2);
        cgc->cmd[0] = GPCMD_MODE_SELECT_10;
        cgc->cmd[1] = 0x10;             /* PF */
-       cgc->cmd[7] = cgc->buflen >> 8;
-       cgc->cmd[8] = cgc->buflen & 0xff;
+       put_unaligned_be16(cgc->buflen, &cgc->cmd[7]);
        cgc->data_direction = CGC_DATA_WRITE;
        return pkt_generic_packet(pd, cgc);
 }
@@ -1528,8 +1562,7 @@ static int pkt_get_track_info(struct pktcdvd_device *pd, __u16 track, __u8 type,
        init_cdrom_command(&cgc, ti, 8, CGC_DATA_READ);
        cgc.cmd[0] = GPCMD_READ_TRACK_RZONE_INFO;
        cgc.cmd[1] = type & 3;
-       cgc.cmd[4] = (track & 0xff00) >> 8;
-       cgc.cmd[5] = track & 0xff;
+       put_unaligned_be16(track, &cgc.cmd[4]);
        cgc.cmd[8] = 8;
        cgc.quiet = 1;
 
@@ -1590,6 +1623,7 @@ static noinline_for_stack int pkt_get_last_written(struct pktcdvd_device *pd,
  */
 static noinline_for_stack int pkt_set_write_settings(struct pktcdvd_device *pd)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        struct packet_command cgc;
        struct scsi_sense_hdr sshdr;
        write_param_page *wp;
@@ -1609,8 +1643,8 @@ static noinline_for_stack int pkt_set_write_settings(struct pktcdvd_device *pd)
                return ret;
        }
 
-       size = 2 + ((buffer[0] << 8) | (buffer[1] & 0xff));
-       pd->mode_offset = (buffer[6] << 8) | (buffer[7] & 0xff);
+       size = 2 + get_unaligned_be16(&buffer[0]);
+       pd->mode_offset = get_unaligned_be16(&buffer[6]);
        if (size > sizeof(buffer))
                size = sizeof(buffer);
 
@@ -1656,7 +1690,7 @@ static noinline_for_stack int pkt_set_write_settings(struct pktcdvd_device *pd)
                /*
                 * paranoia
                 */
-               pkt_err(pd, "write mode wrong %d\n", wp->data_block_type);
+               dev_err(ddev, "write mode wrong %d\n", wp->data_block_type);
                return 1;
        }
        wp->packet_size = cpu_to_be32(pd->settings.size >> 2);
@@ -1677,6 +1711,8 @@ static noinline_for_stack int pkt_set_write_settings(struct pktcdvd_device *pd)
  */
 static int pkt_writable_track(struct pktcdvd_device *pd, track_information *ti)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
+
        switch (pd->mmc3_profile) {
                case 0x1a: /* DVD+RW */
                case 0x12: /* DVD-RAM */
@@ -1701,7 +1737,7 @@ static int pkt_writable_track(struct pktcdvd_device *pd, track_information *ti)
        if (ti->rt == 1 && ti->blank == 0)
                return 1;
 
-       pkt_err(pd, "bad state %d-%d-%d\n", ti->rt, ti->blank, ti->packet);
+       dev_err(ddev, "bad state %d-%d-%d\n", ti->rt, ti->blank, ti->packet);
        return 0;
 }
 
@@ -1710,6 +1746,8 @@ static int pkt_writable_track(struct pktcdvd_device *pd, track_information *ti)
  */
 static int pkt_writable_disc(struct pktcdvd_device *pd, disc_information *di)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
+
        switch (pd->mmc3_profile) {
                case 0x0a: /* CD-RW */
                case 0xffff: /* MMC3 not supported */
@@ -1719,8 +1757,7 @@ static int pkt_writable_disc(struct pktcdvd_device *pd, disc_information *di)
                case 0x12: /* DVD-RAM */
                        return 1;
                default:
-                       pkt_dbg(2, pd, "Wrong disc profile (%x)\n",
-                               pd->mmc3_profile);
+                       dev_dbg(ddev, "Wrong disc profile (%x)\n", pd->mmc3_profile);
                        return 0;
        }
 
@@ -1729,22 +1766,22 @@ static int pkt_writable_disc(struct pktcdvd_device *pd, disc_information *di)
         * but i'm not sure, should we leave this to user apps? probably.
         */
        if (di->disc_type == 0xff) {
-               pkt_notice(pd, "unknown disc - no track?\n");
+               dev_notice(ddev, "unknown disc - no track?\n");
                return 0;
        }
 
        if (di->disc_type != 0x20 && di->disc_type != 0) {
-               pkt_err(pd, "wrong disc type (%x)\n", di->disc_type);
+               dev_err(ddev, "wrong disc type (%x)\n", di->disc_type);
                return 0;
        }
 
        if (di->erasable == 0) {
-               pkt_notice(pd, "disc not erasable\n");
+               dev_err(ddev, "disc not erasable\n");
                return 0;
        }
 
        if (di->border_status == PACKET_SESSION_RESERVED) {
-               pkt_err(pd, "can't write to last track (reserved)\n");
+               dev_err(ddev, "can't write to last track (reserved)\n");
                return 0;
        }
 
@@ -1753,6 +1790,7 @@ static int pkt_writable_disc(struct pktcdvd_device *pd, disc_information *di)
 
 static noinline_for_stack int pkt_probe_settings(struct pktcdvd_device *pd)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        struct packet_command cgc;
        unsigned char buf[12];
        disc_information di;
@@ -1763,14 +1801,14 @@ static noinline_for_stack int pkt_probe_settings(struct pktcdvd_device *pd)
        cgc.cmd[0] = GPCMD_GET_CONFIGURATION;
        cgc.cmd[8] = 8;
        ret = pkt_generic_packet(pd, &cgc);
-       pd->mmc3_profile = ret ? 0xffff : buf[6] << 8 | buf[7];
+       pd->mmc3_profile = ret ? 0xffff : get_unaligned_be16(&buf[6]);
 
        memset(&di, 0, sizeof(disc_information));
        memset(&ti, 0, sizeof(track_information));
 
        ret = pkt_get_disc_info(pd, &di);
        if (ret) {
-               pkt_err(pd, "failed get_disc\n");
+               dev_err(ddev, "failed get_disc\n");
                return ret;
        }
 
@@ -1782,12 +1820,12 @@ static noinline_for_stack int pkt_probe_settings(struct pktcdvd_device *pd)
        track = 1; /* (di.last_track_msb << 8) | di.last_track_lsb; */
        ret = pkt_get_track_info(pd, track, 1, &ti);
        if (ret) {
-               pkt_err(pd, "failed get_track\n");
+               dev_err(ddev, "failed get_track\n");
                return ret;
        }
 
        if (!pkt_writable_track(pd, &ti)) {
-               pkt_err(pd, "can't write to this track\n");
+               dev_err(ddev, "can't write to this track\n");
                return -EROFS;
        }
 
@@ -1797,11 +1835,11 @@ static noinline_for_stack int pkt_probe_settings(struct pktcdvd_device *pd)
         */
        pd->settings.size = be32_to_cpu(ti.fixed_packet_size) << 2;
        if (pd->settings.size == 0) {
-               pkt_notice(pd, "detected zero packet size!\n");
+               dev_notice(ddev, "detected zero packet size!\n");
                return -ENXIO;
        }
        if (pd->settings.size > PACKET_MAX_SECTORS) {
-               pkt_err(pd, "packet size is too big\n");
+               dev_err(ddev, "packet size is too big\n");
                return -EROFS;
        }
        pd->settings.fp = ti.fp;
@@ -1843,7 +1881,7 @@ static noinline_for_stack int pkt_probe_settings(struct pktcdvd_device *pd)
                        pd->settings.block_mode = PACKET_BLOCK_MODE2;
                        break;
                default:
-                       pkt_err(pd, "unknown data mode\n");
+                       dev_err(ddev, "unknown data mode\n");
                        return -EROFS;
        }
        return 0;
@@ -1854,6 +1892,7 @@ static noinline_for_stack int pkt_probe_settings(struct pktcdvd_device *pd)
  */
 static noinline_for_stack int pkt_write_caching(struct pktcdvd_device *pd)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        struct packet_command cgc;
        struct scsi_sense_hdr sshdr;
        unsigned char buf[64];
@@ -1880,13 +1919,13 @@ static noinline_for_stack int pkt_write_caching(struct pktcdvd_device *pd)
         */
        buf[pd->mode_offset + 10] |= (set << 2);
 
-       cgc.buflen = cgc.cmd[8] = 2 + ((buf[0] << 8) | (buf[1] & 0xff));
+       cgc.buflen = cgc.cmd[8] = 2 + get_unaligned_be16(&buf[0]);
        ret = pkt_mode_select(pd, &cgc);
        if (ret) {
-               pkt_err(pd, "write caching control failed\n");
+               dev_err(ddev, "write caching control failed\n");
                pkt_dump_sense(pd, &cgc);
        } else if (!ret && set)
-               pkt_notice(pd, "enabled write caching\n");
+               dev_notice(ddev, "enabled write caching\n");
        return ret;
 }
 
@@ -1935,12 +1974,12 @@ static noinline_for_stack int pkt_get_max_speed(struct pktcdvd_device *pd,
                 * Speed Performance Descriptor Block", use the information
                 * in the first block. (contains the highest speed)
                 */
-               int num_spdb = (cap_buf[30] << 8) + cap_buf[31];
+               int num_spdb = get_unaligned_be16(&cap_buf[30]);
                if (num_spdb > 0)
                        offset = 34;
        }
 
-       *write_speed = (cap_buf[offset] << 8) | cap_buf[offset + 1];
+       *write_speed = get_unaligned_be16(&cap_buf[offset]);
        return 0;
 }
 
@@ -1967,6 +2006,7 @@ static char us_clv_to_speed[16] = {
 static noinline_for_stack int pkt_media_speed(struct pktcdvd_device *pd,
                                                unsigned *speed)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        struct packet_command cgc;
        struct scsi_sense_hdr sshdr;
        unsigned char buf[64];
@@ -1984,7 +2024,7 @@ static noinline_for_stack int pkt_media_speed(struct pktcdvd_device *pd,
                pkt_dump_sense(pd, &cgc);
                return ret;
        }
-       size = ((unsigned int) buf[0]<<8) + buf[1] + 2;
+       size = 2 + get_unaligned_be16(&buf[0]);
        if (size > sizeof(buf))
                size = sizeof(buf);
 
@@ -2001,11 +2041,11 @@ static noinline_for_stack int pkt_media_speed(struct pktcdvd_device *pd,
        }
 
        if (!(buf[6] & 0x40)) {
-               pkt_notice(pd, "disc type is not CD-RW\n");
+               dev_notice(ddev, "disc type is not CD-RW\n");
                return 1;
        }
        if (!(buf[6] & 0x4)) {
-               pkt_notice(pd, "A1 values on media are not valid, maybe not CDRW?\n");
+               dev_notice(ddev, "A1 values on media are not valid, maybe not CDRW?\n");
                return 1;
        }
 
@@ -2025,25 +2065,26 @@ static noinline_for_stack int pkt_media_speed(struct pktcdvd_device *pd,
                        *speed = us_clv_to_speed[sp];
                        break;
                default:
-                       pkt_notice(pd, "unknown disc sub-type %d\n", st);
+                       dev_notice(ddev, "unknown disc sub-type %d\n", st);
                        return 1;
        }
        if (*speed) {
-               pkt_info(pd, "maximum media speed: %d\n", *speed);
+               dev_info(ddev, "maximum media speed: %d\n", *speed);
                return 0;
        } else {
-               pkt_notice(pd, "unknown speed %d for sub-type %d\n", sp, st);
+               dev_notice(ddev, "unknown speed %d for sub-type %d\n", sp, st);
                return 1;
        }
 }
 
 static noinline_for_stack int pkt_perform_opc(struct pktcdvd_device *pd)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        struct packet_command cgc;
        struct scsi_sense_hdr sshdr;
        int ret;
 
-       pkt_dbg(2, pd, "Performing OPC\n");
+       dev_dbg(ddev, "Performing OPC\n");
 
        init_cdrom_command(&cgc, NULL, 0, CGC_DATA_NONE);
        cgc.sshdr = &sshdr;
@@ -2058,18 +2099,19 @@ static noinline_for_stack int pkt_perform_opc(struct pktcdvd_device *pd)
 
 static int pkt_open_write(struct pktcdvd_device *pd)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        int ret;
        unsigned int write_speed, media_write_speed, read_speed;
 
        ret = pkt_probe_settings(pd);
        if (ret) {
-               pkt_dbg(2, pd, "failed probe\n");
+               dev_dbg(ddev, "failed probe\n");
                return ret;
        }
 
        ret = pkt_set_write_settings(pd);
        if (ret) {
-               pkt_dbg(1, pd, "failed saving write settings\n");
+               dev_notice(ddev, "failed saving write settings\n");
                return -EIO;
        }
 
@@ -2082,30 +2124,29 @@ static int pkt_open_write(struct pktcdvd_device *pd)
                case 0x13: /* DVD-RW */
                case 0x1a: /* DVD+RW */
                case 0x12: /* DVD-RAM */
-                       pkt_dbg(1, pd, "write speed %ukB/s\n", write_speed);
+                       dev_notice(ddev, "write speed %ukB/s\n", write_speed);
                        break;
                default:
                        ret = pkt_media_speed(pd, &media_write_speed);
                        if (ret)
                                media_write_speed = 16;
                        write_speed = min(write_speed, media_write_speed * 177);
-                       pkt_dbg(1, pd, "write speed %ux\n", write_speed / 176);
+                       dev_notice(ddev, "write speed %ux\n", write_speed / 176);
                        break;
        }
        read_speed = write_speed;
 
        ret = pkt_set_speed(pd, write_speed, read_speed);
        if (ret) {
-               pkt_dbg(1, pd, "couldn't set write speed\n");
+               dev_notice(ddev, "couldn't set write speed\n");
                return -EIO;
        }
        pd->write_speed = write_speed;
        pd->read_speed = read_speed;
 
        ret = pkt_perform_opc(pd);
-       if (ret) {
-               pkt_dbg(1, pd, "Optimum Power Calibration failed\n");
-       }
+       if (ret)
+               dev_notice(ddev, "Optimum Power Calibration failed\n");
 
        return 0;
 }
@@ -2113,8 +2154,9 @@ static int pkt_open_write(struct pktcdvd_device *pd)
 /*
  * called at open time.
  */
-static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
+static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        int ret;
        long lba;
        struct request_queue *q;
@@ -2125,7 +2167,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
         * to read/write from/to it. It is already opened in O_NONBLOCK mode
         * so open should not fail.
         */
-       bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd);
+       bdev = blkdev_get_by_dev(pd->bdev->bd_dev, BLK_OPEN_READ, pd, NULL);
        if (IS_ERR(bdev)) {
                ret = PTR_ERR(bdev);
                goto out;
@@ -2133,7 +2175,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
 
        ret = pkt_get_last_written(pd, &lba);
        if (ret) {
-               pkt_err(pd, "pkt_get_last_written failed\n");
+               dev_err(ddev, "pkt_get_last_written failed\n");
                goto out_putdev;
        }
 
@@ -2162,17 +2204,17 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
 
        if (write) {
                if (!pkt_grow_pktlist(pd, CONFIG_CDROM_PKTCDVD_BUFFERS)) {
-                       pkt_err(pd, "not enough memory for buffers\n");
+                       dev_err(ddev, "not enough memory for buffers\n");
                        ret = -ENOMEM;
                        goto out_putdev;
                }
-               pkt_info(pd, "%lukB available on disc\n", lba << 1);
+               dev_info(ddev, "%lukB available on disc\n", lba << 1);
        }
 
        return 0;
 
 out_putdev:
-       blkdev_put(bdev, FMODE_READ | FMODE_EXCL);
+       blkdev_put(bdev, pd);
 out:
        return ret;
 }
@@ -2183,13 +2225,15 @@ out:
  */
 static void pkt_release_dev(struct pktcdvd_device *pd, int flush)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
+
        if (flush && pkt_flush_cache(pd))
-               pkt_dbg(1, pd, "not flushing cache\n");
+               dev_notice(ddev, "not flushing cache\n");
 
        pkt_lock_door(pd, 0);
 
        pkt_set_speed(pd, MAX_SPEED, MAX_SPEED);
-       blkdev_put(pd->bdev, FMODE_READ | FMODE_EXCL);
+       blkdev_put(pd->bdev, pd);
 
        pkt_shrink_pktlist(pd);
 }
@@ -2203,14 +2247,14 @@ static struct pktcdvd_device *pkt_find_dev_from_minor(unsigned int dev_minor)
        return pkt_devs[dev_minor];
 }
 
-static int pkt_open(struct block_device *bdev, fmode_t mode)
+static int pkt_open(struct gendisk *disk, blk_mode_t mode)
 {
        struct pktcdvd_device *pd = NULL;
        int ret;
 
        mutex_lock(&pktcdvd_mutex);
        mutex_lock(&ctl_mutex);
-       pd = pkt_find_dev_from_minor(MINOR(bdev->bd_dev));
+       pd = pkt_find_dev_from_minor(disk->first_minor);
        if (!pd) {
                ret = -ENODEV;
                goto out;
@@ -2219,22 +2263,21 @@ static int pkt_open(struct block_device *bdev, fmode_t mode)
 
        pd->refcnt++;
        if (pd->refcnt > 1) {
-               if ((mode & FMODE_WRITE) &&
+               if ((mode & BLK_OPEN_WRITE) &&
                    !test_bit(PACKET_WRITABLE, &pd->flags)) {
                        ret = -EBUSY;
                        goto out_dec;
                }
        } else {
-               ret = pkt_open_dev(pd, mode & FMODE_WRITE);
+               ret = pkt_open_dev(pd, mode & BLK_OPEN_WRITE);
                if (ret)
                        goto out_dec;
                /*
                 * needed here as well, since ext2 (among others) may change
                 * the blocksize at mount time
                 */
-               set_blocksize(bdev, CD_FRAMESIZE);
+               set_blocksize(disk->part0, CD_FRAMESIZE);
        }
-
        mutex_unlock(&ctl_mutex);
        mutex_unlock(&pktcdvd_mutex);
        return 0;
@@ -2247,7 +2290,7 @@ out:
        return ret;
 }
 
-static void pkt_close(struct gendisk *disk, fmode_t mode)
+static void pkt_release(struct gendisk *disk)
 {
        struct pktcdvd_device *pd = disk->private_data;
 
@@ -2385,15 +2428,15 @@ static void pkt_make_request_write(struct request_queue *q, struct bio *bio)
 static void pkt_submit_bio(struct bio *bio)
 {
        struct pktcdvd_device *pd = bio->bi_bdev->bd_disk->queue->queuedata;
+       struct device *ddev = disk_to_dev(pd->disk);
        struct bio *split;
 
        bio = bio_split_to_limits(bio);
        if (!bio)
                return;
 
-       pkt_dbg(2, pd, "start = %6llx stop = %6llx\n",
-               (unsigned long long)bio->bi_iter.bi_sector,
-               (unsigned long long)bio_end_sector(bio));
+       dev_dbg(ddev, "start = %6llx stop = %6llx\n",
+               bio->bi_iter.bi_sector, bio_end_sector(bio));
 
        /*
         * Clone READ bios so we can have our own bi_end_io callback.
@@ -2404,13 +2447,12 @@ static void pkt_submit_bio(struct bio *bio)
        }
 
        if (!test_bit(PACKET_WRITABLE, &pd->flags)) {
-               pkt_notice(pd, "WRITE for ro device (%llu)\n",
-                          (unsigned long long)bio->bi_iter.bi_sector);
+               dev_notice(ddev, "WRITE for ro device (%llu)\n", bio->bi_iter.bi_sector);
                goto end_io;
        }
 
        if (!bio->bi_iter.bi_size || (bio->bi_iter.bi_size % CD_FRAMESIZE)) {
-               pkt_err(pd, "wrong bio size\n");
+               dev_err(ddev, "wrong bio size\n");
                goto end_io;
        }
 
@@ -2446,74 +2488,15 @@ static void pkt_init_queue(struct pktcdvd_device *pd)
        q->queuedata = pd;
 }
 
-static int pkt_seq_show(struct seq_file *m, void *p)
-{
-       struct pktcdvd_device *pd = m->private;
-       char *msg;
-       int states[PACKET_NUM_STATES];
-
-       seq_printf(m, "Writer %s mapped to %pg:\n", pd->name, pd->bdev);
-
-       seq_printf(m, "\nSettings:\n");
-       seq_printf(m, "\tpacket size:\t\t%dkB\n", pd->settings.size / 2);
-
-       if (pd->settings.write_type == 0)
-               msg = "Packet";
-       else
-               msg = "Unknown";
-       seq_printf(m, "\twrite type:\t\t%s\n", msg);
-
-       seq_printf(m, "\tpacket type:\t\t%s\n", pd->settings.fp ? "Fixed" : "Variable");
-       seq_printf(m, "\tlink loss:\t\t%d\n", pd->settings.link_loss);
-
-       seq_printf(m, "\ttrack mode:\t\t%d\n", pd->settings.track_mode);
-
-       if (pd->settings.block_mode == PACKET_BLOCK_MODE1)
-               msg = "Mode 1";
-       else if (pd->settings.block_mode == PACKET_BLOCK_MODE2)
-               msg = "Mode 2";
-       else
-               msg = "Unknown";
-       seq_printf(m, "\tblock mode:\t\t%s\n", msg);
-
-       seq_printf(m, "\nStatistics:\n");
-       seq_printf(m, "\tpackets started:\t%lu\n", pd->stats.pkt_started);
-       seq_printf(m, "\tpackets ended:\t\t%lu\n", pd->stats.pkt_ended);
-       seq_printf(m, "\twritten:\t\t%lukB\n", pd->stats.secs_w >> 1);
-       seq_printf(m, "\tread gather:\t\t%lukB\n", pd->stats.secs_rg >> 1);
-       seq_printf(m, "\tread:\t\t\t%lukB\n", pd->stats.secs_r >> 1);
-
-       seq_printf(m, "\nMisc:\n");
-       seq_printf(m, "\treference count:\t%d\n", pd->refcnt);
-       seq_printf(m, "\tflags:\t\t\t0x%lx\n", pd->flags);
-       seq_printf(m, "\tread speed:\t\t%ukB/s\n", pd->read_speed);
-       seq_printf(m, "\twrite speed:\t\t%ukB/s\n", pd->write_speed);
-       seq_printf(m, "\tstart offset:\t\t%lu\n", pd->offset);
-       seq_printf(m, "\tmode page offset:\t%u\n", pd->mode_offset);
-
-       seq_printf(m, "\nQueue state:\n");
-       seq_printf(m, "\tbios queued:\t\t%d\n", pd->bio_queue_size);
-       seq_printf(m, "\tbios pending:\t\t%d\n", atomic_read(&pd->cdrw.pending_bios));
-       seq_printf(m, "\tcurrent sector:\t\t0x%llx\n", (unsigned long long)pd->current_sector);
-
-       pkt_count_states(pd, states);
-       seq_printf(m, "\tstate:\t\t\ti:%d ow:%d rw:%d ww:%d rec:%d fin:%d\n",
-                  states[0], states[1], states[2], states[3], states[4], states[5]);
-
-       seq_printf(m, "\twrite congestion marks:\toff=%d on=%d\n",
-                       pd->write_congestion_off,
-                       pd->write_congestion_on);
-       return 0;
-}
-
 static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
 {
+       struct device *ddev = disk_to_dev(pd->disk);
        int i;
        struct block_device *bdev;
        struct scsi_device *sdev;
 
        if (pd->pkt_dev == dev) {
-               pkt_err(pd, "recursive setup not allowed\n");
+               dev_err(ddev, "recursive setup not allowed\n");
                return -EBUSY;
        }
        for (i = 0; i < MAX_WRITERS; i++) {
@@ -2521,21 +2504,22 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
                if (!pd2)
                        continue;
                if (pd2->bdev->bd_dev == dev) {
-                       pkt_err(pd, "%pg already setup\n", pd2->bdev);
+                       dev_err(ddev, "%pg already setup\n", pd2->bdev);
                        return -EBUSY;
                }
                if (pd2->pkt_dev == dev) {
-                       pkt_err(pd, "can't chain pktcdvd devices\n");
+                       dev_err(ddev, "can't chain pktcdvd devices\n");
                        return -EBUSY;
                }
        }
 
-       bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL);
+       bdev = blkdev_get_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_NDELAY, NULL,
+                                NULL);
        if (IS_ERR(bdev))
                return PTR_ERR(bdev);
        sdev = scsi_device_from_queue(bdev->bd_disk->queue);
        if (!sdev) {
-               blkdev_put(bdev, FMODE_READ | FMODE_NDELAY);
+               blkdev_put(bdev, NULL);
                return -EINVAL;
        }
        put_device(&sdev->sdev_gendev);
@@ -2549,30 +2533,31 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
        pkt_init_queue(pd);
 
        atomic_set(&pd->cdrw.pending_bios, 0);
-       pd->cdrw.thread = kthread_run(kcdrwd, pd, "%s", pd->name);
+       pd->cdrw.thread = kthread_run(kcdrwd, pd, "%s", pd->disk->disk_name);
        if (IS_ERR(pd->cdrw.thread)) {
-               pkt_err(pd, "can't start kernel thread\n");
+               dev_err(ddev, "can't start kernel thread\n");
                goto out_mem;
        }
 
-       proc_create_single_data(pd->name, 0, pkt_proc, pkt_seq_show, pd);
-       pkt_dbg(1, pd, "writer mapped to %pg\n", bdev);
+       proc_create_single_data(pd->disk->disk_name, 0, pkt_proc, pkt_seq_show, pd);
+       dev_notice(ddev, "writer mapped to %pg\n", bdev);
        return 0;
 
 out_mem:
-       blkdev_put(bdev, FMODE_READ | FMODE_NDELAY);
+       blkdev_put(bdev, NULL);
        /* This is safe: open() is still holding a reference. */
        module_put(THIS_MODULE);
        return -ENOMEM;
 }
 
-static int pkt_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg)
+static int pkt_ioctl(struct block_device *bdev, blk_mode_t mode,
+               unsigned int cmd, unsigned long arg)
 {
        struct pktcdvd_device *pd = bdev->bd_disk->private_data;
+       struct device *ddev = disk_to_dev(pd->disk);
        int ret;
 
-       pkt_dbg(2, pd, "cmd %x, dev %d:%d\n",
-               cmd, MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
+       dev_dbg(ddev, "cmd %x, dev %d:%d\n", cmd, MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
 
        mutex_lock(&pktcdvd_mutex);
        switch (cmd) {
@@ -2598,7 +2583,7 @@ static int pkt_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd,
                        ret = bdev->bd_disk->fops->ioctl(bdev, mode, cmd, arg);
                break;
        default:
-               pkt_dbg(2, pd, "Unknown ioctl (%x)\n", cmd);
+               dev_dbg(ddev, "Unknown ioctl (%x)\n", cmd);
                ret = -ENOTTY;
        }
        mutex_unlock(&pktcdvd_mutex);
@@ -2631,7 +2616,7 @@ static const struct block_device_operations pktcdvd_ops = {
        .owner =                THIS_MODULE,
        .submit_bio =           pkt_submit_bio,
        .open =                 pkt_open,
-       .release =              pkt_close,
+       .release =              pkt_release,
        .ioctl =                pkt_ioctl,
        .compat_ioctl =         blkdev_compat_ptr_ioctl,
        .check_events =         pkt_check_events,
@@ -2676,7 +2661,6 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
        spin_lock_init(&pd->iosched.lock);
        bio_list_init(&pd->iosched.read_queue);
        bio_list_init(&pd->iosched.write_queue);
-       sprintf(pd->name, DRIVER_NAME"%d", idx);
        init_waitqueue_head(&pd->wqueue);
        pd->bio_queue = RB_ROOT;
 
@@ -2693,7 +2677,7 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
        disk->minors = 1;
        disk->fops = &pktcdvd_ops;
        disk->flags = GENHD_FL_REMOVABLE | GENHD_FL_NO_PART;
-       strcpy(disk->disk_name, pd->name);
+       snprintf(disk->disk_name, sizeof(disk->disk_name), DRIVER_NAME"%d", idx);
        disk->private_data = pd;
 
        pd->pkt_dev = MKDEV(pktdev_major, idx);
@@ -2735,6 +2719,7 @@ out_mutex:
 static int pkt_remove_dev(dev_t pkt_dev)
 {
        struct pktcdvd_device *pd;
+       struct device *ddev;
        int idx;
        int ret = 0;
 
@@ -2755,6 +2740,9 @@ static int pkt_remove_dev(dev_t pkt_dev)
                ret = -EBUSY;
                goto out;
        }
+
+       ddev = disk_to_dev(pd->disk);
+
        if (!IS_ERR(pd->cdrw.thread))
                kthread_stop(pd->cdrw.thread);
 
@@ -2763,10 +2751,10 @@ static int pkt_remove_dev(dev_t pkt_dev)
        pkt_debugfs_dev_remove(pd);
        pkt_sysfs_dev_remove(pd);
 
-       blkdev_put(pd->bdev, FMODE_READ | FMODE_NDELAY);
+       blkdev_put(pd->bdev, NULL);
 
-       remove_proc_entry(pd->name, pkt_proc);
-       pkt_dbg(1, pd, "writer unmapped\n");
+       remove_proc_entry(pd->disk->disk_name, pkt_proc);
+       dev_notice(ddev, "writer unmapped\n");
 
        del_gendisk(pd->disk);
        put_disk(pd->disk);
index 84ad3b1..bd0e075 100644 (file)
@@ -660,9 +660,9 @@ static bool pending_result_dec(struct pending_result *pending, int *result)
        return true;
 }
 
-static int rbd_open(struct block_device *bdev, fmode_t mode)
+static int rbd_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct rbd_device *rbd_dev = bdev->bd_disk->private_data;
+       struct rbd_device *rbd_dev = disk->private_data;
        bool removing = false;
 
        spin_lock_irq(&rbd_dev->lock);
@@ -679,7 +679,7 @@ static int rbd_open(struct block_device *bdev, fmode_t mode)
        return 0;
 }
 
-static void rbd_release(struct gendisk *disk, fmode_t mode)
+static void rbd_release(struct gendisk *disk)
 {
        struct rbd_device *rbd_dev = disk->private_data;
        unsigned long open_count_before;
@@ -1334,14 +1334,30 @@ static bool rbd_obj_is_tail(struct rbd_obj_request *obj_req)
 /*
  * Must be called after rbd_obj_calc_img_extents().
  */
-static bool rbd_obj_copyup_enabled(struct rbd_obj_request *obj_req)
+static void rbd_obj_set_copyup_enabled(struct rbd_obj_request *obj_req)
 {
-       if (!obj_req->num_img_extents ||
-           (rbd_obj_is_entire(obj_req) &&
-            !obj_req->img_request->snapc->num_snaps))
-               return false;
+       rbd_assert(obj_req->img_request->snapc);
 
-       return true;
+       if (obj_req->img_request->op_type == OBJ_OP_DISCARD) {
+               dout("%s %p objno %llu discard\n", __func__, obj_req,
+                    obj_req->ex.oe_objno);
+               return;
+       }
+
+       if (!obj_req->num_img_extents) {
+               dout("%s %p objno %llu not overlapping\n", __func__, obj_req,
+                    obj_req->ex.oe_objno);
+               return;
+       }
+
+       if (rbd_obj_is_entire(obj_req) &&
+           !obj_req->img_request->snapc->num_snaps) {
+               dout("%s %p objno %llu entire\n", __func__, obj_req,
+                    obj_req->ex.oe_objno);
+               return;
+       }
+
+       obj_req->flags |= RBD_OBJ_FLAG_COPYUP_ENABLED;
 }
 
 static u64 rbd_obj_img_extents_bytes(struct rbd_obj_request *obj_req)
@@ -1442,6 +1458,7 @@ __rbd_obj_add_osd_request(struct rbd_obj_request *obj_req,
 static struct ceph_osd_request *
 rbd_obj_add_osd_request(struct rbd_obj_request *obj_req, int num_ops)
 {
+       rbd_assert(obj_req->img_request->snapc);
        return __rbd_obj_add_osd_request(obj_req, obj_req->img_request->snapc,
                                         num_ops);
 }
@@ -1578,15 +1595,18 @@ static void rbd_img_request_init(struct rbd_img_request *img_request,
        mutex_init(&img_request->state_mutex);
 }
 
+/*
+ * Only snap_id is captured here, for reads.  For writes, snapshot
+ * context is captured in rbd_img_object_requests() after exclusive
+ * lock is ensured to be held.
+ */
 static void rbd_img_capture_header(struct rbd_img_request *img_req)
 {
        struct rbd_device *rbd_dev = img_req->rbd_dev;
 
        lockdep_assert_held(&rbd_dev->header_rwsem);
 
-       if (rbd_img_is_write(img_req))
-               img_req->snapc = ceph_get_snap_context(rbd_dev->header.snapc);
-       else
+       if (!rbd_img_is_write(img_req))
                img_req->snap_id = rbd_dev->spec->snap_id;
 
        if (rbd_dev_parent_get(rbd_dev))
@@ -2233,9 +2253,6 @@ static int rbd_obj_init_write(struct rbd_obj_request *obj_req)
        if (ret)
                return ret;
 
-       if (rbd_obj_copyup_enabled(obj_req))
-               obj_req->flags |= RBD_OBJ_FLAG_COPYUP_ENABLED;
-
        obj_req->write_state = RBD_OBJ_WRITE_START;
        return 0;
 }
@@ -2341,8 +2358,6 @@ static int rbd_obj_init_zeroout(struct rbd_obj_request *obj_req)
        if (ret)
                return ret;
 
-       if (rbd_obj_copyup_enabled(obj_req))
-               obj_req->flags |= RBD_OBJ_FLAG_COPYUP_ENABLED;
        if (!obj_req->num_img_extents) {
                obj_req->flags |= RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT;
                if (rbd_obj_is_entire(obj_req))
@@ -3286,6 +3301,7 @@ again:
        case RBD_OBJ_WRITE_START:
                rbd_assert(!*result);
 
+               rbd_obj_set_copyup_enabled(obj_req);
                if (rbd_obj_write_is_noop(obj_req))
                        return true;
 
@@ -3472,9 +3488,19 @@ static int rbd_img_exclusive_lock(struct rbd_img_request *img_req)
 
 static void rbd_img_object_requests(struct rbd_img_request *img_req)
 {
+       struct rbd_device *rbd_dev = img_req->rbd_dev;
        struct rbd_obj_request *obj_req;
 
        rbd_assert(!img_req->pending.result && !img_req->pending.num_pending);
+       rbd_assert(!need_exclusive_lock(img_req) ||
+                  __rbd_is_lock_owner(rbd_dev));
+
+       if (rbd_img_is_write(img_req)) {
+               rbd_assert(!img_req->snapc);
+               down_read(&rbd_dev->header_rwsem);
+               img_req->snapc = ceph_get_snap_context(rbd_dev->header.snapc);
+               up_read(&rbd_dev->header_rwsem);
+       }
 
        for_each_obj_request(img_req, obj_req) {
                int result = 0;
@@ -3492,7 +3518,6 @@ static void rbd_img_object_requests(struct rbd_img_request *img_req)
 
 static bool rbd_img_advance(struct rbd_img_request *img_req, int *result)
 {
-       struct rbd_device *rbd_dev = img_req->rbd_dev;
        int ret;
 
 again:
@@ -3513,9 +3538,6 @@ again:
                if (*result)
                        return true;
 
-               rbd_assert(!need_exclusive_lock(img_req) ||
-                          __rbd_is_lock_owner(rbd_dev));
-
                rbd_img_object_requests(img_req);
                if (!img_req->pending.num_pending) {
                        *result = img_req->pending.result;
@@ -3977,6 +3999,10 @@ static int rbd_post_acquire_action(struct rbd_device *rbd_dev)
 {
        int ret;
 
+       ret = rbd_dev_refresh(rbd_dev);
+       if (ret)
+               return ret;
+
        if (rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP) {
                ret = rbd_object_map_open(rbd_dev);
                if (ret)
index 40b3163..208e5f8 100644 (file)
@@ -3,13 +3,11 @@
 ccflags-y := -I$(srctree)/drivers/infiniband/ulp/rtrs
 
 rnbd-client-y := rnbd-clt.o \
-                 rnbd-clt-sysfs.o \
-                 rnbd-common.o
+                 rnbd-clt-sysfs.o
 
 CFLAGS_rnbd-srv-trace.o = -I$(src)
 
-rnbd-server-y := rnbd-common.o \
-                 rnbd-srv.o \
+rnbd-server-y := rnbd-srv.o \
                  rnbd-srv-sysfs.o \
                  rnbd-srv-trace.o
 
index 8c60879..c36d8b1 100644 (file)
@@ -24,7 +24,9 @@
 #include "rnbd-clt.h"
 
 static struct device *rnbd_dev;
-static struct class *rnbd_dev_class;
+static const struct class rnbd_dev_class = {
+       .name = "rnbd_client",
+};
 static struct kobject *rnbd_devs_kobj;
 
 enum {
@@ -278,7 +280,7 @@ static ssize_t access_mode_show(struct kobject *kobj,
 
        dev = container_of(kobj, struct rnbd_clt_dev, kobj);
 
-       return sysfs_emit(page, "%s\n", rnbd_access_mode_str(dev->access_mode));
+       return sysfs_emit(page, "%s\n", rnbd_access_modes[dev->access_mode].str);
 }
 
 static struct kobj_attribute rnbd_clt_access_mode =
@@ -596,7 +598,7 @@ static ssize_t rnbd_clt_map_device_store(struct kobject *kobj,
 
        pr_info("Mapping device %s on session %s, (access_mode: %s, nr_poll_queues: %d)\n",
                pathname, sessname,
-               rnbd_access_mode_str(access_mode),
+               rnbd_access_modes[access_mode].str,
                nr_poll_queues);
 
        dev = rnbd_clt_map_device(sessname, paths, path_cnt, port_nr, pathname,
@@ -646,11 +648,11 @@ int rnbd_clt_create_sysfs_files(void)
 {
        int err;
 
-       rnbd_dev_class = class_create("rnbd-client");
-       if (IS_ERR(rnbd_dev_class))
-               return PTR_ERR(rnbd_dev_class);
+       err = class_register(&rnbd_dev_class);
+       if (err)
+               return err;
 
-       rnbd_dev = device_create_with_groups(rnbd_dev_class, NULL,
+       rnbd_dev = device_create_with_groups(&rnbd_dev_class, NULL,
                                              MKDEV(0, 0), NULL,
                                              default_attr_groups, "ctl");
        if (IS_ERR(rnbd_dev)) {
@@ -666,9 +668,9 @@ int rnbd_clt_create_sysfs_files(void)
        return 0;
 
 dev_destroy:
-       device_destroy(rnbd_dev_class, MKDEV(0, 0));
+       device_destroy(&rnbd_dev_class, MKDEV(0, 0));
 cls_destroy:
-       class_destroy(rnbd_dev_class);
+       class_unregister(&rnbd_dev_class);
 
        return err;
 }
@@ -678,6 +680,6 @@ void rnbd_clt_destroy_sysfs_files(void)
        sysfs_remove_group(&rnbd_dev->kobj, &default_attr_group);
        kobject_del(rnbd_devs_kobj);
        kobject_put(rnbd_devs_kobj);
-       device_destroy(rnbd_dev_class, MKDEV(0, 0));
-       class_destroy(rnbd_dev_class);
+       device_destroy(&rnbd_dev_class, MKDEV(0, 0));
+       class_unregister(&rnbd_dev_class);
 }
index 5eb8c78..b0550b6 100644 (file)
@@ -921,11 +921,11 @@ rnbd_clt_session *find_or_create_sess(const char *sessname, bool *first)
        return sess;
 }
 
-static int rnbd_client_open(struct block_device *block_device, fmode_t mode)
+static int rnbd_client_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct rnbd_clt_dev *dev = block_device->bd_disk->private_data;
+       struct rnbd_clt_dev *dev = disk->private_data;
 
-       if (get_disk_ro(dev->gd) && (mode & FMODE_WRITE))
+       if (get_disk_ro(dev->gd) && (mode & BLK_OPEN_WRITE))
                return -EPERM;
 
        if (dev->dev_state == DEV_STATE_UNMAPPED ||
@@ -935,7 +935,7 @@ static int rnbd_client_open(struct block_device *block_device, fmode_t mode)
        return 0;
 }
 
-static void rnbd_client_release(struct gendisk *gen, fmode_t mode)
+static void rnbd_client_release(struct gendisk *gen)
 {
        struct rnbd_clt_dev *dev = gen->private_data;
 
diff --git a/drivers/block/rnbd/rnbd-common.c b/drivers/block/rnbd/rnbd-common.c
deleted file mode 100644 (file)
index 596c3f7..0000000
+++ /dev/null
@@ -1,23 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * RDMA Network Block Driver
- *
- * Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
- * Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
- * Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
- */
-#include "rnbd-proto.h"
-
-const char *rnbd_access_mode_str(enum rnbd_access_mode mode)
-{
-       switch (mode) {
-       case RNBD_ACCESS_RO:
-               return "ro";
-       case RNBD_ACCESS_RW:
-               return "rw";
-       case RNBD_ACCESS_MIGRATION:
-               return "migration";
-       default:
-               return "unknown";
-       }
-}
index ea7ac8b..e32f8f2 100644 (file)
@@ -61,6 +61,15 @@ enum rnbd_access_mode {
        RNBD_ACCESS_MIGRATION,
 };
 
+static const __maybe_unused struct {
+       enum rnbd_access_mode mode;
+       const char *str;
+} rnbd_access_modes[] = {
+       [RNBD_ACCESS_RO] = {RNBD_ACCESS_RO, "ro"},
+       [RNBD_ACCESS_RW] = {RNBD_ACCESS_RW, "rw"},
+       [RNBD_ACCESS_MIGRATION] = {RNBD_ACCESS_MIGRATION, "migration"},
+};
+
 /**
  * struct rnbd_msg_sess_info - initial session info from client to server
  * @hdr:               message header
@@ -185,7 +194,6 @@ struct rnbd_msg_io {
 enum rnbd_io_flags {
 
        /* Operations */
-
        RNBD_OP_READ            = 0,
        RNBD_OP_WRITE           = 1,
        RNBD_OP_FLUSH           = 2,
@@ -193,15 +201,9 @@ enum rnbd_io_flags {
        RNBD_OP_SECURE_ERASE    = 4,
        RNBD_OP_WRITE_SAME      = 5,
 
-       RNBD_OP_LAST,
-
        /* Flags */
-
        RNBD_F_SYNC  = 1<<(RNBD_OP_BITS + 0),
        RNBD_F_FUA   = 1<<(RNBD_OP_BITS + 1),
-
-       RNBD_F_ALL   = (RNBD_F_SYNC | RNBD_F_FUA)
-
 };
 
 static inline u32 rnbd_op(u32 flags)
@@ -214,21 +216,6 @@ static inline u32 rnbd_flags(u32 flags)
        return flags & ~RNBD_OP_MASK;
 }
 
-static inline bool rnbd_flags_supported(u32 flags)
-{
-       u32 op;
-
-       op = rnbd_op(flags);
-       flags = rnbd_flags(flags);
-
-       if (op >= RNBD_OP_LAST)
-               return false;
-       if (flags & ~RNBD_F_ALL)
-               return false;
-
-       return true;
-}
-
 static inline blk_opf_t rnbd_to_bio_flags(u32 rnbd_opf)
 {
        blk_opf_t bio_opf;
@@ -241,7 +228,7 @@ static inline blk_opf_t rnbd_to_bio_flags(u32 rnbd_opf)
                bio_opf = REQ_OP_WRITE;
                break;
        case RNBD_OP_FLUSH:
-               bio_opf = REQ_OP_FLUSH | REQ_PREFLUSH;
+               bio_opf = REQ_OP_WRITE | REQ_PREFLUSH;
                break;
        case RNBD_OP_DISCARD:
                bio_opf = REQ_OP_DISCARD;
index d5d9267..cba6ba4 100644 (file)
@@ -9,7 +9,6 @@
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
 
-#include <uapi/linux/limits.h>
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
 #include <linux/stat.h>
@@ -20,7 +19,9 @@
 #include "rnbd-srv.h"
 
 static struct device *rnbd_dev;
-static struct class *rnbd_dev_class;
+static const struct class rnbd_dev_class = {
+       .name = "rnbd-server",
+};
 static struct kobject *rnbd_devs_kobj;
 
 static void rnbd_srv_dev_release(struct kobject *kobj)
@@ -88,8 +89,7 @@ static ssize_t read_only_show(struct kobject *kobj, struct kobj_attribute *attr,
 
        sess_dev = container_of(kobj, struct rnbd_srv_sess_dev, kobj);
 
-       return sysfs_emit(page, "%d\n",
-                         !(sess_dev->open_flags & FMODE_WRITE));
+       return sysfs_emit(page, "%d\n", sess_dev->readonly);
 }
 
 static struct kobj_attribute rnbd_srv_dev_session_ro_attr =
@@ -104,7 +104,7 @@ static ssize_t access_mode_show(struct kobject *kobj,
        sess_dev = container_of(kobj, struct rnbd_srv_sess_dev, kobj);
 
        return sysfs_emit(page, "%s\n",
-                         rnbd_access_mode_str(sess_dev->access_mode));
+                         rnbd_access_modes[sess_dev->access_mode].str);
 }
 
 static struct kobj_attribute rnbd_srv_dev_session_access_mode_attr =
@@ -215,12 +215,12 @@ int rnbd_srv_create_sysfs_files(void)
 {
        int err;
 
-       rnbd_dev_class = class_create("rnbd-server");
-       if (IS_ERR(rnbd_dev_class))
-               return PTR_ERR(rnbd_dev_class);
+       err = class_register(&rnbd_dev_class);
+       if (err)
+               return err;
 
-       rnbd_dev = device_create(rnbd_dev_class, NULL,
-                                 MKDEV(0, 0), NULL, "ctl");
+       rnbd_dev = device_create(&rnbd_dev_class, NULL,
+                                MKDEV(0, 0), NULL, "ctl");
        if (IS_ERR(rnbd_dev)) {
                err = PTR_ERR(rnbd_dev);
                goto cls_destroy;
@@ -234,9 +234,9 @@ int rnbd_srv_create_sysfs_files(void)
        return 0;
 
 dev_destroy:
-       device_destroy(rnbd_dev_class, MKDEV(0, 0));
+       device_destroy(&rnbd_dev_class, MKDEV(0, 0));
 cls_destroy:
-       class_destroy(rnbd_dev_class);
+       class_unregister(&rnbd_dev_class);
 
        return err;
 }
@@ -245,6 +245,6 @@ void rnbd_srv_destroy_sysfs_files(void)
 {
        kobject_del(rnbd_devs_kobj);
        kobject_put(rnbd_devs_kobj);
-       device_destroy(rnbd_dev_class, MKDEV(0, 0));
-       class_destroy(rnbd_dev_class);
+       device_destroy(&rnbd_dev_class, MKDEV(0, 0));
+       class_unregister(&rnbd_dev_class);
 }
index 2cfed2e..c186df0 100644 (file)
@@ -96,7 +96,7 @@ rnbd_get_sess_dev(int dev_id, struct rnbd_srv_session *srv_sess)
                ret = kref_get_unless_zero(&sess_dev->kref);
        rcu_read_unlock();
 
-       if (!sess_dev || !ret)
+       if (!ret)
                return ERR_PTR(-ENXIO);
 
        return sess_dev;
@@ -180,7 +180,7 @@ static void destroy_device(struct kref *kref)
 
        WARN_ONCE(!list_empty(&dev->sess_dev_list),
                  "Device %s is being destroyed but still in use!\n",
-                 dev->id);
+                 dev->name);
 
        spin_lock(&dev_lock);
        list_del(&dev->list);
@@ -219,10 +219,10 @@ void rnbd_destroy_sess_dev(struct rnbd_srv_sess_dev *sess_dev, bool keep_id)
        rnbd_put_sess_dev(sess_dev);
        wait_for_completion(&dc); /* wait for inflights to drop to zero */
 
-       blkdev_put(sess_dev->bdev, sess_dev->open_flags);
+       blkdev_put(sess_dev->bdev, NULL);
        mutex_lock(&sess_dev->dev->lock);
        list_del(&sess_dev->dev_list);
-       if (sess_dev->open_flags & FMODE_WRITE)
+       if (!sess_dev->readonly)
                sess_dev->dev->open_write_cnt--;
        mutex_unlock(&sess_dev->dev->lock);
 
@@ -356,7 +356,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
                            const void *msg, size_t len,
                            void *data, size_t datalen);
 
-static int process_msg_sess_info(struct rnbd_srv_session *srv_sess,
+static void process_msg_sess_info(struct rnbd_srv_session *srv_sess,
                                 const void *msg, size_t len,
                                 void *data, size_t datalen);
 
@@ -384,8 +384,7 @@ static int rnbd_srv_rdma_ev(void *priv, struct rtrs_srv_op *id,
                ret = process_msg_open(srv_sess, usr, usrlen, data, datalen);
                break;
        case RNBD_MSG_SESS_INFO:
-               ret = process_msg_sess_info(srv_sess, usr, usrlen, data,
-                                           datalen);
+               process_msg_sess_info(srv_sess, usr, usrlen, data, datalen);
                break;
        default:
                pr_warn("Received unexpected message type %d from session %s\n",
@@ -431,7 +430,7 @@ static struct rnbd_srv_dev *rnbd_srv_init_srv_dev(struct block_device *bdev)
        if (!dev)
                return ERR_PTR(-ENOMEM);
 
-       snprintf(dev->id, sizeof(dev->id), "%pg", bdev);
+       snprintf(dev->name, sizeof(dev->name), "%pg", bdev);
        kref_init(&dev->kref);
        INIT_LIST_HEAD(&dev->sess_dev_list);
        mutex_init(&dev->lock);
@@ -446,7 +445,7 @@ rnbd_srv_find_or_add_srv_dev(struct rnbd_srv_dev *new_dev)
 
        spin_lock(&dev_lock);
        list_for_each_entry(dev, &dev_list, list) {
-               if (!strncmp(dev->id, new_dev->id, sizeof(dev->id))) {
+               if (!strncmp(dev->name, new_dev->name, sizeof(dev->name))) {
                        if (!kref_get_unless_zero(&dev->kref))
                                /*
                                 * We lost the race, device is almost dead.
@@ -467,39 +466,38 @@ static int rnbd_srv_check_update_open_perm(struct rnbd_srv_dev *srv_dev,
                                            struct rnbd_srv_session *srv_sess,
                                            enum rnbd_access_mode access_mode)
 {
-       int ret = -EPERM;
+       int ret = 0;
 
        mutex_lock(&srv_dev->lock);
 
        switch (access_mode) {
        case RNBD_ACCESS_RO:
-               ret = 0;
                break;
        case RNBD_ACCESS_RW:
                if (srv_dev->open_write_cnt == 0)  {
                        srv_dev->open_write_cnt++;
-                       ret = 0;
                } else {
                        pr_err("Mapping device '%s' for session %s with RW permissions failed. Device already opened as 'RW' by %d client(s), access mode %s.\n",
-                              srv_dev->id, srv_sess->sessname,
+                              srv_dev->name, srv_sess->sessname,
                               srv_dev->open_write_cnt,
-                              rnbd_access_mode_str(access_mode));
+                              rnbd_access_modes[access_mode].str);
+                       ret = -EPERM;
                }
                break;
        case RNBD_ACCESS_MIGRATION:
                if (srv_dev->open_write_cnt < 2) {
                        srv_dev->open_write_cnt++;
-                       ret = 0;
                } else {
                        pr_err("Mapping device '%s' for session %s with migration permissions failed. Device already opened as 'RW' by %d client(s), access mode %s.\n",
-                              srv_dev->id, srv_sess->sessname,
+                              srv_dev->name, srv_sess->sessname,
                               srv_dev->open_write_cnt,
-                              rnbd_access_mode_str(access_mode));
+                              rnbd_access_modes[access_mode].str);
+                       ret = -EPERM;
                }
                break;
        default:
                pr_err("Received mapping request for device '%s' on session %s with invalid access mode: %d\n",
-                      srv_dev->id, srv_sess->sessname, access_mode);
+                      srv_dev->name, srv_sess->sessname, access_mode);
                ret = -EINVAL;
        }
 
@@ -561,7 +559,7 @@ static void rnbd_srv_fill_msg_open_rsp(struct rnbd_msg_open_rsp *rsp,
 static struct rnbd_srv_sess_dev *
 rnbd_srv_create_set_sess_dev(struct rnbd_srv_session *srv_sess,
                              const struct rnbd_msg_open *open_msg,
-                             struct block_device *bdev, fmode_t open_flags,
+                             struct block_device *bdev, bool readonly,
                              struct rnbd_srv_dev *srv_dev)
 {
        struct rnbd_srv_sess_dev *sdev = rnbd_sess_dev_alloc(srv_sess);
@@ -576,7 +574,7 @@ rnbd_srv_create_set_sess_dev(struct rnbd_srv_session *srv_sess,
        sdev->bdev              = bdev;
        sdev->sess              = srv_sess;
        sdev->dev               = srv_dev;
-       sdev->open_flags        = open_flags;
+       sdev->readonly          = readonly;
        sdev->access_mode       = open_msg->access_mode;
 
        return sdev;
@@ -631,7 +629,7 @@ static char *rnbd_srv_get_full_path(struct rnbd_srv_session *srv_sess,
        return full_path;
 }
 
-static int process_msg_sess_info(struct rnbd_srv_session *srv_sess,
+static void process_msg_sess_info(struct rnbd_srv_session *srv_sess,
                                 const void *msg, size_t len,
                                 void *data, size_t datalen)
 {
@@ -644,8 +642,6 @@ static int process_msg_sess_info(struct rnbd_srv_session *srv_sess,
 
        rsp->hdr.type = cpu_to_le16(RNBD_MSG_SESS_INFO_RSP);
        rsp->ver = srv_sess->ver;
-
-       return 0;
 }
 
 /**
@@ -681,15 +677,14 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
        struct rnbd_srv_sess_dev *srv_sess_dev;
        const struct rnbd_msg_open *open_msg = msg;
        struct block_device *bdev;
-       fmode_t open_flags;
+       blk_mode_t open_flags = BLK_OPEN_READ;
        char *full_path;
        struct rnbd_msg_open_rsp *rsp = data;
 
        trace_process_msg_open(srv_sess, open_msg);
 
-       open_flags = FMODE_READ;
        if (open_msg->access_mode != RNBD_ACCESS_RO)
-               open_flags |= FMODE_WRITE;
+               open_flags |= BLK_OPEN_WRITE;
 
        mutex_lock(&srv_sess->lock);
 
@@ -719,7 +714,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
                goto reject;
        }
 
-       bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE);
+       bdev = blkdev_get_by_path(full_path, open_flags, NULL, NULL);
        if (IS_ERR(bdev)) {
                ret = PTR_ERR(bdev);
                pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %d\n",
@@ -736,9 +731,9 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
                goto blkdev_put;
        }
 
-       srv_sess_dev = rnbd_srv_create_set_sess_dev(srv_sess, open_msg,
-                                                    bdev, open_flags,
-                                                    srv_dev);
+       srv_sess_dev = rnbd_srv_create_set_sess_dev(srv_sess, open_msg, bdev,
+                               open_msg->access_mode == RNBD_ACCESS_RO,
+                               srv_dev);
        if (IS_ERR(srv_sess_dev)) {
                pr_err("Opening device '%s' on session %s failed, creating sess_dev failed, err: %ld\n",
                       full_path, srv_sess->sessname, PTR_ERR(srv_sess_dev));
@@ -774,7 +769,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
        list_add(&srv_sess_dev->dev_list, &srv_dev->sess_dev_list);
        mutex_unlock(&srv_dev->lock);
 
-       rnbd_srv_info(srv_sess_dev, "Opened device '%s'\n", srv_dev->id);
+       rnbd_srv_info(srv_sess_dev, "Opened device '%s'\n", srv_dev->name);
 
        kfree(full_path);
 
@@ -795,7 +790,7 @@ srv_dev_put:
        }
        rnbd_put_srv_dev(srv_dev);
 blkdev_put:
-       blkdev_put(bdev, open_flags);
+       blkdev_put(bdev, NULL);
 free_path:
        kfree(full_path);
 reject:
@@ -808,7 +803,7 @@ static struct rtrs_srv_ctx *rtrs_ctx;
 static struct rtrs_srv_ops rtrs_ops;
 static int __init rnbd_srv_init_module(void)
 {
-       int err;
+       int err = 0;
 
        BUILD_BUG_ON(sizeof(struct rnbd_msg_hdr) != 4);
        BUILD_BUG_ON(sizeof(struct rnbd_msg_sess_info) != 36);
@@ -822,19 +817,17 @@ static int __init rnbd_srv_init_module(void)
        };
        rtrs_ctx = rtrs_srv_open(&rtrs_ops, port_nr);
        if (IS_ERR(rtrs_ctx)) {
-               err = PTR_ERR(rtrs_ctx);
                pr_err("rtrs_srv_open(), err: %d\n", err);
-               return err;
+               return PTR_ERR(rtrs_ctx);
        }
 
        err = rnbd_srv_create_sysfs_files();
        if (err) {
                pr_err("rnbd_srv_create_sysfs_files(), err: %d\n", err);
                rtrs_srv_close(rtrs_ctx);
-               return err;
        }
 
-       return 0;
+       return err;
 }
 
 static void __exit rnbd_srv_cleanup_module(void)
index f5962fd..1027656 100644 (file)
@@ -35,7 +35,7 @@ struct rnbd_srv_dev {
        struct kobject                  dev_kobj;
        struct kobject                  *dev_sessions_kobj;
        struct kref                     kref;
-       char                            id[NAME_MAX];
+       char                            name[NAME_MAX];
        /* List of rnbd_srv_sess_dev structs */
        struct list_head                sess_dev_list;
        struct mutex                    lock;
@@ -52,7 +52,7 @@ struct rnbd_srv_sess_dev {
        struct kobject                  kobj;
        u32                             device_id;
        bool                            keep_id;
-       fmode_t                         open_flags;
+       bool                            readonly;
        struct kref                     kref;
        struct completion               *destroy_comp;
        char                            pathname[NAME_MAX];
index 9fa821f..7bf4b48 100644 (file)
@@ -139,7 +139,7 @@ static int vdc_getgeo(struct block_device *bdev, struct hd_geometry *geo)
  * when vdisk_mtype is VD_MEDIA_TYPE_CD or VD_MEDIA_TYPE_DVD.
  * Needed to be able to install inside an ldom from an iso image.
  */
-static int vdc_ioctl(struct block_device *bdev, fmode_t mode,
+static int vdc_ioctl(struct block_device *bdev, blk_mode_t mode,
                     unsigned command, unsigned long argument)
 {
        struct vdc_port *port = bdev->bd_disk->private_data;
index 42b4b68..f85b6af 100644 (file)
@@ -608,20 +608,18 @@ static void setup_medium(struct floppy_state *fs)
        }
 }
 
-static int floppy_open(struct block_device *bdev, fmode_t mode)
+static int floppy_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct floppy_state *fs = bdev->bd_disk->private_data;
+       struct floppy_state *fs = disk->private_data;
        struct swim __iomem *base = fs->swd->base;
        int err;
 
-       if (fs->ref_count == -1 || (fs->ref_count && mode & FMODE_EXCL))
+       if (fs->ref_count == -1 || (fs->ref_count && mode & BLK_OPEN_EXCL))
                return -EBUSY;
-
-       if (mode & FMODE_EXCL)
+       if (mode & BLK_OPEN_EXCL)
                fs->ref_count = -1;
        else
                fs->ref_count++;
-
        swim_write(base, setup, S_IBM_DRIVE  | S_FCLK_DIV2);
        udelay(10);
        swim_drive(base, fs->location);
@@ -636,13 +634,13 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
 
        set_capacity(fs->disk, fs->total_secs);
 
-       if (mode & FMODE_NDELAY)
+       if (mode & BLK_OPEN_NDELAY)
                return 0;
 
-       if (mode & (FMODE_READ|FMODE_WRITE)) {
-               if (bdev_check_media_change(bdev) && fs->disk_in)
+       if (mode & (BLK_OPEN_READ | BLK_OPEN_WRITE)) {
+               if (disk_check_media_change(disk) && fs->disk_in)
                        fs->ejected = 0;
-               if ((mode & FMODE_WRITE) && fs->write_protected) {
+               if ((mode & BLK_OPEN_WRITE) && fs->write_protected) {
                        err = -EROFS;
                        goto out;
                }
@@ -659,18 +657,18 @@ out:
        return err;
 }
 
-static int floppy_unlocked_open(struct block_device *bdev, fmode_t mode)
+static int floppy_unlocked_open(struct gendisk *disk, blk_mode_t mode)
 {
        int ret;
 
        mutex_lock(&swim_mutex);
-       ret = floppy_open(bdev, mode);
+       ret = floppy_open(disk, mode);
        mutex_unlock(&swim_mutex);
 
        return ret;
 }
 
-static void floppy_release(struct gendisk *disk, fmode_t mode)
+static void floppy_release(struct gendisk *disk)
 {
        struct floppy_state *fs = disk->private_data;
        struct swim __iomem *base = fs->swd->base;
@@ -686,7 +684,7 @@ static void floppy_release(struct gendisk *disk, fmode_t mode)
        mutex_unlock(&swim_mutex);
 }
 
-static int floppy_ioctl(struct block_device *bdev, fmode_t mode,
+static int floppy_ioctl(struct block_device *bdev, blk_mode_t mode,
                        unsigned int cmd, unsigned long param)
 {
        struct floppy_state *fs = bdev->bd_disk->private_data;
index da811a7..dc43a63 100644 (file)
@@ -246,10 +246,9 @@ static int grab_drive(struct floppy_state *fs, enum swim_state state,
                      int interruptible);
 static void release_drive(struct floppy_state *fs);
 static int fd_eject(struct floppy_state *fs);
-static int floppy_ioctl(struct block_device *bdev, fmode_t mode,
+static int floppy_ioctl(struct block_device *bdev, blk_mode_t mode,
                        unsigned int cmd, unsigned long param);
-static int floppy_open(struct block_device *bdev, fmode_t mode);
-static void floppy_release(struct gendisk *disk, fmode_t mode);
+static int floppy_open(struct gendisk *disk, blk_mode_t mode);
 static unsigned int floppy_check_events(struct gendisk *disk,
                                        unsigned int clearing);
 static int floppy_revalidate(struct gendisk *disk);
@@ -883,7 +882,7 @@ static int fd_eject(struct floppy_state *fs)
 static struct floppy_struct floppy_type =
        { 2880,18,2,80,0,0x1B,0x00,0xCF,0x6C,NULL };    /*  7 1.44MB 3.5"   */
 
-static int floppy_locked_ioctl(struct block_device *bdev, fmode_t mode,
+static int floppy_locked_ioctl(struct block_device *bdev, blk_mode_t mode,
                        unsigned int cmd, unsigned long param)
 {
        struct floppy_state *fs = bdev->bd_disk->private_data;
@@ -911,7 +910,7 @@ static int floppy_locked_ioctl(struct block_device *bdev, fmode_t mode,
        return -ENOTTY;
 }
 
-static int floppy_ioctl(struct block_device *bdev, fmode_t mode,
+static int floppy_ioctl(struct block_device *bdev, blk_mode_t mode,
                                 unsigned int cmd, unsigned long param)
 {
        int ret;
@@ -923,9 +922,9 @@ static int floppy_ioctl(struct block_device *bdev, fmode_t mode,
        return ret;
 }
 
-static int floppy_open(struct block_device *bdev, fmode_t mode)
+static int floppy_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct floppy_state *fs = bdev->bd_disk->private_data;
+       struct floppy_state *fs = disk->private_data;
        struct swim3 __iomem *sw = fs->swim3;
        int n, err = 0;
 
@@ -958,18 +957,18 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
                swim3_action(fs, SETMFM);
                swim3_select(fs, RELAX);
 
-       } else if (fs->ref_count == -1 || mode & FMODE_EXCL)
+       } else if (fs->ref_count == -1 || mode & BLK_OPEN_EXCL)
                return -EBUSY;
 
-       if (err == 0 && (mode & FMODE_NDELAY) == 0
-           && (mode & (FMODE_READ|FMODE_WRITE))) {
-               if (bdev_check_media_change(bdev))
-                       floppy_revalidate(bdev->bd_disk);
+       if (err == 0 && !(mode & BLK_OPEN_NDELAY) &&
+           (mode & (BLK_OPEN_READ | BLK_OPEN_WRITE))) {
+               if (disk_check_media_change(disk))
+                       floppy_revalidate(disk);
                if (fs->ejected)
                        err = -ENXIO;
        }
 
-       if (err == 0 && (mode & FMODE_WRITE)) {
+       if (err == 0 && (mode & BLK_OPEN_WRITE)) {
                if (fs->write_prot < 0)
                        fs->write_prot = swim3_readbit(fs, WRITE_PROT);
                if (fs->write_prot)
@@ -985,7 +984,7 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
                return err;
        }
 
-       if (mode & FMODE_EXCL)
+       if (mode & BLK_OPEN_EXCL)
                fs->ref_count = -1;
        else
                ++fs->ref_count;
@@ -993,18 +992,18 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
        return 0;
 }
 
-static int floppy_unlocked_open(struct block_device *bdev, fmode_t mode)
+static int floppy_unlocked_open(struct gendisk *disk, blk_mode_t mode)
 {
        int ret;
 
        mutex_lock(&swim3_mutex);
-       ret = floppy_open(bdev, mode);
+       ret = floppy_open(disk, mode);
        mutex_unlock(&swim3_mutex);
 
        return ret;
 }
 
-static void floppy_release(struct gendisk *disk, fmode_t mode)
+static void floppy_release(struct gendisk *disk)
 {
        struct floppy_state *fs = disk->private_data;
        struct swim3 __iomem *sw = fs->swim3;
index c7331f5..1c82375 100644 (file)
@@ -43,6 +43,7 @@
 #include <asm/page.h>
 #include <linux/task_work.h>
 #include <linux/namei.h>
+#include <linux/kref.h>
 #include <uapi/linux/ublk_cmd.h>
 
 #define UBLK_MINORS            (1U << MINORBITS)
@@ -54,7 +55,8 @@
                | UBLK_F_USER_RECOVERY \
                | UBLK_F_USER_RECOVERY_REISSUE \
                | UBLK_F_UNPRIVILEGED_DEV \
-               | UBLK_F_CMD_IOCTL_ENCODE)
+               | UBLK_F_CMD_IOCTL_ENCODE \
+               | UBLK_F_USER_COPY)
 
 /* All UBLK_PARAM_TYPE_* should be included here */
 #define UBLK_PARAM_TYPE_ALL (UBLK_PARAM_TYPE_BASIC | \
@@ -62,7 +64,8 @@
 
 struct ublk_rq_data {
        struct llist_node node;
-       struct callback_head work;
+
+       struct kref ref;
 };
 
 struct ublk_uring_cmd_pdu {
@@ -182,8 +185,13 @@ struct ublk_params_header {
        __u32   types;
 };
 
+static inline void __ublk_complete_rq(struct request *req);
+static void ublk_complete_rq(struct kref *ref);
+
 static dev_t ublk_chr_devt;
-static struct class *ublk_chr_class;
+static const struct class ublk_chr_class = {
+       .name = "ublk-char",
+};
 
 static DEFINE_IDR(ublk_index_idr);
 static DEFINE_SPINLOCK(ublk_idr_lock);
@@ -202,6 +210,23 @@ static unsigned int ublks_added;   /* protected by ublk_ctl_mutex */
 
 static struct miscdevice ublk_misc;
 
+static inline unsigned ublk_pos_to_hwq(loff_t pos)
+{
+       return ((pos - UBLKSRV_IO_BUF_OFFSET) >> UBLK_QID_OFF) &
+               UBLK_QID_BITS_MASK;
+}
+
+static inline unsigned ublk_pos_to_buf_off(loff_t pos)
+{
+       return (pos - UBLKSRV_IO_BUF_OFFSET) & UBLK_IO_BUF_BITS_MASK;
+}
+
+static inline unsigned ublk_pos_to_tag(loff_t pos)
+{
+       return ((pos - UBLKSRV_IO_BUF_OFFSET) >> UBLK_TAG_OFF) &
+               UBLK_TAG_BITS_MASK;
+}
+
 static void ublk_dev_param_basic_apply(struct ublk_device *ub)
 {
        struct request_queue *q = ub->ub_disk->queue;
@@ -290,12 +315,52 @@ static int ublk_apply_params(struct ublk_device *ub)
        return 0;
 }
 
-static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq)
+static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
 {
-       if (IS_BUILTIN(CONFIG_BLK_DEV_UBLK) &&
-                       !(ubq->flags & UBLK_F_URING_CMD_COMP_IN_TASK))
-               return true;
-       return false;
+       return ubq->flags & UBLK_F_USER_COPY;
+}
+
+static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
+{
+       /*
+        * read()/write() is involved in user copy, so request reference
+        * has to be grabbed
+        */
+       return ublk_support_user_copy(ubq);
+}
+
+static inline void ublk_init_req_ref(const struct ublk_queue *ubq,
+               struct request *req)
+{
+       if (ublk_need_req_ref(ubq)) {
+               struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
+
+               kref_init(&data->ref);
+       }
+}
+
+static inline bool ublk_get_req_ref(const struct ublk_queue *ubq,
+               struct request *req)
+{
+       if (ublk_need_req_ref(ubq)) {
+               struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
+
+               return kref_get_unless_zero(&data->ref);
+       }
+
+       return true;
+}
+
+static inline void ublk_put_req_ref(const struct ublk_queue *ubq,
+               struct request *req)
+{
+       if (ublk_need_req_ref(ubq)) {
+               struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
+
+               kref_put(&data->ref, ublk_complete_rq);
+       } else {
+               __ublk_complete_rq(req);
+       }
 }
 
 static inline bool ublk_need_get_data(const struct ublk_queue *ubq)
@@ -384,9 +449,9 @@ static void ublk_store_owner_uid_gid(unsigned int *owner_uid,
        *owner_gid = from_kgid(&init_user_ns, gid);
 }
 
-static int ublk_open(struct block_device *bdev, fmode_t mode)
+static int ublk_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct ublk_device *ub = bdev->bd_disk->private_data;
+       struct ublk_device *ub = disk->private_data;
 
        if (capable(CAP_SYS_ADMIN))
                return 0;
@@ -421,49 +486,39 @@ static const struct block_device_operations ub_fops = {
 
 #define UBLK_MAX_PIN_PAGES     32
 
-struct ublk_map_data {
-       const struct request *rq;
-       unsigned long   ubuf;
-       unsigned int    len;
-};
-
 struct ublk_io_iter {
        struct page *pages[UBLK_MAX_PIN_PAGES];
-       unsigned pg_off;        /* offset in the 1st page in pages */
-       int nr_pages;           /* how many page pointers in pages */
        struct bio *bio;
        struct bvec_iter iter;
 };
 
-static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data,
-               unsigned max_bytes, bool to_vm)
+/* return how many pages are copied */
+static void ublk_copy_io_pages(struct ublk_io_iter *data,
+               size_t total, size_t pg_off, int dir)
 {
-       const unsigned total = min_t(unsigned, max_bytes,
-                       PAGE_SIZE - data->pg_off +
-                       ((data->nr_pages - 1) << PAGE_SHIFT));
        unsigned done = 0;
        unsigned pg_idx = 0;
 
        while (done < total) {
                struct bio_vec bv = bio_iter_iovec(data->bio, data->iter);
-               const unsigned int bytes = min3(bv.bv_len, total - done,
-                               (unsigned)(PAGE_SIZE - data->pg_off));
+               unsigned int bytes = min3(bv.bv_len, (unsigned)total - done,
+                               (unsigned)(PAGE_SIZE - pg_off));
                void *bv_buf = bvec_kmap_local(&bv);
                void *pg_buf = kmap_local_page(data->pages[pg_idx]);
 
-               if (to_vm)
-                       memcpy(pg_buf + data->pg_off, bv_buf, bytes);
+               if (dir == ITER_DEST)
+                       memcpy(pg_buf + pg_off, bv_buf, bytes);
                else
-                       memcpy(bv_buf, pg_buf + data->pg_off, bytes);
+                       memcpy(bv_buf, pg_buf + pg_off, bytes);
 
                kunmap_local(pg_buf);
                kunmap_local(bv_buf);
 
                /* advance page array */
-               data->pg_off += bytes;
-               if (data->pg_off == PAGE_SIZE) {
+               pg_off += bytes;
+               if (pg_off == PAGE_SIZE) {
                        pg_idx += 1;
-                       data->pg_off = 0;
+                       pg_off = 0;
                }
 
                done += bytes;
@@ -477,41 +532,58 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data,
                        data->iter = data->bio->bi_iter;
                }
        }
+}
 
-       return done;
+static bool ublk_advance_io_iter(const struct request *req,
+               struct ublk_io_iter *iter, unsigned int offset)
+{
+       struct bio *bio = req->bio;
+
+       for_each_bio(bio) {
+               if (bio->bi_iter.bi_size > offset) {
+                       iter->bio = bio;
+                       iter->iter = bio->bi_iter;
+                       bio_advance_iter(iter->bio, &iter->iter, offset);
+                       return true;
+               }
+               offset -= bio->bi_iter.bi_size;
+       }
+       return false;
 }
 
-static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm)
+/*
+ * Copy data between request pages and io_iter, and 'offset'
+ * is the start point of linear offset of request.
+ */
+static size_t ublk_copy_user_pages(const struct request *req,
+               unsigned offset, struct iov_iter *uiter, int dir)
 {
-       const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0;
-       const unsigned long start_vm = data->ubuf;
-       unsigned int done = 0;
-       struct ublk_io_iter iter = {
-               .pg_off = start_vm & (PAGE_SIZE - 1),
-               .bio    = data->rq->bio,
-               .iter   = data->rq->bio->bi_iter,
-       };
-       const unsigned int nr_pages = round_up(data->len +
-                       (start_vm & (PAGE_SIZE - 1)), PAGE_SIZE) >> PAGE_SHIFT;
-
-       while (done < nr_pages) {
-               const unsigned to_pin = min_t(unsigned, UBLK_MAX_PIN_PAGES,
-                               nr_pages - done);
-               unsigned i, len;
-
-               iter.nr_pages = get_user_pages_fast(start_vm +
-                               (done << PAGE_SHIFT), to_pin, gup_flags,
-                               iter.pages);
-               if (iter.nr_pages <= 0)
-                       return done == 0 ? iter.nr_pages : done;
-               len = ublk_copy_io_pages(&iter, data->len, to_vm);
-               for (i = 0; i < iter.nr_pages; i++) {
-                       if (to_vm)
+       struct ublk_io_iter iter;
+       size_t done = 0;
+
+       if (!ublk_advance_io_iter(req, &iter, offset))
+               return 0;
+
+       while (iov_iter_count(uiter) && iter.bio) {
+               unsigned nr_pages;
+               ssize_t len;
+               size_t off;
+               int i;
+
+               len = iov_iter_get_pages2(uiter, iter.pages,
+                               iov_iter_count(uiter),
+                               UBLK_MAX_PIN_PAGES, &off);
+               if (len <= 0)
+                       return done;
+
+               ublk_copy_io_pages(&iter, len, off, dir);
+               nr_pages = DIV_ROUND_UP(len + off, PAGE_SIZE);
+               for (i = 0; i < nr_pages; i++) {
+                       if (dir == ITER_DEST)
                                set_page_dirty(iter.pages[i]);
                        put_page(iter.pages[i]);
                }
-               data->len -= len;
-               done += iter.nr_pages;
+               done += len;
        }
 
        return done;
@@ -532,21 +604,23 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req,
 {
        const unsigned int rq_bytes = blk_rq_bytes(req);
 
+       if (ublk_support_user_copy(ubq))
+               return rq_bytes;
+
        /*
         * no zero copy, we delay copy WRITE request data into ublksrv
         * context and the big benefit is that pinning pages in current
         * context is pretty fast, see ublk_pin_user_pages
         */
        if (ublk_need_map_req(req)) {
-               struct ublk_map_data data = {
-                       .rq     =       req,
-                       .ubuf   =       io->addr,
-                       .len    =       rq_bytes,
-               };
+               struct iov_iter iter;
+               struct iovec iov;
+               const int dir = ITER_DEST;
 
-               ublk_copy_user_pages(&data, true);
+               import_single_range(dir, u64_to_user_ptr(io->addr), rq_bytes,
+                               &iov, &iter);
 
-               return rq_bytes - data.len;
+               return ublk_copy_user_pages(req, 0, &iter, dir);
        }
        return rq_bytes;
 }
@@ -557,18 +631,19 @@ static int ublk_unmap_io(const struct ublk_queue *ubq,
 {
        const unsigned int rq_bytes = blk_rq_bytes(req);
 
+       if (ublk_support_user_copy(ubq))
+               return rq_bytes;
+
        if (ublk_need_unmap_req(req)) {
-               struct ublk_map_data data = {
-                       .rq     =       req,
-                       .ubuf   =       io->addr,
-                       .len    =       io->res,
-               };
+               struct iov_iter iter;
+               struct iovec iov;
+               const int dir = ITER_SOURCE;
 
                WARN_ON_ONCE(io->res > rq_bytes);
 
-               ublk_copy_user_pages(&data, false);
-
-               return io->res - data.len;
+               import_single_range(dir, u64_to_user_ptr(io->addr), io->res,
+                               &iov, &iter);
+               return ublk_copy_user_pages(req, 0, &iter, dir);
        }
        return rq_bytes;
 }
@@ -648,13 +723,19 @@ static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq)
 }
 
 /* todo: handle partial completion */
-static void ublk_complete_rq(struct request *req)
+static inline void __ublk_complete_rq(struct request *req)
 {
        struct ublk_queue *ubq = req->mq_hctx->driver_data;
        struct ublk_io *io = &ubq->ios[req->tag];
        unsigned int unmapped_bytes;
        blk_status_t res = BLK_STS_OK;
 
+       /* called from ublk_abort_queue() code path */
+       if (io->flags & UBLK_IO_FLAG_ABORTED) {
+               res = BLK_STS_IOERR;
+               goto exit;
+       }
+
        /* failed read IO if nothing is read */
        if (!io->res && req_op(req) == REQ_OP_READ)
                io->res = -EIO;
@@ -694,6 +775,15 @@ exit:
        blk_mq_end_request(req, res);
 }
 
+static void ublk_complete_rq(struct kref *ref)
+{
+       struct ublk_rq_data *data = container_of(ref, struct ublk_rq_data,
+                       ref);
+       struct request *req = blk_mq_rq_from_pdu(data);
+
+       __ublk_complete_rq(req);
+}
+
 /*
  * Since __ublk_rq_task_work always fails requests immediately during
  * exiting, __ublk_fail_req() is only called from abort context during
@@ -712,7 +802,7 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
                if (ublk_queue_can_use_recovery_reissue(ubq))
                        blk_mq_requeue_request(req, false);
                else
-                       blk_mq_end_request(req, BLK_STS_IOERR);
+                       ublk_put_req_ref(ubq, req);
        }
 }
 
@@ -821,6 +911,7 @@ static inline void __ublk_rq_task_work(struct request *req,
                        mapped_bytes >> 9;
        }
 
+       ublk_init_req_ref(ubq, req);
        ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags);
 }
 
@@ -852,17 +943,6 @@ static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, unsigned issue_flags)
        ublk_forward_io_cmds(ubq, issue_flags);
 }
 
-static void ublk_rq_task_work_fn(struct callback_head *work)
-{
-       struct ublk_rq_data *data = container_of(work,
-                       struct ublk_rq_data, work);
-       struct request *req = blk_mq_rq_from_pdu(data);
-       struct ublk_queue *ubq = req->mq_hctx->driver_data;
-       unsigned issue_flags = IO_URING_F_UNLOCKED;
-
-       ublk_forward_io_cmds(ubq, issue_flags);
-}
-
 static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
 {
        struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq);
@@ -886,10 +966,6 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
         */
        if (unlikely(io->flags & UBLK_IO_FLAG_ABORTED)) {
                ublk_abort_io_cmds(ubq);
-       } else if (ublk_can_use_task_work(ubq)) {
-               if (task_work_add(ubq->ubq_daemon, &data->work,
-                                       TWA_SIGNAL_NO_IPI))
-                       ublk_abort_io_cmds(ubq);
        } else {
                struct io_uring_cmd *cmd = io->cmd;
                struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
@@ -961,19 +1037,9 @@ static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
        return 0;
 }
 
-static int ublk_init_rq(struct blk_mq_tag_set *set, struct request *req,
-               unsigned int hctx_idx, unsigned int numa_node)
-{
-       struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
-
-       init_task_work(&data->work, ublk_rq_task_work_fn);
-       return 0;
-}
-
 static const struct blk_mq_ops ublk_mq_ops = {
        .queue_rq       = ublk_queue_rq,
        .init_hctx      = ublk_init_hctx,
-       .init_request   = ublk_init_rq,
        .timeout        = ublk_timeout,
 };
 
@@ -1050,7 +1116,7 @@ static void ublk_commit_completion(struct ublk_device *ub,
        req = blk_mq_tag_to_rq(ub->tag_set.tags[qid], tag);
 
        if (req && likely(!blk_should_fake_timeout(req->q)))
-               ublk_complete_rq(req);
+               ublk_put_req_ref(ubq, req);
 }
 
 /*
@@ -1120,6 +1186,11 @@ static inline bool ublk_queue_ready(struct ublk_queue *ubq)
        return ubq->nr_io_ready == ubq->q_depth;
 }
 
+static void ublk_cmd_cancel_cb(struct io_uring_cmd *cmd, unsigned issue_flags)
+{
+       io_uring_cmd_done(cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
+}
+
 static void ublk_cancel_queue(struct ublk_queue *ubq)
 {
        int i;
@@ -1131,8 +1202,8 @@ static void ublk_cancel_queue(struct ublk_queue *ubq)
                struct ublk_io *io = &ubq->ios[i];
 
                if (io->flags & UBLK_IO_FLAG_ACTIVE)
-                       io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0,
-                                               IO_URING_F_UNLOCKED);
+                       io_uring_cmd_complete_in_task(io->cmd,
+                                                     ublk_cmd_cancel_cb);
        }
 
        /* all io commands are canceled */
@@ -1281,7 +1352,7 @@ static inline int ublk_check_cmd_op(u32 cmd_op)
 {
        u32 ioc_type = _IOC_TYPE(cmd_op);
 
-       if (IS_ENABLED(CONFIG_BLKDEV_UBLK_LEGACY_OPCODES) && ioc_type != 'u')
+       if (!IS_ENABLED(CONFIG_BLKDEV_UBLK_LEGACY_OPCODES) && ioc_type != 'u')
                return -EOPNOTSUPP;
 
        if (ioc_type != 'u' && ioc_type != 0)
@@ -1290,6 +1361,14 @@ static inline int ublk_check_cmd_op(u32 cmd_op)
        return 0;
 }
 
+static inline void ublk_fill_io_cmd(struct ublk_io *io,
+               struct io_uring_cmd *cmd, unsigned long buf_addr)
+{
+       io->cmd = cmd;
+       io->flags |= UBLK_IO_FLAG_ACTIVE;
+       io->addr = buf_addr;
+}
+
 static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
                               unsigned int issue_flags,
                               const struct ublksrv_io_cmd *ub_cmd)
@@ -1335,6 +1414,11 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
                        ^ (_IOC_NR(cmd_op) == UBLK_IO_NEED_GET_DATA))
                goto out;
 
+       if (ublk_support_user_copy(ubq) && ub_cmd->addr) {
+               ret = -EINVAL;
+               goto out;
+       }
+
        ret = ublk_check_cmd_op(cmd_op);
        if (ret)
                goto out;
@@ -1353,36 +1437,41 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
                 */
                if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)
                        goto out;
-               /* FETCH_RQ has to provide IO buffer if NEED GET DATA is not enabled */
-               if (!ub_cmd->addr && !ublk_need_get_data(ubq))
-                       goto out;
-               io->cmd = cmd;
-               io->flags |= UBLK_IO_FLAG_ACTIVE;
-               io->addr = ub_cmd->addr;
 
+               if (!ublk_support_user_copy(ubq)) {
+                       /*
+                        * FETCH_RQ has to provide IO buffer if NEED GET
+                        * DATA is not enabled
+                        */
+                       if (!ub_cmd->addr && !ublk_need_get_data(ubq))
+                               goto out;
+               }
+
+               ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
                ublk_mark_io_ready(ub, ubq);
                break;
        case UBLK_IO_COMMIT_AND_FETCH_REQ:
                req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag);
-               /*
-                * COMMIT_AND_FETCH_REQ has to provide IO buffer if NEED GET DATA is
-                * not enabled or it is Read IO.
-                */
-               if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || req_op(req) == REQ_OP_READ))
-                       goto out;
+
                if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
                        goto out;
-               io->addr = ub_cmd->addr;
-               io->flags |= UBLK_IO_FLAG_ACTIVE;
-               io->cmd = cmd;
+
+               if (!ublk_support_user_copy(ubq)) {
+                       /*
+                        * COMMIT_AND_FETCH_REQ has to provide IO buffer if
+                        * NEED GET DATA is not enabled or it is Read IO.
+                        */
+                       if (!ub_cmd->addr && (!ublk_need_get_data(ubq) ||
+                                               req_op(req) == REQ_OP_READ))
+                               goto out;
+               }
+               ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
                ublk_commit_completion(ub, ub_cmd);
                break;
        case UBLK_IO_NEED_GET_DATA:
                if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
                        goto out;
-               io->addr = ub_cmd->addr;
-               io->cmd = cmd;
-               io->flags |= UBLK_IO_FLAG_ACTIVE;
+               ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
                ublk_handle_need_get_data(ub, ub_cmd->q_id, ub_cmd->tag);
                break;
        default:
@@ -1397,6 +1486,36 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
        return -EIOCBQUEUED;
 }
 
+static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
+               struct ublk_queue *ubq, int tag, size_t offset)
+{
+       struct request *req;
+
+       if (!ublk_need_req_ref(ubq))
+               return NULL;
+
+       req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag);
+       if (!req)
+               return NULL;
+
+       if (!ublk_get_req_ref(ubq, req))
+               return NULL;
+
+       if (unlikely(!blk_mq_request_started(req) || req->tag != tag))
+               goto fail_put;
+
+       if (!ublk_rq_has_data(req))
+               goto fail_put;
+
+       if (offset > blk_rq_bytes(req))
+               goto fail_put;
+
+       return req;
+fail_put:
+       ublk_put_req_ref(ubq, req);
+       return NULL;
+}
+
 static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
        /*
@@ -1414,11 +1533,112 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
        return __ublk_ch_uring_cmd(cmd, issue_flags, &ub_cmd);
 }
 
+static inline bool ublk_check_ubuf_dir(const struct request *req,
+               int ubuf_dir)
+{
+       /* copy ubuf to request pages */
+       if (req_op(req) == REQ_OP_READ && ubuf_dir == ITER_SOURCE)
+               return true;
+
+       /* copy request pages to ubuf */
+       if (req_op(req) == REQ_OP_WRITE && ubuf_dir == ITER_DEST)
+               return true;
+
+       return false;
+}
+
+static struct request *ublk_check_and_get_req(struct kiocb *iocb,
+               struct iov_iter *iter, size_t *off, int dir)
+{
+       struct ublk_device *ub = iocb->ki_filp->private_data;
+       struct ublk_queue *ubq;
+       struct request *req;
+       size_t buf_off;
+       u16 tag, q_id;
+
+       if (!ub)
+               return ERR_PTR(-EACCES);
+
+       if (!user_backed_iter(iter))
+               return ERR_PTR(-EACCES);
+
+       if (ub->dev_info.state == UBLK_S_DEV_DEAD)
+               return ERR_PTR(-EACCES);
+
+       tag = ublk_pos_to_tag(iocb->ki_pos);
+       q_id = ublk_pos_to_hwq(iocb->ki_pos);
+       buf_off = ublk_pos_to_buf_off(iocb->ki_pos);
+
+       if (q_id >= ub->dev_info.nr_hw_queues)
+               return ERR_PTR(-EINVAL);
+
+       ubq = ublk_get_queue(ub, q_id);
+       if (!ubq)
+               return ERR_PTR(-EINVAL);
+
+       if (tag >= ubq->q_depth)
+               return ERR_PTR(-EINVAL);
+
+       req = __ublk_check_and_get_req(ub, ubq, tag, buf_off);
+       if (!req)
+               return ERR_PTR(-EINVAL);
+
+       if (!req->mq_hctx || !req->mq_hctx->driver_data)
+               goto fail;
+
+       if (!ublk_check_ubuf_dir(req, dir))
+               goto fail;
+
+       *off = buf_off;
+       return req;
+fail:
+       ublk_put_req_ref(ubq, req);
+       return ERR_PTR(-EACCES);
+}
+
+static ssize_t ublk_ch_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+       struct ublk_queue *ubq;
+       struct request *req;
+       size_t buf_off;
+       size_t ret;
+
+       req = ublk_check_and_get_req(iocb, to, &buf_off, ITER_DEST);
+       if (IS_ERR(req))
+               return PTR_ERR(req);
+
+       ret = ublk_copy_user_pages(req, buf_off, to, ITER_DEST);
+       ubq = req->mq_hctx->driver_data;
+       ublk_put_req_ref(ubq, req);
+
+       return ret;
+}
+
+static ssize_t ublk_ch_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+       struct ublk_queue *ubq;
+       struct request *req;
+       size_t buf_off;
+       size_t ret;
+
+       req = ublk_check_and_get_req(iocb, from, &buf_off, ITER_SOURCE);
+       if (IS_ERR(req))
+               return PTR_ERR(req);
+
+       ret = ublk_copy_user_pages(req, buf_off, from, ITER_SOURCE);
+       ubq = req->mq_hctx->driver_data;
+       ublk_put_req_ref(ubq, req);
+
+       return ret;
+}
+
 static const struct file_operations ublk_ch_fops = {
        .owner = THIS_MODULE,
        .open = ublk_ch_open,
        .release = ublk_ch_release,
        .llseek = no_llseek,
+       .read_iter = ublk_ch_read_iter,
+       .write_iter = ublk_ch_write_iter,
        .uring_cmd = ublk_ch_uring_cmd,
        .mmap = ublk_ch_mmap,
 };
@@ -1542,7 +1762,7 @@ static int ublk_add_chdev(struct ublk_device *ub)
 
        dev->parent = ublk_misc.this_device;
        dev->devt = MKDEV(MAJOR(ublk_chr_devt), minor);
-       dev->class = ublk_chr_class;
+       dev->class = &ublk_chr_class;
        dev->release = ublk_cdev_rel;
        device_initialize(dev);
 
@@ -1813,10 +2033,12 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
         */
        ub->dev_info.flags &= UBLK_F_ALL;
 
-       if (!IS_BUILTIN(CONFIG_BLK_DEV_UBLK))
-               ub->dev_info.flags |= UBLK_F_URING_CMD_COMP_IN_TASK;
+       ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE |
+               UBLK_F_URING_CMD_COMP_IN_TASK;
 
-       ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE;
+       /* GET_DATA isn't needed any more with USER_COPY */
+       if (ub->dev_info.flags & UBLK_F_USER_COPY)
+               ub->dev_info.flags &= ~UBLK_F_NEED_GET_DATA;
 
        /* We are not ready to support zero copy */
        ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY;
@@ -2128,6 +2350,21 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
        return ret;
 }
 
+static int ublk_ctrl_get_features(struct io_uring_cmd *cmd)
+{
+       const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
+       void __user *argp = (void __user *)(unsigned long)header->addr;
+       u64 features = UBLK_F_ALL & ~UBLK_F_SUPPORT_ZERO_COPY;
+
+       if (header->len != UBLK_FEATURES_LEN || !header->addr)
+               return -EINVAL;
+
+       if (copy_to_user(argp, &features, UBLK_FEATURES_LEN))
+               return -EFAULT;
+
+       return 0;
+}
+
 /*
  * All control commands are sent via /dev/ublk-control, so we have to check
  * the destination device's permission
@@ -2208,6 +2445,7 @@ static int ublk_ctrl_uring_cmd_permission(struct ublk_device *ub,
        case UBLK_CMD_GET_DEV_INFO2:
        case UBLK_CMD_GET_QUEUE_AFFINITY:
        case UBLK_CMD_GET_PARAMS:
+       case (_IOC_NR(UBLK_U_CMD_GET_FEATURES)):
                mask = MAY_READ;
                break;
        case UBLK_CMD_START_DEV:
@@ -2257,6 +2495,11 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
        if (ret)
                goto out;
 
+       if (cmd_op == UBLK_U_CMD_GET_FEATURES) {
+               ret = ublk_ctrl_get_features(cmd);
+               goto out;
+       }
+
        if (_IOC_NR(cmd_op) != UBLK_CMD_ADD_DEV) {
                ret = -ENODEV;
                ub = ublk_get_device_from_id(header->dev_id);
@@ -2332,6 +2575,9 @@ static int __init ublk_init(void)
 {
        int ret;
 
+       BUILD_BUG_ON((u64)UBLKSRV_IO_BUF_OFFSET +
+                       UBLKSRV_IO_BUF_TOTAL_SIZE < UBLKSRV_IO_BUF_OFFSET);
+
        init_waitqueue_head(&ublk_idr_wq);
 
        ret = misc_register(&ublk_misc);
@@ -2342,11 +2588,10 @@ static int __init ublk_init(void)
        if (ret)
                goto unregister_mis;
 
-       ublk_chr_class = class_create("ublk-char");
-       if (IS_ERR(ublk_chr_class)) {
-               ret = PTR_ERR(ublk_chr_class);
+       ret = class_register(&ublk_chr_class);
+       if (ret)
                goto free_chrdev_region;
-       }
+
        return 0;
 
 free_chrdev_region:
@@ -2364,7 +2609,7 @@ static void __exit ublk_exit(void)
        idr_for_each_entry(&ublk_index_idr, ub, id)
                ublk_remove(ub);
 
-       class_destroy(ublk_chr_class);
+       class_unregister(&ublk_chr_class);
        misc_deregister(&ublk_misc);
 
        idr_destroy(&ublk_index_idr);
index 2b918e2..b47358d 100644 (file)
@@ -348,63 +348,33 @@ static inline void virtblk_request_done(struct request *req)
        blk_mq_end_request(req, status);
 }
 
-static void virtblk_complete_batch(struct io_comp_batch *iob)
-{
-       struct request *req;
-
-       rq_list_for_each(&iob->req_list, req) {
-               virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
-               virtblk_cleanup_cmd(req);
-       }
-       blk_mq_end_request_batch(iob);
-}
-
-static int virtblk_handle_req(struct virtio_blk_vq *vq,
-                             struct io_comp_batch *iob)
-{
-       struct virtblk_req *vbr;
-       int req_done = 0;
-       unsigned int len;
-
-       while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
-               struct request *req = blk_mq_rq_from_pdu(vbr);
-
-               if (likely(!blk_should_fake_timeout(req->q)) &&
-                   !blk_mq_complete_request_remote(req) &&
-                   !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
-                                        virtblk_complete_batch))
-                       virtblk_request_done(req);
-               req_done++;
-       }
-
-       return req_done;
-}
-
 static void virtblk_done(struct virtqueue *vq)
 {
        struct virtio_blk *vblk = vq->vdev->priv;
-       struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index];
-       int req_done = 0;
+       bool req_done = false;
+       int qid = vq->index;
+       struct virtblk_req *vbr;
        unsigned long flags;
-       DEFINE_IO_COMP_BATCH(iob);
+       unsigned int len;
 
-       spin_lock_irqsave(&vblk_vq->lock, flags);
+       spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
        do {
                virtqueue_disable_cb(vq);
-               req_done += virtblk_handle_req(vblk_vq, &iob);
+               while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
+                       struct request *req = blk_mq_rq_from_pdu(vbr);
 
+                       if (likely(!blk_should_fake_timeout(req->q)))
+                               blk_mq_complete_request(req);
+                       req_done = true;
+               }
                if (unlikely(virtqueue_is_broken(vq)))
                        break;
        } while (!virtqueue_enable_cb(vq));
 
-       if (req_done) {
-               if (!rq_list_empty(iob.req_list))
-                       iob.complete(&iob);
-
-               /* In case queue is stopped waiting for more buffers. */
+       /* In case queue is stopped waiting for more buffers. */
+       if (req_done)
                blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
-       }
-       spin_unlock_irqrestore(&vblk_vq->lock, flags);
+       spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 }
 
 static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
@@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set)
        }
 }
 
+static void virtblk_complete_batch(struct io_comp_batch *iob)
+{
+       struct request *req;
+
+       rq_list_for_each(&iob->req_list, req) {
+               virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
+               virtblk_cleanup_cmd(req);
+       }
+       blk_mq_end_request_batch(iob);
+}
+
 static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
 {
        struct virtio_blk *vblk = hctx->queue->queuedata;
        struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
+       struct virtblk_req *vbr;
        unsigned long flags;
+       unsigned int len;
        int found = 0;
 
        spin_lock_irqsave(&vq->lock, flags);
-       found = virtblk_handle_req(vq, iob);
+
+       while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
+               struct request *req = blk_mq_rq_from_pdu(vbr);
+
+               found++;
+               if (!blk_mq_complete_request_remote(req) &&
+                   !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr),
+                                               virtblk_complete_batch))
+                       virtblk_request_done(req);
+       }
 
        if (found)
                blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
index 4807af1..bb66178 100644 (file)
@@ -473,7 +473,7 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
        if (vbd->bdev)
-               blkdev_put(vbd->bdev, vbd->readonly ? FMODE_READ : FMODE_WRITE);
+               blkdev_put(vbd->bdev, NULL);
        vbd->bdev = NULL;
 }
 
@@ -492,7 +492,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
        vbd->pdevice  = MKDEV(major, minor);
 
        bdev = blkdev_get_by_dev(vbd->pdevice, vbd->readonly ?
-                                FMODE_READ : FMODE_WRITE, NULL);
+                                BLK_OPEN_READ : BLK_OPEN_WRITE, NULL, NULL);
 
        if (IS_ERR(bdev)) {
                pr_warn("xen_vbd_create: device %08x could not be opened\n",
index 23ed258..434fab3 100644 (file)
@@ -509,7 +509,7 @@ static int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
        return 0;
 }
 
-static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
+static int blkif_ioctl(struct block_device *bdev, blk_mode_t mode,
                       unsigned command, unsigned long argument)
 {
        struct blkfront_info *info = bdev->bd_disk->private_data;
@@ -780,7 +780,8 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
                ring_req->u.rw.handle = info->handle;
                ring_req->operation = rq_data_dir(req) ?
                        BLKIF_OP_WRITE : BLKIF_OP_READ;
-               if (req_op(req) == REQ_OP_FLUSH || req->cmd_flags & REQ_FUA) {
+               if (req_op(req) == REQ_OP_FLUSH ||
+                   (req_op(req) == REQ_OP_WRITE && (req->cmd_flags & REQ_FUA))) {
                        /*
                         * Ideally we can do an unordered flush-to-disk.
                         * In case the backend onlysupports barriers, use that.
index c1e85f3..1149316 100644 (file)
@@ -140,16 +140,14 @@ static void get_chipram(void)
        return;
 }
 
-static int z2_open(struct block_device *bdev, fmode_t mode)
+static int z2_open(struct gendisk *disk, blk_mode_t mode)
 {
-       int device;
+       int device = disk->first_minor;
        int max_z2_map = (Z2RAM_SIZE / Z2RAM_CHUNKSIZE) * sizeof(z2ram_map[0]);
        int max_chip_map = (amiga_chip_size / Z2RAM_CHUNKSIZE) *
            sizeof(z2ram_map[0]);
        int rc = -ENOMEM;
 
-       device = MINOR(bdev->bd_dev);
-
        mutex_lock(&z2ram_mutex);
        if (current_device != -1 && current_device != device) {
                rc = -EBUSY;
@@ -290,7 +288,7 @@ err_out:
        return rc;
 }
 
-static void z2_release(struct gendisk *disk, fmode_t mode)
+static void z2_release(struct gendisk *disk)
 {
        mutex_lock(&z2ram_mutex);
        if (current_device == -1) {
index f6d90f1..1867f37 100644 (file)
@@ -420,7 +420,7 @@ static void reset_bdev(struct zram *zram)
                return;
 
        bdev = zram->bdev;
-       blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+       blkdev_put(bdev, zram);
        /* hope filp_close flush all of IO */
        filp_close(zram->backing_dev, NULL);
        zram->backing_dev = NULL;
@@ -507,8 +507,8 @@ static ssize_t backing_dev_store(struct device *dev,
                goto out;
        }
 
-       bdev = blkdev_get_by_dev(inode->i_rdev,
-                       FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram);
+       bdev = blkdev_get_by_dev(inode->i_rdev, BLK_OPEN_READ | BLK_OPEN_WRITE,
+                                zram, NULL);
        if (IS_ERR(bdev)) {
                err = PTR_ERR(bdev);
                bdev = NULL;
@@ -539,7 +539,7 @@ out:
        kvfree(bitmap);
 
        if (bdev)
-               blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+               blkdev_put(bdev, zram);
 
        if (backing_dev)
                filp_close(backing_dev, NULL);
@@ -700,7 +700,7 @@ static ssize_t writeback_store(struct device *dev,
                bio_init(&bio, zram->bdev, &bio_vec, 1,
                         REQ_OP_WRITE | REQ_SYNC);
                bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
-               bio_add_page(&bio, page, PAGE_SIZE, 0);
+               __bio_add_page(&bio, page, PAGE_SIZE, 0);
 
                /*
                 * XXX: A single page IO would be inefficient for write
@@ -2097,19 +2097,16 @@ static ssize_t reset_store(struct device *dev,
        return len;
 }
 
-static int zram_open(struct block_device *bdev, fmode_t mode)
+static int zram_open(struct gendisk *disk, blk_mode_t mode)
 {
-       int ret = 0;
-       struct zram *zram;
+       struct zram *zram = disk->private_data;
 
-       WARN_ON(!mutex_is_locked(&bdev->bd_disk->open_mutex));
+       WARN_ON(!mutex_is_locked(&disk->open_mutex));
 
-       zram = bdev->bd_disk->private_data;
        /* zram was claimed to reset so open request fails */
        if (zram->claim)
-               ret = -EBUSY;
-
-       return ret;
+               return -EBUSY;
+       return 0;
 }
 
 static const struct block_device_operations zram_devops = {
index 3a34d7c..52ef446 100644 (file)
@@ -1319,17 +1319,17 @@ static void nxp_serdev_remove(struct serdev_device *serdev)
        hci_free_dev(hdev);
 }
 
-static struct btnxpuart_data w8987_data = {
+static struct btnxpuart_data w8987_data __maybe_unused = {
        .helper_fw_name = NULL,
        .fw_name = FIRMWARE_W8987,
 };
 
-static struct btnxpuart_data w8997_data = {
+static struct btnxpuart_data w8997_data __maybe_unused = {
        .helper_fw_name = FIRMWARE_HELPER,
        .fw_name = FIRMWARE_W8997,
 };
 
-static const struct of_device_id nxpuart_of_match_table[] = {
+static const struct of_device_id nxpuart_of_match_table[] __maybe_unused = {
        { .compatible = "nxp,88w8987-bt", .data = &w8987_data },
        { .compatible = "nxp,88w8997-bt", .data = &w8997_data },
        { }
index 1b06450..e30c979 100644 (file)
@@ -78,7 +78,8 @@ enum qca_flags {
        QCA_HW_ERROR_EVENT,
        QCA_SSR_TRIGGERED,
        QCA_BT_OFF,
-       QCA_ROM_FW
+       QCA_ROM_FW,
+       QCA_DEBUGFS_CREATED,
 };
 
 enum qca_capabilities {
@@ -635,6 +636,9 @@ static void qca_debugfs_init(struct hci_dev *hdev)
        if (!hdev->debugfs)
                return;
 
+       if (test_and_set_bit(QCA_DEBUGFS_CREATED, &qca->flags))
+               return;
+
        ibs_dir = debugfs_create_dir("ibs", hdev->debugfs);
 
        /* read only */
index 416f723..cc28398 100644 (file)
 #include <linux/errno.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
+#include <linux/nospec.h>
 #include <linux/slab.h> 
 #include <linux/cdrom.h>
 #include <linux/sysctl.h>
@@ -978,15 +979,6 @@ static void cdrom_dvd_rw_close_write(struct cdrom_device_info *cdi)
        cdi->media_written = 0;
 }
 
-static int cdrom_close_write(struct cdrom_device_info *cdi)
-{
-#if 0
-       return cdrom_flush_cache(cdi);
-#else
-       return 0;
-#endif
-}
-
 /* badly broken, I know. Is due for a fixup anytime. */
 static void cdrom_count_tracks(struct cdrom_device_info *cdi, tracktype *tracks)
 {
@@ -1155,8 +1147,7 @@ clean_up_and_return:
  * is in their own interest: device control becomes a lot easier
  * this way.
  */
-int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev,
-              fmode_t mode)
+int cdrom_open(struct cdrom_device_info *cdi, blk_mode_t mode)
 {
        int ret;
 
@@ -1165,7 +1156,7 @@ int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev,
        /* if this was a O_NONBLOCK open and we should honor the flags,
         * do a quick open without drive/disc integrity checks. */
        cdi->use_count++;
-       if ((mode & FMODE_NDELAY) && (cdi->options & CDO_USE_FFLAGS)) {
+       if ((mode & BLK_OPEN_NDELAY) && (cdi->options & CDO_USE_FFLAGS)) {
                ret = cdi->ops->open(cdi, 1);
        } else {
                ret = open_for_data(cdi);
@@ -1173,7 +1164,7 @@ int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev,
                        goto err;
                if (CDROM_CAN(CDC_GENERIC_PACKET))
                        cdrom_mmc3_profile(cdi);
-               if (mode & FMODE_WRITE) {
+               if (mode & BLK_OPEN_WRITE) {
                        ret = -EROFS;
                        if (cdrom_open_write(cdi))
                                goto err_release;
@@ -1182,6 +1173,7 @@ int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev,
                        ret = 0;
                        cdi->media_written = 0;
                }
+               cdi->opened_for_data = true;
        }
 
        if (ret)
@@ -1259,10 +1251,9 @@ static int check_for_audio_disc(struct cdrom_device_info *cdi,
        return 0;
 }
 
-void cdrom_release(struct cdrom_device_info *cdi, fmode_t mode)
+void cdrom_release(struct cdrom_device_info *cdi)
 {
        const struct cdrom_device_ops *cdo = cdi->ops;
-       int opened_for_data;
 
        cd_dbg(CD_CLOSE, "entering cdrom_release\n");
 
@@ -1280,20 +1271,12 @@ void cdrom_release(struct cdrom_device_info *cdi, fmode_t mode)
                }
        }
 
-       opened_for_data = !(cdi->options & CDO_USE_FFLAGS) ||
-               !(mode & FMODE_NDELAY);
-
-       /*
-        * flush cache on last write release
-        */
-       if (CDROM_CAN(CDC_RAM) && !cdi->use_count && cdi->for_data)
-               cdrom_close_write(cdi);
-
        cdo->release(cdi);
-       if (cdi->use_count == 0) {      /* last process that closes dev*/
-               if (opened_for_data &&
-                   cdi->options & CDO_AUTO_EJECT && CDROM_CAN(CDC_OPEN_TRAY))
+
+       if (cdi->use_count == 0 && cdi->opened_for_data) {
+               if (cdi->options & CDO_AUTO_EJECT && CDROM_CAN(CDC_OPEN_TRAY))
                        cdo->tray_move(cdi, 1);
+               cdi->opened_for_data = false;
        }
 }
 EXPORT_SYMBOL(cdrom_release);
@@ -2329,6 +2312,9 @@ static int cdrom_ioctl_media_changed(struct cdrom_device_info *cdi,
        if (arg >= cdi->capacity)
                return -EINVAL;
 
+       /* Prevent arg from speculatively bypassing the length check */
+       barrier_nospec();
+
        info = kmalloc(sizeof(*info), GFP_KERNEL);
        if (!info)
                return -ENOMEM;
@@ -3337,7 +3323,7 @@ static int mmc_ioctl(struct cdrom_device_info *cdi, unsigned int cmd,
  * ATAPI / SCSI specific code now mainly resides in mmc_ioctl().
  */
 int cdrom_ioctl(struct cdrom_device_info *cdi, struct block_device *bdev,
-               fmode_t mode, unsigned int cmd, unsigned long arg)
+               unsigned int cmd, unsigned long arg)
 {
        void __user *argp = (void __user *)arg;
        int ret;
index ceded57..3a46e27 100644 (file)
@@ -474,19 +474,19 @@ static const struct cdrom_device_ops gdrom_ops = {
                                  CDC_RESET | CDC_DRIVE_STATUS | CDC_CD_R,
 };
 
-static int gdrom_bdops_open(struct block_device *bdev, fmode_t mode)
+static int gdrom_bdops_open(struct gendisk *disk, blk_mode_t mode)
 {
        int ret;
 
-       bdev_check_media_change(bdev);
+       disk_check_media_change(disk);
 
        mutex_lock(&gdrom_mutex);
-       ret = cdrom_open(gd.cd_info, bdev, mode);
+       ret = cdrom_open(gd.cd_info);
        mutex_unlock(&gdrom_mutex);
        return ret;
 }
 
-static void gdrom_bdops_release(struct gendisk *disk, fmode_t mode)
+static void gdrom_bdops_release(struct gendisk *disk)
 {
        mutex_lock(&gdrom_mutex);
        cdrom_release(gd.cd_info, mode);
@@ -499,13 +499,13 @@ static unsigned int gdrom_bdops_check_events(struct gendisk *disk,
        return cdrom_check_events(gd.cd_info, clearing);
 }
 
-static int gdrom_bdops_ioctl(struct block_device *bdev, fmode_t mode,
+static int gdrom_bdops_ioctl(struct block_device *bdev, blk_mode_t mode,
        unsigned cmd, unsigned long arg)
 {
        int ret;
 
        mutex_lock(&gdrom_mutex);
-       ret = cdrom_ioctl(gd.cd_info, bdev, mode, cmd, arg);
+       ret = cdrom_ioctl(gd.cd_info, bdev, cmd, arg);
        mutex_unlock(&gdrom_mutex);
 
        return ret;
index d68d05d..514f9f2 100644 (file)
@@ -90,6 +90,9 @@ parisc_agp_tlbflush(struct agp_memory *mem)
 {
        struct _parisc_agp_info *info = &parisc_agp_info;
 
+       /* force fdc ops to be visible to IOMMU */
+       asm_io_sync();
+
        writeq(info->gart_base | ilog2(info->gart_size), info->ioc_regs+IOC_PCOM);
        readq(info->ioc_regs+IOC_PCOM); /* flush */
 }
@@ -158,6 +161,7 @@ parisc_agp_insert_memory(struct agp_memory *mem, off_t pg_start, int type)
                        info->gatt[j] =
                                parisc_agp_mask_memory(agp_bridge,
                                        paddr, type);
+                       asm_io_fdc(&info->gatt[j]);
                }
        }
 
@@ -191,7 +195,16 @@ static unsigned long
 parisc_agp_mask_memory(struct agp_bridge_data *bridge, dma_addr_t addr,
                       int type)
 {
-       return SBA_PDIR_VALID_BIT | addr;
+       unsigned ci;                    /* coherent index */
+       dma_addr_t pa;
+
+       pa = addr & IOVP_MASK;
+       asm("lci 0(%1), %0" : "=r" (ci) : "r" (phys_to_virt(pa)));
+
+       pa |= (ci >> PAGE_SHIFT) & 0xff;/* move CI (8 bits) into lowest byte */
+       pa |= SBA_PDIR_VALID_BIT;       /* set "valid" bit */
+
+       return cpu_to_le64(pa);
 }
 
 static void
index 253f2dd..3cb3776 100644 (file)
@@ -1546,7 +1546,7 @@ const struct file_operations random_fops = {
        .compat_ioctl = compat_ptr_ioctl,
        .fasync = random_fasync,
        .llseek = noop_llseek,
-       .splice_read = generic_file_splice_read,
+       .splice_read = copy_splice_read,
        .splice_write = iter_file_splice_write,
 };
 
@@ -1557,7 +1557,7 @@ const struct file_operations urandom_fops = {
        .compat_ioctl = compat_ptr_ioctl,
        .fasync = random_fasync,
        .llseek = noop_llseek,
-       .splice_read = generic_file_splice_read,
+       .splice_read = copy_splice_read,
        .splice_write = iter_file_splice_write,
 };
 
index c10a4aa..cd48033 100644 (file)
@@ -571,6 +571,10 @@ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
 {
        struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng);
 
+       /* Give back zero bytes, as TPM chip has not yet fully resumed: */
+       if (chip->flags & TPM_CHIP_FLAG_SUSPENDED)
+               return 0;
+
        return tpm_get_random(chip, data, max);
 }
 
index 4463d00..586ca10 100644 (file)
@@ -412,6 +412,8 @@ int tpm_pm_suspend(struct device *dev)
        }
 
 suspended:
+       chip->flags |= TPM_CHIP_FLAG_SUSPENDED;
+
        if (rc)
                dev_err(dev, "Ignoring error %d while suspending\n", rc);
        return 0;
@@ -429,6 +431,14 @@ int tpm_pm_resume(struct device *dev)
        if (chip == NULL)
                return -ENODEV;
 
+       chip->flags &= ~TPM_CHIP_FLAG_SUSPENDED;
+
+       /*
+        * Guarantee that SUSPENDED is written last, so that hwrng does not
+        * activate before the chip has been fully resumed.
+        */
+       wmb();
+
        return 0;
 }
 EXPORT_SYMBOL_GPL(tpm_pm_resume);
index 7af3898..7db3593 100644 (file)
@@ -122,6 +122,29 @@ static const struct dmi_system_id tpm_tis_dmi_table[] = {
                        DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
                },
        },
+       {
+               .callback = tpm_tis_disable_irq,
+               .ident = "ThinkStation P360 Tiny",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkStation P360 Tiny"),
+               },
+       },
+       {
+               .callback = tpm_tis_disable_irq,
+               .ident = "ThinkPad L490",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad L490"),
+               },
+       },
+       {
+               .callback = tpm_tis_disable_irq,
+               .ident = "UPX-TGL",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "AAEON"),
+               },
+       },
        {}
 };
 
index 02945d5..558144f 100644 (file)
@@ -1209,25 +1209,20 @@ static void tpm_tis_reenable_interrupts(struct tpm_chip *chip)
        u32 intmask;
        int rc;
 
-       if (chip->ops->clk_enable != NULL)
-               chip->ops->clk_enable(chip, true);
-
-       /* reenable interrupts that device may have lost or
-        * BIOS/firmware may have disabled
+       /*
+        * Re-enable interrupts that device may have lost or BIOS/firmware may
+        * have disabled.
         */
        rc = tpm_tis_write8(priv, TPM_INT_VECTOR(priv->locality), priv->irq);
-       if (rc < 0)
-               goto out;
+       if (rc < 0) {
+               dev_err(&chip->dev, "Setting IRQ failed.\n");
+               return;
+       }
 
        intmask = priv->int_mask | TPM_GLOBAL_INT_ENABLE;
-
-       tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
-
-out:
-       if (chip->ops->clk_enable != NULL)
-               chip->ops->clk_enable(chip, false);
-
-       return;
+       rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
+       if (rc < 0)
+               dev_err(&chip->dev, "Enabling interrupts failed.\n");
 }
 
 int tpm_tis_resume(struct device *dev)
@@ -1235,27 +1230,27 @@ int tpm_tis_resume(struct device *dev)
        struct tpm_chip *chip = dev_get_drvdata(dev);
        int ret;
 
-       ret = tpm_tis_request_locality(chip, 0);
-       if (ret < 0)
+       ret = tpm_chip_start(chip);
+       if (ret)
                return ret;
 
        if (chip->flags & TPM_CHIP_FLAG_IRQ)
                tpm_tis_reenable_interrupts(chip);
 
-       ret = tpm_pm_resume(dev);
-       if (ret)
-               goto out;
-
        /*
         * TPM 1.2 requires self-test on resume. This function actually returns
         * an error code but for unknown reason it isn't handled.
         */
        if (!(chip->flags & TPM_CHIP_FLAG_TPM2))
                tpm1_do_selftest(chip);
-out:
-       tpm_tis_relinquish_locality(chip, 0);
 
-       return ret;
+       tpm_chip_stop(chip);
+
+       ret = tpm_pm_resume(dev);
+       if (ret)
+               return ret;
+
+       return 0;
 }
 EXPORT_SYMBOL_GPL(tpm_tis_resume);
 #endif
index e978f45..610bfad 100644 (file)
@@ -84,10 +84,10 @@ enum tis_defaults {
 #define ILB_REMAP_SIZE                 0x100
 
 enum tpm_tis_flags {
-       TPM_TIS_ITPM_WORKAROUND         = BIT(0),
-       TPM_TIS_INVALID_STATUS          = BIT(1),
-       TPM_TIS_DEFAULT_CANCELLATION    = BIT(2),
-       TPM_TIS_IRQ_TESTED              = BIT(3),
+       TPM_TIS_ITPM_WORKAROUND         = 0,
+       TPM_TIS_INVALID_STATUS          = 1,
+       TPM_TIS_DEFAULT_CANCELLATION    = 2,
+       TPM_TIS_IRQ_TESTED              = 3,
 };
 
 struct tpm_tis_data {
index edfa946..66759fe 100644 (file)
@@ -119,7 +119,10 @@ static int clk_composite_determine_rate(struct clk_hw *hw,
                        if (ret)
                                continue;
 
-                       rate_diff = abs(req->rate - tmp_req.rate);
+                       if (req->rate >= tmp_req.rate)
+                               rate_diff = req->rate - tmp_req.rate;
+                       else
+                               rate_diff = tmp_req.rate - req->rate;
 
                        if (!rate_diff || !req->best_parent_hw
                                       || best_rate_diff > rate_diff) {
index 70ae1dd..bacdcbb 100644 (file)
@@ -40,7 +40,7 @@ static struct clk_hw *loongson2_clk_register(struct device *dev,
 {
        int ret;
        struct clk_hw *hw;
-       struct clk_init_data init;
+       struct clk_init_data init = { };
 
        hw = devm_kzalloc(dev, sizeof(*hw), GFP_KERNEL);
        if (!hw)
index 22fc749..f6ea7e5 100644 (file)
@@ -10,7 +10,6 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <dt-bindings/clock/imx1-clock.h>
-#include <soc/imx/timer.h>
 #include <asm/irq.h>
 
 #include "clk.h"
index 5d17712..99618de 100644 (file)
@@ -8,7 +8,6 @@
 #include <linux/of_address.h>
 #include <dt-bindings/clock/imx27-clock.h>
 #include <soc/imx/revision.h>
-#include <soc/imx/timer.h>
 #include <asm/irq.h>
 
 #include "clk.h"
index c44e18c..4c8d9ff 100644 (file)
@@ -11,7 +11,6 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <soc/imx/revision.h>
-#include <soc/imx/timer.h>
 #include <asm/irq.h>
 
 #include "clk.h"
index 7dcbaea..3b6fdb4 100644 (file)
@@ -10,7 +10,6 @@
 #include <linux/of.h>
 #include <linux/err.h>
 #include <soc/imx/revision.h>
-#include <soc/imx/timer.h>
 #include <asm/irq.h>
 
 #include "clk.h"
index 6b4e193..c87a6c4 100644 (file)
@@ -23,6 +23,7 @@
 static DEFINE_SPINLOCK(mt8365_clk_lock);
 
 static const struct mtk_fixed_clk top_fixed_clks[] = {
+       FIXED_CLK(CLK_TOP_CLK_NULL, "clk_null", NULL, 0),
        FIXED_CLK(CLK_TOP_I2S0_BCK, "i2s0_bck", NULL, 26000000),
        FIXED_CLK(CLK_TOP_DSI0_LNTC_DSICK, "dsi0_lntc_dsick", "clk26m",
                  75000000),
@@ -559,6 +560,14 @@ static const struct mtk_clk_divider top_adj_divs[] = {
                  0x324, 16, 8, CLK_DIVIDER_ROUND_CLOSEST),
        DIV_ADJ_F(CLK_TOP_APLL12_CK_DIV3, "apll12_ck_div3", "apll_i2s3_sel",
                  0x324, 24, 8, CLK_DIVIDER_ROUND_CLOSEST),
+       DIV_ADJ_F(CLK_TOP_APLL12_CK_DIV4, "apll12_ck_div4", "apll_tdmout_sel",
+                 0x328, 0, 8, CLK_DIVIDER_ROUND_CLOSEST),
+       DIV_ADJ_F(CLK_TOP_APLL12_CK_DIV4B, "apll12_ck_div4b", "apll_tdmout_sel",
+                 0x328, 8, 8, CLK_DIVIDER_ROUND_CLOSEST),
+       DIV_ADJ_F(CLK_TOP_APLL12_CK_DIV5, "apll12_ck_div5", "apll_tdmin_sel",
+                 0x328, 16, 8, CLK_DIVIDER_ROUND_CLOSEST),
+       DIV_ADJ_F(CLK_TOP_APLL12_CK_DIV5B, "apll12_ck_div5b", "apll_tdmin_sel",
+                 0x328, 24, 8, CLK_DIVIDER_ROUND_CLOSEST),
        DIV_ADJ_F(CLK_TOP_APLL12_CK_DIV6, "apll12_ck_div6", "apll_spdif_sel",
                  0x32c, 0, 8, CLK_DIVIDER_ROUND_CLOSEST),
 };
@@ -583,15 +592,15 @@ static const struct mtk_gate_regs top2_cg_regs = {
 
 #define GATE_TOP0(_id, _name, _parent, _shift)                 \
        GATE_MTK(_id, _name, _parent, &top0_cg_regs,            \
-                _shift, &mtk_clk_gate_ops_no_setclr_inv)
+                _shift, &mtk_clk_gate_ops_no_setclr)
 
 #define GATE_TOP1(_id, _name, _parent, _shift)                 \
        GATE_MTK(_id, _name, _parent, &top1_cg_regs,            \
-                _shift, &mtk_clk_gate_ops_no_setclr)
+                _shift, &mtk_clk_gate_ops_no_setclr_inv)
 
 #define GATE_TOP2(_id, _name, _parent, _shift)                 \
        GATE_MTK(_id, _name, _parent, &top2_cg_regs,            \
-                _shift, &mtk_clk_gate_ops_no_setclr)
+                _shift, &mtk_clk_gate_ops_no_setclr_inv)
 
 static const struct mtk_gate top_clk_gates[] = {
        GATE_TOP0(CLK_TOP_CONN_32K, "conn_32k", "clk32k", 10),
@@ -696,6 +705,7 @@ static const struct mtk_gate ifr_clks[] = {
        GATE_IFR3(CLK_IFR_GCPU, "ifr_gcpu", "axi_sel", 8),
        GATE_IFR3(CLK_IFR_TRNG, "ifr_trng", "axi_sel", 9),
        GATE_IFR3(CLK_IFR_AUXADC, "ifr_auxadc", "clk26m", 10),
+       GATE_IFR3(CLK_IFR_CPUM, "ifr_cpum", "clk26m", 11),
        GATE_IFR3(CLK_IFR_AUXADC_MD, "ifr_auxadc_md", "clk26m", 14),
        GATE_IFR3(CLK_IFR_AP_DMA, "ifr_ap_dma", "axi_sel", 18),
        GATE_IFR3(CLK_IFR_DEBUGSYS, "ifr_debugsys", "axi_sel", 24),
@@ -717,6 +727,8 @@ static const struct mtk_gate ifr_clks[] = {
        GATE_IFR5(CLK_IFR_PWRAP_TMR, "ifr_pwrap_tmr", "clk26m", 12),
        GATE_IFR5(CLK_IFR_PWRAP_SPI, "ifr_pwrap_spi", "clk26m", 13),
        GATE_IFR5(CLK_IFR_PWRAP_SYS, "ifr_pwrap_sys", "clk26m", 14),
+       GATE_MTK_FLAGS(CLK_IFR_MCU_PM_BK, "ifr_mcu_pm_bk", NULL, &ifr5_cg_regs,
+                       17, &mtk_clk_gate_ops_setclr, CLK_IGNORE_UNUSED),
        GATE_IFR5(CLK_IFR_IRRX_26M, "ifr_irrx_26m", "clk26m", 22),
        GATE_IFR5(CLK_IFR_IRRX_32K, "ifr_irrx_32k", "clk32k", 23),
        GATE_IFR5(CLK_IFR_I2C0_AXI, "ifr_i2c0_axi", "i2c_sel", 24),
index 42958a5..621e298 100644 (file)
@@ -164,7 +164,7 @@ void pxa3xx_clk_update_accr(u32 disable, u32 enable, u32 xclkcfg, u32 mask)
        accr &= ~disable;
        accr |= enable;
 
-       writel(accr, ACCR);
+       writel(accr, clk_regs + ACCR);
        if (xclkcfg)
                __asm__("mcr p14, 0, %0, c6, c0, 0\n" : : "r"(xclkcfg));
 
index 526382d..c4d671a 100644 (file)
@@ -612,6 +612,15 @@ config TIMER_IMX_SYS_CTR
          Enable this option to use i.MX system counter timer as a
          clockevent.
 
+config CLKSRC_LOONGSON1_PWM
+       bool "Clocksource using Loongson1 PWM"
+       depends on MACH_LOONGSON32 || COMPILE_TEST
+       select MIPS_EXTERNAL_TIMER
+       select TIMER_OF
+       help
+         Enable this option to use Loongson1 PWM timer as clocksource
+         instead of the performance counter.
+
 config CLKSRC_ST_LPC
        bool "Low power clocksource found in the LPC" if COMPILE_TEST
        select TIMER_OF if OF
index f12d398..5d93c9e 100644 (file)
@@ -89,3 +89,4 @@ obj-$(CONFIG_MICROCHIP_PIT64B)                += timer-microchip-pit64b.o
 obj-$(CONFIG_MSC313E_TIMER)            += timer-msc313e.o
 obj-$(CONFIG_GOLDFISH_TIMER)           += timer-goldfish.o
 obj-$(CONFIG_GXP_TIMER)                        += timer-gxp.o
+obj-$(CONFIG_CLKSRC_LOONGSON1_PWM)     += timer-loongson1-pwm.o
index e09d442..e733a2a 100644 (file)
@@ -191,22 +191,40 @@ u32 arch_timer_reg_read(int access, enum arch_timer_reg reg,
        return val;
 }
 
-static notrace u64 arch_counter_get_cntpct_stable(void)
+static noinstr u64 raw_counter_get_cntpct_stable(void)
 {
        return __arch_counter_get_cntpct_stable();
 }
 
-static notrace u64 arch_counter_get_cntpct(void)
+static notrace u64 arch_counter_get_cntpct_stable(void)
+{
+       u64 val;
+       preempt_disable_notrace();
+       val = __arch_counter_get_cntpct_stable();
+       preempt_enable_notrace();
+       return val;
+}
+
+static noinstr u64 arch_counter_get_cntpct(void)
 {
        return __arch_counter_get_cntpct();
 }
 
-static notrace u64 arch_counter_get_cntvct_stable(void)
+static noinstr u64 raw_counter_get_cntvct_stable(void)
 {
        return __arch_counter_get_cntvct_stable();
 }
 
-static notrace u64 arch_counter_get_cntvct(void)
+static notrace u64 arch_counter_get_cntvct_stable(void)
+{
+       u64 val;
+       preempt_disable_notrace();
+       val = __arch_counter_get_cntvct_stable();
+       preempt_enable_notrace();
+       return val;
+}
+
+static noinstr u64 arch_counter_get_cntvct(void)
 {
        return __arch_counter_get_cntvct();
 }
@@ -753,14 +771,14 @@ static int arch_timer_set_next_event_phys(unsigned long evt,
        return 0;
 }
 
-static u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
+static noinstr u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
 {
        u32 cnt_lo, cnt_hi, tmp_hi;
 
        do {
-               cnt_hi = readl_relaxed(t->base + offset_lo + 4);
-               cnt_lo = readl_relaxed(t->base + offset_lo);
-               tmp_hi = readl_relaxed(t->base + offset_lo + 4);
+               cnt_hi = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo + 4));
+               cnt_lo = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo));
+               tmp_hi = __le32_to_cpu((__le32 __force)__raw_readl(t->base + offset_lo + 4));
        } while (cnt_hi != tmp_hi);
 
        return ((u64) cnt_hi << 32) | cnt_lo;
@@ -1060,7 +1078,7 @@ bool arch_timer_evtstrm_available(void)
        return cpumask_test_cpu(raw_smp_processor_id(), &evtstrm_available);
 }
 
-static u64 arch_counter_get_cntvct_mem(void)
+static noinstr u64 arch_counter_get_cntvct_mem(void)
 {
        return arch_counter_get_cnt_mem(arch_timer_mem, CNTVCT_LO);
 }
@@ -1074,6 +1092,7 @@ struct arch_timer_kvm_info *arch_timer_get_kvm_info(void)
 
 static void __init arch_counter_register(unsigned type)
 {
+       u64 (*scr)(void);
        u64 start_count;
        int width;
 
@@ -1083,21 +1102,28 @@ static void __init arch_counter_register(unsigned type)
 
                if ((IS_ENABLED(CONFIG_ARM64) && !is_hyp_mode_available()) ||
                    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
-                       if (arch_timer_counter_has_wa())
+                       if (arch_timer_counter_has_wa()) {
                                rd = arch_counter_get_cntvct_stable;
-                       else
+                               scr = raw_counter_get_cntvct_stable;
+                       } else {
                                rd = arch_counter_get_cntvct;
+                               scr = arch_counter_get_cntvct;
+                       }
                } else {
-                       if (arch_timer_counter_has_wa())
+                       if (arch_timer_counter_has_wa()) {
                                rd = arch_counter_get_cntpct_stable;
-                       else
+                               scr = raw_counter_get_cntpct_stable;
+                       } else {
                                rd = arch_counter_get_cntpct;
+                               scr = arch_counter_get_cntpct;
+                       }
                }
 
                arch_timer_read_counter = rd;
                clocksource_counter.vdso_clock_mode = vdso_default;
        } else {
                arch_timer_read_counter = arch_counter_get_cntvct_mem;
+               scr = arch_counter_get_cntvct_mem;
        }
 
        width = arch_counter_get_width();
@@ -1113,7 +1139,7 @@ static void __init arch_counter_register(unsigned type)
        timecounter_init(&arch_timer_kvm_info.timecounter,
                         &cyclecounter, start_count);
 
-       sched_clock_register(arch_timer_read_counter, width, arch_timer_rate);
+       sched_clock_register(scr, width, arch_timer_rate);
 }
 
 static void arch_timer_stop(struct clock_event_device *clk)
index bcd9042..e56307a 100644 (file)
@@ -365,6 +365,20 @@ void hv_stimer_global_cleanup(void)
 }
 EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
 
+static __always_inline u64 read_hv_clock_msr(void)
+{
+       /*
+        * Read the partition counter to get the current tick count. This count
+        * is set to 0 when the partition is created and is incremented in 100
+        * nanosecond units.
+        *
+        * Use hv_raw_get_register() because this function is used from
+        * noinstr. Notable; while HV_REGISTER_TIME_REF_COUNT is a synthetic
+        * register it doesn't need the GHCB path.
+        */
+       return hv_raw_get_register(HV_REGISTER_TIME_REF_COUNT);
+}
+
 /*
  * Code and definitions for the Hyper-V clocksources.  Two
  * clocksources are defined: one that reads the Hyper-V defined MSR, and
@@ -393,14 +407,20 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static u64 notrace read_hv_clock_tsc(void)
+static __always_inline u64 read_hv_clock_tsc(void)
 {
-       u64 current_tick = hv_read_tsc_page(hv_get_tsc_page());
+       u64 cur_tsc, time;
 
-       if (current_tick == U64_MAX)
-               current_tick = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
+       /*
+        * The Hyper-V Top-Level Function Spec (TLFS), section Timers,
+        * subsection Refererence Counter, guarantees that the TSC and MSR
+        * times are in sync and monotonic. Therefore we can fall back
+        * to the MSR in case the TSC page indicates unavailability.
+        */
+       if (!hv_read_tsc_page_tsc(tsc_page, &cur_tsc, &time))
+               time = read_hv_clock_msr();
 
-       return current_tick;
+       return time;
 }
 
 static u64 notrace read_hv_clock_tsc_cs(struct clocksource *arg)
@@ -408,7 +428,7 @@ static u64 notrace read_hv_clock_tsc_cs(struct clocksource *arg)
        return read_hv_clock_tsc();
 }
 
-static u64 notrace read_hv_sched_clock_tsc(void)
+static u64 noinstr read_hv_sched_clock_tsc(void)
 {
        return (read_hv_clock_tsc() - hv_sched_clock_offset) *
                (NSEC_PER_SEC / HV_CLOCK_HZ);
@@ -460,30 +480,14 @@ static struct clocksource hyperv_cs_tsc = {
 #endif
 };
 
-static u64 notrace read_hv_clock_msr(void)
-{
-       /*
-        * Read the partition counter to get the current tick count. This count
-        * is set to 0 when the partition is created and is incremented in
-        * 100 nanosecond units.
-        */
-       return hv_get_register(HV_REGISTER_TIME_REF_COUNT);
-}
-
 static u64 notrace read_hv_clock_msr_cs(struct clocksource *arg)
 {
        return read_hv_clock_msr();
 }
 
-static u64 notrace read_hv_sched_clock_msr(void)
-{
-       return (read_hv_clock_msr() - hv_sched_clock_offset) *
-               (NSEC_PER_SEC / HV_CLOCK_HZ);
-}
-
 static struct clocksource hyperv_cs_msr = {
        .name   = "hyperv_clocksource_msr",
-       .rating = 500,
+       .rating = 495,
        .read   = read_hv_clock_msr_cs,
        .mask   = CLOCKSOURCE_MASK(64),
        .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
@@ -513,7 +517,7 @@ static __always_inline void hv_setup_sched_clock(void *sched_clock)
 static __always_inline void hv_setup_sched_clock(void *sched_clock) {}
 #endif /* CONFIG_GENERIC_SCHED_CLOCK */
 
-static bool __init hv_init_tsc_clocksource(void)
+static void __init hv_init_tsc_clocksource(void)
 {
        union hv_reference_tsc_msr tsc_msr;
 
@@ -524,17 +528,14 @@ static bool __init hv_init_tsc_clocksource(void)
         * Hyper-V Reference TSC rating, causing the generic TSC to be used.
         * TSC_INVARIANT is not offered on ARM64, so the Hyper-V Reference
         * TSC will be preferred over the virtualized ARM64 arch counter.
-        * While the Hyper-V MSR clocksource won't be used since the
-        * Reference TSC clocksource is present, change its rating as
-        * well for consistency.
         */
        if (ms_hyperv.features & HV_ACCESS_TSC_INVARIANT) {
                hyperv_cs_tsc.rating = 250;
-               hyperv_cs_msr.rating = 250;
+               hyperv_cs_msr.rating = 245;
        }
 
        if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
-               return false;
+               return;
 
        hv_read_reference_counter = read_hv_clock_tsc;
 
@@ -565,33 +566,34 @@ static bool __init hv_init_tsc_clocksource(void)
 
        clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 
-       hv_sched_clock_offset = hv_read_reference_counter();
-       hv_setup_sched_clock(read_hv_sched_clock_tsc);
-
-       return true;
+       /*
+        * If TSC is invariant, then let it stay as the sched clock since it
+        * will be faster than reading the TSC page. But if not invariant, use
+        * the TSC page so that live migrations across hosts with different
+        * frequencies is handled correctly.
+        */
+       if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT)) {
+               hv_sched_clock_offset = hv_read_reference_counter();
+               hv_setup_sched_clock(read_hv_sched_clock_tsc);
+       }
 }
 
 void __init hv_init_clocksource(void)
 {
        /*
-        * Try to set up the TSC page clocksource. If it succeeds, we're
-        * done. Otherwise, set up the MSR clocksource.  At least one of
-        * these will always be available except on very old versions of
-        * Hyper-V on x86.  In that case we won't have a Hyper-V
+        * Try to set up the TSC page clocksource, then the MSR clocksource.
+        * At least one of these will always be available except on very old
+        * versions of Hyper-V on x86.  In that case we won't have a Hyper-V
         * clocksource, but Linux will still run with a clocksource based
         * on the emulated PIT or LAPIC timer.
+        *
+        * Never use the MSR clocksource as sched clock.  It's too slow.
+        * Better to use the native sched clock as the fallback.
         */
-       if (hv_init_tsc_clocksource())
-               return;
-
-       if (!(ms_hyperv.features & HV_MSR_TIME_REF_COUNT_AVAILABLE))
-               return;
-
-       hv_read_reference_counter = read_hv_clock_msr;
-       clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
+       hv_init_tsc_clocksource();
 
-       hv_sched_clock_offset = hv_read_reference_counter();
-       hv_setup_sched_clock(read_hv_sched_clock_msr);
+       if (ms_hyperv.features & HV_MSR_TIME_REF_COUNT_AVAILABLE)
+               clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
 }
 
 void __init hv_remap_tsc_clocksource(void)
index 089ce64..154ee5f 100644 (file)
@@ -369,7 +369,7 @@ static int __init ingenic_tcu_probe(struct platform_device *pdev)
        return 0;
 }
 
-static int __maybe_unused ingenic_tcu_suspend(struct device *dev)
+static int ingenic_tcu_suspend(struct device *dev)
 {
        struct ingenic_tcu *tcu = dev_get_drvdata(dev);
        unsigned int cpu;
@@ -382,7 +382,7 @@ static int __maybe_unused ingenic_tcu_suspend(struct device *dev)
        return 0;
 }
 
-static int __maybe_unused ingenic_tcu_resume(struct device *dev)
+static int ingenic_tcu_resume(struct device *dev)
 {
        struct ingenic_tcu *tcu = dev_get_drvdata(dev);
        unsigned int cpu;
@@ -406,7 +406,7 @@ err_timer_clk_disable:
        return ret;
 }
 
-static const struct dev_pm_ops __maybe_unused ingenic_tcu_pm_ops = {
+static const struct dev_pm_ops ingenic_tcu_pm_ops = {
        /* _noirq: We want the TCU clocks to be gated last / ungated first */
        .suspend_noirq = ingenic_tcu_suspend,
        .resume_noirq  = ingenic_tcu_resume,
@@ -415,9 +415,7 @@ static const struct dev_pm_ops __maybe_unused ingenic_tcu_pm_ops = {
 static struct platform_driver ingenic_tcu_driver = {
        .driver = {
                .name   = "ingenic-tcu-timer",
-#ifdef CONFIG_PM_SLEEP
-               .pm     = &ingenic_tcu_pm_ops,
-#endif
+               .pm     = pm_sleep_ptr(&ingenic_tcu_pm_ops),
                .of_match_table = ingenic_tcu_of_match,
        },
 };
index 4efd0cf..0d52e28 100644 (file)
@@ -486,10 +486,10 @@ static int __init ttc_timer_probe(struct platform_device *pdev)
         * and use it. Note that the event timer uses the interrupt and it's the
         * 2nd TTC hence the irq_of_parse_and_map(,1)
         */
-       timer_baseaddr = of_iomap(timer, 0);
-       if (!timer_baseaddr) {
+       timer_baseaddr = devm_of_iomap(&pdev->dev, timer, 0, NULL);
+       if (IS_ERR(timer_baseaddr)) {
                pr_err("ERROR: invalid timer base address\n");
-               return -ENXIO;
+               return PTR_ERR(timer_baseaddr);
        }
 
        irq = irq_of_parse_and_map(timer, 1);
@@ -513,20 +513,27 @@ static int __init ttc_timer_probe(struct platform_device *pdev)
        clk_ce = of_clk_get(timer, clksel);
        if (IS_ERR(clk_ce)) {
                pr_err("ERROR: timer input clock not found\n");
-               return PTR_ERR(clk_ce);
+               ret = PTR_ERR(clk_ce);
+               goto put_clk_cs;
        }
 
        ret = ttc_setup_clocksource(clk_cs, timer_baseaddr, timer_width);
        if (ret)
-               return ret;
+               goto put_clk_ce;
 
        ret = ttc_setup_clockevent(clk_ce, timer_baseaddr + 4, irq);
        if (ret)
-               return ret;
+               goto put_clk_ce;
 
        pr_info("%pOFn #0 at %p, irq=%d\n", timer, timer_baseaddr, irq);
 
        return 0;
+
+put_clk_ce:
+       clk_put(clk_ce);
+put_clk_cs:
+       clk_put(clk_cs);
+       return ret;
 }
 
 static const struct of_device_id ttc_timer_of_match[] = {
index ca3e4cb..28ab4f1 100644 (file)
@@ -16,7 +16,6 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
-#include <soc/imx/timer.h>
 
 /*
  * There are 4 versions of the timer hardware on Freescale MXC hardware.
  *  - MX25, MX31, MX35, MX37, MX51, MX6Q(rev1.0)
  *  - MX6DL, MX6SX, MX6Q(rev1.1+)
  */
+enum imx_gpt_type {
+       GPT_TYPE_IMX1,          /* i.MX1 */
+       GPT_TYPE_IMX21,         /* i.MX21/27 */
+       GPT_TYPE_IMX31,         /* i.MX31/35/25/37/51/6Q */
+       GPT_TYPE_IMX6DL,        /* i.MX6DL/SX/SL */
+};
 
 /* defines common for all i.MX */
 #define MXC_TCTL               0x00
@@ -93,13 +98,11 @@ static void imx1_gpt_irq_disable(struct imx_timer *imxtm)
        tmp = readl_relaxed(imxtm->base + MXC_TCTL);
        writel_relaxed(tmp & ~MX1_2_TCTL_IRQEN, imxtm->base + MXC_TCTL);
 }
-#define imx21_gpt_irq_disable imx1_gpt_irq_disable
 
 static void imx31_gpt_irq_disable(struct imx_timer *imxtm)
 {
        writel_relaxed(0, imxtm->base + V2_IR);
 }
-#define imx6dl_gpt_irq_disable imx31_gpt_irq_disable
 
 static void imx1_gpt_irq_enable(struct imx_timer *imxtm)
 {
@@ -108,13 +111,11 @@ static void imx1_gpt_irq_enable(struct imx_timer *imxtm)
        tmp = readl_relaxed(imxtm->base + MXC_TCTL);
        writel_relaxed(tmp | MX1_2_TCTL_IRQEN, imxtm->base + MXC_TCTL);
 }
-#define imx21_gpt_irq_enable imx1_gpt_irq_enable
 
 static void imx31_gpt_irq_enable(struct imx_timer *imxtm)
 {
        writel_relaxed(1<<0, imxtm->base + V2_IR);
 }
-#define imx6dl_gpt_irq_enable imx31_gpt_irq_enable
 
 static void imx1_gpt_irq_acknowledge(struct imx_timer *imxtm)
 {
@@ -131,7 +132,6 @@ static void imx31_gpt_irq_acknowledge(struct imx_timer *imxtm)
 {
        writel_relaxed(V2_TSTAT_OF1, imxtm->base + V2_TSTAT);
 }
-#define imx6dl_gpt_irq_acknowledge imx31_gpt_irq_acknowledge
 
 static void __iomem *sched_clock_reg;
 
@@ -296,7 +296,6 @@ static void imx1_gpt_setup_tctl(struct imx_timer *imxtm)
        tctl_val = MX1_2_TCTL_FRR | MX1_2_TCTL_CLK_PCLK1 | MXC_TCTL_TEN;
        writel_relaxed(tctl_val, imxtm->base + MXC_TCTL);
 }
-#define imx21_gpt_setup_tctl imx1_gpt_setup_tctl
 
 static void imx31_gpt_setup_tctl(struct imx_timer *imxtm)
 {
@@ -343,10 +342,10 @@ static const struct imx_gpt_data imx21_gpt_data = {
        .reg_tstat = MX1_2_TSTAT,
        .reg_tcn = MX1_2_TCN,
        .reg_tcmp = MX1_2_TCMP,
-       .gpt_irq_enable = imx21_gpt_irq_enable,
-       .gpt_irq_disable = imx21_gpt_irq_disable,
+       .gpt_irq_enable = imx1_gpt_irq_enable,
+       .gpt_irq_disable = imx1_gpt_irq_disable,
        .gpt_irq_acknowledge = imx21_gpt_irq_acknowledge,
-       .gpt_setup_tctl = imx21_gpt_setup_tctl,
+       .gpt_setup_tctl = imx1_gpt_setup_tctl,
        .set_next_event = mx1_2_set_next_event,
 };
 
@@ -365,9 +364,9 @@ static const struct imx_gpt_data imx6dl_gpt_data = {
        .reg_tstat = V2_TSTAT,
        .reg_tcn = V2_TCN,
        .reg_tcmp = V2_TCMP,
-       .gpt_irq_enable = imx6dl_gpt_irq_enable,
-       .gpt_irq_disable = imx6dl_gpt_irq_disable,
-       .gpt_irq_acknowledge = imx6dl_gpt_irq_acknowledge,
+       .gpt_irq_enable = imx31_gpt_irq_enable,
+       .gpt_irq_disable = imx31_gpt_irq_disable,
+       .gpt_irq_acknowledge = imx31_gpt_irq_acknowledge,
        .gpt_setup_tctl = imx6dl_gpt_setup_tctl,
        .set_next_event = v2_set_next_event,
 };
diff --git a/drivers/clocksource/timer-loongson1-pwm.c b/drivers/clocksource/timer-loongson1-pwm.c
new file mode 100644 (file)
index 0000000..6335fee
--- /dev/null
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Clocksource driver for Loongson-1 SoC
+ *
+ * Copyright (c) 2023 Keguang Zhang <keguang.zhang@gmail.com>
+ */
+
+#include <linux/clockchips.h>
+#include <linux/interrupt.h>
+#include <linux/sizes.h>
+#include "timer-of.h"
+
+/* Loongson-1 PWM Timer Register Definitions */
+#define PWM_CNTR               0x0
+#define PWM_HRC                        0x4
+#define PWM_LRC                        0x8
+#define PWM_CTRL               0xc
+
+/* PWM Control Register Bits */
+#define INT_LRC_EN             BIT(11)
+#define INT_HRC_EN             BIT(10)
+#define CNTR_RST               BIT(7)
+#define INT_SR                 BIT(6)
+#define INT_EN                 BIT(5)
+#define PWM_SINGLE             BIT(4)
+#define PWM_OE                 BIT(3)
+#define CNT_EN                 BIT(0)
+
+#define CNTR_WIDTH             24
+
+DEFINE_RAW_SPINLOCK(ls1x_timer_lock);
+
+struct ls1x_clocksource {
+       void __iomem *reg_base;
+       unsigned long ticks_per_jiffy;
+       struct clocksource clksrc;
+};
+
+static inline struct ls1x_clocksource *to_ls1x_clksrc(struct clocksource *c)
+{
+       return container_of(c, struct ls1x_clocksource, clksrc);
+}
+
+static inline void ls1x_pwmtimer_set_period(unsigned int period,
+                                           struct timer_of *to)
+{
+       writel(period, timer_of_base(to) + PWM_LRC);
+       writel(period, timer_of_base(to) + PWM_HRC);
+}
+
+static inline void ls1x_pwmtimer_clear(struct timer_of *to)
+{
+       writel(0, timer_of_base(to) + PWM_CNTR);
+}
+
+static inline void ls1x_pwmtimer_start(struct timer_of *to)
+{
+       writel((INT_EN | PWM_OE | CNT_EN), timer_of_base(to) + PWM_CTRL);
+}
+
+static inline void ls1x_pwmtimer_stop(struct timer_of *to)
+{
+       writel(0, timer_of_base(to) + PWM_CTRL);
+}
+
+static inline void ls1x_pwmtimer_irq_ack(struct timer_of *to)
+{
+       int val;
+
+       val = readl(timer_of_base(to) + PWM_CTRL);
+       val |= INT_SR;
+       writel(val, timer_of_base(to) + PWM_CTRL);
+}
+
+static irqreturn_t ls1x_clockevent_isr(int irq, void *dev_id)
+{
+       struct clock_event_device *clkevt = dev_id;
+       struct timer_of *to = to_timer_of(clkevt);
+
+       ls1x_pwmtimer_irq_ack(to);
+       ls1x_pwmtimer_clear(to);
+       ls1x_pwmtimer_start(to);
+
+       clkevt->event_handler(clkevt);
+
+       return IRQ_HANDLED;
+}
+
+static int ls1x_clockevent_set_state_periodic(struct clock_event_device *clkevt)
+{
+       struct timer_of *to = to_timer_of(clkevt);
+
+       raw_spin_lock(&ls1x_timer_lock);
+       ls1x_pwmtimer_set_period(timer_of_period(to), to);
+       ls1x_pwmtimer_clear(to);
+       ls1x_pwmtimer_start(to);
+       raw_spin_unlock(&ls1x_timer_lock);
+
+       return 0;
+}
+
+static int ls1x_clockevent_tick_resume(struct clock_event_device *clkevt)
+{
+       raw_spin_lock(&ls1x_timer_lock);
+       ls1x_pwmtimer_start(to_timer_of(clkevt));
+       raw_spin_unlock(&ls1x_timer_lock);
+
+       return 0;
+}
+
+static int ls1x_clockevent_set_state_shutdown(struct clock_event_device *clkevt)
+{
+       raw_spin_lock(&ls1x_timer_lock);
+       ls1x_pwmtimer_stop(to_timer_of(clkevt));
+       raw_spin_unlock(&ls1x_timer_lock);
+
+       return 0;
+}
+
+static int ls1x_clockevent_set_next(unsigned long evt,
+                                   struct clock_event_device *clkevt)
+{
+       struct timer_of *to = to_timer_of(clkevt);
+
+       raw_spin_lock(&ls1x_timer_lock);
+       ls1x_pwmtimer_set_period(evt, to);
+       ls1x_pwmtimer_clear(to);
+       ls1x_pwmtimer_start(to);
+       raw_spin_unlock(&ls1x_timer_lock);
+
+       return 0;
+}
+
+static struct timer_of ls1x_to = {
+       .flags = TIMER_OF_IRQ | TIMER_OF_BASE | TIMER_OF_CLOCK,
+       .clkevt = {
+               .name                   = "ls1x-pwmtimer",
+               .features               = CLOCK_EVT_FEAT_PERIODIC |
+                                         CLOCK_EVT_FEAT_ONESHOT,
+               .rating                 = 300,
+               .set_next_event         = ls1x_clockevent_set_next,
+               .set_state_periodic     = ls1x_clockevent_set_state_periodic,
+               .set_state_oneshot      = ls1x_clockevent_set_state_shutdown,
+               .set_state_shutdown     = ls1x_clockevent_set_state_shutdown,
+               .tick_resume            = ls1x_clockevent_tick_resume,
+       },
+       .of_irq = {
+               .handler                = ls1x_clockevent_isr,
+               .flags                  = IRQF_TIMER,
+       },
+};
+
+/*
+ * Since the PWM timer overflows every two ticks, its not very useful
+ * to just read by itself. So use jiffies to emulate a free
+ * running counter:
+ */
+static u64 ls1x_clocksource_read(struct clocksource *cs)
+{
+       struct ls1x_clocksource *ls1x_cs = to_ls1x_clksrc(cs);
+       unsigned long flags;
+       int count;
+       u32 jifs;
+       static int old_count;
+       static u32 old_jifs;
+
+       raw_spin_lock_irqsave(&ls1x_timer_lock, flags);
+       /*
+        * Although our caller may have the read side of xtime_lock,
+        * this is now a seqlock, and we are cheating in this routine
+        * by having side effects on state that we cannot undo if
+        * there is a collision on the seqlock and our caller has to
+        * retry.  (Namely, old_jifs and old_count.)  So we must treat
+        * jiffies as volatile despite the lock.  We read jiffies
+        * before latching the timer count to guarantee that although
+        * the jiffies value might be older than the count (that is,
+        * the counter may underflow between the last point where
+        * jiffies was incremented and the point where we latch the
+        * count), it cannot be newer.
+        */
+       jifs = jiffies;
+       /* read the count */
+       count = readl(ls1x_cs->reg_base + PWM_CNTR);
+
+       /*
+        * It's possible for count to appear to go the wrong way for this
+        * reason:
+        *
+        *  The timer counter underflows, but we haven't handled the resulting
+        *  interrupt and incremented jiffies yet.
+        *
+        * Previous attempts to handle these cases intelligently were buggy, so
+        * we just do the simple thing now.
+        */
+       if (count < old_count && jifs == old_jifs)
+               count = old_count;
+
+       old_count = count;
+       old_jifs = jifs;
+
+       raw_spin_unlock_irqrestore(&ls1x_timer_lock, flags);
+
+       return (u64)(jifs * ls1x_cs->ticks_per_jiffy) + count;
+}
+
+static struct ls1x_clocksource ls1x_clocksource = {
+       .clksrc = {
+               .name           = "ls1x-pwmtimer",
+               .rating         = 300,
+               .read           = ls1x_clocksource_read,
+               .mask           = CLOCKSOURCE_MASK(CNTR_WIDTH),
+               .flags          = CLOCK_SOURCE_IS_CONTINUOUS,
+       },
+};
+
+static int __init ls1x_pwm_clocksource_init(struct device_node *np)
+{
+       struct timer_of *to = &ls1x_to;
+       int ret;
+
+       ret = timer_of_init(np, to);
+       if (ret)
+               return ret;
+
+       clockevents_config_and_register(&to->clkevt, timer_of_rate(to),
+                                       0x1, GENMASK(CNTR_WIDTH - 1, 0));
+
+       ls1x_clocksource.reg_base = timer_of_base(to);
+       ls1x_clocksource.ticks_per_jiffy = timer_of_period(to);
+
+       return clocksource_register_hz(&ls1x_clocksource.clksrc,
+                                      timer_of_rate(to));
+}
+
+TIMER_OF_DECLARE(ls1x_pwm_clocksource, "loongson,ls1b-pwmtimer",
+                ls1x_pwm_clocksource_init);
index 2c839bd..a1c51ab 100644 (file)
@@ -38,7 +38,7 @@ choice
        prompt "Default CPUFreq governor"
        default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1110_CPUFREQ
        default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if ARM64 || ARM
-       default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if X86_INTEL_PSTATE && SMP
+       default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if (X86_INTEL_PSTATE || X86_AMD_PSTATE) && SMP
        default CPU_FREQ_DEFAULT_GOV_PERFORMANCE
        help
          This option sets which CPUFreq governor shall be loaded at
index 00476e9..438c9e7 100644 (file)
@@ -51,6 +51,23 @@ config X86_AMD_PSTATE
 
          If in doubt, say N.
 
+config X86_AMD_PSTATE_DEFAULT_MODE
+       int "AMD Processor P-State default mode"
+       depends on X86_AMD_PSTATE
+       default 3 if X86_AMD_PSTATE
+       range 1 4
+       help
+         Select the default mode the amd-pstate driver will use on
+         supported hardware.
+         The value set has the following meanings:
+               1 -> Disabled
+               2 -> Passive
+               3 -> Active (EPP)
+               4 -> Guided
+
+         For details, take a look at:
+         <file:Documentation/admin-guide/pm/amd-pstate.rst>.
+
 config X86_AMD_PSTATE_UT
        tristate "selftest for AMD Processor P-State driver"
        depends on X86 && ACPI_PROCESSOR
index 2990439..b2f05d2 100644 (file)
@@ -975,7 +975,7 @@ static int __init acpi_cpufreq_probe(struct platform_device *pdev)
 
        /* don't keep reloading if cpufreq_driver exists */
        if (cpufreq_get_current_driver())
-               return -EEXIST;
+               return -ENODEV;
 
        pr_debug("%s\n", __func__);
 
index 5a3d4aa..81fba0d 100644 (file)
@@ -62,7 +62,8 @@
 static struct cpufreq_driver *current_pstate_driver;
 static struct cpufreq_driver amd_pstate_driver;
 static struct cpufreq_driver amd_pstate_epp_driver;
-static int cppc_state = AMD_PSTATE_DISABLE;
+static int cppc_state = AMD_PSTATE_UNDEFINED;
+static bool cppc_enabled;
 
 /*
  * AMD Energy Preference Performance (EPP)
@@ -228,7 +229,28 @@ static int amd_pstate_set_energy_pref_index(struct amd_cpudata *cpudata,
 
 static inline int pstate_enable(bool enable)
 {
-       return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable);
+       int ret, cpu;
+       unsigned long logical_proc_id_mask = 0;
+
+       if (enable == cppc_enabled)
+               return 0;
+
+       for_each_present_cpu(cpu) {
+               unsigned long logical_id = topology_logical_die_id(cpu);
+
+               if (test_bit(logical_id, &logical_proc_id_mask))
+                       continue;
+
+               set_bit(logical_id, &logical_proc_id_mask);
+
+               ret = wrmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_ENABLE,
+                               enable);
+               if (ret)
+                       return ret;
+       }
+
+       cppc_enabled = enable;
+       return 0;
 }
 
 static int cppc_enable(bool enable)
@@ -236,6 +258,9 @@ static int cppc_enable(bool enable)
        int cpu, ret = 0;
        struct cppc_perf_ctrls perf_ctrls;
 
+       if (enable == cppc_enabled)
+               return 0;
+
        for_each_present_cpu(cpu) {
                ret = cppc_set_enable(cpu, enable);
                if (ret)
@@ -251,6 +276,7 @@ static int cppc_enable(bool enable)
                }
        }
 
+       cppc_enabled = enable;
        return ret;
 }
 
@@ -444,9 +470,8 @@ static int amd_pstate_verify(struct cpufreq_policy_data *policy)
        return 0;
 }
 
-static int amd_pstate_target(struct cpufreq_policy *policy,
-                            unsigned int target_freq,
-                            unsigned int relation)
+static int amd_pstate_update_freq(struct cpufreq_policy *policy,
+                                 unsigned int target_freq, bool fast_switch)
 {
        struct cpufreq_freqs freqs;
        struct amd_cpudata *cpudata = policy->driver_data;
@@ -465,26 +490,51 @@ static int amd_pstate_target(struct cpufreq_policy *policy,
        des_perf = DIV_ROUND_CLOSEST(target_freq * cap_perf,
                                     cpudata->max_freq);
 
-       cpufreq_freq_transition_begin(policy, &freqs);
+       WARN_ON(fast_switch && !policy->fast_switch_enabled);
+       /*
+        * If fast_switch is desired, then there aren't any registered
+        * transition notifiers. See comment for
+        * cpufreq_enable_fast_switch().
+        */
+       if (!fast_switch)
+               cpufreq_freq_transition_begin(policy, &freqs);
+
        amd_pstate_update(cpudata, min_perf, des_perf,
-                         max_perf, false, policy->governor->flags);
-       cpufreq_freq_transition_end(policy, &freqs, false);
+                       max_perf, fast_switch, policy->governor->flags);
+
+       if (!fast_switch)
+               cpufreq_freq_transition_end(policy, &freqs, false);
 
        return 0;
 }
 
+static int amd_pstate_target(struct cpufreq_policy *policy,
+                            unsigned int target_freq,
+                            unsigned int relation)
+{
+       return amd_pstate_update_freq(policy, target_freq, false);
+}
+
+static unsigned int amd_pstate_fast_switch(struct cpufreq_policy *policy,
+                                 unsigned int target_freq)
+{
+       return amd_pstate_update_freq(policy, target_freq, true);
+}
+
 static void amd_pstate_adjust_perf(unsigned int cpu,
                                   unsigned long _min_perf,
                                   unsigned long target_perf,
                                   unsigned long capacity)
 {
        unsigned long max_perf, min_perf, des_perf,
-                     cap_perf, lowest_nonlinear_perf;
+                     cap_perf, lowest_nonlinear_perf, max_freq;
        struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
        struct amd_cpudata *cpudata = policy->driver_data;
+       unsigned int target_freq;
 
        cap_perf = READ_ONCE(cpudata->highest_perf);
        lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+       max_freq = READ_ONCE(cpudata->max_freq);
 
        des_perf = cap_perf;
        if (target_perf < capacity)
@@ -501,6 +551,10 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
        if (max_perf < min_perf)
                max_perf = min_perf;
 
+       des_perf = clamp_t(unsigned long, des_perf, min_perf, max_perf);
+       target_freq = div_u64(des_perf * max_freq, max_perf);
+       policy->cur = target_freq;
+
        amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true,
                        policy->governor->flags);
        cpufreq_cpu_put(policy);
@@ -715,6 +769,7 @@ static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
 
        freq_qos_remove_request(&cpudata->req[1]);
        freq_qos_remove_request(&cpudata->req[0]);
+       policy->fast_switch_possible = false;
        kfree(cpudata);
 
        return 0;
@@ -1016,6 +1071,26 @@ static const struct attribute_group amd_pstate_global_attr_group = {
        .attrs = pstate_global_attributes,
 };
 
+static bool amd_pstate_acpi_pm_profile_server(void)
+{
+       switch (acpi_gbl_FADT.preferred_profile) {
+       case PM_ENTERPRISE_SERVER:
+       case PM_SOHO_SERVER:
+       case PM_PERFORMANCE_SERVER:
+               return true;
+       }
+       return false;
+}
+
+static bool amd_pstate_acpi_pm_profile_undefined(void)
+{
+       if (acpi_gbl_FADT.preferred_profile == PM_UNSPECIFIED)
+               return true;
+       if (acpi_gbl_FADT.preferred_profile >= NR_PM_PROFILES)
+               return true;
+       return false;
+}
+
 static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy)
 {
        int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
@@ -1073,13 +1148,16 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy)
        policy->max = policy->cpuinfo.max_freq;
 
        /*
-        * Set the policy to powersave to provide a valid fallback value in case
+        * Set the policy to provide a valid fallback value in case
         * the default cpufreq governor is neither powersave nor performance.
         */
-       policy->policy = CPUFREQ_POLICY_POWERSAVE;
+       if (amd_pstate_acpi_pm_profile_server() ||
+           amd_pstate_acpi_pm_profile_undefined())
+               policy->policy = CPUFREQ_POLICY_PERFORMANCE;
+       else
+               policy->policy = CPUFREQ_POLICY_POWERSAVE;
 
        if (boot_cpu_has(X86_FEATURE_CPPC)) {
-               policy->fast_switch_possible = true;
                ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, &value);
                if (ret)
                        return ret;
@@ -1102,7 +1180,6 @@ free_cpudata1:
 static int amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy)
 {
        pr_debug("CPU %d exiting\n", policy->cpu);
-       policy->fast_switch_possible = false;
        return 0;
 }
 
@@ -1309,6 +1386,7 @@ static struct cpufreq_driver amd_pstate_driver = {
        .flags          = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
        .verify         = amd_pstate_verify,
        .target         = amd_pstate_target,
+       .fast_switch    = amd_pstate_fast_switch,
        .init           = amd_pstate_cpu_init,
        .exit           = amd_pstate_cpu_exit,
        .suspend        = amd_pstate_cpu_suspend,
@@ -1328,10 +1406,29 @@ static struct cpufreq_driver amd_pstate_epp_driver = {
        .online         = amd_pstate_epp_cpu_online,
        .suspend        = amd_pstate_epp_suspend,
        .resume         = amd_pstate_epp_resume,
-       .name           = "amd_pstate_epp",
+       .name           = "amd-pstate-epp",
        .attr           = amd_pstate_epp_attr,
 };
 
+static int __init amd_pstate_set_driver(int mode_idx)
+{
+       if (mode_idx >= AMD_PSTATE_DISABLE && mode_idx < AMD_PSTATE_MAX) {
+               cppc_state = mode_idx;
+               if (cppc_state == AMD_PSTATE_DISABLE)
+                       pr_info("driver is explicitly disabled\n");
+
+               if (cppc_state == AMD_PSTATE_ACTIVE)
+                       current_pstate_driver = &amd_pstate_epp_driver;
+
+               if (cppc_state == AMD_PSTATE_PASSIVE || cppc_state == AMD_PSTATE_GUIDED)
+                       current_pstate_driver = &amd_pstate_driver;
+
+               return 0;
+       }
+
+       return -EINVAL;
+}
+
 static int __init amd_pstate_init(void)
 {
        struct device *dev_root;
@@ -1339,15 +1436,6 @@ static int __init amd_pstate_init(void)
 
        if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
                return -ENODEV;
-       /*
-        * by default the pstate driver is disabled to load
-        * enable the amd_pstate passive mode driver explicitly
-        * with amd_pstate=passive or other modes in kernel command line
-        */
-       if (cppc_state == AMD_PSTATE_DISABLE) {
-               pr_info("driver load is disabled, boot with specific mode to enable this\n");
-               return -ENODEV;
-       }
 
        if (!acpi_cpc_valid()) {
                pr_warn_once("the _CPC object is not present in SBIOS or ACPI disabled\n");
@@ -1358,6 +1446,33 @@ static int __init amd_pstate_init(void)
        if (cpufreq_get_current_driver())
                return -EEXIST;
 
+       switch (cppc_state) {
+       case AMD_PSTATE_UNDEFINED:
+               /* Disable on the following configs by default:
+                * 1. Undefined platforms
+                * 2. Server platforms
+                * 3. Shared memory designs
+                */
+               if (amd_pstate_acpi_pm_profile_undefined() ||
+                   amd_pstate_acpi_pm_profile_server() ||
+                   !boot_cpu_has(X86_FEATURE_CPPC)) {
+                       pr_info("driver load is disabled, boot with specific mode to enable this\n");
+                       return -ENODEV;
+               }
+               ret = amd_pstate_set_driver(CONFIG_X86_AMD_PSTATE_DEFAULT_MODE);
+               if (ret)
+                       return ret;
+               break;
+       case AMD_PSTATE_DISABLE:
+               return -ENODEV;
+       case AMD_PSTATE_PASSIVE:
+       case AMD_PSTATE_ACTIVE:
+       case AMD_PSTATE_GUIDED:
+               break;
+       default:
+               return -EINVAL;
+       }
+
        /* capability check */
        if (boot_cpu_has(X86_FEATURE_CPPC)) {
                pr_debug("AMD CPPC MSR based functionality is supported\n");
@@ -1410,21 +1525,7 @@ static int __init amd_pstate_param(char *str)
        size = strlen(str);
        mode_idx = get_mode_idx_from_str(str, size);
 
-       if (mode_idx >= AMD_PSTATE_DISABLE && mode_idx < AMD_PSTATE_MAX) {
-               cppc_state = mode_idx;
-               if (cppc_state == AMD_PSTATE_DISABLE)
-                       pr_info("driver is explicitly disabled\n");
-
-               if (cppc_state == AMD_PSTATE_ACTIVE)
-                       current_pstate_driver = &amd_pstate_epp_driver;
-
-               if (cppc_state == AMD_PSTATE_PASSIVE || cppc_state == AMD_PSTATE_GUIDED)
-                       current_pstate_driver = &amd_pstate_driver;
-
-               return 0;
-       }
-
-       return -EINVAL;
+       return amd_pstate_set_driver(mode_idx);
 }
 early_param("amd_pstate", amd_pstate_param);
 
index 6b52ebe..50bbc96 100644 (file)
@@ -2828,7 +2828,8 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
             (driver_data->setpolicy && (driver_data->target_index ||
                    driver_data->target)) ||
             (!driver_data->get_intermediate != !driver_data->target_intermediate) ||
-            (!driver_data->online != !driver_data->offline))
+            (!driver_data->online != !driver_data->offline) ||
+                (driver_data->adjust_perf && !driver_data->fast_switch))
                return -EINVAL;
 
        pr_debug("trying to register driver %s\n", driver_data->name);
index 2548ec9..f291825 100644 (file)
@@ -824,6 +824,8 @@ static ssize_t store_energy_performance_preference(
                        err = cpufreq_start_governor(policy);
                        if (!ret)
                                ret = err;
+               } else {
+                       ret = 0;
                }
        }
 
index 1d2cfea..73efbcf 100644 (file)
@@ -583,7 +583,7 @@ static int __init pcc_cpufreq_probe(struct platform_device *pdev)
 
        /* Skip initialization if another cpufreq driver is there. */
        if (cpufreq_get_current_driver())
-               return -EEXIST;
+               return -ENODEV;
 
        if (acpi_disabled)
                return -ENODEV;
index 8e929f6..737a026 100644 (file)
@@ -145,7 +145,7 @@ static noinstr void enter_s2idle_proper(struct cpuidle_driver *drv,
 
        instrumentation_begin();
 
-       time_start = ns_to_ktime(local_clock());
+       time_start = ns_to_ktime(local_clock_noinstr());
 
        tick_freeze();
        /*
@@ -169,7 +169,7 @@ static noinstr void enter_s2idle_proper(struct cpuidle_driver *drv,
        tick_unfreeze();
        start_critical_timings();
 
-       time_end = ns_to_ktime(local_clock());
+       time_end = ns_to_ktime(local_clock_noinstr());
 
        dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
        dev->states_usage[index].s2idle_usage++;
@@ -243,7 +243,7 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
        sched_idle_set_state(target_state);
 
        trace_cpu_idle(index, dev->cpu);
-       time_start = ns_to_ktime(local_clock());
+       time_start = ns_to_ktime(local_clock_noinstr());
 
        stop_critical_timings();
        if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
@@ -276,7 +276,7 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
        start_critical_timings();
 
        sched_clock_idle_wakeup_event();
-       time_end = ns_to_ktime(local_clock());
+       time_end = ns_to_ktime(local_clock_noinstr());
        trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
 
        /* The cpu is no longer idle or about to enter idle. */
index bdcfeae..9b6d90a 100644 (file)
@@ -15,7 +15,7 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
 {
        u64 time_start;
 
-       time_start = local_clock();
+       time_start = local_clock_noinstr();
 
        dev->poll_time_limit = false;
 
@@ -32,7 +32,7 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
                                continue;
 
                        loop_count = 0;
-                       if (local_clock() - time_start > limit) {
+                       if (local_clock_noinstr() - time_start > limit) {
                                dev->poll_time_limit = true;
                                break;
                        }
index 10fe9f7..f2dd667 100644 (file)
@@ -8,7 +8,7 @@
  * keysize in CBC and ECB mode.
  * Add support also for DES and 3DES in CBC and ECB mode.
  *
- * You could find the datasheet in Documentation/arm/sunxi.rst
+ * You could find the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include "sun4i-ss.h"
 
index 006e401..51a3a7b 100644 (file)
@@ -6,7 +6,7 @@
  *
  * Core file which registers crypto algorithms supported by the SS.
  *
- * You could find a link for the datasheet in Documentation/arm/sunxi.rst
+ * You could find a link for the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include <linux/clk.h>
 #include <linux/crypto.h>
index d282927..f7893e4 100644 (file)
@@ -6,7 +6,7 @@
  *
  * This file add support for MD5 and SHA1.
  *
- * You could find the datasheet in Documentation/arm/sunxi.rst
+ * You could find the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include "sun4i-ss.h"
 #include <asm/unaligned.h>
index ba59c7a..6c5d4aa 100644 (file)
@@ -8,7 +8,7 @@
  * Support MD5 and SHA1 hash algorithms.
  * Support DES and 3DES
  *
- * You could find the datasheet in Documentation/arm/sunxi.rst
+ * You could find the datasheet in Documentation/arch/arm/sunxi.rst
  */
 
 #include <linux/clk.h>
index 74b4e91..c135500 100644 (file)
@@ -8,7 +8,7 @@
  * This file add support for AES cipher with 128,192,256 bits keysize in
  * CBC and ECB mode.
  *
- * You could find a link for the datasheet in Documentation/arm/sunxi.rst
+ * You could find a link for the datasheet in Documentation/arch/arm/sunxi.rst
  */
 
 #include <linux/bottom_half.h>
index a6865ff..07ea0cc 100644 (file)
@@ -7,7 +7,7 @@
  *
  * Core file which registers crypto algorithms supported by the CryptoEngine.
  *
- * You could find a link for the datasheet in Documentation/arm/sunxi.rst
+ * You could find a link for the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include <linux/clk.h>
 #include <linux/crypto.h>
index 8b5b9b9..930ad15 100644 (file)
@@ -7,7 +7,7 @@
  *
  * This file add support for MD5 and SHA1/SHA224/SHA256/SHA384/SHA512.
  *
- * You could find the datasheet in Documentation/arm/sunxi.rst
+ * You could find the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include <linux/bottom_half.h>
 #include <linux/dma-mapping.h>
index b3cc43e..8081537 100644 (file)
@@ -7,7 +7,7 @@
  *
  * This file handle the PRNG
  *
- * You could find a link for the datasheet in Documentation/arm/sunxi.rst
+ * You could find a link for the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include "sun8i-ce.h"
 #include <linux/dma-mapping.h>
index e2b9b91..9c35f2a 100644 (file)
@@ -7,7 +7,7 @@
  *
  * This file handle the TRNG
  *
- * You could find a link for the datasheet in Documentation/arm/sunxi.rst
+ * You could find a link for the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include "sun8i-ce.h"
 #include <linux/dma-mapping.h>
index 16966cc..381a90f 100644 (file)
@@ -8,7 +8,7 @@
  * This file add support for AES cipher with 128,192,256 bits keysize in
  * CBC and ECB mode.
  *
- * You could find a link for the datasheet in Documentation/arm/sunxi.rst
+ * You could find a link for the datasheet in Documentation/arch/arm/sunxi.rst
  */
 
 #include <linux/bottom_half.h>
index c9dc06f..3dd844b 100644 (file)
@@ -7,7 +7,7 @@
  *
  * Core file which registers crypto algorithms supported by the SecuritySystem
  *
- * You could find a link for the datasheet in Documentation/arm/sunxi.rst
+ * You could find a link for the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include <linux/clk.h>
 #include <linux/crypto.h>
index 577bf63..a4b67d1 100644 (file)
@@ -7,7 +7,7 @@
  *
  * This file add support for MD5 and SHA1/SHA224/SHA256.
  *
- * You could find the datasheet in Documentation/arm/sunxi.rst
+ * You could find the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include <linux/bottom_half.h>
 #include <linux/dma-mapping.h>
index 70c7b5d..a923cfc 100644 (file)
@@ -7,7 +7,7 @@
  *
  * This file handle the PRNG found in the SS
  *
- * You could find a link for the datasheet in Documentation/arm/sunxi.rst
+ * You could find a link for the datasheet in Documentation/arch/arm/sunxi.rst
  */
 #include "sun8i-ss.h"
 #include <linux/dma-mapping.h>
index ddf6e91..30e6acf 100644 (file)
@@ -357,9 +357,9 @@ static int cptpf_vfpf_mbox_init(struct otx2_cptpf_dev *cptpf, int num_vfs)
        u64 vfpf_mbox_base;
        int err, i;
 
-       cptpf->vfpf_mbox_wq = alloc_workqueue("cpt_vfpf_mailbox",
-                                             WQ_UNBOUND | WQ_HIGHPRI |
-                                             WQ_MEM_RECLAIM, 1);
+       cptpf->vfpf_mbox_wq =
+               alloc_ordered_workqueue("cpt_vfpf_mailbox",
+                                       WQ_HIGHPRI | WQ_MEM_RECLAIM);
        if (!cptpf->vfpf_mbox_wq)
                return -ENOMEM;
 
@@ -453,9 +453,9 @@ static int cptpf_afpf_mbox_init(struct otx2_cptpf_dev *cptpf)
        resource_size_t offset;
        int err;
 
-       cptpf->afpf_mbox_wq = alloc_workqueue("cpt_afpf_mailbox",
-                                             WQ_UNBOUND | WQ_HIGHPRI |
-                                             WQ_MEM_RECLAIM, 1);
+       cptpf->afpf_mbox_wq =
+               alloc_ordered_workqueue("cpt_afpf_mailbox",
+                                       WQ_HIGHPRI | WQ_MEM_RECLAIM);
        if (!cptpf->afpf_mbox_wq)
                return -ENOMEM;
 
index 392e9fe..6023a7a 100644 (file)
@@ -75,9 +75,9 @@ static int cptvf_pfvf_mbox_init(struct otx2_cptvf_dev *cptvf)
        resource_size_t offset, size;
        int ret;
 
-       cptvf->pfvf_mbox_wq = alloc_workqueue("cpt_pfvf_mailbox",
-                                             WQ_UNBOUND | WQ_HIGHPRI |
-                                             WQ_MEM_RECLAIM, 1);
+       cptvf->pfvf_mbox_wq =
+               alloc_ordered_workqueue("cpt_pfvf_mailbox",
+                                       WQ_HIGHPRI | WQ_MEM_RECLAIM);
        if (!cptvf->pfvf_mbox_wq)
                return -ENOMEM;
 
index 23b9ff9..bea9cf3 100644 (file)
@@ -1028,7 +1028,7 @@ static int cxl_mem_get_partition_info(struct cxl_dev_state *cxlds)
  * cxl_dev_state_identify() - Send the IDENTIFY command to the device.
  * @cxlds: The device data for the operation
  *
- * Return: 0 if identify was executed successfully.
+ * Return: 0 if identify was executed successfully or media not ready.
  *
  * This will dispatch the identify command to the device and on success populate
  * structures to be exported to sysfs.
@@ -1041,6 +1041,9 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
        u32 val;
        int rc;
 
+       if (!cxlds->media_ready)
+               return 0;
+
        mbox_cmd = (struct cxl_mbox_cmd) {
                .opcode = CXL_MBOX_OP_IDENTIFY,
                .size_out = sizeof(id),
@@ -1102,6 +1105,13 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
        struct device *dev = cxlds->dev;
        int rc;
 
+       if (!cxlds->media_ready) {
+               cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
+               cxlds->ram_res = DEFINE_RES_MEM(0, 0);
+               cxlds->pmem_res = DEFINE_RES_MEM(0, 0);
+               return 0;
+       }
+
        cxlds->dpa_res =
                (struct resource)DEFINE_RES_MEM(0, cxlds->total_bytes);
 
index bdbd907..67f4ab6 100644 (file)
@@ -101,23 +101,57 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port)
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_port_enumerate_dports, CXL);
 
-/*
- * Wait up to @media_ready_timeout for the device to report memory
- * active.
- */
-int cxl_await_media_ready(struct cxl_dev_state *cxlds)
+static int cxl_dvsec_mem_range_valid(struct cxl_dev_state *cxlds, int id)
+{
+       struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+       int d = cxlds->cxl_dvsec;
+       bool valid = false;
+       int rc, i;
+       u32 temp;
+
+       if (id > CXL_DVSEC_RANGE_MAX)
+               return -EINVAL;
+
+       /* Check MEM INFO VALID bit first, give up after 1s */
+       i = 1;
+       do {
+               rc = pci_read_config_dword(pdev,
+                                          d + CXL_DVSEC_RANGE_SIZE_LOW(id),
+                                          &temp);
+               if (rc)
+                       return rc;
+
+               valid = FIELD_GET(CXL_DVSEC_MEM_INFO_VALID, temp);
+               if (valid)
+                       break;
+               msleep(1000);
+       } while (i--);
+
+       if (!valid) {
+               dev_err(&pdev->dev,
+                       "Timeout awaiting memory range %d valid after 1s.\n",
+                       id);
+               return -ETIMEDOUT;
+       }
+
+       return 0;
+}
+
+static int cxl_dvsec_mem_range_active(struct cxl_dev_state *cxlds, int id)
 {
        struct pci_dev *pdev = to_pci_dev(cxlds->dev);
        int d = cxlds->cxl_dvsec;
        bool active = false;
-       u64 md_status;
        int rc, i;
+       u32 temp;
 
-       for (i = media_ready_timeout; i; i--) {
-               u32 temp;
+       if (id > CXL_DVSEC_RANGE_MAX)
+               return -EINVAL;
 
+       /* Check MEM ACTIVE bit, up to 60s timeout by default */
+       for (i = media_ready_timeout; i; i--) {
                rc = pci_read_config_dword(
-                       pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(0), &temp);
+                       pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(id), &temp);
                if (rc)
                        return rc;
 
@@ -134,6 +168,39 @@ int cxl_await_media_ready(struct cxl_dev_state *cxlds)
                return -ETIMEDOUT;
        }
 
+       return 0;
+}
+
+/*
+ * Wait up to @media_ready_timeout for the device to report memory
+ * active.
+ */
+int cxl_await_media_ready(struct cxl_dev_state *cxlds)
+{
+       struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+       int d = cxlds->cxl_dvsec;
+       int rc, i, hdm_count;
+       u64 md_status;
+       u16 cap;
+
+       rc = pci_read_config_word(pdev,
+                                 d + CXL_DVSEC_CAP_OFFSET, &cap);
+       if (rc)
+               return rc;
+
+       hdm_count = FIELD_GET(CXL_DVSEC_HDM_COUNT_MASK, cap);
+       for (i = 0; i < hdm_count; i++) {
+               rc = cxl_dvsec_mem_range_valid(cxlds, i);
+               if (rc)
+                       return rc;
+       }
+
+       for (i = 0; i < hdm_count; i++) {
+               rc = cxl_dvsec_mem_range_active(cxlds, i);
+               if (rc)
+                       return rc;
+       }
+
        md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
        if (!CXLMDEV_READY(md_status))
                return -EIO;
@@ -241,17 +308,36 @@ static void disable_hdm(void *_cxlhdm)
               hdm + CXL_HDM_DECODER_CTRL_OFFSET);
 }
 
-static int devm_cxl_enable_hdm(struct device *host, struct cxl_hdm *cxlhdm)
+int devm_cxl_enable_hdm(struct cxl_port *port, struct cxl_hdm *cxlhdm)
 {
-       void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+       void __iomem *hdm;
        u32 global_ctrl;
 
+       /*
+        * If the hdm capability was not mapped there is nothing to enable and
+        * the caller is responsible for what happens next.  For example,
+        * emulate a passthrough decoder.
+        */
+       if (IS_ERR(cxlhdm))
+               return 0;
+
+       hdm = cxlhdm->regs.hdm_decoder;
        global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+
+       /*
+        * If the HDM decoder capability was enabled on entry, skip
+        * registering disable_hdm() since this decode capability may be
+        * owned by platform firmware.
+        */
+       if (global_ctrl & CXL_HDM_DECODER_ENABLE)
+               return 0;
+
        writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
               hdm + CXL_HDM_DECODER_CTRL_OFFSET);
 
-       return devm_add_action_or_reset(host, disable_hdm, cxlhdm);
+       return devm_add_action_or_reset(&port->dev, disable_hdm, cxlhdm);
 }
+EXPORT_SYMBOL_NS_GPL(devm_cxl_enable_hdm, CXL);
 
 int cxl_dvsec_rr_decode(struct device *dev, int d,
                        struct cxl_endpoint_dvsec_info *info)
@@ -425,7 +511,7 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
        if (info->mem_enabled)
                return 0;
 
-       rc = devm_cxl_enable_hdm(&port->dev, cxlhdm);
+       rc = devm_cxl_enable_hdm(port, cxlhdm);
        if (rc)
                return rc;
 
@@ -571,6 +657,7 @@ void read_cdat_data(struct cxl_port *port)
                /* Don't leave table data allocated on error */
                devm_kfree(dev, cdat_table);
                dev_err(dev, "CDAT data read error\n");
+               return;
        }
 
        port->cdat.table = cdat_table + sizeof(__le32);
index da20684..e7c284c 100644 (file)
@@ -750,11 +750,10 @@ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
 
        parent_port = parent_dport ? parent_dport->port : NULL;
        if (IS_ERR(port)) {
-               dev_dbg(uport, "Failed to add %s%s%s%s: %ld\n",
-                       dev_name(&port->dev),
-                       parent_port ? " to " : "",
+               dev_dbg(uport, "Failed to add%s%s%s: %ld\n",
+                       parent_port ? " port to " : "",
                        parent_port ? dev_name(&parent_port->dev) : "",
-                       parent_port ? "" : " (root port)",
+                       parent_port ? "" : " root port",
                        PTR_ERR(port));
        } else {
                dev_dbg(uport, "%s added%s%s%s\n",
index 044a92d..f93a285 100644 (file)
@@ -710,6 +710,7 @@ struct cxl_endpoint_dvsec_info {
 struct cxl_hdm;
 struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port,
                                   struct cxl_endpoint_dvsec_info *info);
+int devm_cxl_enable_hdm(struct cxl_port *port, struct cxl_hdm *cxlhdm);
 int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
                                struct cxl_endpoint_dvsec_info *info);
 int devm_cxl_add_passthrough_decoder(struct cxl_port *port);
index db12b63..a2845a7 100644 (file)
@@ -266,6 +266,7 @@ struct cxl_poison_state {
  * @regs: Parsed register blocks
  * @cxl_dvsec: Offset to the PCIe device DVSEC
  * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @media_ready: Indicate whether the device media is usable
  * @payload_size: Size of space for payload
  *                (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
  * @lsa_size: Size of Label Storage Area
@@ -303,6 +304,7 @@ struct cxl_dev_state {
        int cxl_dvsec;
 
        bool rcd;
+       bool media_ready;
        size_t payload_size;
        size_t lsa_size;
        struct mutex mbox_mutex; /* Protects device mailbox and firmware */
index 0465ef9..7c02e55 100644 (file)
@@ -31,6 +31,8 @@
 #define   CXL_DVSEC_RANGE_BASE_LOW(i)  (0x24 + (i * 0x10))
 #define     CXL_DVSEC_MEM_BASE_LOW_MASK        GENMASK(31, 28)
 
+#define CXL_DVSEC_RANGE_MAX            2
+
 /* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
 #define CXL_DVSEC_FUNCTION_MAP                                 2
 
index 10caf18..519edd0 100644 (file)
@@ -124,6 +124,9 @@ static int cxl_mem_probe(struct device *dev)
        struct dentry *dentry;
        int rc;
 
+       if (!cxlds->media_ready)
+               return -EBUSY;
+
        /*
         * Someone is trying to reattach this device after it lost its port
         * connection (an endpoint port previously registered by this memdev was
index f7a5b8e..0872f22 100644 (file)
@@ -708,6 +708,12 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        if (rc)
                dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
 
+       rc = cxl_await_media_ready(cxlds);
+       if (rc == 0)
+               cxlds->media_ready = true;
+       else
+               dev_warn(&pdev->dev, "Media not active (%d)\n", rc);
+
        rc = cxl_pci_setup_mailbox(cxlds);
        if (rc)
                return rc;
index eb57324..c23b616 100644 (file)
@@ -60,13 +60,17 @@ static int discover_region(struct device *dev, void *root)
 static int cxl_switch_port_probe(struct cxl_port *port)
 {
        struct cxl_hdm *cxlhdm;
-       int rc;
+       int rc, nr_dports;
 
-       rc = devm_cxl_port_enumerate_dports(port);
-       if (rc < 0)
-               return rc;
+       nr_dports = devm_cxl_port_enumerate_dports(port);
+       if (nr_dports < 0)
+               return nr_dports;
 
        cxlhdm = devm_cxl_setup_hdm(port, NULL);
+       rc = devm_cxl_enable_hdm(port, cxlhdm);
+       if (rc)
+               return rc;
+
        if (!IS_ERR(cxlhdm))
                return devm_cxl_enumerate_decoders(cxlhdm, NULL);
 
@@ -75,7 +79,7 @@ static int cxl_switch_port_probe(struct cxl_port *port)
                return PTR_ERR(cxlhdm);
        }
 
-       if (rc == 1) {
+       if (nr_dports == 1) {
                dev_dbg(&port->dev, "Fallback to passthrough decoder\n");
                return devm_cxl_add_passthrough_decoder(port);
        }
@@ -113,12 +117,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
        if (rc)
                return rc;
 
-       rc = cxl_await_media_ready(cxlds);
-       if (rc) {
-               dev_err(&port->dev, "Media not active (%d)\n", rc);
-               return rc;
-       }
-
        rc = devm_cxl_enumerate_decoders(cxlhdm, &info);
        if (rc)
                return rc;
index 8841444..245898f 100644 (file)
@@ -518,6 +518,7 @@ static struct platform_driver exynos_bus_platdrv = {
 };
 module_platform_driver(exynos_bus_platdrv);
 
+MODULE_SOFTDEP("pre: exynos_ppmu");
 MODULE_DESCRIPTION("Generic Exynos Bus frequency driver");
 MODULE_AUTHOR("Chanwoo Choi <cw00.choi@samsung.com>");
 MODULE_LICENSE("GPL v2");
index e5458ad..6354622 100644 (file)
@@ -127,7 +127,7 @@ static int mtk_ccifreq_target(struct device *dev, unsigned long *freq,
                              u32 flags)
 {
        struct mtk_ccifreq_drv *drv = dev_get_drvdata(dev);
-       struct clk *cci_pll = clk_get_parent(drv->cci_clk);
+       struct clk *cci_pll;
        struct dev_pm_opp *opp;
        unsigned long opp_rate;
        int voltage, pre_voltage, inter_voltage, target_voltage, ret;
@@ -139,6 +139,7 @@ static int mtk_ccifreq_target(struct device *dev, unsigned long *freq,
                return 0;
 
        inter_voltage = drv->inter_voltage;
+       cci_pll = clk_get_parent(drv->cci_clk);
 
        opp_rate = *freq;
        opp = devfreq_recommended_opp(dev, &opp_rate, 1);
index 01f2e86..12cf6bb 100644 (file)
@@ -12,7 +12,6 @@
 #include <linux/shmem_fs.h>
 #include <linux/slab.h>
 #include <linux/udmabuf.h>
-#include <linux/hugetlb.h>
 #include <linux/vmalloc.h>
 #include <linux/iosys-map.h>
 
@@ -207,9 +206,7 @@ static long udmabuf_create(struct miscdevice *device,
        struct udmabuf *ubuf;
        struct dma_buf *buf;
        pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit;
-       struct page *page, *hpage = NULL;
-       pgoff_t subpgoff, maxsubpgs;
-       struct hstate *hpstate;
+       struct page *page;
        int seals, ret = -EINVAL;
        u32 i, flags;
 
@@ -245,7 +242,7 @@ static long udmabuf_create(struct miscdevice *device,
                if (!memfd)
                        goto err;
                mapping = memfd->f_mapping;
-               if (!shmem_mapping(mapping) && !is_file_hugepages(memfd))
+               if (!shmem_mapping(mapping))
                        goto err;
                seals = memfd_fcntl(memfd, F_GET_SEALS, 0);
                if (seals == -EINVAL)
@@ -256,48 +253,16 @@ static long udmabuf_create(struct miscdevice *device,
                        goto err;
                pgoff = list[i].offset >> PAGE_SHIFT;
                pgcnt = list[i].size   >> PAGE_SHIFT;
-               if (is_file_hugepages(memfd)) {
-                       hpstate = hstate_file(memfd);
-                       pgoff = list[i].offset >> huge_page_shift(hpstate);
-                       subpgoff = (list[i].offset &
-                                   ~huge_page_mask(hpstate)) >> PAGE_SHIFT;
-                       maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT;
-               }
                for (pgidx = 0; pgidx < pgcnt; pgidx++) {
-                       if (is_file_hugepages(memfd)) {
-                               if (!hpage) {
-                                       hpage = find_get_page_flags(mapping, pgoff,
-                                                                   FGP_ACCESSED);
-                                       if (!hpage) {
-                                               ret = -EINVAL;
-                                               goto err;
-                                       }
-                               }
-                               page = hpage + subpgoff;
-                               get_page(page);
-                               subpgoff++;
-                               if (subpgoff == maxsubpgs) {
-                                       put_page(hpage);
-                                       hpage = NULL;
-                                       subpgoff = 0;
-                                       pgoff++;
-                               }
-                       } else {
-                               page = shmem_read_mapping_page(mapping,
-                                                              pgoff + pgidx);
-                               if (IS_ERR(page)) {
-                                       ret = PTR_ERR(page);
-                                       goto err;
-                               }
+                       page = shmem_read_mapping_page(mapping, pgoff + pgidx);
+                       if (IS_ERR(page)) {
+                               ret = PTR_ERR(page);
+                               goto err;
                        }
                        ubuf->pages[pgbuf++] = page;
                }
                fput(memfd);
                memfd = NULL;
-               if (hpage) {
-                       put_page(hpage);
-                       hpage = NULL;
-               }
        }
 
        exp_info.ops  = &udmabuf_ops;
index 8858470..ee3a219 100644 (file)
 #define ATC_DST_PIP            BIT(12)         /* Destination Picture-in-Picture enabled */
 #define ATC_SRC_DSCR_DIS       BIT(16)         /* Src Descriptor fetch disable */
 #define ATC_DST_DSCR_DIS       BIT(20)         /* Dst Descriptor fetch disable */
-#define ATC_FC                 GENMASK(22, 21) /* Choose Flow Controller */
+#define ATC_FC                 GENMASK(23, 21) /* Choose Flow Controller */
 #define ATC_FC_MEM2MEM         0x0             /* Mem-to-Mem (DMA) */
 #define ATC_FC_MEM2PER         0x1             /* Mem-to-Periph (DMA) */
 #define ATC_FC_PER2MEM         0x2             /* Periph-to-Mem (DMA) */
 #define ATC_AUTO               BIT(31)         /* Auto multiple buffer tx enable */
 
 /* Bitfields in CFG */
-#define ATC_PER_MSB(h) ((0x30U & (h)) >> 4)    /* Extract most significant bits of a handshaking identifier */
-
 #define ATC_SRC_PER            GENMASK(3, 0)   /* Channel src rq associated with periph handshaking ifc h */
 #define ATC_DST_PER            GENMASK(7, 4)   /* Channel dst rq associated with periph handshaking ifc h */
 #define ATC_SRC_REP            BIT(8)          /* Source Replay Mod */
 #define ATC_DPIP_HOLE          GENMASK(15, 0)
 #define ATC_DPIP_BOUNDARY      GENMASK(25, 16)
 
-#define ATC_SRC_PER_ID(id)     (FIELD_PREP(ATC_SRC_PER_MSB, (id)) |    \
-                                FIELD_PREP(ATC_SRC_PER, (id)))
-#define ATC_DST_PER_ID(id)     (FIELD_PREP(ATC_DST_PER_MSB, (id)) |    \
-                                FIELD_PREP(ATC_DST_PER, (id)))
+#define ATC_PER_MSB            GENMASK(5, 4)   /* Extract MSBs of a handshaking identifier */
+#define ATC_SRC_PER_ID(id)                                            \
+       ({ typeof(id) _id = (id);                                      \
+          FIELD_PREP(ATC_SRC_PER_MSB, FIELD_GET(ATC_PER_MSB, _id)) |  \
+          FIELD_PREP(ATC_SRC_PER, _id); })
+#define ATC_DST_PER_ID(id)                                            \
+       ({ typeof(id) _id = (id);                                      \
+          FIELD_PREP(ATC_DST_PER_MSB, FIELD_GET(ATC_PER_MSB, _id)) |  \
+          FIELD_PREP(ATC_DST_PER, _id); })
 
 
 
index 7da6d9b..c3b3716 100644 (file)
@@ -1102,6 +1102,8 @@ at_xdmac_prep_interleaved(struct dma_chan *chan,
                                                        NULL,
                                                        src_addr, dst_addr,
                                                        xt, xt->sgl);
+               if (!first)
+                       return NULL;
 
                /* Length of the block is (BLEN+1) microblocks. */
                for (i = 0; i < xt->numf - 1; i++)
@@ -1132,8 +1134,9 @@ at_xdmac_prep_interleaved(struct dma_chan *chan,
                                                               src_addr, dst_addr,
                                                               xt, chunk);
                        if (!desc) {
-                               list_splice_tail_init(&first->descs_list,
-                                                     &atchan->free_descs_list);
+                               if (first)
+                                       list_splice_tail_init(&first->descs_list,
+                                                             &atchan->free_descs_list);
                                return NULL;
                        }
 
index ecbf67c..d32deb9 100644 (file)
@@ -277,7 +277,6 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
                if (wq_dedicated(wq)) {
                        rc = idxd_wq_set_pasid(wq, pasid);
                        if (rc < 0) {
-                               iommu_sva_unbind_device(sva);
                                dev_err(dev, "wq set pasid failed: %d\n", rc);
                                goto failed_set_pasid;
                        }
index 0d9257f..b4731fe 100644 (file)
@@ -1050,7 +1050,7 @@ static bool _trigger(struct pl330_thread *thrd)
        return true;
 }
 
-static bool _start(struct pl330_thread *thrd)
+static bool pl330_start_thread(struct pl330_thread *thrd)
 {
        switch (_state(thrd)) {
        case PL330_STATE_FAULT_COMPLETING:
@@ -1702,7 +1702,7 @@ static int pl330_update(struct pl330_dmac *pl330)
                        thrd->req_running = -1;
 
                        /* Get going again ASAP */
-                       _start(thrd);
+                       pl330_start_thread(thrd);
 
                        /* For now, just make a list of callbacks to be done */
                        list_add_tail(&descdone->rqd, &pl330->req_done);
@@ -2089,7 +2089,7 @@ static void pl330_tasklet(struct tasklet_struct *t)
        } else {
                /* Make sure the PL330 Channel thread is active */
                spin_lock(&pch->thread->dmac->lock);
-               _start(pch->thread);
+               pl330_start_thread(pch->thread);
                spin_unlock(&pch->thread->dmac->lock);
        }
 
@@ -2107,7 +2107,7 @@ static void pl330_tasklet(struct tasklet_struct *t)
                        if (power_down) {
                                pch->active = true;
                                spin_lock(&pch->thread->dmac->lock);
-                               _start(pch->thread);
+                               pl330_start_thread(pch->thread);
                                spin_unlock(&pch->thread->dmac->lock);
                                power_down = false;
                        }
index fc3a2a0..b8329a2 100644 (file)
@@ -5527,7 +5527,7 @@ static int udma_probe(struct platform_device *pdev)
        return ret;
 }
 
-static int udma_pm_suspend(struct device *dev)
+static int __maybe_unused udma_pm_suspend(struct device *dev)
 {
        struct udma_dev *ud = dev_get_drvdata(dev);
        struct dma_device *dma_dev = &ud->ddev;
@@ -5549,7 +5549,7 @@ static int udma_pm_suspend(struct device *dev)
        return 0;
 }
 
-static int udma_pm_resume(struct device *dev)
+static int __maybe_unused udma_pm_resume(struct device *dev)
 {
        struct udma_dev *ud = dev_get_drvdata(dev);
        struct dma_device *dma_dev = &ud->ddev;
index 68f5767..110e99b 100644 (file)
@@ -550,4 +550,15 @@ config EDAC_ZYNQMP
          Xilinx ZynqMP OCM (On Chip Memory) controller. It can also be
          built as a module. In that case it will be called zynqmp_edac.
 
+config EDAC_NPCM
+       tristate "Nuvoton NPCM DDR Memory Controller"
+       depends on (ARCH_NPCM || COMPILE_TEST)
+       help
+         Support for error detection and correction on the Nuvoton NPCM DDR
+         memory controller.
+
+         The memory controller supports single bit error correction, double bit
+         error detection (in-line ECC in which a section 1/8th of the memory
+         device used to store data is used for ECC storage).
+
 endif # EDAC
index 9b025c5..61945d3 100644 (file)
@@ -84,4 +84,5 @@ obj-$(CONFIG_EDAC_QCOM)                       += qcom_edac.o
 obj-$(CONFIG_EDAC_ASPEED)              += aspeed_edac.o
 obj-$(CONFIG_EDAC_BLUEFIELD)           += bluefield_edac.o
 obj-$(CONFIG_EDAC_DMC520)              += dmc520_edac.o
+obj-$(CONFIG_EDAC_NPCM)                        += npcm_edac.o
 obj-$(CONFIG_EDAC_ZYNQMP)              += zynqmp_edac.o
index 5c4292e..597dae7 100644 (file)
@@ -975,6 +975,74 @@ static int sys_addr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr)
        return csrow;
 }
 
+/*
+ * See AMD PPR DF::LclNodeTypeMap
+ *
+ * This register gives information for nodes of the same type within a system.
+ *
+ * Reading this register from a GPU node will tell how many GPU nodes are in the
+ * system and what the lowest AMD Node ID value is for the GPU nodes. Use this
+ * info to fixup the Linux logical "Node ID" value set in the AMD NB code and EDAC.
+ */
+static struct local_node_map {
+       u16 node_count;
+       u16 base_node_id;
+} gpu_node_map;
+
+#define PCI_DEVICE_ID_AMD_MI200_DF_F1          0x14d1
+#define REG_LOCAL_NODE_TYPE_MAP                        0x144
+
+/* Local Node Type Map (LNTM) fields */
+#define LNTM_NODE_COUNT                                GENMASK(27, 16)
+#define LNTM_BASE_NODE_ID                      GENMASK(11, 0)
+
+static int gpu_get_node_map(void)
+{
+       struct pci_dev *pdev;
+       int ret;
+       u32 tmp;
+
+       /*
+        * Node ID 0 is reserved for CPUs.
+        * Therefore, a non-zero Node ID means we've already cached the values.
+        */
+       if (gpu_node_map.base_node_id)
+               return 0;
+
+       pdev = pci_get_device(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MI200_DF_F1, NULL);
+       if (!pdev) {
+               ret = -ENODEV;
+               goto out;
+       }
+
+       ret = pci_read_config_dword(pdev, REG_LOCAL_NODE_TYPE_MAP, &tmp);
+       if (ret)
+               goto out;
+
+       gpu_node_map.node_count = FIELD_GET(LNTM_NODE_COUNT, tmp);
+       gpu_node_map.base_node_id = FIELD_GET(LNTM_BASE_NODE_ID, tmp);
+
+out:
+       pci_dev_put(pdev);
+       return ret;
+}
+
+static int fixup_node_id(int node_id, struct mce *m)
+{
+       /* MCA_IPID[InstanceIdHi] give the AMD Node ID for the bank. */
+       u8 nid = (m->ipid >> 44) & 0xF;
+
+       if (smca_get_bank_type(m->extcpu, m->bank) != SMCA_UMC_V2)
+               return node_id;
+
+       /* Nodes below the GPU base node are CPU nodes and don't need a fixup. */
+       if (nid < gpu_node_map.base_node_id)
+               return node_id;
+
+       /* Convert the hardware-provided AMD Node ID to a Linux logical one. */
+       return nid - gpu_node_map.base_node_id + 1;
+}
+
 /* Protect the PCI config register pairs used for DF indirect access. */
 static DEFINE_MUTEX(df_indirect_mutex);
 
@@ -1426,12 +1494,47 @@ static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
        return cs_mode;
 }
 
+static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode,
+                                 int csrow_nr, int dimm)
+{
+       u32 msb, weight, num_zero_bits;
+       u32 addr_mask_deinterleaved;
+       int size = 0;
+
+       /*
+        * The number of zero bits in the mask is equal to the number of bits
+        * in a full mask minus the number of bits in the current mask.
+        *
+        * The MSB is the number of bits in the full mask because BIT[0] is
+        * always 0.
+        *
+        * In the special 3 Rank interleaving case, a single bit is flipped
+        * without swapping with the most significant bit. This can be handled
+        * by keeping the MSB where it is and ignoring the single zero bit.
+        */
+       msb = fls(addr_mask_orig) - 1;
+       weight = hweight_long(addr_mask_orig);
+       num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE);
+
+       /* Take the number of zero bits off from the top of the mask. */
+       addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
+
+       edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
+       edac_dbg(1, "  Original AddrMask: 0x%x\n", addr_mask_orig);
+       edac_dbg(1, "  Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
+
+       /* Register [31:1] = Address [39:9]. Size is in kBs here. */
+       size = (addr_mask_deinterleaved >> 2) + 1;
+
+       /* Return size in MBs. */
+       return size >> 10;
+}
+
 static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
                                    unsigned int cs_mode, int csrow_nr)
 {
-       u32 addr_mask_orig, addr_mask_deinterleaved;
-       u32 msb, weight, num_zero_bits;
        int cs_mask_nr = csrow_nr;
+       u32 addr_mask_orig;
        int dimm, size = 0;
 
        /* No Chip Selects are enabled. */
@@ -1475,33 +1578,7 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
        else
                addr_mask_orig = pvt->csels[umc].csmasks[cs_mask_nr];
 
-       /*
-        * The number of zero bits in the mask is equal to the number of bits
-        * in a full mask minus the number of bits in the current mask.
-        *
-        * The MSB is the number of bits in the full mask because BIT[0] is
-        * always 0.
-        *
-        * In the special 3 Rank interleaving case, a single bit is flipped
-        * without swapping with the most significant bit. This can be handled
-        * by keeping the MSB where it is and ignoring the single zero bit.
-        */
-       msb = fls(addr_mask_orig) - 1;
-       weight = hweight_long(addr_mask_orig);
-       num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE);
-
-       /* Take the number of zero bits off from the top of the mask. */
-       addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
-
-       edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
-       edac_dbg(1, "  Original AddrMask: 0x%x\n", addr_mask_orig);
-       edac_dbg(1, "  Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
-
-       /* Register [31:1] = Address [39:9]. Size is in kBs here. */
-       size = (addr_mask_deinterleaved >> 2) + 1;
-
-       /* Return size in MBs. */
-       return size >> 10;
+       return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm);
 }
 
 static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
@@ -2992,6 +3069,8 @@ static void decode_umc_error(int node_id, struct mce *m)
        struct err_info err;
        u64 sys_addr;
 
+       node_id = fixup_node_id(node_id, m);
+
        mci = edac_mc_find(node_id);
        if (!mci)
                return;
@@ -3675,6 +3754,227 @@ static int umc_hw_info_get(struct amd64_pvt *pvt)
        return 0;
 }
 
+/*
+ * The CPUs have one channel per UMC, so UMC number is equivalent to a
+ * channel number. The GPUs have 8 channels per UMC, so the UMC number no
+ * longer works as a channel number.
+ *
+ * The channel number within a GPU UMC is given in MCA_IPID[15:12].
+ * However, the IDs are split such that two UMC values go to one UMC, and
+ * the channel numbers are split in two groups of four.
+ *
+ * Refer to comment on gpu_get_umc_base().
+ *
+ * For example,
+ * UMC0 CH[3:0] = 0x0005[3:0]000
+ * UMC0 CH[7:4] = 0x0015[3:0]000
+ * UMC1 CH[3:0] = 0x0025[3:0]000
+ * UMC1 CH[7:4] = 0x0035[3:0]000
+ */
+static void gpu_get_err_info(struct mce *m, struct err_info *err)
+{
+       u8 ch = (m->ipid & GENMASK(31, 0)) >> 20;
+       u8 phy = ((m->ipid >> 12) & 0xf);
+
+       err->channel = ch % 2 ? phy + 4 : phy;
+       err->csrow = phy;
+}
+
+static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+                                   unsigned int cs_mode, int csrow_nr)
+{
+       u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
+
+       return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1);
+}
+
+static void gpu_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
+{
+       int size, cs_mode, cs = 0;
+
+       edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
+
+       cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+
+       for_each_chip_select(cs, ctrl, pvt) {
+               size = gpu_addr_mask_to_cs_size(pvt, ctrl, cs_mode, cs);
+               amd64_info(EDAC_MC ": %d: %5dMB\n", cs, size);
+       }
+}
+
+static void gpu_dump_misc_regs(struct amd64_pvt *pvt)
+{
+       struct amd64_umc *umc;
+       u32 i;
+
+       for_each_umc(i) {
+               umc = &pvt->umc[i];
+
+               edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
+               edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
+               edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
+               edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
+
+               gpu_debug_display_dimm_sizes(pvt, i);
+       }
+}
+
+static u32 gpu_get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr)
+{
+       u32 nr_pages;
+       int cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+
+       nr_pages   = gpu_addr_mask_to_cs_size(pvt, dct, cs_mode, csrow_nr);
+       nr_pages <<= 20 - PAGE_SHIFT;
+
+       edac_dbg(0, "csrow: %d, channel: %d\n", csrow_nr, dct);
+       edac_dbg(0, "nr_pages/channel: %u\n", nr_pages);
+
+       return nr_pages;
+}
+
+static void gpu_init_csrows(struct mem_ctl_info *mci)
+{
+       struct amd64_pvt *pvt = mci->pvt_info;
+       struct dimm_info *dimm;
+       u8 umc, cs;
+
+       for_each_umc(umc) {
+               for_each_chip_select(cs, umc, pvt) {
+                       if (!csrow_enabled(cs, umc, pvt))
+                               continue;
+
+                       dimm = mci->csrows[umc]->channels[cs]->dimm;
+
+                       edac_dbg(1, "MC node: %d, csrow: %d\n",
+                                pvt->mc_node_id, cs);
+
+                       dimm->nr_pages = gpu_get_csrow_nr_pages(pvt, umc, cs);
+                       dimm->edac_mode = EDAC_SECDED;
+                       dimm->mtype = MEM_HBM2;
+                       dimm->dtype = DEV_X16;
+                       dimm->grain = 64;
+               }
+       }
+}
+
+static void gpu_setup_mci_misc_attrs(struct mem_ctl_info *mci)
+{
+       struct amd64_pvt *pvt = mci->pvt_info;
+
+       mci->mtype_cap          = MEM_FLAG_HBM2;
+       mci->edac_ctl_cap       = EDAC_FLAG_SECDED;
+
+       mci->edac_cap           = EDAC_FLAG_EC;
+       mci->mod_name           = EDAC_MOD_STR;
+       mci->ctl_name           = pvt->ctl_name;
+       mci->dev_name           = pci_name(pvt->F3);
+       mci->ctl_page_to_phys   = NULL;
+
+       gpu_init_csrows(mci);
+}
+
+/* ECC is enabled by default on GPU nodes */
+static bool gpu_ecc_enabled(struct amd64_pvt *pvt)
+{
+       return true;
+}
+
+static inline u32 gpu_get_umc_base(u8 umc, u8 channel)
+{
+       /*
+        * On CPUs, there is one channel per UMC, so UMC numbering equals
+        * channel numbering. On GPUs, there are eight channels per UMC,
+        * so the channel numbering is different from UMC numbering.
+        *
+        * On CPU nodes channels are selected in 6th nibble
+        * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
+        *
+        * On GPU nodes channels are selected in 3rd nibble
+        * HBM chX[3:0]= [Y  ]5X[3:0]000;
+        * HBM chX[7:4]= [Y+1]5X[3:0]000
+        */
+       umc *= 2;
+
+       if (channel >= 4)
+               umc++;
+
+       return 0x50000 + (umc << 20) + ((channel % 4) << 12);
+}
+
+static void gpu_read_mc_regs(struct amd64_pvt *pvt)
+{
+       u8 nid = pvt->mc_node_id;
+       struct amd64_umc *umc;
+       u32 i, umc_base;
+
+       /* Read registers from each UMC */
+       for_each_umc(i) {
+               umc_base = gpu_get_umc_base(i, 0);
+               umc = &pvt->umc[i];
+
+               amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
+               amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
+               amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
+       }
+}
+
+static void gpu_read_base_mask(struct amd64_pvt *pvt)
+{
+       u32 base_reg, mask_reg;
+       u32 *base, *mask;
+       int umc, cs;
+
+       for_each_umc(umc) {
+               for_each_chip_select(cs, umc, pvt) {
+                       base_reg = gpu_get_umc_base(umc, cs) + UMCCH_BASE_ADDR;
+                       base = &pvt->csels[umc].csbases[cs];
+
+                       if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) {
+                               edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
+                                        umc, cs, *base, base_reg);
+                       }
+
+                       mask_reg = gpu_get_umc_base(umc, cs) + UMCCH_ADDR_MASK;
+                       mask = &pvt->csels[umc].csmasks[cs];
+
+                       if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) {
+                               edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
+                                        umc, cs, *mask, mask_reg);
+                       }
+               }
+       }
+}
+
+static void gpu_prep_chip_selects(struct amd64_pvt *pvt)
+{
+       int umc;
+
+       for_each_umc(umc) {
+               pvt->csels[umc].b_cnt = 8;
+               pvt->csels[umc].m_cnt = 8;
+       }
+}
+
+static int gpu_hw_info_get(struct amd64_pvt *pvt)
+{
+       int ret;
+
+       ret = gpu_get_node_map();
+       if (ret)
+               return ret;
+
+       pvt->umc = kcalloc(pvt->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
+       if (!pvt->umc)
+               return -ENOMEM;
+
+       gpu_prep_chip_selects(pvt);
+       gpu_read_base_mask(pvt);
+       gpu_read_mc_regs(pvt);
+
+       return 0;
+}
+
 static void hw_info_put(struct amd64_pvt *pvt)
 {
        pci_dev_put(pvt->F1);
@@ -3690,6 +3990,14 @@ static struct low_ops umc_ops = {
        .get_err_info                   = umc_get_err_info,
 };
 
+static struct low_ops gpu_ops = {
+       .hw_info_get                    = gpu_hw_info_get,
+       .ecc_enabled                    = gpu_ecc_enabled,
+       .setup_mci_misc_attrs           = gpu_setup_mci_misc_attrs,
+       .dump_misc_regs                 = gpu_dump_misc_regs,
+       .get_err_info                   = gpu_get_err_info,
+};
+
 /* Use Family 16h versions for defaults and adjust as needed below. */
 static struct low_ops dct_ops = {
        .map_sysaddr_to_csrow           = f1x_map_sysaddr_to_csrow,
@@ -3813,9 +4121,27 @@ static int per_family_init(struct amd64_pvt *pvt)
                case 0x20 ... 0x2f:
                        pvt->ctl_name                   = "F19h_M20h";
                        break;
+               case 0x30 ... 0x3f:
+                       if (pvt->F3->device == PCI_DEVICE_ID_AMD_MI200_DF_F3) {
+                               pvt->ctl_name           = "MI200";
+                               pvt->max_mcs            = 4;
+                               pvt->ops                = &gpu_ops;
+                       } else {
+                               pvt->ctl_name           = "F19h_M30h";
+                               pvt->max_mcs            = 8;
+                       }
+                       break;
                case 0x50 ... 0x5f:
                        pvt->ctl_name                   = "F19h_M50h";
                        break;
+               case 0x60 ... 0x6f:
+                       pvt->ctl_name                   = "F19h_M60h";
+                       pvt->flags.zn_regs_v2           = 1;
+                       break;
+               case 0x70 ... 0x7f:
+                       pvt->ctl_name                   = "F19h_M70h";
+                       pvt->flags.zn_regs_v2           = 1;
+                       break;
                case 0xa0 ... 0xaf:
                        pvt->ctl_name                   = "F19h_MA0h";
                        pvt->max_mcs                    = 12;
@@ -3846,11 +4172,17 @@ static int init_one_instance(struct amd64_pvt *pvt)
        struct edac_mc_layer layers[2];
        int ret = -ENOMEM;
 
+       /*
+        * For Heterogeneous family EDAC CHIP_SELECT and CHANNEL layers should
+        * be swapped to fit into the layers.
+        */
        layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
-       layers[0].size = pvt->csels[0].b_cnt;
+       layers[0].size = (pvt->F3->device == PCI_DEVICE_ID_AMD_MI200_DF_F3) ?
+                        pvt->max_mcs : pvt->csels[0].b_cnt;
        layers[0].is_virt_csrow = true;
        layers[1].type = EDAC_MC_LAYER_CHANNEL;
-       layers[1].size = pvt->max_mcs;
+       layers[1].size = (pvt->F3->device == PCI_DEVICE_ID_AMD_MI200_DF_F3) ?
+                        pvt->csels[0].b_cnt : pvt->max_mcs;
        layers[1].is_virt_csrow = false;
 
        mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
@@ -4074,8 +4406,6 @@ static int __init amd64_edac_init(void)
        amd64_err("%s on 32-bit is unsupported. USE AT YOUR OWN RISK!\n", EDAC_MOD_STR);
 #endif
 
-       printk(KERN_INFO "AMD64 EDAC driver v%s\n", EDAC_AMD64_VERSION);
-
        return 0;
 
 err_pci:
@@ -4121,7 +4451,7 @@ module_exit(amd64_edac_exit);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("SoftwareBitMaker: Doug Thompson, Dave Peterson, Thayne Harbaugh; AMD");
-MODULE_DESCRIPTION("MC support for AMD64 memory controllers - " EDAC_AMD64_VERSION);
+MODULE_DESCRIPTION("MC support for AMD64 memory controllers");
 
 module_param(edac_op_state, int, 0444);
 MODULE_PARM_DESC(edac_op_state, "EDAC Error Reporting state: 0=Poll,1=NMI");
index e84fe0d..5a4e4a5 100644 (file)
@@ -16,6 +16,7 @@
 #include <linux/slab.h>
 #include <linux/mmzone.h>
 #include <linux/edac.h>
+#include <linux/bitfield.h>
 #include <asm/cpu_device_id.h>
 #include <asm/msr.h>
 #include "edac_module.h"
@@ -85,7 +86,6 @@
  *         sections 3.5.4 and 3.5.5 for more information.
  */
 
-#define EDAC_AMD64_VERSION             "3.5.0"
 #define EDAC_MOD_STR                   "amd64_edac"
 
 /* Extended Model from CPUID, for CPU Revision numbers */
index cc5c63f..9215c06 100644 (file)
@@ -1186,7 +1186,8 @@ static void decode_smca_error(struct mce *m)
        if (xec < smca_mce_descs[bank_type].num_descs)
                pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
 
-       if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
+       if ((bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2) &&
+           xec == 0 && decode_dram_ecc)
                decode_dram_ecc(topology_die_id(m->extcpu), m);
 }
 
diff --git a/drivers/edac/npcm_edac.c b/drivers/edac/npcm_edac.c
new file mode 100644 (file)
index 0000000..12b95be
--- /dev/null
@@ -0,0 +1,543 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2022 Nuvoton Technology Corporation
+
+#include <linux/debugfs.h>
+#include <linux/iopoll.h>
+#include <linux/of_device.h>
+#include <linux/regmap.h>
+#include "edac_module.h"
+
+#define EDAC_MOD_NAME                  "npcm-edac"
+#define EDAC_MSG_SIZE                  256
+
+/* chip serials */
+#define NPCM7XX_CHIP                   BIT(0)
+#define NPCM8XX_CHIP                   BIT(1)
+
+/* syndrome values */
+#define UE_SYNDROME                    0x03
+
+/* error injection */
+#define ERROR_TYPE_CORRECTABLE         0
+#define ERROR_TYPE_UNCORRECTABLE       1
+#define ERROR_LOCATION_DATA            0
+#define ERROR_LOCATION_CHECKCODE       1
+#define ERROR_BIT_DATA_MAX             63
+#define ERROR_BIT_CHECKCODE_MAX                7
+
+static char data_synd[] = {
+       0xf4, 0xf1, 0xec, 0xea, 0xe9, 0xe6, 0xe5, 0xe3,
+       0xdc, 0xda, 0xd9, 0xd6, 0xd5, 0xd3, 0xce, 0xcb,
+       0xb5, 0xb0, 0xad, 0xab, 0xa8, 0xa7, 0xa4, 0xa2,
+       0x9d, 0x9b, 0x98, 0x97, 0x94, 0x92, 0x8f, 0x8a,
+       0x75, 0x70, 0x6d, 0x6b, 0x68, 0x67, 0x64, 0x62,
+       0x5e, 0x5b, 0x58, 0x57, 0x54, 0x52, 0x4f, 0x4a,
+       0x34, 0x31, 0x2c, 0x2a, 0x29, 0x26, 0x25, 0x23,
+       0x1c, 0x1a, 0x19, 0x16, 0x15, 0x13, 0x0e, 0x0b
+};
+
+static struct regmap *npcm_regmap;
+
+struct npcm_platform_data {
+       /* chip serials */
+       int chip;
+
+       /* memory controller registers */
+       u32 ctl_ecc_en;
+       u32 ctl_int_status;
+       u32 ctl_int_ack;
+       u32 ctl_int_mask_master;
+       u32 ctl_int_mask_ecc;
+       u32 ctl_ce_addr_l;
+       u32 ctl_ce_addr_h;
+       u32 ctl_ce_data_l;
+       u32 ctl_ce_data_h;
+       u32 ctl_ce_synd;
+       u32 ctl_ue_addr_l;
+       u32 ctl_ue_addr_h;
+       u32 ctl_ue_data_l;
+       u32 ctl_ue_data_h;
+       u32 ctl_ue_synd;
+       u32 ctl_source_id;
+       u32 ctl_controller_busy;
+       u32 ctl_xor_check_bits;
+
+       /* masks and shifts */
+       u32 ecc_en_mask;
+       u32 int_status_ce_mask;
+       u32 int_status_ue_mask;
+       u32 int_ack_ce_mask;
+       u32 int_ack_ue_mask;
+       u32 int_mask_master_non_ecc_mask;
+       u32 int_mask_master_global_mask;
+       u32 int_mask_ecc_non_event_mask;
+       u32 ce_addr_h_mask;
+       u32 ce_synd_mask;
+       u32 ce_synd_shift;
+       u32 ue_addr_h_mask;
+       u32 ue_synd_mask;
+       u32 ue_synd_shift;
+       u32 source_id_ce_mask;
+       u32 source_id_ce_shift;
+       u32 source_id_ue_mask;
+       u32 source_id_ue_shift;
+       u32 controller_busy_mask;
+       u32 xor_check_bits_mask;
+       u32 xor_check_bits_shift;
+       u32 writeback_en_mask;
+       u32 fwc_mask;
+};
+
+struct priv_data {
+       void __iomem *reg;
+       char message[EDAC_MSG_SIZE];
+       const struct npcm_platform_data *pdata;
+
+       /* error injection */
+       struct dentry *debugfs;
+       u8 error_type;
+       u8 location;
+       u8 bit;
+};
+
+static void handle_ce(struct mem_ctl_info *mci)
+{
+       struct priv_data *priv = mci->pvt_info;
+       const struct npcm_platform_data *pdata;
+       u32 val_h = 0, val_l, id, synd;
+       u64 addr = 0, data = 0;
+
+       pdata = priv->pdata;
+       regmap_read(npcm_regmap, pdata->ctl_ce_addr_l, &val_l);
+       if (pdata->chip == NPCM8XX_CHIP) {
+               regmap_read(npcm_regmap, pdata->ctl_ce_addr_h, &val_h);
+               val_h &= pdata->ce_addr_h_mask;
+       }
+       addr = ((addr | val_h) << 32) | val_l;
+
+       regmap_read(npcm_regmap, pdata->ctl_ce_data_l, &val_l);
+       if (pdata->chip == NPCM8XX_CHIP)
+               regmap_read(npcm_regmap, pdata->ctl_ce_data_h, &val_h);
+       data = ((data | val_h) << 32) | val_l;
+
+       regmap_read(npcm_regmap, pdata->ctl_source_id, &id);
+       id = (id & pdata->source_id_ce_mask) >> pdata->source_id_ce_shift;
+
+       regmap_read(npcm_regmap, pdata->ctl_ce_synd, &synd);
+       synd = (synd & pdata->ce_synd_mask) >> pdata->ce_synd_shift;
+
+       snprintf(priv->message, EDAC_MSG_SIZE,
+                "addr = 0x%llx, data = 0x%llx, id = 0x%x", addr, data, id);
+
+       edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 1, addr >> PAGE_SHIFT,
+                            addr & ~PAGE_MASK, synd, 0, 0, -1, priv->message, "");
+}
+
+static void handle_ue(struct mem_ctl_info *mci)
+{
+       struct priv_data *priv = mci->pvt_info;
+       const struct npcm_platform_data *pdata;
+       u32 val_h = 0, val_l, id, synd;
+       u64 addr = 0, data = 0;
+
+       pdata = priv->pdata;
+       regmap_read(npcm_regmap, pdata->ctl_ue_addr_l, &val_l);
+       if (pdata->chip == NPCM8XX_CHIP) {
+               regmap_read(npcm_regmap, pdata->ctl_ue_addr_h, &val_h);
+               val_h &= pdata->ue_addr_h_mask;
+       }
+       addr = ((addr | val_h) << 32) | val_l;
+
+       regmap_read(npcm_regmap, pdata->ctl_ue_data_l, &val_l);
+       if (pdata->chip == NPCM8XX_CHIP)
+               regmap_read(npcm_regmap, pdata->ctl_ue_data_h, &val_h);
+       data = ((data | val_h) << 32) | val_l;
+
+       regmap_read(npcm_regmap, pdata->ctl_source_id, &id);
+       id = (id & pdata->source_id_ue_mask) >> pdata->source_id_ue_shift;
+
+       regmap_read(npcm_regmap, pdata->ctl_ue_synd, &synd);
+       synd = (synd & pdata->ue_synd_mask) >> pdata->ue_synd_shift;
+
+       snprintf(priv->message, EDAC_MSG_SIZE,
+                "addr = 0x%llx, data = 0x%llx, id = 0x%x", addr, data, id);
+
+       edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 1, addr >> PAGE_SHIFT,
+                            addr & ~PAGE_MASK, synd, 0, 0, -1, priv->message, "");
+}
+
+static irqreturn_t edac_ecc_isr(int irq, void *dev_id)
+{
+       const struct npcm_platform_data *pdata;
+       struct mem_ctl_info *mci = dev_id;
+       u32 status;
+
+       pdata = ((struct priv_data *)mci->pvt_info)->pdata;
+       regmap_read(npcm_regmap, pdata->ctl_int_status, &status);
+       if (status & pdata->int_status_ce_mask) {
+               handle_ce(mci);
+
+               /* acknowledge the CE interrupt */
+               regmap_write(npcm_regmap, pdata->ctl_int_ack,
+                            pdata->int_ack_ce_mask);
+               return IRQ_HANDLED;
+       } else if (status & pdata->int_status_ue_mask) {
+               handle_ue(mci);
+
+               /* acknowledge the UE interrupt */
+               regmap_write(npcm_regmap, pdata->ctl_int_ack,
+                            pdata->int_ack_ue_mask);
+               return IRQ_HANDLED;
+       }
+
+       WARN_ON_ONCE(1);
+       return IRQ_NONE;
+}
+
+static ssize_t force_ecc_error(struct file *file, const char __user *data,
+                              size_t count, loff_t *ppos)
+{
+       struct device *dev = file->private_data;
+       struct mem_ctl_info *mci = to_mci(dev);
+       struct priv_data *priv = mci->pvt_info;
+       const struct npcm_platform_data *pdata;
+       u32 val, syndrome;
+       int ret;
+
+       pdata = priv->pdata;
+       edac_printk(KERN_INFO, EDAC_MOD_NAME,
+                   "force an ECC error, type = %d, location = %d, bit = %d\n",
+                   priv->error_type, priv->location, priv->bit);
+
+       /* ensure no pending writes */
+       ret = regmap_read_poll_timeout(npcm_regmap, pdata->ctl_controller_busy,
+                                      val, !(val & pdata->controller_busy_mask),
+                                      1000, 10000);
+       if (ret) {
+               edac_printk(KERN_INFO, EDAC_MOD_NAME,
+                           "wait pending writes timeout\n");
+               return count;
+       }
+
+       regmap_read(npcm_regmap, pdata->ctl_xor_check_bits, &val);
+       val &= ~pdata->xor_check_bits_mask;
+
+       /* write syndrome to XOR_CHECK_BITS */
+       if (priv->error_type == ERROR_TYPE_CORRECTABLE) {
+               if (priv->location == ERROR_LOCATION_DATA &&
+                   priv->bit > ERROR_BIT_DATA_MAX) {
+                       edac_printk(KERN_INFO, EDAC_MOD_NAME,
+                                   "data bit should not exceed %d (%d)\n",
+                                   ERROR_BIT_DATA_MAX, priv->bit);
+                       return count;
+               }
+
+               if (priv->location == ERROR_LOCATION_CHECKCODE &&
+                   priv->bit > ERROR_BIT_CHECKCODE_MAX) {
+                       edac_printk(KERN_INFO, EDAC_MOD_NAME,
+                                   "checkcode bit should not exceed %d (%d)\n",
+                                   ERROR_BIT_CHECKCODE_MAX, priv->bit);
+                       return count;
+               }
+
+               syndrome = priv->location ? 1 << priv->bit
+                                         : data_synd[priv->bit];
+
+               regmap_write(npcm_regmap, pdata->ctl_xor_check_bits,
+                            val | (syndrome << pdata->xor_check_bits_shift) |
+                            pdata->writeback_en_mask);
+       } else if (priv->error_type == ERROR_TYPE_UNCORRECTABLE) {
+               regmap_write(npcm_regmap, pdata->ctl_xor_check_bits,
+                            val | (UE_SYNDROME << pdata->xor_check_bits_shift));
+       }
+
+       /* force write check */
+       regmap_update_bits(npcm_regmap, pdata->ctl_xor_check_bits,
+                          pdata->fwc_mask, pdata->fwc_mask);
+
+       return count;
+}
+
+static const struct file_operations force_ecc_error_fops = {
+       .open = simple_open,
+       .write = force_ecc_error,
+       .llseek = generic_file_llseek,
+};
+
+/*
+ * Setup debugfs for error injection.
+ *
+ * Nodes:
+ *   error_type                - 0: CE, 1: UE
+ *   location          - 0: data, 1: checkcode
+ *   bit               - 0 ~ 63 for data and 0 ~ 7 for checkcode
+ *   force_ecc_error   - trigger
+ *
+ * Examples:
+ *   1. Inject a correctable error (CE) at checkcode bit 7.
+ *      ~# echo 0 > /sys/kernel/debug/edac/npcm-edac/error_type
+ *      ~# echo 1 > /sys/kernel/debug/edac/npcm-edac/location
+ *      ~# echo 7 > /sys/kernel/debug/edac/npcm-edac/bit
+ *      ~# echo 1 > /sys/kernel/debug/edac/npcm-edac/force_ecc_error
+ *
+ *   2. Inject an uncorrectable error (UE).
+ *      ~# echo 1 > /sys/kernel/debug/edac/npcm-edac/error_type
+ *      ~# echo 1 > /sys/kernel/debug/edac/npcm-edac/force_ecc_error
+ */
+static void setup_debugfs(struct mem_ctl_info *mci)
+{
+       struct priv_data *priv = mci->pvt_info;
+
+       priv->debugfs = edac_debugfs_create_dir(mci->mod_name);
+       if (!priv->debugfs)
+               return;
+
+       edac_debugfs_create_x8("error_type", 0644, priv->debugfs, &priv->error_type);
+       edac_debugfs_create_x8("location", 0644, priv->debugfs, &priv->location);
+       edac_debugfs_create_x8("bit", 0644, priv->debugfs, &priv->bit);
+       edac_debugfs_create_file("force_ecc_error", 0200, priv->debugfs,
+                                &mci->dev, &force_ecc_error_fops);
+}
+
+static int setup_irq(struct mem_ctl_info *mci, struct platform_device *pdev)
+{
+       const struct npcm_platform_data *pdata;
+       int ret, irq;
+
+       pdata = ((struct priv_data *)mci->pvt_info)->pdata;
+       irq = platform_get_irq(pdev, 0);
+       if (irq < 0) {
+               edac_printk(KERN_ERR, EDAC_MOD_NAME, "IRQ not defined in DTS\n");
+               return irq;
+       }
+
+       ret = devm_request_irq(&pdev->dev, irq, edac_ecc_isr, 0,
+                              dev_name(&pdev->dev), mci);
+       if (ret < 0) {
+               edac_printk(KERN_ERR, EDAC_MOD_NAME, "failed to request IRQ\n");
+               return ret;
+       }
+
+       /* enable the functional group of ECC and mask the others */
+       regmap_write(npcm_regmap, pdata->ctl_int_mask_master,
+                    pdata->int_mask_master_non_ecc_mask);
+
+       if (pdata->chip == NPCM8XX_CHIP)
+               regmap_write(npcm_regmap, pdata->ctl_int_mask_ecc,
+                            pdata->int_mask_ecc_non_event_mask);
+
+       return 0;
+}
+
+static const struct regmap_config npcm_regmap_cfg = {
+       .reg_bits       = 32,
+       .reg_stride     = 4,
+       .val_bits       = 32,
+};
+
+static int edac_probe(struct platform_device *pdev)
+{
+       const struct npcm_platform_data *pdata;
+       struct device *dev = &pdev->dev;
+       struct edac_mc_layer layers[1];
+       struct mem_ctl_info *mci;
+       struct priv_data *priv;
+       void __iomem *reg;
+       u32 val;
+       int rc;
+
+       reg = devm_platform_ioremap_resource(pdev, 0);
+       if (IS_ERR(reg))
+               return PTR_ERR(reg);
+
+       npcm_regmap = devm_regmap_init_mmio(dev, reg, &npcm_regmap_cfg);
+       if (IS_ERR(npcm_regmap))
+               return PTR_ERR(npcm_regmap);
+
+       pdata = of_device_get_match_data(dev);
+       if (!pdata)
+               return -EINVAL;
+
+       /* bail out if ECC is not enabled */
+       regmap_read(npcm_regmap, pdata->ctl_ecc_en, &val);
+       if (!(val & pdata->ecc_en_mask)) {
+               edac_printk(KERN_ERR, EDAC_MOD_NAME, "ECC is not enabled\n");
+               return -EPERM;
+       }
+
+       edac_op_state = EDAC_OPSTATE_INT;
+
+       layers[0].type = EDAC_MC_LAYER_ALL_MEM;
+       layers[0].size = 1;
+
+       mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers,
+                           sizeof(struct priv_data));
+       if (!mci)
+               return -ENOMEM;
+
+       mci->pdev = &pdev->dev;
+       priv = mci->pvt_info;
+       priv->reg = reg;
+       priv->pdata = pdata;
+       platform_set_drvdata(pdev, mci);
+
+       mci->mtype_cap = MEM_FLAG_DDR4;
+       mci->edac_ctl_cap = EDAC_FLAG_SECDED;
+       mci->scrub_cap = SCRUB_FLAG_HW_SRC;
+       mci->scrub_mode = SCRUB_HW_SRC;
+       mci->edac_cap = EDAC_FLAG_SECDED;
+       mci->ctl_name = "npcm_ddr_controller";
+       mci->dev_name = dev_name(&pdev->dev);
+       mci->mod_name = EDAC_MOD_NAME;
+       mci->ctl_page_to_phys = NULL;
+
+       rc = setup_irq(mci, pdev);
+       if (rc)
+               goto free_edac_mc;
+
+       rc = edac_mc_add_mc(mci);
+       if (rc)
+               goto free_edac_mc;
+
+       if (IS_ENABLED(CONFIG_EDAC_DEBUG) && pdata->chip == NPCM8XX_CHIP)
+               setup_debugfs(mci);
+
+       return rc;
+
+free_edac_mc:
+       edac_mc_free(mci);
+       return rc;
+}
+
+static int edac_remove(struct platform_device *pdev)
+{
+       struct mem_ctl_info *mci = platform_get_drvdata(pdev);
+       struct priv_data *priv = mci->pvt_info;
+       const struct npcm_platform_data *pdata;
+
+       pdata = priv->pdata;
+       if (IS_ENABLED(CONFIG_EDAC_DEBUG) && pdata->chip == NPCM8XX_CHIP)
+               edac_debugfs_remove_recursive(priv->debugfs);
+
+       edac_mc_del_mc(&pdev->dev);
+       edac_mc_free(mci);
+
+       regmap_write(npcm_regmap, pdata->ctl_int_mask_master,
+                    pdata->int_mask_master_global_mask);
+       regmap_update_bits(npcm_regmap, pdata->ctl_ecc_en, pdata->ecc_en_mask, 0);
+
+       return 0;
+}
+
+static const struct npcm_platform_data npcm750_edac = {
+       .chip                           = NPCM7XX_CHIP,
+
+       /* memory controller registers */
+       .ctl_ecc_en                     = 0x174,
+       .ctl_int_status                 = 0x1d0,
+       .ctl_int_ack                    = 0x1d4,
+       .ctl_int_mask_master            = 0x1d8,
+       .ctl_ce_addr_l                  = 0x188,
+       .ctl_ce_data_l                  = 0x190,
+       .ctl_ce_synd                    = 0x18c,
+       .ctl_ue_addr_l                  = 0x17c,
+       .ctl_ue_data_l                  = 0x184,
+       .ctl_ue_synd                    = 0x180,
+       .ctl_source_id                  = 0x194,
+
+       /* masks and shifts */
+       .ecc_en_mask                    = BIT(24),
+       .int_status_ce_mask             = GENMASK(4, 3),
+       .int_status_ue_mask             = GENMASK(6, 5),
+       .int_ack_ce_mask                = GENMASK(4, 3),
+       .int_ack_ue_mask                = GENMASK(6, 5),
+       .int_mask_master_non_ecc_mask   = GENMASK(30, 7) | GENMASK(2, 0),
+       .int_mask_master_global_mask    = BIT(31),
+       .ce_synd_mask                   = GENMASK(6, 0),
+       .ce_synd_shift                  = 0,
+       .ue_synd_mask                   = GENMASK(6, 0),
+       .ue_synd_shift                  = 0,
+       .source_id_ce_mask              = GENMASK(29, 16),
+       .source_id_ce_shift             = 16,
+       .source_id_ue_mask              = GENMASK(13, 0),
+       .source_id_ue_shift             = 0,
+};
+
+static const struct npcm_platform_data npcm845_edac = {
+       .chip =                         NPCM8XX_CHIP,
+
+       /* memory controller registers */
+       .ctl_ecc_en                     = 0x16c,
+       .ctl_int_status                 = 0x228,
+       .ctl_int_ack                    = 0x244,
+       .ctl_int_mask_master            = 0x220,
+       .ctl_int_mask_ecc               = 0x260,
+       .ctl_ce_addr_l                  = 0x18c,
+       .ctl_ce_addr_h                  = 0x190,
+       .ctl_ce_data_l                  = 0x194,
+       .ctl_ce_data_h                  = 0x198,
+       .ctl_ce_synd                    = 0x190,
+       .ctl_ue_addr_l                  = 0x17c,
+       .ctl_ue_addr_h                  = 0x180,
+       .ctl_ue_data_l                  = 0x184,
+       .ctl_ue_data_h                  = 0x188,
+       .ctl_ue_synd                    = 0x180,
+       .ctl_source_id                  = 0x19c,
+       .ctl_controller_busy            = 0x20c,
+       .ctl_xor_check_bits             = 0x174,
+
+       /* masks and shifts */
+       .ecc_en_mask                    = GENMASK(17, 16),
+       .int_status_ce_mask             = GENMASK(1, 0),
+       .int_status_ue_mask             = GENMASK(3, 2),
+       .int_ack_ce_mask                = GENMASK(1, 0),
+       .int_ack_ue_mask                = GENMASK(3, 2),
+       .int_mask_master_non_ecc_mask   = GENMASK(30, 3) | GENMASK(1, 0),
+       .int_mask_master_global_mask    = BIT(31),
+       .int_mask_ecc_non_event_mask    = GENMASK(8, 4),
+       .ce_addr_h_mask                 = GENMASK(1, 0),
+       .ce_synd_mask                   = GENMASK(15, 8),
+       .ce_synd_shift                  = 8,
+       .ue_addr_h_mask                 = GENMASK(1, 0),
+       .ue_synd_mask                   = GENMASK(15, 8),
+       .ue_synd_shift                  = 8,
+       .source_id_ce_mask              = GENMASK(29, 16),
+       .source_id_ce_shift             = 16,
+       .source_id_ue_mask              = GENMASK(13, 0),
+       .source_id_ue_shift             = 0,
+       .controller_busy_mask           = BIT(0),
+       .xor_check_bits_mask            = GENMASK(23, 16),
+       .xor_check_bits_shift           = 16,
+       .writeback_en_mask              = BIT(24),
+       .fwc_mask                       = BIT(8),
+};
+
+static const struct of_device_id npcm_edac_of_match[] = {
+       {
+               .compatible = "nuvoton,npcm750-memory-controller",
+               .data = &npcm750_edac
+       },
+       {
+               .compatible = "nuvoton,npcm845-memory-controller",
+               .data = &npcm845_edac
+       },
+       {},
+};
+
+MODULE_DEVICE_TABLE(of, npcm_edac_of_match);
+
+static struct platform_driver npcm_edac_driver = {
+       .driver = {
+               .name = "npcm-edac",
+               .of_match_table = npcm_edac_of_match,
+       },
+       .probe = edac_probe,
+       .remove = edac_remove,
+};
+
+module_platform_driver(npcm_edac_driver);
+
+MODULE_AUTHOR("Medad CChien <medadyoung@gmail.com>");
+MODULE_AUTHOR("Marvin Lin <kflin@nuvoton.com>");
+MODULE_DESCRIPTION("Nuvoton NPCM EDAC Driver");
+MODULE_LICENSE("GPL");
index 265e0fb..b2db545 100644 (file)
 #define TRP_SYN_REG_CNT                 6
 #define DRP_SYN_REG_CNT                 8
 
-#define LLCC_COMMON_STATUS0             0x0003000c
 #define LLCC_LB_CNT_MASK                GENMASK(31, 28)
 #define LLCC_LB_CNT_SHIFT               28
 
-/* Single & double bit syndrome register offsets */
-#define TRP_ECC_SB_ERR_SYN0             0x0002304c
-#define TRP_ECC_DB_ERR_SYN0             0x00020370
-#define DRP_ECC_SB_ERR_SYN0             0x0004204c
-#define DRP_ECC_DB_ERR_SYN0             0x00042070
-
-/* Error register offsets */
-#define TRP_ECC_ERROR_STATUS1           0x00020348
-#define TRP_ECC_ERROR_STATUS0           0x00020344
-#define DRP_ECC_ERROR_STATUS1           0x00042048
-#define DRP_ECC_ERROR_STATUS0           0x00042044
-
-/* TRP, DRP interrupt register offsets */
-#define DRP_INTERRUPT_STATUS            0x00041000
-#define TRP_INTERRUPT_0_STATUS          0x00020480
-#define DRP_INTERRUPT_CLEAR             0x00041008
-#define DRP_ECC_ERROR_CNTR_CLEAR        0x00040004
-#define TRP_INTERRUPT_0_CLEAR           0x00020484
-#define TRP_ECC_ERROR_CNTR_CLEAR        0x00020440
-
 /* Mask and shift macros */
 #define ECC_DB_ERR_COUNT_MASK           GENMASK(4, 0)
 #define ECC_DB_ERR_WAYS_MASK            GENMASK(31, 16)
 #define DRP_TRP_INT_CLEAR               GENMASK(1, 0)
 #define DRP_TRP_CNT_CLEAR               GENMASK(1, 0)
 
-/* Config registers offsets*/
-#define DRP_ECC_ERROR_CFG               0x00040000
-
-/* Tag RAM, Data RAM interrupt register offsets */
-#define CMN_INTERRUPT_0_ENABLE          0x0003001c
-#define CMN_INTERRUPT_2_ENABLE          0x0003003c
-#define TRP_INTERRUPT_0_ENABLE          0x00020488
-#define DRP_INTERRUPT_ENABLE            0x0004100c
-
 #define SB_ERROR_THRESHOLD              0x1
 #define SB_ERROR_THRESHOLD_SHIFT        24
 #define SB_DB_TRP_INTERRUPT_ENABLE      0x3
@@ -88,9 +58,6 @@ enum {
 static const struct llcc_edac_reg_data edac_reg_data[] = {
        [LLCC_DRAM_CE] = {
                .name = "DRAM Single-bit",
-               .synd_reg = DRP_ECC_SB_ERR_SYN0,
-               .count_status_reg = DRP_ECC_ERROR_STATUS1,
-               .ways_status_reg = DRP_ECC_ERROR_STATUS0,
                .reg_cnt = DRP_SYN_REG_CNT,
                .count_mask = ECC_SB_ERR_COUNT_MASK,
                .ways_mask = ECC_SB_ERR_WAYS_MASK,
@@ -98,9 +65,6 @@ static const struct llcc_edac_reg_data edac_reg_data[] = {
        },
        [LLCC_DRAM_UE] = {
                .name = "DRAM Double-bit",
-               .synd_reg = DRP_ECC_DB_ERR_SYN0,
-               .count_status_reg = DRP_ECC_ERROR_STATUS1,
-               .ways_status_reg = DRP_ECC_ERROR_STATUS0,
                .reg_cnt = DRP_SYN_REG_CNT,
                .count_mask = ECC_DB_ERR_COUNT_MASK,
                .ways_mask = ECC_DB_ERR_WAYS_MASK,
@@ -108,9 +72,6 @@ static const struct llcc_edac_reg_data edac_reg_data[] = {
        },
        [LLCC_TRAM_CE] = {
                .name = "TRAM Single-bit",
-               .synd_reg = TRP_ECC_SB_ERR_SYN0,
-               .count_status_reg = TRP_ECC_ERROR_STATUS1,
-               .ways_status_reg = TRP_ECC_ERROR_STATUS0,
                .reg_cnt = TRP_SYN_REG_CNT,
                .count_mask = ECC_SB_ERR_COUNT_MASK,
                .ways_mask = ECC_SB_ERR_WAYS_MASK,
@@ -118,9 +79,6 @@ static const struct llcc_edac_reg_data edac_reg_data[] = {
        },
        [LLCC_TRAM_UE] = {
                .name = "TRAM Double-bit",
-               .synd_reg = TRP_ECC_DB_ERR_SYN0,
-               .count_status_reg = TRP_ECC_ERROR_STATUS1,
-               .ways_status_reg = TRP_ECC_ERROR_STATUS0,
                .reg_cnt = TRP_SYN_REG_CNT,
                .count_mask = ECC_DB_ERR_COUNT_MASK,
                .ways_mask = ECC_DB_ERR_WAYS_MASK,
@@ -128,7 +86,7 @@ static const struct llcc_edac_reg_data edac_reg_data[] = {
        },
 };
 
-static int qcom_llcc_core_setup(struct regmap *llcc_bcast_regmap)
+static int qcom_llcc_core_setup(struct llcc_drv_data *drv, struct regmap *llcc_bcast_regmap)
 {
        u32 sb_err_threshold;
        int ret;
@@ -137,31 +95,31 @@ static int qcom_llcc_core_setup(struct regmap *llcc_bcast_regmap)
         * Configure interrupt enable registers such that Tag, Data RAM related
         * interrupts are propagated to interrupt controller for servicing
         */
-       ret = regmap_update_bits(llcc_bcast_regmap, CMN_INTERRUPT_2_ENABLE,
+       ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_2_enable,
                                 TRP0_INTERRUPT_ENABLE,
                                 TRP0_INTERRUPT_ENABLE);
        if (ret)
                return ret;
 
-       ret = regmap_update_bits(llcc_bcast_regmap, TRP_INTERRUPT_0_ENABLE,
+       ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->trp_interrupt_0_enable,
                                 SB_DB_TRP_INTERRUPT_ENABLE,
                                 SB_DB_TRP_INTERRUPT_ENABLE);
        if (ret)
                return ret;
 
        sb_err_threshold = (SB_ERROR_THRESHOLD << SB_ERROR_THRESHOLD_SHIFT);
-       ret = regmap_write(llcc_bcast_regmap, DRP_ECC_ERROR_CFG,
+       ret = regmap_write(llcc_bcast_regmap, drv->edac_reg_offset->drp_ecc_error_cfg,
                           sb_err_threshold);
        if (ret)
                return ret;
 
-       ret = regmap_update_bits(llcc_bcast_regmap, CMN_INTERRUPT_2_ENABLE,
+       ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_2_enable,
                                 DRP0_INTERRUPT_ENABLE,
                                 DRP0_INTERRUPT_ENABLE);
        if (ret)
                return ret;
 
-       ret = regmap_write(llcc_bcast_regmap, DRP_INTERRUPT_ENABLE,
+       ret = regmap_write(llcc_bcast_regmap, drv->edac_reg_offset->drp_interrupt_enable,
                           SB_DB_DRP_INTERRUPT_ENABLE);
        return ret;
 }
@@ -170,29 +128,33 @@ static int qcom_llcc_core_setup(struct regmap *llcc_bcast_regmap)
 static int
 qcom_llcc_clear_error_status(int err_type, struct llcc_drv_data *drv)
 {
-       int ret = 0;
+       int ret;
 
        switch (err_type) {
        case LLCC_DRAM_CE:
        case LLCC_DRAM_UE:
-               ret = regmap_write(drv->bcast_regmap, DRP_INTERRUPT_CLEAR,
+               ret = regmap_write(drv->bcast_regmap,
+                                  drv->edac_reg_offset->drp_interrupt_clear,
                                   DRP_TRP_INT_CLEAR);
                if (ret)
                        return ret;
 
-               ret = regmap_write(drv->bcast_regmap, DRP_ECC_ERROR_CNTR_CLEAR,
+               ret = regmap_write(drv->bcast_regmap,
+                                  drv->edac_reg_offset->drp_ecc_error_cntr_clear,
                                   DRP_TRP_CNT_CLEAR);
                if (ret)
                        return ret;
                break;
        case LLCC_TRAM_CE:
        case LLCC_TRAM_UE:
-               ret = regmap_write(drv->bcast_regmap, TRP_INTERRUPT_0_CLEAR,
+               ret = regmap_write(drv->bcast_regmap,
+                                  drv->edac_reg_offset->trp_interrupt_0_clear,
                                   DRP_TRP_INT_CLEAR);
                if (ret)
                        return ret;
 
-               ret = regmap_write(drv->bcast_regmap, TRP_ECC_ERROR_CNTR_CLEAR,
+               ret = regmap_write(drv->bcast_regmap,
+                                  drv->edac_reg_offset->trp_ecc_error_cntr_clear,
                                   DRP_TRP_CNT_CLEAR);
                if (ret)
                        return ret;
@@ -205,16 +167,54 @@ qcom_llcc_clear_error_status(int err_type, struct llcc_drv_data *drv)
        return ret;
 }
 
+struct qcom_llcc_syn_regs {
+       u32 synd_reg;
+       u32 count_status_reg;
+       u32 ways_status_reg;
+};
+
+static void get_reg_offsets(struct llcc_drv_data *drv, int err_type,
+                           struct qcom_llcc_syn_regs *syn_regs)
+{
+       const struct llcc_edac_reg_offset *edac_reg_offset = drv->edac_reg_offset;
+
+       switch (err_type) {
+       case LLCC_DRAM_CE:
+               syn_regs->synd_reg = edac_reg_offset->drp_ecc_sb_err_syn0;
+               syn_regs->count_status_reg = edac_reg_offset->drp_ecc_error_status1;
+               syn_regs->ways_status_reg = edac_reg_offset->drp_ecc_error_status0;
+               break;
+       case LLCC_DRAM_UE:
+               syn_regs->synd_reg = edac_reg_offset->drp_ecc_db_err_syn0;
+               syn_regs->count_status_reg = edac_reg_offset->drp_ecc_error_status1;
+               syn_regs->ways_status_reg = edac_reg_offset->drp_ecc_error_status0;
+               break;
+       case LLCC_TRAM_CE:
+               syn_regs->synd_reg = edac_reg_offset->trp_ecc_sb_err_syn0;
+               syn_regs->count_status_reg = edac_reg_offset->trp_ecc_error_status1;
+               syn_regs->ways_status_reg = edac_reg_offset->trp_ecc_error_status0;
+               break;
+       case LLCC_TRAM_UE:
+               syn_regs->synd_reg = edac_reg_offset->trp_ecc_db_err_syn0;
+               syn_regs->count_status_reg = edac_reg_offset->trp_ecc_error_status1;
+               syn_regs->ways_status_reg = edac_reg_offset->trp_ecc_error_status0;
+               break;
+       }
+}
+
 /* Dump Syndrome registers data for Tag RAM, Data RAM bit errors*/
 static int
 dump_syn_reg_values(struct llcc_drv_data *drv, u32 bank, int err_type)
 {
        struct llcc_edac_reg_data reg_data = edac_reg_data[err_type];
+       struct qcom_llcc_syn_regs regs = { };
        int err_cnt, err_ways, ret, i;
        u32 synd_reg, synd_val;
 
+       get_reg_offsets(drv, err_type, &regs);
+
        for (i = 0; i < reg_data.reg_cnt; i++) {
-               synd_reg = reg_data.synd_reg + (i * 4);
+               synd_reg = regs.synd_reg + (i * 4);
                ret = regmap_read(drv->regmaps[bank], synd_reg,
                                  &synd_val);
                if (ret)
@@ -224,7 +224,7 @@ dump_syn_reg_values(struct llcc_drv_data *drv, u32 bank, int err_type)
                            reg_data.name, i, synd_val);
        }
 
-       ret = regmap_read(drv->regmaps[bank], reg_data.count_status_reg,
+       ret = regmap_read(drv->regmaps[bank], regs.count_status_reg,
                          &err_cnt);
        if (ret)
                goto clear;
@@ -234,7 +234,7 @@ dump_syn_reg_values(struct llcc_drv_data *drv, u32 bank, int err_type)
        edac_printk(KERN_CRIT, EDAC_LLCC, "%s: Error count: 0x%4x\n",
                    reg_data.name, err_cnt);
 
-       ret = regmap_read(drv->regmaps[bank], reg_data.ways_status_reg,
+       ret = regmap_read(drv->regmaps[bank], regs.ways_status_reg,
                          &err_ways);
        if (ret)
                goto clear;
@@ -295,7 +295,7 @@ static irqreturn_t llcc_ecc_irq_handler(int irq, void *edev_ctl)
 
        /* Iterate over the banks and look for Tag RAM or Data RAM errors */
        for (i = 0; i < drv->num_banks; i++) {
-               ret = regmap_read(drv->regmaps[i], DRP_INTERRUPT_STATUS,
+               ret = regmap_read(drv->regmaps[i], drv->edac_reg_offset->drp_interrupt_status,
                                  &drp_error);
 
                if (!ret && (drp_error & SB_ECC_ERROR)) {
@@ -310,7 +310,7 @@ static irqreturn_t llcc_ecc_irq_handler(int irq, void *edev_ctl)
                if (!ret)
                        irq_rc = IRQ_HANDLED;
 
-               ret = regmap_read(drv->regmaps[i], TRP_INTERRUPT_0_STATUS,
+               ret = regmap_read(drv->regmaps[i], drv->edac_reg_offset->trp_interrupt_0_status,
                                  &trp_error);
 
                if (!ret && (trp_error & SB_ECC_ERROR)) {
@@ -342,7 +342,7 @@ static int qcom_llcc_edac_probe(struct platform_device *pdev)
        int ecc_irq;
        int rc;
 
-       rc = qcom_llcc_core_setup(llcc_driv_data->bcast_regmap);
+       rc = qcom_llcc_core_setup(llcc_driv_data, llcc_driv_data->bcast_regmap);
        if (rc)
                return rc;
 
index 0bcd9f0..b9c5772 100644 (file)
@@ -481,7 +481,7 @@ static int thunderx_create_debugfs_nodes(struct dentry *parent,
                ent = edac_debugfs_create_file(attrs[i]->name, attrs[i]->mode,
                                               parent, data, &attrs[i]->fops);
 
-               if (!ent)
+               if (IS_ERR(ent))
                        break;
        }
 
index af22be8..538bd67 100644 (file)
@@ -706,21 +706,22 @@ static void fwnet_receive_packet(struct fw_card *card, struct fw_request *r,
        int rcode;
 
        if (destination == IEEE1394_ALL_NODES) {
-               kfree(r);
-
-               return;
-       }
-
-       if (offset != dev->handler.offset)
+               // Although the response to the broadcast packet is not necessarily required, the
+               // fw_send_response() function should still be called to maintain the reference
+               // counting of the object. In the case, the call of function just releases the
+               // object as a result to decrease the reference counting.
+               rcode = RCODE_COMPLETE;
+       } else if (offset != dev->handler.offset) {
                rcode = RCODE_ADDRESS_ERROR;
-       else if (tcode != TCODE_WRITE_BLOCK_REQUEST)
+       } else if (tcode != TCODE_WRITE_BLOCK_REQUEST) {
                rcode = RCODE_TYPE_ERROR;
-       else if (fwnet_incoming_packet(dev, payload, length,
-                                      source, generation, false) != 0) {
+       else if (fwnet_incoming_packet(dev, payload, length,
+                                        source, generation, false) != 0) {
                dev_err(&dev->netdev->dev, "incoming packet failure\n");
                rcode = RCODE_CONFLICT_ERROR;
-       } else
+       } else {
                rcode = RCODE_COMPLETE;
+       }
 
        fw_send_response(card, r, rcode);
 }
index f29d77e..2b8bfcd 100644 (file)
@@ -15,6 +15,8 @@
 
 #include "common.h"
 
+static DEFINE_IDA(ffa_bus_id);
+
 static int ffa_device_match(struct device *dev, struct device_driver *drv)
 {
        const struct ffa_device_id *id_table;
@@ -53,7 +55,8 @@ static void ffa_device_remove(struct device *dev)
 {
        struct ffa_driver *ffa_drv = to_ffa_driver(dev->driver);
 
-       ffa_drv->remove(to_ffa_dev(dev));
+       if (ffa_drv->remove)
+               ffa_drv->remove(to_ffa_dev(dev));
 }
 
 static int ffa_device_uevent(const struct device *dev, struct kobj_uevent_env *env)
@@ -130,6 +133,7 @@ static void ffa_release_device(struct device *dev)
 {
        struct ffa_device *ffa_dev = to_ffa_dev(dev);
 
+       ida_free(&ffa_bus_id, ffa_dev->id);
        kfree(ffa_dev);
 }
 
@@ -170,18 +174,24 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev)
 struct ffa_device *ffa_device_register(const uuid_t *uuid, int vm_id,
                                       const struct ffa_ops *ops)
 {
-       int ret;
+       int id, ret;
        struct device *dev;
        struct ffa_device *ffa_dev;
 
+       id = ida_alloc_min(&ffa_bus_id, 1, GFP_KERNEL);
+       if (id < 0)
+               return NULL;
+
        ffa_dev = kzalloc(sizeof(*ffa_dev), GFP_KERNEL);
-       if (!ffa_dev)
+       if (!ffa_dev) {
+               ida_free(&ffa_bus_id, id);
                return NULL;
+       }
 
        dev = &ffa_dev->dev;
        dev->bus = &ffa_bus_type;
        dev->release = ffa_release_device;
-       dev_set_name(&ffa_dev->dev, "arm-ffa-%04x", vm_id);
+       dev_set_name(&ffa_dev->dev, "arm-ffa-%d", id);
 
        ffa_dev->vm_id = vm_id;
        ffa_dev->ops = ops;
@@ -217,4 +227,5 @@ void arm_ffa_bus_exit(void)
 {
        ffa_devices_unregister();
        bus_unregister(&ffa_bus_type);
+       ida_destroy(&ffa_bus_id);
 }
index fa85c64..2109cd1 100644 (file)
@@ -193,7 +193,8 @@ __ffa_partition_info_get(u32 uuid0, u32 uuid1, u32 uuid2, u32 uuid3,
        int idx, count, flags = 0, sz, buf_sz;
        ffa_value_t partition_info;
 
-       if (!buffer || !num_partitions) /* Just get the count for now */
+       if (drv_info->version > FFA_VERSION_1_0 &&
+           (!buffer || !num_partitions)) /* Just get the count for now */
                flags = PARTITION_INFO_GET_RETURN_COUNT_ONLY;
 
        mutex_lock(&drv_info->rx_lock);
@@ -420,12 +421,18 @@ ffa_setup_and_transmit(u32 func_id, void *buffer, u32 max_fragsize,
                ep_mem_access->receiver = args->attrs[idx].receiver;
                ep_mem_access->attrs = args->attrs[idx].attrs;
                ep_mem_access->composite_off = COMPOSITE_OFFSET(args->nattrs);
+               ep_mem_access->flag = 0;
+               ep_mem_access->reserved = 0;
        }
+       mem_region->handle = 0;
+       mem_region->reserved_0 = 0;
+       mem_region->reserved_1 = 0;
        mem_region->ep_count = args->nattrs;
 
        composite = buffer + COMPOSITE_OFFSET(args->nattrs);
        composite->total_pg_cnt = ffa_get_num_pages_sg(args->sg);
        composite->addr_range_cnt = num_entries;
+       composite->reserved = 0;
 
        length = COMPOSITE_CONSTITUENTS_OFFSET(args->nattrs, num_entries);
        frag_len = COMPOSITE_CONSTITUENTS_OFFSET(args->nattrs, 0);
@@ -460,6 +467,7 @@ ffa_setup_and_transmit(u32 func_id, void *buffer, u32 max_fragsize,
 
                constituents->address = sg_phys(args->sg);
                constituents->pg_cnt = args->sg->length / FFA_PAGE_SIZE;
+               constituents->reserved = 0;
                constituents++;
                frag_len += sizeof(struct ffa_mem_region_addr_range);
        } while ((args->sg = sg_next(args->sg)));
index d40df09..6971dcf 100644 (file)
@@ -1066,7 +1066,7 @@ static int scmi_xfer_raw_worker_init(struct scmi_raw_mode_info *raw)
 
        raw->wait_wq = alloc_workqueue("scmi-raw-wait-wq-%d",
                                       WQ_UNBOUND | WQ_FREEZABLE |
-                                      WQ_HIGHPRI, WQ_SYSFS, raw->id);
+                                      WQ_HIGHPRI | WQ_SYSFS, 0, raw->id);
        if (!raw->wait_wq)
                return -ENOMEM;
 
index e4ccfb6..ec056f6 100644 (file)
@@ -2124,6 +2124,7 @@ static int cs_dsp_load_coeff(struct cs_dsp *dsp, const struct firmware *firmware
                                   file, blocks, le32_to_cpu(blk->len),
                                   type, le32_to_cpu(blk->id));
 
+                       region_name = cs_dsp_mem_region_name(type);
                        mem = cs_dsp_find_region(dsp, type);
                        if (!mem) {
                                cs_dsp_err(dsp, "No base for region %x\n", type);
@@ -2147,8 +2148,8 @@ static int cs_dsp_load_coeff(struct cs_dsp *dsp, const struct firmware *firmware
                                reg = dsp->ops->region_to_reg(mem, reg);
                                reg += offset;
                        } else {
-                               cs_dsp_err(dsp, "No %x for algorithm %x\n",
-                                          type, le32_to_cpu(blk->id));
+                               cs_dsp_err(dsp, "No %s for algorithm %x\n",
+                                          region_name, le32_to_cpu(blk->id));
                        }
                        break;
 
index 043ca31..231f1c7 100644 (file)
@@ -269,6 +269,20 @@ config EFI_COCO_SECRET
          virt/coco/efi_secret module to access the secrets, which in turn
          allows userspace programs to access the injected secrets.
 
+config UNACCEPTED_MEMORY
+       bool
+       depends on EFI_STUB
+       help
+          Some Virtual Machine platforms, such as Intel TDX, require
+          some memory to be "accepted" by the guest before it can be used.
+          This mechanism helps prevent malicious hosts from making changes
+          to guest memory.
+
+          UEFI specification v2.9 introduced EFI_UNACCEPTED_MEMORY memory type.
+
+          This option adds support for unaccepted memory and makes such memory
+          usable by the kernel.
+
 config EFI_EMBEDDED_FIRMWARE
        bool
        select CRYPTO_LIB_SHA256
index b51f2a4..e489fef 100644 (file)
@@ -41,3 +41,4 @@ obj-$(CONFIG_EFI_CAPSULE_LOADER)      += capsule-loader.o
 obj-$(CONFIG_EFI_EARLYCON)             += earlycon.o
 obj-$(CONFIG_UEFI_CPER_ARM)            += cper-arm.o
 obj-$(CONFIG_UEFI_CPER_X86)            += cper-x86.o
+obj-$(CONFIG_UNACCEPTED_MEMORY)                += unaccepted_memory.o
index abeff7d..3a6ee7b 100644 (file)
@@ -50,6 +50,9 @@ struct efi __read_mostly efi = {
 #ifdef CONFIG_EFI_COCO_SECRET
        .coco_secret            = EFI_INVALID_TABLE_ADDR,
 #endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+       .unaccepted             = EFI_INVALID_TABLE_ADDR,
+#endif
 };
 EXPORT_SYMBOL(efi);
 
@@ -361,24 +364,6 @@ static void __init efi_debugfs_init(void)
 static inline void efi_debugfs_init(void) {}
 #endif
 
-static void refresh_nv_rng_seed(struct work_struct *work)
-{
-       u8 seed[EFI_RANDOM_SEED_SIZE];
-
-       get_random_bytes(seed, sizeof(seed));
-       efi.set_variable(L"RandomSeed", &LINUX_EFI_RANDOM_SEED_TABLE_GUID,
-                        EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS |
-                        EFI_VARIABLE_RUNTIME_ACCESS, sizeof(seed), seed);
-       memzero_explicit(seed, sizeof(seed));
-}
-static int refresh_nv_rng_seed_notification(struct notifier_block *nb, unsigned long action, void *data)
-{
-       static DECLARE_WORK(work, refresh_nv_rng_seed);
-       schedule_work(&work);
-       return NOTIFY_DONE;
-}
-static struct notifier_block refresh_nv_rng_seed_nb = { .notifier_call = refresh_nv_rng_seed_notification };
-
 /*
  * We register the efi subsystem with the firmware subsystem and the
  * efivars subsystem with the efi subsystem, if the system was booted with
@@ -451,9 +436,6 @@ static int __init efisubsys_init(void)
                platform_device_register_simple("efi_secret", 0, NULL, 0);
 #endif
 
-       if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
-               execute_with_initialized_rng(&refresh_nv_rng_seed_nb);
-
        return 0;
 
 err_remove_group:
@@ -605,6 +587,9 @@ static const efi_config_table_type_t common_tables[] __initconst = {
 #ifdef CONFIG_EFI_COCO_SECRET
        {LINUX_EFI_COCO_SECRET_AREA_GUID,       &efi.coco_secret,       "CocoSecret"    },
 #endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+       {LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID,   &efi.unaccepted,        "Unaccepted"    },
+#endif
 #ifdef CONFIG_EFI_GENERIC_STUB
        {LINUX_EFI_SCREEN_INFO_TABLE_GUID,      &screen_info_table                      },
 #endif
@@ -759,6 +744,25 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
                }
        }
 
+       if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) &&
+           efi.unaccepted != EFI_INVALID_TABLE_ADDR) {
+               struct efi_unaccepted_memory *unaccepted;
+
+               unaccepted = early_memremap(efi.unaccepted, sizeof(*unaccepted));
+               if (unaccepted) {
+                       unsigned long size;
+
+                       if (unaccepted->version == 1) {
+                               size = sizeof(*unaccepted) + unaccepted->size;
+                               memblock_reserve(efi.unaccepted, size);
+                       } else {
+                               efi.unaccepted = EFI_INVALID_TABLE_ADDR;
+                       }
+
+                       early_memunmap(unaccepted, sizeof(*unaccepted));
+               }
+       }
+
        return 0;
 }
 
@@ -843,6 +847,7 @@ static __initdata char memory_type_name[][13] = {
        "MMIO Port",
        "PAL Code",
        "Persistent",
+       "Unaccepted",
 };
 
 char * __init efi_md_typeattr_format(char *buf, size_t size,
index 3abb2b3..16d64a3 100644 (file)
@@ -96,6 +96,8 @@ CFLAGS_arm32-stub.o           := -DTEXT_OFFSET=$(TEXT_OFFSET)
 zboot-obj-$(CONFIG_RISCV)      := lib-clz_ctz.o lib-ashldi3.o
 lib-$(CONFIG_EFI_ZBOOT)                += zboot.o $(zboot-obj-y)
 
+lib-$(CONFIG_UNACCEPTED_MEMORY) += unaccepted_memory.o bitmap.o find.o
+
 extra-y                                := $(lib-y)
 lib-y                          := $(patsubst %.o,%.stub.o,$(lib-y))
 
index 89ef820..2c48962 100644 (file)
@@ -32,7 +32,8 @@ zboot-size-len-$(CONFIG_KERNEL_GZIP)   := 0
 $(obj)/vmlinuz: $(obj)/vmlinux.bin FORCE
        $(call if_changed,$(zboot-method-y))
 
-OBJCOPYFLAGS_vmlinuz.o := -I binary -O $(EFI_ZBOOT_BFD_TARGET) $(EFI_ZBOOT_OBJCOPY_FLAGS) \
+# avoid eager evaluation to prevent references to non-existent build artifacts
+OBJCOPYFLAGS_vmlinuz.o = -I binary -O $(EFI_ZBOOT_BFD_TARGET) $(EFI_ZBOOT_OBJCOPY_FLAGS) \
                          --rename-section .data=.gzdata,load,alloc,readonly,contents
 $(obj)/vmlinuz.o: $(obj)/vmlinuz FORCE
        $(call if_changed,objcopy)
diff --git a/drivers/firmware/efi/libstub/bitmap.c b/drivers/firmware/efi/libstub/bitmap.c
new file mode 100644 (file)
index 0000000..5c9bba0
--- /dev/null
@@ -0,0 +1,41 @@
+#include <linux/bitmap.h>
+
+void __bitmap_set(unsigned long *map, unsigned int start, int len)
+{
+       unsigned long *p = map + BIT_WORD(start);
+       const unsigned int size = start + len;
+       int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
+       unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
+
+       while (len - bits_to_set >= 0) {
+               *p |= mask_to_set;
+               len -= bits_to_set;
+               bits_to_set = BITS_PER_LONG;
+               mask_to_set = ~0UL;
+               p++;
+       }
+       if (len) {
+               mask_to_set &= BITMAP_LAST_WORD_MASK(size);
+               *p |= mask_to_set;
+       }
+}
+
+void __bitmap_clear(unsigned long *map, unsigned int start, int len)
+{
+       unsigned long *p = map + BIT_WORD(start);
+       const unsigned int size = start + len;
+       int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
+       unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+
+       while (len - bits_to_clear >= 0) {
+               *p &= ~mask_to_clear;
+               len -= bits_to_clear;
+               bits_to_clear = BITS_PER_LONG;
+               mask_to_clear = ~0UL;
+               p++;
+       }
+       if (len) {
+               mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+               *p &= ~mask_to_clear;
+       }
+}
index 67d5a20..6aa38a1 100644 (file)
@@ -1133,4 +1133,13 @@ const u8 *__efi_get_smbios_string(const struct efi_smbios_record *record,
 void efi_remap_image(unsigned long image_base, unsigned alloc_size,
                     unsigned long code_size);
 
+asmlinkage efi_status_t __efiapi
+efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab);
+
+efi_status_t allocate_unaccepted_bitmap(__u32 nr_desc,
+                                       struct efi_boot_memmap *map);
+void process_unaccepted_memory(u64 start, u64 end);
+void accept_memory(phys_addr_t start, phys_addr_t end);
+void arch_accept_memory(phys_addr_t start, phys_addr_t end);
+
 #endif
diff --git a/drivers/firmware/efi/libstub/find.c b/drivers/firmware/efi/libstub/find.c
new file mode 100644 (file)
index 0000000..4e7740d
--- /dev/null
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/bitmap.h>
+#include <linux/math.h>
+#include <linux/minmax.h>
+
+/*
+ * Common helper for find_next_bit() function family
+ * @FETCH: The expression that fetches and pre-processes each word of bitmap(s)
+ * @MUNGE: The expression that post-processes a word containing found bit (may be empty)
+ * @size: The bitmap size in bits
+ * @start: The bitnumber to start searching at
+ */
+#define FIND_NEXT_BIT(FETCH, MUNGE, size, start)                               \
+({                                                                             \
+       unsigned long mask, idx, tmp, sz = (size), __start = (start);           \
+                                                                               \
+       if (unlikely(__start >= sz))                                            \
+               goto out;                                                       \
+                                                                               \
+       mask = MUNGE(BITMAP_FIRST_WORD_MASK(__start));                          \
+       idx = __start / BITS_PER_LONG;                                          \
+                                                                               \
+       for (tmp = (FETCH) & mask; !tmp; tmp = (FETCH)) {                       \
+               if ((idx + 1) * BITS_PER_LONG >= sz)                            \
+                       goto out;                                               \
+               idx++;                                                          \
+       }                                                                       \
+                                                                               \
+       sz = min(idx * BITS_PER_LONG + __ffs(MUNGE(tmp)), sz);                  \
+out:                                                                           \
+       sz;                                                                     \
+})
+
+unsigned long _find_next_bit(const unsigned long *addr, unsigned long nbits, unsigned long start)
+{
+       return FIND_NEXT_BIT(addr[idx], /* nop */, nbits, start);
+}
+
+unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits,
+                                        unsigned long start)
+{
+       return FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+}
diff --git a/drivers/firmware/efi/libstub/unaccepted_memory.c b/drivers/firmware/efi/libstub/unaccepted_memory.c
new file mode 100644 (file)
index 0000000..ca61f47
--- /dev/null
@@ -0,0 +1,222 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/efi.h>
+#include <asm/efi.h>
+#include "efistub.h"
+
+struct efi_unaccepted_memory *unaccepted_table;
+
+efi_status_t allocate_unaccepted_bitmap(__u32 nr_desc,
+                                       struct efi_boot_memmap *map)
+{
+       efi_guid_t unaccepted_table_guid = LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID;
+       u64 unaccepted_start = ULLONG_MAX, unaccepted_end = 0, bitmap_size;
+       efi_status_t status;
+       int i;
+
+       /* Check if the table is already installed */
+       unaccepted_table = get_efi_config_table(unaccepted_table_guid);
+       if (unaccepted_table) {
+               if (unaccepted_table->version != 1) {
+                       efi_err("Unknown version of unaccepted memory table\n");
+                       return EFI_UNSUPPORTED;
+               }
+               return EFI_SUCCESS;
+       }
+
+       /* Check if there's any unaccepted memory and find the max address */
+       for (i = 0; i < nr_desc; i++) {
+               efi_memory_desc_t *d;
+               unsigned long m = (unsigned long)map->map;
+
+               d = efi_early_memdesc_ptr(m, map->desc_size, i);
+               if (d->type != EFI_UNACCEPTED_MEMORY)
+                       continue;
+
+               unaccepted_start = min(unaccepted_start, d->phys_addr);
+               unaccepted_end = max(unaccepted_end,
+                                    d->phys_addr + d->num_pages * PAGE_SIZE);
+       }
+
+       if (unaccepted_start == ULLONG_MAX)
+               return EFI_SUCCESS;
+
+       unaccepted_start = round_down(unaccepted_start,
+                                     EFI_UNACCEPTED_UNIT_SIZE);
+       unaccepted_end = round_up(unaccepted_end, EFI_UNACCEPTED_UNIT_SIZE);
+
+       /*
+        * If unaccepted memory is present, allocate a bitmap to track what
+        * memory has to be accepted before access.
+        *
+        * One bit in the bitmap represents 2MiB in the address space:
+        * A 4k bitmap can track 64GiB of physical address space.
+        *
+        * In the worst case scenario -- a huge hole in the middle of the
+        * address space -- It needs 256MiB to handle 4PiB of the address
+        * space.
+        *
+        * The bitmap will be populated in setup_e820() according to the memory
+        * map after efi_exit_boot_services().
+        */
+       bitmap_size = DIV_ROUND_UP(unaccepted_end - unaccepted_start,
+                                  EFI_UNACCEPTED_UNIT_SIZE * BITS_PER_BYTE);
+
+       status = efi_bs_call(allocate_pool, EFI_LOADER_DATA,
+                            sizeof(*unaccepted_table) + bitmap_size,
+                            (void **)&unaccepted_table);
+       if (status != EFI_SUCCESS) {
+               efi_err("Failed to allocate unaccepted memory config table\n");
+               return status;
+       }
+
+       unaccepted_table->version = 1;
+       unaccepted_table->unit_size = EFI_UNACCEPTED_UNIT_SIZE;
+       unaccepted_table->phys_base = unaccepted_start;
+       unaccepted_table->size = bitmap_size;
+       memset(unaccepted_table->bitmap, 0, bitmap_size);
+
+       status = efi_bs_call(install_configuration_table,
+                            &unaccepted_table_guid, unaccepted_table);
+       if (status != EFI_SUCCESS) {
+               efi_bs_call(free_pool, unaccepted_table);
+               efi_err("Failed to install unaccepted memory config table!\n");
+       }
+
+       return status;
+}
+
+/*
+ * The accepted memory bitmap only works at unit_size granularity.  Take
+ * unaligned start/end addresses and either:
+ *  1. Accepts the memory immediately and in its entirety
+ *  2. Accepts unaligned parts, and marks *some* aligned part unaccepted
+ *
+ * The function will never reach the bitmap_set() with zero bits to set.
+ */
+void process_unaccepted_memory(u64 start, u64 end)
+{
+       u64 unit_size = unaccepted_table->unit_size;
+       u64 unit_mask = unaccepted_table->unit_size - 1;
+       u64 bitmap_size = unaccepted_table->size;
+
+       /*
+        * Ensure that at least one bit will be set in the bitmap by
+        * immediately accepting all regions under 2*unit_size.  This is
+        * imprecise and may immediately accept some areas that could
+        * have been represented in the bitmap.  But, results in simpler
+        * code below
+        *
+        * Consider case like this (assuming unit_size == 2MB):
+        *
+        * | 4k | 2044k |    2048k   |
+        * ^ 0x0        ^ 2MB        ^ 4MB
+        *
+        * Only the first 4k has been accepted. The 0MB->2MB region can not be
+        * represented in the bitmap. The 2MB->4MB region can be represented in
+        * the bitmap. But, the 0MB->4MB region is <2*unit_size and will be
+        * immediately accepted in its entirety.
+        */
+       if (end - start < 2 * unit_size) {
+               arch_accept_memory(start, end);
+               return;
+       }
+
+       /*
+        * No matter how the start and end are aligned, at least one unaccepted
+        * unit_size area will remain to be marked in the bitmap.
+        */
+
+       /* Immediately accept a <unit_size piece at the start: */
+       if (start & unit_mask) {
+               arch_accept_memory(start, round_up(start, unit_size));
+               start = round_up(start, unit_size);
+       }
+
+       /* Immediately accept a <unit_size piece at the end: */
+       if (end & unit_mask) {
+               arch_accept_memory(round_down(end, unit_size), end);
+               end = round_down(end, unit_size);
+       }
+
+       /*
+        * Accept part of the range that before phys_base and cannot be recorded
+        * into the bitmap.
+        */
+       if (start < unaccepted_table->phys_base) {
+               arch_accept_memory(start,
+                                  min(unaccepted_table->phys_base, end));
+               start = unaccepted_table->phys_base;
+       }
+
+       /* Nothing to record */
+       if (end < unaccepted_table->phys_base)
+               return;
+
+       /* Translate to offsets from the beginning of the bitmap */
+       start -= unaccepted_table->phys_base;
+       end -= unaccepted_table->phys_base;
+
+       /* Accept memory that doesn't fit into bitmap */
+       if (end > bitmap_size * unit_size * BITS_PER_BYTE) {
+               unsigned long phys_start, phys_end;
+
+               phys_start = bitmap_size * unit_size * BITS_PER_BYTE +
+                            unaccepted_table->phys_base;
+               phys_end = end + unaccepted_table->phys_base;
+
+               arch_accept_memory(phys_start, phys_end);
+               end = bitmap_size * unit_size * BITS_PER_BYTE;
+       }
+
+       /*
+        * 'start' and 'end' are now both unit_size-aligned.
+        * Record the range as being unaccepted:
+        */
+       bitmap_set(unaccepted_table->bitmap,
+                  start / unit_size, (end - start) / unit_size);
+}
+
+void accept_memory(phys_addr_t start, phys_addr_t end)
+{
+       unsigned long range_start, range_end;
+       unsigned long bitmap_size;
+       u64 unit_size;
+
+       if (!unaccepted_table)
+               return;
+
+       unit_size = unaccepted_table->unit_size;
+
+       /*
+        * Only care for the part of the range that is represented
+        * in the bitmap.
+        */
+       if (start < unaccepted_table->phys_base)
+               start = unaccepted_table->phys_base;
+       if (end < unaccepted_table->phys_base)
+               return;
+
+       /* Translate to offsets from the beginning of the bitmap */
+       start -= unaccepted_table->phys_base;
+       end -= unaccepted_table->phys_base;
+
+       /* Make sure not to overrun the bitmap */
+       if (end > unaccepted_table->size * unit_size * BITS_PER_BYTE)
+               end = unaccepted_table->size * unit_size * BITS_PER_BYTE;
+
+       range_start = start / unit_size;
+       bitmap_size = DIV_ROUND_UP(end, unit_size);
+
+       for_each_set_bitrange_from(range_start, range_end,
+                                  unaccepted_table->bitmap, bitmap_size) {
+               unsigned long phys_start, phys_end;
+
+               phys_start = range_start * unit_size + unaccepted_table->phys_base;
+               phys_end = range_end * unit_size + unaccepted_table->phys_base;
+
+               arch_accept_memory(phys_start, phys_end);
+               bitmap_clear(unaccepted_table->bitmap,
+                            range_start, range_end - range_start);
+       }
+}
index a0bfd31..220be75 100644 (file)
@@ -26,6 +26,17 @@ const efi_dxe_services_table_t *efi_dxe_table;
 u32 image_offset __section(".data");
 static efi_loaded_image_t *image = NULL;
 
+typedef union sev_memory_acceptance_protocol sev_memory_acceptance_protocol_t;
+union sev_memory_acceptance_protocol {
+       struct {
+               efi_status_t (__efiapi * allow_unaccepted_memory)(
+                       sev_memory_acceptance_protocol_t *);
+       };
+       struct {
+               u32 allow_unaccepted_memory;
+       } mixed_mode;
+};
+
 static efi_status_t
 preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
 {
@@ -310,6 +321,29 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
 #endif
 }
 
+static void setup_unaccepted_memory(void)
+{
+       efi_guid_t mem_acceptance_proto = OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID;
+       sev_memory_acceptance_protocol_t *proto;
+       efi_status_t status;
+
+       if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
+               return;
+
+       /*
+        * Enable unaccepted memory before calling exit boot services in order
+        * for the UEFI to not accept all memory on EBS.
+        */
+       status = efi_bs_call(locate_protocol, &mem_acceptance_proto, NULL,
+                            (void **)&proto);
+       if (status != EFI_SUCCESS)
+               return;
+
+       status = efi_call_proto(proto, allow_unaccepted_memory);
+       if (status != EFI_SUCCESS)
+               efi_err("Memory acceptance protocol failed\n");
+}
+
 static const efi_char16_t apple[] = L"Apple";
 
 static void setup_quirks(struct boot_params *boot_params,
@@ -613,6 +647,16 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
                        e820_type = E820_TYPE_PMEM;
                        break;
 
+               case EFI_UNACCEPTED_MEMORY:
+                       if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) {
+                               efi_warn_once(
+"The system has unaccepted memory,  but kernel does not support it\nConsider enabling CONFIG_UNACCEPTED_MEMORY\n");
+                               continue;
+                       }
+                       e820_type = E820_TYPE_RAM;
+                       process_unaccepted_memory(d->phys_addr,
+                                                 d->phys_addr + PAGE_SIZE * d->num_pages);
+                       break;
                default:
                        continue;
                }
@@ -681,28 +725,27 @@ static efi_status_t allocate_e820(struct boot_params *params,
                                  struct setup_data **e820ext,
                                  u32 *e820ext_size)
 {
-       unsigned long map_size, desc_size, map_key;
+       struct efi_boot_memmap *map;
        efi_status_t status;
-       __u32 nr_desc, desc_version;
-
-       /* Only need the size of the mem map and size of each mem descriptor */
-       map_size = 0;
-       status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key,
-                            &desc_size, &desc_version);
-       if (status != EFI_BUFFER_TOO_SMALL)
-               return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED;
+       __u32 nr_desc;
 
-       nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS;
+       status = efi_get_memory_map(&map, false);
+       if (status != EFI_SUCCESS)
+               return status;
 
-       if (nr_desc > ARRAY_SIZE(params->e820_table)) {
-               u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table);
+       nr_desc = map->map_size / map->desc_size;
+       if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) {
+               u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) +
+                                EFI_MMAP_NR_SLACK_SLOTS;
 
                status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
-               if (status != EFI_SUCCESS)
-                       return status;
        }
 
-       return EFI_SUCCESS;
+       if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) && status == EFI_SUCCESS)
+               status = allocate_unaccepted_bitmap(nr_desc, map);
+
+       efi_bs_call(free_pool, map);
+       return status;
 }
 
 struct exit_boot_struct {
@@ -899,6 +942,8 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
 
        setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
 
+       setup_unaccepted_memory();
+
        status = exit_boot(boot_params, handle);
        if (status != EFI_SUCCESS) {
                efi_err("exit_boot() failed!\n");
diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c
new file mode 100644 (file)
index 0000000..853f7dc
--- /dev/null
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/efi.h>
+#include <linux/memblock.h>
+#include <linux/spinlock.h>
+#include <asm/unaccepted_memory.h>
+
+/* Protects unaccepted memory bitmap */
+static DEFINE_SPINLOCK(unaccepted_memory_lock);
+
+/*
+ * accept_memory() -- Consult bitmap and accept the memory if needed.
+ *
+ * Only memory that is explicitly marked as unaccepted in the bitmap requires
+ * an action. All the remaining memory is implicitly accepted and doesn't need
+ * acceptance.
+ *
+ * No need to accept:
+ *  - anything if the system has no unaccepted table;
+ *  - memory that is below phys_base;
+ *  - memory that is above the memory that addressable by the bitmap;
+ */
+void accept_memory(phys_addr_t start, phys_addr_t end)
+{
+       struct efi_unaccepted_memory *unaccepted;
+       unsigned long range_start, range_end;
+       unsigned long flags;
+       u64 unit_size;
+
+       unaccepted = efi_get_unaccepted_table();
+       if (!unaccepted)
+               return;
+
+       unit_size = unaccepted->unit_size;
+
+       /*
+        * Only care for the part of the range that is represented
+        * in the bitmap.
+        */
+       if (start < unaccepted->phys_base)
+               start = unaccepted->phys_base;
+       if (end < unaccepted->phys_base)
+               return;
+
+       /* Translate to offsets from the beginning of the bitmap */
+       start -= unaccepted->phys_base;
+       end -= unaccepted->phys_base;
+
+       /*
+        * load_unaligned_zeropad() can lead to unwanted loads across page
+        * boundaries. The unwanted loads are typically harmless. But, they
+        * might be made to totally unrelated or even unmapped memory.
+        * load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now
+        * #VE) to recover from these unwanted loads.
+        *
+        * But, this approach does not work for unaccepted memory. For TDX, a
+        * load from unaccepted memory will not lead to a recoverable exception
+        * within the guest. The guest will exit to the VMM where the only
+        * recourse is to terminate the guest.
+        *
+        * There are two parts to fix this issue and comprehensively avoid
+        * access to unaccepted memory. Together these ensure that an extra
+        * "guard" page is accepted in addition to the memory that needs to be
+        * used:
+        *
+        * 1. Implicitly extend the range_contains_unaccepted_memory(start, end)
+        *    checks up to end+unit_size if 'end' is aligned on a unit_size
+        *    boundary.
+        *
+        * 2. Implicitly extend accept_memory(start, end) to end+unit_size if
+        *    'end' is aligned on a unit_size boundary. (immediately following
+        *    this comment)
+        */
+       if (!(end % unit_size))
+               end += unit_size;
+
+       /* Make sure not to overrun the bitmap */
+       if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
+               end = unaccepted->size * unit_size * BITS_PER_BYTE;
+
+       range_start = start / unit_size;
+
+       spin_lock_irqsave(&unaccepted_memory_lock, flags);
+       for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap,
+                                  DIV_ROUND_UP(end, unit_size)) {
+               unsigned long phys_start, phys_end;
+               unsigned long len = range_end - range_start;
+
+               phys_start = range_start * unit_size + unaccepted->phys_base;
+               phys_end = range_end * unit_size + unaccepted->phys_base;
+
+               arch_accept_memory(phys_start, phys_end);
+               bitmap_clear(unaccepted->bitmap, range_start, len);
+       }
+       spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
+}
+
+bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end)
+{
+       struct efi_unaccepted_memory *unaccepted;
+       unsigned long flags;
+       bool ret = false;
+       u64 unit_size;
+
+       unaccepted = efi_get_unaccepted_table();
+       if (!unaccepted)
+               return false;
+
+       unit_size = unaccepted->unit_size;
+
+       /*
+        * Only care for the part of the range that is represented
+        * in the bitmap.
+        */
+       if (start < unaccepted->phys_base)
+               start = unaccepted->phys_base;
+       if (end < unaccepted->phys_base)
+               return false;
+
+       /* Translate to offsets from the beginning of the bitmap */
+       start -= unaccepted->phys_base;
+       end -= unaccepted->phys_base;
+
+       /*
+        * Also consider the unaccepted state of the *next* page. See fix #1 in
+        * the comment on load_unaligned_zeropad() in accept_memory().
+        */
+       if (!(end % unit_size))
+               end += unit_size;
+
+       /* Make sure not to overrun the bitmap */
+       if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
+               end = unaccepted->size * unit_size * BITS_PER_BYTE;
+
+       spin_lock_irqsave(&unaccepted_memory_lock, flags);
+       while (start < end) {
+               if (test_bit(start / unit_size, unaccepted->bitmap)) {
+                       ret = true;
+                       break;
+               }
+
+               start += unit_size;
+       }
+       spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
+
+       return ret;
+}
index 94b49cc..71f5130 100644 (file)
@@ -42,8 +42,6 @@ static const struct {
 };
 
 #define IBFT_SIGN_LEN 4
-#define IBFT_START 0x80000 /* 512kB */
-#define IBFT_END 0x100000 /* 1MB */
 #define VGA_MEM 0xA0000 /* VGA buffer */
 #define VGA_SIZE 0x20000 /* 128kB */
 
@@ -52,9 +50,9 @@ static const struct {
  */
 void __init reserve_ibft_region(void)
 {
-       unsigned long pos;
+       unsigned long pos, virt_pos = 0;
        unsigned int len = 0;
-       void *virt;
+       void *virt = NULL;
        int i;
 
        ibft_phys_addr = 0;
@@ -70,13 +68,20 @@ void __init reserve_ibft_region(void)
                 * so skip that area */
                if (pos == VGA_MEM)
                        pos += VGA_SIZE;
-               virt = isa_bus_to_virt(pos);
+
+               /* Map page by page */
+               if (offset_in_page(pos) == 0) {
+                       if (virt)
+                               early_memunmap(virt, PAGE_SIZE);
+                       virt = early_memremap_ro(pos, PAGE_SIZE);
+                       virt_pos = pos;
+               }
 
                for (i = 0; i < ARRAY_SIZE(ibft_signs); i++) {
-                       if (memcmp(virt, ibft_signs[i].sign, IBFT_SIGN_LEN) ==
-                           0) {
+                       if (memcmp(virt + (pos - virt_pos), ibft_signs[i].sign,
+                                  IBFT_SIGN_LEN) == 0) {
                                unsigned long *addr =
-                                   (unsigned long *)isa_bus_to_virt(pos + 4);
+                                   (unsigned long *)(virt + pos - virt_pos + 4);
                                len = *addr;
                                /* if the length of the table extends past 1M,
                                 * the table cannot be valid. */
@@ -84,9 +89,12 @@ void __init reserve_ibft_region(void)
                                        ibft_phys_addr = pos;
                                        memblock_reserve(ibft_phys_addr, PAGE_ALIGN(len));
                                        pr_info("iBFT found at %pa.\n", &ibft_phys_addr);
-                                       return;
+                                       goto out;
                                }
                        }
                }
        }
+
+out:
+       early_memunmap(virt, PAGE_SIZE);
 }
index 82c64cb..74363ed 100644 (file)
@@ -51,7 +51,8 @@ __init bool sysfb_parse_mode(const struct screen_info *si,
         *
         * It's not easily possible to fix this in struct screen_info,
         * as this could break UAPI. The best solution is to compute
-        * bits_per_pixel here and ignore lfb_depth. In the loop below,
+        * bits_per_pixel from the color bits, reserved bits and
+        * reported lfb_depth, whichever is highest.  In the loop below,
         * ignore simplefb formats with alpha bits, as EFI and VESA
         * don't specify alpha channels.
         */
@@ -60,6 +61,7 @@ __init bool sysfb_parse_mode(const struct screen_info *si,
                                          si->green_size + si->green_pos,
                                          si->blue_size + si->blue_pos),
                                     si->rsvd_size + si->rsvd_pos);
+               bits_per_pixel = max_t(u32, bits_per_pixel, si->lfb_depth);
        } else {
                bits_per_pixel = si->lfb_depth;
        }
index 5521f06..f45c6a3 100644 (file)
@@ -897,7 +897,7 @@ config GPIO_F7188X
        help
          This option enables support for GPIOs found on Fintek Super-I/O
          chips F71869, F71869A, F71882FG, F71889F and F81866.
-         As well as Nuvoton Super-I/O chip NCT6116D.
+         As well as Nuvoton Super-I/O chip NCT6126D.
 
          To compile this driver as a module, choose M here: the module will
          be called f7188x-gpio.
index 9effa77..f54ca5a 100644 (file)
@@ -48,7 +48,7 @@
 /*
  * Nuvoton devices.
  */
-#define SIO_NCT6116D_ID                0xD283  /* NCT6116D chipset ID */
+#define SIO_NCT6126D_ID                0xD283  /* NCT6126D chipset ID */
 
 #define SIO_LD_GPIO_NUVOTON    0x07    /* GPIO logical device */
 
@@ -62,7 +62,7 @@ enum chips {
        f81866,
        f81804,
        f81865,
-       nct6116d,
+       nct6126d,
 };
 
 static const char * const f7188x_names[] = {
@@ -74,7 +74,7 @@ static const char * const f7188x_names[] = {
        "f81866",
        "f81804",
        "f81865",
-       "nct6116d",
+       "nct6126d",
 };
 
 struct f7188x_sio {
@@ -187,8 +187,8 @@ static int f7188x_gpio_set_config(struct gpio_chip *chip, unsigned offset,
 /* Output mode register (0:open drain 1:push-pull). */
 #define f7188x_gpio_out_mode(base) ((base) + 3)
 
-#define f7188x_gpio_dir_invert(type)   ((type) == nct6116d)
-#define f7188x_gpio_data_single(type)  ((type) == nct6116d)
+#define f7188x_gpio_dir_invert(type)   ((type) == nct6126d)
+#define f7188x_gpio_data_single(type)  ((type) == nct6126d)
 
 static struct f7188x_gpio_bank f71869_gpio_bank[] = {
        F7188X_GPIO_BANK(0, 6, 0xF0, DRVNAME "-0"),
@@ -274,7 +274,7 @@ static struct f7188x_gpio_bank f81865_gpio_bank[] = {
        F7188X_GPIO_BANK(60, 5, 0x90, DRVNAME "-6"),
 };
 
-static struct f7188x_gpio_bank nct6116d_gpio_bank[] = {
+static struct f7188x_gpio_bank nct6126d_gpio_bank[] = {
        F7188X_GPIO_BANK(0, 8, 0xE0, DRVNAME "-0"),
        F7188X_GPIO_BANK(10, 8, 0xE4, DRVNAME "-1"),
        F7188X_GPIO_BANK(20, 8, 0xE8, DRVNAME "-2"),
@@ -282,7 +282,7 @@ static struct f7188x_gpio_bank nct6116d_gpio_bank[] = {
        F7188X_GPIO_BANK(40, 8, 0xF0, DRVNAME "-4"),
        F7188X_GPIO_BANK(50, 8, 0xF4, DRVNAME "-5"),
        F7188X_GPIO_BANK(60, 8, 0xF8, DRVNAME "-6"),
-       F7188X_GPIO_BANK(70, 1, 0xFC, DRVNAME "-7"),
+       F7188X_GPIO_BANK(70, 8, 0xFC, DRVNAME "-7"),
 };
 
 static int f7188x_gpio_get_direction(struct gpio_chip *chip, unsigned offset)
@@ -490,9 +490,9 @@ static int f7188x_gpio_probe(struct platform_device *pdev)
                data->nr_bank = ARRAY_SIZE(f81865_gpio_bank);
                data->bank = f81865_gpio_bank;
                break;
-       case nct6116d:
-               data->nr_bank = ARRAY_SIZE(nct6116d_gpio_bank);
-               data->bank = nct6116d_gpio_bank;
+       case nct6126d:
+               data->nr_bank = ARRAY_SIZE(nct6126d_gpio_bank);
+               data->bank = nct6126d_gpio_bank;
                break;
        default:
                return -ENODEV;
@@ -559,9 +559,9 @@ static int __init f7188x_find(int addr, struct f7188x_sio *sio)
        case SIO_F81865_ID:
                sio->type = f81865;
                break;
-       case SIO_NCT6116D_ID:
+       case SIO_NCT6126D_ID:
                sio->device = SIO_LD_GPIO_NUVOTON;
-               sio->type = nct6116d;
+               sio->type = nct6126d;
                break;
        default:
                pr_info("Unsupported Fintek device 0x%04x\n", devid);
@@ -569,7 +569,7 @@ static int __init f7188x_find(int addr, struct f7188x_sio *sio)
        }
 
        /* double check manufacturer where possible */
-       if (sio->type != nct6116d) {
+       if (sio->type != nct6126d) {
                manid = superio_inw(addr, SIO_FINTEK_MANID);
                if (manid != SIO_FINTEK_ID) {
                        pr_debug("Not a Fintek device at 0x%08x\n", addr);
@@ -581,7 +581,7 @@ static int __init f7188x_find(int addr, struct f7188x_sio *sio)
        err = 0;
 
        pr_info("Found %s at %#x\n", f7188x_names[sio->type], (unsigned int)addr);
-       if (sio->type != nct6116d)
+       if (sio->type != nct6126d)
                pr_info("   revision %d\n", superio_inb(addr, SIO_FINTEK_DEVREV));
 
 err:
index e6a7049..b32063a 100644 (file)
@@ -369,7 +369,7 @@ static void gpio_mockup_debugfs_setup(struct device *dev,
                priv->offset = i;
                priv->desc = gpiochip_get_desc(gc, i);
 
-               debugfs_create_file(name, 0200, chip->dbg_dir, priv,
+               debugfs_create_file(name, 0600, chip->dbg_dir, priv,
                                    &gpio_mockup_debugfs_ops);
        }
 }
index 98939cd..745e5f6 100644 (file)
@@ -221,8 +221,12 @@ static int sifive_gpio_probe(struct platform_device *pdev)
                return -ENODEV;
        }
 
-       for (i = 0; i < ngpio; i++)
-               chip->irq_number[i] = platform_get_irq(pdev, i);
+       for (i = 0; i < ngpio; i++) {
+               ret = platform_get_irq(pdev, i);
+               if (ret < 0)
+                       return ret;
+               chip->irq_number[i] = ret;
+       }
 
        ret = bgpio_init(&chip->gc, dev, 4,
                         chip->base + SIFIVE_GPIO_INPUT_VAL,
index a1c8702..8b49b0a 100644 (file)
@@ -696,6 +696,9 @@ static char **gpio_sim_make_line_names(struct gpio_sim_bank *bank,
        char **line_names;
 
        list_for_each_entry(line, &bank->line_list, siblings) {
+               if (line->offset >= bank->num_lines)
+                       continue;
+
                if (line->name) {
                        if (line->offset > max_offset)
                                max_offset = line->offset;
@@ -721,8 +724,13 @@ static char **gpio_sim_make_line_names(struct gpio_sim_bank *bank,
        if (!line_names)
                return ERR_PTR(-ENOMEM);
 
-       list_for_each_entry(line, &bank->line_list, siblings)
-               line_names[line->offset] = line->name;
+       list_for_each_entry(line, &bank->line_list, siblings) {
+               if (line->offset >= bank->num_lines)
+                       continue;
+
+               if (line->name && (line->offset <= max_offset))
+                       line_names[line->offset] = line->name;
+       }
 
        return line_names;
 }
@@ -754,6 +762,9 @@ static int gpio_sim_add_hogs(struct gpio_sim_device *dev)
 
        list_for_each_entry(bank, &dev->bank_list, siblings) {
                list_for_each_entry(line, &bank->line_list, siblings) {
+                       if (line->offset >= bank->num_lines)
+                               continue;
+
                        if (line->hog)
                                num_hogs++;
                }
@@ -769,6 +780,9 @@ static int gpio_sim_add_hogs(struct gpio_sim_device *dev)
 
        list_for_each_entry(bank, &dev->bank_list, siblings) {
                list_for_each_entry(line, &bank->line_list, siblings) {
+                       if (line->offset >= bank->num_lines)
+                               continue;
+
                        if (!line->hog)
                                continue;
 
index 04fb05d..5be8ad6 100644 (file)
@@ -209,6 +209,8 @@ static int gpiochip_find_base(int ngpio)
                        break;
                /* nope, check the space right after the chip */
                base = gdev->base + gdev->ngpio;
+               if (base < GPIO_DYNAMIC_BASE)
+                       base = GPIO_DYNAMIC_BASE;
        }
 
        if (gpio_is_valid(base)) {
@@ -1743,7 +1745,7 @@ static void gpiochip_irqchip_remove(struct gpio_chip *gc)
        }
 
        /* Remove all IRQ mappings and delete the domain */
-       if (gc->irq.domain) {
+       if (!gc->irq.domain_is_allocated_externally && gc->irq.domain) {
                unsigned int irq;
 
                for (offset = 0; offset < gc->ngpio; offset++) {
@@ -1789,6 +1791,15 @@ int gpiochip_irqchip_add_domain(struct gpio_chip *gc,
 
        gc->to_irq = gpiochip_to_irq;
        gc->irq.domain = domain;
+       gc->irq.domain_is_allocated_externally = true;
+
+       /*
+        * Using barrier() here to prevent compiler from reordering
+        * gc->irq.initialized before adding irqdomain.
+        */
+       barrier();
+
+       gc->irq.initialized = true;
 
        return 0;
 }
index aeeec21..fd6e837 100644 (file)
@@ -1092,16 +1092,20 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev)
         * S0ix even though the system is suspending to idle, so return false
         * in that case.
         */
-       if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
-               dev_warn_once(adev->dev,
+       if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
+               dev_err_once(adev->dev,
                              "Power consumption will be higher as BIOS has not been configured for suspend-to-idle.\n"
                              "To use suspend-to-idle change the sleep mode in BIOS setup.\n");
+               return false;
+       }
 
 #if !IS_ENABLED(CONFIG_AMD_PMC)
-       dev_warn_once(adev->dev,
+       dev_err_once(adev->dev,
                      "Power consumption will be higher as the kernel has not been compiled with CONFIG_AMD_PMC.\n");
-#endif /* CONFIG_AMD_PMC */
+       return false;
+#else
        return true;
+#endif /* CONFIG_AMD_PMC */
 }
 
 #endif /* CONFIG_SUSPEND */
index 981a9cf..5c7d408 100644 (file)
@@ -3757,6 +3757,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
                adev->have_atomics_support = ((struct amd_sriov_msg_pf2vf_info *)
                        adev->virt.fw_reserve.p_pf2vf)->pcie_atomic_ops_support_flags ==
                        (PCI_EXP_DEVCAP2_ATOMIC_COMP32 | PCI_EXP_DEVCAP2_ATOMIC_COMP64);
+       /* APUs w/ gfx9 onwards doesn't reply on PCIe atomics, rather it is a
+        * internal path natively support atomics, set have_atomics_support to true.
+        */
+       else if ((adev->flags & AMD_IS_APU) &&
+               (adev->ip_versions[GC_HWIP][0] > IP_VERSION(9, 0, 0)))
+               adev->have_atomics_support = true;
        else
                adev->have_atomics_support =
                        !pci_enable_atomic_ops_to_root(adev->pdev,
@@ -4506,7 +4512,11 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
        dev_info(adev->dev, "recover vram bo from shadow start\n");
        mutex_lock(&adev->shadow_list_lock);
        list_for_each_entry(vmbo, &adev->shadow_list, shadow_list) {
-               shadow = &vmbo->bo;
+               /* If vm is compute context or adev is APU, shadow will be NULL */
+               if (!vmbo->shadow)
+                       continue;
+               shadow = vmbo->shadow;
+
                /* No need to recover an evicted BO */
                if (shadow->tbo.resource->mem_type != TTM_PL_TT ||
                    shadow->tbo.resource->start == AMDGPU_BO_INVALID_OFFSET ||
index b1ca1ab..393b6fb 100644 (file)
@@ -1615,6 +1615,7 @@ static const u16 amdgpu_unsupported_pciidlist[] = {
        0x5874,
        0x5940,
        0x5941,
+       0x5b70,
        0x5b72,
        0x5b73,
        0x5b74,
index f52d0ba..a7d2508 100644 (file)
@@ -582,7 +582,8 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev)
                if (r)
                        amdgpu_fence_driver_force_completion(ring);
 
-               if (ring->fence_drv.irq_src)
+               if (!drm_dev_is_unplugged(adev_to_drm(adev)) &&
+                   ring->fence_drv.irq_src)
                        amdgpu_irq_put(adev, ring->fence_drv.irq_src,
                                       ring->fence_drv.irq_type);
 
index 9d3a054..f3f541b 100644 (file)
@@ -687,9 +687,11 @@ int amdgpu_gfx_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *r
                if (r)
                        return r;
 
-               r = amdgpu_irq_get(adev, &adev->gfx.cp_ecc_error_irq, 0);
-               if (r)
-                       goto late_fini;
+               if (adev->gfx.cp_ecc_error_irq.funcs) {
+                       r = amdgpu_irq_get(adev, &adev->gfx.cp_ecc_error_irq, 0);
+                       if (r)
+                               goto late_fini;
+               }
        } else {
                amdgpu_ras_feature_enable_on_boot(adev, ras_block, 0);
        }
index 4e25317..95b0f98 100644 (file)
@@ -593,6 +593,8 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
        case IP_VERSION(9, 3, 0):
        /* GC 10.3.7 */
        case IP_VERSION(10, 3, 7):
+       /* GC 11.0.1 */
+       case IP_VERSION(11, 0, 1):
                if (amdgpu_tmz == 0) {
                        adev->gmc.tmz_enabled = false;
                        dev_info(adev->dev,
@@ -616,7 +618,6 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
        case IP_VERSION(10, 3, 1):
        /* YELLOW_CARP*/
        case IP_VERSION(10, 3, 3):
-       case IP_VERSION(11, 0, 1):
        case IP_VERSION(11, 0, 4):
                /* Don't enable it by default yet.
                 */
index b07c000..4fa019c 100644 (file)
@@ -241,6 +241,31 @@ int amdgpu_jpeg_process_poison_irq(struct amdgpu_device *adev,
        return 0;
 }
 
+int amdgpu_jpeg_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *ras_block)
+{
+       int r, i;
+
+       r = amdgpu_ras_block_late_init(adev, ras_block);
+       if (r)
+               return r;
+
+       if (amdgpu_ras_is_supported(adev, ras_block->block)) {
+               for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) {
+                       if (adev->jpeg.harvest_config & (1 << i))
+                               continue;
+
+                       r = amdgpu_irq_get(adev, &adev->jpeg.inst[i].ras_poison_irq, 0);
+                       if (r)
+                               goto late_fini;
+               }
+       }
+       return 0;
+
+late_fini:
+       amdgpu_ras_block_late_fini(adev, ras_block);
+       return r;
+}
+
 int amdgpu_jpeg_ras_sw_init(struct amdgpu_device *adev)
 {
        int err;
@@ -262,7 +287,7 @@ int amdgpu_jpeg_ras_sw_init(struct amdgpu_device *adev)
        adev->jpeg.ras_if = &ras->ras_block.ras_comm;
 
        if (!ras->ras_block.ras_late_init)
-               ras->ras_block.ras_late_init = amdgpu_ras_block_late_init;
+               ras->ras_block.ras_late_init = amdgpu_jpeg_ras_late_init;
 
        return 0;
 }
index 0ca76f0..1471a1e 100644 (file)
@@ -38,6 +38,7 @@ struct amdgpu_jpeg_reg{
 struct amdgpu_jpeg_inst {
        struct amdgpu_ring ring_dec;
        struct amdgpu_irq_src irq;
+       struct amdgpu_irq_src ras_poison_irq;
        struct amdgpu_jpeg_reg external;
 };
 
@@ -72,6 +73,8 @@ int amdgpu_jpeg_dec_ring_test_ib(struct amdgpu_ring *ring, long timeout);
 int amdgpu_jpeg_process_poison_irq(struct amdgpu_device *adev,
                                struct amdgpu_irq_src *source,
                                struct amdgpu_iv_entry *entry);
+int amdgpu_jpeg_ras_late_init(struct amdgpu_device *adev,
+                               struct ras_common_if *ras_block);
 int amdgpu_jpeg_ras_sw_init(struct amdgpu_device *adev);
 
 #endif /*__AMDGPU_JPEG_H__*/
index 2bd1a54..a70103a 100644 (file)
@@ -79,9 +79,10 @@ static void amdgpu_bo_user_destroy(struct ttm_buffer_object *tbo)
 static void amdgpu_bo_vm_destroy(struct ttm_buffer_object *tbo)
 {
        struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev);
-       struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
+       struct amdgpu_bo *shadow_bo = ttm_to_amdgpu_bo(tbo), *bo;
        struct amdgpu_bo_vm *vmbo;
 
+       bo = shadow_bo->parent;
        vmbo = to_amdgpu_bo_vm(bo);
        /* in case amdgpu_device_recover_vram got NULL of bo->parent */
        if (!list_empty(&vmbo->shadow_list)) {
@@ -139,7 +140,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 
                if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
                        places[c].lpfn = visible_pfn;
-               else if (adev->gmc.real_vram_size != adev->gmc.visible_vram_size)
+               else
                        places[c].flags |= TTM_PL_FLAG_TOPDOWN;
 
                if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
@@ -694,11 +695,6 @@ int amdgpu_bo_create_vm(struct amdgpu_device *adev,
                return r;
 
        *vmbo_ptr = to_amdgpu_bo_vm(bo_ptr);
-       INIT_LIST_HEAD(&(*vmbo_ptr)->shadow_list);
-       /* Set destroy callback to amdgpu_bo_vm_destroy after vmbo->shadow_list
-        * is initialized.
-        */
-       bo_ptr->tbo.destroy = &amdgpu_bo_vm_destroy;
        return r;
 }
 
@@ -715,6 +711,8 @@ void amdgpu_bo_add_to_shadow_list(struct amdgpu_bo_vm *vmbo)
 
        mutex_lock(&adev->shadow_list_lock);
        list_add_tail(&vmbo->shadow_list, &adev->shadow_list);
+       vmbo->shadow->parent = amdgpu_bo_ref(&vmbo->bo);
+       vmbo->shadow->tbo.destroy = &amdgpu_bo_vm_destroy;
        mutex_unlock(&adev->shadow_list_lock);
 }
 
index 9d7e6e0..a150b7a 100644 (file)
@@ -3548,6 +3548,9 @@ static ssize_t amdgpu_psp_vbflash_read(struct file *filp, struct kobject *kobj,
        void *fw_pri_cpu_addr;
        int ret;
 
+       if (adev->psp.vbflash_image_size == 0)
+               return -EINVAL;
+
        dev_info(adev->dev, "VBIOS flash to PSP started");
 
        ret = amdgpu_bo_create_kernel(adev, adev->psp.vbflash_image_size,
@@ -3599,13 +3602,13 @@ static ssize_t amdgpu_psp_vbflash_status(struct device *dev,
 }
 
 static const struct bin_attribute psp_vbflash_bin_attr = {
-       .attr = {.name = "psp_vbflash", .mode = 0664},
+       .attr = {.name = "psp_vbflash", .mode = 0660},
        .size = 0,
        .write = amdgpu_psp_vbflash_write,
        .read = amdgpu_psp_vbflash_read,
 };
 
-static DEVICE_ATTR(psp_vbflash_status, 0444, amdgpu_psp_vbflash_status, NULL);
+static DEVICE_ATTR(psp_vbflash_status, 0440, amdgpu_psp_vbflash_status, NULL);
 
 int amdgpu_psp_sysfs_init(struct amdgpu_device *adev)
 {
index dc474b8..49de3a3 100644 (file)
@@ -581,3 +581,21 @@ void amdgpu_ring_ib_end(struct amdgpu_ring *ring)
        if (ring->is_sw_ring)
                amdgpu_sw_ring_ib_end(ring);
 }
+
+void amdgpu_ring_ib_on_emit_cntl(struct amdgpu_ring *ring)
+{
+       if (ring->is_sw_ring)
+               amdgpu_sw_ring_ib_mark_offset(ring, AMDGPU_MUX_OFFSET_TYPE_CONTROL);
+}
+
+void amdgpu_ring_ib_on_emit_ce(struct amdgpu_ring *ring)
+{
+       if (ring->is_sw_ring)
+               amdgpu_sw_ring_ib_mark_offset(ring, AMDGPU_MUX_OFFSET_TYPE_CE);
+}
+
+void amdgpu_ring_ib_on_emit_de(struct amdgpu_ring *ring)
+{
+       if (ring->is_sw_ring)
+               amdgpu_sw_ring_ib_mark_offset(ring, AMDGPU_MUX_OFFSET_TYPE_DE);
+}
index d874944..2474cb7 100644 (file)
@@ -227,6 +227,9 @@ struct amdgpu_ring_funcs {
        int (*preempt_ib)(struct amdgpu_ring *ring);
        void (*emit_mem_sync)(struct amdgpu_ring *ring);
        void (*emit_wave_limit)(struct amdgpu_ring *ring, bool enable);
+       void (*patch_cntl)(struct amdgpu_ring *ring, unsigned offset);
+       void (*patch_ce)(struct amdgpu_ring *ring, unsigned offset);
+       void (*patch_de)(struct amdgpu_ring *ring, unsigned offset);
 };
 
 struct amdgpu_ring {
@@ -318,10 +321,16 @@ struct amdgpu_ring {
 #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r))
 #define amdgpu_ring_patch_cond_exec(r,o) (r)->funcs->patch_cond_exec((r),(o))
 #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
+#define amdgpu_ring_patch_cntl(r, o) ((r)->funcs->patch_cntl((r), (o)))
+#define amdgpu_ring_patch_ce(r, o) ((r)->funcs->patch_ce((r), (o)))
+#define amdgpu_ring_patch_de(r, o) ((r)->funcs->patch_de((r), (o)))
 
 int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
 void amdgpu_ring_ib_begin(struct amdgpu_ring *ring);
 void amdgpu_ring_ib_end(struct amdgpu_ring *ring);
+void amdgpu_ring_ib_on_emit_cntl(struct amdgpu_ring *ring);
+void amdgpu_ring_ib_on_emit_ce(struct amdgpu_ring *ring);
+void amdgpu_ring_ib_on_emit_de(struct amdgpu_ring *ring);
 
 void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
 void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
index 62079f0..73516ab 100644 (file)
@@ -105,6 +105,16 @@ static void amdgpu_mux_resubmit_chunks(struct amdgpu_ring_mux *mux)
                                amdgpu_fence_update_start_timestamp(e->ring,
                                                                    chunk->sync_seq,
                                                                    ktime_get());
+                               if (chunk->sync_seq ==
+                                       le32_to_cpu(*(e->ring->fence_drv.cpu_addr + 2))) {
+                                       if (chunk->cntl_offset <= e->ring->buf_mask)
+                                               amdgpu_ring_patch_cntl(e->ring,
+                                                                      chunk->cntl_offset);
+                                       if (chunk->ce_offset <= e->ring->buf_mask)
+                                               amdgpu_ring_patch_ce(e->ring, chunk->ce_offset);
+                                       if (chunk->de_offset <= e->ring->buf_mask)
+                                               amdgpu_ring_patch_de(e->ring, chunk->de_offset);
+                               }
                                amdgpu_ring_mux_copy_pkt_from_sw_ring(mux, e->ring,
                                                                      chunk->start,
                                                                      chunk->end);
@@ -407,6 +417,17 @@ void amdgpu_sw_ring_ib_end(struct amdgpu_ring *ring)
        amdgpu_ring_mux_end_ib(mux, ring);
 }
 
+void amdgpu_sw_ring_ib_mark_offset(struct amdgpu_ring *ring, enum amdgpu_ring_mux_offset_type type)
+{
+       struct amdgpu_device *adev = ring->adev;
+       struct amdgpu_ring_mux *mux = &adev->gfx.muxer;
+       unsigned offset;
+
+       offset = ring->wptr & ring->buf_mask;
+
+       amdgpu_ring_mux_ib_mark_offset(mux, ring, offset, type);
+}
+
 void amdgpu_ring_mux_start_ib(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring)
 {
        struct amdgpu_mux_entry *e;
@@ -429,6 +450,10 @@ void amdgpu_ring_mux_start_ib(struct amdgpu_ring_mux *mux, struct amdgpu_ring *r
        }
 
        chunk->start = ring->wptr;
+       /* the initialized value used to check if they are set by the ib submission*/
+       chunk->cntl_offset = ring->buf_mask + 1;
+       chunk->de_offset = ring->buf_mask + 1;
+       chunk->ce_offset = ring->buf_mask + 1;
        list_add_tail(&chunk->entry, &e->list);
 }
 
@@ -454,6 +479,41 @@ static void scan_and_remove_signaled_chunk(struct amdgpu_ring_mux *mux, struct a
        }
 }
 
+void amdgpu_ring_mux_ib_mark_offset(struct amdgpu_ring_mux *mux,
+                                   struct amdgpu_ring *ring, u64 offset,
+                                   enum amdgpu_ring_mux_offset_type type)
+{
+       struct amdgpu_mux_entry *e;
+       struct amdgpu_mux_chunk *chunk;
+
+       e = amdgpu_ring_mux_sw_entry(mux, ring);
+       if (!e) {
+               DRM_ERROR("cannot find entry!\n");
+               return;
+       }
+
+       chunk = list_last_entry(&e->list, struct amdgpu_mux_chunk, entry);
+       if (!chunk) {
+               DRM_ERROR("cannot find chunk!\n");
+               return;
+       }
+
+       switch (type) {
+       case AMDGPU_MUX_OFFSET_TYPE_CONTROL:
+               chunk->cntl_offset = offset;
+               break;
+       case AMDGPU_MUX_OFFSET_TYPE_DE:
+               chunk->de_offset = offset;
+               break;
+       case AMDGPU_MUX_OFFSET_TYPE_CE:
+               chunk->ce_offset = offset;
+               break;
+       default:
+               DRM_ERROR("invalid type (%d)\n", type);
+               break;
+       }
+}
+
 void amdgpu_ring_mux_end_ib(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring)
 {
        struct amdgpu_mux_entry *e;
index 4be45fc..b22d4fb 100644 (file)
@@ -50,6 +50,12 @@ struct amdgpu_mux_entry {
        struct list_head        list;
 };
 
+enum amdgpu_ring_mux_offset_type {
+       AMDGPU_MUX_OFFSET_TYPE_CONTROL,
+       AMDGPU_MUX_OFFSET_TYPE_DE,
+       AMDGPU_MUX_OFFSET_TYPE_CE,
+};
+
 struct amdgpu_ring_mux {
        struct amdgpu_ring      *real_ring;
 
@@ -72,12 +78,18 @@ struct amdgpu_ring_mux {
  * @sync_seq: the fence seqno related with the saved IB.
  * @start:- start location on the software ring.
  * @end:- end location on the software ring.
+ * @control_offset:- the PRE_RESUME bit position used for resubmission.
+ * @de_offset:- the anchor in write_data for de meta of resubmission.
+ * @ce_offset:- the anchor in write_data for ce meta of resubmission.
  */
 struct amdgpu_mux_chunk {
        struct list_head        entry;
        uint32_t                sync_seq;
        u64                     start;
        u64                     end;
+       u64                     cntl_offset;
+       u64                     de_offset;
+       u64                     ce_offset;
 };
 
 int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring,
@@ -89,6 +101,8 @@ u64 amdgpu_ring_mux_get_wptr(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ri
 u64 amdgpu_ring_mux_get_rptr(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring);
 void amdgpu_ring_mux_start_ib(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring);
 void amdgpu_ring_mux_end_ib(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring);
+void amdgpu_ring_mux_ib_mark_offset(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring,
+                                   u64 offset, enum amdgpu_ring_mux_offset_type type);
 bool amdgpu_mcbp_handle_trailing_fence_irq(struct amdgpu_ring_mux *mux);
 
 u64 amdgpu_sw_ring_get_rptr_gfx(struct amdgpu_ring *ring);
@@ -97,6 +111,7 @@ void amdgpu_sw_ring_set_wptr_gfx(struct amdgpu_ring *ring);
 void amdgpu_sw_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
 void amdgpu_sw_ring_ib_begin(struct amdgpu_ring *ring);
 void amdgpu_sw_ring_ib_end(struct amdgpu_ring *ring);
+void amdgpu_sw_ring_ib_mark_offset(struct amdgpu_ring *ring, enum amdgpu_ring_mux_offset_type type);
 const char *amdgpu_sw_ring_name(int idx);
 unsigned int amdgpu_sw_ring_priority(int idx);
 
index e63fcc5..2d94f1b 100644 (file)
@@ -1181,6 +1181,31 @@ int amdgpu_vcn_process_poison_irq(struct amdgpu_device *adev,
        return 0;
 }
 
+int amdgpu_vcn_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *ras_block)
+{
+       int r, i;
+
+       r = amdgpu_ras_block_late_init(adev, ras_block);
+       if (r)
+               return r;
+
+       if (amdgpu_ras_is_supported(adev, ras_block->block)) {
+               for (i = 0; i < adev->vcn.num_vcn_inst; i++) {
+                       if (adev->vcn.harvest_config & (1 << i))
+                               continue;
+
+                       r = amdgpu_irq_get(adev, &adev->vcn.inst[i].ras_poison_irq, 0);
+                       if (r)
+                               goto late_fini;
+               }
+       }
+       return 0;
+
+late_fini:
+       amdgpu_ras_block_late_fini(adev, ras_block);
+       return r;
+}
+
 int amdgpu_vcn_ras_sw_init(struct amdgpu_device *adev)
 {
        int err;
@@ -1202,7 +1227,7 @@ int amdgpu_vcn_ras_sw_init(struct amdgpu_device *adev)
        adev->vcn.ras_if = &ras->ras_block.ras_comm;
 
        if (!ras->ras_block.ras_late_init)
-               ras->ras_block.ras_late_init = amdgpu_ras_block_late_init;
+               ras->ras_block.ras_late_init = amdgpu_vcn_ras_late_init;
 
        return 0;
 }
index c730949..f1397ef 100644 (file)
@@ -234,6 +234,7 @@ struct amdgpu_vcn_inst {
        struct amdgpu_ring      ring_enc[AMDGPU_VCN_MAX_ENC_RINGS];
        atomic_t                sched_score;
        struct amdgpu_irq_src   irq;
+       struct amdgpu_irq_src   ras_poison_irq;
        struct amdgpu_vcn_reg   external;
        struct amdgpu_bo        *dpg_sram_bo;
        struct dpg_pause_state  pause_state;
@@ -400,6 +401,8 @@ void amdgpu_debugfs_vcn_fwlog_init(struct amdgpu_device *adev,
 int amdgpu_vcn_process_poison_irq(struct amdgpu_device *adev,
                        struct amdgpu_irq_src *source,
                        struct amdgpu_iv_entry *entry);
+int amdgpu_vcn_ras_late_init(struct amdgpu_device *adev,
+                       struct ras_common_if *ras_block);
 int amdgpu_vcn_ras_sw_init(struct amdgpu_device *adev);
 
 #endif
index df63dc3..051c719 100644 (file)
@@ -564,7 +564,6 @@ int amdgpu_vm_pt_create(struct amdgpu_device *adev, struct amdgpu_vm *vm,
                return r;
        }
 
-       (*vmbo)->shadow->parent = amdgpu_bo_ref(bo);
        amdgpu_bo_add_to_shadow_list(*vmbo);
 
        return 0;
index 43d6a9d..afacfb9 100644 (file)
@@ -800,7 +800,7 @@ static void amdgpu_vram_mgr_debug(struct ttm_resource_manager *man,
 {
        struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
        struct drm_buddy *mm = &mgr->mm;
-       struct drm_buddy_block *block;
+       struct amdgpu_vram_reservation *rsv;
 
        drm_printf(printer, "  vis usage:%llu\n",
                   amdgpu_vram_mgr_vis_usage(mgr));
@@ -812,8 +812,9 @@ static void amdgpu_vram_mgr_debug(struct ttm_resource_manager *man,
        drm_buddy_print(mm, printer);
 
        drm_printf(printer, "reserved:\n");
-       list_for_each_entry(block, &mgr->reserved_pages, link)
-               drm_buddy_block_print(mm, block, printer);
+       list_for_each_entry(rsv, &mgr->reserved_pages, blocks)
+               drm_printf(printer, "%#018llx-%#018llx: %llu\n",
+                       rsv->start, rsv->start + rsv->size, rsv->size);
        mutex_unlock(&mgr->lock);
 }
 
index f5b5ce1..ab44c13 100644 (file)
@@ -6892,8 +6892,10 @@ static int gfx_v10_0_kiq_resume(struct amdgpu_device *adev)
                return r;
 
        r = amdgpu_bo_kmap(ring->mqd_obj, (void **)&ring->mqd_ptr);
-       if (unlikely(r != 0))
+       if (unlikely(r != 0)) {
+               amdgpu_bo_unreserve(ring->mqd_obj);
                return r;
+       }
 
        gfx_v10_0_kiq_init_queue(ring);
        amdgpu_bo_kunmap(ring->mqd_obj);
@@ -8152,8 +8154,14 @@ static int gfx_v10_0_set_powergating_state(void *handle,
        case IP_VERSION(10, 3, 3):
        case IP_VERSION(10, 3, 6):
        case IP_VERSION(10, 3, 7):
+               if (!enable)
+                       amdgpu_gfx_off_ctrl(adev, false);
+
                gfx_v10_cntl_pg(adev, enable);
-               amdgpu_gfx_off_ctrl(adev, enable);
+
+               if (enable)
+                       amdgpu_gfx_off_ctrl(adev, true);
+
                break;
        default:
                break;
index a9da048..c4940b6 100644 (file)
@@ -1315,13 +1315,6 @@ static int gfx_v11_0_sw_init(void *handle)
        if (r)
                return r;
 
-       /* ECC error */
-       r = amdgpu_irq_add_id(adev, SOC21_IH_CLIENTID_GRBM_CP,
-                                 GFX_11_0_0__SRCID__CP_ECC_ERROR,
-                                 &adev->gfx.cp_ecc_error_irq);
-       if (r)
-               return r;
-
        /* FED error */
        r = amdgpu_irq_add_id(adev, SOC21_IH_CLIENTID_GFX,
                                  GFX_11_0_0__SRCID__RLC_GC_FED_INTERRUPT,
@@ -4444,7 +4437,6 @@ static int gfx_v11_0_hw_fini(void *handle)
        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
        int r;
 
-       amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
 
@@ -4675,24 +4667,27 @@ static uint64_t gfx_v11_0_get_gpu_clock_counter(struct amdgpu_device *adev)
        uint64_t clock;
        uint64_t clock_counter_lo, clock_counter_hi_pre, clock_counter_hi_after;
 
-       amdgpu_gfx_off_ctrl(adev, false);
-       mutex_lock(&adev->gfx.gpu_clock_mutex);
        if (amdgpu_sriov_vf(adev)) {
+               amdgpu_gfx_off_ctrl(adev, false);
+               mutex_lock(&adev->gfx.gpu_clock_mutex);
                clock_counter_hi_pre = (uint64_t)RREG32_SOC15(GC, 0, regCP_MES_MTIME_HI);
                clock_counter_lo = (uint64_t)RREG32_SOC15(GC, 0, regCP_MES_MTIME_LO);
                clock_counter_hi_after = (uint64_t)RREG32_SOC15(GC, 0, regCP_MES_MTIME_HI);
                if (clock_counter_hi_pre != clock_counter_hi_after)
                        clock_counter_lo = (uint64_t)RREG32_SOC15(GC, 0, regCP_MES_MTIME_LO);
+               mutex_unlock(&adev->gfx.gpu_clock_mutex);
+               amdgpu_gfx_off_ctrl(adev, true);
        } else {
+               preempt_disable();
                clock_counter_hi_pre = (uint64_t)RREG32_SOC15(SMUIO, 0, regGOLDEN_TSC_COUNT_UPPER);
                clock_counter_lo = (uint64_t)RREG32_SOC15(SMUIO, 0, regGOLDEN_TSC_COUNT_LOWER);
                clock_counter_hi_after = (uint64_t)RREG32_SOC15(SMUIO, 0, regGOLDEN_TSC_COUNT_UPPER);
                if (clock_counter_hi_pre != clock_counter_hi_after)
                        clock_counter_lo = (uint64_t)RREG32_SOC15(SMUIO, 0, regGOLDEN_TSC_COUNT_LOWER);
+               preempt_enable();
        }
        clock = clock_counter_lo | (clock_counter_hi_after << 32ULL);
-       mutex_unlock(&adev->gfx.gpu_clock_mutex);
-       amdgpu_gfx_off_ctrl(adev, true);
+
        return clock;
 }
 
@@ -5158,8 +5153,14 @@ static int gfx_v11_0_set_powergating_state(void *handle,
                break;
        case IP_VERSION(11, 0, 1):
        case IP_VERSION(11, 0, 4):
+               if (!enable)
+                       amdgpu_gfx_off_ctrl(adev, false);
+
                gfx_v11_cntl_pg(adev, enable);
-               amdgpu_gfx_off_ctrl(adev, enable);
+
+               if (enable)
+                       amdgpu_gfx_off_ctrl(adev, true);
+
                break;
        default:
                break;
@@ -5897,36 +5898,6 @@ static void gfx_v11_0_set_compute_eop_interrupt_state(struct amdgpu_device *adev
        }
 }
 
-#define CP_ME1_PIPE_INST_ADDR_INTERVAL  0x1
-#define SET_ECC_ME_PIPE_STATE(reg_addr, state) \
-       do { \
-               uint32_t tmp = RREG32_SOC15_IP(GC, reg_addr); \
-               tmp = REG_SET_FIELD(tmp, CP_ME1_PIPE0_INT_CNTL, CP_ECC_ERROR_INT_ENABLE, state); \
-               WREG32_SOC15_IP(GC, reg_addr, tmp); \
-       } while (0)
-
-static int gfx_v11_0_set_cp_ecc_error_state(struct amdgpu_device *adev,
-                                                       struct amdgpu_irq_src *source,
-                                                       unsigned type,
-                                                       enum amdgpu_interrupt_state state)
-{
-       uint32_t ecc_irq_state = 0;
-       uint32_t pipe0_int_cntl_addr = 0;
-       int i = 0;
-
-       ecc_irq_state = (state == AMDGPU_IRQ_STATE_ENABLE) ? 1 : 0;
-
-       pipe0_int_cntl_addr = SOC15_REG_OFFSET(GC, 0, regCP_ME1_PIPE0_INT_CNTL);
-
-       WREG32_FIELD15_PREREG(GC, 0, CP_INT_CNTL_RING0, CP_ECC_ERROR_INT_ENABLE, ecc_irq_state);
-
-       for (i = 0; i < adev->gfx.mec.num_pipe_per_mec; i++)
-               SET_ECC_ME_PIPE_STATE(pipe0_int_cntl_addr + i * CP_ME1_PIPE_INST_ADDR_INTERVAL,
-                                       ecc_irq_state);
-
-       return 0;
-}
-
 static int gfx_v11_0_set_eop_interrupt_state(struct amdgpu_device *adev,
                                            struct amdgpu_irq_src *src,
                                            unsigned type,
@@ -6341,11 +6312,6 @@ static const struct amdgpu_irq_src_funcs gfx_v11_0_priv_inst_irq_funcs = {
        .process = gfx_v11_0_priv_inst_irq,
 };
 
-static const struct amdgpu_irq_src_funcs gfx_v11_0_cp_ecc_error_irq_funcs = {
-       .set = gfx_v11_0_set_cp_ecc_error_state,
-       .process = amdgpu_gfx_cp_ecc_error_irq,
-};
-
 static const struct amdgpu_irq_src_funcs gfx_v11_0_rlc_gc_fed_irq_funcs = {
        .process = gfx_v11_0_rlc_gc_fed_irq,
 };
@@ -6361,9 +6327,6 @@ static void gfx_v11_0_set_irq_funcs(struct amdgpu_device *adev)
        adev->gfx.priv_inst_irq.num_types = 1;
        adev->gfx.priv_inst_irq.funcs = &gfx_v11_0_priv_inst_irq_funcs;
 
-       adev->gfx.cp_ecc_error_irq.num_types = 1; /* CP ECC error */
-       adev->gfx.cp_ecc_error_irq.funcs = &gfx_v11_0_cp_ecc_error_irq_funcs;
-
        adev->gfx.rlc_gc_fed_irq.num_types = 1; /* 0x80 FED error */
        adev->gfx.rlc_gc_fed_irq.funcs = &gfx_v11_0_rlc_gc_fed_irq_funcs;
 
index adbcd81..a674c8a 100644 (file)
@@ -149,16 +149,6 @@ MODULE_FIRMWARE("amdgpu/aldebaran_sjt_mec2.bin");
 #define mmGOLDEN_TSC_COUNT_LOWER_Renoir                0x0026
 #define mmGOLDEN_TSC_COUNT_LOWER_Renoir_BASE_IDX       1
 
-#define mmGOLDEN_TSC_COUNT_UPPER_Raven   0x007a
-#define mmGOLDEN_TSC_COUNT_UPPER_Raven_BASE_IDX 0
-#define mmGOLDEN_TSC_COUNT_LOWER_Raven   0x007b
-#define mmGOLDEN_TSC_COUNT_LOWER_Raven_BASE_IDX 0
-
-#define mmGOLDEN_TSC_COUNT_UPPER_Raven2   0x0068
-#define mmGOLDEN_TSC_COUNT_UPPER_Raven2_BASE_IDX 0
-#define mmGOLDEN_TSC_COUNT_LOWER_Raven2   0x0069
-#define mmGOLDEN_TSC_COUNT_LOWER_Raven2_BASE_IDX 0
-
 enum ta_ras_gfx_subblock {
        /*CPC*/
        TA_RAS_BLOCK__GFX_CPC_INDEX_START = 0,
@@ -765,7 +755,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev);
 static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
                                struct amdgpu_cu_info *cu_info);
 static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev);
-static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume);
+static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume, bool usegds);
 static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);
 static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
                                          void *ras_error_status);
@@ -3617,8 +3607,10 @@ static int gfx_v9_0_kiq_resume(struct amdgpu_device *adev)
                return r;
 
        r = amdgpu_bo_kmap(ring->mqd_obj, (void **)&ring->mqd_ptr);
-       if (unlikely(r != 0))
+       if (unlikely(r != 0)) {
+               amdgpu_bo_unreserve(ring->mqd_obj);
                return r;
+       }
 
        gfx_v9_0_kiq_init_queue(ring);
        amdgpu_bo_kunmap(ring->mqd_obj);
@@ -3764,7 +3756,8 @@ static int gfx_v9_0_hw_fini(void *handle)
 {
        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-       amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
+       if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
+               amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
 
@@ -4001,36 +3994,6 @@ static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev)
                preempt_enable();
                clock = clock_lo | (clock_hi << 32ULL);
                break;
-       case IP_VERSION(9, 1, 0):
-               preempt_disable();
-               clock_hi = RREG32_SOC15_NO_KIQ(PWR, 0, mmGOLDEN_TSC_COUNT_UPPER_Raven);
-               clock_lo = RREG32_SOC15_NO_KIQ(PWR, 0, mmGOLDEN_TSC_COUNT_LOWER_Raven);
-               hi_check = RREG32_SOC15_NO_KIQ(PWR, 0, mmGOLDEN_TSC_COUNT_UPPER_Raven);
-               /* The PWR TSC clock frequency is 100MHz, which sets 32-bit carry over
-                * roughly every 42 seconds.
-                */
-               if (hi_check != clock_hi) {
-                       clock_lo = RREG32_SOC15_NO_KIQ(PWR, 0, mmGOLDEN_TSC_COUNT_LOWER_Raven);
-                       clock_hi = hi_check;
-               }
-               preempt_enable();
-               clock = clock_lo | (clock_hi << 32ULL);
-               break;
-       case IP_VERSION(9, 2, 2):
-               preempt_disable();
-               clock_hi = RREG32_SOC15_NO_KIQ(PWR, 0, mmGOLDEN_TSC_COUNT_UPPER_Raven2);
-               clock_lo = RREG32_SOC15_NO_KIQ(PWR, 0, mmGOLDEN_TSC_COUNT_LOWER_Raven2);
-               hi_check = RREG32_SOC15_NO_KIQ(PWR, 0, mmGOLDEN_TSC_COUNT_UPPER_Raven2);
-               /* The PWR TSC clock frequency is 100MHz, which sets 32-bit carry over
-                * roughly every 42 seconds.
-                */
-               if (hi_check != clock_hi) {
-                       clock_lo = RREG32_SOC15_NO_KIQ(PWR, 0, mmGOLDEN_TSC_COUNT_LOWER_Raven2);
-                       clock_hi = hi_check;
-               }
-               preempt_enable();
-               clock = clock_lo | (clock_hi << 32ULL);
-               break;
        default:
                amdgpu_gfx_off_ctrl(adev, false);
                mutex_lock(&adev->gfx.gpu_clock_mutex);
@@ -5164,7 +5127,8 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
                        gfx_v9_0_ring_emit_de_meta(ring,
                                                   (!amdgpu_sriov_vf(ring->adev) &&
                                                   flags & AMDGPU_IB_PREEMPTED) ?
-                                                  true : false);
+                                                  true : false,
+                                                  job->gds_size > 0 && job->gds_base != 0);
        }
 
        amdgpu_ring_write(ring, header);
@@ -5175,9 +5139,83 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
 #endif
                lower_32_bits(ib->gpu_addr));
        amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr));
+       amdgpu_ring_ib_on_emit_cntl(ring);
        amdgpu_ring_write(ring, control);
 }
 
+static void gfx_v9_0_ring_patch_cntl(struct amdgpu_ring *ring,
+                                    unsigned offset)
+{
+       u32 control = ring->ring[offset];
+
+       control |= INDIRECT_BUFFER_PRE_RESUME(1);
+       ring->ring[offset] = control;
+}
+
+static void gfx_v9_0_ring_patch_ce_meta(struct amdgpu_ring *ring,
+                                       unsigned offset)
+{
+       struct amdgpu_device *adev = ring->adev;
+       void *ce_payload_cpu_addr;
+       uint64_t payload_offset, payload_size;
+
+       payload_size = sizeof(struct v9_ce_ib_state);
+
+       if (ring->is_mes_queue) {
+               payload_offset = offsetof(struct amdgpu_mes_ctx_meta_data,
+                                         gfx[0].gfx_meta_data) +
+                       offsetof(struct v9_gfx_meta_data, ce_payload);
+               ce_payload_cpu_addr =
+                       amdgpu_mes_ctx_get_offs_cpu_addr(ring, payload_offset);
+       } else {
+               payload_offset = offsetof(struct v9_gfx_meta_data, ce_payload);
+               ce_payload_cpu_addr = adev->virt.csa_cpu_addr + payload_offset;
+       }
+
+       if (offset + (payload_size >> 2) <= ring->buf_mask + 1) {
+               memcpy((void *)&ring->ring[offset], ce_payload_cpu_addr, payload_size);
+       } else {
+               memcpy((void *)&ring->ring[offset], ce_payload_cpu_addr,
+                      (ring->buf_mask + 1 - offset) << 2);
+               payload_size -= (ring->buf_mask + 1 - offset) << 2;
+               memcpy((void *)&ring->ring[0],
+                      ce_payload_cpu_addr + ((ring->buf_mask + 1 - offset) << 2),
+                      payload_size);
+       }
+}
+
+static void gfx_v9_0_ring_patch_de_meta(struct amdgpu_ring *ring,
+                                       unsigned offset)
+{
+       struct amdgpu_device *adev = ring->adev;
+       void *de_payload_cpu_addr;
+       uint64_t payload_offset, payload_size;
+
+       payload_size = sizeof(struct v9_de_ib_state);
+
+       if (ring->is_mes_queue) {
+               payload_offset = offsetof(struct amdgpu_mes_ctx_meta_data,
+                                         gfx[0].gfx_meta_data) +
+                       offsetof(struct v9_gfx_meta_data, de_payload);
+               de_payload_cpu_addr =
+                       amdgpu_mes_ctx_get_offs_cpu_addr(ring, payload_offset);
+       } else {
+               payload_offset = offsetof(struct v9_gfx_meta_data, de_payload);
+               de_payload_cpu_addr = adev->virt.csa_cpu_addr + payload_offset;
+       }
+
+       if (offset + (payload_size >> 2) <= ring->buf_mask + 1) {
+               memcpy((void *)&ring->ring[offset], de_payload_cpu_addr, payload_size);
+       } else {
+               memcpy((void *)&ring->ring[offset], de_payload_cpu_addr,
+                      (ring->buf_mask + 1 - offset) << 2);
+               payload_size -= (ring->buf_mask + 1 - offset) << 2;
+               memcpy((void *)&ring->ring[0],
+                      de_payload_cpu_addr + ((ring->buf_mask + 1 - offset) << 2),
+                      payload_size);
+       }
+}
+
 static void gfx_v9_0_ring_emit_ib_compute(struct amdgpu_ring *ring,
                                          struct amdgpu_job *job,
                                          struct amdgpu_ib *ib,
@@ -5373,6 +5411,8 @@ static void gfx_v9_0_ring_emit_ce_meta(struct amdgpu_ring *ring, bool resume)
        amdgpu_ring_write(ring, lower_32_bits(ce_payload_gpu_addr));
        amdgpu_ring_write(ring, upper_32_bits(ce_payload_gpu_addr));
 
+       amdgpu_ring_ib_on_emit_ce(ring);
+
        if (resume)
                amdgpu_ring_write_multiple(ring, ce_payload_cpu_addr,
                                           sizeof(ce_payload) >> 2);
@@ -5406,10 +5446,6 @@ static int gfx_v9_0_ring_preempt_ib(struct amdgpu_ring *ring)
        amdgpu_ring_alloc(ring, 13);
        gfx_v9_0_ring_emit_fence(ring, ring->trail_fence_gpu_addr,
                                 ring->trail_seq, AMDGPU_FENCE_FLAG_EXEC | AMDGPU_FENCE_FLAG_INT);
-       /*reset the CP_VMID_PREEMPT after trailing fence*/
-       amdgpu_ring_emit_wreg(ring,
-                             SOC15_REG_OFFSET(GC, 0, mmCP_VMID_PREEMPT),
-                             0x0);
 
        /* assert IB preemption, emit the trailing fence */
        kiq->pmf->kiq_unmap_queues(kiq_ring, ring, PREEMPT_QUEUES_NO_UNMAP,
@@ -5432,6 +5468,10 @@ static int gfx_v9_0_ring_preempt_ib(struct amdgpu_ring *ring)
                DRM_WARN("ring %d timeout to preempt ib\n", ring->idx);
        }
 
+       /*reset the CP_VMID_PREEMPT after trailing fence*/
+       amdgpu_ring_emit_wreg(ring,
+                             SOC15_REG_OFFSET(GC, 0, mmCP_VMID_PREEMPT),
+                             0x0);
        amdgpu_ring_commit(ring);
 
        /* deassert preemption condition */
@@ -5439,7 +5479,7 @@ static int gfx_v9_0_ring_preempt_ib(struct amdgpu_ring *ring)
        return r;
 }
 
-static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume)
+static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume, bool usegds)
 {
        struct amdgpu_device *adev = ring->adev;
        struct v9_de_ib_state de_payload = {0};
@@ -5470,8 +5510,10 @@ static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume)
                                 PAGE_SIZE);
        }
 
-       de_payload.gds_backup_addrlo = lower_32_bits(gds_addr);
-       de_payload.gds_backup_addrhi = upper_32_bits(gds_addr);
+       if (usegds) {
+               de_payload.gds_backup_addrlo = lower_32_bits(gds_addr);
+               de_payload.gds_backup_addrhi = upper_32_bits(gds_addr);
+       }
 
        cnt = (sizeof(de_payload) >> 2) + 4 - 2;
        amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, cnt));
@@ -5482,6 +5524,7 @@ static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume)
        amdgpu_ring_write(ring, lower_32_bits(de_payload_gpu_addr));
        amdgpu_ring_write(ring, upper_32_bits(de_payload_gpu_addr));
 
+       amdgpu_ring_ib_on_emit_de(ring);
        if (resume)
                amdgpu_ring_write_multiple(ring, de_payload_cpu_addr,
                                           sizeof(de_payload) >> 2);
@@ -6892,6 +6935,9 @@ static const struct amdgpu_ring_funcs gfx_v9_0_sw_ring_funcs_gfx = {
        .emit_reg_write_reg_wait = gfx_v9_0_ring_emit_reg_write_reg_wait,
        .soft_recovery = gfx_v9_0_ring_soft_recovery,
        .emit_mem_sync = gfx_v9_0_emit_mem_sync,
+       .patch_cntl = gfx_v9_0_ring_patch_cntl,
+       .patch_de = gfx_v9_0_ring_patch_de_meta,
+       .patch_ce = gfx_v9_0_ring_patch_ce_meta,
 };
 
 static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
index d95f9fe..4116c11 100644 (file)
@@ -31,6 +31,8 @@
 #include "umc_v8_10.h"
 #include "athub/athub_3_0_0_sh_mask.h"
 #include "athub/athub_3_0_0_offset.h"
+#include "dcn/dcn_3_2_0_offset.h"
+#include "dcn/dcn_3_2_0_sh_mask.h"
 #include "oss/osssys_6_0_0_offset.h"
 #include "ivsrcid/vmc/irqsrcs_vmc_1_0.h"
 #include "navi10_enum.h"
@@ -546,7 +548,24 @@ static void gmc_v11_0_get_vm_pte(struct amdgpu_device *adev,
 
 static unsigned gmc_v11_0_get_vbios_fb_size(struct amdgpu_device *adev)
 {
-       return 0;
+       u32 d1vga_control = RREG32_SOC15(DCE, 0, regD1VGA_CONTROL);
+       unsigned size;
+
+       if (REG_GET_FIELD(d1vga_control, D1VGA_CONTROL, D1VGA_MODE_ENABLE)) {
+               size = AMDGPU_VBIOS_VGA_ALLOCATION;
+       } else {
+               u32 viewport;
+               u32 pitch;
+
+               viewport = RREG32_SOC15(DCE, 0, regHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION);
+               pitch = RREG32_SOC15(DCE, 0, regHUBPREQ0_DCSURF_SURFACE_PITCH);
+               size = (REG_GET_FIELD(viewport,
+                                       HUBP0_DCSURF_PRI_VIEWPORT_DIMENSION, PRI_VIEWPORT_HEIGHT) *
+                               REG_GET_FIELD(pitch, HUBPREQ0_DCSURF_SURFACE_PITCH, PITCH) *
+                               4);
+       }
+
+       return size;
 }
 
 static const struct amdgpu_gmc_funcs gmc_v11_0_gmc_funcs = {
index b040f51..73e0dc5 100644 (file)
@@ -102,13 +102,13 @@ static int jpeg_v2_5_sw_init(void *handle)
 
                /* JPEG DJPEG POISON EVENT */
                r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_jpeg[i],
-                       VCN_2_6__SRCID_DJPEG0_POISON, &adev->jpeg.inst[i].irq);
+                       VCN_2_6__SRCID_DJPEG0_POISON, &adev->jpeg.inst[i].ras_poison_irq);
                if (r)
                        return r;
 
                /* JPEG EJPEG POISON EVENT */
                r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_jpeg[i],
-                       VCN_2_6__SRCID_EJPEG0_POISON, &adev->jpeg.inst[i].irq);
+                       VCN_2_6__SRCID_EJPEG0_POISON, &adev->jpeg.inst[i].ras_poison_irq);
                if (r)
                        return r;
        }
@@ -221,6 +221,9 @@ static int jpeg_v2_5_hw_fini(void *handle)
                if (adev->jpeg.cur_state != AMD_PG_STATE_GATE &&
                      RREG32_SOC15(JPEG, i, mmUVD_JRBC_STATUS))
                        jpeg_v2_5_set_powergating_state(adev, AMD_PG_STATE_GATE);
+
+               if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__JPEG))
+                       amdgpu_irq_put(adev, &adev->jpeg.inst[i].ras_poison_irq, 0);
        }
 
        return 0;
@@ -569,6 +572,14 @@ static int jpeg_v2_5_set_interrupt_state(struct amdgpu_device *adev,
        return 0;
 }
 
+static int jpeg_v2_6_set_ras_interrupt_state(struct amdgpu_device *adev,
+                                       struct amdgpu_irq_src *source,
+                                       unsigned int type,
+                                       enum amdgpu_interrupt_state state)
+{
+       return 0;
+}
+
 static int jpeg_v2_5_process_interrupt(struct amdgpu_device *adev,
                                      struct amdgpu_irq_src *source,
                                      struct amdgpu_iv_entry *entry)
@@ -593,10 +604,6 @@ static int jpeg_v2_5_process_interrupt(struct amdgpu_device *adev,
        case VCN_2_0__SRCID__JPEG_DECODE:
                amdgpu_fence_process(&adev->jpeg.inst[ip_instance].ring_dec);
                break;
-       case VCN_2_6__SRCID_DJPEG0_POISON:
-       case VCN_2_6__SRCID_EJPEG0_POISON:
-               amdgpu_jpeg_process_poison_irq(adev, source, entry);
-               break;
        default:
                DRM_ERROR("Unhandled interrupt: %d %d\n",
                          entry->src_id, entry->src_data[0]);
@@ -725,6 +732,11 @@ static const struct amdgpu_irq_src_funcs jpeg_v2_5_irq_funcs = {
        .process = jpeg_v2_5_process_interrupt,
 };
 
+static const struct amdgpu_irq_src_funcs jpeg_v2_6_ras_irq_funcs = {
+       .set = jpeg_v2_6_set_ras_interrupt_state,
+       .process = amdgpu_jpeg_process_poison_irq,
+};
+
 static void jpeg_v2_5_set_irq_funcs(struct amdgpu_device *adev)
 {
        int i;
@@ -735,6 +747,9 @@ static void jpeg_v2_5_set_irq_funcs(struct amdgpu_device *adev)
 
                adev->jpeg.inst[i].irq.num_types = 1;
                adev->jpeg.inst[i].irq.funcs = &jpeg_v2_5_irq_funcs;
+
+               adev->jpeg.inst[i].ras_poison_irq.num_types = 1;
+               adev->jpeg.inst[i].ras_poison_irq.funcs = &jpeg_v2_6_ras_irq_funcs;
        }
 }
 
@@ -800,6 +815,7 @@ const struct amdgpu_ras_block_hw_ops jpeg_v2_6_ras_hw_ops = {
 static struct amdgpu_jpeg_ras jpeg_v2_6_ras = {
        .ras_block = {
                .hw_ops = &jpeg_v2_6_ras_hw_ops,
+               .ras_late_init = amdgpu_jpeg_ras_late_init,
        },
 };
 
index c55e094..1c2292c 100644 (file)
@@ -54,6 +54,7 @@ static int jpeg_v3_0_early_init(void *handle)
 
        switch (adev->ip_versions[UVD_HWIP][0]) {
        case IP_VERSION(3, 1, 1):
+       case IP_VERSION(3, 1, 2):
                break;
        default:
                harvest = RREG32_SOC15(JPEG, 0, mmCC_UVD_HARVESTING);
index 77e1e64..a3d83c9 100644 (file)
@@ -87,13 +87,13 @@ static int jpeg_v4_0_sw_init(void *handle)
 
        /* JPEG DJPEG POISON EVENT */
        r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VCN,
-                       VCN_4_0__SRCID_DJPEG0_POISON, &adev->jpeg.inst->irq);
+                       VCN_4_0__SRCID_DJPEG0_POISON, &adev->jpeg.inst->ras_poison_irq);
        if (r)
                return r;
 
        /* JPEG EJPEG POISON EVENT */
        r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VCN,
-                       VCN_4_0__SRCID_EJPEG0_POISON, &adev->jpeg.inst->irq);
+                       VCN_4_0__SRCID_EJPEG0_POISON, &adev->jpeg.inst->ras_poison_irq);
        if (r)
                return r;
 
@@ -202,7 +202,8 @@ static int jpeg_v4_0_hw_fini(void *handle)
                        RREG32_SOC15(JPEG, 0, regUVD_JRBC_STATUS))
                        jpeg_v4_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
        }
-       amdgpu_irq_put(adev, &adev->jpeg.inst->irq, 0);
+       if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__JPEG))
+               amdgpu_irq_put(adev, &adev->jpeg.inst->ras_poison_irq, 0);
 
        return 0;
 }
@@ -670,6 +671,14 @@ static int jpeg_v4_0_set_interrupt_state(struct amdgpu_device *adev,
        return 0;
 }
 
+static int jpeg_v4_0_set_ras_interrupt_state(struct amdgpu_device *adev,
+                                       struct amdgpu_irq_src *source,
+                                       unsigned int type,
+                                       enum amdgpu_interrupt_state state)
+{
+       return 0;
+}
+
 static int jpeg_v4_0_process_interrupt(struct amdgpu_device *adev,
                                      struct amdgpu_irq_src *source,
                                      struct amdgpu_iv_entry *entry)
@@ -680,10 +689,6 @@ static int jpeg_v4_0_process_interrupt(struct amdgpu_device *adev,
        case VCN_4_0__SRCID__JPEG_DECODE:
                amdgpu_fence_process(&adev->jpeg.inst->ring_dec);
                break;
-       case VCN_4_0__SRCID_DJPEG0_POISON:
-       case VCN_4_0__SRCID_EJPEG0_POISON:
-               amdgpu_jpeg_process_poison_irq(adev, source, entry);
-               break;
        default:
                DRM_DEV_ERROR(adev->dev, "Unhandled interrupt: %d %d\n",
                          entry->src_id, entry->src_data[0]);
@@ -753,10 +758,18 @@ static const struct amdgpu_irq_src_funcs jpeg_v4_0_irq_funcs = {
        .process = jpeg_v4_0_process_interrupt,
 };
 
+static const struct amdgpu_irq_src_funcs jpeg_v4_0_ras_irq_funcs = {
+       .set = jpeg_v4_0_set_ras_interrupt_state,
+       .process = amdgpu_jpeg_process_poison_irq,
+};
+
 static void jpeg_v4_0_set_irq_funcs(struct amdgpu_device *adev)
 {
        adev->jpeg.inst->irq.num_types = 1;
        adev->jpeg.inst->irq.funcs = &jpeg_v4_0_irq_funcs;
+
+       adev->jpeg.inst->ras_poison_irq.num_types = 1;
+       adev->jpeg.inst->ras_poison_irq.funcs = &jpeg_v4_0_ras_irq_funcs;
 }
 
 const struct amdgpu_ip_block_version jpeg_v4_0_ip_block = {
@@ -811,6 +824,7 @@ const struct amdgpu_ras_block_hw_ops jpeg_v4_0_ras_hw_ops = {
 static struct amdgpu_jpeg_ras jpeg_v4_0_ras = {
        .ras_block = {
                .hw_ops = &jpeg_v4_0_ras_hw_ops,
+               .ras_late_init = amdgpu_jpeg_ras_late_init,
        },
 };
 
index 98c826f..0fb6013 100644 (file)
@@ -98,6 +98,16 @@ static const struct amdgpu_video_codecs nv_video_codecs_decode =
 };
 
 /* Sienna Cichlid */
+static const struct amdgpu_video_codec_info sc_video_codecs_encode_array[] = {
+       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2160, 0)},
+       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 7680, 4352, 0)},
+};
+
+static const struct amdgpu_video_codecs sc_video_codecs_encode = {
+       .codec_count = ARRAY_SIZE(sc_video_codecs_encode_array),
+       .codec_array = sc_video_codecs_encode_array,
+};
+
 static const struct amdgpu_video_codec_info sc_video_codecs_decode_array_vcn0[] =
 {
        {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG2, 4096, 4096, 3)},
@@ -136,8 +146,8 @@ static const struct amdgpu_video_codecs sc_video_codecs_decode_vcn1 =
 /* SRIOV Sienna Cichlid, not const since data is controlled by host */
 static struct amdgpu_video_codec_info sriov_sc_video_codecs_encode_array[] =
 {
-       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2304, 0)},
-       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 0)},
+       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2160, 0)},
+       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 7680, 4352, 0)},
 };
 
 static struct amdgpu_video_codec_info sriov_sc_video_codecs_decode_array_vcn0[] =
@@ -237,12 +247,12 @@ static int nv_query_video_codecs(struct amdgpu_device *adev, bool encode,
                } else {
                        if (adev->vcn.harvest_config & AMDGPU_VCN_HARVEST_VCN0) {
                                if (encode)
-                                       *codecs = &nv_video_codecs_encode;
+                                       *codecs = &sc_video_codecs_encode;
                                else
                                        *codecs = &sc_video_codecs_decode_vcn1;
                        } else {
                                if (encode)
-                                       *codecs = &nv_video_codecs_encode;
+                                       *codecs = &sc_video_codecs_encode;
                                else
                                        *codecs = &sc_video_codecs_decode_vcn0;
                        }
@@ -251,14 +261,14 @@ static int nv_query_video_codecs(struct amdgpu_device *adev, bool encode,
        case IP_VERSION(3, 0, 16):
        case IP_VERSION(3, 0, 2):
                if (encode)
-                       *codecs = &nv_video_codecs_encode;
+                       *codecs = &sc_video_codecs_encode;
                else
                        *codecs = &sc_video_codecs_decode_vcn0;
                return 0;
        case IP_VERSION(3, 1, 1):
        case IP_VERSION(3, 1, 2):
                if (encode)
-                       *codecs = &nv_video_codecs_encode;
+                       *codecs = &sc_video_codecs_encode;
                else
                        *codecs = &yc_video_codecs_decode;
                return 0;
index e1b7fca..5f10883 100644 (file)
@@ -57,7 +57,13 @@ static int psp_v10_0_init_microcode(struct psp_context *psp)
        if (err)
                return err;
 
-       return psp_init_ta_microcode(psp, ucode_prefix);
+       err = psp_init_ta_microcode(psp, ucode_prefix);
+       if ((adev->ip_versions[GC_HWIP][0] == IP_VERSION(9, 1, 0)) &&
+               (adev->pdev->revision == 0xa1) &&
+               (psp->securedisplay_context.context.bin_desc.fw_version >= 0x27000008)) {
+               adev->psp.securedisplay_context.context.bin_desc.size_bytes = 0;
+       }
+       return err;
 }
 
 static int psp_v10_0_ring_create(struct psp_context *psp,
index b3cc04d..9295ac7 100644 (file)
@@ -1917,9 +1917,11 @@ static int sdma_v4_0_hw_fini(void *handle)
                return 0;
        }
 
-       for (i = 0; i < adev->sdma.num_instances; i++) {
-               amdgpu_irq_put(adev, &adev->sdma.ecc_irq,
-                              AMDGPU_SDMA_IRQ_INSTANCE0 + i);
+       if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__SDMA)) {
+               for (i = 0; i < adev->sdma.num_instances; i++) {
+                       amdgpu_irq_put(adev, &adev->sdma.ecc_irq,
+                                      AMDGPU_SDMA_IRQ_INSTANCE0 + i);
+               }
        }
 
        sdma_v4_0_ctx_switch_enable(adev, false);
index 6d15d5c..a2fd1ff 100644 (file)
@@ -301,10 +301,11 @@ static u32 soc15_get_xclk(struct amdgpu_device *adev)
        u32 reference_clock = adev->clock.spll.reference_freq;
 
        if (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(12, 0, 0) ||
-           adev->ip_versions[MP1_HWIP][0] == IP_VERSION(12, 0, 1) ||
-           adev->ip_versions[MP1_HWIP][0] == IP_VERSION(10, 0, 0) ||
-           adev->ip_versions[MP1_HWIP][0] == IP_VERSION(10, 0, 1))
+           adev->ip_versions[MP1_HWIP][0] == IP_VERSION(12, 0, 1))
                return 10000;
+       if (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(10, 0, 0) ||
+           adev->ip_versions[MP1_HWIP][0] == IP_VERSION(10, 0, 1))
+               return reference_clock / 4;
 
        return reference_clock;
 }
index 744be2a..d771625 100644 (file)
@@ -711,7 +711,7 @@ static int soc21_common_early_init(void *handle)
                        AMD_PG_SUPPORT_VCN_DPG |
                        AMD_PG_SUPPORT_GFX_PG |
                        AMD_PG_SUPPORT_JPEG;
-               adev->external_rev_id = adev->rev_id + 0x1;
+               adev->external_rev_id = adev->rev_id + 0x80;
                break;
 
        default:
index ab0b45d..515681c 100644 (file)
@@ -143,7 +143,7 @@ static int vcn_v2_5_sw_init(void *handle)
 
                /* VCN POISON TRAP */
                r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[j],
-                       VCN_2_6__SRCID_UVD_POISON, &adev->vcn.inst[j].irq);
+                       VCN_2_6__SRCID_UVD_POISON, &adev->vcn.inst[j].ras_poison_irq);
                if (r)
                        return r;
        }
@@ -354,6 +354,9 @@ static int vcn_v2_5_hw_fini(void *handle)
                    (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
                     RREG32_SOC15(VCN, i, mmUVD_STATUS)))
                        vcn_v2_5_set_powergating_state(adev, AMD_PG_STATE_GATE);
+
+               if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__VCN))
+                       amdgpu_irq_put(adev, &adev->vcn.inst[i].ras_poison_irq, 0);
        }
 
        return 0;
@@ -1807,6 +1810,14 @@ static int vcn_v2_5_set_interrupt_state(struct amdgpu_device *adev,
        return 0;
 }
 
+static int vcn_v2_6_set_ras_interrupt_state(struct amdgpu_device *adev,
+                                       struct amdgpu_irq_src *source,
+                                       unsigned int type,
+                                       enum amdgpu_interrupt_state state)
+{
+       return 0;
+}
+
 static int vcn_v2_5_process_interrupt(struct amdgpu_device *adev,
                                      struct amdgpu_irq_src *source,
                                      struct amdgpu_iv_entry *entry)
@@ -1837,9 +1848,6 @@ static int vcn_v2_5_process_interrupt(struct amdgpu_device *adev,
        case VCN_2_0__SRCID__UVD_ENC_LOW_LATENCY:
                amdgpu_fence_process(&adev->vcn.inst[ip_instance].ring_enc[1]);
                break;
-       case VCN_2_6__SRCID_UVD_POISON:
-               amdgpu_vcn_process_poison_irq(adev, source, entry);
-               break;
        default:
                DRM_ERROR("Unhandled interrupt: %d %d\n",
                          entry->src_id, entry->src_data[0]);
@@ -1854,6 +1862,11 @@ static const struct amdgpu_irq_src_funcs vcn_v2_5_irq_funcs = {
        .process = vcn_v2_5_process_interrupt,
 };
 
+static const struct amdgpu_irq_src_funcs vcn_v2_6_ras_irq_funcs = {
+       .set = vcn_v2_6_set_ras_interrupt_state,
+       .process = amdgpu_vcn_process_poison_irq,
+};
+
 static void vcn_v2_5_set_irq_funcs(struct amdgpu_device *adev)
 {
        int i;
@@ -1863,6 +1876,9 @@ static void vcn_v2_5_set_irq_funcs(struct amdgpu_device *adev)
                        continue;
                adev->vcn.inst[i].irq.num_types = adev->vcn.num_enc_rings + 1;
                adev->vcn.inst[i].irq.funcs = &vcn_v2_5_irq_funcs;
+
+               adev->vcn.inst[i].ras_poison_irq.num_types = adev->vcn.num_enc_rings + 1;
+               adev->vcn.inst[i].ras_poison_irq.funcs = &vcn_v2_6_ras_irq_funcs;
        }
 }
 
@@ -1965,6 +1981,7 @@ const struct amdgpu_ras_block_hw_ops vcn_v2_6_ras_hw_ops = {
 static struct amdgpu_vcn_ras vcn_v2_6_ras = {
        .ras_block = {
                .hw_ops = &vcn_v2_6_ras_hw_ops,
+               .ras_late_init = amdgpu_vcn_ras_late_init,
        },
 };
 
index bf06740..da126ff 100644 (file)
@@ -129,7 +129,11 @@ static int vcn_v4_0_sw_init(void *handle)
                if (adev->vcn.harvest_config & (1 << i))
                        continue;
 
-               atomic_set(&adev->vcn.inst[i].sched_score, 0);
+               /* Init instance 0 sched_score to 1, so it's scheduled after other instances */
+               if (i == 0)
+                       atomic_set(&adev->vcn.inst[i].sched_score, 1);
+               else
+                       atomic_set(&adev->vcn.inst[i].sched_score, 0);
 
                /* VCN UNIFIED TRAP */
                r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[i],
@@ -139,7 +143,7 @@ static int vcn_v4_0_sw_init(void *handle)
 
                /* VCN POISON TRAP */
                r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[i],
-                               VCN_4_0__SRCID_UVD_POISON, &adev->vcn.inst[i].irq);
+                               VCN_4_0__SRCID_UVD_POISON, &adev->vcn.inst[i].ras_poison_irq);
                if (r)
                        return r;
 
@@ -305,8 +309,8 @@ static int vcn_v4_0_hw_fini(void *handle)
                         vcn_v4_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
                        }
                }
-
-               amdgpu_irq_put(adev, &adev->vcn.inst[i].irq, 0);
+               if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__VCN))
+                       amdgpu_irq_put(adev, &adev->vcn.inst[i].ras_poison_irq, 0);
        }
 
        return 0;
@@ -1976,6 +1980,24 @@ static int vcn_v4_0_set_interrupt_state(struct amdgpu_device *adev, struct amdgp
 }
 
 /**
+ * vcn_v4_0_set_ras_interrupt_state - set VCN block RAS interrupt state
+ *
+ * @adev: amdgpu_device pointer
+ * @source: interrupt sources
+ * @type: interrupt types
+ * @state: interrupt states
+ *
+ * Set VCN block RAS interrupt state
+ */
+static int vcn_v4_0_set_ras_interrupt_state(struct amdgpu_device *adev,
+       struct amdgpu_irq_src *source,
+       unsigned int type,
+       enum amdgpu_interrupt_state state)
+{
+       return 0;
+}
+
+/**
  * vcn_v4_0_process_interrupt - process VCN block interrupt
  *
  * @adev: amdgpu_device pointer
@@ -2007,9 +2029,6 @@ static int vcn_v4_0_process_interrupt(struct amdgpu_device *adev, struct amdgpu_
        case VCN_4_0__SRCID__UVD_ENC_GENERAL_PURPOSE:
                amdgpu_fence_process(&adev->vcn.inst[ip_instance].ring_enc[0]);
                break;
-       case VCN_4_0__SRCID_UVD_POISON:
-               amdgpu_vcn_process_poison_irq(adev, source, entry);
-               break;
        default:
                DRM_ERROR("Unhandled interrupt: %d %d\n",
                          entry->src_id, entry->src_data[0]);
@@ -2024,6 +2043,11 @@ static const struct amdgpu_irq_src_funcs vcn_v4_0_irq_funcs = {
        .process = vcn_v4_0_process_interrupt,
 };
 
+static const struct amdgpu_irq_src_funcs vcn_v4_0_ras_irq_funcs = {
+       .set = vcn_v4_0_set_ras_interrupt_state,
+       .process = amdgpu_vcn_process_poison_irq,
+};
+
 /**
  * vcn_v4_0_set_irq_funcs - set VCN block interrupt irq functions
  *
@@ -2041,6 +2065,9 @@ static void vcn_v4_0_set_irq_funcs(struct amdgpu_device *adev)
 
                adev->vcn.inst[i].irq.num_types = adev->vcn.num_enc_rings + 1;
                adev->vcn.inst[i].irq.funcs = &vcn_v4_0_irq_funcs;
+
+               adev->vcn.inst[i].ras_poison_irq.num_types = adev->vcn.num_enc_rings + 1;
+               adev->vcn.inst[i].ras_poison_irq.funcs = &vcn_v4_0_ras_irq_funcs;
        }
 }
 
@@ -2114,6 +2141,7 @@ const struct amdgpu_ras_block_hw_ops vcn_v4_0_ras_hw_ops = {
 static struct amdgpu_vcn_ras vcn_v4_0_ras = {
        .ras_block = {
                .hw_ops = &vcn_v4_0_ras_hw_ops,
+               .ras_late_init = amdgpu_vcn_ras_late_init,
        },
 };
 
index 531f173..c0360db 100644 (file)
@@ -542,8 +542,15 @@ static u32 vi_get_xclk(struct amdgpu_device *adev)
        u32 reference_clock = adev->clock.spll.reference_freq;
        u32 tmp;
 
-       if (adev->flags & AMD_IS_APU)
-               return reference_clock;
+       if (adev->flags & AMD_IS_APU) {
+               switch (adev->asic_type) {
+               case CHIP_STONEY:
+                       /* vbios says 48Mhz, but the actual freq is 100Mhz */
+                       return 10000;
+               default:
+                       return reference_clock;
+               }
+       }
 
        tmp = RREG32_SMC(ixCG_CLKPIN_CNTL_2);
        if (REG_GET_FIELD(tmp, CG_CLKPIN_CNTL_2, MUX_TCLK_TO_XCLK))
index 8b4b186..7acd73e 100644 (file)
@@ -2479,20 +2479,25 @@ static void dm_gpureset_toggle_interrupts(struct amdgpu_device *adev,
                if (acrtc && state->stream_status[i].plane_count != 0) {
                        irq_source = IRQ_TYPE_PFLIP + acrtc->otg_inst;
                        rc = dc_interrupt_set(adev->dm.dc, irq_source, enable) ? 0 : -EBUSY;
-                       DRM_DEBUG_VBL("crtc %d - vupdate irq %sabling: r=%d\n",
-                                     acrtc->crtc_id, enable ? "en" : "dis", rc);
                        if (rc)
                                DRM_WARN("Failed to %s pflip interrupts\n",
                                         enable ? "enable" : "disable");
 
                        if (enable) {
-                               rc = amdgpu_dm_crtc_enable_vblank(&acrtc->base);
-                               if (rc)
-                                       DRM_WARN("Failed to enable vblank interrupts\n");
-                       } else {
-                               amdgpu_dm_crtc_disable_vblank(&acrtc->base);
-                       }
+                               if (amdgpu_dm_crtc_vrr_active(to_dm_crtc_state(acrtc->base.state)))
+                                       rc = amdgpu_dm_crtc_set_vupdate_irq(&acrtc->base, true);
+                       } else
+                               rc = amdgpu_dm_crtc_set_vupdate_irq(&acrtc->base, false);
+
+                       if (rc)
+                               DRM_WARN("Failed to %sable vupdate interrupt\n", enable ? "en" : "dis");
 
+                       irq_source = IRQ_TYPE_VBLANK + acrtc->otg_inst;
+                       /* During gpu-reset we disable and then enable vblank irq, so
+                        * don't use amdgpu_irq_get/put() to avoid refcount change.
+                        */
+                       if (!dc_interrupt_set(adev->dm.dc, irq_source, enable))
+                               DRM_WARN("Failed to %sable vblank interrupt\n", enable ? "en" : "dis");
                }
        }
 
@@ -2852,7 +2857,7 @@ static int dm_resume(void *handle)
                 * this is the case when traversing through already created
                 * MST connectors, should be skipped
                 */
-               if (aconnector->dc_link->type == dc_connection_mst_branch)
+               if (aconnector && aconnector->mst_root)
                        continue;
 
                mutex_lock(&aconnector->hpd_lock);
@@ -6737,7 +6742,7 @@ static int dm_encoder_helper_atomic_check(struct drm_encoder *encoder,
        int clock, bpp = 0;
        bool is_y420 = false;
 
-       if (!aconnector->mst_output_port || !aconnector->dc_sink)
+       if (!aconnector->mst_output_port)
                return 0;
 
        mst_port = aconnector->mst_output_port;
@@ -7191,7 +7196,13 @@ static int amdgpu_dm_connector_get_modes(struct drm_connector *connector)
                                drm_add_modes_noedid(connector, 1920, 1080);
        } else {
                amdgpu_dm_connector_ddc_get_modes(connector, edid);
-               amdgpu_dm_connector_add_common_modes(encoder, connector);
+               /* most eDP supports only timings from its edid,
+                * usually only detailed timings are available
+                * from eDP edid. timings which are not from edid
+                * may damage eDP
+                */
+               if (connector->connector_type != DRM_MODE_CONNECTOR_eDP)
+                       amdgpu_dm_connector_add_common_modes(encoder, connector);
                amdgpu_dm_connector_add_freesync_modes(connector, edid);
        }
        amdgpu_dm_fbc_init(connector);
@@ -8193,6 +8204,12 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
                if (acrtc_state->abm_level != dm_old_crtc_state->abm_level)
                        bundle->stream_update.abm_level = &acrtc_state->abm_level;
 
+               mutex_lock(&dm->dc_lock);
+               if ((acrtc_state->update_type > UPDATE_TYPE_FAST) &&
+                               acrtc_state->stream->link->psr_settings.psr_allow_active)
+                       amdgpu_dm_psr_disable(acrtc_state->stream);
+               mutex_unlock(&dm->dc_lock);
+
                /*
                 * If FreeSync state on the stream has changed then we need to
                 * re-adjust the min/max bounds now that DC doesn't handle this
@@ -8206,10 +8223,6 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
                        spin_unlock_irqrestore(&pcrtc->dev->event_lock, flags);
                }
                mutex_lock(&dm->dc_lock);
-               if ((acrtc_state->update_type > UPDATE_TYPE_FAST) &&
-                               acrtc_state->stream->link->psr_settings.psr_allow_active)
-                       amdgpu_dm_psr_disable(acrtc_state->stream);
-
                update_planes_and_stream_adapter(dm->dc,
                                         acrtc_state->update_type,
                                         planes_count,
index e3762e8..440fc08 100644 (file)
@@ -146,7 +146,6 @@ static void vblank_control_worker(struct work_struct *work)
 
 static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
 {
-       enum dc_irq_source irq_source;
        struct amdgpu_crtc *acrtc = to_amdgpu_crtc(crtc);
        struct amdgpu_device *adev = drm_to_adev(crtc->dev);
        struct dm_crtc_state *acrtc_state = to_dm_crtc_state(crtc->state);
@@ -169,18 +168,9 @@ static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
        if (rc)
                return rc;
 
-       if (amdgpu_in_reset(adev)) {
-               irq_source = IRQ_TYPE_VBLANK + acrtc->otg_inst;
-               /* During gpu-reset we disable and then enable vblank irq, so
-                * don't use amdgpu_irq_get/put() to avoid refcount change.
-                */
-               if (!dc_interrupt_set(adev->dm.dc, irq_source, enable))
-                       rc = -EBUSY;
-       } else {
-               rc = (enable)
-                       ? amdgpu_irq_get(adev, &adev->crtc_irq, acrtc->crtc_id)
-                       : amdgpu_irq_put(adev, &adev->crtc_irq, acrtc->crtc_id);
-       }
+       rc = (enable)
+               ? amdgpu_irq_get(adev, &adev->crtc_irq, acrtc->crtc_id)
+               : amdgpu_irq_put(adev, &adev->crtc_irq, acrtc->crtc_id);
 
        if (rc)
                return rc;
index 52564b9..7cde67b 100644 (file)
@@ -1981,6 +1981,9 @@ static enum dc_status dc_commit_state_no_check(struct dc *dc, struct dc_state *c
        return result;
 }
 
+static bool commit_minimal_transition_state(struct dc *dc,
+               struct dc_state *transition_base_context);
+
 /**
  * dc_commit_streams - Commit current stream state
  *
@@ -2002,6 +2005,8 @@ enum dc_status dc_commit_streams(struct dc *dc,
        struct dc_state *context;
        enum dc_status res = DC_OK;
        struct dc_validation_set set[MAX_STREAMS] = {0};
+       struct pipe_ctx *pipe;
+       bool handle_exit_odm2to1 = false;
 
        if (dc->ctx->dce_environment == DCE_ENV_VIRTUAL_HW)
                return res;
@@ -2026,6 +2031,22 @@ enum dc_status dc_commit_streams(struct dc *dc,
                }
        }
 
+       /* Check for case where we are going from odm 2:1 to max
+        *  pipe scenario.  For these cases, we will call
+        *  commit_minimal_transition_state() to exit out of odm 2:1
+        *  first before processing new streams
+        */
+       if (stream_count == dc->res_pool->pipe_count) {
+               for (i = 0; i < dc->res_pool->pipe_count; i++) {
+                       pipe = &dc->current_state->res_ctx.pipe_ctx[i];
+                       if (pipe->next_odm_pipe)
+                               handle_exit_odm2to1 = true;
+               }
+       }
+
+       if (handle_exit_odm2to1)
+               res = commit_minimal_transition_state(dc, dc->current_state);
+
        context = dc_create_state(dc);
        if (!context)
                goto context_alloc_fail;
@@ -3872,6 +3893,7 @@ static bool commit_minimal_transition_state(struct dc *dc,
        unsigned int i, j;
        unsigned int pipe_in_use = 0;
        bool subvp_in_use = false;
+       bool odm_in_use = false;
 
        if (!transition_context)
                return false;
@@ -3900,6 +3922,18 @@ static bool commit_minimal_transition_state(struct dc *dc,
                }
        }
 
+       /* If ODM is enabled and we are adding or removing planes from any ODM
+        * pipe, we must use the minimal transition.
+        */
+       for (i = 0; i < dc->res_pool->pipe_count; i++) {
+               struct pipe_ctx *pipe = &dc->current_state->res_ctx.pipe_ctx[i];
+
+               if (pipe->stream && pipe->next_odm_pipe) {
+                       odm_in_use = true;
+                       break;
+               }
+       }
+
        /* When the OS add a new surface if we have been used all of pipes with odm combine
         * and mpc split feature, it need use commit_minimal_transition_state to transition safely.
         * After OS exit MPO, it will back to use odm and mpc split with all of pipes, we need
@@ -3908,7 +3942,7 @@ static bool commit_minimal_transition_state(struct dc *dc,
         * Reduce the scenarios to use dc_commit_state_no_check in the stage of flip. Especially
         * enter/exit MPO when DCN still have enough resources.
         */
-       if (pipe_in_use != dc->res_pool->pipe_count && !subvp_in_use) {
+       if (pipe_in_use != dc->res_pool->pipe_count && !subvp_in_use && !odm_in_use) {
                dc_release_state(transition_context);
                return true;
        }
index 117d80c..fe15513 100644 (file)
@@ -1446,6 +1446,26 @@ static int acquire_first_split_pipe(
 
                        split_pipe->stream = stream;
                        return i;
+               } else if (split_pipe->prev_odm_pipe &&
+                               split_pipe->prev_odm_pipe->plane_state == split_pipe->plane_state) {
+                       split_pipe->prev_odm_pipe->next_odm_pipe = split_pipe->next_odm_pipe;
+                       if (split_pipe->next_odm_pipe)
+                               split_pipe->next_odm_pipe->prev_odm_pipe = split_pipe->prev_odm_pipe;
+
+                       if (split_pipe->prev_odm_pipe->plane_state)
+                               resource_build_scaling_params(split_pipe->prev_odm_pipe);
+
+                       memset(split_pipe, 0, sizeof(*split_pipe));
+                       split_pipe->stream_res.tg = pool->timing_generators[i];
+                       split_pipe->plane_res.hubp = pool->hubps[i];
+                       split_pipe->plane_res.ipp = pool->ipps[i];
+                       split_pipe->plane_res.dpp = pool->dpps[i];
+                       split_pipe->stream_res.opp = pool->opps[i];
+                       split_pipe->plane_res.mpcc_inst = pool->dpps[i]->inst;
+                       split_pipe->pipe_idx = i;
+
+                       split_pipe->stream = stream;
+                       return i;
                }
        }
        return -1;
index 422fbf7..5403e93 100644 (file)
@@ -2113,15 +2113,6 @@ void dcn20_optimize_bandwidth(
        if (hubbub->funcs->program_compbuf_size)
                hubbub->funcs->program_compbuf_size(hubbub, context->bw_ctx.bw.dcn.compbuf_size_kb, true);
 
-       if (context->bw_ctx.bw.dcn.clk.fw_based_mclk_switching) {
-               dc_dmub_srv_p_state_delegate(dc,
-                       true, context);
-               context->bw_ctx.bw.dcn.clk.p_state_change_support = true;
-               dc->clk_mgr->clks.fw_based_mclk_switching = true;
-       } else {
-               dc->clk_mgr->clks.fw_based_mclk_switching = false;
-       }
-
        dc->clk_mgr->funcs->update_clocks(
                        dc->clk_mgr,
                        context,
index 8263a07..32121db 100644 (file)
@@ -983,36 +983,13 @@ void dcn30_set_disp_pattern_generator(const struct dc *dc,
 }
 
 void dcn30_prepare_bandwidth(struct dc *dc,
-       struct dc_state *context)
+                            struct dc_state *context)
 {
-       bool p_state_change_support = context->bw_ctx.bw.dcn.clk.p_state_change_support;
-       /* Any transition into an FPO config should disable MCLK switching first to avoid
-        * driver and FW P-State synchronization issues.
-        */
-       if (context->bw_ctx.bw.dcn.clk.fw_based_mclk_switching || dc->clk_mgr->clks.fw_based_mclk_switching) {
-               dc->optimized_required = true;
-               context->bw_ctx.bw.dcn.clk.p_state_change_support = false;
-       }
-
        if (dc->clk_mgr->dc_mode_softmax_enabled)
                if (dc->clk_mgr->clks.dramclk_khz <= dc->clk_mgr->bw_params->dc_mode_softmax_memclk * 1000 &&
                                context->bw_ctx.bw.dcn.clk.dramclk_khz > dc->clk_mgr->bw_params->dc_mode_softmax_memclk * 1000)
                        dc->clk_mgr->funcs->set_max_memclk(dc->clk_mgr, dc->clk_mgr->bw_params->clk_table.entries[dc->clk_mgr->bw_params->clk_table.num_entries - 1].memclk_mhz);
 
        dcn20_prepare_bandwidth(dc, context);
-       /*
-        * enabled -> enabled: do not disable
-        * enabled -> disabled: disable
-        * disabled -> enabled: don't care
-        * disabled -> disabled: don't care
-        */
-       if (!context->bw_ctx.bw.dcn.clk.fw_based_mclk_switching)
-               dc_dmub_srv_p_state_delegate(dc, false, context);
-
-       if (context->bw_ctx.bw.dcn.clk.fw_based_mclk_switching || dc->clk_mgr->clks.fw_based_mclk_switching) {
-               /* After disabling P-State, restore the original value to ensure we get the correct P-State
-                * on the next optimize. */
-               context->bw_ctx.bw.dcn.clk.p_state_change_support = p_state_change_support;
-       }
 }
 
index 40c488b..cc3fe9c 100644 (file)
@@ -423,3 +423,68 @@ void dcn314_hubp_pg_control(struct dce_hwseq *hws, unsigned int hubp_inst, bool
 
        PERF_TRACE();
 }
+static void apply_symclk_on_tx_off_wa(struct dc_link *link)
+{
+       /* There are use cases where SYMCLK is referenced by OTG. For instance
+        * for TMDS signal, OTG relies SYMCLK even if TX video output is off.
+        * However current link interface will power off PHY when disabling link
+        * output. This will turn off SYMCLK generated by PHY. The workaround is
+        * to identify such case where SYMCLK is still in use by OTG when we
+        * power off PHY. When this is detected, we will temporarily power PHY
+        * back on and move PHY's SYMCLK state to SYMCLK_ON_TX_OFF by calling
+        * program_pix_clk interface. When OTG is disabled, we will then power
+        * off PHY by calling disable link output again.
+        *
+        * In future dcn generations, we plan to rework transmitter control
+        * interface so that we could have an option to set SYMCLK ON TX OFF
+        * state in one step without this workaround
+        */
+
+       struct dc *dc = link->ctx->dc;
+       struct pipe_ctx *pipe_ctx = NULL;
+       uint8_t i;
+
+       if (link->phy_state.symclk_ref_cnts.otg > 0) {
+               for (i = 0; i < MAX_PIPES; i++) {
+                       pipe_ctx = &dc->current_state->res_ctx.pipe_ctx[i];
+                       if (pipe_ctx->stream && pipe_ctx->stream->link == link && pipe_ctx->top_pipe == NULL) {
+                               pipe_ctx->clock_source->funcs->program_pix_clk(
+                                               pipe_ctx->clock_source,
+                                               &pipe_ctx->stream_res.pix_clk_params,
+                                               dc->link_srv->dp_get_encoding_format(
+                                                               &pipe_ctx->link_config.dp_link_settings),
+                                               &pipe_ctx->pll_settings);
+                               link->phy_state.symclk_state = SYMCLK_ON_TX_OFF;
+                               break;
+                       }
+               }
+       }
+}
+
+void dcn314_disable_link_output(struct dc_link *link,
+               const struct link_resource *link_res,
+               enum signal_type signal)
+{
+       struct dc *dc = link->ctx->dc;
+       const struct link_hwss *link_hwss = get_link_hwss(link, link_res);
+       struct dmcu *dmcu = dc->res_pool->dmcu;
+
+       if (signal == SIGNAL_TYPE_EDP &&
+                       link->dc->hwss.edp_backlight_control)
+               link->dc->hwss.edp_backlight_control(link, false);
+       else if (dmcu != NULL && dmcu->funcs->lock_phy)
+               dmcu->funcs->lock_phy(dmcu);
+
+       link_hwss->disable_link_output(link, link_res, signal);
+       link->phy_state.symclk_state = SYMCLK_OFF_TX_OFF;
+       /*
+        * Add the logic to extract BOTH power up and power down sequences
+        * from enable/disable link output and only call edp panel control
+        * in enable_link_dp and disable_link_dp once.
+        */
+       if (dmcu != NULL && dmcu->funcs->lock_phy)
+               dmcu->funcs->unlock_phy(dmcu);
+       dc->link_srv->dp_trace_source_sequence(link, DPCD_SOURCE_SEQ_AFTER_DISABLE_LINK_PHY);
+
+       apply_symclk_on_tx_off_wa(link);
+}
index c786d5e..6d0b625 100644 (file)
@@ -45,4 +45,6 @@ void dcn314_hubp_pg_control(struct dce_hwseq *hws, unsigned int hubp_inst, bool
 
 void dcn314_dpp_root_clock_control(struct dce_hwseq *hws, unsigned int dpp_inst, bool clock_on);
 
+void dcn314_disable_link_output(struct dc_link *link, const struct link_resource *link_res, enum signal_type signal);
+
 #endif /* __DC_HWSS_DCN314_H__ */
index 5267e90..a588f46 100644 (file)
@@ -105,7 +105,7 @@ static const struct hw_sequencer_funcs dcn314_funcs = {
        .enable_lvds_link_output = dce110_enable_lvds_link_output,
        .enable_tmds_link_output = dce110_enable_tmds_link_output,
        .enable_dp_link_output = dce110_enable_dp_link_output,
-       .disable_link_output = dce110_disable_link_output,
+       .disable_link_output = dcn314_disable_link_output,
        .z10_restore = dcn31_z10_restore,
        .z10_save_init = dcn31_z10_save_init,
        .set_disp_pattern_generator = dcn30_set_disp_pattern_generator,
index 47beb4e..0c4c320 100644 (file)
@@ -138,7 +138,7 @@ struct _vcs_dpi_soc_bounding_box_st dcn3_2_soc = {
        .urgent_out_of_order_return_per_channel_pixel_only_bytes = 4096,
        .urgent_out_of_order_return_per_channel_pixel_and_vm_bytes = 4096,
        .urgent_out_of_order_return_per_channel_vm_only_bytes = 4096,
-       .pct_ideal_sdp_bw_after_urgent = 100.0,
+       .pct_ideal_sdp_bw_after_urgent = 90.0,
        .pct_ideal_fabric_bw_after_urgent = 67.0,
        .pct_ideal_dram_sdp_bw_after_urgent_pixel_only = 20.0,
        .pct_ideal_dram_sdp_bw_after_urgent_pixel_and_vm = 60.0, // N/A, for now keep as is until DML implemented
index 13c7e73..d75248b 100644 (file)
@@ -810,7 +810,8 @@ static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman
                                        v->SwathHeightY[k],
                                        v->SwathHeightC[k],
                                        TWait,
-                                       v->DRAMSpeedPerState[mode_lib->vba.VoltageLevel] <= MEM_STROBE_FREQ_MHZ ?
+                                       (v->DRAMSpeedPerState[mode_lib->vba.VoltageLevel] <= MEM_STROBE_FREQ_MHZ ||
+                                               v->DCFCLKPerState[mode_lib->vba.VoltageLevel] <= MIN_DCFCLK_FREQ_MHZ) ?
                                                        mode_lib->vba.ip.min_prefetch_in_strobe_us : 0,
                                        /* Output */
                                        &v->DSTXAfterScaler[k],
@@ -3310,7 +3311,7 @@ void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_l
                                                        v->swath_width_chroma_ub_this_state[k],
                                                        v->SwathHeightYThisState[k],
                                                        v->SwathHeightCThisState[k], v->TWait,
-                                                       v->DRAMSpeedPerState[i] <= MEM_STROBE_FREQ_MHZ ?
+                                                       (v->DRAMSpeedPerState[i] <= MEM_STROBE_FREQ_MHZ || v->DCFCLKState[i][j] <= MIN_DCFCLK_FREQ_MHZ) ?
                                                                        mode_lib->vba.ip.min_prefetch_in_strobe_us : 0,
 
                                                        /* Output */
index 500b3dd..d98e36a 100644 (file)
@@ -53,6 +53,7 @@
 #define BPP_BLENDED_PIPE 0xffffffff
 
 #define MEM_STROBE_FREQ_MHZ 1600
+#define MIN_DCFCLK_FREQ_MHZ 200
 #define MEM_STROBE_MAX_DELIVERY_TIME_US 60.0
 
 struct display_mode_lib;
index a131e30..d471d58 100644 (file)
@@ -980,6 +980,11 @@ static bool detect_link_and_local_sink(struct dc_link *link,
                                        (link->dpcd_caps.dongle_type !=
                                                        DISPLAY_DONGLE_DP_HDMI_CONVERTER))
                                converter_disable_audio = true;
+
+                       /* limited link rate to HBR3 for DPIA until we implement USB4 V2 */
+                       if (link->ep_type == DISPLAY_ENDPOINT_USB4_DPIA &&
+                                       link->reported_link_cap.link_rate > LINK_RATE_HIGH3)
+                               link->reported_link_cap.link_rate = LINK_RATE_HIGH3;
                        break;
                }
 
index d4b7da5..e8b2fc4 100644 (file)
@@ -359,5 +359,8 @@ bool link_validate_dpia_bandwidth(const struct dc_stream_state *stream, const un
                link[i] = stream[i].link;
                bw_needed[i] = dc_bandwidth_in_kbps_from_timing(&stream[i].timing);
        }
+
+       ret = dpia_validate_usb4_bw(link, bw_needed, num_streams);
+
        return ret;
 }
index 300e156..078aaaa 100644 (file)
@@ -36,6 +36,8 @@
 #define amdgpu_dpm_enable_bapm(adev, e) \
                ((adev)->powerplay.pp_funcs->enable_bapm((adev)->powerplay.pp_handle, (e)))
 
+#define amdgpu_dpm_is_legacy_dpm(adev) ((adev)->powerplay.pp_handle == (adev))
+
 int amdgpu_dpm_get_sclk(struct amdgpu_device *adev, bool low)
 {
        const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
@@ -1460,15 +1462,24 @@ int amdgpu_dpm_get_smu_prv_buf_details(struct amdgpu_device *adev,
 
 int amdgpu_dpm_is_overdrive_supported(struct amdgpu_device *adev)
 {
-       struct pp_hwmgr *hwmgr = adev->powerplay.pp_handle;
-       struct smu_context *smu = adev->powerplay.pp_handle;
+       if (is_support_sw_smu(adev)) {
+               struct smu_context *smu = adev->powerplay.pp_handle;
 
-       if ((is_support_sw_smu(adev) && smu->od_enabled) ||
-           (is_support_sw_smu(adev) && smu->is_apu) ||
-               (!is_support_sw_smu(adev) && hwmgr->od_enabled))
-               return true;
+               return (smu->od_enabled || smu->is_apu);
+       } else {
+               struct pp_hwmgr *hwmgr;
 
-       return false;
+               /*
+                * dpm on some legacy asics don't carry od_enabled member
+                * as its pp_handle is casted directly from adev.
+                */
+               if (amdgpu_dpm_is_legacy_dpm(adev))
+                       return false;
+
+               hwmgr = (struct pp_hwmgr *)adev->powerplay.pp_handle;
+
+               return hwmgr->od_enabled;
+       }
 }
 
 int amdgpu_dpm_set_pp_table(struct amdgpu_device *adev,
index 58c2246..f4f4045 100644 (file)
@@ -871,13 +871,11 @@ static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
        }
        if (ret == -ENOENT) {
                size = amdgpu_dpm_print_clock_levels(adev, OD_SCLK, buf);
-               if (size > 0) {
-                       size += amdgpu_dpm_print_clock_levels(adev, OD_MCLK, buf + size);
-                       size += amdgpu_dpm_print_clock_levels(adev, OD_VDDC_CURVE, buf + size);
-                       size += amdgpu_dpm_print_clock_levels(adev, OD_VDDGFX_OFFSET, buf + size);
-                       size += amdgpu_dpm_print_clock_levels(adev, OD_RANGE, buf + size);
-                       size += amdgpu_dpm_print_clock_levels(adev, OD_CCLK, buf + size);
-               }
+               size += amdgpu_dpm_print_clock_levels(adev, OD_MCLK, buf + size);
+               size += amdgpu_dpm_print_clock_levels(adev, OD_VDDC_CURVE, buf + size);
+               size += amdgpu_dpm_print_clock_levels(adev, OD_VDDGFX_OFFSET, buf + size);
+               size += amdgpu_dpm_print_clock_levels(adev, OD_RANGE, buf + size);
+               size += amdgpu_dpm_print_clock_levels(adev, OD_CCLK, buf + size);
        }
 
        if (size == 0)
index d6d9e3b..02e69cc 100644 (file)
@@ -6925,23 +6925,6 @@ static int si_dpm_enable(struct amdgpu_device *adev)
        return 0;
 }
 
-static int si_set_temperature_range(struct amdgpu_device *adev)
-{
-       int ret;
-
-       ret = si_thermal_enable_alert(adev, false);
-       if (ret)
-               return ret;
-       ret = si_thermal_set_temperature_range(adev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX);
-       if (ret)
-               return ret;
-       ret = si_thermal_enable_alert(adev, true);
-       if (ret)
-               return ret;
-
-       return ret;
-}
-
 static void si_dpm_disable(struct amdgpu_device *adev)
 {
        struct rv7xx_power_info *pi = rv770_get_pi(adev);
@@ -7626,18 +7609,6 @@ static int si_dpm_process_interrupt(struct amdgpu_device *adev,
 
 static int si_dpm_late_init(void *handle)
 {
-       int ret;
-       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-
-       if (!adev->pm.dpm_enabled)
-               return 0;
-
-       ret = si_set_temperature_range(adev);
-       if (ret)
-               return ret;
-#if 0 //TODO ?
-       si_dpm_powergate_uvd(adev, true);
-#endif
        return 0;
 }
 
index 5633c57..2ddf519 100644 (file)
@@ -733,6 +733,24 @@ static int smu_late_init(void *handle)
                return ret;
        }
 
+       /*
+        * Explicitly notify PMFW the power mode the system in. Since
+        * the PMFW may boot the ASIC with a different mode.
+        * For those supporting ACDC switch via gpio, PMFW will
+        * handle the switch automatically. Driver involvement
+        * is unnecessary.
+        */
+       if (!smu->dc_controlled_by_gpio) {
+               ret = smu_set_power_source(smu,
+                                          adev->pm.ac_power ? SMU_POWER_SOURCE_AC :
+                                          SMU_POWER_SOURCE_DC);
+               if (ret) {
+                       dev_err(adev->dev, "Failed to switch to %s mode!\n",
+                               adev->pm.ac_power ? "AC" : "DC");
+                       return ret;
+               }
+       }
+
        if ((adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 1)) ||
            (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 3)))
                return 0;
index c400051..275f708 100644 (file)
@@ -3413,26 +3413,8 @@ static int navi10_post_smu_init(struct smu_context *smu)
                return 0;
 
        ret = navi10_run_umc_cdr_workaround(smu);
-       if (ret) {
+       if (ret)
                dev_err(adev->dev, "Failed to apply umc cdr workaround!\n");
-               return ret;
-       }
-
-       if (!smu->dc_controlled_by_gpio) {
-               /*
-                * For Navi1X, manually switch it to AC mode as PMFW
-                * may boot it with DC mode.
-                */
-               ret = smu_v11_0_set_power_source(smu,
-                                                adev->pm.ac_power ?
-                                                SMU_POWER_SOURCE_AC :
-                                                SMU_POWER_SOURCE_DC);
-               if (ret) {
-                       dev_err(adev->dev, "Failed to switch to %s mode!\n",
-                                       adev->pm.ac_power ? "AC" : "DC");
-                       return ret;
-               }
-       }
 
        return ret;
 }
index 75f1868..85d5359 100644 (file)
@@ -2067,33 +2067,94 @@ static int sienna_cichlid_display_disable_memory_clock_switch(struct smu_context
        return ret;
 }
 
+static void sienna_cichlid_get_override_pcie_settings(struct smu_context *smu,
+                                                     uint32_t *gen_speed_override,
+                                                     uint32_t *lane_width_override)
+{
+       struct amdgpu_device *adev = smu->adev;
+
+       *gen_speed_override = 0xff;
+       *lane_width_override = 0xff;
+
+       switch (adev->pdev->device) {
+       case 0x73A0:
+       case 0x73A1:
+       case 0x73A2:
+       case 0x73A3:
+       case 0x73AB:
+       case 0x73AE:
+               /* Bit 7:0: PCIE lane width, 1 to 7 corresponds is x1 to x32 */
+               *lane_width_override = 6;
+               break;
+       case 0x73E0:
+       case 0x73E1:
+       case 0x73E3:
+               *lane_width_override = 4;
+               break;
+       case 0x7420:
+       case 0x7421:
+       case 0x7422:
+       case 0x7423:
+       case 0x7424:
+               *lane_width_override = 3;
+               break;
+       default:
+               break;
+       }
+}
+
+#define MAX(a, b)      ((a) > (b) ? (a) : (b))
+
 static int sienna_cichlid_update_pcie_parameters(struct smu_context *smu,
                                         uint32_t pcie_gen_cap,
                                         uint32_t pcie_width_cap)
 {
        struct smu_11_0_dpm_context *dpm_context = smu->smu_dpm.dpm_context;
-
-       uint32_t smu_pcie_arg;
+       struct smu_11_0_pcie_table *pcie_table = &dpm_context->dpm_tables.pcie_table;
+       uint32_t gen_speed_override, lane_width_override;
        uint8_t *table_member1, *table_member2;
+       uint32_t min_gen_speed, max_gen_speed;
+       uint32_t min_lane_width, max_lane_width;
+       uint32_t smu_pcie_arg;
        int ret, i;
 
        GET_PPTABLE_MEMBER(PcieGenSpeed, &table_member1);
        GET_PPTABLE_MEMBER(PcieLaneCount, &table_member2);
 
-       /* lclk dpm table setup */
-       for (i = 0; i < MAX_PCIE_CONF; i++) {
-               dpm_context->dpm_tables.pcie_table.pcie_gen[i] = table_member1[i];
-               dpm_context->dpm_tables.pcie_table.pcie_lane[i] = table_member2[i];
+       sienna_cichlid_get_override_pcie_settings(smu,
+                                                 &gen_speed_override,
+                                                 &lane_width_override);
+
+       /* PCIE gen speed override */
+       if (gen_speed_override != 0xff) {
+               min_gen_speed = MIN(pcie_gen_cap, gen_speed_override);
+               max_gen_speed = MIN(pcie_gen_cap, gen_speed_override);
+       } else {
+               min_gen_speed = MAX(0, table_member1[0]);
+               max_gen_speed = MIN(pcie_gen_cap, table_member1[1]);
+               min_gen_speed = min_gen_speed > max_gen_speed ?
+                               max_gen_speed : min_gen_speed;
        }
+       pcie_table->pcie_gen[0] = min_gen_speed;
+       pcie_table->pcie_gen[1] = max_gen_speed;
+
+       /* PCIE lane width override */
+       if (lane_width_override != 0xff) {
+               min_lane_width = MIN(pcie_width_cap, lane_width_override);
+               max_lane_width = MIN(pcie_width_cap, lane_width_override);
+       } else {
+               min_lane_width = MAX(1, table_member2[0]);
+               max_lane_width = MIN(pcie_width_cap, table_member2[1]);
+               min_lane_width = min_lane_width > max_lane_width ?
+                                max_lane_width : min_lane_width;
+       }
+       pcie_table->pcie_lane[0] = min_lane_width;
+       pcie_table->pcie_lane[1] = max_lane_width;
 
        for (i = 0; i < NUM_LINK_LEVELS; i++) {
-               smu_pcie_arg = (i << 16) |
-                       ((table_member1[i] <= pcie_gen_cap) ?
-                        (table_member1[i] << 8) :
-                        (pcie_gen_cap << 8)) |
-                       ((table_member2[i] <= pcie_width_cap) ?
-                        table_member2[i] :
-                        pcie_width_cap);
+               smu_pcie_arg = (i << 16 |
+                               pcie_table->pcie_gen[i] << 8 |
+                               pcie_table->pcie_lane[i]);
 
                ret = smu_cmn_send_smc_msg_with_param(smu,
                                SMU_MSG_OverridePcieParameters,
@@ -2101,11 +2162,6 @@ static int sienna_cichlid_update_pcie_parameters(struct smu_context *smu,
                                NULL);
                if (ret)
                        return ret;
-
-               if (table_member1[i] > pcie_gen_cap)
-                       dpm_context->dpm_tables.pcie_table.pcie_gen[i] = pcie_gen_cap;
-               if (table_member2[i] > pcie_width_cap)
-                       dpm_context->dpm_tables.pcie_table.pcie_lane[i] = pcie_width_cap;
        }
 
        return 0;
index 7433dca..067b4e0 100644 (file)
@@ -582,7 +582,7 @@ static int vangogh_print_legacy_clk_levels(struct smu_context *smu,
        DpmClocks_t *clk_table = smu->smu_table.clocks_table;
        SmuMetrics_legacy_t metrics;
        struct smu_dpm_context *smu_dpm_ctx = &(smu->smu_dpm);
-       int i, size = 0, ret = 0;
+       int i, idx, size = 0, ret = 0;
        uint32_t cur_value = 0, value = 0, count = 0;
        bool cur_value_match_level = false;
 
@@ -656,7 +656,8 @@ static int vangogh_print_legacy_clk_levels(struct smu_context *smu,
        case SMU_MCLK:
        case SMU_FCLK:
                for (i = 0; i < count; i++) {
-                       ret = vangogh_get_dpm_clk_limited(smu, clk_type, i, &value);
+                       idx = (clk_type == SMU_FCLK || clk_type == SMU_MCLK) ? (count - i - 1) : i;
+                       ret = vangogh_get_dpm_clk_limited(smu, clk_type, idx, &value);
                        if (ret)
                                return ret;
                        if (!value)
@@ -683,7 +684,7 @@ static int vangogh_print_clk_levels(struct smu_context *smu,
        DpmClocks_t *clk_table = smu->smu_table.clocks_table;
        SmuMetrics_t metrics;
        struct smu_dpm_context *smu_dpm_ctx = &(smu->smu_dpm);
-       int i, size = 0, ret = 0;
+       int i, idx, size = 0, ret = 0;
        uint32_t cur_value = 0, value = 0, count = 0;
        bool cur_value_match_level = false;
        uint32_t min, max;
@@ -765,7 +766,8 @@ static int vangogh_print_clk_levels(struct smu_context *smu,
        case SMU_MCLK:
        case SMU_FCLK:
                for (i = 0; i < count; i++) {
-                       ret = vangogh_get_dpm_clk_limited(smu, clk_type, i, &value);
+                       idx = (clk_type == SMU_FCLK || clk_type == SMU_MCLK) ? (count - i - 1) : i;
+                       ret = vangogh_get_dpm_clk_limited(smu, clk_type, idx, &value);
                        if (ret)
                                return ret;
                        if (!value)
index 5cdc071..8a8ba25 100644 (file)
@@ -494,7 +494,7 @@ static int renoir_set_fine_grain_gfx_freq_parameters(struct smu_context *smu)
 static int renoir_print_clk_levels(struct smu_context *smu,
                        enum smu_clk_type clk_type, char *buf)
 {
-       int i, size = 0, ret = 0;
+       int i, idx, size = 0, ret = 0;
        uint32_t cur_value = 0, value = 0, count = 0, min = 0, max = 0;
        SmuMetrics_t metrics;
        struct smu_dpm_context *smu_dpm_ctx = &(smu->smu_dpm);
@@ -594,7 +594,8 @@ static int renoir_print_clk_levels(struct smu_context *smu,
        case SMU_VCLK:
        case SMU_DCLK:
                for (i = 0; i < count; i++) {
-                       ret = renoir_get_dpm_clk_limited(smu, clk_type, i, &value);
+                       idx = (clk_type == SMU_FCLK || clk_type == SMU_MCLK) ? (count - i - 1) : i;
+                       ret = renoir_get_dpm_clk_limited(smu, clk_type, idx, &value);
                        if (ret)
                                return ret;
                        if (!value)
index 393c6a7..ca37918 100644 (file)
@@ -573,11 +573,11 @@ int smu_v13_0_init_power(struct smu_context *smu)
        if (smu_power->power_context || smu_power->power_context_size != 0)
                return -EINVAL;
 
-       smu_power->power_context = kzalloc(sizeof(struct smu_13_0_dpm_context),
+       smu_power->power_context = kzalloc(sizeof(struct smu_13_0_power_context),
                                           GFP_KERNEL);
        if (!smu_power->power_context)
                return -ENOMEM;
-       smu_power->power_context_size = sizeof(struct smu_13_0_dpm_context);
+       smu_power->power_context_size = sizeof(struct smu_13_0_power_context);
 
        return 0;
 }
index 09405ef..08577d1 100644 (file)
@@ -1696,10 +1696,39 @@ static int smu_v13_0_0_set_power_profile_mode(struct smu_context *smu,
                }
        }
 
-       /* conv PP_SMC_POWER_PROFILE* to WORKLOAD_PPLIB_*_BIT */
-       workload_type = smu_cmn_to_asic_specific_index(smu,
+       if (smu->power_profile_mode == PP_SMC_POWER_PROFILE_COMPUTE &&
+               (((smu->adev->pdev->device == 0x744C) && (smu->adev->pdev->revision == 0xC8)) ||
+               ((smu->adev->pdev->device == 0x744C) && (smu->adev->pdev->revision == 0xCC)))) {
+               ret = smu_cmn_update_table(smu,
+                                          SMU_TABLE_ACTIVITY_MONITOR_COEFF,
+                                          WORKLOAD_PPLIB_COMPUTE_BIT,
+                                          (void *)(&activity_monitor_external),
+                                          false);
+               if (ret) {
+                       dev_err(smu->adev->dev, "[%s] Failed to get activity monitor!", __func__);
+                       return ret;
+               }
+
+               ret = smu_cmn_update_table(smu,
+                                          SMU_TABLE_ACTIVITY_MONITOR_COEFF,
+                                          WORKLOAD_PPLIB_CUSTOM_BIT,
+                                          (void *)(&activity_monitor_external),
+                                          true);
+               if (ret) {
+                       dev_err(smu->adev->dev, "[%s] Failed to set activity monitor!", __func__);
+                       return ret;
+               }
+
+               workload_type = smu_cmn_to_asic_specific_index(smu,
+                                                      CMN2ASIC_MAPPING_WORKLOAD,
+                                                      PP_SMC_POWER_PROFILE_CUSTOM);
+       } else {
+               /* conv PP_SMC_POWER_PROFILE* to WORKLOAD_PPLIB_*_BIT */
+               workload_type = smu_cmn_to_asic_specific_index(smu,
                                                       CMN2ASIC_MAPPING_WORKLOAD,
                                                       smu->power_profile_mode);
+       }
+
        if (workload_type < 0)
                return -EINVAL;
 
index 8fa9a36..6d9760e 100644 (file)
@@ -478,7 +478,7 @@ static int smu_v13_0_4_get_dpm_level_count(struct smu_context *smu,
 static int smu_v13_0_4_print_clk_levels(struct smu_context *smu,
                                        enum smu_clk_type clk_type, char *buf)
 {
-       int i, size = 0, ret = 0;
+       int i, idx, size = 0, ret = 0;
        uint32_t cur_value = 0, value = 0, count = 0;
        uint32_t min, max;
 
@@ -512,7 +512,8 @@ static int smu_v13_0_4_print_clk_levels(struct smu_context *smu,
                        break;
 
                for (i = 0; i < count; i++) {
-                       ret = smu_v13_0_4_get_dpm_freq_by_index(smu, clk_type, i, &value);
+                       idx = (clk_type == SMU_FCLK || clk_type == SMU_MCLK) ? (count - i - 1) : i;
+                       ret = smu_v13_0_4_get_dpm_freq_by_index(smu, clk_type, idx, &value);
                        if (ret)
                                break;
 
index 6644596..0081fa6 100644 (file)
@@ -866,7 +866,7 @@ out:
 static int smu_v13_0_5_print_clk_levels(struct smu_context *smu,
                                enum smu_clk_type clk_type, char *buf)
 {
-       int i, size = 0, ret = 0;
+       int i, idx, size = 0, ret = 0;
        uint32_t cur_value = 0, value = 0, count = 0;
        uint32_t min = 0, max = 0;
 
@@ -898,7 +898,8 @@ static int smu_v13_0_5_print_clk_levels(struct smu_context *smu,
                        goto print_clk_out;
 
                for (i = 0; i < count; i++) {
-                       ret = smu_v13_0_5_get_dpm_freq_by_index(smu, clk_type, i, &value);
+                       idx = (clk_type == SMU_MCLK) ? (count - i - 1) : i;
+                       ret = smu_v13_0_5_get_dpm_freq_by_index(smu, clk_type, idx, &value);
                        if (ret)
                                goto print_clk_out;
 
index 3d9ff46..bba6216 100644 (file)
@@ -125,6 +125,7 @@ static struct cmn2asic_msg_mapping smu_v13_0_7_message_map[SMU_MSG_MAX_COUNT] =
        MSG_MAP(ArmD3,                          PPSMC_MSG_ArmD3,                       0),
        MSG_MAP(AllowGpo,                       PPSMC_MSG_SetGpoAllow,           0),
        MSG_MAP(GetPptLimit,                    PPSMC_MSG_GetPptLimit,                 0),
+       MSG_MAP(NotifyPowerSource,              PPSMC_MSG_NotifyPowerSource,           0),
 };
 
 static struct cmn2asic_mapping smu_v13_0_7_clk_map[SMU_CLK_COUNT] = {
@@ -1770,6 +1771,7 @@ static const struct pptable_funcs smu_v13_0_7_ppt_funcs = {
        .enable_mgpu_fan_boost = smu_v13_0_7_enable_mgpu_fan_boost,
        .get_power_limit = smu_v13_0_7_get_power_limit,
        .set_power_limit = smu_v13_0_set_power_limit,
+       .set_power_source = smu_v13_0_set_power_source,
        .get_power_profile_mode = smu_v13_0_7_get_power_profile_mode,
        .set_power_profile_mode = smu_v13_0_7_set_power_profile_mode,
        .set_tool_table_location = smu_v13_0_set_tool_table_location,
index 04e56b0..798f36c 100644 (file)
@@ -1000,7 +1000,7 @@ out:
 static int yellow_carp_print_clk_levels(struct smu_context *smu,
                                enum smu_clk_type clk_type, char *buf)
 {
-       int i, size = 0, ret = 0;
+       int i, idx, size = 0, ret = 0;
        uint32_t cur_value = 0, value = 0, count = 0;
        uint32_t min, max;
 
@@ -1033,7 +1033,8 @@ static int yellow_carp_print_clk_levels(struct smu_context *smu,
                        goto print_clk_out;
 
                for (i = 0; i < count; i++) {
-                       ret = yellow_carp_get_dpm_freq_by_index(smu, clk_type, i, &value);
+                       idx = (clk_type == SMU_FCLK || clk_type == SMU_MCLK) ? (count - i - 1) : i;
+                       ret = yellow_carp_get_dpm_freq_by_index(smu, clk_type, idx, &value);
                        if (ret)
                                goto print_clk_out;
 
index fbb070f..6dc1a09 100644 (file)
@@ -119,53 +119,32 @@ err_astdp_edid_not_ready:
 /*
  * Launch Aspeed DP
  */
-void ast_dp_launch(struct drm_device *dev, u8 bPower)
+void ast_dp_launch(struct drm_device *dev)
 {
-       u32 i = 0, j = 0, WaitCount = 1;
-       u8 bDPTX = 0;
+       u32 i = 0;
        u8 bDPExecute = 1;
-
        struct ast_device *ast = to_ast_device(dev);
-       // S3 come back, need more time to wait BMC ready.
-       if (bPower)
-               WaitCount = 300;
-
-
-       // Wait total count by different condition.
-       for (j = 0; j < WaitCount; j++) {
-               bDPTX = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xD1, TX_TYPE_MASK);
-
-               if (bDPTX)
-                       break;
 
+       // Wait one second then timeout.
+       while (ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xD1, ASTDP_MCU_FW_EXECUTING) !=
+               ASTDP_MCU_FW_EXECUTING) {
+               i++;
+               // wait 100 ms
                msleep(100);
-       }
 
-       // 0xE : ASTDP with DPMCU FW handling
-       if (bDPTX == ASTDP_DPMCU_TX) {
-               // Wait one second then timeout.
-               i = 0;
-
-               while (ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xD1, COPROCESSOR_LAUNCH) !=
-                       COPROCESSOR_LAUNCH) {
-                       i++;
-                       // wait 100 ms
-                       msleep(100);
-
-                       if (i >= 10) {
-                               // DP would not be ready.
-                               bDPExecute = 0;
-                               break;
-                       }
+               if (i >= 10) {
+                       // DP would not be ready.
+                       bDPExecute = 0;
+                       break;
                }
+       }
 
-               if (bDPExecute)
-                       ast->tx_chip_types |= BIT(AST_TX_ASTDP);
+       if (!bDPExecute)
+               drm_err(dev, "Wait DPMCU executing timeout\n");
 
-               ast_set_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xE5,
-                                                       (u8) ~ASTDP_HOST_EDID_READ_DONE_MASK,
-                                                       ASTDP_HOST_EDID_READ_DONE);
-       }
+       ast_set_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xE5,
+                              (u8) ~ASTDP_HOST_EDID_READ_DONE_MASK,
+                              ASTDP_HOST_EDID_READ_DONE);
 }
 
 
index a501169..5498a66 100644 (file)
@@ -350,9 +350,6 @@ int ast_mode_config_init(struct ast_device *ast);
 #define AST_DP501_LINKRATE     0xf014
 #define AST_DP501_EDID_DATA    0xf020
 
-/* Define for Soc scratched reg */
-#define COPROCESSOR_LAUNCH                     BIT(5)
-
 /*
  * Display Transmitter Type:
  */
@@ -480,7 +477,7 @@ struct ast_i2c_chan *ast_i2c_create(struct drm_device *dev);
 
 /* aspeed DP */
 int ast_astdp_read_edid(struct drm_device *dev, u8 *ediddata);
-void ast_dp_launch(struct drm_device *dev, u8 bPower);
+void ast_dp_launch(struct drm_device *dev);
 void ast_dp_power_on_off(struct drm_device *dev, bool no);
 void ast_dp_set_on_off(struct drm_device *dev, bool no);
 void ast_dp_set_mode(struct drm_crtc *crtc, struct ast_vbios_mode_info *vbios_mode);
index 794ffd4..1f35438 100644 (file)
@@ -254,8 +254,13 @@ static int ast_detect_chip(struct drm_device *dev, bool *need_post)
                case 0x0c:
                        ast->tx_chip_types = AST_TX_DP501_BIT;
                }
-       } else if (ast->chip == AST2600)
-               ast_dp_launch(&ast->base, 0);
+       } else if (ast->chip == AST2600) {
+               if (ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xD1, TX_TYPE_MASK) ==
+                   ASTDP_DPMCU_TX) {
+                       ast->tx_chip_types = AST_TX_ASTDP_BIT;
+                       ast_dp_launch(&ast->base);
+               }
+       }
 
        /* Print stuff for diagnostic purposes */
        if (ast->tx_chip_types & AST_TX_NONE_BIT)
@@ -264,6 +269,8 @@ static int ast_detect_chip(struct drm_device *dev, bool *need_post)
                drm_info(dev, "Using Sil164 TMDS transmitter\n");
        if (ast->tx_chip_types & AST_TX_DP501_BIT)
                drm_info(dev, "Using DP501 DisplayPort transmitter\n");
+       if (ast->tx_chip_types & AST_TX_ASTDP_BIT)
+               drm_info(dev, "Using ASPEED DisplayPort transmitter\n");
 
        return 0;
 }
@@ -425,11 +432,12 @@ struct ast_device *ast_device_create(const struct drm_driver *drv,
                return ERR_PTR(-EIO);
 
        /*
-        * If we don't have IO space at all, use MMIO now and
-        * assume the chip has MMIO enabled by default (rev 0x20
-        * and higher).
+        * After AST2500, MMIO is enabled by default, and it should be adopted
+        * to be compatible with Arm.
         */
-       if (!(pci_resource_flags(pdev, 2) & IORESOURCE_IO)) {
+       if (pdev->revision >= 0x40) {
+               ast->ioregs = ast->regs + AST_IO_MM_OFFSET;
+       } else if (!(pci_resource_flags(pdev, 2) & IORESOURCE_IO)) {
                drm_info(dev, "platform has no IO space, trying MMIO\n");
                ast->ioregs = ast->regs + AST_IO_MM_OFFSET;
        }
index 3637482..b3c670a 100644 (file)
@@ -1647,6 +1647,8 @@ static int ast_dp501_output_init(struct ast_device *ast)
 static int ast_astdp_connector_helper_get_modes(struct drm_connector *connector)
 {
        void *edid;
+       struct drm_device *dev = connector->dev;
+       struct ast_device *ast = to_ast_device(dev);
 
        int succ;
        int count;
@@ -1655,9 +1657,17 @@ static int ast_astdp_connector_helper_get_modes(struct drm_connector *connector)
        if (!edid)
                goto err_drm_connector_update_edid_property;
 
+       /*
+        * Protect access to I/O registers from concurrent modesetting
+        * by acquiring the I/O-register lock.
+        */
+       mutex_lock(&ast->ioregs_lock);
+
        succ = ast_astdp_read_edid(connector->dev, edid);
        if (succ < 0)
-               goto err_kfree;
+               goto err_mutex_unlock;
+
+       mutex_unlock(&ast->ioregs_lock);
 
        drm_connector_update_edid_property(connector, edid);
        count = drm_add_edid_modes(connector, edid);
@@ -1665,7 +1675,8 @@ static int ast_astdp_connector_helper_get_modes(struct drm_connector *connector)
 
        return count;
 
-err_kfree:
+err_mutex_unlock:
+       mutex_unlock(&ast->ioregs_lock);
        kfree(edid);
 err_drm_connector_update_edid_property:
        drm_connector_update_edid_property(connector, NULL);
index 71bb36b..a005aec 100644 (file)
@@ -380,7 +380,8 @@ void ast_post_gpu(struct drm_device *dev)
        ast_set_def_ext_reg(dev);
 
        if (ast->chip == AST2600) {
-               ast_dp_launch(dev, 1);
+               if (ast->tx_chip_types & AST_TX_ASTDP_BIT)
+                       ast_dp_launch(dev);
        } else if (ast->config_mode == ast_use_p2a) {
                if (ast->chip == AST2500)
                        ast_post_chip_2500(dev);
index 7a74878..4676cf2 100644 (file)
@@ -298,6 +298,10 @@ static void ti_sn_bridge_set_refclk_freq(struct ti_sn65dsi86 *pdata)
                if (refclk_lut[i] == refclk_rate)
                        break;
 
+       /* avoid buffer overflow and "1" is the default rate in the datasheet. */
+       if (i >= refclk_lut_size)
+               i = 1;
+
        regmap_update_bits(pdata->regmap, SN_DPPLL_SRC_REG, REFCLK_FREQ_MASK,
                           REFCLK_FREQ(i));
 
index 38dab76..e2e21ce 100644 (file)
@@ -3404,7 +3404,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
 
        /* Skip failed payloads */
        if (payload->vc_start_slot == -1) {
-               drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
+               drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
                            payload->port->connector->name);
                return -EIO;
        }
index 6445898..fd27f19 100644 (file)
@@ -641,19 +641,27 @@ static void drm_fb_helper_damage(struct drm_fb_helper *helper, u32 x, u32 y,
 static void drm_fb_helper_memory_range_to_clip(struct fb_info *info, off_t off, size_t len,
                                               struct drm_rect *clip)
 {
+       u32 line_length = info->fix.line_length;
+       u32 fb_height = info->var.yres;
        off_t end = off + len;
        u32 x1 = 0;
-       u32 y1 = off / info->fix.line_length;
+       u32 y1 = off / line_length;
        u32 x2 = info->var.xres;
-       u32 y2 = DIV_ROUND_UP(end, info->fix.line_length);
+       u32 y2 = DIV_ROUND_UP(end, line_length);
+
+       /* Don't allow any of them beyond the bottom bound of display area */
+       if (y1 > fb_height)
+               y1 = fb_height;
+       if (y2 > fb_height)
+               y2 = fb_height;
 
        if ((y2 - y1) == 1) {
                /*
                 * We've only written to a single scanline. Try to reduce
                 * the number of horizontal pixels that need an update.
                 */
-               off_t bit_off = (off % info->fix.line_length) * 8;
-               off_t bit_end = (end % info->fix.line_length) * 8;
+               off_t bit_off = (off % line_length) * 8;
+               off_t bit_end = (end % line_length) * 8;
 
                x1 = bit_off / info->var.bits_per_pixel;
                x2 = DIV_ROUND_UP(bit_end, info->var.bits_per_pixel);
@@ -1537,17 +1545,19 @@ static void drm_fb_helper_fill_pixel_fmt(struct fb_var_screeninfo *var,
        }
 }
 
-static void __fill_var(struct fb_var_screeninfo *var,
+static void __fill_var(struct fb_var_screeninfo *var, struct fb_info *info,
                       struct drm_framebuffer *fb)
 {
        int i;
 
        var->xres_virtual = fb->width;
        var->yres_virtual = fb->height;
-       var->accel_flags = FB_ACCELF_TEXT;
+       var->accel_flags = 0;
        var->bits_per_pixel = drm_format_info_bpp(fb->format, 0);
 
-       var->height = var->width = 0;
+       var->height = info->var.height;
+       var->width = info->var.width;
+
        var->left_margin = var->right_margin = 0;
        var->upper_margin = var->lower_margin = 0;
        var->hsync_len = var->vsync_len = 0;
@@ -1610,7 +1620,7 @@ int drm_fb_helper_check_var(struct fb_var_screeninfo *var,
                return -EINVAL;
        }
 
-       __fill_var(var, fb);
+       __fill_var(var, info, fb);
 
        /*
         * fb_pan_display() validates this, but fb_set_par() doesn't and just
@@ -2066,7 +2076,7 @@ static void drm_fb_helper_fill_var(struct fb_info *info,
        info->pseudo_palette = fb_helper->pseudo_palette;
        info->var.xoffset = 0;
        info->var.yoffset = 0;
-       __fill_var(&info->var, fb);
+       __fill_var(&info->var, info, fb);
        info->var.activate = FB_ACTIVATE_NOW;
 
        drm_fb_helper_fill_pixel_fmt(&info->var, format);
index 4cf214d..c21c3f6 100644 (file)
@@ -264,28 +264,10 @@ void drmm_kfree(struct drm_device *dev, void *data)
 }
 EXPORT_SYMBOL(drmm_kfree);
 
-static void drmm_mutex_release(struct drm_device *dev, void *res)
+void __drmm_mutex_release(struct drm_device *dev, void *res)
 {
        struct mutex *lock = res;
 
        mutex_destroy(lock);
 }
-
-/**
- * drmm_mutex_init - &drm_device-managed mutex_init()
- * @dev: DRM device
- * @lock: lock to be initialized
- *
- * Returns:
- * 0 on success, or a negative errno code otherwise.
- *
- * This is a &drm_device-managed version of mutex_init(). The initialized
- * lock is automatically destroyed on the final drm_dev_put().
- */
-int drmm_mutex_init(struct drm_device *dev, struct mutex *lock)
-{
-       mutex_init(lock);
-
-       return drmm_add_action_or_reset(dev, drmm_mutex_release, lock);
-}
-EXPORT_SYMBOL(drmm_mutex_init);
+EXPORT_SYMBOL(__drmm_mutex_release);
index 295382c..3fd6c73 100644 (file)
@@ -221,7 +221,7 @@ mipi_dsi_device_register_full(struct mipi_dsi_host *host,
                return dsi;
        }
 
-       dsi->dev.of_node = info->node;
+       device_set_node(&dsi->dev, of_fwnode_handle(info->node));
        dsi->channel = info->channel;
        strlcpy(dsi->name, info->type, sizeof(dsi->name));
 
index b1a38e6..0cb646c 100644 (file)
@@ -179,7 +179,7 @@ static const struct dmi_system_id orientation_data[] = {
        }, {    /* AYA NEO AIR */
                .matches = {
                  DMI_EXACT_MATCH(DMI_SYS_VENDOR, "AYANEO"),
-                 DMI_MATCH(DMI_BOARD_NAME, "AIR"),
+                 DMI_MATCH(DMI_PRODUCT_NAME, "AIR"),
                },
                .driver_data = (void *)&lcd1080x1920_leftside_up,
        }, {    /* AYA NEO NEXT */
index ec784e5..414e585 100644 (file)
@@ -1335,7 +1335,7 @@ int exynos_g2d_exec_ioctl(struct drm_device *drm_dev, void *data,
        /* Let the runqueue know that there is work to do. */
        queue_work(g2d->g2d_workq, &g2d->runqueue_work);
 
-       if (runqueue_node->async)
+       if (req->async)
                goto out;
 
        wait_for_completion(&runqueue_node->complete);
index 74ea3c2..1a5ae78 100644 (file)
@@ -34,11 +34,11 @@ static inline int exynos_g2d_exec_ioctl(struct drm_device *dev, void *data,
        return -ENODEV;
 }
 
-int g2d_open(struct drm_device *drm_dev, struct drm_file *file)
+static inline int g2d_open(struct drm_device *drm_dev, struct drm_file *file)
 {
        return 0;
 }
 
-void g2d_close(struct drm_device *drm_dev, struct drm_file *file)
+static inline void g2d_close(struct drm_device *drm_dev, struct drm_file *file)
 { }
 #endif
index 4d56c8c..f5e1adf 100644 (file)
@@ -469,8 +469,6 @@ static int vidi_remove(struct platform_device *pdev)
        if (ctx->raw_edid != (struct edid *)fake_edid_info) {
                kfree(ctx->raw_edid);
                ctx->raw_edid = NULL;
-
-               return -EINVAL;
        }
 
        component_del(&pdev->dev, &vidi_component_ops);
index 06a0ca1..e4f4d2e 100644 (file)
@@ -62,10 +62,11 @@ config DRM_I915_FORCE_PROBE
          This is the default value for the i915.force_probe module
          parameter. Using the module parameter overrides this option.
 
-         Force probe the i915 for Intel graphics devices that are
-         recognized but not properly supported by this kernel version. It is
-         recommended to upgrade to a kernel version with proper support as soon
-         as it is available.
+         Force probe the i915 driver for Intel graphics devices that are
+         recognized but not properly supported by this kernel version. Force
+         probing an unsupported device taints the kernel. It is recommended to
+         upgrade to a kernel version with proper support as soon as it is
+         available.
 
          It can also be used to block the probe of recognized and fully
          supported devices.
@@ -75,7 +76,8 @@ config DRM_I915_FORCE_PROBE
          Use "<pci-id>[,<pci-id>,...]" to force probe the i915 for listed
          devices. For example, "4500" or "4500,4571".
 
-         Use "*" to force probe the driver for all known devices.
+         Use "*" to force probe the driver for all known devices. Not
+         recommended.
 
          Use "!" right before the ID to block the probe of the device. For
          example, "4500,!4571" forces the probe of 4500 and blocks the probe of
index 40de9f0..f33164b 100644 (file)
@@ -1028,7 +1028,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
        int ret;
 
        if (old_obj) {
-               const struct intel_crtc_state *crtc_state =
+               const struct intel_crtc_state *new_crtc_state =
                        intel_atomic_get_new_crtc_state(state,
                                                        to_intel_crtc(old_plane_state->hw.crtc));
 
@@ -1043,7 +1043,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
                 * This should only fail upon a hung GPU, in which case we
                 * can safely continue.
                 */
-               if (intel_crtc_needs_modeset(crtc_state)) {
+               if (new_crtc_state && intel_crtc_needs_modeset(new_crtc_state)) {
                        ret = i915_sw_fence_await_reservation(&state->commit_ready,
                                                              old_obj->base.resv,
                                                              false, 0,
index 084a483..2aaaba0 100644 (file)
@@ -1453,6 +1453,18 @@ static u8 tgl_calc_voltage_level(int cdclk)
                return 0;
 }
 
+static u8 rplu_calc_voltage_level(int cdclk)
+{
+       if (cdclk > 556800)
+               return 3;
+       else if (cdclk > 480000)
+               return 2;
+       else if (cdclk > 312000)
+               return 1;
+       else
+               return 0;
+}
+
 static void icl_readout_refclk(struct drm_i915_private *dev_priv,
                               struct intel_cdclk_config *cdclk_config)
 {
@@ -3242,6 +3254,13 @@ static const struct intel_cdclk_funcs mtl_cdclk_funcs = {
        .calc_voltage_level = tgl_calc_voltage_level,
 };
 
+static const struct intel_cdclk_funcs rplu_cdclk_funcs = {
+       .get_cdclk = bxt_get_cdclk,
+       .set_cdclk = bxt_set_cdclk,
+       .modeset_calc_cdclk = bxt_modeset_calc_cdclk,
+       .calc_voltage_level = rplu_calc_voltage_level,
+};
+
 static const struct intel_cdclk_funcs tgl_cdclk_funcs = {
        .get_cdclk = bxt_get_cdclk,
        .set_cdclk = bxt_set_cdclk,
@@ -3384,14 +3403,17 @@ void intel_init_cdclk_hooks(struct drm_i915_private *dev_priv)
                dev_priv->display.funcs.cdclk = &tgl_cdclk_funcs;
                dev_priv->display.cdclk.table = dg2_cdclk_table;
        } else if (IS_ALDERLAKE_P(dev_priv)) {
-               dev_priv->display.funcs.cdclk = &tgl_cdclk_funcs;
                /* Wa_22011320316:adl-p[a0] */
-               if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
+               if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0)) {
                        dev_priv->display.cdclk.table = adlp_a_step_cdclk_table;
-               else if (IS_ADLP_RPLU(dev_priv))
+                       dev_priv->display.funcs.cdclk = &tgl_cdclk_funcs;
+               } else if (IS_ADLP_RPLU(dev_priv)) {
                        dev_priv->display.cdclk.table = rplu_cdclk_table;
-               else
+                       dev_priv->display.funcs.cdclk = &rplu_cdclk_funcs;
+               } else {
                        dev_priv->display.cdclk.table = adlp_cdclk_table;
+                       dev_priv->display.funcs.cdclk = &tgl_cdclk_funcs;
+               }
        } else if (IS_ROCKETLAKE(dev_priv)) {
                dev_priv->display.funcs.cdclk = &tgl_cdclk_funcs;
                dev_priv->display.cdclk.table = rkl_cdclk_table;
index 3c29792..0aae9a1 100644 (file)
@@ -1851,9 +1851,17 @@ static void hsw_crtc_disable(struct intel_atomic_state *state,
 
        intel_disable_shared_dpll(old_crtc_state);
 
-       intel_encoders_post_pll_disable(state, crtc);
+       if (!intel_crtc_is_bigjoiner_slave(old_crtc_state)) {
+               struct intel_crtc *slave_crtc;
+
+               intel_encoders_post_pll_disable(state, crtc);
 
-       intel_dmc_disable_pipe(i915, crtc->pipe);
+               intel_dmc_disable_pipe(i915, crtc->pipe);
+
+               for_each_intel_crtc_in_pipe_mask(&i915->drm, slave_crtc,
+                                                intel_crtc_bigjoiner_slave_pipes(old_crtc_state))
+                       intel_dmc_disable_pipe(i915, slave_crtc->pipe);
+       }
 }
 
 static void i9xx_pfit_enable(const struct intel_crtc_state *crtc_state)
index f0bace9..529ee22 100644 (file)
@@ -1601,6 +1601,11 @@ int intel_dp_dsc_compute_config(struct intel_dp *intel_dp,
                pipe_config->dsc.slice_count =
                        drm_dp_dsc_sink_max_slice_count(intel_dp->dsc_dpcd,
                                                        true);
+               if (!pipe_config->dsc.slice_count) {
+                       drm_dbg_kms(&dev_priv->drm, "Unsupported Slice Count %d\n",
+                                   pipe_config->dsc.slice_count);
+                       return -EINVAL;
+               }
        } else {
                u16 dsc_max_output_bpp = 0;
                u8 dsc_dp_slice_count;
index 705915d..524bd6d 100644 (file)
@@ -129,7 +129,7 @@ static int intel_dp_aux_sync_len(void)
 
 static int intel_dp_aux_fw_sync_len(void)
 {
-       int precharge = 16; /* 10-16 */
+       int precharge = 10; /* 10-16 */
        int preamble = 8;
 
        return precharge + preamble;
index 650232c..b183efa 100644 (file)
@@ -204,8 +204,6 @@ bool intel_hdcp2_capable(struct intel_connector *connector)
        struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
        struct drm_i915_private *dev_priv = to_i915(connector->base.dev);
        struct intel_hdcp *hdcp = &connector->hdcp;
-       struct intel_gt *gt = dev_priv->media_gt;
-       struct intel_gsc_uc *gsc = &gt->uc.gsc;
        bool capable = false;
 
        /* I915 support for HDCP2.2 */
@@ -213,9 +211,13 @@ bool intel_hdcp2_capable(struct intel_connector *connector)
                return false;
 
        /* If MTL+ make sure gsc is loaded and proxy is setup */
-       if (intel_hdcp_gsc_cs_required(dev_priv))
-               if (!intel_uc_fw_is_running(&gsc->fw))
+       if (intel_hdcp_gsc_cs_required(dev_priv)) {
+               struct intel_gt *gt = dev_priv->media_gt;
+               struct intel_gsc_uc *gsc = gt ? &gt->uc.gsc : NULL;
+
+               if (!gsc || !intel_uc_fw_is_running(&gsc->fw))
                        return false;
+       }
 
        /* MEI/GSC interface is solid depending on which is used */
        mutex_lock(&dev_priv->display.hdcp.comp_mutex);
index a81fa6a..7b516b1 100644 (file)
@@ -346,8 +346,10 @@ static int live_parallel_switch(void *arg)
                                continue;
 
                        ce = intel_context_create(data[m].ce[0]->engine);
-                       if (IS_ERR(ce))
+                       if (IS_ERR(ce)) {
+                               err = PTR_ERR(ce);
                                goto out;
+                       }
 
                        err = intel_context_pin(ce);
                        if (err) {
@@ -367,8 +369,10 @@ static int live_parallel_switch(void *arg)
 
                worker = kthread_create_worker(0, "igt/parallel:%s",
                                               data[n].ce[0]->engine->name);
-               if (IS_ERR(worker))
+               if (IS_ERR(worker)) {
+                       err = PTR_ERR(worker);
                        goto out;
+               }
 
                data[n].worker = worker;
        }
@@ -397,8 +401,10 @@ static int live_parallel_switch(void *arg)
                        }
                }
 
-               if (igt_live_test_end(&t))
-                       err = -EIO;
+               if (igt_live_test_end(&t)) {
+                       err = err ?: -EIO;
+                       break;
+               }
        }
 
 out:
index 736b89a..4202df5 100644 (file)
@@ -1530,8 +1530,8 @@ static int live_busywait_preempt(void *arg)
        struct drm_i915_gem_object *obj;
        struct i915_vma *vma;
        enum intel_engine_id id;
-       int err = -ENOMEM;
        u32 *map;
+       int err;
 
        /*
         * Verify that even without HAS_LOGICAL_RING_PREEMPTION, we can
@@ -1539,13 +1539,17 @@ static int live_busywait_preempt(void *arg)
         */
 
        ctx_hi = kernel_context(gt->i915, NULL);
-       if (!ctx_hi)
-               return -ENOMEM;
+       if (IS_ERR(ctx_hi))
+               return PTR_ERR(ctx_hi);
+
        ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY;
 
        ctx_lo = kernel_context(gt->i915, NULL);
-       if (!ctx_lo)
+       if (IS_ERR(ctx_lo)) {
+               err = PTR_ERR(ctx_lo);
                goto err_ctx_hi;
+       }
+
        ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY;
 
        obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
index cf49188..e0e7931 100644 (file)
        { FORCEWAKE_MT,             0,      0, "FORCEWAKE" }
 
 #define COMMON_GEN9BASE_GLOBAL \
-       { GEN8_FAULT_TLB_DATA0,     0,      0, "GEN8_FAULT_TLB_DATA0" }, \
-       { GEN8_FAULT_TLB_DATA1,     0,      0, "GEN8_FAULT_TLB_DATA1" }, \
        { ERROR_GEN6,               0,      0, "ERROR_GEN6" }, \
        { DONE_REG,                 0,      0, "DONE_REG" }, \
        { HSW_GTT_CACHE_EN,         0,      0, "HSW_GTT_CACHE_EN" }
 
+#define GEN9_GLOBAL \
+       { GEN8_FAULT_TLB_DATA0,     0,      0, "GEN8_FAULT_TLB_DATA0" }, \
+       { GEN8_FAULT_TLB_DATA1,     0,      0, "GEN8_FAULT_TLB_DATA1" }
+
 #define COMMON_GEN12BASE_GLOBAL \
        { GEN12_FAULT_TLB_DATA0,    0,      0, "GEN12_FAULT_TLB_DATA0" }, \
        { GEN12_FAULT_TLB_DATA1,    0,      0, "GEN12_FAULT_TLB_DATA1" }, \
@@ -142,6 +144,7 @@ static const struct __guc_mmio_reg_descr xe_lpd_gsc_inst_regs[] = {
 static const struct __guc_mmio_reg_descr default_global_regs[] = {
        COMMON_BASE_GLOBAL,
        COMMON_GEN9BASE_GLOBAL,
+       GEN9_GLOBAL,
 };
 
 static const struct __guc_mmio_reg_descr default_rc_class_regs[] = {
index 2a012da..edcfb5f 100644 (file)
@@ -1344,6 +1344,12 @@ static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
                return -ENODEV;
        }
 
+       if (intel_info->require_force_probe) {
+               dev_info(&pdev->dev, "Force probing unsupported Device ID %04x, tainting kernel\n",
+                        pdev->device);
+               add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+       }
+
        /* Only bind to function 0 of the device. Early generations
         * used function 1 as a placeholder for multi-head. This causes
         * us confusion instead, especially on the systems where both
index 050b8ae..3035cba 100644 (file)
@@ -877,12 +877,17 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
                        stream->oa_buffer.last_ctx_id = ctx_id;
                }
 
-               /*
-                * Clear out the report id and timestamp as a means to detect unlanded
-                * reports.
-                */
-               oa_report_id_clear(stream, report32);
-               oa_timestamp_clear(stream, report32);
+               if (is_power_of_2(report_size)) {
+                       /*
+                        * Clear out the report id and timestamp as a means
+                        * to detect unlanded reports.
+                        */
+                       oa_report_id_clear(stream, report32);
+                       oa_timestamp_clear(stream, report32);
+               } else {
+                       /* Zero out the entire report */
+                       memset(report32, 0, report_size);
+               }
        }
 
        if (start_offset != *offset) {
index ff00340..ffd91a5 100644 (file)
@@ -165,7 +165,7 @@ int lima_sched_context_init(struct lima_sched_pipe *pipe,
 void lima_sched_context_fini(struct lima_sched_pipe *pipe,
                             struct lima_sched_context *context)
 {
-       drm_sched_entity_fini(&context->base);
+       drm_sched_entity_destroy(&context->base);
 }
 
 struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task)
index 0f2dd26..af3ce5a 100644 (file)
@@ -642,6 +642,11 @@ void mgag200_crtc_helper_atomic_enable(struct drm_crtc *crtc, struct drm_atomic_
        if (funcs->pixpllc_atomic_update)
                funcs->pixpllc_atomic_update(crtc, old_state);
 
+       if (crtc_state->gamma_lut)
+               mgag200_crtc_set_gamma(mdev, format, crtc_state->gamma_lut->data);
+       else
+               mgag200_crtc_set_gamma_linear(mdev, format);
+
        mgag200_enable_display(mdev);
 
        if (funcs->enable_vidrst)
index e16b4b3..8914992 100644 (file)
@@ -1526,8 +1526,6 @@ int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
        if (!pdev)
                return -ENODEV;
 
-       mutex_init(&gmu->lock);
-
        gmu->dev = &pdev->dev;
 
        of_dma_configure(gmu->dev, node, true);
index 9fb214f..52da379 100644 (file)
@@ -1981,6 +1981,8 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
        adreno_gpu = &a6xx_gpu->base;
        gpu = &adreno_gpu->base;
 
+       mutex_init(&a6xx_gpu->gmu.lock);
+
        adreno_gpu->registers = NULL;
 
        /*
index 2b3ae84..bdcd554 100644 (file)
@@ -98,17 +98,17 @@ static const struct dpu_sspp_cfg msm8998_sspp[] = {
 
 static const struct dpu_lm_cfg msm8998_lm[] = {
        LM_BLK("lm_0", LM_0, 0x44000, MIXER_MSM8998_MASK,
-               &msm8998_lm_sblk, PINGPONG_0, LM_2, DSPP_0),
+               &msm8998_lm_sblk, PINGPONG_0, LM_1, DSPP_0),
        LM_BLK("lm_1", LM_1, 0x45000, MIXER_MSM8998_MASK,
-               &msm8998_lm_sblk, PINGPONG_1, LM_5, DSPP_1),
+               &msm8998_lm_sblk, PINGPONG_1, LM_0, DSPP_1),
        LM_BLK("lm_2", LM_2, 0x46000, MIXER_MSM8998_MASK,
-               &msm8998_lm_sblk, PINGPONG_2, LM_0, 0),
+               &msm8998_lm_sblk, PINGPONG_2, LM_5, 0),
        LM_BLK("lm_3", LM_3, 0x47000, MIXER_MSM8998_MASK,
                &msm8998_lm_sblk, PINGPONG_MAX, 0, 0),
        LM_BLK("lm_4", LM_4, 0x48000, MIXER_MSM8998_MASK,
                &msm8998_lm_sblk, PINGPONG_MAX, 0, 0),
        LM_BLK("lm_5", LM_5, 0x49000, MIXER_MSM8998_MASK,
-               &msm8998_lm_sblk, PINGPONG_3, LM_1, 0),
+               &msm8998_lm_sblk, PINGPONG_3, LM_2, 0),
 };
 
 static const struct dpu_pingpong_cfg msm8998_pp[] = {
@@ -134,10 +134,10 @@ static const struct dpu_dspp_cfg msm8998_dspp[] = {
 };
 
 static const struct dpu_intf_cfg msm8998_intf[] = {
-       INTF_BLK("intf_0", INTF_0, 0x6a000, 0x280, INTF_DP, 0, 25, INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 24, 25),
-       INTF_BLK("intf_1", INTF_1, 0x6a800, 0x280, INTF_DSI, 0, 25, INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 26, 27),
-       INTF_BLK("intf_2", INTF_2, 0x6b000, 0x280, INTF_DSI, 1, 25, INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 28, 29),
-       INTF_BLK("intf_3", INTF_3, 0x6b800, 0x280, INTF_HDMI, 0, 25, INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 30, 31),
+       INTF_BLK("intf_0", INTF_0, 0x6a000, 0x280, INTF_DP, 0, 21, INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 24, 25),
+       INTF_BLK("intf_1", INTF_1, 0x6a800, 0x280, INTF_DSI, 0, 21, INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 26, 27),
+       INTF_BLK("intf_2", INTF_2, 0x6b000, 0x280, INTF_DSI, 1, 21, INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 28, 29),
+       INTF_BLK("intf_3", INTF_3, 0x6b800, 0x280, INTF_HDMI, 0, 21, INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 30, 31),
 };
 
 static const struct dpu_perf_cfg msm8998_perf_data = {
index 282d410..42b0e58 100644 (file)
@@ -128,10 +128,10 @@ static const struct dpu_dspp_cfg sm8150_dspp[] = {
 };
 
 static const struct dpu_pingpong_cfg sm8150_pp[] = {
-       PP_BLK_TE("pingpong_0", PINGPONG_0, 0x70000, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK("pingpong_0", PINGPONG_0, 0x70000, MERGE_3D_0, sdm845_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 8),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 12)),
-       PP_BLK_TE("pingpong_1", PINGPONG_1, 0x70800, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK("pingpong_1", PINGPONG_1, 0x70800, MERGE_3D_0, sdm845_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 9),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 13)),
        PP_BLK("pingpong_2", PINGPONG_2, 0x71000, MERGE_3D_1, sdm845_pp_sblk,
index c574002..e3bdfe7 100644 (file)
@@ -116,10 +116,10 @@ static const struct dpu_lm_cfg sc8180x_lm[] = {
 };
 
 static const struct dpu_pingpong_cfg sc8180x_pp[] = {
-       PP_BLK_TE("pingpong_0", PINGPONG_0, 0x70000, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK("pingpong_0", PINGPONG_0, 0x70000, MERGE_3D_0, sdm845_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 8),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 12)),
-       PP_BLK_TE("pingpong_1", PINGPONG_1, 0x70800, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK("pingpong_1", PINGPONG_1, 0x70800, MERGE_3D_0, sdm845_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 9),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 13)),
        PP_BLK("pingpong_2", PINGPONG_2, 0x71000, MERGE_3D_1, sdm845_pp_sblk,
index 2c40229..ed13058 100644 (file)
@@ -129,10 +129,10 @@ static const struct dpu_dspp_cfg sm8250_dspp[] = {
 };
 
 static const struct dpu_pingpong_cfg sm8250_pp[] = {
-       PP_BLK_TE("pingpong_0", PINGPONG_0, 0x70000, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK("pingpong_0", PINGPONG_0, 0x70000, MERGE_3D_0, sdm845_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 8),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 12)),
-       PP_BLK_TE("pingpong_1", PINGPONG_1, 0x70800, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK("pingpong_1", PINGPONG_1, 0x70800, MERGE_3D_0, sdm845_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 9),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 13)),
        PP_BLK("pingpong_2", PINGPONG_2, 0x71000, MERGE_3D_1, sdm845_pp_sblk,
index 8799ed7..a46b117 100644 (file)
@@ -80,8 +80,8 @@ static const struct dpu_dspp_cfg sc7180_dspp[] = {
 };
 
 static const struct dpu_pingpong_cfg sc7180_pp[] = {
-       PP_BLK_TE("pingpong_0", PINGPONG_0, 0x70000, 0, sdm845_pp_sblk_te, -1, -1),
-       PP_BLK_TE("pingpong_1", PINGPONG_1, 0x70800, 0, sdm845_pp_sblk_te, -1, -1),
+       PP_BLK("pingpong_0", PINGPONG_0, 0x70000, 0, sdm845_pp_sblk, -1, -1),
+       PP_BLK("pingpong_1", PINGPONG_1, 0x70800, 0, sdm845_pp_sblk, -1, -1),
 };
 
 static const struct dpu_intf_cfg sc7180_intf[] = {
index 6f04d8f..988d820 100644 (file)
@@ -122,7 +122,6 @@ const struct dpu_mdss_cfg dpu_sm6115_cfg = {
        .mdss_irqs = BIT(MDP_SSPP_TOP0_INTR) | \
                     BIT(MDP_SSPP_TOP0_INTR2) | \
                     BIT(MDP_SSPP_TOP0_HIST_INTR) | \
-                    BIT(MDP_INTF0_INTR) | \
                     BIT(MDP_INTF1_INTR),
 };
 
index 303492d..c9003dc 100644 (file)
@@ -112,7 +112,6 @@ const struct dpu_mdss_cfg dpu_qcm2290_cfg = {
        .mdss_irqs = BIT(MDP_SSPP_TOP0_INTR) | \
                     BIT(MDP_SSPP_TOP0_INTR2) | \
                     BIT(MDP_SSPP_TOP0_HIST_INTR) | \
-                    BIT(MDP_INTF0_INTR) | \
                     BIT(MDP_INTF1_INTR),
 };
 
index ca107ca..4f6a965 100644 (file)
@@ -127,22 +127,22 @@ static const struct dpu_dspp_cfg sm8350_dspp[] = {
 };
 
 static const struct dpu_pingpong_cfg sm8350_pp[] = {
-       PP_BLK_TE("pingpong_0", PINGPONG_0, 0x69000, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK_DITHER("pingpong_0", PINGPONG_0, 0x69000, MERGE_3D_0, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 8),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 12)),
-       PP_BLK_TE("pingpong_1", PINGPONG_1, 0x6a000, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK_DITHER("pingpong_1", PINGPONG_1, 0x6a000, MERGE_3D_0, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 9),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 13)),
-       PP_BLK("pingpong_2", PINGPONG_2, 0x6b000, MERGE_3D_1, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_2", PINGPONG_2, 0x6b000, MERGE_3D_1, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 10),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 14)),
-       PP_BLK("pingpong_3", PINGPONG_3, 0x6c000, MERGE_3D_1, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_3", PINGPONG_3, 0x6c000, MERGE_3D_1, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 11),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 15)),
-       PP_BLK("pingpong_4", PINGPONG_4, 0x6d000, MERGE_3D_2, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_4", PINGPONG_4, 0x6d000, MERGE_3D_2, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 30),
                        -1),
-       PP_BLK("pingpong_5", PINGPONG_5, 0x6e000, MERGE_3D_2, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_5", PINGPONG_5, 0x6e000, MERGE_3D_2, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 31),
                        -1),
 };
index 5957de1..6b2c7ea 100644 (file)
@@ -87,10 +87,10 @@ static const struct dpu_dspp_cfg sc7280_dspp[] = {
 };
 
 static const struct dpu_pingpong_cfg sc7280_pp[] = {
-       PP_BLK("pingpong_0", PINGPONG_0, 0x69000, 0, sc7280_pp_sblk, -1, -1),
-       PP_BLK("pingpong_1", PINGPONG_1, 0x6a000, 0, sc7280_pp_sblk, -1, -1),
-       PP_BLK("pingpong_2", PINGPONG_2, 0x6b000, 0, sc7280_pp_sblk, -1, -1),
-       PP_BLK("pingpong_3", PINGPONG_3, 0x6c000, 0, sc7280_pp_sblk, -1, -1),
+       PP_BLK_DITHER("pingpong_0", PINGPONG_0, 0x69000, 0, sc7280_pp_sblk, -1, -1),
+       PP_BLK_DITHER("pingpong_1", PINGPONG_1, 0x6a000, 0, sc7280_pp_sblk, -1, -1),
+       PP_BLK_DITHER("pingpong_2", PINGPONG_2, 0x6b000, 0, sc7280_pp_sblk, -1, -1),
+       PP_BLK_DITHER("pingpong_3", PINGPONG_3, 0x6c000, 0, sc7280_pp_sblk, -1, -1),
 };
 
 static const struct dpu_intf_cfg sc7280_intf[] = {
index 9aab110..706d0f1 100644 (file)
@@ -121,18 +121,18 @@ static const struct dpu_dspp_cfg sc8280xp_dspp[] = {
 };
 
 static const struct dpu_pingpong_cfg sc8280xp_pp[] = {
-       PP_BLK_TE("pingpong_0", PINGPONG_0, 0x69000, MERGE_3D_0, sdm845_pp_sblk_te,
-                 DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 8), -1),
-       PP_BLK_TE("pingpong_1", PINGPONG_1, 0x6a000, MERGE_3D_0, sdm845_pp_sblk_te,
-                 DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 9), -1),
-       PP_BLK_TE("pingpong_2", PINGPONG_2, 0x6b000, MERGE_3D_1, sdm845_pp_sblk_te,
-                 DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 10), -1),
-       PP_BLK_TE("pingpong_3", PINGPONG_3, 0x6c000, MERGE_3D_1, sdm845_pp_sblk_te,
-                 DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 11), -1),
-       PP_BLK_TE("pingpong_4", PINGPONG_4, 0x6d000, MERGE_3D_2, sdm845_pp_sblk_te,
-                 DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 30), -1),
-       PP_BLK_TE("pingpong_5", PINGPONG_5, 0x6e000, MERGE_3D_2, sdm845_pp_sblk_te,
-                 DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 31), -1),
+       PP_BLK_DITHER("pingpong_0", PINGPONG_0, 0x69000, MERGE_3D_0, sc7280_pp_sblk,
+                       DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 8), -1),
+       PP_BLK_DITHER("pingpong_1", PINGPONG_1, 0x6a000, MERGE_3D_0, sc7280_pp_sblk,
+                       DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 9), -1),
+       PP_BLK_DITHER("pingpong_2", PINGPONG_2, 0x6b000, MERGE_3D_1, sc7280_pp_sblk,
+                       DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 10), -1),
+       PP_BLK_DITHER("pingpong_3", PINGPONG_3, 0x6c000, MERGE_3D_1, sc7280_pp_sblk,
+                       DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 11), -1),
+       PP_BLK_DITHER("pingpong_4", PINGPONG_4, 0x6d000, MERGE_3D_2, sc7280_pp_sblk,
+                       DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 30), -1),
+       PP_BLK_DITHER("pingpong_5", PINGPONG_5, 0x6e000, MERGE_3D_2, sc7280_pp_sblk,
+                       DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 31), -1),
 };
 
 static const struct dpu_merge_3d_cfg sc8280xp_merge_3d[] = {
index 02a259b..4ecb3df 100644 (file)
@@ -128,28 +128,28 @@ static const struct dpu_dspp_cfg sm8450_dspp[] = {
 };
 /* FIXME: interrupts */
 static const struct dpu_pingpong_cfg sm8450_pp[] = {
-       PP_BLK_TE("pingpong_0", PINGPONG_0, 0x69000, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK_DITHER("pingpong_0", PINGPONG_0, 0x69000, MERGE_3D_0, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 8),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 12)),
-       PP_BLK_TE("pingpong_1", PINGPONG_1, 0x6a000, MERGE_3D_0, sdm845_pp_sblk_te,
+       PP_BLK_DITHER("pingpong_1", PINGPONG_1, 0x6a000, MERGE_3D_0, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 9),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 13)),
-       PP_BLK("pingpong_2", PINGPONG_2, 0x6b000, MERGE_3D_1, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_2", PINGPONG_2, 0x6b000, MERGE_3D_1, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 10),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 14)),
-       PP_BLK("pingpong_3", PINGPONG_3, 0x6c000, MERGE_3D_1, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_3", PINGPONG_3, 0x6c000, MERGE_3D_1, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 11),
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 15)),
-       PP_BLK("pingpong_4", PINGPONG_4, 0x6d000, MERGE_3D_2, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_4", PINGPONG_4, 0x6d000, MERGE_3D_2, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 30),
                        -1),
-       PP_BLK("pingpong_5", PINGPONG_5, 0x6e000, MERGE_3D_2, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_5", PINGPONG_5, 0x6e000, MERGE_3D_2, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 31),
                        -1),
-       PP_BLK("pingpong_6", PINGPONG_6, 0x65800, MERGE_3D_3, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_6", PINGPONG_6, 0x65800, MERGE_3D_3, sc7280_pp_sblk,
                        -1,
                        -1),
-       PP_BLK("pingpong_7", PINGPONG_7, 0x65c00, MERGE_3D_3, sdm845_pp_sblk,
+       PP_BLK_DITHER("pingpong_7", PINGPONG_7, 0x65c00, MERGE_3D_3, sc7280_pp_sblk,
                        -1,
                        -1),
 };
index 9e40303..d0ab351 100644 (file)
@@ -132,28 +132,28 @@ static const struct dpu_dspp_cfg sm8550_dspp[] = {
                 &sm8150_dspp_sblk),
 };
 static const struct dpu_pingpong_cfg sm8550_pp[] = {
-       PP_BLK_DIPHER("pingpong_0", PINGPONG_0, 0x69000, MERGE_3D_0, sc7280_pp_sblk,
+       PP_BLK_DITHER("pingpong_0", PINGPONG_0, 0x69000, MERGE_3D_0, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 8),
                        -1),
-       PP_BLK_DIPHER("pingpong_1", PINGPONG_1, 0x6a000, MERGE_3D_0, sc7280_pp_sblk,
+       PP_BLK_DITHER("pingpong_1", PINGPONG_1, 0x6a000, MERGE_3D_0, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 9),
                        -1),
-       PP_BLK_DIPHER("pingpong_2", PINGPONG_2, 0x6b000, MERGE_3D_1, sc7280_pp_sblk,
+       PP_BLK_DITHER("pingpong_2", PINGPONG_2, 0x6b000, MERGE_3D_1, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 10),
                        -1),
-       PP_BLK_DIPHER("pingpong_3", PINGPONG_3, 0x6c000, MERGE_3D_1, sc7280_pp_sblk,
+       PP_BLK_DITHER("pingpong_3", PINGPONG_3, 0x6c000, MERGE_3D_1, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR, 11),
                        -1),
-       PP_BLK_DIPHER("pingpong_4", PINGPONG_4, 0x6d000, MERGE_3D_2, sc7280_pp_sblk,
+       PP_BLK_DITHER("pingpong_4", PINGPONG_4, 0x6d000, MERGE_3D_2, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 30),
                        -1),
-       PP_BLK_DIPHER("pingpong_5", PINGPONG_5, 0x6e000, MERGE_3D_2, sc7280_pp_sblk,
+       PP_BLK_DITHER("pingpong_5", PINGPONG_5, 0x6e000, MERGE_3D_2, sc7280_pp_sblk,
                        DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 31),
                        -1),
-       PP_BLK_DIPHER("pingpong_6", PINGPONG_6, 0x66000, MERGE_3D_3, sc7280_pp_sblk,
+       PP_BLK_DITHER("pingpong_6", PINGPONG_6, 0x66000, MERGE_3D_3, sc7280_pp_sblk,
                        -1,
                        -1),
-       PP_BLK_DIPHER("pingpong_7", PINGPONG_7, 0x66400, MERGE_3D_3, sc7280_pp_sblk,
+       PP_BLK_DITHER("pingpong_7", PINGPONG_7, 0x66400, MERGE_3D_3, sc7280_pp_sblk,
                        -1,
                        -1),
 };
index 03f162a..5d994bc 100644 (file)
@@ -491,7 +491,7 @@ static const struct dpu_pingpong_sub_blks sc7280_pp_sblk = {
        .len = 0x20, .version = 0x20000},
 };
 
-#define PP_BLK_DIPHER(_name, _id, _base, _merge_3d, _sblk, _done, _rdptr) \
+#define PP_BLK_DITHER(_name, _id, _base, _merge_3d, _sblk, _done, _rdptr) \
        {\
        .name = _name, .id = _id, \
        .base = _base, .len = 0, \
@@ -587,12 +587,12 @@ static const u32 sdm845_nrt_pri_lvl[] = {3, 3, 3, 3, 3, 3, 3, 3};
 
 static const struct dpu_vbif_dynamic_ot_cfg msm8998_ot_rdwr_cfg[] = {
        {
-               .pps = 1088 * 1920 * 30,
+               .pps = 1920 * 1080 * 30,
                .ot_limit = 2,
        },
        {
-               .pps = 1088 * 1920 * 60,
-               .ot_limit = 6,
+               .pps = 1920 * 1080 * 60,
+               .ot_limit = 4,
        },
        {
                .pps = 3840 * 2160 * 30,
@@ -705,10 +705,7 @@ static const struct dpu_qos_lut_entry msm8998_qos_linear[] = {
        {.fl = 10, .lut = 0x1555b},
        {.fl = 11, .lut = 0x5555b},
        {.fl = 12, .lut = 0x15555b},
-       {.fl = 13, .lut = 0x55555b},
-       {.fl = 14, .lut = 0},
-       {.fl = 1,  .lut = 0x1b},
-       {.fl = 0,  .lut = 0}
+       {.fl = 0,  .lut = 0x55555b}
 };
 
 static const struct dpu_qos_lut_entry sdm845_qos_linear[] = {
@@ -730,9 +727,7 @@ static const struct dpu_qos_lut_entry msm8998_qos_macrotile[] = {
        {.fl = 10, .lut = 0x1aaff},
        {.fl = 11, .lut = 0x5aaff},
        {.fl = 12, .lut = 0x15aaff},
-       {.fl = 13, .lut = 0x55aaff},
-       {.fl = 1,  .lut = 0x1aaff},
-       {.fl = 0,  .lut = 0},
+       {.fl = 0,  .lut = 0x55aaff},
 };
 
 static const struct dpu_qos_lut_entry sc7180_qos_linear[] = {
index 53326f2..17f3e7e 100644 (file)
@@ -15,7 +15,7 @@
 
 /*
  * Register offsets in MDSS register file for the interrupt registers
- * w.r.t. to the MDP base
+ * w.r.t. the MDP base
  */
 #define MDP_SSPP_TOP0_OFF              0x0
 #define MDP_INTF_0_OFF                 0x6A000
 #define MDP_INTF_3_OFF                 0x6B800
 #define MDP_INTF_4_OFF                 0x6C000
 #define MDP_INTF_5_OFF                 0x6C800
+#define INTF_INTR_EN                   0x1c0
+#define INTF_INTR_STATUS               0x1c4
+#define INTF_INTR_CLEAR                        0x1c8
 #define MDP_AD4_0_OFF                  0x7C000
 #define MDP_AD4_1_OFF                  0x7D000
 #define MDP_AD4_INTR_EN_OFF            0x41c
 #define MDP_AD4_INTR_CLEAR_OFF         0x424
 #define MDP_AD4_INTR_STATUS_OFF                0x420
-#define MDP_INTF_0_OFF_REV_7xxx             0x34000
-#define MDP_INTF_1_OFF_REV_7xxx             0x35000
-#define MDP_INTF_2_OFF_REV_7xxx             0x36000
-#define MDP_INTF_3_OFF_REV_7xxx             0x37000
-#define MDP_INTF_4_OFF_REV_7xxx             0x38000
-#define MDP_INTF_5_OFF_REV_7xxx             0x39000
-#define MDP_INTF_6_OFF_REV_7xxx             0x3a000
-#define MDP_INTF_7_OFF_REV_7xxx             0x3b000
-#define MDP_INTF_8_OFF_REV_7xxx             0x3c000
+#define MDP_INTF_0_OFF_REV_7xxx                0x34000
+#define MDP_INTF_1_OFF_REV_7xxx                0x35000
+#define MDP_INTF_2_OFF_REV_7xxx                0x36000
+#define MDP_INTF_3_OFF_REV_7xxx                0x37000
+#define MDP_INTF_4_OFF_REV_7xxx                0x38000
+#define MDP_INTF_5_OFF_REV_7xxx                0x39000
+#define MDP_INTF_6_OFF_REV_7xxx                0x3a000
+#define MDP_INTF_7_OFF_REV_7xxx                0x3b000
+#define MDP_INTF_8_OFF_REV_7xxx                0x3c000
 
 /**
  * struct dpu_intr_reg - array of DPU register sets
index 84ee2ef..b9dddf5 100644 (file)
 #define   INTF_TPG_RGB_MAPPING          0x11C
 #define   INTF_PROG_FETCH_START         0x170
 #define   INTF_PROG_ROT_START           0x174
-
-#define   INTF_FRAME_LINE_COUNT_EN      0x0A8
-#define   INTF_FRAME_COUNT              0x0AC
-#define   INTF_LINE_COUNT               0x0B0
-
 #define   INTF_MUX                      0x25C
 #define   INTF_STATUS                   0x26C
 
index 2d28afd..a3e413d 100644 (file)
@@ -61,6 +61,7 @@ static const struct dpu_wb_cfg *_wb_offset(enum dpu_wb wb,
        for (i = 0; i < m->wb_count; i++) {
                if (wb == m->wb[i].id) {
                        b->blk_addr = addr + m->wb[i].base;
+                       b->log_mask = DPU_DBG_MASK_WB;
                        return &m->wb[i];
                }
        }
index feb9a72..5acd568 100644 (file)
@@ -21,9 +21,6 @@
 #define HIST_INTR_EN                    0x01c
 #define HIST_INTR_STATUS                0x020
 #define HIST_INTR_CLEAR                 0x024
-#define INTF_INTR_EN                    0x1C0
-#define INTF_INTR_STATUS                0x1C4
-#define INTF_INTR_CLEAR                 0x1C8
 #define SPLIT_DISPLAY_EN                0x2F4
 #define SPLIT_DISPLAY_UPPER_PIPE_CTRL   0x2F8
 #define DSPP_IGC_COLOR0_RAM_LUTN        0x300
index 6666783..1245c7a 100644 (file)
@@ -593,6 +593,18 @@ static struct hdmi_codec_pdata codec_data = {
        .i2s = 1,
 };
 
+void dp_unregister_audio_driver(struct device *dev, struct dp_audio *dp_audio)
+{
+       struct dp_audio_private *audio_priv;
+
+       audio_priv = container_of(dp_audio, struct dp_audio_private, dp_audio);
+
+       if (audio_priv->audio_pdev) {
+               platform_device_unregister(audio_priv->audio_pdev);
+               audio_priv->audio_pdev = NULL;
+       }
+}
+
 int dp_register_audio_driver(struct device *dev,
                struct dp_audio *dp_audio)
 {
index 84e5f4a..4ab7888 100644 (file)
@@ -53,6 +53,8 @@ struct dp_audio *dp_audio_get(struct platform_device *pdev,
 int dp_register_audio_driver(struct device *dev,
                struct dp_audio *dp_audio);
 
+void dp_unregister_audio_driver(struct device *dev, struct dp_audio *dp_audio);
+
 /**
  * dp_audio_put()
  *
index 7a8cf1c..5142aeb 100644 (file)
@@ -620,7 +620,7 @@ void dp_catalog_hpd_config_intr(struct dp_catalog *dp_catalog,
                                config & DP_DP_HPD_INT_MASK);
 }
 
-void dp_catalog_ctrl_hpd_config(struct dp_catalog *dp_catalog)
+void dp_catalog_ctrl_hpd_enable(struct dp_catalog *dp_catalog)
 {
        struct dp_catalog_private *catalog = container_of(dp_catalog,
                                struct dp_catalog_private, dp_catalog);
@@ -635,6 +635,19 @@ void dp_catalog_ctrl_hpd_config(struct dp_catalog *dp_catalog)
        dp_write_aux(catalog, REG_DP_DP_HPD_CTRL, DP_DP_HPD_CTRL_HPD_EN);
 }
 
+void dp_catalog_ctrl_hpd_disable(struct dp_catalog *dp_catalog)
+{
+       struct dp_catalog_private *catalog = container_of(dp_catalog,
+                               struct dp_catalog_private, dp_catalog);
+
+       u32 reftimer = dp_read_aux(catalog, REG_DP_DP_HPD_REFTIMER);
+
+       reftimer &= ~DP_DP_HPD_REFTIMER_ENABLE;
+       dp_write_aux(catalog, REG_DP_DP_HPD_REFTIMER, reftimer);
+
+       dp_write_aux(catalog, REG_DP_DP_HPD_CTRL, 0);
+}
+
 static void dp_catalog_enable_sdp(struct dp_catalog_private *catalog)
 {
        /* trigger sdp */
index 82376a2..38786e8 100644 (file)
@@ -104,7 +104,8 @@ bool dp_catalog_ctrl_mainlink_ready(struct dp_catalog *dp_catalog);
 void dp_catalog_ctrl_enable_irq(struct dp_catalog *dp_catalog, bool enable);
 void dp_catalog_hpd_config_intr(struct dp_catalog *dp_catalog,
                        u32 intr_mask, bool en);
-void dp_catalog_ctrl_hpd_config(struct dp_catalog *dp_catalog);
+void dp_catalog_ctrl_hpd_enable(struct dp_catalog *dp_catalog);
+void dp_catalog_ctrl_hpd_disable(struct dp_catalog *dp_catalog);
 void dp_catalog_ctrl_config_psr(struct dp_catalog *dp_catalog);
 void dp_catalog_ctrl_set_psr(struct dp_catalog *dp_catalog, bool enter);
 u32 dp_catalog_link_is_connected(struct dp_catalog *dp_catalog);
index 3e13acd..03b0eda 100644 (file)
 #include "dp_audio.h"
 #include "dp_debug.h"
 
+static bool psr_enabled = false;
+module_param(psr_enabled, bool, 0);
+MODULE_PARM_DESC(psr_enabled, "enable PSR for eDP and DP displays");
+
 #define HPD_STRING_SIZE 30
 
 enum {
@@ -326,6 +330,7 @@ static void dp_display_unbind(struct device *dev, struct device *master,
        kthread_stop(dp->ev_tsk);
 
        dp_power_client_deinit(dp->power);
+       dp_unregister_audio_driver(dev, dp->audio);
        dp_aux_unregister(dp->aux);
        dp->drm_dev = NULL;
        dp->aux->drm_dev = NULL;
@@ -406,7 +411,7 @@ static int dp_display_process_hpd_high(struct dp_display_private *dp)
 
        edid = dp->panel->edid;
 
-       dp->dp_display.psr_supported = dp->panel->psr_cap.version;
+       dp->dp_display.psr_supported = dp->panel->psr_cap.version && psr_enabled;
 
        dp->audio_supported = drm_detect_monitor_audio(edid);
        dp_panel_handle_sink_request(dp->panel);
@@ -615,12 +620,6 @@ static int dp_hpd_plug_handle(struct dp_display_private *dp, u32 data)
                dp->hpd_state = ST_MAINLINK_READY;
        }
 
-       /* enable HDP irq_hpd/replug interrupt */
-       if (dp->dp_display.internal_hpd)
-               dp_catalog_hpd_config_intr(dp->catalog,
-                                          DP_DP_IRQ_HPD_INT_MASK | DP_DP_HPD_REPLUG_INT_MASK,
-                                          true);
-
        drm_dbg_dp(dp->drm_dev, "After, type=%d hpd_state=%d\n",
                        dp->dp_display.connector_type, state);
        mutex_unlock(&dp->event_mutex);
@@ -658,12 +657,6 @@ static int dp_hpd_unplug_handle(struct dp_display_private *dp, u32 data)
        drm_dbg_dp(dp->drm_dev, "Before, type=%d hpd_state=%d\n",
                        dp->dp_display.connector_type, state);
 
-       /* disable irq_hpd/replug interrupts */
-       if (dp->dp_display.internal_hpd)
-               dp_catalog_hpd_config_intr(dp->catalog,
-                                          DP_DP_IRQ_HPD_INT_MASK | DP_DP_HPD_REPLUG_INT_MASK,
-                                          false);
-
        /* unplugged, no more irq_hpd handle */
        dp_del_event(dp, EV_IRQ_HPD_INT);
 
@@ -687,10 +680,6 @@ static int dp_hpd_unplug_handle(struct dp_display_private *dp, u32 data)
                return 0;
        }
 
-       /* disable HPD plug interrupts */
-       if (dp->dp_display.internal_hpd)
-               dp_catalog_hpd_config_intr(dp->catalog, DP_DP_HPD_PLUG_INT_MASK, false);
-
        /*
         * We don't need separate work for disconnect as
         * connect/attention interrupts are disabled
@@ -706,10 +695,6 @@ static int dp_hpd_unplug_handle(struct dp_display_private *dp, u32 data)
        /* signal the disconnect event early to ensure proper teardown */
        dp_display_handle_plugged_change(&dp->dp_display, false);
 
-       /* enable HDP plug interrupt to prepare for next plugin */
-       if (dp->dp_display.internal_hpd)
-               dp_catalog_hpd_config_intr(dp->catalog, DP_DP_HPD_PLUG_INT_MASK, true);
-
        drm_dbg_dp(dp->drm_dev, "After, type=%d hpd_state=%d\n",
                        dp->dp_display.connector_type, state);
 
@@ -1082,26 +1067,6 @@ void msm_dp_snapshot(struct msm_disp_state *disp_state, struct msm_dp *dp)
        mutex_unlock(&dp_display->event_mutex);
 }
 
-static void dp_display_config_hpd(struct dp_display_private *dp)
-{
-
-       dp_display_host_init(dp);
-       dp_catalog_ctrl_hpd_config(dp->catalog);
-
-       /* Enable plug and unplug interrupts only if requested */
-       if (dp->dp_display.internal_hpd)
-               dp_catalog_hpd_config_intr(dp->catalog,
-                               DP_DP_HPD_PLUG_INT_MASK |
-                               DP_DP_HPD_UNPLUG_INT_MASK,
-                               true);
-
-       /* Enable interrupt first time
-        * we are leaving dp clocks on during disconnect
-        * and never disable interrupt
-        */
-       enable_irq(dp->irq);
-}
-
 void dp_display_set_psr(struct msm_dp *dp_display, bool enter)
 {
        struct dp_display_private *dp;
@@ -1176,7 +1141,7 @@ static int hpd_event_thread(void *data)
 
                switch (todo->event_id) {
                case EV_HPD_INIT_SETUP:
-                       dp_display_config_hpd(dp_priv);
+                       dp_display_host_init(dp_priv);
                        break;
                case EV_HPD_PLUG_INT:
                        dp_hpd_plug_handle(dp_priv, todo->data);
@@ -1282,7 +1247,6 @@ int dp_display_request_irq(struct msm_dp *dp_display)
                                dp->irq, rc);
                return rc;
        }
-       disable_irq(dp->irq);
 
        return 0;
 }
@@ -1394,13 +1358,8 @@ static int dp_pm_resume(struct device *dev)
        /* turn on dp ctrl/phy */
        dp_display_host_init(dp);
 
-       dp_catalog_ctrl_hpd_config(dp->catalog);
-
-       if (dp->dp_display.internal_hpd)
-               dp_catalog_hpd_config_intr(dp->catalog,
-                               DP_DP_HPD_PLUG_INT_MASK |
-                               DP_DP_HPD_UNPLUG_INT_MASK,
-                               true);
+       if (dp_display->is_edp)
+               dp_catalog_ctrl_hpd_enable(dp->catalog);
 
        if (dp_catalog_link_is_connected(dp->catalog)) {
                /*
@@ -1568,9 +1527,8 @@ static int dp_display_get_next_bridge(struct msm_dp *dp)
 
        if (aux_bus && dp->is_edp) {
                dp_display_host_init(dp_priv);
-               dp_catalog_ctrl_hpd_config(dp_priv->catalog);
+               dp_catalog_ctrl_hpd_enable(dp_priv->catalog);
                dp_display_host_phy_init(dp_priv);
-               enable_irq(dp_priv->irq);
 
                /*
                 * The code below assumes that the panel will finish probing
@@ -1612,7 +1570,6 @@ static int dp_display_get_next_bridge(struct msm_dp *dp)
 
 error:
        if (dp->is_edp) {
-               disable_irq(dp_priv->irq);
                dp_display_host_phy_exit(dp_priv);
                dp_display_host_deinit(dp_priv);
        }
@@ -1801,16 +1758,31 @@ void dp_bridge_hpd_enable(struct drm_bridge *bridge)
 {
        struct msm_dp_bridge *dp_bridge = to_dp_bridge(bridge);
        struct msm_dp *dp_display = dp_bridge->dp_display;
+       struct dp_display_private *dp = container_of(dp_display, struct dp_display_private, dp_display);
+
+       mutex_lock(&dp->event_mutex);
+       dp_catalog_ctrl_hpd_enable(dp->catalog);
+
+       /* enable HDP interrupts */
+       dp_catalog_hpd_config_intr(dp->catalog, DP_DP_HPD_INT_MASK, true);
 
        dp_display->internal_hpd = true;
+       mutex_unlock(&dp->event_mutex);
 }
 
 void dp_bridge_hpd_disable(struct drm_bridge *bridge)
 {
        struct msm_dp_bridge *dp_bridge = to_dp_bridge(bridge);
        struct msm_dp *dp_display = dp_bridge->dp_display;
+       struct dp_display_private *dp = container_of(dp_display, struct dp_display_private, dp_display);
+
+       mutex_lock(&dp->event_mutex);
+       /* disable HDP interrupts */
+       dp_catalog_hpd_config_intr(dp->catalog, DP_DP_HPD_INT_MASK, false);
+       dp_catalog_ctrl_hpd_disable(dp->catalog);
 
        dp_display->internal_hpd = false;
+       mutex_unlock(&dp->event_mutex);
 }
 
 void dp_bridge_hpd_notify(struct drm_bridge *bridge,
index d77fa97..9c45d64 100644 (file)
@@ -155,6 +155,8 @@ static bool can_do_async(struct drm_atomic_state *state,
        for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
                if (drm_atomic_crtc_needs_modeset(crtc_state))
                        return false;
+               if (!crtc_state->active)
+                       return false;
                if (++num_crtcs > 1)
                        return false;
                *async_crtc = crtc;
index b4cfa44..463ca41 100644 (file)
@@ -449,6 +449,8 @@ static int msm_drm_init(struct device *dev, const struct drm_driver *drv)
        if (ret)
                goto err_cleanup_mode_config;
 
+       dma_set_max_seg_size(dev, UINT_MAX);
+
        /* Bind all our sub-components: */
        ret = component_bind_all(dev, ddev);
        if (ret)
@@ -459,8 +461,6 @@ static int msm_drm_init(struct device *dev, const struct drm_driver *drv)
        if (ret)
                goto err_msm_uninit;
 
-       dma_set_max_seg_size(dev, UINT_MAX);
-
        msm_gem_shrinker_init(ddev);
 
        if (priv->kms_init) {
index db6c4e2..cd39b9d 100644 (file)
@@ -219,7 +219,8 @@ static void put_pages(struct drm_gem_object *obj)
        }
 }
 
-static struct page **msm_gem_pin_pages_locked(struct drm_gem_object *obj)
+static struct page **msm_gem_pin_pages_locked(struct drm_gem_object *obj,
+                                             unsigned madv)
 {
        struct msm_drm_private *priv = obj->dev->dev_private;
        struct msm_gem_object *msm_obj = to_msm_bo(obj);
@@ -227,7 +228,9 @@ static struct page **msm_gem_pin_pages_locked(struct drm_gem_object *obj)
 
        msm_gem_assert_locked(obj);
 
-       if (GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED)) {
+       if (GEM_WARN_ON(msm_obj->madv > madv)) {
+               DRM_DEV_ERROR(obj->dev->dev, "Invalid madv state: %u vs %u\n",
+                       msm_obj->madv, madv);
                return ERR_PTR(-EBUSY);
        }
 
@@ -248,7 +251,7 @@ struct page **msm_gem_pin_pages(struct drm_gem_object *obj)
        struct page **p;
 
        msm_gem_lock(obj);
-       p = msm_gem_pin_pages_locked(obj);
+       p = msm_gem_pin_pages_locked(obj, MSM_MADV_WILLNEED);
        msm_gem_unlock(obj);
 
        return p;
@@ -473,10 +476,7 @@ int msm_gem_pin_vma_locked(struct drm_gem_object *obj, struct msm_gem_vma *vma)
 
        msm_gem_assert_locked(obj);
 
-       if (GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED))
-               return -EBUSY;
-
-       pages = msm_gem_pin_pages_locked(obj);
+       pages = msm_gem_pin_pages_locked(obj, MSM_MADV_WILLNEED);
        if (IS_ERR(pages))
                return PTR_ERR(pages);
 
@@ -699,13 +699,7 @@ static void *get_vaddr(struct drm_gem_object *obj, unsigned madv)
        if (obj->import_attach)
                return ERR_PTR(-ENODEV);
 
-       if (GEM_WARN_ON(msm_obj->madv > madv)) {
-               DRM_DEV_ERROR(obj->dev->dev, "Invalid madv state: %u vs %u\n",
-                       msm_obj->madv, madv);
-               return ERR_PTR(-EBUSY);
-       }
-
-       pages = msm_gem_pin_pages_locked(obj);
+       pages = msm_gem_pin_pages_locked(obj, madv);
        if (IS_ERR(pages))
                return ERR_CAST(pages);
 
index aff18c2..9f5933c 100644 (file)
@@ -722,7 +722,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
        struct msm_drm_private *priv = dev->dev_private;
        struct drm_msm_gem_submit *args = data;
        struct msm_file_private *ctx = file->driver_priv;
-       struct msm_gem_submit *submit;
+       struct msm_gem_submit *submit = NULL;
        struct msm_gpu *gpu = priv->gpu;
        struct msm_gpu_submitqueue *queue;
        struct msm_ringbuffer *ring;
@@ -769,13 +769,15 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
                out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
                if (out_fence_fd < 0) {
                        ret = out_fence_fd;
-                       return ret;
+                       goto out_post_unlock;
                }
        }
 
        submit = submit_create(dev, gpu, queue, args->nr_bos, args->nr_cmds);
-       if (IS_ERR(submit))
-               return PTR_ERR(submit);
+       if (IS_ERR(submit)) {
+               ret = PTR_ERR(submit);
+               goto out_post_unlock;
+       }
 
        trace_msm_gpu_submit(pid_nr(submit->pid), ring->id, submit->ident,
                args->nr_bos, args->nr_cmds);
@@ -962,11 +964,20 @@ out:
        if (has_ww_ticket)
                ww_acquire_fini(&submit->ticket);
 out_unlock:
-       if (ret && (out_fence_fd >= 0))
-               put_unused_fd(out_fence_fd);
        mutex_unlock(&queue->lock);
 out_post_unlock:
-       msm_gem_submit_put(submit);
+       if (ret && (out_fence_fd >= 0))
+               put_unused_fd(out_fence_fd);
+
+       if (!IS_ERR_OR_NULL(submit)) {
+               msm_gem_submit_put(submit);
+       } else {
+               /*
+                * If the submit hasn't yet taken ownership of the queue
+                * then we need to drop the reference ourself:
+                */
+               msm_submitqueue_put(queue);
+       }
        if (!IS_ERR_OR_NULL(post_deps)) {
                for (i = 0; i < args->nr_out_syncobjs; ++i) {
                        kfree(post_deps[i].chain);
index 418e1e0..5cc8d35 100644 (file)
@@ -234,7 +234,12 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
        /* Get the pagetable configuration from the domain */
        if (adreno_smmu->cookie)
                ttbr1_cfg = adreno_smmu->get_ttbr1_cfg(adreno_smmu->cookie);
-       if (!ttbr1_cfg)
+
+       /*
+        * If you hit this WARN_ONCE() you are probably missing an entry in
+        * qcom_smmu_impl_of_match[] in arm-smmu-qcom.c
+        */
+       if (WARN_ONCE(!ttbr1_cfg, "No per-process page tables"))
                return ERR_PTR(-ENODEV);
 
        pagetable = kzalloc(sizeof(*pagetable), GFP_KERNEL);
@@ -410,7 +415,7 @@ struct msm_mmu *msm_iommu_gpu_new(struct device *dev, struct msm_gpu *gpu, unsig
        struct msm_mmu *mmu;
 
        mmu = msm_iommu_new(dev, quirks);
-       if (IS_ERR(mmu))
+       if (IS_ERR_OR_NULL(mmu))
                return mmu;
 
        iommu = to_msm_iommu(mmu);
index eb99d84..16d4ad5 100644 (file)
@@ -2,6 +2,8 @@
 #ifndef __NVIF_IF0012_H__
 #define __NVIF_IF0012_H__
 
+#include <drm/display/drm_dp.h>
+
 union nvif_outp_args {
        struct nvif_outp_v0 {
                __u8 version;
@@ -63,7 +65,7 @@ union nvif_outp_acquire_args {
                                __u8 hda;
                                __u8 mst;
                                __u8 pad04[4];
-                               __u8 dpcd[16];
+                               __u8 dpcd[DP_RECEIVER_CAP_SIZE];
                        } dp;
                };
        } v0;
index 8cf096f..a2ae8c2 100644 (file)
@@ -220,6 +220,9 @@ static void nouveau_dsm_pci_probe(struct pci_dev *pdev, acpi_handle *dhandle_out
        int optimus_funcs;
        struct pci_dev *parent_pdev;
 
+       if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)
+               return;
+
        *has_pr3 = false;
        parent_pdev = pci_upstream_bridge(pdev);
        if (parent_pdev) {
index 086b66b..f75c6f0 100644 (file)
@@ -730,7 +730,8 @@ out:
 #endif
 
        nouveau_connector_set_edid(nv_connector, edid);
-       nouveau_connector_set_encoder(connector, nv_encoder);
+       if (nv_encoder)
+               nouveau_connector_set_encoder(connector, nv_encoder);
        return status;
 }
 
@@ -966,7 +967,7 @@ nouveau_connector_get_modes(struct drm_connector *connector)
        /* Determine display colour depth for everything except LVDS now,
         * DP requires this before mode_valid() is called.
         */
-       if (connector->connector_type != DRM_MODE_CONNECTOR_LVDS)
+       if (connector->connector_type != DRM_MODE_CONNECTOR_LVDS && nv_connector->native_mode)
                nouveau_connector_detect_depth(connector);
 
        /* Find the native mode if this is a digital panel, if we didn't
@@ -987,7 +988,7 @@ nouveau_connector_get_modes(struct drm_connector *connector)
         * "native" mode as some VBIOS tables require us to use the
         * pixel clock as part of the lookup...
         */
-       if (connector->connector_type == DRM_MODE_CONNECTOR_LVDS)
+       if (connector->connector_type == DRM_MODE_CONNECTOR_LVDS && nv_connector->native_mode)
                nouveau_connector_detect_depth(connector);
 
        if (nv_encoder->dcb->type == DCB_OUTPUT_TV)
index cc7c5b4..7aac938 100644 (file)
@@ -137,10 +137,16 @@ nouveau_name(struct drm_device *dev)
 static inline bool
 nouveau_cli_work_ready(struct dma_fence *fence)
 {
-       if (!dma_fence_is_signaled(fence))
-               return false;
-       dma_fence_put(fence);
-       return true;
+       bool ret = true;
+
+       spin_lock_irq(fence->lock);
+       if (!dma_fence_is_signaled_locked(fence))
+               ret = false;
+       spin_unlock_irq(fence->lock);
+
+       if (ret == true)
+               dma_fence_put(fence);
+       return ret;
 }
 
 static void
index b7631c1..4e7f873 100644 (file)
@@ -3,6 +3,7 @@
 #define __NVKM_DISP_OUTP_H__
 #include "priv.h"
 
+#include <drm/display/drm_dp.h>
 #include <subdev/bios.h>
 #include <subdev/bios/dcb.h>
 #include <subdev/bios/dp.h>
@@ -42,7 +43,7 @@ struct nvkm_outp {
                        bool aux_pwr_pu;
                        u8 lttpr[6];
                        u8 lttprs;
-                       u8 dpcd[16];
+                       u8 dpcd[DP_RECEIVER_CAP_SIZE];
 
                        struct {
                                int dpcd; /* -1, or index into SUPPORTED_LINK_RATES table */
index 4f0ca70..fc283a4 100644 (file)
@@ -146,7 +146,7 @@ nvkm_uoutp_mthd_release(struct nvkm_outp *outp, void *argv, u32 argc)
 }
 
 static int
-nvkm_uoutp_mthd_acquire_dp(struct nvkm_outp *outp, u8 dpcd[16],
+nvkm_uoutp_mthd_acquire_dp(struct nvkm_outp *outp, u8 dpcd[DP_RECEIVER_CAP_SIZE],
                           u8 link_nr, u8 link_bw, bool hda, bool mst)
 {
        int ret;
index 6afdf26..b9fe926 100644 (file)
@@ -53,7 +53,7 @@ pl111_mode_valid(struct drm_simple_display_pipe *pipe,
 {
        struct drm_device *drm = pipe->crtc.dev;
        struct pl111_drm_dev_private *priv = drm->dev_private;
-       u32 cpp = priv->variant->fb_bpp / 8;
+       u32 cpp = DIV_ROUND_UP(priv->variant->fb_depth, 8);
        u64 bw;
 
        /*
index 2a46b5b..d1fe756 100644 (file)
@@ -114,7 +114,7 @@ struct drm_minor;
  *     extensions to the control register
  * @formats: array of supported pixel formats on this variant
  * @nformats: the length of the array of supported pixel formats
- * @fb_bpp: desired bits per pixel on the default framebuffer
+ * @fb_depth: desired depth per pixel on the default framebuffer
  */
 struct pl111_variant_data {
        const char *name;
@@ -126,7 +126,7 @@ struct pl111_variant_data {
        bool st_bitmux_control;
        const u32 *formats;
        unsigned int nformats;
-       unsigned int fb_bpp;
+       unsigned int fb_depth;
 };
 
 struct pl111_drm_dev_private {
index 4b2a9e9..43049c8 100644 (file)
@@ -308,7 +308,7 @@ static int pl111_amba_probe(struct amba_device *amba_dev,
        if (ret < 0)
                goto dev_put;
 
-       drm_fbdev_dma_setup(drm, priv->variant->fb_bpp);
+       drm_fbdev_dma_setup(drm, priv->variant->fb_depth);
 
        return 0;
 
@@ -351,7 +351,7 @@ static const struct pl111_variant_data pl110_variant = {
        .is_pl110 = true,
        .formats = pl110_pixel_formats,
        .nformats = ARRAY_SIZE(pl110_pixel_formats),
-       .fb_bpp = 16,
+       .fb_depth = 16,
 };
 
 /* RealView, Versatile Express etc use this modern variant */
@@ -376,7 +376,7 @@ static const struct pl111_variant_data pl111_variant = {
        .name = "PL111",
        .formats = pl111_pixel_formats,
        .nformats = ARRAY_SIZE(pl111_pixel_formats),
-       .fb_bpp = 32,
+       .fb_depth = 32,
 };
 
 static const u32 pl110_nomadik_pixel_formats[] = {
@@ -405,7 +405,7 @@ static const struct pl111_variant_data pl110_nomadik_variant = {
        .is_lcdc = true,
        .st_bitmux_control = true,
        .broken_vblank = true,
-       .fb_bpp = 16,
+       .fb_depth = 16,
 };
 
 static const struct amba_id pl111_id_table[] = {
index 1b436b7..00c3ebd 100644 (file)
@@ -316,7 +316,7 @@ static const struct pl111_variant_data pl110_integrator = {
        .broken_vblank = true,
        .formats = pl110_integrator_pixel_formats,
        .nformats = ARRAY_SIZE(pl110_integrator_pixel_formats),
-       .fb_bpp = 16,
+       .fb_depth = 16,
 };
 
 /*
@@ -330,7 +330,7 @@ static const struct pl111_variant_data pl110_impd1 = {
        .broken_vblank = true,
        .formats = pl110_integrator_pixel_formats,
        .nformats = ARRAY_SIZE(pl110_integrator_pixel_formats),
-       .fb_bpp = 16,
+       .fb_depth = 15,
 };
 
 /*
@@ -343,7 +343,7 @@ static const struct pl111_variant_data pl110_versatile = {
        .external_bgr = true,
        .formats = pl110_versatile_pixel_formats,
        .nformats = ARRAY_SIZE(pl110_versatile_pixel_formats),
-       .fb_bpp = 16,
+       .fb_depth = 16,
 };
 
 /*
@@ -355,7 +355,7 @@ static const struct pl111_variant_data pl111_realview = {
        .name = "PL111 RealView",
        .formats = pl111_realview_pixel_formats,
        .nformats = ARRAY_SIZE(pl111_realview_pixel_formats),
-       .fb_bpp = 16,
+       .fb_depth = 16,
 };
 
 /*
@@ -367,7 +367,7 @@ static const struct pl111_variant_data pl111_vexpress = {
        .name = "PL111 Versatile Express",
        .formats = pl111_realview_pixel_formats,
        .nformats = ARRAY_SIZE(pl111_realview_pixel_formats),
-       .fb_bpp = 16,
+       .fb_depth = 16,
        .broken_clockdivider = true,
 };
 
index fe76e29..8f6c3ae 100644 (file)
@@ -307,6 +307,7 @@ static void radeon_fbdev_client_unregister(struct drm_client_dev *client)
 
        if (fb_helper->info) {
                vga_switcheroo_client_fb_set(rdev->pdev, NULL);
+               drm_helper_force_disable_all(dev);
                drm_fb_helper_unregister_info(fb_helper);
        } else {
                drm_client_release(&fb_helper->client);
index bdc5af2..d3f5ddb 100644 (file)
@@ -459,7 +459,6 @@ int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data,
        struct radeon_device *rdev = dev->dev_private;
        struct drm_radeon_gem_set_domain *args = data;
        struct drm_gem_object *gobj;
-       struct radeon_bo *robj;
        int r;
 
        /* for now if someone requests domain CPU -
@@ -472,13 +471,12 @@ int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data,
                up_read(&rdev->exclusive_lock);
                return -ENOENT;
        }
-       robj = gem_to_radeon_bo(gobj);
 
        r = radeon_gem_set_domain(gobj, args->read_domains, args->write_domain);
 
        drm_gem_object_put(gobj);
        up_read(&rdev->exclusive_lock);
-       r = radeon_gem_handle_lockup(robj->rdev, r);
+       r = radeon_gem_handle_lockup(rdev, r);
        return r;
 }
 
index 3377fbc..c4dda90 100644 (file)
@@ -99,6 +99,16 @@ static void radeon_hotplug_work_func(struct work_struct *work)
 
 static void radeon_dp_work_func(struct work_struct *work)
 {
+       struct radeon_device *rdev = container_of(work, struct radeon_device,
+                                                 dp_work);
+       struct drm_device *dev = rdev->ddev;
+       struct drm_mode_config *mode_config = &dev->mode_config;
+       struct drm_connector *connector;
+
+       mutex_lock(&mode_config->mutex);
+       list_for_each_entry(connector, &mode_config->connector_list, head)
+               radeon_connector_hotplug(connector);
+       mutex_unlock(&mode_config->mutex);
 }
 
 /**
index fcd5bd7..aea5a90 100644 (file)
@@ -309,7 +309,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
  */
 void drm_sched_fault(struct drm_gpu_scheduler *sched)
 {
-       if (sched->ready)
+       if (sched->timeout_wq)
                mod_delayed_work(sched->timeout_wq, &sched->work_tdr, 0);
 }
 EXPORT_SYMBOL(drm_sched_fault);
@@ -1141,9 +1141,6 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
        for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
                struct drm_sched_rq *rq = &sched->sched_rq[i];
 
-               if (!rq)
-                       continue;
-
                spin_lock(&rq->lock);
                list_for_each_entry(s_entity, &rq->entities, list)
                        /*
index 0b74ca2..23899d7 100644 (file)
                         flags, magic, bp,              \
                         eax, ebx, ecx, edx, si, di)    \
 ({                                                     \
-        asm volatile ("push %%rbp;"                    \
+        asm volatile (                                 \
+               UNWIND_HINT_SAVE                        \
+               "push %%rbp;"                           \
+               UNWIND_HINT_UNDEFINED                   \
                 "mov %12, %%rbp;"                      \
                 VMWARE_HYPERCALL_HB_OUT                        \
-                "pop %%rbp;" :                         \
+                "pop %%rbp;"                           \
+               UNWIND_HINT_RESTORE :                   \
                 "=a"(eax),                             \
                 "=b"(ebx),                             \
                 "=c"(ecx),                             \
                        flags, magic, bp,               \
                        eax, ebx, ecx, edx, si, di)     \
 ({                                                     \
-        asm volatile ("push %%rbp;"                    \
+        asm volatile (                                 \
+               UNWIND_HINT_SAVE                        \
+               "push %%rbp;"                           \
+               UNWIND_HINT_UNDEFINED                   \
                 "mov %12, %%rbp;"                      \
                 VMWARE_HYPERCALL_HB_IN                 \
-                "pop %%rbp" :                          \
+                "pop %%rbp;"                           \
+               UNWIND_HINT_RESTORE :                   \
                 "=a"(eax),                             \
                 "=b"(ebx),                             \
                 "=c"(ecx),                             \
index e3799a5..9c88861 100644 (file)
@@ -187,8 +187,8 @@ _gb_connection_create(struct gb_host_device *hd, int hd_cport_id,
        spin_lock_init(&connection->lock);
        INIT_LIST_HEAD(&connection->operations);
 
-       connection->wq = alloc_workqueue("%s:%d", WQ_UNBOUND, 1,
-                                        dev_name(&hd->dev), hd_cport_id);
+       connection->wq = alloc_ordered_workqueue("%s:%d", 0, dev_name(&hd->dev),
+                                                hd_cport_id);
        if (!connection->wq) {
                ret = -ENOMEM;
                goto err_free_connection;
index 16cced8..0d7e749 100644 (file)
@@ -1318,7 +1318,7 @@ struct gb_svc *gb_svc_create(struct gb_host_device *hd)
        if (!svc)
                return NULL;
 
-       svc->wq = alloc_workqueue("%s:svc", WQ_UNBOUND, 1, dev_name(&hd->dev));
+       svc->wq = alloc_ordered_workqueue("%s:svc", 0, dev_name(&hd->dev));
        if (!svc->wq) {
                kfree(svc);
                return NULL;
index 7ae5f27..c6bdb9c 100644 (file)
@@ -587,6 +587,8 @@ static const struct hid_device_id hammer_devices[] = {
        { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC,
                     USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_HAMMER) },
        { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC,
+                    USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_JEWEL) },
+       { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC,
                     USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_MAGNEMITE) },
        { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC,
                     USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_MASTERBALL) },
index d79e946..5d29aba 100644 (file)
 #define USB_DEVICE_ID_GOOGLE_MOONBALL  0x5044
 #define USB_DEVICE_ID_GOOGLE_DON       0x5050
 #define USB_DEVICE_ID_GOOGLE_EEL       0x5057
+#define USB_DEVICE_ID_GOOGLE_JEWEL     0x5061
 
 #define USB_VENDOR_ID_GOTOP            0x08f2
 #define USB_DEVICE_ID_SUPER_Q2         0x007f
index 0fcfd85..5e1a412 100644 (file)
@@ -286,7 +286,7 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
        struct hidpp_report *message,
        struct hidpp_report *response)
 {
-       int ret;
+       int ret = -1;
        int max_retries = 3;
 
        mutex_lock(&hidpp->send_mutex);
@@ -300,13 +300,13 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
         */
        *response = *message;
 
-       for (; max_retries != 0; max_retries--) {
+       for (; max_retries != 0 && ret; max_retries--) {
                ret = __hidpp_send_report(hidpp->hid_dev, message);
 
                if (ret) {
                        dbg_hid("__hidpp_send_report returned err: %d\n", ret);
                        memset(response, 0, sizeof(struct hidpp_report));
-                       goto exit;
+                       break;
                }
 
                if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
@@ -314,13 +314,14 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
                        dbg_hid("%s:timeout waiting for response\n", __func__);
                        memset(response, 0, sizeof(struct hidpp_report));
                        ret = -ETIMEDOUT;
+                       break;
                }
 
                if (response->report_id == REPORT_ID_HIDPP_SHORT &&
                    response->rap.sub_id == HIDPP_ERROR) {
                        ret = response->rap.params[1];
                        dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
-                       goto exit;
+                       break;
                }
 
                if ((response->report_id == REPORT_ID_HIDPP_LONG ||
@@ -329,13 +330,12 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
                        ret = response->fap.params[1];
                        if (ret != HIDPP20_ERROR_BUSY) {
                                dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret);
-                               goto exit;
+                               break;
                        }
                        dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret);
                }
        }
 
-exit:
        mutex_unlock(&hidpp->send_mutex);
        return ret;
 
index 8214896..76e5353 100644 (file)
@@ -2224,7 +2224,9 @@ static void wacom_update_name(struct wacom *wacom, const char *suffix)
                } else if (strstr(product_name, "Wacom") ||
                           strstr(product_name, "wacom") ||
                           strstr(product_name, "WACOM")) {
-                       strscpy(name, product_name, sizeof(name));
+                       if (strscpy(name, product_name, sizeof(name)) < 0) {
+                               hid_warn(wacom->hdev, "String overflow while assembling device name");
+                       }
                } else {
                        snprintf(name, sizeof(name), "Wacom %s", product_name);
                }
@@ -2242,7 +2244,9 @@ static void wacom_update_name(struct wacom *wacom, const char *suffix)
                if (name[strlen(name)-1] == ' ')
                        name[strlen(name)-1] = '\0';
        } else {
-               strscpy(name, features->name, sizeof(name));
+               if (strscpy(name, features->name, sizeof(name)) < 0) {
+                       hid_warn(wacom->hdev, "String overflow while assembling device name");
+               }
        }
 
        snprintf(wacom_wac->name, sizeof(wacom_wac->name), "%s%s",
@@ -2410,8 +2414,13 @@ static int wacom_parse_and_register(struct wacom *wacom, bool wireless)
                goto fail_quirks;
        }
 
-       if (features->device_type & WACOM_DEVICETYPE_WL_MONITOR)
+       if (features->device_type & WACOM_DEVICETYPE_WL_MONITOR) {
                error = hid_hw_open(hdev);
+               if (error) {
+                       hid_err(hdev, "hw open failed\n");
+                       goto fail_quirks;
+               }
+       }
 
        wacom_set_shared_values(wacom_wac);
        devres_close_group(&hdev->dev, wacom);
@@ -2500,8 +2509,10 @@ static void wacom_wireless_work(struct work_struct *work)
                                goto fail;
                }
 
-               strscpy(wacom_wac->name, wacom_wac1->name,
-                       sizeof(wacom_wac->name));
+               if (strscpy(wacom_wac->name, wacom_wac1->name,
+                       sizeof(wacom_wac->name)) < 0) {
+                       hid_warn(wacom->hdev, "String overflow while assembling device name");
+               }
        }
 
        return;
index dc0f7d9..2ccf838 100644 (file)
@@ -831,7 +831,7 @@ static int wacom_intuos_inout(struct wacom_wac *wacom)
        /* Enter report */
        if ((data[1] & 0xfc) == 0xc0) {
                /* serial number of the tool */
-               wacom->serial[idx] = ((data[3] & 0x0f) << 28) +
+               wacom->serial[idx] = ((__u64)(data[3] & 0x0f) << 28) +
                        (data[4] << 20) + (data[5] << 12) +
                        (data[6] << 4) + (data[7] >> 4);
 
index 007f26d..2f4d09c 100644 (file)
@@ -829,11 +829,22 @@ static void vmbus_wait_for_unload(void)
                if (completion_done(&vmbus_connection.unload_event))
                        goto completed;
 
-               for_each_online_cpu(cpu) {
+               for_each_present_cpu(cpu) {
                        struct hv_per_cpu_context *hv_cpu
                                = per_cpu_ptr(hv_context.cpu_context, cpu);
 
+                       /*
+                        * In a CoCo VM the synic_message_page is not allocated
+                        * in hv_synic_alloc(). Instead it is set/cleared in
+                        * hv_synic_enable_regs() and hv_synic_disable_regs()
+                        * such that it is set only when the CPU is online. If
+                        * not all present CPUs are online, the message page
+                        * might be NULL, so skip such CPUs.
+                        */
                        page_addr = hv_cpu->synic_message_page;
+                       if (!page_addr)
+                               continue;
+
                        msg = (struct hv_message *)page_addr
                                + VMBUS_MESSAGE_SINT;
 
@@ -867,11 +878,14 @@ completed:
         * maybe-pending messages on all CPUs to be able to receive new
         * messages after we reconnect.
         */
-       for_each_online_cpu(cpu) {
+       for_each_present_cpu(cpu) {
                struct hv_per_cpu_context *hv_cpu
                        = per_cpu_ptr(hv_context.cpu_context, cpu);
 
                page_addr = hv_cpu->synic_message_page;
+               if (!page_addr)
+                       continue;
+
                msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
                msg->header.message_type = HVMSG_NONE;
        }
index 64f9cec..542a1d5 100644 (file)
@@ -364,13 +364,20 @@ int hv_common_cpu_init(unsigned int cpu)
        flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
 
        inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
-       *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
-       if (!(*inputarg))
-               return -ENOMEM;
 
-       if (hv_root_partition) {
-               outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
-               *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
+       /*
+        * hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory is already
+        * allocated if this CPU was previously online and then taken offline
+        */
+       if (!*inputarg) {
+               *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
+               if (!(*inputarg))
+                       return -ENOMEM;
+
+               if (hv_root_partition) {
+                       outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+                       *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
+               }
        }
 
        msr_vp_index = hv_get_register(HV_REGISTER_VP_INDEX);
@@ -385,24 +392,17 @@ int hv_common_cpu_init(unsigned int cpu)
 
 int hv_common_cpu_die(unsigned int cpu)
 {
-       unsigned long flags;
-       void **inputarg, **outputarg;
-       void *mem;
-
-       local_irq_save(flags);
-
-       inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
-       mem = *inputarg;
-       *inputarg = NULL;
-
-       if (hv_root_partition) {
-               outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
-               *outputarg = NULL;
-       }
-
-       local_irq_restore(flags);
-
-       kfree(mem);
+       /*
+        * The hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory
+        * is not freed when the CPU goes offline as the hyperv_pcpu_input_arg
+        * may be used by the Hyper-V vPCI driver in reassigning interrupts
+        * as part of the offlining process.  The interrupt reassignment
+        * happens *after* the CPUHP_AP_HYPERV_ONLINE state has run and
+        * called this function.
+        *
+        * If a previously offlined CPU is brought back online again, the
+        * originally allocated memory is reused in hv_common_cpu_init().
+        */
 
        return 0;
 }
index 1c65a6d..67f95a2 100644 (file)
@@ -1372,7 +1372,7 @@ static int vmbus_bus_init(void)
        ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
                                hv_synic_init, hv_synic_cleanup);
        if (ret < 0)
-               goto err_cpuhp;
+               goto err_alloc;
        hyperv_cpuhp_online = ret;
 
        ret = vmbus_connect();
@@ -1392,9 +1392,8 @@ static int vmbus_bus_init(void)
 
 err_connect:
        cpuhp_remove_state(hyperv_cpuhp_online);
-err_cpuhp:
-       hv_synic_free();
 err_alloc:
+       hv_synic_free();
        if (vmbus_irq == -1) {
                hv_remove_vmbus_handler();
        } else {
index ba2f6a4..7b177b9 100644 (file)
@@ -507,6 +507,7 @@ static const struct pci_device_id k10temp_id_table[] = {
        { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) },
        { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_M60H_DF_F3) },
        { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_M70H_DF_F3) },
+       { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_M78H_DF_F3) },
        { PCI_VDEVICE(HYGON, PCI_DEVICE_ID_AMD_17H_DF_F3) },
        {}
 };
index 711f451..89e8ed2 100644 (file)
@@ -402,6 +402,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
                trace_id = coresight_trace_id_get_cpu_id(cpu);
                if (!IS_VALID_CS_TRACE_ID(trace_id)) {
                        cpumask_clear_cpu(cpu, mask);
+                       coresight_release_path(path);
                        continue;
                }
 
index 918d461..eaa296c 100644 (file)
@@ -942,7 +942,7 @@ tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
 
        len = tmc_etr_buf_get_data(etr_buf, offset,
                                   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
-       if (WARN_ON(len < CORESIGHT_BARRIER_PKT_SIZE))
+       if (WARN_ON(len < 0 || len < CORESIGHT_BARRIER_PKT_SIZE))
                return -EINVAL;
        coresight_insert_barrier_packet(bufp);
        return offset + CORESIGHT_BARRIER_PKT_SIZE;
index 1fc4fd7..1bab91c 100644 (file)
@@ -218,7 +218,7 @@ static inline void set_trbe_enabled(struct trbe_cpudata *cpudata, u64 trblimitr)
         * Enable the TRBE without clearing LIMITPTR which
         * might be required for fetching the buffer limits.
         */
-       trblimitr |= TRBLIMITR_ENABLE;
+       trblimitr |= TRBLIMITR_EL1_E;
        write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
 
        /* Synchronize the TRBE enable event */
@@ -236,7 +236,7 @@ static inline void set_trbe_disabled(struct trbe_cpudata *cpudata)
         * Disable the TRBE without clearing LIMITPTR which
         * might be required for fetching the buffer limits.
         */
-       trblimitr &= ~TRBLIMITR_ENABLE;
+       trblimitr &= ~TRBLIMITR_EL1_E;
        write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
 
        if (trbe_needs_drain_after_disable(cpudata))
@@ -582,12 +582,12 @@ static void clr_trbe_status(void)
        u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
 
        WARN_ON(is_trbe_enabled());
-       trbsr &= ~TRBSR_IRQ;
-       trbsr &= ~TRBSR_TRG;
-       trbsr &= ~TRBSR_WRAP;
-       trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
-       trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
-       trbsr &= ~TRBSR_STOP;
+       trbsr &= ~TRBSR_EL1_IRQ;
+       trbsr &= ~TRBSR_EL1_TRG;
+       trbsr &= ~TRBSR_EL1_WRAP;
+       trbsr &= ~TRBSR_EL1_EC_MASK;
+       trbsr &= ~TRBSR_EL1_BSC_MASK;
+       trbsr &= ~TRBSR_EL1_S;
        write_sysreg_s(trbsr, SYS_TRBSR_EL1);
 }
 
@@ -596,13 +596,13 @@ static void set_trbe_limit_pointer_enabled(struct trbe_buf *buf)
        u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
        unsigned long addr = buf->trbe_limit;
 
-       WARN_ON(!IS_ALIGNED(addr, (1UL << TRBLIMITR_LIMIT_SHIFT)));
+       WARN_ON(!IS_ALIGNED(addr, (1UL << TRBLIMITR_EL1_LIMIT_SHIFT)));
        WARN_ON(!IS_ALIGNED(addr, PAGE_SIZE));
 
-       trblimitr &= ~TRBLIMITR_NVM;
-       trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
-       trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
-       trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
+       trblimitr &= ~TRBLIMITR_EL1_nVM;
+       trblimitr &= ~TRBLIMITR_EL1_FM_MASK;
+       trblimitr &= ~TRBLIMITR_EL1_TM_MASK;
+       trblimitr &= ~TRBLIMITR_EL1_LIMIT_MASK;
 
        /*
         * Fill trace buffer mode is used here while configuring the
@@ -613,14 +613,15 @@ static void set_trbe_limit_pointer_enabled(struct trbe_buf *buf)
         * trace data in the interrupt handler, before reconfiguring
         * the TRBE.
         */
-       trblimitr |= (TRBE_FILL_MODE_FILL & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT;
+       trblimitr |= (TRBLIMITR_EL1_FM_FILL << TRBLIMITR_EL1_FM_SHIFT) &
+                    TRBLIMITR_EL1_FM_MASK;
 
        /*
         * Trigger mode is not used here while configuring the TRBE for
         * the trace capture. Hence just keep this in the ignore mode.
         */
-       trblimitr |= (TRBE_TRIG_MODE_IGNORE & TRBLIMITR_TRIG_MODE_MASK) <<
-                     TRBLIMITR_TRIG_MODE_SHIFT;
+       trblimitr |= (TRBLIMITR_EL1_TM_IGNR << TRBLIMITR_EL1_TM_SHIFT) &
+                    TRBLIMITR_EL1_TM_MASK;
        trblimitr |= (addr & PAGE_MASK);
        set_trbe_enabled(buf->cpudata, trblimitr);
 }
index 98ff1b1..77cbb5c 100644 (file)
@@ -30,7 +30,7 @@ static inline bool is_trbe_enabled(void)
 {
        u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
 
-       return trblimitr & TRBLIMITR_ENABLE;
+       return trblimitr & TRBLIMITR_EL1_E;
 }
 
 #define TRBE_EC_OTHERS         0
@@ -39,7 +39,7 @@ static inline bool is_trbe_enabled(void)
 
 static inline int get_trbe_ec(u64 trbsr)
 {
-       return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
+       return (trbsr & TRBSR_EL1_EC_MASK) >> TRBSR_EL1_EC_SHIFT;
 }
 
 #define TRBE_BSC_NOT_STOPPED 0
@@ -48,63 +48,55 @@ static inline int get_trbe_ec(u64 trbsr)
 
 static inline int get_trbe_bsc(u64 trbsr)
 {
-       return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
+       return (trbsr & TRBSR_EL1_BSC_MASK) >> TRBSR_EL1_BSC_SHIFT;
 }
 
 static inline void clr_trbe_irq(void)
 {
        u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
 
-       trbsr &= ~TRBSR_IRQ;
+       trbsr &= ~TRBSR_EL1_IRQ;
        write_sysreg_s(trbsr, SYS_TRBSR_EL1);
 }
 
 static inline bool is_trbe_irq(u64 trbsr)
 {
-       return trbsr & TRBSR_IRQ;
+       return trbsr & TRBSR_EL1_IRQ;
 }
 
 static inline bool is_trbe_trg(u64 trbsr)
 {
-       return trbsr & TRBSR_TRG;
+       return trbsr & TRBSR_EL1_TRG;
 }
 
 static inline bool is_trbe_wrap(u64 trbsr)
 {
-       return trbsr & TRBSR_WRAP;
+       return trbsr & TRBSR_EL1_WRAP;
 }
 
 static inline bool is_trbe_abort(u64 trbsr)
 {
-       return trbsr & TRBSR_ABORT;
+       return trbsr & TRBSR_EL1_EA;
 }
 
 static inline bool is_trbe_running(u64 trbsr)
 {
-       return !(trbsr & TRBSR_STOP);
+       return !(trbsr & TRBSR_EL1_S);
 }
 
-#define TRBE_TRIG_MODE_STOP            0
-#define TRBE_TRIG_MODE_IRQ             1
-#define TRBE_TRIG_MODE_IGNORE          3
-
-#define TRBE_FILL_MODE_FILL            0
-#define TRBE_FILL_MODE_WRAP            1
-#define TRBE_FILL_MODE_CIRCULAR_BUFFER 3
-
 static inline bool get_trbe_flag_update(u64 trbidr)
 {
-       return trbidr & TRBIDR_FLAG;
+       return trbidr & TRBIDR_EL1_F;
 }
 
 static inline bool is_trbe_programmable(u64 trbidr)
 {
-       return !(trbidr & TRBIDR_PROG);
+       return !(trbidr & TRBIDR_EL1_P);
 }
 
 static inline int get_trbe_address_align(u64 trbidr)
 {
-       return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
+       return (trbidr & TRBIDR_EL1_Align_MASK) >> TRBIDR_EL1_Align_SHIFT;
 }
 
 static inline unsigned long get_trbe_write_pointer(void)
@@ -121,7 +113,7 @@ static inline void set_trbe_write_pointer(unsigned long addr)
 static inline unsigned long get_trbe_limit_pointer(void)
 {
        u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
-       unsigned long addr = trblimitr & (TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
+       unsigned long addr = trblimitr & TRBLIMITR_EL1_LIMIT_MASK;
 
        WARN_ON(!IS_ALIGNED(addr, PAGE_SIZE));
        return addr;
@@ -130,7 +122,7 @@ static inline unsigned long get_trbe_limit_pointer(void)
 static inline unsigned long get_trbe_base_pointer(void)
 {
        u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
-       unsigned long addr = trbbaser & (TRBBASER_BASE_MASK << TRBBASER_BASE_SHIFT);
+       unsigned long addr = trbbaser & TRBBASER_EL1_BASE_MASK;
 
        WARN_ON(!IS_ALIGNED(addr, PAGE_SIZE));
        return addr;
@@ -139,7 +131,7 @@ static inline unsigned long get_trbe_base_pointer(void)
 static inline void set_trbe_base_pointer(unsigned long addr)
 {
        WARN_ON(is_trbe_enabled());
-       WARN_ON(!IS_ALIGNED(addr, (1UL << TRBBASER_BASE_SHIFT)));
+       WARN_ON(!IS_ALIGNED(addr, (1UL << TRBBASER_EL1_BASE_SHIFT)));
        WARN_ON(!IS_ALIGNED(addr, PAGE_SIZE));
        write_sysreg_s(addr, SYS_TRBBASER_EL1);
 }
index c5d87aa..bf23bfb 100644 (file)
@@ -40,6 +40,7 @@
 #define DW_IC_CON_BUS_CLEAR_CTRL               BIT(11)
 
 #define DW_IC_DATA_CMD_DAT                     GENMASK(7, 0)
+#define DW_IC_DATA_CMD_FIRST_DATA_BYTE         BIT(11)
 
 /*
  * Registers offset
index cec2505..2e079cf 100644 (file)
@@ -176,6 +176,10 @@ static irqreturn_t i2c_dw_isr_slave(int this_irq, void *dev_id)
 
                do {
                        regmap_read(dev->map, DW_IC_DATA_CMD, &tmp);
+                       if (tmp & DW_IC_DATA_CMD_FIRST_DATA_BYTE)
+                               i2c_slave_event(dev->slave,
+                                               I2C_SLAVE_WRITE_REQUESTED,
+                                               &val);
                        val = tmp;
                        i2c_slave_event(dev->slave, I2C_SLAVE_WRITE_RECEIVED,
                                        &val);
index 8e98794..39c479f 100644 (file)
 #define IMG_I2C_TIMEOUT                        (msecs_to_jiffies(1000))
 
 /*
- * Worst incs are 1 (innacurate) and 16*256 (irregular).
+ * Worst incs are 1 (inaccurate) and 16*256 (irregular).
  * So a sensible inc is the logarithmic mean: 64 (2^6), which is
  * in the middle of the valid range (0-127).
  */
index 1af0a63..4d24ceb 100644 (file)
@@ -201,8 +201,8 @@ static void lpi2c_imx_stop(struct lpi2c_imx_struct *lpi2c_imx)
 /* CLKLO = I2C_CLK_RATIO * CLKHI, SETHOLD = CLKHI, DATAVD = CLKHI/2 */
 static int lpi2c_imx_config(struct lpi2c_imx_struct *lpi2c_imx)
 {
-       u8 prescale, filt, sethold, clkhi, clklo, datavd;
-       unsigned int clk_rate, clk_cycle;
+       u8 prescale, filt, sethold, datavd;
+       unsigned int clk_rate, clk_cycle, clkhi, clklo;
        enum lpi2c_imx_pincfg pincfg;
        unsigned int temp;
 
index b21ffd6..5ef136c 100644 (file)
@@ -1118,8 +1118,10 @@ static int pci1xxxx_i2c_resume(struct device *dev)
 static DEFINE_SIMPLE_DEV_PM_OPS(pci1xxxx_i2c_pm_ops, pci1xxxx_i2c_suspend,
                         pci1xxxx_i2c_resume);
 
-static void pci1xxxx_i2c_shutdown(struct pci1xxxx_i2c *i2c)
+static void pci1xxxx_i2c_shutdown(void *data)
 {
+       struct pci1xxxx_i2c *i2c = data;
+
        pci1xxxx_i2c_config_padctrl(i2c, false);
        pci1xxxx_i2c_configure_core_reg(i2c, false);
 }
@@ -1156,7 +1158,7 @@ static int pci1xxxx_i2c_probe_pci(struct pci_dev *pdev,
        init_completion(&i2c->i2c_xfer_done);
        pci1xxxx_i2c_init(i2c);
 
-       ret = devm_add_action(dev, (void (*)(void *))pci1xxxx_i2c_shutdown, i2c);
+       ret = devm_add_action(dev, pci1xxxx_i2c_shutdown, i2c);
        if (ret)
                return ret;
 
index 047dfef..878c076 100644 (file)
@@ -520,6 +520,17 @@ mv64xxx_i2c_intr(int irq, void *dev_id)
 
        while (readl(drv_data->reg_base + drv_data->reg_offsets.control) &
                                                MV64XXX_I2C_REG_CONTROL_IFLG) {
+               /*
+                * It seems that sometime the controller updates the status
+                * register only after it asserts IFLG in control register.
+                * This may result in weird bugs when in atomic mode. A delay
+                * of 100 ns before reading the status register solves this
+                * issue. This bug does not seem to appear when using
+                * interrupts.
+                */
+               if (drv_data->atomic)
+                       ndelay(100);
+
                status = readl(drv_data->reg_base + drv_data->reg_offsets.status);
                mv64xxx_i2c_fsm(drv_data, status);
                mv64xxx_i2c_do_action(drv_data);
index 2e153f2..7868238 100644 (file)
@@ -1752,16 +1752,21 @@ nodma:
        if (!clk_freq || clk_freq > I2C_MAX_FAST_MODE_PLUS_FREQ) {
                dev_err(qup->dev, "clock frequency not supported %d\n",
                        clk_freq);
-               return -EINVAL;
+               ret = -EINVAL;
+               goto fail_dma;
        }
 
        qup->base = devm_platform_ioremap_resource(pdev, 0);
-       if (IS_ERR(qup->base))
-               return PTR_ERR(qup->base);
+       if (IS_ERR(qup->base)) {
+               ret = PTR_ERR(qup->base);
+               goto fail_dma;
+       }
 
        qup->irq = platform_get_irq(pdev, 0);
-       if (qup->irq < 0)
-               return qup->irq;
+       if (qup->irq < 0) {
+               ret = qup->irq;
+               goto fail_dma;
+       }
 
        if (has_acpi_companion(qup->dev)) {
                ret = device_property_read_u32(qup->dev,
@@ -1775,13 +1780,15 @@ nodma:
                qup->clk = devm_clk_get(qup->dev, "core");
                if (IS_ERR(qup->clk)) {
                        dev_err(qup->dev, "Could not get core clock\n");
-                       return PTR_ERR(qup->clk);
+                       ret = PTR_ERR(qup->clk);
+                       goto fail_dma;
                }
 
                qup->pclk = devm_clk_get(qup->dev, "iface");
                if (IS_ERR(qup->pclk)) {
                        dev_err(qup->dev, "Could not get iface clock\n");
-                       return PTR_ERR(qup->pclk);
+                       ret = PTR_ERR(qup->pclk);
+                       goto fail_dma;
                }
                qup_i2c_enable_clocks(qup);
                src_clk_freq = clk_get_rate(qup->clk);
index 4fe15cd..ffc54fb 100644 (file)
@@ -576,12 +576,14 @@ static int sprd_i2c_remove(struct platform_device *pdev)
        struct sprd_i2c *i2c_dev = platform_get_drvdata(pdev);
        int ret;
 
-       ret = pm_runtime_resume_and_get(i2c_dev->dev);
+       ret = pm_runtime_get_sync(i2c_dev->dev);
        if (ret < 0)
-               return ret;
+               dev_err(&pdev->dev, "Failed to resume device (%pe)\n", ERR_PTR(ret));
 
        i2c_del_adapter(&i2c_dev->adap);
-       clk_disable_unprepare(i2c_dev->clk);
+
+       if (ret >= 0)
+               clk_disable_unprepare(i2c_dev->clk);
 
        pm_runtime_put_noidle(i2c_dev->dev);
        pm_runtime_disable(i2c_dev->dev);
index aa2d19d..34201d7 100644 (file)
@@ -199,6 +199,43 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev,
        return __intel_idle(dev, drv, index);
 }
 
+static __always_inline int __intel_idle_hlt(struct cpuidle_device *dev,
+                                       struct cpuidle_driver *drv, int index)
+{
+       raw_safe_halt();
+       raw_local_irq_disable();
+       return index;
+}
+
+/**
+ * intel_idle_hlt - Ask the processor to enter the given idle state using hlt.
+ * @dev: cpuidle device of the target CPU.
+ * @drv: cpuidle driver (assumed to point to intel_idle_driver).
+ * @index: Target idle state index.
+ *
+ * Use the HLT instruction to notify the processor that the CPU represented by
+ * @dev is idle and it can try to enter the idle state corresponding to @index.
+ *
+ * Must be called under local_irq_disable().
+ */
+static __cpuidle int intel_idle_hlt(struct cpuidle_device *dev,
+                               struct cpuidle_driver *drv, int index)
+{
+       return __intel_idle_hlt(dev, drv, index);
+}
+
+static __cpuidle int intel_idle_hlt_irq_on(struct cpuidle_device *dev,
+                                   struct cpuidle_driver *drv, int index)
+{
+       int ret;
+
+       raw_local_irq_enable();
+       ret = __intel_idle_hlt(dev, drv, index);
+       raw_local_irq_disable();
+
+       return ret;
+}
+
 /**
  * intel_idle_s2idle - Ask the processor to enter the given idle state.
  * @dev: cpuidle device of the target CPU.
@@ -1242,6 +1279,25 @@ static struct cpuidle_state snr_cstates[] __initdata = {
                .enter = NULL }
 };
 
+static struct cpuidle_state vmguest_cstates[] __initdata = {
+       {
+               .name = "C1",
+               .desc = "HLT",
+               .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_IRQ_ENABLE,
+               .exit_latency = 5,
+               .target_residency = 10,
+               .enter = &intel_idle_hlt, },
+       {
+               .name = "C1L",
+               .desc = "Long HLT",
+               .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TLB_FLUSHED,
+               .exit_latency = 5,
+               .target_residency = 200,
+               .enter = &intel_idle_hlt, },
+       {
+               .enter = NULL }
+};
+
 static const struct idle_cpu idle_cpu_nehalem __initconst = {
        .state_table = nehalem_cstates,
        .auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE,
@@ -1839,6 +1895,66 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
        return true;
 }
 
+static void state_update_enter_method(struct cpuidle_state *state, int cstate)
+{
+       if (state->enter == intel_idle_hlt) {
+               if (force_irq_on) {
+                       pr_info("forced intel_idle_irq for state %d\n", cstate);
+                       state->enter = intel_idle_hlt_irq_on;
+               }
+               return;
+       }
+       if (state->enter == intel_idle_hlt_irq_on)
+               return; /* no update scenarios */
+
+       if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
+               /*
+                * Combining with XSTATE with IBRS or IRQ_ENABLE flags
+                * is not currently supported but this driver.
+                */
+               WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
+               WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
+               state->enter = intel_idle_xstate;
+               return;
+       }
+
+       if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
+                          state->flags & CPUIDLE_FLAG_IBRS) {
+               /*
+                * IBRS mitigation requires that C-states are entered
+                * with interrupts disabled.
+                */
+               WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
+               state->enter = intel_idle_ibrs;
+               return;
+       }
+
+       if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
+               state->enter = intel_idle_irq;
+               return;
+       }
+
+       if (force_irq_on) {
+               pr_info("forced intel_idle_irq for state %d\n", cstate);
+               state->enter = intel_idle_irq;
+       }
+}
+
+/*
+ * For mwait based states, we want to verify the cpuid data to see if the state
+ * is actually supported by this specific CPU.
+ * For non-mwait based states, this check should be skipped.
+ */
+static bool should_verify_mwait(struct cpuidle_state *state)
+{
+       if (state->enter == intel_idle_hlt)
+               return false;
+       if (state->enter == intel_idle_hlt_irq_on)
+               return false;
+
+       return true;
+}
+
 static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 {
        int cstate;
@@ -1887,35 +2003,15 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
                }
 
                mwait_hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
-               if (!intel_idle_verify_cstate(mwait_hint))
+               if (should_verify_mwait(&cpuidle_state_table[cstate]) && !intel_idle_verify_cstate(mwait_hint))
                        continue;
 
                /* Structure copy. */
                drv->states[drv->state_count] = cpuidle_state_table[cstate];
                state = &drv->states[drv->state_count];
 
-               if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
-                       /*
-                        * Combining with XSTATE with IBRS or IRQ_ENABLE flags
-                        * is not currently supported but this driver.
-                        */
-                       WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
-                       WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
-                       state->enter = intel_idle_xstate;
-               } else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
-                          state->flags & CPUIDLE_FLAG_IBRS) {
-                       /*
-                        * IBRS mitigation requires that C-states are entered
-                        * with interrupts disabled.
-                        */
-                       WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
-                       state->enter = intel_idle_ibrs;
-               } else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
-                       state->enter = intel_idle_irq;
-               } else if (force_irq_on) {
-                       pr_info("forced intel_idle_irq for state %d\n", cstate);
-                       state->enter = intel_idle_irq;
-               }
+               state_update_enter_method(state, cstate);
+
 
                if ((disabled_states_mask & BIT(drv->state_count)) ||
                    ((icpu->use_acpi || force_use_acpi) &&
@@ -2041,6 +2137,93 @@ static void __init intel_idle_cpuidle_devices_uninit(void)
                cpuidle_unregister_device(per_cpu_ptr(intel_idle_cpuidle_devices, i));
 }
 
+/*
+ * Match up the latency and break even point of the bare metal (cpu based)
+ * states with the deepest VM available state.
+ *
+ * We only want to do this for the deepest state, the ones that has
+ * the TLB_FLUSHED flag set on the .
+ *
+ * All our short idle states are dominated by vmexit/vmenter latencies,
+ * not the underlying hardware latencies so we keep our values for these.
+ */
+static void matchup_vm_state_with_baremetal(void)
+{
+       int cstate;
+
+       for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
+               int matching_cstate;
+
+               if (intel_idle_max_cstate_reached(cstate))
+                       break;
+
+               if (!cpuidle_state_table[cstate].enter)
+                       break;
+
+               if (!(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_TLB_FLUSHED))
+                       continue;
+
+               for (matching_cstate = 0; matching_cstate < CPUIDLE_STATE_MAX; ++matching_cstate) {
+                       if (!icpu->state_table[matching_cstate].enter)
+                               break;
+                       if (icpu->state_table[matching_cstate].exit_latency > cpuidle_state_table[cstate].exit_latency) {
+                               cpuidle_state_table[cstate].exit_latency = icpu->state_table[matching_cstate].exit_latency;
+                               cpuidle_state_table[cstate].target_residency = icpu->state_table[matching_cstate].target_residency;
+                       }
+               }
+
+       }
+}
+
+
+static int __init intel_idle_vminit(const struct x86_cpu_id *id)
+{
+       int retval;
+
+       cpuidle_state_table = vmguest_cstates;
+
+       icpu = (const struct idle_cpu *)id->driver_data;
+
+       pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n",
+                boot_cpu_data.x86_model);
+
+       intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
+       if (!intel_idle_cpuidle_devices)
+               return -ENOMEM;
+
+       /*
+        * We don't know exactly what the host will do when we go idle, but as a worst estimate
+        * we can assume that the exit latency of the deepest host state will be hit for our
+        * deep (long duration) guest idle state.
+        * The same logic applies to the break even point for the long duration guest idle state.
+        * So lets copy these two properties from the table we found for the host CPU type.
+        */
+       matchup_vm_state_with_baremetal();
+
+       intel_idle_cpuidle_driver_init(&intel_idle_driver);
+
+       retval = cpuidle_register_driver(&intel_idle_driver);
+       if (retval) {
+               struct cpuidle_driver *drv = cpuidle_get_driver();
+               printk(KERN_DEBUG pr_fmt("intel_idle yielding to %s\n"),
+                      drv ? drv->name : "none");
+               goto init_driver_fail;
+       }
+
+       retval = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "idle/intel:online",
+                                  intel_idle_cpu_online, NULL);
+       if (retval < 0)
+               goto hp_setup_fail;
+
+       return 0;
+hp_setup_fail:
+       intel_idle_cpuidle_devices_uninit();
+       cpuidle_unregister_driver(&intel_idle_driver);
+init_driver_fail:
+       free_percpu(intel_idle_cpuidle_devices);
+       return retval;
+}
+
 static int __init intel_idle_init(void)
 {
        const struct x86_cpu_id *id;
@@ -2059,6 +2242,8 @@ static int __init intel_idle_init(void)
        id = x86_match_cpu(intel_idle_ids);
        if (id) {
                if (!boot_cpu_has(X86_FEATURE_MWAIT)) {
+                       if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+                               return intel_idle_vminit(id);
                        pr_debug("Please enable MWAIT in BIOS SETUP\n");
                        return -ENODEV;
                }
index f98393d..b8636fa 100644 (file)
@@ -1048,7 +1048,7 @@ int kx022a_probe_internal(struct device *dev)
                data->ien_reg = KX022A_REG_INC4;
        } else {
                irq = fwnode_irq_get_byname(fwnode, "INT2");
-               if (irq <= 0)
+               if (irq < 0)
                        return dev_err_probe(dev, irq, "No suitable IRQ\n");
 
                data->inc_reg = KX022A_REG_INC5;
index 5f7d81b..282e539 100644 (file)
@@ -1291,12 +1291,12 @@ static int apply_acpi_orientation(struct iio_dev *indio_dev)
 
        adev = ACPI_COMPANION(indio_dev->dev.parent);
        if (!adev)
-               return 0;
+               return -ENXIO;
 
        /* Read _ONT data, which should be a package of 6 integers. */
        status = acpi_evaluate_object(adev->handle, "_ONT", NULL, &buffer);
        if (status == AE_NOT_FOUND) {
-               return 0;
+               return -ENXIO;
        } else if (ACPI_FAILURE(status)) {
                dev_warn(&indio_dev->dev, "failed to execute _ONT: %d\n",
                         status);
index 3839434..5a5dd5e 100644 (file)
@@ -1817,6 +1817,11 @@ static const struct clk_ops ad4130_int_clk_ops = {
        .unprepare = ad4130_int_clk_unprepare,
 };
 
+static void ad4130_clk_del_provider(void *of_node)
+{
+       of_clk_del_provider(of_node);
+}
+
 static int ad4130_setup_int_clk(struct ad4130_state *st)
 {
        struct device *dev = &st->spi->dev;
@@ -1824,6 +1829,7 @@ static int ad4130_setup_int_clk(struct ad4130_state *st)
        struct clk_init_data init;
        const char *clk_name;
        struct clk *clk;
+       int ret;
 
        if (st->int_pin_sel == AD4130_INT_PIN_CLK ||
            st->mclk_sel != AD4130_MCLK_76_8KHZ)
@@ -1843,7 +1849,11 @@ static int ad4130_setup_int_clk(struct ad4130_state *st)
        if (IS_ERR(clk))
                return PTR_ERR(clk);
 
-       return of_clk_add_provider(of_node, of_clk_src_simple_get, clk);
+       ret = of_clk_add_provider(of_node, of_clk_src_simple_get, clk);
+       if (ret)
+               return ret;
+
+       return devm_add_action_or_reset(dev, ad4130_clk_del_provider, of_node);
 }
 
 static int ad4130_setup(struct iio_dev *indio_dev)
index 55a6ab5..99bb604 100644 (file)
@@ -897,10 +897,6 @@ static const struct iio_info ad7195_info = {
        __AD719x_CHANNEL(_si, _channel1, -1, _address, NULL, IIO_VOLTAGE, \
                BIT(IIO_CHAN_INFO_SCALE), ad7192_calibsys_ext_info)
 
-#define AD719x_SHORTED_CHANNEL(_si, _channel1, _address) \
-       __AD719x_CHANNEL(_si, _channel1, -1, _address, "shorted", IIO_VOLTAGE, \
-               BIT(IIO_CHAN_INFO_SCALE), ad7192_calibsys_ext_info)
-
 #define AD719x_TEMP_CHANNEL(_si, _address) \
        __AD719x_CHANNEL(_si, 0, -1, _address, NULL, IIO_TEMP, 0, NULL)
 
@@ -908,7 +904,7 @@ static const struct iio_chan_spec ad7192_channels[] = {
        AD719x_DIFF_CHANNEL(0, 1, 2, AD7192_CH_AIN1P_AIN2M),
        AD719x_DIFF_CHANNEL(1, 3, 4, AD7192_CH_AIN3P_AIN4M),
        AD719x_TEMP_CHANNEL(2, AD7192_CH_TEMP),
-       AD719x_SHORTED_CHANNEL(3, 2, AD7192_CH_AIN2P_AIN2M),
+       AD719x_DIFF_CHANNEL(3, 2, 2, AD7192_CH_AIN2P_AIN2M),
        AD719x_CHANNEL(4, 1, AD7192_CH_AIN1),
        AD719x_CHANNEL(5, 2, AD7192_CH_AIN2),
        AD719x_CHANNEL(6, 3, AD7192_CH_AIN3),
@@ -922,7 +918,7 @@ static const struct iio_chan_spec ad7193_channels[] = {
        AD719x_DIFF_CHANNEL(2, 5, 6, AD7193_CH_AIN5P_AIN6M),
        AD719x_DIFF_CHANNEL(3, 7, 8, AD7193_CH_AIN7P_AIN8M),
        AD719x_TEMP_CHANNEL(4, AD7193_CH_TEMP),
-       AD719x_SHORTED_CHANNEL(5, 2, AD7193_CH_AIN2P_AIN2M),
+       AD719x_DIFF_CHANNEL(5, 2, 2, AD7193_CH_AIN2P_AIN2M),
        AD719x_CHANNEL(6, 1, AD7193_CH_AIN1),
        AD719x_CHANNEL(7, 2, AD7193_CH_AIN2),
        AD719x_CHANNEL(8, 3, AD7193_CH_AIN3),
index d8570f6..7e21928 100644 (file)
@@ -584,6 +584,10 @@ static int devm_ad_sd_probe_trigger(struct device *dev, struct iio_dev *indio_de
        init_completion(&sigma_delta->completion);
 
        sigma_delta->irq_dis = true;
+
+       /* the IRQ core clears IRQ_DISABLE_UNLAZY flag when freeing an IRQ */
+       irq_set_status_flags(sigma_delta->spi->irq, IRQ_DISABLE_UNLAZY);
+
        ret = devm_request_irq(dev, sigma_delta->spi->irq,
                               ad_sd_data_rdy_trig_poll,
                               sigma_delta->info->irq_flags | IRQF_NO_AUTOEN,
index a775d2e..dce9ec9 100644 (file)
@@ -236,8 +236,7 @@ static int imx93_adc_read_raw(struct iio_dev *indio_dev,
 {
        struct imx93_adc *adc = iio_priv(indio_dev);
        struct device *dev = adc->dev;
-       long ret;
-       u32 vref_uv;
+       int ret;
 
        switch (mask) {
        case IIO_CHAN_INFO_RAW:
@@ -253,10 +252,10 @@ static int imx93_adc_read_raw(struct iio_dev *indio_dev,
                return IIO_VAL_INT;
 
        case IIO_CHAN_INFO_SCALE:
-               ret = vref_uv = regulator_get_voltage(adc->vref);
+               ret = regulator_get_voltage(adc->vref);
                if (ret < 0)
                        return ret;
-               *val = vref_uv / 1000;
+               *val = ret / 1000;
                *val2 = 12;
                return IIO_VAL_FRACTIONAL_LOG2;
 
index bc62e5a..0bc1121 100644 (file)
@@ -19,6 +19,7 @@
 
 #include <dt-bindings/iio/adc/mediatek,mt6370_adc.h>
 
+#define MT6370_REG_DEV_INFO            0x100
 #define MT6370_REG_CHG_CTRL3           0x113
 #define MT6370_REG_CHG_CTRL7           0x117
 #define MT6370_REG_CHG_ADC             0x121
@@ -27,6 +28,7 @@
 #define MT6370_ADC_START_MASK          BIT(0)
 #define MT6370_ADC_IN_SEL_MASK         GENMASK(7, 4)
 #define MT6370_AICR_ICHG_MASK          GENMASK(7, 2)
+#define MT6370_VENID_MASK              GENMASK(7, 4)
 
 #define MT6370_AICR_100_mA             0x0
 #define MT6370_AICR_150_mA             0x1
 #define ADC_CONV_TIME_MS               35
 #define ADC_CONV_POLLING_TIME_US       1000
 
+#define MT6370_VID_RT5081              0x8
+#define MT6370_VID_RT5081A             0xA
+#define MT6370_VID_MT6370              0xE
+
 struct mt6370_adc_data {
        struct device *dev;
        struct regmap *regmap;
@@ -55,6 +61,7 @@ struct mt6370_adc_data {
         * from being read at the same time.
         */
        struct mutex adc_lock;
+       unsigned int vid;
 };
 
 static int mt6370_adc_read_channel(struct mt6370_adc_data *priv, int chan,
@@ -98,6 +105,30 @@ adc_unlock:
        return ret;
 }
 
+static int mt6370_adc_get_ibus_scale(struct mt6370_adc_data *priv)
+{
+       switch (priv->vid) {
+       case MT6370_VID_RT5081:
+       case MT6370_VID_RT5081A:
+       case MT6370_VID_MT6370:
+               return 3350;
+       default:
+               return 3875;
+       }
+}
+
+static int mt6370_adc_get_ibat_scale(struct mt6370_adc_data *priv)
+{
+       switch (priv->vid) {
+       case MT6370_VID_RT5081:
+       case MT6370_VID_RT5081A:
+       case MT6370_VID_MT6370:
+               return 2680;
+       default:
+               return 3870;
+       }
+}
+
 static int mt6370_adc_read_scale(struct mt6370_adc_data *priv,
                                 int chan, int *val1, int *val2)
 {
@@ -123,7 +154,7 @@ static int mt6370_adc_read_scale(struct mt6370_adc_data *priv,
                case MT6370_AICR_250_mA:
                case MT6370_AICR_300_mA:
                case MT6370_AICR_350_mA:
-                       *val1 = 3350;
+                       *val1 = mt6370_adc_get_ibus_scale(priv);
                        break;
                default:
                        *val1 = 5000;
@@ -150,7 +181,7 @@ static int mt6370_adc_read_scale(struct mt6370_adc_data *priv,
                case MT6370_ICHG_600_mA:
                case MT6370_ICHG_700_mA:
                case MT6370_ICHG_800_mA:
-                       *val1 = 2680;
+                       *val1 = mt6370_adc_get_ibat_scale(priv);
                        break;
                default:
                        *val1 = 5000;
@@ -251,6 +282,20 @@ static const struct iio_chan_spec mt6370_adc_channels[] = {
        MT6370_ADC_CHAN(TEMP_JC, IIO_TEMP, 12, BIT(IIO_CHAN_INFO_OFFSET)),
 };
 
+static int mt6370_get_vendor_info(struct mt6370_adc_data *priv)
+{
+       unsigned int dev_info;
+       int ret;
+
+       ret = regmap_read(priv->regmap, MT6370_REG_DEV_INFO, &dev_info);
+       if (ret)
+               return ret;
+
+       priv->vid = FIELD_GET(MT6370_VENID_MASK, dev_info);
+
+       return 0;
+}
+
 static int mt6370_adc_probe(struct platform_device *pdev)
 {
        struct device *dev = &pdev->dev;
@@ -272,6 +317,10 @@ static int mt6370_adc_probe(struct platform_device *pdev)
        priv->regmap = regmap;
        mutex_init(&priv->adc_lock);
 
+       ret = mt6370_get_vendor_info(priv);
+       if (ret)
+               return dev_err_probe(dev, ret, "Failed to get vid\n");
+
        ret = regmap_write(priv->regmap, MT6370_REG_CHG_ADC, 0);
        if (ret)
                return dev_err_probe(dev, ret, "Failed to reset ADC\n");
index bca79a9..a50f391 100644 (file)
@@ -757,13 +757,13 @@ static int mxs_lradc_adc_probe(struct platform_device *pdev)
 
        ret = mxs_lradc_adc_trigger_init(iio);
        if (ret)
-               goto err_trig;
+               return ret;
 
        ret = iio_triggered_buffer_setup(iio, &iio_pollfunc_store_time,
                                         &mxs_lradc_adc_trigger_handler,
                                         &mxs_lradc_adc_buffer_ops);
        if (ret)
-               return ret;
+               goto err_trig;
 
        adc->vref_mv = mxs_lradc_adc_vref_mv[lradc->soc];
 
@@ -801,9 +801,9 @@ static int mxs_lradc_adc_probe(struct platform_device *pdev)
 
 err_dev:
        mxs_lradc_adc_hw_stop(adc);
-       mxs_lradc_adc_trigger_remove(iio);
-err_trig:
        iio_triggered_buffer_cleanup(iio);
+err_trig:
+       mxs_lradc_adc_trigger_remove(iio);
        return ret;
 }
 
@@ -814,8 +814,8 @@ static int mxs_lradc_adc_remove(struct platform_device *pdev)
 
        iio_device_unregister(iio);
        mxs_lradc_adc_hw_stop(adc);
-       mxs_lradc_adc_trigger_remove(iio);
        iio_triggered_buffer_cleanup(iio);
+       mxs_lradc_adc_trigger_remove(iio);
 
        return 0;
 }
index c1c4392..7dfc9c9 100644 (file)
@@ -547,7 +547,7 @@ static int palmas_gpadc_read_raw(struct iio_dev *indio_dev,
        int adc_chan = chan->channel;
        int ret = 0;
 
-       if (adc_chan > PALMAS_ADC_CH_MAX)
+       if (adc_chan >= PALMAS_ADC_CH_MAX)
                return -EINVAL;
 
        mutex_lock(&adc->lock);
@@ -595,7 +595,7 @@ static int palmas_gpadc_read_event_config(struct iio_dev *indio_dev,
        int adc_chan = chan->channel;
        int ret = 0;
 
-       if (adc_chan > PALMAS_ADC_CH_MAX || type != IIO_EV_TYPE_THRESH)
+       if (adc_chan >= PALMAS_ADC_CH_MAX || type != IIO_EV_TYPE_THRESH)
                return -EINVAL;
 
        mutex_lock(&adc->lock);
@@ -684,7 +684,7 @@ static int palmas_gpadc_write_event_config(struct iio_dev *indio_dev,
        int adc_chan = chan->channel;
        int ret;
 
-       if (adc_chan > PALMAS_ADC_CH_MAX || type != IIO_EV_TYPE_THRESH)
+       if (adc_chan >= PALMAS_ADC_CH_MAX || type != IIO_EV_TYPE_THRESH)
                return -EINVAL;
 
        mutex_lock(&adc->lock);
@@ -710,7 +710,7 @@ static int palmas_gpadc_read_event_value(struct iio_dev *indio_dev,
        int adc_chan = chan->channel;
        int ret;
 
-       if (adc_chan > PALMAS_ADC_CH_MAX || type != IIO_EV_TYPE_THRESH)
+       if (adc_chan >= PALMAS_ADC_CH_MAX || type != IIO_EV_TYPE_THRESH)
                return -EINVAL;
 
        mutex_lock(&adc->lock);
@@ -744,7 +744,7 @@ static int palmas_gpadc_write_event_value(struct iio_dev *indio_dev,
        int old;
        int ret;
 
-       if (adc_chan > PALMAS_ADC_CH_MAX || type != IIO_EV_TYPE_THRESH)
+       if (adc_chan >= PALMAS_ADC_CH_MAX || type != IIO_EV_TYPE_THRESH)
                return -EINVAL;
 
        mutex_lock(&adc->lock);
index 1aadb2a..bd7e240 100644 (file)
@@ -2006,16 +2006,15 @@ static int stm32_adc_get_legacy_chan_count(struct iio_dev *indio_dev, struct stm
         * to get the *real* number of channels.
         */
        ret = device_property_count_u32(dev, "st,adc-diff-channels");
-       if (ret < 0)
-               return ret;
-
-       ret /= (int)(sizeof(struct stm32_adc_diff_channel) / sizeof(u32));
-       if (ret > adc_info->max_channels) {
-               dev_err(&indio_dev->dev, "Bad st,adc-diff-channels?\n");
-               return -EINVAL;
-       } else if (ret > 0) {
-               adc->num_diff = ret;
-               num_channels += ret;
+       if (ret > 0) {
+               ret /= (int)(sizeof(struct stm32_adc_diff_channel) / sizeof(u32));
+               if (ret > adc_info->max_channels) {
+                       dev_err(&indio_dev->dev, "Bad st,adc-diff-channels?\n");
+                       return -EINVAL;
+               } else if (ret > 0) {
+                       adc->num_diff = ret;
+                       num_channels += ret;
+               }
        }
 
        /* Optional sample time is provided either for each, or all channels */
@@ -2037,6 +2036,7 @@ static int stm32_adc_legacy_chan_init(struct iio_dev *indio_dev,
        struct stm32_adc_diff_channel diff[STM32_ADC_CH_MAX];
        struct device *dev = &indio_dev->dev;
        u32 num_diff = adc->num_diff;
+       int num_se = nchans - num_diff;
        int size = num_diff * sizeof(*diff) / sizeof(u32);
        int scan_index = 0, ret, i, c;
        u32 smp = 0, smps[STM32_ADC_CH_MAX], chans[STM32_ADC_CH_MAX];
@@ -2063,29 +2063,32 @@ static int stm32_adc_legacy_chan_init(struct iio_dev *indio_dev,
                        scan_index++;
                }
        }
-
-       ret = device_property_read_u32_array(dev, "st,adc-channels", chans,
-                                            nchans);
-       if (ret)
-               return ret;
-
-       for (c = 0; c < nchans; c++) {
-               if (chans[c] >= adc_info->max_channels) {
-                       dev_err(&indio_dev->dev, "Invalid channel %d\n",
-                               chans[c]);
-                       return -EINVAL;
+       if (num_se > 0) {
+               ret = device_property_read_u32_array(dev, "st,adc-channels", chans, num_se);
+               if (ret) {
+                       dev_err(&indio_dev->dev, "Failed to get st,adc-channels %d\n", ret);
+                       return ret;
                }
 
-               /* Channel can't be configured both as single-ended & diff */
-               for (i = 0; i < num_diff; i++) {
-                       if (chans[c] == diff[i].vinp) {
-                               dev_err(&indio_dev->dev, "channel %d misconfigured\n",  chans[c]);
+               for (c = 0; c < num_se; c++) {
+                       if (chans[c] >= adc_info->max_channels) {
+                               dev_err(&indio_dev->dev, "Invalid channel %d\n",
+                                       chans[c]);
                                return -EINVAL;
                        }
+
+                       /* Channel can't be configured both as single-ended & diff */
+                       for (i = 0; i < num_diff; i++) {
+                               if (chans[c] == diff[i].vinp) {
+                                       dev_err(&indio_dev->dev, "channel %d misconfigured\n",
+                                               chans[c]);
+                                       return -EINVAL;
+                               }
+                       }
+                       stm32_adc_chan_init_one(indio_dev, &channels[scan_index],
+                                               chans[c], 0, scan_index, false);
+                       scan_index++;
                }
-               stm32_adc_chan_init_one(indio_dev, &channels[scan_index],
-                                       chans[c], 0, scan_index, false);
-               scan_index++;
        }
 
        if (adc->nsmps > 0) {
@@ -2306,7 +2309,7 @@ static int stm32_adc_chan_fw_init(struct iio_dev *indio_dev, bool timestamping)
 
        if (legacy)
                ret = stm32_adc_legacy_chan_init(indio_dev, adc, channels,
-                                                num_channels);
+                                                timestamping ? num_channels - 1 : num_channels);
        else
                ret = stm32_adc_generic_chan_init(indio_dev, adc, channels);
        if (ret < 0)
index 07e9f6a..e3366cf 100644 (file)
@@ -1007,7 +1007,7 @@ static int ad74413r_read_raw(struct iio_dev *indio_dev,
 
                ret = ad74413r_get_single_adc_result(indio_dev, chan->channel,
                                                     val);
-               if (ret)
+               if (ret < 0)
                        return ret;
 
                ad74413r_adc_to_resistance_result(*val, val);
index 6c74fea..addd97a 100644 (file)
@@ -17,7 +17,7 @@ obj-$(CONFIG_AD5592R_BASE) += ad5592r-base.o
 obj-$(CONFIG_AD5592R) += ad5592r.o
 obj-$(CONFIG_AD5593R) += ad5593r.o
 obj-$(CONFIG_AD5755) += ad5755.o
-obj-$(CONFIG_AD5755) += ad5758.o
+obj-$(CONFIG_AD5758) += ad5758.o
 obj-$(CONFIG_AD5761) += ad5761.o
 obj-$(CONFIG_AD5764) += ad5764.o
 obj-$(CONFIG_AD5766) += ad5766.o
index 46bf758..3f5661a 100644 (file)
@@ -47,12 +47,18 @@ static int mcp4725_suspend(struct device *dev)
        struct mcp4725_data *data = iio_priv(i2c_get_clientdata(
                to_i2c_client(dev)));
        u8 outbuf[2];
+       int ret;
 
        outbuf[0] = (data->powerdown_mode + 1) << 4;
        outbuf[1] = 0;
        data->powerdown = true;
 
-       return i2c_master_send(data->client, outbuf, 2);
+       ret = i2c_master_send(data->client, outbuf, 2);
+       if (ret < 0)
+               return ret;
+       else if (ret != 2)
+               return -EIO;
+       return 0;
 }
 
 static int mcp4725_resume(struct device *dev)
@@ -60,13 +66,19 @@ static int mcp4725_resume(struct device *dev)
        struct mcp4725_data *data = iio_priv(i2c_get_clientdata(
                to_i2c_client(dev)));
        u8 outbuf[2];
+       int ret;
 
        /* restore previous DAC value */
        outbuf[0] = (data->dac_value >> 8) & 0xf;
        outbuf[1] = data->dac_value & 0xff;
        data->powerdown = false;
 
-       return i2c_master_send(data->client, outbuf, 2);
+       ret = i2c_master_send(data->client, outbuf, 2);
+       if (ret < 0)
+               return ret;
+       else if (ret != 2)
+               return -EIO;
+       return 0;
 }
 static DEFINE_SIMPLE_DEV_PM_OPS(mcp4725_pm_ops, mcp4725_suspend,
                                mcp4725_resume);
index 99576b2..32d7f83 100644 (file)
@@ -275,9 +275,14 @@ static int inv_icm42600_buffer_preenable(struct iio_dev *indio_dev)
 {
        struct inv_icm42600_state *st = iio_device_get_drvdata(indio_dev);
        struct device *dev = regmap_get_device(st->map);
+       struct inv_icm42600_timestamp *ts = iio_priv(indio_dev);
 
        pm_runtime_get_sync(dev);
 
+       mutex_lock(&st->lock);
+       inv_icm42600_timestamp_reset(ts);
+       mutex_unlock(&st->lock);
+
        return 0;
 }
 
@@ -375,7 +380,6 @@ static int inv_icm42600_buffer_postdisable(struct iio_dev *indio_dev)
        struct device *dev = regmap_get_device(st->map);
        unsigned int sensor;
        unsigned int *watermark;
-       struct inv_icm42600_timestamp *ts;
        struct inv_icm42600_sensor_conf conf = INV_ICM42600_SENSOR_CONF_INIT;
        unsigned int sleep_temp = 0;
        unsigned int sleep_sensor = 0;
@@ -385,11 +389,9 @@ static int inv_icm42600_buffer_postdisable(struct iio_dev *indio_dev)
        if (indio_dev == st->indio_gyro) {
                sensor = INV_ICM42600_SENSOR_GYRO;
                watermark = &st->fifo.watermark.gyro;
-               ts = iio_priv(st->indio_gyro);
        } else if (indio_dev == st->indio_accel) {
                sensor = INV_ICM42600_SENSOR_ACCEL;
                watermark = &st->fifo.watermark.accel;
-               ts = iio_priv(st->indio_accel);
        } else {
                return -EINVAL;
        }
@@ -417,8 +419,6 @@ static int inv_icm42600_buffer_postdisable(struct iio_dev *indio_dev)
        if (!st->fifo.on)
                ret = inv_icm42600_set_temp_conf(st, false, &sleep_temp);
 
-       inv_icm42600_timestamp_reset(ts);
-
 out_unlock:
        mutex_unlock(&st->lock);
 
index 8bb6897..7653261 100644 (file)
@@ -337,6 +337,17 @@ free_gains:
        return ret;
 }
 
+static void iio_gts_us_to_int_micro(int *time_us, int *int_micro_times,
+                                   int num_times)
+{
+       int i;
+
+       for (i = 0; i < num_times; i++) {
+               int_micro_times[i * 2] = time_us[i] / 1000000;
+               int_micro_times[i * 2 + 1] = time_us[i] % 1000000;
+       }
+}
+
 /**
  * iio_gts_build_avail_time_table - build table of available integration times
  * @gts:       Gain time scale descriptor
@@ -351,7 +362,7 @@ free_gains:
  */
 static int iio_gts_build_avail_time_table(struct iio_gts *gts)
 {
-       int *times, i, j, idx = 0;
+       int *times, i, j, idx = 0, *int_micro_times;
 
        if (!gts->num_itime)
                return 0;
@@ -378,13 +389,24 @@ static int iio_gts_build_avail_time_table(struct iio_gts *gts)
                        }
                }
        }
-       gts->avail_time_tables = times;
-       /*
-        * This is just to survive a unlikely corner-case where times in the
-        * given time table were not unique. Else we could just trust the
-        * gts->num_itime.
-        */
-       gts->num_avail_time_tables = idx;
+
+       /* create a list of times formatted as list of IIO_VAL_INT_PLUS_MICRO */
+       int_micro_times = kcalloc(idx, sizeof(int) * 2, GFP_KERNEL);
+       if (int_micro_times) {
+               /*
+                * This is just to survive a unlikely corner-case where times in
+                * the given time table were not unique. Else we could just
+                * trust the gts->num_itime.
+                */
+               gts->num_avail_time_tables = idx;
+               iio_gts_us_to_int_micro(times, int_micro_times, idx);
+       }
+
+       gts->avail_time_tables = int_micro_times;
+       kfree(times);
+
+       if (!int_micro_times)
+               return -ENOMEM;
 
        return 0;
 }
@@ -683,8 +705,8 @@ int iio_gts_avail_times(struct iio_gts *gts,  const int **vals, int *type,
                return -EINVAL;
 
        *vals = gts->avail_time_tables;
-       *type = IIO_VAL_INT;
-       *length = gts->num_avail_time_tables;
+       *type = IIO_VAL_INT_PLUS_MICRO;
+       *length = gts->num_avail_time_tables * 2;
 
        return IIO_AVAIL_LIST;
 }
index e486dcf..f85194f 100644 (file)
@@ -231,6 +231,9 @@ struct bu27034_result {
 
 static const struct regmap_range bu27034_volatile_ranges[] = {
        {
+               .range_min = BU27034_REG_SYSTEM_CONTROL,
+               .range_max = BU27034_REG_SYSTEM_CONTROL,
+       }, {
                .range_min = BU27034_REG_MODE_CONTROL4,
                .range_max = BU27034_REG_MODE_CONTROL4,
        }, {
@@ -1167,11 +1170,12 @@ static int bu27034_read_raw(struct iio_dev *idev,
 
        switch (mask) {
        case IIO_CHAN_INFO_INT_TIME:
-               *val = bu27034_get_int_time(data);
-               if (*val < 0)
-                       return *val;
+               *val = 0;
+               *val2 = bu27034_get_int_time(data);
+               if (*val2 < 0)
+                       return *val2;
 
-               return IIO_VAL_INT;
+               return IIO_VAL_INT_PLUS_MICRO;
 
        case IIO_CHAN_INFO_SCALE:
                return bu27034_get_scale(data, chan->channel, val, val2);
@@ -1229,7 +1233,10 @@ static int bu27034_write_raw(struct iio_dev *idev,
                ret = bu27034_set_scale(data, chan->channel, val, val2);
                break;
        case IIO_CHAN_INFO_INT_TIME:
-               ret = bu27034_try_set_int_time(data, val);
+               if (!val)
+                       ret = bu27034_try_set_int_time(data, val2);
+               else
+                       ret = -EINVAL;
                break;
        default:
                ret = -EINVAL;
@@ -1268,12 +1275,19 @@ static int bu27034_chip_init(struct bu27034_data *data)
        int ret, sel;
 
        /* Reset */
-       ret = regmap_update_bits(data->regmap, BU27034_REG_SYSTEM_CONTROL,
+       ret = regmap_write_bits(data->regmap, BU27034_REG_SYSTEM_CONTROL,
                           BU27034_MASK_SW_RESET, BU27034_MASK_SW_RESET);
        if (ret)
                return dev_err_probe(data->dev, ret, "Sensor reset failed\n");
 
        msleep(1);
+
+       ret = regmap_reinit_cache(data->regmap, &bu27034_regmap);
+       if (ret) {
+               dev_err(data->dev, "Failed to reinit reg cache\n");
+               return ret;
+       }
+
        /*
         * Read integration time here to ensure it is in regmap cache. We do
         * this to speed-up the int-time acquisition in the start of the buffer
index 14e2933..94f5d61 100644 (file)
@@ -8,6 +8,7 @@
  * TODO: Proximity
  */
 #include <linux/bitops.h>
+#include <linux/bitfield.h>
 #include <linux/i2c.h>
 #include <linux/module.h>
 #include <linux/pm_runtime.h>
@@ -42,6 +43,7 @@
 #define VCNL4035_ALS_PERS_MASK         GENMASK(3, 2)
 #define VCNL4035_INT_ALS_IF_H_MASK     BIT(12)
 #define VCNL4035_INT_ALS_IF_L_MASK     BIT(13)
+#define VCNL4035_DEV_ID_MASK           GENMASK(7, 0)
 
 /* Default values */
 #define VCNL4035_MODE_ALS_ENABLE       BIT(0)
@@ -413,6 +415,7 @@ static int vcnl4035_init(struct vcnl4035_data *data)
                return ret;
        }
 
+       id = FIELD_GET(VCNL4035_DEV_ID_MASK, id);
        if (id != VCNL4035_DEV_ID_VAL) {
                dev_err(&data->client->dev, "Wrong id, got %x, expected %x\n",
                        id, VCNL4035_DEV_ID_VAL);
index 28bb7ef..e155a75 100644 (file)
@@ -296,12 +296,13 @@ static int tmag5273_read_raw(struct iio_dev *indio_dev,
                        return ret;
 
                ret = tmag5273_get_measure(data, &t, &x, &y, &z, &angle, &magnitude);
-               if (ret)
-                       return ret;
 
                pm_runtime_mark_last_busy(data->dev);
                pm_runtime_put_autosuspend(data->dev);
 
+               if (ret)
+                       return ret;
+
                switch (chan->address) {
                case TEMPERATURE:
                        *val = t;
index 93a1c48..6b3f438 100644 (file)
@@ -3295,7 +3295,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
        route->path_rec->traffic_class = tos;
        route->path_rec->mtu = iboe_get_mtu(ndev->mtu);
        route->path_rec->rate_selector = IB_SA_EQ;
-       route->path_rec->rate = iboe_get_rate(ndev);
+       route->path_rec->rate = IB_RATE_PORT_CURRENT;
        dev_put(ndev);
        route->path_rec->packet_life_time_selector = IB_SA_EQ;
        /* In case ACK timeout is set, use this value to calculate
@@ -4964,7 +4964,7 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
        if (!ndev)
                return -ENODEV;
 
-       ib.rec.rate = iboe_get_rate(ndev);
+       ib.rec.rate = IB_RATE_PORT_CURRENT;
        ib.rec.hop_limit = 1;
        ib.rec.mtu = iboe_get_mtu(ndev->mtu);
 
index 4796f6a..e836c9c 100644 (file)
@@ -1850,8 +1850,13 @@ static int modify_qp(struct uverbs_attr_bundle *attrs,
                attr->path_mtu = cmd->base.path_mtu;
        if (cmd->base.attr_mask & IB_QP_PATH_MIG_STATE)
                attr->path_mig_state = cmd->base.path_mig_state;
-       if (cmd->base.attr_mask & IB_QP_QKEY)
+       if (cmd->base.attr_mask & IB_QP_QKEY) {
+               if (cmd->base.qkey & IB_QP_SET_QKEY && !capable(CAP_NET_RAW)) {
+                       ret = -EPERM;
+                       goto release_qp;
+               }
                attr->qkey = cmd->base.qkey;
+       }
        if (cmd->base.attr_mask & IB_QP_RQ_PSN)
                attr->rq_psn = cmd->base.rq_psn;
        if (cmd->base.attr_mask & IB_QP_SQ_PSN)
index fbace69..7c9c79c 100644 (file)
@@ -222,8 +222,12 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue,
        spin_lock_irq(&ev_queue->lock);
 
        while (list_empty(&ev_queue->event_list)) {
-               spin_unlock_irq(&ev_queue->lock);
+               if (ev_queue->is_closed) {
+                       spin_unlock_irq(&ev_queue->lock);
+                       return -EIO;
+               }
 
+               spin_unlock_irq(&ev_queue->lock);
                if (filp->f_flags & O_NONBLOCK)
                        return -EAGAIN;
 
@@ -233,12 +237,6 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue,
                        return -ERESTARTSYS;
 
                spin_lock_irq(&ev_queue->lock);
-
-               /* If device was disassociated and no event exists set an error */
-               if (list_empty(&ev_queue->event_list) && ev_queue->is_closed) {
-                       spin_unlock_irq(&ev_queue->lock);
-                       return -EIO;
-               }
        }
 
        event = list_entry(ev_queue->event_list.next, struct ib_uverbs_event, list);
index 5a2baf4..2c95e6f 100644 (file)
@@ -135,8 +135,6 @@ struct bnxt_re_dev {
 
        struct delayed_work             worker;
        u8                              cur_prio_map;
-       u16                             active_speed;
-       u8                              active_width;
 
        /* FP Notification Queue (CQ & SRQ) */
        struct tasklet_struct           nq_task;
index e86afec..952811c 100644 (file)
@@ -199,6 +199,7 @@ int bnxt_re_query_port(struct ib_device *ibdev, u32 port_num,
 {
        struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
        struct bnxt_qplib_dev_attr *dev_attr = &rdev->dev_attr;
+       int rc;
 
        memset(port_attr, 0, sizeof(*port_attr));
 
@@ -228,10 +229,10 @@ int bnxt_re_query_port(struct ib_device *ibdev, u32 port_num,
        port_attr->sm_sl = 0;
        port_attr->subnet_timeout = 0;
        port_attr->init_type_reply = 0;
-       port_attr->active_speed = rdev->active_speed;
-       port_attr->active_width = rdev->active_width;
+       rc = ib_get_eth_speed(&rdev->ibdev, port_num, &port_attr->active_speed,
+                             &port_attr->active_width);
 
-       return 0;
+       return rc;
 }
 
 int bnxt_re_get_port_immutable(struct ib_device *ibdev, u32 port_num,
@@ -3341,9 +3342,7 @@ static int bnxt_re_process_raw_qp_pkt_rx(struct bnxt_re_qp *gsi_qp,
        udwr.remote_qkey = gsi_sqp->qplib_qp.qkey;
 
        /* post data received  in the send queue */
-       rc = bnxt_re_post_send_shadow_qp(rdev, gsi_sqp, swr);
-
-       return 0;
+       return bnxt_re_post_send_shadow_qp(rdev, gsi_sqp, swr);
 }
 
 static void bnxt_re_process_res_rawqp1_wc(struct ib_wc *wc,
index b9e2f89..3073398 100644 (file)
@@ -1077,8 +1077,6 @@ static int bnxt_re_ib_init(struct bnxt_re_dev *rdev)
                return rc;
        }
        dev_info(rdev_to_dev(rdev), "Device registered with IB successfully");
-       ib_get_eth_speed(&rdev->ibdev, 1, &rdev->active_speed,
-                        &rdev->active_width);
        set_bit(BNXT_RE_FLAG_ISSUE_ROCE_STATS, &rdev->flags);
 
        event = netif_running(rdev->netdev) && netif_carrier_ok(rdev->netdev) ?
@@ -1336,6 +1334,10 @@ static void bnxt_re_setup_cc(struct bnxt_re_dev *rdev, bool enable)
 {
        struct bnxt_qplib_cc_param cc_param = {};
 
+       /* Do not enable congestion control on VFs */
+       if (rdev->is_virtfn)
+               return;
+
        /* Currently enabling only for GenP5 adapters */
        if (!bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx))
                return;
index f139d4c..8974f62 100644 (file)
@@ -2056,6 +2056,12 @@ int bnxt_qplib_create_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
        u32 pg_sz_lvl;
        int rc;
 
+       if (!cq->dpi) {
+               dev_err(&rcfw->pdev->dev,
+                       "FP: CREATE_CQ failed due to NULL DPI\n");
+               return -EINVAL;
+       }
+
        hwq_attr.res = res;
        hwq_attr.depth = cq->max_wqe;
        hwq_attr.stride = sizeof(struct cq_base);
@@ -2069,11 +2075,6 @@ int bnxt_qplib_create_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
                                 CMDQ_BASE_OPCODE_CREATE_CQ,
                                 sizeof(req));
 
-       if (!cq->dpi) {
-               dev_err(&rcfw->pdev->dev,
-                       "FP: CREATE_CQ failed due to NULL DPI\n");
-               return -EINVAL;
-       }
        req.dpi = cpu_to_le32(cq->dpi->dpi);
        req.cq_handle = cpu_to_le64(cq->cq_handle);
        req.cq_size = cpu_to_le32(cq->hwq.max_elements);
index 126d4f2..81b0c5e 100644 (file)
@@ -215,17 +215,9 @@ int bnxt_qplib_alloc_init_hwq(struct bnxt_qplib_hwq *hwq,
                        return -EINVAL;
                hwq_attr->sginfo->npages = npages;
        } else {
-               unsigned long sginfo_num_pages = ib_umem_num_dma_blocks(
-                       hwq_attr->sginfo->umem, hwq_attr->sginfo->pgsize);
-
+               npages = ib_umem_num_dma_blocks(hwq_attr->sginfo->umem,
+                                               hwq_attr->sginfo->pgsize);
                hwq->is_user = true;
-               npages = sginfo_num_pages;
-               npages = (npages * PAGE_SIZE) /
-                         BIT_ULL(hwq_attr->sginfo->pgshft);
-               if ((sginfo_num_pages * PAGE_SIZE) %
-                    BIT_ULL(hwq_attr->sginfo->pgshft))
-                       if (!npages)
-                               npages++;
        }
 
        if (npages == MAX_PBL_LVL_0_PGS && !hwq_attr->sginfo->nopte) {
index 1714a1e..b967a17 100644 (file)
@@ -617,16 +617,15 @@ int bnxt_qplib_reg_mr(struct bnxt_qplib_res *res, struct bnxt_qplib_mrw *mr,
                /* Free the hwq if it already exist, must be a rereg */
                if (mr->hwq.max_elements)
                        bnxt_qplib_free_hwq(res, &mr->hwq);
-               /* Use system PAGE_SIZE */
                hwq_attr.res = res;
                hwq_attr.depth = pages;
-               hwq_attr.stride = buf_pg_size;
+               hwq_attr.stride = sizeof(dma_addr_t);
                hwq_attr.type = HWQ_TYPE_MR;
                hwq_attr.sginfo = &sginfo;
                hwq_attr.sginfo->umem = umem;
                hwq_attr.sginfo->npages = pages;
-               hwq_attr.sginfo->pgsize = PAGE_SIZE;
-               hwq_attr.sginfo->pgshft = PAGE_SHIFT;
+               hwq_attr.sginfo->pgsize = buf_pg_size;
+               hwq_attr.sginfo->pgshft = ilog2(buf_pg_size);
                rc = bnxt_qplib_alloc_init_hwq(&mr->hwq, &hwq_attr);
                if (rc) {
                        dev_err(&res->pdev->dev,
index 8eca6c1..2a195c4 100644 (file)
@@ -1403,7 +1403,7 @@ static int pbl_continuous_initialize(struct efa_dev *dev,
  */
 static int pbl_indirect_initialize(struct efa_dev *dev, struct pbl_context *pbl)
 {
-       u32 size_in_pages = DIV_ROUND_UP(pbl->pbl_buf_size_in_bytes, PAGE_SIZE);
+       u32 size_in_pages = DIV_ROUND_UP(pbl->pbl_buf_size_in_bytes, EFA_CHUNK_PAYLOAD_SIZE);
        struct scatterlist *sgl;
        int sg_dma_cnt, err;
 
index 84f1167..d4c6b9b 100644 (file)
@@ -4583,11 +4583,9 @@ static int modify_qp_init_to_rtr(struct ib_qp *ibqp,
        mtu = ib_mtu_enum_to_int(ib_mtu);
        if (WARN_ON(mtu <= 0))
                return -EINVAL;
-#define MAX_LP_MSG_LEN 16384
-       /* MTU * (2 ^ LP_PKTN_INI) shouldn't be bigger than 16KB */
-       lp_pktn_ini = ilog2(MAX_LP_MSG_LEN / mtu);
-       if (WARN_ON(lp_pktn_ini >= 0xF))
-               return -EINVAL;
+#define MIN_LP_MSG_LEN 1024
+       /* mtu * (2 ^ lp_pktn_ini) should be in the range of 1024 to mtu */
+       lp_pktn_ini = ilog2(max(mtu, MIN_LP_MSG_LEN) / mtu);
 
        if (attr_mask & IB_QP_PATH_MTU) {
                hr_reg_write(context, QPC_MTU, ib_mtu);
@@ -5012,7 +5010,6 @@ static int hns_roce_v2_set_abs_fields(struct ib_qp *ibqp,
 static bool check_qp_timeout_cfg_range(struct hns_roce_dev *hr_dev, u8 *timeout)
 {
 #define QP_ACK_TIMEOUT_MAX_HIP08 20
-#define QP_ACK_TIMEOUT_OFFSET 10
 #define QP_ACK_TIMEOUT_MAX 31
 
        if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08) {
@@ -5021,7 +5018,7 @@ static bool check_qp_timeout_cfg_range(struct hns_roce_dev *hr_dev, u8 *timeout)
                                   "local ACK timeout shall be 0 to 20.\n");
                        return false;
                }
-               *timeout += QP_ACK_TIMEOUT_OFFSET;
+               *timeout += HNS_ROCE_V2_QP_ACK_TIMEOUT_OFS_HIP08;
        } else if (hr_dev->pci_dev->revision > PCI_REVISION_ID_HIP08) {
                if (*timeout > QP_ACK_TIMEOUT_MAX) {
                        ibdev_warn(&hr_dev->ib_dev,
@@ -5307,6 +5304,18 @@ out:
        return ret;
 }
 
+static u8 get_qp_timeout_attr(struct hns_roce_dev *hr_dev,
+                             struct hns_roce_v2_qp_context *context)
+{
+       u8 timeout;
+
+       timeout = (u8)hr_reg_read(context, QPC_AT);
+       if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08)
+               timeout -= HNS_ROCE_V2_QP_ACK_TIMEOUT_OFS_HIP08;
+
+       return timeout;
+}
+
 static int hns_roce_v2_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
                                int qp_attr_mask,
                                struct ib_qp_init_attr *qp_init_attr)
@@ -5384,7 +5393,7 @@ static int hns_roce_v2_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
        qp_attr->max_dest_rd_atomic = 1 << hr_reg_read(&context, QPC_RR_MAX);
 
        qp_attr->min_rnr_timer = (u8)hr_reg_read(&context, QPC_MIN_RNR_TIME);
-       qp_attr->timeout = (u8)hr_reg_read(&context, QPC_AT);
+       qp_attr->timeout = get_qp_timeout_attr(hr_dev, &context);
        qp_attr->retry_cnt = hr_reg_read(&context, QPC_RETRY_NUM_INIT);
        qp_attr->rnr_retry = hr_reg_read(&context, QPC_RNR_NUM_INIT);
 
index 1b44d24..7033eae 100644 (file)
@@ -44,6 +44,8 @@
 #define HNS_ROCE_V2_MAX_XRCD_NUM               0x1000000
 #define HNS_ROCE_V2_RSV_XRCD_NUM               0
 
+#define HNS_ROCE_V2_QP_ACK_TIMEOUT_OFS_HIP08    10
+
 #define HNS_ROCE_V3_SCCC_SZ                    64
 #define HNS_ROCE_V3_GMV_ENTRY_SZ               32
 
index 37a5cf6..1437649 100644 (file)
@@ -33,6 +33,7 @@
 
 #include <linux/vmalloc.h>
 #include <rdma/ib_umem.h>
+#include <linux/math.h>
 #include "hns_roce_device.h"
 #include "hns_roce_cmd.h"
 #include "hns_roce_hem.h"
@@ -909,6 +910,44 @@ static int mtr_init_buf_cfg(struct hns_roce_dev *hr_dev,
        return page_cnt;
 }
 
+static u64 cal_pages_per_l1ba(unsigned int ba_per_bt, unsigned int hopnum)
+{
+       return int_pow(ba_per_bt, hopnum - 1);
+}
+
+static unsigned int cal_best_bt_pg_sz(struct hns_roce_dev *hr_dev,
+                                     struct hns_roce_mtr *mtr,
+                                     unsigned int pg_shift)
+{
+       unsigned long cap = hr_dev->caps.page_size_cap;
+       struct hns_roce_buf_region *re;
+       unsigned int pgs_per_l1ba;
+       unsigned int ba_per_bt;
+       unsigned int ba_num;
+       int i;
+
+       for_each_set_bit_from(pg_shift, &cap, sizeof(cap) * BITS_PER_BYTE) {
+               if (!(BIT(pg_shift) & cap))
+                       continue;
+
+               ba_per_bt = BIT(pg_shift) / BA_BYTE_LEN;
+               ba_num = 0;
+               for (i = 0; i < mtr->hem_cfg.region_count; i++) {
+                       re = &mtr->hem_cfg.region[i];
+                       if (re->hopnum == 0)
+                               continue;
+
+                       pgs_per_l1ba = cal_pages_per_l1ba(ba_per_bt, re->hopnum);
+                       ba_num += DIV_ROUND_UP(re->count, pgs_per_l1ba);
+               }
+
+               if (ba_num <= ba_per_bt)
+                       return pg_shift;
+       }
+
+       return 0;
+}
+
 static int mtr_alloc_mtt(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
                         unsigned int ba_page_shift)
 {
@@ -917,6 +956,10 @@ static int mtr_alloc_mtt(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr,
 
        hns_roce_hem_list_init(&mtr->hem_list);
        if (!cfg->is_direct) {
+               ba_page_shift = cal_best_bt_pg_sz(hr_dev, mtr, ba_page_shift);
+               if (!ba_page_shift)
+                       return -ERANGE;
+
                ret = hns_roce_hem_list_request(hr_dev, &mtr->hem_list,
                                                cfg->region, cfg->region_count,
                                                ba_page_shift);
index ab5cdf7..eaa12c1 100644 (file)
@@ -522,11 +522,6 @@ static int irdma_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
        if (!iwqp->user_mode)
                cancel_delayed_work_sync(&iwqp->dwork_flush);
 
-       irdma_qp_rem_ref(&iwqp->ibqp);
-       wait_for_completion(&iwqp->free_qp);
-       irdma_free_lsmm_rsrc(iwqp);
-       irdma_cqp_qp_destroy_cmd(&iwdev->rf->sc_dev, &iwqp->sc_qp);
-
        if (!iwqp->user_mode) {
                if (iwqp->iwscq) {
                        irdma_clean_cqes(iwqp, iwqp->iwscq);
@@ -534,6 +529,12 @@ static int irdma_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
                                irdma_clean_cqes(iwqp, iwqp->iwrcq);
                }
        }
+
+       irdma_qp_rem_ref(&iwqp->ibqp);
+       wait_for_completion(&iwqp->free_qp);
+       irdma_free_lsmm_rsrc(iwqp);
+       irdma_cqp_qp_destroy_cmd(&iwdev->rf->sc_dev, &iwqp->sc_qp);
+
        irdma_remove_push_mmap_entries(iwqp);
        irdma_free_qp_rsrc(iwqp);
 
@@ -3291,6 +3292,7 @@ static int irdma_post_send(struct ib_qp *ibqp,
                        break;
                case IB_WR_LOCAL_INV:
                        info.op_type = IRDMA_OP_TYPE_INV_STAG;
+                       info.local_fence = info.read_fence;
                        info.op.inv_local_stag.target_stag = ib_wr->ex.invalidate_rkey;
                        err = irdma_uk_stag_local_invalidate(ukqp, &info, true);
                        break;
index 1c06920..93257fa 100644 (file)
@@ -209,7 +209,8 @@ static const struct mlx5_ib_counters *get_counters(struct mlx5_ib_dev *dev,
             !vport_qcounters_supported(dev)) || !port_num)
                return &dev->port[0].cnts;
 
-       return &dev->port[port_num - 1].cnts;
+       return is_mdev_switchdev_mode(dev->mdev) ?
+              &dev->port[1].cnts : &dev->port[port_num - 1].cnts;
 }
 
 /**
@@ -262,7 +263,7 @@ static struct rdma_hw_stats *
 mlx5_ib_alloc_hw_port_stats(struct ib_device *ibdev, u32 port_num)
 {
        struct mlx5_ib_dev *dev = to_mdev(ibdev);
-       const struct mlx5_ib_counters *cnts = &dev->port[port_num - 1].cnts;
+       const struct mlx5_ib_counters *cnts = get_counters(dev, port_num);
 
        return do_alloc_stats(cnts);
 }
@@ -329,6 +330,7 @@ static int mlx5_ib_query_q_counters_vport(struct mlx5_ib_dev *dev,
 {
        u32 out[MLX5_ST_SZ_DW(query_q_counter_out)] = {};
        u32 in[MLX5_ST_SZ_DW(query_q_counter_in)] = {};
+       struct mlx5_core_dev *mdev;
        __be32 val;
        int ret, i;
 
@@ -336,12 +338,16 @@ static int mlx5_ib_query_q_counters_vport(struct mlx5_ib_dev *dev,
            dev->port[port_num].rep->vport == MLX5_VPORT_UPLINK)
                return 0;
 
+       mdev = mlx5_eswitch_get_core_dev(dev->port[port_num].rep->esw);
+       if (!mdev)
+               return -EOPNOTSUPP;
+
        MLX5_SET(query_q_counter_in, in, opcode, MLX5_CMD_OP_QUERY_Q_COUNTER);
        MLX5_SET(query_q_counter_in, in, other_vport, 1);
        MLX5_SET(query_q_counter_in, in, vport_number,
                 dev->port[port_num].rep->vport);
        MLX5_SET(query_q_counter_in, in, aggregate, 1);
-       ret = mlx5_cmd_exec_inout(dev->mdev, query_q_counter, in, out);
+       ret = mlx5_cmd_exec_inout(mdev, query_q_counter, in, out);
        if (ret)
                return ret;
 
@@ -575,43 +581,53 @@ static void mlx5_ib_fill_counters(struct mlx5_ib_dev *dev,
        bool is_vport = is_mdev_switchdev_mode(dev->mdev) &&
                        port_num != MLX5_VPORT_PF;
        const struct mlx5_ib_counter *names;
-       int j = 0, i;
+       int j = 0, i, size;
 
        names = is_vport ? vport_basic_q_cnts : basic_q_cnts;
-       for (i = 0; i < ARRAY_SIZE(basic_q_cnts); i++, j++) {
+       size = is_vport ? ARRAY_SIZE(vport_basic_q_cnts) :
+                         ARRAY_SIZE(basic_q_cnts);
+       for (i = 0; i < size; i++, j++) {
                descs[j].name = names[i].name;
-               offsets[j] = basic_q_cnts[i].offset;
+               offsets[j] = names[i].offset;
        }
 
        names = is_vport ? vport_out_of_seq_q_cnts : out_of_seq_q_cnts;
+       size = is_vport ? ARRAY_SIZE(vport_out_of_seq_q_cnts) :
+                         ARRAY_SIZE(out_of_seq_q_cnts);
        if (MLX5_CAP_GEN(dev->mdev, out_of_seq_cnt)) {
-               for (i = 0; i < ARRAY_SIZE(out_of_seq_q_cnts); i++, j++) {
+               for (i = 0; i < size; i++, j++) {
                        descs[j].name = names[i].name;
-                       offsets[j] = out_of_seq_q_cnts[i].offset;
+                       offsets[j] = names[i].offset;
                }
        }
 
        names = is_vport ? vport_retrans_q_cnts : retrans_q_cnts;
+       size = is_vport ? ARRAY_SIZE(vport_retrans_q_cnts) :
+                         ARRAY_SIZE(retrans_q_cnts);
        if (MLX5_CAP_GEN(dev->mdev, retransmission_q_counters)) {
-               for (i = 0; i < ARRAY_SIZE(retrans_q_cnts); i++, j++) {
+               for (i = 0; i < size; i++, j++) {
                        descs[j].name = names[i].name;
-                       offsets[j] = retrans_q_cnts[i].offset;
+                       offsets[j] = names[i].offset;
                }
        }
 
        names = is_vport ? vport_extended_err_cnts : extended_err_cnts;
+       size = is_vport ? ARRAY_SIZE(vport_extended_err_cnts) :
+                         ARRAY_SIZE(extended_err_cnts);
        if (MLX5_CAP_GEN(dev->mdev, enhanced_error_q_counters)) {
-               for (i = 0; i < ARRAY_SIZE(extended_err_cnts); i++, j++) {
+               for (i = 0; i < size; i++, j++) {
                        descs[j].name = names[i].name;
-                       offsets[j] = extended_err_cnts[i].offset;
+                       offsets[j] = names[i].offset;
                }
        }
 
        names = is_vport ? vport_roce_accl_cnts : roce_accl_cnts;
+       size = is_vport ? ARRAY_SIZE(vport_roce_accl_cnts) :
+                         ARRAY_SIZE(roce_accl_cnts);
        if (MLX5_CAP_GEN(dev->mdev, roce_accl)) {
-               for (i = 0; i < ARRAY_SIZE(roce_accl_cnts); i++, j++) {
+               for (i = 0; i < size; i++, j++) {
                        descs[j].name = names[i].name;
-                       offsets[j] = roce_accl_cnts[i].offset;
+                       offsets[j] = names[i].offset;
                }
        }
 
@@ -661,25 +677,37 @@ static void mlx5_ib_fill_counters(struct mlx5_ib_dev *dev,
 static int __mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev,
                                    struct mlx5_ib_counters *cnts, u32 port_num)
 {
-       u32 num_counters, num_op_counters = 0;
+       bool is_vport = is_mdev_switchdev_mode(dev->mdev) &&
+                       port_num != MLX5_VPORT_PF;
+       u32 num_counters, num_op_counters = 0, size;
 
-       num_counters = ARRAY_SIZE(basic_q_cnts);
+       size = is_vport ? ARRAY_SIZE(vport_basic_q_cnts) :
+                         ARRAY_SIZE(basic_q_cnts);
+       num_counters = size;
 
+       size = is_vport ? ARRAY_SIZE(vport_out_of_seq_q_cnts) :
+                         ARRAY_SIZE(out_of_seq_q_cnts);
        if (MLX5_CAP_GEN(dev->mdev, out_of_seq_cnt))
-               num_counters += ARRAY_SIZE(out_of_seq_q_cnts);
+               num_counters += size;
 
+       size = is_vport ? ARRAY_SIZE(vport_retrans_q_cnts) :
+                         ARRAY_SIZE(retrans_q_cnts);
        if (MLX5_CAP_GEN(dev->mdev, retransmission_q_counters))
-               num_counters += ARRAY_SIZE(retrans_q_cnts);
+               num_counters += size;
 
+       size = is_vport ? ARRAY_SIZE(vport_extended_err_cnts) :
+                         ARRAY_SIZE(extended_err_cnts);
        if (MLX5_CAP_GEN(dev->mdev, enhanced_error_q_counters))
-               num_counters += ARRAY_SIZE(extended_err_cnts);
+               num_counters += size;
 
+       size = is_vport ? ARRAY_SIZE(vport_roce_accl_cnts) :
+                         ARRAY_SIZE(roce_accl_cnts);
        if (MLX5_CAP_GEN(dev->mdev, roce_accl))
-               num_counters += ARRAY_SIZE(roce_accl_cnts);
+               num_counters += size;
 
        cnts->num_q_counters = num_counters;
 
-       if (is_mdev_switchdev_mode(dev->mdev) && port_num != MLX5_VPORT_PF)
+       if (is_vport)
                goto skip_non_qcounters;
 
        if (MLX5_CAP_GEN(dev->mdev, cc_query_allowed)) {
@@ -725,11 +753,11 @@ err:
 static void mlx5_ib_dealloc_counters(struct mlx5_ib_dev *dev)
 {
        u32 in[MLX5_ST_SZ_DW(dealloc_q_counter_in)] = {};
-       int num_cnt_ports;
+       int num_cnt_ports = dev->num_ports;
        int i, j;
 
-       num_cnt_ports = (!is_mdev_switchdev_mode(dev->mdev) ||
-                        vport_qcounters_supported(dev)) ? dev->num_ports : 1;
+       if (is_mdev_switchdev_mode(dev->mdev))
+               num_cnt_ports = min(2, num_cnt_ports);
 
        MLX5_SET(dealloc_q_counter_in, in, opcode,
                 MLX5_CMD_OP_DEALLOC_Q_COUNTER);
@@ -761,15 +789,22 @@ static int mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev)
 {
        u32 out[MLX5_ST_SZ_DW(alloc_q_counter_out)] = {};
        u32 in[MLX5_ST_SZ_DW(alloc_q_counter_in)] = {};
-       int num_cnt_ports;
+       int num_cnt_ports = dev->num_ports;
        int err = 0;
        int i;
        bool is_shared;
 
        MLX5_SET(alloc_q_counter_in, in, opcode, MLX5_CMD_OP_ALLOC_Q_COUNTER);
        is_shared = MLX5_CAP_GEN(dev->mdev, log_max_uctx) != 0;
-       num_cnt_ports = (!is_mdev_switchdev_mode(dev->mdev) ||
-                        vport_qcounters_supported(dev)) ? dev->num_ports : 1;
+
+       /*
+        * In switchdev we need to allocate two ports, one that is used for
+        * the device Q_counters and it is essentially the real Q_counters of
+        * this device, while the other is used as a helper for PF to be able to
+        * query all other vports.
+        */
+       if (is_mdev_switchdev_mode(dev->mdev))
+               num_cnt_ports = min(2, num_cnt_ports);
 
        for (i = 0; i < num_cnt_ports; i++) {
                err = __mlx5_ib_alloc_counters(dev, &dev->port[i].cnts, i);
index 3008632..1e419e0 100644 (file)
@@ -695,8 +695,6 @@ static struct mlx5_ib_flow_prio *_get_prio(struct mlx5_ib_dev *dev,
        struct mlx5_flow_table_attr ft_attr = {};
        struct mlx5_flow_table *ft;
 
-       if (mlx5_ib_shared_ft_allowed(&dev->ib_dev))
-               ft_attr.uid = MLX5_SHARED_RESOURCE_UID;
        ft_attr.prio = priority;
        ft_attr.max_fte = num_entries;
        ft_attr.flags = flags;
@@ -2025,6 +2023,237 @@ static int flow_matcher_cleanup(struct ib_uobject *uobject,
        return 0;
 }
 
+static int steering_anchor_create_ft(struct mlx5_ib_dev *dev,
+                                    struct mlx5_ib_flow_prio *ft_prio,
+                                    enum mlx5_flow_namespace_type ns_type)
+{
+       struct mlx5_flow_table_attr ft_attr = {};
+       struct mlx5_flow_namespace *ns;
+       struct mlx5_flow_table *ft;
+
+       if (ft_prio->anchor.ft)
+               return 0;
+
+       ns = mlx5_get_flow_namespace(dev->mdev, ns_type);
+       if (!ns)
+               return -EOPNOTSUPP;
+
+       ft_attr.flags = MLX5_FLOW_TABLE_UNMANAGED;
+       ft_attr.uid = MLX5_SHARED_RESOURCE_UID;
+       ft_attr.prio = 0;
+       ft_attr.max_fte = 2;
+       ft_attr.level = 1;
+
+       ft = mlx5_create_flow_table(ns, &ft_attr);
+       if (IS_ERR(ft))
+               return PTR_ERR(ft);
+
+       ft_prio->anchor.ft = ft;
+
+       return 0;
+}
+
+static void steering_anchor_destroy_ft(struct mlx5_ib_flow_prio *ft_prio)
+{
+       if (ft_prio->anchor.ft) {
+               mlx5_destroy_flow_table(ft_prio->anchor.ft);
+               ft_prio->anchor.ft = NULL;
+       }
+}
+
+static int
+steering_anchor_create_fg_drop(struct mlx5_ib_flow_prio *ft_prio)
+{
+       int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+       struct mlx5_flow_group *fg;
+       void *flow_group_in;
+       int err = 0;
+
+       if (ft_prio->anchor.fg_drop)
+               return 0;
+
+       flow_group_in = kvzalloc(inlen, GFP_KERNEL);
+       if (!flow_group_in)
+               return -ENOMEM;
+
+       MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, 1);
+       MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, 1);
+
+       fg = mlx5_create_flow_group(ft_prio->anchor.ft, flow_group_in);
+       if (IS_ERR(fg)) {
+               err = PTR_ERR(fg);
+               goto out;
+       }
+
+       ft_prio->anchor.fg_drop = fg;
+
+out:
+       kvfree(flow_group_in);
+
+       return err;
+}
+
+static void
+steering_anchor_destroy_fg_drop(struct mlx5_ib_flow_prio *ft_prio)
+{
+       if (ft_prio->anchor.fg_drop) {
+               mlx5_destroy_flow_group(ft_prio->anchor.fg_drop);
+               ft_prio->anchor.fg_drop = NULL;
+       }
+}
+
+static int
+steering_anchor_create_fg_goto_table(struct mlx5_ib_flow_prio *ft_prio)
+{
+       int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+       struct mlx5_flow_group *fg;
+       void *flow_group_in;
+       int err = 0;
+
+       if (ft_prio->anchor.fg_goto_table)
+               return 0;
+
+       flow_group_in = kvzalloc(inlen, GFP_KERNEL);
+       if (!flow_group_in)
+               return -ENOMEM;
+
+       fg = mlx5_create_flow_group(ft_prio->anchor.ft, flow_group_in);
+       if (IS_ERR(fg)) {
+               err = PTR_ERR(fg);
+               goto out;
+       }
+       ft_prio->anchor.fg_goto_table = fg;
+
+out:
+       kvfree(flow_group_in);
+
+       return err;
+}
+
+static void
+steering_anchor_destroy_fg_goto_table(struct mlx5_ib_flow_prio *ft_prio)
+{
+       if (ft_prio->anchor.fg_goto_table) {
+               mlx5_destroy_flow_group(ft_prio->anchor.fg_goto_table);
+               ft_prio->anchor.fg_goto_table = NULL;
+       }
+}
+
+static int
+steering_anchor_create_rule_drop(struct mlx5_ib_flow_prio *ft_prio)
+{
+       struct mlx5_flow_act flow_act = {};
+       struct mlx5_flow_handle *handle;
+
+       if (ft_prio->anchor.rule_drop)
+               return 0;
+
+       flow_act.fg = ft_prio->anchor.fg_drop;
+       flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP;
+
+       handle = mlx5_add_flow_rules(ft_prio->anchor.ft, NULL, &flow_act,
+                                    NULL, 0);
+       if (IS_ERR(handle))
+               return PTR_ERR(handle);
+
+       ft_prio->anchor.rule_drop = handle;
+
+       return 0;
+}
+
+static void steering_anchor_destroy_rule_drop(struct mlx5_ib_flow_prio *ft_prio)
+{
+       if (ft_prio->anchor.rule_drop) {
+               mlx5_del_flow_rules(ft_prio->anchor.rule_drop);
+               ft_prio->anchor.rule_drop = NULL;
+       }
+}
+
+static int
+steering_anchor_create_rule_goto_table(struct mlx5_ib_flow_prio *ft_prio)
+{
+       struct mlx5_flow_destination dest = {};
+       struct mlx5_flow_act flow_act = {};
+       struct mlx5_flow_handle *handle;
+
+       if (ft_prio->anchor.rule_goto_table)
+               return 0;
+
+       flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+       flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
+       flow_act.fg = ft_prio->anchor.fg_goto_table;
+
+       dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+       dest.ft = ft_prio->flow_table;
+
+       handle = mlx5_add_flow_rules(ft_prio->anchor.ft, NULL, &flow_act,
+                                    &dest, 1);
+       if (IS_ERR(handle))
+               return PTR_ERR(handle);
+
+       ft_prio->anchor.rule_goto_table = handle;
+
+       return 0;
+}
+
+static void
+steering_anchor_destroy_rule_goto_table(struct mlx5_ib_flow_prio *ft_prio)
+{
+       if (ft_prio->anchor.rule_goto_table) {
+               mlx5_del_flow_rules(ft_prio->anchor.rule_goto_table);
+               ft_prio->anchor.rule_goto_table = NULL;
+       }
+}
+
+static int steering_anchor_create_res(struct mlx5_ib_dev *dev,
+                                     struct mlx5_ib_flow_prio *ft_prio,
+                                     enum mlx5_flow_namespace_type ns_type)
+{
+       int err;
+
+       err = steering_anchor_create_ft(dev, ft_prio, ns_type);
+       if (err)
+               return err;
+
+       err = steering_anchor_create_fg_drop(ft_prio);
+       if (err)
+               goto destroy_ft;
+
+       err = steering_anchor_create_fg_goto_table(ft_prio);
+       if (err)
+               goto destroy_fg_drop;
+
+       err = steering_anchor_create_rule_drop(ft_prio);
+       if (err)
+               goto destroy_fg_goto_table;
+
+       err = steering_anchor_create_rule_goto_table(ft_prio);
+       if (err)
+               goto destroy_rule_drop;
+
+       return 0;
+
+destroy_rule_drop:
+       steering_anchor_destroy_rule_drop(ft_prio);
+destroy_fg_goto_table:
+       steering_anchor_destroy_fg_goto_table(ft_prio);
+destroy_fg_drop:
+       steering_anchor_destroy_fg_drop(ft_prio);
+destroy_ft:
+       steering_anchor_destroy_ft(ft_prio);
+
+       return err;
+}
+
+static void mlx5_steering_anchor_destroy_res(struct mlx5_ib_flow_prio *ft_prio)
+{
+       steering_anchor_destroy_rule_goto_table(ft_prio);
+       steering_anchor_destroy_rule_drop(ft_prio);
+       steering_anchor_destroy_fg_goto_table(ft_prio);
+       steering_anchor_destroy_fg_drop(ft_prio);
+       steering_anchor_destroy_ft(ft_prio);
+}
+
 static int steering_anchor_cleanup(struct ib_uobject *uobject,
                                   enum rdma_remove_reason why,
                                   struct uverbs_attr_bundle *attrs)
@@ -2035,6 +2264,9 @@ static int steering_anchor_cleanup(struct ib_uobject *uobject,
                return -EBUSY;
 
        mutex_lock(&obj->dev->flow_db->lock);
+       if (!--obj->ft_prio->anchor.rule_goto_table_ref)
+               steering_anchor_destroy_rule_goto_table(obj->ft_prio);
+
        put_flow_table(obj->dev, obj->ft_prio, true);
        mutex_unlock(&obj->dev->flow_db->lock);
 
@@ -2042,6 +2274,24 @@ static int steering_anchor_cleanup(struct ib_uobject *uobject,
        return 0;
 }
 
+static void fs_cleanup_anchor(struct mlx5_ib_flow_prio *prio,
+                             int count)
+{
+       while (count--)
+               mlx5_steering_anchor_destroy_res(&prio[count]);
+}
+
+void mlx5_ib_fs_cleanup_anchor(struct mlx5_ib_dev *dev)
+{
+       fs_cleanup_anchor(dev->flow_db->prios, MLX5_IB_NUM_FLOW_FT);
+       fs_cleanup_anchor(dev->flow_db->egress_prios, MLX5_IB_NUM_FLOW_FT);
+       fs_cleanup_anchor(dev->flow_db->sniffer, MLX5_IB_NUM_SNIFFER_FTS);
+       fs_cleanup_anchor(dev->flow_db->egress, MLX5_IB_NUM_EGRESS_FTS);
+       fs_cleanup_anchor(dev->flow_db->fdb, MLX5_IB_NUM_FDB_FTS);
+       fs_cleanup_anchor(dev->flow_db->rdma_rx, MLX5_IB_NUM_FLOW_FT);
+       fs_cleanup_anchor(dev->flow_db->rdma_tx, MLX5_IB_NUM_FLOW_FT);
+}
+
 static int mlx5_ib_matcher_ns(struct uverbs_attr_bundle *attrs,
                              struct mlx5_ib_flow_matcher *obj)
 {
@@ -2182,21 +2432,31 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_STEERING_ANCHOR_CREATE)(
                return -ENOMEM;
 
        mutex_lock(&dev->flow_db->lock);
+
        ft_prio = _get_flow_table(dev, priority, ns_type, 0);
        if (IS_ERR(ft_prio)) {
-               mutex_unlock(&dev->flow_db->lock);
                err = PTR_ERR(ft_prio);
                goto free_obj;
        }
 
        ft_prio->refcount++;
-       ft_id = mlx5_flow_table_id(ft_prio->flow_table);
-       mutex_unlock(&dev->flow_db->lock);
+
+       if (!ft_prio->anchor.rule_goto_table_ref) {
+               err = steering_anchor_create_res(dev, ft_prio, ns_type);
+               if (err)
+                       goto put_flow_table;
+       }
+
+       ft_prio->anchor.rule_goto_table_ref++;
+
+       ft_id = mlx5_flow_table_id(ft_prio->anchor.ft);
 
        err = uverbs_copy_to(attrs, MLX5_IB_ATTR_STEERING_ANCHOR_FT_ID,
                             &ft_id, sizeof(ft_id));
        if (err)
-               goto put_flow_table;
+               goto destroy_res;
+
+       mutex_unlock(&dev->flow_db->lock);
 
        uobj->object = obj;
        obj->dev = dev;
@@ -2205,8 +2465,10 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_STEERING_ANCHOR_CREATE)(
 
        return 0;
 
+destroy_res:
+       --ft_prio->anchor.rule_goto_table_ref;
+       mlx5_steering_anchor_destroy_res(ft_prio);
 put_flow_table:
-       mutex_lock(&dev->flow_db->lock);
        put_flow_table(dev, ft_prio, true);
        mutex_unlock(&dev->flow_db->lock);
 free_obj:
index ad320ad..b973490 100644 (file)
@@ -10,6 +10,7 @@
 
 #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
 int mlx5_ib_fs_init(struct mlx5_ib_dev *dev);
+void mlx5_ib_fs_cleanup_anchor(struct mlx5_ib_dev *dev);
 #else
 static inline int mlx5_ib_fs_init(struct mlx5_ib_dev *dev)
 {
@@ -21,9 +22,24 @@ static inline int mlx5_ib_fs_init(struct mlx5_ib_dev *dev)
        mutex_init(&dev->flow_db->lock);
        return 0;
 }
+
+inline void mlx5_ib_fs_cleanup_anchor(struct mlx5_ib_dev *dev) {}
 #endif
+
 static inline void mlx5_ib_fs_cleanup(struct mlx5_ib_dev *dev)
 {
+       /* When a steering anchor is created, a special flow table is also
+        * created for the user to reference. Since the user can reference it,
+        * the kernel cannot trust that when the user destroys the steering
+        * anchor, they no longer reference the flow table.
+        *
+        * To address this issue, when a user destroys a steering anchor, only
+        * the flow steering rule in the table is destroyed, but the table
+        * itself is kept to deal with the above scenario. The remaining
+        * resources are only removed when the RDMA device is destroyed, which
+        * is a safe assumption that all references are gone.
+        */
+       mlx5_ib_fs_cleanup_anchor(dev);
        kfree(dev->flow_db);
 }
 #endif /* _MLX5_IB_FS_H */
index 5d45de2..f0b394e 100644 (file)
@@ -4275,6 +4275,9 @@ const struct mlx5_ib_profile raw_eth_profile = {
        STAGE_CREATE(MLX5_IB_STAGE_POST_IB_REG_UMR,
                     mlx5_ib_stage_post_ib_reg_umr_init,
                     NULL),
+       STAGE_CREATE(MLX5_IB_STAGE_DELAY_DROP,
+                    mlx5_ib_stage_delay_drop_init,
+                    mlx5_ib_stage_delay_drop_cleanup),
        STAGE_CREATE(MLX5_IB_STAGE_RESTRACK,
                     mlx5_ib_restrack_init,
                     NULL),
index efa4dc6..2dfa6f4 100644 (file)
@@ -237,8 +237,19 @@ enum {
 #define MLX5_IB_NUM_SNIFFER_FTS                2
 #define MLX5_IB_NUM_EGRESS_FTS         1
 #define MLX5_IB_NUM_FDB_FTS            MLX5_BY_PASS_NUM_REGULAR_PRIOS
+
+struct mlx5_ib_anchor {
+       struct mlx5_flow_table *ft;
+       struct mlx5_flow_group *fg_goto_table;
+       struct mlx5_flow_group *fg_drop;
+       struct mlx5_flow_handle *rule_goto_table;
+       struct mlx5_flow_handle *rule_drop;
+       unsigned int rule_goto_table_ref;
+};
+
 struct mlx5_ib_flow_prio {
        struct mlx5_flow_table          *flow_table;
+       struct mlx5_ib_anchor           anchor;
        unsigned int                    refcount;
 };
 
@@ -1587,6 +1598,9 @@ static inline bool mlx5_ib_lag_should_assign_affinity(struct mlx5_ib_dev *dev)
            MLX5_CAP_PORT_SELECTION(dev->mdev, port_select_flow_table_bypass))
                return 0;
 
+       if (mlx5_lag_is_lacp_owner(dev->mdev) && !dev->lag_active)
+               return 0;
+
        return dev->lag_active ||
                (MLX5_CAP_GEN(dev->mdev, num_lag_ports) > 1 &&
                 MLX5_CAP_GEN(dev->mdev, lag_tx_port_affinity));
index 70ca8ff..78b96bf 100644 (file)
@@ -1237,6 +1237,9 @@ static int create_raw_packet_qp_tis(struct mlx5_ib_dev *dev,
 
        MLX5_SET(create_tis_in, in, uid, to_mpd(pd)->uid);
        MLX5_SET(tisc, tisc, transport_domain, tdn);
+       if (!mlx5_ib_lag_should_assign_affinity(dev) &&
+           mlx5_lag_is_lacp_owner(dev->mdev))
+               MLX5_SET(tisc, tisc, strict_lag_tx_port_affinity, 1);
        if (qp->flags & IB_QP_CREATE_SOURCE_QPN)
                MLX5_SET(tisc, tisc, underlay_qpn, qp->underlay_qpn);
 
index db18ace..f46c5a5 100644 (file)
@@ -115,15 +115,16 @@ static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode)
 void retransmit_timer(struct timer_list *t)
 {
        struct rxe_qp *qp = from_timer(qp, t, retrans_timer);
+       unsigned long flags;
 
        rxe_dbg_qp(qp, "retransmit timer fired\n");
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (qp->valid) {
                qp->comp.timeout = 1;
                rxe_sched_task(&qp->comp.task);
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 }
 
 void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
@@ -481,11 +482,13 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 
 static void comp_check_sq_drain_done(struct rxe_qp *qp)
 {
-       spin_lock_bh(&qp->state_lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (unlikely(qp_state(qp) == IB_QPS_SQD)) {
                if (qp->attr.sq_draining && qp->comp.psn == qp->req.psn) {
                        qp->attr.sq_draining = 0;
-                       spin_unlock_bh(&qp->state_lock);
+                       spin_unlock_irqrestore(&qp->state_lock, flags);
 
                        if (qp->ibqp.event_handler) {
                                struct ib_event ev;
@@ -499,7 +502,7 @@ static void comp_check_sq_drain_done(struct rxe_qp *qp)
                        return;
                }
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 }
 
 static inline enum comp_state complete_ack(struct rxe_qp *qp,
@@ -625,13 +628,15 @@ static void free_pkt(struct rxe_pkt_info *pkt)
  */
 static void reset_retry_timer(struct rxe_qp *qp)
 {
+       unsigned long flags;
+
        if (qp_type(qp) == IB_QPT_RC && qp->qp_timeout_jiffies) {
-               spin_lock_bh(&qp->state_lock);
+               spin_lock_irqsave(&qp->state_lock, flags);
                if (qp_state(qp) >= IB_QPS_RTS &&
                    psn_compare(qp->req.psn, qp->comp.psn) > 0)
                        mod_timer(&qp->retrans_timer,
                                  jiffies + qp->qp_timeout_jiffies);
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
        }
 }
 
@@ -643,18 +648,19 @@ int rxe_completer(struct rxe_qp *qp)
        struct rxe_pkt_info *pkt = NULL;
        enum comp_state state;
        int ret;
+       unsigned long flags;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (!qp->valid || qp_state(qp) == IB_QPS_ERR ||
                          qp_state(qp) == IB_QPS_RESET) {
                bool notify = qp->valid && (qp_state(qp) == IB_QPS_ERR);
 
                drain_resp_pkts(qp);
                flush_send_queue(qp, notify);
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                goto exit;
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        if (qp->comp.timeout) {
                qp->comp.timeout_retry = 1;
index 20ff0c0..6ca2a05 100644 (file)
@@ -113,8 +113,6 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
 
        queue_advance_producer(cq->queue, QUEUE_TYPE_TO_CLIENT);
 
-       spin_unlock_irqrestore(&cq->cq_lock, flags);
-
        if ((cq->notify == IB_CQ_NEXT_COMP) ||
            (cq->notify == IB_CQ_SOLICITED && solicited)) {
                cq->notify = 0;
@@ -122,6 +120,8 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
                cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
        }
 
+       spin_unlock_irqrestore(&cq->cq_lock, flags);
+
        return 0;
 }
 
index 2bc7361..cd59666 100644 (file)
@@ -159,6 +159,9 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
        pkt->mask = RXE_GRH_MASK;
        pkt->paylen = be16_to_cpu(udph->len) - sizeof(*udph);
 
+       /* remove udp header */
+       skb_pull(skb, sizeof(struct udphdr));
+
        rxe_rcv(skb);
 
        return 0;
@@ -401,6 +404,9 @@ static int rxe_loopback(struct sk_buff *skb, struct rxe_pkt_info *pkt)
                return -EIO;
        }
 
+       /* remove udp header */
+       skb_pull(skb, sizeof(struct udphdr));
+
        rxe_rcv(skb);
 
        return 0;
@@ -412,15 +418,16 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
        int err;
        int is_request = pkt->mask & RXE_REQ_MASK;
        struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+       unsigned long flags;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if ((is_request && (qp_state(qp) < IB_QPS_RTS)) ||
            (!is_request && (qp_state(qp) < IB_QPS_RTR))) {
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                rxe_dbg_qp(qp, "Packet dropped. QP is not in ready state\n");
                goto drop;
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        rxe_icrc_generate(skb, pkt);
 
index c5451a4..a0f2064 100644 (file)
@@ -176,6 +176,9 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp,
        spin_lock_init(&qp->rq.producer_lock);
        spin_lock_init(&qp->rq.consumer_lock);
 
+       skb_queue_head_init(&qp->req_pkts);
+       skb_queue_head_init(&qp->resp_pkts);
+
        atomic_set(&qp->ssn, 0);
        atomic_set(&qp->skb_out, 0);
 }
@@ -234,8 +237,6 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
        qp->req.opcode          = -1;
        qp->comp.opcode         = -1;
 
-       skb_queue_head_init(&qp->req_pkts);
-
        rxe_init_task(&qp->req.task, qp, rxe_requester);
        rxe_init_task(&qp->comp.task, qp, rxe_completer);
 
@@ -279,8 +280,6 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp,
                }
        }
 
-       skb_queue_head_init(&qp->resp_pkts);
-
        rxe_init_task(&qp->resp.task, qp, rxe_responder);
 
        qp->resp.opcode         = OPCODE_NONE;
@@ -300,6 +299,7 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
        struct rxe_cq *rcq = to_rcq(init->recv_cq);
        struct rxe_cq *scq = to_rcq(init->send_cq);
        struct rxe_srq *srq = init->srq ? to_rsrq(init->srq) : NULL;
+       unsigned long flags;
 
        rxe_get(pd);
        rxe_get(rcq);
@@ -325,10 +325,10 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
        if (err)
                goto err2;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        qp->attr.qp_state = IB_QPS_RESET;
        qp->valid = 1;
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        return 0;
 
@@ -492,24 +492,28 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 /* move the qp to the error state */
 void rxe_qp_error(struct rxe_qp *qp)
 {
-       spin_lock_bh(&qp->state_lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&qp->state_lock, flags);
        qp->attr.qp_state = IB_QPS_ERR;
 
        /* drain work and packet queues */
        rxe_sched_task(&qp->resp.task);
        rxe_sched_task(&qp->comp.task);
        rxe_sched_task(&qp->req.task);
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 }
 
 static void rxe_qp_sqd(struct rxe_qp *qp, struct ib_qp_attr *attr,
                       int mask)
 {
-       spin_lock_bh(&qp->state_lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&qp->state_lock, flags);
        qp->attr.sq_draining = 1;
        rxe_sched_task(&qp->comp.task);
        rxe_sched_task(&qp->req.task);
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 }
 
 /* caller should hold qp->state_lock */
@@ -555,14 +559,16 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
                qp->attr.cur_qp_state = attr->qp_state;
 
        if (mask & IB_QP_STATE) {
-               spin_lock_bh(&qp->state_lock);
+               unsigned long flags;
+
+               spin_lock_irqsave(&qp->state_lock, flags);
                err = __qp_chk_state(qp, attr, mask);
                if (!err) {
                        qp->attr.qp_state = attr->qp_state;
                        rxe_dbg_qp(qp, "state -> %s\n",
                                        qps2str[attr->qp_state]);
                }
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
 
                if (err)
                        return err;
@@ -688,6 +694,8 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
 /* called by the query qp verb */
 int rxe_qp_to_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask)
 {
+       unsigned long flags;
+
        *attr = qp->attr;
 
        attr->rq_psn                            = qp->resp.psn;
@@ -708,12 +716,13 @@ int rxe_qp_to_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask)
        /* Applications that get this state typically spin on it.
         * Yield the processor
         */
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (qp->attr.sq_draining) {
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                cond_resched();
+       } else {
+               spin_unlock_irqrestore(&qp->state_lock, flags);
        }
-       spin_unlock_bh(&qp->state_lock);
 
        return 0;
 }
@@ -736,10 +745,11 @@ int rxe_qp_chk_destroy(struct rxe_qp *qp)
 static void rxe_qp_do_cleanup(struct work_struct *work)
 {
        struct rxe_qp *qp = container_of(work, typeof(*qp), cleanup_work.work);
+       unsigned long flags;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        qp->valid = 0;
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
        qp->qp_timeout_jiffies = 0;
 
        if (qp_type(qp) == IB_QPT_RC) {
index 2f953cc..5861e42 100644 (file)
@@ -14,6 +14,7 @@ static int check_type_state(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
                            struct rxe_qp *qp)
 {
        unsigned int pkt_type;
+       unsigned long flags;
 
        if (unlikely(!qp->valid))
                return -EINVAL;
@@ -38,19 +39,19 @@ static int check_type_state(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
                return -EINVAL;
        }
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (pkt->mask & RXE_REQ_MASK) {
                if (unlikely(qp_state(qp) < IB_QPS_RTR)) {
-                       spin_unlock_bh(&qp->state_lock);
+                       spin_unlock_irqrestore(&qp->state_lock, flags);
                        return -EINVAL;
                }
        } else {
                if (unlikely(qp_state(qp) < IB_QPS_RTS)) {
-                       spin_unlock_bh(&qp->state_lock);
+                       spin_unlock_irqrestore(&qp->state_lock, flags);
                        return -EINVAL;
                }
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        return 0;
 }
index 65134a9..5fe7cba 100644 (file)
@@ -99,17 +99,18 @@ static void req_retry(struct rxe_qp *qp)
 void rnr_nak_timer(struct timer_list *t)
 {
        struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
+       unsigned long flags;
 
        rxe_dbg_qp(qp, "nak timer fired\n");
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (qp->valid) {
                /* request a send queue retry */
                qp->req.need_retry = 1;
                qp->req.wait_for_rnr_timer = 0;
                rxe_sched_task(&qp->req.task);
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 }
 
 static void req_check_sq_drain_done(struct rxe_qp *qp)
@@ -118,8 +119,9 @@ static void req_check_sq_drain_done(struct rxe_qp *qp)
        unsigned int index;
        unsigned int cons;
        struct rxe_send_wqe *wqe;
+       unsigned long flags;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (qp_state(qp) == IB_QPS_SQD) {
                q = qp->sq.queue;
                index = qp->req.wqe_index;
@@ -140,7 +142,7 @@ static void req_check_sq_drain_done(struct rxe_qp *qp)
                                break;
 
                        qp->attr.sq_draining = 0;
-                       spin_unlock_bh(&qp->state_lock);
+                       spin_unlock_irqrestore(&qp->state_lock, flags);
 
                        if (qp->ibqp.event_handler) {
                                struct ib_event ev;
@@ -154,7 +156,7 @@ static void req_check_sq_drain_done(struct rxe_qp *qp)
                        return;
                } while (0);
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 }
 
 static struct rxe_send_wqe *__req_next_wqe(struct rxe_qp *qp)
@@ -173,6 +175,7 @@ static struct rxe_send_wqe *__req_next_wqe(struct rxe_qp *qp)
 static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
 {
        struct rxe_send_wqe *wqe;
+       unsigned long flags;
 
        req_check_sq_drain_done(qp);
 
@@ -180,13 +183,13 @@ static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
        if (wqe == NULL)
                return NULL;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (unlikely((qp_state(qp) == IB_QPS_SQD) &&
                     (wqe->state != wqe_state_processing))) {
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                return NULL;
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        wqe->mask = wr_opcode_mask(wqe->wr.opcode, qp);
        return wqe;
@@ -676,16 +679,17 @@ int rxe_requester(struct rxe_qp *qp)
        struct rxe_queue *q = qp->sq.queue;
        struct rxe_ah *ah;
        struct rxe_av *av;
+       unsigned long flags;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (unlikely(!qp->valid)) {
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                goto exit;
        }
 
        if (unlikely(qp_state(qp) == IB_QPS_ERR)) {
                wqe = __req_next_wqe(qp);
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                if (wqe)
                        goto err;
                else
@@ -700,10 +704,10 @@ int rxe_requester(struct rxe_qp *qp)
                qp->req.wait_psn = 0;
                qp->req.need_retry = 0;
                qp->req.wait_for_rnr_timer = 0;
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                goto exit;
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        /* we come here if the retransmit timer has fired
         * or if the rnr timer has fired. If the retransmit
index 68f6cd1..ee68306 100644 (file)
@@ -489,8 +489,9 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
                if (mw->access & IB_ZERO_BASED)
                        qp->resp.offset = mw->addr;
 
-               rxe_put(mw);
                rxe_get(mr);
+               rxe_put(mw);
+               mw = NULL;
        } else {
                mr = lookup_mr(qp->pd, access, rkey, RXE_LOOKUP_REMOTE);
                if (!mr) {
@@ -1047,6 +1048,7 @@ static enum resp_states do_complete(struct rxe_qp *qp,
        struct ib_uverbs_wc *uwc = &cqe.uibwc;
        struct rxe_recv_wqe *wqe = qp->resp.wqe;
        struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+       unsigned long flags;
 
        if (!wqe)
                goto finish;
@@ -1137,12 +1139,12 @@ static enum resp_states do_complete(struct rxe_qp *qp,
                return RESPST_ERR_CQ_OVERFLOW;
 
 finish:
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (unlikely(qp_state(qp) == IB_QPS_ERR)) {
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                return RESPST_CHK_RESOURCE;
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        if (unlikely(!pkt))
                return RESPST_DONE;
@@ -1468,18 +1470,19 @@ int rxe_responder(struct rxe_qp *qp)
        enum resp_states state;
        struct rxe_pkt_info *pkt = NULL;
        int ret;
+       unsigned long flags;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (!qp->valid || qp_state(qp) == IB_QPS_ERR ||
                          qp_state(qp) == IB_QPS_RESET) {
                bool notify = qp->valid && (qp_state(qp) == IB_QPS_ERR);
 
                drain_req_pkts(qp);
                flush_recv_queue(qp, notify);
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                goto exit;
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
 
index dea605b..83093e1 100644 (file)
@@ -904,10 +904,10 @@ static int rxe_post_send_kernel(struct rxe_qp *qp,
        if (!err)
                rxe_sched_task(&qp->req.task);
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (qp_state(qp) == IB_QPS_ERR)
                rxe_sched_task(&qp->comp.task);
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        return err;
 }
@@ -917,22 +917,23 @@ static int rxe_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 {
        struct rxe_qp *qp = to_rqp(ibqp);
        int err;
+       unsigned long flags;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        /* caller has already called destroy_qp */
        if (WARN_ON_ONCE(!qp->valid)) {
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                rxe_err_qp(qp, "qp has been destroyed");
                return -EINVAL;
        }
 
        if (unlikely(qp_state(qp) < IB_QPS_RTS)) {
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                *bad_wr = wr;
                rxe_err_qp(qp, "qp not ready to send");
                return -EINVAL;
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        if (qp->is_user) {
                /* Utilize process context to do protocol processing */
@@ -1008,22 +1009,22 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
        struct rxe_rq *rq = &qp->rq;
        unsigned long flags;
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        /* caller has already called destroy_qp */
        if (WARN_ON_ONCE(!qp->valid)) {
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                rxe_err_qp(qp, "qp has been destroyed");
                return -EINVAL;
        }
 
        /* see C10-97.2.1 */
        if (unlikely((qp_state(qp) < IB_QPS_INIT))) {
-               spin_unlock_bh(&qp->state_lock);
+               spin_unlock_irqrestore(&qp->state_lock, flags);
                *bad_wr = wr;
                rxe_dbg_qp(qp, "qp not ready to post recv");
                return -EINVAL;
        }
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        if (unlikely(qp->srq)) {
                *bad_wr = wr;
@@ -1044,10 +1045,10 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 
        spin_unlock_irqrestore(&rq->producer_lock, flags);
 
-       spin_lock_bh(&qp->state_lock);
+       spin_lock_irqsave(&qp->state_lock, flags);
        if (qp_state(qp) == IB_QPS_ERR)
                rxe_sched_task(&qp->resp.task);
-       spin_unlock_bh(&qp->state_lock);
+       spin_unlock_irqrestore(&qp->state_lock, flags);
 
        return err;
 }
@@ -1356,7 +1357,7 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
        if (cleanup_err)
                rxe_err_mr(mr, "cleanup failed, err = %d", cleanup_err);
 
-       kfree_rcu(mr);
+       kfree_rcu_mightsleep(mr);
        return 0;
 
 err_out:
index f290cd4..92e1e75 100644 (file)
@@ -657,9 +657,13 @@ static int
 isert_connect_error(struct rdma_cm_id *cma_id)
 {
        struct isert_conn *isert_conn = cma_id->qp->qp_context;
+       struct isert_np *isert_np = cma_id->context;
 
        ib_drain_qp(isert_conn->qp);
+
+       mutex_lock(&isert_np->mutex);
        list_del_init(&isert_conn->node);
+       mutex_unlock(&isert_np->mutex);
        isert_conn->cm_id = NULL;
        isert_put_conn(isert_conn);
 
@@ -2431,6 +2435,7 @@ isert_free_np(struct iscsi_np *np)
 {
        struct isert_np *isert_np = np->np_context;
        struct isert_conn *isert_conn, *n;
+       LIST_HEAD(drop_conn_list);
 
        if (isert_np->cm_id)
                rdma_destroy_id(isert_np->cm_id);
@@ -2450,7 +2455,7 @@ isert_free_np(struct iscsi_np *np)
                                         node) {
                        isert_info("cleaning isert_conn %p state (%d)\n",
                                   isert_conn, isert_conn->state);
-                       isert_connect_release(isert_conn);
+                       list_move_tail(&isert_conn->node, &drop_conn_list);
                }
        }
 
@@ -2461,11 +2466,16 @@ isert_free_np(struct iscsi_np *np)
                                         node) {
                        isert_info("cleaning isert_conn %p state (%d)\n",
                                   isert_conn, isert_conn->state);
-                       isert_connect_release(isert_conn);
+                       list_move_tail(&isert_conn->node, &drop_conn_list);
                }
        }
        mutex_unlock(&isert_np->mutex);
 
+       list_for_each_entry_safe(isert_conn, n, &drop_conn_list, node) {
+               list_del_init(&isert_conn->node);
+               isert_connect_release(isert_conn);
+       }
+
        np->np_context = NULL;
        kfree(isert_np);
 }
@@ -2560,8 +2570,6 @@ static void isert_wait_conn(struct iscsit_conn *conn)
        isert_put_unsol_pending_cmds(conn);
        isert_wait4cmds(conn);
        isert_wait4logout(isert_conn);
-
-       queue_work(isert_release_wq, &isert_conn->release_work);
 }
 
 static void isert_free_conn(struct iscsit_conn *conn)
index edb2e3a..cfb50bf 100644 (file)
@@ -2040,6 +2040,7 @@ static int rtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id,
        return 0;
 }
 
+/* The caller should do the cleanup in case of error */
 static int create_cm(struct rtrs_clt_con *con)
 {
        struct rtrs_path *s = con->c.path;
@@ -2062,14 +2063,14 @@ static int create_cm(struct rtrs_clt_con *con)
        err = rdma_set_reuseaddr(cm_id, 1);
        if (err != 0) {
                rtrs_err(s, "Set address reuse failed, err: %d\n", err);
-               goto destroy_cm;
+               return err;
        }
        err = rdma_resolve_addr(cm_id, (struct sockaddr *)&clt_path->s.src_addr,
                                (struct sockaddr *)&clt_path->s.dst_addr,
                                RTRS_CONNECT_TIMEOUT_MS);
        if (err) {
                rtrs_err(s, "Failed to resolve address, err: %d\n", err);
-               goto destroy_cm;
+               return err;
        }
        /*
         * Combine connection status and session events. This is needed
@@ -2084,29 +2085,15 @@ static int create_cm(struct rtrs_clt_con *con)
                if (err == 0)
                        err = -ETIMEDOUT;
                /* Timedout or interrupted */
-               goto errr;
-       }
-       if (con->cm_err < 0) {
-               err = con->cm_err;
-               goto errr;
+               return err;
        }
-       if (READ_ONCE(clt_path->state) != RTRS_CLT_CONNECTING) {
+       if (con->cm_err < 0)
+               return con->cm_err;
+       if (READ_ONCE(clt_path->state) != RTRS_CLT_CONNECTING)
                /* Device removal */
-               err = -ECONNABORTED;
-               goto errr;
-       }
+               return -ECONNABORTED;
 
        return 0;
-
-errr:
-       stop_cm(con);
-       mutex_lock(&con->con_mutex);
-       destroy_con_cq_qp(con);
-       mutex_unlock(&con->con_mutex);
-destroy_cm:
-       destroy_cm(con);
-
-       return err;
 }
 
 static void rtrs_clt_path_up(struct rtrs_clt_path *clt_path)
@@ -2334,7 +2321,7 @@ static void rtrs_clt_close_work(struct work_struct *work)
 static int init_conns(struct rtrs_clt_path *clt_path)
 {
        unsigned int cid;
-       int err;
+       int err, i;
 
        /*
         * On every new session connections increase reconnect counter
@@ -2350,10 +2337,8 @@ static int init_conns(struct rtrs_clt_path *clt_path)
                        goto destroy;
 
                err = create_cm(to_clt_con(clt_path->s.con[cid]));
-               if (err) {
-                       destroy_con(to_clt_con(clt_path->s.con[cid]));
+               if (err)
                        goto destroy;
-               }
        }
        err = alloc_path_reqs(clt_path);
        if (err)
@@ -2364,15 +2349,21 @@ static int init_conns(struct rtrs_clt_path *clt_path)
        return 0;
 
 destroy:
-       while (cid--) {
-               struct rtrs_clt_con *con = to_clt_con(clt_path->s.con[cid]);
+       /* Make sure we do the cleanup in the order they are created */
+       for (i = 0; i <= cid; i++) {
+               struct rtrs_clt_con *con;
 
-               stop_cm(con);
+               if (!clt_path->s.con[i])
+                       break;
 
-               mutex_lock(&con->con_mutex);
-               destroy_con_cq_qp(con);
-               mutex_unlock(&con->con_mutex);
-               destroy_cm(con);
+               con = to_clt_con(clt_path->s.con[i]);
+               if (con->c.cm_id) {
+                       stop_cm(con);
+                       mutex_lock(&con->con_mutex);
+                       destroy_con_cq_qp(con);
+                       mutex_unlock(&con->con_mutex);
+                       destroy_cm(con);
+               }
                destroy_con(con);
        }
        /*
index 4bf9d86..3696f36 100644 (file)
@@ -37,8 +37,10 @@ struct rtrs_iu *rtrs_iu_alloc(u32 iu_num, size_t size, gfp_t gfp_mask,
                        goto err;
 
                iu->dma_addr = ib_dma_map_single(dma_dev, iu->buf, size, dir);
-               if (ib_dma_mapping_error(dma_dev, iu->dma_addr))
+               if (ib_dma_mapping_error(dma_dev, iu->dma_addr)) {
+                       kfree(iu->buf);
                        goto err;
+               }
 
                iu->cqe.done  = done;
                iu->size      = size;
index 37e876d..641eb86 100644 (file)
@@ -703,7 +703,7 @@ void input_close_device(struct input_handle *handle)
 
        __input_release_device(handle);
 
-       if (!dev->inhibited && !--dev->users) {
+       if (!--dev->users && !dev->inhibited) {
                if (dev->poller)
                        input_dev_poller_stop(dev->poller);
                if (dev->close)
index 28be88e..f33622f 100644 (file)
@@ -281,7 +281,6 @@ static const struct xpad_device {
        { 0x1430, 0xf801, "RedOctane Controller", 0, XTYPE_XBOX360 },
        { 0x146b, 0x0601, "BigBen Interactive XBOX 360 Controller", 0, XTYPE_XBOX360 },
        { 0x146b, 0x0604, "Bigben Interactive DAIJA Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 },
-       { 0x1532, 0x0037, "Razer Sabertooth", 0, XTYPE_XBOX360 },
        { 0x1532, 0x0a00, "Razer Atrox Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOXONE },
        { 0x1532, 0x0a03, "Razer Wildcat", 0, XTYPE_XBOXONE },
        { 0x15e4, 0x3f00, "Power A Mini Pro Elite", 0, XTYPE_XBOX360 },
index 0948938..e79f549 100644 (file)
@@ -109,6 +109,27 @@ static const struct dmi_system_id dmi_use_low_level_irq[] = {
 };
 
 /*
+ * Some devices have a wrong entry which points to a GPIO which is
+ * required in another driver, so this driver must not claim it.
+ */
+static const struct dmi_system_id dmi_invalid_acpi_index[] = {
+       {
+               /*
+                * Lenovo Yoga Book X90F / X90L, the PNP0C40 home button entry
+                * points to a GPIO which is not a home button and which is
+                * required by the lenovo-yogabook driver.
+                */
+               .matches = {
+                       DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Intel Corporation"),
+                       DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "CHERRYVIEW D1 PLATFORM"),
+                       DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "YETI-11"),
+               },
+               .driver_data = (void *)1l,
+       },
+       {} /* Terminating entry */
+};
+
+/*
  * Get the Nth GPIO number from the ACPI object.
  */
 static int soc_button_lookup_gpio(struct device *dev, int acpi_index,
@@ -137,6 +158,8 @@ soc_button_device_create(struct platform_device *pdev,
        struct platform_device *pd;
        struct gpio_keys_button *gpio_keys;
        struct gpio_keys_platform_data *gpio_keys_pdata;
+       const struct dmi_system_id *dmi_id;
+       int invalid_acpi_index = -1;
        int error, gpio, irq;
        int n_buttons = 0;
 
@@ -154,10 +177,17 @@ soc_button_device_create(struct platform_device *pdev,
        gpio_keys = (void *)(gpio_keys_pdata + 1);
        n_buttons = 0;
 
+       dmi_id = dmi_first_match(dmi_invalid_acpi_index);
+       if (dmi_id)
+               invalid_acpi_index = (long)dmi_id->driver_data;
+
        for (info = button_info; info->name; info++) {
                if (info->autorepeat != autorepeat)
                        continue;
 
+               if (info->acpi_index == invalid_acpi_index)
+                       continue;
+
                error = soc_button_lookup_gpio(&pdev->dev, info->acpi_index, &gpio, &irq);
                if (error || irq < 0) {
                        /*
index ece97f8..2118b20 100644 (file)
@@ -674,10 +674,11 @@ static void process_packet_head_v4(struct psmouse *psmouse)
        struct input_dev *dev = psmouse->dev;
        struct elantech_data *etd = psmouse->private;
        unsigned char *packet = psmouse->packet;
-       int id = ((packet[3] & 0xe0) >> 5) - 1;
+       int id;
        int pres, traces;
 
-       if (id < 0)
+       id = ((packet[3] & 0xe0) >> 5) - 1;
+       if (id < 0 || id >= ETP_MAX_FINGERS)
                return;
 
        etd->mt[id].x = ((packet[1] & 0x0f) << 8) | packet[2];
@@ -707,7 +708,7 @@ static void process_packet_motion_v4(struct psmouse *psmouse)
        int id, sid;
 
        id = ((packet[0] & 0xe0) >> 5) - 1;
-       if (id < 0)
+       if (id < 0 || id >= ETP_MAX_FINGERS)
                return;
 
        sid = ((packet[3] & 0xe0) >> 5) - 1;
@@ -728,7 +729,7 @@ static void process_packet_motion_v4(struct psmouse *psmouse)
        input_report_abs(dev, ABS_MT_POSITION_X, etd->mt[id].x);
        input_report_abs(dev, ABS_MT_POSITION_Y, etd->mt[id].y);
 
-       if (sid >= 0) {
+       if (sid >= 0 && sid < ETP_MAX_FINGERS) {
                etd->mt[sid].x += delta_x2 * weight;
                etd->mt[sid].y -= delta_y2 * weight;
                input_mt_slot(dev, sid);
index 30102cb..3c9d072 100644 (file)
@@ -560,7 +560,7 @@ static int cyttsp5_hid_output_get_sysinfo(struct cyttsp5 *ts)
 static int cyttsp5_hid_output_bl_launch_app(struct cyttsp5 *ts)
 {
        int rc;
-       u8 cmd[HID_OUTPUT_BL_LAUNCH_APP];
+       u8 cmd[HID_OUTPUT_BL_LAUNCH_APP_SIZE];
        u16 crc;
 
        put_unaligned_le16(HID_OUTPUT_BL_LAUNCH_APP_SIZE, cmd);
index 577c75c..bb3c607 100644 (file)
@@ -22,7 +22,7 @@
  * in the kernel). So this driver offers straight forward, reliable single
  * touch functionality only.
  *
- * s.a. A20 User Manual "1.15 TP" (Documentation/arm/sunxi.rst)
+ * s.a. A20 User Manual "1.15 TP" (Documentation/arch/arm/sunxi.rst)
  * (looks like the description in the A20 User Manual v1.3 is better
  * than the one in the A10 User Manual v.1.5)
  */
index db98c3f..4d80060 100644 (file)
@@ -282,6 +282,7 @@ config EXYNOS_IOMMU_DEBUG
 config IPMMU_VMSA
        bool "Renesas VMSA-compatible IPMMU"
        depends on ARCH_RENESAS || COMPILE_TEST
+       depends on ARM || ARM64 || COMPILE_TEST
        depends on !GENERIC_ATOMIC64    # for IOMMU_IO_PGTABLE_LPAE
        select IOMMU_API
        select IOMMU_IO_PGTABLE_LPAE
@@ -417,22 +418,6 @@ config S390_IOMMU
        help
          Support for the IOMMU API for s390 PCI devices.
 
-config S390_CCW_IOMMU
-       bool "S390 CCW IOMMU Support"
-       depends on S390 && CCW || COMPILE_TEST
-       select IOMMU_API
-       help
-         Enables bits of IOMMU API required by VFIO. The iommu_ops
-         is not implemented as it is not necessary for VFIO.
-
-config S390_AP_IOMMU
-       bool "S390 AP IOMMU Support"
-       depends on S390 && ZCRYPT || COMPILE_TEST
-       select IOMMU_API
-       help
-         Enables bits of IOMMU API required by VFIO. The iommu_ops
-         is not implemented as it is not necessary for VFIO.
-
 config MTK_IOMMU
        tristate "MediaTek IOMMU Support"
        depends on ARCH_MEDIATEK || COMPILE_TEST
index e98f20a..9beeceb 100644 (file)
@@ -15,9 +15,7 @@ extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
 extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
 extern void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid);
 extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu);
-extern int amd_iommu_init_devices(void);
-extern void amd_iommu_uninit_devices(void);
-extern void amd_iommu_init_notifier(void);
+extern void amd_iommu_restart_ga_log(struct amd_iommu *iommu);
 extern void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
index 2ddbda3..ab8aa8f 100644 (file)
@@ -986,8 +986,13 @@ union irte_ga_hi {
 };
 
 struct irte_ga {
-       union irte_ga_lo lo;
-       union irte_ga_hi hi;
+       union {
+               struct {
+                       union irte_ga_lo lo;
+                       union irte_ga_hi hi;
+               };
+               u128 irte;
+       };
 };
 
 struct irq_2_irte {
index 329a406..c2d80a4 100644 (file)
@@ -759,6 +759,30 @@ void amd_iommu_restart_event_logging(struct amd_iommu *iommu)
 }
 
 /*
+ * This function restarts event logging in case the IOMMU experienced
+ * an GA log overflow.
+ */
+void amd_iommu_restart_ga_log(struct amd_iommu *iommu)
+{
+       u32 status;
+
+       status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
+       if (status & MMIO_STATUS_GALOG_RUN_MASK)
+               return;
+
+       pr_info_ratelimited("IOMMU GA Log restarting\n");
+
+       iommu_feature_disable(iommu, CONTROL_GALOG_EN);
+       iommu_feature_disable(iommu, CONTROL_GAINT_EN);
+
+       writel(MMIO_STATUS_GALOG_OVERFLOW_MASK,
+              iommu->mmio_base + MMIO_STATUS_OFFSET);
+
+       iommu_feature_enable(iommu, CONTROL_GAINT_EN);
+       iommu_feature_enable(iommu, CONTROL_GALOG_EN);
+}
+
+/*
  * This function resets the command buffer if the IOMMU stopped fetching
  * commands from it.
  */
index 4a31464..9ea4096 100644 (file)
@@ -845,6 +845,7 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) { }
        (MMIO_STATUS_EVT_OVERFLOW_INT_MASK | \
         MMIO_STATUS_EVT_INT_MASK | \
         MMIO_STATUS_PPR_INT_MASK | \
+        MMIO_STATUS_GALOG_OVERFLOW_MASK | \
         MMIO_STATUS_GALOG_INT_MASK)
 
 irqreturn_t amd_iommu_int_thread(int irq, void *data)
@@ -868,10 +869,16 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data)
                }
 
 #ifdef CONFIG_IRQ_REMAP
-               if (status & MMIO_STATUS_GALOG_INT_MASK) {
+               if (status & (MMIO_STATUS_GALOG_INT_MASK |
+                             MMIO_STATUS_GALOG_OVERFLOW_MASK)) {
                        pr_devel("Processing IOMMU GA Log\n");
                        iommu_poll_ga_log(iommu);
                }
+
+               if (status & MMIO_STATUS_GALOG_OVERFLOW_MASK) {
+                       pr_info_ratelimited("IOMMU GA Log overflow\n");
+                       amd_iommu_restart_ga_log(iommu);
+               }
 #endif
 
                if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) {
@@ -2067,14 +2074,10 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
 {
        struct io_pgtable_ops *pgtbl_ops;
        struct protection_domain *domain;
-       int pgtable = amd_iommu_pgtable;
+       int pgtable;
        int mode = DEFAULT_PGTABLE_LEVEL;
        int ret;
 
-       domain = kzalloc(sizeof(*domain), GFP_KERNEL);
-       if (!domain)
-               return NULL;
-
        /*
         * Force IOMMU v1 page table when iommu=pt and
         * when allocating domain for pass-through devices.
@@ -2084,8 +2087,16 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
                mode = PAGE_MODE_NONE;
        } else if (type == IOMMU_DOMAIN_UNMANAGED) {
                pgtable = AMD_IOMMU_V1;
+       } else if (type == IOMMU_DOMAIN_DMA || type == IOMMU_DOMAIN_DMA_FQ) {
+               pgtable = amd_iommu_pgtable;
+       } else {
+               return NULL;
        }
 
+       domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+       if (!domain)
+               return NULL;
+
        switch (pgtable) {
        case AMD_IOMMU_V1:
                ret = protection_domain_init_v1(domain, mode);
@@ -2118,6 +2129,15 @@ out_err:
        return NULL;
 }
 
+static inline u64 dma_max_address(void)
+{
+       if (amd_iommu_pgtable == AMD_IOMMU_V1)
+               return ~0ULL;
+
+       /* V2 with 4/5 level page table */
+       return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
+}
+
 static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
 {
        struct protection_domain *domain;
@@ -2134,7 +2154,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
                return NULL;
 
        domain->domain.geometry.aperture_start = 0;
-       domain->domain.geometry.aperture_end   = ~0ULL;
+       domain->domain.geometry.aperture_end   = dma_max_address();
        domain->domain.geometry.force_aperture = true;
 
        return &domain->domain;
@@ -2387,7 +2407,7 @@ static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
        unsigned long flags;
 
        spin_lock_irqsave(&dom->lock, flags);
-       domain_flush_pages(dom, gather->start, gather->end - gather->start, 1);
+       domain_flush_pages(dom, gather->start, gather->end - gather->start + 1, 1);
        amd_iommu_domain_flush_complete(dom);
        spin_unlock_irqrestore(&dom->lock, flags);
 }
@@ -3003,10 +3023,10 @@ out:
 static int modify_irte_ga(struct amd_iommu *iommu, u16 devid, int index,
                          struct irte_ga *irte, struct amd_ir_data *data)
 {
-       bool ret;
        struct irq_remap_table *table;
-       unsigned long flags;
        struct irte_ga *entry;
+       unsigned long flags;
+       u128 old;
 
        table = get_irq_table(iommu, devid);
        if (!table)
@@ -3017,16 +3037,14 @@ static int modify_irte_ga(struct amd_iommu *iommu, u16 devid, int index,
        entry = (struct irte_ga *)table->table;
        entry = &entry[index];
 
-       ret = cmpxchg_double(&entry->lo.val, &entry->hi.val,
-                            entry->lo.val, entry->hi.val,
-                            irte->lo.val, irte->hi.val);
        /*
         * We use cmpxchg16 to atomically update the 128-bit IRTE,
         * and it cannot be updated by the hardware or other processors
         * behind us, so the return value of cmpxchg16 should be the
         * same as the old value.
         */
-       WARN_ON(!ret);
+       old = entry->irte;
+       WARN_ON(!try_cmpxchg128(&entry->irte, &old, irte->irte));
 
        if (data)
                data->ref = entry;
@@ -3493,8 +3511,7 @@ int amd_iommu_activate_guest_mode(void *data)
        struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
        u64 valid;
 
-       if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
-           !entry || entry->lo.fields_vapic.guest_mode)
+       if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) || !entry)
                return 0;
 
        valid = entry->lo.fields_vapic.valid;
index ae09c62..c71afda 100644 (file)
@@ -517,6 +517,7 @@ static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = {
        { .compatible = "qcom,qcm2290-smmu-500", .data = &qcom_smmu_500_impl0_data },
        { .compatible = "qcom,qdu1000-smmu-500", .data = &qcom_smmu_500_impl0_data  },
        { .compatible = "qcom,sc7180-smmu-500", .data = &qcom_smmu_500_impl0_data },
+       { .compatible = "qcom,sc7180-smmu-v2", .data = &qcom_smmu_v2_data },
        { .compatible = "qcom,sc7280-smmu-500", .data = &qcom_smmu_500_impl0_data },
        { .compatible = "qcom,sc8180x-smmu-500", .data = &qcom_smmu_500_impl0_data },
        { .compatible = "qcom,sc8280xp-smmu-500", .data = &qcom_smmu_500_impl0_data },
@@ -561,5 +562,14 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
        if (match)
                return qcom_smmu_create(smmu, match->data);
 
+       /*
+        * If you hit this WARN_ON() you are missing an entry in the
+        * qcom_smmu_impl_of_match[] table, and GPU per-process page-
+        * tables will be broken.
+        */
+       WARN(of_device_is_compatible(np, "qcom,adreno-smmu"),
+            "Missing qcom_smmu_impl_of_match entry for: %s",
+            dev_name(smmu->dev));
+
        return smmu;
 }
index a1b9873..08f5632 100644 (file)
@@ -175,18 +175,14 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
        irte = &iommu->ir_table->base[index];
 
        if ((irte->pst == 1) || (irte_modified->pst == 1)) {
-               bool ret;
-
-               ret = cmpxchg_double(&irte->low, &irte->high,
-                                    irte->low, irte->high,
-                                    irte_modified->low, irte_modified->high);
                /*
                 * We use cmpxchg16 to atomically update the 128-bit IRTE,
                 * and it cannot be updated by the hardware or other processors
                 * behind us, so the return value of cmpxchg16 should be the
                 * same as the old value.
                 */
-               WARN_ON(!ret);
+               u128 old = irte->irte;
+               WARN_ON(!try_cmpxchg128(&irte->irte, &old, irte_modified->irte));
        } else {
                WRITE_ONCE(irte->low, irte_modified->low);
                WRITE_ONCE(irte->high, irte_modified->high);
index aecc7d1..e93906d 100644 (file)
@@ -781,7 +781,8 @@ static void mtk_iommu_flush_iotlb_all(struct iommu_domain *domain)
 {
        struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 
-       mtk_iommu_tlb_flush_all(dom->bank->parent_data);
+       if (dom->bank)
+               mtk_iommu_tlb_flush_all(dom->bank->parent_data);
 }
 
 static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
index ea5a308..4054030 100644 (file)
@@ -1335,20 +1335,22 @@ static int rk_iommu_probe(struct platform_device *pdev)
        for (i = 0; i < iommu->num_irq; i++) {
                int irq = platform_get_irq(pdev, i);
 
-               if (irq < 0)
-                       return irq;
+               if (irq < 0) {
+                       err = irq;
+                       goto err_pm_disable;
+               }
 
                err = devm_request_irq(iommu->dev, irq, rk_iommu_irq,
                                       IRQF_SHARED, dev_name(dev), iommu);
-               if (err) {
-                       pm_runtime_disable(dev);
-                       goto err_remove_sysfs;
-               }
+               if (err)
+                       goto err_pm_disable;
        }
 
        dma_set_mask_and_coherent(dev, rk_ops->dma_bit_mask);
 
        return 0;
+err_pm_disable:
+       pm_runtime_disable(dev);
 err_remove_sysfs:
        iommu_device_sysfs_remove(&iommu->iommu);
 err_put_group:
index 77ebe7e..e731e07 100644 (file)
@@ -212,12 +212,6 @@ out_kfree:
        return err;
 }
 
-void __init clps711x_intc_init(phys_addr_t base, resource_size_t size)
-{
-       BUG_ON(_clps711x_intc_init(NULL, base, size));
-}
-
-#ifdef CONFIG_IRQCHIP
 static int __init clps711x_intc_init_dt(struct device_node *np,
                                        struct device_node *parent)
 {
@@ -231,4 +225,3 @@ static int __init clps711x_intc_init_dt(struct device_node *np,
        return _clps711x_intc_init(np, res.start, resource_size(&res));
 }
 IRQCHIP_DECLARE(clps711x, "cirrus,ep7209-intc", clps711x_intc_init_dt);
-#endif
index 46a3aa6..359efc1 100644 (file)
@@ -125,7 +125,7 @@ static struct irq_chip ft010_irq_chip = {
 /* Local static for the IRQ entry call */
 static struct ft010_irq_data firq;
 
-asmlinkage void __exception_irq_entry ft010_irqchip_handle_irq(struct pt_regs *regs)
+static asmlinkage void __exception_irq_entry ft010_irqchip_handle_irq(struct pt_regs *regs)
 {
        struct ft010_irq_data *f = &firq;
        int irq;
@@ -162,7 +162,7 @@ static const struct irq_domain_ops ft010_irqdomain_ops = {
        .xlate = irq_domain_xlate_onetwocell,
 };
 
-int __init ft010_of_init_irq(struct device_node *node,
+static int __init ft010_of_init_irq(struct device_node *node,
                              struct device_node *parent)
 {
        struct ft010_irq_data *f = &firq;
index a610821..afd6a18 100644 (file)
@@ -16,7 +16,13 @@ void gic_enable_of_quirks(const struct device_node *np,
                          const struct gic_quirk *quirks, void *data)
 {
        for (; quirks->desc; quirks++) {
-               if (!of_device_is_compatible(np, quirks->compatible))
+               if (!quirks->compatible && !quirks->property)
+                       continue;
+               if (quirks->compatible &&
+                   !of_device_is_compatible(np, quirks->compatible))
+                       continue;
+               if (quirks->property &&
+                   !of_property_read_bool(np, quirks->property))
                        continue;
                if (quirks->init(data))
                        pr_info("GIC: enabling workaround for %s\n",
@@ -28,7 +34,7 @@ void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
                void *data)
 {
        for (; quirks->desc; quirks++) {
-               if (quirks->compatible)
+               if (quirks->compatible || quirks->property)
                        continue;
                if (quirks->iidr != (quirks->mask & iidr))
                        continue;
index 27e3d4e..3db4592 100644 (file)
@@ -13,6 +13,7 @@
 struct gic_quirk {
        const char *desc;
        const char *compatible;
+       const char *property;
        bool (*init)(void *data);
        u32 iidr;
        u32 mask;
index 0ec2b1e..1994541 100644 (file)
@@ -3585,6 +3585,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
                irqd = irq_get_irq_data(virq + i);
                irqd_set_single_target(irqd);
                irqd_set_affinity_on_activate(irqd);
+               irqd_set_resend_when_in_progress(irqd);
                pr_debug("ID:%d pID:%d vID:%d\n",
                         (int)(hwirq + i - its_dev->event_map.lpi_base),
                         (int)(hwirq + i), virq + i);
@@ -4523,6 +4524,7 @@ static int its_vpe_irq_domain_alloc(struct irq_domain *domain, unsigned int virq
                irq_domain_set_hwirq_and_chip(domain, virq + i, i,
                                              irqchip, vm->vpes[i]);
                set_bit(i, bitmap);
+               irqd_set_resend_when_in_progress(irq_get_irq_data(virq + i));
        }
 
        if (err) {
index 6fcee22..0c6c1af 100644 (file)
@@ -39,6 +39,8 @@
 
 #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996    (1ULL << 0)
 #define FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539  (1ULL << 1)
+#define FLAGS_WORKAROUND_MTK_GICR_SAVE         (1ULL << 2)
+#define FLAGS_WORKAROUND_ASR_ERRATUM_8601001   (1ULL << 3)
 
 #define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1)
 
@@ -655,10 +657,16 @@ static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
        return 0;
 }
 
-static u64 gic_mpidr_to_affinity(unsigned long mpidr)
+static u64 gic_cpu_to_affinity(int cpu)
 {
+       u64 mpidr = cpu_logical_map(cpu);
        u64 aff;
 
+       /* ASR8601 needs to have its affinities shifted down... */
+       if (unlikely(gic_data.flags & FLAGS_WORKAROUND_ASR_ERRATUM_8601001))
+               mpidr = (MPIDR_AFFINITY_LEVEL(mpidr, 1) |
+                        (MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8));
+
        aff = ((u64)MPIDR_AFFINITY_LEVEL(mpidr, 3) << 32 |
               MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16 |
               MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8  |
@@ -913,7 +921,7 @@ static void __init gic_dist_init(void)
         * Set all global interrupts to the boot CPU only. ARE must be
         * enabled.
         */
-       affinity = gic_mpidr_to_affinity(cpu_logical_map(smp_processor_id()));
+       affinity = gic_cpu_to_affinity(smp_processor_id());
        for (i = 32; i < GIC_LINE_NR; i++)
                gic_write_irouter(affinity, base + GICD_IROUTER + i * 8);
 
@@ -962,7 +970,7 @@ static int gic_iterate_rdists(int (*fn)(struct redist_region *, void __iomem *))
 
 static int __gic_populate_rdist(struct redist_region *region, void __iomem *ptr)
 {
-       unsigned long mpidr = cpu_logical_map(smp_processor_id());
+       unsigned long mpidr;
        u64 typer;
        u32 aff;
 
@@ -970,6 +978,8 @@ static int __gic_populate_rdist(struct redist_region *region, void __iomem *ptr)
         * Convert affinity to a 32bit value that can be matched to
         * GICR_TYPER bits [63:32].
         */
+       mpidr = gic_cpu_to_affinity(smp_processor_id());
+
        aff = (MPIDR_AFFINITY_LEVEL(mpidr, 3) << 24 |
               MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16 |
               MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8 |
@@ -1083,7 +1093,7 @@ static inline bool gic_dist_security_disabled(void)
 static void gic_cpu_sys_reg_init(void)
 {
        int i, cpu = smp_processor_id();
-       u64 mpidr = cpu_logical_map(cpu);
+       u64 mpidr = gic_cpu_to_affinity(cpu);
        u64 need_rss = MPIDR_RS(mpidr);
        bool group0;
        u32 pribits;
@@ -1182,11 +1192,11 @@ static void gic_cpu_sys_reg_init(void)
        for_each_online_cpu(i) {
                bool have_rss = per_cpu(has_rss, i) && per_cpu(has_rss, cpu);
 
-               need_rss |= MPIDR_RS(cpu_logical_map(i));
+               need_rss |= MPIDR_RS(gic_cpu_to_affinity(i));
                if (need_rss && (!have_rss))
                        pr_crit("CPU%d (%lx) can't SGI CPU%d (%lx), no RSS\n",
                                cpu, (unsigned long)mpidr,
-                               i, (unsigned long)cpu_logical_map(i));
+                               i, (unsigned long)gic_cpu_to_affinity(i));
        }
 
        /**
@@ -1262,9 +1272,11 @@ static u16 gic_compute_target_list(int *base_cpu, const struct cpumask *mask,
                                   unsigned long cluster_id)
 {
        int next_cpu, cpu = *base_cpu;
-       unsigned long mpidr = cpu_logical_map(cpu);
+       unsigned long mpidr;
        u16 tlist = 0;
 
+       mpidr = gic_cpu_to_affinity(cpu);
+
        while (cpu < nr_cpu_ids) {
                tlist |= 1 << (mpidr & 0xf);
 
@@ -1273,7 +1285,7 @@ static u16 gic_compute_target_list(int *base_cpu, const struct cpumask *mask,
                        goto out;
                cpu = next_cpu;
 
-               mpidr = cpu_logical_map(cpu);
+               mpidr = gic_cpu_to_affinity(cpu);
 
                if (cluster_id != MPIDR_TO_SGI_CLUSTER_ID(mpidr)) {
                        cpu--;
@@ -1318,7 +1330,7 @@ static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
        dsb(ishst);
 
        for_each_cpu(cpu, mask) {
-               u64 cluster_id = MPIDR_TO_SGI_CLUSTER_ID(cpu_logical_map(cpu));
+               u64 cluster_id = MPIDR_TO_SGI_CLUSTER_ID(gic_cpu_to_affinity(cpu));
                u16 tlist;
 
                tlist = gic_compute_target_list(&cpu, mask, cluster_id);
@@ -1376,7 +1388,7 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 
        offset = convert_offset_index(d, GICD_IROUTER, &index);
        reg = gic_dist_base(d) + offset + (index * 8);
-       val = gic_mpidr_to_affinity(cpu_logical_map(cpu));
+       val = gic_cpu_to_affinity(cpu);
 
        gic_write_irouter(val, reg);
 
@@ -1720,6 +1732,15 @@ static bool gic_enable_quirk_msm8996(void *data)
        return true;
 }
 
+static bool gic_enable_quirk_mtk_gicr(void *data)
+{
+       struct gic_chip_data *d = data;
+
+       d->flags |= FLAGS_WORKAROUND_MTK_GICR_SAVE;
+
+       return true;
+}
+
 static bool gic_enable_quirk_cavium_38539(void *data)
 {
        struct gic_chip_data *d = data;
@@ -1786,6 +1807,15 @@ static bool gic_enable_quirk_nvidia_t241(void *data)
        return true;
 }
 
+static bool gic_enable_quirk_asr8601(void *data)
+{
+       struct gic_chip_data *d = data;
+
+       d->flags |= FLAGS_WORKAROUND_ASR_ERRATUM_8601001;
+
+       return true;
+}
+
 static const struct gic_quirk gic_quirks[] = {
        {
                .desc   = "GICv3: Qualcomm MSM8996 broken firmware",
@@ -1793,6 +1823,16 @@ static const struct gic_quirk gic_quirks[] = {
                .init   = gic_enable_quirk_msm8996,
        },
        {
+               .desc   = "GICv3: ASR erratum 8601001",
+               .compatible = "asr,asr8601-gic-v3",
+               .init   = gic_enable_quirk_asr8601,
+       },
+       {
+               .desc   = "GICv3: Mediatek Chromebook GICR save problem",
+               .property = "mediatek,broken-save-restore-fw",
+               .init   = gic_enable_quirk_mtk_gicr,
+       },
+       {
                .desc   = "GICv3: HIP06 erratum 161010803",
                .iidr   = 0x0204043b,
                .mask   = 0xffffffff,
@@ -1834,6 +1874,11 @@ static void gic_enable_nmi_support(void)
        if (!gic_prio_masking_enabled())
                return;
 
+       if (gic_data.flags & FLAGS_WORKAROUND_MTK_GICR_SAVE) {
+               pr_warn("Skipping NMI enable due to firmware issues\n");
+               return;
+       }
+
        ppi_nmi_refs = kcalloc(gic_data.ppi_nr, sizeof(*ppi_nmi_refs), GFP_KERNEL);
        if (!ppi_nmi_refs)
                return;
index 5f47d8e..b9dcc8e 100644 (file)
@@ -68,6 +68,7 @@ static int __init aic_irq_of_init(struct device_node *node,
        unsigned min_irq = JCORE_AIC2_MIN_HWIRQ;
        unsigned dom_sz = JCORE_AIC_MAX_HWIRQ+1;
        struct irq_domain *domain;
+       int ret;
 
        pr_info("Initializing J-Core AIC\n");
 
@@ -100,6 +101,12 @@ static int __init aic_irq_of_init(struct device_node *node,
        jcore_aic.irq_unmask = noop;
        jcore_aic.name = "AIC";
 
+       ret = irq_alloc_descs(-1, min_irq, dom_sz - min_irq,
+                             of_node_to_nid(node));
+
+       if (ret < 0)
+               return ret;
+
        domain = irq_domain_add_legacy(node, dom_sz - min_irq, min_irq, min_irq,
                                       &jcore_aic_irqdomain_ops,
                                       &jcore_aic);
index 71ef19f..92d8aa2 100644 (file)
@@ -36,6 +36,7 @@ static int nr_pics;
 
 struct eiointc_priv {
        u32                     node;
+       u32                     vec_count;
        nodemask_t              node_map;
        cpumask_t               cpuspan_map;
        struct fwnode_handle    *domain_handle;
@@ -153,18 +154,18 @@ static int eiointc_router_init(unsigned int cpu)
        if ((cpu_logical_map(cpu) % CORES_PER_EIO_NODE) == 0) {
                eiointc_enable();
 
-               for (i = 0; i < VEC_COUNT / 32; i++) {
+               for (i = 0; i < eiointc_priv[0]->vec_count / 32; i++) {
                        data = (((1 << (i * 2 + 1)) << 16) | (1 << (i * 2)));
                        iocsr_write32(data, EIOINTC_REG_NODEMAP + i * 4);
                }
 
-               for (i = 0; i < VEC_COUNT / 32 / 4; i++) {
+               for (i = 0; i < eiointc_priv[0]->vec_count / 32 / 4; i++) {
                        bit = BIT(1 + index); /* Route to IP[1 + index] */
                        data = bit | (bit << 8) | (bit << 16) | (bit << 24);
                        iocsr_write32(data, EIOINTC_REG_IPMAP + i * 4);
                }
 
-               for (i = 0; i < VEC_COUNT / 4; i++) {
+               for (i = 0; i < eiointc_priv[0]->vec_count / 4; i++) {
                        /* Route to Node-0 Core-0 */
                        if (index == 0)
                                bit = BIT(cpu_logical_map(0));
@@ -175,7 +176,7 @@ static int eiointc_router_init(unsigned int cpu)
                        iocsr_write32(data, EIOINTC_REG_ROUTE + i * 4);
                }
 
-               for (i = 0; i < VEC_COUNT / 32; i++) {
+               for (i = 0; i < eiointc_priv[0]->vec_count / 32; i++) {
                        data = 0xffffffff;
                        iocsr_write32(data, EIOINTC_REG_ENABLE + i * 4);
                        iocsr_write32(data, EIOINTC_REG_BOUNCE + i * 4);
@@ -195,7 +196,7 @@ static void eiointc_irq_dispatch(struct irq_desc *desc)
 
        chained_irq_enter(chip, desc);
 
-       for (i = 0; i < VEC_REG_COUNT; i++) {
+       for (i = 0; i < eiointc_priv[0]->vec_count / VEC_COUNT_PER_REG; i++) {
                pending = iocsr_read64(EIOINTC_REG_ISR + (i << 3));
                iocsr_write64(pending, EIOINTC_REG_ISR + (i << 3));
                while (pending) {
@@ -310,11 +311,11 @@ static void eiointc_resume(void)
        eiointc_router_init(0);
 
        for (i = 0; i < nr_pics; i++) {
-               for (j = 0; j < VEC_COUNT; j++) {
+               for (j = 0; j < eiointc_priv[0]->vec_count; j++) {
                        desc = irq_resolve_mapping(eiointc_priv[i]->eiointc_domain, j);
                        if (desc && desc->handle_irq && desc->handle_irq != handle_bad_irq) {
                                raw_spin_lock(&desc->lock);
-                               irq_data = &desc->irq_data;
+                               irq_data = irq_domain_get_irq_data(eiointc_priv[i]->eiointc_domain, irq_desc_get_irq(desc));
                                eiointc_set_irq_affinity(irq_data, irq_data->common->affinity, 0);
                                raw_spin_unlock(&desc->lock);
                        }
@@ -375,11 +376,47 @@ static int __init acpi_cascade_irqdomain_init(void)
        return 0;
 }
 
+static int __init eiointc_init(struct eiointc_priv *priv, int parent_irq,
+                              u64 node_map)
+{
+       int i;
+
+       node_map = node_map ? node_map : -1ULL;
+       for_each_possible_cpu(i) {
+               if (node_map & (1ULL << (cpu_to_eio_node(i)))) {
+                       node_set(cpu_to_eio_node(i), priv->node_map);
+                       cpumask_or(&priv->cpuspan_map, &priv->cpuspan_map,
+                                  cpumask_of(i));
+               }
+       }
+
+       priv->eiointc_domain = irq_domain_create_linear(priv->domain_handle,
+                                                       priv->vec_count,
+                                                       &eiointc_domain_ops,
+                                                       priv);
+       if (!priv->eiointc_domain) {
+               pr_err("loongson-extioi: cannot add IRQ domain\n");
+               return -ENOMEM;
+       }
+
+       eiointc_priv[nr_pics++] = priv;
+       eiointc_router_init(0);
+       irq_set_chained_handler_and_data(parent_irq, eiointc_irq_dispatch, priv);
+
+       if (nr_pics == 1) {
+               register_syscore_ops(&eiointc_syscore_ops);
+               cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_LOONGARCH_STARTING,
+                                         "irqchip/loongarch/intc:starting",
+                                         eiointc_router_init, NULL);
+       }
+
+       return 0;
+}
+
 int __init eiointc_acpi_init(struct irq_domain *parent,
                                     struct acpi_madt_eio_pic *acpi_eiointc)
 {
-       int i, ret, parent_irq;
-       unsigned long node_map;
+       int parent_irq, ret;
        struct eiointc_priv *priv;
        int node;
 
@@ -394,37 +431,14 @@ int __init eiointc_acpi_init(struct irq_domain *parent,
                goto out_free_priv;
        }
 
+       priv->vec_count = VEC_COUNT;
        priv->node = acpi_eiointc->node;
-       node_map = acpi_eiointc->node_map ? : -1ULL;
-
-       for_each_possible_cpu(i) {
-               if (node_map & (1ULL << cpu_to_eio_node(i))) {
-                       node_set(cpu_to_eio_node(i), priv->node_map);
-                       cpumask_or(&priv->cpuspan_map, &priv->cpuspan_map, cpumask_of(i));
-               }
-       }
-
-       /* Setup IRQ domain */
-       priv->eiointc_domain = irq_domain_create_linear(priv->domain_handle, VEC_COUNT,
-                                       &eiointc_domain_ops, priv);
-       if (!priv->eiointc_domain) {
-               pr_err("loongson-eiointc: cannot add IRQ domain\n");
-               goto out_free_handle;
-       }
-
-       eiointc_priv[nr_pics++] = priv;
-
-       eiointc_router_init(0);
 
        parent_irq = irq_create_mapping(parent, acpi_eiointc->cascade);
-       irq_set_chained_handler_and_data(parent_irq, eiointc_irq_dispatch, priv);
 
-       if (nr_pics == 1) {
-               register_syscore_ops(&eiointc_syscore_ops);
-               cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_LOONGARCH_STARTING,
-                                 "irqchip/loongarch/intc:starting",
-                                 eiointc_router_init, NULL);
-       }
+       ret = eiointc_init(priv, parent_irq, acpi_eiointc->node_map);
+       if (ret < 0)
+               goto out_free_handle;
 
        if (cpu_has_flatmode)
                node = cpu_to_node(acpi_eiointc->node * CORES_PER_EIO_NODE);
@@ -432,7 +446,10 @@ int __init eiointc_acpi_init(struct irq_domain *parent,
                node = acpi_eiointc->node;
        acpi_set_vec_parent(node, priv->eiointc_domain, pch_group);
        acpi_set_vec_parent(node, priv->eiointc_domain, msi_group);
+
        ret = acpi_cascade_irqdomain_init();
+       if (ret < 0)
+               goto out_free_handle;
 
        return ret;
 
@@ -444,3 +461,49 @@ out_free_priv:
 
        return -ENOMEM;
 }
+
+static int __init eiointc_of_init(struct device_node *of_node,
+                                 struct device_node *parent)
+{
+       int parent_irq, ret;
+       struct eiointc_priv *priv;
+
+       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+       if (!priv)
+               return -ENOMEM;
+
+       parent_irq = irq_of_parse_and_map(of_node, 0);
+       if (parent_irq <= 0) {
+               ret = -ENODEV;
+               goto out_free_priv;
+       }
+
+       ret = irq_set_handler_data(parent_irq, priv);
+       if (ret < 0)
+               goto out_free_priv;
+
+       /*
+        * In particular, the number of devices supported by the LS2K0500
+        * extended I/O interrupt vector is 128.
+        */
+       if (of_device_is_compatible(of_node, "loongson,ls2k0500-eiointc"))
+               priv->vec_count = 128;
+       else
+               priv->vec_count = VEC_COUNT;
+
+       priv->node = 0;
+       priv->domain_handle = of_node_to_fwnode(of_node);
+
+       ret = eiointc_init(priv, parent_irq, 0);
+       if (ret < 0)
+               goto out_free_priv;
+
+       return 0;
+
+out_free_priv:
+       kfree(priv);
+       return ret;
+}
+
+IRQCHIP_DECLARE(loongson_ls2k0500_eiointc, "loongson,ls2k0500-eiointc", eiointc_of_init);
+IRQCHIP_DECLARE(loongson_ls2k2000_eiointc, "loongson,ls2k2000-eiointc", eiointc_of_init);
index 8d00a9a..e4b33ae 100644 (file)
 #define LIOINTC_REG_INTC_EN_STATUS     (LIOINTC_INTC_CHIP_START + 0x04)
 #define LIOINTC_REG_INTC_ENABLE        (LIOINTC_INTC_CHIP_START + 0x08)
 #define LIOINTC_REG_INTC_DISABLE       (LIOINTC_INTC_CHIP_START + 0x0c)
+/*
+ * LIOINTC_REG_INTC_POL register is only valid for Loongson-2K series, and
+ * Loongson-3 series behave as noops.
+ */
 #define LIOINTC_REG_INTC_POL   (LIOINTC_INTC_CHIP_START + 0x10)
 #define LIOINTC_REG_INTC_EDGE  (LIOINTC_INTC_CHIP_START + 0x14)
 
@@ -116,19 +120,19 @@ static int liointc_set_type(struct irq_data *data, unsigned int type)
        switch (type) {
        case IRQ_TYPE_LEVEL_HIGH:
                liointc_set_bit(gc, LIOINTC_REG_INTC_EDGE, mask, false);
-               liointc_set_bit(gc, LIOINTC_REG_INTC_POL, mask, true);
+               liointc_set_bit(gc, LIOINTC_REG_INTC_POL, mask, false);
                break;
        case IRQ_TYPE_LEVEL_LOW:
                liointc_set_bit(gc, LIOINTC_REG_INTC_EDGE, mask, false);
-               liointc_set_bit(gc, LIOINTC_REG_INTC_POL, mask, false);
+               liointc_set_bit(gc, LIOINTC_REG_INTC_POL, mask, true);
                break;
        case IRQ_TYPE_EDGE_RISING:
                liointc_set_bit(gc, LIOINTC_REG_INTC_EDGE, mask, true);
-               liointc_set_bit(gc, LIOINTC_REG_INTC_POL, mask, true);
+               liointc_set_bit(gc, LIOINTC_REG_INTC_POL, mask, false);
                break;
        case IRQ_TYPE_EDGE_FALLING:
                liointc_set_bit(gc, LIOINTC_REG_INTC_EDGE, mask, true);
-               liointc_set_bit(gc, LIOINTC_REG_INTC_POL, mask, false);
+               liointc_set_bit(gc, LIOINTC_REG_INTC_POL, mask, true);
                break;
        default:
                irq_gc_unlock_irqrestore(gc, flags);
@@ -291,6 +295,7 @@ static int liointc_init(phys_addr_t addr, unsigned long size, int revision,
        ct->chip.irq_mask = irq_gc_mask_disable_reg;
        ct->chip.irq_mask_ack = irq_gc_mask_disable_reg;
        ct->chip.irq_set_type = liointc_set_type;
+       ct->chip.flags = IRQCHIP_SKIP_SET_WAKE;
 
        gc->mask_cache = 0;
        priv->gc = gc;
index e5fe4d5..93a71f6 100644 (file)
@@ -164,7 +164,7 @@ static int pch_pic_domain_translate(struct irq_domain *d,
                if (fwspec->param_count < 2)
                        return -EINVAL;
 
-               *hwirq = fwspec->param[0] + priv->ht_vec_base;
+               *hwirq = fwspec->param[0];
                *type = fwspec->param[1] & IRQ_TYPE_SENSE_MASK;
        } else {
                if (fwspec->param_count < 1)
@@ -196,7 +196,7 @@ static int pch_pic_alloc(struct irq_domain *domain, unsigned int virq,
 
        parent_fwspec.fwnode = domain->parent->fwnode;
        parent_fwspec.param_count = 1;
-       parent_fwspec.param[0] = hwirq;
+       parent_fwspec.param[0] = hwirq + priv->ht_vec_base;
 
        err = irq_domain_alloc_irqs_parent(domain, virq, 1, &parent_fwspec);
        if (err)
@@ -401,14 +401,12 @@ static int __init acpi_cascade_irqdomain_init(void)
 int __init pch_pic_acpi_init(struct irq_domain *parent,
                                        struct acpi_madt_bio_pic *acpi_pchpic)
 {
-       int ret, vec_base;
+       int ret;
        struct fwnode_handle *domain_handle;
 
        if (find_pch_pic(acpi_pchpic->gsi_base) >= 0)
                return 0;
 
-       vec_base = acpi_pchpic->gsi_base - GSI_MIN_PCH_IRQ;
-
        domain_handle = irq_domain_alloc_fwnode(&acpi_pchpic->address);
        if (!domain_handle) {
                pr_err("Unable to allocate domain handle\n");
@@ -416,7 +414,7 @@ int __init pch_pic_acpi_init(struct irq_domain *parent,
        }
 
        ret = pch_pic_init(acpi_pchpic->address, acpi_pchpic->size,
-                               vec_base, parent, domain_handle, acpi_pchpic->gsi_base);
+                               0, parent, domain_handle, acpi_pchpic->gsi_base);
 
        if (ret < 0) {
                irq_domain_free_fwnode(domain_handle);
index eada5e0..5101a3f 100644 (file)
@@ -240,26 +240,27 @@ static int mbigen_of_create_domain(struct platform_device *pdev,
        struct irq_domain *domain;
        struct device_node *np;
        u32 num_pins;
+       int ret = 0;
+
+       parent = bus_get_dev_root(&platform_bus_type);
+       if (!parent)
+               return -ENODEV;
 
        for_each_child_of_node(pdev->dev.of_node, np) {
                if (!of_property_read_bool(np, "interrupt-controller"))
                        continue;
 
-               parent = bus_get_dev_root(&platform_bus_type);
-               if (parent) {
-                       child = of_platform_device_create(np, NULL, parent);
-                       put_device(parent);
-                       if (!child) {
-                               of_node_put(np);
-                               return -ENOMEM;
-                       }
+               child = of_platform_device_create(np, NULL, parent);
+               if (!child) {
+                       ret = -ENOMEM;
+                       break;
                }
 
                if (of_property_read_u32(child->dev.of_node, "num-pins",
                                         &num_pins) < 0) {
                        dev_err(&pdev->dev, "No num-pins property\n");
-                       of_node_put(np);
-                       return -EINVAL;
+                       ret = -EINVAL;
+                       break;
                }
 
                domain = platform_msi_create_device_domain(&child->dev, num_pins,
@@ -267,12 +268,16 @@ static int mbigen_of_create_domain(struct platform_device *pdev,
                                                           &mbigen_domain_ops,
                                                           mgn_chip);
                if (!domain) {
-                       of_node_put(np);
-                       return -ENOMEM;
+                       ret = -ENOMEM;
+                       break;
                }
        }
 
-       return 0;
+       put_device(parent);
+       if (ret)
+               of_node_put(np);
+
+       return ret;
 }
 
 #ifdef CONFIG_ACPI
index 2aaa9aa..7da18ef 100644 (file)
@@ -150,7 +150,7 @@ static const struct meson_gpio_irq_params s4_params = {
        INIT_MESON_S4_COMMON_DATA(82)
 };
 
-static const struct of_device_id meson_irq_gpio_matches[] = {
+static const struct of_device_id meson_irq_gpio_matches[] __maybe_unused = {
        { .compatible = "amlogic,meson8-gpio-intc", .data = &meson8_params },
        { .compatible = "amlogic,meson8b-gpio-intc", .data = &meson8b_params },
        { .compatible = "amlogic,meson-gxbb-gpio-intc", .data = &gxbb_params },
index 046c355..6d5ecc1 100644 (file)
@@ -50,7 +50,7 @@ void __iomem *mips_gic_base;
 
 static DEFINE_PER_CPU_READ_MOSTLY(unsigned long[GIC_MAX_LONGS], pcpu_masks);
 
-static DEFINE_SPINLOCK(gic_lock);
+static DEFINE_RAW_SPINLOCK(gic_lock);
 static struct irq_domain *gic_irq_domain;
 static int gic_shared_intrs;
 static unsigned int gic_cpu_pin;
@@ -210,7 +210,7 @@ static int gic_set_type(struct irq_data *d, unsigned int type)
 
        irq = GIC_HWIRQ_TO_SHARED(d->hwirq);
 
-       spin_lock_irqsave(&gic_lock, flags);
+       raw_spin_lock_irqsave(&gic_lock, flags);
        switch (type & IRQ_TYPE_SENSE_MASK) {
        case IRQ_TYPE_EDGE_FALLING:
                pol = GIC_POL_FALLING_EDGE;
@@ -250,7 +250,7 @@ static int gic_set_type(struct irq_data *d, unsigned int type)
        else
                irq_set_chip_handler_name_locked(d, &gic_level_irq_controller,
                                                 handle_level_irq, NULL);
-       spin_unlock_irqrestore(&gic_lock, flags);
+       raw_spin_unlock_irqrestore(&gic_lock, flags);
 
        return 0;
 }
@@ -268,7 +268,7 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *cpumask,
                return -EINVAL;
 
        /* Assumption : cpumask refers to a single CPU */
-       spin_lock_irqsave(&gic_lock, flags);
+       raw_spin_lock_irqsave(&gic_lock, flags);
 
        /* Re-route this IRQ */
        write_gic_map_vp(irq, BIT(mips_cm_vp_id(cpu)));
@@ -279,7 +279,7 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *cpumask,
                set_bit(irq, per_cpu_ptr(pcpu_masks, cpu));
 
        irq_data_update_effective_affinity(d, cpumask_of(cpu));
-       spin_unlock_irqrestore(&gic_lock, flags);
+       raw_spin_unlock_irqrestore(&gic_lock, flags);
 
        return IRQ_SET_MASK_OK;
 }
@@ -357,12 +357,12 @@ static void gic_mask_local_irq_all_vpes(struct irq_data *d)
        cd = irq_data_get_irq_chip_data(d);
        cd->mask = false;
 
-       spin_lock_irqsave(&gic_lock, flags);
+       raw_spin_lock_irqsave(&gic_lock, flags);
        for_each_online_cpu(cpu) {
                write_gic_vl_other(mips_cm_vp_id(cpu));
                write_gic_vo_rmask(BIT(intr));
        }
-       spin_unlock_irqrestore(&gic_lock, flags);
+       raw_spin_unlock_irqrestore(&gic_lock, flags);
 }
 
 static void gic_unmask_local_irq_all_vpes(struct irq_data *d)
@@ -375,12 +375,12 @@ static void gic_unmask_local_irq_all_vpes(struct irq_data *d)
        cd = irq_data_get_irq_chip_data(d);
        cd->mask = true;
 
-       spin_lock_irqsave(&gic_lock, flags);
+       raw_spin_lock_irqsave(&gic_lock, flags);
        for_each_online_cpu(cpu) {
                write_gic_vl_other(mips_cm_vp_id(cpu));
                write_gic_vo_smask(BIT(intr));
        }
-       spin_unlock_irqrestore(&gic_lock, flags);
+       raw_spin_unlock_irqrestore(&gic_lock, flags);
 }
 
 static void gic_all_vpes_irq_cpu_online(void)
@@ -393,19 +393,21 @@ static void gic_all_vpes_irq_cpu_online(void)
        unsigned long flags;
        int i;
 
-       spin_lock_irqsave(&gic_lock, flags);
+       raw_spin_lock_irqsave(&gic_lock, flags);
 
        for (i = 0; i < ARRAY_SIZE(local_intrs); i++) {
                unsigned int intr = local_intrs[i];
                struct gic_all_vpes_chip_data *cd;
 
+               if (!gic_local_irq_is_routable(intr))
+                       continue;
                cd = &gic_all_vpes_chip_data[intr];
                write_gic_vl_map(mips_gic_vx_map_reg(intr), cd->map);
                if (cd->mask)
                        write_gic_vl_smask(BIT(intr));
        }
 
-       spin_unlock_irqrestore(&gic_lock, flags);
+       raw_spin_unlock_irqrestore(&gic_lock, flags);
 }
 
 static struct irq_chip gic_all_vpes_local_irq_controller = {
@@ -435,11 +437,11 @@ static int gic_shared_irq_domain_map(struct irq_domain *d, unsigned int virq,
 
        data = irq_get_irq_data(virq);
 
-       spin_lock_irqsave(&gic_lock, flags);
+       raw_spin_lock_irqsave(&gic_lock, flags);
        write_gic_map_pin(intr, GIC_MAP_PIN_MAP_TO_PIN | gic_cpu_pin);
        write_gic_map_vp(intr, BIT(mips_cm_vp_id(cpu)));
        irq_data_update_effective_affinity(data, cpumask_of(cpu));
-       spin_unlock_irqrestore(&gic_lock, flags);
+       raw_spin_unlock_irqrestore(&gic_lock, flags);
 
        return 0;
 }
@@ -531,12 +533,12 @@ static int gic_irq_domain_map(struct irq_domain *d, unsigned int virq,
        if (!gic_local_irq_is_routable(intr))
                return -EPERM;
 
-       spin_lock_irqsave(&gic_lock, flags);
+       raw_spin_lock_irqsave(&gic_lock, flags);
        for_each_online_cpu(cpu) {
                write_gic_vl_other(mips_cm_vp_id(cpu));
                write_gic_vo_map(mips_gic_vx_map_reg(intr), map);
        }
-       spin_unlock_irqrestore(&gic_lock, flags);
+       raw_spin_unlock_irqrestore(&gic_lock, flags);
 
        return 0;
 }
index 83455ca..25cf4f8 100644 (file)
@@ -244,132 +244,6 @@ static void __exception_irq_entry mmp2_handle_irq(struct pt_regs *regs)
        generic_handle_domain_irq(icu_data[0].domain, hwirq);
 }
 
-/* MMP (ARMv5) */
-void __init icu_init_irq(void)
-{
-       int irq;
-
-       max_icu_nr = 1;
-       mmp_icu_base = ioremap(0xd4282000, 0x1000);
-       icu_data[0].conf_enable = mmp_conf.conf_enable;
-       icu_data[0].conf_disable = mmp_conf.conf_disable;
-       icu_data[0].conf_mask = mmp_conf.conf_mask;
-       icu_data[0].nr_irqs = 64;
-       icu_data[0].virq_base = 0;
-       icu_data[0].domain = irq_domain_add_legacy(NULL, 64, 0, 0,
-                                                  &irq_domain_simple_ops,
-                                                  &icu_data[0]);
-       for (irq = 0; irq < 64; irq++) {
-               icu_mask_irq(irq_get_irq_data(irq));
-               irq_set_chip_and_handler(irq, &icu_irq_chip, handle_level_irq);
-       }
-       irq_set_default_host(icu_data[0].domain);
-       set_handle_irq(mmp_handle_irq);
-}
-
-/* MMP2 (ARMv7) */
-void __init mmp2_init_icu(void)
-{
-       int irq, end;
-
-       max_icu_nr = 8;
-       mmp_icu_base = ioremap(0xd4282000, 0x1000);
-       icu_data[0].conf_enable = mmp2_conf.conf_enable;
-       icu_data[0].conf_disable = mmp2_conf.conf_disable;
-       icu_data[0].conf_mask = mmp2_conf.conf_mask;
-       icu_data[0].nr_irqs = 64;
-       icu_data[0].virq_base = 0;
-       icu_data[0].domain = irq_domain_add_legacy(NULL, 64, 0, 0,
-                                                  &irq_domain_simple_ops,
-                                                  &icu_data[0]);
-       icu_data[1].reg_status = mmp_icu_base + 0x150;
-       icu_data[1].reg_mask = mmp_icu_base + 0x168;
-       icu_data[1].clr_mfp_irq_base = icu_data[0].virq_base +
-                               icu_data[0].nr_irqs;
-       icu_data[1].clr_mfp_hwirq = 1;          /* offset to IRQ_MMP2_PMIC_BASE */
-       icu_data[1].nr_irqs = 2;
-       icu_data[1].cascade_irq = 4;
-       icu_data[1].virq_base = icu_data[0].virq_base + icu_data[0].nr_irqs;
-       icu_data[1].domain = irq_domain_add_legacy(NULL, icu_data[1].nr_irqs,
-                                                  icu_data[1].virq_base, 0,
-                                                  &irq_domain_simple_ops,
-                                                  &icu_data[1]);
-       icu_data[2].reg_status = mmp_icu_base + 0x154;
-       icu_data[2].reg_mask = mmp_icu_base + 0x16c;
-       icu_data[2].nr_irqs = 2;
-       icu_data[2].cascade_irq = 5;
-       icu_data[2].virq_base = icu_data[1].virq_base + icu_data[1].nr_irqs;
-       icu_data[2].domain = irq_domain_add_legacy(NULL, icu_data[2].nr_irqs,
-                                                  icu_data[2].virq_base, 0,
-                                                  &irq_domain_simple_ops,
-                                                  &icu_data[2]);
-       icu_data[3].reg_status = mmp_icu_base + 0x180;
-       icu_data[3].reg_mask = mmp_icu_base + 0x17c;
-       icu_data[3].nr_irqs = 3;
-       icu_data[3].cascade_irq = 9;
-       icu_data[3].virq_base = icu_data[2].virq_base + icu_data[2].nr_irqs;
-       icu_data[3].domain = irq_domain_add_legacy(NULL, icu_data[3].nr_irqs,
-                                                  icu_data[3].virq_base, 0,
-                                                  &irq_domain_simple_ops,
-                                                  &icu_data[3]);
-       icu_data[4].reg_status = mmp_icu_base + 0x158;
-       icu_data[4].reg_mask = mmp_icu_base + 0x170;
-       icu_data[4].nr_irqs = 5;
-       icu_data[4].cascade_irq = 17;
-       icu_data[4].virq_base = icu_data[3].virq_base + icu_data[3].nr_irqs;
-       icu_data[4].domain = irq_domain_add_legacy(NULL, icu_data[4].nr_irqs,
-                                                  icu_data[4].virq_base, 0,
-                                                  &irq_domain_simple_ops,
-                                                  &icu_data[4]);
-       icu_data[5].reg_status = mmp_icu_base + 0x15c;
-       icu_data[5].reg_mask = mmp_icu_base + 0x174;
-       icu_data[5].nr_irqs = 15;
-       icu_data[5].cascade_irq = 35;
-       icu_data[5].virq_base = icu_data[4].virq_base + icu_data[4].nr_irqs;
-       icu_data[5].domain = irq_domain_add_legacy(NULL, icu_data[5].nr_irqs,
-                                                  icu_data[5].virq_base, 0,
-                                                  &irq_domain_simple_ops,
-                                                  &icu_data[5]);
-       icu_data[6].reg_status = mmp_icu_base + 0x160;
-       icu_data[6].reg_mask = mmp_icu_base + 0x178;
-       icu_data[6].nr_irqs = 2;
-       icu_data[6].cascade_irq = 51;
-       icu_data[6].virq_base = icu_data[5].virq_base + icu_data[5].nr_irqs;
-       icu_data[6].domain = irq_domain_add_legacy(NULL, icu_data[6].nr_irqs,
-                                                  icu_data[6].virq_base, 0,
-                                                  &irq_domain_simple_ops,
-                                                  &icu_data[6]);
-       icu_data[7].reg_status = mmp_icu_base + 0x188;
-       icu_data[7].reg_mask = mmp_icu_base + 0x184;
-       icu_data[7].nr_irqs = 2;
-       icu_data[7].cascade_irq = 55;
-       icu_data[7].virq_base = icu_data[6].virq_base + icu_data[6].nr_irqs;
-       icu_data[7].domain = irq_domain_add_legacy(NULL, icu_data[7].nr_irqs,
-                                                  icu_data[7].virq_base, 0,
-                                                  &irq_domain_simple_ops,
-                                                  &icu_data[7]);
-       end = icu_data[7].virq_base + icu_data[7].nr_irqs;
-       for (irq = 0; irq < end; irq++) {
-               icu_mask_irq(irq_get_irq_data(irq));
-               if (irq == icu_data[1].cascade_irq ||
-                   irq == icu_data[2].cascade_irq ||
-                   irq == icu_data[3].cascade_irq ||
-                   irq == icu_data[4].cascade_irq ||
-                   irq == icu_data[5].cascade_irq ||
-                   irq == icu_data[6].cascade_irq ||
-                   irq == icu_data[7].cascade_irq) {
-                       irq_set_chip(irq, &icu_irq_chip);
-                       irq_set_chained_handler(irq, icu_mux_irq_demux);
-               } else {
-                       irq_set_chip_and_handler(irq, &icu_irq_chip,
-                                                handle_level_irq);
-               }
-       }
-       irq_set_default_host(icu_data[0].domain);
-       set_handle_irq(mmp2_handle_irq);
-}
-
-#ifdef CONFIG_OF
 static int __init mmp_init_bases(struct device_node *node)
 {
        int ret, nr_irqs, irq, i = 0;
@@ -548,4 +422,3 @@ err:
        return -EINVAL;
 }
 IRQCHIP_DECLARE(mmp2_mux_intc, "mrvl,mmp2-mux-intc", mmp2_mux_of_init);
-#endif
index 55cb6b5..be96806 100644 (file)
@@ -201,6 +201,7 @@ static int __init icoll_of_init(struct device_node *np,
        stmp_reset_block(icoll_priv.ctrl);
 
        icoll_add_domain(np, ICOLL_NUM_IRQS);
+       set_handle_irq(icoll_handle_irq);
 
        return 0;
 }
index 6a3f749..b5fa76c 100644 (file)
@@ -173,6 +173,16 @@ static struct irq_chip stm32_exti_h_chip_direct;
 #define EXTI_INVALID_IRQ       U8_MAX
 #define STM32MP1_DESC_IRQ_SIZE (ARRAY_SIZE(stm32mp1_exti_banks) * IRQS_PER_BANK)
 
+/*
+ * Use some intentionally tricky logic here to initialize the whole array to
+ * EXTI_INVALID_IRQ, but then override certain fields, requiring us to indicate
+ * that we "know" that there are overrides in this structure, and we'll need to
+ * disable that warning from W=1 builds.
+ */
+__diag_push();
+__diag_ignore_all("-Woverride-init",
+                 "logic to initialize all and then override some is OK");
+
 static const u8 stm32mp1_desc_irq[] = {
        /* default value */
        [0 ... (STM32MP1_DESC_IRQ_SIZE - 1)] = EXTI_INVALID_IRQ,
@@ -208,6 +218,7 @@ static const u8 stm32mp1_desc_irq[] = {
        [31] = 53,
        [32] = 82,
        [33] = 83,
+       [46] = 151,
        [47] = 93,
        [48] = 138,
        [50] = 139,
@@ -266,6 +277,8 @@ static const u8 stm32mp13_desc_irq[] = {
        [70] = 98,
 };
 
+__diag_pop();
+
 static const struct stm32_exti_drv_data stm32mp1_drv_data = {
        .exti_banks = stm32mp1_exti_banks,
        .bank_nr = ARRAY_SIZE(stm32mp1_exti_banks),
index 55a0372..1c84981 100644 (file)
@@ -312,14 +312,14 @@ static int lpg_calc_freq(struct lpg_channel *chan, uint64_t period)
                max_res = LPG_RESOLUTION_9BIT;
        }
 
-       min_period = (u64)NSEC_PER_SEC *
-                       div64_u64((1 << pwm_resolution_arr[0]), clk_rate_arr[clk_len - 1]);
+       min_period = div64_u64((u64)NSEC_PER_SEC * (1 << pwm_resolution_arr[0]),
+                              clk_rate_arr[clk_len - 1]);
        if (period <= min_period)
                return -EINVAL;
 
        /* Limit period to largest possible value, to avoid overflows */
-       max_period = (u64)NSEC_PER_SEC * max_res * LPG_MAX_PREDIV *
-                       div64_u64((1 << LPG_MAX_M), 1024);
+       max_period = div64_u64((u64)NSEC_PER_SEC * max_res * LPG_MAX_PREDIV * (1 << LPG_MAX_M),
+                              1024);
        if (period > max_period)
                period = max_period;
 
index c4a705c..fc6a12a 100644 (file)
@@ -98,6 +98,7 @@ static ssize_t mbox_test_message_write(struct file *filp,
                                       size_t count, loff_t *ppos)
 {
        struct mbox_test_device *tdev = filp->private_data;
+       char *message;
        void *data;
        int ret;
 
@@ -113,12 +114,13 @@ static ssize_t mbox_test_message_write(struct file *filp,
                return -EINVAL;
        }
 
-       mutex_lock(&tdev->mutex);
-
-       tdev->message = kzalloc(MBOX_MAX_MSG_LEN, GFP_KERNEL);
-       if (!tdev->message)
+       message = kzalloc(MBOX_MAX_MSG_LEN, GFP_KERNEL);
+       if (!message)
                return -ENOMEM;
 
+       mutex_lock(&tdev->mutex);
+
+       tdev->message = message;
        ret = copy_from_user(tdev->message, userbuf, count);
        if (ret) {
                ret = -EFAULT;
index aebb7ef..5a79bb3 100644 (file)
@@ -275,7 +275,7 @@ struct bcache_device {
 
        int (*cache_miss)(struct btree *b, struct search *s,
                          struct bio *bio, unsigned int sectors);
-       int (*ioctl)(struct bcache_device *d, fmode_t mode,
+       int (*ioctl)(struct bcache_device *d, blk_mode_t mode,
                     unsigned int cmd, unsigned long arg);
 };
 
@@ -1004,11 +1004,11 @@ extern struct workqueue_struct *bch_flush_wq;
 extern struct mutex bch_register_lock;
 extern struct list_head bch_cache_sets;
 
-extern struct kobj_type bch_cached_dev_ktype;
-extern struct kobj_type bch_flash_dev_ktype;
-extern struct kobj_type bch_cache_set_ktype;
-extern struct kobj_type bch_cache_set_internal_ktype;
-extern struct kobj_type bch_cache_ktype;
+extern const struct kobj_type bch_cached_dev_ktype;
+extern const struct kobj_type bch_flash_dev_ktype;
+extern const struct kobj_type bch_cache_set_ktype;
+extern const struct kobj_type bch_cache_set_internal_ktype;
+extern const struct kobj_type bch_cache_ktype;
 
 void bch_cached_dev_release(struct kobject *kobj);
 void bch_flash_dev_release(struct kobject *kobj);
index 147c493..fd121a6 100644 (file)
@@ -559,6 +559,27 @@ static void mca_data_alloc(struct btree *b, struct bkey *k, gfp_t gfp)
        }
 }
 
+#define cmp_int(l, r)          ((l > r) - (l < r))
+
+#ifdef CONFIG_PROVE_LOCKING
+static int btree_lock_cmp_fn(const struct lockdep_map *_a,
+                            const struct lockdep_map *_b)
+{
+       const struct btree *a = container_of(_a, struct btree, lock.dep_map);
+       const struct btree *b = container_of(_b, struct btree, lock.dep_map);
+
+       return -cmp_int(a->level, b->level) ?: bkey_cmp(&a->key, &b->key);
+}
+
+static void btree_lock_print_fn(const struct lockdep_map *map)
+{
+       const struct btree *b = container_of(map, struct btree, lock.dep_map);
+
+       printk(KERN_CONT " l=%u %llu:%llu", b->level,
+              KEY_INODE(&b->key), KEY_OFFSET(&b->key));
+}
+#endif
+
 static struct btree *mca_bucket_alloc(struct cache_set *c,
                                      struct bkey *k, gfp_t gfp)
 {
@@ -572,7 +593,7 @@ static struct btree *mca_bucket_alloc(struct cache_set *c,
                return NULL;
 
        init_rwsem(&b->lock);
-       lockdep_set_novalidate_class(&b->lock);
+       lock_set_cmp_fn(&b->lock, btree_lock_cmp_fn, btree_lock_print_fn);
        mutex_init(&b->write_lock);
        lockdep_set_novalidate_class(&b->write_lock);
        INIT_LIST_HEAD(&b->list);
@@ -885,7 +906,7 @@ static struct btree *mca_cannibalize(struct cache_set *c, struct btree_op *op,
  * cannibalize_bucket() will take. This means every time we unlock the root of
  * the btree, we need to release this lock if we have it held.
  */
-static void bch_cannibalize_unlock(struct cache_set *c)
+void bch_cannibalize_unlock(struct cache_set *c)
 {
        spin_lock(&c->btree_cannibalize_lock);
        if (c->btree_cache_alloc_lock == current) {
@@ -1090,10 +1111,12 @@ struct btree *__bch_btree_node_alloc(struct cache_set *c, struct btree_op *op,
                                     struct btree *parent)
 {
        BKEY_PADDED(key) k;
-       struct btree *b = ERR_PTR(-EAGAIN);
+       struct btree *b;
 
        mutex_lock(&c->bucket_lock);
 retry:
+       /* return ERR_PTR(-EAGAIN) when it fails */
+       b = ERR_PTR(-EAGAIN);
        if (__bch_bucket_alloc_set(c, RESERVE_BTREE, &k.key, wait))
                goto err;
 
@@ -1138,7 +1161,7 @@ static struct btree *btree_node_alloc_replacement(struct btree *b,
 {
        struct btree *n = bch_btree_node_alloc(b->c, op, b->level, b->parent);
 
-       if (!IS_ERR_OR_NULL(n)) {
+       if (!IS_ERR(n)) {
                mutex_lock(&n->write_lock);
                bch_btree_sort_into(&b->keys, &n->keys, &b->c->sort);
                bkey_copy_key(&n->key, &b->key);
@@ -1340,7 +1363,7 @@ static int btree_gc_coalesce(struct btree *b, struct btree_op *op,
        memset(new_nodes, 0, sizeof(new_nodes));
        closure_init_stack(&cl);
 
-       while (nodes < GC_MERGE_NODES && !IS_ERR_OR_NULL(r[nodes].b))
+       while (nodes < GC_MERGE_NODES && !IS_ERR(r[nodes].b))
                keys += r[nodes++].keys;
 
        blocks = btree_default_blocks(b->c) * 2 / 3;
@@ -1352,7 +1375,7 @@ static int btree_gc_coalesce(struct btree *b, struct btree_op *op,
 
        for (i = 0; i < nodes; i++) {
                new_nodes[i] = btree_node_alloc_replacement(r[i].b, NULL);
-               if (IS_ERR_OR_NULL(new_nodes[i]))
+               if (IS_ERR(new_nodes[i]))
                        goto out_nocoalesce;
        }
 
@@ -1487,7 +1510,7 @@ out_nocoalesce:
        bch_keylist_free(&keylist);
 
        for (i = 0; i < nodes; i++)
-               if (!IS_ERR_OR_NULL(new_nodes[i])) {
+               if (!IS_ERR(new_nodes[i])) {
                        btree_node_free(new_nodes[i]);
                        rw_unlock(true, new_nodes[i]);
                }
@@ -1669,7 +1692,7 @@ static int bch_btree_gc_root(struct btree *b, struct btree_op *op,
        if (should_rewrite) {
                n = btree_node_alloc_replacement(b, NULL);
 
-               if (!IS_ERR_OR_NULL(n)) {
+               if (!IS_ERR(n)) {
                        bch_btree_node_write_sync(n);
 
                        bch_btree_set_root(n);
@@ -1968,6 +1991,15 @@ static int bch_btree_check_thread(void *arg)
                        c->gc_stats.nodes++;
                        bch_btree_op_init(&op, 0);
                        ret = bcache_btree(check_recurse, p, c->root, &op);
+                       /*
+                        * The op may be added to cache_set's btree_cache_wait
+                        * in mca_cannibalize(), must ensure it is removed from
+                        * the list and release btree_cache_alloc_lock before
+                        * free op memory.
+                        * Otherwise, the btree_cache_wait will be damaged.
+                        */
+                       bch_cannibalize_unlock(c);
+                       finish_wait(&c->btree_cache_wait, &(&op)->wait);
                        if (ret)
                                goto out;
                }
index 1b5fdbc..45d64b5 100644 (file)
@@ -247,8 +247,8 @@ static inline void bch_btree_op_init(struct btree_op *op, int write_lock_level)
 
 static inline void rw_lock(bool w, struct btree *b, int level)
 {
-       w ? down_write_nested(&b->lock, level + 1)
-         : down_read_nested(&b->lock, level + 1);
+       w ? down_write(&b->lock)
+         : down_read(&b->lock);
        if (w)
                b->seq++;
 }
@@ -282,6 +282,7 @@ void bch_initial_gc_finish(struct cache_set *c);
 void bch_moving_gc(struct cache_set *c);
 int bch_btree_check(struct cache_set *c);
 void bch_initial_mark_key(struct cache_set *c, int level, struct bkey *k);
+void bch_cannibalize_unlock(struct cache_set *c);
 
 static inline void wake_up_gc(struct cache_set *c)
 {
index 67a2e29..a9b1f38 100644 (file)
@@ -1228,7 +1228,7 @@ void cached_dev_submit_bio(struct bio *bio)
                detached_dev_do_request(d, bio, orig_bdev, start_time);
 }
 
-static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode,
+static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode,
                            unsigned int cmd, unsigned long arg)
 {
        struct cached_dev *dc = container_of(d, struct cached_dev, disk);
@@ -1318,7 +1318,7 @@ void flash_dev_submit_bio(struct bio *bio)
        continue_at(cl, search_free, NULL);
 }
 
-static int flash_dev_ioctl(struct bcache_device *d, fmode_t mode,
+static int flash_dev_ioctl(struct bcache_device *d, blk_mode_t mode,
                           unsigned int cmd, unsigned long arg)
 {
        return -ENOTTY;
index bd3afc8..21b445f 100644 (file)
@@ -18,7 +18,6 @@ struct cache_stats {
        unsigned long cache_misses;
        unsigned long cache_bypass_hits;
        unsigned long cache_bypass_misses;
-       unsigned long cache_readaheads;
        unsigned long cache_miss_collisions;
        unsigned long sectors_bypassed;
 
index 7e9d19f..e2a8036 100644 (file)
@@ -732,9 +732,9 @@ out:
 
 /* Bcache device */
 
-static int open_dev(struct block_device *b, fmode_t mode)
+static int open_dev(struct gendisk *disk, blk_mode_t mode)
 {
-       struct bcache_device *d = b->bd_disk->private_data;
+       struct bcache_device *d = disk->private_data;
 
        if (test_bit(BCACHE_DEV_CLOSING, &d->flags))
                return -ENXIO;
@@ -743,14 +743,14 @@ static int open_dev(struct block_device *b, fmode_t mode)
        return 0;
 }
 
-static void release_dev(struct gendisk *b, fmode_t mode)
+static void release_dev(struct gendisk *b)
 {
        struct bcache_device *d = b->private_data;
 
        closure_put(&d->cl);
 }
 
-static int ioctl_dev(struct block_device *b, fmode_t mode,
+static int ioctl_dev(struct block_device *b, blk_mode_t mode,
                     unsigned int cmd, unsigned long arg)
 {
        struct bcache_device *d = b->bd_disk->private_data;
@@ -1369,7 +1369,7 @@ static void cached_dev_free(struct closure *cl)
                put_page(virt_to_page(dc->sb_disk));
 
        if (!IS_ERR_OR_NULL(dc->bdev))
-               blkdev_put(dc->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+               blkdev_put(dc->bdev, bcache_kobj);
 
        wake_up(&unregister_wait);
 
@@ -1723,7 +1723,7 @@ static void cache_set_flush(struct closure *cl)
        if (!IS_ERR_OR_NULL(c->gc_thread))
                kthread_stop(c->gc_thread);
 
-       if (!IS_ERR_OR_NULL(c->root))
+       if (!IS_ERR(c->root))
                list_add(&c->root->list, &c->btree_cache);
 
        /*
@@ -2087,7 +2087,7 @@ static int run_cache_set(struct cache_set *c)
 
                err = "cannot allocate new btree root";
                c->root = __bch_btree_node_alloc(c, NULL, 0, true, NULL);
-               if (IS_ERR_OR_NULL(c->root))
+               if (IS_ERR(c->root))
                        goto err;
 
                mutex_lock(&c->root->write_lock);
@@ -2218,7 +2218,7 @@ void bch_cache_release(struct kobject *kobj)
                put_page(virt_to_page(ca->sb_disk));
 
        if (!IS_ERR_OR_NULL(ca->bdev))
-               blkdev_put(ca->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+               blkdev_put(ca->bdev, bcache_kobj);
 
        kfree(ca);
        module_put(THIS_MODULE);
@@ -2359,7 +2359,7 @@ static int register_cache(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
                 * call blkdev_put() to bdev in bch_cache_release(). So we
                 * explicitly call blkdev_put() here.
                 */
-               blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+               blkdev_put(bdev, bcache_kobj);
                if (ret == -ENOMEM)
                        err = "cache_alloc(): -ENOMEM";
                else if (ret == -EPERM)
@@ -2461,7 +2461,7 @@ static void register_bdev_worker(struct work_struct *work)
        if (!dc) {
                fail = true;
                put_page(virt_to_page(args->sb_disk));
-               blkdev_put(args->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+               blkdev_put(args->bdev, bcache_kobj);
                goto out;
        }
 
@@ -2491,7 +2491,7 @@ static void register_cache_worker(struct work_struct *work)
        if (!ca) {
                fail = true;
                put_page(virt_to_page(args->sb_disk));
-               blkdev_put(args->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+               blkdev_put(args->bdev, bcache_kobj);
                goto out;
        }
 
@@ -2558,9 +2558,8 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 
        ret = -EINVAL;
        err = "failed to open device";
-       bdev = blkdev_get_by_path(strim(path),
-                                 FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-                                 sb);
+       bdev = blkdev_get_by_path(strim(path), BLK_OPEN_READ | BLK_OPEN_WRITE,
+                                 bcache_kobj, NULL);
        if (IS_ERR(bdev)) {
                if (bdev == ERR_PTR(-EBUSY)) {
                        dev_t dev;
@@ -2648,7 +2647,7 @@ async_done:
 out_put_sb_page:
        put_page(virt_to_page(sb_disk));
 out_blkdev_put:
-       blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+       blkdev_put(bdev, register_bcache);
 out_free_sb:
        kfree(sb);
 out_free_path:
index c6f6770..0e2c188 100644 (file)
@@ -1111,26 +1111,25 @@ SHOW(__bch_cache)
 
                vfree(p);
 
-               ret = scnprintf(buf, PAGE_SIZE,
-                               "Unused:                %zu%%\n"
-                               "Clean:         %zu%%\n"
-                               "Dirty:         %zu%%\n"
-                               "Metadata:      %zu%%\n"
-                               "Average:       %llu\n"
-                               "Sectors per Q: %zu\n"
-                               "Quantiles:     [",
-                               unused * 100 / (size_t) ca->sb.nbuckets,
-                               available * 100 / (size_t) ca->sb.nbuckets,
-                               dirty * 100 / (size_t) ca->sb.nbuckets,
-                               meta * 100 / (size_t) ca->sb.nbuckets, sum,
-                               n * ca->sb.bucket_size / (ARRAY_SIZE(q) + 1));
+               ret = sysfs_emit(buf,
+                                "Unused:               %zu%%\n"
+                                "Clean:                %zu%%\n"
+                                "Dirty:                %zu%%\n"
+                                "Metadata:     %zu%%\n"
+                                "Average:      %llu\n"
+                                "Sectors per Q:        %zu\n"
+                                "Quantiles:    [",
+                                unused * 100 / (size_t) ca->sb.nbuckets,
+                                available * 100 / (size_t) ca->sb.nbuckets,
+                                dirty * 100 / (size_t) ca->sb.nbuckets,
+                                meta * 100 / (size_t) ca->sb.nbuckets, sum,
+                                n * ca->sb.bucket_size / (ARRAY_SIZE(q) + 1));
 
                for (i = 0; i < ARRAY_SIZE(q); i++)
-                       ret += scnprintf(buf + ret, PAGE_SIZE - ret,
-                                        "%u ", q[i]);
+                       ret += sysfs_emit_at(buf, ret, "%u ", q[i]);
                ret--;
 
-               ret += scnprintf(buf + ret, PAGE_SIZE - ret, "]\n");
+               ret += sysfs_emit_at(buf, ret, "]\n");
 
                return ret;
        }
index a2ff644..65b8bd9 100644 (file)
@@ -3,7 +3,7 @@
 #define _BCACHE_SYSFS_H_
 
 #define KTYPE(type)                                                    \
-struct kobj_type type ## _ktype = {                                    \
+const struct kobj_type type ## _ktype = {                                      \
        .release        = type ## _release,                             \
        .sysfs_ops      = &((const struct sysfs_ops) {                  \
                .show   = type ## _show,                                \
index d4a5fc0..24c0490 100644 (file)
@@ -890,6 +890,16 @@ static int bch_root_node_dirty_init(struct cache_set *c,
        if (ret < 0)
                pr_warn("sectors dirty init failed, ret=%d!\n", ret);
 
+       /*
+        * The op may be added to cache_set's btree_cache_wait
+        * in mca_cannibalize(), must ensure it is removed from
+        * the list and release btree_cache_alloc_lock before
+        * free op memory.
+        * Otherwise, the btree_cache_wait will be damaged.
+        */
+       bch_cannibalize_unlock(c);
+       finish_wait(&c->btree_cache_wait, &(&op.op)->wait);
+
        return ret;
 }
 
index 9e0c699..acffed7 100644 (file)
@@ -1828,7 +1828,7 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
         * Replacement block manager (new_bm) is created and old_bm destroyed outside of
         * cmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of
         * shrinker associated with the block manager's bufio client vs cmd root_lock).
-        * - must take shrinker_mutex without holding cmd->root_lock
+        * - must take shrinker_rwsem without holding cmd->root_lock
         */
        new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
                                         CACHE_MAX_CONCURRENT_LOCKS);
index 8728962..911f73f 100644 (file)
@@ -2051,8 +2051,8 @@ static int parse_metadata_dev(struct cache_args *ca, struct dm_arg_set *as,
        if (!at_least_one_arg(as, error))
                return -EINVAL;
 
-       r = dm_get_device(ca->ti, dm_shift_arg(as), FMODE_READ | FMODE_WRITE,
-                         &ca->metadata_dev);
+       r = dm_get_device(ca->ti, dm_shift_arg(as),
+                         BLK_OPEN_READ | BLK_OPEN_WRITE, &ca->metadata_dev);
        if (r) {
                *error = "Error opening metadata device";
                return r;
@@ -2074,8 +2074,8 @@ static int parse_cache_dev(struct cache_args *ca, struct dm_arg_set *as,
        if (!at_least_one_arg(as, error))
                return -EINVAL;
 
-       r = dm_get_device(ca->ti, dm_shift_arg(as), FMODE_READ | FMODE_WRITE,
-                         &ca->cache_dev);
+       r = dm_get_device(ca->ti, dm_shift_arg(as),
+                         BLK_OPEN_READ | BLK_OPEN_WRITE, &ca->cache_dev);
        if (r) {
                *error = "Error opening cache device";
                return r;
@@ -2093,8 +2093,8 @@ static int parse_origin_dev(struct cache_args *ca, struct dm_arg_set *as,
        if (!at_least_one_arg(as, error))
                return -EINVAL;
 
-       r = dm_get_device(ca->ti, dm_shift_arg(as), FMODE_READ | FMODE_WRITE,
-                         &ca->origin_dev);
+       r = dm_get_device(ca->ti, dm_shift_arg(as),
+                         BLK_OPEN_READ | BLK_OPEN_WRITE, &ca->origin_dev);
        if (r) {
                *error = "Error opening origin device";
                return r;
index f467cdb..94b2fc3 100644 (file)
@@ -1683,8 +1683,8 @@ static int parse_metadata_dev(struct clone *clone, struct dm_arg_set *as, char *
        int r;
        sector_t metadata_dev_size;
 
-       r = dm_get_device(clone->ti, dm_shift_arg(as), FMODE_READ | FMODE_WRITE,
-                         &clone->metadata_dev);
+       r = dm_get_device(clone->ti, dm_shift_arg(as),
+                         BLK_OPEN_READ | BLK_OPEN_WRITE, &clone->metadata_dev);
        if (r) {
                *error = "Error opening metadata device";
                return r;
@@ -1703,8 +1703,8 @@ static int parse_dest_dev(struct clone *clone, struct dm_arg_set *as, char **err
        int r;
        sector_t dest_dev_size;
 
-       r = dm_get_device(clone->ti, dm_shift_arg(as), FMODE_READ | FMODE_WRITE,
-                         &clone->dest_dev);
+       r = dm_get_device(clone->ti, dm_shift_arg(as),
+                         BLK_OPEN_READ | BLK_OPEN_WRITE, &clone->dest_dev);
        if (r) {
                *error = "Error opening destination device";
                return r;
@@ -1725,7 +1725,7 @@ static int parse_source_dev(struct clone *clone, struct dm_arg_set *as, char **e
        int r;
        sector_t source_dev_size;
 
-       r = dm_get_device(clone->ti, dm_shift_arg(as), FMODE_READ,
+       r = dm_get_device(clone->ti, dm_shift_arg(as), BLK_OPEN_READ,
                          &clone->source_dev);
        if (r) {
                *error = "Error opening source device";
index aecab0c..ce913ad 100644 (file)
@@ -207,11 +207,10 @@ struct dm_table {
        unsigned integrity_added:1;
 
        /*
-        * Indicates the rw permissions for the new logical
-        * device.  This should be a combination of FMODE_READ
-        * and FMODE_WRITE.
+        * Indicates the rw permissions for the new logical device.  This
+        * should be a combination of BLK_OPEN_READ and BLK_OPEN_WRITE.
         */
-       fmode_t mode;
+       blk_mode_t mode;
 
        /* a list of devices used by this table */
        struct list_head devices;
index 8b47b91..09e37eb 100644 (file)
@@ -1693,8 +1693,7 @@ retry:
 
                len = (remaining_size > PAGE_SIZE) ? PAGE_SIZE : remaining_size;
 
-               bio_add_page(clone, page, len, 0);
-
+               __bio_add_page(clone, page, len, 0);
                remaining_size -= len;
        }
 
index 0d70914..6acfa5b 100644 (file)
@@ -1482,14 +1482,16 @@ static int era_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
        era->ti = ti;
 
-       r = dm_get_device(ti, argv[0], FMODE_READ | FMODE_WRITE, &era->metadata_dev);
+       r = dm_get_device(ti, argv[0], BLK_OPEN_READ | BLK_OPEN_WRITE,
+                         &era->metadata_dev);
        if (r) {
                ti->error = "Error opening metadata device";
                era_destroy(era);
                return -EINVAL;
        }
 
-       r = dm_get_device(ti, argv[1], FMODE_READ | FMODE_WRITE, &era->origin_dev);
+       r = dm_get_device(ti, argv[1], BLK_OPEN_READ | BLK_OPEN_WRITE,
+                         &era->origin_dev);
        if (r) {
                ti->error = "Error opening data device";
                era_destroy(era);
index d369457..2a71bcd 100644 (file)
@@ -293,8 +293,10 @@ static int __init dm_init_init(void)
 
        for (i = 0; i < ARRAY_SIZE(waitfor); i++) {
                if (waitfor[i]) {
+                       dev_t dev;
+
                        DMINFO("waiting for device %s ...", waitfor[i]);
-                       while (!dm_get_dev_t(waitfor[i]))
+                       while (early_lookup_bdev(waitfor[i], &dev))
                                fsleep(5000);
                }
        }
index 31838b1..63ec502 100644 (file)
@@ -4268,10 +4268,10 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned int argc, char **argv
        }
 
        /*
-        * If this workqueue were percpu, it would cause bio reordering
+        * If this workqueue weren't ordered, it would cause bio reordering
         * and reduced performance.
         */
-       ic->wait_wq = alloc_workqueue("dm-integrity-wait", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+       ic->wait_wq = alloc_ordered_workqueue("dm-integrity-wait", WQ_MEM_RECLAIM);
        if (!ic->wait_wq) {
                ti->error = "Cannot allocate workqueue";
                r = -ENOMEM;
index cc77cf3..6d30101 100644 (file)
@@ -861,7 +861,7 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param)
 
                table = dm_get_inactive_table(md, &srcu_idx);
                if (table) {
-                       if (!(dm_table_get_mode(table) & FMODE_WRITE))
+                       if (!(dm_table_get_mode(table) & BLK_OPEN_WRITE))
                                param->flags |= DM_READONLY_FLAG;
                        param->target_count = table->num_targets;
                }
@@ -1168,13 +1168,10 @@ static int do_resume(struct dm_ioctl *param)
        /* Do we need to load a new map ? */
        if (new_map) {
                sector_t old_size, new_size;
-               int srcu_idx;
 
                /* Suspend if it isn't already suspended */
-               old_map = dm_get_live_table(md, &srcu_idx);
-               if ((param->flags & DM_SKIP_LOCKFS_FLAG) || !old_map)
+               if (param->flags & DM_SKIP_LOCKFS_FLAG)
                        suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
-               dm_put_live_table(md, srcu_idx);
                if (param->flags & DM_NOFLUSH_FLAG)
                        suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
                if (!dm_suspended_md(md))
@@ -1192,7 +1189,7 @@ static int do_resume(struct dm_ioctl *param)
                if (old_size && new_size && old_size != new_size)
                        need_resize_uevent = true;
 
-               if (dm_table_get_mode(new_map) & FMODE_WRITE)
+               if (dm_table_get_mode(new_map) & BLK_OPEN_WRITE)
                        set_disk_ro(dm_disk(md), 0);
                else
                        set_disk_ro(dm_disk(md), 1);
@@ -1381,12 +1378,12 @@ static int dev_arm_poll(struct file *filp, struct dm_ioctl *param, size_t param_
        return 0;
 }
 
-static inline fmode_t get_mode(struct dm_ioctl *param)
+static inline blk_mode_t get_mode(struct dm_ioctl *param)
 {
-       fmode_t mode = FMODE_READ | FMODE_WRITE;
+       blk_mode_t mode = BLK_OPEN_READ | BLK_OPEN_WRITE;
 
        if (param->flags & DM_READONLY_FLAG)
-               mode = FMODE_READ;
+               mode = BLK_OPEN_READ;
 
        return mode;
 }
index c8821fc..8846bf5 100644 (file)
@@ -3750,11 +3750,11 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
                 * canceling read-auto mode
                 */
                mddev->ro = 0;
-               if (!mddev->suspended && mddev->sync_thread)
+               if (!mddev->suspended)
                        md_wakeup_thread(mddev->sync_thread);
        }
        set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-       if (!mddev->suspended && mddev->thread)
+       if (!mddev->suspended)
                md_wakeup_thread(mddev->thread);
 
        return 0;
index 9c49f53..bf7a574 100644 (file)
@@ -1241,9 +1241,8 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
        int i;
        int r = -EINVAL;
        char *origin_path, *cow_path;
-       dev_t origin_dev, cow_dev;
        unsigned int args_used, num_flush_bios = 1;
-       fmode_t origin_mode = FMODE_READ;
+       blk_mode_t origin_mode = BLK_OPEN_READ;
 
        if (argc < 4) {
                ti->error = "requires 4 or more arguments";
@@ -1253,7 +1252,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
        if (dm_target_is_snapshot_merge(ti)) {
                num_flush_bios = 2;
-               origin_mode = FMODE_WRITE;
+               origin_mode = BLK_OPEN_WRITE;
        }
 
        s = kzalloc(sizeof(*s), GFP_KERNEL);
@@ -1279,24 +1278,21 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
                ti->error = "Cannot get origin device";
                goto bad_origin;
        }
-       origin_dev = s->origin->bdev->bd_dev;
 
        cow_path = argv[0];
        argv++;
        argc--;
 
-       cow_dev = dm_get_dev_t(cow_path);
-       if (cow_dev && cow_dev == origin_dev) {
-               ti->error = "COW device cannot be the same as origin device";
-               r = -EINVAL;
-               goto bad_cow;
-       }
-
        r = dm_get_device(ti, cow_path, dm_table_get_mode(ti->table), &s->cow);
        if (r) {
                ti->error = "Cannot get COW device";
                goto bad_cow;
        }
+       if (s->cow->bdev && s->cow->bdev == s->origin->bdev) {
+               ti->error = "COW device cannot be the same as origin device";
+               r = -EINVAL;
+               goto bad_store;
+       }
 
        r = dm_exception_store_create(ti, argc, argv, s, &args_used, &s->store);
        if (r) {
index 1398f1d..7d208b2 100644 (file)
@@ -126,7 +126,7 @@ static int alloc_targets(struct dm_table *t, unsigned int num)
        return 0;
 }
 
-int dm_table_create(struct dm_table **result, fmode_t mode,
+int dm_table_create(struct dm_table **result, blk_mode_t mode,
                    unsigned int num_targets, struct mapped_device *md)
 {
        struct dm_table *t = kzalloc(sizeof(*t), GFP_KERNEL);
@@ -304,7 +304,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
  * device and not to touch the existing bdev field in case
  * it is accessed concurrently.
  */
-static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode,
+static int upgrade_mode(struct dm_dev_internal *dd, blk_mode_t new_mode,
                        struct mapped_device *md)
 {
        int r;
@@ -324,23 +324,13 @@ static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode,
 }
 
 /*
- * Convert the path to a device
- */
-dev_t dm_get_dev_t(const char *path)
-{
-       dev_t dev;
-
-       if (lookup_bdev(path, &dev))
-               dev = name_to_dev_t(path);
-       return dev;
-}
-EXPORT_SYMBOL_GPL(dm_get_dev_t);
-
-/*
  * Add a device to the list, or just increment the usage count if
  * it's already present.
+ *
+ * Note: the __ref annotation is because this function can call the __init
+ * marked early_lookup_bdev when called during early boot code from dm-init.c.
  */
-int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
+int __ref dm_get_device(struct dm_target *ti, const char *path, blk_mode_t mode,
                  struct dm_dev **result)
 {
        int r;
@@ -358,9 +348,13 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
                if (MAJOR(dev) != major || MINOR(dev) != minor)
                        return -EOVERFLOW;
        } else {
-               dev = dm_get_dev_t(path);
-               if (!dev)
-                       return -ENODEV;
+               r = lookup_bdev(path, &dev);
+#ifndef MODULE
+               if (r && system_state < SYSTEM_RUNNING)
+                       r = early_lookup_bdev(path, &dev);
+#endif
+               if (r)
+                       return r;
        }
        if (dev == disk_devt(t->md->disk))
                return -EINVAL;
@@ -668,7 +662,8 @@ int dm_table_add_target(struct dm_table *t, const char *type,
                t->singleton = true;
        }
 
-       if (dm_target_always_writeable(ti->type) && !(t->mode & FMODE_WRITE)) {
+       if (dm_target_always_writeable(ti->type) &&
+           !(t->mode & BLK_OPEN_WRITE)) {
                ti->error = "target type may not be included in a read-only table";
                goto bad;
        }
@@ -2039,7 +2034,7 @@ struct list_head *dm_table_get_devices(struct dm_table *t)
        return &t->devices;
 }
 
-fmode_t dm_table_get_mode(struct dm_table *t)
+blk_mode_t dm_table_get_mode(struct dm_table *t)
 {
        return t->mode;
 }
index 9f5cb52..9dd0409 100644 (file)
@@ -1756,13 +1756,15 @@ int dm_thin_remove_range(struct dm_thin_device *td,
 
 int dm_pool_block_is_shared(struct dm_pool_metadata *pmd, dm_block_t b, bool *result)
 {
-       int r;
+       int r = -EINVAL;
        uint32_t ref_count;
 
        down_read(&pmd->root_lock);
-       r = dm_sm_get_count(pmd->data_sm, b, &ref_count);
-       if (!r)
-               *result = (ref_count > 1);
+       if (!pmd->fail_io) {
+               r = dm_sm_get_count(pmd->data_sm, b, &ref_count);
+               if (!r)
+                       *result = (ref_count > 1);
+       }
        up_read(&pmd->root_lock);
 
        return r;
@@ -1770,10 +1772,11 @@ int dm_pool_block_is_shared(struct dm_pool_metadata *pmd, dm_block_t b, bool *re
 
 int dm_pool_inc_data_range(struct dm_pool_metadata *pmd, dm_block_t b, dm_block_t e)
 {
-       int r = 0;
+       int r = -EINVAL;
 
        pmd_write_lock(pmd);
-       r = dm_sm_inc_blocks(pmd->data_sm, b, e);
+       if (!pmd->fail_io)
+               r = dm_sm_inc_blocks(pmd->data_sm, b, e);
        pmd_write_unlock(pmd);
 
        return r;
@@ -1781,10 +1784,11 @@ int dm_pool_inc_data_range(struct dm_pool_metadata *pmd, dm_block_t b, dm_block_
 
 int dm_pool_dec_data_range(struct dm_pool_metadata *pmd, dm_block_t b, dm_block_t e)
 {
-       int r = 0;
+       int r = -EINVAL;
 
        pmd_write_lock(pmd);
-       r = dm_sm_dec_blocks(pmd->data_sm, b, e);
+       if (!pmd->fail_io)
+               r = dm_sm_dec_blocks(pmd->data_sm, b, e);
        pmd_write_unlock(pmd);
 
        return r;
@@ -1887,7 +1891,7 @@ int dm_pool_abort_metadata(struct dm_pool_metadata *pmd)
         * Replacement block manager (new_bm) is created and old_bm destroyed outside of
         * pmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of
         * shrinker associated with the block manager's bufio client vs pmd root_lock).
-        * - must take shrinker_mutex without holding pmd->root_lock
+        * - must take shrinker_rwsem without holding pmd->root_lock
         */
        new_bm = dm_block_manager_create(pmd->bdev, THIN_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
                                         THIN_MAX_CONCURRENT_LOCKS);
index 2b13c94..f1d0dcb 100644 (file)
@@ -401,8 +401,7 @@ static int issue_discard(struct discard_op *op, dm_block_t data_b, dm_block_t da
        sector_t s = block_to_sectors(tc->pool, data_b);
        sector_t len = block_to_sectors(tc->pool, data_e - data_b);
 
-       return __blkdev_issue_discard(tc->pool_dev->bdev, s, len, GFP_NOWAIT,
-                                     &op->bio);
+       return __blkdev_issue_discard(tc->pool_dev->bdev, s, len, GFP_NOIO, &op->bio);
 }
 
 static void end_discard(struct discard_op *op, int r)
@@ -3301,7 +3300,7 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
        unsigned long block_size;
        dm_block_t low_water_blocks;
        struct dm_dev *metadata_dev;
-       fmode_t metadata_mode;
+       blk_mode_t metadata_mode;
 
        /*
         * FIXME Remove validation from scope of lock.
@@ -3334,7 +3333,8 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
        if (r)
                goto out_unlock;
 
-       metadata_mode = FMODE_READ | ((pf.mode == PM_READ_ONLY) ? 0 : FMODE_WRITE);
+       metadata_mode = BLK_OPEN_READ |
+               ((pf.mode == PM_READ_ONLY) ? 0 : BLK_OPEN_WRITE);
        r = dm_get_device(ti, argv[0], metadata_mode, &metadata_dev);
        if (r) {
                ti->error = "Error opening metadata block device";
@@ -3342,7 +3342,7 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
        }
        warn_if_metadata_device_too_big(metadata_dev->bdev);
 
-       r = dm_get_device(ti, argv[1], FMODE_READ | FMODE_WRITE, &data_dev);
+       r = dm_get_device(ti, argv[1], BLK_OPEN_READ | BLK_OPEN_WRITE, &data_dev);
        if (r) {
                ti->error = "Error getting data device";
                goto out_metadata;
@@ -4223,7 +4223,7 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
                        goto bad_origin_dev;
                }
 
-               r = dm_get_device(ti, argv[2], FMODE_READ, &origin_dev);
+               r = dm_get_device(ti, argv[2], BLK_OPEN_READ, &origin_dev);
                if (r) {
                        ti->error = "Error opening origin device";
                        goto bad_origin_dev;
index a9ee2fa..3ef9f01 100644 (file)
@@ -607,7 +607,7 @@ int verity_fec_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v,
        (*argc)--;
 
        if (!strcasecmp(arg_name, DM_VERITY_OPT_FEC_DEV)) {
-               r = dm_get_device(ti, arg_value, FMODE_READ, &v->fec->dev);
+               r = dm_get_device(ti, arg_value, BLK_OPEN_READ, &v->fec->dev);
                if (r) {
                        ti->error = "FEC device lookup failed";
                        return r;
index e35c16e..26adcfe 100644 (file)
@@ -1196,7 +1196,7 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
        if (r)
                goto bad;
 
-       if ((dm_table_get_mode(ti->table) & ~FMODE_READ)) {
+       if ((dm_table_get_mode(ti->table) & ~BLK_OPEN_READ)) {
                ti->error = "Device must be readonly";
                r = -EINVAL;
                goto bad;
@@ -1225,13 +1225,13 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
        }
        v->version = num;
 
-       r = dm_get_device(ti, argv[1], FMODE_READ, &v->data_dev);
+       r = dm_get_device(ti, argv[1], BLK_OPEN_READ, &v->data_dev);
        if (r) {
                ti->error = "Data device lookup failed";
                goto bad;
        }
 
-       r = dm_get_device(ti, argv[2], FMODE_READ, &v->hash_dev);
+       r = dm_get_device(ti, argv[2], BLK_OPEN_READ, &v->hash_dev);
        if (r) {
                ti->error = "Hash device lookup failed";
                goto bad;
index 8f0896a..9d3cca8 100644 (file)
@@ -577,7 +577,7 @@ static struct dmz_mblock *dmz_get_mblock_slow(struct dmz_metadata *zmd,
        bio->bi_iter.bi_sector = dmz_blk2sect(block);
        bio->bi_private = mblk;
        bio->bi_end_io = dmz_mblock_bio_end_io;
-       bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
+       __bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
        submit_bio(bio);
 
        return mblk;
@@ -728,7 +728,7 @@ static int dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
        bio->bi_iter.bi_sector = dmz_blk2sect(block);
        bio->bi_private = mblk;
        bio->bi_end_io = dmz_mblock_bio_end_io;
-       bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
+       __bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
        submit_bio(bio);
 
        return 0;
@@ -752,7 +752,7 @@ static int dmz_rdwr_block(struct dmz_dev *dev, enum req_op op,
        bio = bio_alloc(dev->bdev, 1, op | REQ_SYNC | REQ_META | REQ_PRIO,
                        GFP_NOIO);
        bio->bi_iter.bi_sector = dmz_blk2sect(block);
-       bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0);
+       __bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0);
        ret = submit_bio_wait(bio);
        bio_put(bio);
 
index 3b694ba..fe2d475 100644 (file)
@@ -207,7 +207,7 @@ static int __init local_init(void)
        if (r)
                return r;
 
-       deferred_remove_workqueue = alloc_workqueue("kdmremove", WQ_UNBOUND, 1);
+       deferred_remove_workqueue = alloc_ordered_workqueue("kdmremove", 0);
        if (!deferred_remove_workqueue) {
                r = -ENOMEM;
                goto out_uevent_exit;
@@ -310,13 +310,13 @@ int dm_deleting_md(struct mapped_device *md)
        return test_bit(DMF_DELETING, &md->flags);
 }
 
-static int dm_blk_open(struct block_device *bdev, fmode_t mode)
+static int dm_blk_open(struct gendisk *disk, blk_mode_t mode)
 {
        struct mapped_device *md;
 
        spin_lock(&_minor_lock);
 
-       md = bdev->bd_disk->private_data;
+       md = disk->private_data;
        if (!md)
                goto out;
 
@@ -334,7 +334,7 @@ out:
        return md ? 0 : -ENXIO;
 }
 
-static void dm_blk_close(struct gendisk *disk, fmode_t mode)
+static void dm_blk_close(struct gendisk *disk)
 {
        struct mapped_device *md;
 
@@ -448,7 +448,7 @@ static void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
        dm_put_live_table(md, srcu_idx);
 }
 
-static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
+static int dm_blk_ioctl(struct block_device *bdev, blk_mode_t mode,
                        unsigned int cmd, unsigned long arg)
 {
        struct mapped_device *md = bdev->bd_disk->private_data;
@@ -734,7 +734,7 @@ static char *_dm_claim_ptr = "I belong to device-mapper";
  * Open a table device so we can use it as a map destination.
  */
 static struct table_device *open_table_device(struct mapped_device *md,
-               dev_t dev, fmode_t mode)
+               dev_t dev, blk_mode_t mode)
 {
        struct table_device *td;
        struct block_device *bdev;
@@ -746,7 +746,7 @@ static struct table_device *open_table_device(struct mapped_device *md,
                return ERR_PTR(-ENOMEM);
        refcount_set(&td->count, 1);
 
-       bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr);
+       bdev = blkdev_get_by_dev(dev, mode, _dm_claim_ptr, NULL);
        if (IS_ERR(bdev)) {
                r = PTR_ERR(bdev);
                goto out_free_td;
@@ -771,7 +771,7 @@ static struct table_device *open_table_device(struct mapped_device *md,
        return td;
 
 out_blkdev_put:
-       blkdev_put(bdev, mode | FMODE_EXCL);
+       blkdev_put(bdev, _dm_claim_ptr);
 out_free_td:
        kfree(td);
        return ERR_PTR(r);
@@ -784,14 +784,14 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
 {
        if (md->disk->slave_dir)
                bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
-       blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
+       blkdev_put(td->dm_dev.bdev, _dm_claim_ptr);
        put_dax(td->dm_dev.dax_dev);
        list_del(&td->list);
        kfree(td);
 }
 
 static struct table_device *find_table_device(struct list_head *l, dev_t dev,
-                                             fmode_t mode)
+                                             blk_mode_t mode)
 {
        struct table_device *td;
 
@@ -802,7 +802,7 @@ static struct table_device *find_table_device(struct list_head *l, dev_t dev,
        return NULL;
 }
 
-int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
+int dm_get_table_device(struct mapped_device *md, dev_t dev, blk_mode_t mode,
                        struct dm_dev **result)
 {
        struct table_device *td;
@@ -1172,7 +1172,8 @@ static inline sector_t max_io_len_target_boundary(struct dm_target *ti,
 }
 
 static sector_t __max_io_len(struct dm_target *ti, sector_t sector,
-                            unsigned int max_granularity)
+                            unsigned int max_granularity,
+                            unsigned int max_sectors)
 {
        sector_t target_offset = dm_target_offset(ti, sector);
        sector_t len = max_io_len_target_boundary(ti, target_offset);
@@ -1186,13 +1187,13 @@ static sector_t __max_io_len(struct dm_target *ti, sector_t sector,
        if (!max_granularity)
                return len;
        return min_t(sector_t, len,
-               min(queue_max_sectors(ti->table->md->queue),
+               min(max_sectors ? : queue_max_sectors(ti->table->md->queue),
                    blk_chunk_sectors_left(target_offset, max_granularity)));
 }
 
 static inline sector_t max_io_len(struct dm_target *ti, sector_t sector)
 {
-       return __max_io_len(ti, sector, ti->max_io_len);
+       return __max_io_len(ti, sector, ti->max_io_len, 0);
 }
 
 int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
@@ -1581,12 +1582,13 @@ static void __send_empty_flush(struct clone_info *ci)
 
 static void __send_changing_extent_only(struct clone_info *ci, struct dm_target *ti,
                                        unsigned int num_bios,
-                                       unsigned int max_granularity)
+                                       unsigned int max_granularity,
+                                       unsigned int max_sectors)
 {
        unsigned int len, bios;
 
        len = min_t(sector_t, ci->sector_count,
-                   __max_io_len(ti, ci->sector, max_granularity));
+                   __max_io_len(ti, ci->sector, max_granularity, max_sectors));
 
        atomic_add(num_bios, &ci->io->io_count);
        bios = __send_duplicate_bios(ci, ti, num_bios, &len);
@@ -1623,23 +1625,27 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
 {
        unsigned int num_bios = 0;
        unsigned int max_granularity = 0;
+       unsigned int max_sectors = 0;
        struct queue_limits *limits = dm_get_queue_limits(ti->table->md);
 
        switch (bio_op(ci->bio)) {
        case REQ_OP_DISCARD:
                num_bios = ti->num_discard_bios;
+               max_sectors = limits->max_discard_sectors;
                if (ti->max_discard_granularity)
-                       max_granularity = limits->max_discard_sectors;
+                       max_granularity = max_sectors;
                break;
        case REQ_OP_SECURE_ERASE:
                num_bios = ti->num_secure_erase_bios;
+               max_sectors = limits->max_secure_erase_sectors;
                if (ti->max_secure_erase_granularity)
-                       max_granularity = limits->max_secure_erase_sectors;
+                       max_granularity = max_sectors;
                break;
        case REQ_OP_WRITE_ZEROES:
                num_bios = ti->num_write_zeroes_bios;
+               max_sectors = limits->max_write_zeroes_sectors;
                if (ti->max_write_zeroes_granularity)
-                       max_granularity = limits->max_write_zeroes_sectors;
+                       max_granularity = max_sectors;
                break;
        default:
                break;
@@ -1654,7 +1660,8 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
        if (unlikely(!num_bios))
                return BLK_STS_NOTSUPP;
 
-       __send_changing_extent_only(ci, ti, num_bios, max_granularity);
+       __send_changing_extent_only(ci, ti, num_bios,
+                                   max_granularity, max_sectors);
        return BLK_STS_OK;
 }
 
@@ -2808,6 +2815,10 @@ retry:
        }
 
        map = rcu_dereference_protected(md->map, lockdep_is_held(&md->suspend_lock));
+       if (!map) {
+               /* avoid deadlock with fs/namespace.c:do_mount() */
+               suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
+       }
 
        r = __dm_suspend(md, map, suspend_flags, TASK_INTERRUPTIBLE, DMF_SUSPENDED);
        if (r)
index a856e0a..63d9010 100644 (file)
@@ -203,7 +203,7 @@ int dm_open_count(struct mapped_device *md);
 int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only_deferred);
 int dm_cancel_deferred_remove(struct mapped_device *md);
 int dm_request_based(struct mapped_device *md);
-int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
+int dm_get_table_device(struct mapped_device *md, dev_t dev, blk_mode_t mode,
                        struct dm_dev **result);
 void dm_put_table_device(struct mapped_device *md, struct dm_dev *d);
 
index 91836e6..6eaa0ea 100644 (file)
@@ -147,7 +147,8 @@ static void __init md_setup_drive(struct md_setup_args *args)
                if (p)
                        *p++ = 0;
 
-               dev = name_to_dev_t(devname);
+               if (early_lookup_bdev(devname, &dev))
+                       dev = 0;
                if (strncmp(devname, "/dev/", 5) == 0)
                        devname += 5;
                snprintf(comp_name, 63, "/dev/%s", devname);
index bc8d756..1ff7128 100644 (file)
@@ -54,14 +54,7 @@ __acquires(bitmap->lock)
 {
        unsigned char *mappage;
 
-       if (page >= bitmap->pages) {
-               /* This can happen if bitmap_start_sync goes beyond
-                * End-of-device while looking for a whole page.
-                * It is harmless.
-                */
-               return -EINVAL;
-       }
-
+       WARN_ON_ONCE(page >= bitmap->pages);
        if (bitmap->bp[page].hijacked) /* it's hijacked, don't try to alloc */
                return 0;
 
@@ -1023,7 +1016,6 @@ static int md_bitmap_file_test_bit(struct bitmap *bitmap, sector_t block)
        return set;
 }
 
-
 /* this gets called when the md device is ready to unplug its underlying
  * (slave) device queues -- before we let any writes go down, we need to
  * sync the dirty pages of the bitmap file to disk */
@@ -1033,8 +1025,7 @@ void md_bitmap_unplug(struct bitmap *bitmap)
        int dirty, need_write;
        int writing = 0;
 
-       if (!bitmap || !bitmap->storage.filemap ||
-           test_bit(BITMAP_STALE, &bitmap->flags))
+       if (!md_bitmap_enabled(bitmap))
                return;
 
        /* look at each page to see if there are any set bits that need to be
@@ -1063,6 +1054,35 @@ void md_bitmap_unplug(struct bitmap *bitmap)
 }
 EXPORT_SYMBOL(md_bitmap_unplug);
 
+struct bitmap_unplug_work {
+       struct work_struct work;
+       struct bitmap *bitmap;
+       struct completion *done;
+};
+
+static void md_bitmap_unplug_fn(struct work_struct *work)
+{
+       struct bitmap_unplug_work *unplug_work =
+               container_of(work, struct bitmap_unplug_work, work);
+
+       md_bitmap_unplug(unplug_work->bitmap);
+       complete(unplug_work->done);
+}
+
+void md_bitmap_unplug_async(struct bitmap *bitmap)
+{
+       DECLARE_COMPLETION_ONSTACK(done);
+       struct bitmap_unplug_work unplug_work;
+
+       INIT_WORK_ONSTACK(&unplug_work.work, md_bitmap_unplug_fn);
+       unplug_work.bitmap = bitmap;
+       unplug_work.done = &done;
+
+       queue_work(md_bitmap_wq, &unplug_work.work);
+       wait_for_completion(&done);
+}
+EXPORT_SYMBOL(md_bitmap_unplug_async);
+
 static void md_bitmap_set_memory_bits(struct bitmap *bitmap, sector_t offset, int needed);
 /* * bitmap_init_from_disk -- called at bitmap_create time to initialize
  * the in-memory bitmap from the on-disk bitmap -- also, sets up the
@@ -1241,11 +1261,28 @@ static bitmap_counter_t *md_bitmap_get_counter(struct bitmap_counts *bitmap,
                                               sector_t offset, sector_t *blocks,
                                               int create);
 
+static void mddev_set_timeout(struct mddev *mddev, unsigned long timeout,
+                             bool force)
+{
+       struct md_thread *thread;
+
+       rcu_read_lock();
+       thread = rcu_dereference(mddev->thread);
+
+       if (!thread)
+               goto out;
+
+       if (force || thread->timeout < MAX_SCHEDULE_TIMEOUT)
+               thread->timeout = timeout;
+
+out:
+       rcu_read_unlock();
+}
+
 /*
  * bitmap daemon -- periodically wakes up to clean bits and flush pages
  *                     out to disk
  */
-
 void md_bitmap_daemon_work(struct mddev *mddev)
 {
        struct bitmap *bitmap;
@@ -1269,7 +1306,7 @@ void md_bitmap_daemon_work(struct mddev *mddev)
 
        bitmap->daemon_lastrun = jiffies;
        if (bitmap->allclean) {
-               mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
+               mddev_set_timeout(mddev, MAX_SCHEDULE_TIMEOUT, true);
                goto done;
        }
        bitmap->allclean = 1;
@@ -1366,8 +1403,7 @@ void md_bitmap_daemon_work(struct mddev *mddev)
 
  done:
        if (bitmap->allclean == 0)
-               mddev->thread->timeout =
-                       mddev->bitmap_info.daemon_sleep;
+               mddev_set_timeout(mddev, mddev->bitmap_info.daemon_sleep, true);
        mutex_unlock(&mddev->bitmap_info.mutex);
 }
 
@@ -1387,6 +1423,14 @@ __acquires(bitmap->lock)
        sector_t csize;
        int err;
 
+       if (page >= bitmap->pages) {
+               /*
+                * This can happen if bitmap_start_sync goes beyond
+                * End-of-device while looking for a whole page or
+                * user set a huge number to sysfs bitmap_set_bits.
+                */
+               return NULL;
+       }
        err = md_bitmap_checkpage(bitmap, page, create, 0);
 
        if (bitmap->bp[page].hijacked ||
@@ -1820,8 +1864,7 @@ void md_bitmap_destroy(struct mddev *mddev)
        mddev->bitmap = NULL; /* disconnect from the md device */
        spin_unlock(&mddev->lock);
        mutex_unlock(&mddev->bitmap_info.mutex);
-       if (mddev->thread)
-               mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
+       mddev_set_timeout(mddev, MAX_SCHEDULE_TIMEOUT, true);
 
        md_bitmap_free(bitmap);
 }
@@ -1964,7 +2007,7 @@ int md_bitmap_load(struct mddev *mddev)
        /* Kick recovery in case any bits were set */
        set_bit(MD_RECOVERY_NEEDED, &bitmap->mddev->recovery);
 
-       mddev->thread->timeout = mddev->bitmap_info.daemon_sleep;
+       mddev_set_timeout(mddev, mddev->bitmap_info.daemon_sleep, true);
        md_wakeup_thread(mddev->thread);
 
        md_bitmap_update_sb(bitmap);
@@ -2469,17 +2512,11 @@ timeout_store(struct mddev *mddev, const char *buf, size_t len)
                timeout = MAX_SCHEDULE_TIMEOUT-1;
        if (timeout < 1)
                timeout = 1;
+
        mddev->bitmap_info.daemon_sleep = timeout;
-       if (mddev->thread) {
-               /* if thread->timeout is MAX_SCHEDULE_TIMEOUT, then
-                * the bitmap is all clean and we don't need to
-                * adjust the timeout right now
-                */
-               if (mddev->thread->timeout < MAX_SCHEDULE_TIMEOUT) {
-                       mddev->thread->timeout = timeout;
-                       md_wakeup_thread(mddev->thread);
-               }
-       }
+       mddev_set_timeout(mddev, timeout, false);
+       md_wakeup_thread(mddev->thread);
+
        return len;
 }
 
index cfd7395..8a3788c 100644 (file)
@@ -264,6 +264,7 @@ void md_bitmap_sync_with_cluster(struct mddev *mddev,
                                 sector_t new_lo, sector_t new_hi);
 
 void md_bitmap_unplug(struct bitmap *bitmap);
+void md_bitmap_unplug_async(struct bitmap *bitmap);
 void md_bitmap_daemon_work(struct mddev *mddev);
 
 int md_bitmap_resize(struct bitmap *bitmap, sector_t blocks,
@@ -273,6 +274,13 @@ int md_bitmap_copy_from_slot(struct mddev *mddev, int slot,
                             sector_t *lo, sector_t *hi, bool clear_bits);
 void md_bitmap_free(struct bitmap *bitmap);
 void md_bitmap_wait_behind_writes(struct mddev *mddev);
+
+static inline bool md_bitmap_enabled(struct bitmap *bitmap)
+{
+       return bitmap && bitmap->storage.filemap &&
+              !test_bit(BITMAP_STALE, &bitmap->flags);
+}
+
 #endif
 
 #endif
index 10e0c53..3d9fd74 100644 (file)
@@ -75,14 +75,14 @@ struct md_cluster_info {
        sector_t suspend_hi;
        int suspend_from; /* the slot which broadcast suspend_lo/hi */
 
-       struct md_thread *recovery_thread;
+       struct md_thread __rcu *recovery_thread;
        unsigned long recovery_map;
        /* communication loc resources */
        struct dlm_lock_resource *ack_lockres;
        struct dlm_lock_resource *message_lockres;
        struct dlm_lock_resource *token_lockres;
        struct dlm_lock_resource *no_new_dev_lockres;
-       struct md_thread *recv_thread;
+       struct md_thread __rcu *recv_thread;
        struct completion newdisk_completion;
        wait_queue_head_t wait;
        unsigned long state;
@@ -362,8 +362,8 @@ static void __recover_slot(struct mddev *mddev, int slot)
 
        set_bit(slot, &cinfo->recovery_map);
        if (!cinfo->recovery_thread) {
-               cinfo->recovery_thread = md_register_thread(recover_bitmaps,
-                               mddev, "recover");
+               rcu_assign_pointer(cinfo->recovery_thread,
+                       md_register_thread(recover_bitmaps, mddev, "recover"));
                if (!cinfo->recovery_thread) {
                        pr_warn("md-cluster: Could not create recovery thread\n");
                        return;
@@ -526,11 +526,15 @@ static void process_add_new_disk(struct mddev *mddev, struct cluster_msg *cmsg)
 static void process_metadata_update(struct mddev *mddev, struct cluster_msg *msg)
 {
        int got_lock = 0;
+       struct md_thread *thread;
        struct md_cluster_info *cinfo = mddev->cluster_info;
        mddev->good_device_nr = le32_to_cpu(msg->raid_slot);
 
        dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_CR);
-       wait_event(mddev->thread->wqueue,
+
+       /* daemaon thread must exist */
+       thread = rcu_dereference_protected(mddev->thread, true);
+       wait_event(thread->wqueue,
                   (got_lock = mddev_trylock(mddev)) ||
                    test_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state));
        md_reload_sb(mddev, mddev->good_device_nr);
@@ -889,7 +893,8 @@ static int join(struct mddev *mddev, int nodes)
        }
        /* Initiate the communication resources */
        ret = -ENOMEM;
-       cinfo->recv_thread = md_register_thread(recv_daemon, mddev, "cluster_recv");
+       rcu_assign_pointer(cinfo->recv_thread,
+                       md_register_thread(recv_daemon, mddev, "cluster_recv"));
        if (!cinfo->recv_thread) {
                pr_err("md-cluster: cannot allocate memory for recv_thread!\n");
                goto err;
index 66edf5e..92c45be 100644 (file)
@@ -400,8 +400,8 @@ static int multipath_run (struct mddev *mddev)
        if (ret)
                goto out_free_conf;
 
-       mddev->thread = md_register_thread(multipathd, mddev,
-                                          "multipath");
+       rcu_assign_pointer(mddev->thread,
+                          md_register_thread(multipathd, mddev, "multipath"));
        if (!mddev->thread)
                goto out_free_conf;
 
index 8e344b4..cf3733c 100644 (file)
 #include "md-bitmap.h"
 #include "md-cluster.h"
 
-/* pers_list is a list of registered personalities protected
- * by pers_lock.
- * pers_lock does extra service to protect accesses to
- * mddev->thread when the mutex cannot be held.
- */
+/* pers_list is a list of registered personalities protected by pers_lock. */
 static LIST_HEAD(pers_list);
 static DEFINE_SPINLOCK(pers_lock);
 
@@ -87,23 +83,13 @@ static struct module *md_cluster_mod;
 static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
 static struct workqueue_struct *md_wq;
 static struct workqueue_struct *md_misc_wq;
-static struct workqueue_struct *md_rdev_misc_wq;
+struct workqueue_struct *md_bitmap_wq;
 
 static int remove_and_add_spares(struct mddev *mddev,
                                 struct md_rdev *this);
 static void mddev_detach(struct mddev *mddev);
-
-enum md_ro_state {
-       MD_RDWR,
-       MD_RDONLY,
-       MD_AUTO_READ,
-       MD_MAX_STATE
-};
-
-static bool md_is_rdwr(struct mddev *mddev)
-{
-       return (mddev->ro == MD_RDWR);
-}
+static void export_rdev(struct md_rdev *rdev, struct mddev *mddev);
+static void md_wakeup_thread_directly(struct md_thread __rcu *thread);
 
 /*
  * Default number of read corrections we'll attempt on an rdev
@@ -360,10 +346,6 @@ EXPORT_SYMBOL_GPL(md_new_event);
 static LIST_HEAD(all_mddevs);
 static DEFINE_SPINLOCK(all_mddevs_lock);
 
-static bool is_md_suspended(struct mddev *mddev)
-{
-       return percpu_ref_is_dying(&mddev->active_io);
-}
 /* Rather than calling directly into the personality make_request function,
  * IO requests come here first so that we can check if the device is
  * being suspended pending a reconfiguration.
@@ -457,13 +439,19 @@ static void md_submit_bio(struct bio *bio)
  */
 void mddev_suspend(struct mddev *mddev)
 {
-       WARN_ON_ONCE(mddev->thread && current == mddev->thread->tsk);
-       lockdep_assert_held(&mddev->reconfig_mutex);
+       struct md_thread *thread = rcu_dereference_protected(mddev->thread,
+                       lockdep_is_held(&mddev->reconfig_mutex));
+
+       WARN_ON_ONCE(thread && current == thread->tsk);
        if (mddev->suspended++)
                return;
        wake_up(&mddev->sb_wait);
        set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
        percpu_ref_kill(&mddev->active_io);
+
+       if (mddev->pers->prepare_suspend)
+               mddev->pers->prepare_suspend(mddev);
+
        wait_event(mddev->sb_wait, percpu_ref_is_zero(&mddev->active_io));
        mddev->pers->quiesce(mddev, 1);
        clear_bit_unlock(MD_ALLOW_SB_UPDATE, &mddev->flags);
@@ -655,9 +643,11 @@ void mddev_init(struct mddev *mddev)
 {
        mutex_init(&mddev->open_mutex);
        mutex_init(&mddev->reconfig_mutex);
+       mutex_init(&mddev->delete_mutex);
        mutex_init(&mddev->bitmap_info.mutex);
        INIT_LIST_HEAD(&mddev->disks);
        INIT_LIST_HEAD(&mddev->all_mddevs);
+       INIT_LIST_HEAD(&mddev->deleting);
        timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0);
        atomic_set(&mddev->active, 1);
        atomic_set(&mddev->openers, 0);
@@ -759,6 +749,24 @@ static void mddev_free(struct mddev *mddev)
 
 static const struct attribute_group md_redundancy_group;
 
+static void md_free_rdev(struct mddev *mddev)
+{
+       struct md_rdev *rdev;
+       struct md_rdev *tmp;
+
+       mutex_lock(&mddev->delete_mutex);
+       if (list_empty(&mddev->deleting))
+               goto out;
+
+       list_for_each_entry_safe(rdev, tmp, &mddev->deleting, same_set) {
+               list_del_init(&rdev->same_set);
+               kobject_del(&rdev->kobj);
+               export_rdev(rdev, mddev);
+       }
+out:
+       mutex_unlock(&mddev->delete_mutex);
+}
+
 void mddev_unlock(struct mddev *mddev)
 {
        if (mddev->to_remove) {
@@ -800,13 +808,10 @@ void mddev_unlock(struct mddev *mddev)
        } else
                mutex_unlock(&mddev->reconfig_mutex);
 
-       /* As we've dropped the mutex we need a spinlock to
-        * make sure the thread doesn't disappear
-        */
-       spin_lock(&pers_lock);
+       md_free_rdev(mddev);
+
        md_wakeup_thread(mddev->thread);
        wake_up(&mddev->sb_wait);
-       spin_unlock(&pers_lock);
 }
 EXPORT_SYMBOL_GPL(mddev_unlock);
 
@@ -938,7 +943,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
        atomic_inc(&rdev->nr_pending);
 
        bio->bi_iter.bi_sector = sector;
-       bio_add_page(bio, page, size, 0);
+       __bio_add_page(bio, page, size, 0);
        bio->bi_private = rdev;
        bio->bi_end_io = super_written;
 
@@ -979,7 +984,7 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
                bio.bi_iter.bi_sector = sector + rdev->new_data_offset;
        else
                bio.bi_iter.bi_sector = sector + rdev->data_offset;
-       bio_add_page(&bio, page, size, 0);
+       __bio_add_page(&bio, page, size, 0);
 
        submit_bio_wait(&bio);
 
@@ -2440,16 +2445,12 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
        return err;
 }
 
-static void rdev_delayed_delete(struct work_struct *ws)
-{
-       struct md_rdev *rdev = container_of(ws, struct md_rdev, del_work);
-       kobject_del(&rdev->kobj);
-       kobject_put(&rdev->kobj);
-}
-
 void md_autodetect_dev(dev_t dev);
 
-static void export_rdev(struct md_rdev *rdev)
+/* just for claiming the bdev */
+static struct md_rdev claim_rdev;
+
+static void export_rdev(struct md_rdev *rdev, struct mddev *mddev)
 {
        pr_debug("md: export_rdev(%pg)\n", rdev->bdev);
        md_rdev_clear(rdev);
@@ -2457,13 +2458,15 @@ static void export_rdev(struct md_rdev *rdev)
        if (test_bit(AutoDetected, &rdev->flags))
                md_autodetect_dev(rdev->bdev->bd_dev);
 #endif
-       blkdev_put(rdev->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+       blkdev_put(rdev->bdev, mddev->major_version == -2 ? &claim_rdev : rdev);
        rdev->bdev = NULL;
        kobject_put(&rdev->kobj);
 }
 
 static void md_kick_rdev_from_array(struct md_rdev *rdev)
 {
+       struct mddev *mddev = rdev->mddev;
+
        bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
        list_del_rcu(&rdev->same_set);
        pr_debug("md: unbind<%pg>\n", rdev->bdev);
@@ -2477,15 +2480,17 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev)
        rdev->sysfs_unack_badblocks = NULL;
        rdev->sysfs_badblocks = NULL;
        rdev->badblocks.count = 0;
-       /* We need to delay this, otherwise we can deadlock when
-        * writing to 'remove' to "dev/state".  We also need
-        * to delay it due to rcu usage.
-        */
+
        synchronize_rcu();
-       INIT_WORK(&rdev->del_work, rdev_delayed_delete);
-       kobject_get(&rdev->kobj);
-       queue_work(md_rdev_misc_wq, &rdev->del_work);
-       export_rdev(rdev);
+
+       /*
+        * kobject_del() will wait for all in progress writers to be done, where
+        * reconfig_mutex is held, hence it can't be called under
+        * reconfig_mutex and it's delayed to mddev_unlock().
+        */
+       mutex_lock(&mddev->delete_mutex);
+       list_add(&rdev->same_set, &mddev->deleting);
+       mutex_unlock(&mddev->delete_mutex);
 }
 
 static void export_array(struct mddev *mddev)
@@ -3553,6 +3558,7 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
 {
        struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr);
        struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj);
+       struct kernfs_node *kn = NULL;
        ssize_t rv;
        struct mddev *mddev = rdev->mddev;
 
@@ -3560,6 +3566,10 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
                return -EIO;
        if (!capable(CAP_SYS_ADMIN))
                return -EACCES;
+
+       if (entry->store == state_store && cmd_match(page, "remove"))
+               kn = sysfs_break_active_protection(kobj, attr);
+
        rv = mddev ? mddev_lock(mddev) : -ENODEV;
        if (!rv) {
                if (rdev->mddev == NULL)
@@ -3568,6 +3578,10 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
                        rv = entry->store(rdev, page, length);
                mddev_unlock(mddev);
        }
+
+       if (kn)
+               sysfs_unbreak_active_protection(kn);
+
        return rv;
 }
 
@@ -3612,6 +3626,7 @@ int md_rdev_init(struct md_rdev *rdev)
        return badblocks_init(&rdev->badblocks, 0);
 }
 EXPORT_SYMBOL_GPL(md_rdev_init);
+
 /*
  * Import a device. If 'super_format' >= 0, then sanity check the superblock
  *
@@ -3624,7 +3639,6 @@ EXPORT_SYMBOL_GPL(md_rdev_init);
  */
 static struct md_rdev *md_import_device(dev_t newdev, int super_format, int super_minor)
 {
-       static struct md_rdev claim_rdev; /* just for claiming the bdev */
        struct md_rdev *rdev;
        sector_t size;
        int err;
@@ -3640,9 +3654,8 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
        if (err)
                goto out_clear_rdev;
 
-       rdev->bdev = blkdev_get_by_dev(newdev,
-                       FMODE_READ | FMODE_WRITE | FMODE_EXCL,
-                       super_format == -2 ? &claim_rdev : rdev);
+       rdev->bdev = blkdev_get_by_dev(newdev, BLK_OPEN_READ | BLK_OPEN_WRITE,
+                       super_format == -2 ? &claim_rdev : rdev, NULL);
        if (IS_ERR(rdev->bdev)) {
                pr_warn("md: could not open device unknown-block(%u,%u).\n",
                        MAJOR(newdev), MINOR(newdev));
@@ -3679,7 +3692,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
        return rdev;
 
 out_blkdev_put:
-       blkdev_put(rdev->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+       blkdev_put(rdev->bdev, super_format == -2 ? &claim_rdev : rdev);
 out_clear_rdev:
        md_rdev_clear(rdev);
 out_free_rdev:
@@ -3794,8 +3807,9 @@ int strict_strtoul_scaled(const char *cp, unsigned long *res, int scale)
 static ssize_t
 safe_delay_show(struct mddev *mddev, char *page)
 {
-       int msec = (mddev->safemode_delay*1000)/HZ;
-       return sprintf(page, "%d.%03d\n", msec/1000, msec%1000);
+       unsigned int msec = ((unsigned long)mddev->safemode_delay*1000)/HZ;
+
+       return sprintf(page, "%u.%03u\n", msec/1000, msec%1000);
 }
 static ssize_t
 safe_delay_store(struct mddev *mddev, const char *cbuf, size_t len)
@@ -3807,7 +3821,7 @@ safe_delay_store(struct mddev *mddev, const char *cbuf, size_t len)
                return -EINVAL;
        }
 
-       if (strict_strtoul_scaled(cbuf, &msec, 3) < 0)
+       if (strict_strtoul_scaled(cbuf, &msec, 3) < 0 || msec > UINT_MAX / HZ)
                return -EINVAL;
        if (msec == 0)
                mddev->safemode_delay = 0;
@@ -4477,6 +4491,8 @@ max_corrected_read_errors_store(struct mddev *mddev, const char *buf, size_t len
        rv = kstrtouint(buf, 10, &n);
        if (rv < 0)
                return rv;
+       if (n > INT_MAX)
+               return -EINVAL;
        atomic_set(&mddev->max_corr_read_errors, n);
        return len;
 }
@@ -4491,20 +4507,6 @@ null_show(struct mddev *mddev, char *page)
        return -EINVAL;
 }
 
-/* need to ensure rdev_delayed_delete() has completed */
-static void flush_rdev_wq(struct mddev *mddev)
-{
-       struct md_rdev *rdev;
-
-       rcu_read_lock();
-       rdev_for_each_rcu(rdev, mddev)
-               if (work_pending(&rdev->del_work)) {
-                       flush_workqueue(md_rdev_misc_wq);
-                       break;
-               }
-       rcu_read_unlock();
-}
-
 static ssize_t
 new_dev_store(struct mddev *mddev, const char *buf, size_t len)
 {
@@ -4532,7 +4534,6 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
            minor != MINOR(dev))
                return -EOVERFLOW;
 
-       flush_rdev_wq(mddev);
        err = mddev_lock(mddev);
        if (err)
                return err;
@@ -4560,7 +4561,7 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
        err = bind_rdev_to_array(rdev, mddev);
  out:
        if (err)
-               export_rdev(rdev);
+               export_rdev(rdev, mddev);
        mddev_unlock(mddev);
        if (!err)
                md_new_event();
@@ -4804,11 +4805,21 @@ action_store(struct mddev *mddev, const char *page, size_t len)
                        return -EINVAL;
                err = mddev_lock(mddev);
                if (!err) {
-                       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+                       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
                                err =  -EBUSY;
-                       else {
+                       } else if (mddev->reshape_position == MaxSector ||
+                                  mddev->pers->check_reshape == NULL ||
+                                  mddev->pers->check_reshape(mddev)) {
                                clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
                                err = mddev->pers->start_reshape(mddev);
+                       } else {
+                               /*
+                                * If reshape is still in progress, and
+                                * md_check_recovery() can continue to reshape,
+                                * don't restart reshape because data can be
+                                * corrupted for raid456.
+                                */
+                               clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
                        }
                        mddev_unlock(mddev);
                }
@@ -5592,7 +5603,6 @@ struct mddev *md_alloc(dev_t dev, char *name)
         * removed (mddev_delayed_delete).
         */
        flush_workqueue(md_misc_wq);
-       flush_workqueue(md_rdev_misc_wq);
 
        mutex_lock(&disks_mutex);
        mddev = mddev_alloc(dev);
@@ -6269,10 +6279,12 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
        }
        if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
                set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-       if (mddev->sync_thread)
-               /* Thread might be blocked waiting for metadata update
-                * which will now never happen */
-               wake_up_process(mddev->sync_thread->tsk);
+
+       /*
+        * Thread might be blocked waiting for metadata update which will now
+        * never happen
+        */
+       md_wakeup_thread_directly(mddev->sync_thread);
 
        if (mddev->external && test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
                return -EBUSY;
@@ -6333,10 +6345,12 @@ static int do_md_stop(struct mddev *mddev, int mode,
        }
        if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
                set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-       if (mddev->sync_thread)
-               /* Thread might be blocked waiting for metadata update
-                * which will now never happen */
-               wake_up_process(mddev->sync_thread->tsk);
+
+       /*
+        * Thread might be blocked waiting for metadata update which will now
+        * never happen
+        */
+       md_wakeup_thread_directly(mddev->sync_thread);
 
        mddev_unlock(mddev);
        wait_event(resync_wait, (mddev->sync_thread == NULL &&
@@ -6498,7 +6512,7 @@ static void autorun_devices(int part)
                        rdev_for_each_list(rdev, tmp, &candidates) {
                                list_del_init(&rdev->same_set);
                                if (bind_rdev_to_array(rdev, mddev))
-                                       export_rdev(rdev);
+                                       export_rdev(rdev, mddev);
                        }
                        autorun_array(mddev);
                        mddev_unlock(mddev);
@@ -6508,7 +6522,7 @@ static void autorun_devices(int part)
                 */
                rdev_for_each_list(rdev, tmp, &candidates) {
                        list_del_init(&rdev->same_set);
-                       export_rdev(rdev);
+                       export_rdev(rdev, mddev);
                }
                mddev_put(mddev);
        }
@@ -6696,13 +6710,13 @@ int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info)
                                pr_warn("md: %pg has different UUID to %pg\n",
                                        rdev->bdev,
                                        rdev0->bdev);
-                               export_rdev(rdev);
+                               export_rdev(rdev, mddev);
                                return -EINVAL;
                        }
                }
                err = bind_rdev_to_array(rdev, mddev);
                if (err)
-                       export_rdev(rdev);
+                       export_rdev(rdev, mddev);
                return err;
        }
 
@@ -6733,7 +6747,6 @@ int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info)
                        if (info->state & (1<<MD_DISK_SYNC)  &&
                            info->raid_disk < mddev->raid_disks) {
                                rdev->raid_disk = info->raid_disk;
-                               set_bit(In_sync, &rdev->flags);
                                clear_bit(Bitmap_sync, &rdev->flags);
                        } else
                                rdev->raid_disk = -1;
@@ -6746,7 +6759,7 @@ int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info)
                        /* This was a hot-add request, but events doesn't
                         * match, so reject it.
                         */
-                       export_rdev(rdev);
+                       export_rdev(rdev, mddev);
                        return -EINVAL;
                }
 
@@ -6772,7 +6785,7 @@ int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info)
                                }
                        }
                        if (has_journal || mddev->bitmap) {
-                               export_rdev(rdev);
+                               export_rdev(rdev, mddev);
                                return -EBUSY;
                        }
                        set_bit(Journal, &rdev->flags);
@@ -6787,7 +6800,7 @@ int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info)
                                /* --add initiated by this node */
                                err = md_cluster_ops->add_new_disk(mddev, rdev);
                                if (err) {
-                                       export_rdev(rdev);
+                                       export_rdev(rdev, mddev);
                                        return err;
                                }
                        }
@@ -6797,7 +6810,7 @@ int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info)
                err = bind_rdev_to_array(rdev, mddev);
 
                if (err)
-                       export_rdev(rdev);
+                       export_rdev(rdev, mddev);
 
                if (mddev_is_clustered(mddev)) {
                        if (info->state & (1 << MD_DISK_CANDIDATE)) {
@@ -6860,7 +6873,7 @@ int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info)
 
                err = bind_rdev_to_array(rdev, mddev);
                if (err) {
-                       export_rdev(rdev);
+                       export_rdev(rdev, mddev);
                        return err;
                }
        }
@@ -6985,7 +6998,7 @@ static int hot_add_disk(struct mddev *mddev, dev_t dev)
        return 0;
 
 abort_export:
-       export_rdev(rdev);
+       export_rdev(rdev, mddev);
        return err;
 }
 
@@ -7486,7 +7499,7 @@ static int __md_set_array_info(struct mddev *mddev, void __user *argp)
        return err;
 }
 
-static int md_ioctl(struct block_device *bdev, fmode_t mode,
+static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
                        unsigned int cmd, unsigned long arg)
 {
        int err = 0;
@@ -7555,9 +7568,6 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
 
        }
 
-       if (cmd == ADD_NEW_DISK || cmd == HOT_ADD_DISK)
-               flush_rdev_wq(mddev);
-
        if (cmd == HOT_REMOVE_DISK)
                /* need to ensure recovery thread has run */
                wait_event_interruptible_timeout(mddev->sb_wait,
@@ -7718,7 +7728,7 @@ out:
        return err;
 }
 #ifdef CONFIG_COMPAT
-static int md_compat_ioctl(struct block_device *bdev, fmode_t mode,
+static int md_compat_ioctl(struct block_device *bdev, blk_mode_t mode,
                    unsigned int cmd, unsigned long arg)
 {
        switch (cmd) {
@@ -7767,13 +7777,13 @@ out_unlock:
        return err;
 }
 
-static int md_open(struct block_device *bdev, fmode_t mode)
+static int md_open(struct gendisk *disk, blk_mode_t mode)
 {
        struct mddev *mddev;
        int err;
 
        spin_lock(&all_mddevs_lock);
-       mddev = mddev_get(bdev->bd_disk->private_data);
+       mddev = mddev_get(disk->private_data);
        spin_unlock(&all_mddevs_lock);
        if (!mddev)
                return -ENODEV;
@@ -7789,7 +7799,7 @@ static int md_open(struct block_device *bdev, fmode_t mode)
        atomic_inc(&mddev->openers);
        mutex_unlock(&mddev->open_mutex);
 
-       bdev_check_media_change(bdev);
+       disk_check_media_change(disk);
        return 0;
 
 out_unlock:
@@ -7799,7 +7809,7 @@ out:
        return err;
 }
 
-static void md_release(struct gendisk *disk, fmode_t mode)
+static void md_release(struct gendisk *disk)
 {
        struct mddev *mddev = disk->private_data;
 
@@ -7886,13 +7896,29 @@ static int md_thread(void *arg)
        return 0;
 }
 
-void md_wakeup_thread(struct md_thread *thread)
+static void md_wakeup_thread_directly(struct md_thread __rcu *thread)
 {
-       if (thread) {
-               pr_debug("md: waking up MD thread %s.\n", thread->tsk->comm);
-               set_bit(THREAD_WAKEUP, &thread->flags);
-               wake_up(&thread->wqueue);
+       struct md_thread *t;
+
+       rcu_read_lock();
+       t = rcu_dereference(thread);
+       if (t)
+               wake_up_process(t->tsk);
+       rcu_read_unlock();
+}
+
+void md_wakeup_thread(struct md_thread __rcu *thread)
+{
+       struct md_thread *t;
+
+       rcu_read_lock();
+       t = rcu_dereference(thread);
+       if (t) {
+               pr_debug("md: waking up MD thread %s.\n", t->tsk->comm);
+               set_bit(THREAD_WAKEUP, &t->flags);
+               wake_up(&t->wqueue);
        }
+       rcu_read_unlock();
 }
 EXPORT_SYMBOL(md_wakeup_thread);
 
@@ -7922,22 +7948,15 @@ struct md_thread *md_register_thread(void (*run) (struct md_thread *),
 }
 EXPORT_SYMBOL(md_register_thread);
 
-void md_unregister_thread(struct md_thread **threadp)
+void md_unregister_thread(struct md_thread __rcu **threadp)
 {
-       struct md_thread *thread;
+       struct md_thread *thread = rcu_dereference_protected(*threadp, true);
 
-       /*
-        * Locking ensures that mddev_unlock does not wake_up a
-        * non-existent thread
-        */
-       spin_lock(&pers_lock);
-       thread = *threadp;
-       if (!thread) {
-               spin_unlock(&pers_lock);
+       if (!thread)
                return;
-       }
-       *threadp = NULL;
-       spin_unlock(&pers_lock);
+
+       rcu_assign_pointer(*threadp, NULL);
+       synchronize_rcu();
 
        pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
        kthread_stop(thread->tsk);
@@ -9100,6 +9119,7 @@ void md_do_sync(struct md_thread *thread)
        spin_unlock(&mddev->lock);
 
        wake_up(&resync_wait);
+       wake_up(&mddev->sb_wait);
        md_wakeup_thread(mddev->thread);
        return;
 }
@@ -9202,9 +9222,8 @@ static void md_start_sync(struct work_struct *ws)
 {
        struct mddev *mddev = container_of(ws, struct mddev, del_work);
 
-       mddev->sync_thread = md_register_thread(md_do_sync,
-                                               mddev,
-                                               "resync");
+       rcu_assign_pointer(mddev->sync_thread,
+                          md_register_thread(md_do_sync, mddev, "resync"));
        if (!mddev->sync_thread) {
                pr_warn("%s: could not start resync thread...\n",
                        mdname(mddev));
@@ -9619,9 +9638,10 @@ static int __init md_init(void)
        if (!md_misc_wq)
                goto err_misc_wq;
 
-       md_rdev_misc_wq = alloc_workqueue("md_rdev_misc", 0, 0);
-       if (!md_rdev_misc_wq)
-               goto err_rdev_misc_wq;
+       md_bitmap_wq = alloc_workqueue("md_bitmap", WQ_MEM_RECLAIM | WQ_UNBOUND,
+                                      0);
+       if (!md_bitmap_wq)
+               goto err_bitmap_wq;
 
        ret = __register_blkdev(MD_MAJOR, "md", md_probe);
        if (ret < 0)
@@ -9641,8 +9661,8 @@ static int __init md_init(void)
 err_mdp:
        unregister_blkdev(MD_MAJOR, "md");
 err_md:
-       destroy_workqueue(md_rdev_misc_wq);
-err_rdev_misc_wq:
+       destroy_workqueue(md_bitmap_wq);
+err_bitmap_wq:
        destroy_workqueue(md_misc_wq);
 err_misc_wq:
        destroy_workqueue(md_wq);
@@ -9938,8 +9958,8 @@ static __exit void md_exit(void)
        }
        spin_unlock(&all_mddevs_lock);
 
-       destroy_workqueue(md_rdev_misc_wq);
        destroy_workqueue(md_misc_wq);
+       destroy_workqueue(md_bitmap_wq);
        destroy_workqueue(md_wq);
 }
 
index fd8f260..bfd2306 100644 (file)
@@ -122,8 +122,6 @@ struct md_rdev {
 
        struct serial_in_rdev *serial;  /* used for raid1 io serialization */
 
-       struct work_struct del_work;    /* used for delayed sysfs removal */
-
        struct kernfs_node *sysfs_state; /* handle for 'state'
                                           * sysfs entry */
        /* handle for 'unacknowledged_bad_blocks' sysfs dentry */
@@ -367,8 +365,8 @@ struct mddev {
        int                             new_chunk_sectors;
        int                             reshape_backwards;
 
-       struct md_thread                *thread;        /* management thread */
-       struct md_thread                *sync_thread;   /* doing resync or reconstruct */
+       struct md_thread __rcu          *thread;        /* management thread */
+       struct md_thread __rcu          *sync_thread;   /* doing resync or reconstruct */
 
        /* 'last_sync_action' is initialized to "none".  It is set when a
         * sync operation (i.e "data-check", "requested-resync", "resync",
@@ -531,6 +529,14 @@ struct mddev {
        unsigned int                    good_device_nr; /* good device num within cluster raid */
        unsigned int                    noio_flag; /* for memalloc scope API */
 
+       /*
+        * Temporarily store rdev that will be finally removed when
+        * reconfig_mutex is unlocked.
+        */
+       struct list_head                deleting;
+       /* Protect the deleting list */
+       struct mutex                    delete_mutex;
+
        bool    has_superblocks:1;
        bool    fail_last_dev:1;
        bool    serialize_policy:1;
@@ -555,6 +561,23 @@ enum recovery_flags {
        MD_RESYNCING_REMOTE,    /* remote node is running resync thread */
 };
 
+enum md_ro_state {
+       MD_RDWR,
+       MD_RDONLY,
+       MD_AUTO_READ,
+       MD_MAX_STATE
+};
+
+static inline bool md_is_rdwr(struct mddev *mddev)
+{
+       return (mddev->ro == MD_RDWR);
+}
+
+static inline bool is_md_suspended(struct mddev *mddev)
+{
+       return percpu_ref_is_dying(&mddev->active_io);
+}
+
 static inline int __must_check mddev_lock(struct mddev *mddev)
 {
        return mutex_lock_interruptible(&mddev->reconfig_mutex);
@@ -614,6 +637,7 @@ struct md_personality
        int (*start_reshape) (struct mddev *mddev);
        void (*finish_reshape) (struct mddev *mddev);
        void (*update_reshape_pos) (struct mddev *mddev);
+       void (*prepare_suspend) (struct mddev *mddev);
        /* quiesce suspends or resumes internal processing.
         * 1 - stop new actions and wait for action io to complete
         * 0 - return to normal behaviour
@@ -734,8 +758,8 @@ extern struct md_thread *md_register_thread(
        void (*run)(struct md_thread *thread),
        struct mddev *mddev,
        const char *name);
-extern void md_unregister_thread(struct md_thread **threadp);
-extern void md_wakeup_thread(struct md_thread *thread);
+extern void md_unregister_thread(struct md_thread __rcu **threadp);
+extern void md_wakeup_thread(struct md_thread __rcu *thread);
 extern void md_check_recovery(struct mddev *mddev);
 extern void md_reap_sync_thread(struct mddev *mddev);
 extern int mddev_init_writes_pending(struct mddev *mddev);
@@ -828,6 +852,7 @@ struct mdu_array_info_s;
 struct mdu_disk_info_s;
 
 extern int mdp_major;
+extern struct workqueue_struct *md_bitmap_wq;
 void md_autostart_arrays(int part);
 int md_set_array_info(struct mddev *mddev, struct mdu_array_info_s *info);
 int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info);
index e61f6ca..169ebe2 100644 (file)
@@ -21,6 +21,7 @@
 #define IO_MADE_GOOD ((struct bio *)2)
 
 #define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
+#define MAX_PLUG_BIO 32
 
 /* for managing resync I/O pages */
 struct resync_pages {
@@ -31,6 +32,7 @@ struct resync_pages {
 struct raid1_plug_cb {
        struct blk_plug_cb      cb;
        struct bio_list         pending;
+       unsigned int            count;
 };
 
 static void rbio_pool_free(void *rbio, void *data)
@@ -101,11 +103,73 @@ static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
                struct page *page = resync_fetch_page(rp, idx);
                int len = min_t(int, size, PAGE_SIZE);
 
-               /*
-                * won't fail because the vec table is big
-                * enough to hold all these pages
-                */
-               bio_add_page(bio, page, len, 0);
+               if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
+                       bio->bi_status = BLK_STS_RESOURCE;
+                       bio_endio(bio);
+                       return;
+               }
+
                size -= len;
        } while (idx++ < RESYNC_PAGES && size > 0);
 }
+
+
+static inline void raid1_submit_write(struct bio *bio)
+{
+       struct md_rdev *rdev = (struct md_rdev *)bio->bi_bdev;
+
+       bio->bi_next = NULL;
+       bio_set_dev(bio, rdev->bdev);
+       if (test_bit(Faulty, &rdev->flags))
+               bio_io_error(bio);
+       else if (unlikely(bio_op(bio) ==  REQ_OP_DISCARD &&
+                         !bdev_max_discard_sectors(bio->bi_bdev)))
+               /* Just ignore it */
+               bio_endio(bio);
+       else
+               submit_bio_noacct(bio);
+}
+
+static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
+                                     blk_plug_cb_fn unplug, int copies)
+{
+       struct raid1_plug_cb *plug = NULL;
+       struct blk_plug_cb *cb;
+
+       /*
+        * If bitmap is not enabled, it's safe to submit the io directly, and
+        * this can get optimal performance.
+        */
+       if (!md_bitmap_enabled(mddev->bitmap)) {
+               raid1_submit_write(bio);
+               return true;
+       }
+
+       cb = blk_check_plugged(unplug, mddev, sizeof(*plug));
+       if (!cb)
+               return false;
+
+       plug = container_of(cb, struct raid1_plug_cb, cb);
+       bio_list_add(&plug->pending, bio);
+       if (++plug->count / MAX_PLUG_BIO >= copies) {
+               list_del(&cb->list);
+               cb->callback(cb, false);
+       }
+
+
+       return true;
+}
+
+/*
+ * current->bio_list will be set under submit_bio() context, in this case bitmap
+ * io will be added to the list and wait for current io submission to finish,
+ * while current io submission must wait for bitmap io to be done. In order to
+ * avoid such deadlock, submit bitmap io asynchronously.
+ */
+static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
+{
+       if (current->bio_list)
+               md_bitmap_unplug_async(bitmap);
+       else
+               md_bitmap_unplug(bitmap);
+}
index 68a9e2d..dd25832 100644 (file)
@@ -794,22 +794,13 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 static void flush_bio_list(struct r1conf *conf, struct bio *bio)
 {
        /* flush any pending bitmap writes to disk before proceeding w/ I/O */
-       md_bitmap_unplug(conf->mddev->bitmap);
+       raid1_prepare_flush_writes(conf->mddev->bitmap);
        wake_up(&conf->wait_barrier);
 
        while (bio) { /* submit pending writes */
                struct bio *next = bio->bi_next;
-               struct md_rdev *rdev = (void *)bio->bi_bdev;
-               bio->bi_next = NULL;
-               bio_set_dev(bio, rdev->bdev);
-               if (test_bit(Faulty, &rdev->flags)) {
-                       bio_io_error(bio);
-               } else if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
-                                   !bdev_max_discard_sectors(bio->bi_bdev)))
-                       /* Just ignore it */
-                       bio_endio(bio);
-               else
-                       submit_bio_noacct(bio);
+
+               raid1_submit_write(bio);
                bio = next;
                cond_resched();
        }
@@ -1147,7 +1138,10 @@ static void alloc_behind_master_bio(struct r1bio *r1_bio,
                if (unlikely(!page))
                        goto free_pages;
 
-               bio_add_page(behind_bio, page, len, 0);
+               if (!bio_add_page(behind_bio, page, len, 0)) {
+                       put_page(page);
+                       goto free_pages;
+               }
 
                size -= len;
                i++;
@@ -1175,7 +1169,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
        struct r1conf *conf = mddev->private;
        struct bio *bio;
 
-       if (from_schedule || current->bio_list) {
+       if (from_schedule) {
                spin_lock_irq(&conf->device_lock);
                bio_list_merge(&conf->pending_bio_list, &plug->pending);
                spin_unlock_irq(&conf->device_lock);
@@ -1343,8 +1337,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
        struct bitmap *bitmap = mddev->bitmap;
        unsigned long flags;
        struct md_rdev *blocked_rdev;
-       struct blk_plug_cb *cb;
-       struct raid1_plug_cb *plug = NULL;
        int first_clone;
        int max_sectors;
        bool write_behind = false;
@@ -1573,15 +1565,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
                                              r1_bio->sector);
                /* flush_pending_writes() needs access to the rdev so...*/
                mbio->bi_bdev = (void *)rdev;
-
-               cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
-               if (cb)
-                       plug = container_of(cb, struct raid1_plug_cb, cb);
-               else
-                       plug = NULL;
-               if (plug) {
-                       bio_list_add(&plug->pending, mbio);
-               } else {
+               if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
                        spin_lock_irqsave(&conf->device_lock, flags);
                        bio_list_add(&conf->pending_bio_list, mbio);
                        spin_unlock_irqrestore(&conf->device_lock, flags);
@@ -2914,7 +2898,7 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
                                 * won't fail because the vec table is big
                                 * enough to hold all these pages
                                 */
-                               bio_add_page(bio, page, len, 0);
+                               __bio_add_page(bio, page, len, 0);
                        }
                }
                nr_sectors += len>>9;
@@ -3084,7 +3068,8 @@ static struct r1conf *setup_conf(struct mddev *mddev)
        }
 
        err = -ENOMEM;
-       conf->thread = md_register_thread(raid1d, mddev, "raid1");
+       rcu_assign_pointer(conf->thread,
+                          md_register_thread(raid1d, mddev, "raid1"));
        if (!conf->thread)
                goto abort;
 
@@ -3177,8 +3162,8 @@ static int raid1_run(struct mddev *mddev)
        /*
         * Ok, everything is just fine now
         */
-       mddev->thread = conf->thread;
-       conf->thread = NULL;
+       rcu_assign_pointer(mddev->thread, conf->thread);
+       rcu_assign_pointer(conf->thread, NULL);
        mddev->private = conf;
        set_bit(MD_FAILFAST_SUPPORTED, &mddev->flags);
 
index ebb6788..468f189 100644 (file)
@@ -130,7 +130,7 @@ struct r1conf {
        /* When taking over an array from a different personality, we store
         * the new thread here until we fully activate the array.
         */
-       struct md_thread        *thread;
+       struct md_thread __rcu  *thread;
 
        /* Keep track of cluster resync window to send to other
         * nodes.
index 4fcfcb3..d0de8c9 100644 (file)
@@ -779,8 +779,16 @@ static struct md_rdev *read_balance(struct r10conf *conf,
                disk = r10_bio->devs[slot].devnum;
                rdev = rcu_dereference(conf->mirrors[disk].replacement);
                if (rdev == NULL || test_bit(Faulty, &rdev->flags) ||
-                   r10_bio->devs[slot].addr + sectors > rdev->recovery_offset)
+                   r10_bio->devs[slot].addr + sectors >
+                   rdev->recovery_offset) {
+                       /*
+                        * Read replacement first to prevent reading both rdev
+                        * and replacement as NULL during replacement replace
+                        * rdev.
+                        */
+                       smp_mb();
                        rdev = rcu_dereference(conf->mirrors[disk].rdev);
+               }
                if (rdev == NULL ||
                    test_bit(Faulty, &rdev->flags))
                        continue;
@@ -902,25 +910,15 @@ static void flush_pending_writes(struct r10conf *conf)
                __set_current_state(TASK_RUNNING);
 
                blk_start_plug(&plug);
-               /* flush any pending bitmap writes to disk
-                * before proceeding w/ I/O */
-               md_bitmap_unplug(conf->mddev->bitmap);
+               raid1_prepare_flush_writes(conf->mddev->bitmap);
                wake_up(&conf->wait_barrier);
 
                while (bio) { /* submit pending writes */
                        struct bio *next = bio->bi_next;
-                       struct md_rdev *rdev = (void*)bio->bi_bdev;
-                       bio->bi_next = NULL;
-                       bio_set_dev(bio, rdev->bdev);
-                       if (test_bit(Faulty, &rdev->flags)) {
-                               bio_io_error(bio);
-                       } else if (unlikely((bio_op(bio) ==  REQ_OP_DISCARD) &&
-                                           !bdev_max_discard_sectors(bio->bi_bdev)))
-                               /* Just ignore it */
-                               bio_endio(bio);
-                       else
-                               submit_bio_noacct(bio);
+
+                       raid1_submit_write(bio);
                        bio = next;
+                       cond_resched();
                }
                blk_finish_plug(&plug);
        } else
@@ -982,6 +980,7 @@ static void lower_barrier(struct r10conf *conf)
 static bool stop_waiting_barrier(struct r10conf *conf)
 {
        struct bio_list *bio_list = current->bio_list;
+       struct md_thread *thread;
 
        /* barrier is dropped */
        if (!conf->barrier)
@@ -997,12 +996,14 @@ static bool stop_waiting_barrier(struct r10conf *conf)
            (!bio_list_empty(&bio_list[0]) || !bio_list_empty(&bio_list[1])))
                return true;
 
+       /* daemon thread must exist while handling io */
+       thread = rcu_dereference_protected(conf->mddev->thread, true);
        /*
         * move on if io is issued from raid10d(), nr_pending is not released
         * from original io(see handle_read_error()). All raise barrier is
         * blocked until this io is done.
         */
-       if (conf->mddev->thread->tsk == current) {
+       if (thread->tsk == current) {
                WARN_ON_ONCE(atomic_read(&conf->nr_pending) == 0);
                return true;
        }
@@ -1113,7 +1114,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
        struct r10conf *conf = mddev->private;
        struct bio *bio;
 
-       if (from_schedule || current->bio_list) {
+       if (from_schedule) {
                spin_lock_irq(&conf->device_lock);
                bio_list_merge(&conf->pending_bio_list, &plug->pending);
                spin_unlock_irq(&conf->device_lock);
@@ -1125,23 +1126,15 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 
        /* we aren't scheduling, so we can do the write-out directly. */
        bio = bio_list_get(&plug->pending);
-       md_bitmap_unplug(mddev->bitmap);
+       raid1_prepare_flush_writes(mddev->bitmap);
        wake_up(&conf->wait_barrier);
 
        while (bio) { /* submit pending writes */
                struct bio *next = bio->bi_next;
-               struct md_rdev *rdev = (void*)bio->bi_bdev;
-               bio->bi_next = NULL;
-               bio_set_dev(bio, rdev->bdev);
-               if (test_bit(Faulty, &rdev->flags)) {
-                       bio_io_error(bio);
-               } else if (unlikely((bio_op(bio) ==  REQ_OP_DISCARD) &&
-                                   !bdev_max_discard_sectors(bio->bi_bdev)))
-                       /* Just ignore it */
-                       bio_endio(bio);
-               else
-                       submit_bio_noacct(bio);
+
+               raid1_submit_write(bio);
                bio = next;
+               cond_resched();
        }
        kfree(plug);
 }
@@ -1282,8 +1275,6 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
        const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC;
        const blk_opf_t do_fua = bio->bi_opf & REQ_FUA;
        unsigned long flags;
-       struct blk_plug_cb *cb;
-       struct raid1_plug_cb *plug = NULL;
        struct r10conf *conf = mddev->private;
        struct md_rdev *rdev;
        int devnum = r10_bio->devs[n_copy].devnum;
@@ -1323,14 +1314,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
 
        atomic_inc(&r10_bio->remaining);
 
-       cb = blk_check_plugged(raid10_unplug, mddev, sizeof(*plug));
-       if (cb)
-               plug = container_of(cb, struct raid1_plug_cb, cb);
-       else
-               plug = NULL;
-       if (plug) {
-               bio_list_add(&plug->pending, mbio);
-       } else {
+       if (!raid1_add_bio_to_plug(mddev, mbio, raid10_unplug, conf->copies)) {
                spin_lock_irqsave(&conf->device_lock, flags);
                bio_list_add(&conf->pending_bio_list, mbio);
                spin_unlock_irqrestore(&conf->device_lock, flags);
@@ -1479,9 +1463,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 
        for (i = 0;  i < conf->copies; i++) {
                int d = r10_bio->devs[i].devnum;
-               struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
-               struct md_rdev *rrdev = rcu_dereference(
-                       conf->mirrors[d].replacement);
+               struct md_rdev *rdev, *rrdev;
+
+               rrdev = rcu_dereference(conf->mirrors[d].replacement);
+               /*
+                * Read replacement first to prevent reading both rdev and
+                * replacement as NULL during replacement replace rdev.
+                */
+               smp_mb();
+               rdev = rcu_dereference(conf->mirrors[d].rdev);
                if (rdev == rrdev)
                        rrdev = NULL;
                if (rdev && (test_bit(Faulty, &rdev->flags)))
@@ -2148,9 +2138,10 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 {
        struct r10conf *conf = mddev->private;
        int err = -EEXIST;
-       int mirror;
+       int mirror, repl_slot = -1;
        int first = 0;
        int last = conf->geo.raid_disks - 1;
+       struct raid10_info *p;
 
        if (mddev->recovery_cp < MaxSector)
                /* only hot-add to in-sync arrays, as recovery is
@@ -2173,23 +2164,14 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
        else
                mirror = first;
        for ( ; mirror <= last ; mirror++) {
-               struct raid10_info *p = &conf->mirrors[mirror];
+               p = &conf->mirrors[mirror];
                if (p->recovery_disabled == mddev->recovery_disabled)
                        continue;
                if (p->rdev) {
-                       if (!test_bit(WantReplacement, &p->rdev->flags) ||
-                           p->replacement != NULL)
-                               continue;
-                       clear_bit(In_sync, &rdev->flags);
-                       set_bit(Replacement, &rdev->flags);
-                       rdev->raid_disk = mirror;
-                       err = 0;
-                       if (mddev->gendisk)
-                               disk_stack_limits(mddev->gendisk, rdev->bdev,
-                                                 rdev->data_offset << 9);
-                       conf->fullsync = 1;
-                       rcu_assign_pointer(p->replacement, rdev);
-                       break;
+                       if (test_bit(WantReplacement, &p->rdev->flags) &&
+                           p->replacement == NULL && repl_slot < 0)
+                               repl_slot = mirror;
+                       continue;
                }
 
                if (mddev->gendisk)
@@ -2206,6 +2188,19 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
                break;
        }
 
+       if (err && repl_slot >= 0) {
+               p = &conf->mirrors[repl_slot];
+               clear_bit(In_sync, &rdev->flags);
+               set_bit(Replacement, &rdev->flags);
+               rdev->raid_disk = repl_slot;
+               err = 0;
+               if (mddev->gendisk)
+                       disk_stack_limits(mddev->gendisk, rdev->bdev,
+                                         rdev->data_offset << 9);
+               conf->fullsync = 1;
+               rcu_assign_pointer(p->replacement, rdev);
+       }
+
        print_conf(conf);
        return err;
 }
@@ -3303,6 +3298,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
        int chunks_skipped = 0;
        sector_t chunk_mask = conf->geo.chunk_mask;
        int page_idx = 0;
+       int error_disk = -1;
 
        /*
         * Allow skipping a full rebuild for incremental assembly
@@ -3386,8 +3382,21 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                return reshape_request(mddev, sector_nr, skipped);
 
        if (chunks_skipped >= conf->geo.raid_disks) {
-               /* if there has been nothing to do on any drive,
-                * then there is nothing to do at all..
+               pr_err("md/raid10:%s: %s fails\n", mdname(mddev),
+                       test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");
+               if (error_disk >= 0 &&
+                   !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+                       /*
+                        * recovery fails, set mirrors.recovery_disabled,
+                        * device shouldn't be added to there.
+                        */
+                       conf->mirrors[error_disk].recovery_disabled =
+                                               mddev->recovery_disabled;
+                       return 0;
+               }
+               /*
+                * if there has been nothing to do on any drive,
+                * then there is nothing to do at all.
                 */
                *skipped = 1;
                return (max_sector - sector_nr) + sectors_skipped;
@@ -3437,8 +3446,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                        sector_t sect;
                        int must_sync;
                        int any_working;
-                       int need_recover = 0;
-                       int need_replace = 0;
                        struct raid10_info *mirror = &conf->mirrors[i];
                        struct md_rdev *mrdev, *mreplace;
 
@@ -3446,15 +3453,13 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                        mrdev = rcu_dereference(mirror->rdev);
                        mreplace = rcu_dereference(mirror->replacement);
 
-                       if (mrdev != NULL &&
-                           !test_bit(Faulty, &mrdev->flags) &&
-                           !test_bit(In_sync, &mrdev->flags))
-                               need_recover = 1;
-                       if (mreplace != NULL &&
-                           !test_bit(Faulty, &mreplace->flags))
-                               need_replace = 1;
+                       if (mrdev && (test_bit(Faulty, &mrdev->flags) ||
+                           test_bit(In_sync, &mrdev->flags)))
+                               mrdev = NULL;
+                       if (mreplace && test_bit(Faulty, &mreplace->flags))
+                               mreplace = NULL;
 
-                       if (!need_recover && !need_replace) {
+                       if (!mrdev && !mreplace) {
                                rcu_read_unlock();
                                continue;
                        }
@@ -3470,8 +3475,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                                rcu_read_unlock();
                                continue;
                        }
-                       if (mreplace && test_bit(Faulty, &mreplace->flags))
-                               mreplace = NULL;
                        /* Unless we are doing a full sync, or a replacement
                         * we only need to recover the block if it is set in
                         * the bitmap
@@ -3490,7 +3493,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                                rcu_read_unlock();
                                continue;
                        }
-                       atomic_inc(&mrdev->nr_pending);
+                       if (mrdev)
+                               atomic_inc(&mrdev->nr_pending);
                        if (mreplace)
                                atomic_inc(&mreplace->nr_pending);
                        rcu_read_unlock();
@@ -3577,7 +3581,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                                r10_bio->devs[1].devnum = i;
                                r10_bio->devs[1].addr = to_addr;
 
-                               if (need_recover) {
+                               if (mrdev) {
                                        bio = r10_bio->devs[1].bio;
                                        bio->bi_next = biolist;
                                        biolist = bio;
@@ -3594,11 +3598,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                                bio = r10_bio->devs[1].repl_bio;
                                if (bio)
                                        bio->bi_end_io = NULL;
-                               /* Note: if need_replace, then bio
+                               /* Note: if replace is not NULL, then bio
                                 * cannot be NULL as r10buf_pool_alloc will
                                 * have allocated it.
                                 */
-                               if (!need_replace)
+                               if (!mreplace)
                                        break;
                                bio->bi_next = biolist;
                                biolist = bio;
@@ -3622,7 +3626,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                                        for (k = 0; k < conf->copies; k++)
                                                if (r10_bio->devs[k].devnum == i)
                                                        break;
-                                       if (!test_bit(In_sync,
+                                       if (mrdev && !test_bit(In_sync,
                                                      &mrdev->flags)
                                            && !rdev_set_badblocks(
                                                    mrdev,
@@ -3643,17 +3647,21 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                                                       mdname(mddev));
                                        mirror->recovery_disabled
                                                = mddev->recovery_disabled;
+                               } else {
+                                       error_disk = i;
                                }
                                put_buf(r10_bio);
                                if (rb2)
                                        atomic_dec(&rb2->remaining);
                                r10_bio = rb2;
-                               rdev_dec_pending(mrdev, mddev);
+                               if (mrdev)
+                                       rdev_dec_pending(mrdev, mddev);
                                if (mreplace)
                                        rdev_dec_pending(mreplace, mddev);
                                break;
                        }
-                       rdev_dec_pending(mrdev, mddev);
+                       if (mrdev)
+                               rdev_dec_pending(mrdev, mddev);
                        if (mreplace)
                                rdev_dec_pending(mreplace, mddev);
                        if (r10_bio->devs[0].bio->bi_opf & MD_FAILFAST) {
@@ -3819,11 +3827,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
                for (bio= biolist ; bio ; bio=bio->bi_next) {
                        struct resync_pages *rp = get_resync_pages(bio);
                        page = resync_fetch_page(rp, page_idx);
-                       /*
-                        * won't fail because the vec table is big enough
-                        * to hold all these pages
-                        */
-                       bio_add_page(bio, page, len, 0);
+                       if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
+                               bio->bi_status = BLK_STS_RESOURCE;
+                               bio_endio(bio);
+                               goto giveup;
+                       }
                }
                nr_sectors += len>>9;
                sector_nr += len>>9;
@@ -4107,7 +4115,8 @@ static struct r10conf *setup_conf(struct mddev *mddev)
        atomic_set(&conf->nr_pending, 0);
 
        err = -ENOMEM;
-       conf->thread = md_register_thread(raid10d, mddev, "raid10");
+       rcu_assign_pointer(conf->thread,
+                          md_register_thread(raid10d, mddev, "raid10"));
        if (!conf->thread)
                goto out;
 
@@ -4152,8 +4161,8 @@ static int raid10_run(struct mddev *mddev)
        if (!conf)
                goto out;
 
-       mddev->thread = conf->thread;
-       conf->thread = NULL;
+       rcu_assign_pointer(mddev->thread, conf->thread);
+       rcu_assign_pointer(conf->thread, NULL);
 
        if (mddev_is_clustered(conf->mddev)) {
                int fc, fo;
@@ -4296,8 +4305,8 @@ static int raid10_run(struct mddev *mddev)
                clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
                set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
                set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-               mddev->sync_thread = md_register_thread(md_do_sync, mddev,
-                                                       "reshape");
+               rcu_assign_pointer(mddev->sync_thread,
+                       md_register_thread(md_do_sync, mddev, "reshape"));
                if (!mddev->sync_thread)
                        goto out_free_conf;
        }
@@ -4698,8 +4707,8 @@ out:
        set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
        set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
 
-       mddev->sync_thread = md_register_thread(md_do_sync, mddev,
-                                               "reshape");
+       rcu_assign_pointer(mddev->sync_thread,
+                          md_register_thread(md_do_sync, mddev, "reshape"));
        if (!mddev->sync_thread) {
                ret = -EAGAIN;
                goto abort;
@@ -4997,11 +5006,11 @@ read_more:
                if (len > PAGE_SIZE)
                        len = PAGE_SIZE;
                for (bio = blist; bio ; bio = bio->bi_next) {
-                       /*
-                        * won't fail because the vec table is big enough
-                        * to hold all these pages
-                        */
-                       bio_add_page(bio, page, len, 0);
+                       if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
+                               bio->bi_status = BLK_STS_RESOURCE;
+                               bio_endio(bio);
+                               return sectors_done;
+                       }
                }
                sector_nr += len >> 9;
                nr_sectors += len >> 9;
index 8c072ce..63e48b1 100644 (file)
@@ -100,7 +100,7 @@ struct r10conf {
        /* When taking over an array from a different personality, we store
         * the new thread here until we fully activate the array.
         */
-       struct md_thread        *thread;
+       struct md_thread __rcu  *thread;
 
        /*
         * Keep track of cluster resync window to send to other nodes.
index 46182b9..47ba7d9 100644 (file)
@@ -120,7 +120,7 @@ struct r5l_log {
        struct bio_set bs;
        mempool_t meta_pool;
 
-       struct md_thread *reclaim_thread;
+       struct md_thread __rcu *reclaim_thread;
        unsigned long reclaim_target;   /* number of space that need to be
                                         * reclaimed.  if it's 0, reclaim spaces
                                         * used by io_units which are in
@@ -792,7 +792,7 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
        io->current_bio = r5l_bio_alloc(log);
        io->current_bio->bi_end_io = r5l_log_endio;
        io->current_bio->bi_private = io;
-       bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0);
+       __bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0);
 
        r5_reserve_log_entry(log, io);
 
@@ -1576,17 +1576,18 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space)
 
 void r5l_quiesce(struct r5l_log *log, int quiesce)
 {
-       struct mddev *mddev;
+       struct mddev *mddev = log->rdev->mddev;
+       struct md_thread *thread = rcu_dereference_protected(
+               log->reclaim_thread, lockdep_is_held(&mddev->reconfig_mutex));
 
        if (quiesce) {
                /* make sure r5l_write_super_and_discard_space exits */
-               mddev = log->rdev->mddev;
                wake_up(&mddev->sb_wait);
-               kthread_park(log->reclaim_thread->tsk);
+               kthread_park(thread->tsk);
                r5l_wake_reclaim(log, MaxSector);
                r5l_do_reclaim(log);
        } else
-               kthread_unpark(log->reclaim_thread->tsk);
+               kthread_unpark(thread->tsk);
 }
 
 bool r5l_log_disk_error(struct r5conf *conf)
@@ -3063,6 +3064,7 @@ void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
 int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
 {
        struct r5l_log *log;
+       struct md_thread *thread;
        int ret;
 
        pr_debug("md/raid:%s: using device %pg as journal\n",
@@ -3121,11 +3123,13 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
        spin_lock_init(&log->tree_lock);
        INIT_RADIX_TREE(&log->big_stripe_tree, GFP_NOWAIT | __GFP_NOWARN);
 
-       log->reclaim_thread = md_register_thread(r5l_reclaim_thread,
-                                                log->rdev->mddev, "reclaim");
-       if (!log->reclaim_thread)
+       thread = md_register_thread(r5l_reclaim_thread, log->rdev->mddev,
+                                   "reclaim");
+       if (!thread)
                goto reclaim_thread;
-       log->reclaim_thread->timeout = R5C_RECLAIM_WAKEUP_INTERVAL;
+
+       thread->timeout = R5C_RECLAIM_WAKEUP_INTERVAL;
+       rcu_assign_pointer(log->reclaim_thread, thread);
 
        init_waitqueue_head(&log->iounit_wait);
 
index e495939..eaea57a 100644 (file)
@@ -465,7 +465,7 @@ static void ppl_submit_iounit(struct ppl_io_unit *io)
 
        bio->bi_end_io = ppl_log_endio;
        bio->bi_iter.bi_sector = log->next_io_sector;
-       bio_add_page(bio, io->header_page, PAGE_SIZE, 0);
+       __bio_add_page(bio, io->header_page, PAGE_SIZE, 0);
 
        pr_debug("%s: log->current_io_sector: %llu\n", __func__,
            (unsigned long long)log->next_io_sector);
@@ -496,7 +496,7 @@ static void ppl_submit_iounit(struct ppl_io_unit *io)
                                               prev->bi_opf, GFP_NOIO,
                                               &ppl_conf->bs);
                        bio->bi_iter.bi_sector = bio_end_sector(prev);
-                       bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0);
+                       __bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0);
 
                        bio_chain(bio, prev);
                        ppl_submit_iounit_bio(io, prev);
index 4739ed8..6615abf 100644 (file)
@@ -5516,7 +5516,7 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
 
        sector = raid5_compute_sector(conf, raid_bio->bi_iter.bi_sector, 0,
                                      &dd_idx, NULL);
-       end_sector = bio_end_sector(raid_bio);
+       end_sector = sector + bio_sectors(raid_bio);
 
        rcu_read_lock();
        if (r5c_big_stripe_cached(conf, sector))
@@ -5966,6 +5966,19 @@ out:
        return ret;
 }
 
+static bool reshape_inprogress(struct mddev *mddev)
+{
+       return test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
+              test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
+              !test_bit(MD_RECOVERY_DONE, &mddev->recovery) &&
+              !test_bit(MD_RECOVERY_INTR, &mddev->recovery);
+}
+
+static bool reshape_disabled(struct mddev *mddev)
+{
+       return is_md_suspended(mddev) || !md_is_rdwr(mddev);
+}
+
 static enum stripe_result make_stripe_request(struct mddev *mddev,
                struct r5conf *conf, struct stripe_request_ctx *ctx,
                sector_t logical_sector, struct bio *bi)
@@ -5997,7 +6010,8 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
                        if (ahead_of_reshape(mddev, logical_sector,
                                             conf->reshape_safe)) {
                                spin_unlock_irq(&conf->device_lock);
-                               return STRIPE_SCHEDULE_AND_RETRY;
+                               ret = STRIPE_SCHEDULE_AND_RETRY;
+                               goto out;
                        }
                }
                spin_unlock_irq(&conf->device_lock);
@@ -6076,6 +6090,15 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 
 out_release:
        raid5_release_stripe(sh);
+out:
+       if (ret == STRIPE_SCHEDULE_AND_RETRY && !reshape_inprogress(mddev) &&
+           reshape_disabled(mddev)) {
+               bi->bi_status = BLK_STS_IOERR;
+               ret = STRIPE_FAIL;
+               pr_err("md/raid456:%s: io failed across reshape position while reshape can't make progress.\n",
+                      mdname(mddev));
+       }
+
        return ret;
 }
 
@@ -7708,7 +7731,8 @@ static struct r5conf *setup_conf(struct mddev *mddev)
        }
 
        sprintf(pers_name, "raid%d", mddev->new_level);
-       conf->thread = md_register_thread(raid5d, mddev, pers_name);
+       rcu_assign_pointer(conf->thread,
+                          md_register_thread(raid5d, mddev, pers_name));
        if (!conf->thread) {
                pr_warn("md/raid:%s: couldn't allocate thread.\n",
                        mdname(mddev));
@@ -7931,8 +7955,8 @@ static int raid5_run(struct mddev *mddev)
        }
 
        conf->min_offset_diff = min_offset_diff;
-       mddev->thread = conf->thread;
-       conf->thread = NULL;
+       rcu_assign_pointer(mddev->thread, conf->thread);
+       rcu_assign_pointer(conf->thread, NULL);
        mddev->private = conf;
 
        for (i = 0; i < conf->raid_disks && conf->previous_raid_disks;
@@ -8029,8 +8053,8 @@ static int raid5_run(struct mddev *mddev)
                clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
                set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
                set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-               mddev->sync_thread = md_register_thread(md_do_sync, mddev,
-                                                       "reshape");
+               rcu_assign_pointer(mddev->sync_thread,
+                       md_register_thread(md_do_sync, mddev, "reshape"));
                if (!mddev->sync_thread)
                        goto abort;
        }
@@ -8377,6 +8401,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
                p = conf->disks + disk;
                tmp = rdev_mdlock_deref(mddev, p->rdev);
                if (test_bit(WantReplacement, &tmp->flags) &&
+                   mddev->reshape_position == MaxSector &&
                    p->replacement == NULL) {
                        clear_bit(In_sync, &rdev->flags);
                        set_bit(Replacement, &rdev->flags);
@@ -8500,6 +8525,7 @@ static int raid5_start_reshape(struct mddev *mddev)
        struct r5conf *conf = mddev->private;
        struct md_rdev *rdev;
        int spares = 0;
+       int i;
        unsigned long flags;
 
        if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
@@ -8511,6 +8537,13 @@ static int raid5_start_reshape(struct mddev *mddev)
        if (has_failed(conf))
                return -EINVAL;
 
+       /* raid5 can't handle concurrent reshape and recovery */
+       if (mddev->recovery_cp < MaxSector)
+               return -EBUSY;
+       for (i = 0; i < conf->raid_disks; i++)
+               if (rdev_mdlock_deref(mddev, conf->disks[i].replacement))
+                       return -EBUSY;
+
        rdev_for_each(rdev, mddev) {
                if (!test_bit(In_sync, &rdev->flags)
                    && !test_bit(Faulty, &rdev->flags))
@@ -8607,8 +8640,8 @@ static int raid5_start_reshape(struct mddev *mddev)
        clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
        set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
        set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-       mddev->sync_thread = md_register_thread(md_do_sync, mddev,
-                                               "reshape");
+       rcu_assign_pointer(mddev->sync_thread,
+                          md_register_thread(md_do_sync, mddev, "reshape"));
        if (!mddev->sync_thread) {
                mddev->recovery = 0;
                spin_lock_irq(&conf->device_lock);
@@ -9043,6 +9076,22 @@ static int raid5_start(struct mddev *mddev)
        return r5l_start(conf->log);
 }
 
+static void raid5_prepare_suspend(struct mddev *mddev)
+{
+       struct r5conf *conf = mddev->private;
+
+       wait_event(mddev->sb_wait, !reshape_inprogress(mddev) ||
+                                   percpu_ref_is_zero(&mddev->active_io));
+       if (percpu_ref_is_zero(&mddev->active_io))
+               return;
+
+       /*
+        * Reshape is not in progress, and array is suspended, io that is
+        * waiting for reshpape can never be done.
+        */
+       wake_up(&conf->wait_for_overlap);
+}
+
 static struct md_personality raid6_personality =
 {
        .name           = "raid6",
@@ -9063,6 +9112,7 @@ static struct md_personality raid6_personality =
        .check_reshape  = raid6_check_reshape,
        .start_reshape  = raid5_start_reshape,
        .finish_reshape = raid5_finish_reshape,
+       .prepare_suspend = raid5_prepare_suspend,
        .quiesce        = raid5_quiesce,
        .takeover       = raid6_takeover,
        .change_consistency_policy = raid5_change_consistency_policy,
@@ -9087,6 +9137,7 @@ static struct md_personality raid5_personality =
        .check_reshape  = raid5_check_reshape,
        .start_reshape  = raid5_start_reshape,
        .finish_reshape = raid5_finish_reshape,
+       .prepare_suspend = raid5_prepare_suspend,
        .quiesce        = raid5_quiesce,
        .takeover       = raid5_takeover,
        .change_consistency_policy = raid5_change_consistency_policy,
@@ -9112,6 +9163,7 @@ static struct md_personality raid4_personality =
        .check_reshape  = raid5_check_reshape,
        .start_reshape  = raid5_start_reshape,
        .finish_reshape = raid5_finish_reshape,
+       .prepare_suspend = raid5_prepare_suspend,
        .quiesce        = raid5_quiesce,
        .takeover       = raid4_takeover,
        .change_consistency_policy = raid5_change_consistency_policy,
index e873938..f197071 100644 (file)
@@ -679,7 +679,7 @@ struct r5conf {
        /* When taking over an array from a different personality, we store
         * the new thread here until we fully activate the array.
         */
-       struct md_thread        *thread;
+       struct md_thread __rcu  *thread;
        struct list_head        temp_inactive_list[NR_STRIPE_HASH_LOCKS];
        struct r5worker_group   *worker_groups;
        int                     group_cnt;
index 769ea6b..241b162 100644 (file)
@@ -1091,7 +1091,8 @@ void cec_received_msg_ts(struct cec_adapter *adap,
        mutex_lock(&adap->lock);
        dprintk(2, "%s: %*ph\n", __func__, msg->len, msg->msg);
 
-       adap->last_initiator = 0xff;
+       if (!adap->transmit_in_progress)
+               adap->last_initiator = 0xff;
 
        /* Check if this message was for us (directed or broadcast). */
        if (!cec_msg_is_broadcast(msg)) {
@@ -1585,7 +1586,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
  *
  * This function is called with adap->lock held.
  */
-static int cec_adap_enable(struct cec_adapter *adap)
+int cec_adap_enable(struct cec_adapter *adap)
 {
        bool enable;
        int ret = 0;
@@ -1595,6 +1596,9 @@ static int cec_adap_enable(struct cec_adapter *adap)
        if (adap->needs_hpd)
                enable = enable && adap->phys_addr != CEC_PHYS_ADDR_INVALID;
 
+       if (adap->devnode.unregistered)
+               enable = false;
+
        if (enable == adap->is_enabled)
                return 0;
 
index af358e9..7e153c5 100644 (file)
@@ -191,6 +191,8 @@ static void cec_devnode_unregister(struct cec_adapter *adap)
        mutex_lock(&adap->lock);
        __cec_s_phys_addr(adap, CEC_PHYS_ADDR_INVALID, false);
        __cec_s_log_addrs(adap, NULL, false);
+       // Disable the adapter (since adap->devnode.unregistered is true)
+       cec_adap_enable(adap);
        mutex_unlock(&adap->lock);
 
        cdev_device_del(&devnode->cdev, &devnode->dev);
index b78df93..ed1f8c6 100644 (file)
@@ -47,6 +47,7 @@ int cec_monitor_pin_cnt_inc(struct cec_adapter *adap);
 void cec_monitor_pin_cnt_dec(struct cec_adapter *adap);
 int cec_adap_status(struct seq_file *file, void *priv);
 int cec_thread_func(void *_adap);
+int cec_adap_enable(struct cec_adapter *adap);
 void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block);
 int __cec_s_log_addrs(struct cec_adapter *adap,
                      struct cec_log_addrs *log_addrs, bool block);
index c2d2792..baf6454 100644 (file)
@@ -151,6 +151,12 @@ struct dvb_ca_private {
 
        /* mutex serializing ioctls */
        struct mutex ioctl_mutex;
+
+       /* A mutex used when a device is disconnected */
+       struct mutex remove_mutex;
+
+       /* Whether the device is disconnected */
+       int exit;
 };
 
 static void dvb_ca_private_free(struct dvb_ca_private *ca)
@@ -187,7 +193,7 @@ static void dvb_ca_en50221_thread_wakeup(struct dvb_ca_private *ca);
 static int dvb_ca_en50221_read_data(struct dvb_ca_private *ca, int slot,
                                    u8 *ebuf, int ecount);
 static int dvb_ca_en50221_write_data(struct dvb_ca_private *ca, int slot,
-                                    u8 *ebuf, int ecount);
+                                    u8 *ebuf, int ecount, int size_write_flag);
 
 /**
  * findstr - Safely find needle in haystack.
@@ -370,7 +376,7 @@ static int dvb_ca_en50221_link_init(struct dvb_ca_private *ca, int slot)
        ret = dvb_ca_en50221_wait_if_status(ca, slot, STATUSREG_FR, HZ / 10);
        if (ret)
                return ret;
-       ret = dvb_ca_en50221_write_data(ca, slot, buf, 2);
+       ret = dvb_ca_en50221_write_data(ca, slot, buf, 2, CMDREG_SW);
        if (ret != 2)
                return -EIO;
        ret = ca->pub->write_cam_control(ca->pub, slot, CTRLIF_COMMAND, IRQEN);
@@ -778,11 +784,13 @@ exit:
  * @buf: The data in this buffer is treated as a complete link-level packet to
  *      be written.
  * @bytes_write: Size of ebuf.
+ * @size_write_flag: A flag on Command Register which says whether the link size
+ * information will be writen or not.
  *
  * return: Number of bytes written, or < 0 on error.
  */
 static int dvb_ca_en50221_write_data(struct dvb_ca_private *ca, int slot,
-                                    u8 *buf, int bytes_write)
+                                    u8 *buf, int bytes_write, int size_write_flag)
 {
        struct dvb_ca_slot *sl = &ca->slot_info[slot];
        int status;
@@ -817,7 +825,7 @@ static int dvb_ca_en50221_write_data(struct dvb_ca_private *ca, int slot,
 
        /* OK, set HC bit */
        status = ca->pub->write_cam_control(ca->pub, slot, CTRLIF_COMMAND,
-                                           IRQEN | CMDREG_HC);
+                                           IRQEN | CMDREG_HC | size_write_flag);
        if (status)
                goto exit;
 
@@ -1508,7 +1516,7 @@ static ssize_t dvb_ca_en50221_io_write(struct file *file,
 
                        mutex_lock(&sl->slot_lock);
                        status = dvb_ca_en50221_write_data(ca, slot, fragbuf,
-                                                          fraglen + 2);
+                                                          fraglen + 2, 0);
                        mutex_unlock(&sl->slot_lock);
                        if (status == (fraglen + 2)) {
                                written = 1;
@@ -1709,12 +1717,22 @@ static int dvb_ca_en50221_io_open(struct inode *inode, struct file *file)
 
        dprintk("%s\n", __func__);
 
-       if (!try_module_get(ca->pub->owner))
+       mutex_lock(&ca->remove_mutex);
+
+       if (ca->exit) {
+               mutex_unlock(&ca->remove_mutex);
+               return -ENODEV;
+       }
+
+       if (!try_module_get(ca->pub->owner)) {
+               mutex_unlock(&ca->remove_mutex);
                return -EIO;
+       }
 
        err = dvb_generic_open(inode, file);
        if (err < 0) {
                module_put(ca->pub->owner);
+               mutex_unlock(&ca->remove_mutex);
                return err;
        }
 
@@ -1739,6 +1757,7 @@ static int dvb_ca_en50221_io_open(struct inode *inode, struct file *file)
 
        dvb_ca_private_get(ca);
 
+       mutex_unlock(&ca->remove_mutex);
        return 0;
 }
 
@@ -1758,6 +1777,8 @@ static int dvb_ca_en50221_io_release(struct inode *inode, struct file *file)
 
        dprintk("%s\n", __func__);
 
+       mutex_lock(&ca->remove_mutex);
+
        /* mark the CA device as closed */
        ca->open = 0;
        dvb_ca_en50221_thread_update_delay(ca);
@@ -1768,6 +1789,13 @@ static int dvb_ca_en50221_io_release(struct inode *inode, struct file *file)
 
        dvb_ca_private_put(ca);
 
+       if (dvbdev->users == 1 && ca->exit == 1) {
+               mutex_unlock(&ca->remove_mutex);
+               wake_up(&dvbdev->wait_queue);
+       } else {
+               mutex_unlock(&ca->remove_mutex);
+       }
+
        return err;
 }
 
@@ -1891,6 +1919,7 @@ int dvb_ca_en50221_init(struct dvb_adapter *dvb_adapter,
        }
 
        mutex_init(&ca->ioctl_mutex);
+       mutex_init(&ca->remove_mutex);
 
        if (signal_pending(current)) {
                ret = -EINTR;
@@ -1933,6 +1962,14 @@ void dvb_ca_en50221_release(struct dvb_ca_en50221 *pubca)
 
        dprintk("%s\n", __func__);
 
+       mutex_lock(&ca->remove_mutex);
+       ca->exit = 1;
+       mutex_unlock(&ca->remove_mutex);
+
+       if (ca->dvbdev->users < 1)
+               wait_event(ca->dvbdev->wait_queue,
+                               ca->dvbdev->users == 1);
+
        /* shutdown the thread if there was one */
        kthread_stop(ca->thread);
 
index 398c862..7c4d86b 100644 (file)
@@ -115,12 +115,12 @@ static inline int dvb_dmx_swfilter_payload(struct dvb_demux_feed *feed,
 
        cc = buf[3] & 0x0f;
        ccok = ((feed->cc + 1) & 0x0f) == cc;
-       feed->cc = cc;
        if (!ccok) {
                set_buf_flags(feed, DMX_BUFFER_FLAG_DISCONTINUITY_DETECTED);
                dprintk_sect_loss("missed packet: %d instead of %d!\n",
                                  cc, (feed->cc + 1) & 0x0f);
        }
+       feed->cc = cc;
 
        if (buf[1] & 0x40)      // PUSI ?
                feed->peslen = 0xfffa;
@@ -300,7 +300,6 @@ static int dvb_dmx_swfilter_section_packet(struct dvb_demux_feed *feed,
 
        cc = buf[3] & 0x0f;
        ccok = ((feed->cc + 1) & 0x0f) == cc;
-       feed->cc = cc;
 
        if (buf[3] & 0x20) {
                /* adaption field present, check for discontinuity_indicator */
@@ -336,6 +335,7 @@ static int dvb_dmx_swfilter_section_packet(struct dvb_demux_feed *feed,
                feed->pusi_seen = false;
                dvb_dmx_swfilter_section_new(feed);
        }
+       feed->cc = cc;
 
        if (buf[1] & 0x40) {
                /* PUSI=1 (is set), section boundary is here */
index cc0a789..9293b05 100644 (file)
@@ -293,14 +293,22 @@ static int dvb_frontend_get_event(struct dvb_frontend *fe,
        }
 
        if (events->eventw == events->eventr) {
-               int ret;
+               struct wait_queue_entry wait;
+               int ret = 0;
 
                if (flags & O_NONBLOCK)
                        return -EWOULDBLOCK;
 
-               ret = wait_event_interruptible(events->wait_queue,
-                                              dvb_frontend_test_event(fepriv, events));
-
+               init_waitqueue_entry(&wait, current);
+               add_wait_queue(&events->wait_queue, &wait);
+               while (!dvb_frontend_test_event(fepriv, events)) {
+                       wait_woken(&wait, TASK_INTERRUPTIBLE, 0);
+                       if (signal_pending(current)) {
+                               ret = -ERESTARTSYS;
+                               break;
+                       }
+               }
+               remove_wait_queue(&events->wait_queue, &wait);
                if (ret < 0)
                        return ret;
        }
index 8a2febf..8bb8dd3 100644 (file)
@@ -1564,15 +1564,43 @@ static long dvb_net_ioctl(struct file *file,
        return dvb_usercopy(file, cmd, arg, dvb_net_do_ioctl);
 }
 
+static int locked_dvb_net_open(struct inode *inode, struct file *file)
+{
+       struct dvb_device *dvbdev = file->private_data;
+       struct dvb_net *dvbnet = dvbdev->priv;
+       int ret;
+
+       if (mutex_lock_interruptible(&dvbnet->remove_mutex))
+               return -ERESTARTSYS;
+
+       if (dvbnet->exit) {
+               mutex_unlock(&dvbnet->remove_mutex);
+               return -ENODEV;
+       }
+
+       ret = dvb_generic_open(inode, file);
+
+       mutex_unlock(&dvbnet->remove_mutex);
+
+       return ret;
+}
+
 static int dvb_net_close(struct inode *inode, struct file *file)
 {
        struct dvb_device *dvbdev = file->private_data;
        struct dvb_net *dvbnet = dvbdev->priv;
 
+       mutex_lock(&dvbnet->remove_mutex);
+
        dvb_generic_release(inode, file);
 
-       if(dvbdev->users == 1 && dvbnet->exit == 1)
+       if (dvbdev->users == 1 && dvbnet->exit == 1) {
+               mutex_unlock(&dvbnet->remove_mutex);
                wake_up(&dvbdev->wait_queue);
+       } else {
+               mutex_unlock(&dvbnet->remove_mutex);
+       }
+
        return 0;
 }
 
@@ -1580,7 +1608,7 @@ static int dvb_net_close(struct inode *inode, struct file *file)
 static const struct file_operations dvb_net_fops = {
        .owner = THIS_MODULE,
        .unlocked_ioctl = dvb_net_ioctl,
-       .open = dvb_generic_open,
+       .open = locked_dvb_net_open,
        .release = dvb_net_close,
        .llseek = noop_llseek,
 };
@@ -1599,10 +1627,13 @@ void dvb_net_release (struct dvb_net *dvbnet)
 {
        int i;
 
+       mutex_lock(&dvbnet->remove_mutex);
        dvbnet->exit = 1;
+       mutex_unlock(&dvbnet->remove_mutex);
+
        if (dvbnet->dvbdev->users < 1)
                wait_event(dvbnet->dvbdev->wait_queue,
-                               dvbnet->dvbdev->users==1);
+                               dvbnet->dvbdev->users == 1);
 
        dvb_unregister_device(dvbnet->dvbdev);
 
@@ -1621,6 +1652,7 @@ int dvb_net_init (struct dvb_adapter *adap, struct dvb_net *dvbnet,
        int i;
 
        mutex_init(&dvbnet->ioctl_mutex);
+       mutex_init(&dvbnet->remove_mutex);
        dvbnet->demux = dmx;
 
        for (i=0; i<DVB_NET_DEVICES_MAX; i++)
index e9b3ce0..a4b05e3 100644 (file)
@@ -27,6 +27,7 @@
 #include <media/tuner.h>
 
 static DEFINE_MUTEX(dvbdev_mutex);
+static LIST_HEAD(dvbdevfops_list);
 static int dvbdev_debug;
 
 module_param(dvbdev_debug, int, 0644);
@@ -453,14 +454,15 @@ int dvb_register_device(struct dvb_adapter *adap, struct dvb_device **pdvbdev,
                        enum dvb_device_type type, int demux_sink_pads)
 {
        struct dvb_device *dvbdev;
-       struct file_operations *dvbdevfops;
+       struct file_operations *dvbdevfops = NULL;
+       struct dvbdevfops_node *node = NULL, *new_node = NULL;
        struct device *clsdev;
        int minor;
        int id, ret;
 
        mutex_lock(&dvbdev_register_lock);
 
-       if ((id = dvbdev_get_free_id (adap, type)) < 0){
+       if ((id = dvbdev_get_free_id (adap, type)) < 0) {
                mutex_unlock(&dvbdev_register_lock);
                *pdvbdev = NULL;
                pr_err("%s: couldn't find free device id\n", __func__);
@@ -468,18 +470,45 @@ int dvb_register_device(struct dvb_adapter *adap, struct dvb_device **pdvbdev,
        }
 
        *pdvbdev = dvbdev = kzalloc(sizeof(*dvbdev), GFP_KERNEL);
-
        if (!dvbdev){
                mutex_unlock(&dvbdev_register_lock);
                return -ENOMEM;
        }
 
-       dvbdevfops = kmemdup(template->fops, sizeof(*dvbdevfops), GFP_KERNEL);
+       /*
+        * When a device of the same type is probe()d more than once,
+        * the first allocated fops are used. This prevents memory leaks
+        * that can occur when the same device is probe()d repeatedly.
+        */
+       list_for_each_entry(node, &dvbdevfops_list, list_head) {
+               if (node->fops->owner == adap->module &&
+                               node->type == type &&
+                               node->template == template) {
+                       dvbdevfops = node->fops;
+                       break;
+               }
+       }
 
-       if (!dvbdevfops){
-               kfree (dvbdev);
-               mutex_unlock(&dvbdev_register_lock);
-               return -ENOMEM;
+       if (dvbdevfops == NULL) {
+               dvbdevfops = kmemdup(template->fops, sizeof(*dvbdevfops), GFP_KERNEL);
+               if (!dvbdevfops) {
+                       kfree(dvbdev);
+                       mutex_unlock(&dvbdev_register_lock);
+                       return -ENOMEM;
+               }
+
+               new_node = kzalloc(sizeof(struct dvbdevfops_node), GFP_KERNEL);
+               if (!new_node) {
+                       kfree(dvbdevfops);
+                       kfree(dvbdev);
+                       mutex_unlock(&dvbdev_register_lock);
+                       return -ENOMEM;
+               }
+
+               new_node->fops = dvbdevfops;
+               new_node->type = type;
+               new_node->template = template;
+               list_add_tail (&new_node->list_head, &dvbdevfops_list);
        }
 
        memcpy(dvbdev, template, sizeof(struct dvb_device));
@@ -490,20 +519,20 @@ int dvb_register_device(struct dvb_adapter *adap, struct dvb_device **pdvbdev,
        dvbdev->priv = priv;
        dvbdev->fops = dvbdevfops;
        init_waitqueue_head (&dvbdev->wait_queue);
-
        dvbdevfops->owner = adap->module;
-
        list_add_tail (&dvbdev->list_head, &adap->device_list);
-
        down_write(&minor_rwsem);
 #ifdef CONFIG_DVB_DYNAMIC_MINORS
        for (minor = 0; minor < MAX_DVB_MINORS; minor++)
                if (dvb_minors[minor] == NULL)
                        break;
-
        if (minor == MAX_DVB_MINORS) {
+               if (new_node) {
+                       list_del (&new_node->list_head);
+                       kfree(dvbdevfops);
+                       kfree(new_node);
+               }
                list_del (&dvbdev->list_head);
-               kfree(dvbdevfops);
                kfree(dvbdev);
                up_write(&minor_rwsem);
                mutex_unlock(&dvbdev_register_lock);
@@ -512,41 +541,47 @@ int dvb_register_device(struct dvb_adapter *adap, struct dvb_device **pdvbdev,
 #else
        minor = nums2minor(adap->num, type, id);
 #endif
-
        dvbdev->minor = minor;
        dvb_minors[minor] = dvb_device_get(dvbdev);
        up_write(&minor_rwsem);
-
        ret = dvb_register_media_device(dvbdev, type, minor, demux_sink_pads);
        if (ret) {
                pr_err("%s: dvb_register_media_device failed to create the mediagraph\n",
                      __func__);
-
+               if (new_node) {
+                       list_del (&new_node->list_head);
+                       kfree(dvbdevfops);
+                       kfree(new_node);
+               }
                dvb_media_device_free(dvbdev);
                list_del (&dvbdev->list_head);
-               kfree(dvbdevfops);
                kfree(dvbdev);
                mutex_unlock(&dvbdev_register_lock);
                return ret;
        }
 
-       mutex_unlock(&dvbdev_register_lock);
-
        clsdev = device_create(dvb_class, adap->device,
                               MKDEV(DVB_MAJOR, minor),
                               dvbdev, "dvb%d.%s%d", adap->num, dnames[type], id);
        if (IS_ERR(clsdev)) {
                pr_err("%s: failed to create device dvb%d.%s%d (%ld)\n",
                       __func__, adap->num, dnames[type], id, PTR_ERR(clsdev));
+               if (new_node) {
+                       list_del (&new_node->list_head);
+                       kfree(dvbdevfops);
+                       kfree(new_node);
+               }
                dvb_media_device_free(dvbdev);
                list_del (&dvbdev->list_head);
-               kfree(dvbdevfops);
                kfree(dvbdev);
+               mutex_unlock(&dvbdev_register_lock);
                return PTR_ERR(clsdev);
        }
+
        dprintk("DVB: register adapter%d/%s%d @ minor: %i (0x%02x)\n",
                adap->num, dnames[type], id, minor, minor);
 
+       mutex_unlock(&dvbdev_register_lock);
        return 0;
 }
 EXPORT_SYMBOL(dvb_register_device);
@@ -575,7 +610,6 @@ static void dvb_free_device(struct kref *ref)
 {
        struct dvb_device *dvbdev = container_of(ref, struct dvb_device, ref);
 
-       kfree (dvbdev->fops);
        kfree (dvbdev);
 }
 
@@ -1081,9 +1115,17 @@ error:
 
 static void __exit exit_dvbdev(void)
 {
+       struct dvbdevfops_node *node, *next;
+
        class_destroy(dvb_class);
        cdev_del(&dvb_device_cdev);
        unregister_chrdev_region(MKDEV(DVB_MAJOR, 0), MAX_DVB_MINORS);
+
+       list_for_each_entry_safe(node, next, &dvbdevfops_list, list_head) {
+               list_del (&node->list_head);
+               kfree(node->fops);
+               kfree(node);
+       }
 }
 
 subsys_initcall(init_dvbdev);
index 1f1753f..0782f83 100644 (file)
@@ -798,7 +798,7 @@ MODULE_DEVICE_TABLE(i2c, mn88443x_i2c_id);
 static struct i2c_driver mn88443x_driver = {
        .driver = {
                .name = "mn88443x",
-               .of_match_table = of_match_ptr(mn88443x_of_match),
+               .of_match_table = mn88443x_of_match,
        },
        .probe_new = mn88443x_probe,
        .remove   = mn88443x_remove,
index 8287851..d85bfbb 100644 (file)
@@ -697,7 +697,7 @@ static void netup_unidvb_dma_fini(struct netup_unidvb_dev *ndev, int num)
        netup_unidvb_dma_enable(dma, 0);
        msleep(50);
        cancel_work_sync(&dma->work);
-       del_timer(&dma->timeout);
+       del_timer_sync(&dma->timeout);
 }
 
 static int netup_unidvb_dma_setup(struct netup_unidvb_dev *ndev)
@@ -887,12 +887,7 @@ static int netup_unidvb_initdev(struct pci_dev *pci_dev,
                ndev->lmmio0, (u32)pci_resource_len(pci_dev, 0),
                ndev->lmmio1, (u32)pci_resource_len(pci_dev, 1),
                pci_dev->irq);
-       if (request_irq(pci_dev->irq, netup_unidvb_isr, IRQF_SHARED,
-                       "netup_unidvb", pci_dev) < 0) {
-               dev_err(&pci_dev->dev,
-                       "%s(): can't get IRQ %d\n", __func__, pci_dev->irq);
-               goto irq_request_err;
-       }
+
        ndev->dma_size = 2 * 188 *
                NETUP_DMA_BLOCKS_COUNT * NETUP_DMA_PACKETS_COUNT;
        ndev->dma_virt = dma_alloc_coherent(&pci_dev->dev,
@@ -933,6 +928,14 @@ static int netup_unidvb_initdev(struct pci_dev *pci_dev,
                dev_err(&pci_dev->dev, "netup_unidvb: DMA setup failed\n");
                goto dma_setup_err;
        }
+
+       if (request_irq(pci_dev->irq, netup_unidvb_isr, IRQF_SHARED,
+                       "netup_unidvb", pci_dev) < 0) {
+               dev_err(&pci_dev->dev,
+                       "%s(): can't get IRQ %d\n", __func__, pci_dev->irq);
+               goto dma_setup_err;
+       }
+
        dev_info(&pci_dev->dev,
                "netup_unidvb: device has been initialized\n");
        return 0;
@@ -951,8 +954,6 @@ spi_setup_err:
        dma_free_coherent(&pci_dev->dev, ndev->dma_size,
                        ndev->dma_virt, ndev->dma_phys);
 dma_alloc_err:
-       free_irq(pci_dev->irq, pci_dev);
-irq_request_err:
        iounmap(ndev->lmmio1);
 pci_bar1_error:
        iounmap(ndev->lmmio0);
index de23627..43d85a5 100644 (file)
@@ -254,7 +254,7 @@ static int vpu_core_register(struct device *dev, struct vpu_core *core)
        if (vpu_core_is_exist(vpu, core))
                return 0;
 
-       core->workqueue = alloc_workqueue("vpu", WQ_UNBOUND | WQ_MEM_RECLAIM, 1);
+       core->workqueue = alloc_ordered_workqueue("vpu", WQ_MEM_RECLAIM);
        if (!core->workqueue) {
                dev_err(core->dev, "fail to alloc workqueue\n");
                return -ENOMEM;
index 6773b88..a48edb4 100644 (file)
@@ -740,7 +740,7 @@ int vpu_v4l2_open(struct file *file, struct vpu_inst *inst)
        inst->fh.ctrl_handler = &inst->ctrl_handler;
        file->private_data = &inst->fh;
        inst->state = VPU_CODEC_STATE_DEINIT;
-       inst->workqueue = alloc_workqueue("vpu_inst", WQ_UNBOUND | WQ_MEM_RECLAIM, 1);
+       inst->workqueue = alloc_ordered_workqueue("vpu_inst", WQ_MEM_RECLAIM);
        if (inst->workqueue) {
                INIT_WORK(&inst->msg_work, vpu_inst_run_work);
                ret = kfifo_init(&inst->msg_fifo,
index d013ea5..ac9a642 100644 (file)
@@ -3268,7 +3268,7 @@ static int coda_probe(struct platform_device *pdev)
                                                       &dev->iram.blob);
        }
 
-       dev->workqueue = alloc_workqueue("coda", WQ_UNBOUND | WQ_MEM_RECLAIM, 1);
+       dev->workqueue = alloc_ordered_workqueue("coda", WQ_MEM_RECLAIM);
        if (!dev->workqueue) {
                dev_err(&pdev->dev, "unable to alloc workqueue\n");
                ret = -ENOMEM;
index 75c92e2..19a4a08 100644 (file)
@@ -1035,7 +1035,6 @@ static int mdp_comp_sub_create(struct mdp_dev *mdp)
 {
        struct device *dev = &mdp->pdev->dev;
        struct device_node *node, *parent;
-       const struct mtk_mdp_driver_data *data = mdp->mdp_data;
 
        parent = dev->of_node->parent;
 
@@ -1045,7 +1044,7 @@ static int mdp_comp_sub_create(struct mdp_dev *mdp)
                int id, alias_id;
                struct mdp_comp *comp;
 
-               of_id = of_match_node(data->mdp_sub_comp_dt_ids, node);
+               of_id = of_match_node(mdp->mdp_data->mdp_sub_comp_dt_ids, node);
                if (!of_id)
                        continue;
                if (!of_device_is_available(node)) {
index 2999155..0fbd030 100644 (file)
@@ -584,6 +584,9 @@ static void mtk_init_vdec_params(struct mtk_vcodec_ctx *ctx)
 
        if (!(ctx->dev->dec_capability & VCODEC_CAPABILITY_4K_DISABLED)) {
                for (i = 0; i < num_supported_formats; i++) {
+                       if (mtk_video_formats[i].type != MTK_FMT_DEC)
+                               continue;
+
                        mtk_video_formats[i].frmsize.max_width =
                                VCODEC_DEC_4K_CODED_WIDTH;
                        mtk_video_formats[i].frmsize.max_height =
index 2385216..253e771 100644 (file)
@@ -378,8 +378,8 @@ static int mxc_isi_runtime_resume(struct device *dev)
 }
 
 static const struct dev_pm_ops mxc_isi_pm_ops = {
-       SET_SYSTEM_SLEEP_PM_OPS(mxc_isi_pm_suspend, mxc_isi_pm_resume)
-       SET_RUNTIME_PM_OPS(mxc_isi_runtime_suspend, mxc_isi_runtime_resume, NULL)
+       SYSTEM_SLEEP_PM_OPS(mxc_isi_pm_suspend, mxc_isi_pm_resume)
+       RUNTIME_PM_OPS(mxc_isi_runtime_suspend, mxc_isi_runtime_resume, NULL)
 };
 
 /* -----------------------------------------------------------------------------
@@ -528,7 +528,7 @@ static struct platform_driver mxc_isi_driver = {
        .driver = {
                .of_match_table = mxc_isi_of_match,
                .name           = MXC_ISI_DRIVER_NAME,
-               .pm             = &mxc_isi_pm_ops,
+               .pm             = pm_ptr(&mxc_isi_pm_ops),
        }
 };
 module_platform_driver(mxc_isi_driver);
index db538f3..19e80b9 100644 (file)
@@ -29,11 +29,10 @@ static inline void mxc_isi_write(struct mxc_isi_pipe *pipe, u32 reg, u32 val)
 
 void mxc_isi_channel_set_inbuf(struct mxc_isi_pipe *pipe, dma_addr_t dma_addr)
 {
-       mxc_isi_write(pipe, CHNL_IN_BUF_ADDR, dma_addr);
-#if CONFIG_ARCH_DMA_ADDR_T_64BIT
+       mxc_isi_write(pipe, CHNL_IN_BUF_ADDR, lower_32_bits(dma_addr));
        if (pipe->isi->pdata->has_36bit_dma)
-               mxc_isi_write(pipe, CHNL_IN_BUF_XTND_ADDR, dma_addr >> 32);
-#endif
+               mxc_isi_write(pipe, CHNL_IN_BUF_XTND_ADDR,
+                             upper_32_bits(dma_addr));
 }
 
 void mxc_isi_channel_set_outbuf(struct mxc_isi_pipe *pipe,
@@ -45,34 +44,36 @@ void mxc_isi_channel_set_outbuf(struct mxc_isi_pipe *pipe,
        val = mxc_isi_read(pipe, CHNL_OUT_BUF_CTRL);
 
        if (buf_id == MXC_ISI_BUF1) {
-               mxc_isi_write(pipe, CHNL_OUT_BUF1_ADDR_Y, dma_addrs[0]);
-               mxc_isi_write(pipe, CHNL_OUT_BUF1_ADDR_U, dma_addrs[1]);
-               mxc_isi_write(pipe, CHNL_OUT_BUF1_ADDR_V, dma_addrs[2]);
-#if CONFIG_ARCH_DMA_ADDR_T_64BIT
+               mxc_isi_write(pipe, CHNL_OUT_BUF1_ADDR_Y,
+                             lower_32_bits(dma_addrs[0]));
+               mxc_isi_write(pipe, CHNL_OUT_BUF1_ADDR_U,
+                             lower_32_bits(dma_addrs[1]));
+               mxc_isi_write(pipe, CHNL_OUT_BUF1_ADDR_V,
+                             lower_32_bits(dma_addrs[2]));
                if (pipe->isi->pdata->has_36bit_dma) {
                        mxc_isi_write(pipe, CHNL_Y_BUF1_XTND_ADDR,
-                                     dma_addrs[0] >> 32);
+                                     upper_32_bits(dma_addrs[0]));
                        mxc_isi_write(pipe, CHNL_U_BUF1_XTND_ADDR,
-                                     dma_addrs[1] >> 32);
+                                     upper_32_bits(dma_addrs[1]));
                        mxc_isi_write(pipe, CHNL_V_BUF1_XTND_ADDR,
-                                     dma_addrs[2] >> 32);
+                                     upper_32_bits(dma_addrs[2]));
                }
-#endif
                val ^= CHNL_OUT_BUF_CTRL_LOAD_BUF1_ADDR;
        } else  {
-               mxc_isi_write(pipe, CHNL_OUT_BUF2_ADDR_Y, dma_addrs[0]);
-               mxc_isi_write(pipe, CHNL_OUT_BUF2_ADDR_U, dma_addrs[1]);
-               mxc_isi_write(pipe, CHNL_OUT_BUF2_ADDR_V, dma_addrs[2]);
-#if CONFIG_ARCH_DMA_ADDR_T_64BIT
+               mxc_isi_write(pipe, CHNL_OUT_BUF2_ADDR_Y,
+                             lower_32_bits(dma_addrs[0]));
+               mxc_isi_write(pipe, CHNL_OUT_BUF2_ADDR_U,
+                             lower_32_bits(dma_addrs[1]));
+               mxc_isi_write(pipe, CHNL_OUT_BUF2_ADDR_V,
+                             lower_32_bits(dma_addrs[2]));
                if (pipe->isi->pdata->has_36bit_dma) {
                        mxc_isi_write(pipe, CHNL_Y_BUF2_XTND_ADDR,
-                                     dma_addrs[0] >> 32);
+                                     upper_32_bits(dma_addrs[0]));
                        mxc_isi_write(pipe, CHNL_U_BUF2_XTND_ADDR,
-                                     dma_addrs[1] >> 32);
+                                     upper_32_bits(dma_addrs[1]));
                        mxc_isi_write(pipe, CHNL_V_BUF2_XTND_ADDR,
-                                     dma_addrs[2] >> 32);
+                                     upper_32_bits(dma_addrs[2]));
                }
-#endif
                val ^= CHNL_OUT_BUF_CTRL_LOAD_BUF2_ADDR;
        }
 
index 898f321..8640db3 100644 (file)
@@ -353,7 +353,6 @@ static int video_get_subdev_format(struct camss_video *video,
        if (subdev == NULL)
                return -EPIPE;
 
-       memset(&fmt, 0, sizeof(fmt));
        fmt.pad = pad;
 
        ret = v4l2_subdev_call(subdev, pad, get_fmt, NULL, &fmt);
index 98bfd44..2a77353 100644 (file)
@@ -728,11 +728,9 @@ static int rvin_setup(struct rvin_dev *vin)
        case V4L2_FIELD_SEQ_TB:
        case V4L2_FIELD_SEQ_BT:
        case V4L2_FIELD_NONE:
-               vnmc = VNMC_IM_ODD_EVEN;
-               progressive = true;
-               break;
        case V4L2_FIELD_ALTERNATE:
                vnmc = VNMC_IM_ODD_EVEN;
+               progressive = true;
                break;
        default:
                vnmc = VNMC_IM_ODD;
@@ -1312,12 +1310,23 @@ static int rvin_mc_validate_format(struct rvin_dev *vin, struct v4l2_subdev *sd,
        }
 
        if (rvin_scaler_needed(vin)) {
+               /* Gen3 can't scale NV12 */
+               if (vin->info->model == RCAR_GEN3 &&
+                   vin->format.pixelformat == V4L2_PIX_FMT_NV12)
+                       return -EPIPE;
+
                if (!vin->scaler)
                        return -EPIPE;
        } else {
-               if (fmt.format.width != vin->format.width ||
-                   fmt.format.height != vin->format.height)
-                       return -EPIPE;
+               if (vin->format.pixelformat == V4L2_PIX_FMT_NV12) {
+                       if (ALIGN(fmt.format.width, 32) != vin->format.width ||
+                           ALIGN(fmt.format.height, 32) != vin->format.height)
+                               return -EPIPE;
+               } else {
+                       if (fmt.format.width != vin->format.width ||
+                           fmt.format.height != vin->format.height)
+                               return -EPIPE;
+               }
        }
 
        if (fmt.format.code != vin->mbus_code)
index 8355185..61cfaaf 100644 (file)
@@ -397,10 +397,12 @@ hantro_reset_raw_fmt(struct hantro_ctx *ctx, int bit_depth)
        if (!raw_vpu_fmt)
                return -EINVAL;
 
-       if (ctx->is_encoder)
+       if (ctx->is_encoder) {
                encoded_fmt = &ctx->dst_fmt;
-       else
+               ctx->vpu_src_fmt = raw_vpu_fmt;
+       } else {
                encoded_fmt = &ctx->src_fmt;
+       }
 
        hantro_reset_fmt(&raw_fmt, raw_vpu_fmt);
        raw_fmt.width = encoded_fmt->width;
index 44540de..d3b5cb4 100644 (file)
@@ -101,6 +101,10 @@ static int ce6230_i2c_master_xfer(struct i2c_adapter *adap,
                if (num > i + 1 && (msg[i+1].flags & I2C_M_RD)) {
                        if (msg[i].addr ==
                                ce6230_zl10353_config.demod_address) {
+                               if (msg[i].len < 1) {
+                                       i = -EOPNOTSUPP;
+                                       break;
+                               }
                                req.cmd = DEMOD_READ;
                                req.value = msg[i].addr >> 1;
                                req.index = msg[i].buf[0];
@@ -117,6 +121,10 @@ static int ce6230_i2c_master_xfer(struct i2c_adapter *adap,
                } else {
                        if (msg[i].addr ==
                                ce6230_zl10353_config.demod_address) {
+                               if (msg[i].len < 1) {
+                                       i = -EOPNOTSUPP;
+                                       break;
+                               }
                                req.cmd = DEMOD_WRITE;
                                req.value = msg[i].addr >> 1;
                                req.index = msg[i].buf[0];
index 7ed0ab9..0e4773f 100644 (file)
@@ -115,6 +115,10 @@ static int ec168_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[],
        while (i < num) {
                if (num > i + 1 && (msg[i+1].flags & I2C_M_RD)) {
                        if (msg[i].addr == ec168_ec100_config.demod_address) {
+                               if (msg[i].len < 1) {
+                                       i = -EOPNOTSUPP;
+                                       break;
+                               }
                                req.cmd = READ_DEMOD;
                                req.value = 0;
                                req.index = 0xff00 + msg[i].buf[0]; /* reg */
@@ -131,6 +135,10 @@ static int ec168_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[],
                        }
                } else {
                        if (msg[i].addr == ec168_ec100_config.demod_address) {
+                               if (msg[i].len < 1) {
+                                       i = -EOPNOTSUPP;
+                                       break;
+                               }
                                req.cmd = WRITE_DEMOD;
                                req.value = msg[i].buf[1]; /* val */
                                req.index = 0xff00 + msg[i].buf[0]; /* reg */
@@ -139,6 +147,10 @@ static int ec168_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[],
                                ret = ec168_ctrl_msg(d, &req);
                                i += 1;
                        } else {
+                               if (msg[i].len < 1) {
+                                       i = -EOPNOTSUPP;
+                                       break;
+                               }
                                req.cmd = WRITE_I2C;
                                req.value = msg[i].buf[0]; /* val */
                                req.index = 0x0100 + msg[i].addr; /* I2C addr */
index 795a012..f7884bb 100644 (file)
@@ -176,6 +176,10 @@ static int rtl28xxu_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[],
                        ret = -EOPNOTSUPP;
                        goto err_mutex_unlock;
                } else if (msg[0].addr == 0x10) {
+                       if (msg[0].len < 1 || msg[1].len < 1) {
+                               ret = -EOPNOTSUPP;
+                               goto err_mutex_unlock;
+                       }
                        /* method 1 - integrated demod */
                        if (msg[0].buf[0] == 0x00) {
                                /* return demod page from driver cache */
@@ -189,6 +193,10 @@ static int rtl28xxu_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[],
                                ret = rtl28xxu_ctrl_msg(d, &req);
                        }
                } else if (msg[0].len < 2) {
+                       if (msg[0].len < 1) {
+                               ret = -EOPNOTSUPP;
+                               goto err_mutex_unlock;
+                       }
                        /* method 2 - old I2C */
                        req.value = (msg[0].buf[0] << 8) | (msg[0].addr << 1);
                        req.index = CMD_I2C_RD;
@@ -217,8 +225,16 @@ static int rtl28xxu_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[],
                        ret = -EOPNOTSUPP;
                        goto err_mutex_unlock;
                } else if (msg[0].addr == 0x10) {
+                       if (msg[0].len < 1) {
+                               ret = -EOPNOTSUPP;
+                               goto err_mutex_unlock;
+                       }
                        /* method 1 - integrated demod */
                        if (msg[0].buf[0] == 0x00) {
+                               if (msg[0].len < 2) {
+                                       ret = -EOPNOTSUPP;
+                                       goto err_mutex_unlock;
+                               }
                                /* save demod page for later demod access */
                                dev->page = msg[0].buf[1];
                                ret = 0;
@@ -231,6 +247,10 @@ static int rtl28xxu_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[],
                                ret = rtl28xxu_ctrl_msg(d, &req);
                        }
                } else if ((msg[0].len < 23) && (!dev->new_i2c_write)) {
+                       if (msg[0].len < 1) {
+                               ret = -EOPNOTSUPP;
+                               goto err_mutex_unlock;
+                       }
                        /* method 2 - old I2C */
                        req.value = (msg[0].buf[0] << 8) | (msg[0].addr << 1);
                        req.index = CMD_I2C_WR;
index 7d78ee0..a31c6f8 100644 (file)
@@ -988,6 +988,10 @@ static int az6027_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[], int n
                        /* write/read request */
                        if (i + 1 < num && (msg[i + 1].flags & I2C_M_RD)) {
                                req = 0xB9;
+                               if (msg[i].len < 1) {
+                                       i = -EOPNOTSUPP;
+                                       break;
+                               }
                                index = (((msg[i].buf[0] << 8) & 0xff00) | (msg[i].buf[1] & 0x00ff));
                                value = msg[i].addr + (msg[i].len << 8);
                                length = msg[i + 1].len + 6;
@@ -1001,6 +1005,10 @@ static int az6027_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[], int n
 
                                /* demod 16bit addr */
                                req = 0xBD;
+                               if (msg[i].len < 1) {
+                                       i = -EOPNOTSUPP;
+                                       break;
+                               }
                                index = (((msg[i].buf[0] << 8) & 0xff00) | (msg[i].buf[1] & 0x00ff));
                                value = msg[i].addr + (2 << 8);
                                length = msg[i].len - 2;
@@ -1026,6 +1034,10 @@ static int az6027_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msg[], int n
                        } else {
 
                                req = 0xBD;
+                               if (msg[i].len < 1) {
+                                       i = -EOPNOTSUPP;
+                                       break;
+                               }
                                index = msg[i].buf[0] & 0x00FF;
                                value = msg[i].addr + (1 << 8);
                                length = msg[i].len - 1;
index 2756815..32134be 100644 (file)
@@ -63,6 +63,10 @@ static int digitv_i2c_xfer(struct i2c_adapter *adap,struct i2c_msg msg[],int num
                warn("more than 2 i2c messages at a time is not handled yet. TODO.");
 
        for (i = 0; i < num; i++) {
+               if (msg[i].len < 1) {
+                       i = -EOPNOTSUPP;
+                       break;
+               }
                /* write/read request */
                if (i+1 < num && (msg[i+1].flags & I2C_M_RD)) {
                        if (digitv_ctrl_msg(d, USB_READ_COFDM, msg[i].buf[0], NULL, 0,
index 0ca7642..8747960 100644 (file)
@@ -946,7 +946,7 @@ static int su3000_read_mac_address(struct dvb_usb_device *d, u8 mac[6])
        for (i = 0; i < 6; i++) {
                obuf[1] = 0xf0 + i;
                if (i2c_transfer(&d->i2c_adap, msg, 2) != 2)
-                       break;
+                       return -1;
                else
                        mac[i] = ibuf[0];
        }
index 9501b10..0df1027 100644 (file)
@@ -37,6 +37,7 @@ config VIDEO_PVRUSB2_DVB
        bool "pvrusb2 ATSC/DVB support"
        default y
        depends on VIDEO_PVRUSB2 && DVB_CORE
+       depends on VIDEO_PVRUSB2=m || DVB_CORE=y
        select DVB_LGDT330X if MEDIA_SUBDRV_AUTOSELECT
        select DVB_S5H1409 if MEDIA_SUBDRV_AUTOSELECT
        select DVB_S5H1411 if MEDIA_SUBDRV_AUTOSELECT
index 38822ce..c4474d4 100644 (file)
@@ -1544,8 +1544,7 @@ static void ttusb_dec_exit_dvb(struct ttusb_dec *dec)
        dvb_dmx_release(&dec->demux);
        if (dec->fe) {
                dvb_unregister_frontend(dec->fe);
-               if (dec->fe->ops.release)
-                       dec->fe->ops.release(dec->fe);
+               dvb_frontend_detach(dec->fe);
        }
        dvb_unregister_adapter(&dec->adapter);
 }
index 7aefa76..d631ce4 100644 (file)
@@ -251,14 +251,17 @@ static int uvc_parse_format(struct uvc_device *dev,
                /* Find the format descriptor from its GUID. */
                fmtdesc = uvc_format_by_guid(&buffer[5]);
 
-               if (fmtdesc != NULL) {
-                       format->fcc = fmtdesc->fcc;
-               } else {
+               if (!fmtdesc) {
+                       /*
+                        * Unknown video formats are not fatal errors, the
+                        * caller will skip this descriptor.
+                        */
                        dev_info(&streaming->intf->dev,
                                 "Unknown video format %pUl\n", &buffer[5]);
-                       format->fcc = 0;
+                       return 0;
                }
 
+               format->fcc = fmtdesc->fcc;
                format->bpp = buffer[21];
 
                /*
@@ -675,7 +678,7 @@ static int uvc_parse_streaming(struct uvc_device *dev,
        interval = (u32 *)&frame[nframes];
 
        streaming->format = format;
-       streaming->nformats = nformats;
+       streaming->nformats = 0;
 
        /* Parse the format descriptors. */
        while (buflen > 2 && buffer[1] == USB_DT_CS_INTERFACE) {
@@ -689,7 +692,10 @@ static int uvc_parse_streaming(struct uvc_device *dev,
                                &interval, buffer, buflen);
                        if (ret < 0)
                                goto error;
+                       if (!ret)
+                               break;
 
+                       streaming->nformats++;
                        frame += format->nframes;
                        format++;
 
index bf0c181..22fe08f 100644 (file)
@@ -314,8 +314,7 @@ int v4l2_create_fwnode_links_to_pad(struct v4l2_subdev *src_sd,
 {
        struct fwnode_handle *endpoint;
 
-       if (!(sink->flags & MEDIA_PAD_FL_SINK) ||
-           !is_media_entity_v4l2_subdev(sink->entity))
+       if (!(sink->flags & MEDIA_PAD_FL_SINK))
                return -EINVAL;
 
        fwnode_graph_for_each_endpoint(dev_fwnode(src_sd->dev), endpoint) {
index f0a7531..2d240bf 100644 (file)
@@ -6,6 +6,7 @@ config EEPROM_AT24
        depends on I2C && SYSFS
        select NVMEM
        select NVMEM_SYSFS
+       select REGMAP
        select REGMAP_I2C
        help
          Enable this driver to get read/write support to most I2C EEPROMs
index f484669..30d4d04 100644 (file)
@@ -316,12 +316,14 @@ static void fastrpc_free_map(struct kref *ref)
        if (map->table) {
                if (map->attr & FASTRPC_ATTR_SECUREMAP) {
                        struct qcom_scm_vmperm perm;
+                       int vmid = map->fl->cctx->vmperms[0].vmid;
+                       u64 src_perms = BIT(QCOM_SCM_VMID_HLOS) | BIT(vmid);
                        int err = 0;
 
                        perm.vmid = QCOM_SCM_VMID_HLOS;
                        perm.perm = QCOM_SCM_PERM_RWX;
                        err = qcom_scm_assign_mem(map->phys, map->size,
-                               &map->fl->cctx->perms, &perm, 1);
+                               &src_perms, &perm, 1);
                        if (err) {
                                dev_err(map->fl->sctx->dev, "Failed to assign memory phys 0x%llx size 0x%llx err %d",
                                                map->phys, map->size, err);
@@ -787,8 +789,12 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
                goto map_err;
        }
 
-       map->phys = sg_dma_address(map->table->sgl);
-       map->phys += ((u64)fl->sctx->sid << 32);
+       if (attr & FASTRPC_ATTR_SECUREMAP) {
+               map->phys = sg_phys(map->table->sgl);
+       } else {
+               map->phys = sg_dma_address(map->table->sgl);
+               map->phys += ((u64)fl->sctx->sid << 32);
+       }
        map->size = len;
        map->va = sg_virt(map->table->sgl);
        map->len = len;
@@ -798,9 +804,15 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int fd,
                 * If subsystem VMIDs are defined in DTSI, then do
                 * hyp_assign from HLOS to those VM(s)
                 */
+               u64 src_perms = BIT(QCOM_SCM_VMID_HLOS);
+               struct qcom_scm_vmperm dst_perms[2] = {0};
+
+               dst_perms[0].vmid = QCOM_SCM_VMID_HLOS;
+               dst_perms[0].perm = QCOM_SCM_PERM_RW;
+               dst_perms[1].vmid = fl->cctx->vmperms[0].vmid;
+               dst_perms[1].perm = QCOM_SCM_PERM_RWX;
                map->attr = attr;
-               err = qcom_scm_assign_mem(map->phys, (u64)map->size, &fl->cctx->perms,
-                               fl->cctx->vmperms, fl->cctx->vmcount);
+               err = qcom_scm_assign_mem(map->phys, (u64)map->size, &src_perms, dst_perms, 2);
                if (err) {
                        dev_err(sess->dev, "Failed to assign memory with phys 0x%llx size 0x%llx err %d",
                                        map->phys, map->size, err);
@@ -1892,7 +1904,7 @@ static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
        req.vaddrout = rsp_msg.vaddr;
 
        /* Add memory to static PD pool, protection thru hypervisor */
-       if (req.flags != ADSP_MMAP_REMOTE_HEAP_ADDR && fl->cctx->vmcount) {
+       if (req.flags == ADSP_MMAP_REMOTE_HEAP_ADDR && fl->cctx->vmcount) {
                struct qcom_scm_vmperm perm;
 
                perm.vmid = QCOM_SCM_VMID_HLOS;
@@ -2337,8 +2349,10 @@ static void fastrpc_notify_users(struct fastrpc_user *user)
        struct fastrpc_invoke_ctx *ctx;
 
        spin_lock(&user->lock);
-       list_for_each_entry(ctx, &user->pending, node)
+       list_for_each_entry(ctx, &user->pending, node) {
+               ctx->retval = -EPIPE;
                complete(&ctx->work);
+       }
        spin_unlock(&user->lock);
 }
 
@@ -2349,7 +2363,9 @@ static void fastrpc_rpmsg_remove(struct rpmsg_device *rpdev)
        struct fastrpc_user *user;
        unsigned long flags;
 
+       /* No invocations past this point */
        spin_lock_irqsave(&cctx->lock, flags);
+       cctx->rpdev = NULL;
        list_for_each_entry(user, &cctx->users, user)
                fastrpc_notify_users(user);
        spin_unlock_irqrestore(&cctx->lock, flags);
@@ -2368,7 +2384,6 @@ static void fastrpc_rpmsg_remove(struct rpmsg_device *rpdev)
 
        of_platform_depopulate(&rpdev->dev);
 
-       cctx->rpdev = NULL;
        fastrpc_channel_ctx_put(cctx);
 }
 
index 48821f4..92110cb 100644 (file)
@@ -487,6 +487,7 @@ static void lkdtm_UNSET_SMEP(void)
         * the cr4 writing instruction.
         */
        insn = (unsigned char *)native_write_cr4;
+       OPTIMIZER_HIDE_VAR(insn);
        for (i = 0; i < MOV_CR4_DEPTH; i++) {
                /* mov %rdi, %cr4 */
                if (insn[i] == 0x0f && insn[i+1] == 0x22 && insn[i+2] == 0xe7)
index 00c33ed..6d01025 100644 (file)
@@ -264,6 +264,7 @@ static ssize_t power_ro_lock_store(struct device *dev,
                goto out_put;
        }
        req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_BOOT_WP;
+       req_to_mmc_queue_req(req)->drv_op_result = -EIO;
        blk_execute_rq(req, false);
        ret = req_to_mmc_queue_req(req)->drv_op_result;
        blk_mq_free_request(req);
@@ -357,15 +358,15 @@ static const struct attribute_group *mmc_disk_attr_groups[] = {
        NULL,
 };
 
-static int mmc_blk_open(struct block_device *bdev, fmode_t mode)
+static int mmc_blk_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct mmc_blk_data *md = mmc_blk_get(bdev->bd_disk);
+       struct mmc_blk_data *md = mmc_blk_get(disk);
        int ret = -ENXIO;
 
        mutex_lock(&block_mutex);
        if (md) {
                ret = 0;
-               if ((mode & FMODE_WRITE) && md->read_only) {
+               if ((mode & BLK_OPEN_WRITE) && md->read_only) {
                        mmc_blk_put(md);
                        ret = -EROFS;
                }
@@ -375,7 +376,7 @@ static int mmc_blk_open(struct block_device *bdev, fmode_t mode)
        return ret;
 }
 
-static void mmc_blk_release(struct gendisk *disk, fmode_t mode)
+static void mmc_blk_release(struct gendisk *disk)
 {
        struct mmc_blk_data *md = disk->private_data;
 
@@ -651,6 +652,7 @@ static int mmc_blk_ioctl_cmd(struct mmc_blk_data *md,
        idatas[0] = idata;
        req_to_mmc_queue_req(req)->drv_op =
                rpmb ? MMC_DRV_OP_IOCTL_RPMB : MMC_DRV_OP_IOCTL;
+       req_to_mmc_queue_req(req)->drv_op_result = -EIO;
        req_to_mmc_queue_req(req)->drv_op_data = idatas;
        req_to_mmc_queue_req(req)->ioc_count = 1;
        blk_execute_rq(req, false);
@@ -722,6 +724,7 @@ static int mmc_blk_ioctl_multi_cmd(struct mmc_blk_data *md,
        }
        req_to_mmc_queue_req(req)->drv_op =
                rpmb ? MMC_DRV_OP_IOCTL_RPMB : MMC_DRV_OP_IOCTL;
+       req_to_mmc_queue_req(req)->drv_op_result = -EIO;
        req_to_mmc_queue_req(req)->drv_op_data = idata;
        req_to_mmc_queue_req(req)->ioc_count = n;
        blk_execute_rq(req, false);
@@ -754,7 +757,7 @@ static int mmc_blk_check_blkdev(struct block_device *bdev)
        return 0;
 }
 
-static int mmc_blk_ioctl(struct block_device *bdev, fmode_t mode,
+static int mmc_blk_ioctl(struct block_device *bdev, blk_mode_t mode,
        unsigned int cmd, unsigned long arg)
 {
        struct mmc_blk_data *md;
@@ -791,7 +794,7 @@ static int mmc_blk_ioctl(struct block_device *bdev, fmode_t mode,
 }
 
 #ifdef CONFIG_COMPAT
-static int mmc_blk_compat_ioctl(struct block_device *bdev, fmode_t mode,
+static int mmc_blk_compat_ioctl(struct block_device *bdev, blk_mode_t mode,
        unsigned int cmd, unsigned long arg)
 {
        return mmc_blk_ioctl(bdev, mode, cmd, (unsigned long) compat_ptr(arg));
@@ -2806,6 +2809,7 @@ static int mmc_dbg_card_status_get(void *data, u64 *val)
        if (IS_ERR(req))
                return PTR_ERR(req);
        req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_GET_CARD_STATUS;
+       req_to_mmc_queue_req(req)->drv_op_result = -EIO;
        blk_execute_rq(req, false);
        ret = req_to_mmc_queue_req(req)->drv_op_result;
        if (ret >= 0) {
@@ -2844,6 +2848,7 @@ static int mmc_ext_csd_open(struct inode *inode, struct file *filp)
                goto out_free;
        }
        req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_GET_EXT_CSD;
+       req_to_mmc_queue_req(req)->drv_op_result = -EIO;
        req_to_mmc_queue_req(req)->drv_op_data = &ext_csd;
        blk_execute_rq(req, false);
        err = req_to_mmc_queue_req(req)->drv_op_result;
index 2e120ad..0c5f5e3 100644 (file)
@@ -28,7 +28,6 @@ struct mmc_pwrseq_sd8787 {
        struct mmc_pwrseq pwrseq;
        struct gpio_desc *reset_gpio;
        struct gpio_desc *pwrdn_gpio;
-       u32 reset_pwrdwn_delay_ms;
 };
 
 #define to_pwrseq_sd8787(p) container_of(p, struct mmc_pwrseq_sd8787, pwrseq)
@@ -39,7 +38,7 @@ static void mmc_pwrseq_sd8787_pre_power_on(struct mmc_host *host)
 
        gpiod_set_value_cansleep(pwrseq->reset_gpio, 1);
 
-       msleep(pwrseq->reset_pwrdwn_delay_ms);
+       msleep(300);
        gpiod_set_value_cansleep(pwrseq->pwrdn_gpio, 1);
 }
 
@@ -51,17 +50,37 @@ static void mmc_pwrseq_sd8787_power_off(struct mmc_host *host)
        gpiod_set_value_cansleep(pwrseq->reset_gpio, 0);
 }
 
+static void mmc_pwrseq_wilc1000_pre_power_on(struct mmc_host *host)
+{
+       struct mmc_pwrseq_sd8787 *pwrseq = to_pwrseq_sd8787(host->pwrseq);
+
+       /* The pwrdn_gpio is really CHIP_EN, reset_gpio is RESETN */
+       gpiod_set_value_cansleep(pwrseq->pwrdn_gpio, 1);
+       msleep(5);
+       gpiod_set_value_cansleep(pwrseq->reset_gpio, 1);
+}
+
+static void mmc_pwrseq_wilc1000_power_off(struct mmc_host *host)
+{
+       struct mmc_pwrseq_sd8787 *pwrseq = to_pwrseq_sd8787(host->pwrseq);
+
+       gpiod_set_value_cansleep(pwrseq->reset_gpio, 0);
+       gpiod_set_value_cansleep(pwrseq->pwrdn_gpio, 0);
+}
+
 static const struct mmc_pwrseq_ops mmc_pwrseq_sd8787_ops = {
        .pre_power_on = mmc_pwrseq_sd8787_pre_power_on,
        .power_off = mmc_pwrseq_sd8787_power_off,
 };
 
-static const u32 sd8787_delay_ms = 300;
-static const u32 wilc1000_delay_ms = 5;
+static const struct mmc_pwrseq_ops mmc_pwrseq_wilc1000_ops = {
+       .pre_power_on = mmc_pwrseq_wilc1000_pre_power_on,
+       .power_off = mmc_pwrseq_wilc1000_power_off,
+};
 
 static const struct of_device_id mmc_pwrseq_sd8787_of_match[] = {
-       { .compatible = "mmc-pwrseq-sd8787", .data = &sd8787_delay_ms },
-       { .compatible = "mmc-pwrseq-wilc1000", .data = &wilc1000_delay_ms },
+       { .compatible = "mmc-pwrseq-sd8787", .data = &mmc_pwrseq_sd8787_ops },
+       { .compatible = "mmc-pwrseq-wilc1000", .data = &mmc_pwrseq_wilc1000_ops },
        {/* sentinel */},
 };
 MODULE_DEVICE_TABLE(of, mmc_pwrseq_sd8787_of_match);
@@ -77,7 +96,6 @@ static int mmc_pwrseq_sd8787_probe(struct platform_device *pdev)
                return -ENOMEM;
 
        match = of_match_node(mmc_pwrseq_sd8787_of_match, pdev->dev.of_node);
-       pwrseq->reset_pwrdwn_delay_ms = *(u32 *)match->data;
 
        pwrseq->pwrdn_gpio = devm_gpiod_get(dev, "powerdown", GPIOD_OUT_LOW);
        if (IS_ERR(pwrseq->pwrdn_gpio))
@@ -88,7 +106,7 @@ static int mmc_pwrseq_sd8787_probe(struct platform_device *pdev)
                return PTR_ERR(pwrseq->reset_gpio);
 
        pwrseq->pwrseq.dev = dev;
-       pwrseq->pwrseq.ops = &mmc_pwrseq_sd8787_ops;
+       pwrseq->pwrseq.ops = match->data;
        pwrseq->pwrseq.owner = THIS_MODULE;
        platform_set_drvdata(pdev, pwrseq);
 
index 8648f7e..eea2088 100644 (file)
@@ -1403,8 +1403,8 @@ static int bcm2835_probe(struct platform_device *pdev)
        host->max_clk = clk_get_rate(clk);
 
        host->irq = platform_get_irq(pdev, 0);
-       if (host->irq <= 0) {
-               ret = -EINVAL;
+       if (host->irq < 0) {
+               ret = host->irq;
                goto err;
        }
 
index 39c6707..9af6b09 100644 (file)
@@ -649,6 +649,7 @@ static struct platform_driver litex_mmc_driver = {
        .driver = {
                .name = "litex-mmc",
                .of_match_table = litex_match,
+               .probe_type = PROBE_PREFER_ASYNCHRONOUS,
        },
 };
 module_platform_driver(litex_mmc_driver);
index b8514d9..ee9a25b 100644 (file)
@@ -991,11 +991,8 @@ static irqreturn_t meson_mmc_irq(int irq, void *dev_id)
 
                if (data && !cmd->error)
                        data->bytes_xfered = data->blksz * data->blocks;
-               if (meson_mmc_bounce_buf_read(data) ||
-                   meson_mmc_get_next_command(cmd))
-                       ret = IRQ_WAKE_THREAD;
-               else
-                       ret = IRQ_HANDLED;
+
+               return IRQ_WAKE_THREAD;
        }
 
 out:
@@ -1007,9 +1004,6 @@ out:
                writel(start, host->regs + SD_EMMC_START);
        }
 
-       if (ret == IRQ_HANDLED)
-               meson_mmc_request_done(host->mmc, cmd->mrq);
-
        return ret;
 }
 
@@ -1192,8 +1186,8 @@ static int meson_mmc_probe(struct platform_device *pdev)
                return PTR_ERR(host->regs);
 
        host->irq = platform_get_irq(pdev, 0);
-       if (host->irq <= 0)
-               return -EINVAL;
+       if (host->irq < 0)
+               return host->irq;
 
        cd_irq = platform_get_irq_optional(pdev, 1);
        mmc_gpio_set_cd_irq(mmc, cd_irq);
index f2b2e8b..696cbef 100644 (file)
@@ -1735,7 +1735,8 @@ static void mmci_set_max_busy_timeout(struct mmc_host *mmc)
                return;
 
        if (host->variant->busy_timeout && mmc->actual_clock)
-               max_busy_timeout = ~0UL / (mmc->actual_clock / MSEC_PER_SEC);
+               max_busy_timeout = U32_MAX / DIV_ROUND_UP(mmc->actual_clock,
+                                                         MSEC_PER_SEC);
 
        mmc->max_busy_timeout = max_busy_timeout;
 }
index edade0e..9785ec9 100644 (file)
@@ -2680,7 +2680,7 @@ static int msdc_drv_probe(struct platform_device *pdev)
 
        host->irq = platform_get_irq(pdev, 0);
        if (host->irq < 0) {
-               ret = -EINVAL;
+               ret = host->irq;
                goto host_free;
        }
 
index 629efbe..b4f6a0a 100644 (file)
@@ -704,7 +704,7 @@ static int mvsd_probe(struct platform_device *pdev)
        }
        irq = platform_get_irq(pdev, 0);
        if (irq < 0)
-               return -ENXIO;
+               return irq;
 
        mmc = mmc_alloc_host(sizeof(struct mvsd_host), &pdev->dev);
        if (!mmc) {
index ce78edf..86454f1 100644 (file)
@@ -1343,7 +1343,7 @@ static int mmc_omap_probe(struct platform_device *pdev)
 
        irq = platform_get_irq(pdev, 0);
        if (irq < 0)
-               return -ENXIO;
+               return irq;
 
        host->virt_base = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
        if (IS_ERR(host->virt_base))
index 517dde7..1e0f2d7 100644 (file)
@@ -1791,9 +1791,11 @@ static int omap_hsmmc_probe(struct platform_device *pdev)
        }
 
        res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-       irq = platform_get_irq(pdev, 0);
-       if (res == NULL || irq < 0)
+       if (!res)
                return -ENXIO;
+       irq = platform_get_irq(pdev, 0);
+       if (irq < 0)
+               return irq;
 
        base = devm_ioremap_resource(&pdev->dev, res);
        if (IS_ERR(base))
index 6f9d31a..1bf22b0 100644 (file)
@@ -637,7 +637,7 @@ static int owl_mmc_probe(struct platform_device *pdev)
 
        owl_host->irq = platform_get_irq(pdev, 0);
        if (owl_host->irq < 0) {
-               ret = -EINVAL;
+               ret = owl_host->irq;
                goto err_release_channel;
        }
 
index 8f0e639..edf2e6c 100644 (file)
@@ -829,7 +829,7 @@ static int sdhci_acpi_probe(struct platform_device *pdev)
        host->ops       = &sdhci_acpi_ops_dflt;
        host->irq       = platform_get_irq(pdev, 0);
        if (host->irq < 0) {
-               err = -EINVAL;
+               err = host->irq;
                goto err_free;
        }
 
index b24aa27..d2f6250 100644 (file)
@@ -540,9 +540,11 @@ static int sdhci_cdns_probe(struct platform_device *pdev)
 
        if (host->mmc->caps & MMC_CAP_HW_RESET) {
                priv->rst_hw = devm_reset_control_get_optional_exclusive(dev, NULL);
-               if (IS_ERR(priv->rst_hw))
-                       return dev_err_probe(mmc_dev(host->mmc), PTR_ERR(priv->rst_hw),
-                                            "reset controller error\n");
+               if (IS_ERR(priv->rst_hw)) {
+                       ret = dev_err_probe(mmc_dev(host->mmc), PTR_ERR(priv->rst_hw),
+                                           "reset controller error\n");
+                       goto free;
+               }
                if (priv->rst_hw)
                        host->mmc_host_ops.card_hw_reset = sdhci_cdns_mmc_hw_reset;
        }
index d7c0c0b..eebf946 100644 (file)
@@ -1634,6 +1634,10 @@ sdhci_esdhc_imx_probe_dt(struct platform_device *pdev,
        if (ret)
                return ret;
 
+       /* HS400/HS400ES require 8 bit bus */
+       if (!(host->mmc->caps & MMC_CAP_8_BIT_DATA))
+               host->mmc->caps2 &= ~(MMC_CAP2_HS400 | MMC_CAP2_HS400_ES);
+
        if (mmc_gpio_get_cd(host->mmc) >= 0)
                host->quirks &= ~SDHCI_QUIRK_BROKEN_CARD_DETECTION;
 
@@ -1724,10 +1728,6 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
                host->mmc_host_ops.init_card = usdhc_init_card;
        }
 
-       err = sdhci_esdhc_imx_probe_dt(pdev, host, imx_data);
-       if (err)
-               goto disable_ahb_clk;
-
        if (imx_data->socdata->flags & ESDHC_FLAG_MAN_TUNING)
                sdhci_esdhc_ops.platform_execute_tuning =
                                        esdhc_executing_tuning;
@@ -1735,15 +1735,13 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
        if (imx_data->socdata->flags & ESDHC_FLAG_ERR004536)
                host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
 
-       if (host->mmc->caps & MMC_CAP_8_BIT_DATA &&
-           imx_data->socdata->flags & ESDHC_FLAG_HS400)
+       if (imx_data->socdata->flags & ESDHC_FLAG_HS400)
                host->mmc->caps2 |= MMC_CAP2_HS400;
 
        if (imx_data->socdata->flags & ESDHC_FLAG_BROKEN_AUTO_CMD23)
                host->quirks2 |= SDHCI_QUIRK2_ACMD23_BROKEN;
 
-       if (host->mmc->caps & MMC_CAP_8_BIT_DATA &&
-           imx_data->socdata->flags & ESDHC_FLAG_HS400_ES) {
+       if (imx_data->socdata->flags & ESDHC_FLAG_HS400_ES) {
                host->mmc->caps2 |= MMC_CAP2_HS400_ES;
                host->mmc_host_ops.hs400_enhanced_strobe =
                                        esdhc_hs400_enhanced_strobe;
@@ -1765,6 +1763,10 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
                        goto disable_ahb_clk;
        }
 
+       err = sdhci_esdhc_imx_probe_dt(pdev, host, imx_data);
+       if (err)
+               goto disable_ahb_clk;
+
        sdhci_esdhc_imx_hwinit(host);
 
        err = sdhci_add_host(host);
index 8ac81d5..1877d58 100644 (file)
@@ -2479,6 +2479,9 @@ static inline void sdhci_msm_get_of_property(struct platform_device *pdev,
                msm_host->ddr_config = DDR_CONFIG_POR_VAL;
 
        of_property_read_u32(node, "qcom,dll-config", &msm_host->dll_config);
+
+       if (of_device_is_compatible(node, "qcom,msm8916-sdhci"))
+               host->quirks2 |= SDHCI_QUIRK2_BROKEN_64_BIT_DMA;
 }
 
 static int sdhci_msm_gcc_reset(struct device *dev, struct sdhci_host *host)
index d463e2f..c790357 100644 (file)
@@ -65,8 +65,8 @@ static int sdhci_probe(struct platform_device *pdev)
        host->hw_name = "sdhci";
        host->ops = &sdhci_pltfm_ops;
        host->irq = platform_get_irq(pdev, 0);
-       if (host->irq <= 0) {
-               ret = -EINVAL;
+       if (host->irq < 0) {
+               ret = host->irq;
                goto err_host;
        }
        host->quirks = SDHCI_QUIRK_BROKEN_ADMA;
index 0fd4c9d..5cf5334 100644 (file)
@@ -1400,7 +1400,7 @@ static int sh_mmcif_probe(struct platform_device *pdev)
        irq[0] = platform_get_irq(pdev, 0);
        irq[1] = platform_get_irq_optional(pdev, 1);
        if (irq[0] < 0)
-               return -ENXIO;
+               return irq[0];
 
        reg = devm_platform_ioremap_resource(pdev, 0);
        if (IS_ERR(reg))
index 3db9f32..69dcb88 100644 (file)
@@ -1350,8 +1350,8 @@ static int sunxi_mmc_resource_request(struct sunxi_mmc_host *host,
                return ret;
 
        host->irq = platform_get_irq(pdev, 0);
-       if (host->irq <= 0) {
-               ret = -EINVAL;
+       if (host->irq < 0) {
+               ret = host->irq;
                goto error_disable_mmc;
        }
 
index 2f59917..2e17903 100644 (file)
@@ -1757,8 +1757,10 @@ static int usdhi6_probe(struct platform_device *pdev)
        irq_cd = platform_get_irq_byname(pdev, "card detect");
        irq_sd = platform_get_irq_byname(pdev, "data");
        irq_sdio = platform_get_irq_byname(pdev, "SDIO");
-       if (irq_sd < 0 || irq_sdio < 0)
-               return -ENODEV;
+       if (irq_sd < 0)
+               return irq_sd;
+       if (irq_sdio < 0)
+               return irq_sdio;
 
        mmc = mmc_alloc_host(sizeof(struct usdhi6_host), dev);
        if (!mmc)
index e4c4bfa..9ec593d 100644 (file)
@@ -1713,6 +1713,9 @@ static void construct_request_response(struct vub300_mmc_host *vub300,
        int bytes = 3 & less_cmd;
        int words = less_cmd >> 2;
        u8 *r = vub300->resp.response.command_response;
+
+       if (!resp_len)
+               return;
        if (bytes == 3) {
                cmd->resp[words] = (r[1 + (words << 2)] << 24)
                        | (r[2 + (words << 2)] << 16)
index 4cd37ec..be106dc 100644 (file)
@@ -209,40 +209,34 @@ static void block2mtd_free_device(struct block2mtd_dev *dev)
        if (dev->blkdev) {
                invalidate_mapping_pages(dev->blkdev->bd_inode->i_mapping,
                                        0, -1);
-               blkdev_put(dev->blkdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+               blkdev_put(dev->blkdev, NULL);
        }
 
        kfree(dev);
 }
 
-
-static struct block2mtd_dev *add_device(char *devname, int erase_size,
-               char *label, int timeout)
+/*
+ * This function is marked __ref because it calls the __init marked
+ * early_lookup_bdev when called from the early boot code.
+ */
+static struct block_device __ref *mdtblock_early_get_bdev(const char *devname,
+               blk_mode_t mode, int timeout, struct block2mtd_dev *dev)
 {
+       struct block_device *bdev = ERR_PTR(-ENODEV);
 #ifndef MODULE
        int i;
-#endif
-       const fmode_t mode = FMODE_READ | FMODE_WRITE | FMODE_EXCL;
-       struct block_device *bdev;
-       struct block2mtd_dev *dev;
-       char *name;
 
-       if (!devname)
-               return NULL;
-
-       dev = kzalloc(sizeof(struct block2mtd_dev), GFP_KERNEL);
-       if (!dev)
-               return NULL;
-
-       /* Get a handle on the device */
-       bdev = blkdev_get_by_path(devname, mode, dev);
+       /*
+        * We can't use early_lookup_bdev from a running system.
+        */
+       if (system_state >= SYSTEM_RUNNING)
+               return bdev;
 
-#ifndef MODULE
        /*
         * We might not have the root device mounted at this point.
         * Try to resolve the device name by other means.
         */
-       for (i = 0; IS_ERR(bdev) && i <= timeout; i++) {
+       for (i = 0; i <= timeout; i++) {
                dev_t devt;
 
                if (i)
@@ -254,13 +248,35 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
                        msleep(1000);
                wait_for_device_probe();
 
-               devt = name_to_dev_t(devname);
-               if (!devt)
-                       continue;
-               bdev = blkdev_get_by_dev(devt, mode, dev);
+               if (!early_lookup_bdev(devname, &devt)) {
+                       bdev = blkdev_get_by_dev(devt, mode, dev, NULL);
+                       if (!IS_ERR(bdev))
+                               break;
+               }
        }
 #endif
+       return bdev;
+}
 
+static struct block2mtd_dev *add_device(char *devname, int erase_size,
+               char *label, int timeout)
+{
+       const blk_mode_t mode = BLK_OPEN_READ | BLK_OPEN_WRITE;
+       struct block_device *bdev;
+       struct block2mtd_dev *dev;
+       char *name;
+
+       if (!devname)
+               return NULL;
+
+       dev = kzalloc(sizeof(struct block2mtd_dev), GFP_KERNEL);
+       if (!dev)
+               return NULL;
+
+       /* Get a handle on the device */
+       bdev = blkdev_get_by_path(devname, mode, dev, NULL);
+       if (IS_ERR(bdev))
+               bdev = mdtblock_early_get_bdev(devname, mode, timeout, dev);
        if (IS_ERR(bdev)) {
                pr_err("error: cannot open device %s\n", devname);
                goto err_free_block2mtd;
index 60b2227..ff18636 100644 (file)
@@ -182,9 +182,9 @@ static blk_status_t mtd_queue_rq(struct blk_mq_hw_ctx *hctx,
        return BLK_STS_OK;
 }
 
-static int blktrans_open(struct block_device *bdev, fmode_t mode)
+static int blktrans_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct mtd_blktrans_dev *dev = bdev->bd_disk->private_data;
+       struct mtd_blktrans_dev *dev = disk->private_data;
        int ret = 0;
 
        kref_get(&dev->ref);
@@ -208,7 +208,7 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode)
        ret = __get_mtd_device(dev->mtd);
        if (ret)
                goto error_release;
-       dev->file_mode = mode;
+       dev->writable = mode & BLK_OPEN_WRITE;
 
 unlock:
        dev->open++;
@@ -225,7 +225,7 @@ error_put:
        return ret;
 }
 
-static void blktrans_release(struct gendisk *disk, fmode_t mode)
+static void blktrans_release(struct gendisk *disk)
 {
        struct mtd_blktrans_dev *dev = disk->private_data;
 
index a0a1194..fa476fb 100644 (file)
@@ -294,7 +294,7 @@ static void mtdblock_release(struct mtd_blktrans_dev *mbd)
                 * It was the last usage. Free the cache, but only sync if
                 * opened for writing.
                 */
-               if (mbd->file_mode & FMODE_WRITE)
+               if (mbd->writable)
                        mtd_sync(mbd->mtd);
                vfree(mtdblk->cache_data);
        }
index 01f1c67..8dc4f5c 100644 (file)
@@ -590,8 +590,8 @@ static void adjust_oob_length(struct mtd_info *mtd, uint64_t start,
                            (end_page - start_page + 1) * oob_per_page);
 }
 
-static int mtdchar_write_ioctl(struct mtd_info *mtd,
-               struct mtd_write_req __user *argp)
+static noinline_for_stack int
+mtdchar_write_ioctl(struct mtd_info *mtd, struct mtd_write_req __user *argp)
 {
        struct mtd_info *master = mtd_get_master(mtd);
        struct mtd_write_req req;
@@ -688,8 +688,8 @@ static int mtdchar_write_ioctl(struct mtd_info *mtd,
        return ret;
 }
 
-static int mtdchar_read_ioctl(struct mtd_info *mtd,
-               struct mtd_read_req __user *argp)
+static noinline_for_stack int
+mtdchar_read_ioctl(struct mtd_info *mtd, struct mtd_read_req __user *argp)
 {
        struct mtd_info *master = mtd_get_master(mtd);
        struct mtd_read_req req;
index 2cda439..017868f 100644 (file)
@@ -36,25 +36,25 @@ int ingenic_ecc_correct(struct ingenic_ecc *ecc,
 void ingenic_ecc_release(struct ingenic_ecc *ecc);
 struct ingenic_ecc *of_ingenic_ecc_get(struct device_node *np);
 #else /* CONFIG_MTD_NAND_INGENIC_ECC */
-int ingenic_ecc_calculate(struct ingenic_ecc *ecc,
+static inline int ingenic_ecc_calculate(struct ingenic_ecc *ecc,
                          struct ingenic_ecc_params *params,
                          const u8 *buf, u8 *ecc_code)
 {
        return -ENODEV;
 }
 
-int ingenic_ecc_correct(struct ingenic_ecc *ecc,
+static inline int ingenic_ecc_correct(struct ingenic_ecc *ecc,
                        struct ingenic_ecc_params *params, u8 *buf,
                        u8 *ecc_code)
 {
        return -ENODEV;
 }
 
-void ingenic_ecc_release(struct ingenic_ecc *ecc)
+static inline void ingenic_ecc_release(struct ingenic_ecc *ecc)
 {
 }
 
-struct ingenic_ecc *of_ingenic_ecc_get(struct device_node *np)
+static inline struct ingenic_ecc *of_ingenic_ecc_get(struct device_node *np)
 {
        return ERR_PTR(-ENODEV);
 }
index afb4245..30c15e4 100644 (file)
@@ -2457,6 +2457,12 @@ static int marvell_nfc_setup_interface(struct nand_chip *chip, int chipnr,
                        NDTR1_WAIT_MODE;
        }
 
+       /*
+        * Reset nfc->selected_chip so the next command will cause the timing
+        * registers to be updated in marvell_nfc_select_target().
+        */
+       nfc->selected_chip = NULL;
+
        return 0;
 }
 
@@ -2894,10 +2900,6 @@ static int marvell_nfc_init(struct marvell_nfc *nfc)
                regmap_update_bits(sysctrl_base, GENCONF_CLK_GATING_CTRL,
                                   GENCONF_CLK_GATING_CTRL_ND_GATE,
                                   GENCONF_CLK_GATING_CTRL_ND_GATE);
-
-               regmap_update_bits(sysctrl_base, GENCONF_ND_CLK_CTRL,
-                                  GENCONF_ND_CLK_CTRL_EN,
-                                  GENCONF_ND_CLK_CTRL_EN);
        }
 
        /* Configure the DMA if appropriate */
index 0bb0ad1..5f29fac 100644 (file)
@@ -2018,6 +2018,7 @@ static const struct spi_nor_manufacturer *manufacturers[] = {
 
 static const struct flash_info spi_nor_generic_flash = {
        .name = "spi-nor-generic",
+       .n_banks = 1,
        /*
         * JESD216 rev A doesn't specify the page size, therefore we need a
         * sane default.
@@ -2921,7 +2922,8 @@ static void spi_nor_late_init_params(struct spi_nor *nor)
        if (nor->flags & SNOR_F_HAS_LOCK && !nor->params->locking_ops)
                spi_nor_init_default_locking_ops(nor);
 
-       nor->params->bank_size = div64_u64(nor->params->size, nor->info->n_banks);
+       if (nor->info->n_banks > 1)
+               params->bank_size = div64_u64(params->size, nor->info->n_banks);
 }
 
 /**
@@ -2987,6 +2989,7 @@ static void spi_nor_init_default_params(struct spi_nor *nor)
        /* Set SPI NOR sizes. */
        params->writesize = 1;
        params->size = (u64)info->sector_size * info->n_sectors;
+       params->bank_size = params->size;
        params->page_size = info->page_size;
 
        if (!(info->flags & SPI_NOR_NO_FR)) {
index 15f9a80..36876aa 100644 (file)
@@ -361,7 +361,7 @@ static int cypress_nor_determine_addr_mode_by_sr1(struct spi_nor *nor,
  */
 static int cypress_nor_set_addr_mode_nbytes(struct spi_nor *nor)
 {
-       struct spi_mem_op op;
+       struct spi_mem_op op = {};
        u8 addr_mode;
        int ret;
 
@@ -492,7 +492,7 @@ s25fs256t_post_bfpt_fixup(struct spi_nor *nor,
                          const struct sfdp_parameter_header *bfpt_header,
                          const struct sfdp_bfpt *bfpt)
 {
-       struct spi_mem_op op;
+       struct spi_mem_op op = {};
        int ret;
 
        ret = cypress_nor_set_addr_mode_nbytes(nor);
index 3711d7f..437c5b8 100644 (file)
@@ -227,9 +227,9 @@ static blk_status_t ubiblock_read(struct request *req)
        return BLK_STS_OK;
 }
 
-static int ubiblock_open(struct block_device *bdev, fmode_t mode)
+static int ubiblock_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct ubiblock *dev = bdev->bd_disk->private_data;
+       struct ubiblock *dev = disk->private_data;
        int ret;
 
        mutex_lock(&dev->dev_mutex);
@@ -246,11 +246,10 @@ static int ubiblock_open(struct block_device *bdev, fmode_t mode)
         * It's just a paranoid check, as write requests will get rejected
         * in any case.
         */
-       if (mode & FMODE_WRITE) {
+       if (mode & BLK_OPEN_WRITE) {
                ret = -EROFS;
                goto out_unlock;
        }
-
        dev->desc = ubi_open_volume(dev->ubi_num, dev->vol_id, UBI_READONLY);
        if (IS_ERR(dev->desc)) {
                dev_err(disk_to_dev(dev->gd), "failed to open ubi volume %d_%d",
@@ -270,7 +269,7 @@ out_unlock:
        return ret;
 }
 
-static void ubiblock_release(struct gendisk *gd, fmode_t mode)
+static void ubiblock_release(struct gendisk *gd)
 {
        struct ubiblock *dev = gd->private_data;
 
index 3fed888..edbaa14 100644 (file)
@@ -3947,7 +3947,11 @@ static int bond_slave_netdev_event(unsigned long event,
                unblock_netpoll_tx();
                break;
        case NETDEV_FEAT_CHANGE:
-               bond_compute_features(bond);
+               if (!bond->notifier_ctx) {
+                       bond->notifier_ctx = true;
+                       bond_compute_features(bond);
+                       bond->notifier_ctx = false;
+               }
                break;
        case NETDEV_RESEND_IGMP:
                /* Propagate to master device */
@@ -6342,6 +6346,8 @@ static int bond_init(struct net_device *bond_dev)
        if (!bond->wq)
                return -ENOMEM;
 
+       bond->notifier_ctx = false;
+
        spin_lock_init(&bond->stats_lock);
        netdev_lockdep_set_classes(bond_dev);
 
index c2d080f..27cbe14 100644 (file)
@@ -84,6 +84,11 @@ nla_put_failure:
        return -EMSGSIZE;
 }
 
+/* Limit the max delay range to 300s */
+static struct netlink_range_validation delay_range = {
+       .max = 300000,
+};
+
 static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
        [IFLA_BOND_MODE]                = { .type = NLA_U8 },
        [IFLA_BOND_ACTIVE_SLAVE]        = { .type = NLA_U32 },
@@ -114,7 +119,7 @@ static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
        [IFLA_BOND_AD_ACTOR_SYSTEM]     = { .type = NLA_BINARY,
                                            .len  = ETH_ALEN },
        [IFLA_BOND_TLB_DYNAMIC_LB]      = { .type = NLA_U8 },
-       [IFLA_BOND_PEER_NOTIF_DELAY]    = { .type = NLA_U32 },
+       [IFLA_BOND_PEER_NOTIF_DELAY]    = NLA_POLICY_FULL_RANGE(NLA_U32, &delay_range),
        [IFLA_BOND_MISSED_MAX]          = { .type = NLA_U8 },
        [IFLA_BOND_NS_IP6_TARGET]       = { .type = NLA_NESTED },
 };
index 0498fc6..f3f27f0 100644 (file)
@@ -169,6 +169,12 @@ static const struct bond_opt_value bond_num_peer_notif_tbl[] = {
        { NULL,      -1,  0}
 };
 
+static const struct bond_opt_value bond_peer_notif_delay_tbl[] = {
+       { "off",     0,   0},
+       { "maxval",  300000, BOND_VALFLAG_MAX},
+       { NULL,      -1,  0}
+};
+
 static const struct bond_opt_value bond_primary_reselect_tbl[] = {
        { "always",  BOND_PRI_RESELECT_ALWAYS,  BOND_VALFLAG_DEFAULT},
        { "better",  BOND_PRI_RESELECT_BETTER,  0},
@@ -488,7 +494,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
                .id = BOND_OPT_PEER_NOTIF_DELAY,
                .name = "peer_notif_delay",
                .desc = "Delay between each peer notification on failover event, in milliseconds",
-               .values = bond_intmax_tbl,
+               .values = bond_peer_notif_delay_tbl,
                .set = bond_option_peer_notif_delay_set
        }
 };
index 3ceccaf..b190007 100644 (file)
@@ -95,7 +95,7 @@ config CAN_AT91
 
 config CAN_BXCAN
        tristate "STM32 Basic Extended CAN (bxCAN) devices"
-       depends on OF || ARCH_STM32 || COMPILE_TEST
+       depends on ARCH_STM32 || COMPILE_TEST
        depends on HAS_IOMEM
        select CAN_RX_OFFLOAD
        help
index e26ccd4..027a8a1 100644 (file)
 #define BXCAN_FiR1_REG(b) (0x40 + (b) * 8)
 #define BXCAN_FiR2_REG(b) (0x44 + (b) * 8)
 
-#define BXCAN_FILTER_ID(primary) (primary ? 0 : 14)
+#define BXCAN_FILTER_ID(cfg) ((cfg) == BXCAN_CFG_DUAL_SECONDARY ? 14 : 0)
 
 /* Filter primary register (FMR) bits */
 #define BXCAN_FMR_CANSB_MASK GENMASK(13, 8)
@@ -135,6 +135,12 @@ enum bxcan_lec_code {
        BXCAN_LEC_UNUSED
 };
 
+enum bxcan_cfg {
+       BXCAN_CFG_SINGLE = 0,
+       BXCAN_CFG_DUAL_PRIMARY,
+       BXCAN_CFG_DUAL_SECONDARY
+};
+
 /* Structure of the message buffer */
 struct bxcan_mb {
        u32 id;                 /* can identifier */
@@ -167,7 +173,7 @@ struct bxcan_priv {
        struct regmap *gcan;
        int tx_irq;
        int sce_irq;
-       bool primary;
+       enum bxcan_cfg cfg;
        struct clk *clk;
        spinlock_t rmw_lock;    /* lock for read-modify-write operations */
        unsigned int tx_head;
@@ -202,17 +208,17 @@ static inline void bxcan_rmw(struct bxcan_priv *priv, void __iomem *addr,
        spin_unlock_irqrestore(&priv->rmw_lock, flags);
 }
 
-static void bxcan_disable_filters(struct bxcan_priv *priv, bool primary)
+static void bxcan_disable_filters(struct bxcan_priv *priv, enum bxcan_cfg cfg)
 {
-       unsigned int fid = BXCAN_FILTER_ID(primary);
+       unsigned int fid = BXCAN_FILTER_ID(cfg);
        u32 fmask = BIT(fid);
 
        regmap_update_bits(priv->gcan, BXCAN_FA1R_REG, fmask, 0);
 }
 
-static void bxcan_enable_filters(struct bxcan_priv *priv, bool primary)
+static void bxcan_enable_filters(struct bxcan_priv *priv, enum bxcan_cfg cfg)
 {
-       unsigned int fid = BXCAN_FILTER_ID(primary);
+       unsigned int fid = BXCAN_FILTER_ID(cfg);
        u32 fmask = BIT(fid);
 
        /* Filter settings:
@@ -680,7 +686,7 @@ static int bxcan_chip_start(struct net_device *ndev)
                  BXCAN_BTR_BRP_MASK | BXCAN_BTR_TS1_MASK | BXCAN_BTR_TS2_MASK |
                  BXCAN_BTR_SJW_MASK, set);
 
-       bxcan_enable_filters(priv, priv->primary);
+       bxcan_enable_filters(priv, priv->cfg);
 
        /* Clear all internal status */
        priv->tx_head = 0;
@@ -806,7 +812,7 @@ static void bxcan_chip_stop(struct net_device *ndev)
                  BXCAN_IER_EPVIE | BXCAN_IER_EWGIE | BXCAN_IER_FOVIE1 |
                  BXCAN_IER_FFIE1 | BXCAN_IER_FMPIE1 | BXCAN_IER_FOVIE0 |
                  BXCAN_IER_FFIE0 | BXCAN_IER_FMPIE0 | BXCAN_IER_TMEIE, 0);
-       bxcan_disable_filters(priv, priv->primary);
+       bxcan_disable_filters(priv, priv->cfg);
        bxcan_enter_sleep_mode(priv);
        priv->can.state = CAN_STATE_STOPPED;
 }
@@ -931,7 +937,7 @@ static int bxcan_probe(struct platform_device *pdev)
        struct clk *clk = NULL;
        void __iomem *regs;
        struct regmap *gcan;
-       bool primary;
+       enum bxcan_cfg cfg;
        int err, rx_irq, tx_irq, sce_irq;
 
        regs = devm_platform_ioremap_resource(pdev, 0);
@@ -946,7 +952,13 @@ static int bxcan_probe(struct platform_device *pdev)
                return PTR_ERR(gcan);
        }
 
-       primary = of_property_read_bool(np, "st,can-primary");
+       if (of_property_read_bool(np, "st,can-primary"))
+               cfg = BXCAN_CFG_DUAL_PRIMARY;
+       else if (of_property_read_bool(np, "st,can-secondary"))
+               cfg = BXCAN_CFG_DUAL_SECONDARY;
+       else
+               cfg = BXCAN_CFG_SINGLE;
+
        clk = devm_clk_get(dev, NULL);
        if (IS_ERR(clk)) {
                dev_err(dev, "failed to get clock\n");
@@ -992,7 +1004,7 @@ static int bxcan_probe(struct platform_device *pdev)
        priv->clk = clk;
        priv->tx_irq = tx_irq;
        priv->sce_irq = sce_irq;
-       priv->primary = primary;
+       priv->cfg = cfg;
        priv->can.clock.freq = clk_get_rate(clk);
        spin_lock_init(&priv->rmw_lock);
        priv->tx_head = 0;
index 241ec63..f6d05b3 100644 (file)
@@ -54,7 +54,8 @@ int can_put_echo_skb(struct sk_buff *skb, struct net_device *dev,
        /* check flag whether this packet has to be looped back */
        if (!(dev->flags & IFF_ECHO) ||
            (skb->protocol != htons(ETH_P_CAN) &&
-            skb->protocol != htons(ETH_P_CANFD))) {
+            skb->protocol != htons(ETH_P_CANFD) &&
+            skb->protocol != htons(ETH_P_CANXL))) {
                kfree_skb(skb);
                return 0;
        }
index 53e8a91..be189ed 100644 (file)
@@ -71,10 +71,12 @@ MODULE_DESCRIPTION("CAN driver for Kvaser CAN/PCIe devices");
 #define KVASER_PCIEFD_SYSID_BUILD_REG (KVASER_PCIEFD_SYSID_BASE + 0x14)
 /* Shared receive buffer registers */
 #define KVASER_PCIEFD_SRB_BASE 0x1f200
+#define KVASER_PCIEFD_SRB_FIFO_LAST_REG (KVASER_PCIEFD_SRB_BASE + 0x1f4)
 #define KVASER_PCIEFD_SRB_CMD_REG (KVASER_PCIEFD_SRB_BASE + 0x200)
 #define KVASER_PCIEFD_SRB_IEN_REG (KVASER_PCIEFD_SRB_BASE + 0x204)
 #define KVASER_PCIEFD_SRB_IRQ_REG (KVASER_PCIEFD_SRB_BASE + 0x20c)
 #define KVASER_PCIEFD_SRB_STAT_REG (KVASER_PCIEFD_SRB_BASE + 0x210)
+#define KVASER_PCIEFD_SRB_RX_NR_PACKETS_REG (KVASER_PCIEFD_SRB_BASE + 0x214)
 #define KVASER_PCIEFD_SRB_CTRL_REG (KVASER_PCIEFD_SRB_BASE + 0x218)
 /* EPCS flash controller registers */
 #define KVASER_PCIEFD_SPI_BASE 0x1fc00
@@ -111,6 +113,9 @@ MODULE_DESCRIPTION("CAN driver for Kvaser CAN/PCIe devices");
 /* DMA support */
 #define KVASER_PCIEFD_SRB_STAT_DMA BIT(24)
 
+/* SRB current packet level */
+#define KVASER_PCIEFD_SRB_RX_NR_PACKETS_MASK 0xff
+
 /* DMA Enable */
 #define KVASER_PCIEFD_SRB_CTRL_DMA_ENABLE BIT(0)
 
@@ -526,7 +531,7 @@ static int kvaser_pciefd_set_tx_irq(struct kvaser_pciefd_can *can)
              KVASER_PCIEFD_KCAN_IRQ_TOF | KVASER_PCIEFD_KCAN_IRQ_ABD |
              KVASER_PCIEFD_KCAN_IRQ_TAE | KVASER_PCIEFD_KCAN_IRQ_TAL |
              KVASER_PCIEFD_KCAN_IRQ_FDIC | KVASER_PCIEFD_KCAN_IRQ_BPP |
-             KVASER_PCIEFD_KCAN_IRQ_TAR | KVASER_PCIEFD_KCAN_IRQ_TFD;
+             KVASER_PCIEFD_KCAN_IRQ_TAR;
 
        iowrite32(msk, can->reg_base + KVASER_PCIEFD_KCAN_IEN_REG);
 
@@ -554,6 +559,8 @@ static void kvaser_pciefd_setup_controller(struct kvaser_pciefd_can *can)
 
        if (can->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
                mode |= KVASER_PCIEFD_KCAN_MODE_LOM;
+       else
+               mode &= ~KVASER_PCIEFD_KCAN_MODE_LOM;
 
        mode |= KVASER_PCIEFD_KCAN_MODE_EEN;
        mode |= KVASER_PCIEFD_KCAN_MODE_EPEN;
@@ -572,7 +579,7 @@ static void kvaser_pciefd_start_controller_flush(struct kvaser_pciefd_can *can)
 
        spin_lock_irqsave(&can->lock, irq);
        iowrite32(-1, can->reg_base + KVASER_PCIEFD_KCAN_IRQ_REG);
-       iowrite32(KVASER_PCIEFD_KCAN_IRQ_ABD | KVASER_PCIEFD_KCAN_IRQ_TFD,
+       iowrite32(KVASER_PCIEFD_KCAN_IRQ_ABD,
                  can->reg_base + KVASER_PCIEFD_KCAN_IEN_REG);
 
        status = ioread32(can->reg_base + KVASER_PCIEFD_KCAN_STAT_REG);
@@ -615,7 +622,7 @@ static int kvaser_pciefd_bus_on(struct kvaser_pciefd_can *can)
        iowrite32(0, can->reg_base + KVASER_PCIEFD_KCAN_IEN_REG);
        iowrite32(-1, can->reg_base + KVASER_PCIEFD_KCAN_IRQ_REG);
 
-       iowrite32(KVASER_PCIEFD_KCAN_IRQ_ABD | KVASER_PCIEFD_KCAN_IRQ_TFD,
+       iowrite32(KVASER_PCIEFD_KCAN_IRQ_ABD,
                  can->reg_base + KVASER_PCIEFD_KCAN_IEN_REG);
 
        mode = ioread32(can->reg_base + KVASER_PCIEFD_KCAN_MODE_REG);
@@ -719,6 +726,7 @@ static int kvaser_pciefd_stop(struct net_device *netdev)
                iowrite32(0, can->reg_base + KVASER_PCIEFD_KCAN_IEN_REG);
                del_timer(&can->bec_poll_timer);
        }
+       can->can.state = CAN_STATE_STOPPED;
        close_candev(netdev);
 
        return ret;
@@ -1007,8 +1015,7 @@ static int kvaser_pciefd_setup_can_ctrls(struct kvaser_pciefd *pcie)
                SET_NETDEV_DEV(netdev, &pcie->pci->dev);
 
                iowrite32(-1, can->reg_base + KVASER_PCIEFD_KCAN_IRQ_REG);
-               iowrite32(KVASER_PCIEFD_KCAN_IRQ_ABD |
-                         KVASER_PCIEFD_KCAN_IRQ_TFD,
+               iowrite32(KVASER_PCIEFD_KCAN_IRQ_ABD,
                          can->reg_base + KVASER_PCIEFD_KCAN_IEN_REG);
 
                pcie->can[i] = can;
@@ -1058,6 +1065,7 @@ static int kvaser_pciefd_setup_dma(struct kvaser_pciefd *pcie)
 {
        int i;
        u32 srb_status;
+       u32 srb_packet_count;
        dma_addr_t dma_addr[KVASER_PCIEFD_DMA_COUNT];
 
        /* Disable the DMA */
@@ -1085,6 +1093,15 @@ static int kvaser_pciefd_setup_dma(struct kvaser_pciefd *pcie)
                  KVASER_PCIEFD_SRB_CMD_RDB1,
                  pcie->reg_base + KVASER_PCIEFD_SRB_CMD_REG);
 
+       /* Empty Rx FIFO */
+       srb_packet_count = ioread32(pcie->reg_base + KVASER_PCIEFD_SRB_RX_NR_PACKETS_REG) &
+                          KVASER_PCIEFD_SRB_RX_NR_PACKETS_MASK;
+       while (srb_packet_count) {
+               /* Drop current packet in FIFO */
+               ioread32(pcie->reg_base + KVASER_PCIEFD_SRB_FIFO_LAST_REG);
+               srb_packet_count--;
+       }
+
        srb_status = ioread32(pcie->reg_base + KVASER_PCIEFD_SRB_STAT_REG);
        if (!(srb_status & KVASER_PCIEFD_SRB_STAT_DI)) {
                dev_err(&pcie->pci->dev, "DMA not idle before enabling\n");
@@ -1425,9 +1442,6 @@ static int kvaser_pciefd_handle_status_packet(struct kvaser_pciefd *pcie,
                cmd = KVASER_PCIEFD_KCAN_CMD_AT;
                cmd |= ++can->cmd_seq << KVASER_PCIEFD_KCAN_CMD_SEQ_SHIFT;
                iowrite32(cmd, can->reg_base + KVASER_PCIEFD_KCAN_CMD_REG);
-
-               iowrite32(KVASER_PCIEFD_KCAN_IRQ_TFD,
-                         can->reg_base + KVASER_PCIEFD_KCAN_IEN_REG);
        } else if (p->header[0] & KVASER_PCIEFD_SPACK_IDET &&
                   p->header[0] & KVASER_PCIEFD_SPACK_IRM &&
                   cmdseq == (p->header[1] & KVASER_PCIEFD_PACKET_SEQ_MSK) &&
@@ -1714,15 +1728,6 @@ static int kvaser_pciefd_transmit_irq(struct kvaser_pciefd_can *can)
        if (irq & KVASER_PCIEFD_KCAN_IRQ_TOF)
                netdev_err(can->can.dev, "Tx FIFO overflow\n");
 
-       if (irq & KVASER_PCIEFD_KCAN_IRQ_TFD) {
-               u8 count = ioread32(can->reg_base +
-                                   KVASER_PCIEFD_KCAN_TX_NPACKETS_REG) & 0xff;
-
-               if (count == 0)
-                       iowrite32(KVASER_PCIEFD_KCAN_CTRL_EFLUSH,
-                                 can->reg_base + KVASER_PCIEFD_KCAN_CTRL_REG);
-       }
-
        if (irq & KVASER_PCIEFD_KCAN_IRQ_BPP)
                netdev_err(can->can.dev,
                           "Fail to change bittiming, when not in reset mode\n");
@@ -1824,6 +1829,11 @@ static int kvaser_pciefd_probe(struct pci_dev *pdev,
        if (err)
                goto err_teardown_can_ctrls;
 
+       err = request_irq(pcie->pci->irq, kvaser_pciefd_irq_handler,
+                         IRQF_SHARED, KVASER_PCIEFD_DRV_NAME, pcie);
+       if (err)
+               goto err_teardown_can_ctrls;
+
        iowrite32(KVASER_PCIEFD_SRB_IRQ_DPD0 | KVASER_PCIEFD_SRB_IRQ_DPD1,
                  pcie->reg_base + KVASER_PCIEFD_SRB_IRQ_REG);
 
@@ -1844,11 +1854,6 @@ static int kvaser_pciefd_probe(struct pci_dev *pdev,
        iowrite32(KVASER_PCIEFD_SRB_CMD_RDB1,
                  pcie->reg_base + KVASER_PCIEFD_SRB_CMD_REG);
 
-       err = request_irq(pcie->pci->irq, kvaser_pciefd_irq_handler,
-                         IRQF_SHARED, KVASER_PCIEFD_DRV_NAME, pcie);
-       if (err)
-               goto err_teardown_can_ctrls;
-
        err = kvaser_pciefd_reg_candev(pcie);
        if (err)
                goto err_free_irq;
@@ -1856,6 +1861,8 @@ static int kvaser_pciefd_probe(struct pci_dev *pdev,
        return 0;
 
 err_free_irq:
+       /* Disable PCI interrupts */
+       iowrite32(0, pcie->reg_base + KVASER_PCIEFD_IEN_REG);
        free_irq(pcie->pci->irq, pcie);
 
 err_teardown_can_ctrls:
index cbe8318..c0215a8 100644 (file)
@@ -1188,8 +1188,6 @@ static int lan9303_port_fdb_add(struct dsa_switch *ds, int port,
        struct lan9303 *chip = ds->priv;
 
        dev_dbg(chip->dev, "%s(%d, %pM, %d)\n", __func__, port, addr, vid);
-       if (vid)
-               return -EOPNOTSUPP;
 
        return lan9303_alr_add_port(chip, addr, port, false);
 }
@@ -1201,8 +1199,6 @@ static int lan9303_port_fdb_del(struct dsa_switch *ds, int port,
        struct lan9303 *chip = ds->priv;
 
        dev_dbg(chip->dev, "%s(%d, %pM, %d)\n", __func__, port, addr, vid);
-       if (vid)
-               return -EOPNOTSUPP;
        lan9303_alr_del_port(chip, addr, port);
 
        return 0;
index 9bc54e1..7e773c4 100644 (file)
@@ -399,6 +399,20 @@ static void mt7530_pll_setup(struct mt7530_priv *priv)
        core_set(priv, CORE_TRGMII_GSW_CLK_CG, REG_GSWCK_EN);
 }
 
+/* If port 6 is available as a CPU port, always prefer that as the default,
+ * otherwise don't care.
+ */
+static struct dsa_port *
+mt753x_preferred_default_local_cpu_port(struct dsa_switch *ds)
+{
+       struct dsa_port *cpu_dp = dsa_to_port(ds, 6);
+
+       if (dsa_port_is_cpu(cpu_dp))
+               return cpu_dp;
+
+       return NULL;
+}
+
 /* Setup port 6 interface mode and TRGMII TX circuit */
 static int
 mt7530_pad_clk_setup(struct dsa_switch *ds, phy_interface_t interface)
@@ -985,6 +999,18 @@ unlock_exit:
        mutex_unlock(&priv->reg_mutex);
 }
 
+static void
+mt753x_trap_frames(struct mt7530_priv *priv)
+{
+       /* Trap BPDUs to the CPU port(s) */
+       mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK,
+                  MT753X_BPDU_CPU_ONLY);
+
+       /* Trap LLDP frames with :0E MAC DA to the CPU port(s) */
+       mt7530_rmw(priv, MT753X_RGAC2, MT753X_R0E_PORT_FW_MASK,
+                  MT753X_R0E_PORT_FW(MT753X_BPDU_CPU_ONLY));
+}
+
 static int
 mt753x_cpu_port_enable(struct dsa_switch *ds, int port)
 {
@@ -1007,9 +1033,16 @@ mt753x_cpu_port_enable(struct dsa_switch *ds, int port)
                   UNU_FFP(BIT(port)));
 
        /* Set CPU port number */
-       if (priv->id == ID_MT7621)
+       if (priv->id == ID_MT7530 || priv->id == ID_MT7621)
                mt7530_rmw(priv, MT7530_MFC, CPU_MASK, CPU_EN | CPU_PORT(port));
 
+       /* Add the CPU port to the CPU port bitmap for MT7531 and the switch on
+        * the MT7988 SoC. Trapped frames will be forwarded to the CPU port that
+        * is affine to the inbound user port.
+        */
+       if (priv->id == ID_MT7531 || priv->id == ID_MT7988)
+               mt7530_set(priv, MT7531_CFC, MT7531_CPU_PMAP(BIT(port)));
+
        /* CPU port gets connected to all user ports of
         * the switch.
         */
@@ -2255,6 +2288,8 @@ mt7530_setup(struct dsa_switch *ds)
 
        priv->p6_interface = PHY_INTERFACE_MODE_NA;
 
+       mt753x_trap_frames(priv);
+
        /* Enable and reset MIB counters */
        mt7530_mib_reset(ds);
 
@@ -2352,17 +2387,9 @@ static int
 mt7531_setup_common(struct dsa_switch *ds)
 {
        struct mt7530_priv *priv = ds->priv;
-       struct dsa_port *cpu_dp;
        int ret, i;
 
-       /* BPDU to CPU port */
-       dsa_switch_for_each_cpu_port(cpu_dp, ds) {
-               mt7530_rmw(priv, MT7531_CFC, MT7531_CPU_PMAP_MASK,
-                          BIT(cpu_dp->index));
-               break;
-       }
-       mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK,
-                  MT753X_BPDU_CPU_ONLY);
+       mt753x_trap_frames(priv);
 
        /* Enable and reset MIB counters */
        mt7530_mib_reset(ds);
@@ -3085,6 +3112,7 @@ static int mt7988_setup(struct dsa_switch *ds)
 const struct dsa_switch_ops mt7530_switch_ops = {
        .get_tag_protocol       = mtk_get_tag_protocol,
        .setup                  = mt753x_setup,
+       .preferred_default_local_cpu_port = mt753x_preferred_default_local_cpu_port,
        .get_strings            = mt7530_get_strings,
        .get_ethtool_stats      = mt7530_get_ethtool_stats,
        .get_sset_count         = mt7530_get_sset_count,
index 5084f48..08045b0 100644 (file)
@@ -54,6 +54,7 @@ enum mt753x_id {
 #define  MT7531_MIRROR_PORT_GET(x)     (((x) >> 16) & MIRROR_MASK)
 #define  MT7531_MIRROR_PORT_SET(x)     (((x) & MIRROR_MASK) << 16)
 #define  MT7531_CPU_PMAP_MASK          GENMASK(7, 0)
+#define  MT7531_CPU_PMAP(x)            FIELD_PREP(MT7531_CPU_PMAP_MASK, x)
 
 #define MT753X_MIRROR_REG(id)          ((((id) == ID_MT7531) || ((id) == ID_MT7988)) ? \
                                         MT7531_CFC : MT7530_MFC)
@@ -66,6 +67,11 @@ enum mt753x_id {
 #define MT753X_BPC                     0x24
 #define  MT753X_BPDU_PORT_FW_MASK      GENMASK(2, 0)
 
+/* Register for :03 and :0E MAC DA frame control */
+#define MT753X_RGAC2                   0x2c
+#define  MT753X_R0E_PORT_FW_MASK       GENMASK(18, 16)
+#define  MT753X_R0E_PORT_FW(x)         FIELD_PREP(MT753X_R0E_PORT_FW_MASK, x)
+
 enum mt753x_bpdu_port_fw {
        MT753X_BPDU_FOLLOW_MFC,
        MT753X_BPDU_CPU_EXCLUDE = 4,
index 64a2f2f..08a46ff 100644 (file)
@@ -7170,7 +7170,7 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
                goto out;
        }
        if (chip->reset)
-               usleep_range(1000, 2000);
+               usleep_range(10000, 20000);
 
        /* Detect if the device is configured in single chip addressing mode,
         * otherwise continue with address specific smi init/detection.
index aec9d4f..d19b630 100644 (file)
 /* Offset 0x10: Extended Port Control Command */
 #define MV88E6393X_PORT_EPC_CMD                0x10
 #define MV88E6393X_PORT_EPC_CMD_BUSY   0x8000
-#define MV88E6393X_PORT_EPC_CMD_WRITE  0x0300
+#define MV88E6393X_PORT_EPC_CMD_WRITE  0x3000
 #define MV88E6393X_PORT_EPC_INDEX_PORT_ETYPE   0x02
 
 /* Offset 0x11: Extended Port Control Data */
index cfb3fae..d172a3e 100644 (file)
@@ -1263,7 +1263,7 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
        /* Consider the standard Ethernet overhead of 8 octets preamble+SFD,
         * 4 octets FCS, 12 octets IFG.
         */
-       needed_bit_time_ps = (maxlen + 24) * picos_per_byte;
+       needed_bit_time_ps = (u64)(maxlen + 24) * picos_per_byte;
 
        dev_dbg(ocelot->dev,
                "port %d: max frame size %d needs %llu ps at speed %d\n",
index 4347b42..de9da46 100644 (file)
@@ -20,6 +20,7 @@ config NET_DSA_QCA8K_LEDS_SUPPORT
        bool "Qualcomm Atheros QCA8K Ethernet switch family LEDs support"
        depends on NET_DSA_QCA8K
        depends on LEDS_CLASS=y || LEDS_CLASS=NET_DSA_QCA8K
+       depends on LEDS_TRIGGERS
        help
          This enabled support for LEDs present on the Qualcomm Atheros
          QCA8K Ethernet switch chips.
index 919027c..c37d2e5 100644 (file)
@@ -120,6 +120,22 @@ static void a5psw_port_mgmtfwd_set(struct a5psw *a5psw, int port, bool enable)
        a5psw_port_pattern_set(a5psw, port, A5PSW_PATTERN_MGMTFWD, enable);
 }
 
+static void a5psw_port_tx_enable(struct a5psw *a5psw, int port, bool enable)
+{
+       u32 mask = A5PSW_PORT_ENA_TX(port);
+       u32 reg = enable ? mask : 0;
+
+       /* Even though the port TX is disabled through TXENA bit in the
+        * PORT_ENA register, it can still send BPDUs. This depends on the tag
+        * configuration added when sending packets from the CPU port to the
+        * switch port. Indeed, when using forced forwarding without filtering,
+        * even disabled ports will be able to send packets that are tagged.
+        * This allows to implement STP support when ports are in a state where
+        * forwarding traffic should be stopped but BPDUs should still be sent.
+        */
+       a5psw_reg_rmw(a5psw, A5PSW_PORT_ENA, mask, reg);
+}
+
 static void a5psw_port_enable_set(struct a5psw *a5psw, int port, bool enable)
 {
        u32 port_ena = 0;
@@ -292,6 +308,22 @@ static int a5psw_set_ageing_time(struct dsa_switch *ds, unsigned int msecs)
        return 0;
 }
 
+static void a5psw_port_learning_set(struct a5psw *a5psw, int port, bool learn)
+{
+       u32 mask = A5PSW_INPUT_LEARN_DIS(port);
+       u32 reg = !learn ? mask : 0;
+
+       a5psw_reg_rmw(a5psw, A5PSW_INPUT_LEARN, mask, reg);
+}
+
+static void a5psw_port_rx_block_set(struct a5psw *a5psw, int port, bool block)
+{
+       u32 mask = A5PSW_INPUT_LEARN_BLOCK(port);
+       u32 reg = block ? mask : 0;
+
+       a5psw_reg_rmw(a5psw, A5PSW_INPUT_LEARN, mask, reg);
+}
+
 static void a5psw_flooding_set_resolution(struct a5psw *a5psw, int port,
                                          bool set)
 {
@@ -308,6 +340,14 @@ static void a5psw_flooding_set_resolution(struct a5psw *a5psw, int port,
                a5psw_reg_writel(a5psw, offsets[i], a5psw->bridged_ports);
 }
 
+static void a5psw_port_set_standalone(struct a5psw *a5psw, int port,
+                                     bool standalone)
+{
+       a5psw_port_learning_set(a5psw, port, !standalone);
+       a5psw_flooding_set_resolution(a5psw, port, !standalone);
+       a5psw_port_mgmtfwd_set(a5psw, port, standalone);
+}
+
 static int a5psw_port_bridge_join(struct dsa_switch *ds, int port,
                                  struct dsa_bridge bridge,
                                  bool *tx_fwd_offload,
@@ -323,8 +363,7 @@ static int a5psw_port_bridge_join(struct dsa_switch *ds, int port,
        }
 
        a5psw->br_dev = bridge.dev;
-       a5psw_flooding_set_resolution(a5psw, port, true);
-       a5psw_port_mgmtfwd_set(a5psw, port, false);
+       a5psw_port_set_standalone(a5psw, port, false);
 
        return 0;
 }
@@ -334,8 +373,7 @@ static void a5psw_port_bridge_leave(struct dsa_switch *ds, int port,
 {
        struct a5psw *a5psw = ds->priv;
 
-       a5psw_flooding_set_resolution(a5psw, port, false);
-       a5psw_port_mgmtfwd_set(a5psw, port, true);
+       a5psw_port_set_standalone(a5psw, port, true);
 
        /* No more ports bridged */
        if (a5psw->bridged_ports == BIT(A5PSW_CPU_PORT))
@@ -344,28 +382,35 @@ static void a5psw_port_bridge_leave(struct dsa_switch *ds, int port,
 
 static void a5psw_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
 {
-       u32 mask = A5PSW_INPUT_LEARN_DIS(port) | A5PSW_INPUT_LEARN_BLOCK(port);
+       bool learning_enabled, rx_enabled, tx_enabled;
        struct a5psw *a5psw = ds->priv;
-       u32 reg = 0;
 
        switch (state) {
        case BR_STATE_DISABLED:
        case BR_STATE_BLOCKING:
-               reg |= A5PSW_INPUT_LEARN_DIS(port);
-               reg |= A5PSW_INPUT_LEARN_BLOCK(port);
-               break;
        case BR_STATE_LISTENING:
-               reg |= A5PSW_INPUT_LEARN_DIS(port);
+               rx_enabled = false;
+               tx_enabled = false;
+               learning_enabled = false;
                break;
        case BR_STATE_LEARNING:
-               reg |= A5PSW_INPUT_LEARN_BLOCK(port);
+               rx_enabled = false;
+               tx_enabled = false;
+               learning_enabled = true;
                break;
        case BR_STATE_FORWARDING:
-       default:
+               rx_enabled = true;
+               tx_enabled = true;
+               learning_enabled = true;
                break;
+       default:
+               dev_err(ds->dev, "invalid STP state: %d\n", state);
+               return;
        }
 
-       a5psw_reg_rmw(a5psw, A5PSW_INPUT_LEARN, mask, reg);
+       a5psw_port_learning_set(a5psw, port, learning_enabled);
+       a5psw_port_rx_block_set(a5psw, port, !rx_enabled);
+       a5psw_port_tx_enable(a5psw, port, tx_enabled);
 }
 
 static void a5psw_port_fast_age(struct dsa_switch *ds, int port)
@@ -673,7 +718,7 @@ static int a5psw_setup(struct dsa_switch *ds)
        }
 
        /* Configure management port */
-       reg = A5PSW_CPU_PORT | A5PSW_MGMT_CFG_DISCARD;
+       reg = A5PSW_CPU_PORT | A5PSW_MGMT_CFG_ENABLE;
        a5psw_reg_writel(a5psw, A5PSW_MGMT_CFG, reg);
 
        /* Set pattern 0 to forward all frame to mgmt port */
@@ -722,13 +767,15 @@ static int a5psw_setup(struct dsa_switch *ds)
                if (dsa_port_is_unused(dp))
                        continue;
 
-               /* Enable egress flooding for CPU port */
-               if (dsa_port_is_cpu(dp))
+               /* Enable egress flooding and learning for CPU port */
+               if (dsa_port_is_cpu(dp)) {
                        a5psw_flooding_set_resolution(a5psw, port, true);
+                       a5psw_port_learning_set(a5psw, port, true);
+               }
 
-               /* Enable management forward only for user ports */
+               /* Enable standalone mode for user ports */
                if (dsa_port_is_user(dp))
-                       a5psw_port_mgmtfwd_set(a5psw, port, true);
+                       a5psw_port_set_standalone(a5psw, port, true);
        }
 
        return 0;
index c67abd4..b869192 100644 (file)
@@ -19,6 +19,7 @@
 #define A5PSW_PORT_OFFSET(port)                (0x400 * (port))
 
 #define A5PSW_PORT_ENA                 0x8
+#define A5PSW_PORT_ENA_TX(port)                BIT(port)
 #define A5PSW_PORT_ENA_RX_SHIFT                16
 #define A5PSW_PORT_ENA_TX_RX(port)     (BIT((port) + A5PSW_PORT_ENA_RX_SHIFT) | \
                                         BIT(port))
@@ -36,7 +37,7 @@
 #define A5PSW_INPUT_LEARN_BLOCK(p)     BIT(p)
 
 #define A5PSW_MGMT_CFG                 0x20
-#define A5PSW_MGMT_CFG_DISCARD         BIT(7)
+#define A5PSW_MGMT_CFG_ENABLE          BIT(6)
 
 #define A5PSW_MODE_CFG                 0x24
 #define A5PSW_MODE_STATS_RESET         BIT(31)
index d2f4358..ba3e7aa 100644 (file)
@@ -66,8 +66,10 @@ static int max_interrupt_work = 20;
 #include <linux/timer.h>
 #include <linux/ethtool.h>
 #include <linux/bitops.h>
-
 #include <linux/uaccess.h>
+
+#include <net/Space.h>
+
 #include <asm/io.h>
 #include <asm/dma.h>
 
index 82f94b1..5267e9d 100644 (file)
@@ -195,6 +195,7 @@ static int tc589_probe(struct pcmcia_device *link)
 {
        struct el3_private *lp;
        struct net_device *dev;
+       int ret;
 
        dev_dbg(&link->dev, "3c589_attach()\n");
 
@@ -218,7 +219,15 @@ static int tc589_probe(struct pcmcia_device *link)
 
        dev->ethtool_ops = &netdev_ethtool_ops;
 
-       return tc589_config(link);
+       ret = tc589_config(link);
+       if (ret)
+               goto err_free_netdev;
+
+       return 0;
+
+err_free_netdev:
+       free_netdev(dev);
+       return ret;
 }
 
 static void tc589_detach(struct pcmcia_device *link)
index 0a9118b..bc9c81d 100644 (file)
@@ -52,6 +52,7 @@ static const char version2[] =
 #include <linux/etherdevice.h>
 #include <linux/jiffies.h>
 #include <linux/platform_device.h>
+#include <net/Space.h>
 
 #include <asm/io.h>
 
index 6e62c37..7465650 100644 (file)
@@ -66,6 +66,7 @@ static const char version[] =
 #include <linux/isapnp.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
+#include <net/Space.h>
 
 #include <asm/io.h>
 #include <asm/irq.h>
index 5b00c45..119021d 100644 (file)
@@ -37,6 +37,7 @@ static const char version[] =
 #include <linux/delay.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
+#include <net/Space.h>
 
 #include <asm/io.h>
 
index 8971665..6cf3818 100644 (file)
@@ -59,6 +59,7 @@ static const char version[] = "lance.c:v1.16 2006/11/09 dplatt@3do.com, becker@c
 #include <linux/skbuff.h>
 #include <linux/mm.h>
 #include <linux/bitops.h>
+#include <net/Space.h>
 
 #include <asm/io.h>
 #include <asm/dma.h>
index f7c597e..debe521 100644 (file)
@@ -68,9 +68,15 @@ bool pdsc_is_fw_running(struct pdsc *pdsc)
 
 bool pdsc_is_fw_good(struct pdsc *pdsc)
 {
-       u8 gen = pdsc->fw_status & PDS_CORE_FW_STS_F_GENERATION;
+       bool fw_running = pdsc_is_fw_running(pdsc);
+       u8 gen;
 
-       return pdsc_is_fw_running(pdsc) && gen == pdsc->fw_generation;
+       /* Make sure to update the cached fw_status by calling
+        * pdsc_is_fw_running() before getting the generation
+        */
+       gen = pdsc->fw_status & PDS_CORE_FW_STS_F_GENERATION;
+
+       return fw_running && gen == pdsc->fw_generation;
 }
 
 static u8 pdsc_devcmd_status(struct pdsc *pdsc)
index 33a9574..32d2c6f 100644 (file)
@@ -1329,7 +1329,7 @@ static enum xgbe_mode xgbe_phy_status_aneg(struct xgbe_prv_data *pdata)
        return pdata->phy_if.phy_impl.an_outcome(pdata);
 }
 
-static void xgbe_phy_status_result(struct xgbe_prv_data *pdata)
+static bool xgbe_phy_status_result(struct xgbe_prv_data *pdata)
 {
        struct ethtool_link_ksettings *lks = &pdata->phy.lks;
        enum xgbe_mode mode;
@@ -1367,8 +1367,13 @@ static void xgbe_phy_status_result(struct xgbe_prv_data *pdata)
 
        pdata->phy.duplex = DUPLEX_FULL;
 
-       if (xgbe_set_mode(pdata, mode) && pdata->an_again)
+       if (!xgbe_set_mode(pdata, mode))
+               return false;
+
+       if (pdata->an_again)
                xgbe_phy_reconfig_aneg(pdata);
+
+       return true;
 }
 
 static void xgbe_phy_status(struct xgbe_prv_data *pdata)
@@ -1398,7 +1403,8 @@ static void xgbe_phy_status(struct xgbe_prv_data *pdata)
                        return;
                }
 
-               xgbe_phy_status_result(pdata);
+               if (xgbe_phy_status_result(pdata))
+                       return;
 
                if (test_bit(XGBE_LINK_INIT, &pdata->dev_state))
                        clear_bit(XGBE_LINK_INIT, &pdata->dev_state);
index 38d0cda..bf1611c 100644 (file)
@@ -2531,9 +2531,9 @@ static int bcm_sysport_probe(struct platform_device *pdev)
        priv->irq0 = platform_get_irq(pdev, 0);
        if (!priv->is_lite) {
                priv->irq1 = platform_get_irq(pdev, 1);
-               priv->wol_irq = platform_get_irq(pdev, 2);
+               priv->wol_irq = platform_get_irq_optional(pdev, 2);
        } else {
-               priv->wol_irq = platform_get_irq(pdev, 1);
+               priv->wol_irq = platform_get_irq_optional(pdev, 1);
        }
        if (priv->irq0 <= 0 || (priv->irq1 <= 0 && !priv->is_lite)) {
                ret = -EINVAL;
index 637d162..1e7a6f1 100644 (file)
@@ -14294,11 +14294,16 @@ static void bnx2x_io_resume(struct pci_dev *pdev)
        bp->fw_seq = SHMEM_RD(bp, func_mb[BP_FW_MB_IDX(bp)].drv_mb_header) &
                                                        DRV_MSG_SEQ_NUMBER_MASK;
 
-       if (netif_running(dev))
-               bnx2x_nic_load(bp, LOAD_NORMAL);
+       if (netif_running(dev)) {
+               if (bnx2x_nic_load(bp, LOAD_NORMAL)) {
+                       netdev_err(bp->dev, "Error during driver initialization, try unloading/reloading the driver\n");
+                       goto done;
+               }
+       }
 
        netif_device_attach(dev);
 
+done:
        rtnl_unlock();
 }
 
index dcd9367..b499bc9 100644 (file)
@@ -692,7 +692,7 @@ next_tx_int:
 
        __netif_txq_completed_wake(txq, nr_pkts, tx_bytes,
                                   bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
-                                  READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
+                                  READ_ONCE(txr->dev_state) == BNXT_DEV_STATE_CLOSING);
 }
 
 static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
@@ -2365,6 +2365,9 @@ static int bnxt_async_event_process(struct bnxt *bp,
                                struct bnxt_ptp_cfg *ptp = bp->ptp_cfg;
                                u64 ns;
 
+                               if (!ptp)
+                                       goto async_event_process_exit;
+
                                spin_lock_bh(&ptp->ptp_lock);
                                bnxt_ptp_update_current_time(bp);
                                ns = (((u64)BNXT_EVENT_PHC_RTC_UPDATE(data1) <<
@@ -4763,6 +4766,9 @@ int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp, unsigned long *bmap, int bmap_size,
                if (event_id == ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY &&
                    !(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
                        continue;
+               if (event_id == ASYNC_EVENT_CMPL_EVENT_ID_PHC_UPDATE &&
+                   !bp->ptp_cfg)
+                       continue;
                __set_bit(bnxt_async_events_arr[i], async_events_bmap);
        }
        if (bmap && bmap_size) {
@@ -5350,6 +5356,7 @@ static void bnxt_hwrm_update_rss_hash_cfg(struct bnxt *bp)
        if (hwrm_req_init(bp, req, HWRM_VNIC_RSS_QCFG))
                return;
 
+       req->vnic_id = cpu_to_le16(vnic->fw_vnic_id);
        /* all contexts configured to same hash_type, zero always exists */
        req->rss_ctx_idx = cpu_to_le16(vnic->fw_rss_cos_lb_ctx[0]);
        resp = hwrm_req_hold(bp, req);
@@ -8812,6 +8819,9 @@ static int bnxt_init_chip(struct bnxt *bp, bool irq_re_init)
                goto err_out;
        }
 
+       if (BNXT_VF(bp))
+               bnxt_hwrm_func_qcfg(bp);
+
        rc = bnxt_setup_vnic(bp, 0);
        if (rc)
                goto err_out;
@@ -11598,6 +11608,7 @@ static void bnxt_tx_timeout(struct net_device *dev, unsigned int txqueue)
 static void bnxt_fw_health_check(struct bnxt *bp)
 {
        struct bnxt_fw_health *fw_health = bp->fw_health;
+       struct pci_dev *pdev = bp->pdev;
        u32 val;
 
        if (!fw_health->enabled || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
@@ -11611,7 +11622,7 @@ static void bnxt_fw_health_check(struct bnxt *bp)
        }
 
        val = bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
-       if (val == fw_health->last_fw_heartbeat) {
+       if (val == fw_health->last_fw_heartbeat && pci_device_is_present(pdev)) {
                fw_health->arrests++;
                goto fw_reset;
        }
@@ -11619,7 +11630,7 @@ static void bnxt_fw_health_check(struct bnxt *bp)
        fw_health->last_fw_heartbeat = val;
 
        val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
-       if (val != fw_health->last_fw_reset_cnt) {
+       if (val != fw_health->last_fw_reset_cnt && pci_device_is_present(pdev)) {
                fw_health->discoveries++;
                goto fw_reset;
        }
@@ -13025,26 +13036,37 @@ static void bnxt_cfg_ntp_filters(struct bnxt *bp)
 
 #endif /* CONFIG_RFS_ACCEL */
 
-static int bnxt_udp_tunnel_sync(struct net_device *netdev, unsigned int table)
+static int bnxt_udp_tunnel_set_port(struct net_device *netdev, unsigned int table,
+                                   unsigned int entry, struct udp_tunnel_info *ti)
 {
        struct bnxt *bp = netdev_priv(netdev);
-       struct udp_tunnel_info ti;
        unsigned int cmd;
 
-       udp_tunnel_nic_get_port(netdev, table, 0, &ti);
-       if (ti.type == UDP_TUNNEL_TYPE_VXLAN)
+       if (ti->type == UDP_TUNNEL_TYPE_VXLAN)
                cmd = TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_VXLAN;
        else
                cmd = TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_GENEVE;
 
-       if (ti.port)
-               return bnxt_hwrm_tunnel_dst_port_alloc(bp, ti.port, cmd);
+       return bnxt_hwrm_tunnel_dst_port_alloc(bp, ti->port, cmd);
+}
+
+static int bnxt_udp_tunnel_unset_port(struct net_device *netdev, unsigned int table,
+                                     unsigned int entry, struct udp_tunnel_info *ti)
+{
+       struct bnxt *bp = netdev_priv(netdev);
+       unsigned int cmd;
+
+       if (ti->type == UDP_TUNNEL_TYPE_VXLAN)
+               cmd = TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_VXLAN;
+       else
+               cmd = TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_GENEVE;
 
        return bnxt_hwrm_tunnel_dst_port_free(bp, cmd);
 }
 
 static const struct udp_tunnel_nic_info bnxt_udp_tunnels = {
-       .sync_table     = bnxt_udp_tunnel_sync,
+       .set_port       = bnxt_udp_tunnel_set_port,
+       .unset_port     = bnxt_udp_tunnel_unset_port,
        .flags          = UDP_TUNNEL_NIC_INFO_MAY_SLEEP |
                          UDP_TUNNEL_NIC_INFO_OPEN_ONLY,
        .tables         = {
index 2dd8ee4..8fd5071 100644 (file)
@@ -3831,7 +3831,7 @@ static int bnxt_reset(struct net_device *dev, u32 *flags)
                }
        }
 
-       if (req & BNXT_FW_RESET_AP) {
+       if (!BNXT_CHIP_P4_PLUS(bp) && (req & BNXT_FW_RESET_AP)) {
                /* This feature is not supported in older firmware versions */
                if (bp->hwrm_spec_code >= 0x10803) {
                        if (!bnxt_firmware_reset_ap(dev)) {
index e466891..f388671 100644 (file)
@@ -952,6 +952,7 @@ int bnxt_ptp_init(struct bnxt *bp, bool phc_cfg)
                bnxt_ptp_timecounter_init(bp, true);
                bnxt_ptp_adjfine_rtc(bp, 0);
        }
+       bnxt_hwrm_func_drv_rgtr(bp, NULL, 0, true);
 
        ptp->ptp_info = bnxt_ptp_caps;
        if ((bp->fw_cap & BNXT_FW_CAP_PTP_PPS)) {
index f28ffc3..2b5761a 100644 (file)
@@ -1272,7 +1272,8 @@ static void bcmgenet_get_ethtool_stats(struct net_device *dev,
        }
 }
 
-static void bcmgenet_eee_enable_set(struct net_device *dev, bool enable)
+void bcmgenet_eee_enable_set(struct net_device *dev, bool enable,
+                            bool tx_lpi_enabled)
 {
        struct bcmgenet_priv *priv = netdev_priv(dev);
        u32 off = priv->hw_params->tbuf_offset + TBUF_ENERGY_CTRL;
@@ -1292,7 +1293,7 @@ static void bcmgenet_eee_enable_set(struct net_device *dev, bool enable)
 
        /* Enable EEE and switch to a 27Mhz clock automatically */
        reg = bcmgenet_readl(priv->base + off);
-       if (enable)
+       if (tx_lpi_enabled)
                reg |= TBUF_EEE_EN | TBUF_PM_EN;
        else
                reg &= ~(TBUF_EEE_EN | TBUF_PM_EN);
@@ -1313,6 +1314,7 @@ static void bcmgenet_eee_enable_set(struct net_device *dev, bool enable)
 
        priv->eee.eee_enabled = enable;
        priv->eee.eee_active = enable;
+       priv->eee.tx_lpi_enabled = tx_lpi_enabled;
 }
 
 static int bcmgenet_get_eee(struct net_device *dev, struct ethtool_eee *e)
@@ -1328,6 +1330,7 @@ static int bcmgenet_get_eee(struct net_device *dev, struct ethtool_eee *e)
 
        e->eee_enabled = p->eee_enabled;
        e->eee_active = p->eee_active;
+       e->tx_lpi_enabled = p->tx_lpi_enabled;
        e->tx_lpi_timer = bcmgenet_umac_readl(priv, UMAC_EEE_LPI_TIMER);
 
        return phy_ethtool_get_eee(dev->phydev, e);
@@ -1337,7 +1340,6 @@ static int bcmgenet_set_eee(struct net_device *dev, struct ethtool_eee *e)
 {
        struct bcmgenet_priv *priv = netdev_priv(dev);
        struct ethtool_eee *p = &priv->eee;
-       int ret = 0;
 
        if (GENET_IS_V1(priv))
                return -EOPNOTSUPP;
@@ -1348,16 +1350,11 @@ static int bcmgenet_set_eee(struct net_device *dev, struct ethtool_eee *e)
        p->eee_enabled = e->eee_enabled;
 
        if (!p->eee_enabled) {
-               bcmgenet_eee_enable_set(dev, false);
+               bcmgenet_eee_enable_set(dev, false, false);
        } else {
-               ret = phy_init_eee(dev->phydev, false);
-               if (ret) {
-                       netif_err(priv, hw, dev, "EEE initialization failed\n");
-                       return ret;
-               }
-
+               p->eee_active = phy_init_eee(dev->phydev, false) >= 0;
                bcmgenet_umac_writel(priv, e->tx_lpi_timer, UMAC_EEE_LPI_TIMER);
-               bcmgenet_eee_enable_set(dev, true);
+               bcmgenet_eee_enable_set(dev, p->eee_active, e->tx_lpi_enabled);
        }
 
        return phy_ethtool_set_eee(dev->phydev, e);
@@ -3450,7 +3447,7 @@ err_clk_disable:
        return ret;
 }
 
-static void bcmgenet_netif_stop(struct net_device *dev)
+static void bcmgenet_netif_stop(struct net_device *dev, bool stop_phy)
 {
        struct bcmgenet_priv *priv = netdev_priv(dev);
 
@@ -3465,6 +3462,8 @@ static void bcmgenet_netif_stop(struct net_device *dev)
        /* Disable MAC transmit. TX DMA disabled must be done before this */
        umac_enable_set(priv, CMD_TX_EN, false);
 
+       if (stop_phy)
+               phy_stop(dev->phydev);
        bcmgenet_disable_rx_napi(priv);
        bcmgenet_intr_disable(priv);
 
@@ -3485,7 +3484,7 @@ static int bcmgenet_close(struct net_device *dev)
 
        netif_dbg(priv, ifdown, dev, "bcmgenet_close\n");
 
-       bcmgenet_netif_stop(dev);
+       bcmgenet_netif_stop(dev, false);
 
        /* Really kill the PHY state machine and disconnect from it */
        phy_disconnect(dev->phydev);
@@ -4277,9 +4276,6 @@ static int bcmgenet_resume(struct device *d)
        if (!device_may_wakeup(d))
                phy_resume(dev->phydev);
 
-       if (priv->eee.eee_enabled)
-               bcmgenet_eee_enable_set(dev, true);
-
        bcmgenet_netif_start(dev);
 
        netif_device_attach(dev);
@@ -4303,7 +4299,7 @@ static int bcmgenet_suspend(struct device *d)
 
        netif_device_detach(dev);
 
-       bcmgenet_netif_stop(dev);
+       bcmgenet_netif_stop(dev, true);
 
        if (!device_may_wakeup(d))
                phy_suspend(dev->phydev);
index 946f6e2..1985c0e 100644 (file)
@@ -703,4 +703,7 @@ int bcmgenet_wol_power_down_cfg(struct bcmgenet_priv *priv,
 void bcmgenet_wol_power_up_cfg(struct bcmgenet_priv *priv,
                               enum bcmgenet_power_mode mode);
 
+void bcmgenet_eee_enable_set(struct net_device *dev, bool enable,
+                            bool tx_lpi_enabled);
+
 #endif /* __BCMGENET_H__ */
index be04290..c15ed0a 100644 (file)
@@ -87,6 +87,11 @@ static void bcmgenet_mac_config(struct net_device *dev)
                reg |= CMD_TX_EN | CMD_RX_EN;
        }
        bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+
+       priv->eee.eee_active = phy_init_eee(phydev, 0) >= 0;
+       bcmgenet_eee_enable_set(dev,
+                               priv->eee.eee_enabled && priv->eee.eee_active,
+                               priv->eee.tx_lpi_enabled);
 }
 
 /* setup netdev link state when PHY link status change and
index 7eb2ddb..a317feb 100644 (file)
@@ -1126,8 +1126,7 @@ static int bgx_lmac_enable(struct bgx *bgx, u8 lmacid)
        }
 
 poll:
-       lmac->check_link = alloc_workqueue("check_link", WQ_UNBOUND |
-                                          WQ_MEM_RECLAIM, 1);
+       lmac->check_link = alloc_ordered_workqueue("check_link", WQ_MEM_RECLAIM);
        if (!lmac->check_link)
                return -ENOMEM;
        INIT_DELAYED_WORK(&lmac->dwork, bgx_poll_for_link);
index 06a0c00..276c32c 100644 (file)
@@ -72,6 +72,8 @@
 #include <linux/gfp.h>
 #include <linux/io.h>
 
+#include <net/Space.h>
+
 #include <asm/irq.h>
 #include <linux/atomic.h>
 #if ALLOW_DMA
index 7e408bc..0defd51 100644 (file)
@@ -1135,8 +1135,8 @@ static struct sk_buff *be_lancer_xmit_workarounds(struct be_adapter *adapter,
        eth_hdr_len = ntohs(skb->protocol) == ETH_P_8021Q ?
                                                VLAN_ETH_HLEN : ETH_HLEN;
        if (skb->len <= 60 &&
-           (lancer_chip(adapter) || skb_vlan_tag_present(skb)) &&
-           is_ipv4_pkt(skb)) {
+           (lancer_chip(adapter) || BE3_chip(adapter) ||
+            skb_vlan_tag_present(skb)) && is_ipv4_pkt(skb)) {
                ip = (struct iphdr *)ip_hdr(skb);
                pskb_trim(skb, eth_hdr_len + ntohs(ip->tot_len));
        }
index b1871e6..00e50bd 100644 (file)
@@ -54,6 +54,9 @@ static int phy_mode(enum dpmac_eth_if eth_if, phy_interface_t *if_mode)
        case DPMAC_ETH_IF_XFI:
                *if_mode = PHY_INTERFACE_MODE_10GBASER;
                break;
+       case DPMAC_ETH_IF_CAUI:
+               *if_mode = PHY_INTERFACE_MODE_25GBASER;
+               break;
        default:
                return -EINVAL;
        }
@@ -79,6 +82,8 @@ static enum dpmac_eth_if dpmac_eth_if_mode(phy_interface_t if_mode)
                return DPMAC_ETH_IF_XFI;
        case PHY_INTERFACE_MODE_1000BASEX:
                return DPMAC_ETH_IF_1000BASEX;
+       case PHY_INTERFACE_MODE_25GBASER:
+               return DPMAC_ETH_IF_CAUI;
        default:
                return DPMAC_ETH_IF_MII;
        }
@@ -418,7 +423,7 @@ int dpaa2_mac_connect(struct dpaa2_mac *mac)
 
        mac->phylink_config.mac_capabilities = MAC_SYM_PAUSE | MAC_ASYM_PAUSE |
                MAC_10FD | MAC_100FD | MAC_1000FD | MAC_2500FD | MAC_5000FD |
-               MAC_10000FD;
+               MAC_10000FD | MAC_25000FD;
 
        dpaa2_mac_set_supported_interfaces(mac);
 
index 3c4fa26..9e1b253 100644 (file)
@@ -1229,7 +1229,13 @@ static int enetc_clean_rx_ring(struct enetc_bdr *rx_ring,
                if (!skb)
                        break;
 
-               rx_byte_cnt += skb->len;
+               /* When set, the outer VLAN header is extracted and reported
+                * in the receive buffer descriptor. So rx_byte_cnt should
+                * add the length of the extracted VLAN header.
+                */
+               if (bd_status & ENETC_RXBD_FLAG_VLAN)
+                       rx_byte_cnt += VLAN_HLEN;
+               rx_byte_cnt += skb->len + ETH_HLEN;
                rx_frm_cnt++;
 
                napi_gro_receive(napi, skb);
@@ -1565,6 +1571,14 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
                enetc_build_xdp_buff(rx_ring, bd_status, &rxbd, &i,
                                     &cleaned_cnt, &xdp_buff);
 
+               /* When set, the outer VLAN header is extracted and reported
+                * in the receive buffer descriptor. So rx_byte_cnt should
+                * add the length of the extracted VLAN header.
+                */
+               if (bd_status & ENETC_RXBD_FLAG_VLAN)
+                       rx_byte_cnt += VLAN_HLEN;
+               rx_byte_cnt += xdp_get_buff_len(&xdp_buff);
+
                xdp_act = bpf_prog_run_xdp(prog, &xdp_buff);
 
                switch (xdp_act) {
index 83c27bb..126007a 100644 (file)
@@ -181,8 +181,8 @@ int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data)
        int bw_sum = 0;
        u8 bw;
 
-       prio_top = netdev_get_prio_tc_map(ndev, tc_nums - 1);
-       prio_next = netdev_get_prio_tc_map(ndev, tc_nums - 2);
+       prio_top = tc_nums - 1;
+       prio_next = tc_nums - 2;
 
        /* Support highest prio and second prio tc in cbs mode */
        if (tc != prio_top && tc != prio_next)
index 42ec6ca..38e5b5a 100644 (file)
@@ -3798,7 +3798,6 @@ static int fec_enet_txq_xmit_frame(struct fec_enet_private *fep,
        entries_free = fec_enet_get_free_txdesc_num(txq);
        if (entries_free < MAX_SKB_FRAGS + 1) {
                netdev_err(fep->netdev, "NOT enough BD for SG!\n");
-               xdp_return_frame(frame);
                return NETDEV_TX_BUSY;
        }
 
@@ -3835,6 +3834,11 @@ static int fec_enet_txq_xmit_frame(struct fec_enet_private *fep,
        index = fec_enet_get_bd_index(last_bdp, &txq->bd);
        txq->tx_skbuff[index] = NULL;
 
+       /* Make sure the updates to rest of the descriptor are performed before
+        * transferring ownership.
+        */
+       dma_wmb();
+
        /* Send it on its way.  Tell FEC it's ready, interrupt when done,
         * it's the last BD of the frame, and to put the CRC on the end.
         */
@@ -3844,8 +3848,14 @@ static int fec_enet_txq_xmit_frame(struct fec_enet_private *fep,
        /* If this was the last BD in the ring, start at the beginning again. */
        bdp = fec_enet_get_nextdesc(last_bdp, &txq->bd);
 
+       /* Make sure the update to bdp are performed before txq->bd.cur. */
+       dma_wmb();
+
        txq->bd.cur = bdp;
 
+       /* Trigger transmission start */
+       writel(0, txq->bd.reg_desc_active);
+
        return 0;
 }
 
@@ -3874,12 +3884,6 @@ static int fec_enet_xdp_xmit(struct net_device *dev,
                sent_frames++;
        }
 
-       /* Make sure the update to bdp and tx_skbuff are performed. */
-       wmb();
-
-       /* Trigger transmission start */
-       writel(0, txq->bd.reg_desc_active);
-
        __netif_tx_unlock(nq);
 
        return sent_frames;
@@ -4478,9 +4482,11 @@ fec_drv_remove(struct platform_device *pdev)
        struct device_node *np = pdev->dev.of_node;
        int ret;
 
-       ret = pm_runtime_resume_and_get(&pdev->dev);
+       ret = pm_runtime_get_sync(&pdev->dev);
        if (ret < 0)
-               return ret;
+               dev_err(&pdev->dev,
+                       "Failed to resume device in remove callback (%pe)\n",
+                       ERR_PTR(ret));
 
        cancel_work_sync(&fep->tx_timeout_work);
        fec_ptp_stop(pdev);
@@ -4493,8 +4499,13 @@ fec_drv_remove(struct platform_device *pdev)
                of_phy_deregister_fixed_link(np);
        of_node_put(fep->phy_node);
 
-       clk_disable_unprepare(fep->clk_ahb);
-       clk_disable_unprepare(fep->clk_ipg);
+       /* After pm_runtime_get_sync() failed, the clks are still off, so skip
+        * disabling them again.
+        */
+       if (ret >= 0) {
+               clk_disable_unprepare(fep->clk_ahb);
+               clk_disable_unprepare(fep->clk_ipg);
+       }
        pm_runtime_put_noidle(&pdev->dev);
        pm_runtime_disable(&pdev->dev);
 
index 57ce743..caa00c7 100644 (file)
@@ -294,19 +294,6 @@ static int gve_napi_poll_dqo(struct napi_struct *napi, int budget)
        bool reschedule = false;
        int work_done = 0;
 
-       /* Clear PCI MSI-X Pending Bit Array (PBA)
-        *
-        * This bit is set if an interrupt event occurs while the vector is
-        * masked. If this bit is set and we reenable the interrupt, it will
-        * fire again. Since we're just about to poll the queue state, we don't
-        * need it to fire again.
-        *
-        * Under high softirq load, it's possible that the interrupt condition
-        * is triggered twice before we got the chance to process it.
-        */
-       gve_write_irq_doorbell_dqo(priv, block,
-                                  GVE_ITR_NO_UPDATE_DQO | GVE_ITR_CLEAR_PBA_BIT_DQO);
-
        if (block->tx)
                reschedule |= gve_tx_poll_dqo(block, /*do_clean=*/true);
 
index cbbab5b..b85c412 100644 (file)
@@ -331,9 +331,25 @@ static int hclge_comm_cmd_csq_done(struct hclge_comm_hw *hw)
        return head == hw->cmq.csq.next_to_use;
 }
 
-static void hclge_comm_wait_for_resp(struct hclge_comm_hw *hw,
+static u32 hclge_get_cmdq_tx_timeout(u16 opcode, u32 tx_timeout)
+{
+       static const struct hclge_cmdq_tx_timeout_map cmdq_tx_timeout_map[] = {
+               {HCLGE_OPC_CFG_RST_TRIGGER, HCLGE_COMM_CMDQ_TX_TIMEOUT_500MS},
+       };
+       u32 i;
+
+       for (i = 0; i < ARRAY_SIZE(cmdq_tx_timeout_map); i++)
+               if (cmdq_tx_timeout_map[i].opcode == opcode)
+                       return cmdq_tx_timeout_map[i].tx_timeout;
+
+       return tx_timeout;
+}
+
+static void hclge_comm_wait_for_resp(struct hclge_comm_hw *hw, u16 opcode,
                                     bool *is_completed)
 {
+       u32 cmdq_tx_timeout = hclge_get_cmdq_tx_timeout(opcode,
+                                                       hw->cmq.tx_timeout);
        u32 timeout = 0;
 
        do {
@@ -343,7 +359,7 @@ static void hclge_comm_wait_for_resp(struct hclge_comm_hw *hw,
                }
                udelay(1);
                timeout++;
-       } while (timeout < hw->cmq.tx_timeout);
+       } while (timeout < cmdq_tx_timeout);
 }
 
 static int hclge_comm_cmd_convert_err_code(u16 desc_ret)
@@ -407,7 +423,8 @@ static int hclge_comm_cmd_check_result(struct hclge_comm_hw *hw,
         * if multi descriptors to be sent, use the first one to check
         */
        if (HCLGE_COMM_SEND_SYNC(le16_to_cpu(desc->flag)))
-               hclge_comm_wait_for_resp(hw, &is_completed);
+               hclge_comm_wait_for_resp(hw, le16_to_cpu(desc->opcode),
+                                        &is_completed);
 
        if (!is_completed)
                ret = -EBADE;
@@ -529,7 +546,7 @@ int hclge_comm_cmd_queue_init(struct pci_dev *pdev, struct hclge_comm_hw *hw)
        cmdq->crq.desc_num = HCLGE_COMM_NIC_CMQ_DESC_NUM;
 
        /* Setup Tx write back timeout */
-       cmdq->tx_timeout = HCLGE_COMM_CMDQ_TX_TIMEOUT;
+       cmdq->tx_timeout = HCLGE_COMM_CMDQ_TX_TIMEOUT_DEFAULT;
 
        /* Setup queue rings */
        ret = hclge_comm_alloc_cmd_queue(hw, HCLGE_COMM_TYPE_CSQ);
index de72ecb..18f1b4b 100644 (file)
@@ -54,7 +54,8 @@
 #define HCLGE_COMM_NIC_SW_RST_RDY              BIT(HCLGE_COMM_NIC_SW_RST_RDY_B)
 #define HCLGE_COMM_NIC_CMQ_DESC_NUM_S          3
 #define HCLGE_COMM_NIC_CMQ_DESC_NUM            1024
-#define HCLGE_COMM_CMDQ_TX_TIMEOUT             30000
+#define HCLGE_COMM_CMDQ_TX_TIMEOUT_DEFAULT     30000
+#define HCLGE_COMM_CMDQ_TX_TIMEOUT_500MS       500000
 
 enum hclge_opcode_type {
        /* Generic commands */
@@ -360,6 +361,11 @@ struct hclge_comm_caps_bit_map {
        u16 local_bit;
 };
 
+struct hclge_cmdq_tx_timeout_map {
+       u32 opcode;
+       u32 tx_timeout;
+};
+
 struct hclge_comm_firmware_compat_cmd {
        __le32 compat;
        u8 rsv[20];
index 4c3e90a..d385ffc 100644 (file)
@@ -130,7 +130,7 @@ static struct hns3_dbg_cmd_info hns3_dbg_cmd[] = {
                .name = "tx_bd_queue",
                .cmd = HNAE3_DBG_CMD_TX_BD,
                .dentry = HNS3_DBG_DENTRY_TX_BD,
-               .buf_len = HNS3_DBG_READ_LEN_4MB,
+               .buf_len = HNS3_DBG_READ_LEN_5MB,
                .init = hns3_dbg_bd_file_init,
        },
        {
index 97578ea..4a5ef8a 100644 (file)
@@ -10,6 +10,7 @@
 #define HNS3_DBG_READ_LEN_128KB        0x20000
 #define HNS3_DBG_READ_LEN_1MB  0x100000
 #define HNS3_DBG_READ_LEN_4MB  0x400000
+#define HNS3_DBG_READ_LEN_5MB  0x500000
 #define HNS3_DBG_WRITE_LEN     1024
 
 #define HNS3_DBG_DATA_STR_LEN  32
index 4fb5406..2689b10 100644 (file)
@@ -8053,12 +8053,15 @@ static void hclge_ae_stop(struct hnae3_handle *handle)
        /* If it is not PF reset or FLR, the firmware will disable the MAC,
         * so it only need to stop phy here.
         */
-       if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) &&
-           hdev->reset_type != HNAE3_FUNC_RESET &&
-           hdev->reset_type != HNAE3_FLR_RESET) {
-               hclge_mac_stop_phy(hdev);
-               hclge_update_link_status(hdev);
-               return;
+       if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state)) {
+               hclge_pfc_pause_en_cfg(hdev, HCLGE_PFC_TX_RX_DISABLE,
+                                      HCLGE_PFC_DISABLE);
+               if (hdev->reset_type != HNAE3_FUNC_RESET &&
+                   hdev->reset_type != HNAE3_FLR_RESET) {
+                       hclge_mac_stop_phy(hdev);
+                       hclge_update_link_status(hdev);
+                       return;
+               }
        }
 
        hclge_reset_tqp(handle);
index 4a33f65..922c0da 100644 (file)
@@ -171,8 +171,8 @@ int hclge_mac_pause_en_cfg(struct hclge_dev *hdev, bool tx, bool rx)
        return hclge_cmd_send(&hdev->hw, &desc, 1);
 }
 
-static int hclge_pfc_pause_en_cfg(struct hclge_dev *hdev, u8 tx_rx_bitmap,
-                                 u8 pfc_bitmap)
+int hclge_pfc_pause_en_cfg(struct hclge_dev *hdev, u8 tx_rx_bitmap,
+                          u8 pfc_bitmap)
 {
        struct hclge_desc desc;
        struct hclge_pfc_en_cmd *pfc = (struct hclge_pfc_en_cmd *)desc.data;
index 68f28a9..dd6f1fd 100644 (file)
@@ -164,6 +164,9 @@ struct hclge_bp_to_qs_map_cmd {
        u32 rsvd1;
 };
 
+#define HCLGE_PFC_DISABLE      0
+#define HCLGE_PFC_TX_RX_DISABLE        0
+
 struct hclge_pfc_en_cmd {
        u8 tx_rx_en_bitmap;
        u8 pri_en_bitmap;
@@ -235,6 +238,8 @@ void hclge_tm_schd_info_update(struct hclge_dev *hdev, u8 num_tc);
 void hclge_tm_pfc_info_update(struct hclge_dev *hdev);
 int hclge_tm_dwrr_cfg(struct hclge_dev *hdev);
 int hclge_tm_init_hw(struct hclge_dev *hdev, bool init);
+int hclge_pfc_pause_en_cfg(struct hclge_dev *hdev, u8 tx_rx_bitmap,
+                          u8 pfc_bitmap);
 int hclge_mac_pause_en_cfg(struct hclge_dev *hdev, bool tx, bool rx);
 int hclge_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr);
 void hclge_pfc_rx_stats_get(struct hclge_dev *hdev, u64 *stats);
index f240462..dd08989 100644 (file)
@@ -1436,7 +1436,10 @@ static int hclgevf_reset_wait(struct hclgevf_dev *hdev)
         * might happen in case reset assertion was made by PF. Yes, this also
         * means we might end up waiting bit more even for VF reset.
         */
-       msleep(5000);
+       if (hdev->reset_type == HNAE3_VF_FULL_RESET)
+               msleep(5000);
+       else
+               msleep(500);
 
        return 0;
 }
index 9abaff1..39d0fe7 100644 (file)
@@ -525,7 +525,7 @@ void iavf_set_ethtool_ops(struct net_device *netdev);
 void iavf_update_stats(struct iavf_adapter *adapter);
 void iavf_reset_interrupt_capability(struct iavf_adapter *adapter);
 int iavf_init_interrupt_scheme(struct iavf_adapter *adapter);
-void iavf_irq_enable_queues(struct iavf_adapter *adapter, u32 mask);
+void iavf_irq_enable_queues(struct iavf_adapter *adapter);
 void iavf_free_all_tx_resources(struct iavf_adapter *adapter);
 void iavf_free_all_rx_resources(struct iavf_adapter *adapter);
 
index 2de4baf..4a66873 100644 (file)
@@ -359,21 +359,18 @@ static void iavf_irq_disable(struct iavf_adapter *adapter)
 }
 
 /**
- * iavf_irq_enable_queues - Enable interrupt for specified queues
+ * iavf_irq_enable_queues - Enable interrupt for all queues
  * @adapter: board private structure
- * @mask: bitmap of queues to enable
  **/
-void iavf_irq_enable_queues(struct iavf_adapter *adapter, u32 mask)
+void iavf_irq_enable_queues(struct iavf_adapter *adapter)
 {
        struct iavf_hw *hw = &adapter->hw;
        int i;
 
        for (i = 1; i < adapter->num_msix_vectors; i++) {
-               if (mask & BIT(i - 1)) {
-                       wr32(hw, IAVF_VFINT_DYN_CTLN1(i - 1),
-                            IAVF_VFINT_DYN_CTLN1_INTENA_MASK |
-                            IAVF_VFINT_DYN_CTLN1_ITR_INDX_MASK);
-               }
+               wr32(hw, IAVF_VFINT_DYN_CTLN1(i - 1),
+                    IAVF_VFINT_DYN_CTLN1_INTENA_MASK |
+                    IAVF_VFINT_DYN_CTLN1_ITR_INDX_MASK);
        }
 }
 
@@ -387,7 +384,7 @@ void iavf_irq_enable(struct iavf_adapter *adapter, bool flush)
        struct iavf_hw *hw = &adapter->hw;
 
        iavf_misc_irq_enable(adapter);
-       iavf_irq_enable_queues(adapter, ~0);
+       iavf_irq_enable_queues(adapter);
 
        if (flush)
                iavf_flush(hw);
index bf79333..a19e888 100644 (file)
@@ -40,7 +40,7 @@
 #define IAVF_VFINT_DYN_CTL01_INTENA_MASK IAVF_MASK(0x1, IAVF_VFINT_DYN_CTL01_INTENA_SHIFT)
 #define IAVF_VFINT_DYN_CTL01_ITR_INDX_SHIFT 3
 #define IAVF_VFINT_DYN_CTL01_ITR_INDX_MASK IAVF_MASK(0x3, IAVF_VFINT_DYN_CTL01_ITR_INDX_SHIFT)
-#define IAVF_VFINT_DYN_CTLN1(_INTVF) (0x00003800 + ((_INTVF) * 4)) /* _i=0...15 */ /* Reset: VFR */
+#define IAVF_VFINT_DYN_CTLN1(_INTVF) (0x00003800 + ((_INTVF) * 4)) /* _i=0...63 */ /* Reset: VFR */
 #define IAVF_VFINT_DYN_CTLN1_INTENA_SHIFT 0
 #define IAVF_VFINT_DYN_CTLN1_INTENA_MASK IAVF_MASK(0x1, IAVF_VFINT_DYN_CTLN1_INTENA_SHIFT)
 #define IAVF_VFINT_DYN_CTLN1_SWINT_TRIG_SHIFT 2
index 9afbbda..7c0578b 100644 (file)
@@ -2238,11 +2238,6 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
                iavf_process_config(adapter);
                adapter->flags |= IAVF_FLAG_SETUP_NETDEV_FEATURES;
 
-               /* Request VLAN offload settings */
-               if (VLAN_V2_ALLOWED(adapter))
-                       iavf_set_vlan_offload_features(adapter, 0,
-                                                      netdev->features);
-
                iavf_set_queue_vlan_tag_loc(adapter);
 
                was_mac_changed = !ether_addr_equal(netdev->dev_addr,
index 0157f6e..eb2dc09 100644 (file)
@@ -5160,7 +5160,7 @@ ice_aq_read_i2c(struct ice_hw *hw, struct ice_aqc_link_topo_addr topo_addr,
  */
 int
 ice_aq_write_i2c(struct ice_hw *hw, struct ice_aqc_link_topo_addr topo_addr,
-                u16 bus_addr, __le16 addr, u8 params, u8 *data,
+                u16 bus_addr, __le16 addr, u8 params, const u8 *data,
                 struct ice_sq_cd *cd)
 {
        struct ice_aq_desc desc = { 0 };
index 8ba5f93..81961a7 100644 (file)
@@ -229,7 +229,7 @@ ice_aq_read_i2c(struct ice_hw *hw, struct ice_aqc_link_topo_addr topo_addr,
                struct ice_sq_cd *cd);
 int
 ice_aq_write_i2c(struct ice_hw *hw, struct ice_aqc_link_topo_addr topo_addr,
-                u16 bus_addr, __le16 addr, u8 params, u8 *data,
+                u16 bus_addr, __le16 addr, u8 params, const u8 *data,
                 struct ice_sq_cd *cd);
 bool ice_fw_supports_report_dflt_cfg(struct ice_hw *hw);
 #endif /* _ICE_COMMON_H_ */
index c6d4926..850db8e 100644 (file)
@@ -932,10 +932,9 @@ ice_tx_prepare_vlan_flags_dcb(struct ice_tx_ring *tx_ring,
        if ((first->tx_flags & ICE_TX_FLAGS_HW_VLAN ||
             first->tx_flags & ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN) ||
            skb->priority != TC_PRIO_CONTROL) {
-               first->tx_flags &= ~ICE_TX_FLAGS_VLAN_PR_M;
+               first->vid &= ~VLAN_PRIO_MASK;
                /* Mask the lower 3 bits to set the 802.1p priority */
-               first->tx_flags |= (skb->priority & 0x7) <<
-                                  ICE_TX_FLAGS_VLAN_PR_S;
+               first->vid |= (skb->priority << VLAN_PRIO_SHIFT) & VLAN_PRIO_MASK;
                /* if this is not already set it means a VLAN 0 + priority needs
                 * to be offloaded
                 */
index 2ea8a2b..75c9de6 100644 (file)
@@ -16,8 +16,8 @@
  * * number of bytes written - success
  * * negative - error code
  */
-static unsigned int
-ice_gnss_do_write(struct ice_pf *pf, unsigned char *buf, unsigned int size)
+static int
+ice_gnss_do_write(struct ice_pf *pf, const unsigned char *buf, unsigned int size)
 {
        struct ice_aqc_link_topo_addr link_topo;
        struct ice_hw *hw = &pf->hw;
@@ -72,39 +72,7 @@ err_out:
        dev_err(ice_pf_to_dev(pf), "GNSS failed to write, offset=%u, size=%u, err=%d\n",
                offset, size, err);
 
-       return offset;
-}
-
-/**
- * ice_gnss_write_pending - Write all pending data to internal GNSS
- * @work: GNSS write work structure
- */
-static void ice_gnss_write_pending(struct kthread_work *work)
-{
-       struct gnss_serial *gnss = container_of(work, struct gnss_serial,
-                                               write_work);
-       struct ice_pf *pf = gnss->back;
-
-       if (!pf)
-               return;
-
-       if (!test_bit(ICE_FLAG_GNSS, pf->flags))
-               return;
-
-       if (!list_empty(&gnss->queue)) {
-               struct gnss_write_buf *write_buf = NULL;
-               unsigned int bytes;
-
-               write_buf = list_first_entry(&gnss->queue,
-                                            struct gnss_write_buf, queue);
-
-               bytes = ice_gnss_do_write(pf, write_buf->buf, write_buf->size);
-               dev_dbg(ice_pf_to_dev(pf), "%u bytes written to GNSS\n", bytes);
-
-               list_del(&write_buf->queue);
-               kfree(write_buf->buf);
-               kfree(write_buf);
-       }
+       return err;
 }
 
 /**
@@ -128,12 +96,7 @@ static void ice_gnss_read(struct kthread_work *work)
        int err = 0;
 
        pf = gnss->back;
-       if (!pf) {
-               err = -EFAULT;
-               goto exit;
-       }
-
-       if (!test_bit(ICE_FLAG_GNSS, pf->flags))
+       if (!pf || !test_bit(ICE_FLAG_GNSS, pf->flags))
                return;
 
        hw = &pf->hw;
@@ -191,7 +154,6 @@ free_buf:
        free_page((unsigned long)buf);
 requeue:
        kthread_queue_delayed_work(gnss->kworker, &gnss->read_work, delay);
-exit:
        if (err)
                dev_dbg(ice_pf_to_dev(pf), "GNSS failed to read err=%d\n", err);
 }
@@ -220,8 +182,6 @@ static struct gnss_serial *ice_gnss_struct_init(struct ice_pf *pf)
        pf->gnss_serial = gnss;
 
        kthread_init_delayed_work(&gnss->read_work, ice_gnss_read);
-       INIT_LIST_HEAD(&gnss->queue);
-       kthread_init_work(&gnss->write_work, ice_gnss_write_pending);
        kworker = kthread_create_worker(0, "ice-gnss-%s", dev_name(dev));
        if (IS_ERR(kworker)) {
                kfree(gnss);
@@ -281,7 +241,6 @@ static void ice_gnss_close(struct gnss_device *gdev)
        if (!gnss)
                return;
 
-       kthread_cancel_work_sync(&gnss->write_work);
        kthread_cancel_delayed_work_sync(&gnss->read_work);
 }
 
@@ -300,10 +259,7 @@ ice_gnss_write(struct gnss_device *gdev, const unsigned char *buf,
               size_t count)
 {
        struct ice_pf *pf = gnss_get_drvdata(gdev);
-       struct gnss_write_buf *write_buf;
        struct gnss_serial *gnss;
-       unsigned char *cmd_buf;
-       int err = count;
 
        /* We cannot write a single byte using our I2C implementation. */
        if (count <= 1 || count > ICE_GNSS_TTY_WRITE_BUF)
@@ -319,24 +275,7 @@ ice_gnss_write(struct gnss_device *gdev, const unsigned char *buf,
        if (!gnss)
                return -ENODEV;
 
-       cmd_buf = kcalloc(count, sizeof(*buf), GFP_KERNEL);
-       if (!cmd_buf)
-               return -ENOMEM;
-
-       memcpy(cmd_buf, buf, count);
-       write_buf = kzalloc(sizeof(*write_buf), GFP_KERNEL);
-       if (!write_buf) {
-               kfree(cmd_buf);
-               return -ENOMEM;
-       }
-
-       write_buf->buf = cmd_buf;
-       write_buf->size = count;
-       INIT_LIST_HEAD(&write_buf->queue);
-       list_add_tail(&write_buf->queue, &gnss->queue);
-       kthread_queue_work(gnss->kworker, &gnss->write_work);
-
-       return err;
+       return ice_gnss_do_write(pf, buf, count);
 }
 
 static const struct gnss_operations ice_gnss_ops = {
@@ -432,7 +371,6 @@ void ice_gnss_exit(struct ice_pf *pf)
        if (pf->gnss_serial) {
                struct gnss_serial *gnss = pf->gnss_serial;
 
-               kthread_cancel_work_sync(&gnss->write_work);
                kthread_cancel_delayed_work_sync(&gnss->read_work);
                kthread_destroy_worker(gnss->kworker);
                gnss->kworker = NULL;
index b8bb8b6..75e567a 100644 (file)
  */
 #define ICE_GNSS_UBX_WRITE_BYTES       (ICE_MAX_I2C_WRITE_BYTES + 1)
 
-struct gnss_write_buf {
-       struct list_head queue;
-       unsigned int size;
-       unsigned char *buf;
-};
-
 /**
  * struct gnss_serial - data used to initialize GNSS TTY port
  * @back: back pointer to PF
  * @kworker: kwork thread for handling periodic work
  * @read_work: read_work function for handling GNSS reads
- * @write_work: write_work function for handling GNSS writes
- * @queue: write buffers queue
  */
 struct gnss_serial {
        struct ice_pf *back;
        struct kthread_worker *kworker;
        struct kthread_delayed_work read_work;
-       struct kthread_work write_work;
-       struct list_head queue;
 };
 
 #if IS_ENABLED(CONFIG_GNSS)
index 450317d..11ae0e4 100644 (file)
@@ -2745,6 +2745,8 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vsi_cfg_params *params)
                        goto unroll_vector_base;
 
                ice_vsi_map_rings_to_vectors(vsi);
+               vsi->stat_offsets_loaded = false;
+
                if (ice_is_xdp_ena_vsi(vsi)) {
                        ret = ice_vsi_determine_xdp_res(vsi);
                        if (ret)
@@ -2793,6 +2795,9 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vsi_cfg_params *params)
                ret = ice_vsi_alloc_ring_stats(vsi);
                if (ret)
                        goto unroll_vector_base;
+
+               vsi->stat_offsets_loaded = false;
+
                /* Do not exit if configuring RSS had an issue, at least
                 * receive traffic on first queue. Hence no need to capture
                 * return value
index a1f7c8e..42c318c 100644 (file)
@@ -4802,9 +4802,13 @@ err_init_pf:
 static void ice_deinit_dev(struct ice_pf *pf)
 {
        ice_free_irq_msix_misc(pf);
-       ice_clear_interrupt_scheme(pf);
        ice_deinit_pf(pf);
        ice_deinit_hw(&pf->hw);
+
+       /* Service task is already stopped, so call reset directly. */
+       ice_reset(&pf->hw, ICE_RESET_PFR);
+       pci_wait_for_pending_transaction(pf->pdev);
+       ice_clear_interrupt_scheme(pf);
 }
 
 static void ice_init_features(struct ice_pf *pf)
@@ -5094,10 +5098,6 @@ int ice_load(struct ice_pf *pf)
        struct ice_vsi *vsi;
        int err;
 
-       err = ice_reset(&pf->hw, ICE_RESET_PFR);
-       if (err)
-               return err;
-
        err = ice_init_dev(pf);
        if (err)
                return err;
@@ -5354,12 +5354,6 @@ static void ice_remove(struct pci_dev *pdev)
        ice_setup_mc_magic_wake(pf);
        ice_set_wake(pf);
 
-       /* Issue a PFR as part of the prescribed driver unload flow.  Do not
-        * do it via ice_schedule_reset() since there is no need to rebuild
-        * and the service task is already stopped.
-        */
-       ice_reset(&pf->hw, ICE_RESET_PFR);
-       pci_wait_for_pending_transaction(pdev);
        pci_disable_device(pdev);
 }
 
@@ -7056,6 +7050,10 @@ int ice_down(struct ice_vsi *vsi)
        ice_for_each_txq(vsi, i)
                ice_clean_tx_ring(vsi->tx_rings[i]);
 
+       if (ice_is_xdp_ena_vsi(vsi))
+               ice_for_each_xdp_txq(vsi, i)
+                       ice_clean_tx_ring(vsi->xdp_rings[i]);
+
        ice_for_each_rxq(vsi, i)
                ice_clean_rx_ring(vsi->rx_rings[i]);
 
index f1dca59..588ad86 100644 (file)
@@ -1171,7 +1171,7 @@ int ice_set_vf_spoofchk(struct net_device *netdev, int vf_id, bool ena)
        if (!vf)
                return -EINVAL;
 
-       ret = ice_check_vf_ready_for_cfg(vf);
+       ret = ice_check_vf_ready_for_reset(vf);
        if (ret)
                goto out_put_vf;
 
@@ -1286,7 +1286,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
                goto out_put_vf;
        }
 
-       ret = ice_check_vf_ready_for_cfg(vf);
+       ret = ice_check_vf_ready_for_reset(vf);
        if (ret)
                goto out_put_vf;
 
@@ -1340,7 +1340,7 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
                return -EOPNOTSUPP;
        }
 
-       ret = ice_check_vf_ready_for_cfg(vf);
+       ret = ice_check_vf_ready_for_reset(vf);
        if (ret)
                goto out_put_vf;
 
@@ -1653,7 +1653,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
        if (!vf)
                return -EINVAL;
 
-       ret = ice_check_vf_ready_for_cfg(vf);
+       ret = ice_check_vf_ready_for_reset(vf);
        if (ret)
                goto out_put_vf;
 
index 4fcf2d0..52d0a12 100644 (file)
@@ -1152,11 +1152,11 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
        unsigned int total_rx_bytes = 0, total_rx_pkts = 0;
        unsigned int offset = rx_ring->rx_offset;
        struct xdp_buff *xdp = &rx_ring->xdp;
+       u32 cached_ntc = rx_ring->first_desc;
        struct ice_tx_ring *xdp_ring = NULL;
        struct bpf_prog *xdp_prog = NULL;
        u32 ntc = rx_ring->next_to_clean;
        u32 cnt = rx_ring->count;
-       u32 cached_ntc = ntc;
        u32 xdp_xmit = 0;
        u32 cached_ntu;
        bool failure;
@@ -1664,8 +1664,7 @@ ice_tx_map(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first,
 
        if (first->tx_flags & ICE_TX_FLAGS_HW_VLAN) {
                td_cmd |= (u64)ICE_TX_DESC_CMD_IL2TAG1;
-               td_tag = (first->tx_flags & ICE_TX_FLAGS_VLAN_M) >>
-                         ICE_TX_FLAGS_VLAN_S;
+               td_tag = first->vid;
        }
 
        dma = dma_map_single(tx_ring->dev, skb->data, size, DMA_TO_DEVICE);
@@ -1998,7 +1997,7 @@ ice_tx_prepare_vlan_flags(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first)
         * VLAN offloads exclusively so we only care about the VLAN ID here
         */
        if (skb_vlan_tag_present(skb)) {
-               first->tx_flags |= skb_vlan_tag_get(skb) << ICE_TX_FLAGS_VLAN_S;
+               first->vid = skb_vlan_tag_get(skb);
                if (tx_ring->flags & ICE_TX_FLAGS_RING_VLAN_L2TAG2)
                        first->tx_flags |= ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN;
                else
@@ -2388,8 +2387,7 @@ ice_xmit_frame_ring(struct sk_buff *skb, struct ice_tx_ring *tx_ring)
                offload.cd_qw1 |= (u64)(ICE_TX_DESC_DTYPE_CTX |
                                        (ICE_TX_CTX_DESC_IL2TAG2 <<
                                        ICE_TXD_CTX_QW1_CMD_S));
-               offload.cd_l2tag2 = (first->tx_flags & ICE_TX_FLAGS_VLAN_M) >>
-                       ICE_TX_FLAGS_VLAN_S;
+               offload.cd_l2tag2 = first->vid;
        }
 
        /* set up TSO offload */
index fff0efe..166413f 100644 (file)
@@ -127,10 +127,6 @@ static inline int ice_skb_pad(void)
 #define ICE_TX_FLAGS_IPV6      BIT(6)
 #define ICE_TX_FLAGS_TUNNEL    BIT(7)
 #define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN      BIT(8)
-#define ICE_TX_FLAGS_VLAN_M    0xffff0000
-#define ICE_TX_FLAGS_VLAN_PR_M 0xe0000000
-#define ICE_TX_FLAGS_VLAN_PR_S 29
-#define ICE_TX_FLAGS_VLAN_S    16
 
 #define ICE_XDP_PASS           0
 #define ICE_XDP_CONSUMED       BIT(0)
@@ -182,8 +178,9 @@ struct ice_tx_buf {
                unsigned int gso_segs;
                unsigned int nr_frags;  /* used for mbuf XDP */
        };
-       u32 type:16;                    /* &ice_tx_buf_type */
-       u32 tx_flags:16;
+       u32 tx_flags:12;
+       u32 type:4;                     /* &ice_tx_buf_type */
+       u32 vid:16;
        DEFINE_DMA_UNMAP_LEN(len);
        DEFINE_DMA_UNMAP_ADDR(dma);
 };
index 89fd698..bf74a2f 100644 (file)
@@ -186,6 +186,25 @@ int ice_check_vf_ready_for_cfg(struct ice_vf *vf)
 }
 
 /**
+ * ice_check_vf_ready_for_reset - check if VF is ready to be reset
+ * @vf: VF to check if it's ready to be reset
+ *
+ * The purpose of this function is to ensure that the VF is not in reset,
+ * disabled, and is both initialized and active, thus enabling us to safely
+ * initialize another reset.
+ */
+int ice_check_vf_ready_for_reset(struct ice_vf *vf)
+{
+       int ret;
+
+       ret = ice_check_vf_ready_for_cfg(vf);
+       if (!ret && !test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states))
+               ret = -EAGAIN;
+
+       return ret;
+}
+
+/**
  * ice_trigger_vf_reset - Reset a VF on HW
  * @vf: pointer to the VF structure
  * @is_vflr: true if VFLR was issued, false if not
index e3cda6f..a38ef00 100644 (file)
@@ -215,6 +215,7 @@ u16 ice_get_num_vfs(struct ice_pf *pf);
 struct ice_vsi *ice_get_vf_vsi(struct ice_vf *vf);
 bool ice_is_vf_disabled(struct ice_vf *vf);
 int ice_check_vf_ready_for_cfg(struct ice_vf *vf);
+int ice_check_vf_ready_for_reset(struct ice_vf *vf);
 void ice_set_vf_state_dis(struct ice_vf *vf);
 bool ice_is_any_vf_in_unicast_promisc(struct ice_pf *pf);
 void
index 97243c6..f4a524f 100644 (file)
@@ -3955,6 +3955,7 @@ error_handler:
                ice_vc_notify_vf_link_state(vf);
                break;
        case VIRTCHNL_OP_RESET_VF:
+               clear_bit(ICE_VF_STATE_ACTIVE, vf->vf_states);
                ops->reset_vf(vf);
                break;
        case VIRTCHNL_OP_ADD_ETH_ADDR:
index 205d577..caf91c6 100644 (file)
@@ -426,7 +426,7 @@ void igb_mta_set(struct e1000_hw *hw, u32 hash_value)
 static u32 igb_hash_mc_addr(struct e1000_hw *hw, u8 *mc_addr)
 {
        u32 hash_value, hash_mask;
-       u8 bit_shift = 0;
+       u8 bit_shift = 1;
 
        /* Register count multiplied by bits per register */
        hash_mask = (hw->mac.mta_reg_count * 32) - 1;
@@ -434,7 +434,7 @@ static u32 igb_hash_mc_addr(struct e1000_hw *hw, u8 *mc_addr)
        /* For a mc_filter_type of 0, bit_shift is the number of left-shifts
         * where 0xFF would still fall within the hash mask.
         */
-       while (hash_mask >> bit_shift != 0xFF)
+       while (hash_mask >> bit_shift != 0xFF && bit_shift < 4)
                bit_shift++;
 
        /* The portion of the address that is used for the hash table
index 7d60da1..319ed60 100644 (file)
@@ -822,6 +822,8 @@ static int igb_set_eeprom(struct net_device *netdev,
                 */
                ret_val = hw->nvm.ops.read(hw, last_word, 1,
                                   &eeprom_buff[last_word - first_word]);
+               if (ret_val)
+                       goto out;
        }
 
        /* Device's eeprom is always little-endian, word addressable */
@@ -841,6 +843,7 @@ static int igb_set_eeprom(struct net_device *netdev,
                hw->nvm.ops.update(hw);
 
        igb_set_fw_version(adapter);
+out:
        kfree(eeprom_buff);
        return ret_val;
 }
index 58872a4..bb3db38 100644 (file)
@@ -6947,6 +6947,7 @@ static void igb_extts(struct igb_adapter *adapter, int tsintr_tt)
        struct e1000_hw *hw = &adapter->hw;
        struct ptp_clock_event event;
        struct timespec64 ts;
+       unsigned long flags;
 
        if (pin < 0 || pin >= IGB_N_SDP)
                return;
@@ -6954,9 +6955,12 @@ static void igb_extts(struct igb_adapter *adapter, int tsintr_tt)
        if (hw->mac.type == e1000_82580 ||
            hw->mac.type == e1000_i354 ||
            hw->mac.type == e1000_i350) {
-               s64 ns = rd32(auxstmpl);
+               u64 ns = rd32(auxstmpl);
 
-               ns += ((s64)(rd32(auxstmph) & 0xFF)) << 32;
+               ns += ((u64)(rd32(auxstmph) & 0xFF)) << 32;
+               spin_lock_irqsave(&adapter->tmreg_lock, flags);
+               ns = timecounter_cyc2time(&adapter->tc, ns);
+               spin_unlock_irqrestore(&adapter->tmreg_lock, flags);
                ts = ns_to_timespec64(ns);
        } else {
                ts.tv_nsec = rd32(auxstmpl);
index 1c46768..fa76419 100644 (file)
@@ -254,6 +254,13 @@ static void igc_clean_tx_ring(struct igc_ring *tx_ring)
        /* reset BQL for queue */
        netdev_tx_reset_queue(txring_txq(tx_ring));
 
+       /* Zero out the buffer ring */
+       memset(tx_ring->tx_buffer_info, 0,
+              sizeof(*tx_ring->tx_buffer_info) * tx_ring->count);
+
+       /* Zero out the descriptor ring */
+       memset(tx_ring->desc, 0, tx_ring->size);
+
        /* reset next_to_use and next_to_clean */
        tx_ring->next_to_use = 0;
        tx_ring->next_to_clean = 0;
@@ -267,7 +274,7 @@ static void igc_clean_tx_ring(struct igc_ring *tx_ring)
  */
 void igc_free_tx_resources(struct igc_ring *tx_ring)
 {
-       igc_clean_tx_ring(tx_ring);
+       igc_disable_tx_ring(tx_ring);
 
        vfree(tx_ring->tx_buffer_info);
        tx_ring->tx_buffer_info = NULL;
@@ -6723,6 +6730,9 @@ static void igc_remove(struct pci_dev *pdev)
 
        igc_ptp_stop(adapter);
 
+       pci_disable_ptm(pdev);
+       pci_clear_master(pdev);
+
        set_bit(__IGC_DOWN, &adapter->state);
 
        del_timer_sync(&adapter->watchdog_timer);
index 5d83c88..1726297 100644 (file)
@@ -1256,7 +1256,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
        if (!__netif_txq_completed_wake(txq, total_packets, total_bytes,
                                        ixgbe_desc_unused(tx_ring),
                                        TX_WAKE_THRESHOLD,
-                                       netif_carrier_ok(tx_ring->netdev) &&
+                                       !netif_carrier_ok(tx_ring->netdev) ||
                                        test_bit(__IXGBE_DOWN, &adapter->state)))
                ++tx_ring->tx_stats.restart_queue;
 
index e1853da..43eb6e8 100644 (file)
@@ -981,6 +981,9 @@ int octep_device_setup(struct octep_device *oct)
                oct->mmio[i].hw_addr =
                        ioremap(pci_resource_start(oct->pdev, i * 2),
                                pci_resource_len(oct->pdev, i * 2));
+               if (!oct->mmio[i].hw_addr)
+                       goto unmap_prev;
+
                oct->mmio[i].mapped = 1;
        }
 
@@ -1015,7 +1018,9 @@ int octep_device_setup(struct octep_device *oct)
        return 0;
 
 unsupported_dev:
-       for (i = 0; i < OCTEP_MMIO_REGIONS; i++)
+       i = OCTEP_MMIO_REGIONS;
+unmap_prev:
+       while (i--)
                iounmap(oct->mmio[i].hw_addr);
 
        kfree(oct->conf);
index 9f673bd..0069e60 100644 (file)
@@ -3044,9 +3044,8 @@ static int rvu_flr_init(struct rvu *rvu)
                            cfg | BIT_ULL(22));
        }
 
-       rvu->flr_wq = alloc_workqueue("rvu_afpf_flr",
-                                     WQ_UNBOUND | WQ_HIGHPRI | WQ_MEM_RECLAIM,
-                                      1);
+       rvu->flr_wq = alloc_ordered_workqueue("rvu_afpf_flr",
+                                             WQ_HIGHPRI | WQ_MEM_RECLAIM);
        if (!rvu->flr_wq)
                return -ENOMEM;
 
index 4ad707e..f01d057 100644 (file)
@@ -1878,7 +1878,8 @@ static int nix_check_txschq_alloc_req(struct rvu *rvu, int lvl, u16 pcifunc,
                free_cnt = rvu_rsrc_free_count(&txsch->schq);
        }
 
-       if (free_cnt < req_schq || req_schq > MAX_TXSCHQ_PER_FUNC)
+       if (free_cnt < req_schq || req->schq[lvl] > MAX_TXSCHQ_PER_FUNC ||
+           req->schq_contig[lvl] > MAX_TXSCHQ_PER_FUNC)
                return NIX_AF_ERR_TLX_ALLOC_FAIL;
 
        /* If contiguous queues are needed, check for availability */
@@ -4080,10 +4081,6 @@ int rvu_mbox_handler_nix_set_rx_cfg(struct rvu *rvu, struct nix_rx_cfg *req,
 
 static u64 rvu_get_lbk_link_credits(struct rvu *rvu, u16 lbk_max_frs)
 {
-       /* CN10k supports 72KB FIFO size and max packet size of 64k */
-       if (rvu->hw->lbk_bufsize == 0x12000)
-               return (rvu->hw->lbk_bufsize - lbk_max_frs) / 16;
-
        return 1600; /* 16 * max LBK datarate = 16 * 100Gbps */
 }
 
index 5120911..9f11c1e 100644 (file)
@@ -1164,10 +1164,8 @@ static u16 __rvu_npc_exact_cmd_rules_cnt_update(struct rvu *rvu, int drop_mcam_i
 {
        struct npc_exact_table *table;
        u16 *cnt, old_cnt;
-       bool promisc;
 
        table = rvu->hw->table;
-       promisc = table->promisc_mode[drop_mcam_idx];
 
        cnt = &table->cnt_cmd_rules[drop_mcam_idx];
        old_cnt = *cnt;
@@ -1179,16 +1177,13 @@ static u16 __rvu_npc_exact_cmd_rules_cnt_update(struct rvu *rvu, int drop_mcam_i
 
        *enable_or_disable_cam = false;
 
-       if (promisc)
-               goto done;
-
-       /* If all rules are deleted and not already in promisc mode; disable cam */
+       /* If all rules are deleted, disable cam */
        if (!*cnt && val < 0) {
                *enable_or_disable_cam = true;
                goto done;
        }
 
-       /* If rule got added and not already in promisc mode; enable cam */
+       /* If rule got added, enable cam */
        if (!old_cnt && val > 0) {
                *enable_or_disable_cam = true;
                goto done;
@@ -1443,7 +1438,6 @@ int rvu_npc_exact_promisc_disable(struct rvu *rvu, u16 pcifunc)
        u32 drop_mcam_idx;
        bool *promisc;
        bool rc;
-       u32 cnt;
 
        table = rvu->hw->table;
 
@@ -1466,17 +1460,8 @@ int rvu_npc_exact_promisc_disable(struct rvu *rvu, u16 pcifunc)
                return LMAC_AF_ERR_INVALID_PARAM;
        }
        *promisc = false;
-       cnt = __rvu_npc_exact_cmd_rules_cnt_update(rvu, drop_mcam_idx, 0, NULL);
        mutex_unlock(&table->lock);
 
-       /* If no dmac filter entries configured, disable drop rule */
-       if (!cnt)
-               rvu_npc_enable_mcam_by_entry_index(rvu, drop_mcam_idx, NIX_INTF_RX, false);
-       else
-               rvu_npc_enable_mcam_by_entry_index(rvu, drop_mcam_idx, NIX_INTF_RX, !*promisc);
-
-       dev_dbg(rvu->dev, "%s: disabled  promisc mode (cgx=%d lmac=%d, cnt=%d)\n",
-               __func__, cgx_id, lmac_id, cnt);
        return 0;
 }
 
@@ -1494,7 +1479,6 @@ int rvu_npc_exact_promisc_enable(struct rvu *rvu, u16 pcifunc)
        u32 drop_mcam_idx;
        bool *promisc;
        bool rc;
-       u32 cnt;
 
        table = rvu->hw->table;
 
@@ -1517,17 +1501,8 @@ int rvu_npc_exact_promisc_enable(struct rvu *rvu, u16 pcifunc)
                return LMAC_AF_ERR_INVALID_PARAM;
        }
        *promisc = true;
-       cnt = __rvu_npc_exact_cmd_rules_cnt_update(rvu, drop_mcam_idx, 0, NULL);
        mutex_unlock(&table->lock);
 
-       /* If no dmac filter entries configured, disable drop rule */
-       if (!cnt)
-               rvu_npc_enable_mcam_by_entry_index(rvu, drop_mcam_idx, NIX_INTF_RX, false);
-       else
-               rvu_npc_enable_mcam_by_entry_index(rvu, drop_mcam_idx, NIX_INTF_RX, !*promisc);
-
-       dev_dbg(rvu->dev, "%s: Enabled promisc mode (cgx=%d lmac=%d cnt=%d)\n",
-               __func__, cgx_id, lmac_id, cnt);
        return 0;
 }
 
index 18284ad..74c4979 100644 (file)
@@ -271,8 +271,7 @@ static int otx2_pf_flr_init(struct otx2_nic *pf, int num_vfs)
 {
        int vf;
 
-       pf->flr_wq = alloc_workqueue("otx2_pf_flr_wq",
-                                    WQ_UNBOUND | WQ_HIGHPRI, 1);
+       pf->flr_wq = alloc_ordered_workqueue("otx2_pf_flr_wq", WQ_HIGHPRI);
        if (!pf->flr_wq)
                return -ENOMEM;
 
@@ -593,9 +592,8 @@ static int otx2_pfvf_mbox_init(struct otx2_nic *pf, int numvfs)
        if (!pf->mbox_pfvf)
                return -ENOMEM;
 
-       pf->mbox_pfvf_wq = alloc_workqueue("otx2_pfvf_mailbox",
-                                          WQ_UNBOUND | WQ_HIGHPRI |
-                                          WQ_MEM_RECLAIM, 1);
+       pf->mbox_pfvf_wq = alloc_ordered_workqueue("otx2_pfvf_mailbox",
+                                                  WQ_HIGHPRI | WQ_MEM_RECLAIM);
        if (!pf->mbox_pfvf_wq)
                return -ENOMEM;
 
@@ -1063,9 +1061,8 @@ static int otx2_pfaf_mbox_init(struct otx2_nic *pf)
        int err;
 
        mbox->pfvf = pf;
-       pf->mbox_wq = alloc_workqueue("otx2_pfaf_mailbox",
-                                     WQ_UNBOUND | WQ_HIGHPRI |
-                                     WQ_MEM_RECLAIM, 1);
+       pf->mbox_wq = alloc_ordered_workqueue("otx2_pfaf_mailbox",
+                                             WQ_HIGHPRI | WQ_MEM_RECLAIM);
        if (!pf->mbox_wq)
                return -ENOMEM;
 
index 7045fed..7af223b 100644 (file)
@@ -652,9 +652,7 @@ static void otx2_sqe_add_ext(struct otx2_nic *pfvf, struct otx2_snd_queue *sq,
                                htons(ext->lso_sb - skb_network_offset(skb));
                } else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
                        ext->lso_format = pfvf->hw.lso_tsov6_idx;
-
-                       ipv6_hdr(skb)->payload_len =
-                               htons(ext->lso_sb - skb_network_offset(skb));
+                       ipv6_hdr(skb)->payload_len = htons(tcp_hdrlen(skb));
                } else if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
                        __be16 l3_proto = vlan_get_protocol(skb);
                        struct udphdr *udph = udp_hdr(skb);
index 53366db..7baed6b 100644 (file)
@@ -297,9 +297,8 @@ static int otx2vf_vfaf_mbox_init(struct otx2_nic *vf)
        int err;
 
        mbox->pfvf = vf;
-       vf->mbox_wq = alloc_workqueue("otx2_vfaf_mailbox",
-                                     WQ_UNBOUND | WQ_HIGHPRI |
-                                     WQ_MEM_RECLAIM, 1);
+       vf->mbox_wq = alloc_ordered_workqueue("otx2_vfaf_mailbox",
+                                             WQ_HIGHPRI | WQ_MEM_RECLAIM);
        if (!vf->mbox_wq)
                return -ENOMEM;
 
index a75fd07..834c644 100644 (file)
@@ -3269,18 +3269,14 @@ static int mtk_open(struct net_device *dev)
                        eth->dsa_meta[i] = md_dst;
                }
        } else {
-               /* Hardware special tag parsing needs to be disabled if at least
-                * one MAC does not use DSA.
+               /* Hardware DSA untagging and VLAN RX offloading need to be
+                * disabled if at least one MAC does not use DSA.
                 */
                u32 val = mtk_r32(eth, MTK_CDMP_IG_CTRL);
 
                val &= ~MTK_CDMP_STAG_EN;
                mtk_w32(eth, val, MTK_CDMP_IG_CTRL);
 
-               val = mtk_r32(eth, MTK_CDMQ_IG_CTRL);
-               val &= ~MTK_CDMQ_STAG_EN;
-               mtk_w32(eth, val, MTK_CDMQ_IG_CTRL);
-
                mtk_w32(eth, 0, MTK_CDMP_EG_CTRL);
        }
 
index 4c205af..985cff9 100644 (file)
@@ -654,7 +654,7 @@ __mtk_wed_detach(struct mtk_wed_device *dev)
                                           BIT(hw->index), BIT(hw->index));
        }
 
-       if (!hw_list[!hw->index]->wed_dev &&
+       if ((!hw_list[!hw->index] || !hw_list[!hw->index]->wed_dev) &&
            hw->eth->dma_dev != hw->eth->dev)
                mtk_eth_set_dma_device(hw->eth, hw->eth->dev);
 
index d53de39..d532883 100644 (file)
@@ -1920,9 +1920,10 @@ static void mlx5_cmd_err_trace(struct mlx5_core_dev *dev, u16 opcode, u16 op_mod
 static void cmd_status_log(struct mlx5_core_dev *dev, u16 opcode, u8 status,
                           u32 syndrome, int err)
 {
+       const char *namep = mlx5_command_str(opcode);
        struct mlx5_cmd_stats *stats;
 
-       if (!err)
+       if (!err || !(strcmp(namep, "unknown command opcode")))
                return;
 
        stats = &dev->cmd.stats[opcode];
index f404978..7c0f2ad 100644 (file)
@@ -490,7 +490,7 @@ static void poll_trace(struct mlx5_fw_tracer *tracer,
                                (u64)timestamp_low;
                break;
        default:
-               if (tracer_event->event_id >= tracer->str_db.first_string_trace ||
+               if (tracer_event->event_id >= tracer->str_db.first_string_trace &&
                    tracer_event->event_id <= tracer->str_db.first_string_trace +
                                              tracer->str_db.num_string_trace) {
                        tracer_event->type = TRACER_EVENT_TYPE_STRING;
index b8987a4..8e999f2 100644 (file)
@@ -327,6 +327,7 @@ struct mlx5e_params {
        unsigned int sw_mtu;
        int hard_mtu;
        bool ptp_rx;
+       __be32 terminate_lkey_be;
 };
 
 static inline u8 mlx5e_get_dcb_num_tc(struct mlx5e_params *params)
index 9c94807..5ce28ff 100644 (file)
@@ -732,7 +732,8 @@ static void mlx5e_rx_compute_wqe_bulk_params(struct mlx5e_params *params,
 static int mlx5e_build_rq_frags_info(struct mlx5_core_dev *mdev,
                                     struct mlx5e_params *params,
                                     struct mlx5e_xsk_param *xsk,
-                                    struct mlx5e_rq_frags_info *info)
+                                    struct mlx5e_rq_frags_info *info,
+                                    u32 *xdp_frag_size)
 {
        u32 byte_count = MLX5E_SW2HW_MTU(params, params->sw_mtu);
        int frag_size_max = DEFAULT_FRAG_SIZE;
@@ -845,6 +846,8 @@ out:
 
        info->log_num_frags = order_base_2(info->num_frags);
 
+       *xdp_frag_size = info->num_frags > 1 && params->xdp_prog ? PAGE_SIZE : 0;
+
        return 0;
 }
 
@@ -989,7 +992,8 @@ int mlx5e_build_rq_param(struct mlx5_core_dev *mdev,
        }
        default: /* MLX5_WQ_TYPE_CYCLIC */
                MLX5_SET(wq, wq, log_wq_sz, params->log_rq_mtu_frames);
-               err = mlx5e_build_rq_frags_info(mdev, params, xsk, &param->frags_info);
+               err = mlx5e_build_rq_frags_info(mdev, params, xsk, &param->frags_info,
+                                               &param->xdp_frag_size);
                if (err)
                        return err;
                ndsegs = param->frags_info.num_frags;
index a5d20f6..6800949 100644 (file)
@@ -24,6 +24,7 @@ struct mlx5e_rq_param {
        u32                        rqc[MLX5_ST_SZ_DW(rqc)];
        struct mlx5_wq_param       wq;
        struct mlx5e_rq_frags_info frags_info;
+       u32                        xdp_frag_size;
 };
 
 struct mlx5e_sq_param {
index 7ac1ad9..7e8e96c 100644 (file)
@@ -51,7 +51,7 @@ int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
        if (err)
                goto out;
 
-       for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+       for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++) {
                buffer = MLX5_ADDR_OF(pbmc_reg, out, buffer[i]);
                port_buffer->buffer[i].lossy =
                        MLX5_GET(bufferx_reg, buffer, lossy);
@@ -73,14 +73,24 @@ int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
                          port_buffer->buffer[i].lossy);
        }
 
-       port_buffer->headroom_size = total_used;
+       port_buffer->internal_buffers_size = 0;
+       for (i = MLX5E_MAX_NETWORK_BUFFER; i < MLX5E_TOTAL_BUFFERS; i++) {
+               buffer = MLX5_ADDR_OF(pbmc_reg, out, buffer[i]);
+               port_buffer->internal_buffers_size +=
+                       MLX5_GET(bufferx_reg, buffer, size) * port_buff_cell_sz;
+       }
+
        port_buffer->port_buffer_size =
                MLX5_GET(pbmc_reg, out, port_buffer_size) * port_buff_cell_sz;
-       port_buffer->spare_buffer_size =
-               port_buffer->port_buffer_size - total_used;
-
-       mlx5e_dbg(HW, priv, "total buffer size=%d, spare buffer size=%d\n",
-                 port_buffer->port_buffer_size,
+       port_buffer->headroom_size = total_used;
+       port_buffer->spare_buffer_size = port_buffer->port_buffer_size -
+                                        port_buffer->internal_buffers_size -
+                                        port_buffer->headroom_size;
+
+       mlx5e_dbg(HW, priv,
+                 "total buffer size=%u, headroom buffer size=%u, internal buffers size=%u, spare buffer size=%u\n",
+                 port_buffer->port_buffer_size, port_buffer->headroom_size,
+                 port_buffer->internal_buffers_size,
                  port_buffer->spare_buffer_size);
 out:
        kfree(out);
@@ -206,11 +216,11 @@ static int port_update_pool_cfg(struct mlx5_core_dev *mdev,
        if (!MLX5_CAP_GEN(mdev, sbcam_reg))
                return 0;
 
-       for (i = 0; i < MLX5E_MAX_BUFFER; i++)
+       for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++)
                lossless_buff_count += ((port_buffer->buffer[i].size) &&
                                       (!(port_buffer->buffer[i].lossy)));
 
-       for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+       for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++) {
                p = select_sbcm_params(&port_buffer->buffer[i], lossless_buff_count);
                err = mlx5e_port_set_sbcm(mdev, 0, i,
                                          MLX5_INGRESS_DIR,
@@ -293,7 +303,7 @@ static int port_set_buffer(struct mlx5e_priv *priv,
        if (err)
                goto out;
 
-       for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+       for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++) {
                void *buffer = MLX5_ADDR_OF(pbmc_reg, in, buffer[i]);
                u64 size = port_buffer->buffer[i].size;
                u64 xoff = port_buffer->buffer[i].xoff;
@@ -351,7 +361,7 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
 {
        int i;
 
-       for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+       for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++) {
                if (port_buffer->buffer[i].lossy) {
                        port_buffer->buffer[i].xoff = 0;
                        port_buffer->buffer[i].xon  = 0;
@@ -408,7 +418,7 @@ static int update_buffer_lossy(struct mlx5_core_dev *mdev,
        int err;
        int i;
 
-       for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+       for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++) {
                prio_count = 0;
                lossy_count = 0;
 
@@ -432,11 +442,11 @@ static int update_buffer_lossy(struct mlx5_core_dev *mdev,
        }
 
        if (changed) {
-               err = port_update_pool_cfg(mdev, port_buffer);
+               err = update_xoff_threshold(port_buffer, xoff, max_mtu, port_buff_cell_sz);
                if (err)
                        return err;
 
-               err = update_xoff_threshold(port_buffer, xoff, max_mtu, port_buff_cell_sz);
+               err = port_update_pool_cfg(mdev, port_buffer);
                if (err)
                        return err;
 
@@ -515,7 +525,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
 
        if (change & MLX5E_PORT_BUFFER_PRIO2BUFFER) {
                update_prio2buffer = true;
-               for (i = 0; i < MLX5E_MAX_BUFFER; i++)
+               for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++)
                        mlx5e_dbg(HW, priv, "%s: requested to map prio[%d] to buffer %d\n",
                                  __func__, i, prio2buffer[i]);
 
@@ -530,7 +540,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
        }
 
        if (change & MLX5E_PORT_BUFFER_SIZE) {
-               for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+               for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++) {
                        mlx5e_dbg(HW, priv, "%s: buffer[%d]=%d\n", __func__, i, buffer_size[i]);
                        if (!port_buffer.buffer[i].lossy && !buffer_size[i]) {
                                mlx5e_dbg(HW, priv, "%s: lossless buffer[%d] size cannot be zero\n",
@@ -544,7 +554,9 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
 
                mlx5e_dbg(HW, priv, "%s: total buffer requested=%d\n", __func__, total_used);
 
-               if (total_used > port_buffer.port_buffer_size)
+               if (total_used > port_buffer.headroom_size &&
+                   (total_used - port_buffer.headroom_size) >
+                           port_buffer.spare_buffer_size)
                        return -EINVAL;
 
                update_buffer = true;
index a6ef118..f4a19ff 100644 (file)
@@ -35,7 +35,8 @@
 #include "en.h"
 #include "port.h"
 
-#define MLX5E_MAX_BUFFER 8
+#define MLX5E_MAX_NETWORK_BUFFER 8
+#define MLX5E_TOTAL_BUFFERS 10
 #define MLX5E_DEFAULT_CABLE_LEN 7 /* 7 meters */
 
 #define MLX5_BUFFER_SUPPORTED(mdev) (MLX5_CAP_GEN(mdev, pcam_reg) && \
@@ -60,8 +61,9 @@ struct mlx5e_bufferx_reg {
 struct mlx5e_port_buffer {
        u32                       port_buffer_size;
        u32                       spare_buffer_size;
-       u32                       headroom_size;
-       struct mlx5e_bufferx_reg  buffer[MLX5E_MAX_BUFFER];
+       u32                       headroom_size;          /* Buffers 0-7 */
+       u32                       internal_buffers_size;  /* Buffers 8-9 */
+       struct mlx5e_bufferx_reg  buffer[MLX5E_MAX_NETWORK_BUFFER];
 };
 
 int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
index eb5abd0..3cbebfb 100644 (file)
@@ -175,6 +175,8 @@ static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int budget)
        /* ensure cq space is freed before enabling more cqes */
        wmb();
 
+       mlx5e_txqsq_wake(&ptpsq->txqsq);
+
        return work_done == budget;
 }
 
index fc923a9..0380a04 100644 (file)
@@ -84,7 +84,7 @@ mlx5e_tc_act_init_parse_state(struct mlx5e_tc_act_parse_state *parse_state,
 
 int
 mlx5e_tc_act_post_parse(struct mlx5e_tc_act_parse_state *parse_state,
-                       struct flow_action *flow_action,
+                       struct flow_action *flow_action, int from, int to,
                        struct mlx5_flow_attr *attr,
                        enum mlx5_flow_namespace_type ns_type)
 {
@@ -96,6 +96,11 @@ mlx5e_tc_act_post_parse(struct mlx5e_tc_act_parse_state *parse_state,
        priv = parse_state->flow->priv;
 
        flow_action_for_each(i, act, flow_action) {
+               if (i < from)
+                       continue;
+               else if (i > to)
+                       break;
+
                tc_act = mlx5e_tc_act_get(act->id, ns_type);
                if (!tc_act || !tc_act->post_parse)
                        continue;
index 0e6e187..d6c12d0 100644 (file)
@@ -112,7 +112,7 @@ mlx5e_tc_act_init_parse_state(struct mlx5e_tc_act_parse_state *parse_state,
 
 int
 mlx5e_tc_act_post_parse(struct mlx5e_tc_act_parse_state *parse_state,
-                       struct flow_action *flow_action,
+                       struct flow_action *flow_action, int from, int to,
                        struct mlx5_flow_attr *attr,
                        enum mlx5_flow_namespace_type ns_type);
 
index ead38ef..a254e72 100644 (file)
@@ -2021,6 +2021,8 @@ void
 mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *priv,
                       struct mlx5_flow_attr *attr)
 {
+       if (!attr->ct_attr.ft) /* no ct action, return */
+               return;
        if (!attr->ct_attr.nf_ft) /* means only ct clear action, and not ct_clear,ct() */
                return;
 
index 20c2d2e..f0c3464 100644 (file)
@@ -492,6 +492,19 @@ void mlx5e_encap_put(struct mlx5e_priv *priv, struct mlx5e_encap_entry *e)
        mlx5e_encap_dealloc(priv, e);
 }
 
+static void mlx5e_encap_put_locked(struct mlx5e_priv *priv, struct mlx5e_encap_entry *e)
+{
+       struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+       lockdep_assert_held(&esw->offloads.encap_tbl_lock);
+
+       if (!refcount_dec_and_test(&e->refcnt))
+               return;
+       list_del(&e->route_list);
+       hash_del_rcu(&e->encap_hlist);
+       mlx5e_encap_dealloc(priv, e);
+}
+
 static void mlx5e_decap_put(struct mlx5e_priv *priv, struct mlx5e_decap_entry *d)
 {
        struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
@@ -816,6 +829,8 @@ int mlx5e_attach_encap(struct mlx5e_priv *priv,
        uintptr_t hash_key;
        int err = 0;
 
+       lockdep_assert_held(&esw->offloads.encap_tbl_lock);
+
        parse_attr = attr->parse_attr;
        tun_info = parse_attr->tun_info[out_index];
        mpls_info = &parse_attr->mpls_info[out_index];
@@ -829,7 +844,6 @@ int mlx5e_attach_encap(struct mlx5e_priv *priv,
 
        hash_key = hash_encap_info(&key);
 
-       mutex_lock(&esw->offloads.encap_tbl_lock);
        e = mlx5e_encap_get(priv, &key, hash_key);
 
        /* must verify if encap is valid or not */
@@ -840,15 +854,6 @@ int mlx5e_attach_encap(struct mlx5e_priv *priv,
                        goto out_err;
                }
 
-               mutex_unlock(&esw->offloads.encap_tbl_lock);
-               wait_for_completion(&e->res_ready);
-
-               /* Protect against concurrent neigh update. */
-               mutex_lock(&esw->offloads.encap_tbl_lock);
-               if (e->compl_result < 0) {
-                       err = -EREMOTEIO;
-                       goto out_err;
-               }
                goto attach_flow;
        }
 
@@ -877,15 +882,12 @@ int mlx5e_attach_encap(struct mlx5e_priv *priv,
        INIT_LIST_HEAD(&e->flows);
        hash_add_rcu(esw->offloads.encap_tbl, &e->encap_hlist, hash_key);
        tbl_time_before = mlx5e_route_tbl_get_last_update(priv);
-       mutex_unlock(&esw->offloads.encap_tbl_lock);
 
        if (family == AF_INET)
                err = mlx5e_tc_tun_create_header_ipv4(priv, mirred_dev, e);
        else if (family == AF_INET6)
                err = mlx5e_tc_tun_create_header_ipv6(priv, mirred_dev, e);
 
-       /* Protect against concurrent neigh update. */
-       mutex_lock(&esw->offloads.encap_tbl_lock);
        complete_all(&e->res_ready);
        if (err) {
                e->compl_result = err;
@@ -920,18 +922,15 @@ attach_flow:
        } else {
                flow_flag_set(flow, SLOW);
        }
-       mutex_unlock(&esw->offloads.encap_tbl_lock);
 
        return err;
 
 out_err:
-       mutex_unlock(&esw->offloads.encap_tbl_lock);
        if (e)
-               mlx5e_encap_put(priv, e);
+               mlx5e_encap_put_locked(priv, e);
        return err;
 
 out_err_init:
-       mutex_unlock(&esw->offloads.encap_tbl_lock);
        kfree(tun_info);
        kfree(e);
        return err;
@@ -1016,6 +1015,93 @@ out_err:
        return err;
 }
 
+int mlx5e_tc_tun_encap_dests_set(struct mlx5e_priv *priv,
+                                struct mlx5e_tc_flow *flow,
+                                struct mlx5_flow_attr *attr,
+                                struct netlink_ext_ack *extack,
+                                bool *vf_tun)
+{
+       struct mlx5e_tc_flow_parse_attr *parse_attr;
+       struct mlx5_esw_flow_attr *esw_attr;
+       struct net_device *encap_dev = NULL;
+       struct mlx5e_rep_priv *rpriv;
+       struct mlx5e_priv *out_priv;
+       struct mlx5_eswitch *esw;
+       int out_index;
+       int err = 0;
+
+       if (!mlx5e_is_eswitch_flow(flow))
+               return 0;
+
+       parse_attr = attr->parse_attr;
+       esw_attr = attr->esw_attr;
+       *vf_tun = false;
+
+       esw = priv->mdev->priv.eswitch;
+       mutex_lock(&esw->offloads.encap_tbl_lock);
+       for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++) {
+               struct net_device *out_dev;
+               int mirred_ifindex;
+
+               if (!(esw_attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP))
+                       continue;
+
+               mirred_ifindex = parse_attr->mirred_ifindex[out_index];
+               out_dev = dev_get_by_index(dev_net(priv->netdev), mirred_ifindex);
+               if (!out_dev) {
+                       NL_SET_ERR_MSG_MOD(extack, "Requested mirred device not found");
+                       err = -ENODEV;
+                       goto out;
+               }
+               err = mlx5e_attach_encap(priv, flow, attr, out_dev, out_index,
+                                        extack, &encap_dev);
+               dev_put(out_dev);
+               if (err)
+                       goto out;
+
+               if (esw_attr->dests[out_index].flags &
+                   MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE &&
+                   !esw_attr->dest_int_port)
+                       *vf_tun = true;
+
+               out_priv = netdev_priv(encap_dev);
+               rpriv = out_priv->ppriv;
+               esw_attr->dests[out_index].rep = rpriv->rep;
+               esw_attr->dests[out_index].mdev = out_priv->mdev;
+       }
+
+       if (*vf_tun && esw_attr->out_count > 1) {
+               NL_SET_ERR_MSG_MOD(extack, "VF tunnel encap with mirroring is not supported");
+               err = -EOPNOTSUPP;
+               goto out;
+       }
+
+out:
+       mutex_unlock(&esw->offloads.encap_tbl_lock);
+       return err;
+}
+
+void mlx5e_tc_tun_encap_dests_unset(struct mlx5e_priv *priv,
+                                   struct mlx5e_tc_flow *flow,
+                                   struct mlx5_flow_attr *attr)
+{
+       struct mlx5_esw_flow_attr *esw_attr;
+       int out_index;
+
+       if (!mlx5e_is_eswitch_flow(flow))
+               return;
+
+       esw_attr = attr->esw_attr;
+
+       for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++) {
+               if (!(esw_attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP))
+                       continue;
+
+               mlx5e_detach_encap(flow->priv, flow, attr, out_index);
+               kfree(attr->parse_attr->tun_info[out_index]);
+       }
+}
+
 static int cmp_route_info(struct mlx5e_route_key *a,
                          struct mlx5e_route_key *b)
 {
@@ -1369,11 +1455,13 @@ static void mlx5e_invalidate_encap(struct mlx5e_priv *priv,
        struct mlx5e_tc_flow *flow;
 
        list_for_each_entry(flow, encap_flows, tmp_list) {
-               struct mlx5_flow_attr *attr = flow->attr;
                struct mlx5_esw_flow_attr *esw_attr;
+               struct mlx5_flow_attr *attr;
 
                if (!mlx5e_is_offloaded_flow(flow))
                        continue;
+
+               attr = mlx5e_tc_get_encap_attr(flow);
                esw_attr = attr->esw_attr;
 
                if (flow_flag_test(flow, SLOW))
index 8ad273d..5d7d676 100644 (file)
@@ -30,6 +30,15 @@ int mlx5e_attach_decap_route(struct mlx5e_priv *priv,
 void mlx5e_detach_decap_route(struct mlx5e_priv *priv,
                              struct mlx5e_tc_flow *flow);
 
+int mlx5e_tc_tun_encap_dests_set(struct mlx5e_priv *priv,
+                                struct mlx5e_tc_flow *flow,
+                                struct mlx5_flow_attr *attr,
+                                struct netlink_ext_ack *extack,
+                                bool *vf_tun);
+void mlx5e_tc_tun_encap_dests_unset(struct mlx5e_priv *priv,
+                                   struct mlx5e_tc_flow *flow,
+                                   struct mlx5_flow_attr *attr);
+
 struct ip_tunnel_info *mlx5e_dup_tun_info(const struct ip_tunnel_info *tun_info);
 
 int mlx5e_tc_set_attr_rx_tun(struct mlx5e_tc_flow *flow,
index 47381e9..879d698 100644 (file)
@@ -193,6 +193,8 @@ static inline u16 mlx5e_txqsq_get_next_pi(struct mlx5e_txqsq *sq, u16 size)
        return pi;
 }
 
+void mlx5e_txqsq_wake(struct mlx5e_txqsq *sq);
+
 static inline u16 mlx5e_shampo_get_cqe_header_index(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 {
        return be16_to_cpu(cqe->shampo.header_entry_index) & (rq->mpwqe.shampo->hd_per_wq - 1);
index ed279f4..36826b5 100644 (file)
@@ -86,7 +86,7 @@ static int mlx5e_init_xsk_rq(struct mlx5e_channel *c,
        if (err)
                return err;
 
-       return  xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix, 0);
+       return xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix, c->napi.napi_id);
 }
 
 static int mlx5e_open_xsk_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
index 55b3854..891d39b 100644 (file)
@@ -61,16 +61,19 @@ static void mlx5e_ipsec_handle_tx_limit(struct work_struct *_work)
        struct mlx5e_ipsec_sa_entry *sa_entry = dwork->sa_entry;
        struct xfrm_state *x = sa_entry->x;
 
-       spin_lock(&x->lock);
+       if (sa_entry->attrs.drop)
+               return;
+
+       spin_lock_bh(&x->lock);
        xfrm_state_check_expire(x);
        if (x->km.state == XFRM_STATE_EXPIRED) {
                sa_entry->attrs.drop = true;
-               mlx5e_accel_ipsec_fs_modify(sa_entry);
-       }
-       spin_unlock(&x->lock);
+               spin_unlock_bh(&x->lock);
 
-       if (sa_entry->attrs.drop)
+               mlx5e_accel_ipsec_fs_modify(sa_entry);
                return;
+       }
+       spin_unlock_bh(&x->lock);
 
        queue_delayed_work(sa_entry->ipsec->wq, &dwork->dwork,
                           MLX5_IPSEC_RESCHED);
@@ -1040,11 +1043,17 @@ err_fs:
        return err;
 }
 
-static void mlx5e_xfrm_free_policy(struct xfrm_policy *x)
+static void mlx5e_xfrm_del_policy(struct xfrm_policy *x)
 {
        struct mlx5e_ipsec_pol_entry *pol_entry = to_ipsec_pol_entry(x);
 
        mlx5e_accel_ipsec_fs_del_pol(pol_entry);
+}
+
+static void mlx5e_xfrm_free_policy(struct xfrm_policy *x)
+{
+       struct mlx5e_ipsec_pol_entry *pol_entry = to_ipsec_pol_entry(x);
+
        kfree(pol_entry);
 }
 
@@ -1065,6 +1074,7 @@ static const struct xfrmdev_ops mlx5e_ipsec_packet_xfrmdev_ops = {
 
        .xdo_dev_state_update_curlft = mlx5e_xfrm_update_curlft,
        .xdo_dev_policy_add = mlx5e_xfrm_add_policy,
+       .xdo_dev_policy_delete = mlx5e_xfrm_del_policy,
        .xdo_dev_policy_free = mlx5e_xfrm_free_policy,
 };
 
index df90e19..a3554bd 100644 (file)
@@ -305,7 +305,17 @@ static void mlx5e_ipsec_update_esn_state(struct mlx5e_ipsec_sa_entry *sa_entry,
        }
 
        mlx5e_ipsec_build_accel_xfrm_attrs(sa_entry, &attrs);
+
+       /* It is safe to execute the modify below unlocked since the only flows
+        * that could affect this HW object, are create, destroy and this work.
+        *
+        * Creation flow can't co-exist with this modify work, the destruction
+        * flow would cancel this work, and this work is a single entity that
+        * can't conflict with it self.
+        */
+       spin_unlock_bh(&sa_entry->x->lock);
        mlx5_accel_esp_modify_xfrm(sa_entry, &attrs);
+       spin_lock_bh(&sa_entry->x->lock);
 
        data.data_offset_condition_operand =
                MLX5_IPSEC_ASO_REMOVE_FLOW_PKT_CNT_OFFSET;
@@ -431,7 +441,7 @@ static void mlx5e_ipsec_handle_event(struct work_struct *_work)
        aso = sa_entry->ipsec->aso;
        attrs = &sa_entry->attrs;
 
-       spin_lock(&sa_entry->x->lock);
+       spin_lock_bh(&sa_entry->x->lock);
        ret = mlx5e_ipsec_aso_query(sa_entry, NULL);
        if (ret)
                goto unlock;
@@ -447,7 +457,7 @@ static void mlx5e_ipsec_handle_event(struct work_struct *_work)
                mlx5e_ipsec_handle_limits(sa_entry);
 
 unlock:
-       spin_unlock(&sa_entry->x->lock);
+       spin_unlock_bh(&sa_entry->x->lock);
        kfree(work);
 }
 
@@ -596,7 +606,8 @@ int mlx5e_ipsec_aso_query(struct mlx5e_ipsec_sa_entry *sa_entry,
        do {
                ret = mlx5_aso_poll_cq(aso->aso, false);
                if (ret)
-                       usleep_range(2, 10);
+                       /* We are in atomic context */
+                       udelay(10);
        } while (ret && time_is_after_jiffies(expires));
        spin_unlock_bh(&aso->lock);
        return ret;
index 1f90594..41c396e 100644 (file)
@@ -150,10 +150,8 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb,
 
        inlen = MLX5_ST_SZ_BYTES(modify_tir_in);
        in = kvzalloc(inlen, GFP_KERNEL);
-       if (!in) {
-               err = -ENOMEM;
-               goto out;
-       }
+       if (!in)
+               return -ENOMEM;
 
        if (enable_uc_lb)
                lb_flags = MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST;
@@ -171,14 +169,13 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb,
                tirn = tir->tirn;
                err = mlx5_core_modify_tir(mdev, tirn, in);
                if (err)
-                       goto out;
+                       break;
        }
+       mutex_unlock(&mdev->mlx5e_res.hw_objs.td.list_lock);
 
-out:
        kvfree(in);
        if (err)
                netdev_err(priv->netdev, "refresh tir(0x%x) failed, %d\n", tirn, err);
-       mutex_unlock(&mdev->mlx5e_res.hw_objs.td.list_lock);
 
        return err;
 }
index 89de92d..ebee52a 100644 (file)
@@ -926,9 +926,10 @@ static int mlx5e_dcbnl_getbuffer(struct net_device *dev,
        if (err)
                return err;
 
-       for (i = 0; i < MLX5E_MAX_BUFFER; i++)
+       for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++)
                dcb_buffer->buffer_size[i] = port_buffer.buffer[i].size;
-       dcb_buffer->total_size = port_buffer.port_buffer_size;
+       dcb_buffer->total_size = port_buffer.port_buffer_size -
+                                port_buffer.internal_buffers_size;
 
        return 0;
 }
@@ -970,7 +971,7 @@ static int mlx5e_dcbnl_setbuffer(struct net_device *dev,
        if (err)
                return err;
 
-       for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+       for (i = 0; i < MLX5E_MAX_NETWORK_BUFFER; i++) {
                if (port_buffer.buffer[i].size != dcb_buffer->buffer_size[i]) {
                        changed |= MLX5E_PORT_BUFFER_SIZE;
                        buffer_size = dcb_buffer->buffer_size;
index 2944691..a5bdf78 100644 (file)
@@ -641,7 +641,7 @@ static void mlx5e_free_mpwqe_rq_drop_page(struct mlx5e_rq *rq)
 }
 
 static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
-                            struct mlx5e_rq *rq)
+                            u32 xdp_frag_size, struct mlx5e_rq *rq)
 {
        struct mlx5_core_dev *mdev = c->mdev;
        int err;
@@ -665,7 +665,8 @@ static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *param
        if (err)
                return err;
 
-       return xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq->ix, c->napi.napi_id);
+       return __xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq->ix, c->napi.napi_id,
+                                 xdp_frag_size);
 }
 
 static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev,
@@ -727,26 +728,6 @@ static void mlx5e_rq_free_shampo(struct mlx5e_rq *rq)
        mlx5e_rq_shampo_hd_free(rq);
 }
 
-static __be32 mlx5e_get_terminate_scatter_list_mkey(struct mlx5_core_dev *dev)
-{
-       u32 out[MLX5_ST_SZ_DW(query_special_contexts_out)] = {};
-       u32 in[MLX5_ST_SZ_DW(query_special_contexts_in)] = {};
-       int res;
-
-       if (!MLX5_CAP_GEN(dev, terminate_scatter_list_mkey))
-               return MLX5_TERMINATE_SCATTER_LIST_LKEY;
-
-       MLX5_SET(query_special_contexts_in, in, opcode,
-                MLX5_CMD_OP_QUERY_SPECIAL_CONTEXTS);
-       res = mlx5_cmd_exec_inout(dev, query_special_contexts, in, out);
-       if (res)
-               return MLX5_TERMINATE_SCATTER_LIST_LKEY;
-
-       res = MLX5_GET(query_special_contexts_out, out,
-                      terminate_scatter_list_mkey);
-       return cpu_to_be32(res);
-}
-
 static int mlx5e_alloc_rq(struct mlx5e_params *params,
                          struct mlx5e_xsk_param *xsk,
                          struct mlx5e_rq_param *rqp,
@@ -908,7 +889,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
                        /* check if num_frags is not a pow of two */
                        if (rq->wqe.info.num_frags < (1 << rq->wqe.info.log_num_frags)) {
                                wqe->data[f].byte_count = 0;
-                               wqe->data[f].lkey = mlx5e_get_terminate_scatter_list_mkey(mdev);
+                               wqe->data[f].lkey = params->terminate_lkey_be;
                                wqe->data[f].addr = 0;
                        }
                }
@@ -2260,7 +2241,7 @@ static int mlx5e_open_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *param
 {
        int err;
 
-       err = mlx5e_init_rxq_rq(c, params, &c->rq);
+       err = mlx5e_init_rxq_rq(c, params, rq_params->xdp_frag_size, &c->rq);
        if (err)
                return err;
 
@@ -5007,6 +4988,8 @@ void mlx5e_build_nic_params(struct mlx5e_priv *priv, struct mlx5e_xsk *xsk, u16
        /* RQ */
        mlx5e_build_rq_params(mdev, params);
 
+       params->terminate_lkey_be = mlx5_core_get_terminate_scatter_list_mkey(mdev);
+
        params->packet_merge.timeout = mlx5e_choose_lro_timeout(mdev, MLX5E_DEFAULT_LRO_TIMEOUT);
 
        /* CQ moderation params */
@@ -5279,12 +5262,16 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev,
 
        mlx5e_timestamp_init(priv);
 
+       priv->dfs_root = debugfs_create_dir("nic",
+                                           mlx5_debugfs_get_dev_root(mdev));
+
        fs = mlx5e_fs_init(priv->profile, mdev,
                           !test_bit(MLX5E_STATE_DESTROYING, &priv->state),
                           priv->dfs_root);
        if (!fs) {
                err = -ENOMEM;
                mlx5_core_err(mdev, "FS initialization failed, %d\n", err);
+               debugfs_remove_recursive(priv->dfs_root);
                return err;
        }
        priv->fs = fs;
@@ -5305,6 +5292,7 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
        mlx5e_health_destroy_reporters(priv);
        mlx5e_ktls_cleanup(priv);
        mlx5e_fs_cleanup(priv->fs);
+       debugfs_remove_recursive(priv->dfs_root);
        priv->fs = NULL;
 }
 
@@ -5851,8 +5839,8 @@ void mlx5e_detach_netdev(struct mlx5e_priv *priv)
 }
 
 static int
-mlx5e_netdev_attach_profile(struct net_device *netdev, struct mlx5_core_dev *mdev,
-                           const struct mlx5e_profile *new_profile, void *new_ppriv)
+mlx5e_netdev_init_profile(struct net_device *netdev, struct mlx5_core_dev *mdev,
+                         const struct mlx5e_profile *new_profile, void *new_ppriv)
 {
        struct mlx5e_priv *priv = netdev_priv(netdev);
        int err;
@@ -5868,6 +5856,25 @@ mlx5e_netdev_attach_profile(struct net_device *netdev, struct mlx5_core_dev *mde
        err = new_profile->init(priv->mdev, priv->netdev);
        if (err)
                goto priv_cleanup;
+
+       return 0;
+
+priv_cleanup:
+       mlx5e_priv_cleanup(priv);
+       return err;
+}
+
+static int
+mlx5e_netdev_attach_profile(struct net_device *netdev, struct mlx5_core_dev *mdev,
+                           const struct mlx5e_profile *new_profile, void *new_ppriv)
+{
+       struct mlx5e_priv *priv = netdev_priv(netdev);
+       int err;
+
+       err = mlx5e_netdev_init_profile(netdev, mdev, new_profile, new_ppriv);
+       if (err)
+               return err;
+
        err = mlx5e_attach_netdev(priv);
        if (err)
                goto profile_cleanup;
@@ -5875,7 +5882,6 @@ mlx5e_netdev_attach_profile(struct net_device *netdev, struct mlx5_core_dev *mde
 
 profile_cleanup:
        new_profile->cleanup(priv);
-priv_cleanup:
        mlx5e_priv_cleanup(priv);
        return err;
 }
@@ -5894,6 +5900,12 @@ int mlx5e_netdev_change_profile(struct mlx5e_priv *priv,
        priv->profile->cleanup(priv);
        mlx5e_priv_cleanup(priv);
 
+       if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) {
+               mlx5e_netdev_init_profile(netdev, mdev, new_profile, new_ppriv);
+               set_bit(MLX5E_STATE_DESTROYING, &priv->state);
+               return -EIO;
+       }
+
        err = mlx5e_netdev_attach_profile(netdev, mdev, new_profile, new_ppriv);
        if (err) { /* roll back to original profile */
                netdev_warn(netdev, "%s: new profile init failed, %d\n", __func__, err);
@@ -5955,8 +5967,11 @@ static int mlx5e_suspend(struct auxiliary_device *adev, pm_message_t state)
        struct net_device *netdev = priv->netdev;
        struct mlx5_core_dev *mdev = priv->mdev;
 
-       if (!netif_device_present(netdev))
+       if (!netif_device_present(netdev)) {
+               if (test_bit(MLX5E_STATE_DESTROYING, &priv->state))
+                       mlx5e_destroy_mdev_resources(mdev);
                return -ENODEV;
+       }
 
        mlx5e_detach_netdev(priv);
        mlx5e_destroy_mdev_resources(mdev);
@@ -6002,9 +6017,6 @@ static int mlx5e_probe(struct auxiliary_device *adev,
        priv->profile = profile;
        priv->ppriv = NULL;
 
-       priv->dfs_root = debugfs_create_dir("nic",
-                                           mlx5_debugfs_get_dev_root(priv->mdev));
-
        err = profile->init(mdev, netdev);
        if (err) {
                mlx5_core_err(mdev, "mlx5e_nic_profile init failed, %d\n", err);
@@ -6033,7 +6045,6 @@ err_resume:
 err_profile_cleanup:
        profile->cleanup(priv);
 err_destroy_netdev:
-       debugfs_remove_recursive(priv->dfs_root);
        mlx5e_destroy_netdev(priv);
 err_devlink_port_unregister:
        mlx5e_devlink_port_unregister(mlx5e_dev);
@@ -6053,7 +6064,6 @@ static void mlx5e_remove(struct auxiliary_device *adev)
        unregister_netdev(priv->netdev);
        mlx5e_suspend(adev, state);
        priv->profile->cleanup(priv);
-       debugfs_remove_recursive(priv->dfs_root);
        mlx5e_destroy_netdev(priv);
        mlx5e_devlink_port_unregister(mlx5e_dev);
        mlx5e_destroy_devlink(mlx5e_dev);
index 1fc386e..3e7041b 100644 (file)
@@ -30,6 +30,7 @@
  * SOFTWARE.
  */
 
+#include <linux/debugfs.h>
 #include <linux/mlx5/fs.h>
 #include <net/switchdev.h>
 #include <net/pkt_cls.h>
@@ -812,11 +813,15 @@ static int mlx5e_init_ul_rep(struct mlx5_core_dev *mdev,
 {
        struct mlx5e_priv *priv = netdev_priv(netdev);
 
+       priv->dfs_root = debugfs_create_dir("nic",
+                                           mlx5_debugfs_get_dev_root(mdev));
+
        priv->fs = mlx5e_fs_init(priv->profile, mdev,
                                 !test_bit(MLX5E_STATE_DESTROYING, &priv->state),
                                 priv->dfs_root);
        if (!priv->fs) {
                netdev_err(priv->netdev, "FS allocation failed\n");
+               debugfs_remove_recursive(priv->dfs_root);
                return -ENOMEM;
        }
 
@@ -829,6 +834,7 @@ static int mlx5e_init_ul_rep(struct mlx5_core_dev *mdev,
 static void mlx5e_cleanup_rep(struct mlx5e_priv *priv)
 {
        mlx5e_fs_cleanup(priv->fs);
+       debugfs_remove_recursive(priv->dfs_root);
        priv->fs = NULL;
 }
 
index 728b82c..b9b1da7 100644 (file)
@@ -1439,6 +1439,7 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
                mlx5e_hairpin_flow_del(priv, flow);
 
        free_flow_post_acts(flow);
+       mlx5_tc_ct_delete_flow(get_ct_priv(flow->priv), attr);
 
        kvfree(attr->parse_attr);
        kfree(flow->attr);
@@ -1665,11 +1666,9 @@ bool mlx5e_tc_is_vf_tunnel(struct net_device *out_dev, struct net_device *route_
 int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *route_dev, u16 *vport)
 {
        struct mlx5e_priv *out_priv, *route_priv;
-       struct mlx5_devcom *devcom = NULL;
        struct mlx5_core_dev *route_mdev;
        struct mlx5_eswitch *esw;
        u16 vhca_id;
-       int err;
 
        out_priv = netdev_priv(out_dev);
        esw = out_priv->mdev->priv.eswitch;
@@ -1678,6 +1677,9 @@ int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *ro
 
        vhca_id = MLX5_CAP_GEN(route_mdev, vhca_id);
        if (mlx5_lag_is_active(out_priv->mdev)) {
+               struct mlx5_devcom *devcom;
+               int err;
+
                /* In lag case we may get devices from different eswitch instances.
                 * If we failed to get vport num, it means, mostly, that we on the wrong
                 * eswitch.
@@ -1686,101 +1688,16 @@ int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *ro
                if (err != -ENOENT)
                        return err;
 
+               rcu_read_lock();
                devcom = out_priv->mdev->priv.devcom;
-               esw = mlx5_devcom_get_peer_data(devcom, MLX5_DEVCOM_ESW_OFFLOADS);
-               if (!esw)
-                       return -ENODEV;
-       }
-
-       err = mlx5_eswitch_vhca_id_to_vport(esw, vhca_id, vport);
-       if (devcom)
-               mlx5_devcom_release_peer_data(devcom, MLX5_DEVCOM_ESW_OFFLOADS);
-       return err;
-}
-
-static int
-set_encap_dests(struct mlx5e_priv *priv,
-               struct mlx5e_tc_flow *flow,
-               struct mlx5_flow_attr *attr,
-               struct netlink_ext_ack *extack,
-               bool *vf_tun)
-{
-       struct mlx5e_tc_flow_parse_attr *parse_attr;
-       struct mlx5_esw_flow_attr *esw_attr;
-       struct net_device *encap_dev = NULL;
-       struct mlx5e_rep_priv *rpriv;
-       struct mlx5e_priv *out_priv;
-       int out_index;
-       int err = 0;
-
-       if (!mlx5e_is_eswitch_flow(flow))
-               return 0;
-
-       parse_attr = attr->parse_attr;
-       esw_attr = attr->esw_attr;
-       *vf_tun = false;
-
-       for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++) {
-               struct net_device *out_dev;
-               int mirred_ifindex;
-
-               if (!(esw_attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP))
-                       continue;
-
-               mirred_ifindex = parse_attr->mirred_ifindex[out_index];
-               out_dev = dev_get_by_index(dev_net(priv->netdev), mirred_ifindex);
-               if (!out_dev) {
-                       NL_SET_ERR_MSG_MOD(extack, "Requested mirred device not found");
-                       err = -ENODEV;
-                       goto out;
-               }
-               err = mlx5e_attach_encap(priv, flow, attr, out_dev, out_index,
-                                        extack, &encap_dev);
-               dev_put(out_dev);
-               if (err)
-                       goto out;
-
-               if (esw_attr->dests[out_index].flags &
-                   MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE &&
-                   !esw_attr->dest_int_port)
-                       *vf_tun = true;
+               esw = mlx5_devcom_get_peer_data_rcu(devcom, MLX5_DEVCOM_ESW_OFFLOADS);
+               err = esw ? mlx5_eswitch_vhca_id_to_vport(esw, vhca_id, vport) : -ENODEV;
+               rcu_read_unlock();
 
-               out_priv = netdev_priv(encap_dev);
-               rpriv = out_priv->ppriv;
-               esw_attr->dests[out_index].rep = rpriv->rep;
-               esw_attr->dests[out_index].mdev = out_priv->mdev;
-       }
-
-       if (*vf_tun && esw_attr->out_count > 1) {
-               NL_SET_ERR_MSG_MOD(extack, "VF tunnel encap with mirroring is not supported");
-               err = -EOPNOTSUPP;
-               goto out;
+               return err;
        }
 
-out:
-       return err;
-}
-
-static void
-clean_encap_dests(struct mlx5e_priv *priv,
-                 struct mlx5e_tc_flow *flow,
-                 struct mlx5_flow_attr *attr)
-{
-       struct mlx5_esw_flow_attr *esw_attr;
-       int out_index;
-
-       if (!mlx5e_is_eswitch_flow(flow))
-               return;
-
-       esw_attr = attr->esw_attr;
-
-       for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++) {
-               if (!(esw_attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP))
-                       continue;
-
-               mlx5e_detach_encap(priv, flow, attr, out_index);
-               kfree(attr->parse_attr->tun_info[out_index]);
-       }
+       return mlx5_eswitch_vhca_id_to_vport(esw, vhca_id, vport);
 }
 
 static int
@@ -1819,7 +1736,7 @@ post_process_attr(struct mlx5e_tc_flow *flow,
        if (err)
                goto err_out;
 
-       err = set_encap_dests(flow->priv, flow, attr, extack, &vf_tun);
+       err = mlx5e_tc_tun_encap_dests_set(flow->priv, flow, attr, extack, &vf_tun);
        if (err)
                goto err_out;
 
@@ -3943,8 +3860,8 @@ parse_tc_actions(struct mlx5e_tc_act_parse_state *parse_state,
        struct mlx5_flow_attr *prev_attr;
        struct flow_action_entry *act;
        struct mlx5e_tc_act *tc_act;
+       int err, i, i_split = 0;
        bool is_missable;
-       int err, i;
 
        ns_type = mlx5e_get_flow_namespace(flow);
        list_add(&attr->list, &flow->attrs);
@@ -3985,7 +3902,8 @@ parse_tc_actions(struct mlx5e_tc_act_parse_state *parse_state,
                    i < flow_action->num_entries - 1)) {
                        is_missable = tc_act->is_missable ? tc_act->is_missable(act) : false;
 
-                       err = mlx5e_tc_act_post_parse(parse_state, flow_action, attr, ns_type);
+                       err = mlx5e_tc_act_post_parse(parse_state, flow_action, i_split, i, attr,
+                                                     ns_type);
                        if (err)
                                goto out_free_post_acts;
 
@@ -3995,6 +3913,7 @@ parse_tc_actions(struct mlx5e_tc_act_parse_state *parse_state,
                                goto out_free_post_acts;
                        }
 
+                       i_split = i + 1;
                        list_add(&attr->list, &flow->attrs);
                }
 
@@ -4009,7 +3928,7 @@ parse_tc_actions(struct mlx5e_tc_act_parse_state *parse_state,
                }
        }
 
-       err = mlx5e_tc_act_post_parse(parse_state, flow_action, attr, ns_type);
+       err = mlx5e_tc_act_post_parse(parse_state, flow_action, i_split, i, attr, ns_type);
        if (err)
                goto out_free_post_acts;
 
@@ -4323,7 +4242,7 @@ mlx5_free_flow_attr_actions(struct mlx5e_tc_flow *flow, struct mlx5_flow_attr *a
        if (attr->post_act_handle)
                mlx5e_tc_post_act_del(get_post_action(flow->priv), attr->post_act_handle);
 
-       clean_encap_dests(flow->priv, flow, attr);
+       mlx5e_tc_tun_encap_dests_unset(flow->priv, flow, attr);
 
        if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT)
                mlx5_fc_destroy(counter_dev, attr->counter);
@@ -5301,6 +5220,8 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv)
                goto err_action_counter;
        }
 
+       mlx5_esw_offloads_devcom_init(esw);
+
        return 0;
 
 err_action_counter:
@@ -5329,7 +5250,7 @@ void mlx5e_tc_esw_cleanup(struct mlx5_rep_uplink_priv *uplink_priv)
        priv = netdev_priv(rpriv->netdev);
        esw = priv->mdev->priv.eswitch;
 
-       mlx5e_tc_clean_fdb_peer_flows(esw);
+       mlx5_esw_offloads_devcom_cleanup(esw);
 
        mlx5e_tc_tun_cleanup(uplink_priv->encap);
 
@@ -5643,22 +5564,43 @@ bool mlx5e_tc_update_skb_nic(struct mlx5_cqe64 *cqe, struct sk_buff *skb)
                                   0, NULL);
 }
 
+static struct mapping_ctx *
+mlx5e_get_priv_obj_mapping(struct mlx5e_priv *priv)
+{
+       struct mlx5e_tc_table *tc;
+       struct mlx5_eswitch *esw;
+       struct mapping_ctx *ctx;
+
+       if (is_mdev_switchdev_mode(priv->mdev)) {
+               esw = priv->mdev->priv.eswitch;
+               ctx = esw->offloads.reg_c0_obj_pool;
+       } else {
+               tc = mlx5e_fs_get_tc(priv->fs);
+               ctx = tc->mapping;
+       }
+
+       return ctx;
+}
+
 int mlx5e_tc_action_miss_mapping_get(struct mlx5e_priv *priv, struct mlx5_flow_attr *attr,
                                     u64 act_miss_cookie, u32 *act_miss_mapping)
 {
-       struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
        struct mlx5_mapped_obj mapped_obj = {};
+       struct mlx5_eswitch *esw;
        struct mapping_ctx *ctx;
        int err;
 
-       ctx = esw->offloads.reg_c0_obj_pool;
-
+       ctx = mlx5e_get_priv_obj_mapping(priv);
        mapped_obj.type = MLX5_MAPPED_OBJ_ACT_MISS;
        mapped_obj.act_miss_cookie = act_miss_cookie;
        err = mapping_add(ctx, &mapped_obj, act_miss_mapping);
        if (err)
                return err;
 
+       if (!is_mdev_switchdev_mode(priv->mdev))
+               return 0;
+
+       esw = priv->mdev->priv.eswitch;
        attr->act_id_restore_rule = esw_add_restore_rule(esw, *act_miss_mapping);
        if (IS_ERR(attr->act_id_restore_rule))
                goto err_rule;
@@ -5673,10 +5615,9 @@ err_rule:
 void mlx5e_tc_action_miss_mapping_put(struct mlx5e_priv *priv, struct mlx5_flow_attr *attr,
                                      u32 act_miss_mapping)
 {
-       struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-       struct mapping_ctx *ctx;
+       struct mapping_ctx *ctx = mlx5e_get_priv_obj_mapping(priv);
 
-       ctx = esw->offloads.reg_c0_obj_pool;
-       mlx5_del_flow_rules(attr->act_id_restore_rule);
+       if (is_mdev_switchdev_mode(priv->mdev))
+               mlx5_del_flow_rules(attr->act_id_restore_rule);
        mapping_remove(ctx, act_miss_mapping);
 }
index df5e780..c7eb6b2 100644 (file)
@@ -762,6 +762,17 @@ static void mlx5e_tx_wi_consume_fifo_skbs(struct mlx5e_txqsq *sq, struct mlx5e_t
        }
 }
 
+void mlx5e_txqsq_wake(struct mlx5e_txqsq *sq)
+{
+       if (netif_tx_queue_stopped(sq->txq) &&
+           mlx5e_wqc_has_room_for(&sq->wq, sq->cc, sq->pc, sq->stop_room) &&
+           mlx5e_ptpsq_fifo_has_room(sq) &&
+           !test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state)) {
+               netif_tx_wake_queue(sq->txq);
+               sq->stats->wake++;
+       }
+}
+
 bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 {
        struct mlx5e_sq_stats *stats;
@@ -861,13 +872,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 
        netdev_tx_completed_queue(sq->txq, npkts, nbytes);
 
-       if (netif_tx_queue_stopped(sq->txq) &&
-           mlx5e_wqc_has_room_for(&sq->wq, sq->cc, sq->pc, sq->stop_room) &&
-           mlx5e_ptpsq_fifo_has_room(sq) &&
-           !test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state)) {
-               netif_tx_wake_queue(sq->txq);
-               stats->wake++;
-       }
+       mlx5e_txqsq_wake(sq);
 
        return (i == MLX5E_TX_CQ_POLL_BUDGET);
 }
index a50bfda..fbb2d96 100644 (file)
@@ -161,20 +161,22 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
                }
        }
 
+       /* budget=0 means we may be in IRQ context, do as little as possible */
+       if (unlikely(!budget))
+               goto out;
+
        busy |= mlx5e_poll_xdpsq_cq(&c->xdpsq.cq);
 
        if (c->xdp)
                busy |= mlx5e_poll_xdpsq_cq(&c->rq_xdpsq.cq);
 
-       if (likely(budget)) { /* budget=0 means: don't poll rx rings */
-               if (xsk_open)
-                       work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
+       if (xsk_open)
+               work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
 
-               if (likely(budget - work_done))
-                       work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
+       if (likely(budget - work_done))
+               work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
 
-               busy |= work_done == budget;
-       }
+       busy |= work_done == budget;
 
        mlx5e_poll_ico_cq(&c->icosq.cq);
        if (mlx5e_poll_ico_cq(&c->async_icosq.cq))
index 1c35d72..3db4866 100644 (file)
@@ -824,7 +824,7 @@ static int comp_irqs_request_pci(struct mlx5_core_dev *dev)
        ncomp_eqs = table->num_comp_eqs;
        cpus = kcalloc(ncomp_eqs, sizeof(*cpus), GFP_KERNEL);
        if (!cpus)
-               ret = -ENOMEM;
+               return -ENOMEM;
 
        i = 0;
        rcu_read_lock();
@@ -1104,7 +1104,7 @@ void mlx5_core_eq_free_irqs(struct mlx5_core_dev *dev)
        struct mlx5_eq_table *table = dev->priv.eq_table;
 
        mutex_lock(&table->lock); /* sync with create/destroy_async_eq */
-       mlx5_irq_table_destroy(dev);
+       mlx5_irq_table_free_irqs(dev);
        mutex_unlock(&table->lock);
 }
 
index 1a042c9..add6cfa 100644 (file)
@@ -342,6 +342,7 @@ struct mlx5_eswitch {
                u32             large_group_num;
        }  params;
        struct blocking_notifier_head n_head;
+       bool paired[MLX5_MAX_PORTS];
 };
 
 void esw_offloads_disable(struct mlx5_eswitch *esw);
@@ -369,6 +370,8 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs);
 void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf);
 void mlx5_eswitch_disable_locked(struct mlx5_eswitch *esw);
 void mlx5_eswitch_disable(struct mlx5_eswitch *esw);
+void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw);
+void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw);
 int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
                               u16 vport, const u8 *mac);
 int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw,
@@ -767,6 +770,8 @@ static inline void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) {}
 static inline int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int num_vfs) { return 0; }
 static inline void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw, bool clear_vf) {}
 static inline void mlx5_eswitch_disable(struct mlx5_eswitch *esw) {}
+static inline void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw) {}
+static inline void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw) {}
 static inline bool mlx5_eswitch_is_funcs_handler(struct mlx5_core_dev *dev) { return false; }
 static inline
 int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw, u16 vport, int link_state) { return 0; }
index 69215ff..8d19c20 100644 (file)
@@ -2742,6 +2742,9 @@ static int mlx5_esw_offloads_devcom_event(int event,
                    mlx5_eswitch_vport_match_metadata_enabled(peer_esw))
                        break;
 
+               if (esw->paired[mlx5_get_dev_index(peer_esw->dev)])
+                       break;
+
                err = mlx5_esw_offloads_set_ns_peer(esw, peer_esw, true);
                if (err)
                        goto err_out;
@@ -2753,14 +2756,18 @@ static int mlx5_esw_offloads_devcom_event(int event,
                if (err)
                        goto err_pair;
 
+               esw->paired[mlx5_get_dev_index(peer_esw->dev)] = true;
+               peer_esw->paired[mlx5_get_dev_index(esw->dev)] = true;
                mlx5_devcom_set_paired(devcom, MLX5_DEVCOM_ESW_OFFLOADS, true);
                break;
 
        case ESW_OFFLOADS_DEVCOM_UNPAIR:
-               if (!mlx5_devcom_is_paired(devcom, MLX5_DEVCOM_ESW_OFFLOADS))
+               if (!esw->paired[mlx5_get_dev_index(peer_esw->dev)])
                        break;
 
                mlx5_devcom_set_paired(devcom, MLX5_DEVCOM_ESW_OFFLOADS, false);
+               esw->paired[mlx5_get_dev_index(peer_esw->dev)] = false;
+               peer_esw->paired[mlx5_get_dev_index(esw->dev)] = false;
                mlx5_esw_offloads_unpair(peer_esw);
                mlx5_esw_offloads_unpair(esw);
                mlx5_esw_offloads_set_ns_peer(esw, peer_esw, false);
@@ -2779,7 +2786,7 @@ err_out:
        return err;
 }
 
-static void esw_offloads_devcom_init(struct mlx5_eswitch *esw)
+void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw)
 {
        struct mlx5_devcom *devcom = esw->dev->priv.devcom;
 
@@ -2802,7 +2809,7 @@ static void esw_offloads_devcom_init(struct mlx5_eswitch *esw)
                               ESW_OFFLOADS_DEVCOM_PAIR, esw);
 }
 
-static void esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw)
+void mlx5_esw_offloads_devcom_cleanup(struct mlx5_eswitch *esw)
 {
        struct mlx5_devcom *devcom = esw->dev->priv.devcom;
 
@@ -3250,8 +3257,6 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
        if (err)
                goto err_vports;
 
-       esw_offloads_devcom_init(esw);
-
        return 0;
 
 err_vports:
@@ -3292,7 +3297,6 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw,
 
 void esw_offloads_disable(struct mlx5_eswitch *esw)
 {
-       esw_offloads_devcom_cleanup(esw);
        mlx5_eswitch_disable_pf_vf_vports(esw);
        esw_offloads_unload_rep(esw, MLX5_VPORT_UPLINK);
        esw_set_passing_vport_metadata(esw, false);
index 144e594..ec83e64 100644 (file)
@@ -511,10 +511,11 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
        struct mlx5_flow_rule *dst;
        void *in_flow_context, *vlan;
        void *in_match_value;
+       int reformat_id = 0;
        unsigned int inlen;
        int dst_cnt_size;
+       u32 *in, action;
        void *in_dests;
-       u32 *in;
        int err;
 
        if (mlx5_set_extended_dest(dev, fte, &extended_dest))
@@ -553,22 +554,42 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
 
        MLX5_SET(flow_context, in_flow_context, extended_destination,
                 extended_dest);
-       if (extended_dest) {
-               u32 action;
 
-               action = fte->action.action &
-                       ~MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT;
-               MLX5_SET(flow_context, in_flow_context, action, action);
-       } else {
-               MLX5_SET(flow_context, in_flow_context, action,
-                        fte->action.action);
-               if (fte->action.pkt_reformat)
-                       MLX5_SET(flow_context, in_flow_context, packet_reformat_id,
-                                fte->action.pkt_reformat->id);
+       action = fte->action.action;
+       if (extended_dest)
+               action &= ~MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT;
+
+       MLX5_SET(flow_context, in_flow_context, action, action);
+
+       if (!extended_dest && fte->action.pkt_reformat) {
+               struct mlx5_pkt_reformat *pkt_reformat = fte->action.pkt_reformat;
+
+               if (pkt_reformat->owner == MLX5_FLOW_RESOURCE_OWNER_SW) {
+                       reformat_id = mlx5_fs_dr_action_get_pkt_reformat_id(pkt_reformat);
+                       if (reformat_id < 0) {
+                               mlx5_core_err(dev,
+                                             "Unsupported SW-owned pkt_reformat type (%d) in FW-owned table\n",
+                                             pkt_reformat->reformat_type);
+                               err = reformat_id;
+                               goto err_out;
+                       }
+               } else {
+                       reformat_id = fte->action.pkt_reformat->id;
+               }
        }
-       if (fte->action.modify_hdr)
+
+       MLX5_SET(flow_context, in_flow_context, packet_reformat_id, (u32)reformat_id);
+
+       if (fte->action.modify_hdr) {
+               if (fte->action.modify_hdr->owner == MLX5_FLOW_RESOURCE_OWNER_SW) {
+                       mlx5_core_err(dev, "Can't use SW-owned modify_hdr in FW-owned table\n");
+                       err = -EOPNOTSUPP;
+                       goto err_out;
+               }
+
                MLX5_SET(flow_context, in_flow_context, modify_header_id,
                         fte->action.modify_hdr->id);
+       }
 
        MLX5_SET(flow_context, in_flow_context, encrypt_decrypt_type,
                 fte->action.crypto.type);
@@ -885,6 +906,8 @@ static int mlx5_cmd_packet_reformat_alloc(struct mlx5_flow_root_namespace *ns,
 
        pkt_reformat->id = MLX5_GET(alloc_packet_reformat_context_out,
                                    out, packet_reformat_id);
+       pkt_reformat->owner = MLX5_FLOW_RESOURCE_OWNER_FW;
+
        kfree(in);
        return err;
 }
@@ -969,6 +992,7 @@ static int mlx5_cmd_modify_header_alloc(struct mlx5_flow_root_namespace *ns,
        err = mlx5_cmd_exec(dev, in, inlen, out, sizeof(out));
 
        modify_hdr->id = MLX5_GET(alloc_modify_header_context_out, out, modify_header_id);
+       modify_hdr->owner = MLX5_FLOW_RESOURCE_OWNER_FW;
        kfree(in);
        return err;
 }
index f137a06..b043190 100644 (file)
@@ -54,8 +54,14 @@ struct mlx5_flow_definer {
        u32 id;
 };
 
+enum mlx5_flow_resource_owner {
+       MLX5_FLOW_RESOURCE_OWNER_FW,
+       MLX5_FLOW_RESOURCE_OWNER_SW,
+};
+
 struct mlx5_modify_hdr {
        enum mlx5_flow_namespace_type ns_type;
+       enum mlx5_flow_resource_owner owner;
        union {
                struct mlx5_fs_dr_action action;
                u32 id;
@@ -65,6 +71,7 @@ struct mlx5_modify_hdr {
 struct mlx5_pkt_reformat {
        enum mlx5_flow_namespace_type ns_type;
        int reformat_type; /* from mlx5_ifc */
+       enum mlx5_flow_resource_owner owner;
        union {
                struct mlx5_fs_dr_action action;
                u32 id;
index adefde3..b7d779d 100644 (file)
@@ -3,6 +3,7 @@
 
 #include <linux/mlx5/vport.h>
 #include "lib/devcom.h"
+#include "mlx5_core.h"
 
 static LIST_HEAD(devcom_list);
 
@@ -13,7 +14,7 @@ static LIST_HEAD(devcom_list);
 
 struct mlx5_devcom_component {
        struct {
-               void *data;
+               void __rcu *data;
        } device[MLX5_DEVCOM_PORTS_SUPPORTED];
 
        mlx5_devcom_event_handler_t handler;
@@ -77,6 +78,7 @@ struct mlx5_devcom *mlx5_devcom_register_device(struct mlx5_core_dev *dev)
        if (MLX5_CAP_GEN(dev, num_lag_ports) != MLX5_DEVCOM_PORTS_SUPPORTED)
                return NULL;
 
+       mlx5_dev_list_lock();
        sguid0 = mlx5_query_nic_system_image_guid(dev);
        list_for_each_entry(iter, &devcom_list, list) {
                struct mlx5_core_dev *tmp_dev = NULL;
@@ -102,8 +104,10 @@ struct mlx5_devcom *mlx5_devcom_register_device(struct mlx5_core_dev *dev)
 
        if (!priv) {
                priv = mlx5_devcom_list_alloc();
-               if (!priv)
-                       return ERR_PTR(-ENOMEM);
+               if (!priv) {
+                       devcom = ERR_PTR(-ENOMEM);
+                       goto out;
+               }
 
                idx = 0;
                new_priv = true;
@@ -112,13 +116,16 @@ struct mlx5_devcom *mlx5_devcom_register_device(struct mlx5_core_dev *dev)
        priv->devs[idx] = dev;
        devcom = mlx5_devcom_alloc(priv, idx);
        if (!devcom) {
-               kfree(priv);
-               return ERR_PTR(-ENOMEM);
+               if (new_priv)
+                       kfree(priv);
+               devcom = ERR_PTR(-ENOMEM);
+               goto out;
        }
 
        if (new_priv)
                list_add(&priv->list, &devcom_list);
-
+out:
+       mlx5_dev_list_unlock();
        return devcom;
 }
 
@@ -131,6 +138,7 @@ void mlx5_devcom_unregister_device(struct mlx5_devcom *devcom)
        if (IS_ERR_OR_NULL(devcom))
                return;
 
+       mlx5_dev_list_lock();
        priv = devcom->priv;
        priv->devs[devcom->idx] = NULL;
 
@@ -141,10 +149,12 @@ void mlx5_devcom_unregister_device(struct mlx5_devcom *devcom)
                        break;
 
        if (i != MLX5_DEVCOM_PORTS_SUPPORTED)
-               return;
+               goto out;
 
        list_del(&priv->list);
        kfree(priv);
+out:
+       mlx5_dev_list_unlock();
 }
 
 void mlx5_devcom_register_component(struct mlx5_devcom *devcom,
@@ -162,7 +172,7 @@ void mlx5_devcom_register_component(struct mlx5_devcom *devcom,
        comp = &devcom->priv->components[id];
        down_write(&comp->sem);
        comp->handler = handler;
-       comp->device[devcom->idx].data = data;
+       rcu_assign_pointer(comp->device[devcom->idx].data, data);
        up_write(&comp->sem);
 }
 
@@ -176,8 +186,9 @@ void mlx5_devcom_unregister_component(struct mlx5_devcom *devcom,
 
        comp = &devcom->priv->components[id];
        down_write(&comp->sem);
-       comp->device[devcom->idx].data = NULL;
+       RCU_INIT_POINTER(comp->device[devcom->idx].data, NULL);
        up_write(&comp->sem);
+       synchronize_rcu();
 }
 
 int mlx5_devcom_send_event(struct mlx5_devcom *devcom,
@@ -193,12 +204,15 @@ int mlx5_devcom_send_event(struct mlx5_devcom *devcom,
 
        comp = &devcom->priv->components[id];
        down_write(&comp->sem);
-       for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++)
-               if (i != devcom->idx && comp->device[i].data) {
-                       err = comp->handler(event, comp->device[i].data,
-                                           event_data);
+       for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) {
+               void *data = rcu_dereference_protected(comp->device[i].data,
+                                                      lockdep_is_held(&comp->sem));
+
+               if (i != devcom->idx && data) {
+                       err = comp->handler(event, data, event_data);
                        break;
                }
+       }
 
        up_write(&comp->sem);
        return err;
@@ -213,7 +227,7 @@ void mlx5_devcom_set_paired(struct mlx5_devcom *devcom,
        comp = &devcom->priv->components[id];
        WARN_ON(!rwsem_is_locked(&comp->sem));
 
-       comp->paired = paired;
+       WRITE_ONCE(comp->paired, paired);
 }
 
 bool mlx5_devcom_is_paired(struct mlx5_devcom *devcom,
@@ -222,7 +236,7 @@ bool mlx5_devcom_is_paired(struct mlx5_devcom *devcom,
        if (IS_ERR_OR_NULL(devcom))
                return false;
 
-       return devcom->priv->components[id].paired;
+       return READ_ONCE(devcom->priv->components[id].paired);
 }
 
 void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom,
@@ -236,7 +250,7 @@ void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom,
 
        comp = &devcom->priv->components[id];
        down_read(&comp->sem);
-       if (!comp->paired) {
+       if (!READ_ONCE(comp->paired)) {
                up_read(&comp->sem);
                return NULL;
        }
@@ -245,7 +259,29 @@ void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom,
                if (i != devcom->idx)
                        break;
 
-       return comp->device[i].data;
+       return rcu_dereference_protected(comp->device[i].data, lockdep_is_held(&comp->sem));
+}
+
+void *mlx5_devcom_get_peer_data_rcu(struct mlx5_devcom *devcom, enum mlx5_devcom_components id)
+{
+       struct mlx5_devcom_component *comp;
+       int i;
+
+       if (IS_ERR_OR_NULL(devcom))
+               return NULL;
+
+       for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++)
+               if (i != devcom->idx)
+                       break;
+
+       comp = &devcom->priv->components[id];
+       /* This can change concurrently, however 'data' pointer will remain
+        * valid for the duration of RCU read section.
+        */
+       if (!READ_ONCE(comp->paired))
+               return NULL;
+
+       return rcu_dereference(comp->device[i].data);
 }
 
 void mlx5_devcom_release_peer_data(struct mlx5_devcom *devcom,
index 94313c1..9a496f4 100644 (file)
@@ -41,6 +41,7 @@ bool mlx5_devcom_is_paired(struct mlx5_devcom *devcom,
 
 void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom,
                                enum mlx5_devcom_components id);
+void *mlx5_devcom_get_peer_data_rcu(struct mlx5_devcom *devcom, enum mlx5_devcom_components id);
 void mlx5_devcom_release_peer_data(struct mlx5_devcom *devcom,
                                   enum mlx5_devcom_components id);
 
index 995eb2d..d6ee016 100644 (file)
@@ -923,7 +923,6 @@ static int mlx5_pci_init(struct mlx5_core_dev *dev, struct pci_dev *pdev,
        }
 
        mlx5_pci_vsc_init(dev);
-       dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev);
        return 0;
 
 err_clr_master:
@@ -1049,7 +1048,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 
        dev->dm = mlx5_dm_create(dev);
        if (IS_ERR(dev->dm))
-               mlx5_core_warn(dev, "Failed to init device memory%d\n", err);
+               mlx5_core_warn(dev, "Failed to init device memory %ld\n", PTR_ERR(dev->dm));
 
        dev->tracer = mlx5_fw_tracer_create(dev);
        dev->hv_vhca = mlx5_hv_vhca_create(dev);
@@ -1155,6 +1154,7 @@ static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot, u64 timeout
                goto err_cmd_cleanup;
        }
 
+       dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev);
        mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP);
 
        mlx5_start_health_poll(dev);
@@ -1802,15 +1802,16 @@ static void remove_one(struct pci_dev *pdev)
        struct devlink *devlink = priv_to_devlink(dev);
 
        set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state);
-       /* mlx5_drain_fw_reset() is using devlink APIs. Hence, we must drain
-        * fw_reset before unregistering the devlink.
+       /* mlx5_drain_fw_reset() and mlx5_drain_health_wq() are using
+        * devlink notify APIs.
+        * Hence, we must drain them before unregistering the devlink.
         */
        mlx5_drain_fw_reset(dev);
+       mlx5_drain_health_wq(dev);
        devlink_unregister(devlink);
        mlx5_sriov_disable(pdev);
        mlx5_thermal_uninit(dev);
        mlx5_crdump_disable(dev);
-       mlx5_drain_health_wq(dev);
        mlx5_uninit_one(dev);
        mlx5_pci_close(dev);
        mlx5_mdev_uninit(dev);
index 1d87937..2295204 100644 (file)
@@ -276,18 +276,6 @@ static inline bool mlx5_sriov_is_enabled(struct mlx5_core_dev *dev)
        return pci_num_vf(dev->pdev) ? true : false;
 }
 
-static inline int mlx5_lag_is_lacp_owner(struct mlx5_core_dev *dev)
-{
-       /* LACP owner conditions:
-        * 1) Function is physical.
-        * 2) LAG is supported by FW.
-        * 3) LAG is managed by driver (currently the only option).
-        */
-       return  MLX5_CAP_GEN(dev, vport_group_manager) &&
-                  (MLX5_CAP_GEN(dev, num_lag_ports) > 1) &&
-                   MLX5_CAP_GEN(dev, lag_master);
-}
-
 int mlx5_rescan_drivers_locked(struct mlx5_core_dev *dev);
 static inline int mlx5_rescan_drivers(struct mlx5_core_dev *dev)
 {
index efd0c29..aa403a5 100644 (file)
@@ -15,6 +15,7 @@ int mlx5_irq_table_init(struct mlx5_core_dev *dev);
 void mlx5_irq_table_cleanup(struct mlx5_core_dev *dev);
 int mlx5_irq_table_create(struct mlx5_core_dev *dev);
 void mlx5_irq_table_destroy(struct mlx5_core_dev *dev);
+void mlx5_irq_table_free_irqs(struct mlx5_core_dev *dev);
 int mlx5_irq_table_get_num_comp(struct mlx5_irq_table *table);
 int mlx5_irq_table_get_sfs_vec(struct mlx5_irq_table *table);
 struct mlx5_irq_table *mlx5_irq_table_get(struct mlx5_core_dev *dev);
index 9d735c3..678f0be 100644 (file)
@@ -32,6 +32,7 @@
 
 #include <linux/kernel.h>
 #include <linux/mlx5/driver.h>
+#include <linux/mlx5/qp.h>
 #include "mlx5_core.h"
 
 int mlx5_core_create_mkey(struct mlx5_core_dev *dev, u32 *mkey, u32 *in,
@@ -122,3 +123,23 @@ int mlx5_core_destroy_psv(struct mlx5_core_dev *dev, int psv_num)
        return mlx5_cmd_exec_in(dev, destroy_psv, in);
 }
 EXPORT_SYMBOL(mlx5_core_destroy_psv);
+
+__be32 mlx5_core_get_terminate_scatter_list_mkey(struct mlx5_core_dev *dev)
+{
+       u32 out[MLX5_ST_SZ_DW(query_special_contexts_out)] = {};
+       u32 in[MLX5_ST_SZ_DW(query_special_contexts_in)] = {};
+       u32 mkey;
+
+       if (!MLX5_CAP_GEN(dev, terminate_scatter_list_mkey))
+               return MLX5_TERMINATE_SCATTER_LIST_LKEY;
+
+       MLX5_SET(query_special_contexts_in, in, opcode,
+                MLX5_CMD_OP_QUERY_SPECIAL_CONTEXTS);
+       if (mlx5_cmd_exec_inout(dev, query_special_contexts, in, out))
+               return MLX5_TERMINATE_SCATTER_LIST_LKEY;
+
+       mkey = MLX5_GET(query_special_contexts_out, out,
+                       terminate_scatter_list_mkey);
+       return cpu_to_be32(mkey);
+}
+EXPORT_SYMBOL(mlx5_core_get_terminate_scatter_list_mkey);
index 2245d3b..98412bd 100644 (file)
@@ -32,6 +32,7 @@ struct mlx5_irq {
        struct mlx5_irq_pool *pool;
        int refcount;
        struct msi_map map;
+       u32 pool_index;
 };
 
 struct mlx5_irq_table {
@@ -125,14 +126,22 @@ out:
        return ret;
 }
 
-static void irq_release(struct mlx5_irq *irq)
+/* mlx5_system_free_irq - Free an IRQ
+ * @irq: IRQ to free
+ *
+ * Free the IRQ and other resources such as rmap from the system.
+ * BUT doesn't free or remove reference from mlx5.
+ * This function is very important for the shutdown flow, where we need to
+ * cleanup system resoruces but keep mlx5 objects alive,
+ * see mlx5_irq_table_free_irqs().
+ */
+static void mlx5_system_free_irq(struct mlx5_irq *irq)
 {
        struct mlx5_irq_pool *pool = irq->pool;
 #ifdef CONFIG_RFS_ACCEL
        struct cpu_rmap *rmap;
 #endif
 
-       xa_erase(&pool->irqs, irq->map.index);
        /* free_irq requires that affinity_hint and rmap will be cleared before
         * calling it. To satisfy this requirement, we call
         * irq_cpu_rmap_remove() to remove the notifier
@@ -140,14 +149,22 @@ static void irq_release(struct mlx5_irq *irq)
        irq_update_affinity_hint(irq->map.virq, NULL);
 #ifdef CONFIG_RFS_ACCEL
        rmap = mlx5_eq_table_get_rmap(pool->dev);
-       if (rmap && irq->map.index)
+       if (rmap)
                irq_cpu_rmap_remove(rmap, irq->map.virq);
 #endif
 
-       free_cpumask_var(irq->mask);
        free_irq(irq->map.virq, &irq->nh);
        if (irq->map.index && pci_msix_can_alloc_dyn(pool->dev->pdev))
                pci_msix_free_irq(pool->dev->pdev, irq->map);
+}
+
+static void irq_release(struct mlx5_irq *irq)
+{
+       struct mlx5_irq_pool *pool = irq->pool;
+
+       xa_erase(&pool->irqs, irq->pool_index);
+       mlx5_system_free_irq(irq);
+       free_cpumask_var(irq->mask);
        kfree(irq);
 }
 
@@ -231,12 +248,13 @@ struct mlx5_irq *mlx5_irq_alloc(struct mlx5_irq_pool *pool, int i,
        if (!irq)
                return ERR_PTR(-ENOMEM);
        if (!i || !pci_msix_can_alloc_dyn(dev->pdev)) {
-               /* The vector at index 0 was already allocated.
-                * Just get the irq number. If dynamic irq is not supported
-                * vectors have also been allocated.
+               /* The vector at index 0 is always statically allocated. If
+                * dynamic irq is not supported all vectors are statically
+                * allocated. In both cases just get the irq number and set
+                * the index.
                 */
                irq->map.virq = pci_irq_vector(dev->pdev, i);
-               irq->map.index = 0;
+               irq->map.index = i;
        } else {
                irq->map = pci_msix_alloc_irq_at(dev->pdev, MSI_ANY_INDEX, af_desc);
                if (!irq->map.virq) {
@@ -276,11 +294,11 @@ struct mlx5_irq *mlx5_irq_alloc(struct mlx5_irq_pool *pool, int i,
        }
        irq->pool = pool;
        irq->refcount = 1;
-       irq->map.index = i;
-       err = xa_err(xa_store(&pool->irqs, irq->map.index, irq, GFP_KERNEL));
+       irq->pool_index = i;
+       err = xa_err(xa_store(&pool->irqs, irq->pool_index, irq, GFP_KERNEL));
        if (err) {
                mlx5_core_err(dev, "Failed to alloc xa entry for irq(%u). err = %d\n",
-                             irq->map.index, err);
+                             irq->pool_index, err);
                goto err_xa;
        }
        return irq;
@@ -563,17 +581,23 @@ void mlx5_irqs_release_vectors(struct mlx5_irq **irqs, int nirqs)
 int mlx5_irqs_request_vectors(struct mlx5_core_dev *dev, u16 *cpus, int nirqs,
                              struct mlx5_irq **irqs, struct cpu_rmap **rmap)
 {
+       struct mlx5_irq_table *table = mlx5_irq_table_get(dev);
+       struct mlx5_irq_pool *pool = table->pcif_pool;
        struct irq_affinity_desc af_desc;
        struct mlx5_irq *irq;
+       int offset = 1;
        int i;
 
-       af_desc.is_managed = 1;
+       if (!pool->xa_num_irqs.max)
+               offset = 0;
+
+       af_desc.is_managed = false;
        for (i = 0; i < nirqs; i++) {
+               cpumask_clear(&af_desc.mask);
                cpumask_set_cpu(cpus[i], &af_desc.mask);
-               irq = mlx5_irq_request(dev, i + 1, &af_desc, rmap);
+               irq = mlx5_irq_request(dev, i + offset, &af_desc, rmap);
                if (IS_ERR(irq))
                        break;
-               cpumask_clear(&af_desc.mask);
                irqs[i] = irq;
        }
 
@@ -691,6 +715,25 @@ static void irq_pools_destroy(struct mlx5_irq_table *table)
        irq_pool_free(table->pcif_pool);
 }
 
+static void mlx5_irq_pool_free_irqs(struct mlx5_irq_pool *pool)
+{
+       struct mlx5_irq *irq;
+       unsigned long index;
+
+       xa_for_each(&pool->irqs, index, irq)
+               mlx5_system_free_irq(irq);
+
+}
+
+static void mlx5_irq_pools_free_irqs(struct mlx5_irq_table *table)
+{
+       if (table->sf_ctrl_pool) {
+               mlx5_irq_pool_free_irqs(table->sf_comp_pool);
+               mlx5_irq_pool_free_irqs(table->sf_ctrl_pool);
+       }
+       mlx5_irq_pool_free_irqs(table->pcif_pool);
+}
+
 /* irq_table API */
 
 int mlx5_irq_table_init(struct mlx5_core_dev *dev)
@@ -774,6 +817,17 @@ void mlx5_irq_table_destroy(struct mlx5_core_dev *dev)
        pci_free_irq_vectors(dev->pdev);
 }
 
+void mlx5_irq_table_free_irqs(struct mlx5_core_dev *dev)
+{
+       struct mlx5_irq_table *table = dev->priv.irq_table;
+
+       if (mlx5_core_is_sf(dev))
+               return;
+
+       mlx5_irq_pools_free_irqs(table);
+       pci_free_irq_vectors(dev->pdev);
+}
+
 int mlx5_irq_table_get_sfs_vec(struct mlx5_irq_table *table)
 {
        if (table->sf_comp_pool)
index e2f26d0..0692363 100644 (file)
@@ -63,6 +63,7 @@ static void mlx5_sf_dev_remove(struct auxiliary_device *adev)
        struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
        struct devlink *devlink = priv_to_devlink(sf_dev->mdev);
 
+       mlx5_drain_health_wq(sf_dev->mdev);
        devlink_unregister(devlink);
        mlx5_uninit_one(sf_dev->mdev);
        iounmap(sf_dev->mdev->iseg);
index 0eb9a8d..0f783e7 100644 (file)
@@ -1421,9 +1421,13 @@ dr_action_create_reformat_action(struct mlx5dr_domain *dmn,
        }
        case DR_ACTION_TYP_TNL_L3_TO_L2:
        {
-               u8 hw_actions[DR_ACTION_CACHE_LINE_SIZE] = {};
+               u8 *hw_actions;
                int ret;
 
+               hw_actions = kzalloc(DR_ACTION_CACHE_LINE_SIZE, GFP_KERNEL);
+               if (!hw_actions)
+                       return -ENOMEM;
+
                ret = mlx5dr_ste_set_action_decap_l3_list(dmn->ste_ctx,
                                                          data, data_sz,
                                                          hw_actions,
@@ -1431,6 +1435,7 @@ dr_action_create_reformat_action(struct mlx5dr_domain *dmn,
                                                          &action->rewrite->num_of_actions);
                if (ret) {
                        mlx5dr_dbg(dmn, "Failed creating decap l3 action list\n");
+                       kfree(hw_actions);
                        return ret;
                }
 
@@ -1440,6 +1445,7 @@ dr_action_create_reformat_action(struct mlx5dr_domain *dmn,
                ret = mlx5dr_ste_alloc_modify_hdr(action);
                if (ret) {
                        mlx5dr_dbg(dmn, "Failed preparing reformat data\n");
+                       kfree(hw_actions);
                        return ret;
                }
                return 0;
@@ -2129,6 +2135,11 @@ mlx5dr_action_create_aso(struct mlx5dr_domain *dmn, u32 obj_id,
        return action;
 }
 
+u32 mlx5dr_action_get_pkt_reformat_id(struct mlx5dr_action *action)
+{
+       return action->reformat->id;
+}
+
 int mlx5dr_action_destroy(struct mlx5dr_action *action)
 {
        if (WARN_ON_ONCE(refcount_read(&action->refcount) > 1))
index 3835ba3..1aa525e 100644 (file)
@@ -117,6 +117,8 @@ int mlx5dr_cmd_query_device(struct mlx5_core_dev *mdev,
        caps->gvmi              = MLX5_CAP_GEN(mdev, vhca_id);
        caps->flex_protocols    = MLX5_CAP_GEN(mdev, flex_parser_protocols);
        caps->sw_format_ver     = MLX5_CAP_GEN(mdev, steering_format_version);
+       caps->roce_caps.fl_rc_qp_when_roce_disabled =
+               MLX5_CAP_GEN(mdev, fl_rc_qp_when_roce_disabled);
 
        if (MLX5_CAP_GEN(mdev, roce)) {
                err = dr_cmd_query_nic_vport_roce_en(mdev, 0, &roce_en);
@@ -124,7 +126,7 @@ int mlx5dr_cmd_query_device(struct mlx5_core_dev *mdev,
                        return err;
 
                caps->roce_caps.roce_en = roce_en;
-               caps->roce_caps.fl_rc_qp_when_roce_disabled =
+               caps->roce_caps.fl_rc_qp_when_roce_disabled |=
                        MLX5_CAP_ROCE(mdev, fl_rc_qp_when_roce_disabled);
                caps->roce_caps.fl_rc_qp_when_roce_enabled =
                        MLX5_CAP_ROCE(mdev, fl_rc_qp_when_roce_enabled);
index 13e06a6..d6947fe 100644 (file)
@@ -213,6 +213,8 @@ struct mlx5dr_ptrn_mgr *mlx5dr_ptrn_mgr_create(struct mlx5dr_domain *dmn)
        }
 
        INIT_LIST_HEAD(&mgr->ptrn_list);
+       mutex_init(&mgr->modify_hdr_mutex);
+
        return mgr;
 
 free_mgr:
@@ -237,5 +239,6 @@ void mlx5dr_ptrn_mgr_destroy(struct mlx5dr_ptrn_mgr *mgr)
        }
 
        mlx5dr_icm_pool_destroy(mgr->ptrn_icm_pool);
+       mutex_destroy(&mgr->modify_hdr_mutex);
        kfree(mgr);
 }
index 9413aaf..e94fbb0 100644 (file)
@@ -15,7 +15,8 @@ static u32 dr_ste_crc32_calc(const void *input_data, size_t length)
 {
        u32 crc = crc32(0, input_data, length);
 
-       return (__force u32)htonl(crc);
+       return (__force u32)((crc >> 24) & 0xff) | ((crc << 8) & 0xff0000) |
+                           ((crc >> 8) & 0xff00) | ((crc << 24) & 0xff000000);
 }
 
 bool mlx5dr_ste_supp_ttl_cs_recalc(struct mlx5dr_cmd_caps *caps)
index 9846537..cc215be 100644 (file)
@@ -331,8 +331,16 @@ static int mlx5_cmd_dr_create_fte(struct mlx5_flow_root_namespace *ns,
        }
 
        if (fte->action.action & MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT) {
-               bool is_decap = fte->action.pkt_reformat->reformat_type ==
-                       MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2;
+               bool is_decap;
+
+               if (fte->action.pkt_reformat->owner == MLX5_FLOW_RESOURCE_OWNER_FW) {
+                       err = -EINVAL;
+                       mlx5dr_err(domain, "FW-owned reformat can't be used in SW rule\n");
+                               goto free_actions;
+               }
+
+               is_decap = fte->action.pkt_reformat->reformat_type ==
+                          MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2;
 
                if (is_decap)
                        actions[num_actions++] =
@@ -661,6 +669,7 @@ static int mlx5_cmd_dr_packet_reformat_alloc(struct mlx5_flow_root_namespace *ns
                return -EINVAL;
        }
 
+       pkt_reformat->owner = MLX5_FLOW_RESOURCE_OWNER_SW;
        pkt_reformat->action.dr_action = action;
 
        return 0;
@@ -691,6 +700,7 @@ static int mlx5_cmd_dr_modify_header_alloc(struct mlx5_flow_root_namespace *ns,
                return -EINVAL;
        }
 
+       modify_hdr->owner = MLX5_FLOW_RESOURCE_OWNER_SW;
        modify_hdr->action.dr_action = action;
 
        return 0;
@@ -816,6 +826,19 @@ static u32 mlx5_cmd_dr_get_capabilities(struct mlx5_flow_root_namespace *ns,
        return steering_caps;
 }
 
+int mlx5_fs_dr_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat)
+{
+       switch (pkt_reformat->reformat_type) {
+       case MLX5_REFORMAT_TYPE_L2_TO_VXLAN:
+       case MLX5_REFORMAT_TYPE_L2_TO_NVGRE:
+       case MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL:
+       case MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL:
+       case MLX5_REFORMAT_TYPE_INSERT_HDR:
+               return mlx5dr_action_get_pkt_reformat_id(pkt_reformat->action.dr_action);
+       }
+       return -EOPNOTSUPP;
+}
+
 bool mlx5_fs_dr_is_supported(struct mlx5_core_dev *dev)
 {
        return mlx5dr_is_supported(dev);
index d168622..99a3b2e 100644 (file)
@@ -38,6 +38,8 @@ struct mlx5_fs_dr_table {
 
 bool mlx5_fs_dr_is_supported(struct mlx5_core_dev *dev);
 
+int mlx5_fs_dr_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat);
+
 const struct mlx5_flow_cmds *mlx5_fs_cmd_get_dr_cmds(void);
 
 #else
@@ -47,6 +49,11 @@ static inline const struct mlx5_flow_cmds *mlx5_fs_cmd_get_dr_cmds(void)
        return NULL;
 }
 
+static inline u32 mlx5_fs_dr_action_get_pkt_reformat_id(struct mlx5_pkt_reformat *pkt_reformat)
+{
+       return 0;
+}
+
 static inline bool mlx5_fs_dr_is_supported(struct mlx5_core_dev *dev)
 {
        return false;
index 9afd268..d1c04f4 100644 (file)
@@ -150,6 +150,8 @@ mlx5dr_action_create_dest_match_range(struct mlx5dr_domain *dmn,
 
 int mlx5dr_action_destroy(struct mlx5dr_action *action);
 
+u32 mlx5dr_action_get_pkt_reformat_id(struct mlx5dr_action *action);
+
 int mlx5dr_definer_get(struct mlx5dr_domain *dmn, u16 format_id,
                       u8 *dw_selectors, u8 *byte_selectors,
                       u8 *match_mask, u32 *definer_id);
index e47fa6f..20bb5eb 100644 (file)
@@ -45,7 +45,7 @@ static int mlx5_thermal_get_mtmp_temp(struct mlx5_core_dev *mdev, u32 id, int *p
 static int mlx5_thermal_get_temp(struct thermal_zone_device *tzdev,
                                 int *p_temp)
 {
-       struct mlx5_thermal *thermal = tzdev->devdata;
+       struct mlx5_thermal *thermal = thermal_zone_device_priv(tzdev);
        struct mlx5_core_dev *mdev = thermal->mdev;
        int err;
 
@@ -81,12 +81,13 @@ int mlx5_thermal_init(struct mlx5_core_dev *mdev)
                return -ENOMEM;
 
        thermal->mdev = mdev;
-       thermal->tzdev = thermal_zone_device_register(data,
-                                                     MLX5_THERMAL_NUM_TRIPS,
-                                                     MLX5_THERMAL_TRIP_MASK,
-                                                     thermal,
-                                                     &mlx5_thermal_ops,
-                                                     NULL, 0, MLX5_THERMAL_POLL_INT_MSEC);
+       thermal->tzdev = thermal_zone_device_register_with_trips(data,
+                                                                NULL,
+                                                                MLX5_THERMAL_NUM_TRIPS,
+                                                                MLX5_THERMAL_TRIP_MASK,
+                                                                thermal,
+                                                                &mlx5_thermal_ops,
+                                                                NULL, 0, MLX5_THERMAL_POLL_INT_MSEC);
        if (IS_ERR(thermal->tzdev)) {
                dev_err(mdev->device, "Failed to register thermal zone device (%s) %ld\n",
                        data, PTR_ERR(thermal->tzdev));
index afa3b92..0d5a41a 100644 (file)
@@ -245,12 +245,6 @@ static bool mlxbf_gige_rx_packet(struct mlxbf_gige *priv, int *rx_pkts)
 
                skb = priv->rx_skb[rx_pi_rem];
 
-               skb_put(skb, datalen);
-
-               skb->ip_summed = CHECKSUM_NONE; /* device did not checksum packet */
-
-               skb->protocol = eth_type_trans(skb, netdev);
-
                /* Alloc another RX SKB for this same index */
                rx_skb = mlxbf_gige_alloc_skb(priv, MLXBF_GIGE_DEFAULT_BUF_SZ,
                                              &rx_buf_dma, DMA_FROM_DEVICE);
@@ -259,6 +253,13 @@ static bool mlxbf_gige_rx_packet(struct mlxbf_gige *priv, int *rx_pkts)
                priv->rx_skb[rx_pi_rem] = rx_skb;
                dma_unmap_single(priv->dev, *rx_wqe_addr,
                                 MLXBF_GIGE_DEFAULT_BUF_SZ, DMA_FROM_DEVICE);
+
+               skb_put(skb, datalen);
+
+               skb->ip_summed = CHECKSUM_NONE; /* device did not checksum packet */
+
+               skb->protocol = eth_type_trans(skb, netdev);
+
                *rx_wqe_addr = rx_buf_dma;
        } else if (rx_cqe & MLXBF_GIGE_RX_CQE_PKT_STATUS_MAC_ERR) {
                priv->stats.rx_mac_errors++;
index 2b6e046..ee26986 100644 (file)
@@ -1039,6 +1039,16 @@ static int lan966x_reset_switch(struct lan966x *lan966x)
 
        reset_control_reset(switch_reset);
 
+       /* Don't reinitialize the switch core, if it is already initialized. In
+        * case it is initialized twice, some pointers inside the queue system
+        * in HW will get corrupted and then after a while the queue system gets
+        * full and no traffic is passing through the switch. The issue is seen
+        * when loading and unloading the driver and sending traffic through the
+        * switch.
+        */
+       if (lan_rd(lan966x, SYS_RESET_CFG) & SYS_RESET_CFG_CORE_ENA)
+               return 0;
+
        lan_wr(SYS_RESET_CFG_CORE_ENA_SET(0), lan966x, SYS_RESET_CFG);
        lan_wr(SYS_RAM_INIT_RAM_INIT_SET(1), lan966x, SYS_RAM_INIT);
        ret = readx_poll_timeout(lan966x_ram_init, lan966x,
index 06d6292..d907727 100644 (file)
@@ -1279,8 +1279,6 @@ static void mana_poll_tx_cq(struct mana_cq *cq)
        if (comp_read < 1)
                return;
 
-       apc->eth_stats.tx_cqes = comp_read;
-
        for (i = 0; i < comp_read; i++) {
                struct mana_tx_comp_oob *cqe_oob;
 
@@ -1363,8 +1361,6 @@ static void mana_poll_tx_cq(struct mana_cq *cq)
                WARN_ON_ONCE(1);
 
        cq->work_done = pkt_transmitted;
-
-       apc->eth_stats.tx_cqes -= pkt_transmitted;
 }
 
 static void mana_post_pkt_rxq(struct mana_rxq *rxq)
@@ -1626,15 +1622,11 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
 {
        struct gdma_comp *comp = cq->gdma_comp_buf;
        struct mana_rxq *rxq = cq->rxq;
-       struct mana_port_context *apc;
        int comp_read, i;
 
-       apc = netdev_priv(rxq->ndev);
-
        comp_read = mana_gd_poll_cq(cq->gdma_cq, comp, CQE_POLLING_BUFFER);
        WARN_ON_ONCE(comp_read > CQE_POLLING_BUFFER);
 
-       apc->eth_stats.rx_cqes = comp_read;
        rxq->xdp_flush = false;
 
        for (i = 0; i < comp_read; i++) {
@@ -1646,8 +1638,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
                        return;
 
                mana_process_rx_cqe(rxq, cq, &comp[i]);
-
-               apc->eth_stats.rx_cqes--;
        }
 
        if (rxq->xdp_flush)
index a64c814..0dc7867 100644 (file)
@@ -13,11 +13,9 @@ static const struct {
 } mana_eth_stats[] = {
        {"stop_queue", offsetof(struct mana_ethtool_stats, stop_queue)},
        {"wake_queue", offsetof(struct mana_ethtool_stats, wake_queue)},
-       {"tx_cqes", offsetof(struct mana_ethtool_stats, tx_cqes)},
        {"tx_cq_err", offsetof(struct mana_ethtool_stats, tx_cqe_err)},
        {"tx_cqe_unknown_type", offsetof(struct mana_ethtool_stats,
                                        tx_cqe_unknown_type)},
-       {"rx_cqes", offsetof(struct mana_ethtool_stats, rx_cqes)},
        {"rx_coalesced_err", offsetof(struct mana_ethtool_stats,
                                        rx_coalesced_err)},
        {"rx_cqe_unknown_type", offsetof(struct mana_ethtool_stats,
index ef6fd3f..5595bfe 100644 (file)
@@ -307,15 +307,15 @@ static const u32 vsc7514_sys_regmap[] = {
        REG(SYS_COUNT_DROP_YELLOW_PRIO_4,               0x000218),
        REG(SYS_COUNT_DROP_YELLOW_PRIO_5,               0x00021c),
        REG(SYS_COUNT_DROP_YELLOW_PRIO_6,               0x000220),
-       REG(SYS_COUNT_DROP_YELLOW_PRIO_7,               0x000214),
-       REG(SYS_COUNT_DROP_GREEN_PRIO_0,                0x000218),
-       REG(SYS_COUNT_DROP_GREEN_PRIO_1,                0x00021c),
-       REG(SYS_COUNT_DROP_GREEN_PRIO_2,                0x000220),
-       REG(SYS_COUNT_DROP_GREEN_PRIO_3,                0x000224),
-       REG(SYS_COUNT_DROP_GREEN_PRIO_4,                0x000228),
-       REG(SYS_COUNT_DROP_GREEN_PRIO_5,                0x00022c),
-       REG(SYS_COUNT_DROP_GREEN_PRIO_6,                0x000230),
-       REG(SYS_COUNT_DROP_GREEN_PRIO_7,                0x000234),
+       REG(SYS_COUNT_DROP_YELLOW_PRIO_7,               0x000224),
+       REG(SYS_COUNT_DROP_GREEN_PRIO_0,                0x000228),
+       REG(SYS_COUNT_DROP_GREEN_PRIO_1,                0x00022c),
+       REG(SYS_COUNT_DROP_GREEN_PRIO_2,                0x000230),
+       REG(SYS_COUNT_DROP_GREEN_PRIO_3,                0x000234),
+       REG(SYS_COUNT_DROP_GREEN_PRIO_4,                0x000238),
+       REG(SYS_COUNT_DROP_GREEN_PRIO_5,                0x00023c),
+       REG(SYS_COUNT_DROP_GREEN_PRIO_6,                0x000240),
+       REG(SYS_COUNT_DROP_GREEN_PRIO_7,                0x000244),
        REG(SYS_RESET_CFG,                              0x000508),
        REG(SYS_CMID,                                   0x00050c),
        REG(SYS_VLAN_ETYPE_CFG,                         0x000510),
index 094374d..38b8b10 100644 (file)
@@ -8,7 +8,7 @@
 
 #ifdef CONFIG_DCB
 /* DCB feature definitions */
-#define NFP_NET_MAX_DSCP       4
+#define NFP_NET_MAX_DSCP       64
 #define NFP_NET_MAX_TC         IEEE_8021QAZ_MAX_TCS
 #define NFP_NET_MAX_PRIO       8
 #define NFP_DCB_CFG_STRIDE     256
index 0605d1e..7a549b8 100644 (file)
@@ -6138,6 +6138,7 @@ static int nv_probe(struct pci_dev *pci_dev, const struct pci_device_id *id)
        return 0;
 
 out_error:
+       nv_mgmt_release_sema(dev);
        if (phystate_orig)
                writel(phystate|NVREG_ADAPTCTL_RUNNING, base + NvRegAdapterControl);
 out_freering:
index 2edd6bf..7776d3b 100644 (file)
@@ -1903,7 +1903,7 @@ void qed_get_vport_stats(struct qed_dev *cdev, struct qed_eth_stats *stats)
 {
        u32 i;
 
-       if (!cdev) {
+       if (!cdev || cdev->recov_in_prog) {
                memset(stats, 0, sizeof(*stats));
                return;
        }
index f9931ec..4d83cee 100644 (file)
@@ -269,6 +269,10 @@ struct qede_dev {
 #define QEDE_ERR_WARN                  3
 
        struct qede_dump_info           dump_info;
+       struct delayed_work             periodic_task;
+       unsigned long                   stats_coal_ticks;
+       u32                             stats_coal_usecs;
+       spinlock_t                      stats_lock; /* lock for vport stats access */
 };
 
 enum QEDE_STATE {
index 374a86b..95820cf 100644 (file)
@@ -429,6 +429,8 @@ static void qede_get_ethtool_stats(struct net_device *dev,
                }
        }
 
+       spin_lock(&edev->stats_lock);
+
        for (i = 0; i < QEDE_NUM_STATS; i++) {
                if (qede_is_irrelevant_stat(edev, i))
                        continue;
@@ -438,6 +440,8 @@ static void qede_get_ethtool_stats(struct net_device *dev,
                buf++;
        }
 
+       spin_unlock(&edev->stats_lock);
+
        __qede_unlock(edev);
 }
 
@@ -829,6 +833,7 @@ out:
 
        coal->rx_coalesce_usecs = rx_coal;
        coal->tx_coalesce_usecs = tx_coal;
+       coal->stats_block_coalesce_usecs = edev->stats_coal_usecs;
 
        return rc;
 }
@@ -842,6 +847,19 @@ int qede_set_coalesce(struct net_device *dev, struct ethtool_coalesce *coal,
        int i, rc = 0;
        u16 rxc, txc;
 
+       if (edev->stats_coal_usecs != coal->stats_block_coalesce_usecs) {
+               edev->stats_coal_usecs = coal->stats_block_coalesce_usecs;
+               if (edev->stats_coal_usecs) {
+                       edev->stats_coal_ticks = usecs_to_jiffies(edev->stats_coal_usecs);
+                       schedule_delayed_work(&edev->periodic_task, 0);
+
+                       DP_INFO(edev, "Configured stats coal ticks=%lu jiffies\n",
+                               edev->stats_coal_ticks);
+               } else {
+                       cancel_delayed_work_sync(&edev->periodic_task);
+               }
+       }
+
        if (!netif_running(dev)) {
                DP_INFO(edev, "Interface is down\n");
                return -EINVAL;
@@ -2252,7 +2270,8 @@ out:
 }
 
 static const struct ethtool_ops qede_ethtool_ops = {
-       .supported_coalesce_params      = ETHTOOL_COALESCE_USECS,
+       .supported_coalesce_params      = ETHTOOL_COALESCE_USECS |
+                                         ETHTOOL_COALESCE_STATS_BLOCK_USECS,
        .get_link_ksettings             = qede_get_link_ksettings,
        .set_link_ksettings             = qede_set_link_ksettings,
        .get_drvinfo                    = qede_get_drvinfo,
@@ -2303,7 +2322,8 @@ static const struct ethtool_ops qede_ethtool_ops = {
 };
 
 static const struct ethtool_ops qede_vf_ethtool_ops = {
-       .supported_coalesce_params      = ETHTOOL_COALESCE_USECS,
+       .supported_coalesce_params      = ETHTOOL_COALESCE_USECS |
+                                         ETHTOOL_COALESCE_STATS_BLOCK_USECS,
        .get_link_ksettings             = qede_get_link_ksettings,
        .get_drvinfo                    = qede_get_drvinfo,
        .get_msglevel                   = qede_get_msglevel,
index 4c6c685..4b004a7 100644 (file)
@@ -307,6 +307,8 @@ void qede_fill_by_demand_stats(struct qede_dev *edev)
 
        edev->ops->get_vport_stats(edev->cdev, &stats);
 
+       spin_lock(&edev->stats_lock);
+
        p_common->no_buff_discards = stats.common.no_buff_discards;
        p_common->packet_too_big_discard = stats.common.packet_too_big_discard;
        p_common->ttl0_discard = stats.common.ttl0_discard;
@@ -404,6 +406,8 @@ void qede_fill_by_demand_stats(struct qede_dev *edev)
                p_ah->tx_1519_to_max_byte_packets =
                    stats.ah.tx_1519_to_max_byte_packets;
        }
+
+       spin_unlock(&edev->stats_lock);
 }
 
 static void qede_get_stats64(struct net_device *dev,
@@ -412,9 +416,10 @@ static void qede_get_stats64(struct net_device *dev,
        struct qede_dev *edev = netdev_priv(dev);
        struct qede_stats_common *p_common;
 
-       qede_fill_by_demand_stats(edev);
        p_common = &edev->stats.common;
 
+       spin_lock(&edev->stats_lock);
+
        stats->rx_packets = p_common->rx_ucast_pkts + p_common->rx_mcast_pkts +
                            p_common->rx_bcast_pkts;
        stats->tx_packets = p_common->tx_ucast_pkts + p_common->tx_mcast_pkts +
@@ -434,6 +439,8 @@ static void qede_get_stats64(struct net_device *dev,
                stats->collisions = edev->stats.bb.tx_total_collisions;
        stats->rx_crc_errors = p_common->rx_crc_errors;
        stats->rx_frame_errors = p_common->rx_align_errors;
+
+       spin_unlock(&edev->stats_lock);
 }
 
 #ifdef CONFIG_QED_SRIOV
@@ -1063,6 +1070,23 @@ static void qede_unlock(struct qede_dev *edev)
        rtnl_unlock();
 }
 
+static void qede_periodic_task(struct work_struct *work)
+{
+       struct qede_dev *edev = container_of(work, struct qede_dev,
+                                            periodic_task.work);
+
+       qede_fill_by_demand_stats(edev);
+       schedule_delayed_work(&edev->periodic_task, edev->stats_coal_ticks);
+}
+
+static void qede_init_periodic_task(struct qede_dev *edev)
+{
+       INIT_DELAYED_WORK(&edev->periodic_task, qede_periodic_task);
+       spin_lock_init(&edev->stats_lock);
+       edev->stats_coal_usecs = USEC_PER_SEC;
+       edev->stats_coal_ticks = usecs_to_jiffies(USEC_PER_SEC);
+}
+
 static void qede_sp_task(struct work_struct *work)
 {
        struct qede_dev *edev = container_of(work, struct qede_dev,
@@ -1082,6 +1106,7 @@ static void qede_sp_task(struct work_struct *work)
         */
 
        if (test_and_clear_bit(QEDE_SP_RECOVERY, &edev->sp_flags)) {
+               cancel_delayed_work_sync(&edev->periodic_task);
 #ifdef CONFIG_QED_SRIOV
                /* SRIOV must be disabled outside the lock to avoid a deadlock.
                 * The recovery of the active VFs is currently not supported.
@@ -1272,6 +1297,7 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level,
                 */
                INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task);
                mutex_init(&edev->qede_lock);
+               qede_init_periodic_task(edev);
 
                rc = register_netdev(edev->ndev);
                if (rc) {
@@ -1296,6 +1322,11 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level,
        edev->rx_copybreak = QEDE_RX_HDR_SIZE;
 
        qede_log_probe(edev);
+
+       /* retain user config (for example - after recovery) */
+       if (edev->stats_coal_usecs)
+               schedule_delayed_work(&edev->periodic_task, 0);
+
        return 0;
 
 err4:
@@ -1364,6 +1395,7 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode)
                unregister_netdev(ndev);
 
                cancel_delayed_work_sync(&edev->sp_task);
+               cancel_delayed_work_sync(&edev->periodic_task);
 
                edev->ops->common->set_power_state(cdev, PCI_D0);
 
index c865a4b..4a1b94e 100644 (file)
@@ -582,8 +582,7 @@ qcaspi_spi_thread(void *data)
        while (!kthread_should_stop()) {
                set_current_state(TASK_INTERRUPTIBLE);
                if ((qca->intr_req == qca->intr_svc) &&
-                   (qca->txr.skb[qca->txr.head] == NULL) &&
-                   (qca->sync == QCASPI_SYNC_READY))
+                   !qca->txr.skb[qca->txr.head])
                        schedule();
 
                set_current_state(TASK_RUNNING);
index a7e376e..4b19803 100644 (file)
@@ -616,10 +616,10 @@ struct rtl8169_private {
                struct work_struct work;
        } wk;
 
-       spinlock_t config25_lock;
-       spinlock_t mac_ocp_lock;
+       raw_spinlock_t config25_lock;
+       raw_spinlock_t mac_ocp_lock;
 
-       spinlock_t cfg9346_usage_lock;
+       raw_spinlock_t cfg9346_usage_lock;
        int cfg9346_usage_count;
 
        unsigned supports_gmii:1;
@@ -671,20 +671,20 @@ static void rtl_lock_config_regs(struct rtl8169_private *tp)
 {
        unsigned long flags;
 
-       spin_lock_irqsave(&tp->cfg9346_usage_lock, flags);
+       raw_spin_lock_irqsave(&tp->cfg9346_usage_lock, flags);
        if (!--tp->cfg9346_usage_count)
                RTL_W8(tp, Cfg9346, Cfg9346_Lock);
-       spin_unlock_irqrestore(&tp->cfg9346_usage_lock, flags);
+       raw_spin_unlock_irqrestore(&tp->cfg9346_usage_lock, flags);
 }
 
 static void rtl_unlock_config_regs(struct rtl8169_private *tp)
 {
        unsigned long flags;
 
-       spin_lock_irqsave(&tp->cfg9346_usage_lock, flags);
+       raw_spin_lock_irqsave(&tp->cfg9346_usage_lock, flags);
        if (!tp->cfg9346_usage_count++)
                RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
-       spin_unlock_irqrestore(&tp->cfg9346_usage_lock, flags);
+       raw_spin_unlock_irqrestore(&tp->cfg9346_usage_lock, flags);
 }
 
 static void rtl_pci_commit(struct rtl8169_private *tp)
@@ -698,10 +698,10 @@ static void rtl_mod_config2(struct rtl8169_private *tp, u8 clear, u8 set)
        unsigned long flags;
        u8 val;
 
-       spin_lock_irqsave(&tp->config25_lock, flags);
+       raw_spin_lock_irqsave(&tp->config25_lock, flags);
        val = RTL_R8(tp, Config2);
        RTL_W8(tp, Config2, (val & ~clear) | set);
-       spin_unlock_irqrestore(&tp->config25_lock, flags);
+       raw_spin_unlock_irqrestore(&tp->config25_lock, flags);
 }
 
 static void rtl_mod_config5(struct rtl8169_private *tp, u8 clear, u8 set)
@@ -709,10 +709,10 @@ static void rtl_mod_config5(struct rtl8169_private *tp, u8 clear, u8 set)
        unsigned long flags;
        u8 val;
 
-       spin_lock_irqsave(&tp->config25_lock, flags);
+       raw_spin_lock_irqsave(&tp->config25_lock, flags);
        val = RTL_R8(tp, Config5);
        RTL_W8(tp, Config5, (val & ~clear) | set);
-       spin_unlock_irqrestore(&tp->config25_lock, flags);
+       raw_spin_unlock_irqrestore(&tp->config25_lock, flags);
 }
 
 static bool rtl_is_8125(struct rtl8169_private *tp)
@@ -899,9 +899,9 @@ static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data)
 {
        unsigned long flags;
 
-       spin_lock_irqsave(&tp->mac_ocp_lock, flags);
+       raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
        __r8168_mac_ocp_write(tp, reg, data);
-       spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
+       raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
 }
 
 static u16 __r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg)
@@ -919,9 +919,9 @@ static u16 r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg)
        unsigned long flags;
        u16 val;
 
-       spin_lock_irqsave(&tp->mac_ocp_lock, flags);
+       raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
        val = __r8168_mac_ocp_read(tp, reg);
-       spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
+       raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
 
        return val;
 }
@@ -932,10 +932,10 @@ static void r8168_mac_ocp_modify(struct rtl8169_private *tp, u32 reg, u16 mask,
        unsigned long flags;
        u16 data;
 
-       spin_lock_irqsave(&tp->mac_ocp_lock, flags);
+       raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
        data = __r8168_mac_ocp_read(tp, reg);
        __r8168_mac_ocp_write(tp, reg, (data & ~mask) | set);
-       spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
+       raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
 }
 
 /* Work around a hw issue with RTL8168g PHY, the quirk disables
@@ -1420,14 +1420,14 @@ static void __rtl8169_set_wol(struct rtl8169_private *tp, u32 wolopts)
                        r8168_mac_ocp_modify(tp, 0xc0b6, BIT(0), 0);
        }
 
-       spin_lock_irqsave(&tp->config25_lock, flags);
+       raw_spin_lock_irqsave(&tp->config25_lock, flags);
        for (i = 0; i < tmp; i++) {
                options = RTL_R8(tp, cfg[i].reg) & ~cfg[i].mask;
                if (wolopts & cfg[i].opt)
                        options |= cfg[i].mask;
                RTL_W8(tp, cfg[i].reg, options);
        }
-       spin_unlock_irqrestore(&tp->config25_lock, flags);
+       raw_spin_unlock_irqrestore(&tp->config25_lock, flags);
 
        switch (tp->mac_version) {
        case RTL_GIGA_MAC_VER_02 ... RTL_GIGA_MAC_VER_06:
@@ -5179,9 +5179,9 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
        tp->eee_adv = -1;
        tp->ocp_base = OCP_STD_PHY_BASE;
 
-       spin_lock_init(&tp->cfg9346_usage_lock);
-       spin_lock_init(&tp->config25_lock);
-       spin_lock_init(&tp->mac_ocp_lock);
+       raw_spin_lock_init(&tp->cfg9346_usage_lock);
+       raw_spin_lock_init(&tp->config25_lock);
+       raw_spin_lock_init(&tp->mac_ocp_lock);
 
        dev->tstats = devm_netdev_alloc_pcpu_stats(&pdev->dev,
                                                   struct pcpu_sw_netstats);
index 29afadd..fa6d620 100644 (file)
@@ -347,17 +347,6 @@ out:
        return -ENOMEM;
 }
 
-static int rswitch_gwca_ts_queue_alloc(struct rswitch_private *priv)
-{
-       struct rswitch_gwca_queue *gq = &priv->gwca.ts_queue;
-
-       gq->ring_size = TS_RING_SIZE;
-       gq->ts_ring = dma_alloc_coherent(&priv->pdev->dev,
-                                        sizeof(struct rswitch_ts_desc) *
-                                        (gq->ring_size + 1), &gq->ring_dma, GFP_KERNEL);
-       return !gq->ts_ring ? -ENOMEM : 0;
-}
-
 static void rswitch_desc_set_dptr(struct rswitch_desc *desc, dma_addr_t addr)
 {
        desc->dptrl = cpu_to_le32(lower_32_bits(addr));
@@ -533,6 +522,28 @@ static void rswitch_gwca_linkfix_free(struct rswitch_private *priv)
        gwca->linkfix_table = NULL;
 }
 
+static int rswitch_gwca_ts_queue_alloc(struct rswitch_private *priv)
+{
+       struct rswitch_gwca_queue *gq = &priv->gwca.ts_queue;
+       struct rswitch_ts_desc *desc;
+
+       gq->ring_size = TS_RING_SIZE;
+       gq->ts_ring = dma_alloc_coherent(&priv->pdev->dev,
+                                        sizeof(struct rswitch_ts_desc) *
+                                        (gq->ring_size + 1), &gq->ring_dma, GFP_KERNEL);
+
+       if (!gq->ts_ring)
+               return -ENOMEM;
+
+       rswitch_gwca_ts_queue_fill(priv, 0, TS_RING_SIZE);
+       desc = &gq->ts_ring[gq->ring_size];
+       desc->desc.die_dt = DT_LINKFIX;
+       rswitch_desc_set_dptr(&desc->desc, gq->ring_dma);
+       INIT_LIST_HEAD(&priv->gwca.ts_info_list);
+
+       return 0;
+}
+
 static struct rswitch_gwca_queue *rswitch_gwca_get(struct rswitch_private *priv)
 {
        struct rswitch_gwca_queue *gq;
@@ -1485,7 +1496,7 @@ static netdev_tx_t rswitch_start_xmit(struct sk_buff *skb, struct net_device *nd
 
        if (rswitch_get_num_cur_queues(gq) >= gq->ring_size - 1) {
                netif_stop_subqueue(ndev, 0);
-               return ret;
+               return NETDEV_TX_BUSY;
        }
 
        if (skb_put_padto(skb, ETH_ZLEN))
@@ -1780,9 +1791,6 @@ static int rswitch_init(struct rswitch_private *priv)
        if (err < 0)
                goto err_ts_queue_alloc;
 
-       rswitch_gwca_ts_queue_fill(priv, 0, TS_RING_SIZE);
-       INIT_LIST_HEAD(&priv->gwca.ts_info_list);
-
        for (i = 0; i < RSWITCH_NUM_PORTS; i++) {
                err = rswitch_device_alloc(priv, i);
                if (err < 0) {
index d30459d..b63e47a 100644 (file)
@@ -2950,7 +2950,7 @@ static u32 efx_ef10_extract_event_ts(efx_qword_t *event)
        return tstamp;
 }
 
-static void
+static int
 efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event)
 {
        struct efx_nic *efx = channel->efx;
@@ -2958,13 +2958,14 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event)
        unsigned int tx_ev_desc_ptr;
        unsigned int tx_ev_q_label;
        unsigned int tx_ev_type;
+       int work_done;
        u64 ts_part;
 
        if (unlikely(READ_ONCE(efx->reset_pending)))
-               return;
+               return 0;
 
        if (unlikely(EFX_QWORD_FIELD(*event, ESF_DZ_TX_DROP_EVENT)))
-               return;
+               return 0;
 
        /* Get the transmit queue */
        tx_ev_q_label = EFX_QWORD_FIELD(*event, ESF_DZ_TX_QLABEL);
@@ -2973,8 +2974,7 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event)
        if (!tx_queue->timestamping) {
                /* Transmit completion */
                tx_ev_desc_ptr = EFX_QWORD_FIELD(*event, ESF_DZ_TX_DESCR_INDX);
-               efx_xmit_done(tx_queue, tx_ev_desc_ptr & tx_queue->ptr_mask);
-               return;
+               return efx_xmit_done(tx_queue, tx_ev_desc_ptr & tx_queue->ptr_mask);
        }
 
        /* Transmit timestamps are only available for 8XXX series. They result
@@ -3000,6 +3000,7 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event)
         * fields in the event.
         */
        tx_ev_type = EFX_QWORD_FIELD(*event, ESF_EZ_TX_SOFT1);
+       work_done = 0;
 
        switch (tx_ev_type) {
        case TX_TIMESTAMP_EVENT_TX_EV_COMPLETION:
@@ -3016,6 +3017,7 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event)
                tx_queue->completed_timestamp_major = ts_part;
 
                efx_xmit_done_single(tx_queue);
+               work_done = 1;
                break;
 
        default:
@@ -3026,6 +3028,8 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event)
                          EFX_QWORD_VAL(*event));
                break;
        }
+
+       return work_done;
 }
 
 static void
@@ -3081,13 +3085,16 @@ static void efx_ef10_handle_driver_generated_event(struct efx_channel *channel,
        }
 }
 
+#define EFX_NAPI_MAX_TX 512
+
 static int efx_ef10_ev_process(struct efx_channel *channel, int quota)
 {
        struct efx_nic *efx = channel->efx;
        efx_qword_t event, *p_event;
        unsigned int read_ptr;
-       int ev_code;
+       int spent_tx = 0;
        int spent = 0;
+       int ev_code;
 
        if (quota <= 0)
                return spent;
@@ -3126,7 +3133,11 @@ static int efx_ef10_ev_process(struct efx_channel *channel, int quota)
                        }
                        break;
                case ESE_DZ_EV_CODE_TX_EV:
-                       efx_ef10_handle_tx_event(channel, &event);
+                       spent_tx += efx_ef10_handle_tx_event(channel, &event);
+                       if (spent_tx >= EFX_NAPI_MAX_TX) {
+                               spent = quota;
+                               goto out;
+                       }
                        break;
                case ESE_DZ_EV_CODE_DRIVER_EV:
                        efx_ef10_handle_driver_event(channel, &event);
index d916877..be395cd 100644 (file)
@@ -378,7 +378,9 @@ int ef100_probe_netdev(struct efx_probe_data *probe_data)
        efx->net_dev = net_dev;
        SET_NETDEV_DEV(net_dev, &efx->pci_dev->dev);
 
-       net_dev->features |= efx->type->offload_features;
+       /* enable all supported features except rx-fcs and rx-all */
+       net_dev->features |= efx->type->offload_features &
+                            ~(NETIF_F_RXFCS | NETIF_F_RXALL);
        net_dev->hw_features |= efx->type->offload_features;
        net_dev->hw_enc_features |= efx->type->offload_features;
        net_dev->vlan_features |= NETIF_F_HW_CSUM | NETIF_F_SG |
index 4dc643b..7adde96 100644 (file)
@@ -253,6 +253,8 @@ static void ef100_ev_read_ack(struct efx_channel *channel)
                   efx_reg(channel->efx, ER_GZ_EVQ_INT_PRIME));
 }
 
+#define EFX_NAPI_MAX_TX 512
+
 static int ef100_ev_process(struct efx_channel *channel, int quota)
 {
        struct efx_nic *efx = channel->efx;
@@ -260,6 +262,7 @@ static int ef100_ev_process(struct efx_channel *channel, int quota)
        bool evq_phase, old_evq_phase;
        unsigned int read_ptr;
        efx_qword_t *p_event;
+       int spent_tx = 0;
        int spent = 0;
        bool ev_phase;
        int ev_type;
@@ -295,7 +298,9 @@ static int ef100_ev_process(struct efx_channel *channel, int quota)
                        efx_mcdi_process_event(channel, p_event);
                        break;
                case ESE_GZ_EF100_EV_TX_COMPLETION:
-                       ef100_ev_tx(channel, p_event);
+                       spent_tx += ef100_ev_tx(channel, p_event);
+                       if (spent_tx >= EFX_NAPI_MAX_TX)
+                               spent = quota;
                        break;
                case ESE_GZ_EF100_EV_DRIVER:
                        netif_info(efx, drv, efx->net_dev,
index 29ffaf3..849e555 100644 (file)
@@ -346,7 +346,7 @@ void ef100_tx_write(struct efx_tx_queue *tx_queue)
        ef100_tx_push_buffers(tx_queue);
 }
 
-void ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event)
+int ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event)
 {
        unsigned int tx_done =
                EFX_QWORD_FIELD(*p_event, ESF_GZ_EV_TXCMPL_NUM_DESC);
@@ -357,7 +357,7 @@ void ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event)
        unsigned int tx_index = (tx_queue->read_count + tx_done - 1) &
                                tx_queue->ptr_mask;
 
-       efx_xmit_done(tx_queue, tx_index);
+       return efx_xmit_done(tx_queue, tx_index);
 }
 
 /* Add a socket buffer to a TX queue
index e9e1154..d9a0819 100644 (file)
@@ -20,7 +20,7 @@ void ef100_tx_init(struct efx_tx_queue *tx_queue);
 void ef100_tx_write(struct efx_tx_queue *tx_queue);
 unsigned int ef100_tx_max_skb_descs(struct efx_nic *efx);
 
-void ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event);
+int ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event);
 
 netdev_tx_t ef100_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
 int __ef100_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb,
index fcea3ea..41b33a7 100644 (file)
@@ -301,6 +301,7 @@ int efx_probe_interrupts(struct efx_nic *efx)
                efx->tx_channel_offset = 0;
                efx->n_xdp_channels = 0;
                efx->xdp_channel_offset = efx->n_channels;
+               efx->xdp_txq_queues_mode = EFX_XDP_TX_QUEUES_BORROWED;
                rc = pci_enable_msi(efx->pci_dev);
                if (rc == 0) {
                        efx_get_channel(efx, 0)->irq = efx->pci_dev->irq;
@@ -322,6 +323,7 @@ int efx_probe_interrupts(struct efx_nic *efx)
                efx->tx_channel_offset = efx_separate_tx_channels ? 1 : 0;
                efx->n_xdp_channels = 0;
                efx->xdp_channel_offset = efx->n_channels;
+               efx->xdp_txq_queues_mode = EFX_XDP_TX_QUEUES_BORROWED;
                efx->legacy_irq = efx->pci_dev->irq;
        }
 
index 381b805..ef9971c 100644 (file)
@@ -171,9 +171,14 @@ static int efx_devlink_info_nvram_partition(struct efx_nic *efx,
 
        rc = efx_mcdi_nvram_metadata(efx, partition_type, NULL, version, NULL,
                                     0);
+
+       /* If the partition does not exist, that is not an error. */
+       if (rc == -ENOENT)
+               return 0;
+
        if (rc) {
-               netif_err(efx, drv, efx->net_dev, "mcdi nvram %s: failed\n",
-                         version_name);
+               netif_err(efx, drv, efx->net_dev, "mcdi nvram %s: failed (rc=%d)\n",
+                         version_name, rc);
                return rc;
        }
 
@@ -187,36 +192,33 @@ static int efx_devlink_info_nvram_partition(struct efx_nic *efx,
 static int efx_devlink_info_stored_versions(struct efx_nic *efx,
                                            struct devlink_info_req *req)
 {
-       int rc;
-
-       rc = efx_devlink_info_nvram_partition(efx, req,
-                                             NVRAM_PARTITION_TYPE_BUNDLE,
-                                             DEVLINK_INFO_VERSION_GENERIC_FW_BUNDLE_ID);
-       if (rc)
-               return rc;
-
-       rc = efx_devlink_info_nvram_partition(efx, req,
-                                             NVRAM_PARTITION_TYPE_MC_FIRMWARE,
-                                             DEVLINK_INFO_VERSION_GENERIC_FW_MGMT);
-       if (rc)
-               return rc;
-
-       rc = efx_devlink_info_nvram_partition(efx, req,
-                                             NVRAM_PARTITION_TYPE_SUC_FIRMWARE,
-                                             EFX_DEVLINK_INFO_VERSION_FW_MGMT_SUC);
-       if (rc)
-               return rc;
-
-       rc = efx_devlink_info_nvram_partition(efx, req,
-                                             NVRAM_PARTITION_TYPE_EXPANSION_ROM,
-                                             EFX_DEVLINK_INFO_VERSION_FW_EXPROM);
-       if (rc)
-               return rc;
+       int err;
 
-       rc = efx_devlink_info_nvram_partition(efx, req,
-                                             NVRAM_PARTITION_TYPE_EXPANSION_UEFI,
-                                             EFX_DEVLINK_INFO_VERSION_FW_UEFI);
-       return rc;
+       /* We do not care here about the specific error but just if an error
+        * happened. The specific error will be reported inside the call
+        * through system messages, and if any error happened in any call
+        * below, we report it through extack.
+        */
+       err = efx_devlink_info_nvram_partition(efx, req,
+                                              NVRAM_PARTITION_TYPE_BUNDLE,
+                                              DEVLINK_INFO_VERSION_GENERIC_FW_BUNDLE_ID);
+
+       err |= efx_devlink_info_nvram_partition(efx, req,
+                                               NVRAM_PARTITION_TYPE_MC_FIRMWARE,
+                                               DEVLINK_INFO_VERSION_GENERIC_FW_MGMT);
+
+       err |= efx_devlink_info_nvram_partition(efx, req,
+                                               NVRAM_PARTITION_TYPE_SUC_FIRMWARE,
+                                               EFX_DEVLINK_INFO_VERSION_FW_MGMT_SUC);
+
+       err |= efx_devlink_info_nvram_partition(efx, req,
+                                               NVRAM_PARTITION_TYPE_EXPANSION_ROM,
+                                               EFX_DEVLINK_INFO_VERSION_FW_EXPROM);
+
+       err |= efx_devlink_info_nvram_partition(efx, req,
+                                               NVRAM_PARTITION_TYPE_EXPANSION_UEFI,
+                                               EFX_DEVLINK_INFO_VERSION_FW_UEFI);
+       return err;
 }
 
 #define EFX_VER_FLAG(_f)       \
@@ -587,27 +589,20 @@ static int efx_devlink_info_get(struct devlink *devlink,
 {
        struct efx_devlink *devlink_private = devlink_priv(devlink);
        struct efx_nic *efx = devlink_private->efx;
-       int rc;
+       int err;
 
-       /* Several different MCDI commands are used. We report first error
-        * through extack returning at that point. Specific error
-        * information via system messages.
+       /* Several different MCDI commands are used. We report if errors
+        * happened through extack. Specific error information via system
+        * messages inside the calls.
         */
-       rc = efx_devlink_info_board_cfg(efx, req);
-       if (rc) {
-               NL_SET_ERR_MSG_MOD(extack, "Getting board info failed");
-               return rc;
-       }
-       rc = efx_devlink_info_stored_versions(efx, req);
-       if (rc) {
-               NL_SET_ERR_MSG_MOD(extack, "Getting stored versions failed");
-               return rc;
-       }
-       rc = efx_devlink_info_running_versions(efx, req);
-       if (rc) {
-               NL_SET_ERR_MSG_MOD(extack, "Getting running versions failed");
-               return rc;
-       }
+       err = efx_devlink_info_board_cfg(efx, req);
+
+       err |= efx_devlink_info_stored_versions(efx, req);
+
+       err |= efx_devlink_info_running_versions(efx, req);
+
+       if (err)
+               NL_SET_ERR_MSG_MOD(extack, "Errors when getting device info. Check system messages");
 
        return 0;
 }
index 06ed749..1776f7f 100644 (file)
@@ -302,6 +302,7 @@ int efx_siena_probe_interrupts(struct efx_nic *efx)
                efx->tx_channel_offset = 0;
                efx->n_xdp_channels = 0;
                efx->xdp_channel_offset = efx->n_channels;
+               efx->xdp_txq_queues_mode = EFX_XDP_TX_QUEUES_BORROWED;
                rc = pci_enable_msi(efx->pci_dev);
                if (rc == 0) {
                        efx_get_channel(efx, 0)->irq = efx->pci_dev->irq;
@@ -323,6 +324,7 @@ int efx_siena_probe_interrupts(struct efx_nic *efx)
                efx->tx_channel_offset = efx_siena_separate_tx_channels ? 1 : 0;
                efx->n_xdp_channels = 0;
                efx->xdp_channel_offset = efx->n_channels;
+               efx->xdp_txq_queues_mode = EFX_XDP_TX_QUEUES_BORROWED;
                efx->legacy_irq = efx->pci_dev->irq;
        }
 
index 0327639..c004443 100644 (file)
@@ -624,13 +624,12 @@ static int efx_tc_flower_replace_foreign(struct efx_nic *efx,
        if (!found) { /* We don't care. */
                netif_dbg(efx, drv, efx->net_dev,
                          "Ignoring foreign filter that doesn't egdev us\n");
-               rc = -EOPNOTSUPP;
-               goto release;
+               return -EOPNOTSUPP;
        }
 
        rc = efx_mae_match_check_caps(efx, &match.mask, NULL);
        if (rc)
-               goto release;
+               return rc;
 
        if (efx_tc_match_is_encap(&match.mask)) {
                enum efx_encap_type type;
@@ -639,8 +638,7 @@ static int efx_tc_flower_replace_foreign(struct efx_nic *efx,
                if (type == EFX_ENCAP_TYPE_NONE) {
                        NL_SET_ERR_MSG_MOD(extack,
                                           "Egress encap match on unsupported tunnel device");
-                       rc = -EOPNOTSUPP;
-                       goto release;
+                       return -EOPNOTSUPP;
                }
 
                rc = efx_mae_check_encap_type_supported(efx, type);
@@ -648,25 +646,24 @@ static int efx_tc_flower_replace_foreign(struct efx_nic *efx,
                        NL_SET_ERR_MSG_FMT_MOD(extack,
                                               "Firmware reports no support for %s encap match",
                                               efx_tc_encap_type_name(type));
-                       goto release;
+                       return rc;
                }
 
                rc = efx_tc_flower_record_encap_match(efx, &match, type,
                                                      extack);
                if (rc)
-                       goto release;
+                       return rc;
        } else {
                /* This is not a tunnel decap rule, ignore it */
                netif_dbg(efx, drv, efx->net_dev,
                          "Ignoring foreign filter without encap match\n");
-               rc = -EOPNOTSUPP;
-               goto release;
+               return -EOPNOTSUPP;
        }
 
        rule = kzalloc(sizeof(*rule), GFP_USER);
        if (!rule) {
                rc = -ENOMEM;
-               goto release;
+               goto out_free;
        }
        INIT_LIST_HEAD(&rule->acts.list);
        rule->cookie = tc->cookie;
@@ -678,7 +675,7 @@ static int efx_tc_flower_replace_foreign(struct efx_nic *efx,
                          "Ignoring already-offloaded rule (cookie %lx)\n",
                          tc->cookie);
                rc = -EEXIST;
-               goto release;
+               goto out_free;
        }
 
        act = kzalloc(sizeof(*act), GFP_USER);
@@ -843,6 +840,7 @@ release:
                                       efx_tc_match_action_ht_params);
                efx_tc_free_action_set_list(efx, &rule->acts, false);
        }
+out_free:
        kfree(rule);
        if (match.encap)
                efx_tc_flower_release_encap_match(efx, match.encap);
@@ -899,8 +897,7 @@ static int efx_tc_flower_replace(struct efx_nic *efx,
                return rc;
        if (efx_tc_match_is_encap(&match.mask)) {
                NL_SET_ERR_MSG_MOD(extack, "Ingress enc_key matches not supported");
-               rc = -EOPNOTSUPP;
-               goto release;
+               return -EOPNOTSUPP;
        }
 
        if (tc->common.chain_index) {
@@ -924,9 +921,9 @@ static int efx_tc_flower_replace(struct efx_nic *efx,
        if (old) {
                netif_dbg(efx, drv, efx->net_dev,
                          "Already offloaded rule (cookie %lx)\n", tc->cookie);
-               rc = -EEXIST;
                NL_SET_ERR_MSG_MOD(extack, "Rule already offloaded");
-               goto release;
+               kfree(rule);
+               return -EEXIST;
        }
 
        /* Parse actions */
index 67e789b..755aa92 100644 (file)
@@ -249,7 +249,7 @@ void efx_xmit_done_check_empty(struct efx_tx_queue *tx_queue)
        }
 }
 
-void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index)
+int efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index)
 {
        unsigned int fill_level, pkts_compl = 0, bytes_compl = 0;
        unsigned int efv_pkts_compl = 0;
@@ -279,6 +279,8 @@ void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index)
        }
 
        efx_xmit_done_check_empty(tx_queue);
+
+       return pkts_compl + efv_pkts_compl;
 }
 
 /* Remove buffers put into a tx_queue for the current packet.
index d87aecb..1e9f429 100644 (file)
@@ -28,7 +28,7 @@ static inline bool efx_tx_buffer_in_use(struct efx_tx_buffer *buffer)
 }
 
 void efx_xmit_done_check_empty(struct efx_tx_queue *tx_queue);
-void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
+int efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
 
 void efx_enqueue_unwind(struct efx_tx_queue *tx_queue,
                        unsigned int insert_count);
index 16a8c36..f07905f 100644 (file)
@@ -644,7 +644,8 @@ static int qcom_ethqos_probe(struct platform_device *pdev)
        plat_dat->fix_mac_speed = ethqos_fix_mac_speed;
        plat_dat->dump_debug_regs = rgmii_dump;
        plat_dat->has_gmac4 = 1;
-       plat_dat->dwmac4_addrs = &data->dwmac4_addrs;
+       if (ethqos->has_emac3)
+               plat_dat->dwmac4_addrs = &data->dwmac4_addrs;
        plat_dat->pmt = 1;
        plat_dat->tso_en = of_property_read_bool(np, "snps,tso");
        if (of_device_is_compatible(np, "qcom,qcs404-ethqos"))
index 4538f33..d3c5306 100644 (file)
@@ -181,6 +181,7 @@ enum power_event {
 #define GMAC4_LPI_CTRL_STATUS  0xd0
 #define GMAC4_LPI_TIMER_CTRL   0xd4
 #define GMAC4_LPI_ENTRY_TIMER  0xd8
+#define GMAC4_MAC_ONEUS_TIC_COUNTER    0xdc
 
 /* LPI control and status defines */
 #define GMAC4_LPI_CTRL_STATUS_LPITCSE  BIT(21) /* LPI Tx Clock Stop Enable */
index afaec3f..03b1c5a 100644 (file)
@@ -25,6 +25,7 @@ static void dwmac4_core_init(struct mac_device_info *hw,
        struct stmmac_priv *priv = netdev_priv(dev);
        void __iomem *ioaddr = hw->pcsr;
        u32 value = readl(ioaddr + GMAC_CONFIG);
+       u32 clk_rate;
 
        value |= GMAC_CORE_INIT;
 
@@ -47,6 +48,10 @@ static void dwmac4_core_init(struct mac_device_info *hw,
 
        writel(value, ioaddr + GMAC_CONFIG);
 
+       /* Configure LPI 1us counter to number of CSR clock ticks in 1us - 1 */
+       clk_rate = clk_get_rate(priv->plat->stmmac_clk);
+       writel((clk_rate / 1000000) - 1, ioaddr + GMAC4_MAC_ONEUS_TIC_COUNTER);
+
        /* Enable GMAC interrupts */
        value = GMAC_INT_DEFAULT_ENABLE;
 
index 0fca815..8751095 100644 (file)
@@ -3873,7 +3873,6 @@ irq_error:
 
        stmmac_hw_teardown(dev);
 init_error:
-       free_dma_desc_resources(priv, &priv->dma_conf);
        phylink_disconnect_phy(priv->phylink);
 init_phy_error:
        pm_runtime_put(priv->device);
@@ -3891,6 +3890,9 @@ static int stmmac_open(struct net_device *dev)
                return PTR_ERR(dma_conf);
 
        ret = __stmmac_open(dev, dma_conf);
+       if (ret)
+               free_dma_desc_resources(priv, dma_conf);
+
        kfree(dma_conf);
        return ret;
 }
@@ -5633,12 +5635,15 @@ static int stmmac_change_mtu(struct net_device *dev, int new_mtu)
                stmmac_release(dev);
 
                ret = __stmmac_open(dev, dma_conf);
-               kfree(dma_conf);
                if (ret) {
+                       free_dma_desc_resources(priv, dma_conf);
+                       kfree(dma_conf);
                        netdev_err(priv->dev, "failed reopening the interface after MTU change\n");
                        return ret;
                }
 
+               kfree(dma_conf);
+
                stmmac_set_rx_mode(dev);
        }
 
@@ -7233,8 +7238,7 @@ int stmmac_dvr_probe(struct device *device,
        ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
                            NETIF_F_RXCSUM;
        ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
-                            NETDEV_XDP_ACT_XSK_ZEROCOPY |
-                            NETDEV_XDP_ACT_NDO_XMIT;
+                            NETDEV_XDP_ACT_XSK_ZEROCOPY;
 
        ret = stmmac_tc_init(priv, priv);
        if (!ret) {
index 9d4d8c3..aa6f16d 100644 (file)
@@ -117,6 +117,9 @@ int stmmac_xdp_set_prog(struct stmmac_priv *priv, struct bpf_prog *prog,
                return -EOPNOTSUPP;
        }
 
+       if (!prog)
+               xdp_features_clear_redirect_target(dev);
+
        need_update = !!priv->xdp_prog != !!prog;
        if (if_running && need_update)
                stmmac_xdp_release(dev);
@@ -131,5 +134,8 @@ int stmmac_xdp_set_prog(struct stmmac_priv *priv, struct bpf_prog *prog,
        if (if_running && need_update)
                stmmac_xdp_open(dev);
 
+       if (prog)
+               xdp_features_set_redirect_target(dev, false);
+
        return 0;
 }
index 4ef05ba..d61dfa2 100644 (file)
@@ -5077,6 +5077,8 @@ err_out_iounmap:
                cas_shutdown(cp);
        mutex_unlock(&cp->pm_mutex);
 
+       vfree(cp->fw_data);
+
        pci_iounmap(pdev, cp->regs);
 
 
index 11cbcd9..bebcfd5 100644 (file)
@@ -2068,7 +2068,7 @@ static int am65_cpsw_nuss_init_slave_ports(struct am65_cpsw_common *common)
                /* Initialize the Serdes PHY for the port */
                ret = am65_cpsw_init_serdes_phy(dev, port_np, port);
                if (ret)
-                       return ret;
+                       goto of_node_put;
 
                port->slave.mac_only =
                                of_property_read_bool(port_np, "ti,mac-only");
index f9972b8..a03490b 100644 (file)
@@ -1348,3 +1348,5 @@ module_spi_driver(adf7242_driver);
 MODULE_AUTHOR("Michael Hennerich <michael.hennerich@analog.com>");
 MODULE_DESCRIPTION("ADF7242 IEEE802.15.4 Transceiver Driver");
 MODULE_LICENSE("GPL");
+
+MODULE_FIRMWARE(FIRMWARE);
index 8445c21..31cba9a 100644 (file)
@@ -685,7 +685,7 @@ static int hwsim_del_edge_nl(struct sk_buff *msg, struct genl_info *info)
 static int hwsim_set_edge_lqi(struct sk_buff *msg, struct genl_info *info)
 {
        struct nlattr *edge_attrs[MAC802154_HWSIM_EDGE_ATTR_MAX + 1];
-       struct hwsim_edge_info *einfo;
+       struct hwsim_edge_info *einfo, *einfo_old;
        struct hwsim_phy *phy_v0;
        struct hwsim_edge *e;
        u32 v0, v1;
@@ -723,8 +723,10 @@ static int hwsim_set_edge_lqi(struct sk_buff *msg, struct genl_info *info)
        list_for_each_entry_rcu(e, &phy_v0->edges, list) {
                if (e->endpoint->idx == v1) {
                        einfo->lqi = lqi;
-                       rcu_assign_pointer(e->info, einfo);
+                       einfo_old = rcu_replace_pointer(e->info, einfo,
+                                                       lockdep_is_held(&hwsim_phys_lock));
                        rcu_read_unlock();
+                       kfree_rcu(einfo_old, rcu);
                        mutex_unlock(&hwsim_phys_lock);
                        return 0;
                }
index 2ee80ed..afa1d56 100644 (file)
@@ -119,7 +119,7 @@ enum ipa_status_field_id {
 };
 
 /* Size in bytes of an IPA packet status structure */
-#define IPA_STATUS_SIZE                        sizeof(__le32[4])
+#define IPA_STATUS_SIZE                        sizeof(__le32[8])
 
 /* IPA status structure decoder; looks up field values for a structure */
 static u32 ipa_status_extract(struct ipa *ipa, const void *data,
index 460b3d4..ab5133e 100644 (file)
@@ -436,6 +436,9 @@ static int ipvlan_process_v4_outbound(struct sk_buff *skb)
                goto err;
        }
        skb_dst_set(skb, &rt->dst);
+
+       memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
+
        err = ip_local_out(net, skb->sk, skb);
        if (unlikely(net_xmit_eval(err)))
                dev->stats.tx_errors++;
@@ -474,6 +477,9 @@ static int ipvlan_process_v6_outbound(struct sk_buff *skb)
                goto err;
        }
        skb_dst_set(skb, dst);
+
+       memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
+
        err = ip6_local_out(net, skb->sk, skb);
        if (unlikely(net_xmit_eval(err)))
                dev->stats.tx_errors++;
index 71712ea..d5b05e8 100644 (file)
@@ -102,6 +102,10 @@ static unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
 
        skb->dev = addr->master->dev;
        skb->skb_iif = skb->dev->ifindex;
+#if IS_ENABLED(CONFIG_IPV6)
+       if (addr->atype == IPVL_IPV6)
+               IP6CB(skb)->iif = skb->dev->ifindex;
+#endif
        len = skb->len + ETH_HLEN;
        ipvlan_count_rx(addr->master, len, true, false);
 out:
index 3427993..984dfa5 100644 (file)
@@ -3997,17 +3997,15 @@ static int macsec_add_dev(struct net_device *dev, sci_t sci, u8 icv_len)
                return -ENOMEM;
 
        secy->tx_sc.stats = netdev_alloc_pcpu_stats(struct pcpu_tx_sc_stats);
-       if (!secy->tx_sc.stats) {
-               free_percpu(macsec->stats);
+       if (!secy->tx_sc.stats)
                return -ENOMEM;
-       }
 
        secy->tx_sc.md_dst = metadata_dst_alloc(0, METADATA_MACSEC, GFP_KERNEL);
-       if (!secy->tx_sc.md_dst) {
-               free_percpu(secy->tx_sc.stats);
-               free_percpu(macsec->stats);
+       if (!secy->tx_sc.md_dst)
+               /* macsec and secy percpu stats will be freed when unregistering
+                * net_device in macsec_free_netdev()
+                */
                return -ENOMEM;
-       }
 
        if (sci == MACSEC_UNDEF_SCI)
                sci = dev_to_sci(dev, MACSEC_PORT_ES);
index 1e0c206..da2001e 100644 (file)
@@ -291,7 +291,8 @@ static int i2c_rollball_mii_cmd(struct mii_bus *bus, int bus_addr, u8 cmd,
        return i2c_transfer_rollball(i2c, msgs, ARRAY_SIZE(msgs));
 }
 
-static int i2c_mii_read_rollball(struct mii_bus *bus, int phy_id, int reg)
+static int i2c_mii_read_rollball(struct mii_bus *bus, int phy_id, int devad,
+                                int reg)
 {
        u8 buf[4], res[6];
        int bus_addr, ret;
@@ -302,7 +303,7 @@ static int i2c_mii_read_rollball(struct mii_bus *bus, int phy_id, int reg)
                return 0xffff;
 
        buf[0] = ROLLBALL_DATA_ADDR;
-       buf[1] = (reg >> 16) & 0x1f;
+       buf[1] = devad;
        buf[2] = (reg >> 8) & 0xff;
        buf[3] = reg & 0xff;
 
@@ -322,8 +323,8 @@ static int i2c_mii_read_rollball(struct mii_bus *bus, int phy_id, int reg)
        return val;
 }
 
-static int i2c_mii_write_rollball(struct mii_bus *bus, int phy_id, int reg,
-                                 u16 val)
+static int i2c_mii_write_rollball(struct mii_bus *bus, int phy_id, int devad,
+                                 int reg, u16 val)
 {
        int bus_addr, ret;
        u8 buf[6];
@@ -333,7 +334,7 @@ static int i2c_mii_write_rollball(struct mii_bus *bus, int phy_id, int reg,
                return 0;
 
        buf[0] = ROLLBALL_DATA_ADDR;
-       buf[1] = (reg >> 16) & 0x1f;
+       buf[1] = devad;
        buf[2] = (reg >> 8) & 0xff;
        buf[3] = reg & 0xff;
        buf[4] = val >> 8;
@@ -405,8 +406,8 @@ struct mii_bus *mdio_i2c_alloc(struct device *parent, struct i2c_adapter *i2c,
                        return ERR_PTR(ret);
                }
 
-               mii->read = i2c_mii_read_rollball;
-               mii->write = i2c_mii_write_rollball;
+               mii->read_c45 = i2c_mii_read_rollball;
+               mii->write_c45 = i2c_mii_write_rollball;
                break;
        default:
                mii->read = i2c_mii_read_default_c22;
index 68fc559..554837c 100644 (file)
@@ -67,6 +67,7 @@ static int mvusb_mdio_probe(struct usb_interface *interface,
        struct device *dev = &interface->dev;
        struct mvusb_mdio *mvusb;
        struct mii_bus *mdio;
+       int ret;
 
        mdio = devm_mdiobus_alloc_size(dev, sizeof(*mvusb));
        if (!mdio)
@@ -87,7 +88,15 @@ static int mvusb_mdio_probe(struct usb_interface *interface,
        mdio->write = mvusb_mdio_write;
 
        usb_set_intfdata(interface, mvusb);
-       return of_mdiobus_register(mdio, dev->of_node);
+       ret = of_mdiobus_register(mdio, dev->of_node);
+       if (ret)
+               goto put_dev;
+
+       return 0;
+
+put_dev:
+       usb_put_dev(mvusb->udev);
+       return ret;
 }
 
 static void mvusb_mdio_disconnect(struct usb_interface *interface)
index 539cd43..72f25e7 100644 (file)
@@ -873,7 +873,7 @@ int xpcs_do_config(struct dw_xpcs *xpcs, phy_interface_t interface,
 
        switch (compat->an_mode) {
        case DW_AN_C73:
-               if (phylink_autoneg_inband(mode)) {
+               if (test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, advertising)) {
                        ret = xpcs_config_aneg_c73(xpcs, compat);
                        if (ret)
                                return ret;
@@ -1203,7 +1203,7 @@ static const struct xpcs_compat synopsys_xpcs_compat[DW_XPCS_INTERFACE_MAX] = {
        [DW_XPCS_2500BASEX] = {
                .supported = xpcs_2500basex_features,
                .interface = xpcs_2500basex_interfaces,
-               .num_interfaces = ARRAY_SIZE(xpcs_2500basex_features),
+               .num_interfaces = ARRAY_SIZE(xpcs_2500basex_interfaces),
                .an_mode = DW_2500BASEX,
        },
 };
index 9902fb1..729db44 100644 (file)
@@ -40,6 +40,11 @@ static inline int bcm_phy_write_exp_sel(struct phy_device *phydev,
        return bcm_phy_write_exp(phydev, reg | MII_BCM54XX_EXP_SEL_ER, val);
 }
 
+static inline int bcm_phy_read_exp_sel(struct phy_device *phydev, u16 reg)
+{
+       return bcm_phy_read_exp(phydev, reg | MII_BCM54XX_EXP_SEL_ER);
+}
+
 int bcm54xx_auxctl_write(struct phy_device *phydev, u16 regnum, u16 val);
 int bcm54xx_auxctl_read(struct phy_device *phydev, u16 regnum);
 
index 06be71e..f8c17a2 100644 (file)
@@ -486,7 +486,7 @@ static int bcm7xxx_16nm_ephy_afe_config(struct phy_device *phydev)
        bcm_phy_write_misc(phydev, 0x0038, 0x0002, 0xede0);
 
        /* Read CORE_EXPA9 */
-       tmp = bcm_phy_read_exp(phydev, 0x00a9);
+       tmp = bcm_phy_read_exp_sel(phydev, 0x00a9);
        /* CORE_EXPA9[6:1] is rcalcode[5:0] */
        rcalcode = (tmp & 0x7e) / 2;
        /* Correct RCAL code + 1 is -1% rprogr, LP: +16 */
index d75f526..e397e7d 100644 (file)
@@ -44,6 +44,7 @@
 #define DP83867_STRAP_STS1     0x006E
 #define DP83867_STRAP_STS2     0x006f
 #define DP83867_RGMIIDCTL      0x0086
+#define DP83867_DSP_FFE_CFG    0x012c
 #define DP83867_RXFCFG         0x0134
 #define DP83867_RXFPMD1        0x0136
 #define DP83867_RXFPMD2        0x0137
@@ -935,14 +936,33 @@ static int dp83867_phy_reset(struct phy_device *phydev)
 {
        int err;
 
-       err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESTART);
+       err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESET);
        if (err < 0)
                return err;
 
        usleep_range(10, 20);
 
-       return phy_modify(phydev, MII_DP83867_PHYCTRL,
+       err = phy_modify(phydev, MII_DP83867_PHYCTRL,
                         DP83867_PHYCR_FORCE_LINK_GOOD, 0);
+       if (err < 0)
+               return err;
+
+       /* Configure the DSP Feedforward Equalizer Configuration register to
+        * improve short cable (< 1 meter) performance. This will not affect
+        * long cable performance.
+        */
+       err = phy_write_mmd(phydev, DP83867_DEVADDR, DP83867_DSP_FFE_CFG,
+                           0x0e81);
+       if (err < 0)
+               return err;
+
+       err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESTART);
+       if (err < 0)
+               return err;
+
+       usleep_range(10, 20);
+
+       return 0;
 }
 
 static void dp83867_link_change_notify(struct phy_device *phydev)
index 389f33a..8b3618d 100644 (file)
@@ -1287,7 +1287,7 @@ EXPORT_SYMBOL_GPL(mdiobus_modify_changed);
  * @mask: bit mask of bits to clear
  * @set: bit mask of bits to set
  */
-int mdiobus_c45_modify_changed(struct mii_bus *bus, int devad, int addr,
+int mdiobus_c45_modify_changed(struct mii_bus *bus, int addr, int devad,
                               u32 regnum, u16 mask, u16 set)
 {
        int err;
index a50235f..defe5cc 100644 (file)
@@ -179,6 +179,7 @@ enum rgmii_clock_delay {
 #define VSC8502_RGMII_CNTL               20
 #define VSC8502_RGMII_RX_DELAY_MASK      0x0070
 #define VSC8502_RGMII_TX_DELAY_MASK      0x0007
+#define VSC8502_RGMII_RX_CLK_DISABLE     0x0800
 
 #define MSCC_PHY_WOL_LOWER_MAC_ADDR      21
 #define MSCC_PHY_WOL_MID_MAC_ADDR        22
@@ -276,6 +277,7 @@ enum rgmii_clock_delay {
 /* Microsemi PHY ID's
  *   Code assumes lowest nibble is 0
  */
+#define PHY_ID_VSC8501                   0x00070530
 #define PHY_ID_VSC8502                   0x00070630
 #define PHY_ID_VSC8504                   0x000704c0
 #define PHY_ID_VSC8514                   0x00070670
index 62bf99e..28df8a2 100644 (file)
@@ -519,16 +519,27 @@ out_unlock:
  *  * 2.0 ns (which causes the data to be sampled at exactly half way between
  *    clock transitions at 1000 Mbps) if delays should be enabled
  */
-static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
-                                  u16 rgmii_rx_delay_mask,
-                                  u16 rgmii_tx_delay_mask)
+static int vsc85xx_update_rgmii_cntl(struct phy_device *phydev, u32 rgmii_cntl,
+                                    u16 rgmii_rx_delay_mask,
+                                    u16 rgmii_tx_delay_mask)
 {
        u16 rgmii_rx_delay_pos = ffs(rgmii_rx_delay_mask) - 1;
        u16 rgmii_tx_delay_pos = ffs(rgmii_tx_delay_mask) - 1;
        u16 reg_val = 0;
-       int rc;
+       u16 mask = 0;
+       int rc = 0;
 
-       mutex_lock(&phydev->lock);
+       /* For traffic to pass, the VSC8502 family needs the RX_CLK disable bit
+        * to be unset for all PHY modes, so do that as part of the paged
+        * register modification.
+        * For some family members (like VSC8530/31/40/41) this bit is reserved
+        * and read-only, and the RX clock is enabled by default.
+        */
+       if (rgmii_cntl == VSC8502_RGMII_CNTL)
+               mask |= VSC8502_RGMII_RX_CLK_DISABLE;
+
+       if (phy_interface_is_rgmii(phydev))
+               mask |= rgmii_rx_delay_mask | rgmii_tx_delay_mask;
 
        if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID ||
            phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
@@ -537,31 +548,20 @@ static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
            phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
                reg_val |= RGMII_CLK_DELAY_2_0_NS << rgmii_tx_delay_pos;
 
-       rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
-                             rgmii_cntl,
-                             rgmii_rx_delay_mask | rgmii_tx_delay_mask,
-                             reg_val);
-
-       mutex_unlock(&phydev->lock);
+       if (mask)
+               rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
+                                     rgmii_cntl, mask, reg_val);
 
        return rc;
 }
 
 static int vsc85xx_default_config(struct phy_device *phydev)
 {
-       int rc;
-
        phydev->mdix_ctrl = ETH_TP_MDI_AUTO;
 
-       if (phy_interface_mode_is_rgmii(phydev->interface)) {
-               rc = vsc85xx_rgmii_set_skews(phydev, VSC8502_RGMII_CNTL,
-                                            VSC8502_RGMII_RX_DELAY_MASK,
-                                            VSC8502_RGMII_TX_DELAY_MASK);
-               if (rc)
-                       return rc;
-       }
-
-       return 0;
+       return vsc85xx_update_rgmii_cntl(phydev, VSC8502_RGMII_CNTL,
+                                        VSC8502_RGMII_RX_DELAY_MASK,
+                                        VSC8502_RGMII_TX_DELAY_MASK);
 }
 
 static int vsc85xx_get_tunable(struct phy_device *phydev,
@@ -1758,13 +1758,11 @@ static int vsc8584_config_init(struct phy_device *phydev)
        if (ret)
                return ret;
 
-       if (phy_interface_is_rgmii(phydev)) {
-               ret = vsc85xx_rgmii_set_skews(phydev, VSC8572_RGMII_CNTL,
-                                             VSC8572_RGMII_RX_DELAY_MASK,
-                                             VSC8572_RGMII_TX_DELAY_MASK);
-               if (ret)
-                       return ret;
-       }
+       ret = vsc85xx_update_rgmii_cntl(phydev, VSC8572_RGMII_CNTL,
+                                       VSC8572_RGMII_RX_DELAY_MASK,
+                                       VSC8572_RGMII_TX_DELAY_MASK);
+       if (ret)
+               return ret;
 
        ret = genphy_soft_reset(phydev);
        if (ret)
@@ -2317,6 +2315,30 @@ static int vsc85xx_probe(struct phy_device *phydev)
 /* Microsemi VSC85xx PHYs */
 static struct phy_driver vsc85xx_driver[] = {
 {
+       .phy_id         = PHY_ID_VSC8501,
+       .name           = "Microsemi GE VSC8501 SyncE",
+       .phy_id_mask    = 0xfffffff0,
+       /* PHY_BASIC_FEATURES */
+       .soft_reset     = &genphy_soft_reset,
+       .config_init    = &vsc85xx_config_init,
+       .config_aneg    = &vsc85xx_config_aneg,
+       .read_status    = &vsc85xx_read_status,
+       .handle_interrupt = vsc85xx_handle_interrupt,
+       .config_intr    = &vsc85xx_config_intr,
+       .suspend        = &genphy_suspend,
+       .resume         = &genphy_resume,
+       .probe          = &vsc85xx_probe,
+       .set_wol        = &vsc85xx_wol_set,
+       .get_wol        = &vsc85xx_wol_get,
+       .get_tunable    = &vsc85xx_get_tunable,
+       .set_tunable    = &vsc85xx_set_tunable,
+       .read_page      = &vsc85xx_phy_read_page,
+       .write_page     = &vsc85xx_phy_write_page,
+       .get_sset_count = &vsc85xx_get_sset_count,
+       .get_strings    = &vsc85xx_get_strings,
+       .get_stats      = &vsc85xx_get_stats,
+},
+{
        .phy_id         = PHY_ID_VSC8502,
        .name           = "Microsemi GE VSC8502 SyncE",
        .phy_id_mask    = 0xfffffff0,
@@ -2656,6 +2678,8 @@ static struct phy_driver vsc85xx_driver[] = {
 module_phy_driver(vsc85xx_driver);
 
 static struct mdio_device_id __maybe_unused vsc85xx_tbl[] = {
+       { PHY_ID_VSC8501, 0xfffffff0, },
+       { PHY_ID_VSC8502, 0xfffffff0, },
        { PHY_ID_VSC8504, 0xfffffff0, },
        { PHY_ID_VSC8514, 0xfffffff0, },
        { PHY_ID_VSC8530, 0xfffffff0, },
index 6301a9a..ea1073a 100644 (file)
@@ -274,13 +274,6 @@ static int gpy_config_init(struct phy_device *phydev)
        return ret < 0 ? ret : 0;
 }
 
-static bool gpy_has_broken_mdint(struct phy_device *phydev)
-{
-       /* At least these PHYs are known to have broken interrupt handling */
-       return phydev->drv->phy_id == PHY_ID_GPY215B ||
-              phydev->drv->phy_id == PHY_ID_GPY215C;
-}
-
 static int gpy_probe(struct phy_device *phydev)
 {
        struct device *dev = &phydev->mdio.dev;
@@ -300,8 +293,7 @@ static int gpy_probe(struct phy_device *phydev)
        phydev->priv = priv;
        mutex_init(&priv->mbox_lock);
 
-       if (gpy_has_broken_mdint(phydev) &&
-           !device_property_present(dev, "maxlinear,use-broken-interrupts"))
+       if (!device_property_present(dev, "maxlinear,use-broken-interrupts"))
                phydev->dev_flags |= PHY_F_NO_IRQ;
 
        fw_version = phy_read(phydev, PHY_FWV);
@@ -659,11 +651,9 @@ static irqreturn_t gpy_handle_interrupt(struct phy_device *phydev)
         * frame. Therefore, polling is the best we can do and won't do any more
         * harm.
         * It was observed that this bug happens on link state and link speed
-        * changes on a GPY215B and GYP215C independent of the firmware version
-        * (which doesn't mean that this list is exhaustive).
+        * changes independent of the firmware version.
         */
-       if (gpy_has_broken_mdint(phydev) &&
-           (reg & (PHY_IMASK_LSTC | PHY_IMASK_LSPC))) {
+       if (reg & (PHY_IMASK_LSTC | PHY_IMASK_LSPC)) {
                reg = gpy_mbox_read(phydev, REG_GPIO0_OUT);
                if (reg < 0) {
                        phy_error(phydev);
index 17d0d05..5359821 100644 (file)
@@ -3021,6 +3021,15 @@ static int phy_led_blink_set(struct led_classdev *led_cdev,
        return err;
 }
 
+static void phy_leds_unregister(struct phy_device *phydev)
+{
+       struct phy_led *phyled;
+
+       list_for_each_entry(phyled, &phydev->leds, list) {
+               led_classdev_unregister(&phyled->led_cdev);
+       }
+}
+
 static int of_phy_led(struct phy_device *phydev,
                      struct device_node *led)
 {
@@ -3054,7 +3063,7 @@ static int of_phy_led(struct phy_device *phydev,
        init_data.fwnode = of_fwnode_handle(led);
        init_data.devname_mandatory = true;
 
-       err = devm_led_classdev_register_ext(dev, cdev, &init_data);
+       err = led_classdev_register_ext(dev, cdev, &init_data);
        if (err)
                return err;
 
@@ -3083,6 +3092,7 @@ static int of_phy_leds(struct phy_device *phydev)
                err = of_phy_led(phydev, led);
                if (err) {
                        of_node_put(led);
+                       phy_leds_unregister(phydev);
                        return err;
                }
        }
@@ -3305,6 +3315,9 @@ static int phy_remove(struct device *dev)
 
        cancel_delayed_work_sync(&phydev->state_queue);
 
+       if (IS_ENABLED(CONFIG_PHYLIB_LEDS))
+               phy_leds_unregister(phydev);
+
        phydev->state = PHY_DOWN;
 
        sfp_bus_del_upstream(phydev->sfp_bus);
index a4111f1..5efdeb5 100644 (file)
@@ -188,6 +188,7 @@ static int phylink_interface_max_speed(phy_interface_t interface)
        case PHY_INTERFACE_MODE_RGMII_ID:
        case PHY_INTERFACE_MODE_RGMII:
        case PHY_INTERFACE_MODE_QSGMII:
+       case PHY_INTERFACE_MODE_QUSGMII:
        case PHY_INTERFACE_MODE_SGMII:
        case PHY_INTERFACE_MODE_GMII:
                return SPEED_1000;
@@ -204,7 +205,6 @@ static int phylink_interface_max_speed(phy_interface_t interface)
        case PHY_INTERFACE_MODE_10GBASER:
        case PHY_INTERFACE_MODE_10GKR:
        case PHY_INTERFACE_MODE_USXGMII:
-       case PHY_INTERFACE_MODE_QUSGMII:
                return SPEED_10000;
 
        case PHY_INTERFACE_MODE_25GBASER:
@@ -2226,6 +2226,12 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
        ASSERT_RTNL();
 
        if (pl->phydev) {
+               struct ethtool_link_ksettings phy_kset = *kset;
+
+               linkmode_and(phy_kset.link_modes.advertising,
+                            phy_kset.link_modes.advertising,
+                            pl->supported);
+
                /* We can rely on phylib for this update; we also do not need
                 * to update the pl->link_config settings:
                 * - the configuration returned via ksettings_get() will come
@@ -2244,11 +2250,10 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
                 *   the presence of a PHY, this should not be changed as that
                 *   should be determined from the media side advertisement.
                 */
-               return phy_ethtool_ksettings_set(pl->phydev, kset);
+               return phy_ethtool_ksettings_set(pl->phydev, &phy_kset);
        }
 
        config = pl->link_config;
-
        /* Mask out unsupported advertisements */
        linkmode_and(config.advertising, kset->link_modes.advertising,
                     pl->supported);
@@ -3294,6 +3299,41 @@ void phylink_decode_usxgmii_word(struct phylink_link_state *state,
 EXPORT_SYMBOL_GPL(phylink_decode_usxgmii_word);
 
 /**
+ * phylink_decode_usgmii_word() - decode the USGMII word from a MAC PCS
+ * @state: a pointer to a struct phylink_link_state.
+ * @lpa: a 16 bit value which stores the USGMII auto-negotiation word
+ *
+ * Helper for MAC PCS supporting the USGMII protocol and the auto-negotiation
+ * code word.  Decode the USGMII code word and populate the corresponding fields
+ * (speed, duplex) into the phylink_link_state structure. The structure for this
+ * word is the same as the USXGMII word, except it only supports speeds up to
+ * 1Gbps.
+ */
+static void phylink_decode_usgmii_word(struct phylink_link_state *state,
+                                      uint16_t lpa)
+{
+       switch (lpa & MDIO_USXGMII_SPD_MASK) {
+       case MDIO_USXGMII_10:
+               state->speed = SPEED_10;
+               break;
+       case MDIO_USXGMII_100:
+               state->speed = SPEED_100;
+               break;
+       case MDIO_USXGMII_1000:
+               state->speed = SPEED_1000;
+               break;
+       default:
+               state->link = false;
+               return;
+       }
+
+       if (lpa & MDIO_USXGMII_FULL_DUPLEX)
+               state->duplex = DUPLEX_FULL;
+       else
+               state->duplex = DUPLEX_HALF;
+}
+
+/**
  * phylink_mii_c22_pcs_decode_state() - Decode MAC PCS state from MII registers
  * @state: a pointer to a &struct phylink_link_state.
  * @bmsr: The value of the %MII_BMSR register
@@ -3330,9 +3370,11 @@ void phylink_mii_c22_pcs_decode_state(struct phylink_link_state *state,
 
        case PHY_INTERFACE_MODE_SGMII:
        case PHY_INTERFACE_MODE_QSGMII:
-       case PHY_INTERFACE_MODE_QUSGMII:
                phylink_decode_sgmii_word(state, lpa);
                break;
+       case PHY_INTERFACE_MODE_QUSGMII:
+               phylink_decode_usgmii_word(state, lpa);
+               break;
 
        default:
                state->link = false;
index ce993cc..d30d730 100644 (file)
@@ -742,7 +742,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 
        /* Move network header to the right position for VLAN tagged packets */
        if (eth_type_vlan(skb->protocol) &&
-           __vlan_get_protocol(skb, skb->protocol, &depth) != 0)
+           vlan_get_protocol_and_depth(skb, skb->protocol, &depth) != 0)
                skb_set_network_header(skb, depth);
 
        /* copy skb_ubuf_info for callback when skb has no error */
@@ -1197,7 +1197,7 @@ static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp)
 
        /* Move network header to the right position for VLAN tagged packets */
        if (eth_type_vlan(skb->protocol) &&
-           __vlan_get_protocol(skb, skb->protocol, &depth) != 0)
+           vlan_get_protocol_and_depth(skb, skb->protocol, &depth) != 0)
                skb_set_network_header(skb, depth);
 
        rcu_read_lock();
index d10606f..555b0b1 100644 (file)
@@ -1629,6 +1629,7 @@ static int team_init(struct net_device *dev)
 
        team->dev = dev;
        team_set_no_mode(team);
+       team->notifier_ctx = false;
 
        team->pcpu_stats = netdev_alloc_pcpu_stats(struct team_pcpu_stats);
        if (!team->pcpu_stats)
@@ -3022,7 +3023,11 @@ static int team_device_event(struct notifier_block *unused,
                team_del_slave(port->team->dev, dev);
                break;
        case NETDEV_FEAT_CHANGE:
-               team_compute_features(port->team);
+               if (!port->team->notifier_ctx) {
+                       port->team->notifier_ctx = true;
+                       team_compute_features(port->team);
+                       port->team->notifier_ctx = false;
+               }
                break;
        case NETDEV_PRECHANGEMTU:
                /* Forbid to change mtu of underlaying device */
index d4d0a41..d75456a 100644 (file)
@@ -1977,6 +1977,14 @@ napi_busy:
                int queue_len;
 
                spin_lock_bh(&queue->lock);
+
+               if (unlikely(tfile->detached)) {
+                       spin_unlock_bh(&queue->lock);
+                       rcu_read_unlock();
+                       err = -EBUSY;
+                       goto free_skb;
+               }
+
                __skb_queue_tail(queue, skb);
                queue_len = skb_queue_len(queue);
                spin_unlock(&queue->lock);
@@ -2512,6 +2520,13 @@ build:
        if (tfile->napi_enabled) {
                queue = &tfile->sk.sk_write_queue;
                spin_lock(&queue->lock);
+
+               if (unlikely(tfile->detached)) {
+                       spin_unlock(&queue->lock);
+                       kfree_skb(skb);
+                       return -EBUSY;
+               }
+
                __skb_queue_tail(queue, skb);
                spin_unlock(&queue->lock);
                ret = 1;
index 6ce8f4f..db05622 100644 (file)
@@ -181,9 +181,12 @@ static u32 cdc_ncm_check_tx_max(struct usbnet *dev, u32 new_tx)
        else
                min = ctx->max_datagram_size + ctx->max_ndp_size + sizeof(struct usb_cdc_ncm_nth32);
 
-       max = min_t(u32, CDC_NCM_NTB_MAX_SIZE_TX, le32_to_cpu(ctx->ncm_parm.dwNtbOutMaxSize));
-       if (max == 0)
+       if (le32_to_cpu(ctx->ncm_parm.dwNtbOutMaxSize) == 0)
                max = CDC_NCM_NTB_MAX_SIZE_TX; /* dwNtbOutMaxSize not set */
+       else
+               max = clamp_t(u32, le32_to_cpu(ctx->ncm_parm.dwNtbOutMaxSize),
+                             USB_CDC_NCM_NTB_MIN_OUT_SIZE,
+                             CDC_NCM_NTB_MAX_SIZE_TX);
 
        /* some devices set dwNtbOutMaxSize too low for the above default */
        min = min(min, max);
@@ -1244,6 +1247,9 @@ cdc_ncm_fill_tx_frame(struct usbnet *dev, struct sk_buff *skb, __le32 sign)
                         * further.
                         */
                        if (skb_out == NULL) {
+                               /* If even the smallest allocation fails, abort. */
+                               if (ctx->tx_curr_size == USB_CDC_NCM_NTB_MIN_OUT_SIZE)
+                                       goto alloc_failed;
                                ctx->tx_low_mem_max_cnt = min(ctx->tx_low_mem_max_cnt + 1,
                                                              (unsigned)CDC_NCM_LOW_MEM_MAX_CNT);
                                ctx->tx_low_mem_val = ctx->tx_low_mem_max_cnt;
@@ -1262,13 +1268,8 @@ cdc_ncm_fill_tx_frame(struct usbnet *dev, struct sk_buff *skb, __le32 sign)
                        skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
 
                        /* No allocation possible so we will abort */
-                       if (skb_out == NULL) {
-                               if (skb != NULL) {
-                                       dev_kfree_skb_any(skb);
-                                       dev->net->stats.tx_dropped++;
-                               }
-                               goto exit_no_skb;
-                       }
+                       if (!skb_out)
+                               goto alloc_failed;
                        ctx->tx_low_mem_val--;
                }
                if (ctx->is_ndp16) {
@@ -1461,6 +1462,11 @@ cdc_ncm_fill_tx_frame(struct usbnet *dev, struct sk_buff *skb, __le32 sign)
 
        return skb_out;
 
+alloc_failed:
+       if (skb) {
+               dev_kfree_skb_any(skb);
+               dev->net->stats.tx_dropped++;
+       }
 exit_no_skb:
        /* Start timer, if there is a remaining non-empty skb */
        if (ctx->tx_curr_skb != NULL && n > 0)
index 571e37e..2e7c7b0 100644 (file)
@@ -1220,7 +1220,9 @@ static const struct usb_device_id products[] = {
        {QMI_FIXED_INTF(0x05c6, 0x9080, 8)},
        {QMI_FIXED_INTF(0x05c6, 0x9083, 3)},
        {QMI_FIXED_INTF(0x05c6, 0x9084, 4)},
+       {QMI_QUIRK_SET_DTR(0x05c6, 0x9091, 2)}, /* Compal RXM-G1 */
        {QMI_FIXED_INTF(0x05c6, 0x90b2, 3)},    /* ublox R410M */
+       {QMI_QUIRK_SET_DTR(0x05c6, 0x90db, 2)}, /* Compal RXM-G1 */
        {QMI_FIXED_INTF(0x05c6, 0x920d, 0)},
        {QMI_FIXED_INTF(0x05c6, 0x920d, 5)},
        {QMI_QUIRK_SET_DTR(0x05c6, 0x9625, 4)}, /* YUGA CLM920-NC5 */
@@ -1325,7 +1327,7 @@ static const struct usb_device_id products[] = {
        {QMI_FIXED_INTF(0x2001, 0x7e3d, 4)},    /* D-Link DWM-222 A2 */
        {QMI_FIXED_INTF(0x2020, 0x2031, 4)},    /* Olicard 600 */
        {QMI_FIXED_INTF(0x2020, 0x2033, 4)},    /* BroadMobi BM806U */
-       {QMI_FIXED_INTF(0x2020, 0x2060, 4)},    /* BroadMobi BM818 */
+       {QMI_QUIRK_SET_DTR(0x2020, 0x2060, 4)}, /* BroadMobi BM818 */
        {QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)},    /* Sierra Wireless MC7700 */
        {QMI_FIXED_INTF(0x114f, 0x68a2, 8)},    /* Sierra Wireless MC7750 */
        {QMI_FIXED_INTF(0x1199, 0x68a2, 8)},    /* Sierra Wireless MC7710 in QMI mode */
index a12ae26..486b584 100644 (file)
@@ -205,6 +205,8 @@ struct control_buf {
        __virtio16 vid;
        __virtio64 offloads;
        struct virtio_net_ctrl_rss rss;
+       struct virtio_net_ctrl_coal_tx coal_tx;
+       struct virtio_net_ctrl_coal_rx coal_rx;
 };
 
 struct virtnet_info {
@@ -1868,6 +1870,38 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
        return received;
 }
 
+static void virtnet_disable_queue_pair(struct virtnet_info *vi, int qp_index)
+{
+       virtnet_napi_tx_disable(&vi->sq[qp_index].napi);
+       napi_disable(&vi->rq[qp_index].napi);
+       xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq);
+}
+
+static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
+{
+       struct net_device *dev = vi->dev;
+       int err;
+
+       err = xdp_rxq_info_reg(&vi->rq[qp_index].xdp_rxq, dev, qp_index,
+                              vi->rq[qp_index].napi.napi_id);
+       if (err < 0)
+               return err;
+
+       err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq,
+                                        MEM_TYPE_PAGE_SHARED, NULL);
+       if (err < 0)
+               goto err_xdp_reg_mem_model;
+
+       virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
+       virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
+
+       return 0;
+
+err_xdp_reg_mem_model:
+       xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq);
+       return err;
+}
+
 static int virtnet_open(struct net_device *dev)
 {
        struct virtnet_info *vi = netdev_priv(dev);
@@ -1881,22 +1915,20 @@ static int virtnet_open(struct net_device *dev)
                        if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
                                schedule_delayed_work(&vi->refill, 0);
 
-               err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id);
+               err = virtnet_enable_queue_pair(vi, i);
                if (err < 0)
-                       return err;
-
-               err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq,
-                                                MEM_TYPE_PAGE_SHARED, NULL);
-               if (err < 0) {
-                       xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
-                       return err;
-               }
-
-               virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
-               virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi);
+                       goto err_enable_qp;
        }
 
        return 0;
+
+err_enable_qp:
+       disable_delayed_refill(vi);
+       cancel_delayed_work_sync(&vi->refill);
+
+       for (i--; i >= 0; i--)
+               virtnet_disable_queue_pair(vi, i);
+       return err;
 }
 
 static int virtnet_poll_tx(struct napi_struct *napi, int budget)
@@ -2305,11 +2337,8 @@ static int virtnet_close(struct net_device *dev)
        /* Make sure refill_work doesn't re-enable napi! */
        cancel_delayed_work_sync(&vi->refill);
 
-       for (i = 0; i < vi->max_queue_pairs; i++) {
-               virtnet_napi_tx_disable(&vi->sq[i].napi);
-               napi_disable(&vi->rq[i].napi);
-               xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
-       }
+       for (i = 0; i < vi->max_queue_pairs; i++)
+               virtnet_disable_queue_pair(vi, i);
 
        return 0;
 }
@@ -2907,12 +2936,10 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
                                       struct ethtool_coalesce *ec)
 {
        struct scatterlist sgs_tx, sgs_rx;
-       struct virtio_net_ctrl_coal_tx coal_tx;
-       struct virtio_net_ctrl_coal_rx coal_rx;
 
-       coal_tx.tx_usecs = cpu_to_le32(ec->tx_coalesce_usecs);
-       coal_tx.tx_max_packets = cpu_to_le32(ec->tx_max_coalesced_frames);
-       sg_init_one(&sgs_tx, &coal_tx, sizeof(coal_tx));
+       vi->ctrl->coal_tx.tx_usecs = cpu_to_le32(ec->tx_coalesce_usecs);
+       vi->ctrl->coal_tx.tx_max_packets = cpu_to_le32(ec->tx_max_coalesced_frames);
+       sg_init_one(&sgs_tx, &vi->ctrl->coal_tx, sizeof(vi->ctrl->coal_tx));
 
        if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
                                  VIRTIO_NET_CTRL_NOTF_COAL_TX_SET,
@@ -2923,9 +2950,9 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
        vi->tx_usecs = ec->tx_coalesce_usecs;
        vi->tx_max_packets = ec->tx_max_coalesced_frames;
 
-       coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
-       coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
-       sg_init_one(&sgs_rx, &coal_rx, sizeof(coal_rx));
+       vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
+       vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
+       sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
 
        if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
                                  VIRTIO_NET_CTRL_NOTF_COAL_RX_SET,
index d62a904..56326f3 100644 (file)
@@ -384,6 +384,9 @@ static int lapbeth_new_device(struct net_device *dev)
 
        ASSERT_RTNL();
 
+       if (dev->type != ARPHRD_ETHER)
+               return -EINVAL;
+
        ndev = alloc_netdev(sizeof(*lapbeth), "lapb%d", NET_NAME_UNKNOWN,
                            lapbeth_setup);
        if (!ndev)
index 038c590..52c1a3d 100644 (file)
@@ -1082,8 +1082,7 @@ int ath10k_qmi_init(struct ath10k *ar, u32 msa_size)
        if (ret)
                goto err;
 
-       qmi->event_wq = alloc_workqueue("ath10k_qmi_driver_event",
-                                       WQ_UNBOUND, 1);
+       qmi->event_wq = alloc_ordered_workqueue("ath10k_qmi_driver_event", 0);
        if (!qmi->event_wq) {
                ath10k_err(ar, "failed to allocate workqueue\n");
                ret = -EFAULT;
index ab923e2..26b252e 100644 (file)
@@ -3256,8 +3256,7 @@ int ath11k_qmi_init_service(struct ath11k_base *ab)
                return ret;
        }
 
-       ab->qmi.event_wq = alloc_workqueue("ath11k_qmi_driver_event",
-                                          WQ_UNBOUND, 1);
+       ab->qmi.event_wq = alloc_ordered_workqueue("ath11k_qmi_driver_event", 0);
        if (!ab->qmi.event_wq) {
                ath11k_err(ab, "failed to allocate workqueue\n");
                return -EFAULT;
index 03ba245..0a7892b 100644 (file)
@@ -3056,8 +3056,7 @@ int ath12k_qmi_init_service(struct ath12k_base *ab)
                return ret;
        }
 
-       ab->qmi.event_wq = alloc_workqueue("ath12k_qmi_driver_event",
-                                          WQ_UNBOUND, 1);
+       ab->qmi.event_wq = alloc_ordered_workqueue("ath12k_qmi_driver_event", 0);
        if (!ab->qmi.event_wq) {
                ath12k_err(ab, "failed to allocate workqueue\n");
                return -EFAULT;
index 9fc7c08..67b4bac 100644 (file)
@@ -651,7 +651,7 @@ struct b43_iv {
        union {
                __be16 d16;
                __be32 d32;
-       } data __packed;
+       } __packed data;
 } __packed;
 
 
index 6b0cec4..f49365d 100644 (file)
@@ -379,7 +379,7 @@ struct b43legacy_iv {
        union {
                __be16 d16;
                __be32 d32;
-       } data __packed;
+       } __packed data;
 } __packed;
 
 #define B43legacy_PHYMODE(phytype)     (1 << (phytype))
index ff710b0..00679a9 100644 (file)
@@ -1039,6 +1039,11 @@ static int brcmf_ops_sdio_probe(struct sdio_func *func,
        struct brcmf_sdio_dev *sdiodev;
        struct brcmf_bus *bus_if;
 
+       if (!id) {
+               dev_err(&func->dev, "Error no sdio_device_id passed for %x:%x\n", func->vendor, func->device);
+               return -ENODEV;
+       }
+
        brcmf_dbg(SDIO, "Enter\n");
        brcmf_dbg(SDIO, "Class=%x\n", func->class);
        brcmf_dbg(SDIO, "sdio vendor ID: 0x%04x\n", func->vendor);
index 59f3e9c..8022068 100644 (file)
@@ -2394,6 +2394,9 @@ static void brcmf_pcie_debugfs_create(struct device *dev)
 }
 #endif
 
+/* Forward declaration for pci_match_id() call */
+static const struct pci_device_id brcmf_pcie_devid_table[];
+
 static int
 brcmf_pcie_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
@@ -2404,6 +2407,14 @@ brcmf_pcie_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        struct brcmf_core *core;
        struct brcmf_bus *bus;
 
+       if (!id) {
+               id = pci_match_id(brcmf_pcie_devid_table, pdev);
+               if (!id) {
+                       pci_err(pdev, "Error could not find pci_device_id for %x:%x\n", pdev->vendor, pdev->device);
+                       return -ENODEV;
+               }
+       }
+
        brcmf_dbg(PCIE, "Enter %x:%x\n", pdev->vendor, pdev->device);
 
        ret = -ENOMEM;
index 246843a..2178675 100644 (file)
@@ -1331,6 +1331,9 @@ brcmf_usb_disconnect_cb(struct brcmf_usbdev_info *devinfo)
        brcmf_usb_detach(devinfo);
 }
 
+/* Forward declaration for usb_match_id() call */
+static const struct usb_device_id brcmf_usb_devid_table[];
+
 static int
 brcmf_usb_probe(struct usb_interface *intf, const struct usb_device_id *id)
 {
@@ -1342,6 +1345,14 @@ brcmf_usb_probe(struct usb_interface *intf, const struct usb_device_id *id)
        u32 num_of_eps;
        u8 endpoint_num, ep;
 
+       if (!id) {
+               id = usb_match_id(intf, brcmf_usb_devid_table);
+               if (!id) {
+                       dev_err(&intf->dev, "Error could not find matching usb_device_id\n");
+                       return -ENODEV;
+               }
+       }
+
        brcmf_dbg(USB, "Enter 0x%04x:0x%04x\n", id->idVendor, id->idProduct);
 
        devinfo = kzalloc(sizeof(*devinfo), GFP_ATOMIC);
index 5f4a513..cb9181f 100644 (file)
@@ -38,7 +38,7 @@ static const struct dmi_system_id dmi_ppag_approved_list[] = {
        },
        { .ident = "ASUS",
          .matches = {
-                       DMI_MATCH(DMI_SYS_VENDOR, "ASUSTek COMPUTER INC."),
+                       DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
                },
        },
        {}
index d9faaae..5521997 100644 (file)
@@ -1664,14 +1664,10 @@ static __le32 iwl_get_mon_reg(struct iwl_fw_runtime *fwrt, u32 alloc_id,
 }
 
 static void *
-iwl_dump_ini_mon_fill_header(struct iwl_fw_runtime *fwrt,
-                            struct iwl_dump_ini_region_data *reg_data,
+iwl_dump_ini_mon_fill_header(struct iwl_fw_runtime *fwrt, u32 alloc_id,
                             struct iwl_fw_ini_monitor_dump *data,
                             const struct iwl_fw_mon_regs *addrs)
 {
-       struct iwl_fw_ini_region_tlv *reg = (void *)reg_data->reg_tlv->data;
-       u32 alloc_id = le32_to_cpu(reg->dram_alloc_id);
-
        if (!iwl_trans_grab_nic_access(fwrt->trans)) {
                IWL_ERR(fwrt, "Failed to get monitor header\n");
                return NULL;
@@ -1702,8 +1698,10 @@ iwl_dump_ini_mon_dram_fill_header(struct iwl_fw_runtime *fwrt,
                                  void *data, u32 data_len)
 {
        struct iwl_fw_ini_monitor_dump *mon_dump = (void *)data;
+       struct iwl_fw_ini_region_tlv *reg = (void *)reg_data->reg_tlv->data;
+       u32 alloc_id = le32_to_cpu(reg->dram_alloc_id);
 
-       return iwl_dump_ini_mon_fill_header(fwrt, reg_data, mon_dump,
+       return iwl_dump_ini_mon_fill_header(fwrt, alloc_id, mon_dump,
                                            &fwrt->trans->cfg->mon_dram_regs);
 }
 
@@ -1713,8 +1711,10 @@ iwl_dump_ini_mon_smem_fill_header(struct iwl_fw_runtime *fwrt,
                                  void *data, u32 data_len)
 {
        struct iwl_fw_ini_monitor_dump *mon_dump = (void *)data;
+       struct iwl_fw_ini_region_tlv *reg = (void *)reg_data->reg_tlv->data;
+       u32 alloc_id = le32_to_cpu(reg->internal_buffer.alloc_id);
 
-       return iwl_dump_ini_mon_fill_header(fwrt, reg_data, mon_dump,
+       return iwl_dump_ini_mon_fill_header(fwrt, alloc_id, mon_dump,
                                            &fwrt->trans->cfg->mon_smem_regs);
 }
 
@@ -1725,7 +1725,10 @@ iwl_dump_ini_mon_dbgi_fill_header(struct iwl_fw_runtime *fwrt,
 {
        struct iwl_fw_ini_monitor_dump *mon_dump = (void *)data;
 
-       return iwl_dump_ini_mon_fill_header(fwrt, reg_data, mon_dump,
+       return iwl_dump_ini_mon_fill_header(fwrt,
+                                           /* no offset calculation later */
+                                           IWL_FW_INI_ALLOCATION_ID_DBGC1,
+                                           mon_dump,
                                            &fwrt->trans->cfg->mon_dbgi_regs);
 }
 
index 37aa467..6d1007f 100644 (file)
@@ -2732,17 +2732,13 @@ static bool iwl_mvm_wait_d3_notif(struct iwl_notif_wait_data *notif_wait,
                if (wowlan_info_ver < 2) {
                        struct iwl_wowlan_info_notif_v1 *notif_v1 = (void *)pkt->data;
 
-                       notif = kmemdup(notif_v1,
-                                       offsetofend(struct iwl_wowlan_info_notif,
-                                                   received_beacons),
-                                       GFP_ATOMIC);
-
+                       notif = kmemdup(notif_v1, sizeof(*notif), GFP_ATOMIC);
                        if (!notif)
                                return false;
 
                        notif->tid_tear_down = notif_v1->tid_tear_down;
                        notif->station_id = notif_v1->station_id;
-
+                       memset_after(notif, 0, station_id);
                } else {
                        notif = (void *)pkt->data;
                }
index 3963a0d..652a603 100644 (file)
@@ -526,6 +526,11 @@ iwl_mvm_ftm_put_target(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
                rcu_read_lock();
 
                sta = rcu_dereference(mvm->fw_id_to_mac_id[mvmvif->deflink.ap_sta_id]);
+               if (WARN_ON_ONCE(IS_ERR_OR_NULL(sta))) {
+                       rcu_read_unlock();
+                       return PTR_ERR_OR_ZERO(sta);
+               }
+
                if (sta->mfp && (peer->ftm.trigger_based || peer->ftm.non_trigger_based))
                        FTM_PUT_FLAG(PMF);
 
index b35c96c..205c09b 100644 (file)
@@ -1091,7 +1091,7 @@ static const struct dmi_system_id dmi_tas_approved_list[] = {
        },
                { .ident = "LENOVO",
          .matches = {
-                       DMI_MATCH(DMI_SYS_VENDOR, "Lenovo"),
+                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
                },
        },
        { .ident = "DELL",
@@ -1727,8 +1727,7 @@ int iwl_mvm_up(struct iwl_mvm *mvm)
        iwl_mvm_tas_init(mvm);
        iwl_mvm_leds_sync(mvm);
 
-       if (fw_has_capa(&mvm->fw->ucode_capa,
-                       IWL_UCODE_TLV_CAPA_RFIM_SUPPORT)) {
+       if (iwl_rfi_supported(mvm)) {
                if (iwl_mvm_eval_dsm_rfi(mvm) == DSM_VALUE_RFI_ENABLE)
                        iwl_rfi_send_config_cmd(mvm, NULL);
        }
index eb828de..3814915 100644 (file)
@@ -123,11 +123,13 @@ int iwl_mvm_link_changed(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
                                if (mvmvif->link[i]->phy_ctxt)
                                        count++;
 
-                       /* FIXME: IWL_MVM_FW_MAX_ACTIVE_LINKS_NUM should be
-                        * defined per HW
-                        */
-                       if (count >= IWL_MVM_FW_MAX_ACTIVE_LINKS_NUM)
-                               return -EINVAL;
+                       if (vif->type == NL80211_IFTYPE_AP) {
+                               if (count > mvm->fw->ucode_capa.num_beacons)
+                                       return -EOPNOTSUPP;
+                       /* this should be per HW or such */
+                       } else if (count >= IWL_MVM_FW_MAX_ACTIVE_LINKS_NUM) {
+                               return -EOPNOTSUPP;
+                       }
                }
 
                /* Catch early if driver tries to activate or deactivate a link
index 0f01b62..17f788a 100644 (file)
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
- * Copyright (C) 2012-2014, 2018-2022 Intel Corporation
+ * Copyright (C) 2012-2014, 2018-2023 Intel Corporation
  * Copyright (C) 2013-2015 Intel Mobile Communications GmbH
  * Copyright (C) 2016-2017 Intel Deutschland GmbH
  */
@@ -3607,7 +3607,8 @@ static bool iwl_mvm_vif_conf_from_sta(struct iwl_mvm *mvm,
                                      struct ieee80211_vif *vif,
                                      struct ieee80211_sta *sta)
 {
-       unsigned int i;
+       struct ieee80211_link_sta *link_sta;
+       unsigned int link_id;
 
        /* Beacon interval check - firmware will crash if the beacon
         * interval is less than 16. We can't avoid connecting at all,
@@ -3616,14 +3617,11 @@ static bool iwl_mvm_vif_conf_from_sta(struct iwl_mvm *mvm,
         * wpa_s will blocklist the AP...
         */
 
-       for_each_set_bit(i, (unsigned long *)&sta->valid_links,
-                        IEEE80211_MLD_MAX_NUM_LINKS) {
-               struct ieee80211_link_sta *link_sta =
-                       link_sta_dereference_protected(sta, i);
+       for_each_sta_active_link(vif, sta, link_sta, link_id) {
                struct ieee80211_bss_conf *link_conf =
-                       link_conf_dereference_protected(vif, i);
+                       link_conf_dereference_protected(vif, link_id);
 
-               if (!link_conf || !link_sta)
+               if (!link_conf)
                        continue;
 
                if (link_conf->beacon_int < IWL_MVM_MIN_BEACON_INTERVAL_TU) {
@@ -3645,24 +3643,23 @@ static void iwl_mvm_vif_set_he_support(struct ieee80211_hw *hw,
                                       bool is_sta)
 {
        struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
-       unsigned int i;
+       struct ieee80211_link_sta *link_sta;
+       unsigned int link_id;
 
-       for_each_set_bit(i, (unsigned long *)&sta->valid_links,
-                        IEEE80211_MLD_MAX_NUM_LINKS) {
-               struct ieee80211_link_sta *link_sta =
-                       link_sta_dereference_protected(sta, i);
+       for_each_sta_active_link(vif, sta, link_sta, link_id) {
                struct ieee80211_bss_conf *link_conf =
-                       link_conf_dereference_protected(vif, i);
+                       link_conf_dereference_protected(vif, link_id);
 
-               if (!link_conf || !link_sta || !mvmvif->link[i])
+               if (!link_conf || !mvmvif->link[link_id])
                        continue;
 
                link_conf->he_support = link_sta->he_cap.has_he;
 
                if (is_sta) {
-                       mvmvif->link[i]->he_ru_2mhz_block = false;
+                       mvmvif->link[link_id]->he_ru_2mhz_block = false;
                        if (link_sta->he_cap.has_he)
-                               iwl_mvm_check_he_obss_narrow_bw_ru(hw, vif, i,
+                               iwl_mvm_check_he_obss_narrow_bw_ru(hw, vif,
+                                                                  link_id,
                                                                   link_conf);
                }
        }
@@ -3675,6 +3672,7 @@ iwl_mvm_sta_state_notexist_to_none(struct iwl_mvm *mvm,
                                   struct iwl_mvm_sta_state_ops *callbacks)
 {
        struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
+       struct ieee80211_link_sta *link_sta;
        unsigned int i;
        int ret;
 
@@ -3699,15 +3697,9 @@ iwl_mvm_sta_state_notexist_to_none(struct iwl_mvm *mvm,
                                           NL80211_TDLS_SETUP);
        }
 
-       for (i = 0; i < ARRAY_SIZE(sta->link); i++) {
-               struct ieee80211_link_sta *link_sta;
-
-               link_sta = link_sta_dereference_protected(sta, i);
-               if (!link_sta)
-                       continue;
-
+       for_each_sta_active_link(vif, sta, link_sta, i)
                link_sta->agg.max_rc_amsdu_len = 1;
-       }
+
        ieee80211_sta_recalc_aggregates(sta);
 
        if (vif->type == NL80211_IFTYPE_STATION && !sta->tdls)
@@ -3725,7 +3717,8 @@ iwl_mvm_sta_state_auth_to_assoc(struct ieee80211_hw *hw,
 {
        struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
        struct iwl_mvm_sta *mvm_sta = iwl_mvm_sta_from_mac80211(sta);
-       unsigned int i;
+       struct ieee80211_link_sta *link_sta;
+       unsigned int link_id;
 
        lockdep_assert_held(&mvm->mutex);
 
@@ -3751,14 +3744,13 @@ iwl_mvm_sta_state_auth_to_assoc(struct ieee80211_hw *hw,
                if (!mvm->mld_api_is_used)
                        goto out;
 
-               for_each_set_bit(i, (unsigned long *)&sta->valid_links,
-                                IEEE80211_MLD_MAX_NUM_LINKS) {
+               for_each_sta_active_link(vif, sta, link_sta, link_id) {
                        struct ieee80211_bss_conf *link_conf =
-                               link_conf_dereference_protected(vif, i);
+                               link_conf_dereference_protected(vif, link_id);
 
                        if (WARN_ON(!link_conf))
                                return -EINVAL;
-                       if (!mvmvif->link[i])
+                       if (!mvmvif->link[link_id])
                                continue;
 
                        iwl_mvm_link_changed(mvm, vif, link_conf,
@@ -3889,6 +3881,9 @@ int iwl_mvm_mac_sta_state_common(struct ieee80211_hw *hw,
                 * from the AP now.
                 */
                iwl_mvm_reset_cca_40mhz_workaround(mvm, vif);
+
+               /* Also free dup data just in case any assertions below fail */
+               kfree(mvm_sta->dup_data);
        }
 
        mutex_lock(&mvm->mutex);
index fbc2d5e..7fb66c5 100644 (file)
@@ -906,11 +906,12 @@ iwl_mvm_mld_change_vif_links(struct ieee80211_hw *hw,
                                n_active++;
                }
 
-               if (vif->type == NL80211_IFTYPE_AP &&
-                   n_active > mvm->fw->ucode_capa.num_beacons)
-                       return -EOPNOTSUPP;
-               else if (n_active > 1)
+               if (vif->type == NL80211_IFTYPE_AP) {
+                       if (n_active > mvm->fw->ucode_capa.num_beacons)
+                               return -EOPNOTSUPP;
+               } else if (n_active > 1) {
                        return -EOPNOTSUPP;
+               }
        }
 
        for (i = 0; i < IEEE80211_MLD_MAX_NUM_LINKS; i++) {
index 0bfdf44..85a4ce8 100644 (file)
@@ -667,15 +667,15 @@ int iwl_mvm_mld_add_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
                ret = iwl_mvm_mld_alloc_sta_links(mvm, vif, sta);
                if (ret)
                        return ret;
-       }
 
-       spin_lock_init(&mvm_sta->lock);
+               spin_lock_init(&mvm_sta->lock);
 
-       if (test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status))
-               ret = iwl_mvm_alloc_sta_after_restart(mvm, vif, sta);
-       else
                ret = iwl_mvm_sta_init(mvm, vif, sta, IWL_MVM_INVALID_STA,
                                       STATION_TYPE_PEER);
+       } else {
+               ret = iwl_mvm_alloc_sta_after_restart(mvm, vif, sta);
+       }
+
        if (ret)
                goto err;
 
@@ -728,7 +728,7 @@ int iwl_mvm_mld_update_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
        struct iwl_mvm_sta *mvm_sta = iwl_mvm_sta_from_mac80211(sta);
        struct ieee80211_link_sta *link_sta;
        unsigned int link_id;
-       int ret = 0;
+       int ret = -EINVAL;
 
        lockdep_assert_held(&mvm->mutex);
 
@@ -791,8 +791,6 @@ int iwl_mvm_mld_rm_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
 
        lockdep_assert_held(&mvm->mutex);
 
-       kfree(mvm_sta->dup_data);
-
        /* flush its queues here since we are freeing mvm_sta */
        for_each_sta_active_link(vif, sta, link_sta, link_id) {
                struct iwl_mvm_link_sta *mvm_link_sta =
index 6e7470d..9e5008e 100644 (file)
@@ -2347,6 +2347,7 @@ int iwl_mvm_mld_update_sta_keys(struct iwl_mvm *mvm,
                                u32 old_sta_mask,
                                u32 new_sta_mask);
 
+bool iwl_rfi_supported(struct iwl_mvm *mvm);
 int iwl_rfi_send_config_cmd(struct iwl_mvm *mvm,
                            struct iwl_rfi_lut_entry *rfi_table);
 struct iwl_rfi_freq_table_resp_cmd *iwl_rfi_get_freq_table(struct iwl_mvm *mvm);
index 6d18a1f..fdf60af 100644 (file)
@@ -445,6 +445,11 @@ iwl_mvm_update_mcc(struct iwl_mvm *mvm, const char *alpha2,
                struct iwl_mcc_update_resp *mcc_resp = (void *)pkt->data;
 
                n_channels =  __le32_to_cpu(mcc_resp->n_channels);
+               if (iwl_rx_packet_payload_len(pkt) !=
+                   struct_size(mcc_resp, channels, n_channels)) {
+                       resp_cp = ERR_PTR(-EINVAL);
+                       goto exit;
+               }
                resp_len = sizeof(struct iwl_mcc_update_resp) +
                           n_channels * sizeof(__le32);
                resp_cp = kmemdup(mcc_resp, resp_len, GFP_KERNEL);
@@ -456,6 +461,11 @@ iwl_mvm_update_mcc(struct iwl_mvm *mvm, const char *alpha2,
                struct iwl_mcc_update_resp_v3 *mcc_resp_v3 = (void *)pkt->data;
 
                n_channels =  __le32_to_cpu(mcc_resp_v3->n_channels);
+               if (iwl_rx_packet_payload_len(pkt) !=
+                   struct_size(mcc_resp_v3, channels, n_channels)) {
+                       resp_cp = ERR_PTR(-EINVAL);
+                       goto exit;
+               }
                resp_len = sizeof(struct iwl_mcc_update_resp) +
                           n_channels * sizeof(__le32);
                resp_cp = kzalloc(resp_len, GFP_KERNEL);
index bb77bc9..2ecd32b 100644 (file)
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
- * Copyright (C) 2020 - 2021 Intel Corporation
+ * Copyright (C) 2020 - 2022 Intel Corporation
  */
 
 #include "mvm.h"
@@ -70,6 +70,16 @@ static const struct iwl_rfi_lut_entry iwl_rfi_table[IWL_RFI_LUT_SIZE] = {
                PHY_BAND_6, PHY_BAND_6,}},
 };
 
+bool iwl_rfi_supported(struct iwl_mvm *mvm)
+{
+       /* The feature depends on a platform bugfix, so for now
+        * it's always disabled.
+        * When the platform support detection is implemented we should
+        * check FW TLV and platform support instead.
+        */
+       return false;
+}
+
 int iwl_rfi_send_config_cmd(struct iwl_mvm *mvm, struct iwl_rfi_lut_entry *rfi_table)
 {
        int ret;
@@ -81,7 +91,7 @@ int iwl_rfi_send_config_cmd(struct iwl_mvm *mvm, struct iwl_rfi_lut_entry *rfi_t
                .len[0] = sizeof(cmd),
        };
 
-       if (!fw_has_capa(&mvm->fw->ucode_capa, IWL_UCODE_TLV_CAPA_RFIM_SUPPORT))
+       if (!iwl_rfi_supported(mvm))
                return -EOPNOTSUPP;
 
        lockdep_assert_held(&mvm->mutex);
@@ -113,7 +123,7 @@ struct iwl_rfi_freq_table_resp_cmd *iwl_rfi_get_freq_table(struct iwl_mvm *mvm)
                .flags = CMD_WANT_SKB,
        };
 
-       if (!fw_has_capa(&mvm->fw->ucode_capa, IWL_UCODE_TLV_CAPA_RFIM_SUPPORT))
+       if (!iwl_rfi_supported(mvm))
                return ERR_PTR(-EOPNOTSUPP);
 
        mutex_lock(&mvm->mutex);
index a4c1e3b..9a20468 100644 (file)
@@ -2691,6 +2691,8 @@ static void rs_drv_get_rate(void *mvm_r, struct ieee80211_sta *sta,
                return;
 
        lq_sta = mvm_sta;
+
+       spin_lock_bh(&lq_sta->pers.lock);
        iwl_mvm_hwrate_to_tx_rate_v1(lq_sta->last_rate_n_flags,
                                     info->band, &info->control.rates[0]);
        info->control.rates[0].count = 1;
@@ -2705,6 +2707,7 @@ static void rs_drv_get_rate(void *mvm_r, struct ieee80211_sta *sta,
                iwl_mvm_hwrate_to_tx_rate_v1(last_ucode_rate, info->band,
                                             &txrc->reported_rate);
        }
+       spin_unlock_bh(&lq_sta->pers.lock);
 }
 
 static void *rs_drv_alloc_sta(void *mvm_rate, struct ieee80211_sta *sta,
@@ -3261,11 +3264,11 @@ void iwl_mvm_rs_tx_status(struct iwl_mvm *mvm, struct ieee80211_sta *sta,
        /* If it's locked we are in middle of init flow
         * just wait for next tx status to update the lq_sta data
         */
-       if (!spin_trylock(&mvmsta->deflink.lq_sta.rs_drv.pers.lock))
+       if (!spin_trylock_bh(&mvmsta->deflink.lq_sta.rs_drv.pers.lock))
                return;
 
        __iwl_mvm_rs_tx_status(mvm, sta, tid, info, ndp);
-       spin_unlock(&mvmsta->deflink.lq_sta.rs_drv.pers.lock);
+       spin_unlock_bh(&mvmsta->deflink.lq_sta.rs_drv.pers.lock);
 }
 
 #ifdef CONFIG_MAC80211_DEBUGFS
@@ -4114,9 +4117,9 @@ void iwl_mvm_rs_rate_init(struct iwl_mvm *mvm,
        } else {
                struct iwl_mvm_sta *mvmsta = iwl_mvm_sta_from_mac80211(sta);
 
-               spin_lock(&mvmsta->deflink.lq_sta.rs_drv.pers.lock);
+               spin_lock_bh(&mvmsta->deflink.lq_sta.rs_drv.pers.lock);
                rs_drv_rate_init(mvm, sta, band);
-               spin_unlock(&mvmsta->deflink.lq_sta.rs_drv.pers.lock);
+               spin_unlock_bh(&mvmsta->deflink.lq_sta.rs_drv.pers.lock);
        }
 }
 
index e1d02c2..6226e4e 100644 (file)
@@ -691,6 +691,11 @@ void iwl_mvm_reorder_timer_expired(struct timer_list *t)
 
                rcu_read_lock();
                sta = rcu_dereference(buf->mvm->fw_id_to_mac_id[sta_id]);
+               if (WARN_ON_ONCE(IS_ERR_OR_NULL(sta))) {
+                       rcu_read_unlock();
+                       goto out;
+               }
+
                mvmsta = iwl_mvm_sta_from_mac80211(sta);
 
                /* SN is set to the last expired frame + 1 */
@@ -712,6 +717,8 @@ void iwl_mvm_reorder_timer_expired(struct timer_list *t)
                          entries[index].e.reorder_time +
                          1 + RX_REORDER_BUF_TIMEOUT_MQ);
        }
+
+out:
        spin_unlock(&buf->lock);
 }
 
@@ -2512,7 +2519,7 @@ void iwl_mvm_rx_mpdu_mq(struct iwl_mvm *mvm, struct napi_struct *napi,
                                RCU_INIT_POINTER(mvm->csa_tx_blocked_vif, NULL);
                                /* Unblock BCAST / MCAST station */
                                iwl_mvm_modify_all_sta_disable_tx(mvm, mvmvif, false);
-                               cancel_delayed_work_sync(&mvm->cs_tx_unblock_dwork);
+                               cancel_delayed_work(&mvm->cs_tx_unblock_dwork);
                        }
                }
 
index 5469d63..05a54a6 100644 (file)
@@ -281,7 +281,7 @@ static void iwl_mvm_rx_agg_session_expired(struct timer_list *t)
         * A-MDPU and hence the timer continues to run. Then, the
         * timer expires and sta is NULL.
         */
-       if (!sta)
+       if (IS_ERR_OR_NULL(sta))
                goto unlock;
 
        mvm_sta = iwl_mvm_sta_from_mac80211(sta);
@@ -2089,9 +2089,6 @@ int iwl_mvm_rm_sta(struct iwl_mvm *mvm,
 
        lockdep_assert_held(&mvm->mutex);
 
-       if (iwl_mvm_has_new_rx_api(mvm))
-               kfree(mvm_sta->dup_data);
-
        ret = iwl_mvm_drain_sta(mvm, mvm_sta, true);
        if (ret)
                return ret;
@@ -3785,6 +3782,9 @@ static inline u8 *iwl_mvm_get_mac_addr(struct iwl_mvm *mvm,
                u8 sta_id = mvmvif->deflink.ap_sta_id;
                sta = rcu_dereference_protected(mvm->fw_id_to_mac_id[sta_id],
                                                lockdep_is_held(&mvm->mutex));
+               if (WARN_ON_ONCE(IS_ERR_OR_NULL(sta)))
+                       return NULL;
+
                return sta->addr;
        }
 
@@ -3822,6 +3822,11 @@ static int __iwl_mvm_set_sta_key(struct iwl_mvm *mvm,
 
        if (keyconf->cipher == WLAN_CIPHER_SUITE_TKIP) {
                addr = iwl_mvm_get_mac_addr(mvm, vif, sta);
+               if (!addr) {
+                       IWL_ERR(mvm, "Failed to find mac address\n");
+                       return -EINVAL;
+               }
+
                /* get phase 1 key from mac80211 */
                ieee80211_get_key_rx_seq(keyconf, 0, &seq);
                ieee80211_get_tkip_rx_p1k(keyconf, addr, seq.tkip.iv32, p1k);
index 10d7178..00719e1 100644 (file)
@@ -1875,7 +1875,7 @@ static void iwl_mvm_rx_tx_cmd_agg(struct iwl_mvm *mvm,
        mvmsta = iwl_mvm_sta_from_staid_rcu(mvm, sta_id);
 
        sta = rcu_dereference(mvm->fw_id_to_mac_id[sta_id]);
-       if (WARN_ON_ONCE(!sta || !sta->wme)) {
+       if (WARN_ON_ONCE(IS_ERR_OR_NULL(sta) || !sta->wme)) {
                rcu_read_unlock();
                return;
        }
index dba1123..79115eb 100644 (file)
@@ -548,6 +548,8 @@ static const struct iwl_dev_info iwl_dev_info_table[] = {
        IWL_DEV_INFO(0x54F0, 0x1692, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690i_name),
        IWL_DEV_INFO(0x7A70, 0x1691, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690s_name),
        IWL_DEV_INFO(0x7A70, 0x1692, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690i_name),
+       IWL_DEV_INFO(0x7AF0, 0x1691, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690s_name),
+       IWL_DEV_INFO(0x7AF0, 0x1692, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690i_name),
 
        IWL_DEV_INFO(0x271C, 0x0214, iwl9260_2ac_cfg, iwl9260_1_name),
        IWL_DEV_INFO(0x7E40, 0x1691, iwl_cfg_ma_a0_gf4_a0, iwl_ax411_killer_1690s_name),
index b281850..59e14b3 100644 (file)
@@ -3576,7 +3576,7 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct pci_dev *pdev,
        init_waitqueue_head(&trans_pcie->imr_waitq);
 
        trans_pcie->rba.alloc_wq = alloc_workqueue("rb_allocator",
-                                                  WQ_HIGHPRI | WQ_UNBOUND, 1);
+                                                  WQ_HIGHPRI | WQ_UNBOUND, 0);
        if (!trans_pcie->rba.alloc_wq) {
                ret = -ENOMEM;
                goto out_free_trans;
index bcd564d..5337ee4 100644 (file)
@@ -3127,7 +3127,7 @@ struct wireless_dev *mwifiex_add_virtual_intf(struct wiphy *wiphy,
        priv->dfs_cac_workqueue = alloc_workqueue("MWIFIEX_DFS_CAC%s",
                                                  WQ_HIGHPRI |
                                                  WQ_MEM_RECLAIM |
-                                                 WQ_UNBOUND, 1, name);
+                                                 WQ_UNBOUND, 0, name);
        if (!priv->dfs_cac_workqueue) {
                mwifiex_dbg(adapter, ERROR, "cannot alloc DFS CAC queue\n");
                ret = -ENOMEM;
@@ -3138,7 +3138,7 @@ struct wireless_dev *mwifiex_add_virtual_intf(struct wiphy *wiphy,
 
        priv->dfs_chan_sw_workqueue = alloc_workqueue("MWIFIEX_DFS_CHSW%s",
                                                      WQ_HIGHPRI | WQ_UNBOUND |
-                                                     WQ_MEM_RECLAIM, 1, name);
+                                                     WQ_MEM_RECLAIM, 0, name);
        if (!priv->dfs_chan_sw_workqueue) {
                mwifiex_dbg(adapter, ERROR, "cannot alloc DFS channel sw queue\n");
                ret = -ENOMEM;
index ea22a08..1cd9d20 100644 (file)
@@ -1547,7 +1547,7 @@ mwifiex_reinit_sw(struct mwifiex_adapter *adapter)
 
        adapter->workqueue =
                alloc_workqueue("MWIFIEX_WORK_QUEUE",
-                               WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+                               WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
        if (!adapter->workqueue)
                goto err_kmalloc;
 
@@ -1557,7 +1557,7 @@ mwifiex_reinit_sw(struct mwifiex_adapter *adapter)
                adapter->rx_workqueue = alloc_workqueue("MWIFIEX_RX_WORK_QUEUE",
                                                        WQ_HIGHPRI |
                                                        WQ_MEM_RECLAIM |
-                                                       WQ_UNBOUND, 1);
+                                                       WQ_UNBOUND, 0);
                if (!adapter->rx_workqueue)
                        goto err_kmalloc;
                INIT_WORK(&adapter->rx_work, mwifiex_rx_work_queue);
@@ -1702,7 +1702,7 @@ mwifiex_add_card(void *card, struct completion *fw_done,
 
        adapter->workqueue =
                alloc_workqueue("MWIFIEX_WORK_QUEUE",
-                               WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+                               WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
        if (!adapter->workqueue)
                goto err_kmalloc;
 
@@ -1712,7 +1712,7 @@ mwifiex_add_card(void *card, struct completion *fw_done,
                adapter->rx_workqueue = alloc_workqueue("MWIFIEX_RX_WORK_QUEUE",
                                                        WQ_HIGHPRI |
                                                        WQ_MEM_RECLAIM |
-                                                       WQ_UNBOUND, 1);
+                                                       WQ_UNBOUND, 0);
                if (!adapter->rx_workqueue)
                        goto err_kmalloc;
 
index da1d17b..6400248 100644 (file)
@@ -914,7 +914,10 @@ void mt7615_mac_sta_poll(struct mt7615_dev *dev)
 
                msta = list_first_entry(&sta_poll_list, struct mt7615_sta,
                                        poll_list);
+
+               spin_lock_bh(&dev->sta_poll_lock);
                list_del_init(&msta->poll_list);
+               spin_unlock_bh(&dev->sta_poll_lock);
 
                addr = mt7615_mac_wtbl_addr(dev, msta->wcid.idx) + 19 * 4;
 
index a5ec0f6..fabf637 100644 (file)
@@ -173,7 +173,7 @@ enum {
 #define MT_TXS5_MPDU_TX_CNT            GENMASK(31, 23)
 
 #define MT_TXS6_MPDU_FAIL_CNT          GENMASK(31, 23)
-
+#define MT_TXS7_MPDU_RETRY_BYTE                GENMASK(22, 0)
 #define MT_TXS7_MPDU_RETRY_CNT         GENMASK(31, 23)
 
 /* RXD DW0 */
index ee0fbfc..d39a3cc 100644 (file)
@@ -608,7 +608,8 @@ bool mt76_connac2_mac_fill_txs(struct mt76_dev *dev, struct mt76_wcid *wcid,
        /* PPDU based reporting */
        if (FIELD_GET(MT_TXS0_TXS_FORMAT, txs) > 1) {
                stats->tx_bytes +=
-                       le32_get_bits(txs_data[5], MT_TXS5_MPDU_TX_BYTE);
+                       le32_get_bits(txs_data[5], MT_TXS5_MPDU_TX_BYTE) -
+                       le32_get_bits(txs_data[7], MT_TXS7_MPDU_RETRY_BYTE);
                stats->tx_packets +=
                        le32_get_bits(txs_data[5], MT_TXS5_MPDU_TX_CNT);
                stats->tx_failed +=
index 130eb7b..9b0f605 100644 (file)
@@ -1004,10 +1004,10 @@ void mt7996_mac_write_txwi(struct mt7996_dev *dev, __le32 *txwi,
 {
        struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
        struct ieee80211_vif *vif = info->control.vif;
-       struct mt7996_vif *mvif = (struct mt7996_vif *)vif->drv_priv;
        u8 band_idx = (info->hw_queue & MT_TX_HW_QUEUE_PHY) >> 2;
        u8 p_fmt, q_idx, omac_idx = 0, wmm_idx = 0;
        bool is_8023 = info->flags & IEEE80211_TX_CTL_HW_80211_ENCAP;
+       struct mt7996_vif *mvif;
        u16 tx_count = 15;
        u32 val;
        bool beacon = !!(changed & (BSS_CHANGED_BEACON |
@@ -1015,7 +1015,8 @@ void mt7996_mac_write_txwi(struct mt7996_dev *dev, __le32 *txwi,
        bool inband_disc = !!(changed & (BSS_CHANGED_UNSOL_BCAST_PROBE_RESP |
                                         BSS_CHANGED_FILS_DISCOVERY));
 
-       if (vif) {
+       mvif = vif ? (struct mt7996_vif *)vif->drv_priv : NULL;
+       if (mvif) {
                omac_idx = mvif->mt76.omac_idx;
                wmm_idx = mvif->mt76.wmm_idx;
                band_idx = mvif->mt76.band_idx;
@@ -1081,14 +1082,18 @@ void mt7996_mac_write_txwi(struct mt7996_dev *dev, __le32 *txwi,
                struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
                bool mcast = ieee80211_is_data(hdr->frame_control) &&
                             is_multicast_ether_addr(hdr->addr1);
-               u8 idx = mvif->basic_rates_idx;
+               u8 idx = MT7996_BASIC_RATES_TBL;
 
-               if (mcast && mvif->mcast_rates_idx)
-                       idx = mvif->mcast_rates_idx;
-               else if (beacon && mvif->beacon_rates_idx)
-                       idx = mvif->beacon_rates_idx;
+               if (mvif) {
+                       if (mcast && mvif->mcast_rates_idx)
+                               idx = mvif->mcast_rates_idx;
+                       else if (beacon && mvif->beacon_rates_idx)
+                               idx = mvif->beacon_rates_idx;
+                       else
+                               idx = mvif->basic_rates_idx;
+               }
 
-               txwi[6] |= FIELD_PREP(MT_TXD6_TX_RATE, idx);
+               txwi[6] |= cpu_to_le32(FIELD_PREP(MT_TXD6_TX_RATE, idx));
                txwi[3] |= cpu_to_le32(MT_TXD3_BA_DISABLE);
        }
 }
index 8eafbf1..808c1c8 100644 (file)
@@ -1803,6 +1803,7 @@ struct rtl8xxxu_priv {
        u32 rege9c;
        u32 regeb4;
        u32 regebc;
+       u32 regrcr;
        int next_mbox;
        int nr_out_eps;
 
index fd8c8c6..831639d 100644 (file)
@@ -4171,6 +4171,7 @@ static int rtl8xxxu_init_device(struct ieee80211_hw *hw)
                RCR_ACCEPT_MGMT_FRAME | RCR_HTC_LOC_CTRL |
                RCR_APPEND_PHYSTAT | RCR_APPEND_ICV | RCR_APPEND_MIC;
        rtl8xxxu_write32(priv, REG_RCR, val32);
+       priv->regrcr = val32;
 
        if (fops->init_reg_rxfltmap) {
                /* Accept all data frames */
@@ -6501,7 +6502,7 @@ static void rtl8xxxu_configure_filter(struct ieee80211_hw *hw,
                                      unsigned int *total_flags, u64 multicast)
 {
        struct rtl8xxxu_priv *priv = hw->priv;
-       u32 rcr = rtl8xxxu_read32(priv, REG_RCR);
+       u32 rcr = priv->regrcr;
 
        dev_dbg(&priv->udev->dev, "%s: changed_flags %08x, total_flags %08x\n",
                __func__, changed_flags, *total_flags);
@@ -6547,6 +6548,7 @@ static void rtl8xxxu_configure_filter(struct ieee80211_hw *hw,
         */
 
        rtl8xxxu_write32(priv, REG_RCR, rcr);
+       priv->regrcr = rcr;
 
        *total_flags &= (FIF_ALLMULTI | FIF_FCSFAIL | FIF_BCN_PRBRESP_PROMISC |
                         FIF_CONTROL | FIF_OTHER_BSS | FIF_PSPOLL |
index 7aa6eda..144618b 100644 (file)
@@ -88,15 +88,6 @@ static int rtw_ops_config(struct ieee80211_hw *hw, u32 changed)
                }
        }
 
-       if (changed & IEEE80211_CONF_CHANGE_PS) {
-               if (hw->conf.flags & IEEE80211_CONF_PS) {
-                       rtwdev->ps_enabled = true;
-               } else {
-                       rtwdev->ps_enabled = false;
-                       rtw_leave_lps(rtwdev);
-               }
-       }
-
        if (changed & IEEE80211_CONF_CHANGE_CHANNEL)
                rtw_set_channel(rtwdev);
 
@@ -213,6 +204,7 @@ static int rtw_ops_add_interface(struct ieee80211_hw *hw,
        config |= PORT_SET_BCN_CTRL;
        rtw_vif_port_config(rtwdev, rtwvif, config);
        rtw_core_port_switch(rtwdev, vif);
+       rtw_recalc_lps(rtwdev, vif);
 
        mutex_unlock(&rtwdev->mutex);
 
@@ -244,6 +236,7 @@ static void rtw_ops_remove_interface(struct ieee80211_hw *hw,
        config |= PORT_SET_BCN_CTRL;
        rtw_vif_port_config(rtwdev, rtwvif, config);
        clear_bit(rtwvif->port, rtwdev->hw_port);
+       rtw_recalc_lps(rtwdev, NULL);
 
        mutex_unlock(&rtwdev->mutex);
 }
@@ -438,6 +431,9 @@ static void rtw_ops_bss_info_changed(struct ieee80211_hw *hw,
        if (changed & BSS_CHANGED_ERP_SLOT)
                rtw_conf_tx(rtwdev, rtwvif);
 
+       if (changed & BSS_CHANGED_PS)
+               rtw_recalc_lps(rtwdev, NULL);
+
        rtw_vif_port_config(rtwdev, rtwvif, config);
 
        mutex_unlock(&rtwdev->mutex);
@@ -918,7 +914,7 @@ static void rtw_ops_sta_rc_update(struct ieee80211_hw *hw,
        struct rtw_sta_info *si = (struct rtw_sta_info *)sta->drv_priv;
 
        if (changed & IEEE80211_RC_BW_CHANGED)
-               rtw_update_sta_info(rtwdev, si, true);
+               ieee80211_queue_work(rtwdev->hw, &si->rc_work);
 }
 
 const struct ieee80211_ops rtw_ops = {
index 5bf6b45..9447a3a 100644 (file)
@@ -271,8 +271,8 @@ static void rtw_watch_dog_work(struct work_struct *work)
         * more than two stations associated to the AP, then we can not enter
         * lps, because fw does not handle the overlapped beacon interval
         *
-        * mac80211 should iterate vifs and determine if driver can enter
-        * ps by passing IEEE80211_CONF_PS to us, all we need to do is to
+        * rtw_recalc_lps() iterate vifs and determine if driver can enter
+        * ps by vif->type and vif->cfg.ps, all we need to do here is to
         * get that vif and check if device is having traffic more than the
         * threshold.
         */
@@ -319,6 +319,17 @@ static u8 rtw_acquire_macid(struct rtw_dev *rtwdev)
        return mac_id;
 }
 
+static void rtw_sta_rc_work(struct work_struct *work)
+{
+       struct rtw_sta_info *si = container_of(work, struct rtw_sta_info,
+                                              rc_work);
+       struct rtw_dev *rtwdev = si->rtwdev;
+
+       mutex_lock(&rtwdev->mutex);
+       rtw_update_sta_info(rtwdev, si, true);
+       mutex_unlock(&rtwdev->mutex);
+}
+
 int rtw_sta_add(struct rtw_dev *rtwdev, struct ieee80211_sta *sta,
                struct ieee80211_vif *vif)
 {
@@ -329,12 +340,14 @@ int rtw_sta_add(struct rtw_dev *rtwdev, struct ieee80211_sta *sta,
        if (si->mac_id >= RTW_MAX_MAC_ID_NUM)
                return -ENOSPC;
 
+       si->rtwdev = rtwdev;
        si->sta = sta;
        si->vif = vif;
        si->init_ra_lv = 1;
        ewma_rssi_init(&si->avg_rssi);
        for (i = 0; i < ARRAY_SIZE(sta->txq); i++)
                rtw_txq_init(rtwdev, sta->txq[i]);
+       INIT_WORK(&si->rc_work, rtw_sta_rc_work);
 
        rtw_update_sta_info(rtwdev, si, true);
        rtw_fw_media_status_report(rtwdev, si->mac_id, true);
@@ -353,6 +366,8 @@ void rtw_sta_remove(struct rtw_dev *rtwdev, struct ieee80211_sta *sta,
        struct rtw_sta_info *si = (struct rtw_sta_info *)sta->drv_priv;
        int i;
 
+       cancel_work_sync(&si->rc_work);
+
        rtw_release_macid(rtwdev, si->mac_id);
        if (fw_exist)
                rtw_fw_media_status_report(rtwdev, si->mac_id, false);
index a563285..9e841f6 100644 (file)
@@ -743,6 +743,7 @@ struct rtw_txq {
 DECLARE_EWMA(rssi, 10, 16);
 
 struct rtw_sta_info {
+       struct rtw_dev *rtwdev;
        struct ieee80211_sta *sta;
        struct ieee80211_vif *vif;
 
@@ -767,6 +768,8 @@ struct rtw_sta_info {
 
        bool use_cfg_mask;
        struct cfg80211_bitrate_mask *mask;
+
+       struct work_struct rc_work;
 };
 
 enum rtw_bfee_role {
index 9963655..53933fb 100644 (file)
@@ -299,3 +299,46 @@ void rtw_leave_lps_deep(struct rtw_dev *rtwdev)
 
        __rtw_leave_lps_deep(rtwdev);
 }
+
+struct rtw_vif_recalc_lps_iter_data {
+       struct rtw_dev *rtwdev;
+       struct ieee80211_vif *found_vif;
+       int count;
+};
+
+static void __rtw_vif_recalc_lps(struct rtw_vif_recalc_lps_iter_data *data,
+                                struct ieee80211_vif *vif)
+{
+       if (data->count < 0)
+               return;
+
+       if (vif->type != NL80211_IFTYPE_STATION) {
+               data->count = -1;
+               return;
+       }
+
+       data->count++;
+       data->found_vif = vif;
+}
+
+static void rtw_vif_recalc_lps_iter(void *data, u8 *mac,
+                                   struct ieee80211_vif *vif)
+{
+       __rtw_vif_recalc_lps(data, vif);
+}
+
+void rtw_recalc_lps(struct rtw_dev *rtwdev, struct ieee80211_vif *new_vif)
+{
+       struct rtw_vif_recalc_lps_iter_data data = { .rtwdev = rtwdev };
+
+       if (new_vif)
+               __rtw_vif_recalc_lps(&data, new_vif);
+       rtw_iterate_vifs(rtwdev, rtw_vif_recalc_lps_iter, &data);
+
+       if (data.count == 1 && data.found_vif->cfg.ps) {
+               rtwdev->ps_enabled = true;
+       } else {
+               rtwdev->ps_enabled = false;
+               rtw_leave_lps(rtwdev);
+       }
+}
index c194386..5ae83d2 100644 (file)
@@ -23,4 +23,6 @@ void rtw_enter_lps(struct rtw_dev *rtwdev, u8 port_id);
 void rtw_leave_lps(struct rtw_dev *rtwdev);
 void rtw_leave_lps_deep(struct rtw_dev *rtwdev);
 enum rtw_lps_deep_mode rtw_get_lps_deep_mode(struct rtw_dev *rtwdev);
+void rtw_recalc_lps(struct rtw_dev *rtwdev, struct ieee80211_vif *new_vif);
+
 #endif
index af0459a..06fce7c 100644 (file)
@@ -87,11 +87,6 @@ static void rtw_sdio_writew(struct rtw_dev *rtwdev, u16 val, u32 addr,
        u8 buf[2];
        int i;
 
-       if (rtw_sdio_use_memcpy_io(rtwdev, addr, 2)) {
-               sdio_writew(rtwsdio->sdio_func, val, addr, err_ret);
-               return;
-       }
-
        *(__le16 *)buf = cpu_to_le16(val);
 
        for (i = 0; i < 2; i++) {
@@ -125,9 +120,6 @@ static u16 rtw_sdio_readw(struct rtw_dev *rtwdev, u32 addr, int *err_ret)
        u8 buf[2];
        int i;
 
-       if (rtw_sdio_use_memcpy_io(rtwdev, addr, 2))
-               return sdio_readw(rtwsdio->sdio_func, addr, err_ret);
-
        for (i = 0; i < 2; i++) {
                buf[i] = sdio_readb(rtwsdio->sdio_func, addr + i, err_ret);
                if (*err_ret)
index 30647f0..ad1d795 100644 (file)
@@ -78,7 +78,7 @@ struct rtw_usb {
        u8 pipe_interrupt;
        u8 pipe_in;
        u8 out_ep[RTW_USB_EP_MAX];
-       u8 qsel_to_ep[TX_DESC_QSEL_MAX];
+       int qsel_to_ep[TX_DESC_QSEL_MAX];
        u8 usb_txagg_num;
 
        struct workqueue_struct *txwq, *rxwq;
index 7fc0a26..bad864d 100644 (file)
@@ -2531,9 +2531,6 @@ static void rtw89_vif_enter_lps(struct rtw89_dev *rtwdev, struct rtw89_vif *rtwv
            rtwvif->tdls_peer)
                return;
 
-       if (rtwdev->total_sta_assoc > 1)
-               return;
-
        if (rtwvif->offchan)
                return;
 
index b8019cf..512de49 100644 (file)
@@ -1425,6 +1425,8 @@ const struct rtw89_mac_size_set rtw89_mac_size = {
        .wde_size4 = {RTW89_WDE_PG_64, 0, 4096,},
        /* PCIE 64 */
        .wde_size6 = {RTW89_WDE_PG_64, 512, 0,},
+       /* 8852B PCIE SCC */
+       .wde_size7 = {RTW89_WDE_PG_64, 510, 2,},
        /* DLFW */
        .wde_size9 = {RTW89_WDE_PG_64, 0, 1024,},
        /* 8852C DLFW */
@@ -1449,6 +1451,8 @@ const struct rtw89_mac_size_set rtw89_mac_size = {
        .wde_qt4 = {0, 0, 0, 0,},
        /* PCIE 64 */
        .wde_qt6 = {448, 48, 0, 16,},
+       /* 8852B PCIE SCC */
+       .wde_qt7 = {446, 48, 0, 16,},
        /* 8852C DLFW */
        .wde_qt17 = {0, 0, 0,  0,},
        /* 8852C PCIE SCC */
index a8d9847..6ba633c 100644 (file)
@@ -792,6 +792,7 @@ struct rtw89_mac_size_set {
        const struct rtw89_dle_size wde_size0;
        const struct rtw89_dle_size wde_size4;
        const struct rtw89_dle_size wde_size6;
+       const struct rtw89_dle_size wde_size7;
        const struct rtw89_dle_size wde_size9;
        const struct rtw89_dle_size wde_size18;
        const struct rtw89_dle_size wde_size19;
@@ -804,6 +805,7 @@ struct rtw89_mac_size_set {
        const struct rtw89_wde_quota wde_qt0;
        const struct rtw89_wde_quota wde_qt4;
        const struct rtw89_wde_quota wde_qt6;
+       const struct rtw89_wde_quota wde_qt7;
        const struct rtw89_wde_quota wde_qt17;
        const struct rtw89_wde_quota wde_qt18;
        const struct rtw89_ple_quota ple_qt4;
index ee4588b..c42e310 100644 (file)
@@ -89,15 +89,6 @@ static int rtw89_ops_config(struct ieee80211_hw *hw, u32 changed)
            !(hw->conf.flags & IEEE80211_CONF_IDLE))
                rtw89_leave_ips(rtwdev);
 
-       if (changed & IEEE80211_CONF_CHANGE_PS) {
-               if (hw->conf.flags & IEEE80211_CONF_PS) {
-                       rtwdev->lps_enabled = true;
-               } else {
-                       rtw89_leave_lps(rtwdev);
-                       rtwdev->lps_enabled = false;
-               }
-       }
-
        if (changed & IEEE80211_CONF_CHANGE_CHANNEL) {
                rtw89_config_entity_chandef(rtwdev, RTW89_SUB_ENTITY_0,
                                            &hw->conf.chandef);
@@ -168,6 +159,8 @@ static int rtw89_ops_add_interface(struct ieee80211_hw *hw,
        rtw89_core_txq_init(rtwdev, vif->txq);
 
        rtw89_btc_ntfy_role_info(rtwdev, rtwvif, NULL, BTC_ROLE_START);
+
+       rtw89_recalc_lps(rtwdev);
 out:
        mutex_unlock(&rtwdev->mutex);
 
@@ -192,6 +185,7 @@ static void rtw89_ops_remove_interface(struct ieee80211_hw *hw,
        rtw89_mac_remove_vif(rtwdev, rtwvif);
        rtw89_core_release_bit_map(rtwdev->hw_port, rtwvif->port);
        list_del_init(&rtwvif->list);
+       rtw89_recalc_lps(rtwdev);
        rtw89_enter_ips_by_hwflags(rtwdev);
 
        mutex_unlock(&rtwdev->mutex);
@@ -451,6 +445,9 @@ static void rtw89_ops_bss_info_changed(struct ieee80211_hw *hw,
        if (changed & BSS_CHANGED_CQM)
                rtw89_fw_h2c_set_bcn_fltr_cfg(rtwdev, vif, true);
 
+       if (changed & BSS_CHANGED_PS)
+               rtw89_recalc_lps(rtwdev);
+
        mutex_unlock(&rtwdev->mutex);
 }
 
index fa94335..84201ef 100644 (file)
@@ -252,3 +252,29 @@ void rtw89_process_p2p_ps(struct rtw89_dev *rtwdev, struct ieee80211_vif *vif)
        rtw89_p2p_disable_all_noa(rtwdev, vif);
        rtw89_p2p_update_noa(rtwdev, vif);
 }
+
+void rtw89_recalc_lps(struct rtw89_dev *rtwdev)
+{
+       struct ieee80211_vif *vif, *found_vif = NULL;
+       struct rtw89_vif *rtwvif;
+       int count = 0;
+
+       rtw89_for_each_rtwvif(rtwdev, rtwvif) {
+               vif = rtwvif_to_vif(rtwvif);
+
+               if (vif->type != NL80211_IFTYPE_STATION) {
+                       count = 0;
+                       break;
+               }
+
+               count++;
+               found_vif = vif;
+       }
+
+       if (count == 1 && found_vif->cfg.ps) {
+               rtwdev->lps_enabled = true;
+       } else {
+               rtw89_leave_lps(rtwdev);
+               rtwdev->lps_enabled = false;
+       }
+}
index 73c008d..4c18f49 100644 (file)
@@ -15,6 +15,7 @@ void rtw89_enter_ips(struct rtw89_dev *rtwdev);
 void rtw89_leave_ips(struct rtw89_dev *rtwdev);
 void rtw89_set_coex_ctrl_lps(struct rtw89_dev *rtwdev, bool btc_ctrl);
 void rtw89_process_p2p_ps(struct rtw89_dev *rtwdev, struct ieee80211_vif *vif);
+void rtw89_recalc_lps(struct rtw89_dev *rtwdev);
 
 static inline void rtw89_leave_ips_by_hwflags(struct rtw89_dev *rtwdev)
 {
index eaa2ea0..6da1b60 100644 (file)
        RTW8852B_FW_BASENAME "-" __stringify(RTW8852B_FW_FORMAT_MAX) ".bin"
 
 static const struct rtw89_hfc_ch_cfg rtw8852b_hfc_chcfg_pcie[] = {
-       {5, 343, grp_0}, /* ACH 0 */
-       {5, 343, grp_0}, /* ACH 1 */
-       {5, 343, grp_0}, /* ACH 2 */
-       {5, 343, grp_0}, /* ACH 3 */
+       {5, 341, grp_0}, /* ACH 0 */
+       {5, 341, grp_0}, /* ACH 1 */
+       {4, 342, grp_0}, /* ACH 2 */
+       {4, 342, grp_0}, /* ACH 3 */
        {0, 0, grp_0}, /* ACH 4 */
        {0, 0, grp_0}, /* ACH 5 */
        {0, 0, grp_0}, /* ACH 6 */
        {0, 0, grp_0}, /* ACH 7 */
-       {4, 344, grp_0}, /* B0MGQ */
-       {4, 344, grp_0}, /* B0HIQ */
+       {4, 342, grp_0}, /* B0MGQ */
+       {4, 342, grp_0}, /* B0HIQ */
        {0, 0, grp_0}, /* B1MGQ */
        {0, 0, grp_0}, /* B1HIQ */
        {40, 0, 0} /* FWCMDQ */
 };
 
 static const struct rtw89_hfc_pub_cfg rtw8852b_hfc_pubcfg_pcie = {
-       448, /* Group 0 */
+       446, /* Group 0 */
        0, /* Group 1 */
-       448, /* Public Max */
+       446, /* Public Max */
        0 /* WP threshold */
 };
 
@@ -49,13 +49,13 @@ static const struct rtw89_hfc_param_ini rtw8852b_hfc_param_ini_pcie[] = {
 };
 
 static const struct rtw89_dle_mem rtw8852b_dle_mem_pcie[] = {
-       [RTW89_QTA_SCC] = {RTW89_QTA_SCC, &rtw89_mac_size.wde_size6,
-                          &rtw89_mac_size.ple_size6, &rtw89_mac_size.wde_qt6,
-                          &rtw89_mac_size.wde_qt6, &rtw89_mac_size.ple_qt18,
+       [RTW89_QTA_SCC] = {RTW89_QTA_SCC, &rtw89_mac_size.wde_size7,
+                          &rtw89_mac_size.ple_size6, &rtw89_mac_size.wde_qt7,
+                          &rtw89_mac_size.wde_qt7, &rtw89_mac_size.ple_qt18,
                           &rtw89_mac_size.ple_qt58},
-       [RTW89_QTA_WOW] = {RTW89_QTA_WOW, &rtw89_mac_size.wde_size6,
-                          &rtw89_mac_size.ple_size6, &rtw89_mac_size.wde_qt6,
-                          &rtw89_mac_size.wde_qt6, &rtw89_mac_size.ple_qt18,
+       [RTW89_QTA_WOW] = {RTW89_QTA_WOW, &rtw89_mac_size.wde_size7,
+                          &rtw89_mac_size.ple_size6, &rtw89_mac_size.wde_qt7,
+                          &rtw89_mac_size.wde_qt7, &rtw89_mac_size.ple_qt18,
                           &rtw89_mac_size.ple_qt_52b_wow},
        [RTW89_QTA_DLFW] = {RTW89_QTA_DLFW, &rtw89_mac_size.wde_size9,
                            &rtw89_mac_size.ple_size8, &rtw89_mac_size.wde_qt4,
index 9a8faaf..89c7a14 100644 (file)
@@ -5964,10 +5964,11 @@ static int hwsim_new_radio_nl(struct sk_buff *msg, struct genl_info *info)
                        ret = -ENOMEM;
                        goto out_free;
                }
+               param.pmsr_capa = pmsr_capa;
+
                ret = parse_pmsr_capa(info->attrs[HWSIM_ATTR_PMSR_SUPPORT], pmsr_capa, info);
                if (ret)
                        goto out_free;
-               param.pmsr_capa = pmsr_capa;
        }
 
        ret = mac80211_hwsim_new_radio(info, &param);
index c066b00..829515a 100644 (file)
@@ -565,24 +565,32 @@ static void ipc_imem_run_state_worker(struct work_struct *instance)
        struct ipc_mux_config mux_cfg;
        struct iosm_imem *ipc_imem;
        u8 ctrl_chl_idx = 0;
+       int ret;
 
        ipc_imem = container_of(instance, struct iosm_imem, run_state_worker);
 
        if (ipc_imem->phase != IPC_P_RUN) {
                dev_err(ipc_imem->dev,
                        "Modem link down. Exit run state worker.");
-               return;
+               goto err_out;
        }
 
        if (test_and_clear_bit(IOSM_DEVLINK_INIT, &ipc_imem->flag))
                ipc_devlink_deinit(ipc_imem->ipc_devlink);
 
-       if (!ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg))
-               ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
+       ret = ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg);
+       if (ret < 0)
+               goto err_out;
+
+       ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
+       if (!ipc_imem->mux)
+               goto err_out;
+
+       ret = ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
+       if (ret < 0)
+               goto err_ipc_mux_deinit;
 
-       ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
-       if (ipc_imem->mux)
-               ipc_imem->mux->wwan = ipc_imem->wwan;
+       ipc_imem->mux->wwan = ipc_imem->wwan;
 
        while (ctrl_chl_idx < IPC_MEM_MAX_CHANNELS) {
                if (!ipc_chnl_cfg_get(&chnl_cfg_port, ctrl_chl_idx)) {
@@ -622,6 +630,13 @@ static void ipc_imem_run_state_worker(struct work_struct *instance)
 
        /* Complete all memory stores after setting bit */
        smp_mb__after_atomic();
+
+       return;
+
+err_ipc_mux_deinit:
+       ipc_mux_deinit(ipc_imem->mux);
+err_out:
+       ipc_uevent_send(ipc_imem->dev, UEVENT_CD_READY_LINK_DOWN);
 }
 
 static void ipc_imem_handle_irq(struct iosm_imem *ipc_imem, int irq)
index 66b90cc..109cf89 100644 (file)
@@ -77,8 +77,8 @@ out:
 }
 
 /* Initialize wwan channel */
-void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
-                               enum ipc_mux_protocol mux_type)
+int ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
+                              enum ipc_mux_protocol mux_type)
 {
        struct ipc_chnl_cfg chnl_cfg = { 0 };
 
@@ -87,7 +87,7 @@ void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
        /* If modem version is invalid (0xffffffff), do not initialize WWAN. */
        if (ipc_imem->cp_version == -1) {
                dev_err(ipc_imem->dev, "invalid CP version");
-               return;
+               return -EIO;
        }
 
        ipc_chnl_cfg_get(&chnl_cfg, ipc_imem->nr_of_channels);
@@ -104,9 +104,13 @@ void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
 
        /* WWAN registration. */
        ipc_imem->wwan = ipc_wwan_init(ipc_imem, ipc_imem->dev);
-       if (!ipc_imem->wwan)
+       if (!ipc_imem->wwan) {
                dev_err(ipc_imem->dev,
                        "failed to register the ipc_wwan interfaces");
+               return -ENOMEM;
+       }
+
+       return 0;
 }
 
 /* Map SKB to DMA for transfer */
index f8afb21..026c5bd 100644 (file)
@@ -91,9 +91,11 @@ int ipc_imem_sys_wwan_transmit(struct iosm_imem *ipc_imem, int if_id,
  *                             MUX.
  * @ipc_imem:          Pointer to iosm_imem struct.
  * @mux_type:          Type of mux protocol.
+ *
+ * Return: 0 on success and failure value on error
  */
-void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
-                               enum ipc_mux_protocol mux_type);
+int ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
+                              enum ipc_mux_protocol mux_type);
 
 /**
  * ipc_imem_sys_devlink_open - Open a Flash/CD Channel link to CP
index d6b166f..bff46f7 100644 (file)
@@ -626,14 +626,12 @@ static void mux_dl_adb_decode(struct iosm_mux *ipc_mux,
                if (adth->signature != cpu_to_le32(IOSM_AGGR_MUX_SIG_ADTH))
                        goto adb_decode_err;
 
-               if (le16_to_cpu(adth->table_length) < (sizeof(struct mux_adth) -
-                               sizeof(struct mux_adth_dg)))
+               if (le16_to_cpu(adth->table_length) < sizeof(struct mux_adth))
                        goto adb_decode_err;
 
                /* Calculate the number of datagrams. */
                nr_of_dg = (le16_to_cpu(adth->table_length) -
-                                       sizeof(struct mux_adth) +
-                                       sizeof(struct mux_adth_dg)) /
+                                       sizeof(struct mux_adth)) /
                                        sizeof(struct mux_adth_dg);
 
                /* Is the datagram table empty ? */
@@ -649,7 +647,7 @@ static void mux_dl_adb_decode(struct iosm_mux *ipc_mux,
                }
 
                /* New aggregated datagram table. */
-               dg = &adth->dg;
+               dg = adth->dg;
                if (mux_dl_process_dg(ipc_mux, adbh, dg, skb, if_id,
                                      nr_of_dg) < 0)
                        goto adb_decode_err;
@@ -849,7 +847,7 @@ static void ipc_mux_ul_encode_adth(struct iosm_mux *ipc_mux,
                        adth->if_id = i;
                        adth->table_length = cpu_to_le16(adth_dg_size);
                        adth_dg_size -= offsetof(struct mux_adth, dg);
-                       memcpy(&adth->dg, ul_adb->dg[i], adth_dg_size);
+                       memcpy(adth->dg, ul_adb->dg[i], adth_dg_size);
                        ul_adb->if_cnt++;
                }
 
@@ -1426,14 +1424,13 @@ static int ipc_mux_get_payload_from_adb(struct iosm_mux *ipc_mux,
 
                if (adth->signature == cpu_to_le32(IOSM_AGGR_MUX_SIG_ADTH)) {
                        nr_of_dg = (le16_to_cpu(adth->table_length) -
-                                       sizeof(struct mux_adth) +
-                                       sizeof(struct mux_adth_dg)) /
+                                       sizeof(struct mux_adth)) /
                                        sizeof(struct mux_adth_dg);
 
                        if (nr_of_dg <= 0)
                                return payload_size;
 
-                       dg = &adth->dg;
+                       dg = adth->dg;
 
                        for (i = 0; i < nr_of_dg; i++, dg++) {
                                if (le32_to_cpu(dg->datagram_index) <
index 5d4e3b8..f8df88f 100644 (file)
@@ -161,7 +161,7 @@ struct mux_adth {
        u8 opt_ipv4v6;
        __le32 next_table_index;
        __le32 reserved2;
-       struct mux_adth_dg dg;
+       struct mux_adth_dg dg[];
 };
 
 /**
index aec3a18..7162bf3 100644 (file)
@@ -1293,9 +1293,9 @@ int t7xx_cldma_init(struct cldma_ctrl *md_ctrl)
        for (i = 0; i < CLDMA_TXQ_NUM; i++) {
                md_cd_queue_struct_init(&md_ctrl->txq[i], md_ctrl, MTK_TX, i);
                md_ctrl->txq[i].worker =
-                       alloc_workqueue("md_hif%d_tx%d_worker",
-                                       WQ_UNBOUND | WQ_MEM_RECLAIM | (i ? 0 : WQ_HIGHPRI),
-                                       1, md_ctrl->hif_id, i);
+                       alloc_ordered_workqueue("md_hif%d_tx%d_worker",
+                                       WQ_MEM_RECLAIM | (i ? 0 : WQ_HIGHPRI),
+                                       md_ctrl->hif_id, i);
                if (!md_ctrl->txq[i].worker)
                        goto err_workqueue;
 
@@ -1306,9 +1306,10 @@ int t7xx_cldma_init(struct cldma_ctrl *md_ctrl)
                md_cd_queue_struct_init(&md_ctrl->rxq[i], md_ctrl, MTK_RX, i);
                INIT_WORK(&md_ctrl->rxq[i].cldma_work, t7xx_cldma_rx_done);
 
-               md_ctrl->rxq[i].worker = alloc_workqueue("md_hif%d_rx%d_worker",
-                                                        WQ_UNBOUND | WQ_MEM_RECLAIM,
-                                                        1, md_ctrl->hif_id, i);
+               md_ctrl->rxq[i].worker =
+                       alloc_ordered_workqueue("md_hif%d_rx%d_worker",
+                                               WQ_MEM_RECLAIM,
+                                               md_ctrl->hif_id, i);
                if (!md_ctrl->rxq[i].worker)
                        goto err_workqueue;
        }
index 4651420..8dab025 100644 (file)
@@ -618,8 +618,9 @@ int t7xx_dpmaif_txq_init(struct dpmaif_tx_queue *txq)
                return ret;
        }
 
-       txq->worker = alloc_workqueue("md_dpmaif_tx%d_worker", WQ_UNBOUND | WQ_MEM_RECLAIM |
-                                     (txq->index ? 0 : WQ_HIGHPRI), 1, txq->index);
+       txq->worker = alloc_ordered_workqueue("md_dpmaif_tx%d_worker",
+                               WQ_MEM_RECLAIM | (txq->index ? 0 : WQ_HIGHPRI),
+                               txq->index);
        if (!txq->worker)
                return -ENOMEM;
 
index 226fc17..91256e0 100644 (file)
@@ -45,6 +45,7 @@
 #define T7XX_PCI_IREG_BASE             0
 #define T7XX_PCI_EREG_BASE             2
 
+#define T7XX_INIT_TIMEOUT              20
 #define PM_SLEEP_DIS_TIMEOUT_MS                20
 #define PM_ACK_TIMEOUT_MS              1500
 #define PM_AUTOSUSPEND_MS              20000
@@ -96,6 +97,7 @@ static int t7xx_pci_pm_init(struct t7xx_pci_dev *t7xx_dev)
        spin_lock_init(&t7xx_dev->md_pm_lock);
        init_completion(&t7xx_dev->sleep_lock_acquire);
        init_completion(&t7xx_dev->pm_sr_ack);
+       init_completion(&t7xx_dev->init_done);
        atomic_set(&t7xx_dev->md_pm_state, MTK_PM_INIT);
 
        device_init_wakeup(&pdev->dev, true);
@@ -124,6 +126,7 @@ void t7xx_pci_pm_init_late(struct t7xx_pci_dev *t7xx_dev)
        pm_runtime_mark_last_busy(&t7xx_dev->pdev->dev);
        pm_runtime_allow(&t7xx_dev->pdev->dev);
        pm_runtime_put_noidle(&t7xx_dev->pdev->dev);
+       complete_all(&t7xx_dev->init_done);
 }
 
 static int t7xx_pci_pm_reinit(struct t7xx_pci_dev *t7xx_dev)
@@ -529,6 +532,20 @@ static void t7xx_pci_shutdown(struct pci_dev *pdev)
        __t7xx_pci_pm_suspend(pdev);
 }
 
+static int t7xx_pci_pm_prepare(struct device *dev)
+{
+       struct pci_dev *pdev = to_pci_dev(dev);
+       struct t7xx_pci_dev *t7xx_dev;
+
+       t7xx_dev = pci_get_drvdata(pdev);
+       if (!wait_for_completion_timeout(&t7xx_dev->init_done, T7XX_INIT_TIMEOUT * HZ)) {
+               dev_warn(dev, "Not ready for system sleep.\n");
+               return -ETIMEDOUT;
+       }
+
+       return 0;
+}
+
 static int t7xx_pci_pm_suspend(struct device *dev)
 {
        return __t7xx_pci_pm_suspend(to_pci_dev(dev));
@@ -555,6 +572,7 @@ static int t7xx_pci_pm_runtime_resume(struct device *dev)
 }
 
 static const struct dev_pm_ops t7xx_pci_pm_ops = {
+       .prepare = t7xx_pci_pm_prepare,
        .suspend = t7xx_pci_pm_suspend,
        .resume = t7xx_pci_pm_resume,
        .resume_noirq = t7xx_pci_pm_resume_noirq,
index 112efa5..f08f1ab 100644 (file)
@@ -69,6 +69,7 @@ struct t7xx_pci_dev {
        struct t7xx_modem       *md;
        struct t7xx_ccmni_ctrl  *ccmni_ctlb;
        bool                    rgu_pci_irq_en;
+       struct completion       init_done;
 
        /* Low Power Items */
        struct list_head        md_pm_entities;
index f12f903..da3e2dc 100644 (file)
@@ -762,3 +762,6 @@ EXPORT_SYMBOL(fdp_nci_remove);
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("NFC NCI driver for Intel Fields Peak NFC controller");
 MODULE_AUTHOR("Robert Dolca <robert.dolca@intel.com>");
+
+MODULE_FIRMWARE(FDP_OTP_PATCH_NAME);
+MODULE_FIRMWARE(FDP_RAM_PATCH_NAME);
index 44eeb17..a55381f 100644 (file)
@@ -336,10 +336,6 @@ static struct dentry *nfcsim_debugfs_root;
 static void nfcsim_debugfs_init(void)
 {
        nfcsim_debugfs_root = debugfs_create_dir("nfcsim", NULL);
-
-       if (!nfcsim_debugfs_root)
-               pr_err("Could not create debugfs entry\n");
-
 }
 
 static void nfcsim_debugfs_remove(void)
index f70ba58..ab0f32b 100644 (file)
 
 /* Globals */
 
+/* The "nubus.populate_procfs" parameter makes slot resources available in
+ * procfs. It's deprecated and disabled by default because procfs is no longer
+ * thought to be suitable for that and some board ROMs make it too expensive.
+ */
+bool nubus_populate_procfs;
+module_param_named(populate_procfs, nubus_populate_procfs, bool, 0);
+
 LIST_HEAD(nubus_func_rsrcs);
 
 /* Meaning of "bytelanes":
@@ -572,9 +579,9 @@ nubus_get_functional_resource(struct nubus_board *board, int slot,
                        nubus_proc_add_rsrc(dir.procdir, &ent);
                        break;
                default:
-                       /* Local/Private resources have their own
-                          function */
-                       nubus_get_private_resource(fres, dir.procdir, &ent);
+                       if (nubus_populate_procfs)
+                               nubus_get_private_resource(fres, dir.procdir,
+                                                          &ent);
                }
        }
 
index 1fd6678..e7a347d 100644 (file)
@@ -55,7 +55,7 @@ struct proc_dir_entry *nubus_proc_add_board(struct nubus_board *board)
 {
        char name[2];
 
-       if (!proc_bus_nubus_dir)
+       if (!proc_bus_nubus_dir || !nubus_populate_procfs)
                return NULL;
        snprintf(name, sizeof(name), "%x", board->slot);
        return proc_mkdir(name, proc_bus_nubus_dir);
@@ -72,9 +72,10 @@ struct proc_dir_entry *nubus_proc_add_rsrc_dir(struct proc_dir_entry *procdir,
        char name[9];
        int lanes = board->lanes;
 
-       if (!procdir)
+       if (!procdir || !nubus_populate_procfs)
                return NULL;
        snprintf(name, sizeof(name), "%x", ent->type);
+       remove_proc_subtree(name, procdir);
        return proc_mkdir_data(name, 0555, procdir, (void *)lanes);
 }
 
@@ -137,6 +138,18 @@ static int nubus_proc_rsrc_show(struct seq_file *m, void *v)
        return 0;
 }
 
+static int nubus_rsrc_proc_open(struct inode *inode, struct file *file)
+{
+       return single_open(file, nubus_proc_rsrc_show, inode);
+}
+
+static const struct proc_ops nubus_rsrc_proc_ops = {
+       .proc_open      = nubus_rsrc_proc_open,
+       .proc_read      = seq_read,
+       .proc_lseek     = seq_lseek,
+       .proc_release   = single_release,
+};
+
 void nubus_proc_add_rsrc_mem(struct proc_dir_entry *procdir,
                             const struct nubus_dirent *ent,
                             unsigned int size)
@@ -144,7 +157,7 @@ void nubus_proc_add_rsrc_mem(struct proc_dir_entry *procdir,
        char name[9];
        struct nubus_proc_pde_data *pded;
 
-       if (!procdir)
+       if (!procdir || !nubus_populate_procfs)
                return;
 
        snprintf(name, sizeof(name), "%x", ent->type);
@@ -152,8 +165,9 @@ void nubus_proc_add_rsrc_mem(struct proc_dir_entry *procdir,
                pded = nubus_proc_alloc_pde_data(nubus_dirptr(ent), size);
        else
                pded = NULL;
-       proc_create_single_data(name, S_IFREG | 0444, procdir,
-                       nubus_proc_rsrc_show, pded);
+       remove_proc_subtree(name, procdir);
+       proc_create_data(name, S_IFREG | 0444, procdir,
+                        &nubus_rsrc_proc_ops, pded);
 }
 
 void nubus_proc_add_rsrc(struct proc_dir_entry *procdir,
@@ -162,13 +176,14 @@ void nubus_proc_add_rsrc(struct proc_dir_entry *procdir,
        char name[9];
        unsigned char *data = (unsigned char *)ent->data;
 
-       if (!procdir)
+       if (!procdir || !nubus_populate_procfs)
                return;
 
        snprintf(name, sizeof(name), "%x", ent->type);
-       proc_create_single_data(name, S_IFREG | 0444, procdir,
-                       nubus_proc_rsrc_show,
-                       nubus_proc_alloc_pde_data(data, 0));
+       remove_proc_subtree(name, procdir);
+       proc_create_data(name, S_IFREG | 0444, procdir,
+                        &nubus_rsrc_proc_ops,
+                        nubus_proc_alloc_pde_data(data, 0));
 }
 
 /*
index e27202d..d3fc506 100644 (file)
@@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_FC)                 += nvme-fc.o
 obj-$(CONFIG_NVME_TCP)                 += nvme-tcp.o
 obj-$(CONFIG_NVME_APPLE)               += nvme-apple.o
 
-nvme-core-y                            += core.o ioctl.o
+nvme-core-y                            += core.o ioctl.o sysfs.o
 nvme-core-$(CONFIG_NVME_VERBOSE_ERRORS)        += constants.o
 nvme-core-$(CONFIG_TRACING)            += trace.o
 nvme-core-$(CONFIG_NVME_MULTIPATH)     += multipath.o
index ea16a0a..daf5d14 100644 (file)
@@ -30,18 +30,18 @@ struct nvme_dhchap_queue_context {
        u32 s2;
        u16 transaction;
        u8 status;
+       u8 dhgroup_id;
        u8 hash_id;
        size_t hash_len;
-       u8 dhgroup_id;
        u8 c1[64];
        u8 c2[64];
        u8 response[64];
        u8 *host_response;
        u8 *ctrl_key;
-       int ctrl_key_len;
        u8 *host_key;
-       int host_key_len;
        u8 *sess_key;
+       int ctrl_key_len;
+       int host_key_len;
        int sess_key_len;
 };
 
index bc523ca..5e4f884 100644 (file)
@@ -21,7 +21,7 @@ static const char * const nvme_ops[] = {
        [nvme_cmd_resv_release] = "Reservation Release",
        [nvme_cmd_zone_mgmt_send] = "Zone Management Send",
        [nvme_cmd_zone_mgmt_recv] = "Zone Management Receive",
-       [nvme_cmd_zone_append] = "Zone Management Append",
+       [nvme_cmd_zone_append] = "Zone Append",
 };
 
 static const char * const nvme_admin_ops[] = {
index ccb6eb1..fdfcf27 100644 (file)
@@ -237,7 +237,7 @@ int nvme_delete_ctrl(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_delete_ctrl);
 
-static void nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl)
+void nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl)
 {
        /*
         * Keep a reference until nvme_do_delete_ctrl() complete,
@@ -397,7 +397,16 @@ void nvme_complete_rq(struct request *req)
        trace_nvme_complete_rq(req);
        nvme_cleanup_cmd(req);
 
-       if (ctrl->kas)
+       /*
+        * Completions of long-running commands should not be able to
+        * defer sending of periodic keep alives, since the controller
+        * may have completed processing such commands a long time ago
+        * (arbitrarily close to command submission time).
+        * req->deadline - req->timeout is the command submission time
+        * in jiffies.
+        */
+       if (ctrl->kas &&
+           req->deadline - req->timeout >= ctrl->ka_last_check_time)
                ctrl->comp_seen = true;
 
        switch (nvme_decide_disposition(req)) {
@@ -1115,7 +1124,7 @@ u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
 }
 EXPORT_SYMBOL_NS_GPL(nvme_passthru_start, NVME_TARGET_PASSTHRU);
 
-void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
+void nvme_passthru_end(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u32 effects,
                       struct nvme_command *cmd, int status)
 {
        if (effects & NVME_CMD_EFFECTS_CSE_MASK) {
@@ -1132,6 +1141,8 @@ void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
                nvme_queue_scan(ctrl);
                flush_work(&ctrl->scan_work);
        }
+       if (ns)
+               return;
 
        switch (cmd->common.opcode) {
        case nvme_admin_set_features:
@@ -1161,9 +1172,25 @@ EXPORT_SYMBOL_NS_GPL(nvme_passthru_end, NVME_TARGET_PASSTHRU);
  *   The host should send Keep Alive commands at half of the Keep Alive Timeout
  *   accounting for transport roundtrip times [..].
  */
+static unsigned long nvme_keep_alive_work_period(struct nvme_ctrl *ctrl)
+{
+       unsigned long delay = ctrl->kato * HZ / 2;
+
+       /*
+        * When using Traffic Based Keep Alive, we need to run
+        * nvme_keep_alive_work at twice the normal frequency, as one
+        * command completion can postpone sending a keep alive command
+        * by up to twice the delay between runs.
+        */
+       if (ctrl->ctratt & NVME_CTRL_ATTR_TBKAS)
+               delay /= 2;
+       return delay;
+}
+
 static void nvme_queue_keep_alive_work(struct nvme_ctrl *ctrl)
 {
-       queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ / 2);
+       queue_delayed_work(nvme_wq, &ctrl->ka_work,
+                          nvme_keep_alive_work_period(ctrl));
 }
 
 static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
@@ -1172,6 +1199,20 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
        struct nvme_ctrl *ctrl = rq->end_io_data;
        unsigned long flags;
        bool startka = false;
+       unsigned long rtt = jiffies - (rq->deadline - rq->timeout);
+       unsigned long delay = nvme_keep_alive_work_period(ctrl);
+
+       /*
+        * Subtract off the keepalive RTT so nvme_keep_alive_work runs
+        * at the desired frequency.
+        */
+       if (rtt <= delay) {
+               delay -= rtt;
+       } else {
+               dev_warn(ctrl->device, "long keepalive RTT (%u ms)\n",
+                        jiffies_to_msecs(rtt));
+               delay = 0;
+       }
 
        blk_mq_free_request(rq);
 
@@ -1182,6 +1223,7 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
                return RQ_END_IO_NONE;
        }
 
+       ctrl->ka_last_check_time = jiffies;
        ctrl->comp_seen = false;
        spin_lock_irqsave(&ctrl->lock, flags);
        if (ctrl->state == NVME_CTRL_LIVE ||
@@ -1189,7 +1231,7 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
                startka = true;
        spin_unlock_irqrestore(&ctrl->lock, flags);
        if (startka)
-               nvme_queue_keep_alive_work(ctrl);
+               queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
        return RQ_END_IO_NONE;
 }
 
@@ -1200,6 +1242,8 @@ static void nvme_keep_alive_work(struct work_struct *work)
        bool comp_seen = ctrl->comp_seen;
        struct request *rq;
 
+       ctrl->ka_last_check_time = jiffies;
+
        if ((ctrl->ctratt & NVME_CTRL_ATTR_TBKAS) && comp_seen) {
                dev_dbg(ctrl->device,
                        "reschedule traffic based keep-alive timer\n");
@@ -1591,12 +1635,12 @@ static void nvme_ns_release(struct nvme_ns *ns)
        nvme_put_ns(ns);
 }
 
-static int nvme_open(struct block_device *bdev, fmode_t mode)
+static int nvme_open(struct gendisk *disk, blk_mode_t mode)
 {
-       return nvme_ns_open(bdev->bd_disk->private_data);
+       return nvme_ns_open(disk->private_data);
 }
 
-static void nvme_release(struct gendisk *disk, fmode_t mode)
+static void nvme_release(struct gendisk *disk)
 {
        nvme_ns_release(disk->private_data);
 }
@@ -1835,7 +1879,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
                struct nvme_ns *ns, struct nvme_id_ns *id)
 {
        sector_t capacity = nvme_lba_to_sect(ns, le64_to_cpu(id->nsze));
-       unsigned short bs = 1 << ns->lba_shift;
+       u32 bs = 1U << ns->lba_shift;
        u32 atomic_bs, phys_bs, io_opt = 0;
 
        /*
@@ -2256,7 +2300,7 @@ static int nvme_report_zones(struct gendisk *disk, sector_t sector,
 #define nvme_report_zones      NULL
 #endif /* CONFIG_BLK_DEV_ZONED */
 
-static const struct block_device_operations nvme_bdev_ops = {
+const struct block_device_operations nvme_bdev_ops = {
        .owner          = THIS_MODULE,
        .ioctl          = nvme_ioctl,
        .compat_ioctl   = blkdev_compat_ptr_ioctl,
@@ -2791,75 +2835,6 @@ static struct nvme_subsystem *__nvme_find_get_subsystem(const char *subsysnqn)
        return NULL;
 }
 
-#define SUBSYS_ATTR_RO(_name, _mode, _show)                    \
-       struct device_attribute subsys_attr_##_name = \
-               __ATTR(_name, _mode, _show, NULL)
-
-static ssize_t nvme_subsys_show_nqn(struct device *dev,
-                                   struct device_attribute *attr,
-                                   char *buf)
-{
-       struct nvme_subsystem *subsys =
-               container_of(dev, struct nvme_subsystem, dev);
-
-       return sysfs_emit(buf, "%s\n", subsys->subnqn);
-}
-static SUBSYS_ATTR_RO(subsysnqn, S_IRUGO, nvme_subsys_show_nqn);
-
-static ssize_t nvme_subsys_show_type(struct device *dev,
-                                   struct device_attribute *attr,
-                                   char *buf)
-{
-       struct nvme_subsystem *subsys =
-               container_of(dev, struct nvme_subsystem, dev);
-
-       switch (subsys->subtype) {
-       case NVME_NQN_DISC:
-               return sysfs_emit(buf, "discovery\n");
-       case NVME_NQN_NVME:
-               return sysfs_emit(buf, "nvm\n");
-       default:
-               return sysfs_emit(buf, "reserved\n");
-       }
-}
-static SUBSYS_ATTR_RO(subsystype, S_IRUGO, nvme_subsys_show_type);
-
-#define nvme_subsys_show_str_function(field)                           \
-static ssize_t subsys_##field##_show(struct device *dev,               \
-                           struct device_attribute *attr, char *buf)   \
-{                                                                      \
-       struct nvme_subsystem *subsys =                                 \
-               container_of(dev, struct nvme_subsystem, dev);          \
-       return sysfs_emit(buf, "%.*s\n",                                \
-                          (int)sizeof(subsys->field), subsys->field);  \
-}                                                                      \
-static SUBSYS_ATTR_RO(field, S_IRUGO, subsys_##field##_show);
-
-nvme_subsys_show_str_function(model);
-nvme_subsys_show_str_function(serial);
-nvme_subsys_show_str_function(firmware_rev);
-
-static struct attribute *nvme_subsys_attrs[] = {
-       &subsys_attr_model.attr,
-       &subsys_attr_serial.attr,
-       &subsys_attr_firmware_rev.attr,
-       &subsys_attr_subsysnqn.attr,
-       &subsys_attr_subsystype.attr,
-#ifdef CONFIG_NVME_MULTIPATH
-       &subsys_attr_iopolicy.attr,
-#endif
-       NULL,
-};
-
-static const struct attribute_group nvme_subsys_attrs_group = {
-       .attrs = nvme_subsys_attrs,
-};
-
-static const struct attribute_group *nvme_subsys_attrs_groups[] = {
-       &nvme_subsys_attrs_group,
-       NULL,
-};
-
 static inline bool nvme_discovery_ctrl(struct nvme_ctrl *ctrl)
 {
        return ctrl->opts && ctrl->opts->discovery_nqn;
@@ -3064,7 +3039,8 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl)
                ctrl->max_zeroes_sectors = 0;
 
        if (ctrl->subsys->subtype != NVME_NQN_NVME ||
-           nvme_ctrl_limited_cns(ctrl))
+           nvme_ctrl_limited_cns(ctrl) ||
+           test_bit(NVME_CTRL_SKIP_ID_CNS_CS, &ctrl->flags))
                return 0;
 
        id = kzalloc(sizeof(*id), GFP_KERNEL);
@@ -3086,6 +3062,8 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl)
                ctrl->max_zeroes_sectors = nvme_mps_to_sectors(ctrl, id->wzsl);
 
 free_data:
+       if (ret > 0)
+               set_bit(NVME_CTRL_SKIP_ID_CNS_CS, &ctrl->flags);
        kfree(id);
        return ret;
 }
@@ -3393,583 +3371,6 @@ static const struct file_operations nvme_dev_fops = {
        .uring_cmd      = nvme_dev_uring_cmd,
 };
 
-static ssize_t nvme_sysfs_reset(struct device *dev,
-                               struct device_attribute *attr, const char *buf,
-                               size_t count)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       int ret;
-
-       ret = nvme_reset_ctrl_sync(ctrl);
-       if (ret < 0)
-               return ret;
-       return count;
-}
-static DEVICE_ATTR(reset_controller, S_IWUSR, NULL, nvme_sysfs_reset);
-
-static ssize_t nvme_sysfs_rescan(struct device *dev,
-                               struct device_attribute *attr, const char *buf,
-                               size_t count)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       nvme_queue_scan(ctrl);
-       return count;
-}
-static DEVICE_ATTR(rescan_controller, S_IWUSR, NULL, nvme_sysfs_rescan);
-
-static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
-{
-       struct gendisk *disk = dev_to_disk(dev);
-
-       if (disk->fops == &nvme_bdev_ops)
-               return nvme_get_ns_from_dev(dev)->head;
-       else
-               return disk->private_data;
-}
-
-static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
-               char *buf)
-{
-       struct nvme_ns_head *head = dev_to_ns_head(dev);
-       struct nvme_ns_ids *ids = &head->ids;
-       struct nvme_subsystem *subsys = head->subsys;
-       int serial_len = sizeof(subsys->serial);
-       int model_len = sizeof(subsys->model);
-
-       if (!uuid_is_null(&ids->uuid))
-               return sysfs_emit(buf, "uuid.%pU\n", &ids->uuid);
-
-       if (memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
-               return sysfs_emit(buf, "eui.%16phN\n", ids->nguid);
-
-       if (memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
-               return sysfs_emit(buf, "eui.%8phN\n", ids->eui64);
-
-       while (serial_len > 0 && (subsys->serial[serial_len - 1] == ' ' ||
-                                 subsys->serial[serial_len - 1] == '\0'))
-               serial_len--;
-       while (model_len > 0 && (subsys->model[model_len - 1] == ' ' ||
-                                subsys->model[model_len - 1] == '\0'))
-               model_len--;
-
-       return sysfs_emit(buf, "nvme.%04x-%*phN-%*phN-%08x\n", subsys->vendor_id,
-               serial_len, subsys->serial, model_len, subsys->model,
-               head->ns_id);
-}
-static DEVICE_ATTR_RO(wwid);
-
-static ssize_t nguid_show(struct device *dev, struct device_attribute *attr,
-               char *buf)
-{
-       return sysfs_emit(buf, "%pU\n", dev_to_ns_head(dev)->ids.nguid);
-}
-static DEVICE_ATTR_RO(nguid);
-
-static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
-               char *buf)
-{
-       struct nvme_ns_ids *ids = &dev_to_ns_head(dev)->ids;
-
-       /* For backward compatibility expose the NGUID to userspace if
-        * we have no UUID set
-        */
-       if (uuid_is_null(&ids->uuid)) {
-               dev_warn_ratelimited(dev,
-                       "No UUID available providing old NGUID\n");
-               return sysfs_emit(buf, "%pU\n", ids->nguid);
-       }
-       return sysfs_emit(buf, "%pU\n", &ids->uuid);
-}
-static DEVICE_ATTR_RO(uuid);
-
-static ssize_t eui_show(struct device *dev, struct device_attribute *attr,
-               char *buf)
-{
-       return sysfs_emit(buf, "%8ph\n", dev_to_ns_head(dev)->ids.eui64);
-}
-static DEVICE_ATTR_RO(eui);
-
-static ssize_t nsid_show(struct device *dev, struct device_attribute *attr,
-               char *buf)
-{
-       return sysfs_emit(buf, "%d\n", dev_to_ns_head(dev)->ns_id);
-}
-static DEVICE_ATTR_RO(nsid);
-
-static struct attribute *nvme_ns_id_attrs[] = {
-       &dev_attr_wwid.attr,
-       &dev_attr_uuid.attr,
-       &dev_attr_nguid.attr,
-       &dev_attr_eui.attr,
-       &dev_attr_nsid.attr,
-#ifdef CONFIG_NVME_MULTIPATH
-       &dev_attr_ana_grpid.attr,
-       &dev_attr_ana_state.attr,
-#endif
-       NULL,
-};
-
-static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
-               struct attribute *a, int n)
-{
-       struct device *dev = container_of(kobj, struct device, kobj);
-       struct nvme_ns_ids *ids = &dev_to_ns_head(dev)->ids;
-
-       if (a == &dev_attr_uuid.attr) {
-               if (uuid_is_null(&ids->uuid) &&
-                   !memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
-                       return 0;
-       }
-       if (a == &dev_attr_nguid.attr) {
-               if (!memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
-                       return 0;
-       }
-       if (a == &dev_attr_eui.attr) {
-               if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
-                       return 0;
-       }
-#ifdef CONFIG_NVME_MULTIPATH
-       if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
-               if (dev_to_disk(dev)->fops != &nvme_bdev_ops) /* per-path attr */
-                       return 0;
-               if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl))
-                       return 0;
-       }
-#endif
-       return a->mode;
-}
-
-static const struct attribute_group nvme_ns_id_attr_group = {
-       .attrs          = nvme_ns_id_attrs,
-       .is_visible     = nvme_ns_id_attrs_are_visible,
-};
-
-const struct attribute_group *nvme_ns_id_attr_groups[] = {
-       &nvme_ns_id_attr_group,
-       NULL,
-};
-
-#define nvme_show_str_function(field)                                          \
-static ssize_t  field##_show(struct device *dev,                               \
-                           struct device_attribute *attr, char *buf)           \
-{                                                                              \
-        struct nvme_ctrl *ctrl = dev_get_drvdata(dev);                         \
-        return sysfs_emit(buf, "%.*s\n",                                       \
-               (int)sizeof(ctrl->subsys->field), ctrl->subsys->field);         \
-}                                                                              \
-static DEVICE_ATTR(field, S_IRUGO, field##_show, NULL);
-
-nvme_show_str_function(model);
-nvme_show_str_function(serial);
-nvme_show_str_function(firmware_rev);
-
-#define nvme_show_int_function(field)                                          \
-static ssize_t  field##_show(struct device *dev,                               \
-                           struct device_attribute *attr, char *buf)           \
-{                                                                              \
-        struct nvme_ctrl *ctrl = dev_get_drvdata(dev);                         \
-        return sysfs_emit(buf, "%d\n", ctrl->field);                           \
-}                                                                              \
-static DEVICE_ATTR(field, S_IRUGO, field##_show, NULL);
-
-nvme_show_int_function(cntlid);
-nvme_show_int_function(numa_node);
-nvme_show_int_function(queue_count);
-nvme_show_int_function(sqsize);
-nvme_show_int_function(kato);
-
-static ssize_t nvme_sysfs_delete(struct device *dev,
-                               struct device_attribute *attr, const char *buf,
-                               size_t count)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       if (device_remove_file_self(dev, attr))
-               nvme_delete_ctrl_sync(ctrl);
-       return count;
-}
-static DEVICE_ATTR(delete_controller, S_IWUSR, NULL, nvme_sysfs_delete);
-
-static ssize_t nvme_sysfs_show_transport(struct device *dev,
-                                        struct device_attribute *attr,
-                                        char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       return sysfs_emit(buf, "%s\n", ctrl->ops->name);
-}
-static DEVICE_ATTR(transport, S_IRUGO, nvme_sysfs_show_transport, NULL);
-
-static ssize_t nvme_sysfs_show_state(struct device *dev,
-                                    struct device_attribute *attr,
-                                    char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       static const char *const state_name[] = {
-               [NVME_CTRL_NEW]         = "new",
-               [NVME_CTRL_LIVE]        = "live",
-               [NVME_CTRL_RESETTING]   = "resetting",
-               [NVME_CTRL_CONNECTING]  = "connecting",
-               [NVME_CTRL_DELETING]    = "deleting",
-               [NVME_CTRL_DELETING_NOIO]= "deleting (no IO)",
-               [NVME_CTRL_DEAD]        = "dead",
-       };
-
-       if ((unsigned)ctrl->state < ARRAY_SIZE(state_name) &&
-           state_name[ctrl->state])
-               return sysfs_emit(buf, "%s\n", state_name[ctrl->state]);
-
-       return sysfs_emit(buf, "unknown state\n");
-}
-
-static DEVICE_ATTR(state, S_IRUGO, nvme_sysfs_show_state, NULL);
-
-static ssize_t nvme_sysfs_show_subsysnqn(struct device *dev,
-                                        struct device_attribute *attr,
-                                        char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       return sysfs_emit(buf, "%s\n", ctrl->subsys->subnqn);
-}
-static DEVICE_ATTR(subsysnqn, S_IRUGO, nvme_sysfs_show_subsysnqn, NULL);
-
-static ssize_t nvme_sysfs_show_hostnqn(struct device *dev,
-                                       struct device_attribute *attr,
-                                       char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       return sysfs_emit(buf, "%s\n", ctrl->opts->host->nqn);
-}
-static DEVICE_ATTR(hostnqn, S_IRUGO, nvme_sysfs_show_hostnqn, NULL);
-
-static ssize_t nvme_sysfs_show_hostid(struct device *dev,
-                                       struct device_attribute *attr,
-                                       char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       return sysfs_emit(buf, "%pU\n", &ctrl->opts->host->id);
-}
-static DEVICE_ATTR(hostid, S_IRUGO, nvme_sysfs_show_hostid, NULL);
-
-static ssize_t nvme_sysfs_show_address(struct device *dev,
-                                        struct device_attribute *attr,
-                                        char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       return ctrl->ops->get_address(ctrl, buf, PAGE_SIZE);
-}
-static DEVICE_ATTR(address, S_IRUGO, nvme_sysfs_show_address, NULL);
-
-static ssize_t nvme_ctrl_loss_tmo_show(struct device *dev,
-               struct device_attribute *attr, char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       struct nvmf_ctrl_options *opts = ctrl->opts;
-
-       if (ctrl->opts->max_reconnects == -1)
-               return sysfs_emit(buf, "off\n");
-       return sysfs_emit(buf, "%d\n",
-                         opts->max_reconnects * opts->reconnect_delay);
-}
-
-static ssize_t nvme_ctrl_loss_tmo_store(struct device *dev,
-               struct device_attribute *attr, const char *buf, size_t count)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       struct nvmf_ctrl_options *opts = ctrl->opts;
-       int ctrl_loss_tmo, err;
-
-       err = kstrtoint(buf, 10, &ctrl_loss_tmo);
-       if (err)
-               return -EINVAL;
-
-       if (ctrl_loss_tmo < 0)
-               opts->max_reconnects = -1;
-       else
-               opts->max_reconnects = DIV_ROUND_UP(ctrl_loss_tmo,
-                                               opts->reconnect_delay);
-       return count;
-}
-static DEVICE_ATTR(ctrl_loss_tmo, S_IRUGO | S_IWUSR,
-       nvme_ctrl_loss_tmo_show, nvme_ctrl_loss_tmo_store);
-
-static ssize_t nvme_ctrl_reconnect_delay_show(struct device *dev,
-               struct device_attribute *attr, char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       if (ctrl->opts->reconnect_delay == -1)
-               return sysfs_emit(buf, "off\n");
-       return sysfs_emit(buf, "%d\n", ctrl->opts->reconnect_delay);
-}
-
-static ssize_t nvme_ctrl_reconnect_delay_store(struct device *dev,
-               struct device_attribute *attr, const char *buf, size_t count)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       unsigned int v;
-       int err;
-
-       err = kstrtou32(buf, 10, &v);
-       if (err)
-               return err;
-
-       ctrl->opts->reconnect_delay = v;
-       return count;
-}
-static DEVICE_ATTR(reconnect_delay, S_IRUGO | S_IWUSR,
-       nvme_ctrl_reconnect_delay_show, nvme_ctrl_reconnect_delay_store);
-
-static ssize_t nvme_ctrl_fast_io_fail_tmo_show(struct device *dev,
-               struct device_attribute *attr, char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       if (ctrl->opts->fast_io_fail_tmo == -1)
-               return sysfs_emit(buf, "off\n");
-       return sysfs_emit(buf, "%d\n", ctrl->opts->fast_io_fail_tmo);
-}
-
-static ssize_t nvme_ctrl_fast_io_fail_tmo_store(struct device *dev,
-               struct device_attribute *attr, const char *buf, size_t count)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       struct nvmf_ctrl_options *opts = ctrl->opts;
-       int fast_io_fail_tmo, err;
-
-       err = kstrtoint(buf, 10, &fast_io_fail_tmo);
-       if (err)
-               return -EINVAL;
-
-       if (fast_io_fail_tmo < 0)
-               opts->fast_io_fail_tmo = -1;
-       else
-               opts->fast_io_fail_tmo = fast_io_fail_tmo;
-       return count;
-}
-static DEVICE_ATTR(fast_io_fail_tmo, S_IRUGO | S_IWUSR,
-       nvme_ctrl_fast_io_fail_tmo_show, nvme_ctrl_fast_io_fail_tmo_store);
-
-static ssize_t cntrltype_show(struct device *dev,
-                             struct device_attribute *attr, char *buf)
-{
-       static const char * const type[] = {
-               [NVME_CTRL_IO] = "io\n",
-               [NVME_CTRL_DISC] = "discovery\n",
-               [NVME_CTRL_ADMIN] = "admin\n",
-       };
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       if (ctrl->cntrltype > NVME_CTRL_ADMIN || !type[ctrl->cntrltype])
-               return sysfs_emit(buf, "reserved\n");
-
-       return sysfs_emit(buf, type[ctrl->cntrltype]);
-}
-static DEVICE_ATTR_RO(cntrltype);
-
-static ssize_t dctype_show(struct device *dev,
-                          struct device_attribute *attr, char *buf)
-{
-       static const char * const type[] = {
-               [NVME_DCTYPE_NOT_REPORTED] = "none\n",
-               [NVME_DCTYPE_DDC] = "ddc\n",
-               [NVME_DCTYPE_CDC] = "cdc\n",
-       };
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       if (ctrl->dctype > NVME_DCTYPE_CDC || !type[ctrl->dctype])
-               return sysfs_emit(buf, "reserved\n");
-
-       return sysfs_emit(buf, type[ctrl->dctype]);
-}
-static DEVICE_ATTR_RO(dctype);
-
-#ifdef CONFIG_NVME_AUTH
-static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
-               struct device_attribute *attr, char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       struct nvmf_ctrl_options *opts = ctrl->opts;
-
-       if (!opts->dhchap_secret)
-               return sysfs_emit(buf, "none\n");
-       return sysfs_emit(buf, "%s\n", opts->dhchap_secret);
-}
-
-static ssize_t nvme_ctrl_dhchap_secret_store(struct device *dev,
-               struct device_attribute *attr, const char *buf, size_t count)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       struct nvmf_ctrl_options *opts = ctrl->opts;
-       char *dhchap_secret;
-
-       if (!ctrl->opts->dhchap_secret)
-               return -EINVAL;
-       if (count < 7)
-               return -EINVAL;
-       if (memcmp(buf, "DHHC-1:", 7))
-               return -EINVAL;
-
-       dhchap_secret = kzalloc(count + 1, GFP_KERNEL);
-       if (!dhchap_secret)
-               return -ENOMEM;
-       memcpy(dhchap_secret, buf, count);
-       nvme_auth_stop(ctrl);
-       if (strcmp(dhchap_secret, opts->dhchap_secret)) {
-               struct nvme_dhchap_key *key, *host_key;
-               int ret;
-
-               ret = nvme_auth_generate_key(dhchap_secret, &key);
-               if (ret)
-                       return ret;
-               kfree(opts->dhchap_secret);
-               opts->dhchap_secret = dhchap_secret;
-               host_key = ctrl->host_key;
-               mutex_lock(&ctrl->dhchap_auth_mutex);
-               ctrl->host_key = key;
-               mutex_unlock(&ctrl->dhchap_auth_mutex);
-               nvme_auth_free_key(host_key);
-       }
-       /* Start re-authentication */
-       dev_info(ctrl->device, "re-authenticating controller\n");
-       queue_work(nvme_wq, &ctrl->dhchap_auth_work);
-
-       return count;
-}
-static DEVICE_ATTR(dhchap_secret, S_IRUGO | S_IWUSR,
-       nvme_ctrl_dhchap_secret_show, nvme_ctrl_dhchap_secret_store);
-
-static ssize_t nvme_ctrl_dhchap_ctrl_secret_show(struct device *dev,
-               struct device_attribute *attr, char *buf)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       struct nvmf_ctrl_options *opts = ctrl->opts;
-
-       if (!opts->dhchap_ctrl_secret)
-               return sysfs_emit(buf, "none\n");
-       return sysfs_emit(buf, "%s\n", opts->dhchap_ctrl_secret);
-}
-
-static ssize_t nvme_ctrl_dhchap_ctrl_secret_store(struct device *dev,
-               struct device_attribute *attr, const char *buf, size_t count)
-{
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-       struct nvmf_ctrl_options *opts = ctrl->opts;
-       char *dhchap_secret;
-
-       if (!ctrl->opts->dhchap_ctrl_secret)
-               return -EINVAL;
-       if (count < 7)
-               return -EINVAL;
-       if (memcmp(buf, "DHHC-1:", 7))
-               return -EINVAL;
-
-       dhchap_secret = kzalloc(count + 1, GFP_KERNEL);
-       if (!dhchap_secret)
-               return -ENOMEM;
-       memcpy(dhchap_secret, buf, count);
-       nvme_auth_stop(ctrl);
-       if (strcmp(dhchap_secret, opts->dhchap_ctrl_secret)) {
-               struct nvme_dhchap_key *key, *ctrl_key;
-               int ret;
-
-               ret = nvme_auth_generate_key(dhchap_secret, &key);
-               if (ret)
-                       return ret;
-               kfree(opts->dhchap_ctrl_secret);
-               opts->dhchap_ctrl_secret = dhchap_secret;
-               ctrl_key = ctrl->ctrl_key;
-               mutex_lock(&ctrl->dhchap_auth_mutex);
-               ctrl->ctrl_key = key;
-               mutex_unlock(&ctrl->dhchap_auth_mutex);
-               nvme_auth_free_key(ctrl_key);
-       }
-       /* Start re-authentication */
-       dev_info(ctrl->device, "re-authenticating controller\n");
-       queue_work(nvme_wq, &ctrl->dhchap_auth_work);
-
-       return count;
-}
-static DEVICE_ATTR(dhchap_ctrl_secret, S_IRUGO | S_IWUSR,
-       nvme_ctrl_dhchap_ctrl_secret_show, nvme_ctrl_dhchap_ctrl_secret_store);
-#endif
-
-static struct attribute *nvme_dev_attrs[] = {
-       &dev_attr_reset_controller.attr,
-       &dev_attr_rescan_controller.attr,
-       &dev_attr_model.attr,
-       &dev_attr_serial.attr,
-       &dev_attr_firmware_rev.attr,
-       &dev_attr_cntlid.attr,
-       &dev_attr_delete_controller.attr,
-       &dev_attr_transport.attr,
-       &dev_attr_subsysnqn.attr,
-       &dev_attr_address.attr,
-       &dev_attr_state.attr,
-       &dev_attr_numa_node.attr,
-       &dev_attr_queue_count.attr,
-       &dev_attr_sqsize.attr,
-       &dev_attr_hostnqn.attr,
-       &dev_attr_hostid.attr,
-       &dev_attr_ctrl_loss_tmo.attr,
-       &dev_attr_reconnect_delay.attr,
-       &dev_attr_fast_io_fail_tmo.attr,
-       &dev_attr_kato.attr,
-       &dev_attr_cntrltype.attr,
-       &dev_attr_dctype.attr,
-#ifdef CONFIG_NVME_AUTH
-       &dev_attr_dhchap_secret.attr,
-       &dev_attr_dhchap_ctrl_secret.attr,
-#endif
-       NULL
-};
-
-static umode_t nvme_dev_attrs_are_visible(struct kobject *kobj,
-               struct attribute *a, int n)
-{
-       struct device *dev = container_of(kobj, struct device, kobj);
-       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-
-       if (a == &dev_attr_delete_controller.attr && !ctrl->ops->delete_ctrl)
-               return 0;
-       if (a == &dev_attr_address.attr && !ctrl->ops->get_address)
-               return 0;
-       if (a == &dev_attr_hostnqn.attr && !ctrl->opts)
-               return 0;
-       if (a == &dev_attr_hostid.attr && !ctrl->opts)
-               return 0;
-       if (a == &dev_attr_ctrl_loss_tmo.attr && !ctrl->opts)
-               return 0;
-       if (a == &dev_attr_reconnect_delay.attr && !ctrl->opts)
-               return 0;
-       if (a == &dev_attr_fast_io_fail_tmo.attr && !ctrl->opts)
-               return 0;
-#ifdef CONFIG_NVME_AUTH
-       if (a == &dev_attr_dhchap_secret.attr && !ctrl->opts)
-               return 0;
-       if (a == &dev_attr_dhchap_ctrl_secret.attr && !ctrl->opts)
-               return 0;
-#endif
-
-       return a->mode;
-}
-
-const struct attribute_group nvme_dev_attrs_group = {
-       .attrs          = nvme_dev_attrs,
-       .is_visible     = nvme_dev_attrs_are_visible,
-};
-EXPORT_SYMBOL_GPL(nvme_dev_attrs_group);
-
-static const struct attribute_group *nvme_dev_attr_groups[] = {
-       &nvme_dev_attrs_group,
-       NULL,
-};
-
 static struct nvme_ns_head *nvme_find_ns_head(struct nvme_ctrl *ctrl,
                unsigned nsid)
 {
@@ -4209,7 +3610,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
                        goto out_put_ns_head;
                }
 
-               if (!multipath && !list_empty(&head->list)) {
+               if (!multipath) {
                        dev_warn(ctrl->device,
                                "Found shared namespace %d, but multipathing not supported.\n",
                                info->nsid);
@@ -4310,7 +3711,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
         * instance as shared namespaces will show up as multiple block
         * devices.
         */
-       if (ns->head->disk) {
+       if (nvme_ns_head_multipath(ns->head)) {
                sprintf(disk->disk_name, "nvme%dc%dn%d", ctrl->subsys->instance,
                        ctrl->instance, ns->head->instance);
                disk->flags |= GENHD_FL_HIDDEN;
@@ -5045,7 +4446,7 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
         * that were missed. We identify persistent discovery controllers by
         * checking that they started once before, hence are reconnecting back.
         */
-       if (test_and_set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags) &&
+       if (test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags) &&
            nvme_discovery_ctrl(ctrl))
                nvme_change_uevent(ctrl, "NVME_EVENT=rediscover");
 
@@ -5056,6 +4457,7 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
        }
 
        nvme_change_uevent(ctrl, "NVME_EVENT=connected");
+       set_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags);
 }
 EXPORT_SYMBOL_GPL(nvme_start_ctrl);
 
@@ -5195,6 +4597,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 
        return 0;
 out_free_cdev:
+       nvme_fault_inject_fini(&ctrl->fault_inject);
+       dev_pm_qos_hide_latency_tolerance(ctrl->device);
        cdev_device_del(&ctrl->cdev, ctrl->device);
 out_free_name:
        nvme_put_ctrl(ctrl);
index 0069ebf..8175d49 100644 (file)
@@ -21,35 +21,60 @@ static DEFINE_MUTEX(nvmf_hosts_mutex);
 
 static struct nvmf_host *nvmf_default_host;
 
-static struct nvmf_host *__nvmf_host_find(const char *hostnqn)
+static struct nvmf_host *nvmf_host_alloc(const char *hostnqn, uuid_t *id)
 {
        struct nvmf_host *host;
 
-       list_for_each_entry(host, &nvmf_hosts, list) {
-               if (!strcmp(host->nqn, hostnqn))
-                       return host;
-       }
+       host = kmalloc(sizeof(*host), GFP_KERNEL);
+       if (!host)
+               return NULL;
 
-       return NULL;
+       kref_init(&host->ref);
+       uuid_copy(&host->id, id);
+       strscpy(host->nqn, hostnqn, NVMF_NQN_SIZE);
+
+       return host;
 }
 
-static struct nvmf_host *nvmf_host_add(const char *hostnqn)
+static struct nvmf_host *nvmf_host_add(const char *hostnqn, uuid_t *id)
 {
        struct nvmf_host *host;
 
        mutex_lock(&nvmf_hosts_mutex);
-       host = __nvmf_host_find(hostnqn);
-       if (host) {
-               kref_get(&host->ref);
-               goto out_unlock;
+
+       /*
+        * We have defined a host as how it is perceived by the target.
+        * Therefore, we don't allow different Host NQNs with the same Host ID.
+        * Similarly, we do not allow the usage of the same Host NQN with
+        * different Host IDs. This'll maintain unambiguous host identification.
+        */
+       list_for_each_entry(host, &nvmf_hosts, list) {
+               bool same_hostnqn = !strcmp(host->nqn, hostnqn);
+               bool same_hostid = uuid_equal(&host->id, id);
+
+               if (same_hostnqn && same_hostid) {
+                       kref_get(&host->ref);
+                       goto out_unlock;
+               }
+               if (same_hostnqn) {
+                       pr_err("found same hostnqn %s but different hostid %pUb\n",
+                              hostnqn, id);
+                       host = ERR_PTR(-EINVAL);
+                       goto out_unlock;
+               }
+               if (same_hostid) {
+                       pr_err("found same hostid %pUb but different hostnqn %s\n",
+                              id, hostnqn);
+                       host = ERR_PTR(-EINVAL);
+                       goto out_unlock;
+               }
        }
 
-       host = kmalloc(sizeof(*host), GFP_KERNEL);
-       if (!host)
+       host = nvmf_host_alloc(hostnqn, id);
+       if (!host) {
+               host = ERR_PTR(-ENOMEM);
                goto out_unlock;
-
-       kref_init(&host->ref);
-       strscpy(host->nqn, hostnqn, NVMF_NQN_SIZE);
+       }
 
        list_add_tail(&host->list, &nvmf_hosts);
 out_unlock:
@@ -60,16 +85,17 @@ out_unlock:
 static struct nvmf_host *nvmf_host_default(void)
 {
        struct nvmf_host *host;
+       char nqn[NVMF_NQN_SIZE];
+       uuid_t id;
 
-       host = kmalloc(sizeof(*host), GFP_KERNEL);
+       uuid_gen(&id);
+       snprintf(nqn, NVMF_NQN_SIZE,
+               "nqn.2014-08.org.nvmexpress:uuid:%pUb", &id);
+
+       host = nvmf_host_alloc(nqn, &id);
        if (!host)
                return NULL;
 
-       kref_init(&host->ref);
-       uuid_gen(&host->id);
-       snprintf(host->nqn, NVMF_NQN_SIZE,
-               "nqn.2014-08.org.nvmexpress:uuid:%pUb", &host->id);
-
        mutex_lock(&nvmf_hosts_mutex);
        list_add_tail(&host->list, &nvmf_hosts);
        mutex_unlock(&nvmf_hosts_mutex);
@@ -349,6 +375,45 @@ static void nvmf_log_connect_error(struct nvme_ctrl *ctrl,
        }
 }
 
+static struct nvmf_connect_data *nvmf_connect_data_prep(struct nvme_ctrl *ctrl,
+               u16 cntlid)
+{
+       struct nvmf_connect_data *data;
+
+       data = kzalloc(sizeof(*data), GFP_KERNEL);
+       if (!data)
+               return NULL;
+
+       uuid_copy(&data->hostid, &ctrl->opts->host->id);
+       data->cntlid = cpu_to_le16(cntlid);
+       strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE);
+       strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE);
+
+       return data;
+}
+
+static void nvmf_connect_cmd_prep(struct nvme_ctrl *ctrl, u16 qid,
+               struct nvme_command *cmd)
+{
+       cmd->connect.opcode = nvme_fabrics_command;
+       cmd->connect.fctype = nvme_fabrics_type_connect;
+       cmd->connect.qid = cpu_to_le16(qid);
+
+       if (qid) {
+               cmd->connect.sqsize = cpu_to_le16(ctrl->sqsize);
+       } else {
+               cmd->connect.sqsize = cpu_to_le16(NVME_AQ_DEPTH - 1);
+
+               /*
+                * set keep-alive timeout in seconds granularity (ms * 1000)
+                */
+               cmd->connect.kato = cpu_to_le32(ctrl->kato * 1000);
+       }
+
+       if (ctrl->opts->disable_sqflow)
+               cmd->connect.cattr |= NVME_CONNECT_DISABLE_SQFLOW;
+}
+
 /**
  * nvmf_connect_admin_queue() - NVMe Fabrics Admin Queue "Connect"
  *                             API function.
@@ -377,28 +442,12 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
        int ret;
        u32 result;
 
-       cmd.connect.opcode = nvme_fabrics_command;
-       cmd.connect.fctype = nvme_fabrics_type_connect;
-       cmd.connect.qid = 0;
-       cmd.connect.sqsize = cpu_to_le16(NVME_AQ_DEPTH - 1);
-
-       /*
-        * Set keep-alive timeout in seconds granularity (ms * 1000)
-        */
-       cmd.connect.kato = cpu_to_le32(ctrl->kato * 1000);
-
-       if (ctrl->opts->disable_sqflow)
-               cmd.connect.cattr |= NVME_CONNECT_DISABLE_SQFLOW;
+       nvmf_connect_cmd_prep(ctrl, 0, &cmd);
 
-       data = kzalloc(sizeof(*data), GFP_KERNEL);
+       data = nvmf_connect_data_prep(ctrl, 0xffff);
        if (!data)
                return -ENOMEM;
 
-       uuid_copy(&data->hostid, &ctrl->opts->host->id);
-       data->cntlid = cpu_to_le16(0xffff);
-       strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE);
-       strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE);
-
        ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res,
                        data, sizeof(*data), NVME_QID_ANY, 1,
                        BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
@@ -468,23 +517,12 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
        int ret;
        u32 result;
 
-       cmd.connect.opcode = nvme_fabrics_command;
-       cmd.connect.fctype = nvme_fabrics_type_connect;
-       cmd.connect.qid = cpu_to_le16(qid);
-       cmd.connect.sqsize = cpu_to_le16(ctrl->sqsize);
+       nvmf_connect_cmd_prep(ctrl, qid, &cmd);
 
-       if (ctrl->opts->disable_sqflow)
-               cmd.connect.cattr |= NVME_CONNECT_DISABLE_SQFLOW;
-
-       data = kzalloc(sizeof(*data), GFP_KERNEL);
+       data = nvmf_connect_data_prep(ctrl, ctrl->cntlid);
        if (!data)
                return -ENOMEM;
 
-       uuid_copy(&data->hostid, &ctrl->opts->host->id);
-       data->cntlid = cpu_to_le16(ctrl->cntlid);
-       strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE);
-       strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE);
-
        ret = __nvme_submit_sync_cmd(ctrl->connect_q, &cmd, &res,
                        data, sizeof(*data), qid, 1,
                        BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
@@ -621,6 +659,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
        size_t nqnlen  = 0;
        int ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO;
        uuid_t hostid;
+       char hostnqn[NVMF_NQN_SIZE];
 
        /* Set defaults */
        opts->queue_size = NVMF_DEF_QUEUE_SIZE;
@@ -637,7 +676,9 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
        if (!options)
                return -ENOMEM;
 
-       uuid_gen(&hostid);
+       /* use default host if not given by user space */
+       uuid_copy(&hostid, &nvmf_default_host->id);
+       strscpy(hostnqn, nvmf_default_host->nqn, NVMF_NQN_SIZE);
 
        while ((p = strsep(&o, ",\n")) != NULL) {
                if (!*p)
@@ -783,12 +824,8 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
                                ret = -EINVAL;
                                goto out;
                        }
-                       opts->host = nvmf_host_add(p);
+                       strscpy(hostnqn, p, NVMF_NQN_SIZE);
                        kfree(p);
-                       if (!opts->host) {
-                               ret = -ENOMEM;
-                               goto out;
-                       }
                        break;
                case NVMF_OPT_RECONNECT_DELAY:
                        if (match_int(args, &token)) {
@@ -945,18 +982,94 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
                                opts->fast_io_fail_tmo, ctrl_loss_tmo);
        }
 
-       if (!opts->host) {
-               kref_get(&nvmf_default_host->ref);
-               opts->host = nvmf_default_host;
+       opts->host = nvmf_host_add(hostnqn, &hostid);
+       if (IS_ERR(opts->host)) {
+               ret = PTR_ERR(opts->host);
+               opts->host = NULL;
+               goto out;
        }
 
-       uuid_copy(&opts->host->id, &hostid);
-
 out:
        kfree(options);
        return ret;
 }
 
+void nvmf_set_io_queues(struct nvmf_ctrl_options *opts, u32 nr_io_queues,
+                       u32 io_queues[HCTX_MAX_TYPES])
+{
+       if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
+               /*
+                * separate read/write queues
+                * hand out dedicated default queues only after we have
+                * sufficient read queues.
+                */
+               io_queues[HCTX_TYPE_READ] = opts->nr_io_queues;
+               nr_io_queues -= io_queues[HCTX_TYPE_READ];
+               io_queues[HCTX_TYPE_DEFAULT] =
+                       min(opts->nr_write_queues, nr_io_queues);
+               nr_io_queues -= io_queues[HCTX_TYPE_DEFAULT];
+       } else {
+               /*
+                * shared read/write queues
+                * either no write queues were requested, or we don't have
+                * sufficient queue count to have dedicated default queues.
+                */
+               io_queues[HCTX_TYPE_DEFAULT] =
+                       min(opts->nr_io_queues, nr_io_queues);
+               nr_io_queues -= io_queues[HCTX_TYPE_DEFAULT];
+       }
+
+       if (opts->nr_poll_queues && nr_io_queues) {
+               /* map dedicated poll queues only if we have queues left */
+               io_queues[HCTX_TYPE_POLL] =
+                       min(opts->nr_poll_queues, nr_io_queues);
+       }
+}
+EXPORT_SYMBOL_GPL(nvmf_set_io_queues);
+
+void nvmf_map_queues(struct blk_mq_tag_set *set, struct nvme_ctrl *ctrl,
+                    u32 io_queues[HCTX_MAX_TYPES])
+{
+       struct nvmf_ctrl_options *opts = ctrl->opts;
+
+       if (opts->nr_write_queues && io_queues[HCTX_TYPE_READ]) {
+               /* separate read/write queues */
+               set->map[HCTX_TYPE_DEFAULT].nr_queues =
+                       io_queues[HCTX_TYPE_DEFAULT];
+               set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+               set->map[HCTX_TYPE_READ].nr_queues =
+                       io_queues[HCTX_TYPE_READ];
+               set->map[HCTX_TYPE_READ].queue_offset =
+                       io_queues[HCTX_TYPE_DEFAULT];
+       } else {
+               /* shared read/write queues */
+               set->map[HCTX_TYPE_DEFAULT].nr_queues =
+                       io_queues[HCTX_TYPE_DEFAULT];
+               set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+               set->map[HCTX_TYPE_READ].nr_queues =
+                       io_queues[HCTX_TYPE_DEFAULT];
+               set->map[HCTX_TYPE_READ].queue_offset = 0;
+       }
+
+       blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
+       blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
+       if (opts->nr_poll_queues && io_queues[HCTX_TYPE_POLL]) {
+               /* map dedicated poll queues only if we have queues left */
+               set->map[HCTX_TYPE_POLL].nr_queues = io_queues[HCTX_TYPE_POLL];
+               set->map[HCTX_TYPE_POLL].queue_offset =
+                       io_queues[HCTX_TYPE_DEFAULT] +
+                       io_queues[HCTX_TYPE_READ];
+               blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
+       }
+
+       dev_info(ctrl->device,
+               "mapped %d/%d/%d default/read/poll queues.\n",
+               io_queues[HCTX_TYPE_DEFAULT],
+               io_queues[HCTX_TYPE_READ],
+               io_queues[HCTX_TYPE_POLL]);
+}
+EXPORT_SYMBOL_GPL(nvmf_map_queues);
+
 static int nvmf_check_required_opts(struct nvmf_ctrl_options *opts,
                unsigned int required_opts)
 {
index dcac3df..82e7a27 100644 (file)
@@ -77,6 +77,9 @@ enum {
  *                           with the parsing opts enum.
  * @mask:      Used by the fabrics library to parse through sysfs options
  *             on adding a NVMe controller.
+ * @max_reconnects: maximum number of allowed reconnect attempts before removing
+ *             the controller, (-1) means reconnect forever, zero means remove
+ *             immediately;
  * @transport: Holds the fabric transport "technology name" (for a lack of
  *             better description) that will be used by an NVMe controller
  *             being added.
@@ -96,9 +99,6 @@ enum {
  * @discovery_nqn: indicates if the subsysnqn is the well-known discovery NQN.
  * @kato:      Keep-alive timeout.
  * @host:      Virtual NVMe host, contains the NQN and Host ID.
- * @max_reconnects: maximum number of allowed reconnect attempts before removing
- *              the controller, (-1) means reconnect forever, zero means remove
- *              immediately;
  * @dhchap_secret: DH-HMAC-CHAP secret
  * @dhchap_ctrl_secret: DH-HMAC-CHAP controller secret for bi-directional
  *              authentication
@@ -112,6 +112,7 @@ enum {
  */
 struct nvmf_ctrl_options {
        unsigned                mask;
+       int                     max_reconnects;
        char                    *transport;
        char                    *subsysnqn;
        char                    *traddr;
@@ -125,7 +126,6 @@ struct nvmf_ctrl_options {
        bool                    duplicate_connect;
        unsigned int            kato;
        struct nvmf_host        *host;
-       int                     max_reconnects;
        char                    *dhchap_secret;
        char                    *dhchap_ctrl_secret;
        bool                    disable_sqflow;
@@ -181,7 +181,7 @@ nvmf_ctlr_matches_baseopts(struct nvme_ctrl *ctrl,
            ctrl->state == NVME_CTRL_DEAD ||
            strcmp(opts->subsysnqn, ctrl->opts->subsysnqn) ||
            strcmp(opts->host->nqn, ctrl->opts->host->nqn) ||
-           memcmp(&opts->host->id, &ctrl->opts->host->id, sizeof(uuid_t)))
+           !uuid_equal(&opts->host->id, &ctrl->opts->host->id))
                return false;
 
        return true;
@@ -203,6 +203,13 @@ static inline void nvmf_complete_timed_out_request(struct request *rq)
        }
 }
 
+static inline unsigned int nvmf_nr_io_queues(struct nvmf_ctrl_options *opts)
+{
+       return min(opts->nr_io_queues, num_online_cpus()) +
+               min(opts->nr_write_queues, num_online_cpus()) +
+               min(opts->nr_poll_queues, num_online_cpus());
+}
+
 int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val);
 int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val);
@@ -215,5 +222,9 @@ int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size);
 bool nvmf_should_reconnect(struct nvme_ctrl *ctrl);
 bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
                struct nvmf_ctrl_options *opts);
+void nvmf_set_io_queues(struct nvmf_ctrl_options *opts, u32 nr_io_queues,
+                       u32 io_queues[HCTX_MAX_TYPES]);
+void nvmf_map_queues(struct blk_mq_tag_set *set, struct nvme_ctrl *ctrl,
+                    u32 io_queues[HCTX_MAX_TYPES]);
 
 #endif /* _NVME_FABRICS_H */
index 9e6e56c..316f3e4 100644 (file)
@@ -163,7 +163,9 @@ static umode_t nvme_hwmon_is_visible(const void *_data,
        case hwmon_temp_max:
        case hwmon_temp_min:
                if ((!channel && data->ctrl->wctemp) ||
-                   (channel && data->log->temp_sensor[channel - 1])) {
+                   (channel && data->log->temp_sensor[channel - 1] &&
+                    !(data->ctrl->quirks &
+                      NVME_QUIRK_NO_SECONDARY_TEMP_THRESH))) {
                        if (data->ctrl->quirks &
                            NVME_QUIRK_NO_TEMP_THRESH_CHANGE)
                                return 0444;
index 81c5c9e..2130ad6 100644 (file)
@@ -14,7 +14,7 @@ enum {
 };
 
 static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
-               unsigned int flags, fmode_t mode)
+               unsigned int flags, bool open_for_write)
 {
        u32 effects;
 
@@ -80,7 +80,7 @@ static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
         * writing.
         */
        if (nvme_is_write(c) || (effects & NVME_CMD_EFFECTS_LBCC))
-               return mode & FMODE_WRITE;
+               return open_for_write;
        return true;
 }
 
@@ -254,7 +254,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
        blk_mq_free_request(req);
 
        if (effects)
-               nvme_passthru_end(ctrl, effects, cmd, ret);
+               nvme_passthru_end(ctrl, ns, effects, cmd, ret);
 
        return ret;
 }
@@ -337,7 +337,7 @@ static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl,
 
 static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
                struct nvme_passthru_cmd __user *ucmd, unsigned int flags,
-               fmode_t mode)
+               bool open_for_write)
 {
        struct nvme_passthru_cmd cmd;
        struct nvme_command c;
@@ -365,7 +365,7 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
        c.common.cdw14 = cpu_to_le32(cmd.cdw14);
        c.common.cdw15 = cpu_to_le32(cmd.cdw15);
 
-       if (!nvme_cmd_allowed(ns, &c, 0, mode))
+       if (!nvme_cmd_allowed(ns, &c, 0, open_for_write))
                return -EACCES;
 
        if (cmd.timeout_ms)
@@ -385,7 +385,7 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 
 static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
                struct nvme_passthru_cmd64 __user *ucmd, unsigned int flags,
-               fmode_t mode)
+               bool open_for_write)
 {
        struct nvme_passthru_cmd64 cmd;
        struct nvme_command c;
@@ -412,7 +412,7 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
        c.common.cdw14 = cpu_to_le32(cmd.cdw14);
        c.common.cdw15 = cpu_to_le32(cmd.cdw15);
 
-       if (!nvme_cmd_allowed(ns, &c, flags, mode))
+       if (!nvme_cmd_allowed(ns, &c, flags, open_for_write))
                return -EACCES;
 
        if (cmd.timeout_ms)
@@ -521,7 +521,7 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
        if (cookie != NULL && blk_rq_is_poll(req))
                nvme_uring_task_cb(ioucmd, IO_URING_F_UNLOCKED);
        else
-               io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
+               io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
 
        return RQ_END_IO_FREE;
 }
@@ -543,7 +543,7 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io_meta(struct request *req,
        if (cookie != NULL && blk_rq_is_poll(req))
                nvme_uring_task_meta_cb(ioucmd, IO_URING_F_UNLOCKED);
        else
-               io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_meta_cb);
+               io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_meta_cb);
 
        return RQ_END_IO_NONE;
 }
@@ -583,7 +583,7 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
        c.common.cdw14 = cpu_to_le32(READ_ONCE(cmd->cdw14));
        c.common.cdw15 = cpu_to_le32(READ_ONCE(cmd->cdw15));
 
-       if (!nvme_cmd_allowed(ns, &c, 0, ioucmd->file->f_mode))
+       if (!nvme_cmd_allowed(ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE))
                return -EACCES;
 
        d.metadata = READ_ONCE(cmd->metadata);
@@ -649,13 +649,13 @@ static bool is_ctrl_ioctl(unsigned int cmd)
 }
 
 static int nvme_ctrl_ioctl(struct nvme_ctrl *ctrl, unsigned int cmd,
-               void __user *argp, fmode_t mode)
+               void __user *argp, bool open_for_write)
 {
        switch (cmd) {
        case NVME_IOCTL_ADMIN_CMD:
-               return nvme_user_cmd(ctrl, NULL, argp, 0, mode);
+               return nvme_user_cmd(ctrl, NULL, argp, 0, open_for_write);
        case NVME_IOCTL_ADMIN64_CMD:
-               return nvme_user_cmd64(ctrl, NULL, argp, 0, mode);
+               return nvme_user_cmd64(ctrl, NULL, argp, 0, open_for_write);
        default:
                return sed_ioctl(ctrl->opal_dev, cmd, argp);
        }
@@ -680,14 +680,14 @@ struct nvme_user_io32 {
 #endif /* COMPAT_FOR_U64_ALIGNMENT */
 
 static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd,
-               void __user *argp, unsigned int flags, fmode_t mode)
+               void __user *argp, unsigned int flags, bool open_for_write)
 {
        switch (cmd) {
        case NVME_IOCTL_ID:
                force_successful_syscall_return();
                return ns->head->ns_id;
        case NVME_IOCTL_IO_CMD:
-               return nvme_user_cmd(ns->ctrl, ns, argp, flags, mode);
+               return nvme_user_cmd(ns->ctrl, ns, argp, flags, open_for_write);
        /*
         * struct nvme_user_io can have different padding on some 32-bit ABIs.
         * Just accept the compat version as all fields that are used are the
@@ -702,16 +702,18 @@ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd,
                flags |= NVME_IOCTL_VEC;
                fallthrough;
        case NVME_IOCTL_IO64_CMD:
-               return nvme_user_cmd64(ns->ctrl, ns, argp, flags, mode);
+               return nvme_user_cmd64(ns->ctrl, ns, argp, flags,
+                                      open_for_write);
        default:
                return -ENOTTY;
        }
 }
 
-int nvme_ioctl(struct block_device *bdev, fmode_t mode,
+int nvme_ioctl(struct block_device *bdev, blk_mode_t mode,
                unsigned int cmd, unsigned long arg)
 {
        struct nvme_ns *ns = bdev->bd_disk->private_data;
+       bool open_for_write = mode & BLK_OPEN_WRITE;
        void __user *argp = (void __user *)arg;
        unsigned int flags = 0;
 
@@ -719,19 +721,20 @@ int nvme_ioctl(struct block_device *bdev, fmode_t mode,
                flags |= NVME_IOCTL_PARTITION;
 
        if (is_ctrl_ioctl(cmd))
-               return nvme_ctrl_ioctl(ns->ctrl, cmd, argp, mode);
-       return nvme_ns_ioctl(ns, cmd, argp, flags, mode);
+               return nvme_ctrl_ioctl(ns->ctrl, cmd, argp, open_for_write);
+       return nvme_ns_ioctl(ns, cmd, argp, flags, open_for_write);
 }
 
 long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
        struct nvme_ns *ns =
                container_of(file_inode(file)->i_cdev, struct nvme_ns, cdev);
+       bool open_for_write = file->f_mode & FMODE_WRITE;
        void __user *argp = (void __user *)arg;
 
        if (is_ctrl_ioctl(cmd))
-               return nvme_ctrl_ioctl(ns->ctrl, cmd, argp, file->f_mode);
-       return nvme_ns_ioctl(ns, cmd, argp, 0, file->f_mode);
+               return nvme_ctrl_ioctl(ns->ctrl, cmd, argp, open_for_write);
+       return nvme_ns_ioctl(ns, cmd, argp, 0, open_for_write);
 }
 
 static int nvme_uring_cmd_checks(unsigned int issue_flags)
@@ -800,7 +803,7 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
 #ifdef CONFIG_NVME_MULTIPATH
 static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
                void __user *argp, struct nvme_ns_head *head, int srcu_idx,
-               fmode_t mode)
+               bool open_for_write)
        __releases(&head->srcu)
 {
        struct nvme_ctrl *ctrl = ns->ctrl;
@@ -808,16 +811,17 @@ static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
 
        nvme_get_ctrl(ns->ctrl);
        srcu_read_unlock(&head->srcu, srcu_idx);
-       ret = nvme_ctrl_ioctl(ns->ctrl, cmd, argp, mode);
+       ret = nvme_ctrl_ioctl(ns->ctrl, cmd, argp, open_for_write);
 
        nvme_put_ctrl(ctrl);
        return ret;
 }
 
-int nvme_ns_head_ioctl(struct block_device *bdev, fmode_t mode,
+int nvme_ns_head_ioctl(struct block_device *bdev, blk_mode_t mode,
                unsigned int cmd, unsigned long arg)
 {
        struct nvme_ns_head *head = bdev->bd_disk->private_data;
+       bool open_for_write = mode & BLK_OPEN_WRITE;
        void __user *argp = (void __user *)arg;
        struct nvme_ns *ns;
        int srcu_idx, ret = -EWOULDBLOCK;
@@ -838,9 +842,9 @@ int nvme_ns_head_ioctl(struct block_device *bdev, fmode_t mode,
         */
        if (is_ctrl_ioctl(cmd))
                return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx,
-                                       mode);
+                                              open_for_write);
 
-       ret = nvme_ns_ioctl(ns, cmd, argp, flags, mode);
+       ret = nvme_ns_ioctl(ns, cmd, argp, flags, open_for_write);
 out_unlock:
        srcu_read_unlock(&head->srcu, srcu_idx);
        return ret;
@@ -849,6 +853,7 @@ out_unlock:
 long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
                unsigned long arg)
 {
+       bool open_for_write = file->f_mode & FMODE_WRITE;
        struct cdev *cdev = file_inode(file)->i_cdev;
        struct nvme_ns_head *head =
                container_of(cdev, struct nvme_ns_head, cdev);
@@ -863,9 +868,9 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
 
        if (is_ctrl_ioctl(cmd))
                return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx,
-                               file->f_mode);
+                               open_for_write);
 
-       ret = nvme_ns_ioctl(ns, cmd, argp, 0, file->f_mode);
+       ret = nvme_ns_ioctl(ns, cmd, argp, 0, open_for_write);
 out_unlock:
        srcu_read_unlock(&head->srcu, srcu_idx);
        return ret;
@@ -940,7 +945,7 @@ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
 }
 
 static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp,
-               fmode_t mode)
+               bool open_for_write)
 {
        struct nvme_ns *ns;
        int ret;
@@ -964,7 +969,7 @@ static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp,
        kref_get(&ns->kref);
        up_read(&ctrl->namespaces_rwsem);
 
-       ret = nvme_user_cmd(ctrl, ns, argp, 0, mode);
+       ret = nvme_user_cmd(ctrl, ns, argp, 0, open_for_write);
        nvme_put_ns(ns);
        return ret;
 
@@ -976,16 +981,17 @@ out_unlock:
 long nvme_dev_ioctl(struct file *file, unsigned int cmd,
                unsigned long arg)
 {
+       bool open_for_write = file->f_mode & FMODE_WRITE;
        struct nvme_ctrl *ctrl = file->private_data;
        void __user *argp = (void __user *)arg;
 
        switch (cmd) {
        case NVME_IOCTL_ADMIN_CMD:
-               return nvme_user_cmd(ctrl, NULL, argp, 0, file->f_mode);
+               return nvme_user_cmd(ctrl, NULL, argp, 0, open_for_write);
        case NVME_IOCTL_ADMIN64_CMD:
-               return nvme_user_cmd64(ctrl, NULL, argp, 0, file->f_mode);
+               return nvme_user_cmd64(ctrl, NULL, argp, 0, open_for_write);
        case NVME_IOCTL_IO_CMD:
-               return nvme_dev_user_cmd(ctrl, argp, file->f_mode);
+               return nvme_dev_user_cmd(ctrl, argp, open_for_write);
        case NVME_IOCTL_RESET:
                if (!capable(CAP_SYS_ADMIN))
                        return -EACCES;
index 9171452..98001ee 100644 (file)
@@ -402,14 +402,14 @@ static void nvme_ns_head_submit_bio(struct bio *bio)
        srcu_read_unlock(&head->srcu, srcu_idx);
 }
 
-static int nvme_ns_head_open(struct block_device *bdev, fmode_t mode)
+static int nvme_ns_head_open(struct gendisk *disk, blk_mode_t mode)
 {
-       if (!nvme_tryget_ns_head(bdev->bd_disk->private_data))
+       if (!nvme_tryget_ns_head(disk->private_data))
                return -ENXIO;
        return 0;
 }
 
-static void nvme_ns_head_release(struct gendisk *disk, fmode_t mode)
+static void nvme_ns_head_release(struct gendisk *disk)
 {
        nvme_put_ns_head(disk->private_data);
 }
@@ -884,7 +884,6 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
        if (!head->disk)
                return;
-       blk_mark_disk_dead(head->disk);
        /* make sure all pending bios are cleaned up */
        kblockd_schedule_work(&head->requeue_work);
        flush_work(&head->requeue_work);
index bf46f12..9a98c14 100644 (file)
@@ -149,6 +149,11 @@ enum nvme_quirks {
         * Reports garbage in the namespace identifiers (eui64, nguid, uuid).
         */
        NVME_QUIRK_BOGUS_NID                    = (1 << 18),
+
+       /*
+        * No temperature thresholds for channels other than 0 (Composite).
+        */
+       NVME_QUIRK_NO_SECONDARY_TEMP_THRESH     = (1 << 19),
 };
 
 /*
@@ -242,12 +247,13 @@ enum nvme_ctrl_flags {
        NVME_CTRL_ADMIN_Q_STOPPED       = 1,
        NVME_CTRL_STARTED_ONCE          = 2,
        NVME_CTRL_STOPPED               = 3,
+       NVME_CTRL_SKIP_ID_CNS_CS        = 4,
 };
 
 struct nvme_ctrl {
        bool comp_seen;
-       enum nvme_ctrl_state state;
        bool identified;
+       enum nvme_ctrl_state state;
        spinlock_t lock;
        struct mutex scan_lock;
        const struct nvme_ctrl_ops *ops;
@@ -279,8 +285,8 @@ struct nvme_ctrl {
        char name[12];
        u16 cntlid;
 
-       u32 ctrl_config;
        u16 mtfa;
+       u32 ctrl_config;
        u32 queue_count;
 
        u64 cap;
@@ -323,6 +329,7 @@ struct nvme_ctrl {
        struct delayed_work ka_work;
        struct delayed_work failfast_work;
        struct nvme_command ka_cmd;
+       unsigned long ka_last_check_time;
        struct work_struct fw_act_work;
        unsigned long events;
 
@@ -353,10 +360,10 @@ struct nvme_ctrl {
        bool apst_enabled;
 
        /* PCIe only: */
+       u16 hmmaxd;
        u32 hmpre;
        u32 hmmin;
        u32 hmminds;
-       u16 hmmaxd;
 
        /* Fabrics only */
        u32 ioccsz;
@@ -836,10 +843,10 @@ void nvme_put_ns_head(struct nvme_ns_head *head);
 int nvme_cdev_add(struct cdev *cdev, struct device *cdev_device,
                const struct file_operations *fops, struct module *owner);
 void nvme_cdev_del(struct cdev *cdev, struct device *cdev_device);
-int nvme_ioctl(struct block_device *bdev, fmode_t mode,
+int nvme_ioctl(struct block_device *bdev, blk_mode_t mode,
                unsigned int cmd, unsigned long arg);
 long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
-int nvme_ns_head_ioctl(struct block_device *bdev, fmode_t mode,
+int nvme_ns_head_ioctl(struct block_device *bdev, blk_mode_t mode,
                unsigned int cmd, unsigned long arg);
 long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
                unsigned long arg);
@@ -860,7 +867,11 @@ extern const struct attribute_group *nvme_ns_id_attr_groups[];
 extern const struct pr_ops nvme_pr_ops;
 extern const struct block_device_operations nvme_ns_head_ops;
 extern const struct attribute_group nvme_dev_attrs_group;
+extern const struct attribute_group *nvme_subsys_attrs_groups[];
+extern const struct attribute_group *nvme_dev_attr_groups[];
+extern const struct block_device_operations nvme_bdev_ops;
 
+void nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl);
 struct nvme_ns *nvme_find_path(struct nvme_ns_head *head);
 #ifdef CONFIG_NVME_MULTIPATH
 static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
@@ -1072,7 +1083,7 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
                         u8 opcode);
 u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode);
 int nvme_execute_rq(struct request *rq, bool at_head);
-void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
+void nvme_passthru_end(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u32 effects,
                       struct nvme_command *cmd, int status);
 struct nvme_ctrl *nvme_ctrl_from_file(struct file *file);
 struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid);
index 7f25c0f..48c60f7 100644 (file)
@@ -420,10 +420,9 @@ static int nvme_pci_init_request(struct blk_mq_tag_set *set,
                struct request *req, unsigned int hctx_idx,
                unsigned int numa_node)
 {
-       struct nvme_dev *dev = to_nvme_dev(set->driver_data);
        struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 
-       nvme_req(req)->ctrl = &dev->ctrl;
+       nvme_req(req)->ctrl = set->driver_data;
        nvme_req(req)->cmd = &iod->cmd;
        return 0;
 }
@@ -2956,7 +2955,7 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev,
         * over a single page.
         */
        dev->ctrl.max_hw_sectors = min_t(u32,
-               NVME_MAX_KB_SZ << 1, dma_max_mapping_size(&pdev->dev) >> 9);
+               NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9);
        dev->ctrl.max_segments = NVME_MAX_SEGS;
 
        /*
@@ -3402,6 +3401,8 @@ static const struct pci_device_id nvme_id_table[] = {
                .driver_data = NVME_QUIRK_NO_DEEPEST_PS, },
        { PCI_DEVICE(0x2646, 0x2263),   /* KINGSTON A2000 NVMe SSD  */
                .driver_data = NVME_QUIRK_NO_DEEPEST_PS, },
+       { PCI_DEVICE(0x2646, 0x5013),   /* Kingston KC3000, Kingston FURY Renegade */
+               .driver_data = NVME_QUIRK_NO_SECONDARY_TEMP_THRESH, },
        { PCI_DEVICE(0x2646, 0x5018),   /* KINGSTON OM8SFP4xxxxP OS21012 NVMe SSD */
                .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
        { PCI_DEVICE(0x2646, 0x5016),   /* KINGSTON OM3PGP4xxxxP OS21011 NVMe SSD */
@@ -3422,6 +3423,8 @@ static const struct pci_device_id nvme_id_table[] = {
                .driver_data = NVME_QUIRK_BOGUS_NID, },
        { PCI_DEVICE(0x1e4B, 0x1202),   /* MAXIO MAP1202 */
                .driver_data = NVME_QUIRK_BOGUS_NID, },
+       { PCI_DEVICE(0x1e4B, 0x1602),   /* MAXIO MAP1602 */
+               .driver_data = NVME_QUIRK_BOGUS_NID, },
        { PCI_DEVICE(0x1cc1, 0x5350),   /* ADATA XPG GAMMIX S50 */
                .driver_data = NVME_QUIRK_BOGUS_NID, },
        { PCI_DEVICE(0x1dbe, 0x5236),   /* ADATA XPG GAMMIX S70 */
@@ -3441,6 +3444,10 @@ static const struct pci_device_id nvme_id_table[] = {
                                NVME_QUIRK_IGNORE_DEV_SUBNQN, },
        { PCI_DEVICE(0x10ec, 0x5763), /* TEAMGROUP T-FORCE CARDEA ZERO Z330 SSD */
                .driver_data = NVME_QUIRK_BOGUS_NID, },
+       { PCI_DEVICE(0x1e4b, 0x1602), /* HS-SSD-FUTURE 2048G  */
+               .driver_data = NVME_QUIRK_BOGUS_NID, },
+       { PCI_DEVICE(0x10ec, 0x5765), /* TEAMGROUP MP33 2TB SSD */
+               .driver_data = NVME_QUIRK_BOGUS_NID, },
        { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0061),
                .driver_data = NVME_QUIRK_DMA_ADDRESS_BITS_48, },
        { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0065),
index 0eb7969..d433b2e 100644 (file)
@@ -501,7 +501,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
        }
        ibdev = queue->device->dev;
 
-       /* +1 for ib_stop_cq */
+       /* +1 for ib_drain_qp */
        queue->cq_size = cq_factor * queue->queue_size + 1;
 
        ret = nvme_rdma_create_cq(ibdev, queue);
@@ -713,18 +713,10 @@ out_stop_queues:
 static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
        struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
-       struct ib_device *ibdev = ctrl->device->dev;
-       unsigned int nr_io_queues, nr_default_queues;
-       unsigned int nr_read_queues, nr_poll_queues;
+       unsigned int nr_io_queues;
        int i, ret;
 
-       nr_read_queues = min_t(unsigned int, ibdev->num_comp_vectors,
-                               min(opts->nr_io_queues, num_online_cpus()));
-       nr_default_queues =  min_t(unsigned int, ibdev->num_comp_vectors,
-                               min(opts->nr_write_queues, num_online_cpus()));
-       nr_poll_queues = min(opts->nr_poll_queues, num_online_cpus());
-       nr_io_queues = nr_read_queues + nr_default_queues + nr_poll_queues;
-
+       nr_io_queues = nvmf_nr_io_queues(opts);
        ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
        if (ret)
                return ret;
@@ -739,34 +731,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
        dev_info(ctrl->ctrl.device,
                "creating %d I/O queues.\n", nr_io_queues);
 
-       if (opts->nr_write_queues && nr_read_queues < nr_io_queues) {
-               /*
-                * separate read/write queues
-                * hand out dedicated default queues only after we have
-                * sufficient read queues.
-                */
-               ctrl->io_queues[HCTX_TYPE_READ] = nr_read_queues;
-               nr_io_queues -= ctrl->io_queues[HCTX_TYPE_READ];
-               ctrl->io_queues[HCTX_TYPE_DEFAULT] =
-                       min(nr_default_queues, nr_io_queues);
-               nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
-       } else {
-               /*
-                * shared read/write queues
-                * either no write queues were requested, or we don't have
-                * sufficient queue count to have dedicated default queues.
-                */
-               ctrl->io_queues[HCTX_TYPE_DEFAULT] =
-                       min(nr_read_queues, nr_io_queues);
-               nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
-       }
-
-       if (opts->nr_poll_queues && nr_io_queues) {
-               /* map dedicated poll queues only if we have queues left */
-               ctrl->io_queues[HCTX_TYPE_POLL] =
-                       min(nr_poll_queues, nr_io_queues);
-       }
-
+       nvmf_set_io_queues(opts, nr_io_queues, ctrl->io_queues);
        for (i = 1; i < ctrl->ctrl.queue_count; i++) {
                ret = nvme_rdma_alloc_queue(ctrl, i,
                                ctrl->ctrl.sqsize + 1);
@@ -2138,44 +2103,8 @@ static void nvme_rdma_complete_rq(struct request *rq)
 static void nvme_rdma_map_queues(struct blk_mq_tag_set *set)
 {
        struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(set->driver_data);
-       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
 
-       if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
-               /* separate read/write queues */
-               set->map[HCTX_TYPE_DEFAULT].nr_queues =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT];
-               set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
-               set->map[HCTX_TYPE_READ].nr_queues =
-                       ctrl->io_queues[HCTX_TYPE_READ];
-               set->map[HCTX_TYPE_READ].queue_offset =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT];
-       } else {
-               /* shared read/write queues */
-               set->map[HCTX_TYPE_DEFAULT].nr_queues =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT];
-               set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
-               set->map[HCTX_TYPE_READ].nr_queues =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT];
-               set->map[HCTX_TYPE_READ].queue_offset = 0;
-       }
-       blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
-       blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
-
-       if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
-               /* map dedicated poll queues only if we have queues left */
-               set->map[HCTX_TYPE_POLL].nr_queues =
-                               ctrl->io_queues[HCTX_TYPE_POLL];
-               set->map[HCTX_TYPE_POLL].queue_offset =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT] +
-                       ctrl->io_queues[HCTX_TYPE_READ];
-               blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
-       }
-
-       dev_info(ctrl->ctrl.device,
-               "mapped %d/%d/%d default/read/poll queues.\n",
-               ctrl->io_queues[HCTX_TYPE_DEFAULT],
-               ctrl->io_queues[HCTX_TYPE_READ],
-               ctrl->io_queues[HCTX_TYPE_POLL]);
+       nvmf_map_queues(set, &ctrl->ctrl, ctrl->io_queues);
 }
 
 static const struct blk_mq_ops nvme_rdma_mq_ops = {
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
new file mode 100644 (file)
index 0000000..45e9181
--- /dev/null
@@ -0,0 +1,668 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Sysfs interface for the NVMe core driver.
+ *
+ * Copyright (c) 2011-2014, Intel Corporation.
+ */
+
+#include <linux/nvme-auth.h>
+
+#include "nvme.h"
+#include "fabrics.h"
+
+static ssize_t nvme_sysfs_reset(struct device *dev,
+                               struct device_attribute *attr, const char *buf,
+                               size_t count)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       int ret;
+
+       ret = nvme_reset_ctrl_sync(ctrl);
+       if (ret < 0)
+               return ret;
+       return count;
+}
+static DEVICE_ATTR(reset_controller, S_IWUSR, NULL, nvme_sysfs_reset);
+
+static ssize_t nvme_sysfs_rescan(struct device *dev,
+                               struct device_attribute *attr, const char *buf,
+                               size_t count)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       nvme_queue_scan(ctrl);
+       return count;
+}
+static DEVICE_ATTR(rescan_controller, S_IWUSR, NULL, nvme_sysfs_rescan);
+
+static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
+{
+       struct gendisk *disk = dev_to_disk(dev);
+
+       if (disk->fops == &nvme_bdev_ops)
+               return nvme_get_ns_from_dev(dev)->head;
+       else
+               return disk->private_data;
+}
+
+static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
+               char *buf)
+{
+       struct nvme_ns_head *head = dev_to_ns_head(dev);
+       struct nvme_ns_ids *ids = &head->ids;
+       struct nvme_subsystem *subsys = head->subsys;
+       int serial_len = sizeof(subsys->serial);
+       int model_len = sizeof(subsys->model);
+
+       if (!uuid_is_null(&ids->uuid))
+               return sysfs_emit(buf, "uuid.%pU\n", &ids->uuid);
+
+       if (memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
+               return sysfs_emit(buf, "eui.%16phN\n", ids->nguid);
+
+       if (memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
+               return sysfs_emit(buf, "eui.%8phN\n", ids->eui64);
+
+       while (serial_len > 0 && (subsys->serial[serial_len - 1] == ' ' ||
+                                 subsys->serial[serial_len - 1] == '\0'))
+               serial_len--;
+       while (model_len > 0 && (subsys->model[model_len - 1] == ' ' ||
+                                subsys->model[model_len - 1] == '\0'))
+               model_len--;
+
+       return sysfs_emit(buf, "nvme.%04x-%*phN-%*phN-%08x\n", subsys->vendor_id,
+               serial_len, subsys->serial, model_len, subsys->model,
+               head->ns_id);
+}
+static DEVICE_ATTR_RO(wwid);
+
+static ssize_t nguid_show(struct device *dev, struct device_attribute *attr,
+               char *buf)
+{
+       return sysfs_emit(buf, "%pU\n", dev_to_ns_head(dev)->ids.nguid);
+}
+static DEVICE_ATTR_RO(nguid);
+
+static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
+               char *buf)
+{
+       struct nvme_ns_ids *ids = &dev_to_ns_head(dev)->ids;
+
+       /* For backward compatibility expose the NGUID to userspace if
+        * we have no UUID set
+        */
+       if (uuid_is_null(&ids->uuid)) {
+               dev_warn_ratelimited(dev,
+                       "No UUID available providing old NGUID\n");
+               return sysfs_emit(buf, "%pU\n", ids->nguid);
+       }
+       return sysfs_emit(buf, "%pU\n", &ids->uuid);
+}
+static DEVICE_ATTR_RO(uuid);
+
+static ssize_t eui_show(struct device *dev, struct device_attribute *attr,
+               char *buf)
+{
+       return sysfs_emit(buf, "%8ph\n", dev_to_ns_head(dev)->ids.eui64);
+}
+static DEVICE_ATTR_RO(eui);
+
+static ssize_t nsid_show(struct device *dev, struct device_attribute *attr,
+               char *buf)
+{
+       return sysfs_emit(buf, "%d\n", dev_to_ns_head(dev)->ns_id);
+}
+static DEVICE_ATTR_RO(nsid);
+
+static struct attribute *nvme_ns_id_attrs[] = {
+       &dev_attr_wwid.attr,
+       &dev_attr_uuid.attr,
+       &dev_attr_nguid.attr,
+       &dev_attr_eui.attr,
+       &dev_attr_nsid.attr,
+#ifdef CONFIG_NVME_MULTIPATH
+       &dev_attr_ana_grpid.attr,
+       &dev_attr_ana_state.attr,
+#endif
+       NULL,
+};
+
+static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
+               struct attribute *a, int n)
+{
+       struct device *dev = container_of(kobj, struct device, kobj);
+       struct nvme_ns_ids *ids = &dev_to_ns_head(dev)->ids;
+
+       if (a == &dev_attr_uuid.attr) {
+               if (uuid_is_null(&ids->uuid) &&
+                   !memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
+                       return 0;
+       }
+       if (a == &dev_attr_nguid.attr) {
+               if (!memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
+                       return 0;
+       }
+       if (a == &dev_attr_eui.attr) {
+               if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
+                       return 0;
+       }
+#ifdef CONFIG_NVME_MULTIPATH
+       if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
+               if (dev_to_disk(dev)->fops != &nvme_bdev_ops) /* per-path attr */
+                       return 0;
+               if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl))
+                       return 0;
+       }
+#endif
+       return a->mode;
+}
+
+static const struct attribute_group nvme_ns_id_attr_group = {
+       .attrs          = nvme_ns_id_attrs,
+       .is_visible     = nvme_ns_id_attrs_are_visible,
+};
+
+const struct attribute_group *nvme_ns_id_attr_groups[] = {
+       &nvme_ns_id_attr_group,
+       NULL,
+};
+
+#define nvme_show_str_function(field)                                          \
+static ssize_t  field##_show(struct device *dev,                               \
+                           struct device_attribute *attr, char *buf)           \
+{                                                                              \
+        struct nvme_ctrl *ctrl = dev_get_drvdata(dev);                         \
+        return sysfs_emit(buf, "%.*s\n",                                       \
+               (int)sizeof(ctrl->subsys->field), ctrl->subsys->field);         \
+}                                                                              \
+static DEVICE_ATTR(field, S_IRUGO, field##_show, NULL);
+
+nvme_show_str_function(model);
+nvme_show_str_function(serial);
+nvme_show_str_function(firmware_rev);
+
+#define nvme_show_int_function(field)                                          \
+static ssize_t  field##_show(struct device *dev,                               \
+                           struct device_attribute *attr, char *buf)           \
+{                                                                              \
+        struct nvme_ctrl *ctrl = dev_get_drvdata(dev);                         \
+        return sysfs_emit(buf, "%d\n", ctrl->field);                           \
+}                                                                              \
+static DEVICE_ATTR(field, S_IRUGO, field##_show, NULL);
+
+nvme_show_int_function(cntlid);
+nvme_show_int_function(numa_node);
+nvme_show_int_function(queue_count);
+nvme_show_int_function(sqsize);
+nvme_show_int_function(kato);
+
+static ssize_t nvme_sysfs_delete(struct device *dev,
+                               struct device_attribute *attr, const char *buf,
+                               size_t count)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       if (!test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags))
+               return -EBUSY;
+
+       if (device_remove_file_self(dev, attr))
+               nvme_delete_ctrl_sync(ctrl);
+       return count;
+}
+static DEVICE_ATTR(delete_controller, S_IWUSR, NULL, nvme_sysfs_delete);
+
+static ssize_t nvme_sysfs_show_transport(struct device *dev,
+                                        struct device_attribute *attr,
+                                        char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       return sysfs_emit(buf, "%s\n", ctrl->ops->name);
+}
+static DEVICE_ATTR(transport, S_IRUGO, nvme_sysfs_show_transport, NULL);
+
+static ssize_t nvme_sysfs_show_state(struct device *dev,
+                                    struct device_attribute *attr,
+                                    char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       static const char *const state_name[] = {
+               [NVME_CTRL_NEW]         = "new",
+               [NVME_CTRL_LIVE]        = "live",
+               [NVME_CTRL_RESETTING]   = "resetting",
+               [NVME_CTRL_CONNECTING]  = "connecting",
+               [NVME_CTRL_DELETING]    = "deleting",
+               [NVME_CTRL_DELETING_NOIO]= "deleting (no IO)",
+               [NVME_CTRL_DEAD]        = "dead",
+       };
+
+       if ((unsigned)ctrl->state < ARRAY_SIZE(state_name) &&
+           state_name[ctrl->state])
+               return sysfs_emit(buf, "%s\n", state_name[ctrl->state]);
+
+       return sysfs_emit(buf, "unknown state\n");
+}
+
+static DEVICE_ATTR(state, S_IRUGO, nvme_sysfs_show_state, NULL);
+
+static ssize_t nvme_sysfs_show_subsysnqn(struct device *dev,
+                                        struct device_attribute *attr,
+                                        char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       return sysfs_emit(buf, "%s\n", ctrl->subsys->subnqn);
+}
+static DEVICE_ATTR(subsysnqn, S_IRUGO, nvme_sysfs_show_subsysnqn, NULL);
+
+static ssize_t nvme_sysfs_show_hostnqn(struct device *dev,
+                                       struct device_attribute *attr,
+                                       char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       return sysfs_emit(buf, "%s\n", ctrl->opts->host->nqn);
+}
+static DEVICE_ATTR(hostnqn, S_IRUGO, nvme_sysfs_show_hostnqn, NULL);
+
+static ssize_t nvme_sysfs_show_hostid(struct device *dev,
+                                       struct device_attribute *attr,
+                                       char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       return sysfs_emit(buf, "%pU\n", &ctrl->opts->host->id);
+}
+static DEVICE_ATTR(hostid, S_IRUGO, nvme_sysfs_show_hostid, NULL);
+
+static ssize_t nvme_sysfs_show_address(struct device *dev,
+                                        struct device_attribute *attr,
+                                        char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       return ctrl->ops->get_address(ctrl, buf, PAGE_SIZE);
+}
+static DEVICE_ATTR(address, S_IRUGO, nvme_sysfs_show_address, NULL);
+
+static ssize_t nvme_ctrl_loss_tmo_show(struct device *dev,
+               struct device_attribute *attr, char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       struct nvmf_ctrl_options *opts = ctrl->opts;
+
+       if (ctrl->opts->max_reconnects == -1)
+               return sysfs_emit(buf, "off\n");
+       return sysfs_emit(buf, "%d\n",
+                         opts->max_reconnects * opts->reconnect_delay);
+}
+
+static ssize_t nvme_ctrl_loss_tmo_store(struct device *dev,
+               struct device_attribute *attr, const char *buf, size_t count)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       struct nvmf_ctrl_options *opts = ctrl->opts;
+       int ctrl_loss_tmo, err;
+
+       err = kstrtoint(buf, 10, &ctrl_loss_tmo);
+       if (err)
+               return -EINVAL;
+
+       if (ctrl_loss_tmo < 0)
+               opts->max_reconnects = -1;
+       else
+               opts->max_reconnects = DIV_ROUND_UP(ctrl_loss_tmo,
+                                               opts->reconnect_delay);
+       return count;
+}
+static DEVICE_ATTR(ctrl_loss_tmo, S_IRUGO | S_IWUSR,
+       nvme_ctrl_loss_tmo_show, nvme_ctrl_loss_tmo_store);
+
+static ssize_t nvme_ctrl_reconnect_delay_show(struct device *dev,
+               struct device_attribute *attr, char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       if (ctrl->opts->reconnect_delay == -1)
+               return sysfs_emit(buf, "off\n");
+       return sysfs_emit(buf, "%d\n", ctrl->opts->reconnect_delay);
+}
+
+static ssize_t nvme_ctrl_reconnect_delay_store(struct device *dev,
+               struct device_attribute *attr, const char *buf, size_t count)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       unsigned int v;
+       int err;
+
+       err = kstrtou32(buf, 10, &v);
+       if (err)
+               return err;
+
+       ctrl->opts->reconnect_delay = v;
+       return count;
+}
+static DEVICE_ATTR(reconnect_delay, S_IRUGO | S_IWUSR,
+       nvme_ctrl_reconnect_delay_show, nvme_ctrl_reconnect_delay_store);
+
+static ssize_t nvme_ctrl_fast_io_fail_tmo_show(struct device *dev,
+               struct device_attribute *attr, char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       if (ctrl->opts->fast_io_fail_tmo == -1)
+               return sysfs_emit(buf, "off\n");
+       return sysfs_emit(buf, "%d\n", ctrl->opts->fast_io_fail_tmo);
+}
+
+static ssize_t nvme_ctrl_fast_io_fail_tmo_store(struct device *dev,
+               struct device_attribute *attr, const char *buf, size_t count)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       struct nvmf_ctrl_options *opts = ctrl->opts;
+       int fast_io_fail_tmo, err;
+
+       err = kstrtoint(buf, 10, &fast_io_fail_tmo);
+       if (err)
+               return -EINVAL;
+
+       if (fast_io_fail_tmo < 0)
+               opts->fast_io_fail_tmo = -1;
+       else
+               opts->fast_io_fail_tmo = fast_io_fail_tmo;
+       return count;
+}
+static DEVICE_ATTR(fast_io_fail_tmo, S_IRUGO | S_IWUSR,
+       nvme_ctrl_fast_io_fail_tmo_show, nvme_ctrl_fast_io_fail_tmo_store);
+
+static ssize_t cntrltype_show(struct device *dev,
+                             struct device_attribute *attr, char *buf)
+{
+       static const char * const type[] = {
+               [NVME_CTRL_IO] = "io\n",
+               [NVME_CTRL_DISC] = "discovery\n",
+               [NVME_CTRL_ADMIN] = "admin\n",
+       };
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       if (ctrl->cntrltype > NVME_CTRL_ADMIN || !type[ctrl->cntrltype])
+               return sysfs_emit(buf, "reserved\n");
+
+       return sysfs_emit(buf, type[ctrl->cntrltype]);
+}
+static DEVICE_ATTR_RO(cntrltype);
+
+static ssize_t dctype_show(struct device *dev,
+                          struct device_attribute *attr, char *buf)
+{
+       static const char * const type[] = {
+               [NVME_DCTYPE_NOT_REPORTED] = "none\n",
+               [NVME_DCTYPE_DDC] = "ddc\n",
+               [NVME_DCTYPE_CDC] = "cdc\n",
+       };
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       if (ctrl->dctype > NVME_DCTYPE_CDC || !type[ctrl->dctype])
+               return sysfs_emit(buf, "reserved\n");
+
+       return sysfs_emit(buf, type[ctrl->dctype]);
+}
+static DEVICE_ATTR_RO(dctype);
+
+#ifdef CONFIG_NVME_AUTH
+static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
+               struct device_attribute *attr, char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       struct nvmf_ctrl_options *opts = ctrl->opts;
+
+       if (!opts->dhchap_secret)
+               return sysfs_emit(buf, "none\n");
+       return sysfs_emit(buf, "%s\n", opts->dhchap_secret);
+}
+
+static ssize_t nvme_ctrl_dhchap_secret_store(struct device *dev,
+               struct device_attribute *attr, const char *buf, size_t count)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       struct nvmf_ctrl_options *opts = ctrl->opts;
+       char *dhchap_secret;
+
+       if (!ctrl->opts->dhchap_secret)
+               return -EINVAL;
+       if (count < 7)
+               return -EINVAL;
+       if (memcmp(buf, "DHHC-1:", 7))
+               return -EINVAL;
+
+       dhchap_secret = kzalloc(count + 1, GFP_KERNEL);
+       if (!dhchap_secret)
+               return -ENOMEM;
+       memcpy(dhchap_secret, buf, count);
+       nvme_auth_stop(ctrl);
+       if (strcmp(dhchap_secret, opts->dhchap_secret)) {
+               struct nvme_dhchap_key *key, *host_key;
+               int ret;
+
+               ret = nvme_auth_generate_key(dhchap_secret, &key);
+               if (ret) {
+                       kfree(dhchap_secret);
+                       return ret;
+               }
+               kfree(opts->dhchap_secret);
+               opts->dhchap_secret = dhchap_secret;
+               host_key = ctrl->host_key;
+               mutex_lock(&ctrl->dhchap_auth_mutex);
+               ctrl->host_key = key;
+               mutex_unlock(&ctrl->dhchap_auth_mutex);
+               nvme_auth_free_key(host_key);
+       } else
+               kfree(dhchap_secret);
+       /* Start re-authentication */
+       dev_info(ctrl->device, "re-authenticating controller\n");
+       queue_work(nvme_wq, &ctrl->dhchap_auth_work);
+
+       return count;
+}
+
+static DEVICE_ATTR(dhchap_secret, S_IRUGO | S_IWUSR,
+       nvme_ctrl_dhchap_secret_show, nvme_ctrl_dhchap_secret_store);
+
+static ssize_t nvme_ctrl_dhchap_ctrl_secret_show(struct device *dev,
+               struct device_attribute *attr, char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       struct nvmf_ctrl_options *opts = ctrl->opts;
+
+       if (!opts->dhchap_ctrl_secret)
+               return sysfs_emit(buf, "none\n");
+       return sysfs_emit(buf, "%s\n", opts->dhchap_ctrl_secret);
+}
+
+static ssize_t nvme_ctrl_dhchap_ctrl_secret_store(struct device *dev,
+               struct device_attribute *attr, const char *buf, size_t count)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       struct nvmf_ctrl_options *opts = ctrl->opts;
+       char *dhchap_secret;
+
+       if (!ctrl->opts->dhchap_ctrl_secret)
+               return -EINVAL;
+       if (count < 7)
+               return -EINVAL;
+       if (memcmp(buf, "DHHC-1:", 7))
+               return -EINVAL;
+
+       dhchap_secret = kzalloc(count + 1, GFP_KERNEL);
+       if (!dhchap_secret)
+               return -ENOMEM;
+       memcpy(dhchap_secret, buf, count);
+       nvme_auth_stop(ctrl);
+       if (strcmp(dhchap_secret, opts->dhchap_ctrl_secret)) {
+               struct nvme_dhchap_key *key, *ctrl_key;
+               int ret;
+
+               ret = nvme_auth_generate_key(dhchap_secret, &key);
+               if (ret) {
+                       kfree(dhchap_secret);
+                       return ret;
+               }
+               kfree(opts->dhchap_ctrl_secret);
+               opts->dhchap_ctrl_secret = dhchap_secret;
+               ctrl_key = ctrl->ctrl_key;
+               mutex_lock(&ctrl->dhchap_auth_mutex);
+               ctrl->ctrl_key = key;
+               mutex_unlock(&ctrl->dhchap_auth_mutex);
+               nvme_auth_free_key(ctrl_key);
+       } else
+               kfree(dhchap_secret);
+       /* Start re-authentication */
+       dev_info(ctrl->device, "re-authenticating controller\n");
+       queue_work(nvme_wq, &ctrl->dhchap_auth_work);
+
+       return count;
+}
+
+static DEVICE_ATTR(dhchap_ctrl_secret, S_IRUGO | S_IWUSR,
+       nvme_ctrl_dhchap_ctrl_secret_show, nvme_ctrl_dhchap_ctrl_secret_store);
+#endif
+
+static struct attribute *nvme_dev_attrs[] = {
+       &dev_attr_reset_controller.attr,
+       &dev_attr_rescan_controller.attr,
+       &dev_attr_model.attr,
+       &dev_attr_serial.attr,
+       &dev_attr_firmware_rev.attr,
+       &dev_attr_cntlid.attr,
+       &dev_attr_delete_controller.attr,
+       &dev_attr_transport.attr,
+       &dev_attr_subsysnqn.attr,
+       &dev_attr_address.attr,
+       &dev_attr_state.attr,
+       &dev_attr_numa_node.attr,
+       &dev_attr_queue_count.attr,
+       &dev_attr_sqsize.attr,
+       &dev_attr_hostnqn.attr,
+       &dev_attr_hostid.attr,
+       &dev_attr_ctrl_loss_tmo.attr,
+       &dev_attr_reconnect_delay.attr,
+       &dev_attr_fast_io_fail_tmo.attr,
+       &dev_attr_kato.attr,
+       &dev_attr_cntrltype.attr,
+       &dev_attr_dctype.attr,
+#ifdef CONFIG_NVME_AUTH
+       &dev_attr_dhchap_secret.attr,
+       &dev_attr_dhchap_ctrl_secret.attr,
+#endif
+       NULL
+};
+
+static umode_t nvme_dev_attrs_are_visible(struct kobject *kobj,
+               struct attribute *a, int n)
+{
+       struct device *dev = container_of(kobj, struct device, kobj);
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       if (a == &dev_attr_delete_controller.attr && !ctrl->ops->delete_ctrl)
+               return 0;
+       if (a == &dev_attr_address.attr && !ctrl->ops->get_address)
+               return 0;
+       if (a == &dev_attr_hostnqn.attr && !ctrl->opts)
+               return 0;
+       if (a == &dev_attr_hostid.attr && !ctrl->opts)
+               return 0;
+       if (a == &dev_attr_ctrl_loss_tmo.attr && !ctrl->opts)
+               return 0;
+       if (a == &dev_attr_reconnect_delay.attr && !ctrl->opts)
+               return 0;
+       if (a == &dev_attr_fast_io_fail_tmo.attr && !ctrl->opts)
+               return 0;
+#ifdef CONFIG_NVME_AUTH
+       if (a == &dev_attr_dhchap_secret.attr && !ctrl->opts)
+               return 0;
+       if (a == &dev_attr_dhchap_ctrl_secret.attr && !ctrl->opts)
+               return 0;
+#endif
+
+       return a->mode;
+}
+
+const struct attribute_group nvme_dev_attrs_group = {
+       .attrs          = nvme_dev_attrs,
+       .is_visible     = nvme_dev_attrs_are_visible,
+};
+EXPORT_SYMBOL_GPL(nvme_dev_attrs_group);
+
+const struct attribute_group *nvme_dev_attr_groups[] = {
+       &nvme_dev_attrs_group,
+       NULL,
+};
+
+#define SUBSYS_ATTR_RO(_name, _mode, _show)                    \
+       struct device_attribute subsys_attr_##_name = \
+               __ATTR(_name, _mode, _show, NULL)
+
+static ssize_t nvme_subsys_show_nqn(struct device *dev,
+                                   struct device_attribute *attr,
+                                   char *buf)
+{
+       struct nvme_subsystem *subsys =
+               container_of(dev, struct nvme_subsystem, dev);
+
+       return sysfs_emit(buf, "%s\n", subsys->subnqn);
+}
+static SUBSYS_ATTR_RO(subsysnqn, S_IRUGO, nvme_subsys_show_nqn);
+
+static ssize_t nvme_subsys_show_type(struct device *dev,
+                                   struct device_attribute *attr,
+                                   char *buf)
+{
+       struct nvme_subsystem *subsys =
+               container_of(dev, struct nvme_subsystem, dev);
+
+       switch (subsys->subtype) {
+       case NVME_NQN_DISC:
+               return sysfs_emit(buf, "discovery\n");
+       case NVME_NQN_NVME:
+               return sysfs_emit(buf, "nvm\n");
+       default:
+               return sysfs_emit(buf, "reserved\n");
+       }
+}
+static SUBSYS_ATTR_RO(subsystype, S_IRUGO, nvme_subsys_show_type);
+
+#define nvme_subsys_show_str_function(field)                           \
+static ssize_t subsys_##field##_show(struct device *dev,               \
+                           struct device_attribute *attr, char *buf)   \
+{                                                                      \
+       struct nvme_subsystem *subsys =                                 \
+               container_of(dev, struct nvme_subsystem, dev);          \
+       return sysfs_emit(buf, "%.*s\n",                                \
+                          (int)sizeof(subsys->field), subsys->field);  \
+}                                                                      \
+static SUBSYS_ATTR_RO(field, S_IRUGO, subsys_##field##_show);
+
+nvme_subsys_show_str_function(model);
+nvme_subsys_show_str_function(serial);
+nvme_subsys_show_str_function(firmware_rev);
+
+static struct attribute *nvme_subsys_attrs[] = {
+       &subsys_attr_model.attr,
+       &subsys_attr_serial.attr,
+       &subsys_attr_firmware_rev.attr,
+       &subsys_attr_subsysnqn.attr,
+       &subsys_attr_subsystype.attr,
+#ifdef CONFIG_NVME_MULTIPATH
+       &subsys_attr_iopolicy.attr,
+#endif
+       NULL,
+};
+
+static const struct attribute_group nvme_subsys_attrs_group = {
+       .attrs = nvme_subsys_attrs,
+};
+
+const struct attribute_group *nvme_subsys_attrs_groups[] = {
+       &nvme_subsys_attrs_group,
+       NULL,
+};
index bf02304..260b355 100644 (file)
@@ -1802,58 +1802,12 @@ out_free_queues:
        return ret;
 }
 
-static unsigned int nvme_tcp_nr_io_queues(struct nvme_ctrl *ctrl)
-{
-       unsigned int nr_io_queues;
-
-       nr_io_queues = min(ctrl->opts->nr_io_queues, num_online_cpus());
-       nr_io_queues += min(ctrl->opts->nr_write_queues, num_online_cpus());
-       nr_io_queues += min(ctrl->opts->nr_poll_queues, num_online_cpus());
-
-       return nr_io_queues;
-}
-
-static void nvme_tcp_set_io_queues(struct nvme_ctrl *nctrl,
-               unsigned int nr_io_queues)
-{
-       struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
-       struct nvmf_ctrl_options *opts = nctrl->opts;
-
-       if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
-               /*
-                * separate read/write queues
-                * hand out dedicated default queues only after we have
-                * sufficient read queues.
-                */
-               ctrl->io_queues[HCTX_TYPE_READ] = opts->nr_io_queues;
-               nr_io_queues -= ctrl->io_queues[HCTX_TYPE_READ];
-               ctrl->io_queues[HCTX_TYPE_DEFAULT] =
-                       min(opts->nr_write_queues, nr_io_queues);
-               nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
-       } else {
-               /*
-                * shared read/write queues
-                * either no write queues were requested, or we don't have
-                * sufficient queue count to have dedicated default queues.
-                */
-               ctrl->io_queues[HCTX_TYPE_DEFAULT] =
-                       min(opts->nr_io_queues, nr_io_queues);
-               nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
-       }
-
-       if (opts->nr_poll_queues && nr_io_queues) {
-               /* map dedicated poll queues only if we have queues left */
-               ctrl->io_queues[HCTX_TYPE_POLL] =
-                       min(opts->nr_poll_queues, nr_io_queues);
-       }
-}
-
 static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
 {
        unsigned int nr_io_queues;
        int ret;
 
-       nr_io_queues = nvme_tcp_nr_io_queues(ctrl);
+       nr_io_queues = nvmf_nr_io_queues(ctrl->opts);
        ret = nvme_set_queue_count(ctrl, &nr_io_queues);
        if (ret)
                return ret;
@@ -1868,8 +1822,8 @@ static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
        dev_info(ctrl->device,
                "creating %d I/O queues.\n", nr_io_queues);
 
-       nvme_tcp_set_io_queues(ctrl, nr_io_queues);
-
+       nvmf_set_io_queues(ctrl->opts, nr_io_queues,
+                          to_tcp_ctrl(ctrl)->io_queues);
        return __nvme_tcp_alloc_io_queues(ctrl);
 }
 
@@ -2449,44 +2403,8 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx,
 static void nvme_tcp_map_queues(struct blk_mq_tag_set *set)
 {
        struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(set->driver_data);
-       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
-
-       if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
-               /* separate read/write queues */
-               set->map[HCTX_TYPE_DEFAULT].nr_queues =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT];
-               set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
-               set->map[HCTX_TYPE_READ].nr_queues =
-                       ctrl->io_queues[HCTX_TYPE_READ];
-               set->map[HCTX_TYPE_READ].queue_offset =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT];
-       } else {
-               /* shared read/write queues */
-               set->map[HCTX_TYPE_DEFAULT].nr_queues =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT];
-               set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
-               set->map[HCTX_TYPE_READ].nr_queues =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT];
-               set->map[HCTX_TYPE_READ].queue_offset = 0;
-       }
-       blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
-       blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
-
-       if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
-               /* map dedicated poll queues only if we have queues left */
-               set->map[HCTX_TYPE_POLL].nr_queues =
-                               ctrl->io_queues[HCTX_TYPE_POLL];
-               set->map[HCTX_TYPE_POLL].queue_offset =
-                       ctrl->io_queues[HCTX_TYPE_DEFAULT] +
-                       ctrl->io_queues[HCTX_TYPE_READ];
-               blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
-       }
-
-       dev_info(ctrl->ctrl.device,
-               "mapped %d/%d/%d default/read/poll queues.\n",
-               ctrl->io_queues[HCTX_TYPE_DEFAULT],
-               ctrl->io_queues[HCTX_TYPE_READ],
-               ctrl->io_queues[HCTX_TYPE_POLL]);
+
+       nvmf_map_queues(set, &ctrl->ctrl, ctrl->io_queues);
 }
 
 static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
index 7970a76..586458f 100644 (file)
@@ -295,13 +295,11 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
                        status = 0;
                }
                goto done_kfree;
-               break;
        case NVME_AUTH_DHCHAP_MESSAGE_SUCCESS2:
                req->sq->authenticated = true;
                pr_debug("%s: ctrl %d qid %d ctrl authenticated\n",
                         __func__, ctrl->cntlid, req->sq->qid);
                goto done_kfree;
-               break;
        case NVME_AUTH_DHCHAP_MESSAGE_FAILURE2:
                status = nvmet_auth_failure2(d);
                if (status) {
@@ -312,7 +310,6 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
                        status = 0;
                }
                goto done_kfree;
-               break;
        default:
                req->sq->dhchap_status =
                        NVME_AUTH_DHCHAP_FAILURE_INCORRECT_MESSAGE;
@@ -320,7 +317,6 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
                        NVME_AUTH_DHCHAP_MESSAGE_FAILURE2;
                req->sq->authenticated = false;
                goto done_kfree;
-               break;
        }
 done_failure1:
        req->sq->dhchap_status = NVME_AUTH_DHCHAP_FAILURE_INCORRECT_MESSAGE;
@@ -483,15 +479,6 @@ void nvmet_execute_auth_receive(struct nvmet_req *req)
                        status = NVME_SC_INTERNAL;
                        break;
                }
-               if (status) {
-                       req->sq->dhchap_status = status;
-                       nvmet_auth_failure1(req, d, al);
-                       pr_warn("ctrl %d qid %d: challenge status (%x)\n",
-                               ctrl->cntlid, req->sq->qid,
-                               req->sq->dhchap_status);
-                       status = 0;
-                       break;
-               }
                req->sq->dhchap_step = NVME_AUTH_DHCHAP_MESSAGE_REPLY;
                break;
        case NVME_AUTH_DHCHAP_MESSAGE_SUCCESS1:
index e940a7d..c65a734 100644 (file)
@@ -645,8 +645,6 @@ fcloop_fcp_recv_work(struct work_struct *work)
        }
        if (ret)
                fcloop_call_host_done(fcpreq, tfcp_req, ret);
-
-       return;
 }
 
 static void
@@ -1168,7 +1166,8 @@ __wait_localport_unreg(struct fcloop_lport *lport)
 
        ret = nvme_fc_unregister_localport(lport->localport);
 
-       wait_for_completion(&lport->unreg_done);
+       if (!ret)
+               wait_for_completion(&lport->unreg_done);
 
        kfree(lport);
 
index c2d6cea..2733e01 100644 (file)
@@ -51,7 +51,7 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
 {
        if (ns->bdev) {
-               blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
+               blkdev_put(ns->bdev, NULL);
                ns->bdev = NULL;
        }
 }
@@ -85,7 +85,7 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
                return -ENOTBLK;
 
        ns->bdev = blkdev_get_by_path(ns->device_path,
-                       FMODE_READ | FMODE_WRITE, NULL);
+                       BLK_OPEN_READ | BLK_OPEN_WRITE, NULL, NULL);
        if (IS_ERR(ns->bdev)) {
                ret = PTR_ERR(ns->bdev);
                if (ret != -ENOTBLK) {
index dc60a22..6cf723b 100644 (file)
@@ -109,8 +109,8 @@ struct nvmet_sq {
        u32                     sqhd;
        bool                    sqhd_disabled;
 #ifdef CONFIG_NVME_TARGET_AUTH
-       struct delayed_work     auth_expired_work;
        bool                    authenticated;
+       struct delayed_work     auth_expired_work;
        u16                     dhchap_tid;
        u16                     dhchap_status;
        int                     dhchap_step;
index 511c980..71a9c1c 100644 (file)
@@ -243,7 +243,7 @@ static void nvmet_passthru_execute_cmd_work(struct work_struct *w)
        blk_mq_free_request(rq);
 
        if (effects)
-               nvme_passthru_end(ctrl, effects, req->cmd, status);
+               nvme_passthru_end(ctrl, ns, effects, req->cmd, status);
 }
 
 static enum rq_end_io_ret nvmet_passthru_req_done(struct request *rq,
index 2e01960..7feb643 100644 (file)
@@ -811,6 +811,7 @@ static int init_overlay_changeset(struct overlay_changeset *ovcs)
                if (!fragment->target) {
                        pr_err("symbols in overlay, but not in live tree\n");
                        ret = -EINVAL;
+                       of_node_put(node);
                        goto err_out;
                }
 
index bc32662..2d93d0c 100644 (file)
@@ -489,7 +489,10 @@ struct hv_pcibus_device {
        struct fwnode_handle *fwnode;
        /* Protocol version negotiated with the host */
        enum pci_protocol_version_t protocol_version;
+
+       struct mutex state_lock;
        enum hv_pcibus_state state;
+
        struct hv_device *hdev;
        resource_size_t low_mmio_space;
        resource_size_t high_mmio_space;
@@ -545,19 +548,10 @@ struct hv_dr_state {
        struct hv_pcidev_description func[];
 };
 
-enum hv_pcichild_state {
-       hv_pcichild_init = 0,
-       hv_pcichild_requirements,
-       hv_pcichild_resourced,
-       hv_pcichild_ejecting,
-       hv_pcichild_maximum
-};
-
 struct hv_pci_dev {
        /* List protected by pci_rescan_remove_lock */
        struct list_head list_entry;
        refcount_t refs;
-       enum hv_pcichild_state state;
        struct pci_slot *pci_slot;
        struct hv_pcidev_description desc;
        bool reported_missing;
@@ -635,6 +629,11 @@ static void hv_arch_irq_unmask(struct irq_data *data)
        pbus = pdev->bus;
        hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
        int_desc = data->chip_data;
+       if (!int_desc) {
+               dev_warn(&hbus->hdev->device, "%s() can not unmask irq %u\n",
+                        __func__, data->irq);
+               return;
+       }
 
        local_irq_save(flags);
 
@@ -2004,12 +2003,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
                hv_pci_onchannelcallback(hbus);
                spin_unlock_irqrestore(&channel->sched_lock, flags);
 
-               if (hpdev->state == hv_pcichild_ejecting) {
-                       dev_err_once(&hbus->hdev->device,
-                                    "the device is being ejected\n");
-                       goto enable_tasklet;
-               }
-
                udelay(100);
        }
 
@@ -2615,6 +2608,8 @@ static void pci_devices_present_work(struct work_struct *work)
        if (!dr)
                return;
 
+       mutex_lock(&hbus->state_lock);
+
        /* First, mark all existing children as reported missing. */
        spin_lock_irqsave(&hbus->device_list_lock, flags);
        list_for_each_entry(hpdev, &hbus->children, list_entry) {
@@ -2696,6 +2691,8 @@ static void pci_devices_present_work(struct work_struct *work)
                break;
        }
 
+       mutex_unlock(&hbus->state_lock);
+
        kfree(dr);
 }
 
@@ -2844,7 +2841,7 @@ static void hv_eject_device_work(struct work_struct *work)
        hpdev = container_of(work, struct hv_pci_dev, wrk);
        hbus = hpdev->hbus;
 
-       WARN_ON(hpdev->state != hv_pcichild_ejecting);
+       mutex_lock(&hbus->state_lock);
 
        /*
         * Ejection can come before or after the PCI bus has been set up, so
@@ -2882,6 +2879,8 @@ static void hv_eject_device_work(struct work_struct *work)
        put_pcichild(hpdev);
        put_pcichild(hpdev);
        /* hpdev has been freed. Do not use it any more. */
+
+       mutex_unlock(&hbus->state_lock);
 }
 
 /**
@@ -2902,7 +2901,6 @@ static void hv_pci_eject_device(struct hv_pci_dev *hpdev)
                return;
        }
 
-       hpdev->state = hv_pcichild_ejecting;
        get_pcichild(hpdev);
        INIT_WORK(&hpdev->wrk, hv_eject_device_work);
        queue_work(hbus->wq, &hpdev->wrk);
@@ -3331,8 +3329,10 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
        struct pci_bus_d0_entry *d0_entry;
        struct hv_pci_compl comp_pkt;
        struct pci_packet *pkt;
+       bool retry = true;
        int ret;
 
+enter_d0_retry:
        /*
         * Tell the host that the bus is ready to use, and moved into the
         * powered-on state.  This includes telling the host which region
@@ -3359,6 +3359,38 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
        if (ret)
                goto exit;
 
+       /*
+        * In certain case (Kdump) the pci device of interest was
+        * not cleanly shut down and resource is still held on host
+        * side, the host could return invalid device status.
+        * We need to explicitly request host to release the resource
+        * and try to enter D0 again.
+        */
+       if (comp_pkt.completion_status < 0 && retry) {
+               retry = false;
+
+               dev_err(&hdev->device, "Retrying D0 Entry\n");
+
+               /*
+                * Hv_pci_bus_exit() calls hv_send_resource_released()
+                * to free up resources of its child devices.
+                * In the kdump kernel we need to set the
+                * wslot_res_allocated to 255 so it scans all child
+                * devices to release resources allocated in the
+                * normal kernel before panic happened.
+                */
+               hbus->wslot_res_allocated = 255;
+
+               ret = hv_pci_bus_exit(hdev, true);
+
+               if (ret == 0) {
+                       kfree(pkt);
+                       goto enter_d0_retry;
+               }
+               dev_err(&hdev->device,
+                       "Retrying D0 failed with ret %d\n", ret);
+       }
+
        if (comp_pkt.completion_status < 0) {
                dev_err(&hdev->device,
                        "PCI Pass-through VSP failed D0 Entry with status %x\n",
@@ -3401,6 +3433,24 @@ static int hv_pci_query_relations(struct hv_device *hdev)
        if (!ret)
                ret = wait_for_response(hdev, &comp);
 
+       /*
+        * In the case of fast device addition/removal, it's possible that
+        * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we
+        * already got a PCI_BUS_RELATIONS* message from the host and the
+        * channel callback already scheduled a work to hbus->wq, which can be
+        * running pci_devices_present_work() -> survey_child_resources() ->
+        * complete(&hbus->survey_event), even after hv_pci_query_relations()
+        * exits and the stack variable 'comp' is no longer valid; as a result,
+        * a hang or a page fault may happen when the complete() calls
+        * raw_spin_lock_irqsave(). Flush hbus->wq before we exit from
+        * hv_pci_query_relations() to avoid the issues. Note: if 'ret' is
+        * -ENODEV, there can't be any more work item scheduled to hbus->wq
+        * after the flush_workqueue(): see vmbus_onoffer_rescind() ->
+        * vmbus_reset_channel_cb(), vmbus_rescind_cleanup() ->
+        * channel->rescind = true.
+        */
+       flush_workqueue(hbus->wq);
+
        return ret;
 }
 
@@ -3586,7 +3636,6 @@ static int hv_pci_probe(struct hv_device *hdev,
        struct hv_pcibus_device *hbus;
        u16 dom_req, dom;
        char *name;
-       bool enter_d0_retry = true;
        int ret;
 
        bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
@@ -3598,6 +3647,7 @@ static int hv_pci_probe(struct hv_device *hdev,
                return -ENOMEM;
 
        hbus->bridge = bridge;
+       mutex_init(&hbus->state_lock);
        hbus->state = hv_pcibus_init;
        hbus->wslot_res_allocated = -1;
 
@@ -3703,49 +3753,15 @@ static int hv_pci_probe(struct hv_device *hdev,
        if (ret)
                goto free_fwnode;
 
-retry:
        ret = hv_pci_query_relations(hdev);
        if (ret)
                goto free_irq_domain;
 
-       ret = hv_pci_enter_d0(hdev);
-       /*
-        * In certain case (Kdump) the pci device of interest was
-        * not cleanly shut down and resource is still held on host
-        * side, the host could return invalid device status.
-        * We need to explicitly request host to release the resource
-        * and try to enter D0 again.
-        * Since the hv_pci_bus_exit() call releases structures
-        * of all its child devices, we need to start the retry from
-        * hv_pci_query_relations() call, requesting host to send
-        * the synchronous child device relations message before this
-        * information is needed in hv_send_resources_allocated()
-        * call later.
-        */
-       if (ret == -EPROTO && enter_d0_retry) {
-               enter_d0_retry = false;
-
-               dev_err(&hdev->device, "Retrying D0 Entry\n");
-
-               /*
-                * Hv_pci_bus_exit() calls hv_send_resources_released()
-                * to free up resources of its child devices.
-                * In the kdump kernel we need to set the
-                * wslot_res_allocated to 255 so it scans all child
-                * devices to release resources allocated in the
-                * normal kernel before panic happened.
-                */
-               hbus->wslot_res_allocated = 255;
-               ret = hv_pci_bus_exit(hdev, true);
-
-               if (ret == 0)
-                       goto retry;
+       mutex_lock(&hbus->state_lock);
 
-               dev_err(&hdev->device,
-                       "Retrying D0 failed with ret %d\n", ret);
-       }
+       ret = hv_pci_enter_d0(hdev);
        if (ret)
-               goto free_irq_domain;
+               goto release_state_lock;
 
        ret = hv_pci_allocate_bridge_windows(hbus);
        if (ret)
@@ -3763,12 +3779,15 @@ retry:
        if (ret)
                goto free_windows;
 
+       mutex_unlock(&hbus->state_lock);
        return 0;
 
 free_windows:
        hv_pci_free_bridge_windows(hbus);
 exit_d0:
        (void) hv_pci_bus_exit(hdev, true);
+release_state_lock:
+       mutex_unlock(&hbus->state_lock);
 free_irq_domain:
        irq_domain_remove(hbus->irq_domain);
 free_fwnode:
@@ -4018,20 +4037,26 @@ static int hv_pci_resume(struct hv_device *hdev)
        if (ret)
                goto out;
 
+       mutex_lock(&hbus->state_lock);
+
        ret = hv_pci_enter_d0(hdev);
        if (ret)
-               goto out;
+               goto release_state_lock;
 
        ret = hv_send_resources_allocated(hdev);
        if (ret)
-               goto out;
+               goto release_state_lock;
 
        prepopulate_bars(hbus);
 
        hv_pci_restore_msi_state(hbus);
 
        hbus->state = hv_pcibus_installed;
+       mutex_unlock(&hbus->state_lock);
        return 0;
+
+release_state_lock:
+       mutex_unlock(&hbus->state_lock);
 out:
        vmbus_close(hdev->channel);
        return ret;
index f4e2a88..c525867 100644 (file)
@@ -6003,8 +6003,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c1, aspm_l1_acceptable_latency
 
 #ifdef CONFIG_PCIE_DPC
 /*
- * Intel Tiger Lake and Alder Lake BIOS has a bug that clears the DPC
- * RP PIO Log Size of the integrated Thunderbolt PCIe Root Ports.
+ * Intel Ice Lake, Tiger Lake and Alder Lake BIOS has a bug that clears
+ * the DPC RP PIO Log Size of the integrated Thunderbolt PCIe Root
+ * Ports.
  */
 static void dpc_log_size(struct pci_dev *dev)
 {
@@ -6027,6 +6028,10 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x461f, dpc_log_size);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x462f, dpc_log_size);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x463f, dpc_log_size);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x466e, dpc_log_size);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x8a1d, dpc_log_size);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x8a1f, dpc_log_size);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x8a21, dpc_log_size);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x8a23, dpc_log_size);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a23, dpc_log_size);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a25, dpc_log_size);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a27, dpc_log_size);
index 711f824..4c07d71 100644 (file)
@@ -127,6 +127,14 @@ config FSL_IMX8_DDR_PMU
          can give information about memory throughput and other related
          events.
 
+config FSL_IMX9_DDR_PMU
+       tristate "Freescale i.MX9 DDR perf monitor"
+       depends on ARCH_MXC
+        help
+        Provides support for the DDR performance monitor in i.MX9, which
+        can give information about memory throughput and other related
+        events.
+
 config QCOM_L2_PMU
        bool "Qualcomm Technologies L2-cache PMU"
        depends on ARCH_QCOM && ARM64 && ACPI
index dabc859..5cfe895 100644 (file)
@@ -8,6 +8,7 @@ obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
 obj-$(CONFIG_ARM_PMUV3) += arm_pmuv3.o
 obj-$(CONFIG_ARM_SMMU_V3_PMU) += arm_smmuv3_pmu.o
 obj-$(CONFIG_FSL_IMX8_DDR_PMU) += fsl_imx8_ddr_perf.o
+obj-$(CONFIG_FSL_IMX9_DDR_PMU) += fsl_imx9_ddr_perf.o
 obj-$(CONFIG_HISI_PMU) += hisilicon/
 obj-$(CONFIG_QCOM_L2_PMU)      += qcom_l2_pmu.o
 obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
index 8574c6e..cd2de44 100644 (file)
@@ -493,6 +493,17 @@ static int m1_pmu_map_event(struct perf_event *event)
        return armpmu_map_event(event, &m1_pmu_perf_map, NULL, M1_PMU_CFG_EVENT);
 }
 
+static int m2_pmu_map_event(struct perf_event *event)
+{
+       /*
+        * Same deal as the above, except that M2 has 64bit counters.
+        * Which, as far as we're concerned, actually means 63 bits.
+        * Yes, this is getting awkward.
+        */
+       event->hw.flags |= ARMPMU_EVT_63BIT;
+       return armpmu_map_event(event, &m1_pmu_perf_map, NULL, M1_PMU_CFG_EVENT);
+}
+
 static void m1_pmu_reset(void *info)
 {
        int i;
@@ -525,7 +536,7 @@ static int m1_pmu_set_event_filter(struct hw_perf_event *event,
        return 0;
 }
 
-static int m1_pmu_init(struct arm_pmu *cpu_pmu)
+static int m1_pmu_init(struct arm_pmu *cpu_pmu, u32 flags)
 {
        cpu_pmu->handle_irq       = m1_pmu_handle_irq;
        cpu_pmu->enable           = m1_pmu_enable_event;
@@ -536,7 +547,14 @@ static int m1_pmu_init(struct arm_pmu *cpu_pmu)
        cpu_pmu->clear_event_idx  = m1_pmu_clear_event_idx;
        cpu_pmu->start            = m1_pmu_start;
        cpu_pmu->stop             = m1_pmu_stop;
-       cpu_pmu->map_event        = m1_pmu_map_event;
+
+       if (flags & ARMPMU_EVT_47BIT)
+               cpu_pmu->map_event = m1_pmu_map_event;
+       else if (flags & ARMPMU_EVT_63BIT)
+               cpu_pmu->map_event = m2_pmu_map_event;
+       else
+               return WARN_ON(-EINVAL);
+
        cpu_pmu->reset            = m1_pmu_reset;
        cpu_pmu->set_event_filter = m1_pmu_set_event_filter;
 
@@ -550,25 +568,25 @@ static int m1_pmu_init(struct arm_pmu *cpu_pmu)
 static int m1_pmu_ice_init(struct arm_pmu *cpu_pmu)
 {
        cpu_pmu->name = "apple_icestorm_pmu";
-       return m1_pmu_init(cpu_pmu);
+       return m1_pmu_init(cpu_pmu, ARMPMU_EVT_47BIT);
 }
 
 static int m1_pmu_fire_init(struct arm_pmu *cpu_pmu)
 {
        cpu_pmu->name = "apple_firestorm_pmu";
-       return m1_pmu_init(cpu_pmu);
+       return m1_pmu_init(cpu_pmu, ARMPMU_EVT_47BIT);
 }
 
 static int m2_pmu_avalanche_init(struct arm_pmu *cpu_pmu)
 {
        cpu_pmu->name = "apple_avalanche_pmu";
-       return m1_pmu_init(cpu_pmu);
+       return m1_pmu_init(cpu_pmu, ARMPMU_EVT_63BIT);
 }
 
 static int m2_pmu_blizzard_init(struct arm_pmu *cpu_pmu)
 {
        cpu_pmu->name = "apple_blizzard_pmu";
-       return m1_pmu_init(cpu_pmu);
+       return m1_pmu_init(cpu_pmu, ARMPMU_EVT_63BIT);
 }
 
 static const struct of_device_id m1_pmu_of_device_ids[] = {
index 03b1309..998259f 100644 (file)
@@ -645,7 +645,7 @@ static void cci_pmu_sync_counters(struct cci_pmu *cci_pmu)
        struct cci_pmu_hw_events *cci_hw = &cci_pmu->hw_events;
        DECLARE_BITMAP(mask, HW_CNTRS_MAX);
 
-       bitmap_zero(mask, cci_pmu->num_cntrs);
+       bitmap_zero(mask, HW_CNTRS_MAX);
        for_each_set_bit(i, cci_pmu->hw_events.used_mask, cci_pmu->num_cntrs) {
                struct perf_event *event = cci_hw->events[i];
 
@@ -656,7 +656,7 @@ static void cci_pmu_sync_counters(struct cci_pmu *cci_pmu)
                if (event->hw.state & PERF_HES_STOPPED)
                        continue;
                if (event->hw.state & PERF_HES_ARCH) {
-                       set_bit(i, mask);
+                       __set_bit(i, mask);
                        event->hw.state &= ~PERF_HES_ARCH;
                }
        }
index 47d359f..b8c1587 100644 (file)
 #define CMN_MAX_DTMS                   (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4)
 
 /* The CFG node has various info besides the discovery tree */
-#define CMN_CFGM_PERIPH_ID_2           0x0010
-#define CMN_CFGM_PID2_REVISION         GENMASK(7, 4)
+#define CMN_CFGM_PERIPH_ID_01          0x0008
+#define CMN_CFGM_PID0_PART_0           GENMASK_ULL(7, 0)
+#define CMN_CFGM_PID1_PART_1           GENMASK_ULL(35, 32)
+#define CMN_CFGM_PERIPH_ID_23          0x0010
+#define CMN_CFGM_PID2_REVISION         GENMASK_ULL(7, 4)
 
 #define CMN_CFGM_INFO_GLOBAL           0x900
 #define CMN_INFO_MULTIPLE_DTM_EN       BIT_ULL(63)
 #define CMN_WP_DOWN                    2
 
 
+/* Internal values for encoding event support */
 enum cmn_model {
        CMN600 = 1,
        CMN650 = 2,
@@ -197,26 +201,34 @@ enum cmn_model {
        CMN_650ON = CMN650 | CMN700,
 };
 
+/* Actual part numbers and revision IDs defined by the hardware */
+enum cmn_part {
+       PART_CMN600 = 0x434,
+       PART_CMN650 = 0x436,
+       PART_CMN700 = 0x43c,
+       PART_CI700 = 0x43a,
+};
+
 /* CMN-600 r0px shouldn't exist in silicon, thankfully */
 enum cmn_revision {
-       CMN600_R1P0,
-       CMN600_R1P1,
-       CMN600_R1P2,
-       CMN600_R1P3,
-       CMN600_R2P0,
-       CMN600_R3P0,
-       CMN600_R3P1,
-       CMN650_R0P0 = 0,
-       CMN650_R1P0,
-       CMN650_R1P1,
-       CMN650_R2P0,
-       CMN650_R1P2,
-       CMN700_R0P0 = 0,
-       CMN700_R1P0,
-       CMN700_R2P0,
-       CI700_R0P0 = 0,
-       CI700_R1P0,
-       CI700_R2P0,
+       REV_CMN600_R1P0,
+       REV_CMN600_R1P1,
+       REV_CMN600_R1P2,
+       REV_CMN600_R1P3,
+       REV_CMN600_R2P0,
+       REV_CMN600_R3P0,
+       REV_CMN600_R3P1,
+       REV_CMN650_R0P0 = 0,
+       REV_CMN650_R1P0,
+       REV_CMN650_R1P1,
+       REV_CMN650_R2P0,
+       REV_CMN650_R1P2,
+       REV_CMN700_R0P0 = 0,
+       REV_CMN700_R1P0,
+       REV_CMN700_R2P0,
+       REV_CI700_R0P0 = 0,
+       REV_CI700_R1P0,
+       REV_CI700_R2P0,
 };
 
 enum cmn_node_type {
@@ -306,7 +318,7 @@ struct arm_cmn {
        unsigned int state;
 
        enum cmn_revision rev;
-       enum cmn_model model;
+       enum cmn_part part;
        u8 mesh_x;
        u8 mesh_y;
        u16 num_xps;
@@ -394,19 +406,35 @@ static struct arm_cmn_node *arm_cmn_node(const struct arm_cmn *cmn,
        return NULL;
 }
 
+static enum cmn_model arm_cmn_model(const struct arm_cmn *cmn)
+{
+       switch (cmn->part) {
+       case PART_CMN600:
+               return CMN600;
+       case PART_CMN650:
+               return CMN650;
+       case PART_CMN700:
+               return CMN700;
+       case PART_CI700:
+               return CI700;
+       default:
+               return 0;
+       };
+}
+
 static u32 arm_cmn_device_connect_info(const struct arm_cmn *cmn,
                                       const struct arm_cmn_node *xp, int port)
 {
        int offset = CMN_MXP__CONNECT_INFO(port);
 
        if (port >= 2) {
-               if (cmn->model & (CMN600 | CMN650))
+               if (cmn->part == PART_CMN600 || cmn->part == PART_CMN650)
                        return 0;
                /*
                 * CI-700 may have extra ports, but still has the
                 * mesh_port_connect_info registers in the way.
                 */
-               if (cmn->model == CI700)
+               if (cmn->part == PART_CI700)
                        offset += CI700_CONNECT_INFO_P2_5_OFFSET;
        }
 
@@ -640,7 +668,7 @@ static umode_t arm_cmn_event_attr_is_visible(struct kobject *kobj,
 
        eattr = container_of(attr, typeof(*eattr), attr.attr);
 
-       if (!(eattr->model & cmn->model))
+       if (!(eattr->model & arm_cmn_model(cmn)))
                return 0;
 
        type = eattr->type;
@@ -658,7 +686,7 @@ static umode_t arm_cmn_event_attr_is_visible(struct kobject *kobj,
                if ((intf & 4) && !(cmn->ports_used & BIT(intf & 3)))
                        return 0;
 
-               if (chan == 4 && cmn->model == CMN600)
+               if (chan == 4 && cmn->part == PART_CMN600)
                        return 0;
 
                if ((chan == 5 && cmn->rsp_vc_num < 2) ||
@@ -669,19 +697,19 @@ static umode_t arm_cmn_event_attr_is_visible(struct kobject *kobj,
        }
 
        /* Revision-specific differences */
-       if (cmn->model == CMN600) {
-               if (cmn->rev < CMN600_R1P3) {
+       if (cmn->part == PART_CMN600) {
+               if (cmn->rev < REV_CMN600_R1P3) {
                        if (type == CMN_TYPE_CXRA && eventid > 0x10)
                                return 0;
                }
-               if (cmn->rev < CMN600_R1P2) {
+               if (cmn->rev < REV_CMN600_R1P2) {
                        if (type == CMN_TYPE_HNF && eventid == 0x1b)
                                return 0;
                        if (type == CMN_TYPE_CXRA || type == CMN_TYPE_CXHA)
                                return 0;
                }
-       } else if (cmn->model == CMN650) {
-               if (cmn->rev < CMN650_R2P0 || cmn->rev == CMN650_R1P2) {
+       } else if (cmn->part == PART_CMN650) {
+               if (cmn->rev < REV_CMN650_R2P0 || cmn->rev == REV_CMN650_R1P2) {
                        if (type == CMN_TYPE_HNF && eventid > 0x22)
                                return 0;
                        if (type == CMN_TYPE_SBSX && eventid == 0x17)
@@ -689,8 +717,8 @@ static umode_t arm_cmn_event_attr_is_visible(struct kobject *kobj,
                        if (type == CMN_TYPE_RNI && eventid > 0x10)
                                return 0;
                }
-       } else if (cmn->model == CMN700) {
-               if (cmn->rev < CMN700_R2P0) {
+       } else if (cmn->part == PART_CMN700) {
+               if (cmn->rev < REV_CMN700_R2P0) {
                        if (type == CMN_TYPE_HNF && eventid > 0x2c)
                                return 0;
                        if (type == CMN_TYPE_CCHA && eventid > 0x74)
@@ -698,7 +726,7 @@ static umode_t arm_cmn_event_attr_is_visible(struct kobject *kobj,
                        if (type == CMN_TYPE_CCLA && eventid > 0x27)
                                return 0;
                }
-               if (cmn->rev < CMN700_R1P0) {
+               if (cmn->rev < REV_CMN700_R1P0) {
                        if (type == CMN_TYPE_HNF && eventid > 0x2b)
                                return 0;
                }
@@ -1171,19 +1199,31 @@ static ssize_t arm_cmn_cpumask_show(struct device *dev,
 static struct device_attribute arm_cmn_cpumask_attr =
                __ATTR(cpumask, 0444, arm_cmn_cpumask_show, NULL);
 
-static struct attribute *arm_cmn_cpumask_attrs[] = {
+static ssize_t arm_cmn_identifier_show(struct device *dev,
+                                      struct device_attribute *attr, char *buf)
+{
+       struct arm_cmn *cmn = to_cmn(dev_get_drvdata(dev));
+
+       return sysfs_emit(buf, "%03x%02x\n", cmn->part, cmn->rev);
+}
+
+static struct device_attribute arm_cmn_identifier_attr =
+               __ATTR(identifier, 0444, arm_cmn_identifier_show, NULL);
+
+static struct attribute *arm_cmn_other_attrs[] = {
        &arm_cmn_cpumask_attr.attr,
+       &arm_cmn_identifier_attr.attr,
        NULL,
 };
 
-static const struct attribute_group arm_cmn_cpumask_attr_group = {
-       .attrs = arm_cmn_cpumask_attrs,
+static const struct attribute_group arm_cmn_other_attrs_group = {
+       .attrs = arm_cmn_other_attrs,
 };
 
 static const struct attribute_group *arm_cmn_attr_groups[] = {
        &arm_cmn_event_attrs_group,
        &arm_cmn_format_attrs_group,
-       &arm_cmn_cpumask_attr_group,
+       &arm_cmn_other_attrs_group,
        NULL
 };
 
@@ -1200,7 +1240,7 @@ static u32 arm_cmn_wp_config(struct perf_event *event)
        u32 grp = CMN_EVENT_WP_GRP(event);
        u32 exc = CMN_EVENT_WP_EXCLUSIVE(event);
        u32 combine = CMN_EVENT_WP_COMBINE(event);
-       bool is_cmn600 = to_cmn(event->pmu)->model == CMN600;
+       bool is_cmn600 = to_cmn(event->pmu)->part == PART_CMN600;
 
        config = FIELD_PREP(CMN_DTM_WPn_CONFIG_WP_DEV_SEL, dev) |
                 FIELD_PREP(CMN_DTM_WPn_CONFIG_WP_CHN_SEL, chn) |
@@ -1520,14 +1560,14 @@ done:
        return ret;
 }
 
-static enum cmn_filter_select arm_cmn_filter_sel(enum cmn_model model,
+static enum cmn_filter_select arm_cmn_filter_sel(const struct arm_cmn *cmn,
                                                 enum cmn_node_type type,
                                                 unsigned int eventid)
 {
        struct arm_cmn_event_attr *e;
-       int i;
+       enum cmn_model model = arm_cmn_model(cmn);
 
-       for (i = 0; i < ARRAY_SIZE(arm_cmn_event_attrs) - 1; i++) {
+       for (int i = 0; i < ARRAY_SIZE(arm_cmn_event_attrs) - 1; i++) {
                e = container_of(arm_cmn_event_attrs[i], typeof(*e), attr.attr);
                if (e->model & model && e->type == type && e->eventid == eventid)
                        return e->fsel;
@@ -1570,12 +1610,12 @@ static int arm_cmn_event_init(struct perf_event *event)
                /* ...but the DTM may depend on which port we're watching */
                if (cmn->multi_dtm)
                        hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2;
-       } else if (type == CMN_TYPE_XP && cmn->model == CMN700) {
+       } else if (type == CMN_TYPE_XP && cmn->part == PART_CMN700) {
                hw->wide_sel = true;
        }
 
        /* This is sufficiently annoying to recalculate, so cache it */
-       hw->filter_sel = arm_cmn_filter_sel(cmn->model, type, eventid);
+       hw->filter_sel = arm_cmn_filter_sel(cmn, type, eventid);
 
        bynodeid = CMN_EVENT_BYNODEID(event);
        nodeid = CMN_EVENT_NODEID(event);
@@ -1899,9 +1939,10 @@ static int arm_cmn_init_dtc(struct arm_cmn *cmn, struct arm_cmn_node *dn, int id
        if (dtc->irq < 0)
                return dtc->irq;
 
-       writel_relaxed(0, dtc->base + CMN_DT_PMCR);
+       writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL);
+       writel_relaxed(CMN_DT_PMCR_PMU_EN | CMN_DT_PMCR_OVFL_INTR_EN, dtc->base + CMN_DT_PMCR);
+       writeq_relaxed(0, dtc->base + CMN_DT_PMCCNTR);
        writel_relaxed(0x1ff, dtc->base + CMN_DT_PMOVSR_CLR);
-       writel_relaxed(CMN_DT_PMCR_OVFL_INTR_EN, dtc->base + CMN_DT_PMCR);
 
        return 0;
 }
@@ -1961,7 +2002,7 @@ static int arm_cmn_init_dtcs(struct arm_cmn *cmn)
                        dn->type = CMN_TYPE_CCLA;
        }
 
-       writel_relaxed(CMN_DT_DTC_CTL_DT_EN, cmn->dtc[0].base + CMN_DT_DTC_CTL);
+       arm_cmn_set_state(cmn, CMN_STATE_DISABLED);
 
        return 0;
 }
@@ -2006,6 +2047,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
        void __iomem *cfg_region;
        struct arm_cmn_node cfg, *dn;
        struct arm_cmn_dtm *dtm;
+       enum cmn_part part;
        u16 child_count, child_poff;
        u32 xp_offset[CMN_MAX_XPS];
        u64 reg;
@@ -2017,7 +2059,19 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
                return -ENODEV;
 
        cfg_region = cmn->base + rgn_offset;
-       reg = readl_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_2);
+
+       reg = readq_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_01);
+       part = FIELD_GET(CMN_CFGM_PID0_PART_0, reg);
+       part |= FIELD_GET(CMN_CFGM_PID1_PART_1, reg) << 8;
+       if (cmn->part && cmn->part != part)
+               dev_warn(cmn->dev,
+                        "Firmware binding mismatch: expected part number 0x%x, found 0x%x\n",
+                        cmn->part, part);
+       cmn->part = part;
+       if (!arm_cmn_model(cmn))
+               dev_warn(cmn->dev, "Unknown part number: 0x%x\n", part);
+
+       reg = readl_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_23);
        cmn->rev = FIELD_GET(CMN_CFGM_PID2_REVISION, reg);
 
        reg = readq_relaxed(cfg_region + CMN_CFGM_INFO_GLOBAL);
@@ -2081,7 +2135,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
                if (xp->id == (1 << 3))
                        cmn->mesh_x = xp->logid;
 
-               if (cmn->model == CMN600)
+               if (cmn->part == PART_CMN600)
                        xp->dtc = 0xf;
                else
                        xp->dtc = 1 << readl_relaxed(xp_region + CMN_DTM_UNIT_INFO);
@@ -2201,7 +2255,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
        if (cmn->num_xps == 1)
                dev_warn(cmn->dev, "1x1 config not fully supported, translate XP events manually\n");
 
-       dev_dbg(cmn->dev, "model %d, periph_id_2 revision %d\n", cmn->model, cmn->rev);
+       dev_dbg(cmn->dev, "periph_id part 0x%03x revision %d\n", cmn->part, cmn->rev);
        reg = cmn->ports_used;
        dev_dbg(cmn->dev, "mesh %dx%d, ID width %d, ports %6pbl%s\n",
                cmn->mesh_x, cmn->mesh_y, arm_cmn_xyidbits(cmn), &reg,
@@ -2256,17 +2310,17 @@ static int arm_cmn_probe(struct platform_device *pdev)
                return -ENOMEM;
 
        cmn->dev = &pdev->dev;
-       cmn->model = (unsigned long)device_get_match_data(cmn->dev);
+       cmn->part = (unsigned long)device_get_match_data(cmn->dev);
        platform_set_drvdata(pdev, cmn);
 
-       if (cmn->model == CMN600 && has_acpi_companion(cmn->dev)) {
+       if (cmn->part == PART_CMN600 && has_acpi_companion(cmn->dev)) {
                rootnode = arm_cmn600_acpi_probe(pdev, cmn);
        } else {
                rootnode = 0;
                cmn->base = devm_platform_ioremap_resource(pdev, 0);
                if (IS_ERR(cmn->base))
                        return PTR_ERR(cmn->base);
-               if (cmn->model == CMN600)
+               if (cmn->part == PART_CMN600)
                        rootnode = arm_cmn600_of_probe(pdev->dev.of_node);
        }
        if (rootnode < 0)
@@ -2335,10 +2389,10 @@ static int arm_cmn_remove(struct platform_device *pdev)
 
 #ifdef CONFIG_OF
 static const struct of_device_id arm_cmn_of_match[] = {
-       { .compatible = "arm,cmn-600", .data = (void *)CMN600 },
-       { .compatible = "arm,cmn-650", .data = (void *)CMN650 },
-       { .compatible = "arm,cmn-700", .data = (void *)CMN700 },
-       { .compatible = "arm,ci-700", .data = (void *)CI700 },
+       { .compatible = "arm,cmn-600", .data = (void *)PART_CMN600 },
+       { .compatible = "arm,cmn-650" },
+       { .compatible = "arm,cmn-700" },
+       { .compatible = "arm,ci-700" },
        {}
 };
 MODULE_DEVICE_TABLE(of, arm_cmn_of_match);
@@ -2346,9 +2400,9 @@ MODULE_DEVICE_TABLE(of, arm_cmn_of_match);
 
 #ifdef CONFIG_ACPI
 static const struct acpi_device_id arm_cmn_acpi_match[] = {
-       { "ARMHC600", CMN600 },
-       { "ARMHC650", CMN650 },
-       { "ARMHC700", CMN700 },
+       { "ARMHC600", PART_CMN600 },
+       { "ARMHC650" },
+       { "ARMHC700" },
        {}
 };
 MODULE_DEVICE_TABLE(acpi, arm_cmn_acpi_match);
index 0b316fe..25d25de 100644 (file)
@@ -4,8 +4,7 @@
 
 config ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
        tristate "ARM Coresight Architecture PMU"
-       depends on ARM64 && ACPI
-       depends on ACPI_APMT || COMPILE_TEST
+       depends on ARM64 || COMPILE_TEST
        help
          Provides support for performance monitoring unit (PMU) devices
          based on ARM CoreSight PMU architecture. Note that this PMU
index a3f1c41..e2b7827 100644 (file)
@@ -28,7 +28,6 @@
 #include <linux/module.h>
 #include <linux/perf_event.h>
 #include <linux/platform_device.h>
-#include <acpi/processor.h>
 
 #include "arm_cspmu.h"
 #include "nvidia_cspmu.h"
 #define ARM_CSPMU_ACTIVE_CPU_MASK              0x0
 #define ARM_CSPMU_ASSOCIATED_CPU_MASK          0x1
 
-/* Check if field f in flags is set with value v */
-#define CHECK_APMT_FLAG(flags, f, v) \
-       ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
-
 /* Check and use default if implementer doesn't provide attribute callback */
 #define CHECK_DEFAULT_IMPL_OPS(ops, callback)                  \
        do {                                                    \
 
 static unsigned long arm_cspmu_cpuhp_state;
 
+static struct acpi_apmt_node *arm_cspmu_apmt_node(struct device *dev)
+{
+       return *(struct acpi_apmt_node **)dev_get_platdata(dev);
+}
+
 /*
  * In CoreSight PMU architecture, all of the MMIO registers are 32-bit except
  * counter register. The counter register can be implemented as 32-bit or 64-bit
@@ -156,12 +156,6 @@ static u64 read_reg64_hilohi(const void __iomem *addr, u32 max_poll_count)
        return val;
 }
 
-/* Check if PMU supports 64-bit single copy atomic. */
-static inline bool supports_64bit_atomics(const struct arm_cspmu *cspmu)
-{
-       return CHECK_APMT_FLAG(cspmu->apmt_node->flags, ATOMIC, SUPP);
-}
-
 /* Check if cycle counter is supported. */
 static inline bool supports_cycle_counter(const struct arm_cspmu *cspmu)
 {
@@ -189,10 +183,10 @@ static inline bool use_64b_counter_reg(const struct arm_cspmu *cspmu)
 ssize_t arm_cspmu_sysfs_event_show(struct device *dev,
                                struct device_attribute *attr, char *buf)
 {
-       struct dev_ext_attribute *eattr =
-               container_of(attr, struct dev_ext_attribute, attr);
-       return sysfs_emit(buf, "event=0x%llx\n",
-                         (unsigned long long)eattr->var);
+       struct perf_pmu_events_attr *pmu_attr;
+
+       pmu_attr = container_of(attr, typeof(*pmu_attr), attr);
+       return sysfs_emit(buf, "event=0x%llx\n", pmu_attr->id);
 }
 EXPORT_SYMBOL_GPL(arm_cspmu_sysfs_event_show);
 
@@ -320,7 +314,7 @@ static const char *arm_cspmu_get_name(const struct arm_cspmu *cspmu)
        static atomic_t pmu_idx[ACPI_APMT_NODE_TYPE_COUNT] = { 0 };
 
        dev = cspmu->dev;
-       apmt_node = cspmu->apmt_node;
+       apmt_node = arm_cspmu_apmt_node(dev);
        pmu_type = apmt_node->type;
 
        if (pmu_type >= ACPI_APMT_NODE_TYPE_COUNT) {
@@ -397,8 +391,8 @@ static const struct impl_match impl_match[] = {
 static int arm_cspmu_init_impl_ops(struct arm_cspmu *cspmu)
 {
        int ret;
-       struct acpi_apmt_node *apmt_node = cspmu->apmt_node;
        struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
+       struct acpi_apmt_node *apmt_node = arm_cspmu_apmt_node(cspmu->dev);
        const struct impl_match *match = impl_match;
 
        /*
@@ -720,7 +714,7 @@ static u64 arm_cspmu_read_counter(struct perf_event *event)
                offset = counter_offset(sizeof(u64), event->hw.idx);
                counter_addr = cspmu->base1 + offset;
 
-               return supports_64bit_atomics(cspmu) ?
+               return cspmu->has_atomic_dword ?
                               readq(counter_addr) :
                               read_reg64_hilohi(counter_addr, HILOHI_MAX_POLL);
        }
@@ -911,24 +905,18 @@ static struct arm_cspmu *arm_cspmu_alloc(struct platform_device *pdev)
 {
        struct acpi_apmt_node *apmt_node;
        struct arm_cspmu *cspmu;
-       struct device *dev;
-
-       dev = &pdev->dev;
-       apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
-       if (!apmt_node) {
-               dev_err(dev, "failed to get APMT node\n");
-               return NULL;
-       }
+       struct device *dev = &pdev->dev;
 
        cspmu = devm_kzalloc(dev, sizeof(*cspmu), GFP_KERNEL);
        if (!cspmu)
                return NULL;
 
        cspmu->dev = dev;
-       cspmu->apmt_node = apmt_node;
-
        platform_set_drvdata(pdev, cspmu);
 
+       apmt_node = arm_cspmu_apmt_node(dev);
+       cspmu->has_atomic_dword = apmt_node->flags & ACPI_APMT_FLAGS_ATOMIC;
+
        return cspmu;
 }
 
@@ -936,11 +924,9 @@ static int arm_cspmu_init_mmio(struct arm_cspmu *cspmu)
 {
        struct device *dev;
        struct platform_device *pdev;
-       struct acpi_apmt_node *apmt_node;
 
        dev = cspmu->dev;
        pdev = to_platform_device(dev);
-       apmt_node = cspmu->apmt_node;
 
        /* Base address for page 0. */
        cspmu->base0 = devm_platform_ioremap_resource(pdev, 0);
@@ -951,7 +937,7 @@ static int arm_cspmu_init_mmio(struct arm_cspmu *cspmu)
 
        /* Base address for page 1 if supported. Otherwise point to page 0. */
        cspmu->base1 = cspmu->base0;
-       if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
+       if (platform_get_resource(pdev, IORESOURCE_MEM, 1)) {
                cspmu->base1 = devm_platform_ioremap_resource(pdev, 1);
                if (IS_ERR(cspmu->base1)) {
                        dev_err(dev, "ioremap failed for page-1 resource\n");
@@ -1048,19 +1034,14 @@ static int arm_cspmu_request_irq(struct arm_cspmu *cspmu)
        int irq, ret;
        struct device *dev;
        struct platform_device *pdev;
-       struct acpi_apmt_node *apmt_node;
 
        dev = cspmu->dev;
        pdev = to_platform_device(dev);
-       apmt_node = cspmu->apmt_node;
 
        /* Skip IRQ request if the PMU does not support overflow interrupt. */
-       if (apmt_node->ovflw_irq == 0)
-               return 0;
-
-       irq = platform_get_irq(pdev, 0);
+       irq = platform_get_irq_optional(pdev, 0);
        if (irq < 0)
-               return irq;
+               return irq == -ENXIO ? 0 : irq;
 
        ret = devm_request_irq(dev, irq, arm_cspmu_handle_irq,
                               IRQF_NOBALANCING | IRQF_NO_THREAD, dev_name(dev),
@@ -1075,6 +1056,9 @@ static int arm_cspmu_request_irq(struct arm_cspmu *cspmu)
        return 0;
 }
 
+#if defined(CONFIG_ACPI) && defined(CONFIG_ARM64)
+#include <acpi/processor.h>
+
 static inline int arm_cspmu_find_cpu_container(int cpu, u32 container_uid)
 {
        u32 acpi_uid;
@@ -1099,15 +1083,13 @@ static inline int arm_cspmu_find_cpu_container(int cpu, u32 container_uid)
        return -ENODEV;
 }
 
-static int arm_cspmu_get_cpus(struct arm_cspmu *cspmu)
+static int arm_cspmu_acpi_get_cpus(struct arm_cspmu *cspmu)
 {
-       struct device *dev;
        struct acpi_apmt_node *apmt_node;
        int affinity_flag;
        int cpu;
 
-       dev = cspmu->pmu.dev;
-       apmt_node = cspmu->apmt_node;
+       apmt_node = arm_cspmu_apmt_node(cspmu->dev);
        affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
 
        if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
@@ -1129,12 +1111,23 @@ static int arm_cspmu_get_cpus(struct arm_cspmu *cspmu)
        }
 
        if (cpumask_empty(&cspmu->associated_cpus)) {
-               dev_dbg(dev, "No cpu associated with the PMU\n");
+               dev_dbg(cspmu->dev, "No cpu associated with the PMU\n");
                return -ENODEV;
        }
 
        return 0;
 }
+#else
+static int arm_cspmu_acpi_get_cpus(struct arm_cspmu *cspmu)
+{
+       return -ENODEV;
+}
+#endif
+
+static int arm_cspmu_get_cpus(struct arm_cspmu *cspmu)
+{
+       return arm_cspmu_acpi_get_cpus(cspmu);
+}
 
 static int arm_cspmu_register_pmu(struct arm_cspmu *cspmu)
 {
@@ -1220,6 +1213,12 @@ static int arm_cspmu_device_remove(struct platform_device *pdev)
        return 0;
 }
 
+static const struct platform_device_id arm_cspmu_id[] = {
+       {DRVNAME, 0},
+       { },
+};
+MODULE_DEVICE_TABLE(platform, arm_cspmu_id);
+
 static struct platform_driver arm_cspmu_driver = {
        .driver = {
                        .name = DRVNAME,
@@ -1227,12 +1226,14 @@ static struct platform_driver arm_cspmu_driver = {
                },
        .probe = arm_cspmu_device_probe,
        .remove = arm_cspmu_device_remove,
+       .id_table = arm_cspmu_id,
 };
 
 static void arm_cspmu_set_active_cpu(int cpu, struct arm_cspmu *cspmu)
 {
        cpumask_set_cpu(cpu, &cspmu->active_cpu);
-       WARN_ON(irq_set_affinity(cspmu->irq, &cspmu->active_cpu));
+       if (cspmu->irq)
+               WARN_ON(irq_set_affinity(cspmu->irq, &cspmu->active_cpu));
 }
 
 static int arm_cspmu_cpu_online(unsigned int cpu, struct hlist_node *node)
index 51323b1..83df53d 100644 (file)
@@ -8,7 +8,6 @@
 #ifndef __ARM_CSPMU_H__
 #define __ARM_CSPMU_H__
 
-#include <linux/acpi.h>
 #include <linux/bitfield.h>
 #include <linux/cpumask.h>
 #include <linux/device.h>
@@ -118,16 +117,16 @@ struct arm_cspmu_impl {
 struct arm_cspmu {
        struct pmu pmu;
        struct device *dev;
-       struct acpi_apmt_node *apmt_node;
        const char *name;
        const char *identifier;
        void __iomem *base0;
        void __iomem *base1;
-       int irq;
        cpumask_t associated_cpus;
        cpumask_t active_cpu;
        struct hlist_node cpuhp_node;
+       int irq;
 
+       bool has_atomic_dword;
        u32 pmcfgr;
        u32 num_logical_ctrs;
        u32 num_set_clr_reg;
index 5de06f9..9d0f01c 100644 (file)
@@ -227,9 +227,31 @@ static const struct attribute_group dmc620_pmu_format_attr_group = {
        .attrs  = dmc620_pmu_formats_attrs,
 };
 
+static ssize_t dmc620_pmu_cpumask_show(struct device *dev,
+                                      struct device_attribute *attr, char *buf)
+{
+       struct dmc620_pmu *dmc620_pmu = to_dmc620_pmu(dev_get_drvdata(dev));
+
+       return cpumap_print_to_pagebuf(true, buf,
+                                      cpumask_of(dmc620_pmu->irq->cpu));
+}
+
+static struct device_attribute dmc620_pmu_cpumask_attr =
+       __ATTR(cpumask, 0444, dmc620_pmu_cpumask_show, NULL);
+
+static struct attribute *dmc620_pmu_cpumask_attrs[] = {
+       &dmc620_pmu_cpumask_attr.attr,
+       NULL,
+};
+
+static const struct attribute_group dmc620_pmu_cpumask_attr_group = {
+       .attrs = dmc620_pmu_cpumask_attrs,
+};
+
 static const struct attribute_group *dmc620_pmu_attr_groups[] = {
        &dmc620_pmu_events_attr_group,
        &dmc620_pmu_format_attr_group,
+       &dmc620_pmu_cpumask_attr_group,
        NULL,
 };
 
index 15bd1e3..277e29f 100644 (file)
@@ -109,6 +109,8 @@ static inline u64 arm_pmu_event_max_period(struct perf_event *event)
 {
        if (event->hw.flags & ARMPMU_EVT_64BIT)
                return GENMASK_ULL(63, 0);
+       else if (event->hw.flags & ARMPMU_EVT_63BIT)
+               return GENMASK_ULL(62, 0);
        else if (event->hw.flags & ARMPMU_EVT_47BIT)
                return GENMASK_ULL(46, 0);
        else
index c98e403..93b7edb 100644 (file)
@@ -677,9 +677,25 @@ static inline u32 armv8pmu_getreset_flags(void)
        return value;
 }
 
+static void update_pmuserenr(u64 val)
+{
+       lockdep_assert_irqs_disabled();
+
+       /*
+        * The current PMUSERENR_EL0 value might be the value for the guest.
+        * If that's the case, have KVM keep tracking of the register value
+        * for the host EL0 so that KVM can restore it before returning to
+        * the host EL0. Otherwise, update the register now.
+        */
+       if (kvm_set_pmuserenr(val))
+               return;
+
+       write_pmuserenr(val);
+}
+
 static void armv8pmu_disable_user_access(void)
 {
-       write_pmuserenr(0);
+       update_pmuserenr(0);
 }
 
 static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
@@ -695,8 +711,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
                        armv8pmu_write_evcntr(i, 0);
        }
 
-       write_pmuserenr(0);
-       write_pmuserenr(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR);
+       update_pmuserenr(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR);
 }
 
 static void armv8pmu_enable_event(struct perf_event *event)
diff --git a/drivers/perf/fsl_imx9_ddr_perf.c b/drivers/perf/fsl_imx9_ddr_perf.c
new file mode 100644 (file)
index 0000000..71d5b07
--- /dev/null
@@ -0,0 +1,711 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright 2023 NXP
+
+#include <linux/bitfield.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_device.h>
+#include <linux/of_irq.h>
+#include <linux/perf_event.h>
+
+/* Performance monitor configuration */
+#define PMCFG1                         0x00
+#define PMCFG1_RD_TRANS_FILT_EN        BIT(31)
+#define PMCFG1_WR_TRANS_FILT_EN        BIT(30)
+#define PMCFG1_RD_BT_FILT_EN           BIT(29)
+#define PMCFG1_ID_MASK                 GENMASK(17, 0)
+
+#define PMCFG2                         0x04
+#define PMCFG2_ID                      GENMASK(17, 0)
+
+/* Global control register affects all counters and takes priority over local control registers */
+#define PMGC0          0x40
+/* Global control register bits */
+#define PMGC0_FAC      BIT(31)
+#define PMGC0_PMIE     BIT(30)
+#define PMGC0_FCECE    BIT(29)
+
+/*
+ * 64bit counter0 exclusively dedicated to counting cycles
+ * 32bit counters monitor counter-specific events in addition to counting reference events
+ */
+#define PMLCA(n)       (0x40 + 0x10 + (0x10 * n))
+#define PMLCB(n)       (0x40 + 0x14 + (0x10 * n))
+#define PMC(n)         (0x40 + 0x18 + (0x10 * n))
+/* Local control register bits */
+#define PMLCA_FC       BIT(31)
+#define PMLCA_CE       BIT(26)
+#define PMLCA_EVENT    GENMASK(22, 16)
+
+#define NUM_COUNTERS           11
+#define CYCLES_COUNTER         0
+
+#define to_ddr_pmu(p)          container_of(p, struct ddr_pmu, pmu)
+
+#define DDR_PERF_DEV_NAME      "imx9_ddr"
+#define DDR_CPUHP_CB_NAME      DDR_PERF_DEV_NAME "_perf_pmu"
+
+static DEFINE_IDA(ddr_ida);
+
+struct imx_ddr_devtype_data {
+       const char *identifier;         /* system PMU identifier for userspace */
+};
+
+struct ddr_pmu {
+       struct pmu pmu;
+       void __iomem *base;
+       unsigned int cpu;
+       struct hlist_node node;
+       struct device *dev;
+       struct perf_event *events[NUM_COUNTERS];
+       int active_events;
+       enum cpuhp_state cpuhp_state;
+       const struct imx_ddr_devtype_data *devtype_data;
+       int irq;
+       int id;
+};
+
+static const struct imx_ddr_devtype_data imx93_devtype_data = {
+       .identifier = "imx93",
+};
+
+static const struct of_device_id imx_ddr_pmu_dt_ids[] = {
+       {.compatible = "fsl,imx93-ddr-pmu", .data = &imx93_devtype_data},
+       { /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, imx_ddr_pmu_dt_ids);
+
+static ssize_t ddr_perf_identifier_show(struct device *dev,
+                                       struct device_attribute *attr,
+                                       char *page)
+{
+       struct ddr_pmu *pmu = dev_get_drvdata(dev);
+
+       return sysfs_emit(page, "%s\n", pmu->devtype_data->identifier);
+}
+
+static struct device_attribute ddr_perf_identifier_attr =
+       __ATTR(identifier, 0444, ddr_perf_identifier_show, NULL);
+
+static struct attribute *ddr_perf_identifier_attrs[] = {
+       &ddr_perf_identifier_attr.attr,
+       NULL,
+};
+
+static struct attribute_group ddr_perf_identifier_attr_group = {
+       .attrs = ddr_perf_identifier_attrs,
+};
+
+static ssize_t ddr_perf_cpumask_show(struct device *dev,
+                                    struct device_attribute *attr, char *buf)
+{
+       struct ddr_pmu *pmu = dev_get_drvdata(dev);
+
+       return cpumap_print_to_pagebuf(true, buf, cpumask_of(pmu->cpu));
+}
+
+static struct device_attribute ddr_perf_cpumask_attr =
+       __ATTR(cpumask, 0444, ddr_perf_cpumask_show, NULL);
+
+static struct attribute *ddr_perf_cpumask_attrs[] = {
+       &ddr_perf_cpumask_attr.attr,
+       NULL,
+};
+
+static const struct attribute_group ddr_perf_cpumask_attr_group = {
+       .attrs = ddr_perf_cpumask_attrs,
+};
+
+static ssize_t ddr_pmu_event_show(struct device *dev,
+                                 struct device_attribute *attr, char *page)
+{
+       struct perf_pmu_events_attr *pmu_attr;
+
+       pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
+       return sysfs_emit(page, "event=0x%02llx\n", pmu_attr->id);
+}
+
+#define IMX9_DDR_PMU_EVENT_ATTR(_name, _id)                            \
+       (&((struct perf_pmu_events_attr[]) {                            \
+               { .attr = __ATTR(_name, 0444, ddr_pmu_event_show, NULL),\
+                 .id = _id, }                                          \
+       })[0].attr.attr)
+
+static struct attribute *ddr_perf_events_attrs[] = {
+       /* counter0 cycles event */
+       IMX9_DDR_PMU_EVENT_ATTR(cycles, 0),
+
+       /* reference events for all normal counters, need assert DEBUG19[21] bit */
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ddrc1_rmw_for_ecc, 12),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_rreorder, 13),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_wreorder, 14),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_0, 15),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_1, 16),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_2, 17),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_3, 18),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_4, 19),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_5, 22),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_6, 23),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_7, 24),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_8, 25),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_9, 26),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_10, 27),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_11, 28),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_12, 31),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_13, 59),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_15, 61),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_pm_29, 63),
+
+       /* counter1 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_riq_0, 64),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_riq_1, 65),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_riq_2, 66),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_riq_3, 67),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_riq_4, 68),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_riq_5, 69),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_riq_6, 70),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_riq_7, 71),
+
+       /* counter2 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_wiq_0, 64),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_wiq_1, 65),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_wiq_2, 66),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_wiq_3, 67),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_wiq_4, 68),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_wiq_5, 69),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_wiq_6, 70),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_ld_wiq_7, 71),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_empty, 72),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pm_rd_trans_filt, 73),
+
+       /* counter3 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_collision_0, 64),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_collision_1, 65),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_collision_2, 66),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_collision_3, 67),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_collision_4, 68),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_collision_5, 69),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_collision_6, 70),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_collision_7, 71),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_full, 72),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pm_wr_trans_filt, 73),
+
+       /* counter4 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_open_0, 64),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_open_1, 65),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_open_2, 66),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_open_3, 67),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_open_4, 68),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_open_5, 69),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_open_6, 70),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_row_open_7, 71),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_ld_rdq2_rmw, 72),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pm_rd_beat_filt, 73),
+
+       /* counter5 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_valid_start_0, 64),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_valid_start_1, 65),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_valid_start_2, 66),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_valid_start_3, 67),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_valid_start_4, 68),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_valid_start_5, 69),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_valid_start_6, 70),
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_valid_start_7, 71),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_ld_rdq1, 72),
+
+       /* counter6 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(ddrc_qx_valid_end_0, 64),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_ld_rdq2, 72),
+
+       /* counter7 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_1_2_full, 64),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_ld_wrq0, 65),
+
+       /* counter8 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_bias_switched, 64),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_1_4_full, 65),
+
+       /* counter9 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_ld_wrq1, 65),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_3_4_full, 66),
+
+       /* counter10 specific events */
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_misc_mrk, 65),
+       IMX9_DDR_PMU_EVENT_ATTR(eddrtq_pmon_ld_rdq0, 66),
+       NULL,
+};
+
+static const struct attribute_group ddr_perf_events_attr_group = {
+       .name = "events",
+       .attrs = ddr_perf_events_attrs,
+};
+
+PMU_FORMAT_ATTR(event, "config:0-7");
+PMU_FORMAT_ATTR(counter, "config:8-15");
+PMU_FORMAT_ATTR(axi_id, "config1:0-17");
+PMU_FORMAT_ATTR(axi_mask, "config2:0-17");
+
+static struct attribute *ddr_perf_format_attrs[] = {
+       &format_attr_event.attr,
+       &format_attr_counter.attr,
+       &format_attr_axi_id.attr,
+       &format_attr_axi_mask.attr,
+       NULL,
+};
+
+static const struct attribute_group ddr_perf_format_attr_group = {
+       .name = "format",
+       .attrs = ddr_perf_format_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+       &ddr_perf_identifier_attr_group,
+       &ddr_perf_cpumask_attr_group,
+       &ddr_perf_events_attr_group,
+       &ddr_perf_format_attr_group,
+       NULL,
+};
+
+static void ddr_perf_clear_counter(struct ddr_pmu *pmu, int counter)
+{
+       if (counter == CYCLES_COUNTER) {
+               writel(0, pmu->base + PMC(counter) + 0x4);
+               writel(0, pmu->base + PMC(counter));
+       } else {
+               writel(0, pmu->base + PMC(counter));
+       }
+}
+
+static u64 ddr_perf_read_counter(struct ddr_pmu *pmu, int counter)
+{
+       u32 val_lower, val_upper;
+       u64 val;
+
+       if (counter != CYCLES_COUNTER) {
+               val = readl_relaxed(pmu->base + PMC(counter));
+               goto out;
+       }
+
+       /* special handling for reading 64bit cycle counter */
+       do {
+               val_upper = readl_relaxed(pmu->base + PMC(counter) + 0x4);
+               val_lower = readl_relaxed(pmu->base + PMC(counter));
+       } while (val_upper != readl_relaxed(pmu->base + PMC(counter) + 0x4));
+
+       val = val_upper;
+       val = (val << 32);
+       val |= val_lower;
+out:
+       return val;
+}
+
+static void ddr_perf_counter_global_config(struct ddr_pmu *pmu, bool enable)
+{
+       u32 ctrl;
+
+       ctrl = readl_relaxed(pmu->base + PMGC0);
+
+       if (enable) {
+               /*
+                * The performance monitor must be reset before event counting
+                * sequences. The performance monitor can be reset by first freezing
+                * one or more counters and then clearing the freeze condition to
+                * allow the counters to count according to the settings in the
+                * performance monitor registers. Counters can be frozen individually
+                * by setting PMLCAn[FC] bits, or simultaneously by setting PMGC0[FAC].
+                * Simply clearing these freeze bits will then allow the performance
+                * monitor to begin counting based on the register settings.
+                */
+               ctrl |= PMGC0_FAC;
+               writel(ctrl, pmu->base + PMGC0);
+
+               /*
+                * Freeze all counters disabled, interrupt enabled, and freeze
+                * counters on condition enabled.
+                */
+               ctrl &= ~PMGC0_FAC;
+               ctrl |= PMGC0_PMIE | PMGC0_FCECE;
+               writel(ctrl, pmu->base + PMGC0);
+       } else {
+               ctrl |= PMGC0_FAC;
+               ctrl &= ~(PMGC0_PMIE | PMGC0_FCECE);
+               writel(ctrl, pmu->base + PMGC0);
+       }
+}
+
+static void ddr_perf_counter_local_config(struct ddr_pmu *pmu, int config,
+                                   int counter, bool enable)
+{
+       u32 ctrl_a;
+
+       ctrl_a = readl_relaxed(pmu->base + PMLCA(counter));
+
+       if (enable) {
+               ctrl_a |= PMLCA_FC;
+               writel(ctrl_a, pmu->base + PMLCA(counter));
+
+               ddr_perf_clear_counter(pmu, counter);
+
+               /* Freeze counter disabled, condition enabled, and program event.*/
+               ctrl_a &= ~PMLCA_FC;
+               ctrl_a |= PMLCA_CE;
+               ctrl_a &= ~FIELD_PREP(PMLCA_EVENT, 0x7F);
+               ctrl_a |= FIELD_PREP(PMLCA_EVENT, (config & 0x000000FF));
+               writel(ctrl_a, pmu->base + PMLCA(counter));
+       } else {
+               /* Freeze counter. */
+               ctrl_a |= PMLCA_FC;
+               writel(ctrl_a, pmu->base + PMLCA(counter));
+       }
+}
+
+static void ddr_perf_monitor_config(struct ddr_pmu *pmu, int cfg, int cfg1, int cfg2)
+{
+       u32 pmcfg1, pmcfg2;
+       int event, counter;
+
+       event = cfg & 0x000000FF;
+       counter = (cfg & 0x0000FF00) >> 8;
+
+       pmcfg1 = readl_relaxed(pmu->base + PMCFG1);
+
+       if (counter == 2 && event == 73)
+               pmcfg1 |= PMCFG1_RD_TRANS_FILT_EN;
+       else if (counter == 2 && event != 73)
+               pmcfg1 &= ~PMCFG1_RD_TRANS_FILT_EN;
+
+       if (counter == 3 && event == 73)
+               pmcfg1 |= PMCFG1_WR_TRANS_FILT_EN;
+       else if (counter == 3 && event != 73)
+               pmcfg1 &= ~PMCFG1_WR_TRANS_FILT_EN;
+
+       if (counter == 4 && event == 73)
+               pmcfg1 |= PMCFG1_RD_BT_FILT_EN;
+       else if (counter == 4 && event != 73)
+               pmcfg1 &= ~PMCFG1_RD_BT_FILT_EN;
+
+       pmcfg1 &= ~FIELD_PREP(PMCFG1_ID_MASK, 0x3FFFF);
+       pmcfg1 |= FIELD_PREP(PMCFG1_ID_MASK, cfg2);
+       writel(pmcfg1, pmu->base + PMCFG1);
+
+       pmcfg2 = readl_relaxed(pmu->base + PMCFG2);
+       pmcfg2 &= ~FIELD_PREP(PMCFG2_ID, 0x3FFFF);
+       pmcfg2 |= FIELD_PREP(PMCFG2_ID, cfg1);
+       writel(pmcfg2, pmu->base + PMCFG2);
+}
+
+static void ddr_perf_event_update(struct perf_event *event)
+{
+       struct ddr_pmu *pmu = to_ddr_pmu(event->pmu);
+       struct hw_perf_event *hwc = &event->hw;
+       int counter = hwc->idx;
+       u64 new_raw_count;
+
+       new_raw_count = ddr_perf_read_counter(pmu, counter);
+       local64_add(new_raw_count, &event->count);
+
+       /* clear counter's value every time */
+       ddr_perf_clear_counter(pmu, counter);
+}
+
+static int ddr_perf_event_init(struct perf_event *event)
+{
+       struct ddr_pmu *pmu = to_ddr_pmu(event->pmu);
+       struct hw_perf_event *hwc = &event->hw;
+       struct perf_event *sibling;
+
+       if (event->attr.type != event->pmu->type)
+               return -ENOENT;
+
+       if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
+               return -EOPNOTSUPP;
+
+       if (event->cpu < 0) {
+               dev_warn(pmu->dev, "Can't provide per-task data!\n");
+               return -EOPNOTSUPP;
+       }
+
+       /*
+        * We must NOT create groups containing mixed PMUs, although software
+        * events are acceptable (for example to create a CCN group
+        * periodically read when a hrtimer aka cpu-clock leader triggers).
+        */
+       if (event->group_leader->pmu != event->pmu &&
+                       !is_software_event(event->group_leader))
+               return -EINVAL;
+
+       for_each_sibling_event(sibling, event->group_leader) {
+               if (sibling->pmu != event->pmu &&
+                               !is_software_event(sibling))
+                       return -EINVAL;
+       }
+
+       event->cpu = pmu->cpu;
+       hwc->idx = -1;
+
+       return 0;
+}
+
+static void ddr_perf_event_start(struct perf_event *event, int flags)
+{
+       struct ddr_pmu *pmu = to_ddr_pmu(event->pmu);
+       struct hw_perf_event *hwc = &event->hw;
+       int counter = hwc->idx;
+
+       local64_set(&hwc->prev_count, 0);
+
+       ddr_perf_counter_local_config(pmu, event->attr.config, counter, true);
+       hwc->state = 0;
+}
+
+static int ddr_perf_event_add(struct perf_event *event, int flags)
+{
+       struct ddr_pmu *pmu = to_ddr_pmu(event->pmu);
+       struct hw_perf_event *hwc = &event->hw;
+       int cfg = event->attr.config;
+       int cfg1 = event->attr.config1;
+       int cfg2 = event->attr.config2;
+       int counter;
+
+       counter = (cfg & 0x0000FF00) >> 8;
+
+       pmu->events[counter] = event;
+       pmu->active_events++;
+       hwc->idx = counter;
+       hwc->state |= PERF_HES_STOPPED;
+
+       if (flags & PERF_EF_START)
+               ddr_perf_event_start(event, flags);
+
+       /* read trans, write trans, read beat */
+       ddr_perf_monitor_config(pmu, cfg, cfg1, cfg2);
+
+       return 0;
+}
+
+static void ddr_perf_event_stop(struct perf_event *event, int flags)
+{
+       struct ddr_pmu *pmu = to_ddr_pmu(event->pmu);
+       struct hw_perf_event *hwc = &event->hw;
+       int counter = hwc->idx;
+
+       ddr_perf_counter_local_config(pmu, event->attr.config, counter, false);
+       ddr_perf_event_update(event);
+
+       hwc->state |= PERF_HES_STOPPED;
+}
+
+static void ddr_perf_event_del(struct perf_event *event, int flags)
+{
+       struct ddr_pmu *pmu = to_ddr_pmu(event->pmu);
+       struct hw_perf_event *hwc = &event->hw;
+
+       ddr_perf_event_stop(event, PERF_EF_UPDATE);
+
+       pmu->active_events--;
+       hwc->idx = -1;
+}
+
+static void ddr_perf_pmu_enable(struct pmu *pmu)
+{
+       struct ddr_pmu *ddr_pmu = to_ddr_pmu(pmu);
+
+       ddr_perf_counter_global_config(ddr_pmu, true);
+}
+
+static void ddr_perf_pmu_disable(struct pmu *pmu)
+{
+       struct ddr_pmu *ddr_pmu = to_ddr_pmu(pmu);
+
+       ddr_perf_counter_global_config(ddr_pmu, false);
+}
+
+static void ddr_perf_init(struct ddr_pmu *pmu, void __iomem *base,
+                        struct device *dev)
+{
+       *pmu = (struct ddr_pmu) {
+               .pmu = (struct pmu) {
+                       .module       = THIS_MODULE,
+                       .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+                       .task_ctx_nr  = perf_invalid_context,
+                       .attr_groups  = attr_groups,
+                       .event_init   = ddr_perf_event_init,
+                       .add          = ddr_perf_event_add,
+                       .del          = ddr_perf_event_del,
+                       .start        = ddr_perf_event_start,
+                       .stop         = ddr_perf_event_stop,
+                       .read         = ddr_perf_event_update,
+                       .pmu_enable   = ddr_perf_pmu_enable,
+                       .pmu_disable  = ddr_perf_pmu_disable,
+               },
+               .base = base,
+               .dev = dev,
+       };
+}
+
+static irqreturn_t ddr_perf_irq_handler(int irq, void *p)
+{
+       struct ddr_pmu *pmu = (struct ddr_pmu *)p;
+       struct perf_event *event;
+       int i;
+
+       /*
+        * Counters can generate an interrupt on an overflow when msb of a
+        * counter changes from 0 to 1. For the interrupt to be signalled,
+        * below condition mush be satisfied:
+        * PMGC0[PMIE] = 1, PMGC0[FCECE] = 1, PMLCAn[CE] = 1
+        * When an interrupt is signalled, PMGC0[FAC] is set by hardware and
+        * all of the registers are frozen.
+        * Software can clear the interrupt condition by resetting the performance
+        * monitor and clearing the most significant bit of the counter that
+        * generate the overflow.
+        */
+       for (i = 0; i < NUM_COUNTERS; i++) {
+               if (!pmu->events[i])
+                       continue;
+
+               event = pmu->events[i];
+
+               ddr_perf_event_update(event);
+       }
+
+       ddr_perf_counter_global_config(pmu, true);
+
+       return IRQ_HANDLED;
+}
+
+static int ddr_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
+{
+       struct ddr_pmu *pmu = hlist_entry_safe(node, struct ddr_pmu, node);
+       int target;
+
+       if (cpu != pmu->cpu)
+               return 0;
+
+       target = cpumask_any_but(cpu_online_mask, cpu);
+       if (target >= nr_cpu_ids)
+               return 0;
+
+       perf_pmu_migrate_context(&pmu->pmu, cpu, target);
+       pmu->cpu = target;
+
+       WARN_ON(irq_set_affinity(pmu->irq, cpumask_of(pmu->cpu)));
+
+       return 0;
+}
+
+static int ddr_perf_probe(struct platform_device *pdev)
+{
+       struct ddr_pmu *pmu;
+       void __iomem *base;
+       int ret, irq;
+       char *name;
+
+       base = devm_platform_ioremap_resource(pdev, 0);
+       if (IS_ERR(base))
+               return PTR_ERR(base);
+
+       pmu = devm_kzalloc(&pdev->dev, sizeof(*pmu), GFP_KERNEL);
+       if (!pmu)
+               return -ENOMEM;
+
+       ddr_perf_init(pmu, base, &pdev->dev);
+
+       pmu->devtype_data = of_device_get_match_data(&pdev->dev);
+
+       platform_set_drvdata(pdev, pmu);
+
+       pmu->id = ida_simple_get(&ddr_ida, 0, 0, GFP_KERNEL);
+       name = devm_kasprintf(&pdev->dev, GFP_KERNEL, DDR_PERF_DEV_NAME "%d", pmu->id);
+       if (!name) {
+               ret = -ENOMEM;
+               goto format_string_err;
+       }
+
+       pmu->cpu = raw_smp_processor_id();
+       ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DDR_CPUHP_CB_NAME,
+                                     NULL, ddr_perf_offline_cpu);
+       if (ret < 0) {
+               dev_err(&pdev->dev, "Failed to add callbacks for multi state\n");
+               goto cpuhp_state_err;
+       }
+       pmu->cpuhp_state = ret;
+
+       /* Register the pmu instance for cpu hotplug */
+       ret = cpuhp_state_add_instance_nocalls(pmu->cpuhp_state, &pmu->node);
+       if (ret) {
+               dev_err(&pdev->dev, "Error %d registering hotplug\n", ret);
+               goto cpuhp_instance_err;
+       }
+
+       /* Request irq */
+       irq = platform_get_irq(pdev, 0);
+       if (irq < 0) {
+               ret = irq;
+               goto ddr_perf_err;
+       }
+
+       ret = devm_request_irq(&pdev->dev, irq, ddr_perf_irq_handler,
+                              IRQF_NOBALANCING | IRQF_NO_THREAD,
+                              DDR_CPUHP_CB_NAME, pmu);
+       if (ret < 0) {
+               dev_err(&pdev->dev, "Request irq failed: %d", ret);
+               goto ddr_perf_err;
+       }
+
+       pmu->irq = irq;
+       ret = irq_set_affinity(pmu->irq, cpumask_of(pmu->cpu));
+       if (ret) {
+               dev_err(pmu->dev, "Failed to set interrupt affinity\n");
+               goto ddr_perf_err;
+       }
+
+       ret = perf_pmu_register(&pmu->pmu, name, -1);
+       if (ret)
+               goto ddr_perf_err;
+
+       return 0;
+
+ddr_perf_err:
+       cpuhp_state_remove_instance_nocalls(pmu->cpuhp_state, &pmu->node);
+cpuhp_instance_err:
+       cpuhp_remove_multi_state(pmu->cpuhp_state);
+cpuhp_state_err:
+format_string_err:
+       ida_simple_remove(&ddr_ida, pmu->id);
+       dev_warn(&pdev->dev, "i.MX9 DDR Perf PMU failed (%d), disabled\n", ret);
+       return ret;
+}
+
+static int ddr_perf_remove(struct platform_device *pdev)
+{
+       struct ddr_pmu *pmu = platform_get_drvdata(pdev);
+
+       cpuhp_state_remove_instance_nocalls(pmu->cpuhp_state, &pmu->node);
+       cpuhp_remove_multi_state(pmu->cpuhp_state);
+
+       perf_pmu_unregister(&pmu->pmu);
+
+       ida_simple_remove(&ddr_ida, pmu->id);
+
+       return 0;
+}
+
+static struct platform_driver imx_ddr_pmu_driver = {
+       .driver         = {
+               .name                = "imx9-ddr-pmu",
+               .of_match_table      = imx_ddr_pmu_dt_ids,
+               .suppress_bind_attrs = true,
+       },
+       .probe          = ddr_perf_probe,
+       .remove         = ddr_perf_remove,
+};
+module_platform_driver(imx_ddr_pmu_driver);
+
+MODULE_AUTHOR("Xu Yang <xu.yang_2@nxp.com>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("DDRC PerfMon for i.MX9 SoCs");
index 4d2c9ab..48dcc83 100644 (file)
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_HISI_PMU) += hisi_uncore_pmu.o hisi_uncore_l3c_pmu.o \
                          hisi_uncore_hha_pmu.o hisi_uncore_ddrc_pmu.o hisi_uncore_sllc_pmu.o \
-                         hisi_uncore_pa_pmu.o hisi_uncore_cpa_pmu.o
+                         hisi_uncore_pa_pmu.o hisi_uncore_cpa_pmu.o hisi_uncore_uc_pmu.o
 
 obj-$(CONFIG_HISI_PCIE_PMU) += hisi_pcie_pmu.o
 obj-$(CONFIG_HNS3_PMU) += hns3_pmu.o
index 6fee0b6..e10fc7c 100644 (file)
@@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
        pcie_pmu->on_cpu = -1;
        /* Choose a new CPU from all online cpus. */
-       target = cpumask_first(cpu_online_mask);
+       target = cpumask_any_but(cpu_online_mask, cpu);
        if (target >= nr_cpu_ids) {
                pci_err(pcie_pmu->pdev, "There is no CPU to set\n");
                return 0;
index 71b6687..d941e74 100644 (file)
 #define PA_TT_CTRL                     0x1c08
 #define PA_TGTID_CTRL                  0x1c14
 #define PA_SRCID_CTRL                  0x1c18
+
+/* H32 PA interrupt registers */
 #define PA_INT_MASK                    0x1c70
 #define PA_INT_STATUS                  0x1c78
 #define PA_INT_CLEAR                   0x1c7c
+
+#define H60PA_INT_STATUS               0x1c70
+#define H60PA_INT_MASK                 0x1c74
+
 #define PA_EVENT_TYPE0                 0x1c80
 #define PA_PMU_VERSION                 0x1cf0
 #define PA_EVENT_CNT0_L                        0x1d00
@@ -46,6 +52,12 @@ HISI_PMU_EVENT_ATTR_EXTRACTOR(srcid_cmd, config1, 32, 22);
 HISI_PMU_EVENT_ATTR_EXTRACTOR(srcid_msk, config1, 43, 33);
 HISI_PMU_EVENT_ATTR_EXTRACTOR(tracetag_en, config1, 44, 44);
 
+struct hisi_pa_pmu_int_regs {
+       u32 mask_offset;
+       u32 clear_offset;
+       u32 status_offset;
+};
+
 static void hisi_pa_pmu_enable_tracetag(struct perf_event *event)
 {
        struct hisi_pmu *pa_pmu = to_hisi_pmu(event->pmu);
@@ -219,40 +231,40 @@ static void hisi_pa_pmu_disable_counter(struct hisi_pmu *pa_pmu,
 static void hisi_pa_pmu_enable_counter_int(struct hisi_pmu *pa_pmu,
                                           struct hw_perf_event *hwc)
 {
+       struct hisi_pa_pmu_int_regs *regs = pa_pmu->dev_info->private;
        u32 val;
 
        /* Write 0 to enable interrupt */
-       val = readl(pa_pmu->base + PA_INT_MASK);
+       val = readl(pa_pmu->base + regs->mask_offset);
        val &= ~(1 << hwc->idx);
-       writel(val, pa_pmu->base + PA_INT_MASK);
+       writel(val, pa_pmu->base + regs->mask_offset);
 }
 
 static void hisi_pa_pmu_disable_counter_int(struct hisi_pmu *pa_pmu,
                                            struct hw_perf_event *hwc)
 {
+       struct hisi_pa_pmu_int_regs *regs = pa_pmu->dev_info->private;
        u32 val;
 
        /* Write 1 to mask interrupt */
-       val = readl(pa_pmu->base + PA_INT_MASK);
+       val = readl(pa_pmu->base + regs->mask_offset);
        val |= 1 << hwc->idx;
-       writel(val, pa_pmu->base + PA_INT_MASK);
+       writel(val, pa_pmu->base + regs->mask_offset);
 }
 
 static u32 hisi_pa_pmu_get_int_status(struct hisi_pmu *pa_pmu)
 {
-       return readl(pa_pmu->base + PA_INT_STATUS);
+       struct hisi_pa_pmu_int_regs *regs = pa_pmu->dev_info->private;
+
+       return readl(pa_pmu->base + regs->status_offset);
 }
 
 static void hisi_pa_pmu_clear_int_status(struct hisi_pmu *pa_pmu, int idx)
 {
-       writel(1 << idx, pa_pmu->base + PA_INT_CLEAR);
-}
+       struct hisi_pa_pmu_int_regs *regs = pa_pmu->dev_info->private;
 
-static const struct acpi_device_id hisi_pa_pmu_acpi_match[] = {
-       { "HISI0273", },
-       {}
-};
-MODULE_DEVICE_TABLE(acpi, hisi_pa_pmu_acpi_match);
+       writel(1 << idx, pa_pmu->base + regs->clear_offset);
+}
 
 static int hisi_pa_pmu_init_data(struct platform_device *pdev,
                                   struct hisi_pmu *pa_pmu)
@@ -276,6 +288,10 @@ static int hisi_pa_pmu_init_data(struct platform_device *pdev,
        pa_pmu->ccl_id = -1;
        pa_pmu->sccl_id = -1;
 
+       pa_pmu->dev_info = device_get_match_data(&pdev->dev);
+       if (!pa_pmu->dev_info)
+               return -ENODEV;
+
        pa_pmu->base = devm_platform_ioremap_resource(pdev, 0);
        if (IS_ERR(pa_pmu->base)) {
                dev_err(&pdev->dev, "ioremap failed for pa_pmu resource.\n");
@@ -314,6 +330,32 @@ static const struct attribute_group hisi_pa_pmu_v2_events_group = {
        .attrs = hisi_pa_pmu_v2_events_attr,
 };
 
+static struct attribute *hisi_pa_pmu_v3_events_attr[] = {
+       HISI_PMU_EVENT_ATTR(tx_req,     0x0),
+       HISI_PMU_EVENT_ATTR(tx_dat,     0x1),
+       HISI_PMU_EVENT_ATTR(tx_snp,     0x2),
+       HISI_PMU_EVENT_ATTR(rx_req,     0x7),
+       HISI_PMU_EVENT_ATTR(rx_dat,     0x8),
+       HISI_PMU_EVENT_ATTR(rx_snp,     0x9),
+       NULL
+};
+
+static const struct attribute_group hisi_pa_pmu_v3_events_group = {
+       .name = "events",
+       .attrs = hisi_pa_pmu_v3_events_attr,
+};
+
+static struct attribute *hisi_h60pa_pmu_events_attr[] = {
+       HISI_PMU_EVENT_ATTR(rx_flit,    0x50),
+       HISI_PMU_EVENT_ATTR(tx_flit,    0x65),
+       NULL
+};
+
+static const struct attribute_group hisi_h60pa_pmu_events_group = {
+       .name = "events",
+       .attrs = hisi_h60pa_pmu_events_attr,
+};
+
 static DEVICE_ATTR(cpumask, 0444, hisi_cpumask_sysfs_show, NULL);
 
 static struct attribute *hisi_pa_pmu_cpumask_attrs[] = {
@@ -337,6 +379,12 @@ static const struct attribute_group hisi_pa_pmu_identifier_group = {
        .attrs = hisi_pa_pmu_identifier_attrs,
 };
 
+static struct hisi_pa_pmu_int_regs hisi_pa_pmu_regs = {
+       .mask_offset = PA_INT_MASK,
+       .clear_offset = PA_INT_CLEAR,
+       .status_offset = PA_INT_STATUS,
+};
+
 static const struct attribute_group *hisi_pa_pmu_v2_attr_groups[] = {
        &hisi_pa_pmu_v2_format_group,
        &hisi_pa_pmu_v2_events_group,
@@ -345,6 +393,46 @@ static const struct attribute_group *hisi_pa_pmu_v2_attr_groups[] = {
        NULL
 };
 
+static const struct hisi_pmu_dev_info hisi_h32pa_v2 = {
+       .name = "pa",
+       .attr_groups = hisi_pa_pmu_v2_attr_groups,
+       .private = &hisi_pa_pmu_regs,
+};
+
+static const struct attribute_group *hisi_pa_pmu_v3_attr_groups[] = {
+       &hisi_pa_pmu_v2_format_group,
+       &hisi_pa_pmu_v3_events_group,
+       &hisi_pa_pmu_cpumask_attr_group,
+       &hisi_pa_pmu_identifier_group,
+       NULL
+};
+
+static const struct hisi_pmu_dev_info hisi_h32pa_v3 = {
+       .name = "pa",
+       .attr_groups = hisi_pa_pmu_v3_attr_groups,
+       .private = &hisi_pa_pmu_regs,
+};
+
+static struct hisi_pa_pmu_int_regs hisi_h60pa_pmu_regs = {
+       .mask_offset = H60PA_INT_MASK,
+       .clear_offset = H60PA_INT_STATUS, /* Clear on write */
+       .status_offset = H60PA_INT_STATUS,
+};
+
+static const struct attribute_group *hisi_h60pa_pmu_attr_groups[] = {
+       &hisi_pa_pmu_v2_format_group,
+       &hisi_h60pa_pmu_events_group,
+       &hisi_pa_pmu_cpumask_attr_group,
+       &hisi_pa_pmu_identifier_group,
+       NULL
+};
+
+static const struct hisi_pmu_dev_info hisi_h60pa = {
+       .name = "h60pa",
+       .attr_groups = hisi_h60pa_pmu_attr_groups,
+       .private = &hisi_h60pa_pmu_regs,
+};
+
 static const struct hisi_uncore_ops hisi_uncore_pa_ops = {
        .write_evtype           = hisi_pa_pmu_write_evtype,
        .get_event_idx          = hisi_uncore_pmu_get_event_idx,
@@ -375,7 +463,7 @@ static int hisi_pa_pmu_dev_probe(struct platform_device *pdev,
        if (ret)
                return ret;
 
-       pa_pmu->pmu_events.attr_groups = hisi_pa_pmu_v2_attr_groups;
+       pa_pmu->pmu_events.attr_groups = pa_pmu->dev_info->attr_groups;
        pa_pmu->num_counters = PA_NR_COUNTERS;
        pa_pmu->ops = &hisi_uncore_pa_ops;
        pa_pmu->check_event = 0xB0;
@@ -400,8 +488,9 @@ static int hisi_pa_pmu_probe(struct platform_device *pdev)
        if (ret)
                return ret;
 
-       name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "hisi_sicl%u_pa%u",
-                             pa_pmu->sicl_id, pa_pmu->index_id);
+       name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "hisi_sicl%d_%s%u",
+                             pa_pmu->sicl_id, pa_pmu->dev_info->name,
+                             pa_pmu->index_id);
        if (!name)
                return -ENOMEM;
 
@@ -435,6 +524,14 @@ static int hisi_pa_pmu_remove(struct platform_device *pdev)
        return 0;
 }
 
+static const struct acpi_device_id hisi_pa_pmu_acpi_match[] = {
+       { "HISI0273", (kernel_ulong_t)&hisi_h32pa_v2 },
+       { "HISI0275", (kernel_ulong_t)&hisi_h32pa_v3 },
+       { "HISI0274", (kernel_ulong_t)&hisi_h60pa },
+       {}
+};
+MODULE_DEVICE_TABLE(acpi, hisi_pa_pmu_acpi_match);
+
 static struct platform_driver hisi_pa_pmu_driver = {
        .driver = {
                .name = "hisi_pa_pmu",
index 2823f38..0403145 100644 (file)
@@ -20,7 +20,6 @@
 
 #include "hisi_uncore_pmu.h"
 
-#define HISI_GET_EVENTID(ev) (ev->hw.config_base & 0xff)
 #define HISI_MAX_PERIOD(nr) (GENMASK_ULL((nr) - 1, 0))
 
 /*
@@ -226,6 +225,9 @@ int hisi_uncore_pmu_event_init(struct perf_event *event)
        hwc->idx                = -1;
        hwc->config_base        = event->attr.config;
 
+       if (hisi_pmu->ops->check_filter && hisi_pmu->ops->check_filter(event))
+               return -EINVAL;
+
        /* Enforce to use the same CPU for all events in this PMU */
        event->cpu = hisi_pmu->on_cpu;
 
index 07890a8..92402aa 100644 (file)
                return FIELD_GET(GENMASK_ULL(hi, lo), event->attr.config);  \
        }
 
+#define HISI_GET_EVENTID(ev) (ev->hw.config_base & 0xff)
+
+#define HISI_PMU_EVTYPE_BITS           8
+#define HISI_PMU_EVTYPE_SHIFT(idx)     ((idx) % 4 * HISI_PMU_EVTYPE_BITS)
+
 struct hisi_pmu;
 
 struct hisi_uncore_ops {
+       int (*check_filter)(struct perf_event *event);
        void (*write_evtype)(struct hisi_pmu *, int, u32);
        int (*get_event_idx)(struct perf_event *);
        u64 (*read_counter)(struct hisi_pmu *, struct hw_perf_event *);
@@ -62,6 +68,13 @@ struct hisi_uncore_ops {
        void (*disable_filter)(struct perf_event *event);
 };
 
+/* Describes the HISI PMU chip features information */
+struct hisi_pmu_dev_info {
+       const char *name;
+       const struct attribute_group **attr_groups;
+       void *private;
+};
+
 struct hisi_pmu_hwevents {
        struct perf_event *hw_events[HISI_MAX_COUNTERS];
        DECLARE_BITMAP(used_mask, HISI_MAX_COUNTERS);
@@ -72,6 +85,7 @@ struct hisi_pmu_hwevents {
 struct hisi_pmu {
        struct pmu pmu;
        const struct hisi_uncore_ops *ops;
+       const struct hisi_pmu_dev_info *dev_info;
        struct hisi_pmu_hwevents pmu_events;
        /* associated_cpus: All CPUs associated with the PMU */
        cpumask_t associated_cpus;
diff --git a/drivers/perf/hisilicon/hisi_uncore_uc_pmu.c b/drivers/perf/hisilicon/hisi_uncore_uc_pmu.c
new file mode 100644 (file)
index 0000000..63da05e
--- /dev/null
@@ -0,0 +1,578 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * HiSilicon SoC UC (unified cache) uncore Hardware event counters support
+ *
+ * Copyright (C) 2023 HiSilicon Limited
+ *
+ * This code is based on the uncore PMUs like hisi_uncore_l3c_pmu.
+ */
+#include <linux/cpuhotplug.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/list.h>
+#include <linux/mod_devicetable.h>
+#include <linux/property.h>
+
+#include "hisi_uncore_pmu.h"
+
+/* Dynamic CPU hotplug state used by UC PMU */
+static enum cpuhp_state hisi_uc_pmu_online;
+
+/* UC register definition */
+#define HISI_UC_INT_MASK_REG           0x0800
+#define HISI_UC_INT_STS_REG            0x0808
+#define HISI_UC_INT_CLEAR_REG          0x080c
+#define HISI_UC_TRACETAG_CTRL_REG      0x1b2c
+#define HISI_UC_TRACETAG_REQ_MSK       GENMASK(9, 7)
+#define HISI_UC_TRACETAG_MARK_EN       BIT(0)
+#define HISI_UC_TRACETAG_REQ_EN                (HISI_UC_TRACETAG_MARK_EN | BIT(2))
+#define HISI_UC_TRACETAG_SRCID_EN      BIT(3)
+#define HISI_UC_SRCID_CTRL_REG         0x1b40
+#define HISI_UC_SRCID_MSK              GENMASK(14, 1)
+#define HISI_UC_EVENT_CTRL_REG         0x1c00
+#define HISI_UC_EVENT_TRACETAG_EN      BIT(29)
+#define HISI_UC_EVENT_URING_MSK                GENMASK(28, 27)
+#define HISI_UC_EVENT_GLB_EN           BIT(26)
+#define HISI_UC_VERSION_REG            0x1cf0
+#define HISI_UC_EVTYPE_REGn(n)         (0x1d00 + (n) * 4)
+#define HISI_UC_EVTYPE_MASK            GENMASK(7, 0)
+#define HISI_UC_CNTR_REGn(n)           (0x1e00 + (n) * 8)
+
+#define HISI_UC_NR_COUNTERS            0x8
+#define HISI_UC_V2_NR_EVENTS           0xFF
+#define HISI_UC_CNTR_REG_BITS          64
+
+#define HISI_UC_RD_REQ_TRACETAG                0x4
+#define HISI_UC_URING_EVENT_MIN                0x47
+#define HISI_UC_URING_EVENT_MAX                0x59
+
+HISI_PMU_EVENT_ATTR_EXTRACTOR(rd_req_en, config1, 0, 0);
+HISI_PMU_EVENT_ATTR_EXTRACTOR(uring_channel, config1, 5, 4);
+HISI_PMU_EVENT_ATTR_EXTRACTOR(srcid, config1, 19, 6);
+HISI_PMU_EVENT_ATTR_EXTRACTOR(srcid_en, config1, 20, 20);
+
+static int hisi_uc_pmu_check_filter(struct perf_event *event)
+{
+       struct hisi_pmu *uc_pmu = to_hisi_pmu(event->pmu);
+
+       if (hisi_get_srcid_en(event) && !hisi_get_rd_req_en(event)) {
+               dev_err(uc_pmu->dev,
+                       "rcid_en depends on rd_req_en being enabled!\n");
+               return -EINVAL;
+       }
+
+       if (!hisi_get_uring_channel(event))
+               return 0;
+
+       if ((HISI_GET_EVENTID(event) < HISI_UC_URING_EVENT_MIN) ||
+           (HISI_GET_EVENTID(event) > HISI_UC_URING_EVENT_MAX))
+               dev_warn(uc_pmu->dev,
+                        "Only events: [%#x ~ %#x] support channel filtering!",
+                        HISI_UC_URING_EVENT_MIN, HISI_UC_URING_EVENT_MAX);
+
+       return 0;
+}
+
+static void hisi_uc_pmu_config_req_tracetag(struct perf_event *event)
+{
+       struct hisi_pmu *uc_pmu = to_hisi_pmu(event->pmu);
+       u32 val;
+
+       if (!hisi_get_rd_req_en(event))
+               return;
+
+       val = readl(uc_pmu->base + HISI_UC_TRACETAG_CTRL_REG);
+
+       /* The request-type has been configured */
+       if (FIELD_GET(HISI_UC_TRACETAG_REQ_MSK, val) == HISI_UC_RD_REQ_TRACETAG)
+               return;
+
+       /* Set request-type for tracetag, only read request is supported! */
+       val &= ~HISI_UC_TRACETAG_REQ_MSK;
+       val |= FIELD_PREP(HISI_UC_TRACETAG_REQ_MSK, HISI_UC_RD_REQ_TRACETAG);
+       val |= HISI_UC_TRACETAG_REQ_EN;
+       writel(val, uc_pmu->base + HISI_UC_TRACETAG_CTRL_REG);
+}
+
+static void hisi_uc_pmu_clear_req_tracetag(struct perf_event *event)
+{
+       struct hisi_pmu *uc_pmu = to_hisi_pmu(event->pmu);
+       u32 val;
+
+       if (!hisi_get_rd_req_en(event))
+               return;
+
+       val = readl(uc_pmu->base + HISI_UC_TRACETAG_CTRL_REG);
+
+       /* Do nothing, the request-type tracetag has been cleaned up */
+       if (FIELD_GET(HISI_UC_TRACETAG_REQ_MSK, val) == 0)
+               return;
+
+       /* Clear request-type */
+       val &= ~HISI_UC_TRACETAG_REQ_MSK;
+       val &= ~HISI_UC_TRACETAG_REQ_EN;
+       writel(val, uc_pmu->base + HISI_UC_TRACETAG_CTRL_REG);
+}
+
+static void hisi_uc_pmu_config_srcid_tracetag(struct perf_event *event)
+{
+       struct hisi_pmu *uc_pmu = to_hisi_pmu(event->pmu);
+       u32 val;
+
+       if (!hisi_get_srcid_en(event))
+               return;
+
+       val = readl(uc_pmu->base + HISI_UC_TRACETAG_CTRL_REG);
+
+       /* Do nothing, the source id has been configured */
+       if (FIELD_GET(HISI_UC_TRACETAG_SRCID_EN, val))
+               return;
+
+       /* Enable source id tracetag */
+       val |= HISI_UC_TRACETAG_SRCID_EN;
+       writel(val, uc_pmu->base + HISI_UC_TRACETAG_CTRL_REG);
+
+       val = readl(uc_pmu->base + HISI_UC_SRCID_CTRL_REG);
+       val &= ~HISI_UC_SRCID_MSK;
+       val |= FIELD_PREP(HISI_UC_SRCID_MSK, hisi_get_srcid(event));
+       writel(val, uc_pmu->base + HISI_UC_SRCID_CTRL_REG);
+
+       /* Depend on request-type tracetag enabled */
+       hisi_uc_pmu_config_req_tracetag(event);
+}
+
+static void hisi_uc_pmu_clear_srcid_tracetag(struct perf_event *event)
+{
+       struct hisi_pmu *uc_pmu = to_hisi_pmu(event->pmu);
+       u32 val;
+
+       if (!hisi_get_srcid_en(event))
+               return;
+
+       val = readl(uc_pmu->base + HISI_UC_TRACETAG_CTRL_REG);
+
+       /* Do nothing, the source id has been cleaned up */
+       if (FIELD_GET(HISI_UC_TRACETAG_SRCID_EN, val) == 0)
+               return;
+
+       hisi_uc_pmu_clear_req_tracetag(event);
+
+       /* Disable source id tracetag */
+       val &= ~HISI_UC_TRACETAG_SRCID_EN;
+       writel(val, uc_pmu->base + HISI_UC_TRACETAG_CTRL_REG);
+
+       val = readl(uc_pmu->base + HISI_UC_SRCID_CTRL_REG);
+       val &= ~HISI_UC_SRCID_MSK;
+       writel(val, uc_pmu->base + HISI_UC_SRCID_CTRL_REG);
+}
+
+static void hisi_uc_pmu_config_uring_channel(struct perf_event *event)
+{
+       struct hisi_pmu *uc_pmu = to_hisi_pmu(event->pmu);
+       u32 uring_channel = hisi_get_uring_channel(event);
+       u32 val;
+
+       /* Do nothing if not being set or is set explicitly to zero (default) */
+       if (uring_channel == 0)
+               return;
+
+       val = readl(uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+
+       /* Do nothing, the uring_channel has been configured */
+       if (uring_channel == FIELD_GET(HISI_UC_EVENT_URING_MSK, val))
+               return;
+
+       val &= ~HISI_UC_EVENT_URING_MSK;
+       val |= FIELD_PREP(HISI_UC_EVENT_URING_MSK, uring_channel);
+       writel(val, uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+}
+
+static void hisi_uc_pmu_clear_uring_channel(struct perf_event *event)
+{
+       struct hisi_pmu *uc_pmu = to_hisi_pmu(event->pmu);
+       u32 val;
+
+       /* Do nothing if not being set or is set explicitly to zero (default) */
+       if (hisi_get_uring_channel(event) == 0)
+               return;
+
+       val = readl(uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+
+       /* Do nothing, the uring_channel has been cleaned up */
+       if (FIELD_GET(HISI_UC_EVENT_URING_MSK, val) == 0)
+               return;
+
+       val &= ~HISI_UC_EVENT_URING_MSK;
+       writel(val, uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+}
+
+static void hisi_uc_pmu_enable_filter(struct perf_event *event)
+{
+       if (event->attr.config1 == 0)
+               return;
+
+       hisi_uc_pmu_config_uring_channel(event);
+       hisi_uc_pmu_config_req_tracetag(event);
+       hisi_uc_pmu_config_srcid_tracetag(event);
+}
+
+static void hisi_uc_pmu_disable_filter(struct perf_event *event)
+{
+       if (event->attr.config1 == 0)
+               return;
+
+       hisi_uc_pmu_clear_srcid_tracetag(event);
+       hisi_uc_pmu_clear_req_tracetag(event);
+       hisi_uc_pmu_clear_uring_channel(event);
+}
+
+static void hisi_uc_pmu_write_evtype(struct hisi_pmu *uc_pmu, int idx, u32 type)
+{
+       u32 val;
+
+       /*
+        * Select the appropriate event select register.
+        * There are 2 32-bit event select registers for the
+        * 8 hardware counters, each event code is 8-bit wide.
+        */
+       val = readl(uc_pmu->base + HISI_UC_EVTYPE_REGn(idx / 4));
+       val &= ~(HISI_UC_EVTYPE_MASK << HISI_PMU_EVTYPE_SHIFT(idx));
+       val |= (type << HISI_PMU_EVTYPE_SHIFT(idx));
+       writel(val, uc_pmu->base + HISI_UC_EVTYPE_REGn(idx / 4));
+}
+
+static void hisi_uc_pmu_start_counters(struct hisi_pmu *uc_pmu)
+{
+       u32 val;
+
+       val = readl(uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+       val |= HISI_UC_EVENT_GLB_EN;
+       writel(val, uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+}
+
+static void hisi_uc_pmu_stop_counters(struct hisi_pmu *uc_pmu)
+{
+       u32 val;
+
+       val = readl(uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+       val &= ~HISI_UC_EVENT_GLB_EN;
+       writel(val, uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+}
+
+static void hisi_uc_pmu_enable_counter(struct hisi_pmu *uc_pmu,
+                                       struct hw_perf_event *hwc)
+{
+       u32 val;
+
+       /* Enable counter index */
+       val = readl(uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+       val |= (1 << hwc->idx);
+       writel(val, uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+}
+
+static void hisi_uc_pmu_disable_counter(struct hisi_pmu *uc_pmu,
+                                       struct hw_perf_event *hwc)
+{
+       u32 val;
+
+       /* Clear counter index */
+       val = readl(uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+       val &= ~(1 << hwc->idx);
+       writel(val, uc_pmu->base + HISI_UC_EVENT_CTRL_REG);
+}
+
+static u64 hisi_uc_pmu_read_counter(struct hisi_pmu *uc_pmu,
+                                   struct hw_perf_event *hwc)
+{
+       return readq(uc_pmu->base + HISI_UC_CNTR_REGn(hwc->idx));
+}
+
+static void hisi_uc_pmu_write_counter(struct hisi_pmu *uc_pmu,
+                                     struct hw_perf_event *hwc, u64 val)
+{
+       writeq(val, uc_pmu->base + HISI_UC_CNTR_REGn(hwc->idx));
+}
+
+static void hisi_uc_pmu_enable_counter_int(struct hisi_pmu *uc_pmu,
+                                          struct hw_perf_event *hwc)
+{
+       u32 val;
+
+       val = readl(uc_pmu->base + HISI_UC_INT_MASK_REG);
+       val &= ~(1 << hwc->idx);
+       writel(val, uc_pmu->base + HISI_UC_INT_MASK_REG);
+}
+
+static void hisi_uc_pmu_disable_counter_int(struct hisi_pmu *uc_pmu,
+                                           struct hw_perf_event *hwc)
+{
+       u32 val;
+
+       val = readl(uc_pmu->base + HISI_UC_INT_MASK_REG);
+       val |= (1 << hwc->idx);
+       writel(val, uc_pmu->base + HISI_UC_INT_MASK_REG);
+}
+
+static u32 hisi_uc_pmu_get_int_status(struct hisi_pmu *uc_pmu)
+{
+       return readl(uc_pmu->base + HISI_UC_INT_STS_REG);
+}
+
+static void hisi_uc_pmu_clear_int_status(struct hisi_pmu *uc_pmu, int idx)
+{
+       writel(1 << idx, uc_pmu->base + HISI_UC_INT_CLEAR_REG);
+}
+
+static int hisi_uc_pmu_init_data(struct platform_device *pdev,
+                                struct hisi_pmu *uc_pmu)
+{
+       /*
+        * Use SCCL (Super CPU Cluster) ID and CCL (CPU Cluster) ID to
+        * identify the topology information of UC PMU devices in the chip.
+        * They have some CCLs per SCCL and then 4 UC PMU per CCL.
+        */
+       if (device_property_read_u32(&pdev->dev, "hisilicon,scl-id",
+                                    &uc_pmu->sccl_id)) {
+               dev_err(&pdev->dev, "Can not read uc sccl-id!\n");
+               return -EINVAL;
+       }
+
+       if (device_property_read_u32(&pdev->dev, "hisilicon,ccl-id",
+                                    &uc_pmu->ccl_id)) {
+               dev_err(&pdev->dev, "Can not read uc ccl-id!\n");
+               return -EINVAL;
+       }
+
+       if (device_property_read_u32(&pdev->dev, "hisilicon,sub-id",
+                                    &uc_pmu->sub_id)) {
+               dev_err(&pdev->dev, "Can not read sub-id!\n");
+               return -EINVAL;
+       }
+
+       uc_pmu->base = devm_platform_ioremap_resource(pdev, 0);
+       if (IS_ERR(uc_pmu->base)) {
+               dev_err(&pdev->dev, "ioremap failed for uc_pmu resource\n");
+               return PTR_ERR(uc_pmu->base);
+       }
+
+       uc_pmu->identifier = readl(uc_pmu->base + HISI_UC_VERSION_REG);
+
+       return 0;
+}
+
+static struct attribute *hisi_uc_pmu_format_attr[] = {
+       HISI_PMU_FORMAT_ATTR(event, "config:0-7"),
+       HISI_PMU_FORMAT_ATTR(rd_req_en, "config1:0-0"),
+       HISI_PMU_FORMAT_ATTR(uring_channel, "config1:4-5"),
+       HISI_PMU_FORMAT_ATTR(srcid, "config1:6-19"),
+       HISI_PMU_FORMAT_ATTR(srcid_en, "config1:20-20"),
+       NULL
+};
+
+static const struct attribute_group hisi_uc_pmu_format_group = {
+       .name = "format",
+       .attrs = hisi_uc_pmu_format_attr,
+};
+
+static struct attribute *hisi_uc_pmu_events_attr[] = {
+       HISI_PMU_EVENT_ATTR(sq_time,            0x00),
+       HISI_PMU_EVENT_ATTR(pq_time,            0x01),
+       HISI_PMU_EVENT_ATTR(hbm_time,           0x02),
+       HISI_PMU_EVENT_ATTR(iq_comp_time_cring, 0x03),
+       HISI_PMU_EVENT_ATTR(iq_comp_time_uring, 0x05),
+       HISI_PMU_EVENT_ATTR(cpu_rd,             0x10),
+       HISI_PMU_EVENT_ATTR(cpu_rd64,           0x17),
+       HISI_PMU_EVENT_ATTR(cpu_rs64,           0x19),
+       HISI_PMU_EVENT_ATTR(cpu_mru,            0x1a),
+       HISI_PMU_EVENT_ATTR(cycles,             0x9c),
+       HISI_PMU_EVENT_ATTR(spipe_hit,          0xb3),
+       HISI_PMU_EVENT_ATTR(hpipe_hit,          0xdb),
+       HISI_PMU_EVENT_ATTR(cring_rxdat_cnt,    0xfa),
+       HISI_PMU_EVENT_ATTR(cring_txdat_cnt,    0xfb),
+       HISI_PMU_EVENT_ATTR(uring_rxdat_cnt,    0xfc),
+       HISI_PMU_EVENT_ATTR(uring_txdat_cnt,    0xfd),
+       NULL
+};
+
+static const struct attribute_group hisi_uc_pmu_events_group = {
+       .name = "events",
+       .attrs = hisi_uc_pmu_events_attr,
+};
+
+static DEVICE_ATTR(cpumask, 0444, hisi_cpumask_sysfs_show, NULL);
+
+static struct attribute *hisi_uc_pmu_cpumask_attrs[] = {
+       &dev_attr_cpumask.attr,
+       NULL,
+};
+
+static const struct attribute_group hisi_uc_pmu_cpumask_attr_group = {
+       .attrs = hisi_uc_pmu_cpumask_attrs,
+};
+
+static struct device_attribute hisi_uc_pmu_identifier_attr =
+       __ATTR(identifier, 0444, hisi_uncore_pmu_identifier_attr_show, NULL);
+
+static struct attribute *hisi_uc_pmu_identifier_attrs[] = {
+       &hisi_uc_pmu_identifier_attr.attr,
+       NULL
+};
+
+static const struct attribute_group hisi_uc_pmu_identifier_group = {
+       .attrs = hisi_uc_pmu_identifier_attrs,
+};
+
+static const struct attribute_group *hisi_uc_pmu_attr_groups[] = {
+       &hisi_uc_pmu_format_group,
+       &hisi_uc_pmu_events_group,
+       &hisi_uc_pmu_cpumask_attr_group,
+       &hisi_uc_pmu_identifier_group,
+       NULL
+};
+
+static const struct hisi_uncore_ops hisi_uncore_uc_pmu_ops = {
+       .check_filter           = hisi_uc_pmu_check_filter,
+       .write_evtype           = hisi_uc_pmu_write_evtype,
+       .get_event_idx          = hisi_uncore_pmu_get_event_idx,
+       .start_counters         = hisi_uc_pmu_start_counters,
+       .stop_counters          = hisi_uc_pmu_stop_counters,
+       .enable_counter         = hisi_uc_pmu_enable_counter,
+       .disable_counter        = hisi_uc_pmu_disable_counter,
+       .enable_counter_int     = hisi_uc_pmu_enable_counter_int,
+       .disable_counter_int    = hisi_uc_pmu_disable_counter_int,
+       .write_counter          = hisi_uc_pmu_write_counter,
+       .read_counter           = hisi_uc_pmu_read_counter,
+       .get_int_status         = hisi_uc_pmu_get_int_status,
+       .clear_int_status       = hisi_uc_pmu_clear_int_status,
+       .enable_filter          = hisi_uc_pmu_enable_filter,
+       .disable_filter         = hisi_uc_pmu_disable_filter,
+};
+
+static int hisi_uc_pmu_dev_probe(struct platform_device *pdev,
+                                struct hisi_pmu *uc_pmu)
+{
+       int ret;
+
+       ret = hisi_uc_pmu_init_data(pdev, uc_pmu);
+       if (ret)
+               return ret;
+
+       ret = hisi_uncore_pmu_init_irq(uc_pmu, pdev);
+       if (ret)
+               return ret;
+
+       uc_pmu->pmu_events.attr_groups = hisi_uc_pmu_attr_groups;
+       uc_pmu->check_event = HISI_UC_EVTYPE_MASK;
+       uc_pmu->ops = &hisi_uncore_uc_pmu_ops;
+       uc_pmu->counter_bits = HISI_UC_CNTR_REG_BITS;
+       uc_pmu->num_counters = HISI_UC_NR_COUNTERS;
+       uc_pmu->dev = &pdev->dev;
+       uc_pmu->on_cpu = -1;
+
+       return 0;
+}
+
+static void hisi_uc_pmu_remove_cpuhp_instance(void *hotplug_node)
+{
+       cpuhp_state_remove_instance_nocalls(hisi_uc_pmu_online, hotplug_node);
+}
+
+static void hisi_uc_pmu_unregister_pmu(void *pmu)
+{
+       perf_pmu_unregister(pmu);
+}
+
+static int hisi_uc_pmu_probe(struct platform_device *pdev)
+{
+       struct hisi_pmu *uc_pmu;
+       char *name;
+       int ret;
+
+       uc_pmu = devm_kzalloc(&pdev->dev, sizeof(*uc_pmu), GFP_KERNEL);
+       if (!uc_pmu)
+               return -ENOMEM;
+
+       platform_set_drvdata(pdev, uc_pmu);
+
+       ret = hisi_uc_pmu_dev_probe(pdev, uc_pmu);
+       if (ret)
+               return ret;
+
+       name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "hisi_sccl%d_uc%d_%u",
+                             uc_pmu->sccl_id, uc_pmu->ccl_id, uc_pmu->sub_id);
+       if (!name)
+               return -ENOMEM;
+
+       ret = cpuhp_state_add_instance(hisi_uc_pmu_online, &uc_pmu->node);
+       if (ret)
+               return dev_err_probe(&pdev->dev, ret, "Error registering hotplug\n");
+
+       ret = devm_add_action_or_reset(&pdev->dev,
+                                      hisi_uc_pmu_remove_cpuhp_instance,
+                                      &uc_pmu->node);
+       if (ret)
+               return ret;
+
+       hisi_pmu_init(uc_pmu, THIS_MODULE);
+
+       ret = perf_pmu_register(&uc_pmu->pmu, name, -1);
+       if (ret)
+               return ret;
+
+       return devm_add_action_or_reset(&pdev->dev,
+                                       hisi_uc_pmu_unregister_pmu,
+                                       &uc_pmu->pmu);
+}
+
+static const struct acpi_device_id hisi_uc_pmu_acpi_match[] = {
+       { "HISI0291", },
+       {}
+};
+MODULE_DEVICE_TABLE(acpi, hisi_uc_pmu_acpi_match);
+
+static struct platform_driver hisi_uc_pmu_driver = {
+       .driver = {
+               .name = "hisi_uc_pmu",
+               .acpi_match_table = hisi_uc_pmu_acpi_match,
+               /*
+                * We have not worked out a safe bind/unbind process,
+                * Forcefully unbinding during sampling will lead to a
+                * kernel panic, so this is not supported yet.
+                */
+               .suppress_bind_attrs = true,
+       },
+       .probe = hisi_uc_pmu_probe,
+};
+
+static int __init hisi_uc_pmu_module_init(void)
+{
+       int ret;
+
+       ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+                                     "perf/hisi/uc:online",
+                                     hisi_uncore_pmu_online_cpu,
+                                     hisi_uncore_pmu_offline_cpu);
+       if (ret < 0) {
+               pr_err("UC PMU: Error setup hotplug, ret = %d\n", ret);
+               return ret;
+       }
+       hisi_uc_pmu_online = ret;
+
+       ret = platform_driver_register(&hisi_uc_pmu_driver);
+       if (ret)
+               cpuhp_remove_multi_state(hisi_uc_pmu_online);
+
+       return ret;
+}
+module_init(hisi_uc_pmu_module_init);
+
+static void __exit hisi_uc_pmu_module_exit(void)
+{
+       platform_driver_unregister(&hisi_uc_pmu_driver);
+       cpuhp_remove_multi_state(hisi_uc_pmu_online);
+}
+module_exit(hisi_uc_pmu_module_exit);
+
+MODULE_DESCRIPTION("HiSilicon SoC UC uncore PMU driver");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Junhao He <hejunhao3@huawei.com>");
index aaca6db..3f9a98c 100644 (file)
@@ -857,7 +857,6 @@ static int l2_cache_pmu_probe_cluster(struct device *dev, void *data)
                return -ENOMEM;
 
        INIT_LIST_HEAD(&cluster->next);
-       list_add(&cluster->next, &l2cache_pmu->clusters);
        cluster->cluster_id = fw_cluster_id;
 
        irq = platform_get_irq(sdev, 0);
@@ -883,6 +882,7 @@ static int l2_cache_pmu_probe_cluster(struct device *dev, void *data)
 
        spin_lock_init(&cluster->pmu_lock);
 
+       list_add(&cluster->next, &l2cache_pmu->clusters);
        l2cache_pmu->num_pmus++;
 
        return 0;
index c14089f..cabdddb 100644 (file)
@@ -70,7 +70,7 @@ static int phy_g12a_mipi_dphy_analog_power_on(struct phy *phy)
                     HHI_MIPI_CNTL1_BANDGAP);
 
        regmap_write(priv->regmap, HHI_MIPI_CNTL2,
-                    FIELD_PREP(HHI_MIPI_CNTL2_DIF_TX_CTL0, 0x459) |
+                    FIELD_PREP(HHI_MIPI_CNTL2_DIF_TX_CTL0, 0x45a) |
                     FIELD_PREP(HHI_MIPI_CNTL2_DIF_TX_CTL1, 0x2680));
 
        reg = DSI_LANE_CLK;
index caa9537..8aa7251 100644 (file)
@@ -237,11 +237,11 @@ static int mtk_hdmi_pll_calc(struct mtk_hdmi_phy *hdmi_phy, struct clk_hw *hw,
         */
        if (tmds_clk < 54 * MEGA)
                txposdiv = 8;
-       else if (tmds_clk >= 54 * MEGA && tmds_clk < 148.35 * MEGA)
+       else if (tmds_clk >= 54 * MEGA && (tmds_clk * 100) < 14835 * MEGA)
                txposdiv = 4;
-       else if (tmds_clk >= 148.35 * MEGA && tmds_clk < 296.7 * MEGA)
+       else if ((tmds_clk * 100) >= 14835 * MEGA && (tmds_clk * 10) < 2967 * MEGA)
                txposdiv = 2;
-       else if (tmds_clk >= 296.7 * MEGA && tmds_clk <= 594 * MEGA)
+       else if ((tmds_clk * 10) >= 2967 * MEGA && tmds_clk <= 594 * MEGA)
                txposdiv = 1;
        else
                return -EINVAL;
@@ -324,12 +324,12 @@ static int mtk_hdmi_pll_drv_setting(struct clk_hw *hw)
                clk_channel_bias = 0x34; /* 20mA */
                impedance_en = 0xf;
                impedance = 0x36; /* 100ohm */
-       } else if (pixel_clk >= 74.175 * MEGA && pixel_clk <= 300 * MEGA) {
+       } else if (((u64)pixel_clk * 1000) >= 74175 * MEGA && pixel_clk <= 300 * MEGA) {
                data_channel_bias = 0x34; /* 20mA */
                clk_channel_bias = 0x2c; /* 16mA */
                impedance_en = 0xf;
                impedance = 0x36; /* 100ohm */
-       } else if (pixel_clk >= 27 * MEGA && pixel_clk < 74.175 * MEGA) {
+       } else if (pixel_clk >= 27 * MEGA && ((u64)pixel_clk * 1000) < 74175 * MEGA) {
                data_channel_bias = 0x14; /* 10mA */
                clk_channel_bias = 0x14; /* 10mA */
                impedance_en = 0x0;
index 6850e04..87b17e5 100644 (file)
@@ -2472,7 +2472,7 @@ static int qmp_combo_com_init(struct qmp_combo *qmp)
        ret = regulator_bulk_enable(cfg->num_vregs, qmp->vregs);
        if (ret) {
                dev_err(qmp->dev, "failed to enable regulators, err=%d\n", ret);
-               goto err_unlock;
+               goto err_decrement_count;
        }
 
        ret = reset_control_bulk_assert(cfg->num_resets, qmp->resets);
@@ -2522,7 +2522,8 @@ err_assert_reset:
        reset_control_bulk_assert(cfg->num_resets, qmp->resets);
 err_disable_regulators:
        regulator_bulk_disable(cfg->num_vregs, qmp->vregs);
-err_unlock:
+err_decrement_count:
+       qmp->init_count--;
        mutex_unlock(&qmp->phy_mutex);
 
        return ret;
index 09824be..0c603bc 100644 (file)
@@ -379,7 +379,7 @@ static int qmp_pcie_msm8996_com_init(struct qmp_phy *qphy)
        ret = regulator_bulk_enable(cfg->num_vregs, qmp->vregs);
        if (ret) {
                dev_err(qmp->dev, "failed to enable regulators, err=%d\n", ret);
-               goto err_unlock;
+               goto err_decrement_count;
        }
 
        ret = reset_control_bulk_assert(cfg->num_resets, qmp->resets);
@@ -409,7 +409,8 @@ err_assert_reset:
        reset_control_bulk_assert(cfg->num_resets, qmp->resets);
 err_disable_regulators:
        regulator_bulk_disable(cfg->num_vregs, qmp->vregs);
-err_unlock:
+err_decrement_count:
+       qmp->init_count--;
        mutex_unlock(&qmp->phy_mutex);
 
        return ret;
index a590635..6c237f3 100644 (file)
@@ -115,11 +115,11 @@ struct phy_override_seq {
  *
  * @cfg_ahb_clk: AHB2PHY interface clock
  * @ref_clk: phy reference clock
- * @iface_clk: phy interface clock
  * @phy_reset: phy reset control
  * @vregs: regulator supplies bulk data
  * @phy_initialized: if PHY has been initialized correctly
  * @mode: contains the current mode the PHY is in
+ * @update_seq_cfg: tuning parameters for phy init
  */
 struct qcom_snps_hsphy {
        struct phy *phy;
index 7bfecdf..d249a03 100644 (file)
@@ -400,6 +400,7 @@ static struct meson_pmx_group meson_axg_periphs_groups[] = {
        GPIO_GROUP(GPIOA_15),
        GPIO_GROUP(GPIOA_16),
        GPIO_GROUP(GPIOA_17),
+       GPIO_GROUP(GPIOA_18),
        GPIO_GROUP(GPIOA_19),
        GPIO_GROUP(GPIOA_20),
 
index f279b36..43d3530 100644 (file)
@@ -30,6 +30,7 @@
 #include <linux/pinctrl/pinconf.h>
 #include <linux/pinctrl/pinconf-generic.h>
 #include <linux/pinctrl/pinmux.h>
+#include <linux/suspend.h>
 
 #include "core.h"
 #include "pinctrl-utils.h"
@@ -636,9 +637,8 @@ static bool do_amd_gpio_irq_handler(int irq, void *dev_id)
                        regval = readl(regs + i);
 
                        if (regval & PIN_IRQ_PENDING)
-                               dev_dbg(&gpio_dev->pdev->dev,
-                                       "GPIO %d is active: 0x%x",
-                                       irqnr + i, regval);
+                               pm_pr_dbg("GPIO %d is active: 0x%x",
+                                         irqnr + i, regval);
 
                        /* caused wake on resume context for shared IRQ */
                        if (irq < 0 && (regval & BIT(WAKE_STS_OFF)))
index dbe698f..e29c51c 100644 (file)
@@ -372,7 +372,7 @@ static struct i2c_driver cros_ec_driver = {
                .of_match_table = of_match_ptr(cros_ec_i2c_of_match),
                .pm     = &cros_ec_i2c_pm_ops,
        },
-       .probe_new      = cros_ec_i2c_probe,
+       .probe          = cros_ec_i2c_probe,
        .remove         = cros_ec_i2c_remove,
        .id_table       = cros_ec_i2c_id,
 };
index 68bba0f..500a61b 100644 (file)
@@ -16,6 +16,7 @@
 #include <linux/delay.h>
 #include <linux/io.h>
 #include <linux/interrupt.h>
+#include <linux/kobject.h>
 #include <linux/module.h>
 #include <linux/platform_data/cros_ec_commands.h>
 #include <linux/platform_data/cros_ec_proto.h>
@@ -315,6 +316,7 @@ static int cros_ec_lpc_readmem(struct cros_ec_device *ec, unsigned int offset,
 
 static void cros_ec_lpc_acpi_notify(acpi_handle device, u32 value, void *data)
 {
+       static const char *env[] = { "ERROR=PANIC", NULL };
        struct cros_ec_device *ec_dev = data;
        bool ec_has_more_events;
        int ret;
@@ -324,6 +326,7 @@ static void cros_ec_lpc_acpi_notify(acpi_handle device, u32 value, void *data)
        if (value == ACPI_NOTIFY_CROS_EC_PANIC) {
                dev_emerg(ec_dev->dev, "CrOS EC Panic Reported. Shutdown is imminent!");
                blocking_notifier_call_chain(&ec_dev->panic_notifier, 0, ec_dev);
+               kobject_uevent_env(&ec_dev->dev->kobj, KOBJ_CHANGE, (char **)env);
                /* Begin orderly shutdown. Force shutdown after 1 second. */
                hw_protection_shutdown("CrOS EC Panic", 1000);
                /* Do not query for other events after a panic is reported */
@@ -543,23 +546,25 @@ static const struct dmi_system_id cros_ec_lpc_dmi_table[] __initconst = {
 MODULE_DEVICE_TABLE(dmi, cros_ec_lpc_dmi_table);
 
 #ifdef CONFIG_PM_SLEEP
-static int cros_ec_lpc_suspend(struct device *dev)
+static int cros_ec_lpc_prepare(struct device *dev)
 {
        struct cros_ec_device *ec_dev = dev_get_drvdata(dev);
 
        return cros_ec_suspend(ec_dev);
 }
 
-static int cros_ec_lpc_resume(struct device *dev)
+static void cros_ec_lpc_complete(struct device *dev)
 {
        struct cros_ec_device *ec_dev = dev_get_drvdata(dev);
-
-       return cros_ec_resume(ec_dev);
+       cros_ec_resume(ec_dev);
 }
 #endif
 
 static const struct dev_pm_ops cros_ec_lpc_pm_ops = {
-       SET_LATE_SYSTEM_SLEEP_PM_OPS(cros_ec_lpc_suspend, cros_ec_lpc_resume)
+#ifdef CONFIG_PM_SLEEP
+       .prepare = cros_ec_lpc_prepare,
+       .complete = cros_ec_lpc_complete
+#endif
 };
 
 static struct platform_driver cros_ec_lpc_driver = {
index 21143db..3e88cc9 100644 (file)
@@ -104,13 +104,7 @@ static void debug_packet(struct device *dev, const char *name, u8 *ptr,
                         int len)
 {
 #ifdef DEBUG
-       int i;
-
-       dev_dbg(dev, "%s: ", name);
-       for (i = 0; i < len; i++)
-               pr_cont(" %02x", ptr[i]);
-
-       pr_cont("\n");
+       dev_dbg(dev, "%s: %*ph\n", name, len, ptr);
 #endif
 }
 
index 62ccb1a..b313130 100644 (file)
@@ -143,7 +143,7 @@ MODULE_DEVICE_TABLE(acpi, hps_acpi_id);
 #endif /* CONFIG_ACPI */
 
 static struct i2c_driver hps_i2c_driver = {
-       .probe_new = hps_i2c_probe,
+       .probe = hps_i2c_probe,
        .remove = hps_i2c_remove,
        .id_table = hps_i2c_id,
        .driver = {
index 7527204..0eefdcf 100644 (file)
@@ -51,13 +51,18 @@ static int cros_typec_cmd_mux_set(struct cros_typec_switch_data *sdata, int port
 static int cros_typec_get_mux_state(unsigned long mode, struct typec_altmode *alt)
 {
        int ret = -EOPNOTSUPP;
+       u8 pin_assign;
 
-       if (mode == TYPEC_STATE_SAFE)
+       if (mode == TYPEC_STATE_SAFE) {
                ret = USB_PD_MUX_SAFE_MODE;
-       else if (mode == TYPEC_STATE_USB)
+       } else if (mode == TYPEC_STATE_USB) {
                ret = USB_PD_MUX_USB_ENABLED;
-       else if (alt && alt->svid == USB_TYPEC_DP_SID)
+       } else if (alt && alt->svid == USB_TYPEC_DP_SID) {
                ret = USB_PD_MUX_DP_ENABLED;
+               pin_assign = mode - TYPEC_STATE_MODAL;
+               if (pin_assign & DP_PIN_ASSIGN_D)
+                       ret |= USB_PD_MUX_USB_ENABLED;
+       }
 
        return ret;
 }
index c2c9b0d..be967d7 100644 (file)
@@ -1348,9 +1348,8 @@ static int mlxbf_pmc_map_counters(struct device *dev)
 
        for (i = 0; i < pmc->total_blocks; ++i) {
                if (strstr(pmc->block_name[i], "tile")) {
-                       ret = sscanf(pmc->block_name[i], "tile%d", &tile_num);
-                       if (ret < 0)
-                               return ret;
+                       if (sscanf(pmc->block_name[i], "tile%d", &tile_num) != 1)
+                               return -EINVAL;
 
                        if (tile_num >= pmc->tile_count)
                                continue;
index 91a077c..a79318e 100644 (file)
@@ -784,7 +784,7 @@ static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
        fifo = vring->fifo;
 
        /* Return if vdev is not ready. */
-       if (!fifo->vdev[devid])
+       if (!fifo || !fifo->vdev[devid])
                return;
 
        /* Return if another vring is running. */
@@ -980,9 +980,13 @@ static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
 
                vq->num_max = vring->num;
 
+               vq->priv = vring;
+
+               /* Make vq update visible before using it. */
+               virtio_mb(false);
+
                vqs[i] = vq;
                vring->vq = vq;
-               vq->priv = vring;
        }
 
        return 0;
@@ -1302,6 +1306,9 @@ static int mlxbf_tmfifo_probe(struct platform_device *pdev)
 
        mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
 
+       /* Make all updates visible before setting the 'is_ready' flag. */
+       virtio_mb(false);
+
        fifo->is_ready = true;
        return 0;
 
index 535581c..7fc602e 100644 (file)
@@ -825,7 +825,7 @@ static int ssam_cplt_init(struct ssam_cplt *cplt, struct device *dev)
 
        cplt->dev = dev;
 
-       cplt->wq = create_workqueue(SSAM_CPLT_WQ_NAME);
+       cplt->wq = alloc_workqueue(SSAM_CPLT_WQ_NAME, WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
        if (!cplt->wq)
                return -ENOMEM;
 
index 8f52b62..c0a1a58 100644 (file)
@@ -210,6 +210,7 @@ enum ssam_kip_cover_state {
        SSAM_KIP_COVER_STATE_LAPTOP        = 0x03,
        SSAM_KIP_COVER_STATE_FOLDED_CANVAS = 0x04,
        SSAM_KIP_COVER_STATE_FOLDED_BACK   = 0x05,
+       SSAM_KIP_COVER_STATE_BOOK          = 0x06,
 };
 
 static const char *ssam_kip_cover_state_name(struct ssam_tablet_sw *sw,
@@ -231,6 +232,9 @@ static const char *ssam_kip_cover_state_name(struct ssam_tablet_sw *sw,
        case SSAM_KIP_COVER_STATE_FOLDED_BACK:
                return "folded-back";
 
+       case SSAM_KIP_COVER_STATE_BOOK:
+               return "book";
+
        default:
                dev_warn(&sw->sdev->dev, "unknown KIP cover state: %u\n", state->state);
                return "<unknown>";
@@ -244,6 +248,7 @@ static bool ssam_kip_cover_state_is_tablet_mode(struct ssam_tablet_sw *sw,
        case SSAM_KIP_COVER_STATE_DISCONNECTED:
        case SSAM_KIP_COVER_STATE_FOLDED_CANVAS:
        case SSAM_KIP_COVER_STATE_FOLDED_BACK:
+       case SSAM_KIP_COVER_STATE_BOOK:
                return true;
 
        case SSAM_KIP_COVER_STATE_CLOSED:
@@ -335,6 +340,7 @@ enum ssam_pos_state_cover {
        SSAM_POS_COVER_LAPTOP        = 0x03,
        SSAM_POS_COVER_FOLDED_CANVAS = 0x04,
        SSAM_POS_COVER_FOLDED_BACK   = 0x05,
+       SSAM_POS_COVER_BOOK          = 0x06,
 };
 
 enum ssam_pos_state_sls {
@@ -367,6 +373,9 @@ static const char *ssam_pos_state_name_cover(struct ssam_tablet_sw *sw, u32 stat
        case SSAM_POS_COVER_FOLDED_BACK:
                return "folded-back";
 
+       case SSAM_POS_COVER_BOOK:
+               return "book";
+
        default:
                dev_warn(&sw->sdev->dev, "unknown device posture for type-cover: %u\n", state);
                return "<unknown>";
@@ -416,6 +425,7 @@ static bool ssam_pos_state_is_tablet_mode_cover(struct ssam_tablet_sw *sw, u32 s
        case SSAM_POS_COVER_DISCONNECTED:
        case SSAM_POS_COVER_FOLDED_CANVAS:
        case SSAM_POS_COVER_FOLDED_BACK:
+       case SSAM_POS_COVER_BOOK:
                return true;
 
        case SSAM_POS_COVER_CLOSED:
index 4279057..1304cd6 100644 (file)
@@ -543,7 +543,7 @@ static int amd_pmc_idlemask_read(struct amd_pmc_dev *pdev, struct device *dev,
        }
 
        if (dev)
-               dev_dbg(pdev->dev, "SMU idlemask s0i3: 0x%x\n", val);
+               pm_pr_dbg("SMU idlemask s0i3: 0x%x\n", val);
 
        if (s)
                seq_printf(s, "SMU idlemask : 0x%x\n", val);
@@ -769,7 +769,7 @@ static int amd_pmc_verify_czn_rtc(struct amd_pmc_dev *pdev, u32 *arg)
 
        *arg |= (duration << 16);
        rc = rtc_alarm_irq_enable(rtc_device, 0);
-       dev_dbg(pdev->dev, "wakeup timer programmed for %lld seconds\n", duration);
+       pm_pr_dbg("wakeup timer programmed for %lld seconds\n", duration);
 
        return rc;
 }
index d5bb775..7780705 100644 (file)
@@ -245,24 +245,29 @@ static const struct pci_device_id pmf_pci_ids[] = {
        { }
 };
 
-int amd_pmf_init_metrics_table(struct amd_pmf_dev *dev)
+static void amd_pmf_set_dram_addr(struct amd_pmf_dev *dev)
 {
        u64 phys_addr;
        u32 hi, low;
 
-       INIT_DELAYED_WORK(&dev->work_buffer, amd_pmf_get_metrics);
+       phys_addr = virt_to_phys(dev->buf);
+       hi = phys_addr >> 32;
+       low = phys_addr & GENMASK(31, 0);
+
+       amd_pmf_send_cmd(dev, SET_DRAM_ADDR_HIGH, 0, hi, NULL);
+       amd_pmf_send_cmd(dev, SET_DRAM_ADDR_LOW, 0, low, NULL);
+}
 
+int amd_pmf_init_metrics_table(struct amd_pmf_dev *dev)
+{
        /* Get Metrics Table Address */
        dev->buf = kzalloc(sizeof(dev->m_table), GFP_KERNEL);
        if (!dev->buf)
                return -ENOMEM;
 
-       phys_addr = virt_to_phys(dev->buf);
-       hi = phys_addr >> 32;
-       low = phys_addr & GENMASK(31, 0);
+       INIT_DELAYED_WORK(&dev->work_buffer, amd_pmf_get_metrics);
 
-       amd_pmf_send_cmd(dev, SET_DRAM_ADDR_HIGH, 0, hi, NULL);
-       amd_pmf_send_cmd(dev, SET_DRAM_ADDR_LOW, 0, low, NULL);
+       amd_pmf_set_dram_addr(dev);
 
        /*
         * Start collecting the metrics data after a small delay
@@ -273,6 +278,18 @@ int amd_pmf_init_metrics_table(struct amd_pmf_dev *dev)
        return 0;
 }
 
+static int amd_pmf_resume_handler(struct device *dev)
+{
+       struct amd_pmf_dev *pdev = dev_get_drvdata(dev);
+
+       if (pdev->buf)
+               amd_pmf_set_dram_addr(pdev);
+
+       return 0;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(amd_pmf_pm, NULL, amd_pmf_resume_handler);
+
 static void amd_pmf_init_features(struct amd_pmf_dev *dev)
 {
        int ret;
@@ -280,6 +297,8 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev)
        /* Enable Static Slider */
        if (is_apmf_func_supported(dev, APMF_FUNC_STATIC_SLIDER_GRANULAR)) {
                amd_pmf_init_sps(dev);
+               dev->pwr_src_notifier.notifier_call = amd_pmf_pwr_src_notify_call;
+               power_supply_reg_notifier(&dev->pwr_src_notifier);
                dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n");
        }
 
@@ -298,8 +317,10 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev)
 
 static void amd_pmf_deinit_features(struct amd_pmf_dev *dev)
 {
-       if (is_apmf_func_supported(dev, APMF_FUNC_STATIC_SLIDER_GRANULAR))
+       if (is_apmf_func_supported(dev, APMF_FUNC_STATIC_SLIDER_GRANULAR)) {
+               power_supply_unreg_notifier(&dev->pwr_src_notifier);
                amd_pmf_deinit_sps(dev);
+       }
 
        if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
                amd_pmf_deinit_auto_mode(dev);
@@ -382,9 +403,6 @@ static int amd_pmf_probe(struct platform_device *pdev)
        apmf_install_handler(dev);
        amd_pmf_dbgfs_register(dev);
 
-       dev->pwr_src_notifier.notifier_call = amd_pmf_pwr_src_notify_call;
-       power_supply_reg_notifier(&dev->pwr_src_notifier);
-
        dev_info(dev->dev, "registered PMF device successfully\n");
 
        return 0;
@@ -394,7 +412,6 @@ static void amd_pmf_remove(struct platform_device *pdev)
 {
        struct amd_pmf_dev *dev = platform_get_drvdata(pdev);
 
-       power_supply_unreg_notifier(&dev->pwr_src_notifier);
        amd_pmf_deinit_features(dev);
        apmf_acpi_deinit(dev);
        amd_pmf_dbgfs_unregister(dev);
@@ -413,6 +430,7 @@ static struct platform_driver amd_pmf_driver = {
                .name = "amd-pmf",
                .acpi_match_table = amd_pmf_acpi_ids,
                .dev_groups = amd_pmf_driver_groups,
+               .pm = pm_sleep_ptr(&amd_pmf_pm),
        },
        .probe = amd_pmf_probe,
        .remove_new = amd_pmf_remove,
index e2c9a68..fdf7da0 100644 (file)
@@ -555,6 +555,7 @@ static const struct key_entry asus_nb_wmi_keymap[] = {
        { KE_KEY, 0x71, { KEY_F13 } }, /* General-purpose button */
        { KE_IGNORE, 0x79, },  /* Charger type dectection notification */
        { KE_KEY, 0x7a, { KEY_ALS_TOGGLE } }, /* Ambient Light Sensor Toggle */
+       { KE_IGNORE, 0x7B, }, /* Charger connect/disconnect notification */
        { KE_KEY, 0x7c, { KEY_MICMUTE } },
        { KE_KEY, 0x7D, { KEY_BLUETOOTH } }, /* Bluetooth Enable */
        { KE_KEY, 0x7E, { KEY_BLUETOOTH } }, /* Bluetooth Disable */
@@ -584,6 +585,7 @@ static const struct key_entry asus_nb_wmi_keymap[] = {
        { KE_KEY, 0xAE, { KEY_FN_F5 } }, /* Fn+F5 fan mode on 2020+ */
        { KE_KEY, 0xB3, { KEY_PROG4 } }, /* AURA */
        { KE_KEY, 0xB5, { KEY_CALC } },
+       { KE_IGNORE, 0xC0, }, /* External display connect/disconnect notification */
        { KE_KEY, 0xC4, { KEY_KBDILLUMUP } },
        { KE_KEY, 0xC5, { KEY_KBDILLUMDOWN } },
        { KE_IGNORE, 0xC6, },  /* Ambient Light Sensor notification */
index 873f59c..6364ae2 100644 (file)
@@ -211,6 +211,7 @@ struct bios_rfkill2_state {
 static const struct key_entry hp_wmi_keymap[] = {
        { KE_KEY, 0x02,    { KEY_BRIGHTNESSUP } },
        { KE_KEY, 0x03,    { KEY_BRIGHTNESSDOWN } },
+       { KE_KEY, 0x270,   { KEY_MICMUTE } },
        { KE_KEY, 0x20e6,  { KEY_PROG1 } },
        { KE_KEY, 0x20e8,  { KEY_MEDIA } },
        { KE_KEY, 0x2142,  { KEY_MEDIA } },
index 61dffb4..e6ae826 100644 (file)
@@ -208,7 +208,7 @@ static int scan_chunks_sanity_check(struct device *dev)
                        continue;
                reinit_completion(&ifs_done);
                local_work.dev = dev;
-               INIT_WORK(&local_work.w, copy_hashes_authenticate_chunks);
+               INIT_WORK_ONSTACK(&local_work.w, copy_hashes_authenticate_chunks);
                schedule_work_on(cpu, &local_work.w);
                wait_for_completion(&ifs_done);
                if (ifsd->loading_error) {
index 1086c3d..399f062 100644 (file)
@@ -101,9 +101,11 @@ int skl_int3472_register_clock(struct int3472_discrete_device *int3472,
 
        int3472->clock.ena_gpio = acpi_get_and_request_gpiod(path, agpio->pin_table[0],
                                                             "int3472,clk-enable");
-       if (IS_ERR(int3472->clock.ena_gpio))
-               return dev_err_probe(int3472->dev, PTR_ERR(int3472->clock.ena_gpio),
-                                    "getting clk-enable GPIO\n");
+       if (IS_ERR(int3472->clock.ena_gpio)) {
+               ret = PTR_ERR(int3472->clock.ena_gpio);
+               int3472->clock.ena_gpio = NULL;
+               return dev_err_probe(int3472->dev, ret, "getting clk-enable GPIO\n");
+       }
 
        if (polarity == GPIO_ACTIVE_LOW)
                gpiod_toggle_active_low(int3472->clock.ena_gpio);
@@ -199,8 +201,9 @@ int skl_int3472_register_regulator(struct int3472_discrete_device *int3472,
        int3472->regulator.gpio = acpi_get_and_request_gpiod(path, agpio->pin_table[0],
                                                             "int3472,regulator");
        if (IS_ERR(int3472->regulator.gpio)) {
-               dev_err(int3472->dev, "Failed to get regulator GPIO line\n");
-               return PTR_ERR(int3472->regulator.gpio);
+               ret = PTR_ERR(int3472->regulator.gpio);
+               int3472->regulator.gpio = NULL;
+               return dev_err_probe(int3472->dev, ret, "getting regulator GPIO\n");
        }
 
        /* Ensure the pin is in output mode and non-active state */
index e0572a2..02fe360 100644 (file)
@@ -304,14 +304,13 @@ struct isst_if_pkg_info {
 static struct isst_if_cpu_info *isst_cpu_info;
 static struct isst_if_pkg_info *isst_pkg_info;
 
-#define ISST_MAX_PCI_DOMAINS   8
-
 static struct pci_dev *_isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn)
 {
        struct pci_dev *matched_pci_dev = NULL;
        struct pci_dev *pci_dev = NULL;
+       struct pci_dev *_pci_dev = NULL;
        int no_matches = 0, pkg_id;
-       int i, bus_number;
+       int bus_number;
 
        if (bus_no < 0 || bus_no >= ISST_MAX_BUS_NUMBER || cpu < 0 ||
            cpu >= nr_cpu_ids || cpu >= num_possible_cpus())
@@ -323,12 +322,11 @@ static struct pci_dev *_isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn
        if (bus_number < 0)
                return NULL;
 
-       for (i = 0; i < ISST_MAX_PCI_DOMAINS; ++i) {
-               struct pci_dev *_pci_dev;
+       for_each_pci_dev(_pci_dev) {
                int node;
 
-               _pci_dev = pci_get_domain_bus_and_slot(i, bus_number, PCI_DEVFN(dev, fn));
-               if (!_pci_dev)
+               if (_pci_dev->bus->number != bus_number ||
+                   _pci_dev->devfn != PCI_DEVFN(dev, fn))
                        continue;
 
                ++no_matches;
index 1a300e1..064f186 100644 (file)
@@ -44,14 +44,18 @@ static ssize_t store_min_max_freq_khz(struct uncore_data *data,
                                      int min_max)
 {
        unsigned int input;
+       int ret;
 
        if (kstrtouint(buf, 10, &input))
                return -EINVAL;
 
        mutex_lock(&uncore_lock);
-       uncore_write(data, input, min_max);
+       ret = uncore_write(data, input, min_max);
        mutex_unlock(&uncore_lock);
 
+       if (ret)
+               return ret;
+
        return count;
 }
 
index 80abc70..d904fad 100644 (file)
@@ -34,6 +34,7 @@ static int intel_scu_pci_probe(struct pci_dev *pdev,
 
 static const struct pci_device_id pci_ids[] = {
        { PCI_VDEVICE(INTEL, 0x080e) },
+       { PCI_VDEVICE(INTEL, 0x082a) },
        { PCI_VDEVICE(INTEL, 0x08ea) },
        { PCI_VDEVICE(INTEL, 0x0a94) },
        { PCI_VDEVICE(INTEL, 0x11a0) },
index 6fe82f8..b3808ad 100644 (file)
@@ -10318,6 +10318,7 @@ static atomic_t dytc_ignore_event = ATOMIC_INIT(0);
 static DEFINE_MUTEX(dytc_mutex);
 static int dytc_capabilities;
 static bool dytc_mmc_get_available;
+static int profile_force;
 
 static int convert_dytc_to_profile(int funcmode, int dytcmode,
                enum platform_profile_option *profile)
@@ -10580,6 +10581,21 @@ static int tpacpi_dytc_profile_init(struct ibm_init_struct *iibm)
        if (err)
                return err;
 
+       /* Check if user wants to override the profile selection */
+       if (profile_force) {
+               switch (profile_force) {
+               case -1:
+                       dytc_capabilities = 0;
+                       break;
+               case 1:
+                       dytc_capabilities = BIT(DYTC_FC_MMC);
+                       break;
+               case 2:
+                       dytc_capabilities = BIT(DYTC_FC_PSC);
+                       break;
+               }
+               pr_debug("Profile selection forced: 0x%x\n", dytc_capabilities);
+       }
        if (dytc_capabilities & BIT(DYTC_FC_MMC)) { /* MMC MODE */
                pr_debug("MMC is supported\n");
                /*
@@ -10593,11 +10609,6 @@ static int tpacpi_dytc_profile_init(struct ibm_init_struct *iibm)
                                dytc_mmc_get_available = true;
                }
        } else if (dytc_capabilities & BIT(DYTC_FC_PSC)) { /* PSC MODE */
-               /* Support for this only works on AMD platforms */
-               if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
-                       dbg_printk(TPACPI_DBG_INIT, "PSC not support on Intel platforms\n");
-                       return -ENODEV;
-               }
                pr_debug("PSC is supported\n");
        } else {
                dbg_printk(TPACPI_DBG_INIT, "No DYTC support available\n");
@@ -11646,6 +11657,9 @@ MODULE_PARM_DESC(uwb_state,
                 "Initial state of the emulated UWB switch");
 #endif
 
+module_param(profile_force, int, 0444);
+MODULE_PARM_DESC(profile_force, "Force profile mode. -1=off, 1=MMC, 2=PSC");
+
 static void thinkpad_acpi_module_exit(void)
 {
        struct ibm_struct *ibm, *itmp;
index 13802a3..68e66b6 100644 (file)
@@ -336,6 +336,22 @@ static const struct ts_dmi_data dexp_ursus_7w_data = {
        .properties     = dexp_ursus_7w_props,
 };
 
+static const struct property_entry dexp_ursus_kx210i_props[] = {
+       PROPERTY_ENTRY_U32("touchscreen-min-x", 5),
+       PROPERTY_ENTRY_U32("touchscreen-min-y",  2),
+       PROPERTY_ENTRY_U32("touchscreen-size-x", 1720),
+       PROPERTY_ENTRY_U32("touchscreen-size-y", 1137),
+       PROPERTY_ENTRY_STRING("firmware-name", "gsl1680-dexp-ursus-kx210i.fw"),
+       PROPERTY_ENTRY_U32("silead,max-fingers", 10),
+       PROPERTY_ENTRY_BOOL("silead,home-button"),
+       { }
+};
+
+static const struct ts_dmi_data dexp_ursus_kx210i_data = {
+       .acpi_name      = "MSSL1680:00",
+       .properties     = dexp_ursus_kx210i_props,
+};
+
 static const struct property_entry digma_citi_e200_props[] = {
        PROPERTY_ENTRY_U32("touchscreen-size-x", 1980),
        PROPERTY_ENTRY_U32("touchscreen-size-y", 1500),
@@ -378,6 +394,11 @@ static const struct ts_dmi_data gdix1001_01_upside_down_data = {
        .properties     = gdix1001_upside_down_props,
 };
 
+static const struct ts_dmi_data gdix1002_00_upside_down_data = {
+       .acpi_name      = "GDIX1002:00",
+       .properties     = gdix1001_upside_down_props,
+};
+
 static const struct property_entry gp_electronic_t701_props[] = {
        PROPERTY_ENTRY_U32("touchscreen-size-x", 960),
        PROPERTY_ENTRY_U32("touchscreen-size-y", 640),
@@ -1186,6 +1207,14 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
                },
        },
        {
+               /* DEXP Ursus KX210i */
+               .driver_data = (void *)&dexp_ursus_kx210i_data,
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "INSYDE Corp."),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "S107I"),
+               },
+       },
+       {
                /* Digma Citi E200 */
                .driver_data = (void *)&digma_citi_e200_data,
                .matches = {
@@ -1296,6 +1325,18 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
                },
        },
        {
+               /* Juno Tablet */
+               .driver_data = (void *)&gdix1002_00_upside_down_data,
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Default string"),
+                       /* Both product- and board-name being "Default string" is somewhat rare */
+                       DMI_MATCH(DMI_PRODUCT_NAME, "Default string"),
+                       DMI_MATCH(DMI_BOARD_NAME, "Default string"),
+                       /* Above matches are too generic, add partial bios-version match */
+                       DMI_MATCH(DMI_BIOS_VERSION, "JP2V1."),
+               },
+       },
+       {
                /* Mediacom WinPad 7.0 W700 (same hw as Wintron surftab 7") */
                .driver_data = (void *)&trekstor_surftab_wintron70_data,
                .matches = {
index 307ee6f..6f83e99 100644 (file)
@@ -624,10 +624,8 @@ static int ab8500_btemp_get_ext_psy_data(struct device *dev, void *data)
  */
 static void ab8500_btemp_external_power_changed(struct power_supply *psy)
 {
-       struct ab8500_btemp *di = power_supply_get_drvdata(psy);
-
-       class_for_each_device(power_supply_class, NULL,
-               di->btemp_psy, ab8500_btemp_get_ext_psy_data);
+       class_for_each_device(power_supply_class, NULL, psy,
+                             ab8500_btemp_get_ext_psy_data);
 }
 
 /* ab8500 btemp driver interrupts and their respective isr */
index 41a7bff..53560fb 100644 (file)
@@ -2407,10 +2407,8 @@ out:
  */
 static void ab8500_fg_external_power_changed(struct power_supply *psy)
 {
-       struct ab8500_fg *di = power_supply_get_drvdata(psy);
-
-       class_for_each_device(power_supply_class, NULL,
-               di->fg_psy, ab8500_fg_get_ext_psy_data);
+       class_for_each_device(power_supply_class, NULL, psy,
+                             ab8500_fg_get_ext_psy_data);
 }
 
 /**
index 05f4131..3be6f3b 100644 (file)
@@ -507,7 +507,7 @@ static void fuel_gauge_external_power_changed(struct power_supply *psy)
        mutex_lock(&info->lock);
        info->valid = 0; /* Force updating of the cached registers */
        mutex_unlock(&info->lock);
-       power_supply_changed(info->bat);
+       power_supply_changed(psy);
 }
 
 static struct power_supply_desc fuel_gauge_desc = {
index de67b98..dc33f00 100644 (file)
@@ -1262,6 +1262,7 @@ static void bq24190_input_current_limit_work(struct work_struct *work)
        bq24190_charger_set_property(bdi->charger,
                                     POWER_SUPPLY_PROP_INPUT_CURRENT_LIMIT,
                                     &val);
+       power_supply_changed(bdi->charger);
 }
 
 /* Sync the input-current-limit with our parent supply (if we have one) */
index 22cde35..f8636cf 100644 (file)
@@ -750,7 +750,7 @@ static void bq25890_charger_external_power_changed(struct power_supply *psy)
        if (bq->chip_version != BQ25892)
                return;
 
-       ret = power_supply_get_property_from_supplier(bq->charger,
+       ret = power_supply_get_property_from_supplier(psy,
                                                      POWER_SUPPLY_PROP_USB_TYPE,
                                                      &val);
        if (ret)
@@ -775,6 +775,7 @@ static void bq25890_charger_external_power_changed(struct power_supply *psy)
        }
 
        bq25890_field_write(bq, F_IINLIM, input_current_limit);
+       power_supply_changed(psy);
 }
 
 static int bq25890_get_chip_state(struct bq25890_device *bq,
@@ -1106,6 +1107,8 @@ static void bq25890_pump_express_work(struct work_struct *data)
        dev_info(bq->dev, "Hi-voltage charging requested, input voltage is %d mV\n",
                 voltage);
 
+       power_supply_changed(bq->charger);
+
        return;
 error_print:
        bq25890_field_write(bq, F_PUMPX_EN, 0);
index 5ff6f44..4296600 100644 (file)
@@ -1083,10 +1083,8 @@ static int poll_interval_param_set(const char *val, const struct kernel_param *k
                return ret;
 
        mutex_lock(&bq27xxx_list_lock);
-       list_for_each_entry(di, &bq27xxx_battery_devices, list) {
-               cancel_delayed_work_sync(&di->work);
-               schedule_delayed_work(&di->work, 0);
-       }
+       list_for_each_entry(di, &bq27xxx_battery_devices, list)
+               mod_delayed_work(system_wq, &di->work, 0);
        mutex_unlock(&bq27xxx_list_lock);
 
        return ret;
@@ -1761,60 +1759,6 @@ static int bq27xxx_battery_read_health(struct bq27xxx_device_info *di)
        return POWER_SUPPLY_HEALTH_GOOD;
 }
 
-void bq27xxx_battery_update(struct bq27xxx_device_info *di)
-{
-       struct bq27xxx_reg_cache cache = {0, };
-       bool has_singe_flag = di->opts & BQ27XXX_O_ZERO;
-
-       cache.flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, has_singe_flag);
-       if ((cache.flags & 0xff) == 0xff)
-               cache.flags = -1; /* read error */
-       if (cache.flags >= 0) {
-               cache.temperature = bq27xxx_battery_read_temperature(di);
-               if (di->regs[BQ27XXX_REG_TTE] != INVALID_REG_ADDR)
-                       cache.time_to_empty = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTE);
-               if (di->regs[BQ27XXX_REG_TTECP] != INVALID_REG_ADDR)
-                       cache.time_to_empty_avg = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTECP);
-               if (di->regs[BQ27XXX_REG_TTF] != INVALID_REG_ADDR)
-                       cache.time_to_full = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTF);
-
-               cache.charge_full = bq27xxx_battery_read_fcc(di);
-               cache.capacity = bq27xxx_battery_read_soc(di);
-               if (di->regs[BQ27XXX_REG_AE] != INVALID_REG_ADDR)
-                       cache.energy = bq27xxx_battery_read_energy(di);
-               di->cache.flags = cache.flags;
-               cache.health = bq27xxx_battery_read_health(di);
-               if (di->regs[BQ27XXX_REG_CYCT] != INVALID_REG_ADDR)
-                       cache.cycle_count = bq27xxx_battery_read_cyct(di);
-
-               /* We only have to read charge design full once */
-               if (di->charge_design_full <= 0)
-                       di->charge_design_full = bq27xxx_battery_read_dcap(di);
-       }
-
-       if ((di->cache.capacity != cache.capacity) ||
-           (di->cache.flags != cache.flags))
-               power_supply_changed(di->bat);
-
-       if (memcmp(&di->cache, &cache, sizeof(cache)) != 0)
-               di->cache = cache;
-
-       di->last_update = jiffies;
-}
-EXPORT_SYMBOL_GPL(bq27xxx_battery_update);
-
-static void bq27xxx_battery_poll(struct work_struct *work)
-{
-       struct bq27xxx_device_info *di =
-                       container_of(work, struct bq27xxx_device_info,
-                                    work.work);
-
-       bq27xxx_battery_update(di);
-
-       if (poll_interval > 0)
-               schedule_delayed_work(&di->work, poll_interval * HZ);
-}
-
 static bool bq27xxx_battery_is_full(struct bq27xxx_device_info *di, int flags)
 {
        if (di->opts & BQ27XXX_O_ZERO)
@@ -1833,7 +1777,8 @@ static bool bq27xxx_battery_is_full(struct bq27xxx_device_info *di, int flags)
 static int bq27xxx_battery_current_and_status(
        struct bq27xxx_device_info *di,
        union power_supply_propval *val_curr,
-       union power_supply_propval *val_status)
+       union power_supply_propval *val_status,
+       struct bq27xxx_reg_cache *cache)
 {
        bool single_flags = (di->opts & BQ27XXX_O_ZERO);
        int curr;
@@ -1845,10 +1790,14 @@ static int bq27xxx_battery_current_and_status(
                return curr;
        }
 
-       flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, single_flags);
-       if (flags < 0) {
-               dev_err(di->dev, "error reading flags\n");
-               return flags;
+       if (cache) {
+               flags = cache->flags;
+       } else {
+               flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, single_flags);
+               if (flags < 0) {
+                       dev_err(di->dev, "error reading flags\n");
+                       return flags;
+               }
        }
 
        if (di->opts & BQ27XXX_O_ZERO) {
@@ -1883,6 +1832,78 @@ static int bq27xxx_battery_current_and_status(
        return 0;
 }
 
+static void bq27xxx_battery_update_unlocked(struct bq27xxx_device_info *di)
+{
+       union power_supply_propval status = di->last_status;
+       struct bq27xxx_reg_cache cache = {0, };
+       bool has_singe_flag = di->opts & BQ27XXX_O_ZERO;
+
+       cache.flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, has_singe_flag);
+       if ((cache.flags & 0xff) == 0xff)
+               cache.flags = -1; /* read error */
+       if (cache.flags >= 0) {
+               cache.temperature = bq27xxx_battery_read_temperature(di);
+               if (di->regs[BQ27XXX_REG_TTE] != INVALID_REG_ADDR)
+                       cache.time_to_empty = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTE);
+               if (di->regs[BQ27XXX_REG_TTECP] != INVALID_REG_ADDR)
+                       cache.time_to_empty_avg = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTECP);
+               if (di->regs[BQ27XXX_REG_TTF] != INVALID_REG_ADDR)
+                       cache.time_to_full = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTF);
+
+               cache.charge_full = bq27xxx_battery_read_fcc(di);
+               cache.capacity = bq27xxx_battery_read_soc(di);
+               if (di->regs[BQ27XXX_REG_AE] != INVALID_REG_ADDR)
+                       cache.energy = bq27xxx_battery_read_energy(di);
+               di->cache.flags = cache.flags;
+               cache.health = bq27xxx_battery_read_health(di);
+               if (di->regs[BQ27XXX_REG_CYCT] != INVALID_REG_ADDR)
+                       cache.cycle_count = bq27xxx_battery_read_cyct(di);
+
+               /*
+                * On gauges with signed current reporting the current must be
+                * checked to detect charging <-> discharging status changes.
+                */
+               if (!(di->opts & BQ27XXX_O_ZERO))
+                       bq27xxx_battery_current_and_status(di, NULL, &status, &cache);
+
+               /* We only have to read charge design full once */
+               if (di->charge_design_full <= 0)
+                       di->charge_design_full = bq27xxx_battery_read_dcap(di);
+       }
+
+       if ((di->cache.capacity != cache.capacity) ||
+           (di->cache.flags != cache.flags) ||
+           (di->last_status.intval != status.intval)) {
+               di->last_status.intval = status.intval;
+               power_supply_changed(di->bat);
+       }
+
+       if (memcmp(&di->cache, &cache, sizeof(cache)) != 0)
+               di->cache = cache;
+
+       di->last_update = jiffies;
+
+       if (!di->removed && poll_interval > 0)
+               mod_delayed_work(system_wq, &di->work, poll_interval * HZ);
+}
+
+void bq27xxx_battery_update(struct bq27xxx_device_info *di)
+{
+       mutex_lock(&di->lock);
+       bq27xxx_battery_update_unlocked(di);
+       mutex_unlock(&di->lock);
+}
+EXPORT_SYMBOL_GPL(bq27xxx_battery_update);
+
+static void bq27xxx_battery_poll(struct work_struct *work)
+{
+       struct bq27xxx_device_info *di =
+                       container_of(work, struct bq27xxx_device_info,
+                                    work.work);
+
+       bq27xxx_battery_update(di);
+}
+
 /*
  * Get the average power in µW
  * Return < 0 if something fails.
@@ -1985,10 +2006,8 @@ static int bq27xxx_battery_get_property(struct power_supply *psy,
        struct bq27xxx_device_info *di = power_supply_get_drvdata(psy);
 
        mutex_lock(&di->lock);
-       if (time_is_before_jiffies(di->last_update + 5 * HZ)) {
-               cancel_delayed_work_sync(&di->work);
-               bq27xxx_battery_poll(&di->work.work);
-       }
+       if (time_is_before_jiffies(di->last_update + 5 * HZ))
+               bq27xxx_battery_update_unlocked(di);
        mutex_unlock(&di->lock);
 
        if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0)
@@ -1996,7 +2015,7 @@ static int bq27xxx_battery_get_property(struct power_supply *psy,
 
        switch (psp) {
        case POWER_SUPPLY_PROP_STATUS:
-               ret = bq27xxx_battery_current_and_status(di, NULL, val);
+               ret = bq27xxx_battery_current_and_status(di, NULL, val, NULL);
                break;
        case POWER_SUPPLY_PROP_VOLTAGE_NOW:
                ret = bq27xxx_battery_voltage(di, val);
@@ -2005,7 +2024,7 @@ static int bq27xxx_battery_get_property(struct power_supply *psy,
                val->intval = di->cache.flags < 0 ? 0 : 1;
                break;
        case POWER_SUPPLY_PROP_CURRENT_NOW:
-               ret = bq27xxx_battery_current_and_status(di, val, NULL);
+               ret = bq27xxx_battery_current_and_status(di, val, NULL, NULL);
                break;
        case POWER_SUPPLY_PROP_CAPACITY:
                ret = bq27xxx_simple_value(di->cache.capacity, val);
@@ -2078,8 +2097,8 @@ static void bq27xxx_external_power_changed(struct power_supply *psy)
 {
        struct bq27xxx_device_info *di = power_supply_get_drvdata(psy);
 
-       cancel_delayed_work_sync(&di->work);
-       schedule_delayed_work(&di->work, 0);
+       /* After charger plug in/out wait 0.5s for things to stabilize */
+       mod_delayed_work(system_wq, &di->work, HZ / 2);
 }
 
 int bq27xxx_battery_setup(struct bq27xxx_device_info *di)
@@ -2127,22 +2146,18 @@ EXPORT_SYMBOL_GPL(bq27xxx_battery_setup);
 
 void bq27xxx_battery_teardown(struct bq27xxx_device_info *di)
 {
-       /*
-        * power_supply_unregister call bq27xxx_battery_get_property which
-        * call bq27xxx_battery_poll.
-        * Make sure that bq27xxx_battery_poll will not call
-        * schedule_delayed_work again after unregister (which cause OOPS).
-        */
-       poll_interval = 0;
-
-       cancel_delayed_work_sync(&di->work);
-
-       power_supply_unregister(di->bat);
-
        mutex_lock(&bq27xxx_list_lock);
        list_del(&di->list);
        mutex_unlock(&bq27xxx_list_lock);
 
+       /* Set removed to avoid bq27xxx_battery_update() re-queuing the work */
+       mutex_lock(&di->lock);
+       di->removed = true;
+       mutex_unlock(&di->lock);
+
+       cancel_delayed_work_sync(&di->work);
+
+       power_supply_unregister(di->bat);
        mutex_destroy(&di->lock);
 }
 EXPORT_SYMBOL_GPL(bq27xxx_battery_teardown);
index f876899..6d3c748 100644 (file)
@@ -179,7 +179,7 @@ static int bq27xxx_battery_i2c_probe(struct i2c_client *client)
        i2c_set_clientdata(client, di);
 
        if (client->irq) {
-               ret = devm_request_threaded_irq(&client->dev, client->irq,
+               ret = request_threaded_irq(client->irq,
                                NULL, bq27xxx_battery_irq_handler_thread,
                                IRQF_ONESHOT,
                                di->name, di);
@@ -209,6 +209,7 @@ static void bq27xxx_battery_i2c_remove(struct i2c_client *client)
 {
        struct bq27xxx_device_info *di = i2c_get_clientdata(client);
 
+       free_irq(client->irq, di);
        bq27xxx_battery_teardown(di);
 
        mutex_lock(&battery_mutex);
index 92e48e3..1305cba 100644 (file)
@@ -796,7 +796,9 @@ static int mt6360_charger_probe(struct platform_device *pdev)
        mci->vinovp = 6500000;
        mutex_init(&mci->chgdet_lock);
        platform_set_drvdata(pdev, mci);
-       devm_work_autocancel(&pdev->dev, &mci->chrdet_work, mt6360_chrdet_work);
+       ret = devm_work_autocancel(&pdev->dev, &mci->chrdet_work, mt6360_chrdet_work);
+       if (ret)
+               return dev_err_probe(&pdev->dev, ret, "Failed to set delayed work\n");
 
        ret = device_property_read_u32(&pdev->dev, "richtek,vinovp-microvolt", &mci->vinovp);
        if (ret)
index ab986db..3791aec 100644 (file)
@@ -348,6 +348,10 @@ static int __power_supply_is_system_supplied(struct device *dev, void *data)
        struct power_supply *psy = dev_get_drvdata(dev);
        unsigned int *count = data;
 
+       if (!psy->desc->get_property(psy, POWER_SUPPLY_PROP_SCOPE, &ret))
+               if (ret.intval == POWER_SUPPLY_SCOPE_DEVICE)
+                       return 0;
+
        (*count)++;
        if (psy->desc->type != POWER_SUPPLY_TYPE_BATTERY)
                if (!psy->desc->get_property(psy, POWER_SUPPLY_PROP_ONLINE,
@@ -366,8 +370,8 @@ int power_supply_is_system_supplied(void)
                                      __power_supply_is_system_supplied);
 
        /*
-        * If no power class device was found at all, most probably we are
-        * running on a desktop system, so assume we are on mains power.
+        * If no system scope power class device was found at all, most probably we
+        * are running on a desktop system, so assume we are on mains power.
         */
        if (count == 0)
                return 1;
@@ -573,7 +577,7 @@ int power_supply_get_battery_info(struct power_supply *psy,
        struct power_supply_battery_info *info;
        struct device_node *battery_np = NULL;
        struct fwnode_reference_args args;
-       struct fwnode_handle *fwnode;
+       struct fwnode_handle *fwnode = NULL;
        const char *value;
        int err, len, index;
        const __be32 *list;
@@ -585,7 +589,7 @@ int power_supply_get_battery_info(struct power_supply *psy,
                        return -ENODEV;
 
                fwnode = fwnode_handle_get(of_fwnode_handle(battery_np));
-       } else {
+       } else if (psy->dev.parent) {
                err = fwnode_property_get_reference_args(
                                        dev_fwnode(psy->dev.parent),
                                        "monitored-battery", NULL, 0, 0, &args);
@@ -595,6 +599,9 @@ int power_supply_get_battery_info(struct power_supply *psy,
                fwnode = args.fwnode;
        }
 
+       if (!fwnode)
+               return -ENOENT;
+
        err = fwnode_property_read_string(fwnode, "compatible", &value);
        if (err)
                goto out_put_node;
index 702bf83..0674483 100644 (file)
@@ -35,8 +35,9 @@ static void power_supply_update_bat_leds(struct power_supply *psy)
                led_trigger_event(psy->charging_full_trig, LED_FULL);
                led_trigger_event(psy->charging_trig, LED_OFF);
                led_trigger_event(psy->full_trig, LED_FULL);
-               led_trigger_event(psy->charging_blink_full_solid_trig,
-                       LED_FULL);
+               /* Going from blink to LED on requires a LED_OFF event to stop blink */
+               led_trigger_event(psy->charging_blink_full_solid_trig, LED_OFF);
+               led_trigger_event(psy->charging_blink_full_solid_trig, LED_FULL);
                break;
        case POWER_SUPPLY_STATUS_CHARGING:
                led_trigger_event(psy->charging_full_trig, LED_FULL);
index ba3b125..06e5b6b 100644 (file)
@@ -286,7 +286,8 @@ static ssize_t power_supply_show_property(struct device *dev,
 
                if (ret < 0) {
                        if (ret == -ENODATA)
-                               dev_dbg(dev, "driver has no data for `%s' property\n",
+                               dev_dbg_ratelimited(dev,
+                                       "driver has no data for `%s' property\n",
                                        attr->attr.name);
                        else if (ret != -ENODEV && ret != -EAGAIN)
                                dev_err_ratelimited(dev,
index 73f744a..ea33693 100644 (file)
@@ -1023,7 +1023,7 @@ static int rt9467_request_interrupt(struct rt9467_chg_data *data)
        for (i = 0; i < num_chg_irqs; i++) {
                virq = regmap_irq_get_virq(data->irq_chip_data, chg_irqs[i].hwirq);
                if (virq <= 0)
-                       return dev_err_probe(dev, virq, "Failed to get (%s) irq\n",
+                       return dev_err_probe(dev, -EINVAL, "Failed to get (%s) irq\n",
                                             chg_irqs[i].name);
 
                ret = devm_request_threaded_irq(dev, virq, NULL, chg_irqs[i].handler,
index 75ebcbf..a14e89a 100644 (file)
@@ -24,7 +24,7 @@
 #define SBS_CHARGER_REG_STATUS                 0x13
 #define SBS_CHARGER_REG_ALARM_WARNING          0x16
 
-#define SBS_CHARGER_STATUS_CHARGE_INHIBITED    BIT(1)
+#define SBS_CHARGER_STATUS_CHARGE_INHIBITED    BIT(0)
 #define SBS_CHARGER_STATUS_RES_COLD            BIT(9)
 #define SBS_CHARGER_STATUS_RES_HOT             BIT(10)
 #define SBS_CHARGER_STATUS_BATTERY_PRESENT     BIT(14)
index 632977f..bd23c4d 100644 (file)
@@ -733,13 +733,6 @@ static int sc27xx_fgu_set_property(struct power_supply *psy,
        return ret;
 }
 
-static void sc27xx_fgu_external_power_changed(struct power_supply *psy)
-{
-       struct sc27xx_fgu_data *data = power_supply_get_drvdata(psy);
-
-       power_supply_changed(data->battery);
-}
-
 static int sc27xx_fgu_property_is_writeable(struct power_supply *psy,
                                            enum power_supply_property psp)
 {
@@ -774,7 +767,7 @@ static const struct power_supply_desc sc27xx_fgu_desc = {
        .num_properties         = ARRAY_SIZE(sc27xx_fgu_props),
        .get_property           = sc27xx_fgu_get_property,
        .set_property           = sc27xx_fgu_set_property,
-       .external_power_changed = sc27xx_fgu_external_power_changed,
+       .external_power_changed = power_supply_changed,
        .property_is_writeable  = sc27xx_fgu_property_is_writeable,
        .no_thermal             = true,
 };
index 90d33cd..69ef8d0 100644 (file)
@@ -18,10 +18,12 @@ if POWERCAP
 # Client driver configurations go here.
 config INTEL_RAPL_CORE
        tristate
+       depends on PCI
+       select IOSF_MBI
 
 config INTEL_RAPL
        tristate "Intel RAPL Support via MSR Interface"
-       depends on X86 && IOSF_MBI
+       depends on X86 && PCI
        select INTEL_RAPL_CORE
        help
          This enables support for the Intel Running Average Power Limit (RAPL)
@@ -33,6 +35,20 @@ config INTEL_RAPL
          controller, CPU core (Power Plane 0), graphics uncore (Power Plane
          1), etc.
 
+config INTEL_RAPL_TPMI
+       tristate "Intel RAPL Support via TPMI Interface"
+       depends on X86
+       depends on INTEL_TPMI
+       select INTEL_RAPL_CORE
+       help
+         This enables support for the Intel Running Average Power Limit (RAPL)
+         technology via TPMI interface, which allows power limits to be enforced
+         and monitored.
+
+         In RAPL, the platform level settings are divided into domains for
+         fine grained control. These domains include processor package, DRAM
+         controller, platform, etc.
+
 config IDLE_INJECT
        bool "Idle injection framework"
        depends on CPU_IDLE
index 4474201..5ab0dce 100644 (file)
@@ -5,5 +5,6 @@ obj-$(CONFIG_DTPM_DEVFREQ) += dtpm_devfreq.o
 obj-$(CONFIG_POWERCAP) += powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
+obj-$(CONFIG_INTEL_RAPL_TPMI) += intel_rapl_tpmi.o
 obj-$(CONFIG_IDLE_INJECT) += idle_inject.o
 obj-$(CONFIG_ARM_SCMI_POWERCAP) += arm_scmi_powercap.o
index 8970c7b..4e646e5 100644 (file)
 #define PSYS_TIME_WINDOW1_MASK       (0x7FULL<<19)
 #define PSYS_TIME_WINDOW2_MASK       (0x7FULL<<51)
 
+/* bitmasks for RAPL TPMI, used by primitive access functions */
+#define TPMI_POWER_LIMIT_MASK  0x3FFFF
+#define TPMI_POWER_LIMIT_ENABLE        BIT_ULL(62)
+#define TPMI_TIME_WINDOW_MASK  (0x7FULL<<18)
+#define TPMI_INFO_SPEC_MASK    0x3FFFF
+#define TPMI_INFO_MIN_MASK     (0x3FFFFULL << 18)
+#define TPMI_INFO_MAX_MASK     (0x3FFFFULL << 36)
+#define TPMI_INFO_MAX_TIME_WIN_MASK    (0x7FULL << 54)
+
 /* Non HW constants */
 #define RAPL_PRIMITIVE_DERIVED       BIT(1)    /* not from raw data */
 #define RAPL_PRIMITIVE_DUMMY         BIT(2)
@@ -94,26 +103,120 @@ enum unit_type {
 
 #define        DOMAIN_STATE_INACTIVE           BIT(0)
 #define        DOMAIN_STATE_POWER_LIMIT_SET    BIT(1)
-#define DOMAIN_STATE_BIOS_LOCKED        BIT(2)
 
-static const char pl1_name[] = "long_term";
-static const char pl2_name[] = "short_term";
-static const char pl4_name[] = "peak_power";
+static const char *pl_names[NR_POWER_LIMITS] = {
+       [POWER_LIMIT1] = "long_term",
+       [POWER_LIMIT2] = "short_term",
+       [POWER_LIMIT4] = "peak_power",
+};
+
+enum pl_prims {
+       PL_ENABLE,
+       PL_CLAMP,
+       PL_LIMIT,
+       PL_TIME_WINDOW,
+       PL_MAX_POWER,
+       PL_LOCK,
+};
+
+static bool is_pl_valid(struct rapl_domain *rd, int pl)
+{
+       if (pl < POWER_LIMIT1 || pl > POWER_LIMIT4)
+               return false;
+       return rd->rpl[pl].name ? true : false;
+}
+
+static int get_pl_lock_prim(struct rapl_domain *rd, int pl)
+{
+       if (rd->rp->priv->type == RAPL_IF_TPMI) {
+               if (pl == POWER_LIMIT1)
+                       return PL1_LOCK;
+               if (pl == POWER_LIMIT2)
+                       return PL2_LOCK;
+               if (pl == POWER_LIMIT4)
+                       return PL4_LOCK;
+       }
+
+       /* MSR/MMIO Interface doesn't have Lock bit for PL4 */
+       if (pl == POWER_LIMIT4)
+               return -EINVAL;
+
+       /*
+        * Power Limit register that supports two power limits has a different
+        * bit position for the Lock bit.
+        */
+       if (rd->rp->priv->limits[rd->id] & BIT(POWER_LIMIT2))
+               return FW_HIGH_LOCK;
+       return FW_LOCK;
+}
+
+static int get_pl_prim(struct rapl_domain *rd, int pl, enum pl_prims prim)
+{
+       switch (pl) {
+       case POWER_LIMIT1:
+               if (prim == PL_ENABLE)
+                       return PL1_ENABLE;
+               if (prim == PL_CLAMP && rd->rp->priv->type != RAPL_IF_TPMI)
+                       return PL1_CLAMP;
+               if (prim == PL_LIMIT)
+                       return POWER_LIMIT1;
+               if (prim == PL_TIME_WINDOW)
+                       return TIME_WINDOW1;
+               if (prim == PL_MAX_POWER)
+                       return THERMAL_SPEC_POWER;
+               if (prim == PL_LOCK)
+                       return get_pl_lock_prim(rd, pl);
+               return -EINVAL;
+       case POWER_LIMIT2:
+               if (prim == PL_ENABLE)
+                       return PL2_ENABLE;
+               if (prim == PL_CLAMP && rd->rp->priv->type != RAPL_IF_TPMI)
+                       return PL2_CLAMP;
+               if (prim == PL_LIMIT)
+                       return POWER_LIMIT2;
+               if (prim == PL_TIME_WINDOW)
+                       return TIME_WINDOW2;
+               if (prim == PL_MAX_POWER)
+                       return MAX_POWER;
+               if (prim == PL_LOCK)
+                       return get_pl_lock_prim(rd, pl);
+               return -EINVAL;
+       case POWER_LIMIT4:
+               if (prim == PL_LIMIT)
+                       return POWER_LIMIT4;
+               if (prim == PL_ENABLE)
+                       return PL4_ENABLE;
+               /* PL4 would be around two times PL2, use same prim as PL2. */
+               if (prim == PL_MAX_POWER)
+                       return MAX_POWER;
+               if (prim == PL_LOCK)
+                       return get_pl_lock_prim(rd, pl);
+               return -EINVAL;
+       default:
+               return -EINVAL;
+       }
+}
 
 #define power_zone_to_rapl_domain(_zone) \
        container_of(_zone, struct rapl_domain, power_zone)
 
 struct rapl_defaults {
        u8 floor_freq_reg_addr;
-       int (*check_unit)(struct rapl_package *rp, int cpu);
+       int (*check_unit)(struct rapl_domain *rd);
        void (*set_floor_freq)(struct rapl_domain *rd, bool mode);
-       u64 (*compute_time_window)(struct rapl_package *rp, u64 val,
+       u64 (*compute_time_window)(struct rapl_domain *rd, u64 val,
                                    bool to_raw);
        unsigned int dram_domain_energy_unit;
        unsigned int psys_domain_energy_unit;
        bool spr_psys_bits;
 };
-static struct rapl_defaults *rapl_defaults;
+static struct rapl_defaults *defaults_msr;
+static const struct rapl_defaults defaults_tpmi;
+
+static struct rapl_defaults *get_defaults(struct rapl_package *rp)
+{
+       return rp->priv->defaults;
+}
 
 /* Sideband MBI registers */
 #define IOSF_CPU_POWER_BUDGET_CTL_BYT (0x2)
@@ -150,6 +253,12 @@ static int rapl_read_data_raw(struct rapl_domain *rd,
 static int rapl_write_data_raw(struct rapl_domain *rd,
                               enum rapl_primitives prim,
                               unsigned long long value);
+static int rapl_read_pl_data(struct rapl_domain *rd, int pl,
+                             enum pl_prims pl_prim,
+                             bool xlate, u64 *data);
+static int rapl_write_pl_data(struct rapl_domain *rd, int pl,
+                              enum pl_prims pl_prim,
+                              unsigned long long value);
 static u64 rapl_unit_xlate(struct rapl_domain *rd,
                           enum unit_type type, u64 value, int to_raw);
 static void package_power_limit_irq_save(struct rapl_package *rp);
@@ -217,7 +326,7 @@ static int find_nr_power_limit(struct rapl_domain *rd)
        int i, nr_pl = 0;
 
        for (i = 0; i < NR_POWER_LIMITS; i++) {
-               if (rd->rpl[i].name)
+               if (is_pl_valid(rd, i))
                        nr_pl++;
        }
 
@@ -227,37 +336,35 @@ static int find_nr_power_limit(struct rapl_domain *rd)
 static int set_domain_enable(struct powercap_zone *power_zone, bool mode)
 {
        struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone);
-
-       if (rd->state & DOMAIN_STATE_BIOS_LOCKED)
-               return -EACCES;
+       struct rapl_defaults *defaults = get_defaults(rd->rp);
+       int ret;
 
        cpus_read_lock();
-       rapl_write_data_raw(rd, PL1_ENABLE, mode);
-       if (rapl_defaults->set_floor_freq)
-               rapl_defaults->set_floor_freq(rd, mode);
+       ret = rapl_write_pl_data(rd, POWER_LIMIT1, PL_ENABLE, mode);
+       if (!ret && defaults->set_floor_freq)
+               defaults->set_floor_freq(rd, mode);
        cpus_read_unlock();
 
-       return 0;
+       return ret;
 }
 
 static int get_domain_enable(struct powercap_zone *power_zone, bool *mode)
 {
        struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone);
        u64 val;
+       int ret;
 
-       if (rd->state & DOMAIN_STATE_BIOS_LOCKED) {
+       if (rd->rpl[POWER_LIMIT1].locked) {
                *mode = false;
                return 0;
        }
        cpus_read_lock();
-       if (rapl_read_data_raw(rd, PL1_ENABLE, true, &val)) {
-               cpus_read_unlock();
-               return -EIO;
-       }
-       *mode = val;
+       ret = rapl_read_pl_data(rd, POWER_LIMIT1, PL_ENABLE, true, &val);
+       if (!ret)
+               *mode = val;
        cpus_read_unlock();
 
-       return 0;
+       return ret;
 }
 
 /* per RAPL domain ops, in the order of rapl_domain_type */
@@ -313,8 +420,8 @@ static int contraint_to_pl(struct rapl_domain *rd, int cid)
 {
        int i, j;
 
-       for (i = 0, j = 0; i < NR_POWER_LIMITS; i++) {
-               if ((rd->rpl[i].name) && j++ == cid) {
+       for (i = POWER_LIMIT1, j = 0; i < NR_POWER_LIMITS; i++) {
+               if (is_pl_valid(rd, i) && j++ == cid) {
                        pr_debug("%s: index %d\n", __func__, i);
                        return i;
                }
@@ -335,36 +442,11 @@ static int set_power_limit(struct powercap_zone *power_zone, int cid,
        cpus_read_lock();
        rd = power_zone_to_rapl_domain(power_zone);
        id = contraint_to_pl(rd, cid);
-       if (id < 0) {
-               ret = id;
-               goto set_exit;
-       }
-
        rp = rd->rp;
 
-       if (rd->state & DOMAIN_STATE_BIOS_LOCKED) {
-               dev_warn(&power_zone->dev,
-                        "%s locked by BIOS, monitoring only\n", rd->name);
-               ret = -EACCES;
-               goto set_exit;
-       }
-
-       switch (rd->rpl[id].prim_id) {
-       case PL1_ENABLE:
-               rapl_write_data_raw(rd, POWER_LIMIT1, power_limit);
-               break;
-       case PL2_ENABLE:
-               rapl_write_data_raw(rd, POWER_LIMIT2, power_limit);
-               break;
-       case PL4_ENABLE:
-               rapl_write_data_raw(rd, POWER_LIMIT4, power_limit);
-               break;
-       default:
-               ret = -EINVAL;
-       }
+       ret = rapl_write_pl_data(rd, id, PL_LIMIT, power_limit);
        if (!ret)
                package_power_limit_irq_save(rp);
-set_exit:
        cpus_read_unlock();
        return ret;
 }
@@ -374,38 +456,17 @@ static int get_current_power_limit(struct powercap_zone *power_zone, int cid,
 {
        struct rapl_domain *rd;
        u64 val;
-       int prim;
        int ret = 0;
        int id;
 
        cpus_read_lock();
        rd = power_zone_to_rapl_domain(power_zone);
        id = contraint_to_pl(rd, cid);
-       if (id < 0) {
-               ret = id;
-               goto get_exit;
-       }
 
-       switch (rd->rpl[id].prim_id) {
-       case PL1_ENABLE:
-               prim = POWER_LIMIT1;
-               break;
-       case PL2_ENABLE:
-               prim = POWER_LIMIT2;
-               break;
-       case PL4_ENABLE:
-               prim = POWER_LIMIT4;
-               break;
-       default:
-               cpus_read_unlock();
-               return -EINVAL;
-       }
-       if (rapl_read_data_raw(rd, prim, true, &val))
-               ret = -EIO;
-       else
+       ret = rapl_read_pl_data(rd, id, PL_LIMIT, true, &val);
+       if (!ret)
                *data = val;
 
-get_exit:
        cpus_read_unlock();
 
        return ret;
@@ -421,23 +482,9 @@ static int set_time_window(struct powercap_zone *power_zone, int cid,
        cpus_read_lock();
        rd = power_zone_to_rapl_domain(power_zone);
        id = contraint_to_pl(rd, cid);
-       if (id < 0) {
-               ret = id;
-               goto set_time_exit;
-       }
 
-       switch (rd->rpl[id].prim_id) {
-       case PL1_ENABLE:
-               rapl_write_data_raw(rd, TIME_WINDOW1, window);
-               break;
-       case PL2_ENABLE:
-               rapl_write_data_raw(rd, TIME_WINDOW2, window);
-               break;
-       default:
-               ret = -EINVAL;
-       }
+       ret = rapl_write_pl_data(rd, id, PL_TIME_WINDOW, window);
 
-set_time_exit:
        cpus_read_unlock();
        return ret;
 }
@@ -453,33 +500,11 @@ static int get_time_window(struct powercap_zone *power_zone, int cid,
        cpus_read_lock();
        rd = power_zone_to_rapl_domain(power_zone);
        id = contraint_to_pl(rd, cid);
-       if (id < 0) {
-               ret = id;
-               goto get_time_exit;
-       }
 
-       switch (rd->rpl[id].prim_id) {
-       case PL1_ENABLE:
-               ret = rapl_read_data_raw(rd, TIME_WINDOW1, true, &val);
-               break;
-       case PL2_ENABLE:
-               ret = rapl_read_data_raw(rd, TIME_WINDOW2, true, &val);
-               break;
-       case PL4_ENABLE:
-               /*
-                * Time window parameter is not applicable for PL4 entry
-                * so assigining '0' as default value.
-                */
-               val = 0;
-               break;
-       default:
-               cpus_read_unlock();
-               return -EINVAL;
-       }
+       ret = rapl_read_pl_data(rd, id, PL_TIME_WINDOW, true, &val);
        if (!ret)
                *data = val;
 
-get_time_exit:
        cpus_read_unlock();
 
        return ret;
@@ -499,36 +524,23 @@ static const char *get_constraint_name(struct powercap_zone *power_zone,
        return NULL;
 }
 
-static int get_max_power(struct powercap_zone *power_zone, int id, u64 *data)
+static int get_max_power(struct powercap_zone *power_zone, int cid, u64 *data)
 {
        struct rapl_domain *rd;
        u64 val;
-       int prim;
        int ret = 0;
+       int id;
 
        cpus_read_lock();
        rd = power_zone_to_rapl_domain(power_zone);
-       switch (rd->rpl[id].prim_id) {
-       case PL1_ENABLE:
-               prim = THERMAL_SPEC_POWER;
-               break;
-       case PL2_ENABLE:
-               prim = MAX_POWER;
-               break;
-       case PL4_ENABLE:
-               prim = MAX_POWER;
-               break;
-       default:
-               cpus_read_unlock();
-               return -EINVAL;
-       }
-       if (rapl_read_data_raw(rd, prim, true, &val))
-               ret = -EIO;
-       else
+       id = contraint_to_pl(rd, cid);
+
+       ret = rapl_read_pl_data(rd, id, PL_MAX_POWER, true, &val);
+       if (!ret)
                *data = val;
 
        /* As a generalization rule, PL4 would be around two times PL2. */
-       if (rd->rpl[id].prim_id == PL4_ENABLE)
+       if (id == POWER_LIMIT4)
                *data = *data * 2;
 
        cpus_read_unlock();
@@ -545,6 +557,12 @@ static const struct powercap_zone_constraint_ops constraint_ops = {
        .get_name = get_constraint_name,
 };
 
+/* Return the id used for read_raw/write_raw callback */
+static int get_rid(struct rapl_package *rp)
+{
+       return rp->lead_cpu >= 0 ? rp->lead_cpu : rp->id;
+}
+
 /* called after domain detection and package level data are set */
 static void rapl_init_domains(struct rapl_package *rp)
 {
@@ -554,6 +572,7 @@ static void rapl_init_domains(struct rapl_package *rp)
 
        for (i = 0; i < RAPL_DOMAIN_MAX; i++) {
                unsigned int mask = rp->domain_map & (1 << i);
+               int t;
 
                if (!mask)
                        continue;
@@ -562,51 +581,26 @@ static void rapl_init_domains(struct rapl_package *rp)
 
                if (i == RAPL_DOMAIN_PLATFORM && rp->id > 0) {
                        snprintf(rd->name, RAPL_DOMAIN_NAME_LENGTH, "psys-%d",
-                               topology_physical_package_id(rp->lead_cpu));
-               } else
+                               rp->lead_cpu >= 0 ? topology_physical_package_id(rp->lead_cpu) :
+                               rp->id);
+               } else {
                        snprintf(rd->name, RAPL_DOMAIN_NAME_LENGTH, "%s",
                                rapl_domain_names[i]);
+               }
 
                rd->id = i;
-               rd->rpl[0].prim_id = PL1_ENABLE;
-               rd->rpl[0].name = pl1_name;
 
-               /*
-                * The PL2 power domain is applicable for limits two
-                * and limits three
-                */
-               if (rp->priv->limits[i] >= 2) {
-                       rd->rpl[1].prim_id = PL2_ENABLE;
-                       rd->rpl[1].name = pl2_name;
-               }
+               /* PL1 is supported by default */
+               rp->priv->limits[i] |= BIT(POWER_LIMIT1);
 
-               /* Enable PL4 domain if the total power limits are three */
-               if (rp->priv->limits[i] == 3) {
-                       rd->rpl[2].prim_id = PL4_ENABLE;
-                       rd->rpl[2].name = pl4_name;
+               for (t = POWER_LIMIT1; t < NR_POWER_LIMITS; t++) {
+                       if (rp->priv->limits[i] & BIT(t))
+                               rd->rpl[t].name = pl_names[t];
                }
 
                for (j = 0; j < RAPL_DOMAIN_REG_MAX; j++)
                        rd->regs[j] = rp->priv->regs[i][j];
 
-               switch (i) {
-               case RAPL_DOMAIN_DRAM:
-                       rd->domain_energy_unit =
-                           rapl_defaults->dram_domain_energy_unit;
-                       if (rd->domain_energy_unit)
-                               pr_info("DRAM domain energy unit %dpj\n",
-                                       rd->domain_energy_unit);
-                       break;
-               case RAPL_DOMAIN_PLATFORM:
-                       rd->domain_energy_unit =
-                           rapl_defaults->psys_domain_energy_unit;
-                       if (rd->domain_energy_unit)
-                               pr_info("Platform domain energy unit %dpj\n",
-                                       rd->domain_energy_unit);
-                       break;
-               default:
-                       break;
-               }
                rd++;
        }
 }
@@ -615,23 +609,19 @@ static u64 rapl_unit_xlate(struct rapl_domain *rd, enum unit_type type,
                           u64 value, int to_raw)
 {
        u64 units = 1;
-       struct rapl_package *rp = rd->rp;
+       struct rapl_defaults *defaults = get_defaults(rd->rp);
        u64 scale = 1;
 
        switch (type) {
        case POWER_UNIT:
-               units = rp->power_unit;
+               units = rd->power_unit;
                break;
        case ENERGY_UNIT:
                scale = ENERGY_UNIT_SCALE;
-               /* per domain unit takes precedence */
-               if (rd->domain_energy_unit)
-                       units = rd->domain_energy_unit;
-               else
-                       units = rp->energy_unit;
+               units = rd->energy_unit;
                break;
        case TIME_UNIT:
-               return rapl_defaults->compute_time_window(rp, value, to_raw);
+               return defaults->compute_time_window(rd, value, to_raw);
        case ARBITRARY_UNIT:
        default:
                return value;
@@ -645,67 +635,141 @@ static u64 rapl_unit_xlate(struct rapl_domain *rd, enum unit_type type,
        return div64_u64(value, scale);
 }
 
-/* in the order of enum rapl_primitives */
-static struct rapl_primitive_info rpi[] = {
+/* RAPL primitives for MSR and MMIO I/F */
+static struct rapl_primitive_info rpi_msr[NR_RAPL_PRIMITIVES] = {
        /* name, mask, shift, msr index, unit divisor */
-       PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0,
-                           RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(POWER_LIMIT1, POWER_LIMIT1_MASK, 0,
+       [POWER_LIMIT1] = PRIMITIVE_INFO_INIT(POWER_LIMIT1, POWER_LIMIT1_MASK, 0,
                            RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0),
-       PRIMITIVE_INFO_INIT(POWER_LIMIT2, POWER_LIMIT2_MASK, 32,
+       [POWER_LIMIT2] = PRIMITIVE_INFO_INIT(POWER_LIMIT2, POWER_LIMIT2_MASK, 32,
                            RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0),
-       PRIMITIVE_INFO_INIT(POWER_LIMIT4, POWER_LIMIT4_MASK, 0,
+       [POWER_LIMIT4] = PRIMITIVE_INFO_INIT(POWER_LIMIT4, POWER_LIMIT4_MASK, 0,
                                RAPL_DOMAIN_REG_PL4, POWER_UNIT, 0),
-       PRIMITIVE_INFO_INIT(FW_LOCK, POWER_LOW_LOCK, 31,
+       [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0,
+                           RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0),
+       [FW_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, POWER_LOW_LOCK, 31,
                            RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PL1_ENABLE, POWER_LIMIT1_ENABLE, 15,
+       [FW_HIGH_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, POWER_HIGH_LOCK, 63,
                            RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PL1_CLAMP, POWER_LIMIT1_CLAMP, 16,
+       [PL1_ENABLE] = PRIMITIVE_INFO_INIT(PL1_ENABLE, POWER_LIMIT1_ENABLE, 15,
                            RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PL2_ENABLE, POWER_LIMIT2_ENABLE, 47,
+       [PL1_CLAMP] = PRIMITIVE_INFO_INIT(PL1_CLAMP, POWER_LIMIT1_CLAMP, 16,
                            RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48,
+       [PL2_ENABLE] = PRIMITIVE_INFO_INIT(PL2_ENABLE, POWER_LIMIT2_ENABLE, 47,
                            RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PL4_ENABLE, POWER_LIMIT4_MASK, 0,
+       [PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48,
+                           RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
+       [PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE, POWER_LIMIT4_MASK, 0,
                                RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17,
+       [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17,
                            RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0),
-       PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49,
+       [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49,
                            RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0),
-       PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, POWER_INFO_THERMAL_SPEC_MASK,
+       [THERMAL_SPEC_POWER] = PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, POWER_INFO_THERMAL_SPEC_MASK,
                            0, RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0),
-       PRIMITIVE_INFO_INIT(MAX_POWER, POWER_INFO_MAX_MASK, 32,
+       [MAX_POWER] = PRIMITIVE_INFO_INIT(MAX_POWER, POWER_INFO_MAX_MASK, 32,
                            RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0),
-       PRIMITIVE_INFO_INIT(MIN_POWER, POWER_INFO_MIN_MASK, 16,
+       [MIN_POWER] = PRIMITIVE_INFO_INIT(MIN_POWER, POWER_INFO_MIN_MASK, 16,
                            RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0),
-       PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, POWER_INFO_MAX_TIME_WIN_MASK, 48,
+       [MAX_TIME_WINDOW] = PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, POWER_INFO_MAX_TIME_WIN_MASK, 48,
                            RAPL_DOMAIN_REG_INFO, TIME_UNIT, 0),
-       PRIMITIVE_INFO_INIT(THROTTLED_TIME, PERF_STATUS_THROTTLE_TIME_MASK, 0,
+       [THROTTLED_TIME] = PRIMITIVE_INFO_INIT(THROTTLED_TIME, PERF_STATUS_THROTTLE_TIME_MASK, 0,
                            RAPL_DOMAIN_REG_PERF, TIME_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PRIORITY_LEVEL, PP_POLICY_MASK, 0,
+       [PRIORITY_LEVEL] = PRIMITIVE_INFO_INIT(PRIORITY_LEVEL, PP_POLICY_MASK, 0,
                            RAPL_DOMAIN_REG_POLICY, ARBITRARY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT1, PSYS_POWER_LIMIT1_MASK, 0,
+       [PSYS_POWER_LIMIT1] = PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT1, PSYS_POWER_LIMIT1_MASK, 0,
                            RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT2, PSYS_POWER_LIMIT2_MASK, 32,
+       [PSYS_POWER_LIMIT2] = PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT2, PSYS_POWER_LIMIT2_MASK, 32,
                            RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PSYS_PL1_ENABLE, PSYS_POWER_LIMIT1_ENABLE, 17,
+       [PSYS_PL1_ENABLE] = PRIMITIVE_INFO_INIT(PSYS_PL1_ENABLE, PSYS_POWER_LIMIT1_ENABLE, 17,
                            RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PSYS_PL2_ENABLE, PSYS_POWER_LIMIT2_ENABLE, 49,
+       [PSYS_PL2_ENABLE] = PRIMITIVE_INFO_INIT(PSYS_PL2_ENABLE, PSYS_POWER_LIMIT2_ENABLE, 49,
                            RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW1, PSYS_TIME_WINDOW1_MASK, 19,
+       [PSYS_TIME_WINDOW1] = PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW1, PSYS_TIME_WINDOW1_MASK, 19,
                            RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0),
-       PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW2, PSYS_TIME_WINDOW2_MASK, 51,
+       [PSYS_TIME_WINDOW2] = PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW2, PSYS_TIME_WINDOW2_MASK, 51,
                            RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0),
        /* non-hardware */
-       PRIMITIVE_INFO_INIT(AVERAGE_POWER, 0, 0, 0, POWER_UNIT,
+       [AVERAGE_POWER] = PRIMITIVE_INFO_INIT(AVERAGE_POWER, 0, 0, 0, POWER_UNIT,
                            RAPL_PRIMITIVE_DERIVED),
-       {NULL, 0, 0, 0},
 };
 
+/* RAPL primitives for TPMI I/F */
+static struct rapl_primitive_info rpi_tpmi[NR_RAPL_PRIMITIVES] = {
+       /* name, mask, shift, msr index, unit divisor */
+       [POWER_LIMIT1] = PRIMITIVE_INFO_INIT(POWER_LIMIT1, TPMI_POWER_LIMIT_MASK, 0,
+               RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0),
+       [POWER_LIMIT2] = PRIMITIVE_INFO_INIT(POWER_LIMIT2, TPMI_POWER_LIMIT_MASK, 0,
+               RAPL_DOMAIN_REG_PL2, POWER_UNIT, 0),
+       [POWER_LIMIT4] = PRIMITIVE_INFO_INIT(POWER_LIMIT4, TPMI_POWER_LIMIT_MASK, 0,
+               RAPL_DOMAIN_REG_PL4, POWER_UNIT, 0),
+       [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0,
+               RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0),
+       [PL1_LOCK] = PRIMITIVE_INFO_INIT(PL1_LOCK, POWER_HIGH_LOCK, 63,
+               RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
+       [PL2_LOCK] = PRIMITIVE_INFO_INIT(PL2_LOCK, POWER_HIGH_LOCK, 63,
+               RAPL_DOMAIN_REG_PL2, ARBITRARY_UNIT, 0),
+       [PL4_LOCK] = PRIMITIVE_INFO_INIT(PL4_LOCK, POWER_HIGH_LOCK, 63,
+               RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0),
+       [PL1_ENABLE] = PRIMITIVE_INFO_INIT(PL1_ENABLE, TPMI_POWER_LIMIT_ENABLE, 62,
+               RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
+       [PL2_ENABLE] = PRIMITIVE_INFO_INIT(PL2_ENABLE, TPMI_POWER_LIMIT_ENABLE, 62,
+               RAPL_DOMAIN_REG_PL2, ARBITRARY_UNIT, 0),
+       [PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE, TPMI_POWER_LIMIT_ENABLE, 62,
+               RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0),
+       [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TPMI_TIME_WINDOW_MASK, 18,
+               RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0),
+       [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TPMI_TIME_WINDOW_MASK, 18,
+               RAPL_DOMAIN_REG_PL2, TIME_UNIT, 0),
+       [THERMAL_SPEC_POWER] = PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, TPMI_INFO_SPEC_MASK, 0,
+               RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0),
+       [MAX_POWER] = PRIMITIVE_INFO_INIT(MAX_POWER, TPMI_INFO_MAX_MASK, 36,
+               RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0),
+       [MIN_POWER] = PRIMITIVE_INFO_INIT(MIN_POWER, TPMI_INFO_MIN_MASK, 18,
+               RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0),
+       [MAX_TIME_WINDOW] = PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, TPMI_INFO_MAX_TIME_WIN_MASK, 54,
+               RAPL_DOMAIN_REG_INFO, TIME_UNIT, 0),
+       [THROTTLED_TIME] = PRIMITIVE_INFO_INIT(THROTTLED_TIME, PERF_STATUS_THROTTLE_TIME_MASK, 0,
+               RAPL_DOMAIN_REG_PERF, TIME_UNIT, 0),
+       /* non-hardware */
+       [AVERAGE_POWER] = PRIMITIVE_INFO_INIT(AVERAGE_POWER, 0, 0, 0,
+               POWER_UNIT, RAPL_PRIMITIVE_DERIVED),
+};
+
+static struct rapl_primitive_info *get_rpi(struct rapl_package *rp, int prim)
+{
+       struct rapl_primitive_info *rpi = rp->priv->rpi;
+
+       if (prim < 0 || prim > NR_RAPL_PRIMITIVES || !rpi)
+               return NULL;
+
+       return &rpi[prim];
+}
+
+static int rapl_config(struct rapl_package *rp)
+{
+       switch (rp->priv->type) {
+       /* MMIO I/F shares the same register layout as MSR registers */
+       case RAPL_IF_MMIO:
+       case RAPL_IF_MSR:
+               rp->priv->defaults = (void *)defaults_msr;
+               rp->priv->rpi = (void *)rpi_msr;
+               break;
+       case RAPL_IF_TPMI:
+               rp->priv->defaults = (void *)&defaults_tpmi;
+               rp->priv->rpi = (void *)rpi_tpmi;
+               break;
+       default:
+               return -EINVAL;
+       }
+       return 0;
+}
+
 static enum rapl_primitives
 prim_fixups(struct rapl_domain *rd, enum rapl_primitives prim)
 {
-       if (!rapl_defaults->spr_psys_bits)
+       struct rapl_defaults *defaults = get_defaults(rd->rp);
+
+       if (!defaults->spr_psys_bits)
                return prim;
 
        if (rd->id != RAPL_DOMAIN_PLATFORM)
@@ -747,41 +811,33 @@ static int rapl_read_data_raw(struct rapl_domain *rd,
 {
        u64 value;
        enum rapl_primitives prim_fixed = prim_fixups(rd, prim);
-       struct rapl_primitive_info *rp = &rpi[prim_fixed];
+       struct rapl_primitive_info *rpi = get_rpi(rd->rp, prim_fixed);
        struct reg_action ra;
-       int cpu;
 
-       if (!rp->name || rp->flag & RAPL_PRIMITIVE_DUMMY)
+       if (!rpi || !rpi->name || rpi->flag & RAPL_PRIMITIVE_DUMMY)
                return -EINVAL;
 
-       ra.reg = rd->regs[rp->id];
+       ra.reg = rd->regs[rpi->id];
        if (!ra.reg)
                return -EINVAL;
 
-       cpu = rd->rp->lead_cpu;
-
-       /* domain with 2 limits has different bit */
-       if (prim == FW_LOCK && rd->rp->priv->limits[rd->id] == 2) {
-               rp->mask = POWER_HIGH_LOCK;
-               rp->shift = 63;
-       }
        /* non-hardware data are collected by the polling thread */
-       if (rp->flag & RAPL_PRIMITIVE_DERIVED) {
+       if (rpi->flag & RAPL_PRIMITIVE_DERIVED) {
                *data = rd->rdd.primitives[prim];
                return 0;
        }
 
-       ra.mask = rp->mask;
+       ra.mask = rpi->mask;
 
-       if (rd->rp->priv->read_raw(cpu, &ra)) {
-               pr_debug("failed to read reg 0x%llx on cpu %d\n", ra.reg, cpu);
+       if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra)) {
+               pr_debug("failed to read reg 0x%llx for %s:%s\n", ra.reg, rd->rp->name, rd->name);
                return -EIO;
        }
 
-       value = ra.value >> rp->shift;
+       value = ra.value >> rpi->shift;
 
        if (xlate)
-               *data = rapl_unit_xlate(rd, rp->unit, value, 0);
+               *data = rapl_unit_xlate(rd, rpi->unit, value, 0);
        else
                *data = value;
 
@@ -794,28 +850,56 @@ static int rapl_write_data_raw(struct rapl_domain *rd,
                               unsigned long long value)
 {
        enum rapl_primitives prim_fixed = prim_fixups(rd, prim);
-       struct rapl_primitive_info *rp = &rpi[prim_fixed];
-       int cpu;
+       struct rapl_primitive_info *rpi = get_rpi(rd->rp, prim_fixed);
        u64 bits;
        struct reg_action ra;
        int ret;
 
-       cpu = rd->rp->lead_cpu;
-       bits = rapl_unit_xlate(rd, rp->unit, value, 1);
-       bits <<= rp->shift;
-       bits &= rp->mask;
+       if (!rpi || !rpi->name || rpi->flag & RAPL_PRIMITIVE_DUMMY)
+               return -EINVAL;
+
+       bits = rapl_unit_xlate(rd, rpi->unit, value, 1);
+       bits <<= rpi->shift;
+       bits &= rpi->mask;
 
        memset(&ra, 0, sizeof(ra));
 
-       ra.reg = rd->regs[rp->id];
-       ra.mask = rp->mask;
+       ra.reg = rd->regs[rpi->id];
+       ra.mask = rpi->mask;
        ra.value = bits;
 
-       ret = rd->rp->priv->write_raw(cpu, &ra);
+       ret = rd->rp->priv->write_raw(get_rid(rd->rp), &ra);
 
        return ret;
 }
 
+static int rapl_read_pl_data(struct rapl_domain *rd, int pl,
+                             enum pl_prims pl_prim, bool xlate, u64 *data)
+{
+       enum rapl_primitives prim = get_pl_prim(rd, pl, pl_prim);
+
+       if (!is_pl_valid(rd, pl))
+               return -EINVAL;
+
+       return rapl_read_data_raw(rd, prim, xlate, data);
+}
+
+static int rapl_write_pl_data(struct rapl_domain *rd, int pl,
+                              enum pl_prims pl_prim,
+                              unsigned long long value)
+{
+       enum rapl_primitives prim = get_pl_prim(rd, pl, pl_prim);
+
+       if (!is_pl_valid(rd, pl))
+               return -EINVAL;
+
+       if (rd->rpl[pl].locked) {
+               pr_warn("%s:%s:%s locked by BIOS\n", rd->rp->name, rd->name, pl_names[pl]);
+               return -EACCES;
+       }
+
+       return rapl_write_data_raw(rd, prim, value);
+}
 /*
  * Raw RAPL data stored in MSRs are in certain scales. We need to
  * convert them into standard units based on the units reported in
@@ -827,58 +911,58 @@ static int rapl_write_data_raw(struct rapl_domain *rd,
  * power unit : microWatts  : Represented in milliWatts by default
  * time unit  : microseconds: Represented in seconds by default
  */
-static int rapl_check_unit_core(struct rapl_package *rp, int cpu)
+static int rapl_check_unit_core(struct rapl_domain *rd)
 {
        struct reg_action ra;
        u32 value;
 
-       ra.reg = rp->priv->reg_unit;
+       ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT];
        ra.mask = ~0;
-       if (rp->priv->read_raw(cpu, &ra)) {
-               pr_err("Failed to read power unit REG 0x%llx on CPU %d, exit.\n",
-                      rp->priv->reg_unit, cpu);
+       if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra)) {
+               pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n",
+                       ra.reg, rd->rp->name, rd->name);
                return -ENODEV;
        }
 
        value = (ra.value & ENERGY_UNIT_MASK) >> ENERGY_UNIT_OFFSET;
-       rp->energy_unit = ENERGY_UNIT_SCALE * 1000000 / (1 << value);
+       rd->energy_unit = ENERGY_UNIT_SCALE * 1000000 / (1 << value);
 
        value = (ra.value & POWER_UNIT_MASK) >> POWER_UNIT_OFFSET;
-       rp->power_unit = 1000000 / (1 << value);
+       rd->power_unit = 1000000 / (1 << value);
 
        value = (ra.value & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET;
-       rp->time_unit = 1000000 / (1 << value);
+       rd->time_unit = 1000000 / (1 << value);
 
-       pr_debug("Core CPU %s energy=%dpJ, time=%dus, power=%duW\n",
-                rp->name, rp->energy_unit, rp->time_unit, rp->power_unit);
+       pr_debug("Core CPU %s:%s energy=%dpJ, time=%dus, power=%duW\n",
+                rd->rp->name, rd->name, rd->energy_unit, rd->time_unit, rd->power_unit);
 
        return 0;
 }
 
-static int rapl_check_unit_atom(struct rapl_package *rp, int cpu)
+static int rapl_check_unit_atom(struct rapl_domain *rd)
 {
        struct reg_action ra;
        u32 value;
 
-       ra.reg = rp->priv->reg_unit;
+       ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT];
        ra.mask = ~0;
-       if (rp->priv->read_raw(cpu, &ra)) {
-               pr_err("Failed to read power unit REG 0x%llx on CPU %d, exit.\n",
-                      rp->priv->reg_unit, cpu);
+       if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra)) {
+               pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n",
+                       ra.reg, rd->rp->name, rd->name);
                return -ENODEV;
        }
 
        value = (ra.value & ENERGY_UNIT_MASK) >> ENERGY_UNIT_OFFSET;
-       rp->energy_unit = ENERGY_UNIT_SCALE * 1 << value;
+       rd->energy_unit = ENERGY_UNIT_SCALE * 1 << value;
 
        value = (ra.value & POWER_UNIT_MASK) >> POWER_UNIT_OFFSET;
-       rp->power_unit = (1 << value) * 1000;
+       rd->power_unit = (1 << value) * 1000;
 
        value = (ra.value & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET;
-       rp->time_unit = 1000000 / (1 << value);
+       rd->time_unit = 1000000 / (1 << value);
 
-       pr_debug("Atom %s energy=%dpJ, time=%dus, power=%duW\n",
-                rp->name, rp->energy_unit, rp->time_unit, rp->power_unit);
+       pr_debug("Atom %s:%s energy=%dpJ, time=%dus, power=%duW\n",
+                rd->rp->name, rd->name, rd->energy_unit, rd->time_unit, rd->power_unit);
 
        return 0;
 }
@@ -910,6 +994,9 @@ static void power_limit_irq_save_cpu(void *info)
 
 static void package_power_limit_irq_save(struct rapl_package *rp)
 {
+       if (rp->lead_cpu < 0)
+               return;
+
        if (!boot_cpu_has(X86_FEATURE_PTS) || !boot_cpu_has(X86_FEATURE_PLN))
                return;
 
@@ -924,6 +1011,9 @@ static void package_power_limit_irq_restore(struct rapl_package *rp)
 {
        u32 l, h;
 
+       if (rp->lead_cpu < 0)
+               return;
+
        if (!boot_cpu_has(X86_FEATURE_PTS) || !boot_cpu_has(X86_FEATURE_PLN))
                return;
 
@@ -943,33 +1033,33 @@ static void package_power_limit_irq_restore(struct rapl_package *rp)
 
 static void set_floor_freq_default(struct rapl_domain *rd, bool mode)
 {
-       int nr_powerlimit = find_nr_power_limit(rd);
+       int i;
 
        /* always enable clamp such that p-state can go below OS requested
         * range. power capping priority over guranteed frequency.
         */
-       rapl_write_data_raw(rd, PL1_CLAMP, mode);
+       rapl_write_pl_data(rd, POWER_LIMIT1, PL_CLAMP, mode);
 
-       /* some domains have pl2 */
-       if (nr_powerlimit > 1) {
-               rapl_write_data_raw(rd, PL2_ENABLE, mode);
-               rapl_write_data_raw(rd, PL2_CLAMP, mode);
+       for (i = POWER_LIMIT2; i < NR_POWER_LIMITS; i++) {
+               rapl_write_pl_data(rd, i, PL_ENABLE, mode);
+               rapl_write_pl_data(rd, i, PL_CLAMP, mode);
        }
 }
 
 static void set_floor_freq_atom(struct rapl_domain *rd, bool enable)
 {
        static u32 power_ctrl_orig_val;
+       struct rapl_defaults *defaults = get_defaults(rd->rp);
        u32 mdata;
 
-       if (!rapl_defaults->floor_freq_reg_addr) {
+       if (!defaults->floor_freq_reg_addr) {
                pr_err("Invalid floor frequency config register\n");
                return;
        }
 
        if (!power_ctrl_orig_val)
                iosf_mbi_read(BT_MBI_UNIT_PMC, MBI_CR_READ,
-                             rapl_defaults->floor_freq_reg_addr,
+                             defaults->floor_freq_reg_addr,
                              &power_ctrl_orig_val);
        mdata = power_ctrl_orig_val;
        if (enable) {
@@ -977,10 +1067,10 @@ static void set_floor_freq_atom(struct rapl_domain *rd, bool enable)
                mdata |= 1 << 8;
        }
        iosf_mbi_write(BT_MBI_UNIT_PMC, MBI_CR_WRITE,
-                      rapl_defaults->floor_freq_reg_addr, mdata);
+                      defaults->floor_freq_reg_addr, mdata);
 }
 
-static u64 rapl_compute_time_window_core(struct rapl_package *rp, u64 value,
+static u64 rapl_compute_time_window_core(struct rapl_domain *rd, u64 value,
                                         bool to_raw)
 {
        u64 f, y;               /* fraction and exp. used for time unit */
@@ -992,12 +1082,12 @@ static u64 rapl_compute_time_window_core(struct rapl_package *rp, u64 value,
        if (!to_raw) {
                f = (value & 0x60) >> 5;
                y = value & 0x1f;
-               value = (1 << y) * (4 + f) * rp->time_unit / 4;
+               value = (1 << y) * (4 + f) * rd->time_unit / 4;
        } else {
-               if (value < rp->time_unit)
+               if (value < rd->time_unit)
                        return 0;
 
-               do_div(value, rp->time_unit);
+               do_div(value, rd->time_unit);
                y = ilog2(value);
 
                /*
@@ -1013,7 +1103,7 @@ static u64 rapl_compute_time_window_core(struct rapl_package *rp, u64 value,
        return value;
 }
 
-static u64 rapl_compute_time_window_atom(struct rapl_package *rp, u64 value,
+static u64 rapl_compute_time_window_atom(struct rapl_domain *rd, u64 value,
                                         bool to_raw)
 {
        /*
@@ -1021,13 +1111,56 @@ static u64 rapl_compute_time_window_atom(struct rapl_package *rp, u64 value,
         * where time_unit is default to 1 sec. Never 0.
         */
        if (!to_raw)
-               return (value) ? value * rp->time_unit : rp->time_unit;
+               return (value) ? value * rd->time_unit : rd->time_unit;
 
-       value = div64_u64(value, rp->time_unit);
+       value = div64_u64(value, rd->time_unit);
 
        return value;
 }
 
+/* TPMI Unit register has different layout */
+#define TPMI_POWER_UNIT_OFFSET POWER_UNIT_OFFSET
+#define TPMI_POWER_UNIT_MASK   POWER_UNIT_MASK
+#define TPMI_ENERGY_UNIT_OFFSET        0x06
+#define TPMI_ENERGY_UNIT_MASK  0x7C0
+#define TPMI_TIME_UNIT_OFFSET  0x0C
+#define TPMI_TIME_UNIT_MASK    0xF000
+
+static int rapl_check_unit_tpmi(struct rapl_domain *rd)
+{
+       struct reg_action ra;
+       u32 value;
+
+       ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT];
+       ra.mask = ~0;
+       if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra)) {
+               pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n",
+                       ra.reg, rd->rp->name, rd->name);
+               return -ENODEV;
+       }
+
+       value = (ra.value & TPMI_ENERGY_UNIT_MASK) >> TPMI_ENERGY_UNIT_OFFSET;
+       rd->energy_unit = ENERGY_UNIT_SCALE * 1000000 / (1 << value);
+
+       value = (ra.value & TPMI_POWER_UNIT_MASK) >> TPMI_POWER_UNIT_OFFSET;
+       rd->power_unit = 1000000 / (1 << value);
+
+       value = (ra.value & TPMI_TIME_UNIT_MASK) >> TPMI_TIME_UNIT_OFFSET;
+       rd->time_unit = 1000000 / (1 << value);
+
+       pr_debug("Core CPU %s:%s energy=%dpJ, time=%dus, power=%duW\n",
+                rd->rp->name, rd->name, rd->energy_unit, rd->time_unit, rd->power_unit);
+
+       return 0;
+}
+
+static const struct rapl_defaults defaults_tpmi = {
+       .check_unit = rapl_check_unit_tpmi,
+       /* Reuse existing logic, ignore the PL_CLAMP failures and enable all Power Limits */
+       .set_floor_freq = set_floor_freq_default,
+       .compute_time_window = rapl_compute_time_window_core,
+};
+
 static const struct rapl_defaults rapl_defaults_core = {
        .floor_freq_reg_addr = 0,
        .check_unit = rapl_check_unit_core,
@@ -1159,8 +1292,10 @@ static void rapl_update_domain_data(struct rapl_package *rp)
                         rp->domains[dmn].name);
                /* exclude non-raw primitives */
                for (prim = 0; prim < NR_RAW_PRIMITIVES; prim++) {
+                       struct rapl_primitive_info *rpi = get_rpi(rp, prim);
+
                        if (!rapl_read_data_raw(&rp->domains[dmn], prim,
-                                               rpi[prim].unit, &val))
+                                               rpi->unit, &val))
                                rp->domains[dmn].rdd.primitives[prim] = val;
                }
        }
@@ -1239,7 +1374,7 @@ err_cleanup:
        return ret;
 }
 
-static int rapl_check_domain(int cpu, int domain, struct rapl_package *rp)
+static int rapl_check_domain(int domain, struct rapl_package *rp)
 {
        struct reg_action ra;
 
@@ -1260,9 +1395,43 @@ static int rapl_check_domain(int cpu, int domain, struct rapl_package *rp)
         */
 
        ra.mask = ENERGY_STATUS_MASK;
-       if (rp->priv->read_raw(cpu, &ra) || !ra.value)
+       if (rp->priv->read_raw(get_rid(rp), &ra) || !ra.value)
+               return -ENODEV;
+
+       return 0;
+}
+
+/*
+ * Get per domain energy/power/time unit.
+ * RAPL Interfaces without per domain unit register will use the package
+ * scope unit register to set per domain units.
+ */
+static int rapl_get_domain_unit(struct rapl_domain *rd)
+{
+       struct rapl_defaults *defaults = get_defaults(rd->rp);
+       int ret;
+
+       if (!rd->regs[RAPL_DOMAIN_REG_UNIT]) {
+               if (!rd->rp->priv->reg_unit) {
+                       pr_err("No valid Unit register found\n");
+                       return -ENODEV;
+               }
+               rd->regs[RAPL_DOMAIN_REG_UNIT] = rd->rp->priv->reg_unit;
+       }
+
+       if (!defaults->check_unit) {
+               pr_err("missing .check_unit() callback\n");
                return -ENODEV;
+       }
+
+       ret = defaults->check_unit(rd);
+       if (ret)
+               return ret;
 
+       if (rd->id == RAPL_DOMAIN_DRAM && defaults->dram_domain_energy_unit)
+               rd->energy_unit = defaults->dram_domain_energy_unit;
+       if (rd->id == RAPL_DOMAIN_PLATFORM && defaults->psys_domain_energy_unit)
+               rd->energy_unit = defaults->psys_domain_energy_unit;
        return 0;
 }
 
@@ -1280,19 +1449,16 @@ static void rapl_detect_powerlimit(struct rapl_domain *rd)
        u64 val64;
        int i;
 
-       /* check if the domain is locked by BIOS, ignore if MSR doesn't exist */
-       if (!rapl_read_data_raw(rd, FW_LOCK, false, &val64)) {
-               if (val64) {
-                       pr_info("RAPL %s domain %s locked by BIOS\n",
-                               rd->rp->name, rd->name);
-                       rd->state |= DOMAIN_STATE_BIOS_LOCKED;
+       for (i = POWER_LIMIT1; i < NR_POWER_LIMITS; i++) {
+               if (!rapl_read_pl_data(rd, i, PL_LOCK, false, &val64)) {
+                       if (val64) {
+                               rd->rpl[i].locked = true;
+                               pr_info("%s:%s:%s locked by BIOS\n",
+                                       rd->rp->name, rd->name, pl_names[i]);
+                       }
                }
-       }
-       /* check if power limit MSR exists, otherwise domain is monitoring only */
-       for (i = 0; i < NR_POWER_LIMITS; i++) {
-               int prim = rd->rpl[i].prim_id;
 
-               if (rapl_read_data_raw(rd, prim, false, &val64))
+               if (rapl_read_pl_data(rd, i, PL_ENABLE, false, &val64))
                        rd->rpl[i].name = NULL;
        }
 }
@@ -1300,14 +1466,14 @@ static void rapl_detect_powerlimit(struct rapl_domain *rd)
 /* Detect active and valid domains for the given CPU, caller must
  * ensure the CPU belongs to the targeted package and CPU hotlug is disabled.
  */
-static int rapl_detect_domains(struct rapl_package *rp, int cpu)
+static int rapl_detect_domains(struct rapl_package *rp)
 {
        struct rapl_domain *rd;
        int i;
 
        for (i = 0; i < RAPL_DOMAIN_MAX; i++) {
                /* use physical package id to read counters */
-               if (!rapl_check_domain(cpu, i, rp)) {
+               if (!rapl_check_domain(i, rp)) {
                        rp->domain_map |= 1 << i;
                        pr_info("Found RAPL domain %s\n", rapl_domain_names[i]);
                }
@@ -1326,8 +1492,10 @@ static int rapl_detect_domains(struct rapl_package *rp, int cpu)
 
        rapl_init_domains(rp);
 
-       for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++)
+       for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
+               rapl_get_domain_unit(rd);
                rapl_detect_powerlimit(rd);
+       }
 
        return 0;
 }
@@ -1340,13 +1508,13 @@ void rapl_remove_package(struct rapl_package *rp)
        package_power_limit_irq_restore(rp);
 
        for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
-               rapl_write_data_raw(rd, PL1_ENABLE, 0);
-               rapl_write_data_raw(rd, PL1_CLAMP, 0);
-               if (find_nr_power_limit(rd) > 1) {
-                       rapl_write_data_raw(rd, PL2_ENABLE, 0);
-                       rapl_write_data_raw(rd, PL2_CLAMP, 0);
-                       rapl_write_data_raw(rd, PL4_ENABLE, 0);
+               int i;
+
+               for (i = POWER_LIMIT1; i < NR_POWER_LIMITS; i++) {
+                       rapl_write_pl_data(rd, i, PL_ENABLE, 0);
+                       rapl_write_pl_data(rd, i, PL_CLAMP, 0);
                }
+
                if (rd->id == RAPL_DOMAIN_PACKAGE) {
                        rd_package = rd;
                        continue;
@@ -1365,13 +1533,18 @@ void rapl_remove_package(struct rapl_package *rp)
 EXPORT_SYMBOL_GPL(rapl_remove_package);
 
 /* caller to ensure CPU hotplug lock is held */
-struct rapl_package *rapl_find_package_domain(int cpu, struct rapl_if_priv *priv)
+struct rapl_package *rapl_find_package_domain(int id, struct rapl_if_priv *priv, bool id_is_cpu)
 {
-       int id = topology_logical_die_id(cpu);
        struct rapl_package *rp;
+       int uid;
+
+       if (id_is_cpu)
+               uid = topology_logical_die_id(id);
+       else
+               uid = id;
 
        list_for_each_entry(rp, &rapl_packages, plist) {
-               if (rp->id == id
+               if (rp->id == uid
                    && rp->priv->control_type == priv->control_type)
                        return rp;
        }
@@ -1381,34 +1554,37 @@ struct rapl_package *rapl_find_package_domain(int cpu, struct rapl_if_priv *priv
 EXPORT_SYMBOL_GPL(rapl_find_package_domain);
 
 /* called from CPU hotplug notifier, hotplug lock held */
-struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
+struct rapl_package *rapl_add_package(int id, struct rapl_if_priv *priv, bool id_is_cpu)
 {
-       int id = topology_logical_die_id(cpu);
        struct rapl_package *rp;
        int ret;
 
-       if (!rapl_defaults)
-               return ERR_PTR(-ENODEV);
-
        rp = kzalloc(sizeof(struct rapl_package), GFP_KERNEL);
        if (!rp)
                return ERR_PTR(-ENOMEM);
 
-       /* add the new package to the list */
-       rp->id = id;
-       rp->lead_cpu = cpu;
-       rp->priv = priv;
+       if (id_is_cpu) {
+               rp->id = topology_logical_die_id(id);
+               rp->lead_cpu = id;
+               if (topology_max_die_per_package() > 1)
+                       snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
+                                topology_physical_package_id(id), topology_die_id(id));
+               else
+                       snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d",
+                                topology_physical_package_id(id));
+       } else {
+               rp->id = id;
+               rp->lead_cpu = -1;
+               snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d", id);
+       }
 
-       if (topology_max_die_per_package() > 1)
-               snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH,
-                        "package-%d-die-%d",
-                        topology_physical_package_id(cpu), topology_die_id(cpu));
-       else
-               snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d",
-                        topology_physical_package_id(cpu));
+       rp->priv = priv;
+       ret = rapl_config(rp);
+       if (ret)
+               goto err_free_package;
 
        /* check if the package contains valid domains */
-       if (rapl_detect_domains(rp, cpu) || rapl_defaults->check_unit(rp, cpu)) {
+       if (rapl_detect_domains(rp)) {
                ret = -ENODEV;
                goto err_free_package;
        }
@@ -1430,38 +1606,18 @@ static void power_limit_state_save(void)
 {
        struct rapl_package *rp;
        struct rapl_domain *rd;
-       int nr_pl, ret, i;
+       int ret, i;
 
        cpus_read_lock();
        list_for_each_entry(rp, &rapl_packages, plist) {
                if (!rp->power_zone)
                        continue;
                rd = power_zone_to_rapl_domain(rp->power_zone);
-               nr_pl = find_nr_power_limit(rd);
-               for (i = 0; i < nr_pl; i++) {
-                       switch (rd->rpl[i].prim_id) {
-                       case PL1_ENABLE:
-                               ret = rapl_read_data_raw(rd,
-                                                POWER_LIMIT1, true,
-                                                &rd->rpl[i].last_power_limit);
-                               if (ret)
-                                       rd->rpl[i].last_power_limit = 0;
-                               break;
-                       case PL2_ENABLE:
-                               ret = rapl_read_data_raw(rd,
-                                                POWER_LIMIT2, true,
+               for (i = POWER_LIMIT1; i < NR_POWER_LIMITS; i++) {
+                       ret = rapl_read_pl_data(rd, i, PL_LIMIT, true,
                                                 &rd->rpl[i].last_power_limit);
-                               if (ret)
-                                       rd->rpl[i].last_power_limit = 0;
-                               break;
-                       case PL4_ENABLE:
-                               ret = rapl_read_data_raw(rd,
-                                                POWER_LIMIT4, true,
-                                                &rd->rpl[i].last_power_limit);
-                               if (ret)
-                                       rd->rpl[i].last_power_limit = 0;
-                               break;
-                       }
+                       if (ret)
+                               rd->rpl[i].last_power_limit = 0;
                }
        }
        cpus_read_unlock();
@@ -1471,33 +1627,17 @@ static void power_limit_state_restore(void)
 {
        struct rapl_package *rp;
        struct rapl_domain *rd;
-       int nr_pl, i;
+       int i;
 
        cpus_read_lock();
        list_for_each_entry(rp, &rapl_packages, plist) {
                if (!rp->power_zone)
                        continue;
                rd = power_zone_to_rapl_domain(rp->power_zone);
-               nr_pl = find_nr_power_limit(rd);
-               for (i = 0; i < nr_pl; i++) {
-                       switch (rd->rpl[i].prim_id) {
-                       case PL1_ENABLE:
-                               if (rd->rpl[i].last_power_limit)
-                                       rapl_write_data_raw(rd, POWER_LIMIT1,
-                                           rd->rpl[i].last_power_limit);
-                               break;
-                       case PL2_ENABLE:
-                               if (rd->rpl[i].last_power_limit)
-                                       rapl_write_data_raw(rd, POWER_LIMIT2,
-                                           rd->rpl[i].last_power_limit);
-                               break;
-                       case PL4_ENABLE:
-                               if (rd->rpl[i].last_power_limit)
-                                       rapl_write_data_raw(rd, POWER_LIMIT4,
-                                           rd->rpl[i].last_power_limit);
-                               break;
-                       }
-               }
+               for (i = POWER_LIMIT1; i < NR_POWER_LIMITS; i++)
+                       if (rd->rpl[i].last_power_limit)
+                               rapl_write_pl_data(rd, i, PL_LIMIT,
+                                              rd->rpl[i].last_power_limit);
        }
        cpus_read_unlock();
 }
@@ -1528,32 +1668,25 @@ static int __init rapl_init(void)
        int ret;
 
        id = x86_match_cpu(rapl_ids);
-       if (!id) {
-               pr_err("driver does not support CPU family %d model %d\n",
-                      boot_cpu_data.x86, boot_cpu_data.x86_model);
-
-               return -ENODEV;
-       }
+       if (id) {
+               defaults_msr = (struct rapl_defaults *)id->driver_data;
 
-       rapl_defaults = (struct rapl_defaults *)id->driver_data;
-
-       ret = register_pm_notifier(&rapl_pm_notifier);
-       if (ret)
-               return ret;
+               rapl_msr_platdev = platform_device_alloc("intel_rapl_msr", 0);
+               if (!rapl_msr_platdev)
+                       return -ENOMEM;
 
-       rapl_msr_platdev = platform_device_alloc("intel_rapl_msr", 0);
-       if (!rapl_msr_platdev) {
-               ret = -ENOMEM;
-               goto end;
+               ret = platform_device_add(rapl_msr_platdev);
+               if (ret) {
+                       platform_device_put(rapl_msr_platdev);
+                       return ret;
+               }
        }
 
-       ret = platform_device_add(rapl_msr_platdev);
-       if (ret)
+       ret = register_pm_notifier(&rapl_pm_notifier);
+       if (ret && rapl_msr_platdev) {
+               platform_device_del(rapl_msr_platdev);
                platform_device_put(rapl_msr_platdev);
-
-end:
-       if (ret)
-               unregister_pm_notifier(&rapl_pm_notifier);
+       }
 
        return ret;
 }
index a276737..569e25e 100644 (file)
@@ -22,7 +22,6 @@
 #include <linux/processor.h>
 #include <linux/platform_device.h>
 
-#include <asm/iosf_mbi.h>
 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
 
@@ -34,6 +33,7 @@
 static struct rapl_if_priv *rapl_msr_priv;
 
 static struct rapl_if_priv rapl_msr_priv_intel = {
+       .type = RAPL_IF_MSR,
        .reg_unit = MSR_RAPL_POWER_UNIT,
        .regs[RAPL_DOMAIN_PACKAGE] = {
                MSR_PKG_POWER_LIMIT, MSR_PKG_ENERGY_STATUS, MSR_PKG_PERF_STATUS, 0, MSR_PKG_POWER_INFO },
@@ -45,11 +45,12 @@ static struct rapl_if_priv rapl_msr_priv_intel = {
                MSR_DRAM_POWER_LIMIT, MSR_DRAM_ENERGY_STATUS, MSR_DRAM_PERF_STATUS, 0, MSR_DRAM_POWER_INFO },
        .regs[RAPL_DOMAIN_PLATFORM] = {
                MSR_PLATFORM_POWER_LIMIT, MSR_PLATFORM_ENERGY_STATUS, 0, 0, 0},
-       .limits[RAPL_DOMAIN_PACKAGE] = 2,
-       .limits[RAPL_DOMAIN_PLATFORM] = 2,
+       .limits[RAPL_DOMAIN_PACKAGE] = BIT(POWER_LIMIT2),
+       .limits[RAPL_DOMAIN_PLATFORM] = BIT(POWER_LIMIT2),
 };
 
 static struct rapl_if_priv rapl_msr_priv_amd = {
+       .type = RAPL_IF_MSR,
        .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
        .regs[RAPL_DOMAIN_PACKAGE] = {
                0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
@@ -68,9 +69,9 @@ static int rapl_cpu_online(unsigned int cpu)
 {
        struct rapl_package *rp;
 
-       rp = rapl_find_package_domain(cpu, rapl_msr_priv);
+       rp = rapl_find_package_domain(cpu, rapl_msr_priv, true);
        if (!rp) {
-               rp = rapl_add_package(cpu, rapl_msr_priv);
+               rp = rapl_add_package(cpu, rapl_msr_priv, true);
                if (IS_ERR(rp))
                        return PTR_ERR(rp);
        }
@@ -83,7 +84,7 @@ static int rapl_cpu_down_prep(unsigned int cpu)
        struct rapl_package *rp;
        int lead_cpu;
 
-       rp = rapl_find_package_domain(cpu, rapl_msr_priv);
+       rp = rapl_find_package_domain(cpu, rapl_msr_priv, true);
        if (!rp)
                return 0;
 
@@ -137,14 +138,14 @@ static int rapl_msr_write_raw(int cpu, struct reg_action *ra)
 
 /* List of verified CPUs. */
 static const struct x86_cpu_id pl4_support_ids[] = {
-       { X86_VENDOR_INTEL, 6, INTEL_FAM6_TIGERLAKE_L, X86_FEATURE_ANY },
-       { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE, X86_FEATURE_ANY },
-       { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE_L, X86_FEATURE_ANY },
-       { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE_N, X86_FEATURE_ANY },
-       { X86_VENDOR_INTEL, 6, INTEL_FAM6_RAPTORLAKE, X86_FEATURE_ANY },
-       { X86_VENDOR_INTEL, 6, INTEL_FAM6_RAPTORLAKE_P, X86_FEATURE_ANY },
-       { X86_VENDOR_INTEL, 6, INTEL_FAM6_METEORLAKE, X86_FEATURE_ANY },
-       { X86_VENDOR_INTEL, 6, INTEL_FAM6_METEORLAKE_L, X86_FEATURE_ANY },
+       X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, NULL),
+       X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, NULL),
+       X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, NULL),
+       X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_N, NULL),
+       X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, NULL),
+       X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, NULL),
+       X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE, NULL),
+       X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE_L, NULL),
        {}
 };
 
@@ -169,7 +170,7 @@ static int rapl_msr_probe(struct platform_device *pdev)
        rapl_msr_priv->write_raw = rapl_msr_write_raw;
 
        if (id) {
-               rapl_msr_priv->limits[RAPL_DOMAIN_PACKAGE] = 3;
+               rapl_msr_priv->limits[RAPL_DOMAIN_PACKAGE] |= BIT(POWER_LIMIT4);
                rapl_msr_priv->regs[RAPL_DOMAIN_PACKAGE][RAPL_DOMAIN_REG_PL4] =
                        MSR_VR_CURRENT_CONFIG;
                pr_info("PL4 support detected.\n");
diff --git a/drivers/powercap/intel_rapl_tpmi.c b/drivers/powercap/intel_rapl_tpmi.c
new file mode 100644 (file)
index 0000000..4f4f13d
--- /dev/null
@@ -0,0 +1,325 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * intel_rapl_tpmi: Intel RAPL driver via TPMI interface
+ *
+ * Copyright (c) 2023, Intel Corporation.
+ * All Rights Reserved.
+ *
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/auxiliary_bus.h>
+#include <linux/io.h>
+#include <linux/intel_tpmi.h>
+#include <linux/intel_rapl.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#define TPMI_RAPL_VERSION 1
+
+/* 1 header + 10 registers + 5 reserved. 8 bytes for each. */
+#define TPMI_RAPL_DOMAIN_SIZE 128
+
+enum tpmi_rapl_domain_type {
+       TPMI_RAPL_DOMAIN_INVALID,
+       TPMI_RAPL_DOMAIN_SYSTEM,
+       TPMI_RAPL_DOMAIN_PACKAGE,
+       TPMI_RAPL_DOMAIN_RESERVED,
+       TPMI_RAPL_DOMAIN_MEMORY,
+       TPMI_RAPL_DOMAIN_MAX,
+};
+
+enum tpmi_rapl_register {
+       TPMI_RAPL_REG_HEADER,
+       TPMI_RAPL_REG_UNIT,
+       TPMI_RAPL_REG_PL1,
+       TPMI_RAPL_REG_PL2,
+       TPMI_RAPL_REG_PL3,
+       TPMI_RAPL_REG_PL4,
+       TPMI_RAPL_REG_RESERVED,
+       TPMI_RAPL_REG_ENERGY_STATUS,
+       TPMI_RAPL_REG_PERF_STATUS,
+       TPMI_RAPL_REG_POWER_INFO,
+       TPMI_RAPL_REG_INTERRUPT,
+       TPMI_RAPL_REG_MAX = 15,
+};
+
+struct tpmi_rapl_package {
+       struct rapl_if_priv priv;
+       struct intel_tpmi_plat_info *tpmi_info;
+       struct rapl_package *rp;
+       void __iomem *base;
+       struct list_head node;
+};
+
+static LIST_HEAD(tpmi_rapl_packages);
+static DEFINE_MUTEX(tpmi_rapl_lock);
+
+static struct powercap_control_type *tpmi_control_type;
+
+static int tpmi_rapl_read_raw(int id, struct reg_action *ra)
+{
+       if (!ra->reg)
+               return -EINVAL;
+
+       ra->value = readq((void __iomem *)ra->reg);
+
+       ra->value &= ra->mask;
+       return 0;
+}
+
+static int tpmi_rapl_write_raw(int id, struct reg_action *ra)
+{
+       u64 val;
+
+       if (!ra->reg)
+               return -EINVAL;
+
+       val = readq((void __iomem *)ra->reg);
+
+       val &= ~ra->mask;
+       val |= ra->value;
+
+       writeq(val, (void __iomem *)ra->reg);
+       return 0;
+}
+
+static struct tpmi_rapl_package *trp_alloc(int pkg_id)
+{
+       struct tpmi_rapl_package *trp;
+       int ret;
+
+       mutex_lock(&tpmi_rapl_lock);
+
+       if (list_empty(&tpmi_rapl_packages)) {
+               tpmi_control_type = powercap_register_control_type(NULL, "intel-rapl", NULL);
+               if (IS_ERR(tpmi_control_type)) {
+                       ret = PTR_ERR(tpmi_control_type);
+                       goto err_unlock;
+               }
+       }
+
+       trp = kzalloc(sizeof(*trp), GFP_KERNEL);
+       if (!trp) {
+               ret = -ENOMEM;
+               goto err_del_powercap;
+       }
+
+       list_add(&trp->node, &tpmi_rapl_packages);
+
+       mutex_unlock(&tpmi_rapl_lock);
+       return trp;
+
+err_del_powercap:
+       if (list_empty(&tpmi_rapl_packages))
+               powercap_unregister_control_type(tpmi_control_type);
+err_unlock:
+       mutex_unlock(&tpmi_rapl_lock);
+       return ERR_PTR(ret);
+}
+
+static void trp_release(struct tpmi_rapl_package *trp)
+{
+       mutex_lock(&tpmi_rapl_lock);
+       list_del(&trp->node);
+
+       if (list_empty(&tpmi_rapl_packages))
+               powercap_unregister_control_type(tpmi_control_type);
+
+       kfree(trp);
+       mutex_unlock(&tpmi_rapl_lock);
+}
+
+static int parse_one_domain(struct tpmi_rapl_package *trp, u32 offset)
+{
+       u8 tpmi_domain_version;
+       enum rapl_domain_type domain_type;
+       enum tpmi_rapl_domain_type tpmi_domain_type;
+       enum tpmi_rapl_register reg_index;
+       enum rapl_domain_reg_id reg_id;
+       int tpmi_domain_size, tpmi_domain_flags;
+       u64 *tpmi_rapl_regs = trp->base + offset;
+       u64 tpmi_domain_header = readq((void __iomem *)tpmi_rapl_regs);
+
+       /* Domain Parent bits are ignored for now */
+       tpmi_domain_version = tpmi_domain_header & 0xff;
+       tpmi_domain_type = tpmi_domain_header >> 8 & 0xff;
+       tpmi_domain_size = tpmi_domain_header >> 16 & 0xff;
+       tpmi_domain_flags = tpmi_domain_header >> 32 & 0xffff;
+
+       if (tpmi_domain_version != TPMI_RAPL_VERSION) {
+               pr_warn(FW_BUG "Unsupported version:%d\n", tpmi_domain_version);
+               return -ENODEV;
+       }
+
+       /* Domain size: in unit of 128 Bytes */
+       if (tpmi_domain_size != 1) {
+               pr_warn(FW_BUG "Invalid Domain size %d\n", tpmi_domain_size);
+               return -EINVAL;
+       }
+
+       /* Unit register and Energy Status register are mandatory for each domain */
+       if (!(tpmi_domain_flags & BIT(TPMI_RAPL_REG_UNIT)) ||
+           !(tpmi_domain_flags & BIT(TPMI_RAPL_REG_ENERGY_STATUS))) {
+               pr_warn(FW_BUG "Invalid Domain flag 0x%x\n", tpmi_domain_flags);
+               return -EINVAL;
+       }
+
+       switch (tpmi_domain_type) {
+       case TPMI_RAPL_DOMAIN_PACKAGE:
+               domain_type = RAPL_DOMAIN_PACKAGE;
+               break;
+       case TPMI_RAPL_DOMAIN_SYSTEM:
+               domain_type = RAPL_DOMAIN_PLATFORM;
+               break;
+       case TPMI_RAPL_DOMAIN_MEMORY:
+               domain_type = RAPL_DOMAIN_DRAM;
+               break;
+       default:
+               pr_warn(FW_BUG "Unsupported Domain type %d\n", tpmi_domain_type);
+               return -EINVAL;
+       }
+
+       if (trp->priv.regs[domain_type][RAPL_DOMAIN_REG_UNIT]) {
+               pr_warn(FW_BUG "Duplicate Domain type %d\n", tpmi_domain_type);
+               return -EINVAL;
+       }
+
+       reg_index = TPMI_RAPL_REG_HEADER;
+       while (++reg_index != TPMI_RAPL_REG_MAX) {
+               if (!(tpmi_domain_flags & BIT(reg_index)))
+                       continue;
+
+               switch (reg_index) {
+               case TPMI_RAPL_REG_UNIT:
+                       reg_id = RAPL_DOMAIN_REG_UNIT;
+                       break;
+               case TPMI_RAPL_REG_PL1:
+                       reg_id = RAPL_DOMAIN_REG_LIMIT;
+                       trp->priv.limits[domain_type] |= BIT(POWER_LIMIT1);
+                       break;
+               case TPMI_RAPL_REG_PL2:
+                       reg_id = RAPL_DOMAIN_REG_PL2;
+                       trp->priv.limits[domain_type] |= BIT(POWER_LIMIT2);
+                       break;
+               case TPMI_RAPL_REG_PL4:
+                       reg_id = RAPL_DOMAIN_REG_PL4;
+                       trp->priv.limits[domain_type] |= BIT(POWER_LIMIT4);
+                       break;
+               case TPMI_RAPL_REG_ENERGY_STATUS:
+                       reg_id = RAPL_DOMAIN_REG_STATUS;
+                       break;
+               case TPMI_RAPL_REG_PERF_STATUS:
+                       reg_id = RAPL_DOMAIN_REG_PERF;
+                       break;
+               case TPMI_RAPL_REG_POWER_INFO:
+                       reg_id = RAPL_DOMAIN_REG_INFO;
+                       break;
+               default:
+                       continue;
+               }
+               trp->priv.regs[domain_type][reg_id] = (u64)&tpmi_rapl_regs[reg_index];
+       }
+
+       return 0;
+}
+
+static int intel_rapl_tpmi_probe(struct auxiliary_device *auxdev,
+                                const struct auxiliary_device_id *id)
+{
+       struct tpmi_rapl_package *trp;
+       struct intel_tpmi_plat_info *info;
+       struct resource *res;
+       u32 offset;
+       int ret;
+
+       info = tpmi_get_platform_data(auxdev);
+       if (!info)
+               return -ENODEV;
+
+       trp = trp_alloc(info->package_id);
+       if (IS_ERR(trp))
+               return PTR_ERR(trp);
+
+       if (tpmi_get_resource_count(auxdev) > 1) {
+               dev_err(&auxdev->dev, "does not support multiple resources\n");
+               ret = -EINVAL;
+               goto err;
+       }
+
+       res = tpmi_get_resource_at_index(auxdev, 0);
+       if (!res) {
+               dev_err(&auxdev->dev, "can't fetch device resource info\n");
+               ret = -EIO;
+               goto err;
+       }
+
+       trp->base = devm_ioremap_resource(&auxdev->dev, res);
+       if (IS_ERR(trp->base)) {
+               ret = PTR_ERR(trp->base);
+               goto err;
+       }
+
+       for (offset = 0; offset < resource_size(res); offset += TPMI_RAPL_DOMAIN_SIZE) {
+               ret = parse_one_domain(trp, offset);
+               if (ret)
+                       goto err;
+       }
+
+       trp->tpmi_info = info;
+       trp->priv.type = RAPL_IF_TPMI;
+       trp->priv.read_raw = tpmi_rapl_read_raw;
+       trp->priv.write_raw = tpmi_rapl_write_raw;
+       trp->priv.control_type = tpmi_control_type;
+
+       /* RAPL TPMI I/F is per physical package */
+       trp->rp = rapl_find_package_domain(info->package_id, &trp->priv, false);
+       if (trp->rp) {
+               dev_err(&auxdev->dev, "Domain for Package%d already exists\n", info->package_id);
+               ret = -EEXIST;
+               goto err;
+       }
+
+       trp->rp = rapl_add_package(info->package_id, &trp->priv, false);
+       if (IS_ERR(trp->rp)) {
+               dev_err(&auxdev->dev, "Failed to add RAPL Domain for Package%d, %ld\n",
+                       info->package_id, PTR_ERR(trp->rp));
+               ret = PTR_ERR(trp->rp);
+               goto err;
+       }
+
+       auxiliary_set_drvdata(auxdev, trp);
+
+       return 0;
+err:
+       trp_release(trp);
+       return ret;
+}
+
+static void intel_rapl_tpmi_remove(struct auxiliary_device *auxdev)
+{
+       struct tpmi_rapl_package *trp = auxiliary_get_drvdata(auxdev);
+
+       rapl_remove_package(trp->rp);
+       trp_release(trp);
+}
+
+static const struct auxiliary_device_id intel_rapl_tpmi_ids[] = {
+       {.name = "intel_vsec.tpmi-rapl" },
+       { }
+};
+
+MODULE_DEVICE_TABLE(auxiliary, intel_rapl_tpmi_ids);
+
+static struct auxiliary_driver intel_rapl_tpmi_driver = {
+       .probe = intel_rapl_tpmi_probe,
+       .remove = intel_rapl_tpmi_remove,
+       .id_table = intel_rapl_tpmi_ids,
+};
+
+module_auxiliary_driver(intel_rapl_tpmi_driver)
+
+MODULE_IMPORT_NS(INTEL_TPMI);
+
+MODULE_DESCRIPTION("Intel RAPL TPMI Driver");
+MODULE_LICENSE("GPL");
index 0c567d9..5f7d286 100644 (file)
@@ -6,7 +6,7 @@
  *              Bo Shen <voice.shen@atmel.com>
  *
  * Links to reference manuals for the supported PWM chips can be found in
- * Documentation/arm/microchip.rst.
+ * Documentation/arch/arm/microchip.rst.
  *
  * Limitations:
  * - Periods start with the inactive level.
index 46ed668..762429d 100644 (file)
@@ -8,7 +8,7 @@
  *             eric miao <eric.miao@marvell.com>
  *
  * Links to reference manuals for some of the supported PWM chips can be found
- * in Documentation/arm/marvell.rst.
+ * in Documentation/arch/arm/marvell.rst.
  *
  * Limitations:
  * - When PWM is stopped, the current PWM period stops abruptly at the next
index f0a6391..ffb973c 100644 (file)
@@ -46,7 +46,7 @@ int __init ras_add_daemon_trace(void)
 
        fentry = debugfs_create_file("daemon_active", S_IRUSR, ras_debugfs_dir,
                                     NULL, &trace_fops);
-       if (!fentry)
+       if (IS_ERR(fentry))
                return -ENODEV;
 
        return 0;
index dc741ac..698ab7f 100644 (file)
@@ -5256,7 +5256,7 @@ static void rdev_init_debugfs(struct regulator_dev *rdev)
        }
 
        rdev->debugfs = debugfs_create_dir(rname, debugfs_root);
-       if (!rdev->debugfs) {
+       if (IS_ERR(rdev->debugfs)) {
                rdev_warn(rdev, "Failed to create debugfs directory\n");
                return;
        }
@@ -6178,7 +6178,7 @@ static int __init regulator_init(void)
        ret = class_register(&regulator_class);
 
        debugfs_root = debugfs_create_dir("regulator", NULL);
-       if (!debugfs_root)
+       if (IS_ERR(debugfs_root))
                pr_warn("regulator: Failed to create debugfs directory\n");
 
 #ifdef CONFIG_DEBUG_FS
index 1849566..3eb86ec 100644 (file)
@@ -951,9 +951,12 @@ static int mt6359_regulator_probe(struct platform_device *pdev)
        struct regulator_config config = {};
        struct regulator_dev *rdev;
        struct mt6359_regulator_info *mt6359_info;
-       int i, hw_ver;
+       int i, hw_ver, ret;
+
+       ret = regmap_read(mt6397->regmap, MT6359P_HWCID, &hw_ver);
+       if (ret)
+               return ret;
 
-       regmap_read(mt6397->regmap, MT6359P_HWCID, &hw_ver);
        if (hw_ver >= MT6359P_CHIP_VER)
                mt6359_info = mt6359p_regulators;
        else
index 87a746d..e75dd92 100644 (file)
@@ -264,7 +264,7 @@ static const struct pca9450_regulator_desc pca9450a_regulators[] = {
                        .vsel_reg = PCA9450_REG_BUCK2OUT_DVS0,
                        .vsel_mask = BUCK2OUT_DVS0_MASK,
                        .enable_reg = PCA9450_REG_BUCK2CTRL,
-                       .enable_mask = BUCK1_ENMODE_MASK,
+                       .enable_mask = BUCK2_ENMODE_MASK,
                        .ramp_reg = PCA9450_REG_BUCK2CTRL,
                        .ramp_mask = BUCK2_RAMP_MASK,
                        .ramp_delay_table = pca9450_dvs_buck_ramp_table,
@@ -502,7 +502,7 @@ static const struct pca9450_regulator_desc pca9450bc_regulators[] = {
                        .vsel_reg = PCA9450_REG_BUCK2OUT_DVS0,
                        .vsel_mask = BUCK2OUT_DVS0_MASK,
                        .enable_reg = PCA9450_REG_BUCK2CTRL,
-                       .enable_mask = BUCK1_ENMODE_MASK,
+                       .enable_mask = BUCK2_ENMODE_MASK,
                        .ramp_reg = PCA9450_REG_BUCK2CTRL,
                        .ramp_mask = BUCK2_RAMP_MASK,
                        .ramp_delay_table = pca9450_dvs_buck_ramp_table,
index b0a58c6..f3b280a 100644 (file)
@@ -1057,21 +1057,21 @@ static const struct rpmh_vreg_init_data pm8450_vreg_data[] = {
 };
 
 static const struct rpmh_vreg_init_data pm8550_vreg_data[] = {
-       RPMH_VREG("ldo1",   "ldo%s1",  &pmic5_pldo,    "vdd-l1-l4-l10"),
+       RPMH_VREG("ldo1",   "ldo%s1",  &pmic5_nldo515,    "vdd-l1-l4-l10"),
        RPMH_VREG("ldo2",   "ldo%s2",  &pmic5_pldo,    "vdd-l2-l13-l14"),
-       RPMH_VREG("ldo3",   "ldo%s3",  &pmic5_nldo,    "vdd-l3"),
-       RPMH_VREG("ldo4",   "ldo%s4",  &pmic5_nldo,    "vdd-l1-l4-l10"),
+       RPMH_VREG("ldo3",   "ldo%s3",  &pmic5_nldo515,    "vdd-l3"),
+       RPMH_VREG("ldo4",   "ldo%s4",  &pmic5_nldo515,    "vdd-l1-l4-l10"),
        RPMH_VREG("ldo5",   "ldo%s5",  &pmic5_pldo,    "vdd-l5-l16"),
-       RPMH_VREG("ldo6",   "ldo%s6",  &pmic5_pldo_lv, "vdd-l6-l7"),
-       RPMH_VREG("ldo7",   "ldo%s7",  &pmic5_pldo_lv, "vdd-l6-l7"),
-       RPMH_VREG("ldo8",   "ldo%s8",  &pmic5_pldo_lv, "vdd-l8-l9"),
+       RPMH_VREG("ldo6",   "ldo%s6",  &pmic5_pldo, "vdd-l6-l7"),
+       RPMH_VREG("ldo7",   "ldo%s7",  &pmic5_pldo, "vdd-l6-l7"),
+       RPMH_VREG("ldo8",   "ldo%s8",  &pmic5_pldo, "vdd-l8-l9"),
        RPMH_VREG("ldo9",   "ldo%s9",  &pmic5_pldo,    "vdd-l8-l9"),
-       RPMH_VREG("ldo10",  "ldo%s10", &pmic5_nldo,    "vdd-l1-l4-l10"),
-       RPMH_VREG("ldo11",  "ldo%s11", &pmic5_nldo,    "vdd-l11"),
+       RPMH_VREG("ldo10",  "ldo%s10", &pmic5_nldo515,    "vdd-l1-l4-l10"),
+       RPMH_VREG("ldo11",  "ldo%s11", &pmic5_nldo515,    "vdd-l11"),
        RPMH_VREG("ldo12",  "ldo%s12", &pmic5_pldo,    "vdd-l12"),
        RPMH_VREG("ldo13",  "ldo%s13", &pmic5_pldo,    "vdd-l2-l13-l14"),
        RPMH_VREG("ldo14",  "ldo%s14", &pmic5_pldo,    "vdd-l2-l13-l14"),
-       RPMH_VREG("ldo15",  "ldo%s15", &pmic5_pldo,    "vdd-l15"),
+       RPMH_VREG("ldo15",  "ldo%s15", &pmic5_nldo515,    "vdd-l15"),
        RPMH_VREG("ldo16",  "ldo%s16", &pmic5_pldo,    "vdd-l5-l16"),
        RPMH_VREG("ldo17",  "ldo%s17", &pmic5_pldo,    "vdd-l17"),
        RPMH_VREG("bob1",   "bob%s1",  &pmic5_bob,     "vdd-bob1"),
@@ -1086,9 +1086,9 @@ static const struct rpmh_vreg_init_data pm8550vs_vreg_data[] = {
        RPMH_VREG("smps4",  "smp%s4",  &pmic5_ftsmps525_lv, "vdd-s4"),
        RPMH_VREG("smps5",  "smp%s5",  &pmic5_ftsmps525_lv, "vdd-s5"),
        RPMH_VREG("smps6",  "smp%s6",  &pmic5_ftsmps525_mv, "vdd-s6"),
-       RPMH_VREG("ldo1",   "ldo%s1",  &pmic5_nldo,   "vdd-l1"),
-       RPMH_VREG("ldo2",   "ldo%s2",  &pmic5_nldo,   "vdd-l2"),
-       RPMH_VREG("ldo3",   "ldo%s3",  &pmic5_nldo,   "vdd-l3"),
+       RPMH_VREG("ldo1",   "ldo%s1",  &pmic5_nldo515,   "vdd-l1"),
+       RPMH_VREG("ldo2",   "ldo%s2",  &pmic5_nldo515,   "vdd-l2"),
+       RPMH_VREG("ldo3",   "ldo%s3",  &pmic5_nldo515,   "vdd-l3"),
        {}
 };
 
@@ -1101,9 +1101,9 @@ static const struct rpmh_vreg_init_data pm8550ve_vreg_data[] = {
        RPMH_VREG("smps6", "smp%s6", &pmic5_ftsmps525_lv, "vdd-s6"),
        RPMH_VREG("smps7", "smp%s7", &pmic5_ftsmps525_lv, "vdd-s7"),
        RPMH_VREG("smps8", "smp%s8", &pmic5_ftsmps525_lv, "vdd-s8"),
-       RPMH_VREG("ldo1",  "ldo%s1", &pmic5_nldo,   "vdd-l1"),
-       RPMH_VREG("ldo2",  "ldo%s2", &pmic5_nldo,   "vdd-l2"),
-       RPMH_VREG("ldo3",  "ldo%s3", &pmic5_nldo,   "vdd-l3"),
+       RPMH_VREG("ldo1",  "ldo%s1", &pmic5_nldo515,   "vdd-l1"),
+       RPMH_VREG("ldo2",  "ldo%s2", &pmic5_nldo515,   "vdd-l2"),
+       RPMH_VREG("ldo3",  "ldo%s3", &pmic5_nldo515,   "vdd-l3"),
        {}
 };
 
index 9fbfce7..4578895 100644 (file)
@@ -3234,12 +3234,12 @@ struct blk_mq_ops dasd_mq_ops = {
        .exit_hctx = dasd_exit_hctx,
 };
 
-static int dasd_open(struct block_device *bdev, fmode_t mode)
+static int dasd_open(struct gendisk *disk, blk_mode_t mode)
 {
        struct dasd_device *base;
        int rc;
 
-       base = dasd_device_from_gendisk(bdev->bd_disk);
+       base = dasd_device_from_gendisk(disk);
        if (!base)
                return -ENODEV;
 
@@ -3268,14 +3268,12 @@ static int dasd_open(struct block_device *bdev, fmode_t mode)
                rc = -ENODEV;
                goto out;
        }
-
-       if ((mode & FMODE_WRITE) &&
+       if ((mode & BLK_OPEN_WRITE) &&
            (test_bit(DASD_FLAG_DEVICE_RO, &base->flags) ||
             (base->features & DASD_FEATURE_READONLY))) {
                rc = -EROFS;
                goto out;
        }
-
        dasd_put_device(base);
        return 0;
 
@@ -3287,7 +3285,7 @@ unlock:
        return rc;
 }
 
-static void dasd_release(struct gendisk *disk, fmode_t mode)
+static void dasd_release(struct gendisk *disk)
 {
        struct dasd_device *base = dasd_device_from_gendisk(disk);
        if (base) {
index ade1369..113c509 100644 (file)
@@ -127,6 +127,8 @@ static int prepare_itcw(struct itcw *, unsigned int, unsigned int, int,
                        struct dasd_device *, struct dasd_device *,
                        unsigned int, int, unsigned int, unsigned int,
                        unsigned int, unsigned int);
+static int dasd_eckd_query_pprc_status(struct dasd_device *,
+                                      struct dasd_pprc_data_sc4 *);
 
 /* initial attempt at a probe function. this can be simplified once
  * the other detection code is gone */
@@ -3733,6 +3735,26 @@ static int count_exts(unsigned int from, unsigned int to, int trks_per_ext)
        return count;
 }
 
+static int dasd_in_copy_relation(struct dasd_device *device)
+{
+       struct dasd_pprc_data_sc4 *temp;
+       int rc;
+
+       if (!dasd_eckd_pprc_enabled(device))
+               return 0;
+
+       temp = kzalloc(sizeof(*temp), GFP_KERNEL);
+       if (!temp)
+               return -ENOMEM;
+
+       rc = dasd_eckd_query_pprc_status(device, temp);
+       if (!rc)
+               rc = temp->dev_info[0].state;
+
+       kfree(temp);
+       return rc;
+}
+
 /*
  * Release allocated space for a given range or an entire volume.
  */
@@ -3749,6 +3771,7 @@ dasd_eckd_dso_ras(struct dasd_device *device, struct dasd_block *block,
        int cur_to_trk, cur_from_trk;
        struct dasd_ccw_req *cqr;
        u32 beg_cyl, end_cyl;
+       int copy_relation;
        struct ccw1 *ccw;
        int trks_per_ext;
        size_t ras_size;
@@ -3760,6 +3783,10 @@ dasd_eckd_dso_ras(struct dasd_device *device, struct dasd_block *block,
        if (dasd_eckd_ras_sanity_checks(device, first_trk, last_trk))
                return ERR_PTR(-EINVAL);
 
+       copy_relation = dasd_in_copy_relation(device);
+       if (copy_relation < 0)
+               return ERR_PTR(copy_relation);
+
        rq = req ? blk_mq_rq_to_pdu(req) : NULL;
 
        features = &private->features;
@@ -3788,9 +3815,11 @@ dasd_eckd_dso_ras(struct dasd_device *device, struct dasd_block *block,
        /*
         * This bit guarantees initialisation of tracks within an extent that is
         * not fully specified, but is only supported with a certain feature
-        * subset.
+        * subset and for devices not in a copy relation.
         */
-       ras_data->op_flags.guarantee_init = !!(features->feature[56] & 0x01);
+       if (features->feature[56] & 0x01 && !copy_relation)
+               ras_data->op_flags.guarantee_init = 1;
+
        ras_data->lss = private->conf.ned->ID;
        ras_data->dev_addr = private->conf.ned->unit_addr;
        ras_data->nr_exts = nr_exts;
index 998a961..fe5108a 100644 (file)
@@ -130,7 +130,8 @@ int dasd_scan_partitions(struct dasd_block *block)
        struct block_device *bdev;
        int rc;
 
-       bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL);
+       bdev = blkdev_get_by_dev(disk_devt(block->gdp), BLK_OPEN_READ, NULL,
+                                NULL);
        if (IS_ERR(bdev)) {
                DBF_DEV_EVENT(DBF_ERR, block->base,
                              "scan partitions error, blkdev_get returned %ld",
@@ -179,7 +180,7 @@ void dasd_destroy_partitions(struct dasd_block *block)
        mutex_unlock(&bdev->bd_disk->open_mutex);
 
        /* Matching blkdev_put to the blkdev_get in dasd_scan_partitions. */
-       blkdev_put(bdev, FMODE_READ);
+       blkdev_put(bdev, NULL);
 }
 
 int dasd_gendisk_init(void)
index 33f812f..0aa5635 100644 (file)
@@ -965,7 +965,8 @@ int dasd_scan_partitions(struct dasd_block *);
 void dasd_destroy_partitions(struct dasd_block *);
 
 /* externals in dasd_ioctl.c */
-int dasd_ioctl(struct block_device *, fmode_t, unsigned int, unsigned long);
+int dasd_ioctl(struct block_device *bdev, blk_mode_t mode, unsigned int cmd,
+               unsigned long arg);
 int dasd_set_read_only(struct block_device *bdev, bool ro);
 
 /* externals in dasd_proc.c */
index 9327dcd..513a7e6 100644 (file)
@@ -552,10 +552,10 @@ static int __dasd_ioctl_information(struct dasd_block *block,
 
        memcpy(dasd_info->type, base->discipline->name, 4);
 
-       spin_lock_irqsave(&block->queue_lock, flags);
+       spin_lock_irqsave(get_ccwdev_lock(base->cdev), flags);
        list_for_each(l, &base->ccw_queue)
                dasd_info->chanq_len++;
-       spin_unlock_irqrestore(&block->queue_lock, flags);
+       spin_unlock_irqrestore(get_ccwdev_lock(base->cdev), flags);
        return 0;
 }
 
@@ -612,7 +612,7 @@ static int dasd_ioctl_readall_cmb(struct dasd_block *block, unsigned int cmd,
        return ret;
 }
 
-int dasd_ioctl(struct block_device *bdev, fmode_t mode,
+int dasd_ioctl(struct block_device *bdev, blk_mode_t mode,
               unsigned int cmd, unsigned long arg)
 {
        struct dasd_block *block;
index c09f2e0..200f88f 100644 (file)
@@ -28,8 +28,8 @@
 #define DCSSBLK_PARM_LEN 400
 #define DCSS_BUS_ID_SIZE 20
 
-static int dcssblk_open(struct block_device *bdev, fmode_t mode);
-static void dcssblk_release(struct gendisk *disk, fmode_t mode);
+static int dcssblk_open(struct gendisk *disk, blk_mode_t mode);
+static void dcssblk_release(struct gendisk *disk);
 static void dcssblk_submit_bio(struct bio *bio);
 static long dcssblk_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
                long nr_pages, enum dax_access_mode mode, void **kaddr,
@@ -809,12 +809,11 @@ out_buf:
 }
 
 static int
-dcssblk_open(struct block_device *bdev, fmode_t mode)
+dcssblk_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct dcssblk_dev_info *dev_info;
+       struct dcssblk_dev_info *dev_info = disk->private_data;
        int rc;
 
-       dev_info = bdev->bd_disk->private_data;
        if (NULL == dev_info) {
                rc = -ENODEV;
                goto out;
@@ -826,7 +825,7 @@ out:
 }
 
 static void
-dcssblk_release(struct gendisk *disk, fmode_t mode)
+dcssblk_release(struct gendisk *disk)
 {
        struct dcssblk_dev_info *dev_info = disk->private_data;
        struct segment_info *entry;
index 599f547..942c73a 100644 (file)
@@ -51,6 +51,7 @@ static struct dentry *zcore_dir;
 static struct dentry *zcore_reipl_file;
 static struct dentry *zcore_hsa_file;
 static struct ipl_parameter_block *zcore_ipl_block;
+static unsigned long os_info_flags;
 
 static DEFINE_MUTEX(hsa_buf_mutex);
 static char hsa_buf[PAGE_SIZE] __aligned(PAGE_SIZE);
@@ -139,7 +140,13 @@ static ssize_t zcore_reipl_write(struct file *filp, const char __user *buf,
 {
        if (zcore_ipl_block) {
                diag308(DIAG308_SET, zcore_ipl_block);
-               diag308(DIAG308_LOAD_CLEAR, NULL);
+               if (os_info_flags & OS_INFO_FLAG_REIPL_CLEAR)
+                       diag308(DIAG308_LOAD_CLEAR, NULL);
+               /* Use special diag308 subcode for CCW normal ipl */
+               if (zcore_ipl_block->pb0_hdr.pbt == IPL_PBT_CCW)
+                       diag308(DIAG308_LOAD_NORMAL_DUMP, NULL);
+               else
+                       diag308(DIAG308_LOAD_NORMAL, NULL);
        }
        return count;
 }
@@ -212,7 +219,10 @@ static int __init check_sdias(void)
  */
 static int __init zcore_reipl_init(void)
 {
+       struct os_info_entry *entry;
        struct ipib_info ipib_info;
+       unsigned long os_info_addr;
+       struct os_info *os_info;
        int rc;
 
        rc = memcpy_hsa_kernel(&ipib_info, __LC_DUMP_REIPL, sizeof(ipib_info));
@@ -234,6 +244,35 @@ static int __init zcore_reipl_init(void)
                free_page((unsigned long) zcore_ipl_block);
                zcore_ipl_block = NULL;
        }
+       /*
+        * Read the bit-flags field from os_info flags entry.
+        * Return zero even for os_info read or entry checksum errors in order
+        * to continue dump processing, considering that os_info could be
+        * corrupted on the panicked system.
+        */
+       os_info = (void *)__get_free_page(GFP_KERNEL);
+       if (!os_info)
+               return -ENOMEM;
+       rc = memcpy_hsa_kernel(&os_info_addr, __LC_OS_INFO, sizeof(os_info_addr));
+       if (rc)
+               goto out;
+       if (os_info_addr < sclp.hsa_size)
+               rc = memcpy_hsa_kernel(os_info, os_info_addr, PAGE_SIZE);
+       else
+               rc = memcpy_real(os_info, os_info_addr, PAGE_SIZE);
+       if (rc || os_info_csum(os_info) != os_info->csum)
+               goto out;
+       entry = &os_info->entry[OS_INFO_FLAGS_ENTRY];
+       if (entry->addr && entry->size) {
+               if (entry->addr < sclp.hsa_size)
+                       rc = memcpy_hsa_kernel(&os_info_flags, entry->addr, sizeof(os_info_flags));
+               else
+                       rc = memcpy_real(&os_info_flags, entry->addr, sizeof(os_info_flags));
+               if (rc || (__force u32)csum_partial(&os_info_flags, entry->size, 0) != entry->csum)
+                       os_info_flags = 0;
+       }
+out:
+       free_page((unsigned long)os_info);
        return 0;
 }
 
index 8eb089b..c0d620f 100644 (file)
@@ -1111,6 +1111,8 @@ static void io_subchannel_verify(struct subchannel *sch)
        cdev = sch_get_cdev(sch);
        if (cdev)
                dev_fsm_event(cdev, DEV_EVENT_VERIFY);
+       else
+               css_schedule_eval(sch->schid);
 }
 
 static void io_subchannel_terminate_path(struct subchannel *sch, u8 mask)
@@ -1374,6 +1376,7 @@ void ccw_device_set_notoper(struct ccw_device *cdev)
 enum io_sch_action {
        IO_SCH_UNREG,
        IO_SCH_ORPH_UNREG,
+       IO_SCH_UNREG_CDEV,
        IO_SCH_ATTACH,
        IO_SCH_UNREG_ATTACH,
        IO_SCH_ORPH_ATTACH,
@@ -1406,7 +1409,7 @@ static enum io_sch_action sch_get_action(struct subchannel *sch)
        }
        if ((sch->schib.pmcw.pam & sch->opm) == 0) {
                if (ccw_device_notify(cdev, CIO_NO_PATH) != NOTIFY_OK)
-                       return IO_SCH_UNREG;
+                       return IO_SCH_UNREG_CDEV;
                return IO_SCH_DISC;
        }
        if (device_is_disconnected(cdev))
@@ -1468,6 +1471,7 @@ static int io_subchannel_sch_event(struct subchannel *sch, int process)
        case IO_SCH_ORPH_ATTACH:
                ccw_device_set_disconnected(cdev);
                break;
+       case IO_SCH_UNREG_CDEV:
        case IO_SCH_UNREG_ATTACH:
        case IO_SCH_UNREG:
                if (!cdev)
@@ -1501,6 +1505,7 @@ static int io_subchannel_sch_event(struct subchannel *sch, int process)
                if (rc)
                        goto out;
                break;
+       case IO_SCH_UNREG_CDEV:
        case IO_SCH_UNREG_ATTACH:
                spin_lock_irqsave(sch->lock, flags);
                sch_set_cdev(sch, NULL);
index 5ea6249..641f0db 100644 (file)
@@ -95,7 +95,7 @@ static inline int do_sqbs(u64 token, unsigned char state, int queue,
                "       lgr     1,%[token]\n"
                "       .insn   rsy,0xeb000000008a,%[qs],%[ccq],0(%[state])"
                : [ccq] "+&d" (_ccq), [qs] "+&d" (_queuestart)
-               : [state] "d" ((unsigned long)state), [token] "d" (token)
+               : [state] "a" ((unsigned long)state), [token] "d" (token)
                : "memory", "cc", "1");
        *count = _ccq & 0xff;
        *start = _queuestart & 0xff;
index ff538a0..4360181 100644 (file)
@@ -171,7 +171,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
                return -ENODEV;
        }
 
-       parent = kzalloc(sizeof(*parent), GFP_KERNEL);
+       parent = kzalloc(struct_size(parent, mdev_types, 1), GFP_KERNEL);
        if (!parent)
                return -ENOMEM;
 
index b441ae6..b62bbc5 100644 (file)
@@ -79,7 +79,7 @@ struct vfio_ccw_parent {
 
        struct mdev_parent      parent;
        struct mdev_type        mdev_type;
-       struct mdev_type        *mdev_types[1];
+       struct mdev_type        *mdev_types[];
 };
 
 /**
index 5a05d1c..e58bfd2 100644 (file)
@@ -2,7 +2,8 @@
 /*
  *  pkey device driver
  *
- *  Copyright IBM Corp. 2017,2019
+ *  Copyright IBM Corp. 2017, 2023
+ *
  *  Author(s): Harald Freudenberger
  */
 
@@ -32,8 +33,10 @@ MODULE_AUTHOR("IBM Corporation");
 MODULE_DESCRIPTION("s390 protected key interface");
 
 #define KEYBLOBBUFSIZE 8192    /* key buffer size used for internal processing */
+#define MINKEYBLOBBUFSIZE (sizeof(struct keytoken_header))
 #define PROTKEYBLOBBUFSIZE 256 /* protected key buffer size used internal */
 #define MAXAPQNSINLIST 64      /* max 64 apqns within a apqn list */
+#define AES_WK_VP_SIZE 32      /* Size of WK VP block appended to a prot key */
 
 /*
  * debug feature data and functions
@@ -71,49 +74,106 @@ struct protaeskeytoken {
 } __packed;
 
 /* inside view of a clear key token (type 0x00 version 0x02) */
-struct clearaeskeytoken {
-       u8  type;        /* 0x00 for PAES specific key tokens */
+struct clearkeytoken {
+       u8  type;       /* 0x00 for PAES specific key tokens */
        u8  res0[3];
-       u8  version;     /* 0x02 for clear AES key token */
+       u8  version;    /* 0x02 for clear key token */
        u8  res1[3];
-       u32 keytype;     /* key type, one of the PKEY_KEYTYPE values */
-       u32 len;         /* bytes actually stored in clearkey[] */
+       u32 keytype;    /* key type, one of the PKEY_KEYTYPE_* values */
+       u32 len;        /* bytes actually stored in clearkey[] */
        u8  clearkey[]; /* clear key value */
 } __packed;
 
+/* helper function which translates the PKEY_KEYTYPE_AES_* to their keysize */
+static inline u32 pkey_keytype_aes_to_size(u32 keytype)
+{
+       switch (keytype) {
+       case PKEY_KEYTYPE_AES_128:
+               return 16;
+       case PKEY_KEYTYPE_AES_192:
+               return 24;
+       case PKEY_KEYTYPE_AES_256:
+               return 32;
+       default:
+               return 0;
+       }
+}
+
 /*
- * Create a protected key from a clear key value.
+ * Create a protected key from a clear key value via PCKMO instruction.
  */
-static int pkey_clr2protkey(u32 keytype,
-                           const struct pkey_clrkey *clrkey,
-                           struct pkey_protkey *protkey)
+static int pkey_clr2protkey(u32 keytype, const u8 *clrkey,
+                           u8 *protkey, u32 *protkeylen, u32 *protkeytype)
 {
        /* mask of available pckmo subfunctions */
        static cpacf_mask_t pckmo_functions;
 
-       long fc;
+       u8 paramblock[112];
+       u32 pkeytype;
        int keysize;
-       u8 paramblock[64];
+       long fc;
 
        switch (keytype) {
        case PKEY_KEYTYPE_AES_128:
+               /* 16 byte key, 32 byte aes wkvp, total 48 bytes */
                keysize = 16;
+               pkeytype = keytype;
                fc = CPACF_PCKMO_ENC_AES_128_KEY;
                break;
        case PKEY_KEYTYPE_AES_192:
+               /* 24 byte key, 32 byte aes wkvp, total 56 bytes */
                keysize = 24;
+               pkeytype = keytype;
                fc = CPACF_PCKMO_ENC_AES_192_KEY;
                break;
        case PKEY_KEYTYPE_AES_256:
+               /* 32 byte key, 32 byte aes wkvp, total 64 bytes */
                keysize = 32;
+               pkeytype = keytype;
                fc = CPACF_PCKMO_ENC_AES_256_KEY;
                break;
+       case PKEY_KEYTYPE_ECC_P256:
+               /* 32 byte key, 32 byte aes wkvp, total 64 bytes */
+               keysize = 32;
+               pkeytype = PKEY_KEYTYPE_ECC;
+               fc = CPACF_PCKMO_ENC_ECC_P256_KEY;
+               break;
+       case PKEY_KEYTYPE_ECC_P384:
+               /* 48 byte key, 32 byte aes wkvp, total 80 bytes */
+               keysize = 48;
+               pkeytype = PKEY_KEYTYPE_ECC;
+               fc = CPACF_PCKMO_ENC_ECC_P384_KEY;
+               break;
+       case PKEY_KEYTYPE_ECC_P521:
+               /* 80 byte key, 32 byte aes wkvp, total 112 bytes */
+               keysize = 80;
+               pkeytype = PKEY_KEYTYPE_ECC;
+               fc = CPACF_PCKMO_ENC_ECC_P521_KEY;
+               break;
+       case PKEY_KEYTYPE_ECC_ED25519:
+               /* 32 byte key, 32 byte aes wkvp, total 64 bytes */
+               keysize = 32;
+               pkeytype = PKEY_KEYTYPE_ECC;
+               fc = CPACF_PCKMO_ENC_ECC_ED25519_KEY;
+               break;
+       case PKEY_KEYTYPE_ECC_ED448:
+               /* 64 byte key, 32 byte aes wkvp, total 96 bytes */
+               keysize = 64;
+               pkeytype = PKEY_KEYTYPE_ECC;
+               fc = CPACF_PCKMO_ENC_ECC_ED448_KEY;
+               break;
        default:
-               DEBUG_ERR("%s unknown/unsupported keytype %d\n",
+               DEBUG_ERR("%s unknown/unsupported keytype %u\n",
                          __func__, keytype);
                return -EINVAL;
        }
 
+       if (*protkeylen < keysize + AES_WK_VP_SIZE) {
+               DEBUG_ERR("%s prot key buffer size too small: %u < %d\n",
+                         __func__, *protkeylen, keysize + AES_WK_VP_SIZE);
+               return -EINVAL;
+       }
+
        /* Did we already check for PCKMO ? */
        if (!pckmo_functions.bytes[0]) {
                /* no, so check now */
@@ -128,15 +188,15 @@ static int pkey_clr2protkey(u32 keytype,
 
        /* prepare param block */
        memset(paramblock, 0, sizeof(paramblock));
-       memcpy(paramblock, clrkey->clrkey, keysize);
+       memcpy(paramblock, clrkey, keysize);
 
        /* call the pckmo instruction */
        cpacf_pckmo(fc, paramblock);
 
-       /* copy created protected key */
-       protkey->type = keytype;
-       protkey->len = keysize + 32;
-       memcpy(protkey->protkey, paramblock, keysize + 32);
+       /* copy created protected key to key buffer including the wkvp block */
+       *protkeylen = keysize + AES_WK_VP_SIZE;
+       memcpy(protkey, paramblock, *protkeylen);
+       *protkeytype = pkeytype;
 
        return 0;
 }
@@ -144,11 +204,12 @@ static int pkey_clr2protkey(u32 keytype,
 /*
  * Find card and transform secure key into protected key.
  */
-static int pkey_skey2pkey(const u8 *key, struct pkey_protkey *pkey)
+static int pkey_skey2pkey(const u8 *key, u8 *protkey,
+                         u32 *protkeylen, u32 *protkeytype)
 {
-       int rc, verify;
-       u16 cardnr, domain;
        struct keytoken_header *hdr = (struct keytoken_header *)key;
+       u16 cardnr, domain;
+       int rc, verify;
 
        zcrypt_wait_api_operational();
 
@@ -167,14 +228,13 @@ static int pkey_skey2pkey(const u8 *key, struct pkey_protkey *pkey)
                        continue;
                switch (hdr->version) {
                case TOKVER_CCA_AES:
-                       rc = cca_sec2protkey(cardnr, domain,
-                                            key, pkey->protkey,
-                                            &pkey->len, &pkey->type);
+                       rc = cca_sec2protkey(cardnr, domain, key,
+                                            protkey, protkeylen, protkeytype);
                        break;
                case TOKVER_CCA_VLSC:
-                       rc = cca_cipher2protkey(cardnr, domain,
-                                               key, pkey->protkey,
-                                               &pkey->len, &pkey->type);
+                       rc = cca_cipher2protkey(cardnr, domain, key,
+                                               protkey, protkeylen,
+                                               protkeytype);
                        break;
                default:
                        return -EINVAL;
@@ -195,9 +255,9 @@ static int pkey_skey2pkey(const u8 *key, struct pkey_protkey *pkey)
 static int pkey_clr2ep11key(const u8 *clrkey, size_t clrkeylen,
                            u8 *keybuf, size_t *keybuflen)
 {
-       int i, rc;
-       u16 card, dom;
        u32 nr_apqns, *apqns = NULL;
+       u16 card, dom;
+       int i, rc;
 
        zcrypt_wait_api_operational();
 
@@ -227,12 +287,13 @@ out:
 /*
  * Find card and transform EP11 secure key into protected key.
  */
-static int pkey_ep11key2pkey(const u8 *key, struct pkey_protkey *pkey)
+static int pkey_ep11key2pkey(const u8 *key, u8 *protkey,
+                            u32 *protkeylen, u32 *protkeytype)
 {
-       int i, rc;
-       u16 card, dom;
-       u32 nr_apqns, *apqns = NULL;
        struct ep11keyblob *kb = (struct ep11keyblob *)key;
+       u32 nr_apqns, *apqns = NULL;
+       u16 card, dom;
+       int i, rc;
 
        zcrypt_wait_api_operational();
 
@@ -246,9 +307,8 @@ static int pkey_ep11key2pkey(const u8 *key, struct pkey_protkey *pkey)
        for (rc = -ENODEV, i = 0; i < nr_apqns; i++) {
                card = apqns[i] >> 16;
                dom = apqns[i] & 0xFFFF;
-               pkey->len = sizeof(pkey->protkey);
                rc = ep11_kblob2protkey(card, dom, key, kb->head.len,
-                                       pkey->protkey, &pkey->len, &pkey->type);
+                                       protkey, protkeylen, protkeytype);
                if (rc == 0)
                        break;
        }
@@ -306,38 +366,31 @@ out:
 /*
  * Generate a random protected key
  */
-static int pkey_genprotkey(u32 keytype, struct pkey_protkey *protkey)
+static int pkey_genprotkey(u32 keytype, u8 *protkey,
+                          u32 *protkeylen, u32 *protkeytype)
 {
-       struct pkey_clrkey clrkey;
+       u8 clrkey[32];
        int keysize;
        int rc;
 
-       switch (keytype) {
-       case PKEY_KEYTYPE_AES_128:
-               keysize = 16;
-               break;
-       case PKEY_KEYTYPE_AES_192:
-               keysize = 24;
-               break;
-       case PKEY_KEYTYPE_AES_256:
-               keysize = 32;
-               break;
-       default:
+       keysize = pkey_keytype_aes_to_size(keytype);
+       if (!keysize) {
                DEBUG_ERR("%s unknown/unsupported keytype %d\n", __func__,
                          keytype);
                return -EINVAL;
        }
 
        /* generate a dummy random clear key */
-       get_random_bytes(clrkey.clrkey, keysize);
+       get_random_bytes(clrkey, keysize);
 
        /* convert it to a dummy protected key */
-       rc = pkey_clr2protkey(keytype, &clrkey, protkey);
+       rc = pkey_clr2protkey(keytype, clrkey,
+                             protkey, protkeylen, protkeytype);
        if (rc)
                return rc;
 
        /* replace the key part of the protected key with random bytes */
-       get_random_bytes(protkey->protkey, keysize);
+       get_random_bytes(protkey, keysize);
 
        return 0;
 }
@@ -345,37 +398,46 @@ static int pkey_genprotkey(u32 keytype, struct pkey_protkey *protkey)
 /*
  * Verify if a protected key is still valid
  */
-static int pkey_verifyprotkey(const struct pkey_protkey *protkey)
+static int pkey_verifyprotkey(const u8 *protkey, u32 protkeylen,
+                             u32 protkeytype)
 {
-       unsigned long fc;
        struct {
                u8 iv[AES_BLOCK_SIZE];
                u8 key[MAXPROTKEYSIZE];
        } param;
        u8 null_msg[AES_BLOCK_SIZE];
        u8 dest_buf[AES_BLOCK_SIZE];
-       unsigned int k;
+       unsigned int k, pkeylen;
+       unsigned long fc;
 
-       switch (protkey->type) {
+       switch (protkeytype) {
        case PKEY_KEYTYPE_AES_128:
+               pkeylen = 16 + AES_WK_VP_SIZE;
                fc = CPACF_KMC_PAES_128;
                break;
        case PKEY_KEYTYPE_AES_192:
+               pkeylen = 24 + AES_WK_VP_SIZE;
                fc = CPACF_KMC_PAES_192;
                break;
        case PKEY_KEYTYPE_AES_256:
+               pkeylen = 32 + AES_WK_VP_SIZE;
                fc = CPACF_KMC_PAES_256;
                break;
        default:
-               DEBUG_ERR("%s unknown/unsupported keytype %d\n", __func__,
-                         protkey->type);
+               DEBUG_ERR("%s unknown/unsupported keytype %u\n", __func__,
+                         protkeytype);
+               return -EINVAL;
+       }
+       if (protkeylen != pkeylen) {
+               DEBUG_ERR("%s invalid protected key size %u for keytype %u\n",
+                         __func__, protkeylen, protkeytype);
                return -EINVAL;
        }
 
        memset(null_msg, 0, sizeof(null_msg));
 
        memset(param.iv, 0, sizeof(param.iv));
-       memcpy(param.key, protkey->protkey, sizeof(param.key));
+       memcpy(param.key, protkey, protkeylen);
 
        k = cpacf_kmc(fc | CPACF_ENCRYPT, &param, null_msg, dest_buf,
                      sizeof(null_msg));
@@ -387,15 +449,119 @@ static int pkey_verifyprotkey(const struct pkey_protkey *protkey)
        return 0;
 }
 
+/* Helper for pkey_nonccatok2pkey, handles aes clear key token */
+static int nonccatokaes2pkey(const struct clearkeytoken *t,
+                            u8 *protkey, u32 *protkeylen, u32 *protkeytype)
+{
+       size_t tmpbuflen = max_t(size_t, SECKEYBLOBSIZE, MAXEP11AESKEYBLOBSIZE);
+       u8 *tmpbuf = NULL;
+       u32 keysize;
+       int rc;
+
+       keysize = pkey_keytype_aes_to_size(t->keytype);
+       if (!keysize) {
+               DEBUG_ERR("%s unknown/unsupported keytype %u\n",
+                         __func__, t->keytype);
+               return -EINVAL;
+       }
+       if (t->len != keysize) {
+               DEBUG_ERR("%s non clear key aes token: invalid key len %u\n",
+                         __func__, t->len);
+               return -EINVAL;
+       }
+
+       /* try direct way with the PCKMO instruction */
+       rc = pkey_clr2protkey(t->keytype, t->clearkey,
+                             protkey, protkeylen, protkeytype);
+       if (!rc)
+               goto out;
+
+       /* PCKMO failed, so try the CCA secure key way */
+       tmpbuf = kmalloc(tmpbuflen, GFP_ATOMIC);
+       if (!tmpbuf)
+               return -ENOMEM;
+       zcrypt_wait_api_operational();
+       rc = cca_clr2seckey(0xFFFF, 0xFFFF, t->keytype, t->clearkey, tmpbuf);
+       if (rc)
+               goto try_via_ep11;
+       rc = pkey_skey2pkey(tmpbuf,
+                           protkey, protkeylen, protkeytype);
+       if (!rc)
+               goto out;
+
+try_via_ep11:
+       /* if the CCA way also failed, let's try via EP11 */
+       rc = pkey_clr2ep11key(t->clearkey, t->len,
+                             tmpbuf, &tmpbuflen);
+       if (rc)
+               goto failure;
+       rc = pkey_ep11key2pkey(tmpbuf,
+                              protkey, protkeylen, protkeytype);
+       if (!rc)
+               goto out;
+
+failure:
+       DEBUG_ERR("%s unable to build protected key from clear", __func__);
+
+out:
+       kfree(tmpbuf);
+       return rc;
+}
+
+/* Helper for pkey_nonccatok2pkey, handles ecc clear key token */
+static int nonccatokecc2pkey(const struct clearkeytoken *t,
+                            u8 *protkey, u32 *protkeylen, u32 *protkeytype)
+{
+       u32 keylen;
+       int rc;
+
+       switch (t->keytype) {
+       case PKEY_KEYTYPE_ECC_P256:
+               keylen = 32;
+               break;
+       case PKEY_KEYTYPE_ECC_P384:
+               keylen = 48;
+               break;
+       case PKEY_KEYTYPE_ECC_P521:
+               keylen = 80;
+               break;
+       case PKEY_KEYTYPE_ECC_ED25519:
+               keylen = 32;
+               break;
+       case PKEY_KEYTYPE_ECC_ED448:
+               keylen = 64;
+               break;
+       default:
+               DEBUG_ERR("%s unknown/unsupported keytype %u\n",
+                         __func__, t->keytype);
+               return -EINVAL;
+       }
+
+       if (t->len != keylen) {
+               DEBUG_ERR("%s non clear key ecc token: invalid key len %u\n",
+                         __func__, t->len);
+               return -EINVAL;
+       }
+
+       /* only one path possible: via PCKMO instruction */
+       rc = pkey_clr2protkey(t->keytype, t->clearkey,
+                             protkey, protkeylen, protkeytype);
+       if (rc) {
+               DEBUG_ERR("%s unable to build protected key from clear",
+                         __func__);
+       }
+
+       return rc;
+}
+
 /*
  * Transform a non-CCA key token into a protected key
  */
 static int pkey_nonccatok2pkey(const u8 *key, u32 keylen,
-                              struct pkey_protkey *protkey)
+                              u8 *protkey, u32 *protkeylen, u32 *protkeytype)
 {
-       int rc = -EINVAL;
-       u8 *tmpbuf = NULL;
        struct keytoken_header *hdr = (struct keytoken_header *)key;
+       int rc = -EINVAL;
 
        switch (hdr->version) {
        case TOKVER_PROTECTED_KEY: {
@@ -404,59 +570,40 @@ static int pkey_nonccatok2pkey(const u8 *key, u32 keylen,
                if (keylen != sizeof(struct protaeskeytoken))
                        goto out;
                t = (struct protaeskeytoken *)key;
-               protkey->len = t->len;
-               protkey->type = t->keytype;
-               memcpy(protkey->protkey, t->protkey,
-                      sizeof(protkey->protkey));
-               rc = pkey_verifyprotkey(protkey);
+               rc = pkey_verifyprotkey(t->protkey, t->len, t->keytype);
+               if (rc)
+                       goto out;
+               memcpy(protkey, t->protkey, t->len);
+               *protkeylen = t->len;
+               *protkeytype = t->keytype;
                break;
        }
        case TOKVER_CLEAR_KEY: {
-               struct clearaeskeytoken *t;
-               struct pkey_clrkey ckey;
-               union u_tmpbuf {
-                       u8 skey[SECKEYBLOBSIZE];
-                       u8 ep11key[MAXEP11AESKEYBLOBSIZE];
-               };
-               size_t tmpbuflen = sizeof(union u_tmpbuf);
-
-               if (keylen < sizeof(struct clearaeskeytoken))
-                       goto out;
-               t = (struct clearaeskeytoken *)key;
-               if (keylen != sizeof(*t) + t->len)
-                       goto out;
-               if ((t->keytype == PKEY_KEYTYPE_AES_128 && t->len == 16) ||
-                   (t->keytype == PKEY_KEYTYPE_AES_192 && t->len == 24) ||
-                   (t->keytype == PKEY_KEYTYPE_AES_256 && t->len == 32))
-                       memcpy(ckey.clrkey, t->clearkey, t->len);
-               else
-                       goto out;
-               /* alloc temp key buffer space */
-               tmpbuf = kmalloc(tmpbuflen, GFP_ATOMIC);
-               if (!tmpbuf) {
-                       rc = -ENOMEM;
+               struct clearkeytoken *t = (struct clearkeytoken *)key;
+
+               if (keylen < sizeof(struct clearkeytoken) ||
+                   keylen != sizeof(*t) + t->len)
                        goto out;
-               }
-               /* try direct way with the PCKMO instruction */
-               rc = pkey_clr2protkey(t->keytype, &ckey, protkey);
-               if (rc == 0)
+               switch (t->keytype) {
+               case PKEY_KEYTYPE_AES_128:
+               case PKEY_KEYTYPE_AES_192:
+               case PKEY_KEYTYPE_AES_256:
+                       rc = nonccatokaes2pkey(t, protkey,
+                                              protkeylen, protkeytype);
                        break;
-               /* PCKMO failed, so try the CCA secure key way */
-               zcrypt_wait_api_operational();
-               rc = cca_clr2seckey(0xFFFF, 0xFFFF, t->keytype,
-                                   ckey.clrkey, tmpbuf);
-               if (rc == 0)
-                       rc = pkey_skey2pkey(tmpbuf, protkey);
-               if (rc == 0)
+               case PKEY_KEYTYPE_ECC_P256:
+               case PKEY_KEYTYPE_ECC_P384:
+               case PKEY_KEYTYPE_ECC_P521:
+               case PKEY_KEYTYPE_ECC_ED25519:
+               case PKEY_KEYTYPE_ECC_ED448:
+                       rc = nonccatokecc2pkey(t, protkey,
+                                              protkeylen, protkeytype);
                        break;
-               /* if the CCA way also failed, let's try via EP11 */
-               rc = pkey_clr2ep11key(ckey.clrkey, t->len,
-                                     tmpbuf, &tmpbuflen);
-               if (rc == 0)
-                       rc = pkey_ep11key2pkey(tmpbuf, protkey);
-               /* now we should really have an protected key */
-               DEBUG_ERR("%s unable to build protected key from clear",
-                         __func__);
+               default:
+                       DEBUG_ERR("%s unknown/unsupported non cca clear key type %u\n",
+                                 __func__, t->keytype);
+                       return -EINVAL;
+               }
                break;
        }
        case TOKVER_EP11_AES: {
@@ -464,7 +611,8 @@ static int pkey_nonccatok2pkey(const u8 *key, u32 keylen,
                rc = ep11_check_aes_key(debug_info, 3, key, keylen, 1);
                if (rc)
                        goto out;
-               rc = pkey_ep11key2pkey(key, protkey);
+               rc = pkey_ep11key2pkey(key,
+                                      protkey, protkeylen, protkeytype);
                break;
        }
        case TOKVER_EP11_AES_WITH_HEADER:
@@ -473,16 +621,14 @@ static int pkey_nonccatok2pkey(const u8 *key, u32 keylen,
                if (rc)
                        goto out;
                rc = pkey_ep11key2pkey(key + sizeof(struct ep11kblob_header),
-                                      protkey);
+                                      protkey, protkeylen, protkeytype);
                break;
        default:
                DEBUG_ERR("%s unknown/unsupported non-CCA token version %d\n",
                          __func__, hdr->version);
-               rc = -EINVAL;
        }
 
 out:
-       kfree(tmpbuf);
        return rc;
 }
 
@@ -490,7 +636,7 @@ out:
  * Transform a CCA internal key token into a protected key
  */
 static int pkey_ccainttok2pkey(const u8 *key, u32 keylen,
-                              struct pkey_protkey *protkey)
+                              u8 *protkey, u32 *protkeylen, u32 *protkeytype)
 {
        struct keytoken_header *hdr = (struct keytoken_header *)key;
 
@@ -509,17 +655,17 @@ static int pkey_ccainttok2pkey(const u8 *key, u32 keylen,
                return -EINVAL;
        }
 
-       return pkey_skey2pkey(key, protkey);
+       return pkey_skey2pkey(key, protkey, protkeylen, protkeytype);
 }
 
 /*
  * Transform a key blob (of any type) into a protected key
  */
 int pkey_keyblob2pkey(const u8 *key, u32 keylen,
-                     struct pkey_protkey *protkey)
+                     u8 *protkey, u32 *protkeylen, u32 *protkeytype)
 {
-       int rc;
        struct keytoken_header *hdr = (struct keytoken_header *)key;
+       int rc;
 
        if (keylen < sizeof(struct keytoken_header)) {
                DEBUG_ERR("%s invalid keylen %d\n", __func__, keylen);
@@ -528,10 +674,12 @@ int pkey_keyblob2pkey(const u8 *key, u32 keylen,
 
        switch (hdr->type) {
        case TOKTYPE_NON_CCA:
-               rc = pkey_nonccatok2pkey(key, keylen, protkey);
+               rc = pkey_nonccatok2pkey(key, keylen,
+                                        protkey, protkeylen, protkeytype);
                break;
        case TOKTYPE_CCA_INTERNAL:
-               rc = pkey_ccainttok2pkey(key, keylen, protkey);
+               rc = pkey_ccainttok2pkey(key, keylen,
+                                        protkey, protkeylen, protkeytype);
                break;
        default:
                DEBUG_ERR("%s unknown/unsupported blob type %d\n",
@@ -663,9 +811,9 @@ static int pkey_verifykey2(const u8 *key, size_t keylen,
                           enum pkey_key_type *ktype,
                           enum pkey_key_size *ksize, u32 *flags)
 {
-       int rc;
-       u32 _nr_apqns, *_apqns = NULL;
        struct keytoken_header *hdr = (struct keytoken_header *)key;
+       u32 _nr_apqns, *_apqns = NULL;
+       int rc;
 
        if (keylen < sizeof(struct keytoken_header))
                return -EINVAL;
@@ -771,10 +919,10 @@ out:
 
 static int pkey_keyblob2pkey2(const struct pkey_apqn *apqns, size_t nr_apqns,
                              const u8 *key, size_t keylen,
-                             struct pkey_protkey *pkey)
+                             u8 *protkey, u32 *protkeylen, u32 *protkeytype)
 {
-       int i, card, dom, rc;
        struct keytoken_header *hdr = (struct keytoken_header *)key;
+       int i, card, dom, rc;
 
        /* check for at least one apqn given */
        if (!apqns || !nr_apqns)
@@ -806,7 +954,9 @@ static int pkey_keyblob2pkey2(const struct pkey_apqn *apqns, size_t nr_apqns,
                        if (ep11_check_aes_key(debug_info, 3, key, keylen, 1))
                                return -EINVAL;
                } else {
-                       return pkey_nonccatok2pkey(key, keylen, pkey);
+                       return pkey_nonccatok2pkey(key, keylen,
+                                                  protkey, protkeylen,
+                                                  protkeytype);
                }
        } else {
                DEBUG_ERR("%s unknown/unsupported blob type %d\n",
@@ -822,20 +972,20 @@ static int pkey_keyblob2pkey2(const struct pkey_apqn *apqns, size_t nr_apqns,
                dom = apqns[i].domain;
                if (hdr->type == TOKTYPE_CCA_INTERNAL &&
                    hdr->version == TOKVER_CCA_AES) {
-                       rc = cca_sec2protkey(card, dom, key, pkey->protkey,
-                                            &pkey->len, &pkey->type);
+                       rc = cca_sec2protkey(card, dom, key,
+                                            protkey, protkeylen, protkeytype);
                } else if (hdr->type == TOKTYPE_CCA_INTERNAL &&
                           hdr->version == TOKVER_CCA_VLSC) {
-                       rc = cca_cipher2protkey(card, dom, key, pkey->protkey,
-                                               &pkey->len, &pkey->type);
+                       rc = cca_cipher2protkey(card, dom, key,
+                                               protkey, protkeylen,
+                                               protkeytype);
                } else {
                        /* EP11 AES secure key blob */
                        struct ep11keyblob *kb = (struct ep11keyblob *)key;
 
-                       pkey->len = sizeof(pkey->protkey);
                        rc = ep11_kblob2protkey(card, dom, key, kb->head.len,
-                                               pkey->protkey, &pkey->len,
-                                               &pkey->type);
+                                               protkey, protkeylen,
+                                               protkeytype);
                }
                if (rc == 0)
                        break;
@@ -847,9 +997,9 @@ static int pkey_keyblob2pkey2(const struct pkey_apqn *apqns, size_t nr_apqns,
 static int pkey_apqns4key(const u8 *key, size_t keylen, u32 flags,
                          struct pkey_apqn *apqns, size_t *nr_apqns)
 {
-       int rc;
-       u32 _nr_apqns, *_apqns = NULL;
        struct keytoken_header *hdr = (struct keytoken_header *)key;
+       u32 _nr_apqns, *_apqns = NULL;
+       int rc;
 
        if (keylen < sizeof(struct keytoken_header) || flags == 0)
                return -EINVAL;
@@ -860,9 +1010,9 @@ static int pkey_apqns4key(const u8 *key, size_t keylen, u32 flags,
            (hdr->version == TOKVER_EP11_AES_WITH_HEADER ||
             hdr->version == TOKVER_EP11_ECC_WITH_HEADER) &&
            is_ep11_keyblob(key + sizeof(struct ep11kblob_header))) {
-               int minhwtype = 0, api = 0;
                struct ep11keyblob *kb = (struct ep11keyblob *)
                        (key + sizeof(struct ep11kblob_header));
+               int minhwtype = 0, api = 0;
 
                if (flags != PKEY_FLAGS_MATCH_CUR_MKVP)
                        return -EINVAL;
@@ -877,8 +1027,8 @@ static int pkey_apqns4key(const u8 *key, size_t keylen, u32 flags,
        } else if (hdr->type == TOKTYPE_NON_CCA &&
                   hdr->version == TOKVER_EP11_AES &&
                   is_ep11_keyblob(key)) {
-               int minhwtype = 0, api = 0;
                struct ep11keyblob *kb = (struct ep11keyblob *)key;
+               int minhwtype = 0, api = 0;
 
                if (flags != PKEY_FLAGS_MATCH_CUR_MKVP)
                        return -EINVAL;
@@ -891,8 +1041,8 @@ static int pkey_apqns4key(const u8 *key, size_t keylen, u32 flags,
                if (rc)
                        goto out;
        } else if (hdr->type == TOKTYPE_CCA_INTERNAL) {
-               int minhwtype = ZCRYPT_CEX3C;
                u64 cur_mkvp = 0, old_mkvp = 0;
+               int minhwtype = ZCRYPT_CEX3C;
 
                if (hdr->version == TOKVER_CCA_AES) {
                        struct secaeskeytoken *t = (struct secaeskeytoken *)key;
@@ -919,8 +1069,8 @@ static int pkey_apqns4key(const u8 *key, size_t keylen, u32 flags,
                if (rc)
                        goto out;
        } else if (hdr->type == TOKTYPE_CCA_INTERNAL_PKA) {
-               u64 cur_mkvp = 0, old_mkvp = 0;
                struct eccprivkeytoken *t = (struct eccprivkeytoken *)key;
+               u64 cur_mkvp = 0, old_mkvp = 0;
 
                if (t->secid == 0x20) {
                        if (flags & PKEY_FLAGS_MATCH_CUR_MKVP)
@@ -957,8 +1107,8 @@ static int pkey_apqns4keytype(enum pkey_key_type ktype,
                              u8 cur_mkvp[32], u8 alt_mkvp[32], u32 flags,
                              struct pkey_apqn *apqns, size_t *nr_apqns)
 {
-       int rc;
        u32 _nr_apqns, *_apqns = NULL;
+       int rc;
 
        zcrypt_wait_api_operational();
 
@@ -1020,11 +1170,11 @@ out:
 }
 
 static int pkey_keyblob2pkey3(const struct pkey_apqn *apqns, size_t nr_apqns,
-                             const u8 *key, size_t keylen, u32 *protkeytype,
-                             u8 *protkey, u32 *protkeylen)
+                             const u8 *key, size_t keylen,
+                             u8 *protkey, u32 *protkeylen, u32 *protkeytype)
 {
-       int i, card, dom, rc;
        struct keytoken_header *hdr = (struct keytoken_header *)key;
+       int i, card, dom, rc;
 
        /* check for at least one apqn given */
        if (!apqns || !nr_apqns)
@@ -1076,15 +1226,8 @@ static int pkey_keyblob2pkey3(const struct pkey_apqn *apqns, size_t nr_apqns,
                if (cca_check_sececckeytoken(debug_info, 3, key, keylen, 1))
                        return -EINVAL;
        } else if (hdr->type == TOKTYPE_NON_CCA) {
-               struct pkey_protkey pkey;
-
-               rc = pkey_nonccatok2pkey(key, keylen, &pkey);
-               if (rc)
-                       return rc;
-               memcpy(protkey, pkey.protkey, pkey.len);
-               *protkeylen = pkey.len;
-               *protkeytype = pkey.type;
-               return 0;
+               return pkey_nonccatok2pkey(key, keylen,
+                                          protkey, protkeylen, protkeytype);
        } else {
                DEBUG_ERR("%s unknown/unsupported blob type %d\n",
                          __func__, hdr->type);
@@ -1130,7 +1273,7 @@ static int pkey_keyblob2pkey3(const struct pkey_apqn *apqns, size_t nr_apqns,
 
 static void *_copy_key_from_user(void __user *ukey, size_t keylen)
 {
-       if (!ukey || keylen < MINKEYBLOBSIZE || keylen > KEYBLOBBUFSIZE)
+       if (!ukey || keylen < MINKEYBLOBBUFSIZE || keylen > KEYBLOBBUFSIZE)
                return ERR_PTR(-EINVAL);
 
        return memdup_user(ukey, keylen);
@@ -1187,6 +1330,7 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
 
                if (copy_from_user(&ksp, usp, sizeof(ksp)))
                        return -EFAULT;
+               ksp.protkey.len = sizeof(ksp.protkey.protkey);
                rc = cca_sec2protkey(ksp.cardnr, ksp.domain,
                                     ksp.seckey.seckey, ksp.protkey.protkey,
                                     &ksp.protkey.len, &ksp.protkey.type);
@@ -1203,8 +1347,10 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
 
                if (copy_from_user(&kcp, ucp, sizeof(kcp)))
                        return -EFAULT;
-               rc = pkey_clr2protkey(kcp.keytype,
-                                     &kcp.clrkey, &kcp.protkey);
+               kcp.protkey.len = sizeof(kcp.protkey.protkey);
+               rc = pkey_clr2protkey(kcp.keytype, kcp.clrkey.clrkey,
+                                     kcp.protkey.protkey,
+                                     &kcp.protkey.len, &kcp.protkey.type);
                DEBUG_DBG("%s pkey_clr2protkey()=%d\n", __func__, rc);
                if (rc)
                        break;
@@ -1234,7 +1380,9 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
 
                if (copy_from_user(&ksp, usp, sizeof(ksp)))
                        return -EFAULT;
-               rc = pkey_skey2pkey(ksp.seckey.seckey, &ksp.protkey);
+               ksp.protkey.len = sizeof(ksp.protkey.protkey);
+               rc = pkey_skey2pkey(ksp.seckey.seckey, ksp.protkey.protkey,
+                                   &ksp.protkey.len, &ksp.protkey.type);
                DEBUG_DBG("%s pkey_skey2pkey()=%d\n", __func__, rc);
                if (rc)
                        break;
@@ -1263,7 +1411,9 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
 
                if (copy_from_user(&kgp, ugp, sizeof(kgp)))
                        return -EFAULT;
-               rc = pkey_genprotkey(kgp.keytype, &kgp.protkey);
+               kgp.protkey.len = sizeof(kgp.protkey.protkey);
+               rc = pkey_genprotkey(kgp.keytype, kgp.protkey.protkey,
+                                    &kgp.protkey.len, &kgp.protkey.type);
                DEBUG_DBG("%s pkey_genprotkey()=%d\n", __func__, rc);
                if (rc)
                        break;
@@ -1277,7 +1427,8 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
 
                if (copy_from_user(&kvp, uvp, sizeof(kvp)))
                        return -EFAULT;
-               rc = pkey_verifyprotkey(&kvp.protkey);
+               rc = pkey_verifyprotkey(kvp.protkey.protkey,
+                                       kvp.protkey.len, kvp.protkey.type);
                DEBUG_DBG("%s pkey_verifyprotkey()=%d\n", __func__, rc);
                break;
        }
@@ -1291,8 +1442,11 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
                kkey = _copy_key_from_user(ktp.key, ktp.keylen);
                if (IS_ERR(kkey))
                        return PTR_ERR(kkey);
-               rc = pkey_keyblob2pkey(kkey, ktp.keylen, &ktp.protkey);
+               ktp.protkey.len = sizeof(ktp.protkey.protkey);
+               rc = pkey_keyblob2pkey(kkey, ktp.keylen, ktp.protkey.protkey,
+                                      &ktp.protkey.len, &ktp.protkey.type);
                DEBUG_DBG("%s pkey_keyblob2pkey()=%d\n", __func__, rc);
+               memzero_explicit(kkey, ktp.keylen);
                kfree(kkey);
                if (rc)
                        break;
@@ -1302,9 +1456,9 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
        }
        case PKEY_GENSECK2: {
                struct pkey_genseck2 __user *ugs = (void __user *)arg;
+               size_t klen = KEYBLOBBUFSIZE;
                struct pkey_genseck2 kgs;
                struct pkey_apqn *apqns;
-               size_t klen = KEYBLOBBUFSIZE;
                u8 *kkey;
 
                if (copy_from_user(&kgs, ugs, sizeof(kgs)))
@@ -1344,9 +1498,9 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
        }
        case PKEY_CLR2SECK2: {
                struct pkey_clr2seck2 __user *ucs = (void __user *)arg;
+               size_t klen = KEYBLOBBUFSIZE;
                struct pkey_clr2seck2 kcs;
                struct pkey_apqn *apqns;
-               size_t klen = KEYBLOBBUFSIZE;
                u8 *kkey;
 
                if (copy_from_user(&kcs, ucs, sizeof(kcs)))
@@ -1408,8 +1562,8 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
        }
        case PKEY_KBLOB2PROTK2: {
                struct pkey_kblob2pkey2 __user *utp = (void __user *)arg;
-               struct pkey_kblob2pkey2 ktp;
                struct pkey_apqn *apqns = NULL;
+               struct pkey_kblob2pkey2 ktp;
                u8 *kkey;
 
                if (copy_from_user(&ktp, utp, sizeof(ktp)))
@@ -1422,10 +1576,14 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
                        kfree(apqns);
                        return PTR_ERR(kkey);
                }
+               ktp.protkey.len = sizeof(ktp.protkey.protkey);
                rc = pkey_keyblob2pkey2(apqns, ktp.apqn_entries,
-                                       kkey, ktp.keylen, &ktp.protkey);
+                                       kkey, ktp.keylen,
+                                       ktp.protkey.protkey, &ktp.protkey.len,
+                                       &ktp.protkey.type);
                DEBUG_DBG("%s pkey_keyblob2pkey2()=%d\n", __func__, rc);
                kfree(apqns);
+               memzero_explicit(kkey, ktp.keylen);
                kfree(kkey);
                if (rc)
                        break;
@@ -1435,8 +1593,8 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
        }
        case PKEY_APQNS4K: {
                struct pkey_apqns4key __user *uak = (void __user *)arg;
-               struct pkey_apqns4key kak;
                struct pkey_apqn *apqns = NULL;
+               struct pkey_apqns4key kak;
                size_t nr_apqns, len;
                u8 *kkey;
 
@@ -1484,8 +1642,8 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
        }
        case PKEY_APQNS4KT: {
                struct pkey_apqns4keytype __user *uat = (void __user *)arg;
-               struct pkey_apqns4keytype kat;
                struct pkey_apqn *apqns = NULL;
+               struct pkey_apqns4keytype kat;
                size_t nr_apqns, len;
 
                if (copy_from_user(&kat, uat, sizeof(kat)))
@@ -1526,9 +1684,9 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
        }
        case PKEY_KBLOB2PROTK3: {
                struct pkey_kblob2pkey3 __user *utp = (void __user *)arg;
-               struct pkey_kblob2pkey3 ktp;
-               struct pkey_apqn *apqns = NULL;
                u32 protkeylen = PROTKEYBLOBBUFSIZE;
+               struct pkey_apqn *apqns = NULL;
+               struct pkey_kblob2pkey3 ktp;
                u8 *kkey, *protkey;
 
                if (copy_from_user(&ktp, utp, sizeof(ktp)))
@@ -1547,11 +1705,12 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd,
                        kfree(kkey);
                        return -ENOMEM;
                }
-               rc = pkey_keyblob2pkey3(apqns, ktp.apqn_entries, kkey,
-                                       ktp.keylen, &ktp.pkeytype,
-                                       protkey, &protkeylen);
+               rc = pkey_keyblob2pkey3(apqns, ktp.apqn_entries,
+                                       kkey, ktp.keylen,
+                                       protkey, &protkeylen, &ktp.pkeytype);
                DEBUG_DBG("%s pkey_keyblob2pkey3()=%d\n", __func__, rc);
                kfree(apqns);
+               memzero_explicit(kkey, ktp.keylen);
                kfree(kkey);
                if (rc) {
                        kfree(protkey);
@@ -1609,7 +1768,9 @@ static ssize_t pkey_protkey_aes_attr_read(u32 keytype, bool is_xts, char *buf,
        protkeytoken.version = TOKVER_PROTECTED_KEY;
        protkeytoken.keytype = keytype;
 
-       rc = pkey_genprotkey(protkeytoken.keytype, &protkey);
+       protkey.len = sizeof(protkey.protkey);
+       rc = pkey_genprotkey(protkeytoken.keytype,
+                            protkey.protkey, &protkey.len, &protkey.type);
        if (rc)
                return rc;
 
@@ -1619,7 +1780,10 @@ static ssize_t pkey_protkey_aes_attr_read(u32 keytype, bool is_xts, char *buf,
        memcpy(buf, &protkeytoken, sizeof(protkeytoken));
 
        if (is_xts) {
-               rc = pkey_genprotkey(protkeytoken.keytype, &protkey);
+               /* xts needs a second protected key, reuse protkey struct */
+               protkey.len = sizeof(protkey.protkey);
+               rc = pkey_genprotkey(protkeytoken.keytype,
+                                    protkey.protkey, &protkey.len, &protkey.type);
                if (rc)
                        return rc;
 
@@ -1714,8 +1878,8 @@ static struct attribute_group protkey_attr_group = {
 static ssize_t pkey_ccadata_aes_attr_read(u32 keytype, bool is_xts, char *buf,
                                          loff_t off, size_t count)
 {
-       int rc;
        struct pkey_seckey *seckey = (struct pkey_seckey *)buf;
+       int rc;
 
        if (off != 0 || count < sizeof(struct secaeskeytoken))
                return -EINVAL;
@@ -1821,9 +1985,9 @@ static ssize_t pkey_ccacipher_aes_attr_read(enum pkey_key_size keybits,
                                            bool is_xts, char *buf, loff_t off,
                                            size_t count)
 {
-       int i, rc, card, dom;
-       u32 nr_apqns, *apqns = NULL;
        size_t keysize = CCACIPHERTOKENSIZE;
+       u32 nr_apqns, *apqns = NULL;
+       int i, rc, card, dom;
 
        if (off != 0 || count < CCACIPHERTOKENSIZE)
                return -EINVAL;
@@ -1944,9 +2108,9 @@ static ssize_t pkey_ep11_aes_attr_read(enum pkey_key_size keybits,
                                       bool is_xts, char *buf, loff_t off,
                                       size_t count)
 {
-       int i, rc, card, dom;
-       u32 nr_apqns, *apqns = NULL;
        size_t keysize = MAXEP11AESKEYBLOBSIZE;
+       u32 nr_apqns, *apqns = NULL;
+       int i, rc, card, dom;
 
        if (off != 0 || count < MAXEP11AESKEYBLOBSIZE)
                return -EINVAL;
index cfbcb86..a8f58e1 100644 (file)
@@ -716,6 +716,7 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
        ret = vfio_register_emulated_iommu_dev(&matrix_mdev->vdev);
        if (ret)
                goto err_put_vdev;
+       matrix_mdev->req_trigger = NULL;
        dev_set_drvdata(&mdev->dev, matrix_mdev);
        mutex_lock(&matrix_dev->mdevs_lock);
        list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
@@ -1735,6 +1736,26 @@ static void vfio_ap_mdev_close_device(struct vfio_device *vdev)
        vfio_ap_mdev_unset_kvm(matrix_mdev);
 }
 
+static void vfio_ap_mdev_request(struct vfio_device *vdev, unsigned int count)
+{
+       struct device *dev = vdev->dev;
+       struct ap_matrix_mdev *matrix_mdev;
+
+       matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
+
+       if (matrix_mdev->req_trigger) {
+               if (!(count % 10))
+                       dev_notice_ratelimited(dev,
+                                              "Relaying device request to user (#%u)\n",
+                                              count);
+
+               eventfd_signal(matrix_mdev->req_trigger, 1);
+       } else if (count == 0) {
+               dev_notice(dev,
+                          "No device request registered, blocked until released by user\n");
+       }
+}
+
 static int vfio_ap_mdev_get_device_info(unsigned long arg)
 {
        unsigned long minsz;
@@ -1750,11 +1771,115 @@ static int vfio_ap_mdev_get_device_info(unsigned long arg)
 
        info.flags = VFIO_DEVICE_FLAGS_AP | VFIO_DEVICE_FLAGS_RESET;
        info.num_regions = 0;
-       info.num_irqs = 0;
+       info.num_irqs = VFIO_AP_NUM_IRQS;
 
        return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
 }
 
+static ssize_t vfio_ap_get_irq_info(unsigned long arg)
+{
+       unsigned long minsz;
+       struct vfio_irq_info info;
+
+       minsz = offsetofend(struct vfio_irq_info, count);
+
+       if (copy_from_user(&info, (void __user *)arg, minsz))
+               return -EFAULT;
+
+       if (info.argsz < minsz || info.index >= VFIO_AP_NUM_IRQS)
+               return -EINVAL;
+
+       switch (info.index) {
+       case VFIO_AP_REQ_IRQ_INDEX:
+               info.count = 1;
+               info.flags = VFIO_IRQ_INFO_EVENTFD;
+               break;
+       default:
+               return -EINVAL;
+       }
+
+       return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
+}
+
+static int vfio_ap_irq_set_init(struct vfio_irq_set *irq_set, unsigned long arg)
+{
+       int ret;
+       size_t data_size;
+       unsigned long minsz;
+
+       minsz = offsetofend(struct vfio_irq_set, count);
+
+       if (copy_from_user(irq_set, (void __user *)arg, minsz))
+               return -EFAULT;
+
+       ret = vfio_set_irqs_validate_and_prepare(irq_set, 1, VFIO_AP_NUM_IRQS,
+                                                &data_size);
+       if (ret)
+               return ret;
+
+       if (!(irq_set->flags & VFIO_IRQ_SET_ACTION_TRIGGER))
+               return -EINVAL;
+
+       return 0;
+}
+
+static int vfio_ap_set_request_irq(struct ap_matrix_mdev *matrix_mdev,
+                                  unsigned long arg)
+{
+       s32 fd;
+       void __user *data;
+       unsigned long minsz;
+       struct eventfd_ctx *req_trigger;
+
+       minsz = offsetofend(struct vfio_irq_set, count);
+       data = (void __user *)(arg + minsz);
+
+       if (get_user(fd, (s32 __user *)data))
+               return -EFAULT;
+
+       if (fd == -1) {
+               if (matrix_mdev->req_trigger)
+                       eventfd_ctx_put(matrix_mdev->req_trigger);
+               matrix_mdev->req_trigger = NULL;
+       } else if (fd >= 0) {
+               req_trigger = eventfd_ctx_fdget(fd);
+               if (IS_ERR(req_trigger))
+                       return PTR_ERR(req_trigger);
+
+               if (matrix_mdev->req_trigger)
+                       eventfd_ctx_put(matrix_mdev->req_trigger);
+
+               matrix_mdev->req_trigger = req_trigger;
+       } else {
+               return -EINVAL;
+       }
+
+       return 0;
+}
+
+static int vfio_ap_set_irqs(struct ap_matrix_mdev *matrix_mdev,
+                           unsigned long arg)
+{
+       int ret;
+       struct vfio_irq_set irq_set;
+
+       ret = vfio_ap_irq_set_init(&irq_set, arg);
+       if (ret)
+               return ret;
+
+       switch (irq_set.flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+       case VFIO_IRQ_SET_DATA_EVENTFD:
+               switch (irq_set.index) {
+               case VFIO_AP_REQ_IRQ_INDEX:
+                       return vfio_ap_set_request_irq(matrix_mdev, arg);
+               default:
+                       return -EINVAL;
+               }
+       default:
+               return -EINVAL;
+       }
+}
+
 static ssize_t vfio_ap_mdev_ioctl(struct vfio_device *vdev,
                                    unsigned int cmd, unsigned long arg)
 {
@@ -1770,6 +1895,12 @@ static ssize_t vfio_ap_mdev_ioctl(struct vfio_device *vdev,
        case VFIO_DEVICE_RESET:
                ret = vfio_ap_mdev_reset_queues(&matrix_mdev->qtable);
                break;
+       case VFIO_DEVICE_GET_IRQ_INFO:
+                       ret = vfio_ap_get_irq_info(arg);
+                       break;
+       case VFIO_DEVICE_SET_IRQS:
+               ret = vfio_ap_set_irqs(matrix_mdev, arg);
+               break;
        default:
                ret = -EOPNOTSUPP;
                break;
@@ -1844,6 +1975,7 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
        .bind_iommufd = vfio_iommufd_emulated_bind,
        .unbind_iommufd = vfio_iommufd_emulated_unbind,
        .attach_ioas = vfio_iommufd_emulated_attach_ioas,
+       .request = vfio_ap_mdev_request
 };
 
 static struct mdev_driver vfio_ap_matrix_driver = {
index 976a65f..4642bbd 100644 (file)
@@ -15,6 +15,7 @@
 #include <linux/types.h>
 #include <linux/mdev.h>
 #include <linux/delay.h>
+#include <linux/eventfd.h>
 #include <linux/mutex.h>
 #include <linux/kvm_host.h>
 #include <linux/vfio.h>
@@ -103,6 +104,7 @@ struct ap_queue_table {
  *             PQAP(AQIC) instruction.
  * @mdev:      the mediated device
  * @qtable:    table of queues (struct vfio_ap_queue) assigned to the mdev
+ * @req_trigger eventfd ctx for signaling userspace to return a device
  * @apm_add:   bitmap of APIDs added to the host's AP configuration
  * @aqm_add:   bitmap of APQIs added to the host's AP configuration
  * @adm_add:   bitmap of control domain numbers added to the host's AP
@@ -117,6 +119,7 @@ struct ap_matrix_mdev {
        crypto_hook pqap_hook;
        struct mdev_device *mdev;
        struct ap_queue_table qtable;
+       struct eventfd_ctx *req_trigger;
        DECLARE_BITMAP(apm_add, AP_DEVICES);
        DECLARE_BITMAP(aqm_add, AP_DOMAINS);
        DECLARE_BITMAP(adm_add, AP_DOMAINS);
index 8acb9eb..c2096e4 100644 (file)
@@ -771,14 +771,6 @@ static int __init ism_init(void)
 
 static void __exit ism_exit(void)
 {
-       struct ism_dev *ism;
-
-       mutex_lock(&ism_dev_list.mutex);
-       list_for_each_entry(ism, &ism_dev_list.list, list) {
-               ism_dev_exit(ism);
-       }
-       mutex_unlock(&ism_dev_list.mutex);
-
        pci_unregister_driver(&ism_driver);
        debug_unregister(ism_debug_info);
 }
index ca85bdd..cea3a79 100644 (file)
@@ -417,7 +417,7 @@ static int NCR5380_init(struct Scsi_Host *instance, int flags)
        INIT_WORK(&hostdata->main_task, NCR5380_main);
        hostdata->work_q = alloc_workqueue("ncr5380_%d",
                                WQ_UNBOUND | WQ_MEM_RECLAIM,
-                               1, instance->host_no);
+                               0, instance->host_no);
        if (!hostdata->work_q)
                return -ENOMEM;
 
index 5e115e8..7c6efde 100644 (file)
@@ -1678,6 +1678,7 @@ struct aac_dev
        u32                     handle_pci_error;
        bool                    init_reset;
        u8                      soft_reset_support;
+       u8                      use_map_queue;
 };
 
 #define aac_adapter_interrupt(dev) \
index deb32c9..3f062e4 100644 (file)
@@ -223,8 +223,12 @@ int aac_fib_setup(struct aac_dev * dev)
 struct fib *aac_fib_alloc_tag(struct aac_dev *dev, struct scsi_cmnd *scmd)
 {
        struct fib *fibptr;
+       u32 blk_tag;
+       int i;
 
-       fibptr = &dev->fibs[scsi_cmd_to_rq(scmd)->tag];
+       blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
+       i = blk_mq_unique_tag_to_tag(blk_tag);
+       fibptr = &dev->fibs[i];
        /*
         *      Null out fields that depend on being zero at the start of
         *      each I/O
index 68f4dbc..c4a36c0 100644 (file)
@@ -19,6 +19,7 @@
 
 #include <linux/compat.h>
 #include <linux/blkdev.h>
+#include <linux/blk-mq-pci.h>
 #include <linux/completion.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
@@ -504,6 +505,15 @@ common_config:
        return 0;
 }
 
+static void aac_map_queues(struct Scsi_Host *shost)
+{
+       struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
+
+       blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
+                             aac->pdev, 0);
+       aac->use_map_queue = true;
+}
+
 /**
  *     aac_change_queue_depth          -       alter queue depths
  *     @sdev:  SCSI device we are considering
@@ -1488,6 +1498,7 @@ static const struct scsi_host_template aac_driver_template = {
        .bios_param                     = aac_biosparm,
        .shost_groups                   = aac_host_groups,
        .slave_configure                = aac_slave_configure,
+       .map_queues                     = aac_map_queues,
        .change_queue_depth             = aac_change_queue_depth,
        .sdev_groups                    = aac_dev_groups,
        .eh_abort_handler               = aac_eh_abort,
@@ -1775,6 +1786,8 @@ static int aac_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
        shost->max_lun = AAC_MAX_LUN;
 
        pci_set_drvdata(pdev, shost);
+       shost->nr_hw_queues = aac->max_msix;
+       shost->host_tagset = 1;
 
        error = scsi_add_host(shost, &pdev->dev);
        if (error)
@@ -1906,6 +1919,7 @@ static void aac_remove_one(struct pci_dev *pdev)
        struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
 
        aac_cancel_rescan_worker(aac);
+       aac->use_map_queue = false;
        scsi_remove_host(shost);
 
        __aac_shutdown(aac);
index 11ef582..61949f3 100644 (file)
@@ -493,6 +493,10 @@ static int aac_src_deliver_message(struct fib *fib)
 #endif
 
        u16 vector_no;
+       struct scsi_cmnd *scmd;
+       u32 blk_tag;
+       struct Scsi_Host *shost = dev->scsi_host_ptr;
+       struct blk_mq_queue_map *qmap;
 
        atomic_inc(&q->numpending);
 
@@ -505,8 +509,25 @@ static int aac_src_deliver_message(struct fib *fib)
                if ((dev->comm_interface == AAC_COMM_MESSAGE_TYPE3)
                        && dev->sa_firmware)
                        vector_no = aac_get_vector(dev);
-               else
-                       vector_no = fib->vector_no;
+               else {
+                       if (!fib->vector_no || !fib->callback_data) {
+                               if (shost && dev->use_map_queue) {
+                                       qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+                                       vector_no = qmap->mq_map[raw_smp_processor_id()];
+                               }
+                               /*
+                                *      We hardcode the vector_no for
+                                *      reserved commands as a valid shost is
+                                *      absent during the init
+                                */
+                               else
+                                       vector_no = 0;
+                       } else {
+                               scmd = (struct scsi_cmnd *)fib->callback_data;
+                               blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
+                               vector_no = blk_mq_unique_tag_to_hwq(blk_tag);
+                       }
+               }
 
                if (native_hba) {
                        if (fib->flags & FIB_CONTEXT_FLAG_NATIVE_HBA_TMF) {
index ac648bb..cb0a399 100644 (file)
@@ -877,7 +877,8 @@ static long ch_ioctl(struct file *file,
        }
 
        default:
-               return scsi_ioctl(ch->device, file->f_mode, cmd, argp);
+               return scsi_ioctl(ch->device, file->f_mode & FMODE_WRITE, cmd,
+                                 argp);
 
        }
 }
index 9a322a3..595dca9 100644 (file)
@@ -889,7 +889,7 @@ lpfc_bsg_ct_unsol_event(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
                        struct lpfc_iocbq *piocbq)
 {
        uint32_t evt_req_id = 0;
-       uint32_t cmd;
+       u16 cmd;
        struct lpfc_dmabuf *dmabuf = NULL;
        struct lpfc_bsg_event *evt;
        struct event_data *evt_dat = NULL;
@@ -915,7 +915,7 @@ lpfc_bsg_ct_unsol_event(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
 
        ct_req = (struct lpfc_sli_ct_request *)bdeBuf1->virt;
        evt_req_id = ct_req->FsType;
-       cmd = ct_req->CommandResponse.bits.CmdRsp;
+       cmd = be16_to_cpu(ct_req->CommandResponse.bits.CmdRsp);
 
        spin_lock_irqsave(&phba->ct_ev_lock, flags);
        list_for_each_entry(evt, &phba->ct_ev_waiters, node) {
@@ -3186,8 +3186,8 @@ lpfc_bsg_diag_loopback_run(struct bsg_job *job)
                        ctreq->RevisionId.bits.InId = 0;
                        ctreq->FsType = SLI_CT_ELX_LOOPBACK;
                        ctreq->FsSubType = 0;
-                       ctreq->CommandResponse.bits.CmdRsp = ELX_LOOPBACK_DATA;
-                       ctreq->CommandResponse.bits.Size   = size;
+                       ctreq->CommandResponse.bits.CmdRsp = cpu_to_be16(ELX_LOOPBACK_DATA);
+                       ctreq->CommandResponse.bits.Size   = cpu_to_be16(size);
                        segment_offset = ELX_LOOPBACK_HEADER_SZ;
                } else
                        segment_offset = 0;
index df5e5b7..84aa357 100644 (file)
@@ -3796,6 +3796,7 @@ struct qla_qpair {
        uint64_t retry_term_jiff;
        struct qla_tgt_counters tgt_counters;
        uint16_t cpuid;
+       bool cpu_mapped;
        struct qla_fw_resources fwres ____cacheline_aligned;
        struct  qla_buf_pool buf_pool;
        u32     cmd_cnt;
index ec0423e..1a955c3 100644 (file)
@@ -9426,6 +9426,9 @@ struct qla_qpair *qla2xxx_create_qpair(struct scsi_qla_host *vha, int qos,
                qpair->rsp->req = qpair->req;
                qpair->rsp->qpair = qpair;
 
+               if (!qpair->cpu_mapped)
+                       qla_cpu_update(qpair, raw_smp_processor_id());
+
                if (IS_T10_PI_CAPABLE(ha) && ql2xenabledif) {
                        if (ha->fw_attributes & BIT_4)
                                qpair->difdix_supported = 1;
index cce6e42..7b42558 100644 (file)
@@ -539,11 +539,14 @@ qla_mapq_init_qp_cpu_map(struct qla_hw_data *ha,
        if (!ha->qp_cpu_map)
                return;
        mask = pci_irq_get_affinity(ha->pdev, msix->vector_base0);
+       if (!mask)
+               return;
        qpair->cpuid = cpumask_first(mask);
        for_each_cpu(cpu, mask) {
                ha->qp_cpu_map[cpu] = qpair;
        }
        msix->cpuid = qpair->cpuid;
+       qpair->cpu_mapped = true;
 }
 
 static inline void
index 71feda2..245e3a5 100644 (file)
@@ -3770,6 +3770,9 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha,
 
        if (rsp->qpair->cpuid != smp_processor_id() || !rsp->qpair->rcv_intr) {
                rsp->qpair->rcv_intr = 1;
+
+               if (!rsp->qpair->cpu_mapped)
+                       qla_cpu_update(rsp->qpair, raw_smp_processor_id());
        }
 
 #define __update_rsp_in(_is_shadow_hba, _rsp, _rsp_in)                 \
index 96ee352..a9a9ec0 100644 (file)
@@ -10,7 +10,7 @@
 #define uptr64(val) ((void __user *)(uintptr_t)(val))
 
 static int scsi_bsg_sg_io_fn(struct request_queue *q, struct sg_io_v4 *hdr,
-               fmode_t mode, unsigned int timeout)
+               bool open_for_write, unsigned int timeout)
 {
        struct scsi_cmnd *scmd;
        struct request *rq;
@@ -42,7 +42,7 @@ static int scsi_bsg_sg_io_fn(struct request_queue *q, struct sg_io_v4 *hdr,
        if (copy_from_user(scmd->cmnd, uptr64(hdr->request), scmd->cmd_len))
                goto out_put_request;
        ret = -EPERM;
-       if (!scsi_cmd_allowed(scmd->cmnd, mode))
+       if (!scsi_cmd_allowed(scmd->cmnd, open_for_write))
                goto out_put_request;
 
        ret = 0;
index e3b31d3..6f6c597 100644 (file)
@@ -248,7 +248,7 @@ static int scsi_send_start_stop(struct scsi_device *sdev, int data)
  * Only a subset of commands are allowed for unprivileged users. Commands used
  * to format the media, update the firmware, etc. are not permitted.
  */
-bool scsi_cmd_allowed(unsigned char *cmd, fmode_t mode)
+bool scsi_cmd_allowed(unsigned char *cmd, bool open_for_write)
 {
        /* root can do any command. */
        if (capable(CAP_SYS_RAWIO))
@@ -338,7 +338,7 @@ bool scsi_cmd_allowed(unsigned char *cmd, fmode_t mode)
        case GPCMD_SET_READ_AHEAD:
        /* ZBC */
        case ZBC_OUT:
-               return (mode & FMODE_WRITE);
+               return open_for_write;
        default:
                return false;
        }
@@ -346,7 +346,7 @@ bool scsi_cmd_allowed(unsigned char *cmd, fmode_t mode)
 EXPORT_SYMBOL(scsi_cmd_allowed);
 
 static int scsi_fill_sghdr_rq(struct scsi_device *sdev, struct request *rq,
-               struct sg_io_hdr *hdr, fmode_t mode)
+               struct sg_io_hdr *hdr, bool open_for_write)
 {
        struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(rq);
 
@@ -354,7 +354,7 @@ static int scsi_fill_sghdr_rq(struct scsi_device *sdev, struct request *rq,
                return -EMSGSIZE;
        if (copy_from_user(scmd->cmnd, hdr->cmdp, hdr->cmd_len))
                return -EFAULT;
-       if (!scsi_cmd_allowed(scmd->cmnd, mode))
+       if (!scsi_cmd_allowed(scmd->cmnd, open_for_write))
                return -EPERM;
        scmd->cmd_len = hdr->cmd_len;
 
@@ -407,7 +407,8 @@ static int scsi_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
        return ret;
 }
 
-static int sg_io(struct scsi_device *sdev, struct sg_io_hdr *hdr, fmode_t mode)
+static int sg_io(struct scsi_device *sdev, struct sg_io_hdr *hdr,
+               bool open_for_write)
 {
        unsigned long start_time;
        ssize_t ret = 0;
@@ -448,7 +449,7 @@ static int sg_io(struct scsi_device *sdev, struct sg_io_hdr *hdr, fmode_t mode)
                goto out_put_request;
        }
 
-       ret = scsi_fill_sghdr_rq(sdev, rq, hdr, mode);
+       ret = scsi_fill_sghdr_rq(sdev, rq, hdr, open_for_write);
        if (ret < 0)
                goto out_put_request;
 
@@ -477,8 +478,7 @@ out_put_request:
 /**
  * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
  * @q:         request queue to send scsi commands down
- * @mode:      mode used to open the file through which the ioctl has been
- *             submitted
+ * @open_for_write: is the file / block device opened for writing?
  * @sic:       userspace structure describing the command to perform
  *
  * Send down the scsi command described by @sic to the device below
@@ -501,7 +501,7 @@ out_put_request:
  *      Positive numbers returned are the compacted SCSI error codes (4
  *      bytes in one int) where the lowest byte is the SCSI status.
  */
-static int sg_scsi_ioctl(struct request_queue *q, fmode_t mode,
+static int sg_scsi_ioctl(struct request_queue *q, bool open_for_write,
                struct scsi_ioctl_command __user *sic)
 {
        struct request *rq;
@@ -554,7 +554,7 @@ static int sg_scsi_ioctl(struct request_queue *q, fmode_t mode,
                goto error;
 
        err = -EPERM;
-       if (!scsi_cmd_allowed(scmd->cmnd, mode))
+       if (!scsi_cmd_allowed(scmd->cmnd, open_for_write))
                goto error;
 
        /* default.  possible overridden later */
@@ -776,7 +776,7 @@ static int scsi_put_cdrom_generic_arg(const struct cdrom_generic_command *cgc,
        return 0;
 }
 
-static int scsi_cdrom_send_packet(struct scsi_device *sdev, fmode_t mode,
+static int scsi_cdrom_send_packet(struct scsi_device *sdev, bool open_for_write,
                void __user *arg)
 {
        struct cdrom_generic_command cgc;
@@ -817,7 +817,7 @@ static int scsi_cdrom_send_packet(struct scsi_device *sdev, fmode_t mode,
        hdr.cmdp = ((struct cdrom_generic_command __user *) arg)->cmd;
        hdr.cmd_len = sizeof(cgc.cmd);
 
-       err = sg_io(sdev, &hdr, mode);
+       err = sg_io(sdev, &hdr, open_for_write);
        if (err == -EFAULT)
                return -EFAULT;
 
@@ -832,7 +832,7 @@ static int scsi_cdrom_send_packet(struct scsi_device *sdev, fmode_t mode,
        return err;
 }
 
-static int scsi_ioctl_sg_io(struct scsi_device *sdev, fmode_t mode,
+static int scsi_ioctl_sg_io(struct scsi_device *sdev, bool open_for_write,
                void __user *argp)
 {
        struct sg_io_hdr hdr;
@@ -841,7 +841,7 @@ static int scsi_ioctl_sg_io(struct scsi_device *sdev, fmode_t mode,
        error = get_sg_io_hdr(&hdr, argp);
        if (error)
                return error;
-       error = sg_io(sdev, &hdr, mode);
+       error = sg_io(sdev, &hdr, open_for_write);
        if (error == -EFAULT)
                return error;
        if (put_sg_io_hdr(&hdr, argp))
@@ -852,7 +852,7 @@ static int scsi_ioctl_sg_io(struct scsi_device *sdev, fmode_t mode,
 /**
  * scsi_ioctl - Dispatch ioctl to scsi device
  * @sdev: scsi device receiving ioctl
- * @mode: mode the block/char device is opened with
+ * @open_for_write: is the file / block device opened for writing?
  * @cmd: which ioctl is it
  * @arg: data associated with ioctl
  *
@@ -860,7 +860,7 @@ static int scsi_ioctl_sg_io(struct scsi_device *sdev, fmode_t mode,
  * does not take a major/minor number as the dev field.  Rather, it takes
  * a pointer to a &struct scsi_device.
  */
-int scsi_ioctl(struct scsi_device *sdev, fmode_t mode, int cmd,
+int scsi_ioctl(struct scsi_device *sdev, bool open_for_write, int cmd,
                void __user *arg)
 {
        struct request_queue *q = sdev->request_queue;
@@ -896,11 +896,11 @@ int scsi_ioctl(struct scsi_device *sdev, fmode_t mode, int cmd,
        case SG_EMULATED_HOST:
                return sg_emulated_host(q, arg);
        case SG_IO:
-               return scsi_ioctl_sg_io(sdev, mode, arg);
+               return scsi_ioctl_sg_io(sdev, open_for_write, arg);
        case SCSI_IOCTL_SEND_COMMAND:
-               return sg_scsi_ioctl(q, mode, arg);
+               return sg_scsi_ioctl(q, open_for_write, arg);
        case CDROM_SEND_PACKET:
-               return scsi_cdrom_send_packet(sdev, mode, arg);
+               return scsi_cdrom_send_packet(sdev, open_for_write, arg);
        case CDROMCLOSETRAY:
                return scsi_send_start_stop(sdev, 3);
        case CDROMEJECT:
index b7c569a..0226c92 100644 (file)
@@ -1463,6 +1463,8 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
        struct Scsi_Host *host = cmd->device->host;
        int rtn = 0;
 
+       atomic_inc(&cmd->device->iorequest_cnt);
+
        /* check if the device is still usable */
        if (unlikely(cmd->device->sdev_state == SDEV_DEL)) {
                /* in SDEV_DEL we error all commands. DID_NO_CONNECT
@@ -1483,6 +1485,7 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
                 */
                SCSI_LOG_MLQUEUE(3, scmd_printk(KERN_INFO, cmd,
                        "queuecommand : device blocked\n"));
+               atomic_dec(&cmd->device->iorequest_cnt);
                return SCSI_MLQUEUE_DEVICE_BUSY;
        }
 
@@ -1515,6 +1518,7 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
        trace_scsi_dispatch_cmd_start(cmd);
        rtn = host->hostt->queuecommand(host, cmd);
        if (rtn) {
+               atomic_dec(&cmd->device->iorequest_cnt);
                trace_scsi_dispatch_cmd_error(cmd, rtn);
                if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
                    rtn != SCSI_MLQUEUE_TARGET_BUSY)
@@ -1761,7 +1765,6 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
                goto out_dec_host_busy;
        }
 
-       atomic_inc(&cmd->device->iorequest_cnt);
        return BLK_STS_OK;
 
 out_dec_host_busy:
index 1624d52..ab21697 100644 (file)
@@ -1280,11 +1280,10 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
                mempool_free(rq->special_vec.bv_page, sd_page_pool);
 }
 
-static bool sd_need_revalidate(struct block_device *bdev,
-               struct scsi_disk *sdkp)
+static bool sd_need_revalidate(struct gendisk *disk, struct scsi_disk *sdkp)
 {
        if (sdkp->device->removable || sdkp->write_prot) {
-               if (bdev_check_media_change(bdev))
+               if (disk_check_media_change(disk))
                        return true;
        }
 
@@ -1293,13 +1292,13 @@ static bool sd_need_revalidate(struct block_device *bdev,
         * nothing to do with partitions, BLKRRPART is used to force a full
         * revalidate after things like a format for historical reasons.
         */
-       return test_bit(GD_NEED_PART_SCAN, &bdev->bd_disk->state);
+       return test_bit(GD_NEED_PART_SCAN, &disk->state);
 }
 
 /**
  *     sd_open - open a scsi disk device
- *     @bdev: Block device of the scsi disk to open
- *     @mode: FMODE_* mask
+ *     @disk: disk to open
+ *     @mode: open mode
  *
  *     Returns 0 if successful. Returns a negated errno value in case 
  *     of error.
@@ -1309,11 +1308,11 @@ static bool sd_need_revalidate(struct block_device *bdev,
  *     In the latter case @inode and @filp carry an abridged amount
  *     of information as noted above.
  *
- *     Locking: called with bdev->bd_disk->open_mutex held.
+ *     Locking: called with disk->open_mutex held.
  **/
-static int sd_open(struct block_device *bdev, fmode_t mode)
+static int sd_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
+       struct scsi_disk *sdkp = scsi_disk(disk);
        struct scsi_device *sdev = sdkp->device;
        int retval;
 
@@ -1330,14 +1329,15 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
        if (!scsi_block_when_processing_errors(sdev))
                goto error_out;
 
-       if (sd_need_revalidate(bdev, sdkp))
-               sd_revalidate_disk(bdev->bd_disk);
+       if (sd_need_revalidate(disk, sdkp))
+               sd_revalidate_disk(disk);
 
        /*
         * If the drive is empty, just let the open fail.
         */
        retval = -ENOMEDIUM;
-       if (sdev->removable && !sdkp->media_present && !(mode & FMODE_NDELAY))
+       if (sdev->removable && !sdkp->media_present &&
+           !(mode & BLK_OPEN_NDELAY))
                goto error_out;
 
        /*
@@ -1345,7 +1345,7 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
         * if the user expects to be able to write to the thing.
         */
        retval = -EROFS;
-       if (sdkp->write_prot && (mode & FMODE_WRITE))
+       if (sdkp->write_prot && (mode & BLK_OPEN_WRITE))
                goto error_out;
 
        /*
@@ -1374,16 +1374,15 @@ error_out:
  *     sd_release - invoked when the (last) close(2) is called on this
  *     scsi disk.
  *     @disk: disk to release
- *     @mode: FMODE_* mask
  *
  *     Returns 0. 
  *
  *     Note: may block (uninterruptible) if error recovery is underway
  *     on this disk.
  *
- *     Locking: called with bdev->bd_disk->open_mutex held.
+ *     Locking: called with disk->open_mutex held.
  **/
-static void sd_release(struct gendisk *disk, fmode_t mode)
+static void sd_release(struct gendisk *disk)
 {
        struct scsi_disk *sdkp = scsi_disk(disk);
        struct scsi_device *sdev = sdkp->device;
@@ -1426,7 +1425,7 @@ static int sd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 /**
  *     sd_ioctl - process an ioctl
  *     @bdev: target block device
- *     @mode: FMODE_* mask
+ *     @mode: open mode
  *     @cmd: ioctl command number
  *     @arg: this is third argument given to ioctl(2) system call.
  *     Often contains a pointer.
@@ -1437,7 +1436,7 @@ static int sd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
  *     Note: most ioctls are forward onto the block subsystem or further
  *     down in the scsi subsystem.
  **/
-static int sd_ioctl(struct block_device *bdev, fmode_t mode,
+static int sd_ioctl(struct block_device *bdev, blk_mode_t mode,
                    unsigned int cmd, unsigned long arg)
 {
        struct gendisk *disk = bdev->bd_disk;
@@ -1459,13 +1458,13 @@ static int sd_ioctl(struct block_device *bdev, fmode_t mode,
         * access to the device is prohibited.
         */
        error = scsi_ioctl_block_when_processing_errors(sdp, cmd,
-                       (mode & FMODE_NDELAY) != 0);
+                       (mode & BLK_OPEN_NDELAY));
        if (error)
                return error;
 
        if (is_sed_ioctl(cmd))
                return sed_ioctl(sdkp->opal_dev, cmd, p);
-       return scsi_ioctl(sdp, mode, cmd, p);
+       return scsi_ioctl(sdp, mode & BLK_OPEN_WRITE, cmd, p);
 }
 
 static void set_media_not_present(struct scsi_disk *sdkp)
index 037f8c9..dcb7378 100644 (file)
@@ -237,7 +237,7 @@ static int sg_allow_access(struct file *filp, unsigned char *cmd)
 
        if (sfp->parentdp->device->type == TYPE_SCANNER)
                return 0;
-       if (!scsi_cmd_allowed(cmd, filp->f_mode))
+       if (!scsi_cmd_allowed(cmd, filp->f_mode & FMODE_WRITE))
                return -EPERM;
        return 0;
 }
@@ -1103,7 +1103,8 @@ sg_ioctl_common(struct file *filp, Sg_device *sdp, Sg_fd *sfp,
        case SCSI_IOCTL_SEND_COMMAND:
                if (atomic_read(&sdp->detaching))
                        return -ENODEV;
-               return scsi_ioctl(sdp->device, filp->f_mode, cmd_in, p);
+               return scsi_ioctl(sdp->device, filp->f_mode & FMODE_WRITE,
+                                 cmd_in, p);
        case SG_SET_DEBUG:
                result = get_user(val, ip);
                if (result)
@@ -1159,7 +1160,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
        ret = sg_ioctl_common(filp, sdp, sfp, cmd_in, p);
        if (ret != -ENOIOCTLCMD)
                return ret;
-       return scsi_ioctl(sdp->device, filp->f_mode, cmd_in, p);
+       return scsi_ioctl(sdp->device, filp->f_mode & FMODE_WRITE, cmd_in, p);
 }
 
 static __poll_t
@@ -1496,6 +1497,10 @@ sg_add_device(struct device *cl_dev)
        int error;
        unsigned long iflags;
 
+       error = blk_get_queue(scsidp->request_queue);
+       if (error)
+               return error;
+
        error = -ENOMEM;
        cdev = cdev_alloc();
        if (!cdev) {
@@ -1553,6 +1558,7 @@ cdev_add_err:
 out:
        if (cdev)
                cdev_del(cdev);
+       blk_put_queue(scsidp->request_queue);
        return error;
 }
 
@@ -1560,6 +1566,7 @@ static void
 sg_device_destroy(struct kref *kref)
 {
        struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
+       struct request_queue *q = sdp->device->request_queue;
        unsigned long flags;
 
        /* CAUTION!  Note that the device can still be found via idr_find()
@@ -1567,6 +1574,9 @@ sg_device_destroy(struct kref *kref)
         * any other cleanup.
         */
 
+       blk_trace_remove(q);
+       blk_put_queue(q);
+
        write_lock_irqsave(&sg_index_lock, flags);
        idr_remove(&sg_index_idr, sdp->index);
        write_unlock_irqrestore(&sg_index_lock, flags);
index 12869e6..ce886c8 100644 (file)
@@ -484,9 +484,9 @@ static void sr_revalidate_disk(struct scsi_cd *cd)
        get_sectorsize(cd);
 }
 
-static int sr_block_open(struct block_device *bdev, fmode_t mode)
+static int sr_block_open(struct gendisk *disk, blk_mode_t mode)
 {
-       struct scsi_cd *cd = scsi_cd(bdev->bd_disk);
+       struct scsi_cd *cd = scsi_cd(disk);
        struct scsi_device *sdev = cd->device;
        int ret;
 
@@ -494,11 +494,11 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode)
                return -ENXIO;
 
        scsi_autopm_get_device(sdev);
-       if (bdev_check_media_change(bdev))
+       if (disk_check_media_change(disk))
                sr_revalidate_disk(cd);
 
        mutex_lock(&cd->lock);
-       ret = cdrom_open(&cd->cdi, bdev, mode);
+       ret = cdrom_open(&cd->cdi, mode);
        mutex_unlock(&cd->lock);
 
        scsi_autopm_put_device(sdev);
@@ -507,19 +507,19 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode)
        return ret;
 }
 
-static void sr_block_release(struct gendisk *disk, fmode_t mode)
+static void sr_block_release(struct gendisk *disk)
 {
        struct scsi_cd *cd = scsi_cd(disk);
 
        mutex_lock(&cd->lock);
-       cdrom_release(&cd->cdi, mode);
+       cdrom_release(&cd->cdi);
        mutex_unlock(&cd->lock);
 
        scsi_device_put(cd->device);
 }
 
-static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
-                         unsigned long arg)
+static int sr_block_ioctl(struct block_device *bdev, blk_mode_t mode,
+               unsigned cmd, unsigned long arg)
 {
        struct scsi_cd *cd = scsi_cd(bdev->bd_disk);
        struct scsi_device *sdev = cd->device;
@@ -532,18 +532,18 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
        mutex_lock(&cd->lock);
 
        ret = scsi_ioctl_block_when_processing_errors(sdev, cmd,
-                       (mode & FMODE_NDELAY) != 0);
+                       (mode & BLK_OPEN_NDELAY));
        if (ret)
                goto out;
 
        scsi_autopm_get_device(sdev);
 
        if (cmd != CDROMCLOSETRAY && cmd != CDROMEJECT) {
-               ret = cdrom_ioctl(&cd->cdi, bdev, mode, cmd, arg);
+               ret = cdrom_ioctl(&cd->cdi, bdev, cmd, arg);
                if (ret != -ENOSYS)
                        goto put;
        }
-       ret = scsi_ioctl(sdev, mode, cmd, argp);
+       ret = scsi_ioctl(sdev, mode & BLK_OPEN_WRITE, cmd, argp);
 
 put:
        scsi_autopm_put_device(sdev);
index b90a440..14d7981 100644 (file)
@@ -3832,7 +3832,7 @@ static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg)
                break;
        }
 
-       retval = scsi_ioctl(STp->device, file->f_mode, cmd_in, p);
+       retval = scsi_ioctl(STp->device, file->f_mode & FMODE_WRITE, cmd_in, p);
        if (!retval && cmd_in == SCSI_IOCTL_STOP_UNIT) {
                /* unload */
                STp->rew_at_close = 0;
index 5b230e1..8ffb75b 100644 (file)
@@ -109,7 +109,9 @@ enum {
        TASK_ATTRIBUTE_HEADOFQUEUE              = 0x1,
        TASK_ATTRIBUTE_ORDERED                  = 0x2,
        TASK_ATTRIBUTE_ACA                      = 0x4,
+};
 
+enum {
        SS_STS_NORMAL                           = 0x80000000,
        SS_STS_DONE                             = 0x40000000,
        SS_STS_HANDSHAKE                        = 0x20000000,
@@ -121,7 +123,9 @@ enum {
        SS_I2H_REQUEST_RESET                    = 0x2000,
 
        SS_MU_OPERATIONAL                       = 0x80000000,
+};
 
+enum {
        STEX_CDB_LENGTH                         = 16,
        STATUS_VAR_LEN                          = 128,
 
index d9ce379..659196a 100644 (file)
@@ -1567,6 +1567,8 @@ static int storvsc_device_configure(struct scsi_device *sdevice)
 {
        blk_queue_rq_timeout(sdevice->request_queue, (storvsc_timeout * HZ));
 
+       /* storvsc devices don't support MAINTENANCE_IN SCSI cmd */
+       sdevice->no_report_opcodes = 1;
        sdevice->no_write_same = 1;
 
        /*
@@ -1780,7 +1782,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 
        length = scsi_bufflen(scmnd);
        payload = (struct vmbus_packet_mpb_array *)&cmd_request->mpb;
-       payload_sz = sizeof(cmd_request->mpb);
+       payload_sz = 0;
 
        if (scsi_sg_count(scmnd)) {
                unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
@@ -1789,10 +1791,10 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
                unsigned long hvpfn, hvpfns_to_add;
                int j, i = 0, sg_count;
 
-               if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
+               payload_sz = (hvpg_count * sizeof(u64) +
+                             sizeof(struct vmbus_packet_mpb_array));
 
-                       payload_sz = (hvpg_count * sizeof(u64) +
-                                     sizeof(struct vmbus_packet_mpb_array));
+               if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
                        payload = kzalloc(payload_sz, GFP_ATOMIC);
                        if (!payload)
                                return SCSI_MLQUEUE_DEVICE_BUSY;
index 7268c2f..e0d0966 100644 (file)
@@ -36,7 +36,7 @@ config UCC
 config CPM_TSA
        tristate "CPM TSA support"
        depends on OF && HAS_IOMEM
-       depends on CPM1 || COMPILE_TEST
+       depends on CPM1 || (CPM && COMPILE_TEST)
        help
          Freescale CPM Time Slot Assigner (TSA)
          controller.
@@ -47,7 +47,7 @@ config CPM_TSA
 config CPM_QMC
        tristate "CPM QMC support"
        depends on OF && HAS_IOMEM
-       depends on CPM1 || (FSL_SOC && COMPILE_TEST)
+       depends on CPM1 || (FSL_SOC && CPM && COMPILE_TEST)
        depends on CPM_TSA
        help
          Freescale CPM QUICC Multichannel Controller
index 0f43a88..89b7755 100644 (file)
@@ -32,4 +32,5 @@ obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o
 obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o
 obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) +=        kryo-l2-accessors.o
 obj-$(CONFIG_QCOM_ICC_BWMON)   += icc-bwmon.o
-obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE)        += ice.o
+qcom_ice-objs                  += ice.o
+obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE)        += qcom_ice.o
index fd58c5b..f65bfec 100644 (file)
@@ -773,12 +773,12 @@ static int bwmon_probe(struct platform_device *pdev)
        bwmon->max_bw_kbps = UINT_MAX;
        opp = dev_pm_opp_find_bw_floor(dev, &bwmon->max_bw_kbps, 0);
        if (IS_ERR(opp))
-               return dev_err_probe(dev, ret, "failed to find max peak bandwidth\n");
+               return dev_err_probe(dev, PTR_ERR(opp), "failed to find max peak bandwidth\n");
 
        bwmon->min_bw_kbps = 0;
        opp = dev_pm_opp_find_bw_ceil(dev, &bwmon->min_bw_kbps, 0);
        if (IS_ERR(opp))
-               return dev_err_probe(dev, ret, "failed to find min peak bandwidth\n");
+               return dev_err_probe(dev, PTR_ERR(opp), "failed to find min peak bandwidth\n");
 
        bwmon->dev = dev;
 
index dc74d2a..5e3ba0b 100644 (file)
@@ -296,7 +296,7 @@ static int qcom_ramp_controller_probe(struct platform_device *pdev)
                return -ENOMEM;
 
        qrc->desc = device_get_match_data(&pdev->dev);
-       if (!qrc)
+       if (!qrc->desc)
                return -EINVAL;
 
        qrc->regmap = devm_regmap_init_mmio(&pdev->dev, base, &qrc_regmap_config);
index ce48a9f..f83811f 100644 (file)
@@ -233,6 +233,7 @@ static int qcom_rmtfs_mem_probe(struct platform_device *pdev)
                num_vmids = 0;
        } else if (num_vmids < 0) {
                dev_err(&pdev->dev, "failed to count qcom,vmid elements: %d\n", num_vmids);
+               ret = num_vmids;
                goto remove_cdev;
        } else if (num_vmids > NUM_MAX_VMIDS) {
                dev_warn(&pdev->dev,
index f93544f..0dd4363 100644 (file)
@@ -1073,7 +1073,7 @@ static int rpmh_rsc_probe(struct platform_device *pdev)
        drv->ver.minor = rsc_id & (MINOR_VER_MASK << MINOR_VER_SHIFT);
        drv->ver.minor >>= MINOR_VER_SHIFT;
 
-       if (drv->ver.major == 3 && drv->ver.minor >= 0)
+       if (drv->ver.major == 3)
                drv->regs = rpmh_rsc_reg_offset_ver_3_0;
        else
                drv->regs = rpmh_rsc_reg_offset_ver_2_7;
index f20e2a4..63c35a3 100644 (file)
@@ -342,6 +342,21 @@ static const struct rpmhpd_desc sm8150_desc = {
        .num_pds = ARRAY_SIZE(sm8150_rpmhpds),
 };
 
+static struct rpmhpd *sa8155p_rpmhpds[] = {
+       [SA8155P_CX] = &cx_w_mx_parent,
+       [SA8155P_CX_AO] = &cx_ao_w_mx_parent,
+       [SA8155P_EBI] = &ebi,
+       [SA8155P_GFX] = &gfx,
+       [SA8155P_MSS] = &mss,
+       [SA8155P_MX] = &mx,
+       [SA8155P_MX_AO] = &mx_ao,
+};
+
+static const struct rpmhpd_desc sa8155p_desc = {
+       .rpmhpds = sa8155p_rpmhpds,
+       .num_pds = ARRAY_SIZE(sa8155p_rpmhpds),
+};
+
 /* SM8250 RPMH powerdomains */
 static struct rpmhpd *sm8250_rpmhpds[] = {
        [SM8250_CX] = &cx_w_mx_parent,
@@ -519,6 +534,7 @@ static const struct rpmhpd_desc sc8280xp_desc = {
 
 static const struct of_device_id rpmhpd_match_table[] = {
        { .compatible = "qcom,qdu1000-rpmhpd", .data = &qdu1000_desc },
+       { .compatible = "qcom,sa8155p-rpmhpd", .data = &sa8155p_desc },
        { .compatible = "qcom,sa8540p-rpmhpd", .data = &sa8540p_desc },
        { .compatible = "qcom,sa8775p-rpmhpd", .data = &sa8775p_desc },
        { .compatible = "qcom,sc7180-rpmhpd", .data = &sc7180_desc },
index 58ea013..2a1096d 100644 (file)
@@ -100,6 +100,13 @@ static const struct dmi_system_id adr_remap_quirk_table[] = {
                .driver_data = (void *)intel_tgl_bios,
        },
        {
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "HP"),
+                       DMI_MATCH(DMI_BOARD_NAME, "8709"),
+               },
+               .driver_data = (void *)intel_tgl_bios,
+       },
+       {
                /* quirk used for NUC15 'Bishop County' LAPBC510 and LAPBC710 skews */
                .matches = {
                        DMI_MATCH(DMI_SYS_VENDOR, "Intel(R) Client Systems"),
index c296e0b..280455f 100644 (file)
@@ -1099,8 +1099,10 @@ static int qcom_swrm_startup(struct snd_pcm_substream *substream,
        }
 
        sruntime = sdw_alloc_stream(dai->name);
-       if (!sruntime)
-               return -ENOMEM;
+       if (!sruntime) {
+               ret = -ENOMEM;
+               goto err_alloc;
+       }
 
        ctrl->sruntime[dai->id] = sruntime;
 
@@ -1110,12 +1112,19 @@ static int qcom_swrm_startup(struct snd_pcm_substream *substream,
                if (ret < 0 && ret != -ENOTSUPP) {
                        dev_err(dai->dev, "Failed to set sdw stream on %s\n",
                                codec_dai->name);
-                       sdw_release_stream(sruntime);
-                       return ret;
+                       goto err_set_stream;
                }
        }
 
        return 0;
+
+err_set_stream:
+       sdw_release_stream(sruntime);
+err_alloc:
+       pm_runtime_mark_last_busy(ctrl->dev);
+       pm_runtime_put_autosuspend(ctrl->dev);
+
+       return ret;
 }
 
 static void qcom_swrm_shutdown(struct snd_pcm_substream *substream,
index c2191c0..379228f 100644 (file)
@@ -2021,8 +2021,10 @@ int sdw_stream_add_slave(struct sdw_slave *slave,
 
 skip_alloc_master_rt:
        s_rt = sdw_slave_rt_find(slave, stream);
-       if (s_rt)
+       if (s_rt) {
+               alloc_slave_rt = false;
                goto skip_alloc_slave_rt;
+       }
 
        s_rt = sdw_slave_rt_alloc(slave, m_rt);
        if (!s_rt) {
index 6ddb2df..32449be 100644 (file)
@@ -1756,8 +1756,11 @@ static int cqspi_probe(struct platform_device *pdev)
                        cqspi->slow_sram = true;
 
                if (of_device_is_compatible(pdev->dev.of_node,
-                                           "xlnx,versal-ospi-1.0"))
-                       dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
+                                           "xlnx,versal-ospi-1.0")) {
+                       ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
+                       if (ret)
+                               goto probe_reset_failed;
+               }
        }
 
        ret = devm_request_irq(dev, irq, cqspi_irq_handler, 0,
index ac85d55..26e6633 100644 (file)
@@ -12,6 +12,7 @@
 #include <linux/gpio/consumer.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
+#include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/of_irq.h>
 #include <linux/of_address.h>
@@ -301,49 +302,43 @@ static int cdns_spi_setup_transfer(struct spi_device *spi,
 }
 
 /**
- * cdns_spi_fill_tx_fifo - Fills the TX FIFO with as many bytes as possible
+ * cdns_spi_process_fifo - Fills the TX FIFO, and drain the RX FIFO
  * @xspi:      Pointer to the cdns_spi structure
+ * @ntx:       Number of bytes to pack into the TX FIFO
+ * @nrx:       Number of bytes to drain from the RX FIFO
  */
-static void cdns_spi_fill_tx_fifo(struct cdns_spi *xspi)
+static void cdns_spi_process_fifo(struct cdns_spi *xspi, int ntx, int nrx)
 {
-       unsigned long trans_cnt = 0;
+       ntx = clamp(ntx, 0, xspi->tx_bytes);
+       nrx = clamp(nrx, 0, xspi->rx_bytes);
 
-       while ((trans_cnt < xspi->tx_fifo_depth) &&
-              (xspi->tx_bytes > 0)) {
+       xspi->tx_bytes -= ntx;
+       xspi->rx_bytes -= nrx;
 
+       while (ntx || nrx) {
                /* When xspi in busy condition, bytes may send failed,
                 * then spi control did't work thoroughly, add one byte delay
                 */
-               if (cdns_spi_read(xspi, CDNS_SPI_ISR) &
-                   CDNS_SPI_IXR_TXFULL)
+               if (cdns_spi_read(xspi, CDNS_SPI_ISR) & CDNS_SPI_IXR_TXFULL)
                        udelay(10);
 
-               if (xspi->txbuf)
-                       cdns_spi_write(xspi, CDNS_SPI_TXD, *xspi->txbuf++);
-               else
-                       cdns_spi_write(xspi, CDNS_SPI_TXD, 0);
+               if (ntx) {
+                       if (xspi->txbuf)
+                               cdns_spi_write(xspi, CDNS_SPI_TXD, *xspi->txbuf++);
+                       else
+                               cdns_spi_write(xspi, CDNS_SPI_TXD, 0);
 
-               xspi->tx_bytes--;
-               trans_cnt++;
-       }
-}
+                       ntx--;
+               }
 
-/**
- * cdns_spi_read_rx_fifo - Reads the RX FIFO with as many bytes as possible
- * @xspi:       Pointer to the cdns_spi structure
- * @count:     Read byte count
- */
-static void cdns_spi_read_rx_fifo(struct cdns_spi *xspi, unsigned long count)
-{
-       u8 data;
-
-       /* Read out the data from the RX FIFO */
-       while (count > 0) {
-               data = cdns_spi_read(xspi, CDNS_SPI_RXD);
-               if (xspi->rxbuf)
-                       *xspi->rxbuf++ = data;
-               xspi->rx_bytes--;
-               count--;
+               if (nrx) {
+                       u8 data = cdns_spi_read(xspi, CDNS_SPI_RXD);
+
+                       if (xspi->rxbuf)
+                               *xspi->rxbuf++ = data;
+
+                       nrx--;
+               }
        }
 }
 
@@ -381,33 +376,22 @@ static irqreturn_t cdns_spi_irq(int irq, void *dev_id)
                spi_finalize_current_transfer(ctlr);
                status = IRQ_HANDLED;
        } else if (intr_status & CDNS_SPI_IXR_TXOW) {
-               int trans_cnt = cdns_spi_read(xspi, CDNS_SPI_THLD);
+               int threshold = cdns_spi_read(xspi, CDNS_SPI_THLD);
+               int trans_cnt = xspi->rx_bytes - xspi->tx_bytes;
+
+               if (threshold > 1)
+                       trans_cnt -= threshold;
+
                /* Set threshold to one if number of pending are
                 * less than half fifo
                 */
                if (xspi->tx_bytes < xspi->tx_fifo_depth >> 1)
                        cdns_spi_write(xspi, CDNS_SPI_THLD, 1);
 
-               while (trans_cnt) {
-                       cdns_spi_read_rx_fifo(xspi, 1);
-
-                       if (xspi->tx_bytes) {
-                               if (xspi->txbuf)
-                                       cdns_spi_write(xspi, CDNS_SPI_TXD,
-                                                      *xspi->txbuf++);
-                               else
-                                       cdns_spi_write(xspi, CDNS_SPI_TXD, 0);
-                               xspi->tx_bytes--;
-                       }
-                       trans_cnt--;
-               }
-               if (!xspi->tx_bytes) {
-                       /* Fixed delay due to controller limitation with
-                        * RX_NEMPTY incorrect status
-                        * Xilinx AR:65885 contains more details
-                        */
-                       udelay(10);
-                       cdns_spi_read_rx_fifo(xspi, xspi->rx_bytes);
+               if (xspi->tx_bytes) {
+                       cdns_spi_process_fifo(xspi, trans_cnt, trans_cnt);
+               } else {
+                       cdns_spi_process_fifo(xspi, 0, trans_cnt);
                        cdns_spi_write(xspi, CDNS_SPI_IDR,
                                       CDNS_SPI_IXR_DEFAULT);
                        spi_finalize_current_transfer(ctlr);
@@ -450,16 +434,17 @@ static int cdns_transfer_one(struct spi_controller *ctlr,
        xspi->tx_bytes = transfer->len;
        xspi->rx_bytes = transfer->len;
 
-       if (!spi_controller_is_slave(ctlr))
+       if (!spi_controller_is_slave(ctlr)) {
                cdns_spi_setup_transfer(spi, transfer);
+       } else {
+               /* Set TX empty threshold to half of FIFO depth
+                * only if TX bytes are more than half FIFO depth.
+                */
+               if (xspi->tx_bytes > xspi->tx_fifo_depth)
+                       cdns_spi_write(xspi, CDNS_SPI_THLD, xspi->tx_fifo_depth >> 1);
+       }
 
-       /* Set TX empty threshold to half of FIFO depth
-        * only if TX bytes are more than half FIFO depth.
-        */
-       if (xspi->tx_bytes > (xspi->tx_fifo_depth >> 1))
-               cdns_spi_write(xspi, CDNS_SPI_THLD, xspi->tx_fifo_depth >> 1);
-
-       cdns_spi_fill_tx_fifo(xspi);
+       cdns_spi_process_fifo(xspi, xspi->tx_fifo_depth, 0);
        spi_transfer_delay_exec(transfer);
 
        cdns_spi_write(xspi, CDNS_SPI_IER, CDNS_SPI_IXR_DEFAULT);
index 5e6faa9..15f5e9c 100644 (file)
@@ -264,17 +264,17 @@ static void dw_spi_elba_set_cs(struct spi_device *spi, bool enable)
        struct regmap *syscon = dwsmmio->priv;
        u8 cs;
 
-       cs = spi->chip_select;
+       cs = spi_get_chipselect(spi, 0);
        if (cs < 2)
-               dw_spi_elba_override_cs(syscon, spi->chip_select, enable);
+               dw_spi_elba_override_cs(syscon, spi_get_chipselect(spi, 0), enable);
 
        /*
         * The DW SPI controller needs a native CS bit selected to start
         * the serial engine.
         */
-       spi->chip_select = 0;
+       spi_set_chipselect(spi, 0, 0);
        dw_spi_set_cs(spi, enable);
-       spi->chip_select = cs;
+       spi_set_chipselect(spi, 0, cs);
 }
 
 static int dw_spi_elba_init(struct platform_device *pdev,
index 4339485..674cfe0 100644 (file)
@@ -1002,7 +1002,9 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 static int dspi_setup(struct spi_device *spi)
 {
        struct fsl_dspi *dspi = spi_controller_get_devdata(spi->controller);
+       u32 period_ns = DIV_ROUND_UP(NSEC_PER_SEC, spi->max_speed_hz);
        unsigned char br = 0, pbr = 0, pcssck = 0, cssck = 0;
+       u32 quarter_period_ns = DIV_ROUND_UP(period_ns, 4);
        u32 cs_sck_delay = 0, sck_cs_delay = 0;
        struct fsl_dspi_platform_data *pdata;
        unsigned char pasc = 0, asc = 0;
@@ -1031,6 +1033,19 @@ static int dspi_setup(struct spi_device *spi)
                sck_cs_delay = pdata->sck_cs_delay;
        }
 
+       /* Since tCSC and tASC apply to continuous transfers too, avoid SCK
+        * glitches of half a cycle by never allowing tCSC + tASC to go below
+        * half a SCK period.
+        */
+       if (cs_sck_delay < quarter_period_ns)
+               cs_sck_delay = quarter_period_ns;
+       if (sck_cs_delay < quarter_period_ns)
+               sck_cs_delay = quarter_period_ns;
+
+       dev_dbg(&spi->dev,
+               "DSPI controller timing params: CS-to-SCK delay %u ns, SCK-to-CS delay %u ns\n",
+               cs_sck_delay, sck_cs_delay);
+
        clkrate = clk_get_rate(dspi->clk);
        hz_to_spi_baud(&pbr, &br, spi->max_speed_hz, clkrate);
 
index f2341ab..4b70038 100644 (file)
@@ -910,9 +910,14 @@ static int fsl_lpspi_probe(struct platform_device *pdev)
        ret = fsl_lpspi_dma_init(&pdev->dev, fsl_lpspi, controller);
        if (ret == -EPROBE_DEFER)
                goto out_pm_get;
-
        if (ret < 0)
                dev_err(&pdev->dev, "dma setup error %d, use pio\n", ret);
+       else
+               /*
+                * disable LPSPI module IRQ when enable DMA mode successfully,
+                * to prevent the unexpected LPSPI module IRQ events.
+                */
+               disable_irq(irq);
 
        ret = devm_spi_register_controller(&pdev->dev, controller);
        if (ret < 0) {
index ba7be50..b293428 100644 (file)
@@ -294,6 +294,8 @@ static void spi_geni_set_cs(struct spi_device *slv, bool set_flag)
        mas->cs_flag = set_flag;
        /* set xfer_mode to FIFO to complete cs_done in isr */
        mas->cur_xfer_mode = GENI_SE_FIFO;
+       geni_se_select_mode(se, mas->cur_xfer_mode);
+
        reinit_completion(&mas->cs_done);
        if (set_flag)
                geni_se_setup_m_cmd(se, SPI_CS_ASSERT, 0);
@@ -644,6 +646,8 @@ static int spi_geni_init(struct spi_geni_master *mas)
                        geni_se_select_mode(se, GENI_GPI_DMA);
                        dev_dbg(mas->dev, "Using GPI DMA mode for SPI\n");
                        break;
+               } else if (ret == -EPROBE_DEFER) {
+                       goto out_pm;
                }
                /*
                 * in case of failure to get gpi dma channel, we can still do the
index 21c321f..d7432e2 100644 (file)
@@ -1275,6 +1275,9 @@ static int mtk_spi_remove(struct platform_device *pdev)
        struct mtk_spi *mdata = spi_master_get_devdata(master);
        int ret;
 
+       if (mdata->use_spimem && !completion_done(&mdata->spimem_done))
+               complete(&mdata->spimem_done);
+
        ret = pm_runtime_resume_and_get(&pdev->dev);
        if (ret < 0)
                return ret;
index 944ef6b..00e5e88 100644 (file)
@@ -1028,23 +1028,8 @@ static int spi_qup_probe(struct platform_device *pdev)
                return -ENXIO;
        }
 
-       ret = clk_prepare_enable(cclk);
-       if (ret) {
-               dev_err(dev, "cannot enable core clock\n");
-               return ret;
-       }
-
-       ret = clk_prepare_enable(iclk);
-       if (ret) {
-               clk_disable_unprepare(cclk);
-               dev_err(dev, "cannot enable iface clock\n");
-               return ret;
-       }
-
        master = spi_alloc_master(dev, sizeof(struct spi_qup));
        if (!master) {
-               clk_disable_unprepare(cclk);
-               clk_disable_unprepare(iclk);
                dev_err(dev, "cannot allocate master\n");
                return -ENOMEM;
        }
@@ -1092,6 +1077,19 @@ static int spi_qup_probe(struct platform_device *pdev)
        spin_lock_init(&controller->lock);
        init_completion(&controller->done);
 
+       ret = clk_prepare_enable(cclk);
+       if (ret) {
+               dev_err(dev, "cannot enable core clock\n");
+               goto error_dma;
+       }
+
+       ret = clk_prepare_enable(iclk);
+       if (ret) {
+               clk_disable_unprepare(cclk);
+               dev_err(dev, "cannot enable iface clock\n");
+               goto error_dma;
+       }
+
        iomode = readl_relaxed(base + QUP_IO_M_MODES);
 
        size = QUP_IO_M_OUTPUT_BLOCK_SIZE(iomode);
@@ -1121,7 +1119,7 @@ static int spi_qup_probe(struct platform_device *pdev)
        ret = spi_qup_set_state(controller, QUP_STATE_RESET);
        if (ret) {
                dev_err(dev, "cannot set RESET state\n");
-               goto error_dma;
+               goto error_clk;
        }
 
        writel_relaxed(0, base + QUP_OPERATIONAL);
@@ -1145,7 +1143,7 @@ static int spi_qup_probe(struct platform_device *pdev)
        ret = devm_request_irq(dev, irq, spi_qup_qup_irq,
                               IRQF_TRIGGER_HIGH, pdev->name, controller);
        if (ret)
-               goto error_dma;
+               goto error_clk;
 
        pm_runtime_set_autosuspend_delay(dev, MSEC_PER_SEC);
        pm_runtime_use_autosuspend(dev);
@@ -1160,11 +1158,12 @@ static int spi_qup_probe(struct platform_device *pdev)
 
 disable_pm:
        pm_runtime_disable(&pdev->dev);
+error_clk:
+       clk_disable_unprepare(cclk);
+       clk_disable_unprepare(iclk);
 error_dma:
        spi_qup_release_dma(master);
 error:
-       clk_disable_unprepare(cclk);
-       clk_disable_unprepare(iclk);
        spi_master_put(master);
        return ret;
 }
index 63de214..c079368 100644 (file)
@@ -373,7 +373,7 @@ static int ov2680_get_fmt(struct v4l2_subdev *sd,
 static int ov2680_detect(struct i2c_client *client)
 {
        struct i2c_adapter *adapter = client->adapter;
-       u32 high, low;
+       u32 high = 0, low = 0;
        int ret;
        u16 id;
        u8 revision;
@@ -383,7 +383,7 @@ static int ov2680_detect(struct i2c_client *client)
 
        ret = ov_read_reg8(client, OV2680_SC_CMMN_CHIP_ID_H, &high);
        if (ret) {
-               dev_err(&client->dev, "sensor_id_high = 0x%x\n", high);
+               dev_err(&client->dev, "sensor_id_high read failed (%d)\n", ret);
                return -ENODEV;
        }
        ret = ov_read_reg8(client, OV2680_SC_CMMN_CHIP_ID_L, &low);
index 32700cb..ca2efcc 100644 (file)
@@ -354,7 +354,7 @@ static int imx8mq_mipi_csi_start_stream(struct csi_state *state,
                                        struct v4l2_subdev_state *sd_state)
 {
        int ret;
-       u32 hs_settle;
+       u32 hs_settle = 0;
 
        ret = imx8mq_mipi_csi_sw_reset(state);
        if (ret)
index 67a0a1f..044e48e 100644 (file)
@@ -6,4 +6,3 @@ TODO:
        - make driver self-contained instead of being split between staging and
          arch/mips/cavium-octeon.
 
-Contact: Aaro Koskinen <aaro.koskinen@iki.fi>
index 834cce5..b516c28 100644 (file)
@@ -364,8 +364,6 @@ struct iscsi_np *iscsit_add_np(
        init_completion(&np->np_restart_comp);
        INIT_LIST_HEAD(&np->np_list);
 
-       timer_setup(&np->np_login_timer, iscsi_handle_login_thread_timeout, 0);
-
        ret = iscsi_target_setup_login_socket(np, sockaddr);
        if (ret != 0) {
                kfree(np);
index 274bdd7..90b870f 100644 (file)
@@ -811,59 +811,6 @@ void iscsi_post_login_handler(
        iscsit_dec_conn_usage_count(conn);
 }
 
-void iscsi_handle_login_thread_timeout(struct timer_list *t)
-{
-       struct iscsi_np *np = from_timer(np, t, np_login_timer);
-
-       spin_lock_bh(&np->np_thread_lock);
-       pr_err("iSCSI Login timeout on Network Portal %pISpc\n",
-                       &np->np_sockaddr);
-
-       if (np->np_login_timer_flags & ISCSI_TF_STOP) {
-               spin_unlock_bh(&np->np_thread_lock);
-               return;
-       }
-
-       if (np->np_thread)
-               send_sig(SIGINT, np->np_thread, 1);
-
-       np->np_login_timer_flags &= ~ISCSI_TF_RUNNING;
-       spin_unlock_bh(&np->np_thread_lock);
-}
-
-static void iscsi_start_login_thread_timer(struct iscsi_np *np)
-{
-       /*
-        * This used the TA_LOGIN_TIMEOUT constant because at this
-        * point we do not have access to ISCSI_TPG_ATTRIB(tpg)->login_timeout
-        */
-       spin_lock_bh(&np->np_thread_lock);
-       np->np_login_timer_flags &= ~ISCSI_TF_STOP;
-       np->np_login_timer_flags |= ISCSI_TF_RUNNING;
-       mod_timer(&np->np_login_timer, jiffies + TA_LOGIN_TIMEOUT * HZ);
-
-       pr_debug("Added timeout timer to iSCSI login request for"
-                       " %u seconds.\n", TA_LOGIN_TIMEOUT);
-       spin_unlock_bh(&np->np_thread_lock);
-}
-
-static void iscsi_stop_login_thread_timer(struct iscsi_np *np)
-{
-       spin_lock_bh(&np->np_thread_lock);
-       if (!(np->np_login_timer_flags & ISCSI_TF_RUNNING)) {
-               spin_unlock_bh(&np->np_thread_lock);
-               return;
-       }
-       np->np_login_timer_flags |= ISCSI_TF_STOP;
-       spin_unlock_bh(&np->np_thread_lock);
-
-       del_timer_sync(&np->np_login_timer);
-
-       spin_lock_bh(&np->np_thread_lock);
-       np->np_login_timer_flags &= ~ISCSI_TF_RUNNING;
-       spin_unlock_bh(&np->np_thread_lock);
-}
-
 int iscsit_setup_np(
        struct iscsi_np *np,
        struct sockaddr_storage *sockaddr)
@@ -1123,10 +1070,13 @@ static struct iscsit_conn *iscsit_alloc_conn(struct iscsi_np *np)
        spin_lock_init(&conn->nopin_timer_lock);
        spin_lock_init(&conn->response_queue_lock);
        spin_lock_init(&conn->state_lock);
+       spin_lock_init(&conn->login_worker_lock);
+       spin_lock_init(&conn->login_timer_lock);
 
        timer_setup(&conn->nopin_response_timer,
                    iscsit_handle_nopin_response_timeout, 0);
        timer_setup(&conn->nopin_timer, iscsit_handle_nopin_timeout, 0);
+       timer_setup(&conn->login_timer, iscsit_login_timeout, 0);
 
        if (iscsit_conn_set_transport(conn, np->np_transport) < 0)
                goto free_conn;
@@ -1304,7 +1254,7 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
                goto new_sess_out;
        }
 
-       iscsi_start_login_thread_timer(np);
+       iscsit_start_login_timer(conn, current);
 
        pr_debug("Moving to TARG_CONN_STATE_XPT_UP.\n");
        conn->conn_state = TARG_CONN_STATE_XPT_UP;
@@ -1417,8 +1367,6 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
        if (ret < 0)
                goto new_sess_out;
 
-       iscsi_stop_login_thread_timer(np);
-
        if (ret == 1) {
                tpg_np = conn->tpg_np;
 
@@ -1434,7 +1382,7 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
 new_sess_out:
        new_sess = true;
 old_sess_out:
-       iscsi_stop_login_thread_timer(np);
+       iscsit_stop_login_timer(conn);
        tpg_np = conn->tpg_np;
        iscsi_target_login_sess_out(conn, zero_tsih, new_sess);
        new_sess = false;
@@ -1448,7 +1396,6 @@ old_sess_out:
        return 1;
 
 exit:
-       iscsi_stop_login_thread_timer(np);
        spin_lock_bh(&np->np_thread_lock);
        np->np_thread_state = ISCSI_NP_THREAD_EXIT;
        spin_unlock_bh(&np->np_thread_lock);
index 24040c1..fa3fb5f 100644 (file)
@@ -535,25 +535,6 @@ static void iscsi_target_login_drop(struct iscsit_conn *conn, struct iscsi_login
        iscsi_target_login_sess_out(conn, zero_tsih, true);
 }
 
-struct conn_timeout {
-       struct timer_list timer;
-       struct iscsit_conn *conn;
-};
-
-static void iscsi_target_login_timeout(struct timer_list *t)
-{
-       struct conn_timeout *timeout = from_timer(timeout, t, timer);
-       struct iscsit_conn *conn = timeout->conn;
-
-       pr_debug("Entering iscsi_target_login_timeout >>>>>>>>>>>>>>>>>>>\n");
-
-       if (conn->login_kworker) {
-               pr_debug("Sending SIGINT to conn->login_kworker %s/%d\n",
-                        conn->login_kworker->comm, conn->login_kworker->pid);
-               send_sig(SIGINT, conn->login_kworker, 1);
-       }
-}
-
 static void iscsi_target_do_login_rx(struct work_struct *work)
 {
        struct iscsit_conn *conn = container_of(work,
@@ -562,12 +543,15 @@ static void iscsi_target_do_login_rx(struct work_struct *work)
        struct iscsi_np *np = login->np;
        struct iscsi_portal_group *tpg = conn->tpg;
        struct iscsi_tpg_np *tpg_np = conn->tpg_np;
-       struct conn_timeout timeout;
        int rc, zero_tsih = login->zero_tsih;
        bool state;
 
        pr_debug("entering iscsi_target_do_login_rx, conn: %p, %s:%d\n",
                        conn, current->comm, current->pid);
+
+       spin_lock(&conn->login_worker_lock);
+       set_bit(LOGIN_FLAGS_WORKER_RUNNING, &conn->login_flags);
+       spin_unlock(&conn->login_worker_lock);
        /*
         * If iscsi_target_do_login_rx() has been invoked by ->sk_data_ready()
         * before initial PDU processing in iscsi_target_start_negotiation()
@@ -597,19 +581,16 @@ static void iscsi_target_do_login_rx(struct work_struct *work)
                goto err;
        }
 
-       conn->login_kworker = current;
        allow_signal(SIGINT);
-
-       timeout.conn = conn;
-       timer_setup_on_stack(&timeout.timer, iscsi_target_login_timeout, 0);
-       mod_timer(&timeout.timer, jiffies + TA_LOGIN_TIMEOUT * HZ);
-       pr_debug("Starting login timer for %s/%d\n", current->comm, current->pid);
+       rc = iscsit_set_login_timer_kworker(conn, current);
+       if (rc < 0) {
+               /* The login timer has already expired */
+               pr_debug("iscsi_target_do_login_rx, login failed\n");
+               goto err;
+       }
 
        rc = conn->conn_transport->iscsit_get_login_rx(conn, login);
-       del_timer_sync(&timeout.timer);
-       destroy_timer_on_stack(&timeout.timer);
        flush_signals(current);
-       conn->login_kworker = NULL;
 
        if (rc < 0)
                goto err;
@@ -646,7 +627,17 @@ static void iscsi_target_do_login_rx(struct work_struct *work)
                if (iscsi_target_sk_check_and_clear(conn,
                                                    LOGIN_FLAGS_WRITE_ACTIVE))
                        goto err;
+
+               /*
+                * Set the login timer thread pointer to NULL to prevent the
+                * login process from getting stuck if the initiator
+                * stops sending data.
+                */
+               rc = iscsit_set_login_timer_kworker(conn, NULL);
+               if (rc < 0)
+                       goto err;
        } else if (rc == 1) {
+               iscsit_stop_login_timer(conn);
                cancel_delayed_work(&conn->login_work);
                iscsi_target_nego_release(conn);
                iscsi_post_login_handler(np, conn, zero_tsih);
@@ -656,6 +647,7 @@ static void iscsi_target_do_login_rx(struct work_struct *work)
 
 err:
        iscsi_target_restore_sock_callbacks(conn);
+       iscsit_stop_login_timer(conn);
        cancel_delayed_work(&conn->login_work);
        iscsi_target_login_drop(conn, login);
        iscsit_deaccess_np(np, tpg, tpg_np);
@@ -1130,6 +1122,7 @@ int iscsi_target_locate_portal(
        iscsi_target_set_sock_callbacks(conn);
 
        login->np = np;
+       conn->tpg = NULL;
 
        login_req = (struct iscsi_login_req *) login->req;
        payload_length = ntoh24(login_req->dlength);
@@ -1197,7 +1190,6 @@ int iscsi_target_locate_portal(
         */
        sessiontype = strncmp(s_buf, DISCOVERY, 9);
        if (!sessiontype) {
-               conn->tpg = iscsit_global->discovery_tpg;
                if (!login->leading_connection)
                        goto get_target;
 
@@ -1214,9 +1206,11 @@ int iscsi_target_locate_portal(
                 * Serialize access across the discovery struct iscsi_portal_group to
                 * process login attempt.
                 */
+               conn->tpg = iscsit_global->discovery_tpg;
                if (iscsit_access_np(np, conn->tpg) < 0) {
                        iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR,
                                ISCSI_LOGIN_STATUS_SVC_UNAVAILABLE);
+                       conn->tpg = NULL;
                        ret = -1;
                        goto out;
                }
@@ -1368,14 +1362,30 @@ int iscsi_target_start_negotiation(
         * and perform connection cleanup now.
         */
        ret = iscsi_target_do_login(conn, login);
-       if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
-               ret = -1;
+       if (!ret) {
+               spin_lock(&conn->login_worker_lock);
+
+               if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
+                       ret = -1;
+               else if (!test_bit(LOGIN_FLAGS_WORKER_RUNNING, &conn->login_flags)) {
+                       if (iscsit_set_login_timer_kworker(conn, NULL) < 0) {
+                               /*
+                                * The timeout has expired already.
+                                * Schedule login_work to perform the cleanup.
+                                */
+                               schedule_delayed_work(&conn->login_work, 0);
+                       }
+               }
+
+               spin_unlock(&conn->login_worker_lock);
+       }
 
        if (ret < 0) {
                iscsi_target_restore_sock_callbacks(conn);
                iscsi_remove_failed_auth_entry(conn);
        }
        if (ret != 0) {
+               iscsit_stop_login_timer(conn);
                cancel_delayed_work_sync(&conn->login_work);
                iscsi_target_nego_release(conn);
        }
index 26dc8ed..b14835f 100644 (file)
@@ -1040,6 +1040,57 @@ void iscsit_stop_nopin_timer(struct iscsit_conn *conn)
        spin_unlock_bh(&conn->nopin_timer_lock);
 }
 
+void iscsit_login_timeout(struct timer_list *t)
+{
+       struct iscsit_conn *conn = from_timer(conn, t, login_timer);
+       struct iscsi_login *login = conn->login;
+
+       pr_debug("Entering iscsi_target_login_timeout >>>>>>>>>>>>>>>>>>>\n");
+
+       spin_lock_bh(&conn->login_timer_lock);
+       login->login_failed = 1;
+
+       if (conn->login_kworker) {
+               pr_debug("Sending SIGINT to conn->login_kworker %s/%d\n",
+                        conn->login_kworker->comm, conn->login_kworker->pid);
+               send_sig(SIGINT, conn->login_kworker, 1);
+       } else {
+               schedule_delayed_work(&conn->login_work, 0);
+       }
+       spin_unlock_bh(&conn->login_timer_lock);
+}
+
+void iscsit_start_login_timer(struct iscsit_conn *conn, struct task_struct *kthr)
+{
+       pr_debug("Login timer started\n");
+
+       conn->login_kworker = kthr;
+       mod_timer(&conn->login_timer, jiffies + TA_LOGIN_TIMEOUT * HZ);
+}
+
+int iscsit_set_login_timer_kworker(struct iscsit_conn *conn, struct task_struct *kthr)
+{
+       struct iscsi_login *login = conn->login;
+       int ret = 0;
+
+       spin_lock_bh(&conn->login_timer_lock);
+       if (login->login_failed) {
+               /* The timer has already expired */
+               ret = -1;
+       } else {
+               conn->login_kworker = kthr;
+       }
+       spin_unlock_bh(&conn->login_timer_lock);
+
+       return ret;
+}
+
+void iscsit_stop_login_timer(struct iscsit_conn *conn)
+{
+       pr_debug("Login timer stopped\n");
+       timer_delete_sync(&conn->login_timer);
+}
+
 int iscsit_send_tx_data(
        struct iscsit_cmd *cmd,
        struct iscsit_conn *conn,
index 33ea799..24b8e57 100644 (file)
@@ -56,6 +56,10 @@ extern void iscsit_handle_nopin_timeout(struct timer_list *t);
 extern void __iscsit_start_nopin_timer(struct iscsit_conn *);
 extern void iscsit_start_nopin_timer(struct iscsit_conn *);
 extern void iscsit_stop_nopin_timer(struct iscsit_conn *);
+extern void iscsit_login_timeout(struct timer_list *t);
+extern void iscsit_start_login_timer(struct iscsit_conn *, struct task_struct *kthr);
+extern void iscsit_stop_login_timer(struct iscsit_conn *);
+extern int iscsit_set_login_timer_kworker(struct iscsit_conn *, struct task_struct *kthr);
 extern int iscsit_send_tx_data(struct iscsit_cmd *, struct iscsit_conn *, int);
 extern int iscsit_fe_sendpage_sg(struct iscsit_cmd *, struct iscsit_conn *);
 extern int iscsit_tx_login_rsp(struct iscsit_conn *, u8, u8);
index cc838ff..3c462d6 100644 (file)
@@ -90,7 +90,7 @@ static int iblock_configure_device(struct se_device *dev)
        struct request_queue *q;
        struct block_device *bd = NULL;
        struct blk_integrity *bi;
-       fmode_t mode;
+       blk_mode_t mode = BLK_OPEN_READ;
        unsigned int max_write_zeroes_sectors;
        int ret;
 
@@ -108,13 +108,12 @@ static int iblock_configure_device(struct se_device *dev)
        pr_debug( "IBLOCK: Claiming struct block_device: %s\n",
                        ib_dev->ibd_udev_path);
 
-       mode = FMODE_READ|FMODE_EXCL;
        if (!ib_dev->ibd_readonly)
-               mode |= FMODE_WRITE;
+               mode |= BLK_OPEN_WRITE;
        else
                dev->dev_flags |= DF_READ_ONLY;
 
-       bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev);
+       bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev, NULL);
        if (IS_ERR(bd)) {
                ret = PTR_ERR(bd);
                goto out_free_bioset;
@@ -175,7 +174,7 @@ static int iblock_configure_device(struct se_device *dev)
        return 0;
 
 out_blkdev_put:
-       blkdev_put(ib_dev->ibd_bd, FMODE_WRITE|FMODE_READ|FMODE_EXCL);
+       blkdev_put(ib_dev->ibd_bd, ib_dev);
 out_free_bioset:
        bioset_exit(&ib_dev->ibd_bio_set);
 out:
@@ -201,7 +200,7 @@ static void iblock_destroy_device(struct se_device *dev)
        struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
 
        if (ib_dev->ibd_bd != NULL)
-               blkdev_put(ib_dev->ibd_bd, FMODE_WRITE|FMODE_READ|FMODE_EXCL);
+               blkdev_put(ib_dev->ibd_bd, ib_dev);
        bioset_exit(&ib_dev->ibd_bio_set);
 }
 
index e742554..0d4f096 100644 (file)
@@ -366,8 +366,8 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
         * Claim exclusive struct block_device access to struct scsi_device
         * for TYPE_DISK and TYPE_ZBC using supplied udev_path
         */
-       bd = blkdev_get_by_path(dev->udev_path,
-                               FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv);
+       bd = blkdev_get_by_path(dev->udev_path, BLK_OPEN_WRITE | BLK_OPEN_READ,
+                               pdv, NULL);
        if (IS_ERR(bd)) {
                pr_err("pSCSI: blkdev_get_by_path() failed\n");
                scsi_device_put(sd);
@@ -377,7 +377,7 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
 
        ret = pscsi_add_device_to_list(dev, sd);
        if (ret) {
-               blkdev_put(pdv->pdv_bd, FMODE_WRITE|FMODE_READ|FMODE_EXCL);
+               blkdev_put(pdv->pdv_bd, pdv);
                scsi_device_put(sd);
                return ret;
        }
@@ -565,8 +565,7 @@ static void pscsi_destroy_device(struct se_device *dev)
                 */
                if ((sd->type == TYPE_DISK || sd->type == TYPE_ZBC) &&
                    pdv->pdv_bd) {
-                       blkdev_put(pdv->pdv_bd,
-                                  FMODE_WRITE|FMODE_READ|FMODE_EXCL);
+                       blkdev_put(pdv->pdv_bd, pdv);
                        pdv->pdv_bd = NULL;
                }
                /*
index 86adff2..687adc9 100644 (file)
@@ -504,6 +504,8 @@ target_setup_session(struct se_portal_group *tpg,
 
 free_sess:
        transport_free_session(sess);
+       return ERR_PTR(rc);
+
 free_cnt:
        target_free_cmd_counter(cmd_cnt);
        return ERR_PTR(rc);
index ff48c3e..e2014e2 100644 (file)
@@ -118,16 +118,18 @@ struct tee_cmd_unmap_shared_mem {
 
 /**
  * struct tee_cmd_load_ta - load Trusted Application (TA) binary into TEE
- * @low_addr:    [in] bits [31:0] of the physical address of the TA binary
- * @hi_addr:     [in] bits [63:32] of the physical address of the TA binary
- * @size:        [in] size of TA binary in bytes
- * @ta_handle:   [out] return handle of the loaded TA
+ * @low_addr:       [in] bits [31:0] of the physical address of the TA binary
+ * @hi_addr:        [in] bits [63:32] of the physical address of the TA binary
+ * @size:           [in] size of TA binary in bytes
+ * @ta_handle:      [out] return handle of the loaded TA
+ * @return_origin:  [out] origin of return code after TEE processing
  */
 struct tee_cmd_load_ta {
        u32 low_addr;
        u32 hi_addr;
        u32 size;
        u32 ta_handle;
+       u32 return_origin;
 };
 
 /**
index e8cd9aa..e9b63dc 100644 (file)
@@ -423,19 +423,23 @@ int handle_load_ta(void *data, u32 size, struct tee_ioctl_open_session_arg *arg)
        if (ret) {
                arg->ret_origin = TEEC_ORIGIN_COMMS;
                arg->ret = TEEC_ERROR_COMMUNICATION;
-       } else if (arg->ret == TEEC_SUCCESS) {
-               ret = get_ta_refcount(load_cmd.ta_handle);
-               if (!ret) {
-                       arg->ret_origin = TEEC_ORIGIN_COMMS;
-                       arg->ret = TEEC_ERROR_OUT_OF_MEMORY;
-
-                       /* Unload the TA on error */
-                       unload_cmd.ta_handle = load_cmd.ta_handle;
-                       psp_tee_process_cmd(TEE_CMD_ID_UNLOAD_TA,
-                                           (void *)&unload_cmd,
-                                           sizeof(unload_cmd), &ret);
-               } else {
-                       set_session_id(load_cmd.ta_handle, 0, &arg->session);
+       } else {
+               arg->ret_origin = load_cmd.return_origin;
+
+               if (arg->ret == TEEC_SUCCESS) {
+                       ret = get_ta_refcount(load_cmd.ta_handle);
+                       if (!ret) {
+                               arg->ret_origin = TEEC_ORIGIN_COMMS;
+                               arg->ret = TEEC_ERROR_OUT_OF_MEMORY;
+
+                               /* Unload the TA on error */
+                               unload_cmd.ta_handle = load_cmd.ta_handle;
+                               psp_tee_process_cmd(TEE_CMD_ID_UNLOAD_TA,
+                                                   (void *)&unload_cmd,
+                                                   sizeof(unload_cmd), &ret);
+                       } else {
+                               set_session_id(load_cmd.ta_handle, 0, &arg->session);
+                       }
                }
        }
        mutex_unlock(&ta_refcount_mutex);
index 49702cb..3861ae0 100644 (file)
@@ -1004,8 +1004,10 @@ static u32 get_async_notif_value(optee_invoke_fn *invoke_fn, bool *value_valid,
 
        invoke_fn(OPTEE_SMC_GET_ASYNC_NOTIF_VALUE, 0, 0, 0, 0, 0, 0, 0, &res);
 
-       if (res.a0)
+       if (res.a0) {
+               *value_valid = false;
                return 0;
+       }
        *value_valid = (res.a2 & OPTEE_SMC_ASYNC_NOTIF_VALUE_VALID);
        *value_pending = (res.a2 & OPTEE_SMC_ASYNC_NOTIF_VALUE_PENDING);
        return res.a1;
index 4cd7ab7..19a4b33 100644 (file)
@@ -130,6 +130,14 @@ config THERMAL_DEFAULT_GOV_POWER_ALLOCATOR
          system and device power allocation. This governor can only
          operate on cooling devices that implement the power API.
 
+config THERMAL_DEFAULT_GOV_BANG_BANG
+       bool "bang_bang"
+       depends on THERMAL_GOV_BANG_BANG
+       help
+         Use the bang_bang governor as default. This throttles the
+         devices one step at the time, taking into account the trip
+         point hysteresis.
+
 endchoice
 
 config THERMAL_GOV_FAIR_SHARE
index 3abc2dc..756b218 100644 (file)
@@ -282,8 +282,7 @@ static int amlogic_thermal_probe(struct platform_device *pdev)
                return ret;
        }
 
-       if (devm_thermal_add_hwmon_sysfs(&pdev->dev, pdata->tzd))
-               dev_warn(&pdev->dev, "Failed to add hwmon sysfs attributes\n");
+       devm_thermal_add_hwmon_sysfs(&pdev->dev, pdata->tzd);
 
        ret = amlogic_thermal_initialize(pdata);
        if (ret)
index 0e8dfa6..9f6dc4f 100644 (file)
@@ -231,7 +231,7 @@ static void armada380_init(struct platform_device *pdev,
        regmap_write(priv->syscon, data->syscon_control0_off, reg);
 }
 
-static void armada_ap806_init(struct platform_device *pdev,
+static void armada_ap80x_init(struct platform_device *pdev,
                              struct armada_thermal_priv *priv)
 {
        struct armada_thermal_data *data = priv->data;
@@ -614,7 +614,7 @@ static const struct armada_thermal_data armada380_data = {
 };
 
 static const struct armada_thermal_data armada_ap806_data = {
-       .init = armada_ap806_init,
+       .init = armada_ap80x_init,
        .is_valid_bit = BIT(16),
        .temp_shift = 0,
        .temp_mask = 0x3ff,
@@ -637,6 +637,30 @@ static const struct armada_thermal_data armada_ap806_data = {
        .cpu_nr = 4,
 };
 
+static const struct armada_thermal_data armada_ap807_data = {
+       .init = armada_ap80x_init,
+       .is_valid_bit = BIT(16),
+       .temp_shift = 0,
+       .temp_mask = 0x3ff,
+       .thresh_shift = 3,
+       .hyst_shift = 19,
+       .hyst_mask = 0x3,
+       .coef_b = -128900LL,
+       .coef_m = 394ULL,
+       .coef_div = 1,
+       .inverted = true,
+       .signed_sample = true,
+       .syscon_control0_off = 0x84,
+       .syscon_control1_off = 0x88,
+       .syscon_status_off = 0x8C,
+       .dfx_irq_cause_off = 0x108,
+       .dfx_irq_mask_off = 0x10C,
+       .dfx_overheat_irq = BIT(22),
+       .dfx_server_irq_mask_off = 0x104,
+       .dfx_server_irq_en = BIT(1),
+       .cpu_nr = 4,
+};
+
 static const struct armada_thermal_data armada_cp110_data = {
        .init = armada_cp110_init,
        .is_valid_bit = BIT(10),
@@ -681,6 +705,10 @@ static const struct of_device_id armada_thermal_id_table[] = {
                .data       = &armada_ap806_data,
        },
        {
+               .compatible = "marvell,armada-ap807-thermal",
+               .data       = &armada_ap807_data,
+       },
+       {
                .compatible = "marvell,armada-cp110-thermal",
                .data       = &armada_cp110_data,
        },
index d8005e9..d4b4086 100644 (file)
@@ -343,8 +343,7 @@ static int imx8mm_tmu_probe(struct platform_device *pdev)
                }
                tmu->sensors[i].hw_id = i;
 
-               if (devm_thermal_add_hwmon_sysfs(&pdev->dev, tmu->sensors[i].tzd))
-                       dev_warn(&pdev->dev, "failed to add hwmon sysfs attributes\n");
+               devm_thermal_add_hwmon_sysfs(&pdev->dev, tmu->sensors[i].tzd);
        }
 
        platform_set_drvdata(pdev, tmu);
index 839bb99..8d6b4ef 100644 (file)
@@ -116,8 +116,7 @@ static int imx_sc_thermal_probe(struct platform_device *pdev)
                        return ret;
                }
 
-               if (devm_thermal_add_hwmon_sysfs(&pdev->dev, sensor->tzd))
-                       dev_warn(&pdev->dev, "failed to add hwmon sysfs attributes\n");
+               devm_thermal_add_hwmon_sysfs(&pdev->dev, sensor->tzd);
        }
 
        return 0;
index 01b8033..dc519a6 100644 (file)
@@ -203,6 +203,151 @@ end:
 }
 EXPORT_SYMBOL(acpi_parse_art);
 
+/*
+ * acpi_parse_psvt - Passive Table (PSVT) for passive cooling
+ *
+ * @handle: ACPI handle of the device which contains PSVT
+ * @psvt_count: the number of valid entries resulted from parsing PSVT
+ * @psvtp: pointer to array of psvt entries
+ *
+ */
+static int acpi_parse_psvt(acpi_handle handle, int *psvt_count, struct psvt **psvtp)
+{
+       struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+       int nr_bad_entries = 0, revision = 0;
+       union acpi_object *p;
+       acpi_status status;
+       int i, result = 0;
+       struct psvt *psvts;
+
+       if (!acpi_has_method(handle, "PSVT"))
+               return -ENODEV;
+
+       status = acpi_evaluate_object(handle, "PSVT", NULL, &buffer);
+       if (ACPI_FAILURE(status))
+               return -ENODEV;
+
+       p = buffer.pointer;
+       if (!p || (p->type != ACPI_TYPE_PACKAGE)) {
+               result = -EFAULT;
+               goto end;
+       }
+
+       /* first package is the revision number */
+       if (p->package.count > 0) {
+               union acpi_object *prev = &(p->package.elements[0]);
+
+               if (prev->type == ACPI_TYPE_INTEGER)
+                       revision = (int)prev->integer.value;
+       } else {
+               result = -EFAULT;
+               goto end;
+       }
+
+       /* Support only version 2 */
+       if (revision != 2) {
+               result = -EFAULT;
+               goto end;
+       }
+
+       *psvt_count = p->package.count - 1;
+       if (!*psvt_count) {
+               result = -EFAULT;
+               goto end;
+       }
+
+       psvts = kcalloc(*psvt_count, sizeof(*psvts), GFP_KERNEL);
+       if (!psvts) {
+               result = -ENOMEM;
+               goto end;
+       }
+
+       /* Start index is 1 because the first package is the revision number */
+       for (i = 1; i < p->package.count; i++) {
+               struct acpi_buffer psvt_int_format = { sizeof("RRNNNNNNNNNN"), "RRNNNNNNNNNN" };
+               struct acpi_buffer psvt_str_format = { sizeof("RRNNNNNSNNNN"), "RRNNNNNSNNNN" };
+               union acpi_object *package = &(p->package.elements[i]);
+               struct psvt *psvt = &psvts[i - 1 - nr_bad_entries];
+               struct acpi_buffer *psvt_format = &psvt_int_format;
+               struct acpi_buffer element = { 0, NULL };
+               union acpi_object *knob;
+               struct acpi_device *res;
+               struct psvt *psvt_ptr;
+
+               element.length = ACPI_ALLOCATE_BUFFER;
+               element.pointer = NULL;
+
+               if (package->package.count >= ACPI_NR_PSVT_ELEMENTS) {
+                       knob = &(package->package.elements[ACPI_PSVT_CONTROL_KNOB]);
+               } else {
+                       nr_bad_entries++;
+                       pr_info("PSVT package %d is invalid, ignored\n", i);
+                       continue;
+               }
+
+               if (knob->type == ACPI_TYPE_STRING) {
+                       psvt_format = &psvt_str_format;
+                       if (knob->string.length > ACPI_LIMIT_STR_MAX_LEN - 1) {
+                               pr_info("PSVT package %d limit string len exceeds max\n", i);
+                               knob->string.length = ACPI_LIMIT_STR_MAX_LEN - 1;
+                       }
+               }
+
+               status = acpi_extract_package(&(p->package.elements[i]), psvt_format, &element);
+               if (ACPI_FAILURE(status)) {
+                       nr_bad_entries++;
+                       pr_info("PSVT package %d is invalid, ignored\n", i);
+                       continue;
+               }
+
+               psvt_ptr = (struct psvt *)element.pointer;
+
+               memcpy(psvt, psvt_ptr, sizeof(*psvt));
+
+               /* The limit element can be string or U64 */
+               psvt->control_knob_type = (u64)knob->type;
+
+               if (knob->type == ACPI_TYPE_STRING) {
+                       memset(&psvt->limit, 0, sizeof(u64));
+                       strncpy(psvt->limit.string, psvt_ptr->limit.str_ptr, knob->string.length);
+               } else {
+                       psvt->limit.integer = psvt_ptr->limit.integer;
+               }
+
+               kfree(element.pointer);
+
+               res = acpi_fetch_acpi_dev(psvt->source);
+               if (!res) {
+                       nr_bad_entries++;
+                       pr_info("Failed to get source ACPI device\n");
+                       continue;
+               }
+
+               res = acpi_fetch_acpi_dev(psvt->target);
+               if (!res) {
+                       nr_bad_entries++;
+                       pr_info("Failed to get target ACPI device\n");
+                       continue;
+               }
+       }
+
+       /* don't count bad entries */
+       *psvt_count -= nr_bad_entries;
+
+       if (!*psvt_count) {
+               result = -EFAULT;
+               kfree(psvts);
+               goto end;
+       }
+
+       *psvtp = psvts;
+
+       return 0;
+
+end:
+       kfree(buffer.pointer);
+       return result;
+}
 
 /* get device name from acpi handle */
 static void get_single_name(acpi_handle handle, char *name)
@@ -289,6 +434,57 @@ free_trt:
        return ret;
 }
 
+static int fill_psvt(char __user *ubuf)
+{
+       int i, ret, count, psvt_len;
+       union psvt_object *psvt_user;
+       struct psvt *psvts;
+
+       ret = acpi_parse_psvt(acpi_thermal_rel_handle, &count, &psvts);
+       if (ret)
+               return ret;
+
+       psvt_len = count * sizeof(*psvt_user);
+
+       psvt_user = kzalloc(psvt_len, GFP_KERNEL);
+       if (!psvt_user) {
+               ret = -ENOMEM;
+               goto free_psvt;
+       }
+
+       /* now fill in user psvt data */
+       for (i = 0; i < count; i++) {
+               /* userspace psvt needs device name instead of acpi reference */
+               get_single_name(psvts[i].source, psvt_user[i].source_device);
+               get_single_name(psvts[i].target, psvt_user[i].target_device);
+
+               psvt_user[i].priority = psvts[i].priority;
+               psvt_user[i].sample_period = psvts[i].sample_period;
+               psvt_user[i].passive_temp = psvts[i].passive_temp;
+               psvt_user[i].source_domain = psvts[i].source_domain;
+               psvt_user[i].control_knob = psvts[i].control_knob;
+               psvt_user[i].step_size = psvts[i].step_size;
+               psvt_user[i].limit_coeff = psvts[i].limit_coeff;
+               psvt_user[i].unlimit_coeff = psvts[i].unlimit_coeff;
+               psvt_user[i].control_knob_type = psvts[i].control_knob_type;
+               if (psvt_user[i].control_knob_type == ACPI_TYPE_STRING)
+                       strncpy(psvt_user[i].limit.string, psvts[i].limit.string,
+                               ACPI_LIMIT_STR_MAX_LEN);
+               else
+                       psvt_user[i].limit.integer = psvts[i].limit.integer;
+
+       }
+
+       if (copy_to_user(ubuf, psvt_user, psvt_len))
+               ret = -EFAULT;
+
+       kfree(psvt_user);
+
+free_psvt:
+       kfree(psvts);
+       return ret;
+}
+
 static long acpi_thermal_rel_ioctl(struct file *f, unsigned int cmd,
                                   unsigned long __arg)
 {
@@ -298,6 +494,7 @@ static long acpi_thermal_rel_ioctl(struct file *f, unsigned int cmd,
        char __user *arg = (void __user *)__arg;
        struct trt *trts = NULL;
        struct art *arts = NULL;
+       struct psvt *psvts;
 
        switch (cmd) {
        case ACPI_THERMAL_GET_TRT_COUNT:
@@ -336,6 +533,27 @@ static long acpi_thermal_rel_ioctl(struct file *f, unsigned int cmd,
        case ACPI_THERMAL_GET_ART:
                return fill_art(arg);
 
+       case ACPI_THERMAL_GET_PSVT_COUNT:
+               ret = acpi_parse_psvt(acpi_thermal_rel_handle, &count, &psvts);
+               if (!ret) {
+                       kfree(psvts);
+                       return put_user(count, (unsigned long __user *)__arg);
+               }
+               return ret;
+
+       case ACPI_THERMAL_GET_PSVT_LEN:
+               /* total length of the data retrieved (count * PSVT entry size) */
+               ret = acpi_parse_psvt(acpi_thermal_rel_handle, &count, &psvts);
+               length = count * sizeof(union psvt_object);
+               if (!ret) {
+                       kfree(psvts);
+                       return put_user(length, (unsigned long __user *)__arg);
+               }
+               return ret;
+
+       case ACPI_THERMAL_GET_PSVT:
+               return fill_psvt(arg);
+
        default:
                return -ENOTTY;
        }
index 78d9424..ac376d8 100644 (file)
 #define ACPI_THERMAL_GET_TRT   _IOR(ACPI_THERMAL_MAGIC, 5, unsigned long)
 #define ACPI_THERMAL_GET_ART   _IOR(ACPI_THERMAL_MAGIC, 6, unsigned long)
 
+/*
+ * ACPI_THERMAL_GET_PSVT_COUNT = Number of PSVT entries
+ * ACPI_THERMAL_GET_PSVT_LEN = Total return data size (PSVT count x each
+ * PSVT entry size)
+ * ACPI_THERMAL_GET_PSVT = Get the data as an array of psvt_objects
+ */
+#define ACPI_THERMAL_GET_PSVT_LEN _IOR(ACPI_THERMAL_MAGIC, 7, unsigned long)
+#define ACPI_THERMAL_GET_PSVT_COUNT _IOR(ACPI_THERMAL_MAGIC, 8, unsigned long)
+#define ACPI_THERMAL_GET_PSVT  _IOR(ACPI_THERMAL_MAGIC, 9, unsigned long)
+
 struct art {
        acpi_handle source;
        acpi_handle target;
@@ -43,6 +53,32 @@ struct trt {
        u64 reserved4;
 } __packed;
 
+#define ACPI_NR_PSVT_ELEMENTS  12
+#define ACPI_PSVT_CONTROL_KNOB 7
+#define ACPI_LIMIT_STR_MAX_LEN 8
+
+struct psvt {
+       acpi_handle source;
+       acpi_handle target;
+       u64 priority;
+       u64 sample_period;
+       u64 passive_temp;
+       u64 source_domain;
+       u64 control_knob;
+       union {
+               /* For limit_type = ACPI_TYPE_INTEGER */
+               u64 integer;
+               /* For limit_type = ACPI_TYPE_STRING */
+               char string[ACPI_LIMIT_STR_MAX_LEN];
+               char *str_ptr;
+       } limit;
+       u64 step_size;
+       u64 limit_coeff;
+       u64 unlimit_coeff;
+       /* Spec calls this field reserved, so we borrow it for type info */
+       u64 control_knob_type; /* ACPI_TYPE_STRING or ACPI_TYPE_INTEGER */
+} __packed;
+
 #define ACPI_NR_ART_ELEMENTS 13
 /* for usrspace */
 union art_object {
@@ -77,6 +113,27 @@ union trt_object {
        u64 __data[8];
 };
 
+union psvt_object {
+       struct {
+               char source_device[8];
+               char target_device[8];
+               u64 priority;
+               u64 sample_period;
+               u64 passive_temp;
+               u64 source_domain;
+               u64 control_knob;
+               union {
+                       u64 integer;
+                       char string[ACPI_LIMIT_STR_MAX_LEN];
+               } limit;
+               u64 step_size;
+               u64 limit_coeff;
+               u64 unlimit_coeff;
+               u64 control_knob_type;
+       };
+       u64 __data[ACPI_NR_PSVT_ELEMENTS];
+};
+
 #ifdef __KERNEL__
 int acpi_thermal_rel_misc_device_add(acpi_handle handle);
 int acpi_thermal_rel_misc_device_remove(acpi_handle handle);
index 810231b..5e11642 100644 (file)
@@ -131,7 +131,7 @@ static ssize_t available_uuids_show(struct device *dev,
 
        for (i = 0; i < INT3400_THERMAL_MAXIMUM_UUID; i++) {
                if (priv->uuid_bitmap & (1 << i))
-                       length += sysfs_emit_at(buf, length, int3400_thermal_uuids[i]);
+                       length += sysfs_emit_at(buf, length, "%s\n", int3400_thermal_uuids[i]);
        }
 
        return length;
@@ -149,7 +149,7 @@ static ssize_t current_uuid_show(struct device *dev,
 
        for (i = 0; i <= INT3400_THERMAL_CRITICAL; i++) {
                if (priv->os_uuid_mask & BIT(i))
-                       length += sysfs_emit_at(buf, length, int3400_thermal_uuids[i]);
+                       length += sysfs_emit_at(buf, length, "%s\n", int3400_thermal_uuids[i]);
        }
 
        if (length)
index a205221..013f163 100644 (file)
@@ -15,8 +15,8 @@ static const struct rapl_mmio_regs rapl_mmio_default = {
        .reg_unit = 0x5938,
        .regs[RAPL_DOMAIN_PACKAGE] = { 0x59a0, 0x593c, 0x58f0, 0, 0x5930},
        .regs[RAPL_DOMAIN_DRAM] = { 0x58e0, 0x58e8, 0x58ec, 0, 0},
-       .limits[RAPL_DOMAIN_PACKAGE] = 2,
-       .limits[RAPL_DOMAIN_DRAM] = 2,
+       .limits[RAPL_DOMAIN_PACKAGE] = BIT(POWER_LIMIT2),
+       .limits[RAPL_DOMAIN_DRAM] = BIT(POWER_LIMIT2),
 };
 
 static int rapl_mmio_cpu_online(unsigned int cpu)
@@ -27,9 +27,9 @@ static int rapl_mmio_cpu_online(unsigned int cpu)
        if (topology_physical_package_id(cpu))
                return 0;
 
-       rp = rapl_find_package_domain(cpu, &rapl_mmio_priv);
+       rp = rapl_find_package_domain(cpu, &rapl_mmio_priv, true);
        if (!rp) {
-               rp = rapl_add_package(cpu, &rapl_mmio_priv);
+               rp = rapl_add_package(cpu, &rapl_mmio_priv, true);
                if (IS_ERR(rp))
                        return PTR_ERR(rp);
        }
@@ -42,7 +42,7 @@ static int rapl_mmio_cpu_down_prep(unsigned int cpu)
        struct rapl_package *rp;
        int lead_cpu;
 
-       rp = rapl_find_package_domain(cpu, &rapl_mmio_priv);
+       rp = rapl_find_package_domain(cpu, &rapl_mmio_priv, true);
        if (!rp)
                return 0;
 
@@ -97,6 +97,7 @@ int proc_thermal_rapl_add(struct pci_dev *pdev, struct proc_thermal_device *proc
                                                rapl_regs->regs[domain][reg];
                rapl_mmio_priv.limits[domain] = rapl_regs->limits[domain];
        }
+       rapl_mmio_priv.type = RAPL_IF_MMIO;
        rapl_mmio_priv.reg_unit = (u64)proc_priv->mmio_base + rapl_regs->reg_unit;
 
        rapl_mmio_priv.read_raw = rapl_mmio_read_raw;
index f99dc7e..db97499 100644 (file)
@@ -398,7 +398,7 @@ struct intel_soc_dts_sensors *intel_soc_dts_iosf_init(
        spin_lock_init(&sensors->intr_notify_lock);
        mutex_init(&sensors->dts_update_lock);
        sensors->intr_type = intr_type;
-       sensors->tj_max = tj_max;
+       sensors->tj_max = tj_max * 1000;
        if (intr_type == INTEL_SOC_DTS_INTERRUPT_NONE)
                notification = false;
        else
index 7912104..1c3e590 100644 (file)
@@ -222,8 +222,7 @@ static int k3_bandgap_probe(struct platform_device *pdev)
                        goto err_alloc;
                }
 
-               if (devm_thermal_add_hwmon_sysfs(dev, data[id].tzd))
-                       dev_warn(dev, "Failed to add hwmon sysfs attributes\n");
+               devm_thermal_add_hwmon_sysfs(dev, data[id].tzd);
        }
 
        platform_set_drvdata(pdev, bgp);
index 0b55288..f59d36d 100644 (file)
@@ -1222,12 +1222,7 @@ static int mtk_thermal_probe(struct platform_device *pdev)
                return -ENODEV;
        }
 
-       auxadc_base = devm_of_iomap(&pdev->dev, auxadc, 0, NULL);
-       if (IS_ERR(auxadc_base)) {
-               of_node_put(auxadc);
-               return PTR_ERR(auxadc_base);
-       }
-
+       auxadc_base = of_iomap(auxadc, 0);
        auxadc_phys_base = of_get_phys_base(auxadc);
 
        of_node_put(auxadc);
@@ -1243,12 +1238,7 @@ static int mtk_thermal_probe(struct platform_device *pdev)
                return -ENODEV;
        }
 
-       apmixed_base = devm_of_iomap(&pdev->dev, apmixedsys, 0, NULL);
-       if (IS_ERR(apmixed_base)) {
-               of_node_put(apmixedsys);
-               return PTR_ERR(apmixed_base);
-       }
-
+       apmixed_base = of_iomap(apmixedsys, 0);
        apmixed_phys_base = of_get_phys_base(apmixedsys);
 
        of_node_put(apmixedsys);
index d0a3f95..b693fac 100644 (file)
@@ -19,6 +19,8 @@
 #include <linux/thermal.h>
 #include <dt-bindings/thermal/mediatek,lvts-thermal.h>
 
+#include "../thermal_hwmon.h"
+
 #define LVTS_MONCTL0(__base)   (__base + 0x0000)
 #define LVTS_MONCTL1(__base)   (__base + 0x0004)
 #define LVTS_MONCTL2(__base)   (__base + 0x0008)
@@ -996,6 +998,8 @@ static int lvts_ctrl_start(struct device *dev, struct lvts_ctrl *lvts_ctrl)
                        return PTR_ERR(tz);
                }
 
+               devm_thermal_add_hwmon_sysfs(dev, tz);
+
                /*
                 * The thermal zone pointer will be needed in the
                 * interrupt handler, we store it in the sensor
index 5749149..5ddc39b 100644 (file)
@@ -689,9 +689,7 @@ static int adc_tm5_register_tzd(struct adc_tm5_chip *adc_tm)
                        return PTR_ERR(tzd);
                }
                adc_tm->channels[i].tzd = tzd;
-               if (devm_thermal_add_hwmon_sysfs(adc_tm->dev, tzd))
-                       dev_warn(adc_tm->dev,
-                                "Failed to add hwmon sysfs attributes\n");
+               devm_thermal_add_hwmon_sysfs(adc_tm->dev, tzd);
        }
 
        return 0;
index 0f88e98..0e8ebfc 100644 (file)
@@ -411,22 +411,19 @@ static int qpnp_tm_probe(struct platform_device *pdev)
        chip->base = res;
 
        ret = qpnp_tm_read(chip, QPNP_TM_REG_TYPE, &type);
-       if (ret < 0) {
-               dev_err(&pdev->dev, "could not read type\n");
-               return ret;
-       }
+       if (ret < 0)
+               return dev_err_probe(&pdev->dev, ret,
+                                    "could not read type\n");
 
        ret = qpnp_tm_read(chip, QPNP_TM_REG_SUBTYPE, &subtype);
-       if (ret < 0) {
-               dev_err(&pdev->dev, "could not read subtype\n");
-               return ret;
-       }
+       if (ret < 0)
+               return dev_err_probe(&pdev->dev, ret,
+                                    "could not read subtype\n");
 
        ret = qpnp_tm_read(chip, QPNP_TM_REG_DIG_MAJOR, &dig_major);
-       if (ret < 0) {
-               dev_err(&pdev->dev, "could not read dig_major\n");
-               return ret;
-       }
+       if (ret < 0)
+               return dev_err_probe(&pdev->dev, ret,
+                                    "could not read dig_major\n");
 
        if (type != QPNP_TM_TYPE || (subtype != QPNP_TM_SUBTYPE_GEN1
                                     && subtype != QPNP_TM_SUBTYPE_GEN2)) {
@@ -448,20 +445,15 @@ static int qpnp_tm_probe(struct platform_device *pdev)
         */
        chip->tz_dev = devm_thermal_of_zone_register(
                &pdev->dev, 0, chip, &qpnp_tm_sensor_ops);
-       if (IS_ERR(chip->tz_dev)) {
-               dev_err(&pdev->dev, "failed to register sensor\n");
-               return PTR_ERR(chip->tz_dev);
-       }
+       if (IS_ERR(chip->tz_dev))
+               return dev_err_probe(&pdev->dev, PTR_ERR(chip->tz_dev),
+                                    "failed to register sensor\n");
 
        ret = qpnp_tm_init(chip);
-       if (ret < 0) {
-               dev_err(&pdev->dev, "init failed\n");
-               return ret;
-       }
+       if (ret < 0)
+               return dev_err_probe(&pdev->dev, ret, "init failed\n");
 
-       if (devm_thermal_add_hwmon_sysfs(&pdev->dev, chip->tz_dev))
-               dev_warn(&pdev->dev,
-                        "Failed to add hwmon sysfs attributes\n");
+       devm_thermal_add_hwmon_sysfs(&pdev->dev, chip->tz_dev);
 
        ret = devm_request_threaded_irq(&pdev->dev, irq, NULL, qpnp_tm_isr,
                                        IRQF_ONESHOT, node->name, chip);
index e89c6f3..a941b42 100644 (file)
@@ -39,26 +39,6 @@ struct tsens_legacy_calibration_format tsens_8916_nvmem = {
        },
 };
 
-struct tsens_legacy_calibration_format tsens_8939_nvmem = {
-       .base_len = 8,
-       .base_shift = 2,
-       .sp_len = 6,
-       .mode = { 12, 0 },
-       .invalid = { 12, 2 },
-       .base = { { 0, 0 }, { 1, 24 } },
-       .sp = {
-               { { 12, 3 },  { 12, 9 } },
-               { { 12, 15 }, { 12, 21 } },
-               { { 12, 27 }, { 13, 1 } },
-               { { 13, 7 },  { 13, 13 } },
-               { { 13, 19 }, { 13, 25 } },
-               { { 0, 8 },   { 0, 14 } },
-               { { 0, 20 },  { 0, 26 } },
-               { { 1, 0 },   { 1, 6 } },
-               { { 1, 12 },  { 1, 18 } },
-       },
-};
-
 struct tsens_legacy_calibration_format tsens_8974_nvmem = {
        .base_len = 8,
        .base_shift = 2,
@@ -103,22 +83,6 @@ struct tsens_legacy_calibration_format tsens_8974_backup_nvmem = {
        },
 };
 
-struct tsens_legacy_calibration_format tsens_9607_nvmem = {
-       .base_len = 8,
-       .base_shift = 2,
-       .sp_len = 6,
-       .mode = { 2, 20 },
-       .invalid = { 2, 22 },
-       .base = { { 0, 0 }, { 2, 12 } },
-       .sp = {
-               { { 0, 8 },  { 0, 14 } },
-               { { 0, 20 }, { 0, 26 } },
-               { { 1, 0 },  { 1, 6 } },
-               { { 1, 12 }, { 1, 18 } },
-               { { 2, 0 },  { 2, 6 } },
-       },
-};
-
 static int calibrate_8916(struct tsens_priv *priv)
 {
        u32 p1[5], p2[5];
@@ -243,6 +207,39 @@ static int calibrate_8974(struct tsens_priv *priv)
        return 0;
 }
 
+static int __init init_8226(struct tsens_priv *priv)
+{
+       priv->sensor[0].slope = 2901;
+       priv->sensor[1].slope = 2846;
+       priv->sensor[2].slope = 3038;
+       priv->sensor[3].slope = 2955;
+       priv->sensor[4].slope = 2901;
+       priv->sensor[5].slope = 2846;
+
+       return init_common(priv);
+}
+
+static int __init init_8909(struct tsens_priv *priv)
+{
+       int i;
+
+       for (i = 0; i < priv->num_sensors; ++i)
+               priv->sensor[i].slope = 3000;
+
+       priv->sensor[0].p1_calib_offset = 0;
+       priv->sensor[0].p2_calib_offset = 0;
+       priv->sensor[1].p1_calib_offset = -10;
+       priv->sensor[1].p2_calib_offset = -6;
+       priv->sensor[2].p1_calib_offset = 0;
+       priv->sensor[2].p2_calib_offset = 0;
+       priv->sensor[3].p1_calib_offset = -9;
+       priv->sensor[3].p2_calib_offset = -9;
+       priv->sensor[4].p1_calib_offset = -8;
+       priv->sensor[4].p2_calib_offset = -10;
+
+       return init_common(priv);
+}
+
 static int __init init_8939(struct tsens_priv *priv) {
        priv->sensor[0].slope = 2911;
        priv->sensor[1].slope = 2789;
@@ -258,7 +255,28 @@ static int __init init_8939(struct tsens_priv *priv) {
        return init_common(priv);
 }
 
-/* v0.1: 8916, 8939, 8974, 9607 */
+static int __init init_9607(struct tsens_priv *priv)
+{
+       int i;
+
+       for (i = 0; i < priv->num_sensors; ++i)
+               priv->sensor[i].slope = 3000;
+
+       priv->sensor[0].p1_calib_offset = 1;
+       priv->sensor[0].p2_calib_offset = 1;
+       priv->sensor[1].p1_calib_offset = -4;
+       priv->sensor[1].p2_calib_offset = -2;
+       priv->sensor[2].p1_calib_offset = 4;
+       priv->sensor[2].p2_calib_offset = 8;
+       priv->sensor[3].p1_calib_offset = -3;
+       priv->sensor[3].p2_calib_offset = -5;
+       priv->sensor[4].p1_calib_offset = -4;
+       priv->sensor[4].p2_calib_offset = -4;
+
+       return init_common(priv);
+}
+
+/* v0.1: 8226, 8909, 8916, 8939, 8974, 9607 */
 
 static struct tsens_features tsens_v0_1_feat = {
        .ver_major      = VER_0_1,
@@ -313,6 +331,32 @@ static const struct tsens_ops ops_v0_1 = {
        .get_temp       = get_temp_common,
 };
 
+static const struct tsens_ops ops_8226 = {
+       .init           = init_8226,
+       .calibrate      = tsens_calibrate_common,
+       .get_temp       = get_temp_common,
+};
+
+struct tsens_plat_data data_8226 = {
+       .num_sensors    = 6,
+       .ops            = &ops_8226,
+       .feat           = &tsens_v0_1_feat,
+       .fields = tsens_v0_1_regfields,
+};
+
+static const struct tsens_ops ops_8909 = {
+       .init           = init_8909,
+       .calibrate      = tsens_calibrate_common,
+       .get_temp       = get_temp_common,
+};
+
+struct tsens_plat_data data_8909 = {
+       .num_sensors    = 5,
+       .ops            = &ops_8909,
+       .feat           = &tsens_v0_1_feat,
+       .fields = tsens_v0_1_regfields,
+};
+
 static const struct tsens_ops ops_8916 = {
        .init           = init_common,
        .calibrate      = calibrate_8916,
@@ -356,9 +400,15 @@ struct tsens_plat_data data_8974 = {
        .fields = tsens_v0_1_regfields,
 };
 
+static const struct tsens_ops ops_9607 = {
+       .init           = init_9607,
+       .calibrate      = tsens_calibrate_common,
+       .get_temp       = get_temp_common,
+};
+
 struct tsens_plat_data data_9607 = {
        .num_sensors    = 5,
-       .ops            = &ops_v0_1,
+       .ops            = &ops_9607,
        .feat           = &tsens_v0_1_feat,
        .fields = tsens_v0_1_regfields,
 };
index b822a42..5132243 100644 (file)
@@ -42,28 +42,6 @@ struct tsens_legacy_calibration_format tsens_qcs404_nvmem = {
        },
 };
 
-struct tsens_legacy_calibration_format tsens_8976_nvmem = {
-       .base_len = 8,
-       .base_shift = 2,
-       .sp_len = 6,
-       .mode = { 4, 0 },
-       .invalid = { 4, 2 },
-       .base = { { 0, 0 }, { 2, 8 } },
-       .sp = {
-               { { 0, 8 },  { 0, 14 } },
-               { { 0, 20 }, { 0, 26 } },
-               { { 1, 0 },  { 1, 6 } },
-               { { 1, 12 }, { 1, 18 } },
-               { { 2, 8 },  { 2, 14 } },
-               { { 2, 20 }, { 2, 26 } },
-               { { 3, 0 },  { 3, 6 } },
-               { { 3, 12 }, { 3, 18 } },
-               { { 4, 2 },  { 4, 9 } },
-               { { 4, 14 }, { 4, 21 } },
-               { { 4, 26 }, { 5, 1 } },
-       },
-};
-
 static int calibrate_v1(struct tsens_priv *priv)
 {
        u32 p1[10], p2[10];
index d321812..98c356a 100644 (file)
@@ -134,10 +134,12 @@ int tsens_read_calibration(struct tsens_priv *priv, int shift, u32 *p1, u32 *p2,
                        p1[i] = p1[i] + (base1 << shift);
                break;
        case TWO_PT_CALIB:
+       case TWO_PT_CALIB_NO_OFFSET:
                for (i = 0; i < priv->num_sensors; i++)
                        p2[i] = (p2[i] + base2) << shift;
                fallthrough;
        case ONE_PT_CALIB2:
+       case ONE_PT_CALIB2_NO_OFFSET:
                for (i = 0; i < priv->num_sensors; i++)
                        p1[i] = (p1[i] + base1) << shift;
                break;
@@ -149,6 +151,18 @@ int tsens_read_calibration(struct tsens_priv *priv, int shift, u32 *p1, u32 *p2,
                }
        }
 
+       /* Apply calibration offset workaround except for _NO_OFFSET modes */
+       switch (mode) {
+       case TWO_PT_CALIB:
+               for (i = 0; i < priv->num_sensors; i++)
+                       p2[i] += priv->sensor[i].p2_calib_offset;
+               fallthrough;
+       case ONE_PT_CALIB2:
+               for (i = 0; i < priv->num_sensors; i++)
+                       p1[i] += priv->sensor[i].p1_calib_offset;
+               break;
+       }
+
        return mode;
 }
 
@@ -254,7 +268,7 @@ void compute_intercept_slope(struct tsens_priv *priv, u32 *p1,
 
                if (!priv->sensor[i].slope)
                        priv->sensor[i].slope = SLOPE_DEFAULT;
-               if (mode == TWO_PT_CALIB) {
+               if (mode == TWO_PT_CALIB || mode == TWO_PT_CALIB_NO_OFFSET) {
                        /*
                         * slope (m) = adc_code2 - adc_code1 (y2 - y1)/
                         *      temp_120_degc - temp_30_degc (x2 - x1)
@@ -1096,6 +1110,12 @@ static const struct of_device_id tsens_table[] = {
                .compatible = "qcom,mdm9607-tsens",
                .data = &data_9607,
        }, {
+               .compatible = "qcom,msm8226-tsens",
+               .data = &data_8226,
+       }, {
+               .compatible = "qcom,msm8909-tsens",
+               .data = &data_8909,
+       }, {
                .compatible = "qcom,msm8916-tsens",
                .data = &data_8916,
        }, {
@@ -1189,9 +1209,7 @@ static int tsens_register(struct tsens_priv *priv)
                if (priv->ops->enable)
                        priv->ops->enable(priv, i);
 
-               if (devm_thermal_add_hwmon_sysfs(priv->dev, tzd))
-                       dev_warn(priv->dev,
-                                "Failed to add hwmon sysfs attributes\n");
+               devm_thermal_add_hwmon_sysfs(priv->dev, tzd);
        }
 
        /* VER_0 require to set MIN and MAX THRESH
index dba9cd3..2805de1 100644 (file)
@@ -10,6 +10,8 @@
 #define ONE_PT_CALIB           0x1
 #define ONE_PT_CALIB2          0x2
 #define TWO_PT_CALIB           0x3
+#define ONE_PT_CALIB2_NO_OFFSET        0x6
+#define TWO_PT_CALIB_NO_OFFSET 0x7
 #define CAL_DEGC_PT1           30
 #define CAL_DEGC_PT2           120
 #define SLOPE_FACTOR           1000
@@ -57,6 +59,8 @@ struct tsens_sensor {
        unsigned int                    hw_id;
        int                             slope;
        u32                             status;
+       int                             p1_calib_offset;
+       int                             p2_calib_offset;
 };
 
 /**
@@ -635,7 +639,7 @@ int get_temp_common(const struct tsens_sensor *s, int *temp);
 extern struct tsens_plat_data data_8960;
 
 /* TSENS v0.1 targets */
-extern struct tsens_plat_data data_8916, data_8939, data_8974, data_9607;
+extern struct tsens_plat_data data_8226, data_8909, data_8916, data_8939, data_8974, data_9607;
 
 /* TSENS v1 targets */
 extern struct tsens_plat_data data_tsens_v1, data_8976, data_8956;
index e587563..ccc2eea 100644 (file)
@@ -31,7 +31,6 @@
 #define TMR_DISABLE    0x0
 #define TMR_ME         0x80000000
 #define TMR_ALPF       0x0c000000
-#define TMR_MSITE_ALL  GENMASK(15, 0)
 
 #define REGS_TMTMIR    0x008   /* Temperature measurement interval Register */
 #define TMTMIR_DEFAULT 0x0000000f
@@ -51,6 +50,7 @@
                                            * Site Register
                                            */
 #define TRITSR_V       BIT(31)
+#define TRITSR_TP5     BIT(9)
 #define REGS_V2_TMSAR(n)       (0x304 + 16 * (n))      /* TMU monitoring
                                                * site adjustment register
                                                */
@@ -105,6 +105,11 @@ static int tmu_get_temp(struct thermal_zone_device *tz, int *temp)
         * within sensor range. TEMP is an 9 bit value representing
         * temperature in KelVin.
         */
+
+       regmap_read(qdata->regmap, REGS_TMR, &val);
+       if (!(val & TMR_ME))
+               return -EAGAIN;
+
        if (regmap_read_poll_timeout(qdata->regmap,
                                     REGS_TRITSR(qsensor->id),
                                     val,
@@ -113,10 +118,15 @@ static int tmu_get_temp(struct thermal_zone_device *tz, int *temp)
                                     10 * USEC_PER_MSEC))
                return -ENODATA;
 
-       if (qdata->ver == TMU_VER1)
+       if (qdata->ver == TMU_VER1) {
                *temp = (val & GENMASK(7, 0)) * MILLIDEGREE_PER_DEGREE;
-       else
-               *temp = kelvin_to_millicelsius(val & GENMASK(8, 0));
+       } else {
+               if (val & TRITSR_TP5)
+                       *temp = milli_kelvin_to_millicelsius((val & GENMASK(8, 0)) *
+                                                            MILLIDEGREE_PER_DEGREE + 500);
+               else
+                       *temp = kelvin_to_millicelsius(val & GENMASK(8, 0));
+       }
 
        return 0;
 }
@@ -128,15 +138,7 @@ static const struct thermal_zone_device_ops tmu_tz_ops = {
 static int qoriq_tmu_register_tmu_zone(struct device *dev,
                                       struct qoriq_tmu_data *qdata)
 {
-       int id;
-
-       if (qdata->ver == TMU_VER1) {
-               regmap_write(qdata->regmap, REGS_TMR,
-                            TMR_MSITE_ALL | TMR_ME | TMR_ALPF);
-       } else {
-               regmap_write(qdata->regmap, REGS_V2_TMSR, TMR_MSITE_ALL);
-               regmap_write(qdata->regmap, REGS_TMR, TMR_ME | TMR_ALPF_V2);
-       }
+       int id, sites = 0;
 
        for (id = 0; id < SITES_MAX; id++) {
                struct thermal_zone_device *tzd;
@@ -153,14 +155,24 @@ static int qoriq_tmu_register_tmu_zone(struct device *dev,
                        if (ret == -ENODEV)
                                continue;
 
-                       regmap_write(qdata->regmap, REGS_TMR, TMR_DISABLE);
                        return ret;
                }
 
-               if (devm_thermal_add_hwmon_sysfs(dev, tzd))
-                       dev_warn(dev,
-                                "Failed to add hwmon sysfs attributes\n");
+               if (qdata->ver == TMU_VER1)
+                       sites |= 0x1 << (15 - id);
+               else
+                       sites |= 0x1 << id;
+
+               devm_thermal_add_hwmon_sysfs(dev, tzd);
+       }
 
+       if (sites) {
+               if (qdata->ver == TMU_VER1) {
+                       regmap_write(qdata->regmap, REGS_TMR, TMR_ME | TMR_ALPF | sites);
+               } else {
+                       regmap_write(qdata->regmap, REGS_V2_TMSR, sites);
+                       regmap_write(qdata->regmap, REGS_TMR, TMR_ME | TMR_ALPF_V2);
+               }
        }
 
        return 0;
@@ -208,8 +220,6 @@ static int qoriq_tmu_calibration(struct device *dev,
 
 static void qoriq_tmu_init_device(struct qoriq_tmu_data *data)
 {
-       int i;
-
        /* Disable interrupt, using polling instead */
        regmap_write(data->regmap, REGS_TIER, TIER_DISABLE);
 
@@ -220,8 +230,6 @@ static void qoriq_tmu_init_device(struct qoriq_tmu_data *data)
        } else {
                regmap_write(data->regmap, REGS_V2_TMTMIR, TMTMIR_DEFAULT);
                regmap_write(data->regmap, REGS_V2_TEUMR(0), TEUMR0_V2);
-               for (i = 0; i < SITES_MAX; i++)
-                       regmap_write(data->regmap, REGS_V2_TMSAR(i), TMSARA_V2);
        }
 
        /* Disable monitoring */
@@ -230,7 +238,7 @@ static void qoriq_tmu_init_device(struct qoriq_tmu_data *data)
 
 static const struct regmap_range qoriq_yes_ranges[] = {
        regmap_reg_range(REGS_TMR, REGS_TSCFGR),
-       regmap_reg_range(REGS_TTRnCR(0), REGS_TTRnCR(3)),
+       regmap_reg_range(REGS_TTRnCR(0), REGS_TTRnCR(15)),
        regmap_reg_range(REGS_V2_TEUMR(0), REGS_V2_TEUMR(2)),
        regmap_reg_range(REGS_V2_TMSAR(0), REGS_V2_TMSAR(15)),
        regmap_reg_range(REGS_IPBRR(0), REGS_IPBRR(1)),
index 42a4724..9029d01 100644 (file)
 #define REG_GEN3_PTAT2         0x60
 #define REG_GEN3_PTAT3         0x64
 #define REG_GEN3_THSCP         0x68
+#define REG_GEN4_THSFMON00     0x180
+#define REG_GEN4_THSFMON01     0x184
+#define REG_GEN4_THSFMON02     0x188
+#define REG_GEN4_THSFMON15     0x1BC
+#define REG_GEN4_THSFMON16     0x1C0
+#define REG_GEN4_THSFMON17     0x1C4
 
 /* IRQ{STR,MSK,EN} bits */
 #define IRQ_TEMP1              BIT(0)
@@ -55,6 +61,7 @@
 
 #define MCELSIUS(temp) ((temp) * 1000)
 #define GEN3_FUSE_MASK 0xFFF
+#define GEN4_FUSE_MASK 0xFFF
 
 #define TSC_MAX_NUM    5
 
@@ -66,6 +73,13 @@ struct equation_coefs {
        int b2;
 };
 
+struct rcar_gen3_thermal_priv;
+
+struct rcar_thermal_info {
+       int ths_tj_1;
+       void (*read_fuses)(struct rcar_gen3_thermal_priv *priv);
+};
+
 struct rcar_gen3_thermal_tsc {
        void __iomem *base;
        struct thermal_zone_device *zone;
@@ -79,6 +93,7 @@ struct rcar_gen3_thermal_priv {
        struct thermal_zone_device_ops ops;
        unsigned int num_tscs;
        int ptat[3];
+       const struct rcar_thermal_info *info;
 };
 
 static inline u32 rcar_gen3_thermal_read(struct rcar_gen3_thermal_tsc *tsc,
@@ -236,6 +251,62 @@ static irqreturn_t rcar_gen3_thermal_irq(int irq, void *data)
        return IRQ_HANDLED;
 }
 
+static void rcar_gen3_thermal_read_fuses_gen3(struct rcar_gen3_thermal_priv *priv)
+{
+       unsigned int i;
+
+       /*
+        * Set the pseudo calibration points with fused values.
+        * PTAT is shared between all TSCs but only fused for the first
+        * TSC while THCODEs are fused for each TSC.
+        */
+       priv->ptat[0] = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN3_PTAT1) &
+               GEN3_FUSE_MASK;
+       priv->ptat[1] = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN3_PTAT2) &
+               GEN3_FUSE_MASK;
+       priv->ptat[2] = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN3_PTAT3) &
+               GEN3_FUSE_MASK;
+
+       for (i = 0; i < priv->num_tscs; i++) {
+               struct rcar_gen3_thermal_tsc *tsc = priv->tscs[i];
+
+               tsc->thcode[0] = rcar_gen3_thermal_read(tsc, REG_GEN3_THCODE1) &
+                       GEN3_FUSE_MASK;
+               tsc->thcode[1] = rcar_gen3_thermal_read(tsc, REG_GEN3_THCODE2) &
+                       GEN3_FUSE_MASK;
+               tsc->thcode[2] = rcar_gen3_thermal_read(tsc, REG_GEN3_THCODE3) &
+                       GEN3_FUSE_MASK;
+       }
+}
+
+static void rcar_gen3_thermal_read_fuses_gen4(struct rcar_gen3_thermal_priv *priv)
+{
+       unsigned int i;
+
+       /*
+        * Set the pseudo calibration points with fused values.
+        * PTAT is shared between all TSCs but only fused for the first
+        * TSC while THCODEs are fused for each TSC.
+        */
+       priv->ptat[0] = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN4_THSFMON16) &
+               GEN4_FUSE_MASK;
+       priv->ptat[1] = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN4_THSFMON17) &
+               GEN4_FUSE_MASK;
+       priv->ptat[2] = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN4_THSFMON15) &
+               GEN4_FUSE_MASK;
+
+       for (i = 0; i < priv->num_tscs; i++) {
+               struct rcar_gen3_thermal_tsc *tsc = priv->tscs[i];
+
+               tsc->thcode[0] = rcar_gen3_thermal_read(tsc, REG_GEN4_THSFMON01) &
+                       GEN4_FUSE_MASK;
+               tsc->thcode[1] = rcar_gen3_thermal_read(tsc, REG_GEN4_THSFMON02) &
+                       GEN4_FUSE_MASK;
+               tsc->thcode[2] = rcar_gen3_thermal_read(tsc, REG_GEN4_THSFMON00) &
+                       GEN4_FUSE_MASK;
+       }
+}
+
 static bool rcar_gen3_thermal_read_fuses(struct rcar_gen3_thermal_priv *priv)
 {
        unsigned int i;
@@ -243,7 +314,8 @@ static bool rcar_gen3_thermal_read_fuses(struct rcar_gen3_thermal_priv *priv)
 
        /* If fuses are not set, fallback to pseudo values. */
        thscp = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN3_THSCP);
-       if ((thscp & THSCP_COR_PARA_VLD) != THSCP_COR_PARA_VLD) {
+       if (!priv->info->read_fuses ||
+           (thscp & THSCP_COR_PARA_VLD) != THSCP_COR_PARA_VLD) {
                /* Default THCODE values in case FUSEs are not set. */
                static const int thcodes[TSC_MAX_NUM][3] = {
                        { 3397, 2800, 2221 },
@@ -268,29 +340,7 @@ static bool rcar_gen3_thermal_read_fuses(struct rcar_gen3_thermal_priv *priv)
                return false;
        }
 
-       /*
-        * Set the pseudo calibration points with fused values.
-        * PTAT is shared between all TSCs but only fused for the first
-        * TSC while THCODEs are fused for each TSC.
-        */
-       priv->ptat[0] = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN3_PTAT1) &
-               GEN3_FUSE_MASK;
-       priv->ptat[1] = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN3_PTAT2) &
-               GEN3_FUSE_MASK;
-       priv->ptat[2] = rcar_gen3_thermal_read(priv->tscs[0], REG_GEN3_PTAT3) &
-               GEN3_FUSE_MASK;
-
-       for (i = 0; i < priv->num_tscs; i++) {
-               struct rcar_gen3_thermal_tsc *tsc = priv->tscs[i];
-
-               tsc->thcode[0] = rcar_gen3_thermal_read(tsc, REG_GEN3_THCODE1) &
-                       GEN3_FUSE_MASK;
-               tsc->thcode[1] = rcar_gen3_thermal_read(tsc, REG_GEN3_THCODE2) &
-                       GEN3_FUSE_MASK;
-               tsc->thcode[2] = rcar_gen3_thermal_read(tsc, REG_GEN3_THCODE3) &
-                       GEN3_FUSE_MASK;
-       }
-
+       priv->info->read_fuses(priv);
        return true;
 }
 
@@ -318,52 +368,65 @@ static void rcar_gen3_thermal_init(struct rcar_gen3_thermal_priv *priv,
        usleep_range(1000, 2000);
 }
 
-static const int rcar_gen3_ths_tj_1 = 126;
-static const int rcar_gen3_ths_tj_1_m3_w = 116;
+static const struct rcar_thermal_info rcar_m3w_thermal_info = {
+       .ths_tj_1 = 116,
+       .read_fuses = rcar_gen3_thermal_read_fuses_gen3,
+};
+
+static const struct rcar_thermal_info rcar_gen3_thermal_info = {
+       .ths_tj_1 = 126,
+       .read_fuses = rcar_gen3_thermal_read_fuses_gen3,
+};
+
+static const struct rcar_thermal_info rcar_gen4_thermal_info = {
+       .ths_tj_1 = 126,
+       .read_fuses = rcar_gen3_thermal_read_fuses_gen4,
+};
+
 static const struct of_device_id rcar_gen3_thermal_dt_ids[] = {
        {
                .compatible = "renesas,r8a774a1-thermal",
-               .data = &rcar_gen3_ths_tj_1_m3_w,
+               .data = &rcar_m3w_thermal_info,
        },
        {
                .compatible = "renesas,r8a774b1-thermal",
-               .data = &rcar_gen3_ths_tj_1,
+               .data = &rcar_gen3_thermal_info,
        },
        {
                .compatible = "renesas,r8a774e1-thermal",
-               .data = &rcar_gen3_ths_tj_1,
+               .data = &rcar_gen3_thermal_info,
        },
        {
                .compatible = "renesas,r8a7795-thermal",
-               .data = &rcar_gen3_ths_tj_1,
+               .data = &rcar_gen3_thermal_info,
        },
        {
                .compatible = "renesas,r8a7796-thermal",
-               .data = &rcar_gen3_ths_tj_1_m3_w,
+               .data = &rcar_m3w_thermal_info,
        },
        {
                .compatible = "renesas,r8a77961-thermal",
-               .data = &rcar_gen3_ths_tj_1_m3_w,
+               .data = &rcar_m3w_thermal_info,
        },
        {
                .compatible = "renesas,r8a77965-thermal",
-               .data = &rcar_gen3_ths_tj_1,
+               .data = &rcar_gen3_thermal_info,
        },
        {
                .compatible = "renesas,r8a77980-thermal",
-               .data = &rcar_gen3_ths_tj_1,
+               .data = &rcar_gen3_thermal_info,
        },
        {
                .compatible = "renesas,r8a779a0-thermal",
-               .data = &rcar_gen3_ths_tj_1,
+               .data = &rcar_gen3_thermal_info,
        },
        {
                .compatible = "renesas,r8a779f0-thermal",
-               .data = &rcar_gen3_ths_tj_1,
+               .data = &rcar_gen4_thermal_info,
        },
        {
                .compatible = "renesas,r8a779g0-thermal",
-               .data = &rcar_gen3_ths_tj_1,
+               .data = &rcar_gen4_thermal_info,
        },
        {},
 };
@@ -418,7 +481,6 @@ static int rcar_gen3_thermal_probe(struct platform_device *pdev)
 {
        struct rcar_gen3_thermal_priv *priv;
        struct device *dev = &pdev->dev;
-       const int *ths_tj_1 = of_device_get_match_data(dev);
        struct resource *res;
        struct thermal_zone_device *zone;
        unsigned int i;
@@ -430,6 +492,7 @@ static int rcar_gen3_thermal_probe(struct platform_device *pdev)
 
        priv->ops = rcar_gen3_tz_of_ops;
 
+       priv->info = of_device_get_match_data(dev);
        platform_set_drvdata(pdev, priv);
 
        if (rcar_gen3_thermal_request_irqs(priv, pdev))
@@ -469,7 +532,7 @@ static int rcar_gen3_thermal_probe(struct platform_device *pdev)
                struct rcar_gen3_thermal_tsc *tsc = priv->tscs[i];
 
                rcar_gen3_thermal_init(priv, tsc);
-               rcar_gen3_thermal_calc_coefs(priv, tsc, *ths_tj_1);
+               rcar_gen3_thermal_calc_coefs(priv, tsc, priv->info->ths_tj_1);
 
                zone = devm_thermal_of_zone_register(dev, i, tsc, &priv->ops);
                if (IS_ERR(zone)) {
index 2d30420..0d6249b 100644 (file)
@@ -227,14 +227,12 @@ sensor_off:
 }
 EXPORT_SYMBOL_GPL(st_thermal_register);
 
-int st_thermal_unregister(struct platform_device *pdev)
+void st_thermal_unregister(struct platform_device *pdev)
 {
        struct st_thermal_sensor *sensor = platform_get_drvdata(pdev);
 
        st_thermal_sensor_off(sensor);
        thermal_zone_device_unregister(sensor->thermal_dev);
-
-       return 0;
 }
 EXPORT_SYMBOL_GPL(st_thermal_unregister);
 
index d661b2f..75a84e6 100644 (file)
@@ -94,7 +94,7 @@ struct st_thermal_sensor {
 
 extern int st_thermal_register(struct platform_device *pdev,
                               const struct of_device_id *st_thermal_of_match);
-extern int st_thermal_unregister(struct platform_device *pdev);
+extern void st_thermal_unregister(struct platform_device *pdev);
 extern const struct dev_pm_ops st_thermal_pm_ops;
 
 #endif /* __STI_RESET_SYSCFG_H */
index d68596c..e8cfa83 100644 (file)
@@ -172,9 +172,9 @@ static int st_mmap_probe(struct platform_device *pdev)
        return st_thermal_register(pdev,  st_mmap_thermal_of_match);
 }
 
-static int st_mmap_remove(struct platform_device *pdev)
+static void st_mmap_remove(struct platform_device *pdev)
 {
-       return st_thermal_unregister(pdev);
+       st_thermal_unregister(pdev);
 }
 
 static struct platform_driver st_mmap_thermal_driver = {
@@ -184,7 +184,7 @@ static struct platform_driver st_mmap_thermal_driver = {
                .of_match_table = st_mmap_thermal_of_match,
        },
        .probe          = st_mmap_probe,
-       .remove         = st_mmap_remove,
+       .remove_new     = st_mmap_remove,
 };
 
 module_platform_driver(st_mmap_thermal_driver);
index 793ddce..195f3c5 100644 (file)
@@ -319,6 +319,11 @@ out:
        return ret;
 }
 
+static void sun8i_ths_reset_control_assert(void *data)
+{
+       reset_control_assert(data);
+}
+
 static int sun8i_ths_resource_init(struct ths_device *tmdev)
 {
        struct device *dev = tmdev->dev;
@@ -339,47 +344,35 @@ static int sun8i_ths_resource_init(struct ths_device *tmdev)
                if (IS_ERR(tmdev->reset))
                        return PTR_ERR(tmdev->reset);
 
-               tmdev->bus_clk = devm_clk_get(&pdev->dev, "bus");
+               ret = reset_control_deassert(tmdev->reset);
+               if (ret)
+                       return ret;
+
+               ret = devm_add_action_or_reset(dev, sun8i_ths_reset_control_assert,
+                                              tmdev->reset);
+               if (ret)
+                       return ret;
+
+               tmdev->bus_clk = devm_clk_get_enabled(&pdev->dev, "bus");
                if (IS_ERR(tmdev->bus_clk))
                        return PTR_ERR(tmdev->bus_clk);
        }
 
        if (tmdev->chip->has_mod_clk) {
-               tmdev->mod_clk = devm_clk_get(&pdev->dev, "mod");
+               tmdev->mod_clk = devm_clk_get_enabled(&pdev->dev, "mod");
                if (IS_ERR(tmdev->mod_clk))
                        return PTR_ERR(tmdev->mod_clk);
        }
 
-       ret = reset_control_deassert(tmdev->reset);
-       if (ret)
-               return ret;
-
-       ret = clk_prepare_enable(tmdev->bus_clk);
-       if (ret)
-               goto assert_reset;
-
        ret = clk_set_rate(tmdev->mod_clk, 24000000);
        if (ret)
-               goto bus_disable;
-
-       ret = clk_prepare_enable(tmdev->mod_clk);
-       if (ret)
-               goto bus_disable;
+               return ret;
 
        ret = sun8i_ths_calibrate(tmdev);
        if (ret)
-               goto mod_disable;
+               return ret;
 
        return 0;
-
-mod_disable:
-       clk_disable_unprepare(tmdev->mod_clk);
-bus_disable:
-       clk_disable_unprepare(tmdev->bus_clk);
-assert_reset:
-       reset_control_assert(tmdev->reset);
-
-       return ret;
 }
 
 static int sun8i_h3_thermal_init(struct ths_device *tmdev)
@@ -475,9 +468,7 @@ static int sun8i_ths_register(struct ths_device *tmdev)
                if (IS_ERR(tmdev->sensor[i].tzd))
                        return PTR_ERR(tmdev->sensor[i].tzd);
 
-               if (devm_thermal_add_hwmon_sysfs(tmdev->dev, tmdev->sensor[i].tzd))
-                       dev_warn(tmdev->dev,
-                                "Failed to add hwmon sysfs attributes\n");
+               devm_thermal_add_hwmon_sysfs(tmdev->dev, tmdev->sensor[i].tzd);
        }
 
        return 0;
@@ -530,17 +521,6 @@ static int sun8i_ths_probe(struct platform_device *pdev)
        return 0;
 }
 
-static int sun8i_ths_remove(struct platform_device *pdev)
-{
-       struct ths_device *tmdev = platform_get_drvdata(pdev);
-
-       clk_disable_unprepare(tmdev->mod_clk);
-       clk_disable_unprepare(tmdev->bus_clk);
-       reset_control_assert(tmdev->reset);
-
-       return 0;
-}
-
 static const struct ths_thermal_chip sun8i_a83t_ths = {
        .sensor_num = 3,
        .scale = 705,
@@ -642,7 +622,6 @@ MODULE_DEVICE_TABLE(of, of_ths_match);
 
 static struct platform_driver ths_driver = {
        .probe = sun8i_ths_probe,
-       .remove = sun8i_ths_remove,
        .driver = {
                .name = "sun8i-thermal",
                .of_match_table = of_ths_match,
index cb584a5..c243e9d 100644 (file)
@@ -523,8 +523,7 @@ static int tegra_tsensor_register_channel(struct tegra_tsensor *ts,
                return 0;
        }
 
-       if (devm_thermal_add_hwmon_sysfs(ts->dev, tsc->tzd))
-               dev_warn(ts->dev, "failed to add hwmon sysfs attributes\n");
+       devm_thermal_add_hwmon_sysfs(ts->dev, tsc->tzd);
 
        return 0;
 }
index 017b0ce..f4f1a04 100644 (file)
@@ -13,6 +13,8 @@
 #include <linux/slab.h>
 #include <linux/thermal.h>
 
+#include "thermal_hwmon.h"
+
 struct gadc_thermal_info {
        struct device *dev;
        struct thermal_zone_device *tz_dev;
@@ -153,6 +155,8 @@ static int gadc_thermal_probe(struct platform_device *pdev)
                return ret;
        }
 
+       devm_thermal_add_hwmon_sysfs(&pdev->dev, gti->tz_dev);
+
        return 0;
 }
 
index 3d4a787..17c1bbe 100644 (file)
@@ -23,6 +23,8 @@
 #define DEFAULT_THERMAL_GOVERNOR       "user_space"
 #elif defined(CONFIG_THERMAL_DEFAULT_GOV_POWER_ALLOCATOR)
 #define DEFAULT_THERMAL_GOVERNOR       "power_allocator"
+#elif defined(CONFIG_THERMAL_DEFAULT_GOV_BANG_BANG)
+#define DEFAULT_THERMAL_GOVERNOR       "bang_bang"
 #endif
 
 /* Initial state of a cooling device during binding */
index fbe5550..c3ae446 100644 (file)
@@ -271,11 +271,14 @@ int devm_thermal_add_hwmon_sysfs(struct device *dev, struct thermal_zone_device
 
        ptr = devres_alloc(devm_thermal_hwmon_release, sizeof(*ptr),
                           GFP_KERNEL);
-       if (!ptr)
+       if (!ptr) {
+               dev_warn(dev, "Failed to allocate device resource data\n");
                return -ENOMEM;
+       }
 
        ret = thermal_add_hwmon_sysfs(tz);
        if (ret) {
+               dev_warn(dev, "Failed to add hwmon sysfs attributes\n");
                devres_free(ptr);
                return ret;
        }
index 6a53359..d414a4b 100644 (file)
@@ -182,8 +182,7 @@ int ti_thermal_expose_sensor(struct ti_bandgap *bgp, int id,
        ti_bandgap_write_update_interval(bgp, data->sensor_id,
                                         TI_BANDGAP_UPDATE_INTERVAL_MS);
 
-       if (devm_thermal_add_hwmon_sysfs(bgp->dev, data->ti_thermal))
-               dev_warn(bgp->dev, "failed to add hwmon sysfs attributes\n");
+       devm_thermal_add_hwmon_sysfs(bgp->dev, data->ti_thermal);
 
        return 0;
 }
index 3bedecb..14bb6de 100644 (file)
@@ -192,9 +192,9 @@ static int dma_test_start_rings(struct dma_test *dt)
        }
 
        ret = tb_xdomain_enable_paths(dt->xd, dt->tx_hopid,
-                                     dt->tx_ring ? dt->tx_ring->hop : 0,
+                                     dt->tx_ring ? dt->tx_ring->hop : -1,
                                      dt->rx_hopid,
-                                     dt->rx_ring ? dt->rx_ring->hop : 0);
+                                     dt->rx_ring ? dt->rx_ring->hop : -1);
        if (ret) {
                dma_test_free_rings(dt);
                return ret;
@@ -218,9 +218,9 @@ static void dma_test_stop_rings(struct dma_test *dt)
                tb_ring_stop(dt->tx_ring);
 
        ret = tb_xdomain_disable_paths(dt->xd, dt->tx_hopid,
-                                      dt->tx_ring ? dt->tx_ring->hop : 0,
+                                      dt->tx_ring ? dt->tx_ring->hop : -1,
                                       dt->rx_hopid,
-                                      dt->rx_ring ? dt->rx_ring->hop : 0);
+                                      dt->rx_ring ? dt->rx_ring->hop : -1);
        if (ret)
                dev_warn(&dt->svc->dev, "failed to disable DMA paths\n");
 
index d76e923..e58beac 100644 (file)
@@ -54,6 +54,26 @@ static int ring_interrupt_index(const struct tb_ring *ring)
        return bit;
 }
 
+static void nhi_mask_interrupt(struct tb_nhi *nhi, int mask, int ring)
+{
+       if (nhi->quirks & QUIRK_AUTO_CLEAR_INT) {
+               u32 val;
+
+               val = ioread32(nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
+               iowrite32(val & ~mask, nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
+       } else {
+               iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring);
+       }
+}
+
+static void nhi_clear_interrupt(struct tb_nhi *nhi, int ring)
+{
+       if (nhi->quirks & QUIRK_AUTO_CLEAR_INT)
+               ioread32(nhi->iobase + REG_RING_NOTIFY_BASE + ring);
+       else
+               iowrite32(~0, nhi->iobase + REG_RING_INT_CLEAR + ring);
+}
+
 /*
  * ring_interrupt_active() - activate/deactivate interrupts for a single ring
  *
@@ -61,8 +81,8 @@ static int ring_interrupt_index(const struct tb_ring *ring)
  */
 static void ring_interrupt_active(struct tb_ring *ring, bool active)
 {
-       int reg = REG_RING_INTERRUPT_BASE +
-                 ring_interrupt_index(ring) / 32 * 4;
+       int index = ring_interrupt_index(ring) / 32 * 4;
+       int reg = REG_RING_INTERRUPT_BASE + index;
        int interrupt_bit = ring_interrupt_index(ring) & 31;
        int mask = 1 << interrupt_bit;
        u32 old, new;
@@ -123,7 +143,11 @@ static void ring_interrupt_active(struct tb_ring *ring, bool active)
                                         "interrupt for %s %d is already %s\n",
                                         RING_TYPE(ring), ring->hop,
                                         active ? "enabled" : "disabled");
-       iowrite32(new, ring->nhi->iobase + reg);
+
+       if (active)
+               iowrite32(new, ring->nhi->iobase + reg);
+       else
+               nhi_mask_interrupt(ring->nhi, mask, index);
 }
 
 /*
@@ -136,11 +160,11 @@ static void nhi_disable_interrupts(struct tb_nhi *nhi)
        int i = 0;
        /* disable interrupts */
        for (i = 0; i < RING_INTERRUPT_REG_COUNT(nhi); i++)
-               iowrite32(0, nhi->iobase + REG_RING_INTERRUPT_BASE + 4 * i);
+               nhi_mask_interrupt(nhi, ~0, 4 * i);
 
        /* clear interrupt status bits */
        for (i = 0; i < RING_NOTIFY_REG_COUNT(nhi); i++)
-               ioread32(nhi->iobase + REG_RING_NOTIFY_BASE + 4 * i);
+               nhi_clear_interrupt(nhi, 4 * i);
 }
 
 /* ring helper methods */
index faef165..6ba2958 100644 (file)
@@ -93,6 +93,8 @@ struct ring_desc {
 #define REG_RING_INTERRUPT_BASE        0x38200
 #define RING_INTERRUPT_REG_COUNT(nhi) ((31 + 2 * nhi->hop_count) / 32)
 
+#define REG_RING_INTERRUPT_MASK_CLEAR_BASE     0x38208
+
 #define REG_INT_THROTTLING_RATE        0x38c00
 
 /* Interrupt Vector Allocation */
index 7bfbc9c..c1af712 100644 (file)
@@ -737,6 +737,7 @@ static void tb_scan_port(struct tb_port *port)
 {
        struct tb_cm *tcm = tb_priv(port->sw->tb);
        struct tb_port *upstream_port;
+       bool discovery = false;
        struct tb_switch *sw;
        int ret;
 
@@ -804,8 +805,10 @@ static void tb_scan_port(struct tb_port *port)
         * tunnels and know which switches were authorized already by
         * the boot firmware.
         */
-       if (!tcm->hotplug_active)
+       if (!tcm->hotplug_active) {
                dev_set_uevent_suppress(&sw->dev, true);
+               discovery = true;
+       }
 
        /*
         * At the moment Thunderbolt 2 and beyond (devices with LC) we
@@ -835,10 +838,14 @@ static void tb_scan_port(struct tb_port *port)
         * CL0s and CL1 are enabled and supported together.
         * Silently ignore CLx enabling in case CLx is not supported.
         */
-       ret = tb_switch_enable_clx(sw, TB_CL1);
-       if (ret && ret != -EOPNOTSUPP)
-               tb_sw_warn(sw, "failed to enable %s on upstream port\n",
-                          tb_switch_clx_name(TB_CL1));
+       if (discovery) {
+               tb_sw_dbg(sw, "discovery, not touching CL states\n");
+       } else {
+               ret = tb_switch_enable_clx(sw, TB_CL1);
+               if (ret && ret != -EOPNOTSUPP)
+                       tb_sw_warn(sw, "failed to enable %s on upstream port\n",
+                                  tb_switch_clx_name(TB_CL1));
+       }
 
        if (tb_switch_is_clx_enabled(sw, TB_CL1))
                /*
index 9099ae7..4f22267 100644 (file)
@@ -526,7 +526,7 @@ static int tb_dp_xchg_caps(struct tb_tunnel *tunnel)
         * Perform connection manager handshake between IN and OUT ports
         * before capabilities exchange can take place.
         */
-       ret = tb_dp_cm_handshake(in, out, 1500);
+       ret = tb_dp_cm_handshake(in, out, 3000);
        if (ret)
                return ret;
 
index f801b1f..af0e1c0 100644 (file)
@@ -1012,7 +1012,7 @@ static int brcmuart_probe(struct platform_device *pdev)
        of_property_read_u32(np, "clock-frequency", &clk_rate);
 
        /* See if a Baud clock has been specified */
-       baud_mux_clk = of_clk_get_by_name(np, "sw_baud");
+       baud_mux_clk = devm_clk_get(dev, "sw_baud");
        if (IS_ERR(baud_mux_clk)) {
                if (PTR_ERR(baud_mux_clk) == -EPROBE_DEFER) {
                        ret = -EPROBE_DEFER;
@@ -1032,7 +1032,7 @@ static int brcmuart_probe(struct platform_device *pdev)
        if (clk_rate == 0) {
                dev_err(dev, "clock-frequency or clk not defined\n");
                ret = -EINVAL;
-               goto release_dma;
+               goto err_clk_disable;
        }
 
        dev_dbg(dev, "DMA is %senabled\n", priv->dma_enabled ? "" : "not ");
@@ -1119,6 +1119,8 @@ err1:
        serial8250_unregister_port(priv->line);
 err:
        brcmuart_free_bufs(dev, priv);
+err_clk_disable:
+       clk_disable_unprepare(baud_mux_clk);
 release_dma:
        if (priv->dma_enabled)
                brcmuart_arbitration(priv, 0);
@@ -1133,6 +1135,7 @@ static int brcmuart_remove(struct platform_device *pdev)
        hrtimer_cancel(&priv->hrt);
        serial8250_unregister_port(priv->line);
        brcmuart_free_bufs(&pdev->dev, priv);
+       clk_disable_unprepare(priv->baud_mux_clk);
        if (priv->dma_enabled)
                brcmuart_arbitration(priv, 0);
        return 0;
index 64770c6..b406cba 100644 (file)
 #define PCI_DEVICE_ID_COMMTECH_4224PCIE                0x0020
 #define PCI_DEVICE_ID_COMMTECH_4228PCIE                0x0021
 #define PCI_DEVICE_ID_COMMTECH_4222PCIE                0x0022
+
 #define PCI_DEVICE_ID_EXAR_XR17V4358           0x4358
 #define PCI_DEVICE_ID_EXAR_XR17V8358           0x8358
 
+#define PCI_SUBDEVICE_ID_USR_2980              0x0128
+#define PCI_SUBDEVICE_ID_USR_2981              0x0129
+
 #define PCI_DEVICE_ID_SEALEVEL_710xC           0x1001
 #define PCI_DEVICE_ID_SEALEVEL_720xC           0x1002
 #define PCI_DEVICE_ID_SEALEVEL_740xC           0x1004
@@ -829,6 +833,15 @@ static const struct exar8250_board pbn_exar_XR17V8358 = {
                (kernel_ulong_t)&bd                     \
        }
 
+#define USR_DEVICE(devid, sdevid, bd) {                        \
+       PCI_DEVICE_SUB(                                 \
+               PCI_VENDOR_ID_USR,                      \
+               PCI_DEVICE_ID_EXAR_##devid,             \
+               PCI_VENDOR_ID_EXAR,                     \
+               PCI_SUBDEVICE_ID_USR_##sdevid), 0, 0,   \
+               (kernel_ulong_t)&bd                     \
+       }
+
 static const struct pci_device_id exar_pci_tbl[] = {
        EXAR_DEVICE(ACCESSIO, COM_2S, pbn_exar_XR17C15x),
        EXAR_DEVICE(ACCESSIO, COM_4S, pbn_exar_XR17C15x),
@@ -853,6 +866,10 @@ static const struct pci_device_id exar_pci_tbl[] = {
 
        IBM_DEVICE(XR17C152, SATURN_SERIAL_ONE_PORT, pbn_exar_ibm_saturn),
 
+       /* USRobotics USR298x-OEM PCI Modems */
+       USR_DEVICE(XR17C152, 2980, pbn_exar_XR17C15x),
+       USR_DEVICE(XR17C152, 2981, pbn_exar_XR17C15x),
+
        /* Exar Corp. XR17C15[248] Dual/Quad/Octal UART */
        EXAR_DEVICE(EXAR, XR17C152, pbn_exar_XR17C15x),
        EXAR_DEVICE(EXAR, XR17C154, pbn_exar_XR17C15x),
index c55be6f..e80c4f6 100644 (file)
@@ -1920,6 +1920,8 @@ pci_moxa_setup(struct serial_private *priv,
 #define PCI_SUBDEVICE_ID_SIIG_DUAL_30  0x2530
 #define PCI_VENDOR_ID_ADVANTECH                0x13fe
 #define PCI_DEVICE_ID_INTEL_CE4100_UART 0x2e66
+#define PCI_DEVICE_ID_ADVANTECH_PCI1600        0x1600
+#define PCI_DEVICE_ID_ADVANTECH_PCI1600_1611   0x1611
 #define PCI_DEVICE_ID_ADVANTECH_PCI3620        0x3620
 #define PCI_DEVICE_ID_ADVANTECH_PCI3618        0x3618
 #define PCI_DEVICE_ID_ADVANTECH_PCIf618        0xf618
@@ -4085,6 +4087,9 @@ static SIMPLE_DEV_PM_OPS(pciserial_pm_ops, pciserial_suspend_one,
                         pciserial_resume_one);
 
 static const struct pci_device_id serial_pci_tbl[] = {
+       {       PCI_VENDOR_ID_ADVANTECH, PCI_DEVICE_ID_ADVANTECH_PCI1600,
+               PCI_DEVICE_ID_ADVANTECH_PCI1600_1611, PCI_ANY_ID, 0, 0,
+               pbn_b0_4_921600 },
        /* Advantech use PCI_DEVICE_ID_ADVANTECH_PCI3620 (0x3620) as 'PCI_SUBVENDOR_ID' */
        {       PCI_VENDOR_ID_ADVANTECH, PCI_DEVICE_ID_ADVANTECH_PCI3620,
                PCI_DEVICE_ID_ADVANTECH_PCI3620, 0x0001, 0, 0,
index fe8d79c..c153ba3 100644 (file)
@@ -669,6 +669,7 @@ EXPORT_SYMBOL_GPL(serial8250_em485_supported);
 /**
  * serial8250_em485_config() - generic ->rs485_config() callback
  * @port: uart port
+ * @termios: termios structure
  * @rs485: rs485 settings
  *
  * Generic callback usable by 8250 uart drivers to activate rs485 settings
index 2509e7f..89956bb 100644 (file)
@@ -113,13 +113,15 @@ static int tegra_uart_probe(struct platform_device *pdev)
 
        ret = serial8250_register_8250_port(&port8250);
        if (ret < 0)
-               goto err_clkdisable;
+               goto err_ctrl_assert;
 
        platform_set_drvdata(pdev, uart);
        uart->line = ret;
 
        return 0;
 
+err_ctrl_assert:
+       reset_control_assert(uart->rst);
 err_clkdisable:
        clk_disable_unprepare(uart->clk);
 
index 398e5aa..71a7a3e 100644 (file)
@@ -450,8 +450,8 @@ config SERIAL_SA1100
        help
          If you have a machine based on a SA1100/SA1110 StrongARM(R) CPU you
          can enable its onboard serial port by enabling this option.
-         Please read <file:Documentation/arm/sa1100/serial_uart.rst> for further
-         info.
+         Please read <file:Documentation/arch/arm/sa1100/serial_uart.rst> for
+         further info.
 
 config SERIAL_SA1100_CONSOLE
        bool "Console on SA1100 serial port"
@@ -762,7 +762,7 @@ config SERIAL_PMACZILOG_CONSOLE
 
 config SERIAL_CPM
        tristate "CPM SCC/SMC serial port support"
-       depends on CPM2 || CPM1 || (PPC32 && COMPILE_TEST)
+       depends on CPM2 || CPM1
        select SERIAL_CORE
        help
          This driver supports the SCC and SMC serial ports on Motorola 
index 59e25f2..4b2512e 100644 (file)
@@ -606,10 +606,11 @@ static int arc_serial_probe(struct platform_device *pdev)
        }
        uart->baud = val;
 
-       port->membase = of_iomap(np, 0);
-       if (!port->membase)
+       port->membase = devm_platform_ioremap_resource(pdev, 0);
+       if (IS_ERR(port->membase)) {
                /* No point of dev_err since UART itself is hosed here */
-               return -ENXIO;
+               return PTR_ERR(port->membase);
+       }
 
        port->irq = irq_of_parse_and_map(np, 0);
 
index 0577618..46c03ed 100644 (file)
@@ -19,8 +19,6 @@ struct gpio_desc;
 #include "cpm_uart_cpm2.h"
 #elif defined(CONFIG_CPM1)
 #include "cpm_uart_cpm1.h"
-#elif defined(CONFIG_COMPILE_TEST)
-#include "cpm_uart_cpm2.h"
 #endif
 
 #define SERIAL_CPM_MAJOR       204
index c91916e..7fd30fc 100644 (file)
@@ -310,7 +310,7 @@ static const struct lpuart_soc_data ls1021a_data = {
 static const struct lpuart_soc_data ls1028a_data = {
        .devtype = LS1028A_LPUART,
        .iotype = UPIO_MEM32,
-       .rx_watermark = 1,
+       .rx_watermark = 0,
 };
 
 static struct lpuart_soc_data imx7ulp_data = {
@@ -1495,34 +1495,36 @@ static void lpuart_break_ctl(struct uart_port *port, int break_state)
 
 static void lpuart32_break_ctl(struct uart_port *port, int break_state)
 {
-       unsigned long temp, modem;
-       struct tty_struct *tty;
-       unsigned int cflag = 0;
-
-       tty = tty_port_tty_get(&port->state->port);
-       if (tty) {
-               cflag = tty->termios.c_cflag;
-               tty_kref_put(tty);
-       }
+       unsigned long temp;
 
-       temp = lpuart32_read(port, UARTCTRL) & ~UARTCTRL_SBK;
-       modem = lpuart32_read(port, UARTMODIR);
+       temp = lpuart32_read(port, UARTCTRL);
 
+       /*
+        * LPUART IP now has two known bugs, one is CTS has higher priority than the
+        * break signal, which causes the break signal sending through UARTCTRL_SBK
+        * may impacted by the CTS input if the HW flow control is enabled. It
+        * exists on all platforms we support in this driver.
+        * Another bug is i.MX8QM LPUART may have an additional break character
+        * being sent after SBK was cleared.
+        * To avoid above two bugs, we use Transmit Data Inversion function to send
+        * the break signal instead of UARTCTRL_SBK.
+        */
        if (break_state != 0) {
-               temp |= UARTCTRL_SBK;
                /*
-                * LPUART CTS has higher priority than SBK, need to disable CTS before
-                * asserting SBK to avoid any interference if flow control is enabled.
+                * Disable the transmitter to prevent any data from being sent out
+                * during break, then invert the TX line to send break.
                 */
-               if (cflag & CRTSCTS && modem & UARTMODIR_TXCTSE)
-                       lpuart32_write(port, modem & ~UARTMODIR_TXCTSE, UARTMODIR);
+               temp &= ~UARTCTRL_TE;
+               lpuart32_write(port, temp, UARTCTRL);
+               temp |= UARTCTRL_TXINV;
+               lpuart32_write(port, temp, UARTCTRL);
        } else {
-               /* Re-enable the CTS when break off. */
-               if (cflag & CRTSCTS && !(modem & UARTMODIR_TXCTSE))
-                       lpuart32_write(port, modem | UARTMODIR_TXCTSE, UARTMODIR);
+               /* Disable the TXINV to turn off break and re-enable transmitter. */
+               temp &= ~UARTCTRL_TXINV;
+               lpuart32_write(port, temp, UARTCTRL);
+               temp |= UARTCTRL_TE;
+               lpuart32_write(port, temp, UARTCTRL);
        }
-
-       lpuart32_write(port, temp, UARTCTRL);
 }
 
 static void lpuart_setup_watermark(struct lpuart_port *sport)
index a58e927..f1387f1 100644 (file)
@@ -250,6 +250,7 @@ lqasc_err_int(int irq, void *_port)
        struct ltq_uart_port *ltq_port = to_ltq_uart_port(port);
 
        spin_lock_irqsave(&ltq_port->lock, flags);
+       __raw_writel(ASC_IRNCR_EIR, port->membase + LTQ_ASC_IRNCR);
        /* clear any pending interrupts */
        asc_update_bits(0, ASCWHBSTATE_CLRPE | ASCWHBSTATE_CLRFE |
                ASCWHBSTATE_CLRROE, port->membase + LTQ_ASC_WHBSTATE);
index 08dc3e2..8582479 100644 (file)
@@ -1664,19 +1664,18 @@ static int qcom_geni_serial_probe(struct platform_device *pdev)
        uport->private_data = &port->private_data;
        platform_set_drvdata(pdev, port);
 
-       ret = uart_add_one_port(drv, uport);
-       if (ret)
-               return ret;
-
        irq_set_status_flags(uport->irq, IRQ_NOAUTOEN);
        ret = devm_request_irq(uport->dev, uport->irq, qcom_geni_serial_isr,
                        IRQF_TRIGGER_HIGH, port->name, uport);
        if (ret) {
                dev_err(uport->dev, "Failed to get IRQ ret %d\n", ret);
-               uart_remove_one_port(drv, uport);
                return ret;
        }
 
+       ret = uart_add_one_port(drv, uport);
+       if (ret)
+               return ret;
+
        /*
         * Set pm_runtime status as ACTIVE so that wakeup_irq gets
         * enabled/disabled from dev_pm_arm_wake_irq during system
index c84be40..4737a8f 100644 (file)
@@ -466,7 +466,7 @@ static const struct file_operations tty_fops = {
        .llseek         = no_llseek,
        .read_iter      = tty_read,
        .write_iter     = tty_write,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .splice_write   = iter_file_splice_write,
        .poll           = tty_poll,
        .unlocked_ioctl = tty_ioctl,
@@ -481,7 +481,7 @@ static const struct file_operations console_fops = {
        .llseek         = no_llseek,
        .read_iter      = tty_read,
        .write_iter     = redirected_tty_write,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .splice_write   = iter_file_splice_write,
        .poll           = tty_poll,
        .unlocked_ioctl = tty_ioctl,
index 498ba9c..829c4be 100644 (file)
@@ -656,10 +656,17 @@ vcs_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
                        }
                }
 
-               /* The vcs_size might have changed while we slept to grab
-                * the user buffer, so recheck.
+               /* The vc might have been freed or vcs_size might have changed
+                * while we slept to grab the user buffer, so recheck.
                 * Return data written up to now on failure.
                 */
+               vc = vcs_vc(inode, &viewed);
+               if (!vc) {
+                       if (written)
+                               break;
+                       ret = -ENXIO;
+                       goto unlock_out;
+               }
                size = vcs_size(vc, attr, false);
                if (size < 0) {
                        if (written)
index 202ff71..51b3c6a 100644 (file)
@@ -150,7 +150,8 @@ static int ufshcd_mcq_config_nr_queues(struct ufs_hba *hba)
        u32 hba_maxq, rem, tot_queues;
        struct Scsi_Host *host = hba->host;
 
-       hba_maxq = FIELD_GET(MAX_QUEUE_SUP, hba->mcq_capabilities);
+       /* maxq is 0 based value */
+       hba_maxq = FIELD_GET(MAX_QUEUE_SUP, hba->mcq_capabilities) + 1;
 
        tot_queues = UFS_MCQ_NUM_DEV_CMD_QUEUES + read_queues + poll_queues +
                        rw_queues;
@@ -265,7 +266,7 @@ static int ufshcd_mcq_get_tag(struct ufs_hba *hba,
        addr = (le64_to_cpu(cqe->command_desc_base_addr) & CQE_UCD_BA) -
                hba->ucdl_dma_addr;
 
-       return div_u64(addr, sizeof(struct utp_transfer_cmd_desc));
+       return div_u64(addr, ufshcd_get_ucd_size(hba));
 }
 
 static void ufshcd_mcq_process_cqe(struct ufs_hba *hba,
index 17d7bb8..e7e79f5 100644 (file)
@@ -2849,10 +2849,10 @@ static void ufshcd_map_queues(struct Scsi_Host *shost)
 static void ufshcd_init_lrb(struct ufs_hba *hba, struct ufshcd_lrb *lrb, int i)
 {
        struct utp_transfer_cmd_desc *cmd_descp = (void *)hba->ucdl_base_addr +
-               i * sizeof_utp_transfer_cmd_desc(hba);
+               i * ufshcd_get_ucd_size(hba);
        struct utp_transfer_req_desc *utrdlp = hba->utrdl_base_addr;
        dma_addr_t cmd_desc_element_addr = hba->ucdl_dma_addr +
-               i * sizeof_utp_transfer_cmd_desc(hba);
+               i * ufshcd_get_ucd_size(hba);
        u16 response_offset = offsetof(struct utp_transfer_cmd_desc,
                                       response_upiu);
        u16 prdt_offset = offsetof(struct utp_transfer_cmd_desc, prd_table);
@@ -3761,7 +3761,7 @@ static int ufshcd_memory_alloc(struct ufs_hba *hba)
        size_t utmrdl_size, utrdl_size, ucdl_size;
 
        /* Allocate memory for UTP command descriptors */
-       ucdl_size = sizeof_utp_transfer_cmd_desc(hba) * hba->nutrs;
+       ucdl_size = ufshcd_get_ucd_size(hba) * hba->nutrs;
        hba->ucdl_base_addr = dmam_alloc_coherent(hba->dev,
                                                  ucdl_size,
                                                  &hba->ucdl_dma_addr,
@@ -3861,7 +3861,7 @@ static void ufshcd_host_memory_configure(struct ufs_hba *hba)
        prdt_offset =
                offsetof(struct utp_transfer_cmd_desc, prd_table);
 
-       cmd_desc_size = sizeof_utp_transfer_cmd_desc(hba);
+       cmd_desc_size = ufshcd_get_ucd_size(hba);
        cmd_desc_dma_addr = hba->ucdl_dma_addr;
 
        for (i = 0; i < hba->nutrs; i++) {
@@ -8452,7 +8452,7 @@ static void ufshcd_release_sdb_queue(struct ufs_hba *hba, int nutrs)
 {
        size_t ucdl_size, utrdl_size;
 
-       ucdl_size = sizeof(struct utp_transfer_cmd_desc) * nutrs;
+       ucdl_size = ufshcd_get_ucd_size(hba) * nutrs;
        dmam_free_coherent(hba->dev, ucdl_size, hba->ucdl_base_addr,
                           hba->ucdl_dma_addr);
 
@@ -9459,8 +9459,16 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
                         * that performance might be impacted.
                         */
                        ret = ufshcd_urgent_bkops(hba);
-                       if (ret)
+                       if (ret) {
+                               /*
+                                * If return err in suspend flow, IO will hang.
+                                * Trigger error handler and break suspend for
+                                * error recovery.
+                                */
+                               ufshcd_force_error_recovery(hba);
+                               ret = -EBUSY;
                                goto enable_scaling;
+                       }
                } else {
                        /* make sure that auto bkops is disabled */
                        ufshcd_disable_auto_bkops(hba);
index ccfaebc..1dcadef 100644 (file)
@@ -2097,6 +2097,19 @@ int cdns3_ep_config(struct cdns3_endpoint *priv_ep, bool enable)
        else
                priv_ep->trb_burst_size = 16;
 
+       /*
+        * In versions preceding DEV_VER_V2, for example, iMX8QM, there exit the bugs
+        * in the DMA. These bugs occur when the trb_burst_size exceeds 16 and the
+        * address is not aligned to 128 Bytes (which is a product of the 64-bit AXI
+        * and AXI maximum burst length of 16 or 0xF+1, dma_axi_ctrl0[3:0]). This
+        * results in data corruption when it crosses the 4K border. The corruption
+        * specifically occurs from the position (4K - (address & 0x7F)) to 4K.
+        *
+        * So force trb_burst_size to 16 at such platform.
+        */
+       if (priv_dev->dev_ver < DEV_VER_V2)
+               priv_ep->trb_burst_size = 16;
+
        mult = min_t(u8, mult, EP_CFG_MULT_MAX);
        buffering = min_t(u8, buffering, EP_CFG_BUFFERING_MAX);
        maxburst = min_t(u8, maxburst, EP_CFG_MAXBURST_MAX);
index 4bb6d30..311007b 100644 (file)
@@ -1928,6 +1928,8 @@ static int usbtmc_ioctl_request(struct usbtmc_device_data *data,
 
        if (request.req.wLength > USBTMC_BUFSIZE)
                return -EMSGSIZE;
+       if (request.req.wLength == 0)   /* Length-0 requests are never IN */
+               request.req.bRequestType &= ~USB_DIR_IN;
 
        is_in = request.req.bRequestType & USB_DIR_IN;
 
index fbb087b..268ccbe 100644 (file)
@@ -172,3 +172,44 @@ void hcd_buffer_free(
        }
        dma_free_coherent(hcd->self.sysdev, size, addr, dma);
 }
+
+void *hcd_buffer_alloc_pages(struct usb_hcd *hcd,
+               size_t size, gfp_t mem_flags, dma_addr_t *dma)
+{
+       if (size == 0)
+               return NULL;
+
+       if (hcd->localmem_pool)
+               return gen_pool_dma_alloc_align(hcd->localmem_pool,
+                               size, dma, PAGE_SIZE);
+
+       /* some USB hosts just use PIO */
+       if (!hcd_uses_dma(hcd)) {
+               *dma = DMA_MAPPING_ERROR;
+               return (void *)__get_free_pages(mem_flags,
+                               get_order(size));
+       }
+
+       return dma_alloc_coherent(hcd->self.sysdev,
+                       size, dma, mem_flags);
+}
+
+void hcd_buffer_free_pages(struct usb_hcd *hcd,
+               size_t size, void *addr, dma_addr_t dma)
+{
+       if (!addr)
+               return;
+
+       if (hcd->localmem_pool) {
+               gen_pool_free(hcd->localmem_pool,
+                               (unsigned long)addr, size);
+               return;
+       }
+
+       if (!hcd_uses_dma(hcd)) {
+               free_pages((unsigned long)addr, get_order(size));
+               return;
+       }
+
+       dma_free_coherent(hcd->self.sysdev, size, addr, dma);
+}
index e501a03..fcf6881 100644 (file)
@@ -186,6 +186,7 @@ static int connected(struct usb_dev_state *ps)
 static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
 {
        struct usb_dev_state *ps = usbm->ps;
+       struct usb_hcd *hcd = bus_to_hcd(ps->dev->bus);
        unsigned long flags;
 
        spin_lock_irqsave(&ps->lock, flags);
@@ -194,8 +195,8 @@ static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
                list_del(&usbm->memlist);
                spin_unlock_irqrestore(&ps->lock, flags);
 
-               usb_free_coherent(ps->dev, usbm->size, usbm->mem,
-                               usbm->dma_handle);
+               hcd_buffer_free_pages(hcd, usbm->size,
+                               usbm->mem, usbm->dma_handle);
                usbfs_decrease_memory_usage(
                        usbm->size + sizeof(struct usb_memory));
                kfree(usbm);
@@ -234,7 +235,7 @@ static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
        size_t size = vma->vm_end - vma->vm_start;
        void *mem;
        unsigned long flags;
-       dma_addr_t dma_handle;
+       dma_addr_t dma_handle = DMA_MAPPING_ERROR;
        int ret;
 
        ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory));
@@ -247,8 +248,8 @@ static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
                goto error_decrease_mem;
        }
 
-       mem = usb_alloc_coherent(ps->dev, size, GFP_USER | __GFP_NOWARN,
-                       &dma_handle);
+       mem = hcd_buffer_alloc_pages(hcd,
+                       size, GFP_USER | __GFP_NOWARN, &dma_handle);
        if (!mem) {
                ret = -ENOMEM;
                goto error_free_usbm;
@@ -264,7 +265,14 @@ static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
        usbm->vma_use_count = 1;
        INIT_LIST_HEAD(&usbm->memlist);
 
-       if (hcd->localmem_pool || !hcd_uses_dma(hcd)) {
+       /*
+        * In DMA-unavailable cases, hcd_buffer_alloc_pages allocates
+        * normal pages and assigns DMA_MAPPING_ERROR to dma_handle. Check
+        * whether we are in such cases, and then use remap_pfn_range (or
+        * dma_mmap_coherent) to map normal (or DMA) pages into the user
+        * space, respectively.
+        */
+       if (dma_handle == DMA_MAPPING_ERROR) {
                if (remap_pfn_range(vma, vma->vm_start,
                                    virt_to_phys(usbm->mem) >> PAGE_SHIFT,
                                    size, vma->vm_page_prot) < 0) {
index 0beaab9..d68958e 100644 (file)
@@ -1137,7 +1137,7 @@ static int dwc3_core_init(struct dwc3 *dwc)
 
        dwc3_set_incr_burst_type(dwc);
 
-       dwc3_phy_power_on(dwc);
+       ret = dwc3_phy_power_on(dwc);
        if (ret)
                goto err_exit_phy;
 
@@ -1929,6 +1929,11 @@ static int dwc3_remove(struct platform_device *pdev)
        pm_runtime_disable(&pdev->dev);
        pm_runtime_dont_use_autosuspend(&pdev->dev);
        pm_runtime_put_noidle(&pdev->dev);
+       /*
+        * HACK: Clear the driver data, which is currently accessed by parent
+        * glue drivers, before allowing the parent to suspend.
+        */
+       platform_set_drvdata(pdev, NULL);
        pm_runtime_set_suspended(&pdev->dev);
 
        dwc3_free_event_buffers(dwc);
index d56457c..1f043c3 100644 (file)
@@ -1116,6 +1116,7 @@ struct dwc3_scratchpad_array {
  * @dis_metastability_quirk: set to disable metastability quirk.
  * @dis_split_quirk: set to disable split boundary.
  * @wakeup_configured: set if the device is configured for remote wakeup.
+ * @suspended: set to track suspend event due to U3/L2.
  * @imod_interval: set the interrupt moderation interval in 250ns
  *                     increments or 0 to disable.
  * @max_cfg_eps: current max number of IN eps used across all USB configs.
@@ -1332,6 +1333,7 @@ struct dwc3 {
        unsigned                dis_split_quirk:1;
        unsigned                async_callbacks:1;
        unsigned                wakeup_configured:1;
+       unsigned                suspended:1;
 
        u16                     imod_interval;
 
index e4a2560..ebf0346 100644 (file)
@@ -332,6 +332,11 @@ static int dwc3_lsp_show(struct seq_file *s, void *unused)
        unsigned int            current_mode;
        unsigned long           flags;
        u32                     reg;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        reg = dwc3_readl(dwc->regs, DWC3_GSTS);
@@ -350,6 +355,8 @@ static int dwc3_lsp_show(struct seq_file *s, void *unused)
        }
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -395,6 +402,11 @@ static int dwc3_mode_show(struct seq_file *s, void *unused)
        struct dwc3             *dwc = s->private;
        unsigned long           flags;
        u32                     reg;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        reg = dwc3_readl(dwc->regs, DWC3_GCTL);
@@ -414,6 +426,8 @@ static int dwc3_mode_show(struct seq_file *s, void *unused)
                seq_printf(s, "UNKNOWN %08x\n", DWC3_GCTL_PRTCAP(reg));
        }
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -463,6 +477,11 @@ static int dwc3_testmode_show(struct seq_file *s, void *unused)
        struct dwc3             *dwc = s->private;
        unsigned long           flags;
        u32                     reg;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        reg = dwc3_readl(dwc->regs, DWC3_DCTL);
@@ -493,6 +512,8 @@ static int dwc3_testmode_show(struct seq_file *s, void *unused)
                seq_printf(s, "UNKNOWN %d\n", reg);
        }
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -509,6 +530,7 @@ static ssize_t dwc3_testmode_write(struct file *file,
        unsigned long           flags;
        u32                     testmode = 0;
        char                    buf[32];
+       int                     ret;
 
        if (copy_from_user(&buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
                return -EFAULT;
@@ -526,10 +548,16 @@ static ssize_t dwc3_testmode_write(struct file *file,
        else
                testmode = 0;
 
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
+
        spin_lock_irqsave(&dwc->lock, flags);
        dwc3_gadget_set_test_mode(dwc, testmode);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return count;
 }
 
@@ -548,12 +576,18 @@ static int dwc3_link_state_show(struct seq_file *s, void *unused)
        enum dwc3_link_state    state;
        u32                     reg;
        u8                      speed;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        reg = dwc3_readl(dwc->regs, DWC3_GSTS);
        if (DWC3_GSTS_CURMOD(reg) != DWC3_GSTS_CURMOD_DEVICE) {
                seq_puts(s, "Not available\n");
                spin_unlock_irqrestore(&dwc->lock, flags);
+               pm_runtime_put_sync(dwc->dev);
                return 0;
        }
 
@@ -566,6 +600,8 @@ static int dwc3_link_state_show(struct seq_file *s, void *unused)
                   dwc3_gadget_hs_link_string(state));
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -584,6 +620,7 @@ static ssize_t dwc3_link_state_write(struct file *file,
        char                    buf[32];
        u32                     reg;
        u8                      speed;
+       int                     ret;
 
        if (copy_from_user(&buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
                return -EFAULT;
@@ -603,10 +640,15 @@ static ssize_t dwc3_link_state_write(struct file *file,
        else
                return -EINVAL;
 
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
+
        spin_lock_irqsave(&dwc->lock, flags);
        reg = dwc3_readl(dwc->regs, DWC3_GSTS);
        if (DWC3_GSTS_CURMOD(reg) != DWC3_GSTS_CURMOD_DEVICE) {
                spin_unlock_irqrestore(&dwc->lock, flags);
+               pm_runtime_put_sync(dwc->dev);
                return -EINVAL;
        }
 
@@ -616,12 +658,15 @@ static ssize_t dwc3_link_state_write(struct file *file,
        if (speed < DWC3_DSTS_SUPERSPEED &&
            state != DWC3_LINK_STATE_RECOV) {
                spin_unlock_irqrestore(&dwc->lock, flags);
+               pm_runtime_put_sync(dwc->dev);
                return -EINVAL;
        }
 
        dwc3_gadget_set_link_state(dwc, state);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return count;
 }
 
@@ -645,6 +690,11 @@ static int dwc3_tx_fifo_size_show(struct seq_file *s, void *unused)
        unsigned long           flags;
        u32                     mdwidth;
        u32                     val;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        val = dwc3_core_fifo_space(dep, DWC3_TXFIFO);
@@ -657,6 +707,8 @@ static int dwc3_tx_fifo_size_show(struct seq_file *s, void *unused)
        seq_printf(s, "%u\n", val);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -667,6 +719,11 @@ static int dwc3_rx_fifo_size_show(struct seq_file *s, void *unused)
        unsigned long           flags;
        u32                     mdwidth;
        u32                     val;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        val = dwc3_core_fifo_space(dep, DWC3_RXFIFO);
@@ -679,6 +736,8 @@ static int dwc3_rx_fifo_size_show(struct seq_file *s, void *unused)
        seq_printf(s, "%u\n", val);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -688,12 +747,19 @@ static int dwc3_tx_request_queue_show(struct seq_file *s, void *unused)
        struct dwc3             *dwc = dep->dwc;
        unsigned long           flags;
        u32                     val;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        val = dwc3_core_fifo_space(dep, DWC3_TXREQQ);
        seq_printf(s, "%u\n", val);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -703,12 +769,19 @@ static int dwc3_rx_request_queue_show(struct seq_file *s, void *unused)
        struct dwc3             *dwc = dep->dwc;
        unsigned long           flags;
        u32                     val;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        val = dwc3_core_fifo_space(dep, DWC3_RXREQQ);
        seq_printf(s, "%u\n", val);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -718,12 +791,19 @@ static int dwc3_rx_info_queue_show(struct seq_file *s, void *unused)
        struct dwc3             *dwc = dep->dwc;
        unsigned long           flags;
        u32                     val;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        val = dwc3_core_fifo_space(dep, DWC3_RXINFOQ);
        seq_printf(s, "%u\n", val);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -733,12 +813,19 @@ static int dwc3_descriptor_fetch_queue_show(struct seq_file *s, void *unused)
        struct dwc3             *dwc = dep->dwc;
        unsigned long           flags;
        u32                     val;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        val = dwc3_core_fifo_space(dep, DWC3_DESCFETCHQ);
        seq_printf(s, "%u\n", val);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -748,12 +835,19 @@ static int dwc3_event_queue_show(struct seq_file *s, void *unused)
        struct dwc3             *dwc = dep->dwc;
        unsigned long           flags;
        u32                     val;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        val = dwc3_core_fifo_space(dep, DWC3_EVENTQ);
        seq_printf(s, "%u\n", val);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -798,6 +892,11 @@ static int dwc3_trb_ring_show(struct seq_file *s, void *unused)
        struct dwc3             *dwc = dep->dwc;
        unsigned long           flags;
        int                     i;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        if (dep->number <= 1) {
@@ -827,6 +926,8 @@ static int dwc3_trb_ring_show(struct seq_file *s, void *unused)
 out:
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -839,6 +940,11 @@ static int dwc3_ep_info_register_show(struct seq_file *s, void *unused)
        u32                     lower_32_bits;
        u32                     upper_32_bits;
        u32                     reg;
+       int                     ret;
+
+       ret = pm_runtime_resume_and_get(dwc->dev);
+       if (ret < 0)
+               return ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
        reg = DWC3_GDBGLSPMUX_EPSELECT(dep->number);
@@ -851,6 +957,8 @@ static int dwc3_ep_info_register_show(struct seq_file *s, void *unused)
        seq_printf(s, "0x%016llx\n", ep_info);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
+       pm_runtime_put_sync(dwc->dev);
+
        return 0;
 }
 
@@ -910,6 +1018,7 @@ void dwc3_debugfs_init(struct dwc3 *dwc)
        dwc->regset->regs = dwc3_regs;
        dwc->regset->nregs = ARRAY_SIZE(dwc3_regs);
        dwc->regset->base = dwc->regs - DWC3_GLOBALS_REGS_START;
+       dwc->regset->dev = dwc->dev;
 
        root = debugfs_create_dir(dev_name(dwc->dev), usb_debug_root);
        dwc->debug_root = root;
index 959fc92..79b22ab 100644 (file)
@@ -308,7 +308,16 @@ static void dwc3_qcom_interconnect_exit(struct dwc3_qcom *qcom)
 /* Only usable in contexts where the role can not change. */
 static bool dwc3_qcom_is_host(struct dwc3_qcom *qcom)
 {
-       struct dwc3 *dwc = platform_get_drvdata(qcom->dwc3);
+       struct dwc3 *dwc;
+
+       /*
+        * FIXME: Fix this layering violation.
+        */
+       dwc = platform_get_drvdata(qcom->dwc3);
+
+       /* Core driver may not have probed yet. */
+       if (!dwc)
+               return false;
 
        return dwc->xhci;
 }
index c0ca4d1..b78599d 100644 (file)
@@ -198,6 +198,7 @@ static void dwc3_gadget_del_and_unmap_request(struct dwc3_ep *dep,
        list_del(&req->list);
        req->remaining = 0;
        req->needs_extra_trb = false;
+       req->num_trbs = 0;
 
        if (req->request.status == -EINPROGRESS)
                req->request.status = status;
@@ -2440,6 +2441,7 @@ static int dwc3_gadget_func_wakeup(struct usb_gadget *g, int intf_id)
                        return -EINVAL;
                }
                dwc3_resume_gadget(dwc);
+               dwc->suspended = false;
                dwc->link_state = DWC3_LINK_STATE_U0;
        }
 
@@ -2699,6 +2701,21 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
        return ret;
 }
 
+static int dwc3_gadget_soft_connect(struct dwc3 *dwc)
+{
+       /*
+        * In the Synopsys DWC_usb31 1.90a programming guide section
+        * 4.1.9, it specifies that for a reconnect after a
+        * device-initiated disconnect requires a core soft reset
+        * (DCTL.CSftRst) before enabling the run/stop bit.
+        */
+       dwc3_core_soft_reset(dwc);
+
+       dwc3_event_buffers_setup(dwc);
+       __dwc3_gadget_start(dwc);
+       return dwc3_gadget_run_stop(dwc, true);
+}
+
 static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
 {
        struct dwc3             *dwc = gadget_to_dwc(g);
@@ -2737,21 +2754,10 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
 
        synchronize_irq(dwc->irq_gadget);
 
-       if (!is_on) {
+       if (!is_on)
                ret = dwc3_gadget_soft_disconnect(dwc);
-       } else {
-               /*
-                * In the Synopsys DWC_usb31 1.90a programming guide section
-                * 4.1.9, it specifies that for a reconnect after a
-                * device-initiated disconnect requires a core soft reset
-                * (DCTL.CSftRst) before enabling the run/stop bit.
-                */
-               dwc3_core_soft_reset(dwc);
-
-               dwc3_event_buffers_setup(dwc);
-               __dwc3_gadget_start(dwc);
-               ret = dwc3_gadget_run_stop(dwc, true);
-       }
+       else
+               ret = dwc3_gadget_soft_connect(dwc);
 
        pm_runtime_put(dwc->dev);
 
@@ -3938,6 +3944,8 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
 {
        int                     reg;
 
+       dwc->suspended = false;
+
        dwc3_gadget_set_link_state(dwc, DWC3_LINK_STATE_RX_DET);
 
        reg = dwc3_readl(dwc->regs, DWC3_DCTL);
@@ -3962,6 +3970,8 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
 {
        u32                     reg;
 
+       dwc->suspended = false;
+
        /*
         * Ideally, dwc3_reset_gadget() would trigger the function
         * drivers to stop any active transfers through ep disable.
@@ -4180,6 +4190,8 @@ static void dwc3_gadget_conndone_interrupt(struct dwc3 *dwc)
 
 static void dwc3_gadget_wakeup_interrupt(struct dwc3 *dwc, unsigned int evtinfo)
 {
+       dwc->suspended = false;
+
        /*
         * TODO take core out of low power mode when that's
         * implemented.
@@ -4277,6 +4289,7 @@ static void dwc3_gadget_linksts_change_interrupt(struct dwc3 *dwc,
                if (dwc->gadget->wakeup_armed) {
                        dwc3_gadget_enable_linksts_evts(dwc, false);
                        dwc3_resume_gadget(dwc);
+                       dwc->suspended = false;
                }
                break;
        case DWC3_LINK_STATE_U1:
@@ -4303,8 +4316,10 @@ static void dwc3_gadget_suspend_interrupt(struct dwc3 *dwc,
 {
        enum dwc3_link_state next = evtinfo & DWC3_LINK_STATE_MASK;
 
-       if (dwc->link_state != next && next == DWC3_LINK_STATE_U3)
+       if (!dwc->suspended && next == DWC3_LINK_STATE_U3) {
+               dwc->suspended = true;
                dwc3_suspend_gadget(dwc);
+       }
 
        dwc->link_state = next;
 }
@@ -4655,42 +4670,39 @@ void dwc3_gadget_exit(struct dwc3 *dwc)
 int dwc3_gadget_suspend(struct dwc3 *dwc)
 {
        unsigned long flags;
+       int ret;
 
        if (!dwc->gadget_driver)
                return 0;
 
-       dwc3_gadget_run_stop(dwc, false);
+       ret = dwc3_gadget_soft_disconnect(dwc);
+       if (ret)
+               goto err;
 
        spin_lock_irqsave(&dwc->lock, flags);
        dwc3_disconnect_gadget(dwc);
-       __dwc3_gadget_stop(dwc);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
        return 0;
+
+err:
+       /*
+        * Attempt to reset the controller's state. Likely no
+        * communication can be established until the host
+        * performs a port reset.
+        */
+       if (dwc->softconnect)
+               dwc3_gadget_soft_connect(dwc);
+
+       return ret;
 }
 
 int dwc3_gadget_resume(struct dwc3 *dwc)
 {
-       int                     ret;
-
        if (!dwc->gadget_driver || !dwc->softconnect)
                return 0;
 
-       ret = __dwc3_gadget_start(dwc);
-       if (ret < 0)
-               goto err0;
-
-       ret = dwc3_gadget_run_stop(dwc, true);
-       if (ret < 0)
-               goto err1;
-
-       return 0;
-
-err1:
-       __dwc3_gadget_stop(dwc);
-
-err0:
-       return ret;
+       return dwc3_gadget_soft_connect(dwc);
 }
 
 void dwc3_gadget_process_pending_events(struct dwc3 *dwc)
index a13c946..f41a385 100644 (file)
@@ -3535,6 +3535,7 @@ static void ffs_func_unbind(struct usb_configuration *c,
        /* Drain any pending AIO completions */
        drain_workqueue(ffs->io_completion_wq);
 
+       ffs_event_add(ffs, FUNCTIONFS_UNBIND);
        if (!--opts->refcnt)
                functionfs_unbind(ffs);
 
@@ -3559,7 +3560,6 @@ static void ffs_func_unbind(struct usb_configuration *c,
        func->function.ssp_descriptors = NULL;
        func->interfaces_nums = NULL;
 
-       ffs_event_add(ffs, FUNCTIONFS_UNBIND);
 }
 
 static struct usb_function *ffs_alloc(struct usb_function_instance *fi)
index 6956ad8..a366abb 100644 (file)
@@ -17,6 +17,7 @@
 #include <linux/etherdevice.h>
 #include <linux/ethtool.h>
 #include <linux/if_vlan.h>
+#include <linux/string_helpers.h>
 #include <linux/usb/composite.h>
 
 #include "u_ether.h"
@@ -965,6 +966,8 @@ int gether_get_host_addr_cdc(struct net_device *net, char *host_addr, int len)
        dev = netdev_priv(net);
        snprintf(host_addr, len, "%pm", dev->host_mac);
 
+       string_upper(host_addr, host_addr);
+
        return strlen(host_addr);
 }
 EXPORT_SYMBOL_GPL(gether_get_host_addr_cdc);
index c80f9bd..a36913a 100644 (file)
@@ -170,6 +170,9 @@ static int udc_pci_probe(
                retval = -ENODEV;
                goto err_probe;
        }
+
+       udc = dev;
+
        return 0;
 
 err_probe:
index 4641153..83fd1de 100644 (file)
@@ -37,10 +37,14 @@ static const struct bus_type gadget_bus_type;
  * @vbus: for udcs who care about vbus status, this value is real vbus status;
  * for udcs who do not care about vbus status, this value is always true
  * @started: the UDC's started state. True if the UDC had started.
- * @connect_lock: protects udc->vbus, udc->started, gadget->connect, gadget->deactivate related
- * functions. usb_gadget_connect_locked, usb_gadget_disconnect_locked,
- * usb_udc_connect_control_locked, usb_gadget_udc_start_locked, usb_gadget_udc_stop_locked are
- * called with this lock held.
+ * @allow_connect: Indicates whether UDC is allowed to be pulled up.
+ * Set/cleared by gadget_(un)bind_driver() after gadget driver is bound or
+ * unbound.
+ * @connect_lock: protects udc->started, gadget->connect,
+ * gadget->allow_connect and gadget->deactivate. The routines
+ * usb_gadget_connect_locked(), usb_gadget_disconnect_locked(),
+ * usb_udc_connect_control_locked(), usb_gadget_udc_start_locked() and
+ * usb_gadget_udc_stop_locked() are called with this lock held.
  *
  * This represents the internal data structure which is used by the UDC-class
  * to hold information about udc driver and gadget together.
@@ -52,6 +56,8 @@ struct usb_udc {
        struct list_head                list;
        bool                            vbus;
        bool                            started;
+       bool                            allow_connect;
+       struct work_struct              vbus_work;
        struct mutex                    connect_lock;
 };
 
@@ -692,7 +698,6 @@ out:
 }
 EXPORT_SYMBOL_GPL(usb_gadget_vbus_disconnect);
 
-/* Internal version of usb_gadget_connect needs to be called with connect_lock held. */
 static int usb_gadget_connect_locked(struct usb_gadget *gadget)
        __must_hold(&gadget->udc->connect_lock)
 {
@@ -703,15 +708,12 @@ static int usb_gadget_connect_locked(struct usb_gadget *gadget)
                goto out;
        }
 
-       if (gadget->connected)
-               goto out;
-
-       if (gadget->deactivated || !gadget->udc->started) {
+       if (gadget->deactivated || !gadget->udc->allow_connect || !gadget->udc->started) {
                /*
-                * If gadget is deactivated we only save new state.
-                * Gadget will be connected automatically after activation.
-                *
-                * udc first needs to be started before gadget can be pulled up.
+                * If the gadget isn't usable (because it is deactivated,
+                * unbound, or not yet started), we only save the new state.
+                * The gadget will be connected automatically when it is
+                * activated/bound/started.
                 */
                gadget->connected = true;
                goto out;
@@ -749,7 +751,6 @@ int usb_gadget_connect(struct usb_gadget *gadget)
 }
 EXPORT_SYMBOL_GPL(usb_gadget_connect);
 
-/* Internal version of usb_gadget_disconnect needs to be called with connect_lock held. */
 static int usb_gadget_disconnect_locked(struct usb_gadget *gadget)
        __must_hold(&gadget->udc->connect_lock)
 {
@@ -767,8 +768,6 @@ static int usb_gadget_disconnect_locked(struct usb_gadget *gadget)
                /*
                 * If gadget is deactivated we only save new state.
                 * Gadget will stay disconnected after activation.
-                *
-                * udc should have been started before gadget being pulled down.
                 */
                gadget->connected = false;
                goto out;
@@ -829,10 +828,10 @@ int usb_gadget_deactivate(struct usb_gadget *gadget)
 {
        int ret = 0;
 
+       mutex_lock(&gadget->udc->connect_lock);
        if (gadget->deactivated)
-               goto out;
+               goto unlock;
 
-       mutex_lock(&gadget->udc->connect_lock);
        if (gadget->connected) {
                ret = usb_gadget_disconnect_locked(gadget);
                if (ret)
@@ -848,7 +847,6 @@ int usb_gadget_deactivate(struct usb_gadget *gadget)
 
 unlock:
        mutex_unlock(&gadget->udc->connect_lock);
-out:
        trace_usb_gadget_deactivate(gadget, ret);
 
        return ret;
@@ -868,10 +866,10 @@ int usb_gadget_activate(struct usb_gadget *gadget)
 {
        int ret = 0;
 
+       mutex_lock(&gadget->udc->connect_lock);
        if (!gadget->deactivated)
-               goto out;
+               goto unlock;
 
-       mutex_lock(&gadget->udc->connect_lock);
        gadget->deactivated = false;
 
        /*
@@ -882,7 +880,8 @@ int usb_gadget_activate(struct usb_gadget *gadget)
                ret = usb_gadget_connect_locked(gadget);
        mutex_unlock(&gadget->udc->connect_lock);
 
-out:
+unlock:
+       mutex_unlock(&gadget->udc->connect_lock);
        trace_usb_gadget_activate(gadget, ret);
 
        return ret;
@@ -1124,12 +1123,21 @@ EXPORT_SYMBOL_GPL(usb_gadget_set_state);
 /* Acquire connect_lock before calling this function. */
 static void usb_udc_connect_control_locked(struct usb_udc *udc) __must_hold(&udc->connect_lock)
 {
-       if (udc->vbus && udc->started)
+       if (udc->vbus)
                usb_gadget_connect_locked(udc->gadget);
        else
                usb_gadget_disconnect_locked(udc->gadget);
 }
 
+static void vbus_event_work(struct work_struct *work)
+{
+       struct usb_udc *udc = container_of(work, struct usb_udc, vbus_work);
+
+       mutex_lock(&udc->connect_lock);
+       usb_udc_connect_control_locked(udc);
+       mutex_unlock(&udc->connect_lock);
+}
+
 /**
  * usb_udc_vbus_handler - updates the udc core vbus status, and try to
  * connect or disconnect gadget
@@ -1138,17 +1146,23 @@ static void usb_udc_connect_control_locked(struct usb_udc *udc) __must_hold(&udc
  *
  * The udc driver calls it when it wants to connect or disconnect gadget
  * according to vbus status.
+ *
+ * This function can be invoked from interrupt context by irq handlers of
+ * the gadget drivers, however, usb_udc_connect_control() has to run in
+ * non-atomic context due to the following:
+ * a. Some of the gadget driver implementations expect the ->pullup
+ * callback to be invoked in non-atomic context.
+ * b. usb_gadget_disconnect() acquires udc_lock which is a mutex.
+ * Hence offload invocation of usb_udc_connect_control() to workqueue.
  */
 void usb_udc_vbus_handler(struct usb_gadget *gadget, bool status)
 {
        struct usb_udc *udc = gadget->udc;
 
-       mutex_lock(&udc->connect_lock);
        if (udc) {
                udc->vbus = status;
-               usb_udc_connect_control_locked(udc);
+               schedule_work(&udc->vbus_work);
        }
-       mutex_unlock(&udc->connect_lock);
 }
 EXPORT_SYMBOL_GPL(usb_udc_vbus_handler);
 
@@ -1381,6 +1395,7 @@ int usb_add_gadget(struct usb_gadget *gadget)
        mutex_lock(&udc_lock);
        list_add_tail(&udc->list, &udc_list);
        mutex_unlock(&udc_lock);
+       INIT_WORK(&udc->vbus_work, vbus_event_work);
 
        ret = device_add(&udc->dev);
        if (ret)
@@ -1512,6 +1527,7 @@ void usb_del_gadget(struct usb_gadget *gadget)
        flush_work(&gadget->work);
        device_del(&gadget->dev);
        ida_free(&gadget_id_numbers, gadget->id_number);
+       cancel_work_sync(&udc->vbus_work);
        device_unregister(&udc->dev);
 }
 EXPORT_SYMBOL_GPL(usb_del_gadget);
@@ -1583,6 +1599,7 @@ static int gadget_bind_driver(struct device *dev)
                goto err_start;
        }
        usb_gadget_enable_async_callbacks(udc);
+       udc->allow_connect = true;
        usb_udc_connect_control_locked(udc);
        mutex_unlock(&udc->connect_lock);
 
@@ -1615,6 +1632,8 @@ static void gadget_unbind_driver(struct device *dev)
 
        kobject_uevent(&udc->dev.kobj, KOBJ_CHANGE);
 
+       udc->allow_connect = false;
+       cancel_work_sync(&udc->vbus_work);
        mutex_lock(&udc->connect_lock);
        usb_gadget_disconnect_locked(gadget);
        usb_gadget_disable_async_callbacks(udc);
index aac8bc1..eb008e8 100644 (file)
@@ -2877,9 +2877,9 @@ static int renesas_usb3_probe(struct platform_device *pdev)
                struct rzv2m_usb3drd *ddata = dev_get_drvdata(pdev->dev.parent);
 
                usb3->drd_reg = ddata->reg;
-               ret = devm_request_irq(ddata->dev, ddata->drd_irq,
+               ret = devm_request_irq(&pdev->dev, ddata->drd_irq,
                                       renesas_usb3_otg_irq, 0,
-                                      dev_name(ddata->dev), usb3);
+                                      dev_name(&pdev->dev), usb3);
                if (ret < 0)
                        return ret;
        }
index 3592f75..7bd2fdd 100644 (file)
@@ -119,11 +119,13 @@ static int uhci_pci_init(struct usb_hcd *hcd)
 
        uhci->rh_numports = uhci_count_ports(hcd);
 
-       /* Intel controllers report the OverCurrent bit active on.
-        * VIA controllers report it active off, so we'll adjust the
-        * bit value.  (It's not standardized in the UHCI spec.)
+       /*
+        * Intel controllers report the OverCurrent bit active on.  VIA
+        * and ZHAOXIN controllers report it active off, so we'll adjust
+        * the bit value.  (It's not standardized in the UHCI spec.)
         */
-       if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA)
+       if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA ||
+                       to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_ZHAOXIN)
                uhci->oc_low = 1;
 
        /* HP's server management chip requires a longer port reset delay. */
index ddb79f2..79b3691 100644 (file)
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/acpi.h>
 #include <linux/reset.h>
+#include <linux/suspend.h>
 
 #include "xhci.h"
 #include "xhci-trace.h"
@@ -387,7 +388,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
 
        if (pdev->vendor == PCI_VENDOR_ID_AMD &&
                pdev->device == PCI_DEVICE_ID_AMD_RENOIR_XHCI)
-               xhci->quirks |= XHCI_BROKEN_D3COLD;
+               xhci->quirks |= XHCI_BROKEN_D3COLD_S2I;
 
        if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
                xhci->quirks |= XHCI_LPM_SUPPORT;
@@ -801,9 +802,16 @@ static int xhci_pci_suspend(struct usb_hcd *hcd, bool do_wakeup)
         * Systems with the TI redriver that loses port status change events
         * need to have the registers polled during D3, so avoid D3cold.
         */
-       if (xhci->quirks & (XHCI_COMP_MODE_QUIRK | XHCI_BROKEN_D3COLD))
+       if (xhci->quirks & XHCI_COMP_MODE_QUIRK)
                pci_d3cold_disable(pdev);
 
+#ifdef CONFIG_SUSPEND
+       /* d3cold is broken, but only when s2idle is used */
+       if (pm_suspend_target_state == PM_SUSPEND_TO_IDLE &&
+           xhci->quirks & (XHCI_BROKEN_D3COLD_S2I))
+               pci_d3cold_disable(pdev);
+#endif
+
        if (xhci->quirks & XHCI_PME_STUCK_QUIRK)
                xhci_pme_quirk(hcd);
 
index 1ad12d5..2bc82b3 100644 (file)
@@ -276,6 +276,26 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
        trace_xhci_inc_enq(ring);
 }
 
+static int xhci_num_trbs_to(struct xhci_segment *start_seg, union xhci_trb *start,
+                           struct xhci_segment *end_seg, union xhci_trb *end,
+                           unsigned int num_segs)
+{
+       union xhci_trb *last_on_seg;
+       int num = 0;
+       int i = 0;
+
+       do {
+               if (start_seg == end_seg && end >= start)
+                       return num + (end - start);
+               last_on_seg = &start_seg->trbs[TRBS_PER_SEGMENT - 1];
+               num += last_on_seg - start;
+               start_seg = start_seg->next;
+               start = start_seg->trbs;
+       } while (i++ <= num_segs);
+
+       return -EINVAL;
+}
+
 /*
  * Check to see if there's room to enqueue num_trbs on the ring and make sure
  * enqueue pointer will not advance into dequeue segment. See rules above.
@@ -2140,6 +2160,7 @@ static int finish_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
                     u32 trb_comp_code)
 {
        struct xhci_ep_ctx *ep_ctx;
+       int trbs_freed;
 
        ep_ctx = xhci_get_ep_ctx(xhci, ep->vdev->out_ctx, ep->ep_index);
 
@@ -2209,9 +2230,15 @@ static int finish_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
        }
 
        /* Update ring dequeue pointer */
+       trbs_freed = xhci_num_trbs_to(ep_ring->deq_seg, ep_ring->dequeue,
+                                     td->last_trb_seg, td->last_trb,
+                                     ep_ring->num_segs);
+       if (trbs_freed < 0)
+               xhci_dbg(xhci, "Failed to count freed trbs at TD finish\n");
+       else
+               ep_ring->num_trbs_free += trbs_freed;
        ep_ring->dequeue = td->last_trb;
        ep_ring->deq_seg = td->last_trb_seg;
-       ep_ring->num_trbs_free += td->num_trbs - 1;
        inc_deq(xhci, ep_ring);
 
        return xhci_td_cleanup(xhci, td, ep_ring, td->status);
index 08d7219..6b690ec 100644 (file)
@@ -1901,7 +1901,7 @@ struct xhci_hcd {
 #define XHCI_DISABLE_SPARSE    BIT_ULL(38)
 #define XHCI_SG_TRB_CACHE_SIZE_QUIRK   BIT_ULL(39)
 #define XHCI_NO_SOFT_RETRY     BIT_ULL(40)
-#define XHCI_BROKEN_D3COLD     BIT_ULL(41)
+#define XHCI_BROKEN_D3COLD_S2I BIT_ULL(41)
 #define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
 #define XHCI_SUSPEND_RESUME_CLKS       BIT_ULL(43)
 #define XHCI_RESET_TO_DEFAULT  BIT_ULL(44)
index 644a554..fd42e3a 100644 (file)
@@ -248,6 +248,8 @@ static void option_instat_callback(struct urb *urb);
 #define QUECTEL_VENDOR_ID                      0x2c7c
 /* These Quectel products use Quectel's vendor ID */
 #define QUECTEL_PRODUCT_EC21                   0x0121
+#define QUECTEL_PRODUCT_EM061K_LTA             0x0123
+#define QUECTEL_PRODUCT_EM061K_LMS             0x0124
 #define QUECTEL_PRODUCT_EC25                   0x0125
 #define QUECTEL_PRODUCT_EG91                   0x0191
 #define QUECTEL_PRODUCT_EG95                   0x0195
@@ -266,6 +268,8 @@ static void option_instat_callback(struct urb *urb);
 #define QUECTEL_PRODUCT_RM520N                 0x0801
 #define QUECTEL_PRODUCT_EC200U                 0x0901
 #define QUECTEL_PRODUCT_EC200S_CN              0x6002
+#define QUECTEL_PRODUCT_EM061K_LWW             0x6008
+#define QUECTEL_PRODUCT_EM061K_LCN             0x6009
 #define QUECTEL_PRODUCT_EC200T                 0x6026
 #define QUECTEL_PRODUCT_RM500K                 0x7001
 
@@ -1189,6 +1193,18 @@ static const struct usb_device_id option_ids[] = {
        { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM060K, 0xff, 0x00, 0x40) },
        { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM060K, 0xff, 0xff, 0x30) },
        { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM060K, 0xff, 0xff, 0x40) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LCN, 0xff, 0xff, 0x30) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LCN, 0xff, 0x00, 0x40) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LCN, 0xff, 0xff, 0x40) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LMS, 0xff, 0xff, 0x30) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LMS, 0xff, 0x00, 0x40) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LMS, 0xff, 0xff, 0x40) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LTA, 0xff, 0xff, 0x30) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LTA, 0xff, 0x00, 0x40) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LTA, 0xff, 0xff, 0x40) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LWW, 0xff, 0xff, 0x30) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LWW, 0xff, 0x00, 0x40) },
+       { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM061K_LWW, 0xff, 0xff, 0x40) },
        { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM12, 0xff, 0xff, 0xff),
          .driver_info = RSVD(1) | RSVD(2) | RSVD(3) | RSVD(4) | NUMEP2 },
        { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EM12, 0xff, 0, 0) },
index 8931df5..c54e980 100644 (file)
@@ -406,22 +406,25 @@ static DEF_SCSI_QCMD(queuecommand)
  ***********************************************************************/
 
 /* Command timeout and abort */
-static int command_abort(struct scsi_cmnd *srb)
+static int command_abort_matching(struct us_data *us, struct scsi_cmnd *srb_match)
 {
-       struct us_data *us = host_to_us(srb->device->host);
-
-       usb_stor_dbg(us, "%s called\n", __func__);
-
        /*
         * us->srb together with the TIMED_OUT, RESETTING, and ABORTING
         * bits are protected by the host lock.
         */
        scsi_lock(us_to_host(us));
 
-       /* Is this command still active? */
-       if (us->srb != srb) {
+       /* is there any active pending command to abort ? */
+       if (!us->srb) {
                scsi_unlock(us_to_host(us));
                usb_stor_dbg(us, "-- nothing to abort\n");
+               return SUCCESS;
+       }
+
+       /* Does the command match the passed srb if any ? */
+       if (srb_match && us->srb != srb_match) {
+               scsi_unlock(us_to_host(us));
+               usb_stor_dbg(us, "-- pending command mismatch\n");
                return FAILED;
        }
 
@@ -444,6 +447,14 @@ static int command_abort(struct scsi_cmnd *srb)
        return SUCCESS;
 }
 
+static int command_abort(struct scsi_cmnd *srb)
+{
+       struct us_data *us = host_to_us(srb->device->host);
+
+       usb_stor_dbg(us, "%s called\n", __func__);
+       return command_abort_matching(us, srb);
+}
+
 /*
  * This invokes the transport reset mechanism to reset the state of the
  * device
@@ -455,6 +466,9 @@ static int device_reset(struct scsi_cmnd *srb)
 
        usb_stor_dbg(us, "%s called\n", __func__);
 
+       /* abort any pending command before reset */
+       command_abort_matching(us, NULL);
+
        /* lock the device pointers and do the reset */
        mutex_lock(&(us->dev_mutex));
        result = us->transport_reset(us);
index 8f3e884..66de880 100644 (file)
@@ -516,6 +516,10 @@ static ssize_t pin_assignment_show(struct device *dev,
 
        mutex_unlock(&dp->lock);
 
+       /* get_current_pin_assignments can return 0 when no matching pin assignments are found */
+       if (len == 0)
+               len++;
+
        buf[len - 1] = '\n';
        return len;
 }
index 0bcde1f..8cc66e4 100644 (file)
@@ -95,7 +95,7 @@ peak_current_show(struct device *dev, struct device_attribute *attr, char *buf)
 static ssize_t
 fast_role_swap_current_show(struct device *dev, struct device_attribute *attr, char *buf)
 {
-       return sysfs_emit(buf, "%u\n", to_pdo(dev)->pdo >> PDO_FIXED_FRS_CURR_SHIFT) & 3;
+       return sysfs_emit(buf, "%u\n", (to_pdo(dev)->pdo >> PDO_FIXED_FRS_CURR_SHIFT) & 3);
 }
 static DEVICE_ATTR_RO(fast_role_swap_current);
 
index 8b075ca..603dbd4 100644 (file)
@@ -886,6 +886,9 @@ static void tps6598x_remove(struct i2c_client *client)
 {
        struct tps6598x *tps = i2c_get_clientdata(client);
 
+       if (!client->irq)
+               cancel_delayed_work_sync(&tps->wq_poll);
+
        tps6598x_disconnect(tps, 0);
        typec_unregister_port(tps->port);
        usb_role_switch_put(tps->role_sw);
@@ -917,7 +920,7 @@ static int __maybe_unused tps6598x_resume(struct device *dev)
                enable_irq(client->irq);
        }
 
-       if (client->irq)
+       if (!client->irq)
                queue_delayed_work(system_power_efficient_wq, &tps->wq_poll,
                                   msecs_to_jiffies(POLL_INTERVAL));
 
index 2b472ec..b664ecb 100644 (file)
@@ -132,10 +132,8 @@ static int ucsi_exec_command(struct ucsi *ucsi, u64 cmd)
        if (ret)
                return ret;
 
-       if (cci & UCSI_CCI_BUSY) {
-               ucsi->ops->async_write(ucsi, UCSI_CANCEL, NULL, 0);
-               return -EBUSY;
-       }
+       if (cmd != UCSI_CANCEL && cci & UCSI_CCI_BUSY)
+               return ucsi_exec_command(ucsi, UCSI_CANCEL);
 
        if (!(cci & UCSI_CCI_COMMAND_COMPLETE))
                return -EIO;
@@ -149,6 +147,11 @@ static int ucsi_exec_command(struct ucsi *ucsi, u64 cmd)
                return ucsi_read_error(ucsi);
        }
 
+       if (cmd == UCSI_CANCEL && cci & UCSI_CCI_CANCEL_COMPLETE) {
+               ret = ucsi_acknowledge_command(ucsi);
+               return ret ? ret : -EBUSY;
+       }
+
        return UCSI_CCI_LENGTH(cci);
 }
 
index e29e32b..279ac6a 100644 (file)
@@ -3349,10 +3349,10 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
        mlx5_vdpa_remove_debugfs(ndev->debugfs);
        ndev->debugfs = NULL;
        unregister_link_notifier(ndev);
+       _vdpa_unregister_device(dev);
        wq = mvdev->wq;
        mvdev->wq = NULL;
        destroy_workqueue(wq);
-       _vdpa_unregister_device(dev);
        mgtdev->ndev = NULL;
 }
 
index de97e38..5f5c216 100644 (file)
@@ -1685,6 +1685,9 @@ static bool vduse_validate_config(struct vduse_dev_config *config)
        if (config->vq_num > 0xffff)
                return false;
 
+       if (!config->name[0])
+               return false;
+
        if (!device_is_allowed(config->device_id))
                return false;
 
index 3d4dd94..0d2f805 100644 (file)
@@ -860,6 +860,11 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
                if (ret)
                        goto pin_unwind;
 
+               if (!pfn_valid(phys_pfn)) {
+                       ret = -EINVAL;
+                       goto pin_unwind;
+               }
+
                ret = vfio_add_to_pfn_list(dma, iova, phys_pfn);
                if (ret) {
                        if (put_pfn(phys_pfn, dma->prot) && do_accounting)
index 07181cd..ae22731 100644 (file)
@@ -935,13 +935,18 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 
                err = sock->ops->sendmsg(sock, &msg, len);
                if (unlikely(err < 0)) {
+                       bool retry = err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS;
+
                        if (zcopy_used) {
                                if (vq->heads[ubuf->desc].len == VHOST_DMA_IN_PROGRESS)
                                        vhost_net_ubuf_put(ubufs);
-                               nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
-                                       % UIO_MAXIOV;
+                               if (retry)
+                                       nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
+                                               % UIO_MAXIOV;
+                               else
+                                       vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
                        }
-                       if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) {
+                       if (retry) {
                                vhost_discard_vq_desc(vq, 1);
                                vhost_net_enable_vq(net, vq);
                                break;
index 8c1aefc..bf77924 100644 (file)
@@ -407,7 +407,10 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep)
 {
        struct vdpa_device *vdpa = v->vdpa;
        const struct vdpa_config_ops *ops = vdpa->config;
+       struct vhost_dev *d = &v->vdev;
+       u64 actual_features;
        u64 features;
+       int i;
 
        /*
         * It's not allowed to change the features after they have
@@ -422,6 +425,16 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep)
        if (vdpa_set_features(vdpa, features))
                return -EINVAL;
 
+       /* let the vqs know what has been configured */
+       actual_features = ops->get_driver_features(vdpa);
+       for (i = 0; i < d->nvqs; ++i) {
+               struct vhost_virtqueue *vq = d->vqs[i];
+
+               mutex_lock(&vq->mutex);
+               vq->acked_features = actual_features;
+               mutex_unlock(&vq->mutex);
+       }
+
        return 0;
 }
 
@@ -594,7 +607,14 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
                if (r)
                        return r;
 
-               vq->last_avail_idx = vq_state.split.avail_index;
+               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
+                       vq->last_avail_idx = vq_state.packed.last_avail_idx |
+                                            (vq_state.packed.last_avail_counter << 15);
+                       vq->last_used_idx = vq_state.packed.last_used_idx |
+                                           (vq_state.packed.last_used_counter << 15);
+               } else {
+                       vq->last_avail_idx = vq_state.split.avail_index;
+               }
                break;
        }
 
@@ -612,9 +632,15 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
                break;
 
        case VHOST_SET_VRING_BASE:
-               vq_state.split.avail_index = vq->last_avail_idx;
-               if (ops->set_vq_state(vdpa, idx, &vq_state))
-                       r = -EINVAL;
+               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
+                       vq_state.packed.last_avail_idx = vq->last_avail_idx & 0x7fff;
+                       vq_state.packed.last_avail_counter = !!(vq->last_avail_idx & 0x8000);
+                       vq_state.packed.last_used_idx = vq->last_used_idx & 0x7fff;
+                       vq_state.packed.last_used_counter = !!(vq->last_used_idx & 0x8000);
+               } else {
+                       vq_state.split.avail_index = vq->last_avail_idx;
+               }
+               r = ops->set_vq_state(vdpa, idx, &vq_state);
                break;
 
        case VHOST_SET_VRING_CALL:
index a92af08..60c9ebd 100644 (file)
@@ -235,7 +235,7 @@ void vhost_dev_flush(struct vhost_dev *dev)
 {
        struct vhost_flush_struct flush;
 
-       if (dev->worker) {
+       if (dev->worker.vtsk) {
                init_completion(&flush.wait_event);
                vhost_work_init(&flush.work, vhost_flush_work);
 
@@ -247,7 +247,7 @@ EXPORT_SYMBOL_GPL(vhost_dev_flush);
 
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
-       if (!dev->worker)
+       if (!dev->worker.vtsk)
                return;
 
        if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
@@ -255,8 +255,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
                 * sure it was not in the list.
                 * test_and_set_bit() implies a memory barrier.
                 */
-               llist_add(&work->node, &dev->worker->work_list);
-               wake_up_process(dev->worker->vtsk->task);
+               llist_add(&work->node, &dev->worker.work_list);
+               vhost_task_wake(dev->worker.vtsk);
        }
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
@@ -264,7 +264,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue);
 /* A lockless hint for busy polling code to exit the loop */
 bool vhost_has_work(struct vhost_dev *dev)
 {
-       return dev->worker && !llist_empty(&dev->worker->work_list);
+       return !llist_empty(&dev->worker.work_list);
 }
 EXPORT_SYMBOL_GPL(vhost_has_work);
 
@@ -333,31 +333,21 @@ static void vhost_vq_reset(struct vhost_dev *dev,
        __vhost_vq_meta_reset(vq);
 }
 
-static int vhost_worker(void *data)
+static bool vhost_worker(void *data)
 {
        struct vhost_worker *worker = data;
        struct vhost_work *work, *work_next;
        struct llist_node *node;
 
-       for (;;) {
-               /* mb paired w/ kthread_stop */
-               set_current_state(TASK_INTERRUPTIBLE);
-
-               if (vhost_task_should_stop(worker->vtsk)) {
-                       __set_current_state(TASK_RUNNING);
-                       break;
-               }
-
-               node = llist_del_all(&worker->work_list);
-               if (!node)
-                       schedule();
+       node = llist_del_all(&worker->work_list);
+       if (node) {
+               __set_current_state(TASK_RUNNING);
 
                node = llist_reverse_order(node);
                /* make sure flag is seen after deletion */
                smp_wmb();
                llist_for_each_entry_safe(work, work_next, node, node) {
                        clear_bit(VHOST_WORK_QUEUED, &work->flags);
-                       __set_current_state(TASK_RUNNING);
                        kcov_remote_start_common(worker->kcov_handle);
                        work->fn(work);
                        kcov_remote_stop();
@@ -365,7 +355,7 @@ static int vhost_worker(void *data)
                }
        }
 
-       return 0;
+       return !!node;
 }
 
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
@@ -468,7 +458,8 @@ void vhost_dev_init(struct vhost_dev *dev,
        dev->umem = NULL;
        dev->iotlb = NULL;
        dev->mm = NULL;
-       dev->worker = NULL;
+       memset(&dev->worker, 0, sizeof(dev->worker));
+       init_llist_head(&dev->worker.work_list);
        dev->iov_limit = iov_limit;
        dev->weight = weight;
        dev->byte_weight = byte_weight;
@@ -542,47 +533,30 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 
 static void vhost_worker_free(struct vhost_dev *dev)
 {
-       struct vhost_worker *worker = dev->worker;
-
-       if (!worker)
+       if (!dev->worker.vtsk)
                return;
 
-       dev->worker = NULL;
-       WARN_ON(!llist_empty(&worker->work_list));
-       vhost_task_stop(worker->vtsk);
-       kfree(worker);
+       WARN_ON(!llist_empty(&dev->worker.work_list));
+       vhost_task_stop(dev->worker.vtsk);
+       dev->worker.kcov_handle = 0;
+       dev->worker.vtsk = NULL;
 }
 
 static int vhost_worker_create(struct vhost_dev *dev)
 {
-       struct vhost_worker *worker;
        struct vhost_task *vtsk;
        char name[TASK_COMM_LEN];
-       int ret;
-
-       worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
-       if (!worker)
-               return -ENOMEM;
 
-       dev->worker = worker;
-       worker->kcov_handle = kcov_common_handle();
-       init_llist_head(&worker->work_list);
        snprintf(name, sizeof(name), "vhost-%d", current->pid);
 
-       vtsk = vhost_task_create(vhost_worker, worker, name);
-       if (!vtsk) {
-               ret = -ENOMEM;
-               goto free_worker;
-       }
+       vtsk = vhost_task_create(vhost_worker, &dev->worker, name);
+       if (!vtsk)
+               return -ENOMEM;
 
-       worker->vtsk = vtsk;
+       dev->worker.kcov_handle = kcov_common_handle();
+       dev->worker.vtsk = vtsk;
        vhost_task_start(vtsk);
        return 0;
-
-free_worker:
-       kfree(worker);
-       dev->worker = NULL;
-       return ret;
 }
 
 /* Caller should have device mutex */
@@ -1626,17 +1600,25 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
                        r = -EFAULT;
                        break;
                }
-               if (s.num > 0xffff) {
-                       r = -EINVAL;
-                       break;
+               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
+                       vq->last_avail_idx = s.num & 0xffff;
+                       vq->last_used_idx = (s.num >> 16) & 0xffff;
+               } else {
+                       if (s.num > 0xffff) {
+                               r = -EINVAL;
+                               break;
+                       }
+                       vq->last_avail_idx = s.num;
                }
-               vq->last_avail_idx = s.num;
                /* Forget the cached index value. */
                vq->avail_idx = vq->last_avail_idx;
                break;
        case VHOST_GET_VRING_BASE:
                s.index = idx;
-               s.num = vq->last_avail_idx;
+               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+                       s.num = (u32)vq->last_avail_idx | ((u32)vq->last_used_idx << 16);
+               else
+                       s.num = vq->last_avail_idx;
                if (copy_to_user(argp, &s, sizeof s))
                        r = -EFAULT;
                break;
@@ -2575,12 +2557,11 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
 /* Create a new message. */
 struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
 {
-       struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
+       /* Make sure all padding within the structure is initialized. */
+       struct vhost_msg_node *node = kzalloc(sizeof(*node), GFP_KERNEL);
        if (!node)
                return NULL;
 
-       /* Make sure all padding within the structure is initialized. */
-       memset(&node->msg, 0, sizeof node->msg);
        node->vq = vq;
        node->msg.type = type;
        return node;
index 0308638..fc900be 100644 (file)
@@ -92,13 +92,17 @@ struct vhost_virtqueue {
        /* The routine to call when the Guest pings us, or timeout. */
        vhost_work_fn_t handle_kick;
 
-       /* Last available index we saw. */
+       /* Last available index we saw.
+        * Values are limited to 0x7fff, and the high bit is used as
+        * a wrap counter when using VIRTIO_F_RING_PACKED. */
        u16 last_avail_idx;
 
        /* Caches available index value from user. */
        u16 avail_idx;
 
-       /* Last index we used. */
+       /* Last index we used.
+        * Values are limited to 0x7fff, and the high bit is used as
+        * a wrap counter when using VIRTIO_F_RING_PACKED. */
        u16 last_used_idx;
 
        /* Used flags */
@@ -154,7 +158,7 @@ struct vhost_dev {
        struct vhost_virtqueue **vqs;
        int nvqs;
        struct eventfd_ctx *log_ctx;
-       struct vhost_worker *worker;
+       struct vhost_worker worker;
        struct vhost_iotlb *umem;
        struct vhost_iotlb *iotlb;
        spinlock_t iotlb_lock;
index 3ccf46f..07d6e8d 100644 (file)
@@ -124,7 +124,7 @@ static u_long get_line_length(int xres_virtual, int bpp)
      *  First part, xxxfb_check_var, must not write anything
      *  to hardware, it should only verify and adjust var.
      *  This means it doesn't alter par but it does use hardware
-     *  data from it to check this var. 
+     *  data from it to check this var.
      */
 
 static int mc68x328fb_check_var(struct fb_var_screeninfo *var,
@@ -182,7 +182,7 @@ static int mc68x328fb_check_var(struct fb_var_screeninfo *var,
 
        /*
         * Now that we checked it we alter var. The reason being is that the video
-        * mode passed in might not work but slight changes to it might make it 
+        * mode passed in might not work but slight changes to it might make it
         * work. This way we let the user know what is acceptable.
         */
        switch (var->bits_per_pixel) {
@@ -257,8 +257,8 @@ static int mc68x328fb_check_var(struct fb_var_screeninfo *var,
 }
 
 /* This routine actually sets the video mode. It's in here where we
- * the hardware state info->par and fix which can be affected by the 
- * change in par. For this driver it doesn't do much. 
+ * the hardware state info->par and fix which can be affected by the
+ * change in par. For this driver it doesn't do much.
  */
 static int mc68x328fb_set_par(struct fb_info *info)
 {
@@ -295,7 +295,7 @@ static int mc68x328fb_setcolreg(u_int regno, u_int red, u_int green, u_int blue,
         *   {hardwarespecific} contains width of RAMDAC
         *   cmap[X] is programmed to (X << red.offset) | (X << green.offset) | (X << blue.offset)
         *   RAMDAC[X] is programmed to (red, green, blue)
-        * 
+        *
         * Pseudocolor:
         *    uses offset = 0 && length = RAMDAC register width.
         *    var->{color}.offset is 0
@@ -384,7 +384,7 @@ static int mc68x328fb_pan_display(struct fb_var_screeninfo *var,
 }
 
     /*
-     *  Most drivers don't need their own mmap function 
+     *  Most drivers don't need their own mmap function
      */
 
 static int mc68x328fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
index 96e9157..0fdf5f4 100644 (file)
@@ -124,7 +124,7 @@ config FB_PROVIDE_GET_FB_UNMAPPED_AREA
        depends on FB
        help
          Allow generic frame-buffer to provide get_fb_unmapped_area
-         function.
+         function to provide shareable character device support on nommu.
 
 menuconfig FB_FOREIGN_ENDIAN
        bool "Framebuffer foreign endianness support"
index 45e6401..08d15e4 100644 (file)
@@ -523,7 +523,7 @@ static int arcfb_probe(struct platform_device *dev)
 
        info = framebuffer_alloc(sizeof(struct arcfb_par), &dev->dev);
        if (!info)
-               goto err;
+               goto err_fb_alloc;
 
        info->screen_base = (char __iomem *)videomemory;
        info->fbops = &arcfb_ops;
@@ -535,7 +535,7 @@ static int arcfb_probe(struct platform_device *dev)
 
        if (!dio_addr || !cio_addr || !c2io_addr) {
                printk(KERN_WARNING "no IO addresses supplied\n");
-               goto err1;
+               goto err_addr;
        }
        par->dio_addr = dio_addr;
        par->cio_addr = cio_addr;
@@ -551,12 +551,12 @@ static int arcfb_probe(struct platform_device *dev)
                        printk(KERN_INFO
                                "arcfb: Failed req IRQ %d\n", par->irq);
                        retval = -EBUSY;
-                       goto err1;
+                       goto err_addr;
                }
        }
        retval = register_framebuffer(info);
        if (retval < 0)
-               goto err1;
+               goto err_register_fb;
        platform_set_drvdata(dev, info);
        fb_info(info, "Arc frame buffer device, using %dK of video memory\n",
                videomemorysize >> 10);
@@ -580,14 +580,17 @@ static int arcfb_probe(struct platform_device *dev)
        }
 
        return 0;
-err1:
+
+err_register_fb:
+       free_irq(par->irq, info);
+err_addr:
        framebuffer_release(info);
-err:
+err_fb_alloc:
        vfree(videomemory);
        return retval;
 }
 
-static int arcfb_remove(struct platform_device *dev)
+static void arcfb_remove(struct platform_device *dev)
 {
        struct fb_info *info = platform_get_drvdata(dev);
 
@@ -598,12 +601,11 @@ static int arcfb_remove(struct platform_device *dev)
                vfree((void __force *)info->screen_base);
                framebuffer_release(info);
        }
-       return 0;
 }
 
 static struct platform_driver arcfb_driver = {
        .probe  = arcfb_probe,
-       .remove = arcfb_remove,
+       .remove_new = arcfb_remove,
        .driver = {
                .name   = "arcfb",
        },
index 8187a7c..987c5f5 100644 (file)
@@ -317,7 +317,7 @@ static inline void atmel_lcdfb_free_video_memory(struct atmel_lcdfb_info *sinfo)
 /**
  *     atmel_lcdfb_alloc_video_memory - Allocate framebuffer memory
  *     @sinfo: the frame buffer to allocate memory for
- *     
+ *
  *     This function is called only from the atmel_lcdfb_probe()
  *     so no locking by fb_info->mm_lock around smem_len setting is needed.
  */
index b02e4e6..cba2b11 100644 (file)
@@ -3498,11 +3498,6 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
        if (ret)
                goto atyfb_setup_generic_fail;
 #endif
-       if (!(aty_ld_le32(CRTC_GEN_CNTL, par) & CRTC_EXT_DISP_EN))
-               par->clk_wr_offset = (inb(R_GENMO) & 0x0CU) >> 2;
-       else
-               par->clk_wr_offset = aty_ld_8(CLOCK_CNTL, par) & 0x03U;
-
        /* according to ATI, we should use clock 3 for acelerated mode */
        par->clk_wr_offset = 3;
 
index 519313b..648d6ca 100644 (file)
@@ -520,13 +520,10 @@ failed:
        return -ENODEV;
 }
 
-int au1100fb_drv_remove(struct platform_device *dev)
+void au1100fb_drv_remove(struct platform_device *dev)
 {
        struct au1100fb_device *fbdev = NULL;
 
-       if (!dev)
-               return -ENODEV;
-
        fbdev = platform_get_drvdata(dev);
 
 #if !defined(CONFIG_FRAMEBUFFER_CONSOLE) && defined(CONFIG_LOGO)
@@ -543,8 +540,6 @@ int au1100fb_drv_remove(struct platform_device *dev)
                clk_disable_unprepare(fbdev->lcdclk);
                clk_put(fbdev->lcdclk);
        }
-
-       return 0;
 }
 
 #ifdef CONFIG_PM
@@ -593,9 +588,9 @@ static struct platform_driver au1100fb_driver = {
                .name           = "au1100-lcd",
        },
        .probe          = au1100fb_drv_probe,
-        .remove                = au1100fb_drv_remove,
+       .remove_new     = au1100fb_drv_remove,
        .suspend        = au1100fb_drv_suspend,
-        .resume                = au1100fb_drv_resume,
+       .resume         = au1100fb_drv_resume,
 };
 module_platform_driver(au1100fb_driver);
 
index b6b22fa..aed88ce 100644 (file)
@@ -1765,7 +1765,7 @@ failed:
        return ret;
 }
 
-static int au1200fb_drv_remove(struct platform_device *dev)
+static void au1200fb_drv_remove(struct platform_device *dev)
 {
        struct au1200fb_platdata *pd = platform_get_drvdata(dev);
        struct fb_info *fbi;
@@ -1788,8 +1788,6 @@ static int au1200fb_drv_remove(struct platform_device *dev)
        }
 
        free_irq(platform_get_irq(dev, 0), (void *)dev);
-
-       return 0;
 }
 
 #ifdef CONFIG_PM
@@ -1840,7 +1838,7 @@ static struct platform_driver au1200fb_driver = {
                .pm     = AU1200FB_PMOPS,
        },
        .probe          = au1200fb_drv_probe,
-       .remove         = au1200fb_drv_remove,
+       .remove_new     = au1200fb_drv_remove,
 };
 module_platform_driver(au1200fb_driver);
 
index 55e62dd..b518cac 100644 (file)
@@ -1193,7 +1193,7 @@ err:
 
 }
 
-static int broadsheetfb_remove(struct platform_device *dev)
+static void broadsheetfb_remove(struct platform_device *dev)
 {
        struct fb_info *info = platform_get_drvdata(dev);
 
@@ -1209,12 +1209,11 @@ static int broadsheetfb_remove(struct platform_device *dev)
                module_put(par->board->owner);
                framebuffer_release(info);
        }
-       return 0;
 }
 
 static struct platform_driver broadsheetfb_driver = {
        .probe  = broadsheetfb_probe,
-       .remove = broadsheetfb_remove,
+       .remove_new = broadsheetfb_remove,
        .driver = {
                .name   = "broadsheetfb",
        },
index 9cbadcd..025d663 100644 (file)
@@ -352,7 +352,7 @@ out_err:
        return err;
 }
 
-static int bw2_remove(struct platform_device *op)
+static void bw2_remove(struct platform_device *op)
 {
        struct fb_info *info = dev_get_drvdata(&op->dev);
        struct bw2_par *par = info->par;
@@ -363,8 +363,6 @@ static int bw2_remove(struct platform_device *op)
        of_iounmap(&op->resource[0], info->screen_base, info->fix.smem_len);
 
        framebuffer_release(info);
-
-       return 0;
 }
 
 static const struct of_device_id bw2_match[] = {
@@ -381,7 +379,7 @@ static struct platform_driver bw2_driver = {
                .of_match_table = bw2_match,
        },
        .probe          = bw2_probe,
-       .remove         = bw2_remove,
+       .remove_new     = bw2_remove,
 };
 
 static int __init bw2_init(void)
index a028ede..832a82f 100644 (file)
@@ -512,7 +512,7 @@ static int cg14_probe(struct platform_device *op)
        is_8mb = (resource_size(&op->resource[1]) == (8 * 1024 * 1024));
 
        BUILD_BUG_ON(sizeof(par->mmap_map) != sizeof(__cg14_mmap_map));
-               
+
        memcpy(&par->mmap_map, &__cg14_mmap_map, sizeof(par->mmap_map));
 
        for (i = 0; i < CG14_MMAP_ENTRIES; i++) {
index 77dbf94..82eeb13 100644 (file)
@@ -113,14 +113,14 @@ struct fb_info_control {
        struct fb_info          info;
        struct fb_par_control   par;
        u32                     pseudo_palette[16];
-               
+
        struct cmap_regs        __iomem *cmap_regs;
        unsigned long           cmap_regs_phys;
-       
+
        struct control_regs     __iomem *control_regs;
        unsigned long           control_regs_phys;
        unsigned long           control_regs_size;
-       
+
        __u8                    __iomem *frame_buffer;
        unsigned long           frame_buffer_phys;
        unsigned long           fb_orig_base;
@@ -196,7 +196,7 @@ static void set_control_clock(unsigned char *params)
                while (!req.complete)
                        cuda_poll();
        }
-#endif 
+#endif
 }
 
 /*
@@ -233,19 +233,19 @@ static void control_set_hardware(struct fb_info_control *p, struct fb_par_contro
                if (p->par.xoffset != par->xoffset ||
                    p->par.yoffset != par->yoffset)
                        set_screen_start(par->xoffset, par->yoffset, p);
-                       
+
                return;
        }
-       
+
        p->par = *par;
        cmode = p->par.cmode;
        r = &par->regvals;
-       
+
        /* Turn off display */
        out_le32(CNTRL_REG(p,ctrl), 0x400 | par->ctrl);
-       
+
        set_control_clock(r->clock_params);
-       
+
        RADACAL_WRITE(0x20, r->radacal_ctrl);
        RADACAL_WRITE(0x21, p->control_use_bank2 ? 0 : 1);
        RADACAL_WRITE(0x10, 0);
@@ -254,7 +254,7 @@ static void control_set_hardware(struct fb_info_control *p, struct fb_par_contro
        rp = &p->control_regs->vswin;
        for (i = 0; i < 16; ++i, ++rp)
                out_le32(&rp->r, r->regs[i]);
-       
+
        out_le32(CNTRL_REG(p,pitch), par->pitch);
        out_le32(CNTRL_REG(p,mode), r->mode);
        out_le32(CNTRL_REG(p,vram_attr), p->vram_attr);
@@ -366,7 +366,7 @@ static int read_control_sense(struct fb_info_control *p)
        sense |= (in_le32(CNTRL_REG(p,mon_sense)) & 0x180) >> 7;
 
        out_le32(CNTRL_REG(p,mon_sense), 077);  /* turn off drivers */
-       
+
        return sense;
 }
 
@@ -558,9 +558,9 @@ static int control_var_to_par(struct fb_var_screeninfo *var,
 static void control_par_to_var(struct fb_par_control *par, struct fb_var_screeninfo *var)
 {
        struct control_regints *rv;
-       
+
        rv = (struct control_regints *) par->regvals.regs;
-       
+
        memset(var, 0, sizeof(*var));
        var->xres = par->xres;
        var->yres = par->yres;
@@ -568,7 +568,7 @@ static void control_par_to_var(struct fb_par_control *par, struct fb_var_screeni
        var->yres_virtual = par->vyres;
        var->xoffset = par->xoffset;
        var->yoffset = par->yoffset;
-       
+
        switch(par->cmode) {
        default:
        case CMODE_8:
@@ -634,7 +634,7 @@ static int controlfb_check_var (struct fb_var_screeninfo *var, struct fb_info *i
 
        err = control_var_to_par(var, &par, info);
        if (err)
-               return err;     
+               return err;
        control_par_to_var(&par, var);
 
        return 0;
@@ -655,7 +655,7 @@ static int controlfb_set_par (struct fb_info *info)
                                 " control_var_to_par: %d.\n", err);
                return err;
        }
-       
+
        control_set_hardware(p, &par);
 
        info->fix.visual = (p->par.cmode == CMODE_8) ?
@@ -840,7 +840,7 @@ static int __init init_control(struct fb_info_control *p)
        int full, sense, vmode, cmode, vyres;
        struct fb_var_screeninfo var;
        int rc;
-       
+
        printk(KERN_INFO "controlfb: ");
 
        full = p->total_vram == 0x400000;
index f98e8f2..8587c9d 100644 (file)
@@ -247,6 +247,9 @@ static void bit_cursor(struct vc_data *vc, struct fb_info *info, int mode,
 
        cursor.set = 0;
 
+       if (!vc->vc_font.data)
+               return;
+
        c = scr_readw((u16 *) vc->vc_pos);
        attribute = get_attribute(info, c);
        src = vc->vc_font.data + ((c & charmask) * (w * vc->vc_font.height));
index e808dc8..28739f1 100644 (file)
@@ -1468,7 +1468,7 @@ __releases(&info->lock)
 }
 
 #if defined(CONFIG_FB_PROVIDE_GET_FB_UNMAPPED_AREA) && !defined(CONFIG_MMU)
-unsigned long get_fb_unmapped_area(struct file *filp,
+static unsigned long get_fb_unmapped_area(struct file *filp,
                                   unsigned long addr, unsigned long len,
                                   unsigned long pgoff, unsigned long flags)
 {
index 23cf8eb..f7e019d 100644 (file)
@@ -257,6 +257,11 @@ static const struct fb_videomode modedb[] = {
        { NULL, 72, 480, 300, 33386, 40, 24, 11, 19, 80, 3, 0,
                FB_VMODE_DOUBLE },
 
+       /* 1920x1080 @ 60 Hz, 67.3 kHz hsync */
+       { NULL, 60, 1920, 1080, 6734, 148, 88, 36, 4, 44, 5, 0,
+               FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT,
+               FB_VMODE_NONINTERLACED },
+
        /* 1920x1200 @ 60 Hz, 74.5 Khz hsync */
        { NULL, 60, 1920, 1200, 5177, 128, 336, 1, 38, 208, 3,
                FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT,
index 05837a3..c5b7673 100644 (file)
@@ -6,7 +6,7 @@
  *
  *  This driver is based on tgafb.c
  *
- *     Copyright (C) 1997 Geert Uytterhoeven 
+ *     Copyright (C) 1997 Geert Uytterhoeven
  *     Copyright (C) 1995  Jay Estabrook
  *
  *  This file is subject to the terms and conditions of the GNU General Public
@@ -28,7 +28,7 @@
 #include <asm/io.h>
 #include <asm/jazz.h>
 
-/* 
+/*
  * Various defines for the G364
  */
 #define G364_MEM_BASE   0xe4400000
@@ -125,7 +125,7 @@ static const struct fb_ops g364fb_ops = {
  *
  *  This call looks only at xoffset, yoffset and the FB_VMODE_YWRAP flag
  */
-static int g364fb_pan_display(struct fb_var_screeninfo *var, 
+static int g364fb_pan_display(struct fb_var_screeninfo *var,
                              struct fb_info *info)
 {
        if (var->xoffset ||
index 20bdab7..0af5801 100644 (file)
@@ -1,6 +1,6 @@
 /*
  * linux/drivers/video/hgafb.c -- Hercules graphics adaptor frame buffer device
- * 
+ *
  *      Created 25 Nov 1999 by Ferenc Bakonyi (fero@drama.obuda.kando.hu)
  *      Based on skeletonfb.c by Geert Uytterhoeven and
  *               mdacon.c by Andrew Apted
@@ -8,14 +8,14 @@
  * History:
  *
  * - Revision 0.1.8 (23 Oct 2002): Ported to new framebuffer api.
- * 
- * - Revision 0.1.7 (23 Jan 2001): fix crash resulting from MDA only cards 
+ *
+ * - Revision 0.1.7 (23 Jan 2001): fix crash resulting from MDA only cards
  *                                being detected as Hercules.   (Paul G.)
  * - Revision 0.1.6 (17 Aug 2000): new style structs
  *                                 documentation
  * - Revision 0.1.5 (13 Mar 2000): spinlocks instead of saveflags();cli();etc
  *                                 minor fixes
- * - Revision 0.1.4 (24 Jan 2000): fixed a bug in hga_card_detect() for 
+ * - Revision 0.1.4 (24 Jan 2000): fixed a bug in hga_card_detect() for
  *                                  HGA-only systems
  * - Revision 0.1.3 (22 Jan 2000): modified for the new fb_info structure
  *                                 screen is cleared after rmmod
@@ -143,7 +143,7 @@ static bool nologo = 0;
 
 static void write_hga_b(unsigned int val, unsigned char reg)
 {
-       outb_p(reg, HGA_INDEX_PORT); 
+       outb_p(reg, HGA_INDEX_PORT);
        outb_p(val, HGA_VALUE_PORT);
 }
 
@@ -155,7 +155,7 @@ static void write_hga_w(unsigned int val, unsigned char reg)
 
 static int test_hga_b(unsigned char val, unsigned char reg)
 {
-       outb_p(reg, HGA_INDEX_PORT); 
+       outb_p(reg, HGA_INDEX_PORT);
        outb  (val, HGA_VALUE_PORT);
        udelay(20); val = (inb_p(HGA_VALUE_PORT) == val);
        return val;
@@ -244,7 +244,7 @@ static void hga_show_logo(struct fb_info *info)
        void __iomem *dest = hga_vram;
        char *logo = linux_logo_bw;
        int x, y;
-       
+
        for (y = 134; y < 134 + 80 ; y++) * this needs some cleanup *
                for (x = 0; x < 10 ; x++)
                        writeb(~*(logo++),(dest + HGA_ROWADDR(y) + x + 40));
@@ -255,7 +255,7 @@ static void hga_pan(unsigned int xoffset, unsigned int yoffset)
 {
        unsigned int base;
        unsigned long flags;
-       
+
        base = (yoffset / 8) * 90 + xoffset;
        spin_lock_irqsave(&hga_reg_lock, flags);
        write_hga_w(base, 0x0c);        /* start address */
@@ -310,7 +310,7 @@ static int hga_card_detect(void)
        /* Ok, there is definitely a card registering at the correct
         * memory location, so now we do an I/O port test.
         */
-       
+
        if (!test_hga_b(0x66, 0x0f))        /* cursor low register */
                goto error;
 
@@ -321,7 +321,7 @@ static int hga_card_detect(void)
         * bit of the status register is changing.  This test lasts for
         * approximately 1/10th of a second.
         */
-       
+
        p_save = q_save = inb_p(HGA_STATUS_PORT) & HGA_STATUS_VSYNC;
 
        for (count=0; count < 50000 && p_save == q_save; count++) {
@@ -329,7 +329,7 @@ static int hga_card_detect(void)
                udelay(2);
        }
 
-       if (p_save == q_save) 
+       if (p_save == q_save)
                goto error;
 
        switch (inb_p(HGA_STATUS_PORT) & 0x70) {
@@ -415,7 +415,7 @@ static int hgafb_setcolreg(u_int regno, u_int red, u_int green, u_int blue,
  *     @info:pointer to fb_info object containing info for current hga board
  *
  *     This function looks only at xoffset, yoffset and the %FB_VMODE_YWRAP
- *     flag in @var. If input parameters are correct it calls hga_pan() to 
+ *     flag in @var. If input parameters are correct it calls hga_pan() to
  *     program the hardware. @info->var is updated to the new values.
  *     A zero is returned on success and %-EINVAL for failure.
  */
@@ -442,9 +442,9 @@ static int hgafb_pan_display(struct fb_var_screeninfo *var,
  *     hgafb_blank - (un)blank the screen
  *     @blank_mode:blanking method to use
  *     @info:unused
- *     
- *     Blank the screen if blank_mode != 0, else unblank. 
- *     Implements VESA suspend and powerdown modes on hardware that supports 
+ *
+ *     Blank the screen if blank_mode != 0, else unblank.
+ *     Implements VESA suspend and powerdown modes on hardware that supports
  *     disabling hsync/vsync:
  *             @blank_mode == 2 means suspend vsync,
  *             @blank_mode == 3 means suspend hsync,
@@ -539,15 +539,15 @@ static const struct fb_ops hgafb_ops = {
        .fb_copyarea    = hgafb_copyarea,
        .fb_imageblit   = hgafb_imageblit,
 };
-               
+
 /* ------------------------------------------------------------------------- *
  *
  * Functions in fb_info
- * 
+ *
  * ------------------------------------------------------------------------- */
 
 /* ------------------------------------------------------------------------- */
-    
+
        /*
         *  Initialization
         */
index cdd44e5..77fbff4 100644 (file)
@@ -92,7 +92,7 @@ static int hpfb_setcolreg(unsigned regno, unsigned red, unsigned green,
 
        if (regno >= info->cmap.len)
                return 1;
-       
+
        while (in_be16(fb_regs + 0x6002) & 0x4) udelay(1);
 
        out_be16(fb_regs + 0x60ba, 0xff);
@@ -143,7 +143,7 @@ static void topcat_blit(int x0, int y0, int x1, int y1, int w, int h, int rr)
        out_8(fb_regs + WMOVE, fb_bitmask);
 }
 
-static void hpfb_copyarea(struct fb_info *info, const struct fb_copyarea *area) 
+static void hpfb_copyarea(struct fb_info *info, const struct fb_copyarea *area)
 {
        topcat_blit(area->sx, area->sy, area->dx, area->dy, area->width, area->height, RR_COPY);
 }
@@ -315,7 +315,7 @@ unmap_screen_base:
        return ret;
 }
 
-/* 
+/*
  * Check that the secondary ID indicates that we have some hope of working with this
  * framebuffer.  The catseye boards are pretty much like topcats and we can muddle through.
  */
@@ -323,7 +323,7 @@ unmap_screen_base:
 #define topcat_sid_ok(x)  (((x) == DIO_ID2_LRCATSEYE) || ((x) == DIO_ID2_HRCCATSEYE)    \
                           || ((x) == DIO_ID2_HRMCATSEYE) || ((x) == DIO_ID2_TOPCAT))
 
-/* 
+/*
  * Initialise the framebuffer
  */
 static int hpfb_dio_probe(struct dio_dev *d, const struct dio_device_id *ent)
index b4b3670..2082b5c 100644 (file)
@@ -14,6 +14,7 @@
 
 #include "i810_regs.h"
 #include "i810.h"
+#include "i810_main.h"
 
 struct mode_registers std_modes[] = {
        /* 640x480 @ 60Hz */
@@ -276,7 +277,7 @@ void i810fb_fill_var_timings(struct fb_var_screeninfo *var)
        var->upper_margin = total - (yres + var->lower_margin + var->vsync_len);
 }
 
-u32 i810_get_watermark(struct fb_var_screeninfo *var,
+u32 i810_get_watermark(const struct fb_var_screeninfo *var,
                       struct i810fb_par *par)
 {
        struct mode_registers *params = &par->regs;
index bea4564..ee7d01a 100644 (file)
@@ -1347,7 +1347,7 @@ static const struct fb_ops imsttfb_ops = {
        .fb_ioctl       = imsttfb_ioctl,
 };
 
-static void init_imstt(struct fb_info *info)
+static int init_imstt(struct fb_info *info)
 {
        struct imstt_par *par = info->par;
        __u32 i, tmp, *ip, *end;
@@ -1420,7 +1420,7 @@ static void init_imstt(struct fb_info *info)
            || !(compute_imstt_regvals(par, info->var.xres, info->var.yres))) {
                printk("imsttfb: %ux%ux%u not supported\n", info->var.xres, info->var.yres, info->var.bits_per_pixel);
                framebuffer_release(info);
-               return;
+               return -ENODEV;
        }
 
        sprintf(info->fix.id, "IMS TT (%s)", par->ramdac == IBM ? "IBM" : "TVP");
@@ -1452,16 +1452,21 @@ static void init_imstt(struct fb_info *info)
                      FBINFO_HWACCEL_FILLRECT |
                      FBINFO_HWACCEL_YPAN;
 
-       fb_alloc_cmap(&info->cmap, 0, 0);
+       if (fb_alloc_cmap(&info->cmap, 0, 0)) {
+               framebuffer_release(info);
+               return -ENODEV;
+       }
 
        if (register_framebuffer(info) < 0) {
+               fb_dealloc_cmap(&info->cmap);
                framebuffer_release(info);
-               return;
+               return -ENODEV;
        }
 
        tmp = (read_reg_le32(par->dc_regs, SSTATUS) & 0x0f00) >> 8;
        fb_info(info, "%s frame buffer; %uMB vram; chip version %u\n",
                info->fix.id, info->fix.smem_len >> 20, tmp);
+       return 0;
 }
 
 static int imsttfb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
@@ -1529,10 +1534,12 @@ static int imsttfb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
        if (!par->cmap_regs)
                goto error;
        info->pseudo_palette = par->palette;
-       init_imstt(info);
+       ret = init_imstt(info);
+       if (ret)
+               goto error;
 
        pci_set_drvdata(pdev, info);
-       return 0;
+       return ret;
 
 error:
        if (par->dc_regs)
index 312e35c..44ff860 100644 (file)
@@ -339,7 +339,7 @@ static int civic_setpalette(unsigned int regno, unsigned int red,
 {
        unsigned long flags;
        int clut_status;
-       
+
        local_irq_save(flags);
 
        /* Set the register address */
@@ -439,7 +439,7 @@ static int macfb_setcolreg(unsigned regno, unsigned red, unsigned green,
         * (according to the entries in the `var' structure).
         * Return non-zero for invalid regno.
         */
-       
+
        if (regno >= fb_info->cmap.len)
                return 1;
 
@@ -548,7 +548,7 @@ static int __init macfb_init(void)
                return -ENODEV;
        macfb_setup(option);
 
-       if (!MACH_IS_MAC) 
+       if (!MACH_IS_MAC)
                return -ENODEV;
 
        if (mac_bi_data.id == MAC_MODEL_Q630 ||
@@ -644,7 +644,7 @@ static int __init macfb_init(void)
                err = -EINVAL;
                goto fail_unmap;
        }
-       
+
        /*
         * We take a wild guess that if the video physical address is
         * in nubus slot space, that the nubus card is driving video.
@@ -774,7 +774,7 @@ static int __init macfb_init(void)
                        civic_cmap_regs = ioremap(CIVIC_BASE, 0x1000);
                        break;
 
-               
+
                /*
                 * Assorted weirdos
                 * We think this may be like the LC II
index 727a10a..b15a8ad 100644 (file)
@@ -1291,7 +1291,7 @@ static struct i2c_driver maven_driver={
        .driver = {
                .name   = "maven",
        },
-       .probe_new      = maven_probe,
+       .probe          = maven_probe,
        .remove         = maven_remove,
        .id_table       = maven_id,
 };
index ae1a42b..4e6b052 100644 (file)
@@ -138,7 +138,7 @@ int __init maxinefb_init(void)
                *(volatile unsigned char *)fboff = 0x0;
 
        maxinefb_fix.smem_start = fb_start;
-       
+
        /* erase hardware cursor */
        for (i = 0; i < 512; i++) {
                maxinefb_ims332_write_register(IMS332_REG_CURSOR_RAM + i,
index 1eaa35c..477789c 100644 (file)
@@ -491,7 +491,8 @@ static int tpo_td043_probe(struct spi_device *spi)
 
        ddata->vcc_reg = devm_regulator_get(&spi->dev, "vcc");
        if (IS_ERR(ddata->vcc_reg)) {
-               r = dev_err_probe(&spi->dev, r, "failed to get LCD VCC regulator\n");
+               r = dev_err_probe(&spi->dev, PTR_ERR(ddata->vcc_reg),
+                                 "failed to get LCD VCC regulator\n");
                goto err_regulator;
        }
 
index 3e44f95..0876962 100644 (file)
@@ -65,7 +65,7 @@ static const struct fb_ops p9100_ops = {
 #define P9100_FB_OFF 0x0UL
 
 /* 3 bits: 2=8bpp 3=16bpp 5=32bpp 7=24bpp */
-#define SYS_CONFIG_PIXELSIZE_SHIFT 26 
+#define SYS_CONFIG_PIXELSIZE_SHIFT 26
 
 #define SCREENPAINT_TIMECTL1_ENABLE_VIDEO 0x20 /* 0 = off, 1 = on */
 
@@ -110,7 +110,7 @@ struct p9100_regs {
        u32 vram_xxx[25];
 
        /* Registers for IBM RGB528 Palette */
-       u32 ramdac_cmap_wridx; 
+       u32 ramdac_cmap_wridx;
        u32 ramdac_palette_data;
        u32 ramdac_pixel_mask;
        u32 ramdac_palette_rdaddr;
index 82f019f..f8283fc 100644 (file)
@@ -52,17 +52,17 @@ struct fb_info_platinum {
                __u8 red, green, blue;
        }                               palette[256];
        u32                             pseudo_palette[16];
-       
+
        volatile struct cmap_regs       __iomem *cmap_regs;
        unsigned long                   cmap_regs_phys;
-       
+
        volatile struct platinum_regs   __iomem *platinum_regs;
        unsigned long                   platinum_regs_phys;
-       
+
        __u8                            __iomem *frame_buffer;
        volatile __u8                   __iomem *base_frame_buffer;
        unsigned long                   frame_buffer_phys;
-       
+
        unsigned long                   total_vram;
        int                             clktype;
        int                             dactype;
@@ -133,7 +133,7 @@ static int platinumfb_set_par (struct fb_info *info)
        platinum_set_hardware(pinfo);
 
        init = platinum_reg_init[pinfo->vmode-1];
-       
+
        if ((pinfo->vmode == VMODE_832_624_75) && (pinfo->cmode > CMODE_8))
                offset = 0x10;
 
@@ -214,7 +214,7 @@ static int platinumfb_setcolreg(u_int regno, u_int red, u_int green, u_int blue,
                        break;
                }
        }
-       
+
        return 0;
 }
 
@@ -269,7 +269,7 @@ static void platinum_set_hardware(struct fb_info_platinum *pinfo)
        struct platinum_regvals         *init;
        int                             i;
        int                             vmode, cmode;
-       
+
        vmode = pinfo->vmode;
        cmode = pinfo->cmode;
 
@@ -436,7 +436,7 @@ static int read_platinum_sense(struct fb_info_platinum *info)
  * This routine takes a user-supplied var, and picks the best vmode/cmode from it.
  * It also updates the var structure to the actual mode data obtained
  */
-static int platinum_var_to_par(struct fb_var_screeninfo *var, 
+static int platinum_var_to_par(struct fb_var_screeninfo *var,
                               struct fb_info_platinum *pinfo,
                               int check_only)
 {
@@ -478,12 +478,12 @@ static int platinum_var_to_par(struct fb_var_screeninfo *var,
        pinfo->yoffset = 0;
        pinfo->vxres = pinfo->xres;
        pinfo->vyres = pinfo->yres;
-       
+
        return 0;
 }
 
 
-/* 
+/*
  * Parse user specified options (`video=platinumfb:')
  */
 static int __init platinumfb_setup(char *options)
@@ -624,7 +624,7 @@ static int platinumfb_probe(struct platform_device* odev)
                break;
        }
        dev_set_drvdata(&odev->dev, info);
-       
+
        rc = platinum_init_fb(info);
        if (rc != 0) {
                iounmap(pinfo->frame_buffer);
@@ -640,9 +640,9 @@ static void platinumfb_remove(struct platform_device* odev)
 {
        struct fb_info          *info = dev_get_drvdata(&odev->dev);
        struct fb_info_platinum *pinfo = info->par;
-       
+
         unregister_framebuffer (info);
-       
+
        /* Unmap frame buffer and registers */
        iounmap(pinfo->frame_buffer);
        iounmap(pinfo->platinum_regs);
@@ -656,7 +656,7 @@ static void platinumfb_remove(struct platform_device* odev)
        framebuffer_release(info);
 }
 
-static struct of_device_id platinumfb_match[] = 
+static struct of_device_id platinumfb_match[] =
 {
        {
        .name           = "platinum",
@@ -664,7 +664,7 @@ static struct of_device_id platinumfb_match[] =
        {},
 };
 
-static struct platform_driver platinum_driver = 
+static struct platform_driver platinum_driver =
 {
        .driver = {
                .name = "platinumfb",
index b1b8ccd..a2408bf 100644 (file)
  *     - Driver appears to be working for Brutus 320x200x8bpp mode.  Other
  *       resolutions are working, but only the 8bpp mode is supported.
  *       Changes need to be made to the palette encode and decode routines
- *       to support 4 and 16 bpp modes.  
+ *       to support 4 and 16 bpp modes.
  *       Driver is not designed to be a module.  The FrameBuffer is statically
- *       allocated since dynamic allocation of a 300k buffer cannot be 
- *       guaranteed. 
+ *       allocated since dynamic allocation of a 300k buffer cannot be
+ *       guaranteed.
  *
  * 1999/06/17:
  *     - FrameBuffer memory is now allocated at run-time when the
- *       driver is initialized.    
+ *       driver is initialized.
  *
  * 2000/04/10: Nicolas Pitre <nico@fluxnic.net>
  *     - Big cleanup for dynamic selection of machine type at run time.
@@ -74,8 +74,8 @@
  *
  * 2000/08/07: Tak-Shing Chan <tchan.rd@idthk.com>
  *            Jeff Sutherland <jsutherland@accelent.com>
- *     - Resolved an issue caused by a change made to the Assabet's PLD 
- *       earlier this year which broke the framebuffer driver for newer 
+ *     - Resolved an issue caused by a change made to the Assabet's PLD
+ *       earlier this year which broke the framebuffer driver for newer
  *       Phase 4 Assabets.  Some other parameters were changed to optimize
  *       for the Sharp display.
  *
  * 2000/11/23: Eric Peng <ericpeng@coventive.com>
  *     - Freebird add
  *
- * 2001/02/07: Jamey Hicks <jamey.hicks@compaq.com> 
+ * 2001/02/07: Jamey Hicks <jamey.hicks@compaq.com>
  *            Cliff Brake <cbrake@accelent.com>
  *     - Added PM callback
  *
@@ -500,7 +500,7 @@ sa1100fb_set_cmap(struct fb_cmap *cmap, int kspc, int con,
  *     the shortest recovery time
  *  Suspend
  *     This refers to a level of power management in which substantial power
- *     reduction is achieved by the display.  The display can have a longer 
+ *     reduction is achieved by the display.  The display can have a longer
  *     recovery time from this state than from the Stand-by state
  *  Off
  *     This indicates that the display is consuming the lowest level of power
@@ -522,9 +522,9 @@ sa1100fb_set_cmap(struct fb_cmap *cmap, int kspc, int con,
  */
 /*
  * sa1100fb_blank():
- *     Blank the display by setting all palette values to zero.  Note, the 
+ *     Blank the display by setting all palette values to zero.  Note, the
  *     12 and 16 bpp modes don't really use the palette, so this will not
- *      blank the display in all modes.  
+ *      blank the display in all modes.
  */
 static int sa1100fb_blank(int blank, struct fb_info *info)
 {
@@ -603,8 +603,8 @@ static inline unsigned int get_pcd(struct sa1100fb_info *fbi,
 
 /*
  * sa1100fb_activate_var():
- *     Configures LCD Controller based on entries in var parameter.  Settings are      
- *     only written to the controller if changes were made.  
+ *     Configures LCD Controller based on entries in var parameter.  Settings are
+ *     only written to the controller if changes were made.
  */
 static int sa1100fb_activate_var(struct fb_var_screeninfo *var, struct sa1100fb_info *fbi)
 {
@@ -747,7 +747,7 @@ static void sa1100fb_setup_gpio(struct sa1100fb_info *fbi)
         *
         * SA1110 spec update nr. 25 says we can and should
         * clear LDD15 to 12 for 4 or 8bpp modes with active
-        * panels.  
+        * panels.
         */
        if ((fbi->reg_lccr0 & LCCR0_CMS) == LCCR0_Color &&
            (fbi->reg_lccr0 & (LCCR0_Dual|LCCR0_Act)) != 0) {
@@ -1020,9 +1020,9 @@ static int sa1100fb_resume(struct platform_device *dev)
 
 /*
  * sa1100fb_map_video_memory():
- *      Allocates the DRAM memory for the frame buffer.  This buffer is  
- *     remapped into a non-cached, non-buffered, memory region to  
- *      allow palette and pixel writes to occur without flushing the 
+ *      Allocates the DRAM memory for the frame buffer.  This buffer is
+ *     remapped into a non-cached, non-buffered, memory region to
+ *      allow palette and pixel writes to occur without flushing the
  *      cache.  Once this area is remapped, all virtual memory
  *      access to the video memory should occur at the new region.
  */
index 046b999..132d1a2 100644 (file)
@@ -844,7 +844,7 @@ static const struct i2c_device_id ssd1307fb_i2c_id[] = {
 MODULE_DEVICE_TABLE(i2c, ssd1307fb_i2c_id);
 
 static struct i2c_driver ssd1307fb_driver = {
-       .probe_new = ssd1307fb_probe,
+       .probe = ssd1307fb_probe,
        .remove = ssd1307fb_remove,
        .id_table = ssd1307fb_i2c_id,
        .driver = {
index ef8a4c5..686a234 100644 (file)
@@ -1,11 +1,11 @@
 /*
- * linux/drivers/video/stifb.c - 
- * Low level Frame buffer driver for HP workstations with 
+ * linux/drivers/video/stifb.c -
+ * Low level Frame buffer driver for HP workstations with
  * STI (standard text interface) video firmware.
  *
  * Copyright (C) 2001-2006 Helge Deller <deller@gmx.de>
  * Portions Copyright (C) 2001 Thomas Bogendoerfer <tsbogend@alpha.franken.de>
- * 
+ *
  * Based on:
  * - linux/drivers/video/artistfb.c -- Artist frame buffer driver
  *     Copyright (C) 2000 Philipp Rumpf <prumpf@tux.org>
@@ -14,7 +14,7 @@
  * - HP Xhp cfb-based X11 window driver for XFree86
  *     (c)Copyright 1992 Hewlett-Packard Co.
  *
- * 
+ *
  *  The following graphics display devices (NGLE family) are supported by this driver:
  *
  *  HPA4070A   known as "HCRX", a 1280x1024 color device with 8 planes
@@ -30,7 +30,7 @@
  *             supports 1280x1024 color displays with 8 planes.
  *  HP710G     same as HP710C, 1280x1024 grayscale only
  *  HP710L     same as HP710C, 1024x768 color only
- *  HP712      internal graphics support on HP9000s712 SPU, supports 640x480, 
+ *  HP712      internal graphics support on HP9000s712 SPU, supports 640x480,
  *             1024x768 or 1280x1024 color displays on 8 planes (Artist)
  *
  * This file is subject to the terms and conditions of the GNU General Public
@@ -92,7 +92,7 @@ typedef struct {
        __s32   misc_video_end;
 } video_setup_t;
 
-typedef struct {                  
+typedef struct {
        __s16   sizeof_ngle_data;
        __s16   x_size_visible;     /* visible screen dim in pixels  */
        __s16   y_size_visible;
@@ -177,10 +177,10 @@ static int __initdata stifb_bpp_pref[MAX_STI_ROMS];
 #endif /* DEBUG_STIFB_REGS */
 
 
-#define ENABLE 1       /* for enabling/disabling screen */     
+#define ENABLE 1       /* for enabling/disabling screen */
 #define DISABLE 0
 
-#define NGLE_LOCK(fb_info)     do { } while (0) 
+#define NGLE_LOCK(fb_info)     do { } while (0)
 #define NGLE_UNLOCK(fb_info)   do { } while (0)
 
 static void
@@ -198,9 +198,9 @@ SETUP_HW(struct stifb_info *fb)
 
 static void
 SETUP_FB(struct stifb_info *fb)
-{      
+{
        unsigned int reg10_value = 0;
-       
+
        SETUP_HW(fb);
        switch (fb->id)
        {
@@ -210,15 +210,15 @@ SETUP_FB(struct stifb_info *fb)
                        reg10_value = 0x13601000;
                        break;
                case S9000_ID_A1439A:
-                       if (fb->info.var.bits_per_pixel == 32)                                          
+                       if (fb->info.var.bits_per_pixel == 32)
                                reg10_value = 0xBBA0A000;
-                       else 
+                       else
                                reg10_value = 0x13601000;
                        break;
                case S9000_ID_HCRX:
                        if (fb->info.var.bits_per_pixel == 32)
                                reg10_value = 0xBBA0A000;
-                       else                                    
+                       else
                                reg10_value = 0x13602000;
                        break;
                case S9000_ID_TIMBER:
@@ -243,7 +243,7 @@ START_IMAGE_COLORMAP_ACCESS(struct stifb_info *fb)
 }
 
 static void
-WRITE_IMAGE_COLOR(struct stifb_info *fb, int index, int color) 
+WRITE_IMAGE_COLOR(struct stifb_info *fb, int index, int color)
 {
        SETUP_HW(fb);
        WRITE_WORD(((0x100+index)<<2), fb, REG_3);
@@ -251,30 +251,30 @@ WRITE_IMAGE_COLOR(struct stifb_info *fb, int index, int color)
 }
 
 static void
-FINISH_IMAGE_COLORMAP_ACCESS(struct stifb_info *fb) 
-{              
+FINISH_IMAGE_COLORMAP_ACCESS(struct stifb_info *fb)
+{
        WRITE_WORD(0x400, fb, REG_2);
        if (fb->info.var.bits_per_pixel == 32) {
                WRITE_WORD(0x83000100, fb, REG_1);
        } else {
                if (fb->id == S9000_ID_ARTIST || fb->id == CRT_ID_VISUALIZE_EG)
                        WRITE_WORD(0x80000100, fb, REG_26);
-               else                                                    
+               else
                        WRITE_WORD(0x80000100, fb, REG_1);
        }
        SETUP_FB(fb);
 }
 
 static void
-SETUP_RAMDAC(struct stifb_info *fb) 
+SETUP_RAMDAC(struct stifb_info *fb)
 {
        SETUP_HW(fb);
        WRITE_WORD(0x04000000, fb, 0x1020);
        WRITE_WORD(0xff000000, fb, 0x1028);
 }
 
-static void 
-CRX24_SETUP_RAMDAC(struct stifb_info *fb) 
+static void
+CRX24_SETUP_RAMDAC(struct stifb_info *fb)
 {
        SETUP_HW(fb);
        WRITE_WORD(0x04000000, fb, 0x1000);
@@ -286,14 +286,14 @@ CRX24_SETUP_RAMDAC(struct stifb_info *fb)
 }
 
 #if 0
-static void 
+static void
 HCRX_SETUP_RAMDAC(struct stifb_info *fb)
 {
        WRITE_WORD(0xffffffff, fb, REG_32);
 }
 #endif
 
-static void 
+static void
 CRX24_SET_OVLY_MASK(struct stifb_info *fb)
 {
        SETUP_HW(fb);
@@ -314,7 +314,7 @@ ENABLE_DISABLE_DISPLAY(struct stifb_info *fb, int enable)
         WRITE_WORD(value,      fb, 0x1038);
 }
 
-static void 
+static void
 CRX24_ENABLE_DISABLE_DISPLAY(struct stifb_info *fb, int enable)
 {
        unsigned int value = enable ? 0x10000000 : 0x30000000;
@@ -325,11 +325,11 @@ CRX24_ENABLE_DISABLE_DISPLAY(struct stifb_info *fb, int enable)
 }
 
 static void
-ARTIST_ENABLE_DISABLE_DISPLAY(struct stifb_info *fb, int enable) 
+ARTIST_ENABLE_DISABLE_DISPLAY(struct stifb_info *fb, int enable)
 {
        u32 DregsMiscVideo = REG_21;
        u32 DregsMiscCtl = REG_27;
-       
+
        SETUP_HW(fb);
        if (enable) {
          WRITE_WORD(READ_WORD(fb, DregsMiscVideo) | 0x0A000000, fb, DregsMiscVideo);
@@ -344,7 +344,7 @@ ARTIST_ENABLE_DISABLE_DISPLAY(struct stifb_info *fb, int enable)
        (READ_BYTE(fb, REG_16b3) - 1)
 
 #define HYPER_CONFIG_PLANES_24 0x00000100
-       
+
 #define IS_24_DEVICE(fb) \
        (fb->deviceSpecificConfig & HYPER_CONFIG_PLANES_24)
 
@@ -470,15 +470,15 @@ SETUP_ATTR_ACCESS(struct stifb_info *fb, unsigned BufferNumber)
 }
 
 static void
-SET_ATTR_SIZE(struct stifb_info *fb, int width, int height) 
+SET_ATTR_SIZE(struct stifb_info *fb, int width, int height)
 {
-       /* REG_6 seems to have special values when run on a 
+       /* REG_6 seems to have special values when run on a
           RDI precisionbook parisc laptop (INTERNAL_EG_DX1024 or
           INTERNAL_EG_X1024).  The values are:
                0x2f0: internal (LCD) & external display enabled
                0x2a0: external display only
                0x000: zero on standard artist graphic cards
-       */ 
+       */
        WRITE_WORD(0x00000000, fb, REG_6);
        WRITE_WORD((width<<16) | height, fb, REG_9);
        WRITE_WORD(0x05000000, fb, REG_6);
@@ -486,7 +486,7 @@ SET_ATTR_SIZE(struct stifb_info *fb, int width, int height)
 }
 
 static void
-FINISH_ATTR_ACCESS(struct stifb_info *fb) 
+FINISH_ATTR_ACCESS(struct stifb_info *fb)
 {
        SETUP_HW(fb);
        WRITE_WORD(0x00000000, fb, REG_12);
@@ -499,7 +499,7 @@ elkSetupPlanes(struct stifb_info *fb)
        SETUP_FB(fb);
 }
 
-static void 
+static void
 ngleSetupAttrPlanes(struct stifb_info *fb, int BufferNumber)
 {
        SETUP_ATTR_ACCESS(fb, BufferNumber);
@@ -519,7 +519,7 @@ rattlerSetupPlanes(struct stifb_info *fb)
         * read mask register for overlay planes, not image planes).
         */
        CRX24_SETUP_RAMDAC(fb);
-    
+
        /* change fb->id temporarily to fool SETUP_FB() */
        saved_id = fb->id;
        fb->id = CRX24_OVERLAY_PLANES;
@@ -565,7 +565,7 @@ setNgleLutBltCtl(struct stifb_info *fb, int offsetWithinLut, int length)
        lutBltCtl.all           = 0x80000000;
        lutBltCtl.fields.length = length;
 
-       switch (fb->id) 
+       switch (fb->id)
        {
        case S9000_ID_A1439A:           /* CRX24 */
                if (fb->var.bits_per_pixel == 8) {
@@ -576,12 +576,12 @@ setNgleLutBltCtl(struct stifb_info *fb, int offsetWithinLut, int length)
                        lutBltCtl.fields.lutOffset = 0 * 256;
                }
                break;
-               
+
        case S9000_ID_ARTIST:
                lutBltCtl.fields.lutType = NGLE_CMAP_INDEXED0_TYPE;
                lutBltCtl.fields.lutOffset = 0 * 256;
                break;
-               
+
        default:
                lutBltCtl.fields.lutType = NGLE_CMAP_INDEXED0_TYPE;
                lutBltCtl.fields.lutOffset = 0;
@@ -596,7 +596,7 @@ setNgleLutBltCtl(struct stifb_info *fb, int offsetWithinLut, int length)
 #endif
 
 static NgleLutBltCtl
-setHyperLutBltCtl(struct stifb_info *fb, int offsetWithinLut, int length) 
+setHyperLutBltCtl(struct stifb_info *fb, int offsetWithinLut, int length)
 {
        NgleLutBltCtl lutBltCtl;
 
@@ -633,7 +633,7 @@ static void hyperUndoITE(struct stifb_info *fb)
 
        /* Hardware setup for full-depth write to "magic" location */
        GET_FIFO_SLOTS(fb, nFreeFifoSlots, 7);
-       NGLE_QUICK_SET_DST_BM_ACCESS(fb, 
+       NGLE_QUICK_SET_DST_BM_ACCESS(fb,
                BA(IndexedDcd, Otc04, Ots08, AddrLong,
                BAJustPoint(0), BINovly, BAIndexBase(0)));
        NGLE_QUICK_SET_IMAGE_BITMAP_OP(fb,
@@ -653,13 +653,13 @@ static void hyperUndoITE(struct stifb_info *fb)
        NGLE_UNLOCK(fb);
 }
 
-static void 
+static void
 ngleDepth8_ClearImagePlanes(struct stifb_info *fb)
 {
        /* FIXME! */
 }
 
-static void 
+static void
 ngleDepth24_ClearImagePlanes(struct stifb_info *fb)
 {
        /* FIXME! */
@@ -675,7 +675,7 @@ ngleResetAttrPlanes(struct stifb_info *fb, unsigned int ctlPlaneReg)
        NGLE_LOCK(fb);
 
        GET_FIFO_SLOTS(fb, nFreeFifoSlots, 4);
-       NGLE_QUICK_SET_DST_BM_ACCESS(fb, 
+       NGLE_QUICK_SET_DST_BM_ACCESS(fb,
                                     BA(IndexedDcd, Otc32, OtsIndirect,
                                        AddrLong, BAJustPoint(0),
                                        BINattr, BAIndexBase(0)));
@@ -713,22 +713,22 @@ ngleResetAttrPlanes(struct stifb_info *fb, unsigned int ctlPlaneReg)
        /**** Finally, set the Control Plane Register back to zero: ****/
        GET_FIFO_SLOTS(fb, nFreeFifoSlots, 1);
        NGLE_QUICK_SET_CTL_PLN_REG(fb, 0);
-       
+
        NGLE_UNLOCK(fb);
 }
-    
+
 static void
 ngleClearOverlayPlanes(struct stifb_info *fb, int mask, int data)
 {
        int nFreeFifoSlots = 0;
        u32 packed_dst;
        u32 packed_len;
-    
+
        NGLE_LOCK(fb);
 
        /* Hardware setup */
        GET_FIFO_SLOTS(fb, nFreeFifoSlots, 8);
-       NGLE_QUICK_SET_DST_BM_ACCESS(fb, 
+       NGLE_QUICK_SET_DST_BM_ACCESS(fb,
                                     BA(IndexedDcd, Otc04, Ots08, AddrLong,
                                        BAJustPoint(0), BINovly, BAIndexBase(0)));
 
@@ -736,23 +736,23 @@ ngleClearOverlayPlanes(struct stifb_info *fb, int mask, int data)
 
         NGLE_REALLY_SET_IMAGE_FG_COLOR(fb, data);
         NGLE_REALLY_SET_IMAGE_PLANEMASK(fb, mask);
-    
+
         packed_dst = 0;
         packed_len = (fb->info.var.xres << 16) | fb->info.var.yres;
         NGLE_SET_DSTXY(fb, packed_dst);
-    
-        /* Write zeroes to overlay planes */                  
+
+       /* Write zeroes to overlay planes */
        NGLE_QUICK_SET_IMAGE_BITMAP_OP(fb,
                                       IBOvals(RopSrc, MaskAddrOffset(0),
                                               BitmapExtent08, StaticReg(0),
                                               DataDynamic, MaskOtc, BGx(0), FGx(0)));
-                      
+
         SET_LENXY_START_RECFILL(fb, packed_len);
 
        NGLE_UNLOCK(fb);
 }
 
-static void 
+static void
 hyperResetPlanes(struct stifb_info *fb, int enable)
 {
        unsigned int controlPlaneReg;
@@ -783,7 +783,7 @@ hyperResetPlanes(struct stifb_info *fb, int enable)
                ngleClearOverlayPlanes(fb, 0xff, 255);
 
                /**************************************************
-                ** Also need to counteract ITE settings 
+                ** Also need to counteract ITE settings
                 **************************************************/
                hyperUndoITE(fb);
                break;
@@ -803,13 +803,13 @@ hyperResetPlanes(struct stifb_info *fb, int enable)
                ngleResetAttrPlanes(fb, controlPlaneReg);
                break;
        }
-       
+
        NGLE_UNLOCK(fb);
 }
 
 /* Return pointer to in-memory structure holding ELK device-dependent ROM values. */
 
-static void 
+static void
 ngleGetDeviceRomData(struct stifb_info *fb)
 {
 #if 0
@@ -821,7 +821,7 @@ XXX: FIXME: !!!
        char    *pCard8;
        int     i;
        char    *mapOrigin = NULL;
-    
+
        int romTableIdx;
 
        pPackedDevRomData = fb->ngle_rom;
@@ -888,7 +888,7 @@ SETUP_HCRX(struct stifb_info *fb)
 
        /* Initialize Hyperbowl registers */
        GET_FIFO_SLOTS(fb, nFreeFifoSlots, 7);
-       
+
        if (IS_24_DEVICE(fb)) {
                hyperbowl = (fb->info.var.bits_per_pixel == 32) ?
                        HYPERBOWL_MODE01_8_24_LUT0_TRANSPARENT_LUT1_OPAQUE :
@@ -897,9 +897,9 @@ SETUP_HCRX(struct stifb_info *fb)
                /* First write to Hyperbowl must happen twice (bug) */
                WRITE_WORD(hyperbowl, fb, REG_40);
                WRITE_WORD(hyperbowl, fb, REG_40);
-               
+
                WRITE_WORD(HYPERBOWL_MODE2_8_24, fb, REG_39);
-               
+
                WRITE_WORD(0x014c0148, fb, REG_42); /* Set lut 0 to be the direct color */
                WRITE_WORD(0x404c4048, fb, REG_43);
                WRITE_WORD(0x034c0348, fb, REG_44);
@@ -990,7 +990,7 @@ stifb_setcolreg(u_int regno, u_int red, u_int green,
                                0,      /* Offset w/i LUT */
                                256);   /* Load entire LUT */
                NGLE_BINC_SET_SRCADDR(fb,
-                               NGLE_LONG_FB_ADDRESS(0, 0x100, 0)); 
+                               NGLE_LONG_FB_ADDRESS(0, 0x100, 0));
                                /* 0x100 is same as used in WRITE_IMAGE_COLOR() */
                START_COLORMAPLOAD(fb, lutBltCtl.all);
                SETUP_FB(fb);
@@ -1028,7 +1028,7 @@ stifb_blank(int blank_mode, struct fb_info *info)
                ENABLE_DISABLE_DISPLAY(fb, enable);
                break;
        }
-       
+
        SETUP_FB(fb);
        return 0;
 }
@@ -1114,15 +1114,15 @@ stifb_init_display(struct stifb_info *fb)
 
        /* HCRX specific initialization */
        SETUP_HCRX(fb);
-       
+
        /*
        if (id == S9000_ID_HCRX)
                hyperInitSprite(fb);
        else
                ngleInitSprite(fb);
        */
-       
-       /* Initialize the image planes. */ 
+
+       /* Initialize the image planes. */
         switch (id) {
         case S9000_ID_HCRX:
            hyperResetPlanes(fb, ENABLE);
@@ -1194,7 +1194,7 @@ static int __init stifb_init_fb(struct sti_struct *sti, int bpp_pref)
        fb = kzalloc(sizeof(*fb), GFP_ATOMIC);
        if (!fb)
                return -ENOMEM;
-       
+
        info = &fb->info;
 
        /* set struct to a known state */
@@ -1235,7 +1235,7 @@ static int __init stifb_init_fb(struct sti_struct *sti, int bpp_pref)
                        dev_name, fb->id);
                goto out_err0;
        }
-       
+
        /* default to 8 bpp on most graphic chips */
        bpp = 8;
        xres = sti_onscreen_x(fb->sti);
@@ -1256,7 +1256,7 @@ static int __init stifb_init_fb(struct sti_struct *sti, int bpp_pref)
                fb->id = S9000_ID_A1659A;
                break;
        case S9000_ID_TIMBER:   /* HP9000/710 Any (may be a grayscale device) */
-               if (strstr(dev_name, "GRAYSCALE") || 
+               if (strstr(dev_name, "GRAYSCALE") ||
                    strstr(dev_name, "Grayscale") ||
                    strstr(dev_name, "grayscale"))
                        var->grayscale = 1;
@@ -1295,16 +1295,16 @@ static int __init stifb_init_fb(struct sti_struct *sti, int bpp_pref)
        case CRT_ID_VISUALIZE_EG:
        case S9000_ID_ARTIST:   /* Artist */
                break;
-       default: 
+       default:
 #ifdef FALLBACK_TO_1BPP
-               printk(KERN_WARNING 
+               printk(KERN_WARNING
                        "stifb: Unsupported graphics card (id=0x%08x) "
                                "- now trying 1bpp mode instead\n",
                        fb->id);
                bpp = 1;        /* default to 1 bpp */
                break;
 #else
-               printk(KERN_WARNING 
+               printk(KERN_WARNING
                        "stifb: Unsupported graphics card (id=0x%08x) "
                                "- skipping.\n",
                        fb->id);
@@ -1320,11 +1320,11 @@ static int __init stifb_init_fb(struct sti_struct *sti, int bpp_pref)
        fix->line_length = (fb->sti->glob_cfg->total_x * bpp) / 8;
        if (!fix->line_length)
                fix->line_length = 2048; /* default */
-       
+
        /* limit fbsize to max visible screen size */
        if (fix->smem_len > yres*fix->line_length)
                fix->smem_len = ALIGN(yres*fix->line_length, 4*1024*1024);
-       
+
        fix->accel = FB_ACCEL_NONE;
 
        switch (bpp) {
@@ -1350,7 +1350,7 @@ static int __init stifb_init_fb(struct sti_struct *sti, int bpp_pref)
            default:
                break;
        }
-       
+
        var->xres = var->xres_virtual = xres;
        var->yres = var->yres_virtual = yres;
        var->bits_per_pixel = bpp;
@@ -1379,7 +1379,7 @@ static int __init stifb_init_fb(struct sti_struct *sti, int bpp_pref)
                                fix->smem_start, fix->smem_start+fix->smem_len);
                goto out_err2;
        }
-               
+
        if (!request_mem_region(fix->mmio_start, fix->mmio_len, "stifb mmio")) {
                printk(KERN_ERR "stifb: cannot reserve sti mmio region 0x%04lx-0x%04lx\n",
                                fix->mmio_start, fix->mmio_start+fix->mmio_len);
@@ -1393,11 +1393,11 @@ static int __init stifb_init_fb(struct sti_struct *sti, int bpp_pref)
 
        fb_info(&fb->info, "%s %dx%d-%d frame buffer device, %s, id: %04x, mmio: 0x%04lx\n",
                fix->id,
-               var->xres, 
+               var->xres,
                var->yres,
                var->bits_per_pixel,
                dev_name,
-               fb->id, 
+               fb->id,
                fix->mmio_start);
 
        return 0;
@@ -1413,6 +1413,7 @@ out_err1:
        iounmap(info->screen_base);
 out_err0:
        kfree(fb);
+       sti->info = NULL;
        return -ENXIO;
 }
 
@@ -1426,7 +1427,7 @@ static int __init stifb_init(void)
        struct sti_struct *sti;
        struct sti_struct *def_sti;
        int i;
-       
+
 #ifndef MODULE
        char *option = NULL;
 
@@ -1438,7 +1439,7 @@ static int __init stifb_init(void)
                printk(KERN_INFO "stifb: disabled by \"stifb=off\" kernel parameter\n");
                return -ENXIO;
        }
-       
+
        def_sti = sti_get_rom(0);
        if (def_sti) {
                for (i = 1; i <= MAX_STI_ROMS; i++) {
@@ -1472,7 +1473,7 @@ stifb_cleanup(void)
 {
        struct sti_struct *sti;
        int i;
-       
+
        for (i = 1; i <= MAX_STI_ROMS; i++) {
                sti = sti_get_rom(i);
                if (!sti)
@@ -1495,10 +1496,10 @@ int __init
 stifb_setup(char *options)
 {
        int i;
-       
+
        if (!options || !*options)
                return 1;
-       
+
        if (strncmp(options, "off", 3) == 0) {
                stifb_disabled = 1;
                options += 3;
index 216d49c..dabc30a 100644 (file)
@@ -27,6 +27,8 @@
 #include <video/udlfb.h>
 #include "edid.h"
 
+#define OUT_EP_NUM     1       /* The endpoint number we will use */
+
 static const struct fb_fix_screeninfo dlfb_fix = {
        .id =           "udlfb",
        .type =         FB_TYPE_PACKED_PIXELS,
@@ -1541,24 +1543,16 @@ static const struct device_attribute fb_device_attrs[] = {
 static int dlfb_select_std_channel(struct dlfb_data *dlfb)
 {
        int ret;
-       void *buf;
        static const u8 set_def_chn[] = {
                                0x57, 0xCD, 0xDC, 0xA7,
                                0x1C, 0x88, 0x5E, 0x15,
                                0x60, 0xFE, 0xC6, 0x97,
                                0x16, 0x3D, 0x47, 0xF2  };
 
-       buf = kmemdup(set_def_chn, sizeof(set_def_chn), GFP_KERNEL);
-
-       if (!buf)
-               return -ENOMEM;
-
-       ret = usb_control_msg(dlfb->udev, usb_sndctrlpipe(dlfb->udev, 0),
-                       NR_USB_REQUEST_CHANNEL,
+       ret = usb_control_msg_send(dlfb->udev, 0, NR_USB_REQUEST_CHANNEL,
                        (USB_DIR_OUT | USB_TYPE_VENDOR), 0, 0,
-                       buf, sizeof(set_def_chn), USB_CTRL_SET_TIMEOUT);
-
-       kfree(buf);
+                       &set_def_chn, sizeof(set_def_chn), USB_CTRL_SET_TIMEOUT,
+                       GFP_KERNEL);
 
        return ret;
 }
@@ -1652,7 +1646,7 @@ static int dlfb_usb_probe(struct usb_interface *intf,
        struct fb_info *info;
        int retval;
        struct usb_device *usbdev = interface_to_usbdev(intf);
-       struct usb_endpoint_descriptor *out;
+       static u8 out_ep[] = {OUT_EP_NUM + USB_DIR_OUT, 0};
 
        /* usb initialization */
        dlfb = kzalloc(sizeof(*dlfb), GFP_KERNEL);
@@ -1666,9 +1660,9 @@ static int dlfb_usb_probe(struct usb_interface *intf,
        dlfb->udev = usb_get_dev(usbdev);
        usb_set_intfdata(intf, dlfb);
 
-       retval = usb_find_common_endpoints(intf->cur_altsetting, NULL, &out, NULL, NULL);
-       if (retval) {
-               dev_err(&intf->dev, "Device should have at lease 1 bulk endpoint!\n");
+       if (!usb_check_bulk_endpoints(intf, out_ep)) {
+               dev_err(&intf->dev, "Invalid DisplayLink device!\n");
+               retval = -EINVAL;
                goto error;
        }
 
@@ -1927,7 +1921,8 @@ retry:
                }
 
                /* urb->transfer_buffer_length set to actual before submit */
-               usb_fill_bulk_urb(urb, dlfb->udev, usb_sndbulkpipe(dlfb->udev, 1),
+               usb_fill_bulk_urb(urb, dlfb->udev,
+                       usb_sndbulkpipe(dlfb->udev, OUT_EP_NUM),
                        buf, size, dlfb_urb_completion, unode);
                urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
 
index 1007023..b166b7c 100644 (file)
@@ -1,7 +1,7 @@
 /*
  *  valkyriefb.c -- frame buffer device for the PowerMac 'valkyrie' display
  *
- *  Created 8 August 1998 by 
+ *  Created 8 August 1998 by
  *  Martin Costabel <costabel@wanadoo.fr> and Kevin Schoedel
  *
  *  Vmode-switching changes and vmode 15/17 modifications created 29 August
@@ -77,13 +77,13 @@ struct fb_info_valkyrie {
        struct fb_par_valkyrie  par;
        struct cmap_regs        __iomem *cmap_regs;
        unsigned long           cmap_regs_phys;
-       
+
        struct valkyrie_regs    __iomem *valkyrie_regs;
        unsigned long           valkyrie_regs_phys;
-       
+
        __u8                    __iomem *frame_buffer;
        unsigned long           frame_buffer_phys;
-       
+
        int                     sense;
        unsigned long           total_vram;
 
@@ -244,7 +244,7 @@ static inline int valkyrie_vram_reqd(int video_mode, int color_mode)
 {
        int pitch;
        struct valkyrie_regvals *init = valkyrie_reg_init[video_mode-1];
-       
+
        if ((pitch = init->pitch[color_mode]) == 0)
                pitch = 2 * init->pitch[0];
        return init->vres * pitch;
@@ -467,7 +467,7 @@ static int valkyrie_var_to_par(struct fb_var_screeninfo *var,
                printk(KERN_ERR "valkyriefb: vmode %d not valid.\n", vmode);
                return -EINVAL;
        }
-       
+
        if (cmode != CMODE_8 && cmode != CMODE_16) {
                printk(KERN_ERR "valkyriefb: cmode %d not valid.\n", cmode);
                return -EINVAL;
@@ -516,7 +516,7 @@ static void valkyrie_init_fix(struct fb_fix_screeninfo *fix, struct fb_info_valk
        fix->ywrapstep = 0;
        fix->ypanstep = 0;
        fix->xpanstep = 0;
-       
+
 }
 
 /* Fix must already be inited above */
index a945739..6f19909 100644 (file)
@@ -111,7 +111,7 @@ static u_long get_line_length(int xres_virtual, int bpp)
      *  First part, xxxfb_check_var, must not write anything
      *  to hardware, it should only verify and adjust var.
      *  This means it doesn't alter par but it does use hardware
-     *  data from it to check this var. 
+     *  data from it to check this var.
      */
 
 static int vfb_check_var(struct fb_var_screeninfo *var,
@@ -169,7 +169,7 @@ static int vfb_check_var(struct fb_var_screeninfo *var,
 
        /*
         * Now that we checked it we alter var. The reason being is that the video
-        * mode passed in might not work but slight changes to it might make it 
+        * mode passed in might not work but slight changes to it might make it
         * work. This way we let the user know what is acceptable.
         */
        switch (var->bits_per_pixel) {
@@ -235,8 +235,8 @@ static int vfb_check_var(struct fb_var_screeninfo *var,
 }
 
 /* This routine actually sets the video mode. It's in here where we
- * the hardware state info->par and fix which can be affected by the 
- * change in par. For this driver it doesn't do much. 
+ * the hardware state info->par and fix which can be affected by the
+ * change in par. For this driver it doesn't do much.
  */
 static int vfb_set_par(struct fb_info *info)
 {
@@ -379,7 +379,7 @@ static int vfb_pan_display(struct fb_var_screeninfo *var,
 }
 
     /*
-     *  Most drivers don't need their own mmap function 
+     *  Most drivers don't need their own mmap function
      */
 
 static int vfb_mmap(struct fb_info *info,
index d75ab3f..cecdc1c 100644 (file)
@@ -576,8 +576,8 @@ static void ioreq_resume(void)
 int acrn_ioreq_intr_setup(void)
 {
        acrn_setup_intr_handler(ioreq_intr_handler);
-       ioreq_wq = alloc_workqueue("ioreq_wq",
-                                  WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+       ioreq_wq = alloc_ordered_workqueue("ioreq_wq",
+                                          WQ_HIGHPRI | WQ_MEM_RECLAIM);
        if (!ioreq_wq) {
                dev_err(acrn_dev.this_device, "Failed to alloc workqueue!\n");
                acrn_remove_intr_handler();
index f9db079..da2d7ca 100644 (file)
@@ -2,6 +2,7 @@ config SEV_GUEST
        tristate "AMD SEV Guest driver"
        default m
        depends on AMD_MEM_ENCRYPT
+       select CRYPTO
        select CRYPTO_AEAD2
        select CRYPTO_GCM
        help
index 1f5219e..d525934 100644 (file)
@@ -325,8 +325,10 @@ static struct sock_mapping *pvcalls_new_active_socket(
        void *page;
 
        map = kzalloc(sizeof(*map), GFP_KERNEL);
-       if (map == NULL)
+       if (map == NULL) {
+               sock_release(sock);
                return NULL;
+       }
 
        map->fedata = fedata;
        map->sock = sock;
@@ -361,7 +363,7 @@ static struct sock_mapping *pvcalls_new_active_socket(
        map->data.in = map->bytes;
        map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
 
-       map->ioworker.wq = alloc_workqueue("pvcalls_io", WQ_UNBOUND, 1);
+       map->ioworker.wq = alloc_ordered_workqueue("pvcalls_io", 0);
        if (!map->ioworker.wq)
                goto out;
        atomic_set(&map->io, 1);
@@ -418,10 +420,8 @@ static int pvcalls_back_connect(struct xenbus_device *dev,
                                        req->u.connect.ref,
                                        req->u.connect.evtchn,
                                        sock);
-       if (!map) {
+       if (!map)
                ret = -EFAULT;
-               sock_release(sock);
-       }
 
 out:
        rsp = RING_GET_RESPONSE(&fedata->ring, fedata->ring.rsp_prod_pvt++);
@@ -561,7 +561,6 @@ static void __pvcalls_back_accept(struct work_struct *work)
                                        sock);
        if (!map) {
                ret = -EFAULT;
-               sock_release(sock);
                goto out_error;
        }
 
@@ -637,7 +636,7 @@ static int pvcalls_back_bind(struct xenbus_device *dev,
 
        INIT_WORK(&map->register_work, __pvcalls_back_accept);
        spin_lock_init(&map->copy_lock);
-       map->wq = alloc_workqueue("pvcalls_wq", WQ_UNBOUND, 1);
+       map->wq = alloc_ordered_workqueue("pvcalls_wq", 0);
        if (!map->wq) {
                ret = -ENOMEM;
                goto out;
index 6c31b8c..2996fb0 100644 (file)
@@ -374,6 +374,28 @@ v9fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
        return ret;
 }
 
+/*
+ * v9fs_file_splice_read - splice-read from a file
+ * @in: The 9p file to read from
+ * @ppos: Where to find/update the file position
+ * @pipe: The pipe to splice into
+ * @len: The maximum amount of data to splice
+ * @flags: SPLICE_F_* flags
+ */
+static ssize_t v9fs_file_splice_read(struct file *in, loff_t *ppos,
+                                    struct pipe_inode_info *pipe,
+                                    size_t len, unsigned int flags)
+{
+       struct p9_fid *fid = in->private_data;
+
+       p9_debug(P9_DEBUG_VFS, "fid %d count %zu offset %lld\n",
+                fid->fid, len, *ppos);
+
+       if (fid->mode & P9L_DIRECT)
+               return copy_splice_read(in, ppos, pipe, len, flags);
+       return filemap_splice_read(in, ppos, pipe, len, flags);
+}
+
 /**
  * v9fs_file_write_iter - write to a file
  * @iocb: The operation parameters
@@ -569,7 +591,7 @@ const struct file_operations v9fs_file_operations = {
        .release = v9fs_dir_release,
        .lock = v9fs_file_lock,
        .mmap = generic_file_readonly_mmap,
-       .splice_read = generic_file_splice_read,
+       .splice_read = v9fs_file_splice_read,
        .splice_write = iter_file_splice_write,
        .fsync = v9fs_file_fsync,
 };
@@ -583,7 +605,7 @@ const struct file_operations v9fs_file_operations_dotl = {
        .lock = v9fs_file_lock_dotl,
        .flock = v9fs_file_flock_dotl,
        .mmap = v9fs_file_mmap,
-       .splice_read = generic_file_splice_read,
+       .splice_read = v9fs_file_splice_read,
        .splice_write = iter_file_splice_write,
        .fsync = v9fs_file_fsync_dotl,
 };
index cc07a0c..18d034e 100644 (file)
@@ -368,14 +368,7 @@ config NFS_V4_2_SSC_HELPER
 source "net/sunrpc/Kconfig"
 source "fs/ceph/Kconfig"
 
-source "fs/cifs/Kconfig"
-source "fs/ksmbd/Kconfig"
-
-config SMBFS_COMMON
-       tristate
-       default y if CIFS=y || SMB_SERVER=y
-       default m if CIFS=m || SMB_SERVER=m
-
+source "fs/smb/Kconfig"
 source "fs/coda/Kconfig"
 source "fs/afs/Kconfig"
 source "fs/9p/Kconfig"
index 834f1c3..e513aae 100644 (file)
@@ -17,14 +17,8 @@ obj-y :=     open.o read_write.o file_table.o super.o \
                fs_types.o fs_context.o fs_parser.o fsopen.o init.o \
                kernel_read_file.o mnt_idmapping.o remap_range.o
 
-ifeq ($(CONFIG_BLOCK),y)
-obj-y +=       buffer.o mpage.o
-else
-obj-y +=       no-block.o
-endif
-
-obj-$(CONFIG_PROC_FS) += proc_namespace.o
-
+obj-$(CONFIG_BLOCK)            += buffer.o mpage.o
+obj-$(CONFIG_PROC_FS)          += proc_namespace.o
 obj-$(CONFIG_LEGACY_DIRECT_IO) += direct-io.o
 obj-y                          += notify/
 obj-$(CONFIG_EPOLL)            += eventpoll.o
@@ -95,9 +89,7 @@ obj-$(CONFIG_LOCKD)           += lockd/
 obj-$(CONFIG_NLS)              += nls/
 obj-y                          += unicode/
 obj-$(CONFIG_SYSV_FS)          += sysv/
-obj-$(CONFIG_SMBFS_COMMON)     += smbfs_common/
-obj-$(CONFIG_CIFS)             += cifs/
-obj-$(CONFIG_SMB_SERVER)       += ksmbd/
+obj-$(CONFIG_SMBFS)            += smb/
 obj-$(CONFIG_HPFS_FS)          += hpfs/
 obj-$(CONFIG_NTFS_FS)          += ntfs/
 obj-$(CONFIG_NTFS3_FS)         += ntfs3/
index 754afb1..ee80718 100644 (file)
@@ -28,7 +28,7 @@ const struct file_operations adfs_file_operations = {
        .mmap           = generic_file_mmap,
        .fsync          = generic_file_fsync,
        .write_iter     = generic_file_write_iter,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
 };
 
 const struct inode_operations adfs_file_inode_operations = {
index 8daeed3..e43f2f0 100644 (file)
@@ -1001,7 +1001,7 @@ const struct file_operations affs_file_operations = {
        .open           = affs_file_open,
        .release        = affs_file_release,
        .fsync          = affs_file_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
 };
 
 const struct inode_operations affs_file_inode_operations = {
index 4dd97af..5219182 100644 (file)
@@ -1358,6 +1358,7 @@ static int afs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
        op->dentry      = dentry;
        op->create.mode = S_IFDIR | mode;
        op->create.reason = afs_edit_dir_for_mkdir;
+       op->mtime       = current_time(dir);
        op->ops         = &afs_mkdir_operation;
        return afs_do_sync_operation(op);
 }
@@ -1661,6 +1662,7 @@ static int afs_create(struct mnt_idmap *idmap, struct inode *dir,
        op->dentry      = dentry;
        op->create.mode = S_IFREG | mode;
        op->create.reason = afs_edit_dir_for_create;
+       op->mtime       = current_time(dir);
        op->ops         = &afs_create_operation;
        return afs_do_sync_operation(op);
 
@@ -1796,6 +1798,7 @@ static int afs_symlink(struct mnt_idmap *idmap, struct inode *dir,
        op->ops                 = &afs_symlink_operation;
        op->create.reason       = afs_edit_dir_for_symlink;
        op->create.symlink      = content;
+       op->mtime               = current_time(dir);
        return afs_do_sync_operation(op);
 
 error:
index 719b313..d37dd20 100644 (file)
@@ -25,6 +25,9 @@ static void afs_invalidate_folio(struct folio *folio, size_t offset,
 static bool afs_release_folio(struct folio *folio, gfp_t gfp_flags);
 
 static ssize_t afs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter);
+static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
+                                   struct pipe_inode_info *pipe,
+                                   size_t len, unsigned int flags);
 static void afs_vm_open(struct vm_area_struct *area);
 static void afs_vm_close(struct vm_area_struct *area);
 static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
@@ -36,7 +39,7 @@ const struct file_operations afs_file_operations = {
        .read_iter      = afs_file_read_iter,
        .write_iter     = afs_file_write,
        .mmap           = afs_file_mmap,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = afs_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .fsync          = afs_fsync,
        .lock           = afs_lock,
@@ -587,3 +590,18 @@ static ssize_t afs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 
        return generic_file_read_iter(iocb, iter);
 }
+
+static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
+                                   struct pipe_inode_info *pipe,
+                                   size_t len, unsigned int flags)
+{
+       struct afs_vnode *vnode = AFS_FS_I(file_inode(in));
+       struct afs_file *af = in->private_data;
+       int ret;
+
+       ret = afs_validate(vnode, af->key);
+       if (ret < 0)
+               return ret;
+
+       return filemap_splice_read(in, ppos, pipe, len, flags);
+}
index d1c7068..58452b8 100644 (file)
@@ -115,8 +115,8 @@ responded:
                }
        }
 
-       if (rxrpc_kernel_get_srtt(call->net->socket, call->rxcall, &rtt_us) &&
-           rtt_us < server->probe.rtt) {
+       rxrpc_kernel_get_srtt(call->net->socket, call->rxcall, &rtt_us);
+       if (rtt_us < server->probe.rtt) {
                server->probe.rtt = rtt_us;
                server->rtt = rtt_us;
                alist->preferred = index;
index c822d60..8750b99 100644 (file)
@@ -731,6 +731,7 @@ static int afs_writepages_region(struct address_space *mapping,
                         * (changing page->mapping to NULL), or even swizzled
                         * back from swapper_space to tmpfs file mapping
                         */
+try_again:
                        if (wbc->sync_mode != WB_SYNC_NONE) {
                                ret = folio_lock_killable(folio);
                                if (ret < 0) {
@@ -757,12 +758,14 @@ static int afs_writepages_region(struct address_space *mapping,
 #ifdef CONFIG_AFS_FSCACHE
                                        folio_wait_fscache(folio);
 #endif
-                               } else {
-                                       start += folio_size(folio);
+                                       goto try_again;
                                }
+
+                               start += folio_size(folio);
                                if (wbc->sync_mode == WB_SYNC_NONE) {
                                        if (skips >= 5 || need_resched()) {
                                                *_next = start;
+                                               folio_batch_release(&fbatch);
                                                _leave(" = 0 [%llx]", *_next);
                                                return 0;
                                        }
index b0b17bd..77e3361 100644 (file)
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -530,7 +530,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
        for (i = 0; i < nr_pages; i++) {
                struct page *page;
                page = find_or_create_page(file->f_mapping,
-                                          i, GFP_HIGHUSER | __GFP_ZERO);
+                                          i, GFP_USER | __GFP_ZERO);
                if (!page)
                        break;
                pr_debug("pid(%d) page[%d]->count=%d\n",
@@ -571,7 +571,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
        ctx->user_id = ctx->mmap_base;
        ctx->nr_events = nr_events; /* trusted copy */
 
-       ring = kmap_atomic(ctx->ring_pages[0]);
+       ring = page_address(ctx->ring_pages[0]);
        ring->nr = nr_events;   /* user copy */
        ring->id = ~0U;
        ring->head = ring->tail = 0;
@@ -579,7 +579,6 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
        ring->compat_features = AIO_RING_COMPAT_FEATURES;
        ring->incompat_features = AIO_RING_INCOMPAT_FEATURES;
        ring->header_length = sizeof(struct aio_ring);
-       kunmap_atomic(ring);
        flush_dcache_page(ctx->ring_pages[0]);
 
        return 0;
@@ -682,9 +681,8 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
                                         * we are protected from page migration
                                         * changes ring_pages by ->ring_lock.
                                         */
-                                       ring = kmap_atomic(ctx->ring_pages[0]);
+                                       ring = page_address(ctx->ring_pages[0]);
                                        ring->id = ctx->id;
-                                       kunmap_atomic(ring);
                                        return 0;
                                }
 
@@ -1025,9 +1023,8 @@ static void user_refill_reqs_available(struct kioctx *ctx)
                 * against ctx->completed_events below will make sure we do the
                 * safe/right thing.
                 */
-               ring = kmap_atomic(ctx->ring_pages[0]);
+               ring = page_address(ctx->ring_pages[0]);
                head = ring->head;
-               kunmap_atomic(ring);
 
                refill_reqs_available(ctx, head, ctx->tail);
        }
@@ -1133,12 +1130,11 @@ static void aio_complete(struct aio_kiocb *iocb)
        if (++tail >= ctx->nr_events)
                tail = 0;
 
-       ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
+       ev_page = page_address(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
        event = ev_page + pos % AIO_EVENTS_PER_PAGE;
 
        *event = iocb->ki_res;
 
-       kunmap_atomic(ev_page);
        flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
 
        pr_debug("%p[%u]: %p: %p %Lx %Lx %Lx\n", ctx, tail, iocb,
@@ -1152,10 +1148,9 @@ static void aio_complete(struct aio_kiocb *iocb)
 
        ctx->tail = tail;
 
-       ring = kmap_atomic(ctx->ring_pages[0]);
+       ring = page_address(ctx->ring_pages[0]);
        head = ring->head;
        ring->tail = tail;
-       kunmap_atomic(ring);
        flush_dcache_page(ctx->ring_pages[0]);
 
        ctx->completed_events++;
@@ -1215,10 +1210,9 @@ static long aio_read_events_ring(struct kioctx *ctx,
        mutex_lock(&ctx->ring_lock);
 
        /* Access to ->ring_pages here is protected by ctx->ring_lock. */
-       ring = kmap_atomic(ctx->ring_pages[0]);
+       ring = page_address(ctx->ring_pages[0]);
        head = ring->head;
        tail = ring->tail;
-       kunmap_atomic(ring);
 
        /*
         * Ensure that once we've read the current tail pointer, that
@@ -1250,10 +1244,9 @@ static long aio_read_events_ring(struct kioctx *ctx,
                avail = min(avail, nr - ret);
                avail = min_t(long, avail, AIO_EVENTS_PER_PAGE - pos);
 
-               ev = kmap(page);
+               ev = page_address(page);
                copy_ret = copy_to_user(event + ret, ev + pos,
                                        sizeof(*ev) * avail);
-               kunmap(page);
 
                if (unlikely(copy_ret)) {
                        ret = -EFAULT;
@@ -1265,9 +1258,8 @@ static long aio_read_events_ring(struct kioctx *ctx,
                head %= ctx->nr_events;
        }
 
-       ring = kmap_atomic(ctx->ring_pages[0]);
+       ring = page_address(ctx->ring_pages[0]);
        ring->head = head;
-       kunmap_atomic(ring);
        flush_dcache_page(ctx->ring_pages[0]);
 
        pr_debug("%li  h%u t%u\n", ret, head, tail);
index 6baf90b..93046c9 100644 (file)
@@ -600,7 +600,7 @@ static int autofs_dir_symlink(struct mnt_idmap *idmap,
        p_ino = autofs_dentry_ino(dentry->d_parent);
        p_ino->count++;
 
-       dir->i_mtime = current_time(dir);
+       dir->i_mtime = dir->i_ctime = current_time(dir);
 
        return 0;
 }
@@ -633,7 +633,7 @@ static int autofs_dir_unlink(struct inode *dir, struct dentry *dentry)
        d_inode(dentry)->i_size = 0;
        clear_nlink(d_inode(dentry));
 
-       dir->i_mtime = current_time(dir);
+       dir->i_mtime = dir->i_ctime = current_time(dir);
 
        spin_lock(&sbi->lookup_lock);
        __autofs_add_expiring(dentry);
@@ -749,7 +749,7 @@ static int autofs_dir_mkdir(struct mnt_idmap *idmap,
        p_ino = autofs_dentry_ino(dentry->d_parent);
        p_ino->count++;
        inc_nlink(dir);
-       dir->i_mtime = current_time(dir);
+       dir->i_mtime = dir->i_ctime = current_time(dir);
 
        return 0;
 }
index 57ae5ee..adc2230 100644 (file)
@@ -27,7 +27,7 @@ const struct file_operations bfs_file_operations = {
        .read_iter      = generic_file_read_iter,
        .write_iter     = generic_file_write_iter,
        .mmap           = generic_file_mmap,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
 };
 
 static int bfs_move_block(unsigned long from, unsigned long to,
index aac2404..ce083e9 100644 (file)
@@ -71,6 +71,16 @@ bool btrfs_workqueue_normal_congested(const struct btrfs_workqueue *wq)
        return atomic_read(&wq->pending) > wq->thresh * 2;
 }
 
+static void btrfs_init_workqueue(struct btrfs_workqueue *wq,
+                                struct btrfs_fs_info *fs_info)
+{
+       wq->fs_info = fs_info;
+       atomic_set(&wq->pending, 0);
+       INIT_LIST_HEAD(&wq->ordered_list);
+       spin_lock_init(&wq->list_lock);
+       spin_lock_init(&wq->thres_lock);
+}
+
 struct btrfs_workqueue *btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info,
                                              const char *name, unsigned int flags,
                                              int limit_active, int thresh)
@@ -80,9 +90,9 @@ struct btrfs_workqueue *btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info,
        if (!ret)
                return NULL;
 
-       ret->fs_info = fs_info;
+       btrfs_init_workqueue(ret, fs_info);
+
        ret->limit_active = limit_active;
-       atomic_set(&ret->pending, 0);
        if (thresh == 0)
                thresh = DFT_THRESHOLD;
        /* For low threshold, disabling threshold is a better choice */
@@ -106,9 +116,33 @@ struct btrfs_workqueue *btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info,
                return NULL;
        }
 
-       INIT_LIST_HEAD(&ret->ordered_list);
-       spin_lock_init(&ret->list_lock);
-       spin_lock_init(&ret->thres_lock);
+       trace_btrfs_workqueue_alloc(ret, name);
+       return ret;
+}
+
+struct btrfs_workqueue *btrfs_alloc_ordered_workqueue(
+                               struct btrfs_fs_info *fs_info, const char *name,
+                               unsigned int flags)
+{
+       struct btrfs_workqueue *ret;
+
+       ret = kzalloc(sizeof(*ret), GFP_KERNEL);
+       if (!ret)
+               return NULL;
+
+       btrfs_init_workqueue(ret, fs_info);
+
+       /* Ordered workqueues don't allow @max_active adjustments. */
+       ret->limit_active = 1;
+       ret->current_active = 1;
+       ret->thresh = NO_THRESHOLD;
+
+       ret->normal_wq = alloc_ordered_workqueue("btrfs-%s", flags, name);
+       if (!ret->normal_wq) {
+               kfree(ret);
+               return NULL;
+       }
+
        trace_btrfs_workqueue_alloc(ret, name);
        return ret;
 }
index 6e2596d..30f66c5 100644 (file)
@@ -31,6 +31,9 @@ struct btrfs_workqueue *btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info,
                                              unsigned int flags,
                                              int limit_active,
                                              int thresh);
+struct btrfs_workqueue *btrfs_alloc_ordered_workqueue(
+                               struct btrfs_fs_info *fs_info, const char *name,
+                               unsigned int flags);
 void btrfs_init_work(struct btrfs_work *work, btrfs_func_t func,
                     btrfs_func_t ordered_func, btrfs_func_t ordered_free);
 void btrfs_queue_work(struct btrfs_workqueue *wq,
index e54f088..79336fa 100644 (file)
@@ -45,7 +45,8 @@ static int check_extent_in_eb(struct btrfs_backref_walk_ctx *ctx,
        int root_count;
        bool cached;
 
-       if (!btrfs_file_extent_compression(eb, fi) &&
+       if (!ctx->ignore_extent_item_pos &&
+           !btrfs_file_extent_compression(eb, fi) &&
            !btrfs_file_extent_encryption(eb, fi) &&
            !btrfs_file_extent_other_encoding(eb, fi)) {
                u64 data_offset;
@@ -552,7 +553,7 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx,
                                count++;
                        else
                                goto next;
-                       if (!ctx->ignore_extent_item_pos) {
+                       if (!ctx->skip_inode_ref_list) {
                                ret = check_extent_in_eb(ctx, &key, eb, fi, &eie);
                                if (ret == BTRFS_ITERATE_EXTENT_INODES_STOP ||
                                    ret < 0)
@@ -564,7 +565,7 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx,
                                                  eie, (void **)&old, GFP_NOFS);
                        if (ret < 0)
                                break;
-                       if (!ret && !ctx->ignore_extent_item_pos) {
+                       if (!ret && !ctx->skip_inode_ref_list) {
                                while (old->next)
                                        old = old->next;
                                old->next = eie;
@@ -1606,7 +1607,7 @@ again:
                                goto out;
                }
                if (ref->count && ref->parent) {
-                       if (!ctx->ignore_extent_item_pos && !ref->inode_list &&
+                       if (!ctx->skip_inode_ref_list && !ref->inode_list &&
                            ref->level == 0) {
                                struct btrfs_tree_parent_check check = { 0 };
                                struct extent_buffer *eb;
@@ -1647,7 +1648,7 @@ again:
                                                  (void **)&eie, GFP_NOFS);
                        if (ret < 0)
                                goto out;
-                       if (!ret && !ctx->ignore_extent_item_pos) {
+                       if (!ret && !ctx->skip_inode_ref_list) {
                                /*
                                 * We've recorded that parent, so we must extend
                                 * its inode list here.
@@ -1743,7 +1744,7 @@ int btrfs_find_all_leafs(struct btrfs_backref_walk_ctx *ctx)
 static int btrfs_find_all_roots_safe(struct btrfs_backref_walk_ctx *ctx)
 {
        const u64 orig_bytenr = ctx->bytenr;
-       const bool orig_ignore_extent_item_pos = ctx->ignore_extent_item_pos;
+       const bool orig_skip_inode_ref_list = ctx->skip_inode_ref_list;
        bool roots_ulist_allocated = false;
        struct ulist_iterator uiter;
        int ret = 0;
@@ -1764,7 +1765,7 @@ static int btrfs_find_all_roots_safe(struct btrfs_backref_walk_ctx *ctx)
                roots_ulist_allocated = true;
        }
 
-       ctx->ignore_extent_item_pos = true;
+       ctx->skip_inode_ref_list = true;
 
        ULIST_ITER_INIT(&uiter);
        while (1) {
@@ -1789,7 +1790,7 @@ static int btrfs_find_all_roots_safe(struct btrfs_backref_walk_ctx *ctx)
        ulist_free(ctx->refs);
        ctx->refs = NULL;
        ctx->bytenr = orig_bytenr;
-       ctx->ignore_extent_item_pos = orig_ignore_extent_item_pos;
+       ctx->skip_inode_ref_list = orig_skip_inode_ref_list;
 
        return ret;
 }
@@ -1912,7 +1913,7 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr,
                goto out_trans;
        }
 
-       walk_ctx.ignore_extent_item_pos = true;
+       walk_ctx.skip_inode_ref_list = true;
        walk_ctx.trans = trans;
        walk_ctx.fs_info = fs_info;
        walk_ctx.refs = &ctx->refs;
index ef6bbea..1616e3e 100644 (file)
@@ -60,6 +60,12 @@ struct btrfs_backref_walk_ctx {
         * @extent_item_pos is ignored.
         */
        bool ignore_extent_item_pos;
+       /*
+        * If true and bytenr corresponds to a data extent, then the inode list
+        * (each member describing inode number, file offset and root) is not
+        * added to each reference added to the @refs ulist.
+        */
+       bool skip_inode_ref_list;
        /* A valid transaction handle or NULL. */
        struct btrfs_trans_handle *trans;
        /*
index 5379c47..12b1244 100644 (file)
@@ -27,6 +27,17 @@ struct btrfs_failed_bio {
        atomic_t repair_count;
 };
 
+/* Is this a data path I/O that needs storage layer checksum and repair? */
+static inline bool is_data_bbio(struct btrfs_bio *bbio)
+{
+       return bbio->inode && is_data_inode(&bbio->inode->vfs_inode);
+}
+
+static bool bbio_has_ordered_extent(struct btrfs_bio *bbio)
+{
+       return is_data_bbio(bbio) && btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE;
+}
+
 /*
  * Initialize a btrfs_bio structure.  This skips the embedded bio itself as it
  * is already initialized by the block layer.
@@ -61,20 +72,6 @@ struct btrfs_bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
        return bbio;
 }
 
-static blk_status_t btrfs_bio_extract_ordered_extent(struct btrfs_bio *bbio)
-{
-       struct btrfs_ordered_extent *ordered;
-       int ret;
-
-       ordered = btrfs_lookup_ordered_extent(bbio->inode, bbio->file_offset);
-       if (WARN_ON_ONCE(!ordered))
-               return BLK_STS_IOERR;
-       ret = btrfs_extract_ordered_extent(bbio, ordered);
-       btrfs_put_ordered_extent(ordered);
-
-       return errno_to_blk_status(ret);
-}
-
 static struct btrfs_bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
                                         struct btrfs_bio *orig_bbio,
                                         u64 map_length, bool use_append)
@@ -95,13 +92,41 @@ static struct btrfs_bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
        btrfs_bio_init(bbio, fs_info, NULL, orig_bbio);
        bbio->inode = orig_bbio->inode;
        bbio->file_offset = orig_bbio->file_offset;
-       if (!(orig_bbio->bio.bi_opf & REQ_BTRFS_ONE_ORDERED))
-               orig_bbio->file_offset += map_length;
-
+       orig_bbio->file_offset += map_length;
+       if (bbio_has_ordered_extent(bbio)) {
+               refcount_inc(&orig_bbio->ordered->refs);
+               bbio->ordered = orig_bbio->ordered;
+       }
        atomic_inc(&orig_bbio->pending_ios);
        return bbio;
 }
 
+/* Free a bio that was never submitted to the underlying device. */
+static void btrfs_cleanup_bio(struct btrfs_bio *bbio)
+{
+       if (bbio_has_ordered_extent(bbio))
+               btrfs_put_ordered_extent(bbio->ordered);
+       bio_put(&bbio->bio);
+}
+
+static void __btrfs_bio_end_io(struct btrfs_bio *bbio)
+{
+       if (bbio_has_ordered_extent(bbio)) {
+               struct btrfs_ordered_extent *ordered = bbio->ordered;
+
+               bbio->end_io(bbio);
+               btrfs_put_ordered_extent(ordered);
+       } else {
+               bbio->end_io(bbio);
+       }
+}
+
+void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
+{
+       bbio->bio.bi_status = status;
+       __btrfs_bio_end_io(bbio);
+}
+
 static void btrfs_orig_write_end_io(struct bio *bio);
 
 static void btrfs_bbio_propagate_error(struct btrfs_bio *bbio,
@@ -130,12 +155,12 @@ static void btrfs_orig_bbio_end_io(struct btrfs_bio *bbio)
 
                if (bbio->bio.bi_status)
                        btrfs_bbio_propagate_error(bbio, orig_bbio);
-               bio_put(&bbio->bio);
+               btrfs_cleanup_bio(bbio);
                bbio = orig_bbio;
        }
 
        if (atomic_dec_and_test(&bbio->pending_ios))
-               bbio->end_io(bbio);
+               __btrfs_bio_end_io(bbio);
 }
 
 static int next_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror)
@@ -327,10 +352,10 @@ static void btrfs_end_bio_work(struct work_struct *work)
        struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, end_io_work);
 
        /* Metadata reads are checked and repaired by the submitter. */
-       if (bbio->inode && !(bbio->bio.bi_opf & REQ_META))
+       if (is_data_bbio(bbio))
                btrfs_check_read_bio(bbio, bbio->bio.bi_private);
        else
-               bbio->end_io(bbio);
+               btrfs_orig_bbio_end_io(bbio);
 }
 
 static void btrfs_simple_end_io(struct bio *bio)
@@ -348,7 +373,7 @@ static void btrfs_simple_end_io(struct bio *bio)
                INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work);
                queue_work(btrfs_end_io_wq(fs_info, bio), &bbio->end_io_work);
        } else {
-               if (bio_op(bio) == REQ_OP_ZONE_APPEND)
+               if (bio_op(bio) == REQ_OP_ZONE_APPEND && !bio->bi_status)
                        btrfs_record_physical_zoned(bbio);
                btrfs_orig_bbio_end_io(bbio);
        }
@@ -361,8 +386,7 @@ static void btrfs_raid56_end_io(struct bio *bio)
 
        btrfs_bio_counter_dec(bioc->fs_info);
        bbio->mirror_num = bioc->mirror_num;
-       if (bio_op(bio) == REQ_OP_READ && bbio->inode &&
-           !(bbio->bio.bi_opf & REQ_META))
+       if (bio_op(bio) == REQ_OP_READ && is_data_bbio(bbio))
                btrfs_check_read_bio(bbio, NULL);
        else
                btrfs_orig_bbio_end_io(bbio);
@@ -472,13 +496,12 @@ static void btrfs_submit_mirrored_bio(struct btrfs_io_context *bioc, int dev_nr)
 static void __btrfs_submit_bio(struct bio *bio, struct btrfs_io_context *bioc,
                               struct btrfs_io_stripe *smap, int mirror_num)
 {
-       /* Do not leak our private flag into the block layer. */
-       bio->bi_opf &= ~REQ_BTRFS_ONE_ORDERED;
-
        if (!bioc) {
                /* Single mirror read/write fast path. */
                btrfs_bio(bio)->mirror_num = mirror_num;
                bio->bi_iter.bi_sector = smap->physical >> SECTOR_SHIFT;
+               if (bio_op(bio) != REQ_OP_READ)
+                       btrfs_bio(bio)->orig_physical = smap->physical;
                bio->bi_private = smap->dev;
                bio->bi_end_io = btrfs_simple_end_io;
                btrfs_submit_dev_bio(smap->dev, bio);
@@ -574,27 +597,20 @@ static void run_one_async_free(struct btrfs_work *work)
 
 static bool should_async_write(struct btrfs_bio *bbio)
 {
-       /*
-        * If the I/O is not issued by fsync and friends, (->sync_writers != 0),
-        * then try to defer the submission to a workqueue to parallelize the
-        * checksum calculation.
-        */
-       if (atomic_read(&bbio->inode->sync_writers))
+       /* Submit synchronously if the checksum implementation is fast. */
+       if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags))
                return false;
 
        /*
-        * Submit metadata writes synchronously if the checksum implementation
-        * is fast, or we are on a zoned device that wants I/O to be submitted
-        * in order.
+        * Try to defer the submission to a workqueue to parallelize the
+        * checksum calculation unless the I/O is issued synchronously.
         */
-       if (bbio->bio.bi_opf & REQ_META) {
-               struct btrfs_fs_info *fs_info = bbio->fs_info;
+       if (op_is_sync(bbio->bio.bi_opf))
+               return false;
 
-               if (btrfs_is_zoned(fs_info))
-                       return false;
-               if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
-                       return false;
-       }
+       /* Zoned devices require I/O to be submitted in order. */
+       if ((bbio->bio.bi_opf & REQ_META) && btrfs_is_zoned(bbio->fs_info))
+               return false;
 
        return true;
 }
@@ -622,10 +638,7 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio,
 
        btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
                        run_one_async_free);
-       if (op_is_sync(bbio->bio.bi_opf))
-               btrfs_queue_work(fs_info->hipri_workers, &async->work);
-       else
-               btrfs_queue_work(fs_info->workers, &async->work);
+       btrfs_queue_work(fs_info->workers, &async->work);
        return true;
 }
 
@@ -635,7 +648,7 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
        struct btrfs_fs_info *fs_info = bbio->fs_info;
        struct btrfs_bio *orig_bbio = bbio;
        struct bio *bio = &bbio->bio;
-       u64 logical = bio->bi_iter.bi_sector << 9;
+       u64 logical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
        u64 length = bio->bi_iter.bi_size;
        u64 map_length = length;
        bool use_append = btrfs_use_zone_append(bbio);
@@ -645,8 +658,8 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
        int error;
 
        btrfs_bio_counter_inc_blocked(fs_info);
-       error = __btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
-                                 &bioc, &smap, &mirror_num, 1);
+       error = btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
+                               &bioc, &smap, &mirror_num, 1);
        if (error) {
                ret = errno_to_blk_status(error);
                goto fail;
@@ -665,7 +678,7 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
         * Save the iter for the end_io handler and preload the checksums for
         * data reads.
         */
-       if (bio_op(bio) == REQ_OP_READ && inode && !(bio->bi_opf & REQ_META)) {
+       if (bio_op(bio) == REQ_OP_READ && is_data_bbio(bbio)) {
                bbio->saved_iter = bio->bi_iter;
                ret = btrfs_lookup_bio_sums(bbio);
                if (ret)
@@ -676,9 +689,6 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
                if (use_append) {
                        bio->bi_opf &= ~REQ_OP_WRITE;
                        bio->bi_opf |= REQ_OP_ZONE_APPEND;
-                       ret = btrfs_bio_extract_ordered_extent(bbio);
-                       if (ret)
-                               goto fail_put_bio;
                }
 
                /*
@@ -695,6 +705,10 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
                        ret = btrfs_bio_csum(bbio);
                        if (ret)
                                goto fail_put_bio;
+               } else if (use_append) {
+                       ret = btrfs_alloc_dummy_sum(bbio);
+                       if (ret)
+                               goto fail_put_bio;
                }
        }
 
@@ -704,7 +718,7 @@ done:
 
 fail_put_bio:
        if (map_length < length)
-               bio_put(bio);
+               btrfs_cleanup_bio(bbio);
 fail:
        btrfs_bio_counter_dec(fs_info);
        btrfs_bio_end_io(orig_bbio, ret);
@@ -811,10 +825,6 @@ void btrfs_submit_repair_write(struct btrfs_bio *bbio, int mirror_num, bool dev_
                goto fail;
 
        if (dev_replace) {
-               if (btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE && btrfs_is_zoned(fs_info)) {
-                       bbio->bio.bi_opf &= ~REQ_OP_WRITE;
-                       bbio->bio.bi_opf |= REQ_OP_ZONE_APPEND;
-               }
                ASSERT(smap.dev == fs_info->dev_replace.srcdev);
                smap.dev = fs_info->dev_replace.tgtdev;
        }
index a8eca3a..ca79dec 100644 (file)
@@ -39,8 +39,8 @@ struct btrfs_bio {
 
        union {
                /*
-                * Data checksumming and original I/O information for internal
-                * use in the btrfs_submit_bio machinery.
+                * For data reads: checksumming and original I/O information.
+                * (for internal use in the btrfs_submit_bio machinery only)
                 */
                struct {
                        u8 *csum;
@@ -48,7 +48,20 @@ struct btrfs_bio {
                        struct bvec_iter saved_iter;
                };
 
-               /* For metadata parentness verification. */
+               /*
+                * For data writes:
+                * - ordered extent covering the bio
+                * - pointer to the checksums for this bio
+                * - original physical address from the allocator
+                *   (for zone append only)
+                */
+               struct {
+                       struct btrfs_ordered_extent *ordered;
+                       struct btrfs_ordered_sum *sums;
+                       u64 orig_physical;
+               };
+
+               /* For metadata reads: parentness verification. */
                struct btrfs_tree_parent_check parent_check;
        };
 
@@ -84,15 +97,7 @@ void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_fs_info *fs_info,
 struct btrfs_bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
                                  struct btrfs_fs_info *fs_info,
                                  btrfs_bio_end_io_t end_io, void *private);
-
-static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
-{
-       bbio->bio.bi_status = status;
-       bbio->end_io(bbio);
-}
-
-/* Bio only refers to one ordered extent. */
-#define REQ_BTRFS_ONE_ORDERED                  REQ_DRV
+void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status);
 
 /* Submit using blkcg_punt_bio_submit. */
 #define REQ_BTRFS_CGROUP_PUNT                  REQ_FS_PRIVATE
index 957ad1c..48ae509 100644 (file)
@@ -95,14 +95,21 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags)
        }
        allowed &= flags;
 
-       if (allowed & BTRFS_BLOCK_GROUP_RAID6)
+       /* Select the highest-redundancy RAID level. */
+       if (allowed & BTRFS_BLOCK_GROUP_RAID1C4)
+               allowed = BTRFS_BLOCK_GROUP_RAID1C4;
+       else if (allowed & BTRFS_BLOCK_GROUP_RAID6)
                allowed = BTRFS_BLOCK_GROUP_RAID6;
+       else if (allowed & BTRFS_BLOCK_GROUP_RAID1C3)
+               allowed = BTRFS_BLOCK_GROUP_RAID1C3;
        else if (allowed & BTRFS_BLOCK_GROUP_RAID5)
                allowed = BTRFS_BLOCK_GROUP_RAID5;
        else if (allowed & BTRFS_BLOCK_GROUP_RAID10)
                allowed = BTRFS_BLOCK_GROUP_RAID10;
        else if (allowed & BTRFS_BLOCK_GROUP_RAID1)
                allowed = BTRFS_BLOCK_GROUP_RAID1;
+       else if (allowed & BTRFS_BLOCK_GROUP_DUP)
+               allowed = BTRFS_BLOCK_GROUP_DUP;
        else if (allowed & BTRFS_BLOCK_GROUP_RAID0)
                allowed = BTRFS_BLOCK_GROUP_RAID0;
 
@@ -1633,11 +1640,14 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
 {
        struct btrfs_fs_info *fs_info = bg->fs_info;
 
+       trace_btrfs_add_unused_block_group(bg);
        spin_lock(&fs_info->unused_bgs_lock);
        if (list_empty(&bg->bg_list)) {
                btrfs_get_block_group(bg);
-               trace_btrfs_add_unused_block_group(bg);
                list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
+       } else {
+               /* Pull out the block group from the reclaim_bgs list. */
+               list_move_tail(&bg->bg_list, &fs_info->unused_bgs);
        }
        spin_unlock(&fs_info->unused_bgs_lock);
 }
@@ -1791,8 +1801,15 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
                }
                spin_unlock(&bg->lock);
 
-               /* Get out fast, in case we're unmounting the filesystem */
-               if (btrfs_fs_closing(fs_info)) {
+               /*
+                * Get out fast, in case we're read-only or unmounting the
+                * filesystem. It is OK to drop block groups from the list even
+                * for the read-only case. As we did sb_start_write(),
+                * "mount -o remount,ro" won't happen and read-only filesystem
+                * means it is forced read-only due to a fatal error. So, it
+                * never gets back to read-write to let us reclaim again.
+                */
+               if (btrfs_need_cleaner_sleep(fs_info)) {
                        up_write(&space_info->groups_sem);
                        goto next;
                }
@@ -1823,11 +1840,27 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
                }
 
 next:
+               if (ret)
+                       btrfs_mark_bg_to_reclaim(bg);
                btrfs_put_block_group(bg);
+
+               mutex_unlock(&fs_info->reclaim_bgs_lock);
+               /*
+                * Reclaiming all the block groups in the list can take really
+                * long.  Prioritize cleaning up unused block groups.
+                */
+               btrfs_delete_unused_bgs(fs_info);
+               /*
+                * If we are interrupted by a balance, we can just bail out. The
+                * cleaner thread restart again if necessary.
+                */
+               if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
+                       goto end;
                spin_lock(&fs_info->unused_bgs_lock);
        }
        spin_unlock(&fs_info->unused_bgs_lock);
        mutex_unlock(&fs_info->reclaim_bgs_lock);
+end:
        btrfs_exclop_finish(fs_info);
        sb_end_write(fs_info->sb);
 }
@@ -1973,7 +2006,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start,
 
        /* For RAID5/6 adjust to a full IO stripe length */
        if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
-               io_stripe_size = nr_data_stripes(map) << BTRFS_STRIPE_LEN_SHIFT;
+               io_stripe_size = btrfs_stripe_nr_to_offset(nr_data_stripes(map));
 
        buf = kcalloc(map->num_stripes, sizeof(u64), GFP_NOFS);
        if (!buf) {
@@ -2818,10 +2851,20 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
        }
 
        ret = inc_block_group_ro(cache, 0);
-       if (!do_chunk_alloc || ret == -ETXTBSY)
-               goto unlock_out;
        if (!ret)
                goto out;
+       if (ret == -ETXTBSY)
+               goto unlock_out;
+
+       /*
+        * Skip chunk alloction if the bg is SYSTEM, this is to avoid system
+        * chunk allocation storm to exhaust the system chunk array.  Otherwise
+        * we still want to try our best to mark the block group read-only.
+        */
+       if (!do_chunk_alloc && ret == -ENOSPC &&
+           (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM))
+               goto unlock_out;
+
        alloc_flags = btrfs_get_alloc_profile(fs_info, cache->space_info->flags);
        ret = btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
        if (ret < 0)
@@ -3511,9 +3554,9 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
                        spin_unlock(&cache->lock);
                        spin_unlock(&space_info->lock);
 
-                       set_extent_dirty(&trans->transaction->pinned_extents,
-                                        bytenr, bytenr + num_bytes - 1,
-                                        GFP_NOFS | __GFP_NOFAIL);
+                       set_extent_bit(&trans->transaction->pinned_extents,
+                                      bytenr, bytenr + num_bytes - 1,
+                                      EXTENT_DIRTY, NULL);
                }
 
                spin_lock(&trans->transaction->dirty_bgs_lock);
index cc0e4b3..f204add 100644 (file)
@@ -162,7 +162,14 @@ struct btrfs_block_group {
         */
        struct list_head cluster_list;
 
-       /* For delayed block group creation or deletion of empty block groups */
+       /*
+        * Used for several lists:
+        *
+        * 1) struct btrfs_fs_info::unused_bgs
+        * 2) struct btrfs_fs_info::reclaim_bgs
+        * 3) struct btrfs_transaction::deleted_bgs
+        * 4) struct btrfs_trans_handle::new_bgs
+        */
        struct list_head bg_list;
 
        /* For read-only block groups */
index 3ab707e..6279d20 100644 (file)
@@ -124,7 +124,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
        } else {
                num_bytes = 0;
        }
-       if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
+       if (qgroup_to_release_ret &&
+           block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
                qgroup_to_release = block_rsv->qgroup_rsv_reserved -
                                    block_rsv->qgroup_rsv_size;
                block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
@@ -540,3 +541,22 @@ try_reserve:
 
        return ERR_PTR(ret);
 }
+
+int btrfs_check_trunc_cache_free_space(struct btrfs_fs_info *fs_info,
+                                      struct btrfs_block_rsv *rsv)
+{
+       u64 needed_bytes;
+       int ret;
+
+       /* 1 for slack space, 1 for updating the inode */
+       needed_bytes = btrfs_calc_insert_metadata_size(fs_info, 1) +
+               btrfs_calc_metadata_size(fs_info, 1);
+
+       spin_lock(&rsv->lock);
+       if (rsv->reserved < needed_bytes)
+               ret = -ENOSPC;
+       else
+               ret = 0;
+       spin_unlock(&rsv->lock);
+       return ret;
+}
index 6dc7817..b0bd12b 100644 (file)
@@ -82,6 +82,8 @@ void btrfs_release_global_block_rsv(struct btrfs_fs_info *fs_info);
 struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans,
                                            struct btrfs_root *root,
                                            u32 blocksize);
+int btrfs_check_trunc_cache_free_space(struct btrfs_fs_info *fs_info,
+                                      struct btrfs_block_rsv *rsv);
 static inline void btrfs_unuse_block_rsv(struct btrfs_fs_info *fs_info,
                                         struct btrfs_block_rsv *block_rsv,
                                         u32 blocksize)
index ec2ae44..d47a927 100644 (file)
@@ -116,9 +116,6 @@ struct btrfs_inode {
 
        unsigned long runtime_flags;
 
-       /* Keep track of who's O_SYNC/fsyncing currently */
-       atomic_t sync_writers;
-
        /* full 64 bit generation number, struct vfs_inode doesn't have a big
         * enough field for this.
         */
@@ -335,7 +332,7 @@ static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode,
        if (btrfs_is_free_space_inode(inode))
                return;
        trace_btrfs_inode_mod_outstanding_extents(inode->root, btrfs_ino(inode),
-                                                 mod);
+                                                 mod, inode->outstanding_extents);
 }
 
 /*
@@ -407,30 +404,12 @@ static inline bool btrfs_inode_can_compress(const struct btrfs_inode *inode)
        return true;
 }
 
-/*
- * btrfs_inode_item stores flags in a u64, btrfs_inode stores them in two
- * separate u32s. These two functions convert between the two representations.
- */
-static inline u64 btrfs_inode_combine_flags(u32 flags, u32 ro_flags)
-{
-       return (flags | ((u64)ro_flags << 32));
-}
-
-static inline void btrfs_inode_split_flags(u64 inode_item_flags,
-                                          u32 *flags, u32 *ro_flags)
-{
-       *flags = (u32)inode_item_flags;
-       *ro_flags = (u32)(inode_item_flags >> 32);
-}
-
 /* Array of bytes with variable length, hexadecimal format 0x1234 */
 #define CSUM_FMT                               "0x%*phN"
 #define CSUM_FMT_VALUE(size, bytes)            size, bytes
 
 int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
                            u32 pgoff, u8 *csum, const u8 * const csum_expected);
-int btrfs_extract_ordered_extent(struct btrfs_bio *bbio,
-                                struct btrfs_ordered_extent *ordered);
 bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
                        u32 bio_offset, struct bio_vec *bv);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
index 82e49d9..3caf339 100644 (file)
@@ -1459,13 +1459,13 @@ static int btrfsic_map_block(struct btrfsic_state *state, u64 bytenr, u32 len,
        struct btrfs_fs_info *fs_info = state->fs_info;
        int ret;
        u64 length;
-       struct btrfs_io_context *multi = NULL;
+       struct btrfs_io_context *bioc = NULL;
+       struct btrfs_io_stripe smap, *map;
        struct btrfs_device *device;
 
        length = len;
-       ret = btrfs_map_block(fs_info, BTRFS_MAP_READ,
-                             bytenr, &length, &multi, mirror_num);
-
+       ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, bytenr, &length, &bioc,
+                             NULL, &mirror_num, 0);
        if (ret) {
                block_ctx_out->start = 0;
                block_ctx_out->dev_bytenr = 0;
@@ -1478,21 +1478,26 @@ static int btrfsic_map_block(struct btrfsic_state *state, u64 bytenr, u32 len,
                return ret;
        }
 
-       device = multi->stripes[0].dev;
+       if (bioc)
+               map = &bioc->stripes[0];
+       else
+               map = &smap;
+
+       device = map->dev;
        if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state) ||
            !device->bdev || !device->name)
                block_ctx_out->dev = NULL;
        else
                block_ctx_out->dev = btrfsic_dev_state_lookup(
                                                        device->bdev->bd_dev);
-       block_ctx_out->dev_bytenr = multi->stripes[0].physical;
+       block_ctx_out->dev_bytenr = map->physical;
        block_ctx_out->start = bytenr;
        block_ctx_out->len = len;
        block_ctx_out->datav = NULL;
        block_ctx_out->pagev = NULL;
        block_ctx_out->mem_to_free = NULL;
 
-       kfree(multi);
+       kfree(bioc);
        if (NULL == block_ctx_out->dev) {
                ret = -ENXIO;
                pr_info("btrfsic: error, cannot lookup dev (#1)!\n");
@@ -1565,7 +1570,7 @@ static int btrfsic_read_block(struct btrfsic_state *state,
 
                bio = bio_alloc(block_ctx->dev->bdev, num_pages - i,
                                REQ_OP_READ, GFP_NOFS);
-               bio->bi_iter.bi_sector = dev_bytenr >> 9;
+               bio->bi_iter.bi_sector = dev_bytenr >> SECTOR_SHIFT;
 
                for (j = i; j < num_pages; j++) {
                        ret = bio_add_page(bio, block_ctx->pagev[j],
index 2d0493f..8818ed5 100644 (file)
@@ -37,7 +37,7 @@
 #include "file-item.h"
 #include "super.h"
 
-struct bio_set btrfs_compressed_bioset;
+static struct bio_set btrfs_compressed_bioset;
 
 static const char* const btrfs_compress_types[] = { "", "zlib", "lzo", "zstd" };
 
@@ -211,8 +211,6 @@ static noinline void end_compressed_writeback(const struct compressed_bio *cb)
                for (i = 0; i < ret; i++) {
                        struct folio *folio = fbatch.folios[i];
 
-                       if (errno)
-                               folio_set_error(folio);
                        btrfs_page_clamp_clear_writeback(fs_info, &folio->page,
                                                         cb->start, cb->len);
                }
@@ -226,13 +224,8 @@ static void btrfs_finish_compressed_write_work(struct work_struct *work)
        struct compressed_bio *cb =
                container_of(work, struct compressed_bio, write_end_work);
 
-       /*
-        * Ok, we're the last bio for this extent, step one is to call back
-        * into the FS and do all the end_io operations.
-        */
-       btrfs_writepage_endio_finish_ordered(cb->bbio.inode, NULL,
-                       cb->start, cb->start + cb->len - 1,
-                       cb->bbio.bio.bi_status == BLK_STS_OK);
+       btrfs_finish_ordered_extent(cb->bbio.ordered, NULL, cb->start, cb->len,
+                                   cb->bbio.bio.bi_status == BLK_STS_OK);
 
        if (cb->writeback)
                end_compressed_writeback(cb);
@@ -281,32 +274,31 @@ static void btrfs_add_compressed_bio_pages(struct compressed_bio *cb)
  * This also checksums the file bytes and gets things ready for
  * the end io hooks.
  */
-void btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
-                                unsigned int len, u64 disk_start,
-                                unsigned int compressed_len,
-                                struct page **compressed_pages,
-                                unsigned int nr_pages,
-                                blk_opf_t write_flags,
-                                bool writeback)
+void btrfs_submit_compressed_write(struct btrfs_ordered_extent *ordered,
+                                  struct page **compressed_pages,
+                                  unsigned int nr_pages,
+                                  blk_opf_t write_flags,
+                                  bool writeback)
 {
+       struct btrfs_inode *inode = BTRFS_I(ordered->inode);
        struct btrfs_fs_info *fs_info = inode->root->fs_info;
        struct compressed_bio *cb;
 
-       ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
-              IS_ALIGNED(len, fs_info->sectorsize));
-
-       write_flags |= REQ_BTRFS_ONE_ORDERED;
+       ASSERT(IS_ALIGNED(ordered->file_offset, fs_info->sectorsize));
+       ASSERT(IS_ALIGNED(ordered->num_bytes, fs_info->sectorsize));
 
-       cb = alloc_compressed_bio(inode, start, REQ_OP_WRITE | write_flags,
+       cb = alloc_compressed_bio(inode, ordered->file_offset,
+                                 REQ_OP_WRITE | write_flags,
                                  end_compressed_bio_write);
-       cb->start = start;
-       cb->len = len;
+       cb->start = ordered->file_offset;
+       cb->len = ordered->num_bytes;
        cb->compressed_pages = compressed_pages;
-       cb->compressed_len = compressed_len;
+       cb->compressed_len = ordered->disk_num_bytes;
        cb->writeback = writeback;
        INIT_WORK(&cb->write_end_work, btrfs_finish_compressed_write_work);
        cb->nr_pages = nr_pages;
-       cb->bbio.bio.bi_iter.bi_sector = disk_start >> SECTOR_SHIFT;
+       cb->bbio.bio.bi_iter.bi_sector = ordered->disk_bytenr >> SECTOR_SHIFT;
+       cb->bbio.ordered = ordered;
        btrfs_add_compressed_bio_pages(cb);
 
        btrfs_submit_bio(&cb->bbio, 0);
@@ -421,7 +413,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
                 */
                if (!em || cur < em->start ||
                    (cur + fs_info->sectorsize > extent_map_end(em)) ||
-                   (em->block_start >> 9) != orig_bio->bi_iter.bi_sector) {
+                   (em->block_start >> SECTOR_SHIFT) != orig_bio->bi_iter.bi_sector) {
                        free_extent_map(em);
                        unlock_extent(tree, cur, page_end, NULL);
                        unlock_page(page);
@@ -472,7 +464,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
  * After the compressed pages are read, we copy the bytes into the
  * bio we were passed and then call the bio end_io calls
  */
-void btrfs_submit_compressed_read(struct btrfs_bio *bbio, int mirror_num)
+void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
 {
        struct btrfs_inode *inode = bbio->inode;
        struct btrfs_fs_info *fs_info = inode->root->fs_info;
@@ -538,7 +530,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio, int mirror_num)
        if (memstall)
                psi_memstall_leave(&pflags);
 
-       btrfs_submit_bio(&cb->bbio, mirror_num);
+       btrfs_submit_bio(&cb->bbio, 0);
        return;
 
 out_free_compressed_pages:
index 19ab2ab..03bb9d1 100644 (file)
@@ -10,6 +10,7 @@
 #include "bio.h"
 
 struct btrfs_inode;
+struct btrfs_ordered_extent;
 
 /*
  * We want to make sure that amount of RAM required to uncompress an extent is
@@ -86,14 +87,12 @@ int btrfs_decompress(int type, const u8 *data_in, struct page *dest_page,
 int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
                              struct compressed_bio *cb, u32 decompressed);
 
-void btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
-                                 unsigned int len, u64 disk_start,
-                                 unsigned int compressed_len,
+void btrfs_submit_compressed_write(struct btrfs_ordered_extent *ordered,
                                  struct page **compressed_pages,
                                  unsigned int nr_pages,
                                  blk_opf_t write_flags,
                                  bool writeback);
-void btrfs_submit_compressed_read(struct btrfs_bio *bbio, int mirror_num);
+void btrfs_submit_compressed_read(struct btrfs_bio *bbio);
 
 unsigned int btrfs_compress_str2level(unsigned int type, const char *str);
 
index 3c983c7..a4cb4b6 100644 (file)
@@ -37,8 +37,6 @@ static int push_node_left(struct btrfs_trans_handle *trans,
 static int balance_node_right(struct btrfs_trans_handle *trans,
                              struct extent_buffer *dst_buf,
                              struct extent_buffer *src_buf);
-static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
-                   int level, int slot);
 
 static const struct btrfs_csums {
        u16             size;
@@ -150,13 +148,19 @@ static inline void copy_leaf_items(const struct extent_buffer *dst,
                              nr_items * sizeof(struct btrfs_item));
 }
 
+/* This exists for btrfs-progs usages. */
+u16 btrfs_csum_type_size(u16 type)
+{
+       return btrfs_csums[type].size;
+}
+
 int btrfs_super_csum_size(const struct btrfs_super_block *s)
 {
        u16 t = btrfs_super_csum_type(s);
        /*
         * csum type is validated at mount time
         */
-       return btrfs_csums[t].size;
+       return btrfs_csum_type_size(t);
 }
 
 const char *btrfs_super_csum_name(u16 csum_type)
@@ -417,9 +421,13 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
                                               &refs, &flags);
                if (ret)
                        return ret;
-               if (refs == 0) {
-                       ret = -EROFS;
-                       btrfs_handle_fs_error(fs_info, ret, NULL);
+               if (unlikely(refs == 0)) {
+                       btrfs_crit(fs_info,
+               "found 0 references for tree block at bytenr %llu level %d root %llu",
+                                  buf->start, btrfs_header_level(buf),
+                                  btrfs_root_id(root));
+                       ret = -EUCLEAN;
+                       btrfs_abort_transaction(trans, ret);
                        return ret;
                }
        } else {
@@ -464,10 +472,7 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
                                return ret;
                }
                if (new_flags != 0) {
-                       int level = btrfs_header_level(buf);
-
-                       ret = btrfs_set_disk_extent_flags(trans, buf,
-                                                         new_flags, level);
+                       ret = btrfs_set_disk_extent_flags(trans, buf, new_flags);
                        if (ret)
                                return ret;
                }
@@ -583,9 +588,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
                    btrfs_header_backref_rev(buf) < BTRFS_MIXED_BACKREF_REV)
                        parent_start = buf->start;
 
-               atomic_inc(&cow->refs);
                ret = btrfs_tree_mod_log_insert_root(root->node, cow, true);
-               BUG_ON(ret < 0);
+               if (ret < 0) {
+                       btrfs_tree_unlock(cow);
+                       free_extent_buffer(cow);
+                       btrfs_abort_transaction(trans, ret);
+                       return ret;
+               }
+               atomic_inc(&cow->refs);
                rcu_assign_pointer(root->node, cow);
 
                btrfs_free_tree_block(trans, btrfs_root_id(root), buf,
@@ -594,8 +604,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
                add_root_to_dirty_list(root);
        } else {
                WARN_ON(trans->transid != btrfs_header_generation(parent));
-               btrfs_tree_mod_log_insert_key(parent, parent_slot,
-                                             BTRFS_MOD_LOG_KEY_REPLACE);
+               ret = btrfs_tree_mod_log_insert_key(parent, parent_slot,
+                                                   BTRFS_MOD_LOG_KEY_REPLACE);
+               if (ret) {
+                       btrfs_tree_unlock(cow);
+                       free_extent_buffer(cow);
+                       btrfs_abort_transaction(trans, ret);
+                       return ret;
+               }
                btrfs_set_node_blockptr(parent, parent_slot,
                                        cow->start);
                btrfs_set_node_ptr_generation(parent, parent_slot,
@@ -1028,8 +1044,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                child = btrfs_read_node_slot(mid, 0);
                if (IS_ERR(child)) {
                        ret = PTR_ERR(child);
-                       btrfs_handle_fs_error(fs_info, ret, NULL);
-                       goto enospc;
+                       goto out;
                }
 
                btrfs_tree_lock(child);
@@ -1038,11 +1053,16 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                if (ret) {
                        btrfs_tree_unlock(child);
                        free_extent_buffer(child);
-                       goto enospc;
+                       goto out;
                }
 
                ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
-               BUG_ON(ret < 0);
+               if (ret < 0) {
+                       btrfs_tree_unlock(child);
+                       free_extent_buffer(child);
+                       btrfs_abort_transaction(trans, ret);
+                       goto out;
+               }
                rcu_assign_pointer(root->node, child);
 
                add_root_to_dirty_list(root);
@@ -1070,7 +1090,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                if (IS_ERR(left)) {
                        ret = PTR_ERR(left);
                        left = NULL;
-                       goto enospc;
+                       goto out;
                }
 
                __btrfs_tree_lock(left, BTRFS_NESTING_LEFT);
@@ -1079,7 +1099,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                                       BTRFS_NESTING_LEFT_COW);
                if (wret) {
                        ret = wret;
-                       goto enospc;
+                       goto out;
                }
        }
 
@@ -1088,7 +1108,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                if (IS_ERR(right)) {
                        ret = PTR_ERR(right);
                        right = NULL;
-                       goto enospc;
+                       goto out;
                }
 
                __btrfs_tree_lock(right, BTRFS_NESTING_RIGHT);
@@ -1097,7 +1117,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                                       BTRFS_NESTING_RIGHT_COW);
                if (wret) {
                        ret = wret;
-                       goto enospc;
+                       goto out;
                }
        }
 
@@ -1119,7 +1139,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                if (btrfs_header_nritems(right) == 0) {
                        btrfs_clear_buffer_dirty(trans, right);
                        btrfs_tree_unlock(right);
-                       del_ptr(root, path, level + 1, pslot + 1);
+                       ret = btrfs_del_ptr(trans, root, path, level + 1, pslot + 1);
+                       if (ret < 0) {
+                               free_extent_buffer_stale(right);
+                               right = NULL;
+                               goto out;
+                       }
                        root_sub_used(root, right->len);
                        btrfs_free_tree_block(trans, btrfs_root_id(root), right,
                                              0, 1);
@@ -1130,7 +1155,10 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                        btrfs_node_key(right, &right_key, 0);
                        ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
                                        BTRFS_MOD_LOG_KEY_REPLACE);
-                       BUG_ON(ret < 0);
+                       if (ret < 0) {
+                               btrfs_abort_transaction(trans, ret);
+                               goto out;
+                       }
                        btrfs_set_node_key(parent, &right_key, pslot + 1);
                        btrfs_mark_buffer_dirty(parent);
                }
@@ -1145,15 +1173,19 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                 * otherwise we would have pulled some pointers from the
                 * right
                 */
-               if (!left) {
-                       ret = -EROFS;
-                       btrfs_handle_fs_error(fs_info, ret, NULL);
-                       goto enospc;
+               if (unlikely(!left)) {
+                       btrfs_crit(fs_info,
+"missing left child when middle child only has 1 item, parent bytenr %llu level %d mid bytenr %llu root %llu",
+                                  parent->start, btrfs_header_level(parent),
+                                  mid->start, btrfs_root_id(root));
+                       ret = -EUCLEAN;
+                       btrfs_abort_transaction(trans, ret);
+                       goto out;
                }
                wret = balance_node_right(trans, mid, left);
                if (wret < 0) {
                        ret = wret;
-                       goto enospc;
+                       goto out;
                }
                if (wret == 1) {
                        wret = push_node_left(trans, left, mid, 1);
@@ -1165,7 +1197,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
        if (btrfs_header_nritems(mid) == 0) {
                btrfs_clear_buffer_dirty(trans, mid);
                btrfs_tree_unlock(mid);
-               del_ptr(root, path, level + 1, pslot);
+               ret = btrfs_del_ptr(trans, root, path, level + 1, pslot);
+               if (ret < 0) {
+                       free_extent_buffer_stale(mid);
+                       mid = NULL;
+                       goto out;
+               }
                root_sub_used(root, mid->len);
                btrfs_free_tree_block(trans, btrfs_root_id(root), mid, 0, 1);
                free_extent_buffer_stale(mid);
@@ -1176,7 +1213,10 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
                btrfs_node_key(mid, &mid_key, 0);
                ret = btrfs_tree_mod_log_insert_key(parent, pslot,
                                                    BTRFS_MOD_LOG_KEY_REPLACE);
-               BUG_ON(ret < 0);
+               if (ret < 0) {
+                       btrfs_abort_transaction(trans, ret);
+                       goto out;
+               }
                btrfs_set_node_key(parent, &mid_key, pslot);
                btrfs_mark_buffer_dirty(parent);
        }
@@ -1202,7 +1242,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
        if (orig_ptr !=
            btrfs_node_blockptr(path->nodes[level], path->slots[level]))
                BUG();
-enospc:
+out:
        if (right) {
                btrfs_tree_unlock(right);
                free_extent_buffer(right);
@@ -1278,7 +1318,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
                        btrfs_node_key(mid, &disk_key, 0);
                        ret = btrfs_tree_mod_log_insert_key(parent, pslot,
                                        BTRFS_MOD_LOG_KEY_REPLACE);
-                       BUG_ON(ret < 0);
+                       if (ret < 0) {
+                               btrfs_tree_unlock(left);
+                               free_extent_buffer(left);
+                               btrfs_abort_transaction(trans, ret);
+                               return ret;
+                       }
                        btrfs_set_node_key(parent, &disk_key, pslot);
                        btrfs_mark_buffer_dirty(parent);
                        if (btrfs_header_nritems(left) > orig_slot) {
@@ -1333,7 +1378,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
                        btrfs_node_key(right, &disk_key, 0);
                        ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
                                        BTRFS_MOD_LOG_KEY_REPLACE);
-                       BUG_ON(ret < 0);
+                       if (ret < 0) {
+                               btrfs_tree_unlock(right);
+                               free_extent_buffer(right);
+                               btrfs_abort_transaction(trans, ret);
+                               return ret;
+                       }
                        btrfs_set_node_key(parent, &disk_key, pslot + 1);
                        btrfs_mark_buffer_dirty(parent);
 
@@ -2379,6 +2429,87 @@ done:
 }
 
 /*
+ * Search the tree again to find a leaf with smaller keys.
+ * Returns 0 if it found something.
+ * Returns 1 if there are no smaller keys.
+ * Returns < 0 on error.
+ *
+ * This may release the path, and so you may lose any locks held at the
+ * time you call it.
+ */
+static int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
+{
+       struct btrfs_key key;
+       struct btrfs_key orig_key;
+       struct btrfs_disk_key found_key;
+       int ret;
+
+       btrfs_item_key_to_cpu(path->nodes[0], &key, 0);
+       orig_key = key;
+
+       if (key.offset > 0) {
+               key.offset--;
+       } else if (key.type > 0) {
+               key.type--;
+               key.offset = (u64)-1;
+       } else if (key.objectid > 0) {
+               key.objectid--;
+               key.type = (u8)-1;
+               key.offset = (u64)-1;
+       } else {
+               return 1;
+       }
+
+       btrfs_release_path(path);
+       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+       if (ret <= 0)
+               return ret;
+
+       /*
+        * Previous key not found. Even if we were at slot 0 of the leaf we had
+        * before releasing the path and calling btrfs_search_slot(), we now may
+        * be in a slot pointing to the same original key - this can happen if
+        * after we released the path, one of more items were moved from a
+        * sibling leaf into the front of the leaf we had due to an insertion
+        * (see push_leaf_right()).
+        * If we hit this case and our slot is > 0 and just decrement the slot
+        * so that the caller does not process the same key again, which may or
+        * may not break the caller, depending on its logic.
+        */
+       if (path->slots[0] < btrfs_header_nritems(path->nodes[0])) {
+               btrfs_item_key(path->nodes[0], &found_key, path->slots[0]);
+               ret = comp_keys(&found_key, &orig_key);
+               if (ret == 0) {
+                       if (path->slots[0] > 0) {
+                               path->slots[0]--;
+                               return 0;
+                       }
+                       /*
+                        * At slot 0, same key as before, it means orig_key is
+                        * the lowest, leftmost, key in the tree. We're done.
+                        */
+                       return 1;
+               }
+       }
+
+       btrfs_item_key(path->nodes[0], &found_key, 0);
+       ret = comp_keys(&found_key, &key);
+       /*
+        * We might have had an item with the previous key in the tree right
+        * before we released our path. And after we released our path, that
+        * item might have been pushed to the first slot (0) of the leaf we
+        * were holding due to a tree balance. Alternatively, an item with the
+        * previous key can exist as the only element of a leaf (big fat item).
+        * Therefore account for these 2 cases, so that our callers (like
+        * btrfs_previous_item) don't miss an existing item with a key matching
+        * the previous key we computed above.
+        */
+       if (ret <= 0)
+               return 0;
+       return 1;
+}
+
+/*
  * helper to use instead of search slot if no exact match is needed but
  * instead the next or previous item should be returned.
  * When find_higher is true, the next higher item is returned, the next lower
@@ -2552,6 +2683,7 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
        if (slot > 0) {
                btrfs_item_key(eb, &disk_key, slot - 1);
                if (unlikely(comp_keys(&disk_key, new_key) >= 0)) {
+                       btrfs_print_leaf(eb);
                        btrfs_crit(fs_info,
                "slot %u key (%llu %u %llu) new key (%llu %u %llu)",
                                   slot, btrfs_disk_key_objectid(&disk_key),
@@ -2559,13 +2691,13 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
                                   btrfs_disk_key_offset(&disk_key),
                                   new_key->objectid, new_key->type,
                                   new_key->offset);
-                       btrfs_print_leaf(eb);
                        BUG();
                }
        }
        if (slot < btrfs_header_nritems(eb) - 1) {
                btrfs_item_key(eb, &disk_key, slot + 1);
                if (unlikely(comp_keys(&disk_key, new_key) <= 0)) {
+                       btrfs_print_leaf(eb);
                        btrfs_crit(fs_info,
                "slot %u key (%llu %u %llu) new key (%llu %u %llu)",
                                   slot, btrfs_disk_key_objectid(&disk_key),
@@ -2573,7 +2705,6 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
                                   btrfs_disk_key_offset(&disk_key),
                                   new_key->objectid, new_key->type,
                                   new_key->offset);
-                       btrfs_print_leaf(eb);
                        BUG();
                }
        }
@@ -2626,7 +2757,11 @@ static bool check_sibling_keys(struct extent_buffer *left,
                btrfs_item_key_to_cpu(right, &right_first, 0);
        }
 
-       if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
+       if (unlikely(btrfs_comp_cpu_keys(&left_last, &right_first) >= 0)) {
+               btrfs_crit(left->fs_info, "left extent buffer:");
+               btrfs_print_tree(left, false);
+               btrfs_crit(left->fs_info, "right extent buffer:");
+               btrfs_print_tree(right, false);
                btrfs_crit(left->fs_info,
 "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
                           left_last.objectid, left_last.type,
@@ -2699,8 +2834,8 @@ static int push_node_left(struct btrfs_trans_handle *trans,
 
        if (push_items < src_nritems) {
                /*
-                * Don't call btrfs_tree_mod_log_insert_move() here, key removal
-                * was already fully logged by btrfs_tree_mod_log_eb_copy() above.
+                * btrfs_tree_mod_log_eb_copy handles logging the move, so we
+                * don't need to do an explicit tree mod log operation for it.
                 */
                memmove_extent_buffer(src, btrfs_node_key_ptr_offset(src, 0),
                                      btrfs_node_key_ptr_offset(src, push_items),
@@ -2761,8 +2896,11 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
                btrfs_abort_transaction(trans, ret);
                return ret;
        }
-       ret = btrfs_tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
-       BUG_ON(ret < 0);
+
+       /*
+        * btrfs_tree_mod_log_eb_copy handles logging the move, so we don't
+        * need to do an explicit tree mod log operation for it.
+        */
        memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(dst, push_items),
                                      btrfs_node_key_ptr_offset(dst, 0),
                                      (dst_nritems) *
@@ -2836,7 +2974,12 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
 
        old = root->node;
        ret = btrfs_tree_mod_log_insert_root(root->node, c, false);
-       BUG_ON(ret < 0);
+       if (ret < 0) {
+               btrfs_free_tree_block(trans, btrfs_root_id(root), c, 0, 1);
+               btrfs_tree_unlock(c);
+               free_extent_buffer(c);
+               return ret;
+       }
        rcu_assign_pointer(root->node, c);
 
        /* the super has an extra ref to root->node */
@@ -2857,10 +3000,10 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
  * slot and level indicate where you want the key to go, and
  * blocknr is the block the key points to.
  */
-static void insert_ptr(struct btrfs_trans_handle *trans,
-                      struct btrfs_path *path,
-                      struct btrfs_disk_key *key, u64 bytenr,
-                      int slot, int level)
+static int insert_ptr(struct btrfs_trans_handle *trans,
+                     struct btrfs_path *path,
+                     struct btrfs_disk_key *key, u64 bytenr,
+                     int slot, int level)
 {
        struct extent_buffer *lower;
        int nritems;
@@ -2876,7 +3019,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
                if (level) {
                        ret = btrfs_tree_mod_log_insert_move(lower, slot + 1,
                                        slot, nritems - slot);
-                       BUG_ON(ret < 0);
+                       if (ret < 0) {
+                               btrfs_abort_transaction(trans, ret);
+                               return ret;
+                       }
                }
                memmove_extent_buffer(lower,
                              btrfs_node_key_ptr_offset(lower, slot + 1),
@@ -2886,7 +3032,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
        if (level) {
                ret = btrfs_tree_mod_log_insert_key(lower, slot,
                                                    BTRFS_MOD_LOG_KEY_ADD);
-               BUG_ON(ret < 0);
+               if (ret < 0) {
+                       btrfs_abort_transaction(trans, ret);
+                       return ret;
+               }
        }
        btrfs_set_node_key(lower, key, slot);
        btrfs_set_node_blockptr(lower, slot, bytenr);
@@ -2894,6 +3043,8 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
        btrfs_set_node_ptr_generation(lower, slot, trans->transid);
        btrfs_set_header_nritems(lower, nritems + 1);
        btrfs_mark_buffer_dirty(lower);
+
+       return 0;
 }
 
 /*
@@ -2958,6 +3109,8 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
 
        ret = btrfs_tree_mod_log_eb_copy(split, c, 0, mid, c_nritems - mid);
        if (ret) {
+               btrfs_tree_unlock(split);
+               free_extent_buffer(split);
                btrfs_abort_transaction(trans, ret);
                return ret;
        }
@@ -2971,8 +3124,13 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
        btrfs_mark_buffer_dirty(c);
        btrfs_mark_buffer_dirty(split);
 
-       insert_ptr(trans, path, &disk_key, split->start,
-                  path->slots[level + 1] + 1, level + 1);
+       ret = insert_ptr(trans, path, &disk_key, split->start,
+                        path->slots[level + 1] + 1, level + 1);
+       if (ret < 0) {
+               btrfs_tree_unlock(split);
+               free_extent_buffer(split);
+               return ret;
+       }
 
        if (path->slots[level] >= mid) {
                path->slots[level] -= mid;
@@ -2992,7 +3150,7 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
  * and nr indicate which items in the leaf to check.  This totals up the
  * space used both by the item structs and the item data
  */
-static int leaf_space_used(struct extent_buffer *l, int start, int nr)
+static int leaf_space_used(const struct extent_buffer *l, int start, int nr)
 {
        int data_len;
        int nritems = btrfs_header_nritems(l);
@@ -3012,7 +3170,7 @@ static int leaf_space_used(struct extent_buffer *l, int start, int nr)
  * the start of the leaf data.  IOW, how much room
  * the leaf has left for both items and data
  */
-noinline int btrfs_leaf_free_space(struct extent_buffer *leaf)
+int btrfs_leaf_free_space(const struct extent_buffer *leaf)
 {
        struct btrfs_fs_info *fs_info = leaf->fs_info;
        int nritems = btrfs_header_nritems(leaf);
@@ -3215,6 +3373,7 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
 
        if (check_sibling_keys(left, right)) {
                ret = -EUCLEAN;
+               btrfs_abort_transaction(trans, ret);
                btrfs_tree_unlock(right);
                free_extent_buffer(right);
                return ret;
@@ -3433,6 +3592,7 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
 
        if (check_sibling_keys(left, right)) {
                ret = -EUCLEAN;
+               btrfs_abort_transaction(trans, ret);
                goto out;
        }
        return __push_leaf_left(trans, path, min_data_size, empty, left,
@@ -3447,16 +3607,17 @@ out:
  * split the path's leaf in two, making sure there is at least data_size
  * available for the resulting leaf level of the path.
  */
-static noinline void copy_for_split(struct btrfs_trans_handle *trans,
-                                   struct btrfs_path *path,
-                                   struct extent_buffer *l,
-                                   struct extent_buffer *right,
-                                   int slot, int mid, int nritems)
+static noinline int copy_for_split(struct btrfs_trans_handle *trans,
+                                  struct btrfs_path *path,
+                                  struct extent_buffer *l,
+                                  struct extent_buffer *right,
+                                  int slot, int mid, int nritems)
 {
        struct btrfs_fs_info *fs_info = trans->fs_info;
        int data_copy_size;
        int rt_data_off;
        int i;
+       int ret;
        struct btrfs_disk_key disk_key;
        struct btrfs_map_token token;
 
@@ -3481,7 +3642,9 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
 
        btrfs_set_header_nritems(l, mid);
        btrfs_item_key(right, &disk_key, 0);
-       insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
+       ret = insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
+       if (ret < 0)
+               return ret;
 
        btrfs_mark_buffer_dirty(right);
        btrfs_mark_buffer_dirty(l);
@@ -3499,6 +3662,8 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
        }
 
        BUG_ON(path->slots[0] < 0);
+
+       return 0;
 }
 
 /*
@@ -3697,8 +3862,13 @@ again:
        if (split == 0) {
                if (mid <= slot) {
                        btrfs_set_header_nritems(right, 0);
-                       insert_ptr(trans, path, &disk_key,
-                                  right->start, path->slots[1] + 1, 1);
+                       ret = insert_ptr(trans, path, &disk_key,
+                                        right->start, path->slots[1] + 1, 1);
+                       if (ret < 0) {
+                               btrfs_tree_unlock(right);
+                               free_extent_buffer(right);
+                               return ret;
+                       }
                        btrfs_tree_unlock(path->nodes[0]);
                        free_extent_buffer(path->nodes[0]);
                        path->nodes[0] = right;
@@ -3706,8 +3876,13 @@ again:
                        path->slots[1] += 1;
                } else {
                        btrfs_set_header_nritems(right, 0);
-                       insert_ptr(trans, path, &disk_key,
-                                  right->start, path->slots[1], 1);
+                       ret = insert_ptr(trans, path, &disk_key,
+                                        right->start, path->slots[1], 1);
+                       if (ret < 0) {
+                               btrfs_tree_unlock(right);
+                               free_extent_buffer(right);
+                               return ret;
+                       }
                        btrfs_tree_unlock(path->nodes[0]);
                        free_extent_buffer(path->nodes[0]);
                        path->nodes[0] = right;
@@ -3723,7 +3898,12 @@ again:
                return ret;
        }
 
-       copy_for_split(trans, path, l, right, slot, mid, nritems);
+       ret = copy_for_split(trans, path, l, right, slot, mid, nritems);
+       if (ret < 0) {
+               btrfs_tree_unlock(right);
+               free_extent_buffer(right);
+               return ret;
+       }
 
        if (split == 2) {
                BUG_ON(num_doubles != 0);
@@ -3820,7 +4000,12 @@ static noinline int split_item(struct btrfs_path *path,
        struct btrfs_disk_key disk_key;
 
        leaf = path->nodes[0];
-       BUG_ON(btrfs_leaf_free_space(leaf) < sizeof(struct btrfs_item));
+       /*
+        * Shouldn't happen because the caller must have previously called
+        * setup_leaf_for_split() to make room for the new item in the leaf.
+        */
+       if (WARN_ON(btrfs_leaf_free_space(leaf) < sizeof(struct btrfs_item)))
+               return -ENOSPC;
 
        orig_slot = path->slots[0];
        orig_offset = btrfs_item_offset(leaf, path->slots[0]);
@@ -4267,9 +4452,11 @@ int btrfs_duplicate_item(struct btrfs_trans_handle *trans,
  *
  * the tree should have been previously balanced so the deletion does not
  * empty a node.
+ *
+ * This is exported for use inside btrfs-progs, don't un-export it.
  */
-static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
-                   int level, int slot)
+int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+                 struct btrfs_path *path, int level, int slot)
 {
        struct extent_buffer *parent = path->nodes[level];
        u32 nritems;
@@ -4280,7 +4467,10 @@ static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
                if (level) {
                        ret = btrfs_tree_mod_log_insert_move(parent, slot,
                                        slot + 1, nritems - slot - 1);
-                       BUG_ON(ret < 0);
+                       if (ret < 0) {
+                               btrfs_abort_transaction(trans, ret);
+                               return ret;
+                       }
                }
                memmove_extent_buffer(parent,
                              btrfs_node_key_ptr_offset(parent, slot),
@@ -4290,7 +4480,10 @@ static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
        } else if (level) {
                ret = btrfs_tree_mod_log_insert_key(parent, slot,
                                                    BTRFS_MOD_LOG_KEY_REMOVE);
-               BUG_ON(ret < 0);
+               if (ret < 0) {
+                       btrfs_abort_transaction(trans, ret);
+                       return ret;
+               }
        }
 
        nritems--;
@@ -4306,6 +4499,7 @@ static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
                fixup_low_keys(path, &disk_key, level + 1);
        }
        btrfs_mark_buffer_dirty(parent);
+       return 0;
 }
 
 /*
@@ -4318,13 +4512,17 @@ static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
  * The path must have already been setup for deleting the leaf, including
  * all the proper balancing.  path->nodes[1] must be locked.
  */
-static noinline void btrfs_del_leaf(struct btrfs_trans_handle *trans,
-                                   struct btrfs_root *root,
-                                   struct btrfs_path *path,
-                                   struct extent_buffer *leaf)
+static noinline int btrfs_del_leaf(struct btrfs_trans_handle *trans,
+                                  struct btrfs_root *root,
+                                  struct btrfs_path *path,
+                                  struct extent_buffer *leaf)
 {
+       int ret;
+
        WARN_ON(btrfs_header_generation(leaf) != trans->transid);
-       del_ptr(root, path, 1, path->slots[1]);
+       ret = btrfs_del_ptr(trans, root, path, 1, path->slots[1]);
+       if (ret < 0)
+               return ret;
 
        /*
         * btrfs_free_extent is expensive, we want to make sure we
@@ -4337,6 +4535,7 @@ static noinline void btrfs_del_leaf(struct btrfs_trans_handle *trans,
        atomic_inc(&leaf->refs);
        btrfs_free_tree_block(trans, btrfs_root_id(root), leaf, 0, 1);
        free_extent_buffer_stale(leaf);
+       return 0;
 }
 /*
  * delete the item at the leaf level in path.  If that empties
@@ -4386,7 +4585,9 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
                        btrfs_set_header_level(leaf, 0);
                } else {
                        btrfs_clear_buffer_dirty(trans, leaf);
-                       btrfs_del_leaf(trans, root, path, leaf);
+                       ret = btrfs_del_leaf(trans, root, path, leaf);
+                       if (ret < 0)
+                               return ret;
                }
        } else {
                int used = leaf_space_used(leaf, 0, nritems);
@@ -4410,7 +4611,7 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 
                        /* push_leaf_left fixes the path.
                         * make sure the path still points to our leaf
-                        * for possible call to del_ptr below
+                        * for possible call to btrfs_del_ptr below
                         */
                        slot = path->slots[1];
                        atomic_inc(&leaf->refs);
@@ -4447,7 +4648,9 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 
                        if (btrfs_header_nritems(leaf) == 0) {
                                path->slots[1] = slot;
-                               btrfs_del_leaf(trans, root, path, leaf);
+                               ret = btrfs_del_leaf(trans, root, path, leaf);
+                               if (ret < 0)
+                                       return ret;
                                free_extent_buffer(leaf);
                                ret = 0;
                        } else {
@@ -4468,56 +4671,6 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 }
 
 /*
- * search the tree again to find a leaf with lesser keys
- * returns 0 if it found something or 1 if there are no lesser leaves.
- * returns < 0 on io errors.
- *
- * This may release the path, and so you may lose any locks held at the
- * time you call it.
- */
-int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
-{
-       struct btrfs_key key;
-       struct btrfs_disk_key found_key;
-       int ret;
-
-       btrfs_item_key_to_cpu(path->nodes[0], &key, 0);
-
-       if (key.offset > 0) {
-               key.offset--;
-       } else if (key.type > 0) {
-               key.type--;
-               key.offset = (u64)-1;
-       } else if (key.objectid > 0) {
-               key.objectid--;
-               key.type = (u8)-1;
-               key.offset = (u64)-1;
-       } else {
-               return 1;
-       }
-
-       btrfs_release_path(path);
-       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
-       if (ret < 0)
-               return ret;
-       btrfs_item_key(path->nodes[0], &found_key, 0);
-       ret = comp_keys(&found_key, &key);
-       /*
-        * We might have had an item with the previous key in the tree right
-        * before we released our path. And after we released our path, that
-        * item might have been pushed to the first slot (0) of the leaf we
-        * were holding due to a tree balance. Alternatively, an item with the
-        * previous key can exist as the only element of a leaf (big fat item).
-        * Therefore account for these 2 cases, so that our callers (like
-        * btrfs_previous_item) don't miss an existing item with a key matching
-        * the previous key we computed above.
-        */
-       if (ret <= 0)
-               return 0;
-       return 1;
-}
-
-/*
  * A helper function to walk down the tree starting at min_key, and looking
  * for nodes or leaves that are have a minimum transaction id.
  * This is used by the btree defrag code, and tree logging
index 4c1986c..f2d2b31 100644 (file)
@@ -541,6 +541,8 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
                      struct extent_buffer **cow_ret, u64 new_root_objectid);
 int btrfs_block_can_be_shared(struct btrfs_root *root,
                              struct extent_buffer *buf);
+int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+                 struct btrfs_path *path, int level, int slot);
 void btrfs_extend_item(struct btrfs_path *path, u32 data_size);
 void btrfs_truncate_item(struct btrfs_path *path, u32 new_size, int from_end);
 int btrfs_split_item(struct btrfs_trans_handle *trans,
@@ -633,7 +635,6 @@ static inline int btrfs_insert_empty_item(struct btrfs_trans_handle *trans,
        return btrfs_insert_empty_items(trans, root, path, &batch);
 }
 
-int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path);
 int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
                        u64 time_seq);
 
@@ -686,7 +687,7 @@ static inline int btrfs_next_item(struct btrfs_root *root, struct btrfs_path *p)
 {
        return btrfs_next_old_item(root, p, 0);
 }
-int btrfs_leaf_free_space(struct extent_buffer *leaf);
+int btrfs_leaf_free_space(const struct extent_buffer *leaf);
 
 static inline int is_fstree(u64 rootid)
 {
@@ -702,6 +703,7 @@ static inline bool btrfs_is_data_reloc_root(const struct btrfs_root *root)
        return root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID;
 }
 
+u16 btrfs_csum_type_size(u16 type);
 int btrfs_super_csum_size(const struct btrfs_super_block *s);
 const char *btrfs_super_csum_name(u16 csum_type);
 const char *btrfs_super_csum_driver(u16 csum_type);
index 8065341..f2ff4cb 100644 (file)
@@ -1040,7 +1040,8 @@ static int defrag_one_locked_target(struct btrfs_inode *inode,
        clear_extent_bit(&inode->io_tree, start, start + len - 1,
                         EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
                         EXTENT_DEFRAG, cached_state);
-       set_extent_defrag(&inode->io_tree, start, start + len - 1, cached_state);
+       set_extent_bit(&inode->io_tree, start, start + len - 1,
+                      EXTENT_DELALLOC | EXTENT_DEFRAG, cached_state);
 
        /* Update the page status */
        for (i = start_index - first_index; i <= last_index - first_index; i++) {
index 0b32432..6a13cf0 100644 (file)
@@ -407,7 +407,6 @@ static inline void drop_delayed_ref(struct btrfs_delayed_ref_root *delayed_refs,
        RB_CLEAR_NODE(&ref->ref_node);
        if (!list_empty(&ref->add_list))
                list_del(&ref->add_list);
-       ref->in_tree = 0;
        btrfs_put_delayed_ref(ref);
        atomic_dec(&delayed_refs->num_entries);
 }
@@ -507,6 +506,7 @@ struct btrfs_delayed_ref_head *btrfs_select_ref_head(
 {
        struct btrfs_delayed_ref_head *head;
 
+       lockdep_assert_held(&delayed_refs->lock);
 again:
        head = find_ref_head(delayed_refs, delayed_refs->run_delayed_start,
                             true);
@@ -531,7 +531,7 @@ again:
                                href_node);
        }
 
-       head->processing = 1;
+       head->processing = true;
        WARN_ON(delayed_refs->num_heads_ready == 0);
        delayed_refs->num_heads_ready--;
        delayed_refs->run_delayed_start = head->bytenr +
@@ -549,31 +549,35 @@ void btrfs_delete_ref_head(struct btrfs_delayed_ref_root *delayed_refs,
        RB_CLEAR_NODE(&head->href_node);
        atomic_dec(&delayed_refs->num_entries);
        delayed_refs->num_heads--;
-       if (head->processing == 0)
+       if (!head->processing)
                delayed_refs->num_heads_ready--;
 }
 
 /*
  * Helper to insert the ref_node to the tail or merge with tail.
  *
- * Return 0 for insert.
- * Return >0 for merge.
+ * Return false if the ref was inserted.
+ * Return true if the ref was merged into an existing one (and therefore can be
+ * freed by the caller).
  */
-static int insert_delayed_ref(struct btrfs_delayed_ref_root *root,
-                             struct btrfs_delayed_ref_head *href,
-                             struct btrfs_delayed_ref_node *ref)
+static bool insert_delayed_ref(struct btrfs_delayed_ref_root *root,
+                              struct btrfs_delayed_ref_head *href,
+                              struct btrfs_delayed_ref_node *ref)
 {
        struct btrfs_delayed_ref_node *exist;
        int mod;
-       int ret = 0;
 
        spin_lock(&href->lock);
        exist = tree_insert(&href->ref_tree, ref);
-       if (!exist)
-               goto inserted;
+       if (!exist) {
+               if (ref->action == BTRFS_ADD_DELAYED_REF)
+                       list_add_tail(&ref->add_list, &href->ref_add_list);
+               atomic_inc(&root->num_entries);
+               spin_unlock(&href->lock);
+               return false;
+       }
 
        /* Now we are sure we can merge */
-       ret = 1;
        if (exist->action == ref->action) {
                mod = ref->ref_mod;
        } else {
@@ -600,13 +604,7 @@ static int insert_delayed_ref(struct btrfs_delayed_ref_root *root,
        if (exist->ref_mod == 0)
                drop_delayed_ref(root, href, exist);
        spin_unlock(&href->lock);
-       return ret;
-inserted:
-       if (ref->action == BTRFS_ADD_DELAYED_REF)
-               list_add_tail(&ref->add_list, &href->ref_add_list);
-       atomic_inc(&root->num_entries);
-       spin_unlock(&href->lock);
-       return ret;
+       return true;
 }
 
 /*
@@ -699,34 +697,38 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
                                  bool is_system)
 {
        int count_mod = 1;
-       int must_insert_reserved = 0;
+       bool must_insert_reserved = false;
 
        /* If reserved is provided, it must be a data extent. */
        BUG_ON(!is_data && reserved);
 
-       /*
-        * The head node stores the sum of all the mods, so dropping a ref
-        * should drop the sum in the head node by one.
-        */
-       if (action == BTRFS_UPDATE_DELAYED_HEAD)
+       switch (action) {
+       case BTRFS_UPDATE_DELAYED_HEAD:
                count_mod = 0;
-       else if (action == BTRFS_DROP_DELAYED_REF)
+               break;
+       case BTRFS_DROP_DELAYED_REF:
+               /*
+                * The head node stores the sum of all the mods, so dropping a ref
+                * should drop the sum in the head node by one.
+                */
                count_mod = -1;
-
-       /*
-        * BTRFS_ADD_DELAYED_EXTENT means that we need to update the reserved
-        * accounting when the extent is finally added, or if a later
-        * modification deletes the delayed ref without ever inserting the
-        * extent into the extent allocation tree.  ref->must_insert_reserved
-        * is the flag used to record that accounting mods are required.
-        *
-        * Once we record must_insert_reserved, switch the action to
-        * BTRFS_ADD_DELAYED_REF because other special casing is not required.
-        */
-       if (action == BTRFS_ADD_DELAYED_EXTENT)
-               must_insert_reserved = 1;
-       else
-               must_insert_reserved = 0;
+               break;
+       case BTRFS_ADD_DELAYED_EXTENT:
+               /*
+                * BTRFS_ADD_DELAYED_EXTENT means that we need to update the
+                * reserved accounting when the extent is finally added, or if a
+                * later modification deletes the delayed ref without ever
+                * inserting the extent into the extent allocation tree.
+                * ref->must_insert_reserved is the flag used to record that
+                * accounting mods are required.
+                *
+                * Once we record must_insert_reserved, switch the action to
+                * BTRFS_ADD_DELAYED_REF because other special casing is not
+                * required.
+                */
+               must_insert_reserved = true;
+               break;
+       }
 
        refcount_set(&head_ref->refs, 1);
        head_ref->bytenr = bytenr;
@@ -738,7 +740,7 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
        head_ref->ref_tree = RB_ROOT_CACHED;
        INIT_LIST_HEAD(&head_ref->ref_add_list);
        RB_CLEAR_NODE(&head_ref->href_node);
-       head_ref->processing = 0;
+       head_ref->processing = false;
        head_ref->total_ref_mod = count_mod;
        spin_lock_init(&head_ref->lock);
        mutex_init(&head_ref->mutex);
@@ -763,11 +765,11 @@ static noinline struct btrfs_delayed_ref_head *
 add_delayed_ref_head(struct btrfs_trans_handle *trans,
                     struct btrfs_delayed_ref_head *head_ref,
                     struct btrfs_qgroup_extent_record *qrecord,
-                    int action, int *qrecord_inserted_ret)
+                    int action, bool *qrecord_inserted_ret)
 {
        struct btrfs_delayed_ref_head *existing;
        struct btrfs_delayed_ref_root *delayed_refs;
-       int qrecord_inserted = 0;
+       bool qrecord_inserted = false;
 
        delayed_refs = &trans->transaction->delayed_refs;
 
@@ -777,7 +779,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
                                        delayed_refs, qrecord))
                        kfree(qrecord);
                else
-                       qrecord_inserted = 1;
+                       qrecord_inserted = true;
        }
 
        trace_add_delayed_ref_head(trans->fs_info, head_ref, action);
@@ -853,8 +855,6 @@ static void init_delayed_ref_common(struct btrfs_fs_info *fs_info,
        ref->num_bytes = num_bytes;
        ref->ref_mod = 1;
        ref->action = action;
-       ref->is_head = 0;
-       ref->in_tree = 1;
        ref->seq = seq;
        ref->type = ref_type;
        RB_CLEAR_NODE(&ref->ref_node);
@@ -875,11 +875,11 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
        struct btrfs_delayed_ref_head *head_ref;
        struct btrfs_delayed_ref_root *delayed_refs;
        struct btrfs_qgroup_extent_record *record = NULL;
-       int qrecord_inserted;
+       bool qrecord_inserted;
        bool is_system;
+       bool merged;
        int action = generic_ref->action;
        int level = generic_ref->tree_ref.level;
-       int ret;
        u64 bytenr = generic_ref->bytenr;
        u64 num_bytes = generic_ref->len;
        u64 parent = generic_ref->parent;
@@ -935,7 +935,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
        head_ref = add_delayed_ref_head(trans, head_ref, record,
                                        action, &qrecord_inserted);
 
-       ret = insert_delayed_ref(delayed_refs, head_ref, &ref->node);
+       merged = insert_delayed_ref(delayed_refs, head_ref, &ref->node);
        spin_unlock(&delayed_refs->lock);
 
        /*
@@ -947,7 +947,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
        trace_add_delayed_tree_ref(fs_info, &ref->node, ref,
                                   action == BTRFS_ADD_DELAYED_EXTENT ?
                                   BTRFS_ADD_DELAYED_REF : action);
-       if (ret > 0)
+       if (merged)
                kmem_cache_free(btrfs_delayed_tree_ref_cachep, ref);
 
        if (qrecord_inserted)
@@ -968,9 +968,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
        struct btrfs_delayed_ref_head *head_ref;
        struct btrfs_delayed_ref_root *delayed_refs;
        struct btrfs_qgroup_extent_record *record = NULL;
-       int qrecord_inserted;
+       bool qrecord_inserted;
        int action = generic_ref->action;
-       int ret;
+       bool merged;
        u64 bytenr = generic_ref->bytenr;
        u64 num_bytes = generic_ref->len;
        u64 parent = generic_ref->parent;
@@ -1027,7 +1027,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
        head_ref = add_delayed_ref_head(trans, head_ref, record,
                                        action, &qrecord_inserted);
 
-       ret = insert_delayed_ref(delayed_refs, head_ref, &ref->node);
+       merged = insert_delayed_ref(delayed_refs, head_ref, &ref->node);
        spin_unlock(&delayed_refs->lock);
 
        /*
@@ -1039,7 +1039,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
        trace_add_delayed_data_ref(trans->fs_info, &ref->node, ref,
                                   action == BTRFS_ADD_DELAYED_EXTENT ?
                                   BTRFS_ADD_DELAYED_REF : action);
-       if (ret > 0)
+       if (merged)
                kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
 
 
index b54261f..b8e14b0 100644 (file)
@@ -48,9 +48,6 @@ struct btrfs_delayed_ref_node {
 
        unsigned int action:8;
        unsigned int type:8;
-       /* is this node still in the rbtree? */
-       unsigned int is_head:1;
-       unsigned int in_tree:1;
 };
 
 struct btrfs_delayed_extent_op {
@@ -70,20 +67,26 @@ struct btrfs_delayed_extent_op {
 struct btrfs_delayed_ref_head {
        u64 bytenr;
        u64 num_bytes;
-       refcount_t refs;
+       /*
+        * For insertion into struct btrfs_delayed_ref_root::href_root.
+        * Keep it in the same cache line as 'bytenr' for more efficient
+        * searches in the rbtree.
+        */
+       struct rb_node href_node;
        /*
         * the mutex is held while running the refs, and it is also
         * held when checking the sum of reference modifications.
         */
        struct mutex mutex;
 
+       refcount_t refs;
+
+       /* Protects 'ref_tree' and 'ref_add_list'. */
        spinlock_t lock;
        struct rb_root_cached ref_tree;
        /* accumulate add BTRFS_ADD_DELAYED_REF nodes to this ref_add_list. */
        struct list_head ref_add_list;
 
-       struct rb_node href_node;
-
        struct btrfs_delayed_extent_op *extent_op;
 
        /*
@@ -113,10 +116,10 @@ struct btrfs_delayed_ref_head {
         * we need to update the in ram accounting to properly reflect
         * the free has happened.
         */
-       unsigned int must_insert_reserved:1;
-       unsigned int is_data:1;
-       unsigned int is_system:1;
-       unsigned int processing:1;
+       bool must_insert_reserved;
+       bool is_data;
+       bool is_system;
+       bool processing;
 };
 
 struct btrfs_delayed_tree_ref {
@@ -337,7 +340,7 @@ static inline void btrfs_put_delayed_ref(struct btrfs_delayed_ref_node *ref)
 {
        WARN_ON(refcount_read(&ref->refs) == 0);
        if (refcount_dec_and_test(&ref->refs)) {
-               WARN_ON(ref->in_tree);
+               WARN_ON(!RB_EMPTY_NODE(&ref->ref_node));
                switch (ref->type) {
                case BTRFS_TREE_BLOCK_REF_KEY:
                case BTRFS_SHARED_BLOCK_REF_KEY:
index 78696d3..5f10965 100644 (file)
@@ -41,7 +41,7 @@
  *   All new writes will be written to both target and source devices, so even
  *   if replace gets canceled, sources device still contains up-to-date data.
  *
- *   Location:         handle_ops_on_dev_replace() from __btrfs_map_block()
+ *   Location:         handle_ops_on_dev_replace() from btrfs_map_block()
  *   Start:            btrfs_dev_replace_start()
  *   End:              btrfs_dev_replace_finishing()
  *   Content:          Latest data/metadata
@@ -257,8 +257,8 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
                return -EINVAL;
        }
 
-       bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
-                                 fs_info->bdev_holder);
+       bdev = blkdev_get_by_path(device_path, BLK_OPEN_WRITE,
+                                 fs_info->bdev_holder, NULL);
        if (IS_ERR(bdev)) {
                btrfs_err(fs_info, "target device %s is invalid!", device_path);
                return PTR_ERR(bdev);
@@ -315,7 +315,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
        device->bdev = bdev;
        set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
        set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
-       device->mode = FMODE_EXCL;
+       device->holder = fs_info->bdev_holder;
        device->dev_stats_valid = 1;
        set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
        device->fs_devices = fs_devices;
@@ -334,7 +334,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
        return 0;
 
 error:
-       blkdev_put(bdev, FMODE_EXCL);
+       blkdev_put(bdev, fs_info->bdev_holder);
        return ret;
 }
 
@@ -795,8 +795,8 @@ static int btrfs_set_target_alloc_state(struct btrfs_device *srcdev,
        while (!find_first_extent_bit(&srcdev->alloc_state, start,
                                      &found_start, &found_end,
                                      CHUNK_ALLOCATED, &cached_state)) {
-               ret = set_extent_bits(&tgtdev->alloc_state, found_start,
-                                     found_end, CHUNK_ALLOCATED);
+               ret = set_extent_bit(&tgtdev->alloc_state, found_start,
+                                    found_end, CHUNK_ALLOCATED, NULL);
                if (ret)
                        break;
                start = found_end + 1;
index a6d77fe..944a734 100644 (file)
@@ -73,6 +73,23 @@ static struct list_head *get_discard_list(struct btrfs_discard_ctl *discard_ctl,
        return &discard_ctl->discard_list[block_group->discard_index];
 }
 
+/*
+ * Determine if async discard should be running.
+ *
+ * @discard_ctl: discard control
+ *
+ * Check if the file system is writeable and BTRFS_FS_DISCARD_RUNNING is set.
+ */
+static bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl)
+{
+       struct btrfs_fs_info *fs_info = container_of(discard_ctl,
+                                                    struct btrfs_fs_info,
+                                                    discard_ctl);
+
+       return (!(fs_info->sb->s_flags & SB_RDONLY) &&
+               test_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags));
+}
+
 static void __add_to_discard_list(struct btrfs_discard_ctl *discard_ctl,
                                  struct btrfs_block_group *block_group)
 {
@@ -545,23 +562,6 @@ static void btrfs_discard_workfn(struct work_struct *work)
 }
 
 /*
- * Determine if async discard should be running.
- *
- * @discard_ctl: discard control
- *
- * Check if the file system is writeable and BTRFS_FS_DISCARD_RUNNING is set.
- */
-bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl)
-{
-       struct btrfs_fs_info *fs_info = container_of(discard_ctl,
-                                                    struct btrfs_fs_info,
-                                                    discard_ctl);
-
-       return (!(fs_info->sb->s_flags & SB_RDONLY) &&
-               test_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags));
-}
-
-/*
  * Recalculate the base delay.
  *
  * @discard_ctl: discard control
index 57b9202..dddb0f9 100644 (file)
@@ -24,7 +24,6 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
                              struct btrfs_block_group *block_group);
 void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl,
                                 bool override);
-bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl);
 
 /* Update operations */
 void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl);
index 59ea049..7513388 100644 (file)
                                 BTRFS_SUPER_FLAG_METADUMP |\
                                 BTRFS_SUPER_FLAG_METADUMP_V2)
 
-static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
-static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
-                                     struct btrfs_fs_info *fs_info);
-static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root);
-static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
-                                       struct extent_io_tree *dirty_pages,
-                                       int mark);
-static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info,
-                                      struct extent_io_tree *pinned_extents);
 static int btrfs_cleanup_transaction(struct btrfs_fs_info *fs_info);
 static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info);
 
@@ -96,7 +87,7 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
        crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
                            first_page_part - BTRFS_CSUM_SIZE);
 
-       for (i = 1; i < num_pages; i++) {
+       for (i = 1; i < num_pages && INLINE_EXTENT_BUFFER_PAGES > 1; i++) {
                kaddr = page_address(buf->pages[i]);
                crypto_shash_update(shash, kaddr, PAGE_SIZE);
        }
@@ -110,35 +101,27 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
  * detect blocks that either didn't get written at all or got written
  * in the wrong place.
  */
-static int verify_parent_transid(struct extent_io_tree *io_tree,
-                                struct extent_buffer *eb, u64 parent_transid,
-                                int atomic)
+int btrfs_buffer_uptodate(struct extent_buffer *eb, u64 parent_transid, int atomic)
 {
-       struct extent_state *cached_state = NULL;
-       int ret;
+       if (!extent_buffer_uptodate(eb))
+               return 0;
 
        if (!parent_transid || btrfs_header_generation(eb) == parent_transid)
-               return 0;
+               return 1;
 
        if (atomic)
                return -EAGAIN;
 
-       lock_extent(io_tree, eb->start, eb->start + eb->len - 1, &cached_state);
-       if (extent_buffer_uptodate(eb) &&
-           btrfs_header_generation(eb) == parent_transid) {
-               ret = 0;
-               goto out;
-       }
-       btrfs_err_rl(eb->fs_info,
+       if (!extent_buffer_uptodate(eb) ||
+           btrfs_header_generation(eb) != parent_transid) {
+               btrfs_err_rl(eb->fs_info,
 "parent transid verify failed on logical %llu mirror %u wanted %llu found %llu",
                        eb->start, eb->read_mirror,
                        parent_transid, btrfs_header_generation(eb));
-       ret = 1;
-       clear_extent_buffer_uptodate(eb);
-out:
-       unlock_extent(io_tree, eb->start, eb->start + eb->len - 1,
-                     &cached_state);
-       return ret;
+               clear_extent_buffer_uptodate(eb);
+               return 0;
+       }
+       return 1;
 }
 
 static bool btrfs_supported_super_csum(u16 csum_type)
@@ -180,69 +163,10 @@ int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
        return 0;
 }
 
-int btrfs_verify_level_key(struct extent_buffer *eb, int level,
-                          struct btrfs_key *first_key, u64 parent_transid)
-{
-       struct btrfs_fs_info *fs_info = eb->fs_info;
-       int found_level;
-       struct btrfs_key found_key;
-       int ret;
-
-       found_level = btrfs_header_level(eb);
-       if (found_level != level) {
-               WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
-                    KERN_ERR "BTRFS: tree level check failed\n");
-               btrfs_err(fs_info,
-"tree level mismatch detected, bytenr=%llu level expected=%u has=%u",
-                         eb->start, level, found_level);
-               return -EIO;
-       }
-
-       if (!first_key)
-               return 0;
-
-       /*
-        * For live tree block (new tree blocks in current transaction),
-        * we need proper lock context to avoid race, which is impossible here.
-        * So we only checks tree blocks which is read from disk, whose
-        * generation <= fs_info->last_trans_committed.
-        */
-       if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
-               return 0;
-
-       /* We have @first_key, so this @eb must have at least one item */
-       if (btrfs_header_nritems(eb) == 0) {
-               btrfs_err(fs_info,
-               "invalid tree nritems, bytenr=%llu nritems=0 expect >0",
-                         eb->start);
-               WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
-               return -EUCLEAN;
-       }
-
-       if (found_level)
-               btrfs_node_key_to_cpu(eb, &found_key, 0);
-       else
-               btrfs_item_key_to_cpu(eb, &found_key, 0);
-       ret = btrfs_comp_cpu_keys(first_key, &found_key);
-
-       if (ret) {
-               WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
-                    KERN_ERR "BTRFS: tree first key check failed\n");
-               btrfs_err(fs_info,
-"tree first key mismatch detected, bytenr=%llu parent_transid=%llu key expected=(%llu,%u,%llu) has=(%llu,%u,%llu)",
-                         eb->start, parent_transid, first_key->objectid,
-                         first_key->type, first_key->offset,
-                         found_key.objectid, found_key.type,
-                         found_key.offset);
-       }
-       return ret;
-}
-
 static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
                                      int mirror_num)
 {
        struct btrfs_fs_info *fs_info = eb->fs_info;
-       u64 start = eb->start;
        int i, num_pages = num_extent_pages(eb);
        int ret = 0;
 
@@ -251,12 +175,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
 
        for (i = 0; i < num_pages; i++) {
                struct page *p = eb->pages[i];
+               u64 start = max_t(u64, eb->start, page_offset(p));
+               u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE);
+               u32 len = end - start;
 
-               ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
-                               start, p, start - page_offset(p), mirror_num);
+               ret = btrfs_repair_io_failure(fs_info, 0, start, len,
+                               start, p, offset_in_page(start), mirror_num);
                if (ret)
                        break;
-               start += PAGE_SIZE;
        }
 
        return ret;
@@ -311,12 +237,34 @@ int btrfs_read_extent_buffer(struct extent_buffer *eb,
        return ret;
 }
 
-static int csum_one_extent_buffer(struct extent_buffer *eb)
+/*
+ * Checksum a dirty tree block before IO.
+ */
+blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
 {
+       struct extent_buffer *eb = bbio->private;
        struct btrfs_fs_info *fs_info = eb->fs_info;
+       u64 found_start = btrfs_header_bytenr(eb);
        u8 result[BTRFS_CSUM_SIZE];
        int ret;
 
+       /* Btree blocks are always contiguous on disk. */
+       if (WARN_ON_ONCE(bbio->file_offset != eb->start))
+               return BLK_STS_IOERR;
+       if (WARN_ON_ONCE(bbio->bio.bi_iter.bi_size != eb->len))
+               return BLK_STS_IOERR;
+
+       if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) {
+               WARN_ON_ONCE(found_start != 0);
+               return BLK_STS_OK;
+       }
+
+       if (WARN_ON_ONCE(found_start != eb->start))
+               return BLK_STS_IOERR;
+       if (WARN_ON(!btrfs_page_test_uptodate(fs_info, eb->pages[0], eb->start,
+                                             eb->len)))
+               return BLK_STS_IOERR;
+
        ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
                                    offsetof(struct btrfs_header, fsid),
                                    BTRFS_FSID_SIZE) == 0);
@@ -325,7 +273,7 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
        if (btrfs_header_level(eb))
                ret = btrfs_check_node(eb);
        else
-               ret = btrfs_check_leaf_full(eb);
+               ret = btrfs_check_leaf(eb);
 
        if (ret < 0)
                goto error;
@@ -343,8 +291,7 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
                goto error;
        }
        write_extent_buffer(eb, result, 0, fs_info->csum_size);
-
-       return 0;
+       return BLK_STS_OK;
 
 error:
        btrfs_print_tree(eb, 0);
@@ -358,103 +305,10 @@ error:
         */
        WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG) ||
                btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID);
-       return ret;
-}
-
-/* Checksum all dirty extent buffers in one bio_vec */
-static int csum_dirty_subpage_buffers(struct btrfs_fs_info *fs_info,
-                                     struct bio_vec *bvec)
-{
-       struct page *page = bvec->bv_page;
-       u64 bvec_start = page_offset(page) + bvec->bv_offset;
-       u64 cur;
-       int ret = 0;
-
-       for (cur = bvec_start; cur < bvec_start + bvec->bv_len;
-            cur += fs_info->nodesize) {
-               struct extent_buffer *eb;
-               bool uptodate;
-
-               eb = find_extent_buffer(fs_info, cur);
-               uptodate = btrfs_subpage_test_uptodate(fs_info, page, cur,
-                                                      fs_info->nodesize);
-
-               /* A dirty eb shouldn't disappear from buffer_radix */
-               if (WARN_ON(!eb))
-                       return -EUCLEAN;
-
-               if (WARN_ON(cur != btrfs_header_bytenr(eb))) {
-                       free_extent_buffer(eb);
-                       return -EUCLEAN;
-               }
-               if (WARN_ON(!uptodate)) {
-                       free_extent_buffer(eb);
-                       return -EUCLEAN;
-               }
-
-               ret = csum_one_extent_buffer(eb);
-               free_extent_buffer(eb);
-               if (ret < 0)
-                       return ret;
-       }
-       return ret;
-}
-
-/*
- * Checksum a dirty tree block before IO.  This has extra checks to make sure
- * we only fill in the checksum field in the first page of a multi-page block.
- * For subpage extent buffers we need bvec to also read the offset in the page.
- */
-static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec)
-{
-       struct page *page = bvec->bv_page;
-       u64 start = page_offset(page);
-       u64 found_start;
-       struct extent_buffer *eb;
-
-       if (fs_info->nodesize < PAGE_SIZE)
-               return csum_dirty_subpage_buffers(fs_info, bvec);
-
-       eb = (struct extent_buffer *)page->private;
-       if (page != eb->pages[0])
-               return 0;
-
-       found_start = btrfs_header_bytenr(eb);
-
-       if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) {
-               WARN_ON(found_start != 0);
-               return 0;
-       }
-
-       /*
-        * Please do not consolidate these warnings into a single if.
-        * It is useful to know what went wrong.
-        */
-       if (WARN_ON(found_start != start))
-               return -EUCLEAN;
-       if (WARN_ON(!PageUptodate(page)))
-               return -EUCLEAN;
-
-       return csum_one_extent_buffer(eb);
-}
-
-blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
-{
-       struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
-       struct bvec_iter iter;
-       struct bio_vec bv;
-       int ret = 0;
-
-       bio_for_each_segment(bv, &bbio->bio, iter) {
-               ret = csum_dirty_buffer(fs_info, &bv);
-               if (ret)
-                       break;
-       }
-
        return errno_to_blk_status(ret);
 }
 
-static int check_tree_block_fsid(struct extent_buffer *eb)
+static bool check_tree_block_fsid(struct extent_buffer *eb)
 {
        struct btrfs_fs_info *fs_info = eb->fs_info;
        struct btrfs_fs_devices *fs_devices = fs_info->fs_devices, *seed_devs;
@@ -474,18 +328,18 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
                metadata_uuid = fs_devices->fsid;
 
        if (!memcmp(fsid, metadata_uuid, BTRFS_FSID_SIZE))
-               return 0;
+               return false;
 
        list_for_each_entry(seed_devs, &fs_devices->seed_list, seed_list)
                if (!memcmp(fsid, seed_devs->fsid, BTRFS_FSID_SIZE))
-                       return 0;
+                       return false;
 
-       return 1;
+       return true;
 }
 
 /* Do basic extent buffer checks at read time */
-static int validate_extent_buffer(struct extent_buffer *eb,
-                                 struct btrfs_tree_parent_check *check)
+int btrfs_validate_extent_buffer(struct extent_buffer *eb,
+                                struct btrfs_tree_parent_check *check)
 {
        struct btrfs_fs_info *fs_info = eb->fs_info;
        u64 found_start;
@@ -582,7 +436,7 @@ static int validate_extent_buffer(struct extent_buffer *eb,
         * that we don't try and read the other copies of this block, just
         * return -EIO.
         */
-       if (found_level == 0 && btrfs_check_leaf_full(eb)) {
+       if (found_level == 0 && btrfs_check_leaf(eb)) {
                set_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags);
                ret = -EIO;
        }
@@ -590,9 +444,7 @@ static int validate_extent_buffer(struct extent_buffer *eb,
        if (found_level > 0 && btrfs_check_node(eb))
                ret = -EIO;
 
-       if (!ret)
-               set_extent_buffer_uptodate(eb);
-       else
+       if (ret)
                btrfs_err(fs_info,
                "read time tree block corruption detected on logical %llu mirror %u",
                          eb->start, eb->read_mirror);
@@ -600,105 +452,6 @@ out:
        return ret;
 }
 
-static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
-                                  int mirror, struct btrfs_tree_parent_check *check)
-{
-       struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
-       struct extent_buffer *eb;
-       bool reads_done;
-       int ret = 0;
-
-       ASSERT(check);
-
-       /*
-        * We don't allow bio merge for subpage metadata read, so we should
-        * only get one eb for each endio hook.
-        */
-       ASSERT(end == start + fs_info->nodesize - 1);
-       ASSERT(PagePrivate(page));
-
-       eb = find_extent_buffer(fs_info, start);
-       /*
-        * When we are reading one tree block, eb must have been inserted into
-        * the radix tree. If not, something is wrong.
-        */
-       ASSERT(eb);
-
-       reads_done = atomic_dec_and_test(&eb->io_pages);
-       /* Subpage read must finish in page read */
-       ASSERT(reads_done);
-
-       eb->read_mirror = mirror;
-       if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
-               ret = -EIO;
-               goto err;
-       }
-       ret = validate_extent_buffer(eb, check);
-       if (ret < 0)
-               goto err;
-
-       set_extent_buffer_uptodate(eb);
-
-       free_extent_buffer(eb);
-       return ret;
-err:
-       /*
-        * end_bio_extent_readpage decrements io_pages in case of error,
-        * make sure it has something to decrement.
-        */
-       atomic_inc(&eb->io_pages);
-       clear_extent_buffer_uptodate(eb);
-       free_extent_buffer(eb);
-       return ret;
-}
-
-int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
-                                  struct page *page, u64 start, u64 end,
-                                  int mirror)
-{
-       struct extent_buffer *eb;
-       int ret = 0;
-       int reads_done;
-
-       ASSERT(page->private);
-
-       if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE)
-               return validate_subpage_buffer(page, start, end, mirror,
-                                              &bbio->parent_check);
-
-       eb = (struct extent_buffer *)page->private;
-
-       /*
-        * The pending IO might have been the only thing that kept this buffer
-        * in memory.  Make sure we have a ref for all this other checks
-        */
-       atomic_inc(&eb->refs);
-
-       reads_done = atomic_dec_and_test(&eb->io_pages);
-       if (!reads_done)
-               goto err;
-
-       eb->read_mirror = mirror;
-       if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
-               ret = -EIO;
-               goto err;
-       }
-       ret = validate_extent_buffer(eb, &bbio->parent_check);
-err:
-       if (ret) {
-               /*
-                * our io error hook is going to dec the io pages
-                * again, we have to make sure it has something
-                * to decrement
-                */
-               atomic_inc(&eb->io_pages);
-               clear_extent_buffer_uptodate(eb);
-       }
-       free_extent_buffer(eb);
-
-       return ret;
-}
-
 #ifdef CONFIG_MIGRATION
 static int btree_migrate_folio(struct address_space *mapping,
                struct folio *dst, struct folio *src, enum migrate_mode mode)
@@ -995,13 +748,18 @@ int btrfs_global_root_insert(struct btrfs_root *root)
 {
        struct btrfs_fs_info *fs_info = root->fs_info;
        struct rb_node *tmp;
+       int ret = 0;
 
        write_lock(&fs_info->global_root_lock);
        tmp = rb_find_add(&root->rb_node, &fs_info->global_root_tree, global_root_cmp);
        write_unlock(&fs_info->global_root_lock);
-       ASSERT(!tmp);
 
-       return tmp ? -EEXIST : 0;
+       if (tmp) {
+               ret = -EEXIST;
+               btrfs_warn(fs_info, "global root %llu %llu already exists",
+                               root->root_key.objectid, root->root_key.offset);
+       }
+       return ret;
 }
 
 void btrfs_global_root_delete(struct btrfs_root *root)
@@ -1390,8 +1148,7 @@ static struct btrfs_root *btrfs_lookup_fs_root(struct btrfs_fs_info *fs_info,
        spin_lock(&fs_info->fs_roots_radix_lock);
        root = radix_tree_lookup(&fs_info->fs_roots_radix,
                                 (unsigned long)root_id);
-       if (root)
-               root = btrfs_grab_root(root);
+       root = btrfs_grab_root(root);
        spin_unlock(&fs_info->fs_roots_radix_lock);
        return root;
 }
@@ -1405,31 +1162,28 @@ static struct btrfs_root *btrfs_get_global_root(struct btrfs_fs_info *fs_info,
                .offset = 0,
        };
 
-       if (objectid == BTRFS_ROOT_TREE_OBJECTID)
+       switch (objectid) {
+       case BTRFS_ROOT_TREE_OBJECTID:
                return btrfs_grab_root(fs_info->tree_root);
-       if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
+       case BTRFS_EXTENT_TREE_OBJECTID:
                return btrfs_grab_root(btrfs_global_root(fs_info, &key));
-       if (objectid == BTRFS_CHUNK_TREE_OBJECTID)
+       case BTRFS_CHUNK_TREE_OBJECTID:
                return btrfs_grab_root(fs_info->chunk_root);
-       if (objectid == BTRFS_DEV_TREE_OBJECTID)
+       case BTRFS_DEV_TREE_OBJECTID:
                return btrfs_grab_root(fs_info->dev_root);
-       if (objectid == BTRFS_CSUM_TREE_OBJECTID)
+       case BTRFS_CSUM_TREE_OBJECTID:
+               return btrfs_grab_root(btrfs_global_root(fs_info, &key));
+       case BTRFS_QUOTA_TREE_OBJECTID:
+               return btrfs_grab_root(fs_info->quota_root);
+       case BTRFS_UUID_TREE_OBJECTID:
+               return btrfs_grab_root(fs_info->uuid_root);
+       case BTRFS_BLOCK_GROUP_TREE_OBJECTID:
+               return btrfs_grab_root(fs_info->block_group_root);
+       case BTRFS_FREE_SPACE_TREE_OBJECTID:
                return btrfs_grab_root(btrfs_global_root(fs_info, &key));
-       if (objectid == BTRFS_QUOTA_TREE_OBJECTID)
-               return btrfs_grab_root(fs_info->quota_root) ?
-                       fs_info->quota_root : ERR_PTR(-ENOENT);
-       if (objectid == BTRFS_UUID_TREE_OBJECTID)
-               return btrfs_grab_root(fs_info->uuid_root) ?
-                       fs_info->uuid_root : ERR_PTR(-ENOENT);
-       if (objectid == BTRFS_BLOCK_GROUP_TREE_OBJECTID)
-               return btrfs_grab_root(fs_info->block_group_root) ?
-                       fs_info->block_group_root : ERR_PTR(-ENOENT);
-       if (objectid == BTRFS_FREE_SPACE_TREE_OBJECTID) {
-               struct btrfs_root *root = btrfs_global_root(fs_info, &key);
-
-               return btrfs_grab_root(root) ? root : ERR_PTR(-ENOENT);
-       }
-       return NULL;
+       default:
+               return NULL;
+       }
 }
 
 int btrfs_insert_fs_root(struct btrfs_fs_info *fs_info,
@@ -1985,7 +1739,6 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
 {
        btrfs_destroy_workqueue(fs_info->fixup_workers);
        btrfs_destroy_workqueue(fs_info->delalloc_workers);
-       btrfs_destroy_workqueue(fs_info->hipri_workers);
        btrfs_destroy_workqueue(fs_info->workers);
        if (fs_info->endio_workers)
                destroy_workqueue(fs_info->endio_workers);
@@ -2177,12 +1930,10 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
 {
        u32 max_active = fs_info->thread_pool_size;
        unsigned int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
+       unsigned int ordered_flags = WQ_MEM_RECLAIM | WQ_FREEZABLE;
 
        fs_info->workers =
                btrfs_alloc_workqueue(fs_info, "worker", flags, max_active, 16);
-       fs_info->hipri_workers =
-               btrfs_alloc_workqueue(fs_info, "worker-high",
-                                     flags | WQ_HIGHPRI, max_active, 16);
 
        fs_info->delalloc_workers =
                btrfs_alloc_workqueue(fs_info, "delalloc",
@@ -2196,7 +1947,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
                btrfs_alloc_workqueue(fs_info, "cache", flags, max_active, 0);
 
        fs_info->fixup_workers =
-               btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0);
+               btrfs_alloc_ordered_workqueue(fs_info, "fixup", ordered_flags);
 
        fs_info->endio_workers =
                alloc_workqueue("btrfs-endio", flags, max_active);
@@ -2215,11 +1966,12 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
                btrfs_alloc_workqueue(fs_info, "delayed-meta", flags,
                                      max_active, 0);
        fs_info->qgroup_rescan_workers =
-               btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
+               btrfs_alloc_ordered_workqueue(fs_info, "qgroup-rescan",
+                                             ordered_flags);
        fs_info->discard_ctl.discard_workers =
-               alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
+               alloc_ordered_workqueue("btrfs_discard", WQ_FREEZABLE);
 
-       if (!(fs_info->workers && fs_info->hipri_workers &&
+       if (!(fs_info->workers &&
              fs_info->delalloc_workers && fs_info->flush_workers &&
              fs_info->endio_workers && fs_info->endio_meta_workers &&
              fs_info->compressed_write_workers &&
@@ -2259,6 +2011,9 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
                if (!strstr(crypto_shash_driver_name(csum_shash), "generic"))
                        set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
                break;
+       case BTRFS_CSUM_TYPE_XXHASH:
+               set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
+               break;
        default:
                break;
        }
@@ -2636,6 +2391,14 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
                ret = -EINVAL;
        }
 
+       if (memcmp(fs_info->fs_devices->metadata_uuid, sb->dev_item.fsid,
+                  BTRFS_FSID_SIZE) != 0) {
+               btrfs_err(fs_info,
+                       "dev_item UUID does not match metadata fsid: %pU != %pU",
+                       fs_info->fs_devices->metadata_uuid, sb->dev_item.fsid);
+               ret = -EINVAL;
+       }
+
        /*
         * Artificial requirement for block-group-tree to force newer features
         * (free-space-tree, no-holes) so the test matrix is smaller.
@@ -2648,14 +2411,6 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
                ret = -EINVAL;
        }
 
-       if (memcmp(fs_info->fs_devices->metadata_uuid, sb->dev_item.fsid,
-                  BTRFS_FSID_SIZE) != 0) {
-               btrfs_err(fs_info,
-                       "dev_item UUID does not match metadata fsid: %pU != %pU",
-                       fs_info->fs_devices->metadata_uuid, sb->dev_item.fsid);
-               ret = -EINVAL;
-       }
-
        /*
         * Hint to catch really bogus numbers, bitflips or so, more exact checks are
         * done later
@@ -2841,6 +2596,7 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
                        /* We can't trust the free space cache either */
                        btrfs_set_opt(fs_info->mount_opt, CLEAR_CACHE);
 
+                       btrfs_warn(fs_info, "try to load backup roots slot %d", i);
                        ret = read_backup_root(fs_info, i);
                        backup_index = ret;
                        if (ret < 0)
@@ -3121,23 +2877,34 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
 {
        int ret;
        const bool cache_opt = btrfs_test_opt(fs_info, SPACE_CACHE);
-       bool clear_free_space_tree = false;
+       bool rebuild_free_space_tree = false;
 
        if (btrfs_test_opt(fs_info, CLEAR_CACHE) &&
            btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
-               clear_free_space_tree = true;
+               rebuild_free_space_tree = true;
        } else if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
                   !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID)) {
                btrfs_warn(fs_info, "free space tree is invalid");
-               clear_free_space_tree = true;
+               rebuild_free_space_tree = true;
        }
 
-       if (clear_free_space_tree) {
-               btrfs_info(fs_info, "clearing free space tree");
-               ret = btrfs_clear_free_space_tree(fs_info);
+       if (rebuild_free_space_tree) {
+               btrfs_info(fs_info, "rebuilding free space tree");
+               ret = btrfs_rebuild_free_space_tree(fs_info);
                if (ret) {
                        btrfs_warn(fs_info,
-                                  "failed to clear free space tree: %d", ret);
+                                  "failed to rebuild free space tree: %d", ret);
+                       goto out;
+               }
+       }
+
+       if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
+           !btrfs_test_opt(fs_info, FREE_SPACE_TREE)) {
+               btrfs_info(fs_info, "disabling free space tree");
+               ret = btrfs_delete_free_space_tree(fs_info);
+               if (ret) {
+                       btrfs_warn(fs_info,
+                                  "failed to disable free space tree: %d", ret);
                        goto out;
                }
        }
@@ -4644,28 +4411,10 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
        btrfs_close_devices(fs_info->fs_devices);
 }
 
-int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
-                         int atomic)
-{
-       int ret;
-       struct inode *btree_inode = buf->pages[0]->mapping->host;
-
-       ret = extent_buffer_uptodate(buf);
-       if (!ret)
-               return ret;
-
-       ret = verify_parent_transid(&BTRFS_I(btree_inode)->io_tree, buf,
-                                   parent_transid, atomic);
-       if (ret == -EAGAIN)
-               return ret;
-       return !ret;
-}
-
 void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
 {
        struct btrfs_fs_info *fs_info = buf->fs_info;
        u64 transid = btrfs_header_generation(buf);
-       int was_dirty;
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
        /*
@@ -4680,19 +4429,13 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
        if (transid != fs_info->generation)
                WARN(1, KERN_CRIT "btrfs transid mismatch buffer %llu, found %llu running %llu\n",
                        buf->start, transid, fs_info->generation);
-       was_dirty = set_extent_buffer_dirty(buf);
-       if (!was_dirty)
-               percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
-                                        buf->len,
-                                        fs_info->dirty_metadata_batch);
+       set_extent_buffer_dirty(buf);
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
        /*
-        * Since btrfs_mark_buffer_dirty() can be called with item pointer set
-        * but item data not updated.
-        * So here we should only check item pointers, not item data.
+        * btrfs_check_leaf() won't check item data if we don't have WRITTEN
+        * set, so this will only validate the basic structure of the items.
         */
-       if (btrfs_header_level(buf) == 0 &&
-           btrfs_check_leaf_relaxed(buf)) {
+       if (btrfs_header_level(buf) == 0 && btrfs_check_leaf(buf)) {
                btrfs_print_leaf(buf);
                ASSERT(0);
        }
@@ -4822,13 +4565,12 @@ static void btrfs_destroy_all_ordered_extents(struct btrfs_fs_info *fs_info)
        btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1);
 }
 
-static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
-                                     struct btrfs_fs_info *fs_info)
+static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
+                                      struct btrfs_fs_info *fs_info)
 {
        struct rb_node *node;
        struct btrfs_delayed_ref_root *delayed_refs;
        struct btrfs_delayed_ref_node *ref;
-       int ret = 0;
 
        delayed_refs = &trans->delayed_refs;
 
@@ -4836,7 +4578,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
        if (atomic_read(&delayed_refs->num_entries) == 0) {
                spin_unlock(&delayed_refs->lock);
                btrfs_debug(fs_info, "delayed_refs has NO entry");
-               return ret;
+               return;
        }
 
        while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
@@ -4853,7 +4595,6 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
                while ((n = rb_first_cached(&head->ref_tree)) != NULL) {
                        ref = rb_entry(n, struct btrfs_delayed_ref_node,
                                       ref_node);
-                       ref->in_tree = 0;
                        rb_erase_cached(&ref->ref_node, &head->ref_tree);
                        RB_CLEAR_NODE(&ref->ref_node);
                        if (!list_empty(&ref->add_list))
@@ -4898,8 +4639,6 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
        btrfs_qgroup_destroy_extent_records(trans);
 
        spin_unlock(&delayed_refs->lock);
-
-       return ret;
 }
 
 static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
@@ -4925,7 +4664,11 @@ static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
                 */
                inode = igrab(&btrfs_inode->vfs_inode);
                if (inode) {
+                       unsigned int nofs_flag;
+
+                       nofs_flag = memalloc_nofs_save();
                        invalidate_inode_pages2(inode->i_mapping);
+                       memalloc_nofs_restore(nofs_flag);
                        iput(inode);
                }
                spin_lock(&root->delalloc_lock);
@@ -5031,7 +4774,12 @@ static void btrfs_cleanup_bg_io(struct btrfs_block_group *cache)
 
        inode = cache->io_ctl.inode;
        if (inode) {
+               unsigned int nofs_flag;
+
+               nofs_flag = memalloc_nofs_save();
                invalidate_inode_pages2(inode->i_mapping);
+               memalloc_nofs_restore(nofs_flag);
+
                BTRFS_I(inode)->generation = 0;
                cache->io_ctl.inode = NULL;
                iput(inode);
@@ -5115,8 +4863,6 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans,
                                     EXTENT_DIRTY);
        btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents);
 
-       btrfs_free_redirty_list(cur_trans);
-
        cur_trans->state =TRANS_STATE_COMPLETED;
        wake_up(&cur_trans->commit_wait);
 }
index 4d57723..b03767f 100644 (file)
@@ -31,8 +31,6 @@ struct btrfs_tree_parent_check;
 
 void btrfs_check_leaked_roots(struct btrfs_fs_info *fs_info);
 void btrfs_init_fs_info(struct btrfs_fs_info *fs_info);
-int btrfs_verify_level_key(struct extent_buffer *eb, int level,
-                          struct btrfs_key *first_key, u64 parent_transid);
 struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
                                      struct btrfs_tree_parent_check *check);
 struct extent_buffer *btrfs_find_create_tree_block(
@@ -84,9 +82,8 @@ void btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info);
 void btrfs_btree_balance_dirty_nodelay(struct btrfs_fs_info *fs_info);
 void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
                                 struct btrfs_root *root);
-int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
-                                  struct page *page, u64 start, u64 end,
-                                  int mirror);
+int btrfs_validate_extent_buffer(struct extent_buffer *eb,
+                                struct btrfs_tree_parent_check *check);
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 struct btrfs_root *btrfs_alloc_dummy_root(struct btrfs_fs_info *fs_info);
 #endif
index 29a2258..a2315a4 100644 (file)
@@ -533,6 +533,16 @@ static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
 }
 
 /*
+ * Detect if extent bits request NOWAIT semantics and set the gfp mask accordingly,
+ * unset the EXTENT_NOWAIT bit.
+ */
+static void set_gfp_mask_from_bits(u32 *bits, gfp_t *mask)
+{
+       *mask = (*bits & EXTENT_NOWAIT ? GFP_NOWAIT : GFP_NOFS);
+       *bits &= EXTENT_NOWAIT - 1;
+}
+
+/*
  * Clear some bits on a range in the tree.  This may require splitting or
  * inserting elements in the tree, so the gfp mask is used to indicate which
  * allocations or sleeping are allowed.
@@ -546,7 +556,7 @@ static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
  */
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
                       u32 bits, struct extent_state **cached_state,
-                      gfp_t mask, struct extent_changeset *changeset)
+                      struct extent_changeset *changeset)
 {
        struct extent_state *state;
        struct extent_state *cached;
@@ -556,7 +566,9 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
        int clear = 0;
        int wake;
        int delete = (bits & EXTENT_CLEAR_ALL_BITS);
+       gfp_t mask;
 
+       set_gfp_mask_from_bits(&bits, &mask);
        btrfs_debug_check_extent_io_range(tree, start, end);
        trace_btrfs_clear_extent_bit(tree, start, end - start + 1, bits);
 
@@ -953,7 +965,8 @@ out:
 
 /*
  * Set some bits on a range in the tree.  This may require allocations or
- * sleeping, so the gfp mask is used to indicate what is allowed.
+ * sleeping. By default all allocations use GFP_NOFS, use EXTENT_NOWAIT for
+ * GFP_NOWAIT.
  *
  * If any of the exclusive bits are set, this will fail with -EEXIST if some
  * part of the range already has the desired bits set.  The extent_state of the
@@ -968,7 +981,7 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
                            u32 bits, u64 *failed_start,
                            struct extent_state **failed_state,
                            struct extent_state **cached_state,
-                           struct extent_changeset *changeset, gfp_t mask)
+                           struct extent_changeset *changeset)
 {
        struct extent_state *state;
        struct extent_state *prealloc = NULL;
@@ -978,7 +991,9 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
        u64 last_start;
        u64 last_end;
        u32 exclusive_bits = (bits & EXTENT_LOCKED);
+       gfp_t mask;
 
+       set_gfp_mask_from_bits(&bits, &mask);
        btrfs_debug_check_extent_io_range(tree, start, end);
        trace_btrfs_set_extent_bit(tree, start, end - start + 1, bits);
 
@@ -1188,10 +1203,10 @@ out:
 }
 
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-                  u32 bits, struct extent_state **cached_state, gfp_t mask)
+                  u32 bits, struct extent_state **cached_state)
 {
        return __set_extent_bit(tree, start, end, bits, NULL, NULL,
-                               cached_state, NULL, mask);
+                               cached_state, NULL);
 }
 
 /*
@@ -1687,8 +1702,7 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
         */
        ASSERT(!(bits & EXTENT_LOCKED));
 
-       return __set_extent_bit(tree, start, end, bits, NULL, NULL, NULL,
-                               changeset, GFP_NOFS);
+       return __set_extent_bit(tree, start, end, bits, NULL, NULL, NULL, changeset);
 }
 
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
@@ -1700,8 +1714,7 @@ int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
         */
        ASSERT(!(bits & EXTENT_LOCKED));
 
-       return __clear_extent_bit(tree, start, end, bits, NULL, GFP_NOFS,
-                                 changeset);
+       return __clear_extent_bit(tree, start, end, bits, NULL, changeset);
 }
 
 int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
@@ -1711,7 +1724,7 @@ int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
        u64 failed_start;
 
        err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, &failed_start,
-                              NULL, cached, NULL, GFP_NOFS);
+                              NULL, cached, NULL);
        if (err == -EEXIST) {
                if (failed_start > start)
                        clear_extent_bit(tree, start, failed_start - 1,
@@ -1733,7 +1746,7 @@ int lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
        u64 failed_start;
 
        err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, &failed_start,
-                              &failed_state, cached_state, NULL, GFP_NOFS);
+                              &failed_state, cached_state, NULL);
        while (err == -EEXIST) {
                if (failed_start != start)
                        clear_extent_bit(tree, start, failed_start - 1,
@@ -1743,7 +1756,7 @@ int lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
                                &failed_state);
                err = __set_extent_bit(tree, start, end, EXTENT_LOCKED,
                                       &failed_start, &failed_state,
-                                      cached_state, NULL, GFP_NOFS);
+                                      cached_state, NULL);
        }
        return err;
 }
index 21766e4..fbd3b27 100644 (file)
@@ -43,6 +43,15 @@ enum {
         * want the extent states to go away.
         */
        ENUM_BIT(EXTENT_CLEAR_ALL_BITS),
+
+       /*
+        * This must be last.
+        *
+        * Bit not representing a state but a request for NOWAIT semantics,
+        * e.g. when allocating memory, and must be masked out from the other
+        * bits.
+        */
+       ENUM_BIT(EXTENT_NOWAIT)
 };
 
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
@@ -127,22 +136,20 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
                             u32 bits, struct extent_changeset *changeset);
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-                      u32 bits, struct extent_state **cached, gfp_t mask,
+                      u32 bits, struct extent_state **cached,
                       struct extent_changeset *changeset);
 
 static inline int clear_extent_bit(struct extent_io_tree *tree, u64 start,
                                   u64 end, u32 bits,
                                   struct extent_state **cached)
 {
-       return __clear_extent_bit(tree, start, end, bits, cached,
-                                 GFP_NOFS, NULL);
+       return __clear_extent_bit(tree, start, end, bits, cached, NULL);
 }
 
 static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end,
                                struct extent_state **cached)
 {
-       return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, cached,
-                                 GFP_NOFS, NULL);
+       return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, cached, NULL);
 }
 
 static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
@@ -154,31 +161,13 @@ static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
                           u32 bits, struct extent_changeset *changeset);
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-                  u32 bits, struct extent_state **cached_state, gfp_t mask);
-
-static inline int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start,
-                                        u64 end, u32 bits)
-{
-       return set_extent_bit(tree, start, end, bits, NULL, GFP_NOWAIT);
-}
-
-static inline int set_extent_bits(struct extent_io_tree *tree, u64 start,
-               u64 end, u32 bits)
-{
-       return set_extent_bit(tree, start, end, bits, NULL, GFP_NOFS);
-}
+                  u32 bits, struct extent_state **cached_state);
 
 static inline int clear_extent_uptodate(struct extent_io_tree *tree, u64 start,
                u64 end, struct extent_state **cached_state)
 {
        return __clear_extent_bit(tree, start, end, EXTENT_UPTODATE,
-                                 cached_state, GFP_NOFS, NULL);
-}
-
-static inline int set_extent_dirty(struct extent_io_tree *tree, u64 start,
-               u64 end, gfp_t mask)
-{
-       return set_extent_bit(tree, start, end, EXTENT_DIRTY, NULL, mask);
+                                 cached_state, NULL);
 }
 
 static inline int clear_extent_dirty(struct extent_io_tree *tree, u64 start,
@@ -193,29 +182,6 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
                       u32 bits, u32 clear_bits,
                       struct extent_state **cached_state);
 
-static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
-                                     u64 end, u32 extra_bits,
-                                     struct extent_state **cached_state)
-{
-       return set_extent_bit(tree, start, end,
-                             EXTENT_DELALLOC | extra_bits,
-                             cached_state, GFP_NOFS);
-}
-
-static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
-               u64 end, struct extent_state **cached_state)
-{
-       return set_extent_bit(tree, start, end,
-                             EXTENT_DELALLOC | EXTENT_DEFRAG,
-                             cached_state, GFP_NOFS);
-}
-
-static inline int set_extent_new(struct extent_io_tree *tree, u64 start,
-               u64 end)
-{
-       return set_extent_bit(tree, start, end, EXTENT_NEW, NULL, GFP_NOFS);
-}
-
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
                          u64 *start_ret, u64 *end_ret, u32 bits,
                          struct extent_state **cached_state);
index 5cd289d..911908e 100644 (file)
@@ -73,8 +73,8 @@ int btrfs_add_excluded_extent(struct btrfs_fs_info *fs_info,
                              u64 start, u64 num_bytes)
 {
        u64 end = start + num_bytes - 1;
-       set_extent_bits(&fs_info->excluded_extents, start, end,
-                       EXTENT_UPTODATE);
+       set_extent_bit(&fs_info->excluded_extents, start, end,
+                      EXTENT_UPTODATE, NULL);
        return 0;
 }
 
@@ -402,7 +402,7 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb,
                }
        }
 
-       btrfs_print_leaf((struct extent_buffer *)eb);
+       btrfs_print_leaf(eb);
        btrfs_err(eb->fs_info,
                  "eb %llu iref 0x%lx invalid extent inline ref type %d",
                  eb->start, (unsigned long)iref, type);
@@ -1164,15 +1164,10 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
                 * should not happen at all.
                 */
                if (owner < BTRFS_FIRST_FREE_OBJECTID) {
+                       btrfs_print_leaf(path->nodes[0]);
                        btrfs_crit(trans->fs_info,
-"adding refs to an existing tree ref, bytenr %llu num_bytes %llu root_objectid %llu",
-                                  bytenr, num_bytes, root_objectid);
-                       if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
-                               WARN_ON(1);
-                               btrfs_crit(trans->fs_info,
-                       "path->slots[0]=%d path->nodes[0]:", path->slots[0]);
-                               btrfs_print_leaf(path->nodes[0]);
-                       }
+"adding refs to an existing tree ref, bytenr %llu num_bytes %llu root_objectid %llu slot %u",
+                                  bytenr, num_bytes, root_objectid, path->slots[0]);
                        return -EUCLEAN;
                }
                update_inline_extent_backref(path, iref, refs_to_add, extent_op);
@@ -1208,11 +1203,11 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
 {
        int j, ret = 0;
        u64 bytes_left, end;
-       u64 aligned_start = ALIGN(start, 1 << 9);
+       u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
 
        if (WARN_ON(start != aligned_start)) {
                len -= aligned_start - start;
-               len = round_down(len, 1 << 9);
+               len = round_down(len, 1 << SECTOR_SHIFT);
                start = aligned_start;
        }
 
@@ -1250,7 +1245,8 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
                }
 
                if (size) {
-                       ret = blkdev_issue_discard(bdev, start >> 9, size >> 9,
+                       ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
+                                                  size >> SECTOR_SHIFT,
                                                   GFP_NOFS);
                        if (!ret)
                                *discarded_bytes += size;
@@ -1267,7 +1263,8 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
        }
 
        if (bytes_left) {
-               ret = blkdev_issue_discard(bdev, start >> 9, bytes_left >> 9,
+               ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
+                                          bytes_left >> SECTOR_SHIFT,
                                           GFP_NOFS);
                if (!ret)
                        *discarded_bytes += bytes_left;
@@ -1500,7 +1497,7 @@ out:
 static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
                                struct btrfs_delayed_ref_node *node,
                                struct btrfs_delayed_extent_op *extent_op,
-                               int insert_reserved)
+                               bool insert_reserved)
 {
        int ret = 0;
        struct btrfs_delayed_data_ref *ref;
@@ -1650,7 +1647,7 @@ out:
 static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
                                struct btrfs_delayed_ref_node *node,
                                struct btrfs_delayed_extent_op *extent_op,
-                               int insert_reserved)
+                               bool insert_reserved)
 {
        int ret = 0;
        struct btrfs_delayed_tree_ref *ref;
@@ -1690,7 +1687,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
 static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
                               struct btrfs_delayed_ref_node *node,
                               struct btrfs_delayed_extent_op *extent_op,
-                              int insert_reserved)
+                              bool insert_reserved)
 {
        int ret = 0;
 
@@ -1748,7 +1745,7 @@ static void unselect_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_ref
                                      struct btrfs_delayed_ref_head *head)
 {
        spin_lock(&delayed_refs->lock);
-       head->processing = 0;
+       head->processing = false;
        delayed_refs->num_heads_ready++;
        spin_unlock(&delayed_refs->lock);
        btrfs_delayed_ref_unlock(head);
@@ -1900,7 +1897,7 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans,
        struct btrfs_delayed_ref_root *delayed_refs;
        struct btrfs_delayed_extent_op *extent_op;
        struct btrfs_delayed_ref_node *ref;
-       int must_insert_reserved = 0;
+       bool must_insert_reserved;
        int ret;
 
        delayed_refs = &trans->transaction->delayed_refs;
@@ -1916,7 +1913,6 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans,
                        return -EAGAIN;
                }
 
-               ref->in_tree = 0;
                rb_erase_cached(&ref->ref_node, &locked_ref->ref_tree);
                RB_CLEAR_NODE(&ref->ref_node);
                if (!list_empty(&ref->add_list))
@@ -1943,7 +1939,7 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans,
                 * spin lock.
                 */
                must_insert_reserved = locked_ref->must_insert_reserved;
-               locked_ref->must_insert_reserved = 0;
+               locked_ref->must_insert_reserved = false;
 
                extent_op = locked_ref->extent_op;
                locked_ref->extent_op = NULL;
@@ -2155,10 +2151,10 @@ out:
 }
 
 int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans,
-                               struct extent_buffer *eb, u64 flags,
-                               int level)
+                               struct extent_buffer *eb, u64 flags)
 {
        struct btrfs_delayed_extent_op *extent_op;
+       int level = btrfs_header_level(eb);
        int ret;
 
        extent_op = btrfs_alloc_delayed_extent_op();
@@ -2510,8 +2506,8 @@ static int pin_down_extent(struct btrfs_trans_handle *trans,
        spin_unlock(&cache->lock);
        spin_unlock(&cache->space_info->lock);
 
-       set_extent_dirty(&trans->transaction->pinned_extents, bytenr,
-                        bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
+       set_extent_bit(&trans->transaction->pinned_extents, bytenr,
+                      bytenr + num_bytes - 1, EXTENT_DIRTY, NULL);
        return 0;
 }
 
@@ -2838,6 +2834,13 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
        return ret;
 }
 
+#define abort_and_dump(trans, path, fmt, args...)      \
+({                                                     \
+       btrfs_abort_transaction(trans, -EUCLEAN);       \
+       btrfs_print_leaf(path->nodes[0]);               \
+       btrfs_crit(trans->fs_info, fmt, ##args);        \
+})
+
 /*
  * Drop one or more refs of @node.
  *
@@ -2978,10 +2981,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 
                if (!found_extent) {
                        if (iref) {
-                               btrfs_crit(info,
-"invalid iref, no EXTENT/METADATA_ITEM found but has inline extent ref");
-                               btrfs_abort_transaction(trans, -EUCLEAN);
-                               goto err_dump;
+                               abort_and_dump(trans, path,
+"invalid iref slot %u, no EXTENT/METADATA_ITEM found but has inline extent ref",
+                                          path->slots[0]);
+                               ret = -EUCLEAN;
+                               goto out;
                        }
                        /* Must be SHARED_* item, remove the backref first */
                        ret = remove_extent_backref(trans, extent_root, path,
@@ -3029,11 +3033,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
                        }
 
                        if (ret) {
-                               btrfs_err(info,
-                                         "umm, got %d back from search, was looking for %llu",
-                                         ret, bytenr);
                                if (ret > 0)
                                        btrfs_print_leaf(path->nodes[0]);
+                               btrfs_err(info,
+                       "umm, got %d back from search, was looking for %llu, slot %d",
+                                         ret, bytenr, path->slots[0]);
                        }
                        if (ret < 0) {
                                btrfs_abort_transaction(trans, ret);
@@ -3042,12 +3046,10 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
                        extent_slot = path->slots[0];
                }
        } else if (WARN_ON(ret == -ENOENT)) {
-               btrfs_print_leaf(path->nodes[0]);
-               btrfs_err(info,
-                       "unable to find ref byte nr %llu parent %llu root %llu  owner %llu offset %llu",
-                       bytenr, parent, root_objectid, owner_objectid,
-                       owner_offset);
-               btrfs_abort_transaction(trans, ret);
+               abort_and_dump(trans, path,
+"unable to find ref byte nr %llu parent %llu root %llu owner %llu offset %llu slot %d",
+                              bytenr, parent, root_objectid, owner_objectid,
+                              owner_offset, path->slots[0]);
                goto out;
        } else {
                btrfs_abort_transaction(trans, ret);
@@ -3067,14 +3069,15 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
        if (owner_objectid < BTRFS_FIRST_FREE_OBJECTID &&
            key.type == BTRFS_EXTENT_ITEM_KEY) {
                struct btrfs_tree_block_info *bi;
+
                if (item_size < sizeof(*ei) + sizeof(*bi)) {
-                       btrfs_crit(info,
-"invalid extent item size for key (%llu, %u, %llu) owner %llu, has %u expect >= %zu",
-                                  key.objectid, key.type, key.offset,
-                                  owner_objectid, item_size,
-                                  sizeof(*ei) + sizeof(*bi));
-                       btrfs_abort_transaction(trans, -EUCLEAN);
-                       goto err_dump;
+                       abort_and_dump(trans, path,
+"invalid extent item size for key (%llu, %u, %llu) slot %u owner %llu, has %u expect >= %zu",
+                                      key.objectid, key.type, key.offset,
+                                      path->slots[0], owner_objectid, item_size,
+                                      sizeof(*ei) + sizeof(*bi));
+                       ret = -EUCLEAN;
+                       goto out;
                }
                bi = (struct btrfs_tree_block_info *)(ei + 1);
                WARN_ON(owner_objectid != btrfs_tree_block_level(leaf, bi));
@@ -3082,11 +3085,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 
        refs = btrfs_extent_refs(leaf, ei);
        if (refs < refs_to_drop) {
-               btrfs_crit(info,
-               "trying to drop %d refs but we only have %llu for bytenr %llu",
-                         refs_to_drop, refs, bytenr);
-               btrfs_abort_transaction(trans, -EUCLEAN);
-               goto err_dump;
+               abort_and_dump(trans, path,
+               "trying to drop %d refs but we only have %llu for bytenr %llu slot %u",
+                              refs_to_drop, refs, bytenr, path->slots[0]);
+               ret = -EUCLEAN;
+               goto out;
        }
        refs -= refs_to_drop;
 
@@ -3099,10 +3102,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
                 */
                if (iref) {
                        if (!found_extent) {
-                               btrfs_crit(info,
-"invalid iref, got inlined extent ref but no EXTENT/METADATA_ITEM found");
-                               btrfs_abort_transaction(trans, -EUCLEAN);
-                               goto err_dump;
+                               abort_and_dump(trans, path,
+"invalid iref, got inlined extent ref but no EXTENT/METADATA_ITEM found, slot %u",
+                                              path->slots[0]);
+                               ret = -EUCLEAN;
+                               goto out;
                        }
                } else {
                        btrfs_set_extent_refs(leaf, ei, refs);
@@ -3121,21 +3125,21 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
                if (found_extent) {
                        if (is_data && refs_to_drop !=
                            extent_data_ref_count(path, iref)) {
-                               btrfs_crit(info,
-               "invalid refs_to_drop, current refs %u refs_to_drop %u",
-                                          extent_data_ref_count(path, iref),
-                                          refs_to_drop);
-                               btrfs_abort_transaction(trans, -EUCLEAN);
-                               goto err_dump;
+                               abort_and_dump(trans, path,
+               "invalid refs_to_drop, current refs %u refs_to_drop %u slot %u",
+                                              extent_data_ref_count(path, iref),
+                                              refs_to_drop, path->slots[0]);
+                               ret = -EUCLEAN;
+                               goto out;
                        }
                        if (iref) {
                                if (path->slots[0] != extent_slot) {
-                                       btrfs_crit(info,
-"invalid iref, extent item key (%llu %u %llu) doesn't have wanted iref",
-                                                  key.objectid, key.type,
-                                                  key.offset);
-                                       btrfs_abort_transaction(trans, -EUCLEAN);
-                                       goto err_dump;
+                                       abort_and_dump(trans, path,
+"invalid iref, extent item key (%llu %u %llu) slot %u doesn't have wanted iref",
+                                                      key.objectid, key.type,
+                                                      key.offset, path->slots[0]);
+                                       ret = -EUCLEAN;
+                                       goto out;
                                }
                        } else {
                                /*
@@ -3145,10 +3149,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
                                 * [ EXTENT/METADATA_ITEM ][ SHARED_* ITEM ]
                                 */
                                if (path->slots[0] != extent_slot + 1) {
-                                       btrfs_crit(info,
-       "invalid SHARED_* item, previous item is not EXTENT/METADATA_ITEM");
-                                       btrfs_abort_transaction(trans, -EUCLEAN);
-                                       goto err_dump;
+                                       abort_and_dump(trans, path,
+       "invalid SHARED_* item slot %u, previous item is not EXTENT/METADATA_ITEM",
+                                                      path->slots[0]);
+                                       ret = -EUCLEAN;
+                                       goto out;
                                }
                                path->slots[0] = extent_slot;
                                num_to_del = 2;
@@ -3170,19 +3175,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 out:
        btrfs_free_path(path);
        return ret;
-err_dump:
-       /*
-        * Leaf dump can take up a lot of log buffer, so we only do full leaf
-        * dump for debug build.
-        */
-       if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
-               btrfs_crit(info, "path->slots[0]=%d extent_slot=%d",
-                          path->slots[0], extent_slot);
-               btrfs_print_leaf(path->nodes[0]);
-       }
-
-       btrfs_free_path(path);
-       return -EUCLEAN;
 }
 
 /*
@@ -3219,7 +3211,7 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans,
                goto out;
 
        btrfs_delete_ref_head(delayed_refs, head);
-       head->processing = 0;
+       head->processing = false;
 
        spin_unlock(&head->lock);
        spin_unlock(&delayed_refs->lock);
@@ -4804,7 +4796,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
            !test_bit(BTRFS_ROOT_RESET_LOCKDEP_CLASS, &root->state))
                lockdep_owner = BTRFS_FS_TREE_OBJECTID;
 
-       /* btrfs_clean_tree_block() accesses generation field. */
+       /* btrfs_clear_buffer_dirty() accesses generation field. */
        btrfs_set_header_generation(buf, trans->transid);
 
        /*
@@ -4836,15 +4828,17 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
                 * EXTENT bit to differentiate dirty pages.
                 */
                if (buf->log_index == 0)
-                       set_extent_dirty(&root->dirty_log_pages, buf->start,
-                                       buf->start + buf->len - 1, GFP_NOFS);
+                       set_extent_bit(&root->dirty_log_pages, buf->start,
+                                      buf->start + buf->len - 1,
+                                      EXTENT_DIRTY, NULL);
                else
-                       set_extent_new(&root->dirty_log_pages, buf->start,
-                                       buf->start + buf->len - 1);
+                       set_extent_bit(&root->dirty_log_pages, buf->start,
+                                      buf->start + buf->len - 1,
+                                      EXTENT_NEW, NULL);
        } else {
                buf->log_index = -1;
-               set_extent_dirty(&trans->transaction->dirty_pages, buf->start,
-                        buf->start + buf->len - 1, GFP_NOFS);
+               set_extent_bit(&trans->transaction->dirty_pages, buf->start,
+                              buf->start + buf->len - 1, EXTENT_DIRTY, NULL);
        }
        /* this returns a buffer locked for blocking */
        return buf;
@@ -5102,8 +5096,7 @@ static noinline int walk_down_proc(struct btrfs_trans_handle *trans,
                BUG_ON(ret); /* -ENOMEM */
                ret = btrfs_dec_ref(trans, root, eb, 0);
                BUG_ON(ret); /* -ENOMEM */
-               ret = btrfs_set_disk_extent_flags(trans, eb, flag,
-                                                 btrfs_header_level(eb));
+               ret = btrfs_set_disk_extent_flags(trans, eb, flag);
                BUG_ON(ret); /* -ENOMEM */
                wc->flags[level] |= flag;
        }
@@ -5985,9 +5978,8 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
                ret = btrfs_issue_discard(device->bdev, start, len,
                                          &bytes);
                if (!ret)
-                       set_extent_bits(&device->alloc_state, start,
-                                       start + bytes - 1,
-                                       CHUNK_TRIMMED);
+                       set_extent_bit(&device->alloc_state, start,
+                                      start + bytes - 1, CHUNK_TRIMMED, NULL);
                mutex_unlock(&fs_info->chunk_mutex);
 
                if (ret)
index 0c958fc..429d5c5 100644 (file)
@@ -141,7 +141,7 @@ int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
                  struct extent_buffer *buf, int full_backref);
 int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans,
-                               struct extent_buffer *eb, u64 flags, int level);
+                               struct extent_buffer *eb, u64 flags);
 int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_ref *ref);
 
 int btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info,
index a1adadd..a91d5ad 100644 (file)
@@ -98,33 +98,16 @@ void btrfs_extent_buffer_leak_debug_check(struct btrfs_fs_info *fs_info)
  */
 struct btrfs_bio_ctrl {
        struct btrfs_bio *bbio;
-       int mirror_num;
        enum btrfs_compression_type compress_type;
        u32 len_to_oe_boundary;
        blk_opf_t opf;
        btrfs_bio_end_io_t end_io_func;
        struct writeback_control *wbc;
-
-       /*
-        * This is for metadata read, to provide the extra needed verification
-        * info.  This has to be provided for submit_one_bio(), as
-        * submit_one_bio() can submit a bio if it ends at stripe boundary.  If
-        * no such parent_check is provided, the metadata can hit false alert at
-        * endio time.
-        */
-       struct btrfs_tree_parent_check *parent_check;
-
-       /*
-        * Tell writepage not to lock the state bits for this range, it still
-        * does the unlocking.
-        */
-       bool extent_locked;
 };
 
 static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 {
        struct btrfs_bio *bbio = bio_ctrl->bbio;
-       int mirror_num = bio_ctrl->mirror_num;
 
        if (!bbio)
                return;
@@ -132,25 +115,11 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
        /* Caller should ensure the bio has at least some range added */
        ASSERT(bbio->bio.bi_iter.bi_size);
 
-       if (!is_data_inode(&bbio->inode->vfs_inode)) {
-               if (btrfs_op(&bbio->bio) != BTRFS_MAP_WRITE) {
-                       /*
-                        * For metadata read, we should have the parent_check,
-                        * and copy it to bbio for metadata verification.
-                        */
-                       ASSERT(bio_ctrl->parent_check);
-                       memcpy(&bbio->parent_check,
-                              bio_ctrl->parent_check,
-                              sizeof(struct btrfs_tree_parent_check));
-               }
-               bbio->bio.bi_opf |= REQ_META;
-       }
-
        if (btrfs_op(&bbio->bio) == BTRFS_MAP_READ &&
            bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
-               btrfs_submit_compressed_read(bbio, mirror_num);
+               btrfs_submit_compressed_read(bbio);
        else
-               btrfs_submit_bio(bbio, mirror_num);
+               btrfs_submit_bio(bbio, 0);
 
        /* The bbio is owned by the end_io handler now */
        bio_ctrl->bbio = NULL;
@@ -248,8 +217,6 @@ static int process_one_page(struct btrfs_fs_info *fs_info,
 
        if (page_ops & PAGE_SET_ORDERED)
                btrfs_page_clamp_set_ordered(fs_info, page, start, len);
-       if (page_ops & PAGE_SET_ERROR)
-               btrfs_page_clamp_set_error(fs_info, page, start, len);
        if (page_ops & PAGE_START_WRITEBACK) {
                btrfs_page_clamp_clear_dirty(fs_info, page, start, len);
                btrfs_page_clamp_set_writeback(fs_info, page, start, len);
@@ -295,9 +262,6 @@ static int __process_pages_contig(struct address_space *mapping,
                ASSERT(processed_end && *processed_end == start);
        }
 
-       if ((page_ops & PAGE_SET_ERROR) && start_index <= end_index)
-               mapping_set_error(mapping, -EIO);
-
        folio_batch_init(&fbatch);
        while (index <= end_index) {
                int found_folios;
@@ -506,6 +470,15 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
                               start, end, page_ops, NULL);
 }
 
+static bool btrfs_verify_page(struct page *page, u64 start)
+{
+       if (!fsverity_active(page->mapping->host) ||
+           PageUptodate(page) ||
+           start >= i_size_read(page->mapping->host))
+               return true;
+       return fsverity_verify_page(page);
+}
+
 static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 {
        struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
@@ -513,20 +486,10 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
        ASSERT(page_offset(page) <= start &&
               start + len <= page_offset(page) + PAGE_SIZE);
 
-       if (uptodate) {
-               if (fsverity_active(page->mapping->host) &&
-                   !PageError(page) &&
-                   !PageUptodate(page) &&
-                   start < i_size_read(page->mapping->host) &&
-                   !fsverity_verify_page(page)) {
-                       btrfs_page_set_error(fs_info, page, start, len);
-               } else {
-                       btrfs_page_set_uptodate(fs_info, page, start, len);
-               }
-       } else {
+       if (uptodate && btrfs_verify_page(page, start))
+               btrfs_page_set_uptodate(fs_info, page, start, len);
+       else
                btrfs_page_clear_uptodate(fs_info, page, start, len);
-               btrfs_page_set_error(fs_info, page, start, len);
-       }
 
        if (!btrfs_is_subpage(fs_info, page))
                unlock_page(page);
@@ -554,7 +517,6 @@ void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
                len = end + 1 - start;
 
                btrfs_page_clear_uptodate(fs_info, page, start, len);
-               btrfs_page_set_error(fs_info, page, start, len);
                ret = err < 0 ? err : -EIO;
                mapping_set_error(page->mapping, ret);
        }
@@ -574,8 +536,6 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio)
        struct bio *bio = &bbio->bio;
        int error = blk_status_to_errno(bio->bi_status);
        struct bio_vec *bvec;
-       u64 start;
-       u64 end;
        struct bvec_iter_all iter_all;
 
        ASSERT(!bio_flagged(bio, BIO_CLONED));
@@ -584,6 +544,8 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio)
                struct inode *inode = page->mapping->host;
                struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
                const u32 sectorsize = fs_info->sectorsize;
+               u64 start = page_offset(page) + bvec->bv_offset;
+               u32 len = bvec->bv_len;
 
                /* Our read/write should always be sector aligned. */
                if (!IS_ALIGNED(bvec->bv_offset, sectorsize))
@@ -595,12 +557,12 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio)
                "incomplete page write with offset %u and length %u",
                                   bvec->bv_offset, bvec->bv_len);
 
-               start = page_offset(page) + bvec->bv_offset;
-               end = start + bvec->bv_len - 1;
-
-               end_extent_writepage(page, error, start, end);
-
-               btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len);
+               btrfs_finish_ordered_extent(bbio->ordered, page, start, len, !error);
+               if (error) {
+                       btrfs_page_clear_uptodate(fs_info, page, start, len);
+                       mapping_set_error(page->mapping, error);
+               }
+               btrfs_page_clear_writeback(fs_info, page, start, len);
        }
 
        bio_put(bio);
@@ -686,35 +648,6 @@ static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page)
 }
 
 /*
- * Find extent buffer for a givne bytenr.
- *
- * This is for end_bio_extent_readpage(), thus we can't do any unsafe locking
- * in endio context.
- */
-static struct extent_buffer *find_extent_buffer_readpage(
-               struct btrfs_fs_info *fs_info, struct page *page, u64 bytenr)
-{
-       struct extent_buffer *eb;
-
-       /*
-        * For regular sectorsize, we can use page->private to grab extent
-        * buffer
-        */
-       if (fs_info->nodesize >= PAGE_SIZE) {
-               ASSERT(PagePrivate(page) && page->private);
-               return (struct extent_buffer *)page->private;
-       }
-
-       /* For subpage case, we need to lookup buffer radix tree */
-       rcu_read_lock();
-       eb = radix_tree_lookup(&fs_info->buffer_radix,
-                              bytenr >> fs_info->sectorsize_bits);
-       rcu_read_unlock();
-       ASSERT(eb);
-       return eb;
-}
-
-/*
  * after a readpage IO is done, we need to:
  * clear the uptodate bits on error
  * set the uptodate bits if things worked
@@ -735,7 +668,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
         * larger than UINT_MAX, u32 here is enough.
         */
        u32 bio_offset = 0;
-       int mirror;
        struct bvec_iter_all iter_all;
 
        ASSERT(!bio_flagged(bio, BIO_CLONED));
@@ -775,11 +707,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
                end = start + bvec->bv_len - 1;
                len = bvec->bv_len;
 
-               mirror = bbio->mirror_num;
-               if (uptodate && !is_data_inode(inode) &&
-                   btrfs_validate_metadata_buffer(bbio, page, start, end, mirror))
-                       uptodate = false;
-
                if (likely(uptodate)) {
                        loff_t i_size = i_size_read(inode);
                        pgoff_t end_index = i_size >> PAGE_SHIFT;
@@ -800,19 +727,12 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
                                zero_user_segment(page, zero_start,
                                                  offset_in_page(end) + 1);
                        }
-               } else if (!is_data_inode(inode)) {
-                       struct extent_buffer *eb;
-
-                       eb = find_extent_buffer_readpage(fs_info, page, start);
-                       set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
-                       eb->read_mirror = mirror;
-                       atomic_dec(&eb->io_pages);
                }
 
                /* Update page status and unlock. */
                end_page_read(page, uptodate, start, len);
                endio_readpage_release_extent(&processed, BTRFS_I(inode),
-                                             start, end, PageUptodate(page));
+                                             start, end, uptodate);
 
                ASSERT(bio_offset + len > bio_offset);
                bio_offset += len;
@@ -906,13 +826,8 @@ static void alloc_new_bio(struct btrfs_inode *inode,
        bio_ctrl->bbio = bbio;
        bio_ctrl->len_to_oe_boundary = U32_MAX;
 
-       /*
-        * Limit the extent to the ordered boundary for Zone Append.
-        * Compressed bios aren't submitted directly, so it doesn't apply to
-        * them.
-        */
-       if (bio_ctrl->compress_type == BTRFS_COMPRESS_NONE &&
-           btrfs_use_zone_append(bbio)) {
+       /* Limit data write bios to the ordered boundary. */
+       if (bio_ctrl->wbc) {
                struct btrfs_ordered_extent *ordered;
 
                ordered = btrfs_lookup_ordered_extent(inode, file_offset);
@@ -920,11 +835,9 @@ static void alloc_new_bio(struct btrfs_inode *inode,
                        bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
                                        ordered->file_offset +
                                        ordered->disk_num_bytes - file_offset);
-                       btrfs_put_ordered_extent(ordered);
+                       bbio->ordered = ordered;
                }
-       }
 
-       if (bio_ctrl->wbc) {
                /*
                 * Pick the last added device to support cgroup writeback.  For
                 * multi-device file systems this means blk-cgroup policies have
@@ -1125,7 +1038,6 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
        ret = set_page_extent_mapped(page);
        if (ret < 0) {
                unlock_extent(tree, start, end, NULL);
-               btrfs_page_set_error(fs_info, page, start, PAGE_SIZE);
                unlock_page(page);
                return ret;
        }
@@ -1329,11 +1241,9 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
                }
                ret = btrfs_run_delalloc_range(inode, page, delalloc_start,
                                delalloc_end, &page_started, &nr_written, wbc);
-               if (ret) {
-                       btrfs_page_set_error(inode->root->fs_info, page,
-                                            page_offset(page), PAGE_SIZE);
+               if (ret)
                        return ret;
-               }
+
                /*
                 * delalloc_end is already one less than the total length, so
                 * we don't subtract one from PAGE_SIZE
@@ -1438,7 +1348,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
        struct extent_map *em;
        int ret = 0;
        int nr = 0;
-       bool compressed;
 
        ret = btrfs_writepage_cow_fixup(page);
        if (ret) {
@@ -1448,12 +1357,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
                return 1;
        }
 
-       /*
-        * we don't want to touch the inode after unlocking the page,
-        * so we update the mapping writeback index now
-        */
-       bio_ctrl->wbc->nr_to_write--;
-
        bio_ctrl->end_io_func = end_bio_extent_writepage;
        while (cur <= end) {
                u64 disk_bytenr;
@@ -1486,7 +1389,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 
                em = btrfs_get_extent(inode, NULL, 0, cur, end - cur + 1);
                if (IS_ERR(em)) {
-                       btrfs_page_set_error(fs_info, page, cur, end - cur + 1);
                        ret = PTR_ERR_OR_ZERO(em);
                        goto out_error;
                }
@@ -1497,10 +1399,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
                ASSERT(cur < end);
                ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
                ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
+
                block_start = em->block_start;
-               compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
                disk_bytenr = em->block_start + extent_offset;
 
+               ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags));
+               ASSERT(block_start != EXTENT_MAP_HOLE);
+               ASSERT(block_start != EXTENT_MAP_INLINE);
+
                /*
                 * Note that em_end from extent_map_end() and dirty_range_end from
                 * find_next_dirty_byte() are all exclusive
@@ -1509,22 +1415,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
                free_extent_map(em);
                em = NULL;
 
-               /*
-                * compressed and inline extents are written through other
-                * paths in the FS
-                */
-               if (compressed || block_start == EXTENT_MAP_HOLE ||
-                   block_start == EXTENT_MAP_INLINE) {
-                       if (compressed)
-                               nr++;
-                       else
-                               btrfs_writepage_endio_finish_ordered(inode,
-                                               page, cur, cur + iosize - 1, true);
-                       btrfs_page_clear_dirty(fs_info, page, cur, iosize);
-                       cur += iosize;
-                       continue;
-               }
-
                btrfs_set_range_writeback(inode, cur, cur + iosize - 1);
                if (!PageWriteback(page)) {
                        btrfs_err(inode->root->fs_info,
@@ -1572,7 +1462,6 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 {
        struct folio *folio = page_folio(page);
        struct inode *inode = page->mapping->host;
-       struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
        const u64 page_start = page_offset(page);
        const u64 page_end = page_start + PAGE_SIZE - 1;
        int ret;
@@ -1585,9 +1474,6 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 
        WARN_ON(!PageLocked(page));
 
-       btrfs_page_clear_error(btrfs_sb(inode->i_sb), page,
-                              page_offset(page), PAGE_SIZE);
-
        pg_offset = offset_in_page(i_size);
        if (page->index > end_index ||
           (page->index == end_index && !pg_offset)) {
@@ -1600,77 +1486,30 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
                memzero_page(page, pg_offset, PAGE_SIZE - pg_offset);
 
        ret = set_page_extent_mapped(page);
-       if (ret < 0) {
-               SetPageError(page);
+       if (ret < 0)
                goto done;
-       }
 
-       if (!bio_ctrl->extent_locked) {
-               ret = writepage_delalloc(BTRFS_I(inode), page, bio_ctrl->wbc);
-               if (ret == 1)
-                       return 0;
-               if (ret)
-                       goto done;
-       }
+       ret = writepage_delalloc(BTRFS_I(inode), page, bio_ctrl->wbc);
+       if (ret == 1)
+               return 0;
+       if (ret)
+               goto done;
 
        ret = __extent_writepage_io(BTRFS_I(inode), page, bio_ctrl, i_size, &nr);
        if (ret == 1)
                return 0;
 
+       bio_ctrl->wbc->nr_to_write--;
+
 done:
        if (nr == 0) {
                /* make sure the mapping tag for page dirty gets cleared */
                set_page_writeback(page);
                end_page_writeback(page);
        }
-       /*
-        * Here we used to have a check for PageError() and then set @ret and
-        * call end_extent_writepage().
-        *
-        * But in fact setting @ret here will cause different error paths
-        * between subpage and regular sectorsize.
-        *
-        * For regular page size, we never submit current page, but only add
-        * current page to current bio.
-        * The bio submission can only happen in next page.
-        * Thus if we hit the PageError() branch, @ret is already set to
-        * non-zero value and will not get updated for regular sectorsize.
-        *
-        * But for subpage case, it's possible we submit part of current page,
-        * thus can get PageError() set by submitted bio of the same page,
-        * while our @ret is still 0.
-        *
-        * So here we unify the behavior and don't set @ret.
-        * Error can still be properly passed to higher layer as page will
-        * be set error, here we just don't handle the IO failure.
-        *
-        * NOTE: This is just a hotfix for subpage.
-        * The root fix will be properly ending ordered extent when we hit
-        * an error during writeback.
-        *
-        * But that needs a bigger refactoring, as we not only need to grab the
-        * submitted OE, but also need to know exactly at which bytenr we hit
-        * the error.
-        * Currently the full page based __extent_writepage_io() is not
-        * capable of that.
-        */
-       if (PageError(page))
+       if (ret)
                end_extent_writepage(page, ret, page_start, page_end);
-       if (bio_ctrl->extent_locked) {
-               struct writeback_control *wbc = bio_ctrl->wbc;
-
-               /*
-                * If bio_ctrl->extent_locked, it's from extent_write_locked_range(),
-                * the page can either be locked by lock_page() or
-                * process_one_page().
-                * Let btrfs_page_unlock_writer() handle both cases.
-                */
-               ASSERT(wbc);
-               btrfs_page_unlock_writer(fs_info, page, wbc->range_start,
-                                        wbc->range_end + 1 - wbc->range_start);
-       } else {
-               unlock_page(page);
-       }
+       unlock_page(page);
        ASSERT(ret <= 0);
        return ret;
 }
@@ -1681,52 +1520,26 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
                       TASK_UNINTERRUPTIBLE);
 }
 
-static void end_extent_buffer_writeback(struct extent_buffer *eb)
-{
-       clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
-       smp_mb__after_atomic();
-       wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
-}
-
 /*
  * Lock extent buffer status and pages for writeback.
  *
- * May try to flush write bio if we can't get the lock.
- *
- * Return  0 if the extent buffer doesn't need to be submitted.
- *           (E.g. the extent buffer is not dirty)
- * Return >0 is the extent buffer is submitted to bio.
- * Return <0 if something went wrong, no page is locked.
+ * Return %false if the extent buffer doesn't need to be submitted (e.g. the
+ * extent buffer is not dirty)
+ * Return %true is the extent buffer is submitted to bio.
  */
-static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb,
-                         struct btrfs_bio_ctrl *bio_ctrl)
+static noinline_for_stack bool lock_extent_buffer_for_io(struct extent_buffer *eb,
+                         struct writeback_control *wbc)
 {
        struct btrfs_fs_info *fs_info = eb->fs_info;
-       int i, num_pages;
-       int flush = 0;
-       int ret = 0;
+       bool ret = false;
 
-       if (!btrfs_try_tree_write_lock(eb)) {
-               submit_write_bio(bio_ctrl, 0);
-               flush = 1;
-               btrfs_tree_lock(eb);
-       }
-
-       if (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
+       btrfs_tree_lock(eb);
+       while (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
                btrfs_tree_unlock(eb);
-               if (bio_ctrl->wbc->sync_mode != WB_SYNC_ALL)
-                       return 0;
-               if (!flush) {
-                       submit_write_bio(bio_ctrl, 0);
-                       flush = 1;
-               }
-               while (1) {
-                       wait_on_extent_buffer_writeback(eb);
-                       btrfs_tree_lock(eb);
-                       if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags))
-                               break;
-                       btrfs_tree_unlock(eb);
-               }
+               if (wbc->sync_mode != WB_SYNC_ALL)
+                       return false;
+               wait_on_extent_buffer_writeback(eb);
+               btrfs_tree_lock(eb);
        }
 
        /*
@@ -1742,45 +1555,19 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
                percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
                                         -eb->len,
                                         fs_info->dirty_metadata_batch);
-               ret = 1;
+               ret = true;
        } else {
                spin_unlock(&eb->refs_lock);
        }
-
        btrfs_tree_unlock(eb);
-
-       /*
-        * Either we don't need to submit any tree block, or we're submitting
-        * subpage eb.
-        * Subpage metadata doesn't use page locking at all, so we can skip
-        * the page locking.
-        */
-       if (!ret || fs_info->nodesize < PAGE_SIZE)
-               return ret;
-
-       num_pages = num_extent_pages(eb);
-       for (i = 0; i < num_pages; i++) {
-               struct page *p = eb->pages[i];
-
-               if (!trylock_page(p)) {
-                       if (!flush) {
-                               submit_write_bio(bio_ctrl, 0);
-                               flush = 1;
-                       }
-                       lock_page(p);
-               }
-       }
-
        return ret;
 }
 
-static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
+static void set_btree_ioerr(struct extent_buffer *eb)
 {
        struct btrfs_fs_info *fs_info = eb->fs_info;
 
-       btrfs_page_set_error(fs_info, page, eb->start, eb->len);
-       if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
-               return;
+       set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
 
        /*
         * A read may stumble upon this buffer later, make sure that it gets an
@@ -1794,7 +1581,7 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
         * return a 0 because we are readonly if we don't modify the err seq for
         * the superblock.
         */
-       mapping_set_error(page->mapping, -EIO);
+       mapping_set_error(eb->fs_info->btree_inode->i_mapping, -EIO);
 
        /*
         * If writeback for a btree extent that doesn't belong to a log tree
@@ -1869,101 +1656,34 @@ static struct extent_buffer *find_extent_buffer_nolock(
        return NULL;
 }
 
-/*
- * The endio function for subpage extent buffer write.
- *
- * Unlike end_bio_extent_buffer_writepage(), we only call end_page_writeback()
- * after all extent buffers in the page has finished their writeback.
- */
-static void end_bio_subpage_eb_writepage(struct btrfs_bio *bbio)
+static void extent_buffer_write_end_io(struct btrfs_bio *bbio)
 {
-       struct bio *bio = &bbio->bio;
-       struct btrfs_fs_info *fs_info;
-       struct bio_vec *bvec;
+       struct extent_buffer *eb = bbio->private;
+       struct btrfs_fs_info *fs_info = eb->fs_info;
+       bool uptodate = !bbio->bio.bi_status;
        struct bvec_iter_all iter_all;
+       struct bio_vec *bvec;
+       u32 bio_offset = 0;
 
-       fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb);
-       ASSERT(fs_info->nodesize < PAGE_SIZE);
+       if (!uptodate)
+               set_btree_ioerr(eb);
 
-       ASSERT(!bio_flagged(bio, BIO_CLONED));
-       bio_for_each_segment_all(bvec, bio, iter_all) {
+       bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
+               u64 start = eb->start + bio_offset;
                struct page *page = bvec->bv_page;
-               u64 bvec_start = page_offset(page) + bvec->bv_offset;
-               u64 bvec_end = bvec_start + bvec->bv_len - 1;
-               u64 cur_bytenr = bvec_start;
-
-               ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
-
-               /* Iterate through all extent buffers in the range */
-               while (cur_bytenr <= bvec_end) {
-                       struct extent_buffer *eb;
-                       int done;
-
-                       /*
-                        * Here we can't use find_extent_buffer(), as it may
-                        * try to lock eb->refs_lock, which is not safe in endio
-                        * context.
-                        */
-                       eb = find_extent_buffer_nolock(fs_info, cur_bytenr);
-                       ASSERT(eb);
-
-                       cur_bytenr = eb->start + eb->len;
-
-                       ASSERT(test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags));
-                       done = atomic_dec_and_test(&eb->io_pages);
-                       ASSERT(done);
-
-                       if (bio->bi_status ||
-                           test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
-                               ClearPageUptodate(page);
-                               set_btree_ioerr(page, eb);
-                       }
+               u32 len = bvec->bv_len;
 
-                       btrfs_subpage_clear_writeback(fs_info, page, eb->start,
-                                                     eb->len);
-                       end_extent_buffer_writeback(eb);
-                       /*
-                        * free_extent_buffer() will grab spinlock which is not
-                        * safe in endio context. Thus here we manually dec
-                        * the ref.
-                        */
-                       atomic_dec(&eb->refs);
-               }
+               if (!uptodate)
+                       btrfs_page_clear_uptodate(fs_info, page, start, len);
+               btrfs_page_clear_writeback(fs_info, page, start, len);
+               bio_offset += len;
        }
-       bio_put(bio);
-}
 
-static void end_bio_extent_buffer_writepage(struct btrfs_bio *bbio)
-{
-       struct bio *bio = &bbio->bio;
-       struct bio_vec *bvec;
-       struct extent_buffer *eb;
-       int done;
-       struct bvec_iter_all iter_all;
-
-       ASSERT(!bio_flagged(bio, BIO_CLONED));
-       bio_for_each_segment_all(bvec, bio, iter_all) {
-               struct page *page = bvec->bv_page;
-
-               eb = (struct extent_buffer *)page->private;
-               BUG_ON(!eb);
-               done = atomic_dec_and_test(&eb->io_pages);
-
-               if (bio->bi_status ||
-                   test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
-                       ClearPageUptodate(page);
-                       set_btree_ioerr(page, eb);
-               }
-
-               end_page_writeback(page);
-
-               if (!done)
-                       continue;
-
-               end_extent_buffer_writeback(eb);
-       }
+       clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
+       smp_mb__after_atomic();
+       wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
 
-       bio_put(bio);
+       bio_put(&bbio->bio);
 }
 
 static void prepare_eb_write(struct extent_buffer *eb)
@@ -1973,7 +1693,6 @@ static void prepare_eb_write(struct extent_buffer *eb)
        unsigned long end;
 
        clear_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
-       atomic_set(&eb->io_pages, num_extent_pages(eb));
 
        /* Set btree blocks beyond nritems with 0 to avoid stale content */
        nritems = btrfs_header_nritems(eb);
@@ -1995,63 +1714,49 @@ static void prepare_eb_write(struct extent_buffer *eb)
        }
 }
 
-/*
- * Unlike the work in write_one_eb(), we rely completely on extent locking.
- * Page locking is only utilized at minimum to keep the VMM code happy.
- */
-static void write_one_subpage_eb(struct extent_buffer *eb,
-                                struct btrfs_bio_ctrl *bio_ctrl)
-{
-       struct btrfs_fs_info *fs_info = eb->fs_info;
-       struct page *page = eb->pages[0];
-       bool no_dirty_ebs = false;
-
-       prepare_eb_write(eb);
-
-       /* clear_page_dirty_for_io() in subpage helper needs page locked */
-       lock_page(page);
-       btrfs_subpage_set_writeback(fs_info, page, eb->start, eb->len);
-
-       /* Check if this is the last dirty bit to update nr_written */
-       no_dirty_ebs = btrfs_subpage_clear_and_test_dirty(fs_info, page,
-                                                         eb->start, eb->len);
-       if (no_dirty_ebs)
-               clear_page_dirty_for_io(page);
-
-       bio_ctrl->end_io_func = end_bio_subpage_eb_writepage;
-
-       submit_extent_page(bio_ctrl, eb->start, page, eb->len,
-                          eb->start - page_offset(page));
-       unlock_page(page);
-       /*
-        * Submission finished without problem, if no range of the page is
-        * dirty anymore, we have submitted a page.  Update nr_written in wbc.
-        */
-       if (no_dirty_ebs)
-               bio_ctrl->wbc->nr_to_write--;
-}
-
 static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
-                       struct btrfs_bio_ctrl *bio_ctrl)
+                                           struct writeback_control *wbc)
 {
-       u64 disk_bytenr = eb->start;
-       int i, num_pages;
+       struct btrfs_fs_info *fs_info = eb->fs_info;
+       struct btrfs_bio *bbio;
 
        prepare_eb_write(eb);
 
-       bio_ctrl->end_io_func = end_bio_extent_buffer_writepage;
-
-       num_pages = num_extent_pages(eb);
-       for (i = 0; i < num_pages; i++) {
-               struct page *p = eb->pages[i];
-
-               clear_page_dirty_for_io(p);
-               set_page_writeback(p);
-               submit_extent_page(bio_ctrl, disk_bytenr, p, PAGE_SIZE, 0);
-               disk_bytenr += PAGE_SIZE;
-               bio_ctrl->wbc->nr_to_write--;
+       bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
+                              REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
+                              eb->fs_info, extent_buffer_write_end_io, eb);
+       bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
+       bio_set_dev(&bbio->bio, fs_info->fs_devices->latest_dev->bdev);
+       wbc_init_bio(wbc, &bbio->bio);
+       bbio->inode = BTRFS_I(eb->fs_info->btree_inode);
+       bbio->file_offset = eb->start;
+       if (fs_info->nodesize < PAGE_SIZE) {
+               struct page *p = eb->pages[0];
+
+               lock_page(p);
+               btrfs_subpage_set_writeback(fs_info, p, eb->start, eb->len);
+               if (btrfs_subpage_clear_and_test_dirty(fs_info, p, eb->start,
+                                                      eb->len)) {
+                       clear_page_dirty_for_io(p);
+                       wbc->nr_to_write--;
+               }
+               __bio_add_page(&bbio->bio, p, eb->len, eb->start - page_offset(p));
+               wbc_account_cgroup_owner(wbc, p, eb->len);
                unlock_page(p);
+       } else {
+               for (int i = 0; i < num_extent_pages(eb); i++) {
+                       struct page *p = eb->pages[i];
+
+                       lock_page(p);
+                       clear_page_dirty_for_io(p);
+                       set_page_writeback(p);
+                       __bio_add_page(&bbio->bio, p, PAGE_SIZE, 0);
+                       wbc_account_cgroup_owner(wbc, p, PAGE_SIZE);
+                       wbc->nr_to_write--;
+                       unlock_page(p);
+               }
        }
+       btrfs_submit_bio(bbio, 0);
 }
 
 /*
@@ -2068,14 +1773,13 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
  * Return >=0 for the number of submitted extent buffers.
  * Return <0 for fatal error.
  */
-static int submit_eb_subpage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl)
+static int submit_eb_subpage(struct page *page, struct writeback_control *wbc)
 {
        struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
        int submitted = 0;
        u64 page_start = page_offset(page);
        int bit_start = 0;
        int sectors_per_node = fs_info->nodesize >> fs_info->sectorsize_bits;
-       int ret;
 
        /* Lock and write each dirty extent buffers in the range */
        while (bit_start < fs_info->subpage_info->bitmap_nr_bits) {
@@ -2121,25 +1825,13 @@ static int submit_eb_subpage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl)
                if (!eb)
                        continue;
 
-               ret = lock_extent_buffer_for_io(eb, bio_ctrl);
-               if (ret == 0) {
-                       free_extent_buffer(eb);
-                       continue;
+               if (lock_extent_buffer_for_io(eb, wbc)) {
+                       write_one_eb(eb, wbc);
+                       submitted++;
                }
-               if (ret < 0) {
-                       free_extent_buffer(eb);
-                       goto cleanup;
-               }
-               write_one_subpage_eb(eb, bio_ctrl);
                free_extent_buffer(eb);
-               submitted++;
        }
        return submitted;
-
-cleanup:
-       /* We hit error, end bio for the submitted extent buffers */
-       submit_write_bio(bio_ctrl, ret);
-       return ret;
 }
 
 /*
@@ -2162,7 +1854,7 @@ cleanup:
  * previous call.
  * Return <0 for fatal error.
  */
-static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
+static int submit_eb_page(struct page *page, struct writeback_control *wbc,
                          struct extent_buffer **eb_context)
 {
        struct address_space *mapping = page->mapping;
@@ -2174,7 +1866,7 @@ static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
                return 0;
 
        if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE)
-               return submit_eb_subpage(page, bio_ctrl);
+               return submit_eb_subpage(page, wbc);
 
        spin_lock(&mapping->private_lock);
        if (!PagePrivate(page)) {
@@ -2207,8 +1899,7 @@ static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
                 * If for_sync, this hole will be filled with
                 * trasnsaction commit.
                 */
-               if (bio_ctrl->wbc->sync_mode == WB_SYNC_ALL &&
-                   !bio_ctrl->wbc->for_sync)
+               if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync)
                        ret = -EAGAIN;
                else
                        ret = 0;
@@ -2218,13 +1909,12 @@ static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
 
        *eb_context = eb;
 
-       ret = lock_extent_buffer_for_io(eb, bio_ctrl);
-       if (ret <= 0) {
+       if (!lock_extent_buffer_for_io(eb, wbc)) {
                btrfs_revert_meta_write_pointer(cache, eb);
                if (cache)
                        btrfs_put_block_group(cache);
                free_extent_buffer(eb);
-               return ret;
+               return 0;
        }
        if (cache) {
                /*
@@ -2233,7 +1923,7 @@ static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
                btrfs_schedule_zone_finish_bg(cache, eb);
                btrfs_put_block_group(cache);
        }
-       write_one_eb(eb, bio_ctrl);
+       write_one_eb(eb, wbc);
        free_extent_buffer(eb);
        return 1;
 }
@@ -2242,11 +1932,6 @@ int btree_write_cache_pages(struct address_space *mapping,
                                   struct writeback_control *wbc)
 {
        struct extent_buffer *eb_context = NULL;
-       struct btrfs_bio_ctrl bio_ctrl = {
-               .wbc = wbc,
-               .opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
-               .extent_locked = 0,
-       };
        struct btrfs_fs_info *fs_info = BTRFS_I(mapping->host)->root->fs_info;
        int ret = 0;
        int done = 0;
@@ -2288,7 +1973,7 @@ retry:
                for (i = 0; i < nr_folios; i++) {
                        struct folio *folio = fbatch.folios[i];
 
-                       ret = submit_eb_page(&folio->page, &bio_ctrl, &eb_context);
+                       ret = submit_eb_page(&folio->page, wbc, &eb_context);
                        if (ret == 0)
                                continue;
                        if (ret < 0) {
@@ -2349,8 +2034,6 @@ retry:
                ret = 0;
        if (!ret && BTRFS_FS_ERROR(fs_info))
                ret = -EROFS;
-       submit_write_bio(&bio_ctrl, ret);
-
        btrfs_zoned_meta_io_unlock(fs_info);
        return ret;
 }
@@ -2520,38 +2203,31 @@ retry:
  * already been ran (aka, ordered extent inserted) and all pages are still
  * locked.
  */
-int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
+int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
+                             struct writeback_control *wbc)
 {
        bool found_error = false;
        int first_error = 0;
        int ret = 0;
        struct address_space *mapping = inode->i_mapping;
-       struct page *page;
+       struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+       const u32 sectorsize = fs_info->sectorsize;
+       loff_t i_size = i_size_read(inode);
        u64 cur = start;
-       unsigned long nr_pages;
-       const u32 sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
-       struct writeback_control wbc_writepages = {
-               .sync_mode      = WB_SYNC_ALL,
-               .range_start    = start,
-               .range_end      = end + 1,
-               .no_cgroup_owner = 1,
-       };
        struct btrfs_bio_ctrl bio_ctrl = {
-               .wbc = &wbc_writepages,
-               /* We're called from an async helper function */
-               .opf = REQ_OP_WRITE | REQ_BTRFS_CGROUP_PUNT |
-                       wbc_to_write_flags(&wbc_writepages),
-               .extent_locked = 1,
+               .wbc = wbc,
+               .opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
        };
 
+       if (wbc->no_cgroup_owner)
+               bio_ctrl.opf |= REQ_BTRFS_CGROUP_PUNT;
+
        ASSERT(IS_ALIGNED(start, sectorsize) && IS_ALIGNED(end + 1, sectorsize));
-       nr_pages = (round_up(end, PAGE_SIZE) - round_down(start, PAGE_SIZE)) >>
-                  PAGE_SHIFT;
-       wbc_writepages.nr_to_write = nr_pages * 2;
 
-       wbc_attach_fdatawrite_inode(&wbc_writepages, inode);
        while (cur <= end) {
                u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
+               struct page *page;
+               int nr = 0;
 
                page = find_get_page(mapping, cur >> PAGE_SHIFT);
                /*
@@ -2562,19 +2238,31 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
                ASSERT(PageLocked(page));
                ASSERT(PageDirty(page));
                clear_page_dirty_for_io(page);
-               ret = __extent_writepage(page, &bio_ctrl);
-               ASSERT(ret <= 0);
+
+               ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl,
+                                           i_size, &nr);
+               if (ret == 1)
+                       goto next_page;
+
+               /* Make sure the mapping tag for page dirty gets cleared. */
+               if (nr == 0) {
+                       set_page_writeback(page);
+                       end_page_writeback(page);
+               }
+               if (ret)
+                       end_extent_writepage(page, ret, cur, cur_end);
+               btrfs_page_unlock_writer(fs_info, page, cur, cur_end + 1 - cur);
                if (ret < 0) {
                        found_error = true;
                        first_error = ret;
                }
+next_page:
                put_page(page);
                cur = cur_end + 1;
        }
 
        submit_write_bio(&bio_ctrl, found_error ? ret : 0);
 
-       wbc_detach_inode(&wbc_writepages);
        if (found_error)
                return first_error;
        return ret;
@@ -2588,7 +2276,6 @@ int extent_writepages(struct address_space *mapping,
        struct btrfs_bio_ctrl bio_ctrl = {
                .wbc = wbc,
                .opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
-               .extent_locked = 0,
        };
 
        /*
@@ -2679,8 +2366,7 @@ static int try_release_extent_state(struct extent_io_tree *tree,
                 * The delalloc new bit will be cleared by ordered extent
                 * completion.
                 */
-               ret = __clear_extent_bit(tree, start, end, clear_bits, NULL,
-                                        mask, NULL);
+               ret = __clear_extent_bit(tree, start, end, clear_bits, NULL, NULL);
 
                /* if clear_extent_bit failed for enomem reasons,
                 * we can't allow the release to continue.
@@ -3421,10 +3107,9 @@ static void __free_extent_buffer(struct extent_buffer *eb)
        kmem_cache_free(extent_buffer_cache, eb);
 }
 
-int extent_buffer_under_io(const struct extent_buffer *eb)
+static int extent_buffer_under_io(const struct extent_buffer *eb)
 {
-       return (atomic_read(&eb->io_pages) ||
-               test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
+       return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
                test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
@@ -3557,11 +3242,9 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
        init_rwsem(&eb->lock);
 
        btrfs_leak_debug_add_eb(eb);
-       INIT_LIST_HEAD(&eb->release_list);
 
        spin_lock_init(&eb->refs_lock);
        atomic_set(&eb->refs, 1);
-       atomic_set(&eb->io_pages, 0);
 
        ASSERT(len <= BTRFS_MAX_METADATA_BLOCKSIZE);
 
@@ -3678,9 +3361,9 @@ static void check_buffer_tree_ref(struct extent_buffer *eb)
         * adequately protected by the refcount, but the TREE_REF bit and
         * its corresponding reference are not. To protect against this
         * class of races, we call check_buffer_tree_ref from the codepaths
-        * which trigger io after they set eb->io_pages. Note that once io is
-        * initiated, TREE_REF can no longer be cleared, so that is the
-        * moment at which any such race is best fixed.
+        * which trigger io. Note that once io is initiated, TREE_REF can no
+        * longer be cleared, so that is the moment at which any such race is
+        * best fixed.
         */
        refs = atomic_read(&eb->refs);
        if (refs >= 2 && test_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags))
@@ -3939,7 +3622,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 
                WARN_ON(btrfs_page_test_dirty(fs_info, p, eb->start, eb->len));
                eb->pages[i] = p;
-               if (!PageUptodate(p))
+               if (!btrfs_page_test_uptodate(fs_info, p, eb->start, eb->len))
                        uptodate = 0;
 
                /*
@@ -4142,13 +3825,12 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
                        continue;
                lock_page(page);
                btree_clear_page_dirty(page);
-               ClearPageError(page);
                unlock_page(page);
        }
        WARN_ON(atomic_read(&eb->refs) == 0);
 }
 
-bool set_extent_buffer_dirty(struct extent_buffer *eb)
+void set_extent_buffer_dirty(struct extent_buffer *eb)
 {
        int i;
        int num_pages;
@@ -4183,13 +3865,14 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
                                             eb->start, eb->len);
                if (subpage)
                        unlock_page(eb->pages[0]);
+               percpu_counter_add_batch(&eb->fs_info->dirty_metadata_bytes,
+                                        eb->len,
+                                        eb->fs_info->dirty_metadata_batch);
        }
 #ifdef CONFIG_BTRFS_DEBUG
        for (i = 0; i < num_pages; i++)
                ASSERT(PageDirty(eb->pages[i]));
 #endif
-
-       return was_dirty;
 }
 
 void clear_extent_buffer_uptodate(struct extent_buffer *eb)
@@ -4242,84 +3925,54 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
        }
 }
 
-static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
-                                     int mirror_num,
-                                     struct btrfs_tree_parent_check *check)
+static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
 {
+       struct extent_buffer *eb = bbio->private;
        struct btrfs_fs_info *fs_info = eb->fs_info;
-       struct extent_io_tree *io_tree;
-       struct page *page = eb->pages[0];
-       struct extent_state *cached_state = NULL;
-       struct btrfs_bio_ctrl bio_ctrl = {
-               .opf = REQ_OP_READ,
-               .mirror_num = mirror_num,
-               .parent_check = check,
-       };
-       int ret;
+       bool uptodate = !bbio->bio.bi_status;
+       struct bvec_iter_all iter_all;
+       struct bio_vec *bvec;
+       u32 bio_offset = 0;
 
-       ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
-       ASSERT(PagePrivate(page));
-       ASSERT(check);
-       io_tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
+       eb->read_mirror = bbio->mirror_num;
 
-       if (wait == WAIT_NONE) {
-               if (!try_lock_extent(io_tree, eb->start, eb->start + eb->len - 1,
-                                    &cached_state))
-                       return -EAGAIN;
-       } else {
-               ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1,
-                                 &cached_state);
-               if (ret < 0)
-                       return ret;
-       }
+       if (uptodate &&
+           btrfs_validate_extent_buffer(eb, &bbio->parent_check) < 0)
+               uptodate = false;
 
-       if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
-           PageUptodate(page) ||
-           btrfs_subpage_test_uptodate(fs_info, page, eb->start, eb->len)) {
-               set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
-               unlock_extent(io_tree, eb->start, eb->start + eb->len - 1,
-                             &cached_state);
-               return 0;
+       if (uptodate) {
+               set_extent_buffer_uptodate(eb);
+       } else {
+               clear_extent_buffer_uptodate(eb);
+               set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
        }
 
-       clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
-       eb->read_mirror = 0;
-       atomic_set(&eb->io_pages, 1);
-       check_buffer_tree_ref(eb);
-       bio_ctrl.end_io_func = end_bio_extent_readpage;
+       bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
+               u64 start = eb->start + bio_offset;
+               struct page *page = bvec->bv_page;
+               u32 len = bvec->bv_len;
 
-       btrfs_subpage_clear_error(fs_info, page, eb->start, eb->len);
+               if (uptodate)
+                       btrfs_page_set_uptodate(fs_info, page, start, len);
+               else
+                       btrfs_page_clear_uptodate(fs_info, page, start, len);
 
-       btrfs_subpage_start_reader(fs_info, page, eb->start, eb->len);
-       submit_extent_page(&bio_ctrl, eb->start, page, eb->len,
-                          eb->start - page_offset(page));
-       submit_one_bio(&bio_ctrl);
-       if (wait != WAIT_COMPLETE) {
-               free_extent_state(cached_state);
-               return 0;
+               bio_offset += len;
        }
 
-       wait_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
-                       EXTENT_LOCKED, &cached_state);
-       if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
-               return -EIO;
-       return 0;
+       clear_bit(EXTENT_BUFFER_READING, &eb->bflags);
+       smp_mb__after_atomic();
+       wake_up_bit(&eb->bflags, EXTENT_BUFFER_READING);
+       free_extent_buffer(eb);
+
+       bio_put(&bbio->bio);
 }
 
 int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
                             struct btrfs_tree_parent_check *check)
 {
-       int i;
-       struct page *page;
-       int locked_pages = 0;
-       int all_uptodate = 1;
-       int num_pages;
-       unsigned long num_reads = 0;
-       struct btrfs_bio_ctrl bio_ctrl = {
-               .opf = REQ_OP_READ,
-               .mirror_num = mirror_num,
-               .parent_check = check,
-       };
+       int num_pages = num_extent_pages(eb), i;
+       struct btrfs_bio *bbio;
 
        if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
                return 0;
@@ -4332,87 +3985,39 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
        if (unlikely(test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)))
                return -EIO;
 
-       if (eb->fs_info->nodesize < PAGE_SIZE)
-               return read_extent_buffer_subpage(eb, wait, mirror_num, check);
-
-       num_pages = num_extent_pages(eb);
-       for (i = 0; i < num_pages; i++) {
-               page = eb->pages[i];
-               if (wait == WAIT_NONE) {
-                       /*
-                        * WAIT_NONE is only utilized by readahead. If we can't
-                        * acquire the lock atomically it means either the eb
-                        * is being read out or under modification.
-                        * Either way the eb will be or has been cached,
-                        * readahead can exit safely.
-                        */
-                       if (!trylock_page(page))
-                               goto unlock_exit;
-               } else {
-                       lock_page(page);
-               }
-               locked_pages++;
-       }
-       /*
-        * We need to firstly lock all pages to make sure that
-        * the uptodate bit of our pages won't be affected by
-        * clear_extent_buffer_uptodate().
-        */
-       for (i = 0; i < num_pages; i++) {
-               page = eb->pages[i];
-               if (!PageUptodate(page)) {
-                       num_reads++;
-                       all_uptodate = 0;
-               }
-       }
-
-       if (all_uptodate) {
-               set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
-               goto unlock_exit;
-       }
+       /* Someone else is already reading the buffer, just wait for it. */
+       if (test_and_set_bit(EXTENT_BUFFER_READING, &eb->bflags))
+               goto done;
 
        clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
        eb->read_mirror = 0;
-       atomic_set(&eb->io_pages, num_reads);
-       /*
-        * It is possible for release_folio to clear the TREE_REF bit before we
-        * set io_pages. See check_buffer_tree_ref for a more detailed comment.
-        */
        check_buffer_tree_ref(eb);
-       bio_ctrl.end_io_func = end_bio_extent_readpage;
-       for (i = 0; i < num_pages; i++) {
-               page = eb->pages[i];
-
-               if (!PageUptodate(page)) {
-                       ClearPageError(page);
-                       submit_extent_page(&bio_ctrl, page_offset(page), page,
-                                          PAGE_SIZE, 0);
-               } else {
-                       unlock_page(page);
-               }
+       atomic_inc(&eb->refs);
+
+       bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
+                              REQ_OP_READ | REQ_META, eb->fs_info,
+                              extent_buffer_read_end_io, eb);
+       bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
+       bbio->inode = BTRFS_I(eb->fs_info->btree_inode);
+       bbio->file_offset = eb->start;
+       memcpy(&bbio->parent_check, check, sizeof(*check));
+       if (eb->fs_info->nodesize < PAGE_SIZE) {
+               __bio_add_page(&bbio->bio, eb->pages[0], eb->len,
+                              eb->start - page_offset(eb->pages[0]));
+       } else {
+               for (i = 0; i < num_pages; i++)
+                       __bio_add_page(&bbio->bio, eb->pages[i], PAGE_SIZE, 0);
        }
+       btrfs_submit_bio(bbio, mirror_num);
 
-       submit_one_bio(&bio_ctrl);
-
-       if (wait != WAIT_COMPLETE)
-               return 0;
-
-       for (i = 0; i < num_pages; i++) {
-               page = eb->pages[i];
-               wait_on_page_locked(page);
-               if (!PageUptodate(page))
+done:
+       if (wait == WAIT_COMPLETE) {
+               wait_on_bit_io(&eb->bflags, EXTENT_BUFFER_READING, TASK_UNINTERRUPTIBLE);
+               if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
                        return -EIO;
        }
 
        return 0;
-
-unlock_exit:
-       while (locked_pages > 0) {
-               locked_pages--;
-               page = eb->pages[locked_pages];
-               unlock_page(page);
-       }
-       return 0;
 }
 
 static bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
@@ -4561,18 +4166,17 @@ static void assert_eb_page_uptodate(const struct extent_buffer *eb,
         * looked up.  We don't want to complain in this case, as the page was
         * valid before, we just didn't write it out.  Instead we want to catch
         * the case where we didn't actually read the block properly, which
-        * would have !PageUptodate && !PageError, as we clear PageError before
-        * reading.
+        * would have !PageUptodate and !EXTENT_BUFFER_WRITE_ERR.
         */
-       if (fs_info->nodesize < PAGE_SIZE) {
-               bool uptodate, error;
+       if (test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
+               return;
 
-               uptodate = btrfs_subpage_test_uptodate(fs_info, page,
-                                                      eb->start, eb->len);
-               error = btrfs_subpage_test_error(fs_info, page, eb->start, eb->len);
-               WARN_ON(!uptodate && !error);
+       if (fs_info->nodesize < PAGE_SIZE) {
+               if (WARN_ON(!btrfs_subpage_test_uptodate(fs_info, page,
+                                                        eb->start, eb->len)))
+                       btrfs_subpage_dump_bitmap(fs_info, page, eb->start, eb->len);
        } else {
-               WARN_ON(!PageUptodate(page) && !PageError(page));
+               WARN_ON(!PageUptodate(page));
        }
 }
 
index 4341ad9..c5fae3a 100644 (file)
@@ -29,6 +29,8 @@ enum {
        /* write IO error */
        EXTENT_BUFFER_WRITE_ERR,
        EXTENT_BUFFER_NO_CHECK,
+       /* Indicate that extent buffer pages a being read */
+       EXTENT_BUFFER_READING,
 };
 
 /* these are flags for __process_pages_contig */
@@ -38,7 +40,6 @@ enum {
        ENUM_BIT(PAGE_START_WRITEBACK),
        ENUM_BIT(PAGE_END_WRITEBACK),
        ENUM_BIT(PAGE_SET_ORDERED),
-       ENUM_BIT(PAGE_SET_ERROR),
        ENUM_BIT(PAGE_LOCK),
 };
 
@@ -79,7 +80,6 @@ struct extent_buffer {
        struct btrfs_fs_info *fs_info;
        spinlock_t refs_lock;
        atomic_t refs;
-       atomic_t io_pages;
        int read_mirror;
        struct rcu_head rcu_head;
        pid_t lock_owner;
@@ -89,7 +89,6 @@ struct extent_buffer {
        struct rw_semaphore lock;
 
        struct page *pages[INLINE_EXTENT_BUFFER_PAGES];
-       struct list_head release_list;
 #ifdef CONFIG_BTRFS_DEBUG
        struct list_head leak_list;
 #endif
@@ -179,7 +178,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 
 int btrfs_read_folio(struct file *file, struct folio *folio);
-int extent_write_locked_range(struct inode *inode, u64 start, u64 end);
+int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
+                             struct writeback_control *wbc);
 int extent_writepages(struct address_space *mapping,
                      struct writeback_control *wbc);
 int btree_write_cache_pages(struct address_space *mapping,
@@ -262,10 +262,9 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
 void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
                                unsigned long start, unsigned long pos,
                                unsigned long len);
-bool set_extent_buffer_dirty(struct extent_buffer *eb);
+void set_extent_buffer_dirty(struct extent_buffer *eb);
 void set_extent_buffer_uptodate(struct extent_buffer *eb);
 void clear_extent_buffer_uptodate(struct extent_buffer *eb);
-int extent_buffer_under_io(const struct extent_buffer *eb);
 void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
index 138afa9..0cdb3e8 100644 (file)
@@ -364,8 +364,9 @@ static void extent_map_device_set_bits(struct extent_map *em, unsigned bits)
                struct btrfs_io_stripe *stripe = &map->stripes[i];
                struct btrfs_device *device = stripe->dev;
 
-               set_extent_bits_nowait(&device->alloc_state, stripe->physical,
-                                stripe->physical + stripe_size - 1, bits);
+               set_extent_bit(&device->alloc_state, stripe->physical,
+                              stripe->physical + stripe_size - 1,
+                              bits | EXTENT_NOWAIT, NULL);
        }
 }
 
@@ -380,8 +381,9 @@ static void extent_map_device_clear_bits(struct extent_map *em, unsigned bits)
                struct btrfs_device *device = stripe->dev;
 
                __clear_extent_bit(&device->alloc_state, stripe->physical,
-                                  stripe->physical + stripe_size - 1, bits,
-                                  NULL, GFP_NOWAIT, NULL);
+                                  stripe->physical + stripe_size - 1,
+                                  bits | EXTENT_NOWAIT,
+                                  NULL, NULL);
        }
 }
 
@@ -502,10 +504,10 @@ void remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em)
        RB_CLEAR_NODE(&em->rb_node);
 }
 
-void replace_extent_mapping(struct extent_map_tree *tree,
-                           struct extent_map *cur,
-                           struct extent_map *new,
-                           int modified)
+static void replace_extent_mapping(struct extent_map_tree *tree,
+                                  struct extent_map *cur,
+                                  struct extent_map *new,
+                                  int modified)
 {
        lockdep_assert_held_write(&tree->lock);
 
@@ -959,3 +961,95 @@ int btrfs_replace_extent_map_range(struct btrfs_inode *inode,
 
        return ret;
 }
+
+/*
+ * Split off the first pre bytes from the extent_map at [start, start + len],
+ * and set the block_start for it to new_logical.
+ *
+ * This function is used when an ordered_extent needs to be split.
+ */
+int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
+                    u64 new_logical)
+{
+       struct extent_map_tree *em_tree = &inode->extent_tree;
+       struct extent_map *em;
+       struct extent_map *split_pre = NULL;
+       struct extent_map *split_mid = NULL;
+       int ret = 0;
+       unsigned long flags;
+
+       ASSERT(pre != 0);
+       ASSERT(pre < len);
+
+       split_pre = alloc_extent_map();
+       if (!split_pre)
+               return -ENOMEM;
+       split_mid = alloc_extent_map();
+       if (!split_mid) {
+               ret = -ENOMEM;
+               goto out_free_pre;
+       }
+
+       lock_extent(&inode->io_tree, start, start + len - 1, NULL);
+       write_lock(&em_tree->lock);
+       em = lookup_extent_mapping(em_tree, start, len);
+       if (!em) {
+               ret = -EIO;
+               goto out_unlock;
+       }
+
+       ASSERT(em->len == len);
+       ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags));
+       ASSERT(em->block_start < EXTENT_MAP_LAST_BYTE);
+       ASSERT(test_bit(EXTENT_FLAG_PINNED, &em->flags));
+       ASSERT(!test_bit(EXTENT_FLAG_LOGGING, &em->flags));
+       ASSERT(!list_empty(&em->list));
+
+       flags = em->flags;
+       clear_bit(EXTENT_FLAG_PINNED, &em->flags);
+
+       /* First, replace the em with a new extent_map starting from * em->start */
+       split_pre->start = em->start;
+       split_pre->len = pre;
+       split_pre->orig_start = split_pre->start;
+       split_pre->block_start = new_logical;
+       split_pre->block_len = split_pre->len;
+       split_pre->orig_block_len = split_pre->block_len;
+       split_pre->ram_bytes = split_pre->len;
+       split_pre->flags = flags;
+       split_pre->compress_type = em->compress_type;
+       split_pre->generation = em->generation;
+
+       replace_extent_mapping(em_tree, em, split_pre, 1);
+
+       /*
+        * Now we only have an extent_map at:
+        *     [em->start, em->start + pre]
+        */
+
+       /* Insert the middle extent_map. */
+       split_mid->start = em->start + pre;
+       split_mid->len = em->len - pre;
+       split_mid->orig_start = split_mid->start;
+       split_mid->block_start = em->block_start + pre;
+       split_mid->block_len = split_mid->len;
+       split_mid->orig_block_len = split_mid->block_len;
+       split_mid->ram_bytes = split_mid->len;
+       split_mid->flags = flags;
+       split_mid->compress_type = em->compress_type;
+       split_mid->generation = em->generation;
+       add_extent_mapping(em_tree, split_mid, 1);
+
+       /* Once for us */
+       free_extent_map(em);
+       /* Once for the tree */
+       free_extent_map(em);
+
+out_unlock:
+       write_unlock(&em_tree->lock);
+       unlock_extent(&inode->io_tree, start, start + len - 1, NULL);
+       free_extent_map(split_mid);
+out_free_pre:
+       free_extent_map(split_pre);
+       return ret;
+}
index ad31186..35d27c7 100644 (file)
@@ -90,10 +90,8 @@ struct extent_map *lookup_extent_mapping(struct extent_map_tree *tree,
 int add_extent_mapping(struct extent_map_tree *tree,
                       struct extent_map *em, int modified);
 void remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em);
-void replace_extent_mapping(struct extent_map_tree *tree,
-                           struct extent_map *cur,
-                           struct extent_map *new,
-                           int modified);
+int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
+                    u64 new_logical);
 
 struct extent_map *alloc_extent_map(void);
 void free_extent_map(struct extent_map *em);
index 018c711..696bf69 100644 (file)
@@ -52,13 +52,13 @@ void btrfs_inode_safe_disk_i_size_write(struct btrfs_inode *inode, u64 new_i_siz
        u64 start, end, i_size;
        int ret;
 
+       spin_lock(&inode->lock);
        i_size = new_i_size ?: i_size_read(&inode->vfs_inode);
        if (btrfs_fs_incompat(fs_info, NO_HOLES)) {
                inode->disk_i_size = i_size;
-               return;
+               goto out_unlock;
        }
 
-       spin_lock(&inode->lock);
        ret = find_contiguous_extent_bit(&inode->file_extent_tree, 0, &start,
                                         &end, EXTENT_DIRTY);
        if (!ret && start == 0)
@@ -66,6 +66,7 @@ void btrfs_inode_safe_disk_i_size_write(struct btrfs_inode *inode, u64 new_i_siz
        else
                i_size = 0;
        inode->disk_i_size = i_size;
+out_unlock:
        spin_unlock(&inode->lock);
 }
 
@@ -93,8 +94,8 @@ int btrfs_inode_set_file_extent_range(struct btrfs_inode *inode, u64 start,
 
        if (btrfs_fs_incompat(inode->root->fs_info, NO_HOLES))
                return 0;
-       return set_extent_bits(&inode->file_extent_tree, start, start + len - 1,
-                              EXTENT_DIRTY);
+       return set_extent_bit(&inode->file_extent_tree, start, start + len - 1,
+                             EXTENT_DIRTY, NULL);
 }
 
 /*
@@ -437,9 +438,9 @@ blk_status_t btrfs_lookup_bio_sums(struct btrfs_bio *bbio)
                            BTRFS_DATA_RELOC_TREE_OBJECTID) {
                                u64 file_offset = bbio->file_offset + bio_offset;
 
-                               set_extent_bits(&inode->io_tree, file_offset,
-                                               file_offset + sectorsize - 1,
-                                               EXTENT_NODATASUM);
+                               set_extent_bit(&inode->io_tree, file_offset,
+                                              file_offset + sectorsize - 1,
+                                              EXTENT_NODATASUM, NULL);
                        } else {
                                btrfs_warn_rl(fs_info,
                        "csum hole found for disk bytenr range [%llu, %llu)",
@@ -559,8 +560,8 @@ int btrfs_lookup_csums_list(struct btrfs_root *root, u64 start, u64 end,
                                goto fail;
                        }
 
-                       sums->bytenr = start;
-                       sums->len = (int)size;
+                       sums->logical = start;
+                       sums->len = size;
 
                        offset = bytes_to_csum_size(fs_info, start - key.offset);
 
@@ -720,20 +721,17 @@ fail:
  */
 blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio)
 {
+       struct btrfs_ordered_extent *ordered = bbio->ordered;
        struct btrfs_inode *inode = bbio->inode;
        struct btrfs_fs_info *fs_info = inode->root->fs_info;
        SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
        struct bio *bio = &bbio->bio;
-       u64 offset = bbio->file_offset;
        struct btrfs_ordered_sum *sums;
-       struct btrfs_ordered_extent *ordered = NULL;
        char *data;
        struct bvec_iter iter;
        struct bio_vec bvec;
        int index;
        unsigned int blockcount;
-       unsigned long total_bytes = 0;
-       unsigned long this_sum_bytes = 0;
        int i;
        unsigned nofs_flag;
 
@@ -748,59 +746,17 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio)
        sums->len = bio->bi_iter.bi_size;
        INIT_LIST_HEAD(&sums->list);
 
-       sums->bytenr = bio->bi_iter.bi_sector << 9;
+       sums->logical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
        index = 0;
 
        shash->tfm = fs_info->csum_shash;
 
        bio_for_each_segment(bvec, bio, iter) {
-               if (!ordered) {
-                       ordered = btrfs_lookup_ordered_extent(inode, offset);
-                       /*
-                        * The bio range is not covered by any ordered extent,
-                        * must be a code logic error.
-                        */
-                       if (unlikely(!ordered)) {
-                               WARN(1, KERN_WARNING
-                       "no ordered extent for root %llu ino %llu offset %llu\n",
-                                    inode->root->root_key.objectid,
-                                    btrfs_ino(inode), offset);
-                               kvfree(sums);
-                               return BLK_STS_IOERR;
-                       }
-               }
-
                blockcount = BTRFS_BYTES_TO_BLKS(fs_info,
                                                 bvec.bv_len + fs_info->sectorsize
                                                 - 1);
 
                for (i = 0; i < blockcount; i++) {
-                       if (!(bio->bi_opf & REQ_BTRFS_ONE_ORDERED) &&
-                           !in_range(offset, ordered->file_offset,
-                                     ordered->num_bytes)) {
-                               unsigned long bytes_left;
-
-                               sums->len = this_sum_bytes;
-                               this_sum_bytes = 0;
-                               btrfs_add_ordered_sum(ordered, sums);
-                               btrfs_put_ordered_extent(ordered);
-
-                               bytes_left = bio->bi_iter.bi_size - total_bytes;
-
-                               nofs_flag = memalloc_nofs_save();
-                               sums = kvzalloc(btrfs_ordered_sum_size(fs_info,
-                                                     bytes_left), GFP_KERNEL);
-                               memalloc_nofs_restore(nofs_flag);
-                               BUG_ON(!sums); /* -ENOMEM */
-                               sums->len = bytes_left;
-                               ordered = btrfs_lookup_ordered_extent(inode,
-                                                               offset);
-                               ASSERT(ordered); /* Logic error */
-                               sums->bytenr = (bio->bi_iter.bi_sector << 9)
-                                       + total_bytes;
-                               index = 0;
-                       }
-
                        data = bvec_kmap_local(&bvec);
                        crypto_shash_digest(shash,
                                            data + (i * fs_info->sectorsize),
@@ -808,15 +764,28 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio)
                                            sums->sums + index);
                        kunmap_local(data);
                        index += fs_info->csum_size;
-                       offset += fs_info->sectorsize;
-                       this_sum_bytes += fs_info->sectorsize;
-                       total_bytes += fs_info->sectorsize;
                }
 
        }
-       this_sum_bytes = 0;
+
+       bbio->sums = sums;
        btrfs_add_ordered_sum(ordered, sums);
-       btrfs_put_ordered_extent(ordered);
+       return 0;
+}
+
+/*
+ * Nodatasum I/O on zoned file systems still requires an btrfs_ordered_sum to
+ * record the updated logical address on Zone Append completion.
+ * Allocate just the structure with an empty sums array here for that case.
+ */
+blk_status_t btrfs_alloc_dummy_sum(struct btrfs_bio *bbio)
+{
+       bbio->sums = kmalloc(sizeof(*bbio->sums), GFP_NOFS);
+       if (!bbio->sums)
+               return BLK_STS_RESOURCE;
+       bbio->sums->len = bbio->bio.bi_iter.bi_size;
+       bbio->sums->logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
+       btrfs_add_ordered_sum(bbio->ordered, bbio->sums);
        return 0;
 }
 
@@ -1083,7 +1052,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
 again:
        next_offset = (u64)-1;
        found_next = 0;
-       bytenr = sums->bytenr + total_bytes;
+       bytenr = sums->logical + total_bytes;
        file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
        file_key.offset = bytenr;
        file_key.type = BTRFS_EXTENT_CSUM_KEY;
index 6be8725..4ec669b 100644 (file)
@@ -50,6 +50,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
                           struct btrfs_root *root,
                           struct btrfs_ordered_sum *sums);
 blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio);
+blk_status_t btrfs_alloc_dummy_sum(struct btrfs_bio *bbio);
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
                             struct list_head *list, int search_commit,
                             bool nowait);
index f649647..ba5b0c9 100644 (file)
@@ -1651,7 +1651,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
        struct file *file = iocb->ki_filp;
        struct btrfs_inode *inode = BTRFS_I(file_inode(file));
        ssize_t num_written, num_sync;
-       const bool sync = iocb_is_dsync(iocb);
 
        /*
         * If the fs flips readonly due to some impossible error, although we
@@ -1664,9 +1663,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
        if (encoded && (iocb->ki_flags & IOCB_NOWAIT))
                return -EOPNOTSUPP;
 
-       if (sync)
-               atomic_inc(&inode->sync_writers);
-
        if (encoded) {
                num_written = btrfs_encoded_write(iocb, from, encoded);
                num_sync = encoded->len;
@@ -1686,9 +1682,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
                        num_written = num_sync;
        }
 
-       if (sync)
-               atomic_dec(&inode->sync_writers);
-
        current->backing_dev_info = NULL;
        return num_written;
 }
@@ -1733,9 +1726,7 @@ static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
         * several segments of stripe length (currently 64K).
         */
        blk_start_plug(&plug);
-       atomic_inc(&BTRFS_I(inode)->sync_writers);
        ret = btrfs_fdatawrite_range(inode, start, end);
-       atomic_dec(&BTRFS_I(inode)->sync_writers);
        blk_finish_plug(&plug);
 
        return ret;
@@ -3709,7 +3700,8 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
 {
        int ret;
 
-       filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
+       filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC |
+                       FMODE_CAN_ODIRECT;
 
        ret = fsverity_file_open(inode, filp);
        if (ret)
@@ -3825,7 +3817,7 @@ static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 const struct file_operations btrfs_file_operations = {
        .llseek         = btrfs_file_llseek,
        .read_iter      = btrfs_file_read_iter,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .write_iter     = btrfs_file_write_iter,
        .splice_write   = iter_file_splice_write,
        .mmap           = btrfs_file_mmap,
index d84cef8..8808004 100644 (file)
@@ -292,25 +292,6 @@ out:
        return ret;
 }
 
-int btrfs_check_trunc_cache_free_space(struct btrfs_fs_info *fs_info,
-                                      struct btrfs_block_rsv *rsv)
-{
-       u64 needed_bytes;
-       int ret;
-
-       /* 1 for slack space, 1 for updating the inode */
-       needed_bytes = btrfs_calc_insert_metadata_size(fs_info, 1) +
-               btrfs_calc_metadata_size(fs_info, 1);
-
-       spin_lock(&rsv->lock);
-       if (rsv->reserved < needed_bytes)
-               ret = -ENOSPC;
-       else
-               ret = 0;
-       spin_unlock(&rsv->lock);
-       return ret;
-}
-
 int btrfs_truncate_free_space_cache(struct btrfs_trans_handle *trans,
                                    struct btrfs_block_group *block_group,
                                    struct inode *vfs_inode)
@@ -870,15 +851,16 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode,
                        }
                        spin_lock(&ctl->tree_lock);
                        ret = link_free_space(ctl, e);
-                       ctl->total_bitmaps++;
-                       recalculate_thresholds(ctl);
-                       spin_unlock(&ctl->tree_lock);
                        if (ret) {
+                               spin_unlock(&ctl->tree_lock);
                                btrfs_err(fs_info,
                                        "Duplicate entries in free space cache, dumping");
                                kmem_cache_free(btrfs_free_space_cachep, e);
                                goto free_cache;
                        }
+                       ctl->total_bitmaps++;
+                       recalculate_thresholds(ctl);
+                       spin_unlock(&ctl->tree_lock);
                        list_add_tail(&e->list, &bitmaps);
                }
 
@@ -922,27 +904,31 @@ static int copy_free_space_cache(struct btrfs_block_group *block_group,
        while (!ret && (n = rb_first(&ctl->free_space_offset)) != NULL) {
                info = rb_entry(n, struct btrfs_free_space, offset_index);
                if (!info->bitmap) {
+                       const u64 offset = info->offset;
+                       const u64 bytes = info->bytes;
+
                        unlink_free_space(ctl, info, true);
-                       ret = btrfs_add_free_space(block_group, info->offset,
-                                                  info->bytes);
+                       spin_unlock(&ctl->tree_lock);
                        kmem_cache_free(btrfs_free_space_cachep, info);
+                       ret = btrfs_add_free_space(block_group, offset, bytes);
+                       spin_lock(&ctl->tree_lock);
                } else {
                        u64 offset = info->offset;
                        u64 bytes = ctl->unit;
 
-                       while (search_bitmap(ctl, info, &offset, &bytes,
-                                            false) == 0) {
+                       ret = search_bitmap(ctl, info, &offset, &bytes, false);
+                       if (ret == 0) {
+                               bitmap_clear_bits(ctl, info, offset, bytes, true);
+                               spin_unlock(&ctl->tree_lock);
                                ret = btrfs_add_free_space(block_group, offset,
                                                           bytes);
-                               if (ret)
-                                       break;
-                               bitmap_clear_bits(ctl, info, offset, bytes, true);
-                               offset = info->offset;
-                               bytes = ctl->unit;
+                               spin_lock(&ctl->tree_lock);
+                       } else {
+                               free_bitmap(ctl, info);
+                               ret = 0;
                        }
-                       free_bitmap(ctl, info);
                }
-               cond_resched();
+               cond_resched_lock(&ctl->tree_lock);
        }
        return ret;
 }
@@ -1036,7 +1022,9 @@ int load_free_space_cache(struct btrfs_block_group *block_group)
                                          block_group->bytes_super));
 
        if (matched) {
+               spin_lock(&tmp_ctl.tree_lock);
                ret = copy_free_space_cache(block_group, &tmp_ctl);
+               spin_unlock(&tmp_ctl.tree_lock);
                /*
                 * ret == 1 means we successfully loaded the free space cache,
                 * so we need to re-set it here.
@@ -1595,20 +1583,34 @@ static inline u64 offset_to_bitmap(struct btrfs_free_space_ctl *ctl,
        return bitmap_start;
 }
 
-static int tree_insert_offset(struct rb_root *root, u64 offset,
-                             struct rb_node *node, int bitmap)
+static int tree_insert_offset(struct btrfs_free_space_ctl *ctl,
+                             struct btrfs_free_cluster *cluster,
+                             struct btrfs_free_space *new_entry)
 {
-       struct rb_node **p = &root->rb_node;
+       struct rb_root *root;
+       struct rb_node **p;
        struct rb_node *parent = NULL;
-       struct btrfs_free_space *info;
+
+       lockdep_assert_held(&ctl->tree_lock);
+
+       if (cluster) {
+               lockdep_assert_held(&cluster->lock);
+               root = &cluster->root;
+       } else {
+               root = &ctl->free_space_offset;
+       }
+
+       p = &root->rb_node;
 
        while (*p) {
+               struct btrfs_free_space *info;
+
                parent = *p;
                info = rb_entry(parent, struct btrfs_free_space, offset_index);
 
-               if (offset < info->offset) {
+               if (new_entry->offset < info->offset) {
                        p = &(*p)->rb_left;
-               } else if (offset > info->offset) {
+               } else if (new_entry->offset > info->offset) {
                        p = &(*p)->rb_right;
                } else {
                        /*
@@ -1624,7 +1626,7 @@ static int tree_insert_offset(struct rb_root *root, u64 offset,
                         * found a bitmap, we want to go left, or before
                         * logically.
                         */
-                       if (bitmap) {
+                       if (new_entry->bitmap) {
                                if (info->bitmap) {
                                        WARN_ON_ONCE(1);
                                        return -EEXIST;
@@ -1640,8 +1642,8 @@ static int tree_insert_offset(struct rb_root *root, u64 offset,
                }
        }
 
-       rb_link_node(node, parent, p);
-       rb_insert_color(node, root);
+       rb_link_node(&new_entry->offset_index, parent, p);
+       rb_insert_color(&new_entry->offset_index, root);
 
        return 0;
 }
@@ -1704,6 +1706,8 @@ tree_search_offset(struct btrfs_free_space_ctl *ctl,
        struct rb_node *n = ctl->free_space_offset.rb_node;
        struct btrfs_free_space *entry = NULL, *prev = NULL;
 
+       lockdep_assert_held(&ctl->tree_lock);
+
        /* find entry that is closest to the 'offset' */
        while (n) {
                entry = rb_entry(n, struct btrfs_free_space, offset_index);
@@ -1813,6 +1817,8 @@ static inline void unlink_free_space(struct btrfs_free_space_ctl *ctl,
                                     struct btrfs_free_space *info,
                                     bool update_stat)
 {
+       lockdep_assert_held(&ctl->tree_lock);
+
        rb_erase(&info->offset_index, &ctl->free_space_offset);
        rb_erase_cached(&info->bytes_index, &ctl->free_space_bytes);
        ctl->free_extents--;
@@ -1831,9 +1837,10 @@ static int link_free_space(struct btrfs_free_space_ctl *ctl,
 {
        int ret = 0;
 
+       lockdep_assert_held(&ctl->tree_lock);
+
        ASSERT(info->bytes || info->bitmap);
-       ret = tree_insert_offset(&ctl->free_space_offset, info->offset,
-                                &info->offset_index, (info->bitmap != NULL));
+       ret = tree_insert_offset(ctl, NULL, info);
        if (ret)
                return ret;
 
@@ -1861,6 +1868,8 @@ static void relink_bitmap_entry(struct btrfs_free_space_ctl *ctl,
        if (RB_EMPTY_NODE(&info->bytes_index))
                return;
 
+       lockdep_assert_held(&ctl->tree_lock);
+
        rb_erase_cached(&info->bytes_index, &ctl->free_space_bytes);
        rb_add_cached(&info->bytes_index, &ctl->free_space_bytes, entry_less);
 }
@@ -2446,6 +2455,7 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl,
        u64 offset = info->offset;
        u64 bytes = info->bytes;
        const bool is_trimmed = btrfs_free_space_trimmed(info);
+       struct rb_node *right_prev = NULL;
 
        /*
         * first we want to see if there is free space adjacent to the range we
@@ -2453,9 +2463,11 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl,
         * cover the entire range
         */
        right_info = tree_search_offset(ctl, offset + bytes, 0, 0);
-       if (right_info && rb_prev(&right_info->offset_index))
-               left_info = rb_entry(rb_prev(&right_info->offset_index),
-                                    struct btrfs_free_space, offset_index);
+       if (right_info)
+               right_prev = rb_prev(&right_info->offset_index);
+
+       if (right_prev)
+               left_info = rb_entry(right_prev, struct btrfs_free_space, offset_index);
        else if (!right_info)
                left_info = tree_search_offset(ctl, offset - 1, 0, 0);
 
@@ -2968,9 +2980,10 @@ static void __btrfs_return_cluster_to_free_space(
                             struct btrfs_free_cluster *cluster)
 {
        struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
-       struct btrfs_free_space *entry;
        struct rb_node *node;
 
+       lockdep_assert_held(&ctl->tree_lock);
+
        spin_lock(&cluster->lock);
        if (cluster->block_group != block_group) {
                spin_unlock(&cluster->lock);
@@ -2983,15 +2996,14 @@ static void __btrfs_return_cluster_to_free_space(
 
        node = rb_first(&cluster->root);
        while (node) {
-               bool bitmap;
+               struct btrfs_free_space *entry;
 
                entry = rb_entry(node, struct btrfs_free_space, offset_index);
                node = rb_next(&entry->offset_index);
                rb_erase(&entry->offset_index, &cluster->root);
                RB_CLEAR_NODE(&entry->offset_index);
 
-               bitmap = (entry->bitmap != NULL);
-               if (!bitmap) {
+               if (!entry->bitmap) {
                        /* Merging treats extents as if they were new */
                        if (!btrfs_free_space_trimmed(entry)) {
                                ctl->discardable_extents[BTRFS_STAT_CURR]--;
@@ -3009,8 +3021,7 @@ static void __btrfs_return_cluster_to_free_space(
                                        entry->bytes;
                        }
                }
-               tree_insert_offset(&ctl->free_space_offset,
-                                  entry->offset, &entry->offset_index, bitmap);
+               tree_insert_offset(ctl, NULL, entry);
                rb_add_cached(&entry->bytes_index, &ctl->free_space_bytes,
                              entry_less);
        }
@@ -3323,6 +3334,8 @@ static int btrfs_bitmap_cluster(struct btrfs_block_group *block_group,
        unsigned long total_found = 0;
        int ret;
 
+       lockdep_assert_held(&ctl->tree_lock);
+
        i = offset_to_bit(entry->offset, ctl->unit,
                          max_t(u64, offset, entry->offset));
        want_bits = bytes_to_bits(bytes, ctl->unit);
@@ -3384,8 +3397,7 @@ again:
         */
        RB_CLEAR_NODE(&entry->bytes_index);
 
-       ret = tree_insert_offset(&cluster->root, entry->offset,
-                                &entry->offset_index, 1);
+       ret = tree_insert_offset(ctl, cluster, entry);
        ASSERT(!ret); /* -EEXIST; Logic error */
 
        trace_btrfs_setup_cluster(block_group, cluster,
@@ -3413,6 +3425,8 @@ setup_cluster_no_bitmap(struct btrfs_block_group *block_group,
        u64 max_extent;
        u64 total_size = 0;
 
+       lockdep_assert_held(&ctl->tree_lock);
+
        entry = tree_search_offset(ctl, offset, 0, 1);
        if (!entry)
                return -ENOSPC;
@@ -3475,8 +3489,7 @@ setup_cluster_no_bitmap(struct btrfs_block_group *block_group,
 
                rb_erase(&entry->offset_index, &ctl->free_space_offset);
                rb_erase_cached(&entry->bytes_index, &ctl->free_space_bytes);
-               ret = tree_insert_offset(&cluster->root, entry->offset,
-                                        &entry->offset_index, 0);
+               ret = tree_insert_offset(ctl, cluster, entry);
                total_size += entry->bytes;
                ASSERT(!ret); /* -EEXIST; Logic error */
        } while (node && entry != last);
@@ -3670,7 +3683,7 @@ static int do_trimming(struct btrfs_block_group *block_group,
                __btrfs_add_free_space(block_group, reserved_start,
                                       start - reserved_start,
                                       reserved_trim_state);
-       if (start + bytes < reserved_start + reserved_bytes)
+       if (end < reserved_end)
                __btrfs_add_free_space(block_group, end, reserved_end - end,
                                       reserved_trim_state);
        __btrfs_add_free_space(block_group, start, bytes, trim_state);
index a855e04..33b4da3 100644 (file)
@@ -101,8 +101,6 @@ int btrfs_remove_free_space_inode(struct btrfs_trans_handle *trans,
                                  struct inode *inode,
                                  struct btrfs_block_group *block_group);
 
-int btrfs_check_trunc_cache_free_space(struct btrfs_fs_info *fs_info,
-                                      struct btrfs_block_rsv *rsv);
 int btrfs_truncate_free_space_cache(struct btrfs_trans_handle *trans,
                                    struct btrfs_block_group *block_group,
                                    struct inode *inode);
index 4d155a4..045ddce 100644 (file)
@@ -1252,7 +1252,7 @@ out:
        return ret;
 }
 
-int btrfs_clear_free_space_tree(struct btrfs_fs_info *fs_info)
+int btrfs_delete_free_space_tree(struct btrfs_fs_info *fs_info)
 {
        struct btrfs_trans_handle *trans;
        struct btrfs_root *tree_root = fs_info->tree_root;
@@ -1280,7 +1280,10 @@ int btrfs_clear_free_space_tree(struct btrfs_fs_info *fs_info)
                goto abort;
 
        btrfs_global_root_delete(free_space_root);
+
+       spin_lock(&fs_info->trans_lock);
        list_del(&free_space_root->dirty_list);
+       spin_unlock(&fs_info->trans_lock);
 
        btrfs_tree_lock(free_space_root->node);
        btrfs_clear_buffer_dirty(trans, free_space_root->node);
@@ -1298,6 +1301,54 @@ abort:
        return ret;
 }
 
+int btrfs_rebuild_free_space_tree(struct btrfs_fs_info *fs_info)
+{
+       struct btrfs_trans_handle *trans;
+       struct btrfs_key key = {
+               .objectid = BTRFS_FREE_SPACE_TREE_OBJECTID,
+               .type = BTRFS_ROOT_ITEM_KEY,
+               .offset = 0,
+       };
+       struct btrfs_root *free_space_root = btrfs_global_root(fs_info, &key);
+       struct rb_node *node;
+       int ret;
+
+       trans = btrfs_start_transaction(free_space_root, 1);
+       if (IS_ERR(trans))
+               return PTR_ERR(trans);
+
+       set_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+       set_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
+
+       ret = clear_free_space_tree(trans, free_space_root);
+       if (ret)
+               goto abort;
+
+       node = rb_first_cached(&fs_info->block_group_cache_tree);
+       while (node) {
+               struct btrfs_block_group *block_group;
+
+               block_group = rb_entry(node, struct btrfs_block_group,
+                                      cache_node);
+               ret = populate_free_space_tree(trans, block_group);
+               if (ret)
+                       goto abort;
+               node = rb_next(node);
+       }
+
+       btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE);
+       btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID);
+       clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+
+       ret = btrfs_commit_transaction(trans);
+       clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
+       return ret;
+abort:
+       btrfs_abort_transaction(trans, ret);
+       btrfs_end_transaction(trans);
+       return ret;
+}
+
 static int __add_block_group_free_space(struct btrfs_trans_handle *trans,
                                        struct btrfs_block_group *block_group,
                                        struct btrfs_path *path)
index dc2463e..6d5551d 100644 (file)
@@ -18,7 +18,8 @@ struct btrfs_caching_control;
 
 void set_free_space_tree_thresholds(struct btrfs_block_group *block_group);
 int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info);
-int btrfs_clear_free_space_tree(struct btrfs_fs_info *fs_info);
+int btrfs_delete_free_space_tree(struct btrfs_fs_info *fs_info);
+int btrfs_rebuild_free_space_tree(struct btrfs_fs_info *fs_info);
 int load_free_space_tree(struct btrfs_caching_control *caching_ctl);
 int add_block_group_free_space(struct btrfs_trans_handle *trans,
                               struct btrfs_block_group *block_group);
index 0d98fc5..203d2a2 100644 (file)
@@ -543,7 +543,6 @@ struct btrfs_fs_info {
         * A third pool does submit_bio to avoid deadlocking with the other two.
         */
        struct btrfs_workqueue *workers;
-       struct btrfs_workqueue *hipri_workers;
        struct btrfs_workqueue *delalloc_workers;
        struct btrfs_workqueue *flush_workers;
        struct workqueue_struct *endio_workers;
@@ -577,6 +576,7 @@ struct btrfs_fs_info {
        s32 dirty_metadata_batch;
        s32 delalloc_batch;
 
+       /* Protected by 'trans_lock'. */
        struct list_head dirty_cowonly_roots;
 
        struct btrfs_fs_devices *fs_devices;
@@ -643,7 +643,6 @@ struct btrfs_fs_info {
         */
        refcount_t scrub_workers_refcnt;
        struct workqueue_struct *scrub_workers;
-       struct workqueue_struct *scrub_wr_completion_workers;
        struct btrfs_subpage_info *subpage_info;
 
        struct btrfs_discard_ctl discard_ctl;
@@ -854,7 +853,7 @@ static inline u64 btrfs_calc_metadata_size(const struct btrfs_fs_info *fs_info,
 
 static inline bool btrfs_is_zoned(const struct btrfs_fs_info *fs_info)
 {
-       return fs_info->zone_size > 0;
+       return IS_ENABLED(CONFIG_BLK_DEV_ZONED) && fs_info->zone_size > 0;
 }
 
 /*
index b80aeb7..ede43b6 100644 (file)
@@ -60,6 +60,22 @@ struct btrfs_truncate_control {
        bool clear_extent_range;
 };
 
+/*
+ * btrfs_inode_item stores flags in a u64, btrfs_inode stores them in two
+ * separate u32s. These two functions convert between the two representations.
+ */
+static inline u64 btrfs_inode_combine_flags(u32 flags, u32 ro_flags)
+{
+       return (flags | ((u64)ro_flags << 32));
+}
+
+static inline void btrfs_inode_split_flags(u64 inode_item_flags,
+                                          u32 *flags, u32 *ro_flags)
+{
+       *flags = (u32)inode_item_flags;
+       *ro_flags = (u32)(inode_item_flags >> 32);
+}
+
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
                               struct btrfs_root *root,
                               struct btrfs_truncate_control *control);
index 57d0700..dbbb672 100644 (file)
@@ -70,6 +70,7 @@
 #include "verity.h"
 #include "super.h"
 #include "orphan.h"
+#include "backref.h"
 
 struct btrfs_iget_args {
        u64 ino;
@@ -100,6 +101,18 @@ struct btrfs_rename_ctx {
        u64 index;
 };
 
+/*
+ * Used by data_reloc_print_warning_inode() to pass needed info for filename
+ * resolution and output of error message.
+ */
+struct data_reloc_warn {
+       struct btrfs_path path;
+       struct btrfs_fs_info *fs_info;
+       u64 extent_item_size;
+       u64 logical;
+       int mirror_num;
+};
+
 static const struct inode_operations btrfs_dir_inode_operations;
 static const struct inode_operations btrfs_symlink_inode_operations;
 static const struct inode_operations btrfs_special_inode_operations;
@@ -122,12 +135,198 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
                                       u64 ram_bytes, int compress_type,
                                       int type);
 
+static int data_reloc_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
+                                         u64 root, void *warn_ctx)
+{
+       struct data_reloc_warn *warn = warn_ctx;
+       struct btrfs_fs_info *fs_info = warn->fs_info;
+       struct extent_buffer *eb;
+       struct btrfs_inode_item *inode_item;
+       struct inode_fs_paths *ipath = NULL;
+       struct btrfs_root *local_root;
+       struct btrfs_key key;
+       unsigned int nofs_flag;
+       u32 nlink;
+       int ret;
+
+       local_root = btrfs_get_fs_root(fs_info, root, true);
+       if (IS_ERR(local_root)) {
+               ret = PTR_ERR(local_root);
+               goto err;
+       }
+
+       /* This makes the path point to (inum INODE_ITEM ioff). */
+       key.objectid = inum;
+       key.type = BTRFS_INODE_ITEM_KEY;
+       key.offset = 0;
+
+       ret = btrfs_search_slot(NULL, local_root, &key, &warn->path, 0, 0);
+       if (ret) {
+               btrfs_put_root(local_root);
+               btrfs_release_path(&warn->path);
+               goto err;
+       }
+
+       eb = warn->path.nodes[0];
+       inode_item = btrfs_item_ptr(eb, warn->path.slots[0], struct btrfs_inode_item);
+       nlink = btrfs_inode_nlink(eb, inode_item);
+       btrfs_release_path(&warn->path);
+
+       nofs_flag = memalloc_nofs_save();
+       ipath = init_ipath(4096, local_root, &warn->path);
+       memalloc_nofs_restore(nofs_flag);
+       if (IS_ERR(ipath)) {
+               btrfs_put_root(local_root);
+               ret = PTR_ERR(ipath);
+               ipath = NULL;
+               /*
+                * -ENOMEM, not a critical error, just output an generic error
+                * without filename.
+                */
+               btrfs_warn(fs_info,
+"checksum error at logical %llu mirror %u root %llu, inode %llu offset %llu",
+                          warn->logical, warn->mirror_num, root, inum, offset);
+               return ret;
+       }
+       ret = paths_from_inode(inum, ipath);
+       if (ret < 0)
+               goto err;
+
+       /*
+        * We deliberately ignore the bit ipath might have been too small to
+        * hold all of the paths here
+        */
+       for (int i = 0; i < ipath->fspath->elem_cnt; i++) {
+               btrfs_warn(fs_info,
+"checksum error at logical %llu mirror %u root %llu inode %llu offset %llu length %u links %u (path: %s)",
+                          warn->logical, warn->mirror_num, root, inum, offset,
+                          fs_info->sectorsize, nlink,
+                          (char *)(unsigned long)ipath->fspath->val[i]);
+       }
+
+       btrfs_put_root(local_root);
+       free_ipath(ipath);
+       return 0;
+
+err:
+       btrfs_warn(fs_info,
+"checksum error at logical %llu mirror %u root %llu inode %llu offset %llu, path resolving failed with ret=%d",
+                  warn->logical, warn->mirror_num, root, inum, offset, ret);
+
+       free_ipath(ipath);
+       return ret;
+}
+
+/*
+ * Do extra user-friendly error output (e.g. lookup all the affected files).
+ *
+ * Return true if we succeeded doing the backref lookup.
+ * Return false if such lookup failed, and has to fallback to the old error message.
+ */
+static void print_data_reloc_error(const struct btrfs_inode *inode, u64 file_off,
+                                  const u8 *csum, const u8 *csum_expected,
+                                  int mirror_num)
+{
+       struct btrfs_fs_info *fs_info = inode->root->fs_info;
+       struct btrfs_path path = { 0 };
+       struct btrfs_key found_key = { 0 };
+       struct extent_buffer *eb;
+       struct btrfs_extent_item *ei;
+       const u32 csum_size = fs_info->csum_size;
+       u64 logical;
+       u64 flags;
+       u32 item_size;
+       int ret;
+
+       mutex_lock(&fs_info->reloc_mutex);
+       logical = btrfs_get_reloc_bg_bytenr(fs_info);
+       mutex_unlock(&fs_info->reloc_mutex);
+
+       if (logical == U64_MAX) {
+               btrfs_warn_rl(fs_info, "has data reloc tree but no running relocation");
+               btrfs_warn_rl(fs_info,
+"csum failed root %lld ino %llu off %llu csum " CSUM_FMT " expected csum " CSUM_FMT " mirror %d",
+                       inode->root->root_key.objectid, btrfs_ino(inode), file_off,
+                       CSUM_FMT_VALUE(csum_size, csum),
+                       CSUM_FMT_VALUE(csum_size, csum_expected),
+                       mirror_num);
+               return;
+       }
+
+       logical += file_off;
+       btrfs_warn_rl(fs_info,
+"csum failed root %lld ino %llu off %llu logical %llu csum " CSUM_FMT " expected csum " CSUM_FMT " mirror %d",
+                       inode->root->root_key.objectid,
+                       btrfs_ino(inode), file_off, logical,
+                       CSUM_FMT_VALUE(csum_size, csum),
+                       CSUM_FMT_VALUE(csum_size, csum_expected),
+                       mirror_num);
+
+       ret = extent_from_logical(fs_info, logical, &path, &found_key, &flags);
+       if (ret < 0) {
+               btrfs_err_rl(fs_info, "failed to lookup extent item for logical %llu: %d",
+                            logical, ret);
+               return;
+       }
+       eb = path.nodes[0];
+       ei = btrfs_item_ptr(eb, path.slots[0], struct btrfs_extent_item);
+       item_size = btrfs_item_size(eb, path.slots[0]);
+       if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
+               unsigned long ptr = 0;
+               u64 ref_root;
+               u8 ref_level;
+
+               while (true) {
+                       ret = tree_backref_for_extent(&ptr, eb, &found_key, ei,
+                                                     item_size, &ref_root,
+                                                     &ref_level);
+                       if (ret < 0) {
+                               btrfs_warn_rl(fs_info,
+                               "failed to resolve tree backref for logical %llu: %d",
+                                             logical, ret);
+                               break;
+                       }
+                       if (ret > 0)
+                               break;
+
+                       btrfs_warn_rl(fs_info,
+"csum error at logical %llu mirror %u: metadata %s (level %d) in tree %llu",
+                               logical, mirror_num,
+                               (ref_level ? "node" : "leaf"),
+                               ref_level, ref_root);
+               }
+               btrfs_release_path(&path);
+       } else {
+               struct btrfs_backref_walk_ctx ctx = { 0 };
+               struct data_reloc_warn reloc_warn = { 0 };
+
+               btrfs_release_path(&path);
+
+               ctx.bytenr = found_key.objectid;
+               ctx.extent_item_pos = logical - found_key.objectid;
+               ctx.fs_info = fs_info;
+
+               reloc_warn.logical = logical;
+               reloc_warn.extent_item_size = found_key.offset;
+               reloc_warn.mirror_num = mirror_num;
+               reloc_warn.fs_info = fs_info;
+
+               iterate_extent_inodes(&ctx, true,
+                                     data_reloc_print_warning_inode, &reloc_warn);
+       }
+}
+
 static void __cold btrfs_print_data_csum_error(struct btrfs_inode *inode,
                u64 logical_start, u8 *csum, u8 *csum_expected, int mirror_num)
 {
        struct btrfs_root *root = inode->root;
        const u32 csum_size = root->fs_info->csum_size;
 
+       /* For data reloc tree, it's better to do a backref lookup instead. */
+       if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID)
+               return print_data_reloc_error(inode, logical_start, csum,
+                                             csum_expected, mirror_num);
+
        /* Output without objectid, which is more meaningful */
        if (root->root_key.objectid >= BTRFS_LAST_FREE_OBJECTID) {
                btrfs_warn_rl(root->fs_info,
@@ -636,6 +835,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 {
        struct btrfs_inode *inode = async_chunk->inode;
        struct btrfs_fs_info *fs_info = inode->root->fs_info;
+       struct address_space *mapping = inode->vfs_inode.i_mapping;
        u64 blocksize = fs_info->sectorsize;
        u64 start = async_chunk->start;
        u64 end = async_chunk->end;
@@ -750,7 +950,7 @@ again:
                /* Compression level is applied here and only here */
                ret = btrfs_compress_pages(
                        compress_type | (fs_info->compress_level << 4),
-                                          inode->vfs_inode.i_mapping, start,
+                                          mapping, start,
                                           pages,
                                           &nr_pages,
                                           &total_in,
@@ -793,9 +993,9 @@ cont:
                        unsigned long clear_flags = EXTENT_DELALLOC |
                                EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
                                EXTENT_DO_ACCOUNTING;
-                       unsigned long page_error_op;
 
-                       page_error_op = ret < 0 ? PAGE_SET_ERROR : 0;
+                       if (ret < 0)
+                               mapping_set_error(mapping, -EIO);
 
                        /*
                         * inline extent creation worked or returned error,
@@ -812,7 +1012,6 @@ cont:
                                                     clear_flags,
                                                     PAGE_UNLOCK |
                                                     PAGE_START_WRITEBACK |
-                                                    page_error_op |
                                                     PAGE_END_WRITEBACK);
 
                        /*
@@ -934,6 +1133,12 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
        unsigned long nr_written = 0;
        int page_started = 0;
        int ret;
+       struct writeback_control wbc = {
+               .sync_mode              = WB_SYNC_ALL,
+               .range_start            = start,
+               .range_end              = end,
+               .no_cgroup_owner        = 1,
+       };
 
        /*
         * Call cow_file_range() to run the delalloc range directly, since we
@@ -954,8 +1159,6 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
                        const u64 page_start = page_offset(locked_page);
                        const u64 page_end = page_start + PAGE_SIZE - 1;
 
-                       btrfs_page_set_error(inode->root->fs_info, locked_page,
-                                            page_start, PAGE_SIZE);
                        set_page_writeback(locked_page);
                        end_page_writeback(locked_page);
                        end_extent_writepage(locked_page, ret, page_start, page_end);
@@ -965,7 +1168,10 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
        }
 
        /* All pages will be unlocked, including @locked_page */
-       return extent_write_locked_range(&inode->vfs_inode, start, end);
+       wbc_attach_fdatawrite_inode(&wbc, &inode->vfs_inode);
+       ret = extent_write_locked_range(&inode->vfs_inode, start, end, &wbc);
+       wbc_detach_inode(&wbc);
+       return ret;
 }
 
 static int submit_one_async_extent(struct btrfs_inode *inode,
@@ -976,6 +1182,7 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
        struct extent_io_tree *io_tree = &inode->io_tree;
        struct btrfs_root *root = inode->root;
        struct btrfs_fs_info *fs_info = root->fs_info;
+       struct btrfs_ordered_extent *ordered;
        struct btrfs_key ins;
        struct page *locked_page = NULL;
        struct extent_map *em;
@@ -1037,7 +1244,7 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
        }
        free_extent_map(em);
 
-       ret = btrfs_add_ordered_extent(inode, start,            /* file_offset */
+       ordered = btrfs_alloc_ordered_extent(inode, start,      /* file_offset */
                                       async_extent->ram_size,  /* num_bytes */
                                       async_extent->ram_size,  /* ram_bytes */
                                       ins.objectid,            /* disk_bytenr */
@@ -1045,8 +1252,9 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
                                       0,                       /* offset */
                                       1 << BTRFS_ORDERED_COMPRESSED,
                                       async_extent->compress_type);
-       if (ret) {
+       if (IS_ERR(ordered)) {
                btrfs_drop_extent_map_range(inode, start, end, false);
+               ret = PTR_ERR(ordered);
                goto out_free_reserve;
        }
        btrfs_dec_block_group_reservations(fs_info, ins.objectid);
@@ -1055,11 +1263,7 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
        extent_clear_unlock_delalloc(inode, start, end,
                        NULL, EXTENT_LOCKED | EXTENT_DELALLOC,
                        PAGE_UNLOCK | PAGE_START_WRITEBACK);
-
-       btrfs_submit_compressed_write(inode, start,     /* file_offset */
-                           async_extent->ram_size,     /* num_bytes */
-                           ins.objectid,               /* disk_bytenr */
-                           ins.offset,                 /* compressed_len */
+       btrfs_submit_compressed_write(ordered,
                            async_extent->pages,        /* compressed_pages */
                            async_extent->nr_pages,
                            async_chunk->write_flags, true);
@@ -1074,12 +1278,13 @@ out_free_reserve:
        btrfs_dec_block_group_reservations(fs_info, ins.objectid);
        btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1);
 out_free:
+       mapping_set_error(inode->vfs_inode.i_mapping, -EIO);
        extent_clear_unlock_delalloc(inode, start, end,
                                     NULL, EXTENT_LOCKED | EXTENT_DELALLOC |
                                     EXTENT_DELALLOC_NEW |
                                     EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING,
                                     PAGE_UNLOCK | PAGE_START_WRITEBACK |
-                                    PAGE_END_WRITEBACK | PAGE_SET_ERROR);
+                                    PAGE_END_WRITEBACK);
        free_async_extent_pages(async_extent);
        goto done;
 }
@@ -1287,6 +1492,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
                min_alloc_size = fs_info->sectorsize;
 
        while (num_bytes > 0) {
+               struct btrfs_ordered_extent *ordered;
+
                cur_alloc_size = num_bytes;
                ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
                                           min_alloc_size, 0, alloc_hint,
@@ -1311,16 +1518,18 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
                }
                free_extent_map(em);
 
-               ret = btrfs_add_ordered_extent(inode, start, ram_size, ram_size,
-                                              ins.objectid, cur_alloc_size, 0,
-                                              1 << BTRFS_ORDERED_REGULAR,
-                                              BTRFS_COMPRESS_NONE);
-               if (ret)
+               ordered = btrfs_alloc_ordered_extent(inode, start, ram_size,
+                                       ram_size, ins.objectid, cur_alloc_size,
+                                       0, 1 << BTRFS_ORDERED_REGULAR,
+                                       BTRFS_COMPRESS_NONE);
+               if (IS_ERR(ordered)) {
+                       ret = PTR_ERR(ordered);
                        goto out_drop_extent_cache;
+               }
 
                if (btrfs_is_data_reloc_root(root)) {
-                       ret = btrfs_reloc_clone_csums(inode, start,
-                                                     cur_alloc_size);
+                       ret = btrfs_reloc_clone_csums(ordered);
+
                        /*
                         * Only drop cache here, and process as normal.
                         *
@@ -1337,6 +1546,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
                                                            start + ram_size - 1,
                                                            false);
                }
+               btrfs_put_ordered_extent(ordered);
 
                btrfs_dec_block_group_reservations(fs_info, ins.objectid);
 
@@ -1494,7 +1704,7 @@ static noinline void async_cow_submit(struct btrfs_work *work)
         * ->inode could be NULL if async_chunk_start has failed to compress,
         * in which case we don't have anything to submit, yet we need to
         * always adjust ->async_delalloc_pages as its paired with the init
-        * happening in cow_file_range_async
+        * happening in run_delalloc_compressed
         */
        if (async_chunk->inode)
                submit_compressed_extents(async_chunk);
@@ -1521,58 +1731,36 @@ static noinline void async_cow_free(struct btrfs_work *work)
                kvfree(async_cow);
 }
 
-static int cow_file_range_async(struct btrfs_inode *inode,
-                               struct writeback_control *wbc,
-                               struct page *locked_page,
-                               u64 start, u64 end, int *page_started,
-                               unsigned long *nr_written)
+static bool run_delalloc_compressed(struct btrfs_inode *inode,
+                                   struct writeback_control *wbc,
+                                   struct page *locked_page,
+                                   u64 start, u64 end, int *page_started,
+                                   unsigned long *nr_written)
 {
        struct btrfs_fs_info *fs_info = inode->root->fs_info;
        struct cgroup_subsys_state *blkcg_css = wbc_blkcg_css(wbc);
        struct async_cow *ctx;
        struct async_chunk *async_chunk;
        unsigned long nr_pages;
-       u64 cur_end;
        u64 num_chunks = DIV_ROUND_UP(end - start, SZ_512K);
        int i;
-       bool should_compress;
        unsigned nofs_flag;
        const blk_opf_t write_flags = wbc_to_write_flags(wbc);
 
-       unlock_extent(&inode->io_tree, start, end, NULL);
-
-       if (inode->flags & BTRFS_INODE_NOCOMPRESS &&
-           !btrfs_test_opt(fs_info, FORCE_COMPRESS)) {
-               num_chunks = 1;
-               should_compress = false;
-       } else {
-               should_compress = true;
-       }
-
        nofs_flag = memalloc_nofs_save();
        ctx = kvmalloc(struct_size(ctx, chunks, num_chunks), GFP_KERNEL);
        memalloc_nofs_restore(nofs_flag);
+       if (!ctx)
+               return false;
 
-       if (!ctx) {
-               unsigned clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC |
-                       EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
-                       EXTENT_DO_ACCOUNTING;
-               unsigned long page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK |
-                                        PAGE_END_WRITEBACK | PAGE_SET_ERROR;
-
-               extent_clear_unlock_delalloc(inode, start, end, locked_page,
-                                            clear_bits, page_ops);
-               return -ENOMEM;
-       }
+       unlock_extent(&inode->io_tree, start, end, NULL);
+       set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
 
        async_chunk = ctx->chunks;
        atomic_set(&ctx->num_chunks, num_chunks);
 
        for (i = 0; i < num_chunks; i++) {
-               if (should_compress)
-                       cur_end = min(end, start + SZ_512K - 1);
-               else
-                       cur_end = end;
+               u64 cur_end = min(end, start + SZ_512K - 1);
 
                /*
                 * igrab is called higher up in the call chain, take only the
@@ -1633,13 +1821,14 @@ static int cow_file_range_async(struct btrfs_inode *inode,
                start = cur_end + 1;
        }
        *page_started = 1;
-       return 0;
+       return true;
 }
 
 static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
                                       struct page *locked_page, u64 start,
                                       u64 end, int *page_started,
-                                      unsigned long *nr_written)
+                                      unsigned long *nr_written,
+                                      struct writeback_control *wbc)
 {
        u64 done_offset = end;
        int ret;
@@ -1671,8 +1860,8 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
                        account_page_redirty(locked_page);
                }
                locked_page_done = true;
-               extent_write_locked_range(&inode->vfs_inode, start, done_offset);
-
+               extent_write_locked_range(&inode->vfs_inode, start, done_offset,
+                                         wbc);
                start = done_offset + 1;
        }
 
@@ -1864,7 +2053,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 
        ret = btrfs_cross_ref_exist(root, btrfs_ino(inode),
                                    key->offset - args->extent_offset,
-                                   args->disk_bytenr, false, path);
+                                   args->disk_bytenr, args->strict, path);
        WARN_ON_ONCE(ret > 0 && is_freespace_inode);
        if (ret != 0)
                goto out;
@@ -1947,6 +2136,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
        nocow_args.writeback_path = true;
 
        while (1) {
+               struct btrfs_ordered_extent *ordered;
                struct btrfs_key found_key;
                struct btrfs_file_extent_item *fi;
                struct extent_buffer *leaf;
@@ -1954,6 +2144,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
                u64 ram_bytes;
                u64 nocow_end;
                int extent_type;
+               bool is_prealloc;
 
                nocow = false;
 
@@ -2092,8 +2283,8 @@ out_check:
                }
 
                nocow_end = cur_offset + nocow_args.num_bytes - 1;
-
-               if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
+               is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
+               if (is_prealloc) {
                        u64 orig_start = found_key.offset - nocow_args.extent_offset;
                        struct extent_map *em;
 
@@ -2109,29 +2300,22 @@ out_check:
                                goto error;
                        }
                        free_extent_map(em);
-                       ret = btrfs_add_ordered_extent(inode,
-                                       cur_offset, nocow_args.num_bytes,
-                                       nocow_args.num_bytes,
-                                       nocow_args.disk_bytenr,
-                                       nocow_args.num_bytes, 0,
-                                       1 << BTRFS_ORDERED_PREALLOC,
-                                       BTRFS_COMPRESS_NONE);
-                       if (ret) {
+               }
+
+               ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
+                               nocow_args.num_bytes, nocow_args.num_bytes,
+                               nocow_args.disk_bytenr, nocow_args.num_bytes, 0,
+                               is_prealloc
+                               ? (1 << BTRFS_ORDERED_PREALLOC)
+                               : (1 << BTRFS_ORDERED_NOCOW),
+                               BTRFS_COMPRESS_NONE);
+               if (IS_ERR(ordered)) {
+                       if (is_prealloc) {
                                btrfs_drop_extent_map_range(inode, cur_offset,
                                                            nocow_end, false);
-                               goto error;
                        }
-               } else {
-                       ret = btrfs_add_ordered_extent(inode, cur_offset,
-                                                      nocow_args.num_bytes,
-                                                      nocow_args.num_bytes,
-                                                      nocow_args.disk_bytenr,
-                                                      nocow_args.num_bytes,
-                                                      0,
-                                                      1 << BTRFS_ORDERED_NOCOW,
-                                                      BTRFS_COMPRESS_NONE);
-                       if (ret)
-                               goto error;
+                       ret = PTR_ERR(ordered);
+                       goto error;
                }
 
                if (nocow) {
@@ -2145,8 +2329,8 @@ out_check:
                         * extent_clear_unlock_delalloc() in error handler
                         * from freeing metadata of created ordered extent.
                         */
-                       ret = btrfs_reloc_clone_csums(inode, cur_offset,
-                                                     nocow_args.num_bytes);
+                       ret = btrfs_reloc_clone_csums(ordered);
+               btrfs_put_ordered_extent(ordered);
 
                extent_clear_unlock_delalloc(inode, cur_offset, nocow_end,
                                             locked_page, EXTENT_LOCKED |
@@ -2214,7 +2398,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
                u64 start, u64 end, int *page_started, unsigned long *nr_written,
                struct writeback_control *wbc)
 {
-       int ret;
+       int ret = 0;
        const bool zoned = btrfs_is_zoned(inode->root->fs_info);
 
        /*
@@ -2235,19 +2419,23 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
                ASSERT(!zoned || btrfs_is_data_reloc_root(inode->root));
                ret = run_delalloc_nocow(inode, locked_page, start, end,
                                         page_started, nr_written);
-       } else if (!btrfs_inode_can_compress(inode) ||
-                  !inode_need_compress(inode, start, end)) {
-               if (zoned)
-                       ret = run_delalloc_zoned(inode, locked_page, start, end,
-                                                page_started, nr_written);
-               else
-                       ret = cow_file_range(inode, locked_page, start, end,
-                                            page_started, nr_written, 1, NULL);
-       } else {
-               set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
-               ret = cow_file_range_async(inode, wbc, locked_page, start, end,
-                                          page_started, nr_written);
+               goto out;
        }
+
+       if (btrfs_inode_can_compress(inode) &&
+           inode_need_compress(inode, start, end) &&
+           run_delalloc_compressed(inode, wbc, locked_page, start,
+                                   end, page_started, nr_written))
+               goto out;
+
+       if (zoned)
+               ret = run_delalloc_zoned(inode, locked_page, start, end,
+                                        page_started, nr_written, wbc);
+       else
+               ret = cow_file_range(inode, locked_page, start, end,
+                                    page_started, nr_written, 1, NULL);
+
+out:
        ASSERT(ret <= 0);
        if (ret)
                btrfs_cleanup_ordered_extents(inode, locked_page, start,
@@ -2515,125 +2703,42 @@ void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
        }
 }
 
-/*
- * Split off the first pre bytes from the extent_map at [start, start + len]
- *
- * This function is intended to be used only for extract_ordered_extent().
- */
-static int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre)
-{
-       struct extent_map_tree *em_tree = &inode->extent_tree;
-       struct extent_map *em;
-       struct extent_map *split_pre = NULL;
-       struct extent_map *split_mid = NULL;
-       int ret = 0;
-       unsigned long flags;
-
-       ASSERT(pre != 0);
-       ASSERT(pre < len);
-
-       split_pre = alloc_extent_map();
-       if (!split_pre)
-               return -ENOMEM;
-       split_mid = alloc_extent_map();
-       if (!split_mid) {
-               ret = -ENOMEM;
-               goto out_free_pre;
-       }
-
-       lock_extent(&inode->io_tree, start, start + len - 1, NULL);
-       write_lock(&em_tree->lock);
-       em = lookup_extent_mapping(em_tree, start, len);
-       if (!em) {
-               ret = -EIO;
-               goto out_unlock;
-       }
-
-       ASSERT(em->len == len);
-       ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags));
-       ASSERT(em->block_start < EXTENT_MAP_LAST_BYTE);
-       ASSERT(test_bit(EXTENT_FLAG_PINNED, &em->flags));
-       ASSERT(!test_bit(EXTENT_FLAG_LOGGING, &em->flags));
-       ASSERT(!list_empty(&em->list));
-
-       flags = em->flags;
-       clear_bit(EXTENT_FLAG_PINNED, &em->flags);
-
-       /* First, replace the em with a new extent_map starting from * em->start */
-       split_pre->start = em->start;
-       split_pre->len = pre;
-       split_pre->orig_start = split_pre->start;
-       split_pre->block_start = em->block_start;
-       split_pre->block_len = split_pre->len;
-       split_pre->orig_block_len = split_pre->block_len;
-       split_pre->ram_bytes = split_pre->len;
-       split_pre->flags = flags;
-       split_pre->compress_type = em->compress_type;
-       split_pre->generation = em->generation;
-
-       replace_extent_mapping(em_tree, em, split_pre, 1);
-
-       /*
-        * Now we only have an extent_map at:
-        *     [em->start, em->start + pre]
-        */
-
-       /* Insert the middle extent_map. */
-       split_mid->start = em->start + pre;
-       split_mid->len = em->len - pre;
-       split_mid->orig_start = split_mid->start;
-       split_mid->block_start = em->block_start + pre;
-       split_mid->block_len = split_mid->len;
-       split_mid->orig_block_len = split_mid->block_len;
-       split_mid->ram_bytes = split_mid->len;
-       split_mid->flags = flags;
-       split_mid->compress_type = em->compress_type;
-       split_mid->generation = em->generation;
-       add_extent_mapping(em_tree, split_mid, 1);
-
-       /* Once for us */
-       free_extent_map(em);
-       /* Once for the tree */
-       free_extent_map(em);
-
-out_unlock:
-       write_unlock(&em_tree->lock);
-       unlock_extent(&inode->io_tree, start, start + len - 1, NULL);
-       free_extent_map(split_mid);
-out_free_pre:
-       free_extent_map(split_pre);
-       return ret;
-}
-
-int btrfs_extract_ordered_extent(struct btrfs_bio *bbio,
-                                struct btrfs_ordered_extent *ordered)
+static int btrfs_extract_ordered_extent(struct btrfs_bio *bbio,
+                                       struct btrfs_ordered_extent *ordered)
 {
        u64 start = (u64)bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
        u64 len = bbio->bio.bi_iter.bi_size;
-       struct btrfs_inode *inode = bbio->inode;
-       u64 ordered_len = ordered->num_bytes;
-       int ret = 0;
+       struct btrfs_ordered_extent *new;
+       int ret;
 
        /* Must always be called for the beginning of an ordered extent. */
        if (WARN_ON_ONCE(start != ordered->disk_bytenr))
                return -EINVAL;
 
        /* No need to split if the ordered extent covers the entire bio. */
-       if (ordered->disk_num_bytes == len)
+       if (ordered->disk_num_bytes == len) {
+               refcount_inc(&ordered->refs);
+               bbio->ordered = ordered;
                return 0;
-
-       ret = btrfs_split_ordered_extent(ordered, len);
-       if (ret)
-               return ret;
+       }
 
        /*
         * Don't split the extent_map for NOCOW extents, as we're writing into
         * a pre-existing one.
         */
-       if (test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
-               return 0;
+       if (!test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags)) {
+               ret = split_extent_map(bbio->inode, bbio->file_offset,
+                                      ordered->num_bytes, len,
+                                      ordered->disk_bytenr);
+               if (ret)
+                       return ret;
+       }
 
-       return split_extent_map(inode, bbio->file_offset, ordered_len, len);
+       new = btrfs_split_ordered_extent(ordered, len);
+       if (IS_ERR(new))
+               return PTR_ERR(new);
+       bbio->ordered = new;
+       return 0;
 }
 
 /*
@@ -2651,7 +2756,7 @@ static int add_pending_csums(struct btrfs_trans_handle *trans,
                trans->adding_csums = true;
                if (!csum_root)
                        csum_root = btrfs_csum_root(trans->fs_info,
-                                                   sum->bytenr);
+                                                   sum->logical);
                ret = btrfs_csum_file_blocks(trans, csum_root, sum);
                trans->adding_csums = false;
                if (ret)
@@ -2689,8 +2794,7 @@ static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode,
 
                ret = set_extent_bit(&inode->io_tree, search_start,
                                     search_start + em_len - 1,
-                                    EXTENT_DELALLOC_NEW, cached_state,
-                                    GFP_NOFS);
+                                    EXTENT_DELALLOC_NEW, cached_state);
 next:
                search_start = extent_map_end(em);
                free_extent_map(em);
@@ -2723,8 +2827,8 @@ int btrfs_set_extent_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
                        return ret;
        }
 
-       return set_extent_delalloc(&inode->io_tree, start, end, extra_bits,
-                                  cached_state);
+       return set_extent_bit(&inode->io_tree, start, end,
+                             EXTENT_DELALLOC | extra_bits, cached_state);
 }
 
 /* see btrfs_writepage_start_hook for details on why this is required */
@@ -2847,7 +2951,6 @@ out_page:
                mapping_set_error(page->mapping, ret);
                end_extent_writepage(page, ret, page_start, page_end);
                clear_page_dirty_for_io(page);
-               SetPageError(page);
        }
        btrfs_page_clear_checked(inode->root->fs_info, page, page_start, PAGE_SIZE);
        unlock_page(page);
@@ -3068,7 +3171,7 @@ static int insert_ordered_extent_file_extent(struct btrfs_trans_handle *trans,
  * an ordered extent if the range of bytes in the file it covers are
  * fully written.
  */
-int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
+int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent)
 {
        struct btrfs_inode *inode = BTRFS_I(ordered_extent->inode);
        struct btrfs_root *root = inode->root;
@@ -3103,12 +3206,9 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
                goto out;
        }
 
-       /* A valid ->physical implies a write on a sequential zone. */
-       if (ordered_extent->physical != (u64)-1) {
-               btrfs_rewrite_logical_zoned(ordered_extent);
+       if (btrfs_is_zoned(fs_info))
                btrfs_zone_finish_endio(fs_info, ordered_extent->disk_bytenr,
                                        ordered_extent->disk_num_bytes);
-       }
 
        if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) {
                truncated = true;
@@ -3276,6 +3376,14 @@ out:
        return ret;
 }
 
+int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered)
+{
+       if (btrfs_is_zoned(btrfs_sb(ordered->inode->i_sb)) &&
+           !test_bit(BTRFS_ORDERED_IOERR, &ordered->flags))
+               btrfs_finish_ordered_zoned(ordered);
+       return btrfs_finish_one_ordered(ordered);
+}
+
 void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
                                          struct page *page, u64 start,
                                          u64 end, bool uptodate)
@@ -4223,7 +4331,7 @@ static int btrfs_unlink(struct inode *dir, struct dentry *dentry)
        }
 
        btrfs_record_unlink_dir(trans, BTRFS_I(dir), BTRFS_I(d_inode(dentry)),
-                       0);
+                               false);
 
        ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(d_inode(dentry)),
                                 &fname.disk_name);
@@ -4798,7 +4906,7 @@ again:
 
        if (only_release_metadata)
                set_extent_bit(&inode->io_tree, block_start, block_end,
-                              EXTENT_NORESERVE, NULL, GFP_NOFS);
+                              EXTENT_NORESERVE, NULL);
 
 out_unlock:
        if (ret) {
@@ -7261,7 +7369,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 static int btrfs_get_blocks_direct_write(struct extent_map **map,
                                         struct inode *inode,
                                         struct btrfs_dio_data *dio_data,
-                                        u64 start, u64 len,
+                                        u64 start, u64 *lenp,
                                         unsigned int iomap_flags)
 {
        const bool nowait = (iomap_flags & IOMAP_NOWAIT);
@@ -7272,6 +7380,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
        struct btrfs_block_group *bg;
        bool can_nocow = false;
        bool space_reserved = false;
+       u64 len = *lenp;
        u64 prev_len;
        int ret = 0;
 
@@ -7342,15 +7451,19 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
                free_extent_map(em);
                *map = NULL;
 
-               if (nowait)
-                       return -EAGAIN;
+               if (nowait) {
+                       ret = -EAGAIN;
+                       goto out;
+               }
 
                /*
                 * If we could not allocate data space before locking the file
                 * range and we can't do a NOCOW write, then we have to fail.
                 */
-               if (!dio_data->data_space_reserved)
-                       return -ENOSPC;
+               if (!dio_data->data_space_reserved) {
+                       ret = -ENOSPC;
+                       goto out;
+               }
 
                /*
                 * We have to COW and we have already reserved data space before,
@@ -7391,6 +7504,7 @@ out:
                btrfs_delalloc_release_extents(BTRFS_I(inode), len);
                btrfs_delalloc_release_metadata(BTRFS_I(inode), len, true);
        }
+       *lenp = len;
        return ret;
 }
 
@@ -7567,7 +7681,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 
        if (write) {
                ret = btrfs_get_blocks_direct_write(&em, inode, dio_data,
-                                                   start, len, flags);
+                                                   start, &len, flags);
                if (ret < 0)
                        goto unlock_err;
                unlock_extents = true;
@@ -7661,8 +7775,8 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
                pos += submitted;
                length -= submitted;
                if (write)
-                       btrfs_mark_ordered_io_finished(BTRFS_I(inode), NULL,
-                                                      pos, length, false);
+                       btrfs_finish_ordered_extent(dio_data->ordered, NULL,
+                                                   pos, length, false);
                else
                        unlock_extent(&BTRFS_I(inode)->io_tree, pos,
                                      pos + length - 1, NULL);
@@ -7692,12 +7806,14 @@ static void btrfs_dio_end_io(struct btrfs_bio *bbio)
                           dip->file_offset, dip->bytes, bio->bi_status);
        }
 
-       if (btrfs_op(bio) == BTRFS_MAP_WRITE)
-               btrfs_mark_ordered_io_finished(inode, NULL, dip->file_offset,
-                                              dip->bytes, !bio->bi_status);
-       else
+       if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
+               btrfs_finish_ordered_extent(bbio->ordered, NULL,
+                                           dip->file_offset, dip->bytes,
+                                           !bio->bi_status);
+       } else {
                unlock_extent(&inode->io_tree, dip->file_offset,
                              dip->file_offset + dip->bytes - 1, NULL);
+       }
 
        bbio->bio.bi_private = bbio->private;
        iomap_dio_bio_end_io(bio);
@@ -7733,7 +7849,8 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
 
                ret = btrfs_extract_ordered_extent(bbio, dio_data->ordered);
                if (ret) {
-                       btrfs_bio_end_io(bbio, errno_to_blk_status(ret));
+                       bbio->bio.bi_status = errno_to_blk_status(ret);
+                       btrfs_dio_end_io(bbio);
                        return;
                }
        }
@@ -8227,7 +8344,7 @@ static int btrfs_truncate(struct btrfs_inode *inode, bool skip_writeback)
        int ret;
        struct btrfs_trans_handle *trans;
        u64 mask = fs_info->sectorsize - 1;
-       u64 min_size = btrfs_calc_metadata_size(fs_info, 1);
+       const u64 min_size = btrfs_calc_metadata_size(fs_info, 1);
 
        if (!skip_writeback) {
                ret = btrfs_wait_ordered_range(&inode->vfs_inode,
@@ -8284,7 +8401,15 @@ static int btrfs_truncate(struct btrfs_inode *inode, bool skip_writeback)
        /* Migrate the slack space for the truncate to our reserve */
        ret = btrfs_block_rsv_migrate(&fs_info->trans_block_rsv, rsv,
                                      min_size, false);
-       BUG_ON(ret);
+       /*
+        * We have reserved 2 metadata units when we started the transaction and
+        * min_size matches 1 unit, so this should never fail, but if it does,
+        * it's not critical we just fail truncation.
+        */
+       if (WARN_ON(ret)) {
+               btrfs_end_transaction(trans);
+               goto out;
+       }
 
        trans->block_rsv = rsv;
 
@@ -8332,7 +8457,14 @@ static int btrfs_truncate(struct btrfs_inode *inode, bool skip_writeback)
                btrfs_block_rsv_release(fs_info, rsv, -1, NULL);
                ret = btrfs_block_rsv_migrate(&fs_info->trans_block_rsv,
                                              rsv, min_size, false);
-               BUG_ON(ret);    /* shouldn't happen */
+               /*
+                * We have reserved 2 metadata units when we started the
+                * transaction and min_size matches 1 unit, so this should never
+                * fail, but if it does, it's not critical we just fail truncation.
+                */
+               if (WARN_ON(ret))
+                       break;
+
                trans->block_rsv = rsv;
        }
 
@@ -8459,7 +8591,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
        ei->io_tree.inode = ei;
        extent_io_tree_init(fs_info, &ei->file_extent_tree,
                            IO_TREE_INODE_FILE_EXTENT);
-       atomic_set(&ei->sync_writers, 0);
        mutex_init(&ei->log_mutex);
        btrfs_ordered_inode_tree_init(&ei->ordered_tree);
        INIT_LIST_HEAD(&ei->delalloc_inodes);
@@ -8630,7 +8761,7 @@ static int btrfs_getattr(struct mnt_idmap *idmap,
        inode_bytes = inode_get_bytes(inode);
        spin_unlock(&BTRFS_I(inode)->lock);
        stat->blocks = (ALIGN(inode_bytes, blocksize) +
-                       ALIGN(delalloc_bytes, blocksize)) >> 9;
+                       ALIGN(delalloc_bytes, blocksize)) >> SECTOR_SHIFT;
        return 0;
 }
 
@@ -8786,9 +8917,9 @@ static int btrfs_rename_exchange(struct inode *old_dir,
 
        if (old_dentry->d_parent != new_dentry->d_parent) {
                btrfs_record_unlink_dir(trans, BTRFS_I(old_dir),
-                               BTRFS_I(old_inode), 1);
+                                       BTRFS_I(old_inode), true);
                btrfs_record_unlink_dir(trans, BTRFS_I(new_dir),
-                               BTRFS_I(new_inode), 1);
+                                       BTRFS_I(new_inode), true);
        }
 
        /* src is a subvolume */
@@ -9054,7 +9185,7 @@ static int btrfs_rename(struct mnt_idmap *idmap,
 
        if (old_dentry->d_parent != new_dentry->d_parent)
                btrfs_record_unlink_dir(trans, BTRFS_I(old_dir),
-                               BTRFS_I(old_inode), 1);
+                                       BTRFS_I(old_inode), true);
 
        if (unlikely(old_ino == BTRFS_FIRST_FREE_OBJECTID)) {
                ret = btrfs_unlink_subvol(trans, BTRFS_I(old_dir), old_dentry);
@@ -10161,6 +10292,7 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
        struct extent_io_tree *io_tree = &inode->io_tree;
        struct extent_changeset *data_reserved = NULL;
        struct extent_state *cached_state = NULL;
+       struct btrfs_ordered_extent *ordered;
        int compression;
        size_t orig_count;
        u64 start, end;
@@ -10337,14 +10469,15 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
        }
        free_extent_map(em);
 
-       ret = btrfs_add_ordered_extent(inode, start, num_bytes, ram_bytes,
+       ordered = btrfs_alloc_ordered_extent(inode, start, num_bytes, ram_bytes,
                                       ins.objectid, ins.offset,
                                       encoded->unencoded_offset,
                                       (1 << BTRFS_ORDERED_ENCODED) |
                                       (1 << BTRFS_ORDERED_COMPRESSED),
                                       compression);
-       if (ret) {
+       if (IS_ERR(ordered)) {
                btrfs_drop_extent_map_range(inode, start, end, false);
+               ret = PTR_ERR(ordered);
                goto out_free_reserved;
        }
        btrfs_dec_block_group_reservations(fs_info, ins.objectid);
@@ -10356,8 +10489,7 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 
        btrfs_delalloc_release_extents(inode, num_bytes);
 
-       btrfs_submit_compressed_write(inode, start, num_bytes, ins.objectid,
-                                         ins.offset, pages, nr_pages, 0, false);
+       btrfs_submit_compressed_write(ordered, pages, nr_pages, 0, false);
        ret = orig_count;
        goto out;
 
@@ -10894,7 +11026,6 @@ static const struct address_space_operations btrfs_aops = {
        .read_folio     = btrfs_read_folio,
        .writepages     = btrfs_writepages,
        .readahead      = btrfs_readahead,
-       .direct_IO      = noop_direct_IO,
        .invalidate_folio = btrfs_invalidate_folio,
        .release_folio  = btrfs_release_folio,
        .migrate_folio  = btrfs_migrate_folio,
index 25833b4..a895d10 100644 (file)
@@ -454,7 +454,9 @@ void btrfs_exclop_balance(struct btrfs_fs_info *fs_info,
        case BTRFS_EXCLOP_BALANCE_PAUSED:
                spin_lock(&fs_info->super_lock);
                ASSERT(fs_info->exclusive_operation == BTRFS_EXCLOP_BALANCE ||
-                      fs_info->exclusive_operation == BTRFS_EXCLOP_DEV_ADD);
+                      fs_info->exclusive_operation == BTRFS_EXCLOP_DEV_ADD ||
+                      fs_info->exclusive_operation == BTRFS_EXCLOP_NONE ||
+                      fs_info->exclusive_operation == BTRFS_EXCLOP_BALANCE_PAUSED);
                fs_info->exclusive_operation = BTRFS_EXCLOP_BALANCE_PAUSED;
                spin_unlock(&fs_info->super_lock);
                break;
@@ -647,6 +649,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
        }
        trans->block_rsv = &block_rsv;
        trans->bytes_reserved = block_rsv.size;
+       /* Tree log can't currently deal with an inode which is a new root. */
+       btrfs_set_log_full_commit(trans);
 
        ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
        if (ret)
@@ -755,10 +759,7 @@ out:
        trans->bytes_reserved = 0;
        btrfs_subvolume_release_metadata(root, &block_rsv);
 
-       if (ret)
-               btrfs_end_transaction(trans);
-       else
-               ret = btrfs_commit_transaction(trans);
+       btrfs_end_transaction(trans);
 out_new_inode_args:
        btrfs_new_inode_args_destroy(&new_inode_args);
 out_inode:
@@ -2670,7 +2671,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
        struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
        struct btrfs_ioctl_vol_args_v2 *vol_args;
        struct block_device *bdev = NULL;
-       fmode_t mode;
+       void *holder;
        int ret;
        bool cancel = false;
 
@@ -2707,7 +2708,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
                goto err_drop;
 
        /* Exclusive operation is now claimed */
-       ret = btrfs_rm_device(fs_info, &args, &bdev, &mode);
+       ret = btrfs_rm_device(fs_info, &args, &bdev, &holder);
 
        btrfs_exclop_finish(fs_info);
 
@@ -2722,7 +2723,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 err_drop:
        mnt_drop_write_file(file);
        if (bdev)
-               blkdev_put(bdev, mode);
+               blkdev_put(bdev, holder);
 out:
        btrfs_put_dev_args_from_path(&args);
        kfree(vol_args);
@@ -2736,7 +2737,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
        struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
        struct btrfs_ioctl_vol_args *vol_args;
        struct block_device *bdev = NULL;
-       fmode_t mode;
+       void *holder;
        int ret;
        bool cancel = false;
 
@@ -2763,7 +2764,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
        ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
                                           cancel);
        if (ret == 0) {
-               ret = btrfs_rm_device(fs_info, &args, &bdev, &mode);
+               ret = btrfs_rm_device(fs_info, &args, &bdev, &holder);
                if (!ret)
                        btrfs_info(fs_info, "disk deleted %s", vol_args->name);
                btrfs_exclop_finish(fs_info);
@@ -2771,7 +2772,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 
        mnt_drop_write_file(file);
        if (bdev)
-               blkdev_put(bdev, mode);
+               blkdev_put(bdev, holder);
 out:
        btrfs_put_dev_args_from_path(&args);
        kfree(vol_args);
@@ -3111,6 +3112,13 @@ static noinline long btrfs_ioctl_start_sync(struct btrfs_root *root,
        struct btrfs_trans_handle *trans;
        u64 transid;
 
+       /*
+        * Start orphan cleanup here for the given root in case it hasn't been
+        * started already by other means. Errors are handled in the other
+        * functions during transaction commit.
+        */
+       btrfs_orphan_cleanup(root);
+
        trans = btrfs_attach_transaction_barrier(root);
        if (IS_ERR(trans)) {
                if (PTR_ERR(trans) != -ENOENT)
@@ -3132,14 +3140,13 @@ out:
 static noinline long btrfs_ioctl_wait_sync(struct btrfs_fs_info *fs_info,
                                           void __user *argp)
 {
-       u64 transid;
+       /* By default wait for the current transaction. */
+       u64 transid = 0;
 
-       if (argp) {
+       if (argp)
                if (copy_from_user(&transid, argp, sizeof(transid)))
                        return -EFAULT;
-       } else {
-               transid = 0;  /* current trans */
-       }
+
        return btrfs_wait_for_commit(fs_info, transid);
 }
 
index 3a496b0..7979449 100644 (file)
@@ -57,8 +57,8 @@
 
 static struct btrfs_lockdep_keyset {
        u64                     id;             /* root objectid */
-       /* Longest entry: btrfs-free-space-00 */
-       char                    names[BTRFS_MAX_LEVEL][20];
+       /* Longest entry: btrfs-block-group-00 */
+       char                    names[BTRFS_MAX_LEVEL][24];
        struct lock_class_key   keys[BTRFS_MAX_LEVEL];
 } btrfs_lockdep_keysets[] = {
        { .id = BTRFS_ROOT_TREE_OBJECTID,       DEFINE_NAME("root")     },
@@ -72,6 +72,7 @@ static struct btrfs_lockdep_keyset {
        { .id = BTRFS_DATA_RELOC_TREE_OBJECTID, DEFINE_NAME("dreloc")   },
        { .id = BTRFS_UUID_TREE_OBJECTID,       DEFINE_NAME("uuid")     },
        { .id = BTRFS_FREE_SPACE_TREE_OBJECTID, DEFINE_NAME("free-space") },
+       { .id = BTRFS_BLOCK_GROUP_TREE_OBJECTID, DEFINE_NAME("block-group") },
        { .id = 0,                              DEFINE_NAME("tree")     },
 };
 
index 3a095b9..d3fcfc6 100644 (file)
@@ -88,9 +88,9 @@ struct list_head *lzo_alloc_workspace(unsigned int level)
        if (!workspace)
                return ERR_PTR(-ENOMEM);
 
-       workspace->mem = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
-       workspace->buf = kvmalloc(WORKSPACE_BUF_LENGTH, GFP_KERNEL);
-       workspace->cbuf = kvmalloc(WORKSPACE_CBUF_LENGTH, GFP_KERNEL);
+       workspace->mem = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
+       workspace->buf = kvmalloc(WORKSPACE_BUF_LENGTH, GFP_KERNEL | __GFP_NOWARN);
+       workspace->cbuf = kvmalloc(WORKSPACE_CBUF_LENGTH, GFP_KERNEL | __GFP_NOWARN);
        if (!workspace->mem || !workspace->buf || !workspace->cbuf)
                goto fail;
 
index 310a05c..23fc11a 100644 (file)
@@ -252,14 +252,6 @@ void __cold _btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt,
 }
 #endif
 
-#ifdef CONFIG_BTRFS_ASSERT
-void __cold __noreturn btrfs_assertfail(const char *expr, const char *file, int line)
-{
-       pr_err("assertion failed: %s, in %s:%d\n", expr, file, line);
-       BUG();
-}
-#endif
-
 void __cold btrfs_print_v0_err(struct btrfs_fs_info *fs_info)
 {
        btrfs_err(fs_info,
index ac2d198..deedc1a 100644 (file)
@@ -4,14 +4,23 @@
 #define BTRFS_MESSAGES_H
 
 #include <linux/types.h>
+#include <linux/printk.h>
+#include <linux/bug.h>
 
 struct btrfs_fs_info;
 
+/*
+ * We want to be able to override this in btrfs-progs.
+ */
+#ifdef __KERNEL__
+
 static inline __printf(2, 3) __cold
 void btrfs_no_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...)
 {
 }
 
+#endif
+
 #ifdef CONFIG_PRINTK
 
 #define btrfs_printk(fs_info, fmt, args...)                            \
@@ -160,7 +169,11 @@ do {                                                               \
 } while (0)
 
 #ifdef CONFIG_BTRFS_ASSERT
-void __cold __noreturn btrfs_assertfail(const char *expr, const char *file, int line);
+
+#define btrfs_assertfail(expr, file, line)     ({                              \
+       pr_err("assertion failed: %s, in %s:%d\n", (expr), (file), (line));     \
+       BUG();                                                          \
+})
 
 #define ASSERT(expr)                                           \
        (likely(expr) ? (void)0 : btrfs_assertfail(#expr, __FILE__, __LINE__))
index 768583a..005751a 100644 (file)
@@ -143,4 +143,24 @@ static inline struct rb_node *rb_simple_insert(struct rb_root *root, u64 bytenr,
        return NULL;
 }
 
+static inline bool bitmap_test_range_all_set(const unsigned long *addr,
+                                            unsigned long start,
+                                            unsigned long nbits)
+{
+       unsigned long found_zero;
+
+       found_zero = find_next_zero_bit(addr, start + nbits, start);
+       return (found_zero == start + nbits);
+}
+
+static inline bool bitmap_test_range_all_zero(const unsigned long *addr,
+                                             unsigned long start,
+                                             unsigned long nbits)
+{
+       unsigned long found_set;
+
+       found_set = find_next_bit(addr, start + nbits, start);
+       return (found_set == start + nbits);
+}
+
 #endif
index a9778a9..a629532 100644 (file)
@@ -146,35 +146,11 @@ static inline struct rb_node *tree_search(struct btrfs_ordered_inode_tree *tree,
        return ret;
 }
 
-/*
- * Add an ordered extent to the per-inode tree.
- *
- * @inode:           Inode that this extent is for.
- * @file_offset:     Logical offset in file where the extent starts.
- * @num_bytes:       Logical length of extent in file.
- * @ram_bytes:       Full length of unencoded data.
- * @disk_bytenr:     Offset of extent on disk.
- * @disk_num_bytes:  Size of extent on disk.
- * @offset:          Offset into unencoded data where file data starts.
- * @flags:           Flags specifying type of extent (1 << BTRFS_ORDERED_*).
- * @compress_type:   Compression algorithm used for data.
- *
- * Most of these parameters correspond to &struct btrfs_file_extent_item. The
- * tree is given a single reference on the ordered extent that was inserted, and
- * the returned pointer is given a second reference.
- *
- * Return: the new ordered extent or error pointer.
- */
-struct btrfs_ordered_extent *btrfs_alloc_ordered_extent(
-                       struct btrfs_inode *inode, u64 file_offset,
-                       u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
-                       u64 disk_num_bytes, u64 offset, unsigned long flags,
-                       int compress_type)
+static struct btrfs_ordered_extent *alloc_ordered_extent(
+                       struct btrfs_inode *inode, u64 file_offset, u64 num_bytes,
+                       u64 ram_bytes, u64 disk_bytenr, u64 disk_num_bytes,
+                       u64 offset, unsigned long flags, int compress_type)
 {
-       struct btrfs_root *root = inode->root;
-       struct btrfs_fs_info *fs_info = root->fs_info;
-       struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree;
-       struct rb_node *node;
        struct btrfs_ordered_extent *entry;
        int ret;
 
@@ -184,7 +160,6 @@ struct btrfs_ordered_extent *btrfs_alloc_ordered_extent(
                ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes);
                if (ret < 0)
                        return ERR_PTR(ret);
-               ret = 0;
        } else {
                /*
                 * The ordered extent has reserved qgroup space, release now
@@ -209,15 +184,7 @@ struct btrfs_ordered_extent *btrfs_alloc_ordered_extent(
        entry->compress_type = compress_type;
        entry->truncated_len = (u64)-1;
        entry->qgroup_rsv = ret;
-       entry->physical = (u64)-1;
-
-       ASSERT((flags & ~BTRFS_ORDERED_TYPE_FLAGS) == 0);
        entry->flags = flags;
-
-       percpu_counter_add_batch(&fs_info->ordered_bytes, num_bytes,
-                                fs_info->delalloc_batch);
-
-       /* one ref for the tree */
        refcount_set(&entry->refs, 1);
        init_waitqueue_head(&entry->wait);
        INIT_LIST_HEAD(&entry->list);
@@ -226,15 +193,40 @@ struct btrfs_ordered_extent *btrfs_alloc_ordered_extent(
        INIT_LIST_HEAD(&entry->work_list);
        init_completion(&entry->completion);
 
+       /*
+        * We don't need the count_max_extents here, we can assume that all of
+        * that work has been done at higher layers, so this is truly the
+        * smallest the extent is going to get.
+        */
+       spin_lock(&inode->lock);
+       btrfs_mod_outstanding_extents(inode, 1);
+       spin_unlock(&inode->lock);
+
+       return entry;
+}
+
+static void insert_ordered_extent(struct btrfs_ordered_extent *entry)
+{
+       struct btrfs_inode *inode = BTRFS_I(entry->inode);
+       struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree;
+       struct btrfs_root *root = inode->root;
+       struct btrfs_fs_info *fs_info = root->fs_info;
+       struct rb_node *node;
+
        trace_btrfs_ordered_extent_add(inode, entry);
 
+       percpu_counter_add_batch(&fs_info->ordered_bytes, entry->num_bytes,
+                                fs_info->delalloc_batch);
+
+       /* One ref for the tree. */
+       refcount_inc(&entry->refs);
+
        spin_lock_irq(&tree->lock);
-       node = tree_insert(&tree->tree, file_offset,
-                          &entry->rb_node);
+       node = tree_insert(&tree->tree, entry->file_offset, &entry->rb_node);
        if (node)
                btrfs_panic(fs_info, -EEXIST,
                                "inconsistency in ordered tree at offset %llu",
-                               file_offset);
+                               entry->file_offset);
        spin_unlock_irq(&tree->lock);
 
        spin_lock(&root->ordered_extent_lock);
@@ -248,43 +240,43 @@ struct btrfs_ordered_extent *btrfs_alloc_ordered_extent(
                spin_unlock(&fs_info->ordered_root_lock);
        }
        spin_unlock(&root->ordered_extent_lock);
-
-       /*
-        * We don't need the count_max_extents here, we can assume that all of
-        * that work has been done at higher layers, so this is truly the
-        * smallest the extent is going to get.
-        */
-       spin_lock(&inode->lock);
-       btrfs_mod_outstanding_extents(inode, 1);
-       spin_unlock(&inode->lock);
-
-       /* One ref for the returned entry to match semantics of lookup. */
-       refcount_inc(&entry->refs);
-
-       return entry;
 }
 
 /*
- * Add a new btrfs_ordered_extent for the range, but drop the reference instead
- * of returning it to the caller.
+ * Add an ordered extent to the per-inode tree.
+ *
+ * @inode:           Inode that this extent is for.
+ * @file_offset:     Logical offset in file where the extent starts.
+ * @num_bytes:       Logical length of extent in file.
+ * @ram_bytes:       Full length of unencoded data.
+ * @disk_bytenr:     Offset of extent on disk.
+ * @disk_num_bytes:  Size of extent on disk.
+ * @offset:          Offset into unencoded data where file data starts.
+ * @flags:           Flags specifying type of extent (1 << BTRFS_ORDERED_*).
+ * @compress_type:   Compression algorithm used for data.
+ *
+ * Most of these parameters correspond to &struct btrfs_file_extent_item. The
+ * tree is given a single reference on the ordered extent that was inserted, and
+ * the returned pointer is given a second reference.
+ *
+ * Return: the new ordered extent or error pointer.
  */
-int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset,
-                            u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
-                            u64 disk_num_bytes, u64 offset, unsigned long flags,
-                            int compress_type)
+struct btrfs_ordered_extent *btrfs_alloc_ordered_extent(
+                       struct btrfs_inode *inode, u64 file_offset,
+                       u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
+                       u64 disk_num_bytes, u64 offset, unsigned long flags,
+                       int compress_type)
 {
-       struct btrfs_ordered_extent *ordered;
-
-       ordered = btrfs_alloc_ordered_extent(inode, file_offset, num_bytes,
-                                            ram_bytes, disk_bytenr,
-                                            disk_num_bytes, offset, flags,
-                                            compress_type);
+       struct btrfs_ordered_extent *entry;
 
-       if (IS_ERR(ordered))
-               return PTR_ERR(ordered);
-       btrfs_put_ordered_extent(ordered);
+       ASSERT((flags & ~BTRFS_ORDERED_TYPE_FLAGS) == 0);
 
-       return 0;
+       entry = alloc_ordered_extent(inode, file_offset, num_bytes, ram_bytes,
+                                    disk_bytenr, disk_num_bytes, offset, flags,
+                                    compress_type);
+       if (!IS_ERR(entry))
+               insert_ordered_extent(entry);
+       return entry;
 }
 
 /*
@@ -311,6 +303,90 @@ static void finish_ordered_fn(struct btrfs_work *work)
        btrfs_finish_ordered_io(ordered_extent);
 }
 
+static bool can_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
+                                     struct page *page, u64 file_offset,
+                                     u64 len, bool uptodate)
+{
+       struct btrfs_inode *inode = BTRFS_I(ordered->inode);
+       struct btrfs_fs_info *fs_info = inode->root->fs_info;
+
+       lockdep_assert_held(&inode->ordered_tree.lock);
+
+       if (page) {
+               ASSERT(page->mapping);
+               ASSERT(page_offset(page) <= file_offset);
+               ASSERT(file_offset + len <= page_offset(page) + PAGE_SIZE);
+
+               /*
+                * Ordered (Private2) bit indicates whether we still have
+                * pending io unfinished for the ordered extent.
+                *
+                * If there's no such bit, we need to skip to next range.
+                */
+               if (!btrfs_page_test_ordered(fs_info, page, file_offset, len))
+                       return false;
+               btrfs_page_clear_ordered(fs_info, page, file_offset, len);
+       }
+
+       /* Now we're fine to update the accounting. */
+       if (WARN_ON_ONCE(len > ordered->bytes_left)) {
+               btrfs_crit(fs_info,
+"bad ordered extent accounting, root=%llu ino=%llu OE offset=%llu OE len=%llu to_dec=%llu left=%llu",
+                          inode->root->root_key.objectid, btrfs_ino(inode),
+                          ordered->file_offset, ordered->num_bytes,
+                          len, ordered->bytes_left);
+               ordered->bytes_left = 0;
+       } else {
+               ordered->bytes_left -= len;
+       }
+
+       if (!uptodate)
+               set_bit(BTRFS_ORDERED_IOERR, &ordered->flags);
+
+       if (ordered->bytes_left)
+               return false;
+
+       /*
+        * All the IO of the ordered extent is finished, we need to queue
+        * the finish_func to be executed.
+        */
+       set_bit(BTRFS_ORDERED_IO_DONE, &ordered->flags);
+       cond_wake_up(&ordered->wait);
+       refcount_inc(&ordered->refs);
+       trace_btrfs_ordered_extent_mark_finished(inode, ordered);
+       return true;
+}
+
+static void btrfs_queue_ordered_fn(struct btrfs_ordered_extent *ordered)
+{
+       struct btrfs_inode *inode = BTRFS_I(ordered->inode);
+       struct btrfs_fs_info *fs_info = inode->root->fs_info;
+       struct btrfs_workqueue *wq = btrfs_is_free_space_inode(inode) ?
+               fs_info->endio_freespace_worker : fs_info->endio_write_workers;
+
+       btrfs_init_work(&ordered->work, finish_ordered_fn, NULL, NULL);
+       btrfs_queue_work(wq, &ordered->work);
+}
+
+bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
+                                struct page *page, u64 file_offset, u64 len,
+                                bool uptodate)
+{
+       struct btrfs_inode *inode = BTRFS_I(ordered->inode);
+       unsigned long flags;
+       bool ret;
+
+       trace_btrfs_finish_ordered_extent(inode, file_offset, len, uptodate);
+
+       spin_lock_irqsave(&inode->ordered_tree.lock, flags);
+       ret = can_finish_ordered_extent(ordered, page, file_offset, len, uptodate);
+       spin_unlock_irqrestore(&inode->ordered_tree.lock, flags);
+
+       if (ret)
+               btrfs_queue_ordered_fn(ordered);
+       return ret;
+}
+
 /*
  * Mark all ordered extents io inside the specified range finished.
  *
@@ -329,22 +405,11 @@ void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode,
                                    u64 num_bytes, bool uptodate)
 {
        struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree;
-       struct btrfs_fs_info *fs_info = inode->root->fs_info;
-       struct btrfs_workqueue *wq;
        struct rb_node *node;
        struct btrfs_ordered_extent *entry = NULL;
        unsigned long flags;
        u64 cur = file_offset;
 
-       if (btrfs_is_free_space_inode(inode))
-               wq = fs_info->endio_freespace_worker;
-       else
-               wq = fs_info->endio_write_workers;
-
-       if (page)
-               ASSERT(page->mapping && page_offset(page) <= file_offset &&
-                      file_offset + num_bytes <= page_offset(page) + PAGE_SIZE);
-
        spin_lock_irqsave(&tree->lock, flags);
        while (cur < file_offset + num_bytes) {
                u64 entry_end;
@@ -397,50 +462,9 @@ void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode,
                ASSERT(end + 1 - cur < U32_MAX);
                len = end + 1 - cur;
 
-               if (page) {
-                       /*
-                        * Ordered (Private2) bit indicates whether we still
-                        * have pending io unfinished for the ordered extent.
-                        *
-                        * If there's no such bit, we need to skip to next range.
-                        */
-                       if (!btrfs_page_test_ordered(fs_info, page, cur, len)) {
-                               cur += len;
-                               continue;
-                       }
-                       btrfs_page_clear_ordered(fs_info, page, cur, len);
-               }
-
-               /* Now we're fine to update the accounting */
-               if (unlikely(len > entry->bytes_left)) {
-                       WARN_ON(1);
-                       btrfs_crit(fs_info,
-"bad ordered extent accounting, root=%llu ino=%llu OE offset=%llu OE len=%llu to_dec=%u left=%llu",
-                                  inode->root->root_key.objectid,
-                                  btrfs_ino(inode),
-                                  entry->file_offset,
-                                  entry->num_bytes,
-                                  len, entry->bytes_left);
-                       entry->bytes_left = 0;
-               } else {
-                       entry->bytes_left -= len;
-               }
-
-               if (!uptodate)
-                       set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
-
-               /*
-                * All the IO of the ordered extent is finished, we need to queue
-                * the finish_func to be executed.
-                */
-               if (entry->bytes_left == 0) {
-                       set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
-                       cond_wake_up(&entry->wait);
-                       refcount_inc(&entry->refs);
-                       trace_btrfs_ordered_extent_mark_finished(inode, entry);
+               if (can_finish_ordered_extent(entry, page, cur, len, uptodate)) {
                        spin_unlock_irqrestore(&tree->lock, flags);
-                       btrfs_init_work(&entry->work, finish_ordered_fn, NULL, NULL);
-                       btrfs_queue_work(wq, &entry->work);
+                       btrfs_queue_ordered_fn(entry);
                        spin_lock_irqsave(&tree->lock, flags);
                }
                cur += len;
@@ -564,7 +588,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
        freespace_inode = btrfs_is_free_space_inode(btrfs_inode);
 
        btrfs_lockdep_acquire(fs_info, btrfs_trans_pending_ordered);
-       /* This is paired with btrfs_add_ordered_extent. */
+       /* This is paired with btrfs_alloc_ordered_extent. */
        spin_lock(&btrfs_inode->lock);
        btrfs_mod_outstanding_extents(btrfs_inode, -1);
        spin_unlock(&btrfs_inode->lock);
@@ -1117,17 +1141,22 @@ bool btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end,
 }
 
 /* Split out a new ordered extent for this first @len bytes of @ordered. */
-int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 len)
+struct btrfs_ordered_extent *btrfs_split_ordered_extent(
+                       struct btrfs_ordered_extent *ordered, u64 len)
 {
-       struct inode *inode = ordered->inode;
-       struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree;
-       struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+       struct btrfs_inode *inode = BTRFS_I(ordered->inode);
+       struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree;
+       struct btrfs_root *root = inode->root;
+       struct btrfs_fs_info *fs_info = root->fs_info;
        u64 file_offset = ordered->file_offset;
        u64 disk_bytenr = ordered->disk_bytenr;
-       unsigned long flags = ordered->flags & BTRFS_ORDERED_TYPE_FLAGS;
+       unsigned long flags = ordered->flags;
+       struct btrfs_ordered_sum *sum, *tmpsum;
+       struct btrfs_ordered_extent *new;
        struct rb_node *node;
+       u64 offset = 0;
 
-       trace_btrfs_ordered_extent_split(BTRFS_I(inode), ordered);
+       trace_btrfs_ordered_extent_split(inode, ordered);
 
        ASSERT(!(flags & (1U << BTRFS_ORDERED_COMPRESSED)));
 
@@ -1136,18 +1165,27 @@ int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 len)
         * reduce the original extent to a zero length either.
         */
        if (WARN_ON_ONCE(len >= ordered->num_bytes))
-               return -EINVAL;
-       /* We cannot split once ordered extent is past end_bio. */
-       if (WARN_ON_ONCE(ordered->bytes_left != ordered->disk_num_bytes))
-               return -EINVAL;
+               return ERR_PTR(-EINVAL);
+       /* We cannot split partially completed ordered extents. */
+       if (ordered->bytes_left) {
+               ASSERT(!(flags & ~BTRFS_ORDERED_TYPE_FLAGS));
+               if (WARN_ON_ONCE(ordered->bytes_left != ordered->disk_num_bytes))
+                       return ERR_PTR(-EINVAL);
+       }
        /* We cannot split a compressed ordered extent. */
        if (WARN_ON_ONCE(ordered->disk_num_bytes != ordered->num_bytes))
-               return -EINVAL;
-       /* Checksum list should be empty. */
-       if (WARN_ON_ONCE(!list_empty(&ordered->list)))
-               return -EINVAL;
+               return ERR_PTR(-EINVAL);
 
-       spin_lock_irq(&tree->lock);
+       new = alloc_ordered_extent(inode, file_offset, len, len, disk_bytenr,
+                                  len, 0, flags, ordered->compress_type);
+       if (IS_ERR(new))
+               return new;
+
+       /* One ref for the tree. */
+       refcount_inc(&new->refs);
+
+       spin_lock_irq(&root->ordered_extent_lock);
+       spin_lock(&tree->lock);
        /* Remove from tree once */
        node = &ordered->rb_node;
        rb_erase(node, &tree->tree);
@@ -1159,26 +1197,48 @@ int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 len)
        ordered->disk_bytenr += len;
        ordered->num_bytes -= len;
        ordered->disk_num_bytes -= len;
-       ordered->bytes_left -= len;
+
+       if (test_bit(BTRFS_ORDERED_IO_DONE, &ordered->flags)) {
+               ASSERT(ordered->bytes_left == 0);
+               new->bytes_left = 0;
+       } else {
+               ordered->bytes_left -= len;
+       }
+
+       if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags)) {
+               if (ordered->truncated_len > len) {
+                       ordered->truncated_len -= len;
+               } else {
+                       new->truncated_len = ordered->truncated_len;
+                       ordered->truncated_len = 0;
+               }
+       }
+
+       list_for_each_entry_safe(sum, tmpsum, &ordered->list, list) {
+               if (offset == len)
+                       break;
+               list_move_tail(&sum->list, &new->list);
+               offset += sum->len;
+       }
 
        /* Re-insert the node */
        node = tree_insert(&tree->tree, ordered->file_offset, &ordered->rb_node);
        if (node)
                btrfs_panic(fs_info, -EEXIST,
                        "zoned: inconsistency in ordered tree at offset %llu",
-                           ordered->file_offset);
+                       ordered->file_offset);
 
-       spin_unlock_irq(&tree->lock);
-
-       /*
-        * The splitting extent is already counted and will be added again in
-        * btrfs_add_ordered_extent(). Subtract len to avoid double counting.
-        */
-       percpu_counter_add_batch(&fs_info->ordered_bytes, -len, fs_info->delalloc_batch);
+       node = tree_insert(&tree->tree, new->file_offset, &new->rb_node);
+       if (node)
+               btrfs_panic(fs_info, -EEXIST,
+                       "zoned: inconsistency in ordered tree at offset %llu",
+                       new->file_offset);
+       spin_unlock(&tree->lock);
 
-       return btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, len, len,
-                                       disk_bytenr, len, 0, flags,
-                                       ordered->compress_type);
+       list_add_tail(&new->root_extent_list, &root->ordered_extents);
+       root->nr_ordered_extents++;
+       spin_unlock_irq(&root->ordered_extent_lock);
+       return new;
 }
 
 int __init ordered_data_init(void)
index f0f1138..173bd5c 100644 (file)
@@ -14,13 +14,13 @@ struct btrfs_ordered_inode_tree {
 };
 
 struct btrfs_ordered_sum {
-       /* bytenr is the start of this extent on disk */
-       u64 bytenr;
-
        /*
-        * this is the length in bytes covered by the sums array below.
+        * Logical start address and length for of the blocks covered by
+        * the sums array.
         */
-       int len;
+       u64 logical;
+       u32 len;
+
        struct list_head list;
        /* last field is a variable length array of csums */
        u8 sums[];
@@ -151,12 +151,6 @@ struct btrfs_ordered_extent {
        struct completion completion;
        struct btrfs_work flush_work;
        struct list_head work_list;
-
-       /*
-        * Used to reverse-map physical address returned from ZONE_APPEND write
-        * command in a workqueue context
-        */
-       u64 physical;
 };
 
 static inline void
@@ -167,11 +161,15 @@ btrfs_ordered_inode_tree_init(struct btrfs_ordered_inode_tree *t)
        t->last = NULL;
 }
 
+int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent);
 int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent);
 
 void btrfs_put_ordered_extent(struct btrfs_ordered_extent *entry);
 void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
                                struct btrfs_ordered_extent *entry);
+bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
+                                struct page *page, u64 file_offset, u64 len,
+                                bool uptodate);
 void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode,
                                struct page *page, u64 file_offset,
                                u64 num_bytes, bool uptodate);
@@ -183,10 +181,6 @@ struct btrfs_ordered_extent *btrfs_alloc_ordered_extent(
                        u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
                        u64 disk_num_bytes, u64 offset, unsigned long flags,
                        int compress_type);
-int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset,
-                            u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
-                            u64 disk_num_bytes, u64 offset, unsigned long flags,
-                            int compress_type);
 void btrfs_add_ordered_sum(struct btrfs_ordered_extent *entry,
                           struct btrfs_ordered_sum *sum);
 struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *inode,
@@ -212,7 +206,8 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
                                        struct extent_state **cached_state);
 bool btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end,
                                  struct extent_state **cached_state);
-int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 len);
+struct btrfs_ordered_extent *btrfs_split_ordered_extent(
+                       struct btrfs_ordered_extent *ordered, u64 len);
 int __init ordered_data_init(void);
 void __cold ordered_data_exit(void);
 
index b93c962..aa06d9c 100644 (file)
@@ -49,7 +49,7 @@ const char *btrfs_root_name(const struct btrfs_key *key, char *buf)
        return buf;
 }
 
-static void print_chunk(struct extent_buffer *eb, struct btrfs_chunk *chunk)
+static void print_chunk(const struct extent_buffer *eb, struct btrfs_chunk *chunk)
 {
        int num_stripes = btrfs_chunk_num_stripes(eb, chunk);
        int i;
@@ -62,7 +62,7 @@ static void print_chunk(struct extent_buffer *eb, struct btrfs_chunk *chunk)
                      btrfs_stripe_offset_nr(eb, chunk, i));
        }
 }
-static void print_dev_item(struct extent_buffer *eb,
+static void print_dev_item(const struct extent_buffer *eb,
                           struct btrfs_dev_item *dev_item)
 {
        pr_info("\t\tdev item devid %llu total_bytes %llu bytes used %llu\n",
@@ -70,7 +70,7 @@ static void print_dev_item(struct extent_buffer *eb,
               btrfs_device_total_bytes(eb, dev_item),
               btrfs_device_bytes_used(eb, dev_item));
 }
-static void print_extent_data_ref(struct extent_buffer *eb,
+static void print_extent_data_ref(const struct extent_buffer *eb,
                                  struct btrfs_extent_data_ref *ref)
 {
        pr_cont("extent data backref root %llu objectid %llu offset %llu count %u\n",
@@ -80,7 +80,7 @@ static void print_extent_data_ref(struct extent_buffer *eb,
               btrfs_extent_data_ref_count(eb, ref));
 }
 
-static void print_extent_item(struct extent_buffer *eb, int slot, int type)
+static void print_extent_item(const struct extent_buffer *eb, int slot, int type)
 {
        struct btrfs_extent_item *ei;
        struct btrfs_extent_inline_ref *iref;
@@ -151,10 +151,10 @@ static void print_extent_item(struct extent_buffer *eb, int slot, int type)
                        pr_cont("shared data backref parent %llu count %u\n",
                               offset, btrfs_shared_data_ref_count(eb, sref));
                        /*
-                        * offset is supposed to be a tree block which
-                        * must be aligned to nodesize.
+                        * Offset is supposed to be a tree block which must be
+                        * aligned to sectorsize.
                         */
-                       if (!IS_ALIGNED(offset, eb->fs_info->nodesize))
+                       if (!IS_ALIGNED(offset, eb->fs_info->sectorsize))
                                pr_info(
                        "\t\t\t(parent %llu not aligned to sectorsize %u)\n",
                                     offset, eb->fs_info->sectorsize);
@@ -169,7 +169,7 @@ static void print_extent_item(struct extent_buffer *eb, int slot, int type)
        WARN_ON(ptr > end);
 }
 
-static void print_uuid_item(struct extent_buffer *l, unsigned long offset,
+static void print_uuid_item(const struct extent_buffer *l, unsigned long offset,
                            u32 item_size)
 {
        if (!IS_ALIGNED(item_size, sizeof(u64))) {
@@ -191,7 +191,7 @@ static void print_uuid_item(struct extent_buffer *l, unsigned long offset,
  * Helper to output refs and locking status of extent buffer.  Useful to debug
  * race condition related problems.
  */
-static void print_eb_refs_lock(struct extent_buffer *eb)
+static void print_eb_refs_lock(const struct extent_buffer *eb)
 {
 #ifdef CONFIG_BTRFS_DEBUG
        btrfs_info(eb->fs_info, "refs %u lock_owner %u current %u",
@@ -199,7 +199,7 @@ static void print_eb_refs_lock(struct extent_buffer *eb)
 #endif
 }
 
-void btrfs_print_leaf(struct extent_buffer *l)
+void btrfs_print_leaf(const struct extent_buffer *l)
 {
        struct btrfs_fs_info *fs_info;
        int i;
@@ -355,7 +355,7 @@ void btrfs_print_leaf(struct extent_buffer *l)
        }
 }
 
-void btrfs_print_tree(struct extent_buffer *c, bool follow)
+void btrfs_print_tree(const struct extent_buffer *c, bool follow)
 {
        struct btrfs_fs_info *fs_info;
        int i; u32 nr;
index 8c3e931..c42bc66 100644 (file)
@@ -9,8 +9,8 @@
 /* Buffer size to contain tree name and possibly additional data (offset) */
 #define BTRFS_ROOT_NAME_BUF_LEN                                48
 
-void btrfs_print_leaf(struct extent_buffer *l);
-void btrfs_print_tree(struct extent_buffer *c, bool follow);
+void btrfs_print_leaf(const struct extent_buffer *l);
+void btrfs_print_tree(const struct extent_buffer *c, bool follow);
 const char *btrfs_root_name(const struct btrfs_key *key, char *buf);
 
 #endif
index f41da7a..da1f84a 100644 (file)
@@ -1232,12 +1232,23 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
        int ret = 0;
 
        /*
-        * We need to have subvol_sem write locked, to prevent races between
-        * concurrent tasks trying to disable quotas, because we will unlock
-        * and relock qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+        * We need to have subvol_sem write locked to prevent races with
+        * snapshot creation.
         */
        lockdep_assert_held_write(&fs_info->subvol_sem);
 
+       /*
+        * Lock the cleaner mutex to prevent races with concurrent relocation,
+        * because relocation may be building backrefs for blocks of the quota
+        * root while we are deleting the root. This is like dropping fs roots
+        * of deleted snapshots/subvolumes, we need the same protection.
+        *
+        * This also prevents races between concurrent tasks trying to disable
+        * quotas, because we will unlock and relock qgroup_ioctl_lock across
+        * BTRFS_FS_QUOTA_ENABLED changes.
+        */
+       mutex_lock(&fs_info->cleaner_mutex);
+
        mutex_lock(&fs_info->qgroup_ioctl_lock);
        if (!fs_info->quota_root)
                goto out;
@@ -1301,7 +1312,9 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
                goto out;
        }
 
+       spin_lock(&fs_info->trans_lock);
        list_del(&quota_root->dirty_list);
+       spin_unlock(&fs_info->trans_lock);
 
        btrfs_tree_lock(quota_root->node);
        btrfs_clear_buffer_dirty(trans, quota_root->node);
@@ -1317,6 +1330,7 @@ out:
                btrfs_end_transaction(trans);
        else if (trans)
                ret = btrfs_end_transaction(trans);
+       mutex_unlock(&fs_info->cleaner_mutex);
 
        return ret;
 }
index 2fab37f..f37b925 100644 (file)
@@ -1079,7 +1079,7 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
 
        /* see if we can add this page onto our existing bio */
        if (last) {
-               u64 last_end = last->bi_iter.bi_sector << 9;
+               u64 last_end = last->bi_iter.bi_sector << SECTOR_SHIFT;
                last_end += last->bi_iter.bi_size;
 
                /*
@@ -1099,7 +1099,7 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
        bio = bio_alloc(stripe->dev->bdev,
                        max(BTRFS_STRIPE_LEN >> PAGE_SHIFT, 1),
                        op, GFP_NOFS);
-       bio->bi_iter.bi_sector = disk_start >> 9;
+       bio->bi_iter.bi_sector = disk_start >> SECTOR_SHIFT;
        bio->bi_private = rbio;
 
        __bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
@@ -2747,3 +2747,48 @@ void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio)
        if (!lock_stripe_add(rbio))
                start_async_work(rbio, scrub_rbio_work_locked);
 }
+
+/*
+ * This is for scrub call sites where we already have correct data contents.
+ * This allows us to avoid reading data stripes again.
+ *
+ * Unfortunately here we have to do page copy, other than reusing the pages.
+ * This is due to the fact rbio has its own page management for its cache.
+ */
+void raid56_parity_cache_data_pages(struct btrfs_raid_bio *rbio,
+                                   struct page **data_pages, u64 data_logical)
+{
+       const u64 offset_in_full_stripe = data_logical -
+                                         rbio->bioc->full_stripe_logical;
+       const int page_index = offset_in_full_stripe >> PAGE_SHIFT;
+       const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+       const u32 sectors_per_page = PAGE_SIZE / sectorsize;
+       int ret;
+
+       /*
+        * If we hit ENOMEM temporarily, but later at
+        * raid56_parity_submit_scrub_rbio() time it succeeded, we just do
+        * the extra read, not a big deal.
+        *
+        * If we hit ENOMEM later at raid56_parity_submit_scrub_rbio() time,
+        * the bio would got proper error number set.
+        */
+       ret = alloc_rbio_data_pages(rbio);
+       if (ret < 0)
+               return;
+
+       /* data_logical must be at stripe boundary and inside the full stripe. */
+       ASSERT(IS_ALIGNED(offset_in_full_stripe, BTRFS_STRIPE_LEN));
+       ASSERT(offset_in_full_stripe < (rbio->nr_data << BTRFS_STRIPE_LEN_SHIFT));
+
+       for (int page_nr = 0; page_nr < (BTRFS_STRIPE_LEN >> PAGE_SHIFT); page_nr++) {
+               struct page *dst = rbio->stripe_pages[page_nr + page_index];
+               struct page *src = data_pages[page_nr];
+
+               memcpy_page(dst, 0, src, 0, PAGE_SIZE);
+               for (int sector_nr = sectors_per_page * page_index;
+                    sector_nr < sectors_per_page * (page_index + 1);
+                    sector_nr++)
+                       rbio->stripe_sectors[sector_nr].uptodate = true;
+       }
+}
index 0f7f31c..0e84c9c 100644 (file)
@@ -193,6 +193,9 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
                                unsigned long *dbitmap, int stripe_nsectors);
 void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio);
 
+void raid56_parity_cache_data_pages(struct btrfs_raid_bio *rbio,
+                                   struct page **data_pages, u64 data_logical);
+
 int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info);
 void btrfs_free_stripe_hash_table(struct btrfs_fs_info *info);
 
index 09b1988..25a3361 100644 (file)
@@ -174,8 +174,8 @@ static void mark_block_processed(struct reloc_control *rc,
            in_range(node->bytenr, rc->block_group->start,
                     rc->block_group->length)) {
                blocksize = rc->extent_root->fs_info->nodesize;
-               set_extent_bits(&rc->processed_blocks, node->bytenr,
-                               node->bytenr + blocksize - 1, EXTENT_DIRTY);
+               set_extent_bit(&rc->processed_blocks, node->bytenr,
+                              node->bytenr + blocksize - 1, EXTENT_DIRTY, NULL);
        }
        node->processed = 1;
 }
@@ -3051,9 +3051,9 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra,
                        u64 boundary_end = boundary_start +
                                           fs_info->sectorsize - 1;
 
-                       set_extent_bits(&BTRFS_I(inode)->io_tree,
-                                       boundary_start, boundary_end,
-                                       EXTENT_BOUNDARY);
+                       set_extent_bit(&BTRFS_I(inode)->io_tree,
+                                      boundary_start, boundary_end,
+                                      EXTENT_BOUNDARY, NULL);
                }
                unlock_extent(&BTRFS_I(inode)->io_tree, clamped_start, clamped_end,
                              &cached_state);
@@ -3422,7 +3422,7 @@ int add_data_references(struct reloc_control *rc,
        btrfs_release_path(path);
 
        ctx.bytenr = extent_key->objectid;
-       ctx.ignore_extent_item_pos = true;
+       ctx.skip_inode_ref_list = true;
        ctx.fs_info = rc->extent_root->fs_info;
 
        ret = btrfs_find_all_leafs(&ctx);
@@ -4342,29 +4342,25 @@ out:
  * cloning checksum properly handles the nodatasum extents.
  * it also saves CPU time to re-calculate the checksum.
  */
-int btrfs_reloc_clone_csums(struct btrfs_inode *inode, u64 file_pos, u64 len)
+int btrfs_reloc_clone_csums(struct btrfs_ordered_extent *ordered)
 {
+       struct btrfs_inode *inode = BTRFS_I(ordered->inode);
        struct btrfs_fs_info *fs_info = inode->root->fs_info;
-       struct btrfs_root *csum_root;
-       struct btrfs_ordered_sum *sums;
-       struct btrfs_ordered_extent *ordered;
-       int ret;
-       u64 disk_bytenr;
-       u64 new_bytenr;
+       u64 disk_bytenr = ordered->file_offset + inode->index_cnt;
+       struct btrfs_root *csum_root = btrfs_csum_root(fs_info, disk_bytenr);
        LIST_HEAD(list);
+       int ret;
 
-       ordered = btrfs_lookup_ordered_extent(inode, file_pos);
-       BUG_ON(ordered->file_offset != file_pos || ordered->num_bytes != len);
-
-       disk_bytenr = file_pos + inode->index_cnt;
-       csum_root = btrfs_csum_root(fs_info, disk_bytenr);
        ret = btrfs_lookup_csums_list(csum_root, disk_bytenr,
-                                     disk_bytenr + len - 1, &list, 0, false);
+                                     disk_bytenr + ordered->num_bytes - 1,
+                                     &list, 0, false);
        if (ret)
-               goto out;
+               return ret;
 
        while (!list_empty(&list)) {
-               sums = list_entry(list.next, struct btrfs_ordered_sum, list);
+               struct btrfs_ordered_sum *sums =
+                       list_entry(list.next, struct btrfs_ordered_sum, list);
+
                list_del_init(&sums->list);
 
                /*
@@ -4379,14 +4375,11 @@ int btrfs_reloc_clone_csums(struct btrfs_inode *inode, u64 file_pos, u64 len)
                 * disk_len vs real len like with real inodes since it's all
                 * disk length.
                 */
-               new_bytenr = ordered->disk_bytenr + sums->bytenr - disk_bytenr;
-               sums->bytenr = new_bytenr;
-
+               sums->logical = ordered->disk_bytenr + sums->logical - disk_bytenr;
                btrfs_add_ordered_sum(ordered, sums);
        }
-out:
-       btrfs_put_ordered_extent(ordered);
-       return ret;
+
+       return 0;
 }
 
 int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
@@ -4523,3 +4516,19 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
                ret = clone_backref_node(trans, rc, root, reloc_root);
        return ret;
 }
+
+/*
+ * Get the current bytenr for the block group which is being relocated.
+ *
+ * Return U64_MAX if no running relocation.
+ */
+u64 btrfs_get_reloc_bg_bytenr(struct btrfs_fs_info *fs_info)
+{
+       u64 logical = U64_MAX;
+
+       lockdep_assert_held(&fs_info->reloc_mutex);
+
+       if (fs_info->reloc_ctl && fs_info->reloc_ctl->block_group)
+               logical = fs_info->reloc_ctl->block_group->start;
+       return logical;
+}
index 2041a86..77d69f6 100644 (file)
@@ -8,7 +8,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans, struct btrfs_root *r
 int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
                            struct btrfs_root *root);
 int btrfs_recover_relocation(struct btrfs_fs_info *fs_info);
-int btrfs_reloc_clone_csums(struct btrfs_inode *inode, u64 file_pos, u64 len);
+int btrfs_reloc_clone_csums(struct btrfs_ordered_extent *ordered);
 int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
                          struct btrfs_root *root, struct extent_buffer *buf,
                          struct extent_buffer *cow);
@@ -19,5 +19,6 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
 int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info);
 struct btrfs_root *find_reloc_root(struct btrfs_fs_info *fs_info, u64 bytenr);
 int btrfs_should_ignore_reloc_root(struct btrfs_root *root);
+u64 btrfs_get_reloc_bg_bytenr(struct btrfs_fs_info *fs_info);
 
 #endif
index 836725a..4cae41b 100644 (file)
@@ -134,8 +134,14 @@ struct scrub_stripe {
         * The errors hit during the initial read of the stripe.
         *
         * Would be utilized for error reporting and repair.
+        *
+        * The remaining init_nr_* records the number of errors hit, only used
+        * by error reporting.
         */
        unsigned long init_error_bitmap;
+       unsigned int init_nr_io_errors;
+       unsigned int init_nr_csum_errors;
+       unsigned int init_nr_meta_errors;
 
        /*
         * The following error bitmaps are all for the current status.
@@ -171,7 +177,6 @@ struct scrub_ctx {
        struct btrfs_fs_info    *fs_info;
        int                     first_free;
        int                     cur_stripe;
-       struct list_head        csum_list;
        atomic_t                cancel_req;
        int                     readonly;
        int                     sectors_per_bio;
@@ -303,17 +308,6 @@ static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
        scrub_pause_off(fs_info);
 }
 
-static void scrub_free_csums(struct scrub_ctx *sctx)
-{
-       while (!list_empty(&sctx->csum_list)) {
-               struct btrfs_ordered_sum *sum;
-               sum = list_first_entry(&sctx->csum_list,
-                                      struct btrfs_ordered_sum, list);
-               list_del(&sum->list);
-               kfree(sum);
-       }
-}
-
 static noinline_for_stack void scrub_free_ctx(struct scrub_ctx *sctx)
 {
        int i;
@@ -324,7 +318,6 @@ static noinline_for_stack void scrub_free_ctx(struct scrub_ctx *sctx)
        for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++)
                release_scrub_stripe(&sctx->stripes[i]);
 
-       scrub_free_csums(sctx);
        kfree(sctx);
 }
 
@@ -346,7 +339,6 @@ static noinline_for_stack struct scrub_ctx *scrub_setup_ctx(
        refcount_set(&sctx->refs, 1);
        sctx->is_dev_replace = is_dev_replace;
        sctx->fs_info = fs_info;
-       INIT_LIST_HEAD(&sctx->csum_list);
        for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) {
                int ret;
 
@@ -473,11 +465,8 @@ static void scrub_print_common_warning(const char *errstr, struct btrfs_device *
        struct extent_buffer *eb;
        struct btrfs_extent_item *ei;
        struct scrub_warning swarn;
-       unsigned long ptr = 0;
        u64 flags = 0;
-       u64 ref_root;
        u32 item_size;
-       u8 ref_level = 0;
        int ret;
 
        /* Super block error, no need to search extent tree. */
@@ -507,19 +496,28 @@ static void scrub_print_common_warning(const char *errstr, struct btrfs_device *
        item_size = btrfs_item_size(eb, path->slots[0]);
 
        if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
-               do {
+               unsigned long ptr = 0;
+               u8 ref_level;
+               u64 ref_root;
+
+               while (true) {
                        ret = tree_backref_for_extent(&ptr, eb, &found_key, ei,
                                                      item_size, &ref_root,
                                                      &ref_level);
+                       if (ret < 0) {
+                               btrfs_warn(fs_info,
+                               "failed to resolve tree backref for logical %llu: %d",
+                                                 swarn.logical, ret);
+                               break;
+                       }
+                       if (ret > 0)
+                               break;
                        btrfs_warn_in_rcu(fs_info,
 "%s at logical %llu on dev %s, physical %llu: metadata %s (level %d) in tree %llu",
-                               errstr, swarn.logical,
-                               btrfs_dev_name(dev),
-                               swarn.physical,
-                               ref_level ? "node" : "leaf",
-                               ret < 0 ? -1 : ref_level,
-                               ret < 0 ? -1 : ref_root);
-               } while (ret != 1);
+                               errstr, swarn.logical, btrfs_dev_name(dev),
+                               swarn.physical, (ref_level ? "node" : "leaf"),
+                               ref_level, ref_root);
+               }
                btrfs_release_path(path);
        } else {
                struct btrfs_backref_walk_ctx ctx = { 0 };
@@ -540,48 +538,6 @@ out:
        btrfs_free_path(path);
 }
 
-static inline int scrub_nr_raid_mirrors(struct btrfs_io_context *bioc)
-{
-       if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID5)
-               return 2;
-       else if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID6)
-               return 3;
-       else
-               return (int)bioc->num_stripes;
-}
-
-static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type,
-                                                u64 full_stripe_logical,
-                                                int nstripes, int mirror,
-                                                int *stripe_index,
-                                                u64 *stripe_offset)
-{
-       int i;
-
-       if (map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-               const int nr_data_stripes = (map_type & BTRFS_BLOCK_GROUP_RAID5) ?
-                                           nstripes - 1 : nstripes - 2;
-
-               /* RAID5/6 */
-               for (i = 0; i < nr_data_stripes; i++) {
-                       const u64 data_stripe_start = full_stripe_logical +
-                                               (i * BTRFS_STRIPE_LEN);
-
-                       if (logical >= data_stripe_start &&
-                           logical < data_stripe_start + BTRFS_STRIPE_LEN)
-                               break;
-               }
-
-               *stripe_index = i;
-               *stripe_offset = (logical - full_stripe_logical) &
-                                BTRFS_STRIPE_LEN_MASK;
-       } else {
-               /* The other RAID type */
-               *stripe_index = mirror;
-               *stripe_offset = 0;
-       }
-}
-
 static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical)
 {
        int ret = 0;
@@ -918,8 +874,9 @@ static void scrub_stripe_report_errors(struct scrub_ctx *sctx,
 
                /* For scrub, our mirror_num should always start at 1. */
                ASSERT(stripe->mirror_num >= 1);
-               ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS,
-                                      stripe->logical, &mapped_len, &bioc);
+               ret = btrfs_map_block(fs_info, BTRFS_MAP_GET_READ_MIRRORS,
+                                     stripe->logical, &mapped_len, &bioc,
+                                     NULL, NULL, 1);
                /*
                 * If we failed, dev will be NULL, and later detailed reports
                 * will just be skipped.
@@ -1003,12 +960,9 @@ skip:
        sctx->stat.data_bytes_scrubbed += nr_data_sectors << fs_info->sectorsize_bits;
        sctx->stat.tree_bytes_scrubbed += nr_meta_sectors << fs_info->sectorsize_bits;
        sctx->stat.no_csum += nr_nodatacsum_sectors;
-       sctx->stat.read_errors +=
-               bitmap_weight(&stripe->io_error_bitmap, stripe->nr_sectors);
-       sctx->stat.csum_errors +=
-               bitmap_weight(&stripe->csum_error_bitmap, stripe->nr_sectors);
-       sctx->stat.verify_errors +=
-               bitmap_weight(&stripe->meta_error_bitmap, stripe->nr_sectors);
+       sctx->stat.read_errors += stripe->init_nr_io_errors;
+       sctx->stat.csum_errors += stripe->init_nr_csum_errors;
+       sctx->stat.verify_errors += stripe->init_nr_meta_errors;
        sctx->stat.uncorrectable_errors +=
                bitmap_weight(&stripe->error_bitmap, stripe->nr_sectors);
        sctx->stat.corrected_errors += nr_repaired_sectors;
@@ -1041,6 +995,12 @@ static void scrub_stripe_read_repair_worker(struct work_struct *work)
        scrub_verify_one_stripe(stripe, stripe->extent_sector_bitmap);
        /* Save the initial failed bitmap for later repair and report usage. */
        stripe->init_error_bitmap = stripe->error_bitmap;
+       stripe->init_nr_io_errors = bitmap_weight(&stripe->io_error_bitmap,
+                                                 stripe->nr_sectors);
+       stripe->init_nr_csum_errors = bitmap_weight(&stripe->csum_error_bitmap,
+                                                   stripe->nr_sectors);
+       stripe->init_nr_meta_errors = bitmap_weight(&stripe->meta_error_bitmap,
+                                                   stripe->nr_sectors);
 
        if (bitmap_empty(&stripe->init_error_bitmap, stripe->nr_sectors))
                goto out;
@@ -1137,6 +1097,35 @@ static void scrub_write_endio(struct btrfs_bio *bbio)
                wake_up(&stripe->io_wait);
 }
 
+static void scrub_submit_write_bio(struct scrub_ctx *sctx,
+                                  struct scrub_stripe *stripe,
+                                  struct btrfs_bio *bbio, bool dev_replace)
+{
+       struct btrfs_fs_info *fs_info = sctx->fs_info;
+       u32 bio_len = bbio->bio.bi_iter.bi_size;
+       u32 bio_off = (bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT) -
+                     stripe->logical;
+
+       fill_writer_pointer_gap(sctx, stripe->physical + bio_off);
+       atomic_inc(&stripe->pending_io);
+       btrfs_submit_repair_write(bbio, stripe->mirror_num, dev_replace);
+       if (!btrfs_is_zoned(fs_info))
+               return;
+       /*
+        * For zoned writeback, queue depth must be 1, thus we must wait for
+        * the write to finish before the next write.
+        */
+       wait_scrub_stripe_io(stripe);
+
+       /*
+        * And also need to update the write pointer if write finished
+        * successfully.
+        */
+       if (!test_bit(bio_off >> fs_info->sectorsize_bits,
+                     &stripe->write_error_bitmap))
+               sctx->write_pointer += bio_len;
+}
+
 /*
  * Submit the write bio(s) for the sectors specified by @write_bitmap.
  *
@@ -1155,7 +1144,6 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
 {
        struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
        struct btrfs_bio *bbio = NULL;
-       const bool zoned = btrfs_is_zoned(fs_info);
        int sector_nr;
 
        for_each_set_bit(sector_nr, &write_bitmap, stripe->nr_sectors) {
@@ -1168,13 +1156,7 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
 
                /* Cannot merge with previous sector, submit the current one. */
                if (bbio && sector_nr && !test_bit(sector_nr - 1, &write_bitmap)) {
-                       fill_writer_pointer_gap(sctx, stripe->physical +
-                                       (sector_nr << fs_info->sectorsize_bits));
-                       atomic_inc(&stripe->pending_io);
-                       btrfs_submit_repair_write(bbio, stripe->mirror_num, dev_replace);
-                       /* For zoned writeback, queue depth must be 1. */
-                       if (zoned)
-                               wait_scrub_stripe_io(stripe);
+                       scrub_submit_write_bio(sctx, stripe, bbio, dev_replace);
                        bbio = NULL;
                }
                if (!bbio) {
@@ -1187,14 +1169,8 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
                ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
                ASSERT(ret == fs_info->sectorsize);
        }
-       if (bbio) {
-               fill_writer_pointer_gap(sctx, bbio->bio.bi_iter.bi_sector <<
-                                       SECTOR_SHIFT);
-               atomic_inc(&stripe->pending_io);
-               btrfs_submit_repair_write(bbio, stripe->mirror_num, dev_replace);
-               if (zoned)
-                       wait_scrub_stripe_io(stripe);
-       }
+       if (bbio)
+               scrub_submit_write_bio(sctx, stripe, bbio, dev_replace);
 }
 
 /*
@@ -1279,7 +1255,7 @@ static int get_raid56_logic_offset(u64 physical, int num,
                u32 stripe_index;
                u32 rot;
 
-               *offset = last_offset + (i << BTRFS_STRIPE_LEN_SHIFT);
+               *offset = last_offset + btrfs_stripe_nr_to_offset(i);
 
                stripe_nr = (u32)(*offset >> BTRFS_STRIPE_LEN_SHIFT) / data_stripes;
 
@@ -1294,7 +1270,7 @@ static int get_raid56_logic_offset(u64 physical, int num,
                if (stripe_index < num)
                        j++;
        }
-       *offset = last_offset + (j << BTRFS_STRIPE_LEN_SHIFT);
+       *offset = last_offset + btrfs_stripe_nr_to_offset(j);
        return 1;
 }
 
@@ -1474,6 +1450,9 @@ static void scrub_stripe_reset_bitmaps(struct scrub_stripe *stripe)
 {
        stripe->extent_sector_bitmap = 0;
        stripe->init_error_bitmap = 0;
+       stripe->init_nr_io_errors = 0;
+       stripe->init_nr_csum_errors = 0;
+       stripe->init_nr_meta_errors = 0;
        stripe->error_bitmap = 0;
        stripe->io_error_bitmap = 0;
        stripe->csum_error_bitmap = 0;
@@ -1687,7 +1666,7 @@ static int flush_scrub_stripes(struct scrub_ctx *sctx)
        ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &sctx->stripes[0].state));
 
        scrub_throttle_dev_io(sctx, sctx->stripes[0].dev,
-                             nr_stripes << BTRFS_STRIPE_LEN_SHIFT);
+                             btrfs_stripe_nr_to_offset(nr_stripes));
        for (int i = 0; i < nr_stripes; i++) {
                stripe = &sctx->stripes[i];
                scrub_submit_initial_read(sctx, stripe);
@@ -1714,7 +1693,7 @@ static int flush_scrub_stripes(struct scrub_ctx *sctx)
                                break;
                        }
                }
-       } else {
+       } else if (!sctx->readonly) {
                for (int i = 0; i < nr_stripes; i++) {
                        unsigned long repaired;
 
@@ -1810,7 +1789,7 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
        bool all_empty = true;
        const int data_stripes = nr_data_stripes(map);
        unsigned long extent_bitmap = 0;
-       u64 length = data_stripes << BTRFS_STRIPE_LEN_SHIFT;
+       u64 length = btrfs_stripe_nr_to_offset(data_stripes);
        int ret;
 
        ASSERT(sctx->raid56_data_stripes);
@@ -1825,13 +1804,13 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
                              data_stripes) >> BTRFS_STRIPE_LEN_SHIFT;
                stripe_index = (i + rot) % map->num_stripes;
                physical = map->stripes[stripe_index].physical +
-                          (rot << BTRFS_STRIPE_LEN_SHIFT);
+                          btrfs_stripe_nr_to_offset(rot);
 
                scrub_reset_stripe(stripe);
                set_bit(SCRUB_STRIPE_FLAG_NO_REPORT, &stripe->state);
                ret = scrub_find_fill_first_stripe(bg,
                                map->stripes[stripe_index].dev, physical, 1,
-                               full_stripe_start + (i << BTRFS_STRIPE_LEN_SHIFT),
+                               full_stripe_start + btrfs_stripe_nr_to_offset(i),
                                BTRFS_STRIPE_LEN, stripe);
                if (ret < 0)
                        goto out;
@@ -1841,7 +1820,7 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
                 */
                if (ret > 0) {
                        stripe->logical = full_stripe_start +
-                                         (i << BTRFS_STRIPE_LEN_SHIFT);
+                                         btrfs_stripe_nr_to_offset(i);
                        stripe->dev = map->stripes[stripe_index].dev;
                        stripe->mirror_num = 1;
                        set_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &stripe->state);
@@ -1929,8 +1908,8 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
        bio->bi_end_io = raid56_scrub_wait_endio;
 
        btrfs_bio_counter_inc_blocked(fs_info);
-       ret = btrfs_map_sblock(fs_info, BTRFS_MAP_WRITE, full_stripe_start,
-                              &length, &bioc);
+       ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, full_stripe_start,
+                             &length, &bioc, NULL, NULL, 1);
        if (ret < 0) {
                btrfs_put_bioc(bioc);
                btrfs_bio_counter_dec(fs_info);
@@ -1944,6 +1923,13 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
                btrfs_bio_counter_dec(fs_info);
                goto out;
        }
+       /* Use the recovered stripes as cache to avoid read them from disk again. */
+       for (int i = 0; i < data_stripes; i++) {
+               stripe = &sctx->raid56_data_stripes[i];
+
+               raid56_parity_cache_data_pages(rbio, stripe->pages,
+                               full_stripe_start + (i << BTRFS_STRIPE_LEN_SHIFT));
+       }
        raid56_parity_submit_scrub_rbio(rbio);
        wait_for_completion_io(&io_done);
        ret = blk_status_to_errno(bio->bi_status);
@@ -2034,7 +2020,7 @@ static u64 simple_stripe_full_stripe_len(const struct map_lookup *map)
        ASSERT(map->type & (BTRFS_BLOCK_GROUP_RAID0 |
                            BTRFS_BLOCK_GROUP_RAID10));
 
-       return (map->num_stripes / map->sub_stripes) << BTRFS_STRIPE_LEN_SHIFT;
+       return btrfs_stripe_nr_to_offset(map->num_stripes / map->sub_stripes);
 }
 
 /* Get the logical bytenr for the stripe */
@@ -2050,7 +2036,7 @@ static u64 simple_stripe_get_logical(struct map_lookup *map,
         * (stripe_index / sub_stripes) gives how many data stripes we need to
         * skip.
         */
-       return ((stripe_index / map->sub_stripes) << BTRFS_STRIPE_LEN_SHIFT) +
+       return btrfs_stripe_nr_to_offset(stripe_index / map->sub_stripes) +
               bg->start;
 }
 
@@ -2176,7 +2162,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
        }
        if (profile & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) {
                ret = scrub_simple_stripe(sctx, bg, map, scrub_dev, stripe_index);
-               offset = (stripe_index / map->sub_stripes) << BTRFS_STRIPE_LEN_SHIFT;
+               offset = btrfs_stripe_nr_to_offset(stripe_index / map->sub_stripes);
                goto out;
        }
 
@@ -2191,7 +2177,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 
        /* Initialize @offset in case we need to go to out: label */
        get_raid56_logic_offset(physical, stripe_index, map, &offset, NULL);
-       increment = nr_data_stripes(map) << BTRFS_STRIPE_LEN_SHIFT;
+       increment = btrfs_stripe_nr_to_offset(nr_data_stripes(map));
 
        /*
         * Due to the rotation, for RAID56 it's better to iterate each stripe
@@ -2238,7 +2224,7 @@ next:
        }
 out:
        ret2 = flush_scrub_stripes(sctx);
-       if (!ret2)
+       if (!ret)
                ret = ret2;
        if (sctx->raid56_data_stripes) {
                for (int i = 0; i < nr_data_stripes(map); i++)
@@ -2518,13 +2504,20 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
 
                if (ret == 0) {
                        ro_set = 1;
-               } else if (ret == -ENOSPC && !sctx->is_dev_replace) {
+               } else if (ret == -ENOSPC && !sctx->is_dev_replace &&
+                          !(cache->flags & BTRFS_BLOCK_GROUP_RAID56_MASK)) {
                        /*
                         * btrfs_inc_block_group_ro return -ENOSPC when it
                         * failed in creating new chunk for metadata.
                         * It is not a problem for scrub, because
                         * metadata are always cowed, and our scrub paused
                         * commit_transactions.
+                        *
+                        * For RAID56 chunks, we have to mark them read-only
+                        * for scrub, as later we would use our own cache
+                        * out of RAID56 realm.
+                        * Thus we want the RAID56 bg to be marked RO to
+                        * prevent RMW from screwing up out cache.
                         */
                        ro_set = 0;
                } else if (ret == -ETXTBSY) {
@@ -2705,17 +2698,12 @@ static void scrub_workers_put(struct btrfs_fs_info *fs_info)
        if (refcount_dec_and_mutex_lock(&fs_info->scrub_workers_refcnt,
                                        &fs_info->scrub_lock)) {
                struct workqueue_struct *scrub_workers = fs_info->scrub_workers;
-               struct workqueue_struct *scrub_wr_comp =
-                                               fs_info->scrub_wr_completion_workers;
 
                fs_info->scrub_workers = NULL;
-               fs_info->scrub_wr_completion_workers = NULL;
                mutex_unlock(&fs_info->scrub_lock);
 
                if (scrub_workers)
                        destroy_workqueue(scrub_workers);
-               if (scrub_wr_comp)
-                       destroy_workqueue(scrub_wr_comp);
        }
 }
 
@@ -2726,7 +2714,6 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
                                                int is_dev_replace)
 {
        struct workqueue_struct *scrub_workers = NULL;
-       struct workqueue_struct *scrub_wr_comp = NULL;
        unsigned int flags = WQ_FREEZABLE | WQ_UNBOUND;
        int max_active = fs_info->thread_pool_size;
        int ret = -ENOMEM;
@@ -2734,21 +2721,17 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
        if (refcount_inc_not_zero(&fs_info->scrub_workers_refcnt))
                return 0;
 
-       scrub_workers = alloc_workqueue("btrfs-scrub", flags,
-                                       is_dev_replace ? 1 : max_active);
+       if (is_dev_replace)
+               scrub_workers = alloc_ordered_workqueue("btrfs-scrub", flags);
+       else
+               scrub_workers = alloc_workqueue("btrfs-scrub", flags, max_active);
        if (!scrub_workers)
-               goto fail_scrub_workers;
-
-       scrub_wr_comp = alloc_workqueue("btrfs-scrubwrc", flags, max_active);
-       if (!scrub_wr_comp)
-               goto fail_scrub_wr_completion_workers;
+               return -ENOMEM;
 
        mutex_lock(&fs_info->scrub_lock);
        if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) {
-               ASSERT(fs_info->scrub_workers == NULL &&
-                      fs_info->scrub_wr_completion_workers == NULL);
+               ASSERT(fs_info->scrub_workers == NULL);
                fs_info->scrub_workers = scrub_workers;
-               fs_info->scrub_wr_completion_workers = scrub_wr_comp;
                refcount_set(&fs_info->scrub_workers_refcnt, 1);
                mutex_unlock(&fs_info->scrub_lock);
                return 0;
@@ -2759,10 +2742,7 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
 
        ret = 0;
 
-       destroy_workqueue(scrub_wr_comp);
-fail_scrub_wr_completion_workers:
        destroy_workqueue(scrub_workers);
-fail_scrub_workers:
        return ret;
 }
 
index af2e153..8bfd447 100644 (file)
@@ -1774,9 +1774,21 @@ static int read_symlink(struct btrfs_root *root,
        ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
                        struct btrfs_file_extent_item);
        type = btrfs_file_extent_type(path->nodes[0], ei);
+       if (unlikely(type != BTRFS_FILE_EXTENT_INLINE)) {
+               ret = -EUCLEAN;
+               btrfs_crit(root->fs_info,
+"send: found symlink extent that is not inline, ino %llu root %llu extent type %d",
+                          ino, btrfs_root_id(root), type);
+               goto out;
+       }
        compression = btrfs_file_extent_compression(path->nodes[0], ei);
-       BUG_ON(type != BTRFS_FILE_EXTENT_INLINE);
-       BUG_ON(compression);
+       if (unlikely(compression != BTRFS_COMPRESS_NONE)) {
+               ret = -EUCLEAN;
+               btrfs_crit(root->fs_info,
+"send: found symlink extent with compression, ino %llu root %llu compression type %d",
+                          ino, btrfs_root_id(root), compression);
+               goto out;
+       }
 
        off = btrfs_file_extent_inline_start(ei);
        len = btrfs_file_extent_ram_bytes(path->nodes[0], ei);
index dd46b97..1b999c6 100644 (file)
@@ -100,9 +100,6 @@ void btrfs_init_subpage_info(struct btrfs_subpage_info *subpage_info, u32 sector
        subpage_info->uptodate_offset = cur;
        cur += nr_bits;
 
-       subpage_info->error_offset = cur;
-       cur += nr_bits;
-
        subpage_info->dirty_offset = cur;
        cur += nr_bits;
 
@@ -367,28 +364,6 @@ void btrfs_page_end_writer_lock(const struct btrfs_fs_info *fs_info,
                unlock_page(page);
 }
 
-static bool bitmap_test_range_all_set(unsigned long *addr, unsigned int start,
-                                     unsigned int nbits)
-{
-       unsigned int found_zero;
-
-       found_zero = find_next_zero_bit(addr, start + nbits, start);
-       if (found_zero == start + nbits)
-               return true;
-       return false;
-}
-
-static bool bitmap_test_range_all_zero(unsigned long *addr, unsigned int start,
-                                      unsigned int nbits)
-{
-       unsigned int found_set;
-
-       found_set = find_next_bit(addr, start + nbits, start);
-       if (found_set == start + nbits)
-               return true;
-       return false;
-}
-
 #define subpage_calc_start_bit(fs_info, page, name, start, len)                \
 ({                                                                     \
        unsigned int start_bit;                                         \
@@ -438,35 +413,6 @@ void btrfs_subpage_clear_uptodate(const struct btrfs_fs_info *fs_info,
        spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
-void btrfs_subpage_set_error(const struct btrfs_fs_info *fs_info,
-               struct page *page, u64 start, u32 len)
-{
-       struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
-       unsigned int start_bit = subpage_calc_start_bit(fs_info, page,
-                                                       error, start, len);
-       unsigned long flags;
-
-       spin_lock_irqsave(&subpage->lock, flags);
-       bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-       SetPageError(page);
-       spin_unlock_irqrestore(&subpage->lock, flags);
-}
-
-void btrfs_subpage_clear_error(const struct btrfs_fs_info *fs_info,
-               struct page *page, u64 start, u32 len)
-{
-       struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
-       unsigned int start_bit = subpage_calc_start_bit(fs_info, page,
-                                                       error, start, len);
-       unsigned long flags;
-
-       spin_lock_irqsave(&subpage->lock, flags);
-       bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-       if (subpage_test_bitmap_all_zero(fs_info, subpage, error))
-               ClearPageError(page);
-       spin_unlock_irqrestore(&subpage->lock, flags);
-}
-
 void btrfs_subpage_set_dirty(const struct btrfs_fs_info *fs_info,
                struct page *page, u64 start, u32 len)
 {
@@ -628,7 +574,6 @@ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info, \
        return ret;                                                     \
 }
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
-IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered);
@@ -696,7 +641,6 @@ bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info,      \
 }
 IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
                         PageUptodate);
-IMPLEMENT_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
 IMPLEMENT_BTRFS_PAGE_OPS(dirty, set_page_dirty, clear_page_dirty_for_io,
                         PageDirty);
 IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback,
@@ -767,3 +711,44 @@ void btrfs_page_unlock_writer(struct btrfs_fs_info *fs_info, struct page *page,
        /* Have writers, use proper subpage helper to end it */
        btrfs_page_end_writer_lock(fs_info, page, start, len);
 }
+
+#define GET_SUBPAGE_BITMAP(subpage, subpage_info, name, dst)           \
+       bitmap_cut(dst, subpage->bitmaps, 0,                            \
+                  subpage_info->name##_offset, subpage_info->bitmap_nr_bits)
+
+void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
+                                     struct page *page, u64 start, u32 len)
+{
+       struct btrfs_subpage_info *subpage_info = fs_info->subpage_info;
+       struct btrfs_subpage *subpage;
+       unsigned long uptodate_bitmap;
+       unsigned long error_bitmap;
+       unsigned long dirty_bitmap;
+       unsigned long writeback_bitmap;
+       unsigned long ordered_bitmap;
+       unsigned long checked_bitmap;
+       unsigned long flags;
+
+       ASSERT(PagePrivate(page) && page->private);
+       ASSERT(subpage_info);
+       subpage = (struct btrfs_subpage *)page->private;
+
+       spin_lock_irqsave(&subpage->lock, flags);
+       GET_SUBPAGE_BITMAP(subpage, subpage_info, uptodate, &uptodate_bitmap);
+       GET_SUBPAGE_BITMAP(subpage, subpage_info, dirty, &dirty_bitmap);
+       GET_SUBPAGE_BITMAP(subpage, subpage_info, writeback, &writeback_bitmap);
+       GET_SUBPAGE_BITMAP(subpage, subpage_info, ordered, &ordered_bitmap);
+       GET_SUBPAGE_BITMAP(subpage, subpage_info, checked, &checked_bitmap);
+       spin_unlock_irqrestore(&subpage->lock, flags);
+
+       dump_page(page, "btrfs subpage dump");
+       btrfs_warn(fs_info,
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl error=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl checked=%*pbl",
+                   start, len, page_offset(page),
+                   subpage_info->bitmap_nr_bits, &uptodate_bitmap,
+                   subpage_info->bitmap_nr_bits, &error_bitmap,
+                   subpage_info->bitmap_nr_bits, &dirty_bitmap,
+                   subpage_info->bitmap_nr_bits, &writeback_bitmap,
+                   subpage_info->bitmap_nr_bits, &ordered_bitmap,
+                   subpage_info->bitmap_nr_bits, &checked_bitmap);
+}
index 0e80ad3..5cbf67c 100644 (file)
@@ -8,17 +8,17 @@
 /*
  * Extra info for subpapge bitmap.
  *
- * For subpage we pack all uptodate/error/dirty/writeback/ordered bitmaps into
+ * For subpage we pack all uptodate/dirty/writeback/ordered bitmaps into
  * one larger bitmap.
  *
  * This structure records how they are organized in the bitmap:
  *
- * /- uptodate_offset  /- error_offset /- dirty_offset
+ * /- uptodate_offset  /- dirty_offset /- ordered_offset
  * |                   |               |
  * v                   v               v
- * |u|u|u|u|........|u|u|e|e|.......|e|e| ...  |o|o|
+ * |u|u|u|u|........|u|u|d|d|.......|d|d|o|o|.......|o|o|
  * |<- bitmap_nr_bits ->|
- * |<--------------- total_nr_bits ---------------->|
+ * |<----------------- total_nr_bits ------------------>|
  */
 struct btrfs_subpage_info {
        /* Number of bits for each bitmap */
@@ -32,7 +32,6 @@ struct btrfs_subpage_info {
         * @bitmap_size, which is calculated from PAGE_SIZE / sectorsize.
         */
        unsigned int uptodate_offset;
-       unsigned int error_offset;
        unsigned int dirty_offset;
        unsigned int writeback_offset;
        unsigned int ordered_offset;
@@ -141,7 +140,6 @@ bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info,      \
                struct page *page, u64 start, u32 len);
 
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
-DECLARE_BTRFS_SUBPAGE_OPS(error);
 DECLARE_BTRFS_SUBPAGE_OPS(dirty);
 DECLARE_BTRFS_SUBPAGE_OPS(writeback);
 DECLARE_BTRFS_SUBPAGE_OPS(ordered);
@@ -154,5 +152,7 @@ void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info,
                                 struct page *page);
 void btrfs_page_unlock_writer(struct btrfs_fs_info *fs_info, struct page *page,
                              u64 start, u32 len);
+void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
+                                     struct page *page, u64 start, u32 len);
 
 #endif
index 6cb97ef..f1dd172 100644 (file)
@@ -826,7 +826,11 @@ out:
            !btrfs_test_opt(info, CLEAR_CACHE)) {
                btrfs_err(info, "cannot disable free space tree");
                ret = -EINVAL;
-
+       }
+       if (btrfs_fs_compat_ro(info, BLOCK_GROUP_TREE) &&
+            !btrfs_test_opt(info, FREE_SPACE_TREE)) {
+               btrfs_err(info, "cannot disable free space tree with block-group-tree feature");
+               ret = -EINVAL;
        }
        if (!ret)
                ret = btrfs_check_mountopts_zoned(info);
@@ -845,8 +849,7 @@ out:
  * All other options will be parsed on much later in the mount process and
  * only when we need to allocate a new super block.
  */
-static int btrfs_parse_device_options(const char *options, fmode_t flags,
-                                     void *holder)
+static int btrfs_parse_device_options(const char *options, blk_mode_t flags)
 {
        substring_t args[MAX_OPT_ARGS];
        char *device_name, *opts, *orig, *p;
@@ -880,8 +883,7 @@ static int btrfs_parse_device_options(const char *options, fmode_t flags,
                                error = -ENOMEM;
                                goto out;
                        }
-                       device = btrfs_scan_one_device(device_name, flags,
-                                       holder);
+                       device = btrfs_scan_one_device(device_name, flags);
                        kfree(device_name);
                        if (IS_ERR(device)) {
                                error = PTR_ERR(device);
@@ -1438,12 +1440,9 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
        struct btrfs_fs_devices *fs_devices = NULL;
        struct btrfs_fs_info *fs_info = NULL;
        void *new_sec_opts = NULL;
-       fmode_t mode = FMODE_READ;
+       blk_mode_t mode = sb_open_mode(flags);
        int error = 0;
 
-       if (!(flags & SB_RDONLY))
-               mode |= FMODE_WRITE;
-
        if (data) {
                error = security_sb_eat_lsm_opts(data, &new_sec_opts);
                if (error)
@@ -1473,13 +1472,13 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
        }
 
        mutex_lock(&uuid_mutex);
-       error = btrfs_parse_device_options(data, mode, fs_type);
+       error = btrfs_parse_device_options(data, mode);
        if (error) {
                mutex_unlock(&uuid_mutex);
                goto error_fs_info;
        }
 
-       device = btrfs_scan_one_device(device_name, mode, fs_type);
+       device = btrfs_scan_one_device(device_name, mode);
        if (IS_ERR(device)) {
                mutex_unlock(&uuid_mutex);
                error = PTR_ERR(device);
@@ -1627,7 +1626,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info,
               old_pool_size, new_pool_size);
 
        btrfs_workqueue_set_max(fs_info->workers, new_pool_size);
-       btrfs_workqueue_set_max(fs_info->hipri_workers, new_pool_size);
        btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size);
        btrfs_workqueue_set_max(fs_info->caching_workers, new_pool_size);
        workqueue_set_max_active(fs_info->endio_workers, new_pool_size);
@@ -1837,6 +1835,12 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
                btrfs_clear_sb_rdonly(sb);
 
                set_bit(BTRFS_FS_OPEN, &fs_info->flags);
+
+               /*
+                * If we've gone from readonly -> read/write, we need to get
+                * our sync/async discard lists in the right state.
+                */
+               btrfs_discard_resume(fs_info);
        }
 out:
        /*
@@ -2186,8 +2190,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
        switch (cmd) {
        case BTRFS_IOC_SCAN_DEV:
                mutex_lock(&uuid_mutex);
-               device = btrfs_scan_one_device(vol->name, FMODE_READ,
-                                              &btrfs_root_fs_type);
+               device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
                ret = PTR_ERR_OR_ZERO(device);
                mutex_unlock(&uuid_mutex);
                break;
@@ -2201,8 +2204,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
                break;
        case BTRFS_IOC_DEVICES_READY:
                mutex_lock(&uuid_mutex);
-               device = btrfs_scan_one_device(vol->name, FMODE_READ,
-                                              &btrfs_root_fs_type);
+               device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
                if (IS_ERR(device)) {
                        mutex_unlock(&uuid_mutex);
                        ret = PTR_ERR(device);
index dfc5c7f..f6bc6d7 100644 (file)
@@ -159,7 +159,7 @@ static int test_find_delalloc(u32 sectorsize)
         * |--- delalloc ---|
         * |---  search  ---|
         */
-       set_extent_delalloc(tmp, 0, sectorsize - 1, 0, NULL);
+       set_extent_bit(tmp, 0, sectorsize - 1, EXTENT_DELALLOC, NULL);
        start = 0;
        end = start + PAGE_SIZE - 1;
        found = find_lock_delalloc_range(inode, locked_page, &start,
@@ -190,7 +190,7 @@ static int test_find_delalloc(u32 sectorsize)
                test_err("couldn't find the locked page");
                goto out_bits;
        }
-       set_extent_delalloc(tmp, sectorsize, max_bytes - 1, 0, NULL);
+       set_extent_bit(tmp, sectorsize, max_bytes - 1, EXTENT_DELALLOC, NULL);
        start = test_start;
        end = start + PAGE_SIZE - 1;
        found = find_lock_delalloc_range(inode, locked_page, &start,
@@ -245,7 +245,7 @@ static int test_find_delalloc(u32 sectorsize)
         *
         * We are re-using our test_start from above since it works out well.
         */
-       set_extent_delalloc(tmp, max_bytes, total_dirty - 1, 0, NULL);
+       set_extent_bit(tmp, max_bytes, total_dirty - 1, EXTENT_DELALLOC, NULL);
        start = test_start;
        end = start + PAGE_SIZE - 1;
        found = find_lock_delalloc_range(inode, locked_page, &start,
@@ -503,8 +503,8 @@ static int test_find_first_clear_extent_bit(void)
         * Set 1M-4M alloc/discard and 32M-64M thus leaving a hole between
         * 4M-32M
         */
-       set_extent_bits(&tree, SZ_1M, SZ_4M - 1,
-                       CHUNK_TRIMMED | CHUNK_ALLOCATED);
+       set_extent_bit(&tree, SZ_1M, SZ_4M - 1,
+                      CHUNK_TRIMMED | CHUNK_ALLOCATED, NULL);
 
        find_first_clear_extent_bit(&tree, SZ_512K, &start, &end,
                                    CHUNK_TRIMMED | CHUNK_ALLOCATED);
@@ -516,8 +516,8 @@ static int test_find_first_clear_extent_bit(void)
        }
 
        /* Now add 32M-64M so that we have a hole between 4M-32M */
-       set_extent_bits(&tree, SZ_32M, SZ_64M - 1,
-                       CHUNK_TRIMMED | CHUNK_ALLOCATED);
+       set_extent_bit(&tree, SZ_32M, SZ_64M - 1,
+                      CHUNK_TRIMMED | CHUNK_ALLOCATED, NULL);
 
        /*
         * Request first hole starting at 12M, we should get 4M-32M
@@ -548,7 +548,7 @@ static int test_find_first_clear_extent_bit(void)
         * Set 64M-72M with CHUNK_ALLOC flag, then search for CHUNK_TRIMMED flag
         * being unset in this range, we should get the entry in range 64M-72M
         */
-       set_extent_bits(&tree, SZ_64M, SZ_64M + SZ_8M - 1, CHUNK_ALLOCATED);
+       set_extent_bit(&tree, SZ_64M, SZ_64M + SZ_8M - 1, CHUNK_ALLOCATED, NULL);
        find_first_clear_extent_bit(&tree, SZ_64M + SZ_1M, &start, &end,
                                    CHUNK_TRIMMED);
 
index 8b6a99b..cf30635 100644 (file)
@@ -374,8 +374,6 @@ loop:
        spin_lock_init(&cur_trans->dirty_bgs_lock);
        INIT_LIST_HEAD(&cur_trans->deleted_bgs);
        spin_lock_init(&cur_trans->dropped_roots_lock);
-       INIT_LIST_HEAD(&cur_trans->releasing_ebs);
-       spin_lock_init(&cur_trans->releasing_ebs_lock);
        list_add_tail(&cur_trans->list, &fs_info->trans_list);
        extent_io_tree_init(fs_info, &cur_trans->dirty_pages,
                        IO_TREE_TRANS_DIRTY_PAGES);
@@ -1056,7 +1054,6 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
        u64 start = 0;
        u64 end;
 
-       atomic_inc(&BTRFS_I(fs_info->btree_inode)->sync_writers);
        while (!find_first_extent_bit(dirty_pages, start, &start, &end,
                                      mark, &cached_state)) {
                bool wait_writeback = false;
@@ -1092,7 +1089,6 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
                cond_resched();
                start = end + 1;
        }
-       atomic_dec(&BTRFS_I(fs_info->btree_inode)->sync_writers);
        return werr;
 }
 
@@ -1688,7 +1684,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
         * insert the directory item
         */
        ret = btrfs_set_inode_index(BTRFS_I(parent_inode), &index);
-       BUG_ON(ret); /* -ENOMEM */
+       if (ret) {
+               btrfs_abort_transaction(trans, ret);
+               goto fail;
+       }
 
        /* check if there is a file/dir which has the same name. */
        dir_item = btrfs_lookup_dir_item(NULL, parent_root, path,
@@ -2484,13 +2483,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
                goto scrub_continue;
        }
 
-       /*
-        * At this point, we should have written all the tree blocks allocated
-        * in this transaction. So it's now safe to free the redirtyied extent
-        * buffers.
-        */
-       btrfs_free_redirty_list(cur_trans);
-
        ret = write_all_supers(fs_info, 0);
        /*
         * the super is written, we can safely allow the tree-loggers
index fa728ab..8e9fa23 100644 (file)
@@ -94,9 +94,6 @@ struct btrfs_transaction {
         */
        atomic_t pending_ordered;
        wait_queue_head_t pending_wait;
-
-       spinlock_t releasing_ebs_lock;
-       struct list_head releasing_ebs;
 };
 
 enum {
index e2b5479..038dfa8 100644 (file)
 #include "compression.h"
 #include "volumes.h"
 #include "misc.h"
-#include "btrfs_inode.h"
 #include "fs.h"
 #include "accessors.h"
 #include "file-item.h"
+#include "inode-item.h"
 
 /*
  * Error message should follow the following format:
@@ -857,10 +857,10 @@ int btrfs_check_chunk_valid(struct extent_buffer *leaf,
         *
         * Thus it should be a good way to catch obvious bitflips.
         */
-       if (unlikely(length >= ((u64)U32_MAX << BTRFS_STRIPE_LEN_SHIFT))) {
+       if (unlikely(length >= btrfs_stripe_nr_to_offset(U32_MAX))) {
                chunk_err(leaf, chunk, logical,
                          "chunk length too large: have %llu limit %llu",
-                         length, (u64)U32_MAX << BTRFS_STRIPE_LEN_SHIFT);
+                         length, btrfs_stripe_nr_to_offset(U32_MAX));
                return -EUCLEAN;
        }
        if (unlikely(type & ~(BTRFS_BLOCK_GROUP_TYPE_MASK |
@@ -1620,9 +1620,10 @@ static int check_inode_ref(struct extent_buffer *leaf,
 /*
  * Common point to switch the item-specific validation.
  */
-static int check_leaf_item(struct extent_buffer *leaf,
-                          struct btrfs_key *key, int slot,
-                          struct btrfs_key *prev_key)
+static enum btrfs_tree_block_status check_leaf_item(struct extent_buffer *leaf,
+                                                   struct btrfs_key *key,
+                                                   int slot,
+                                                   struct btrfs_key *prev_key)
 {
        int ret = 0;
        struct btrfs_chunk *chunk;
@@ -1671,10 +1672,13 @@ static int check_leaf_item(struct extent_buffer *leaf,
                ret = check_extent_data_ref(leaf, key, slot);
                break;
        }
-       return ret;
+
+       if (ret)
+               return BTRFS_TREE_BLOCK_INVALID_ITEM;
+       return BTRFS_TREE_BLOCK_CLEAN;
 }
 
-static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
+enum btrfs_tree_block_status __btrfs_check_leaf(struct extent_buffer *leaf)
 {
        struct btrfs_fs_info *fs_info = leaf->fs_info;
        /* No valid key type is 0, so all key should be larger than this key */
@@ -1687,7 +1691,7 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
                generic_err(leaf, 0,
                        "invalid level for leaf, have %d expect 0",
                        btrfs_header_level(leaf));
-               return -EUCLEAN;
+               return BTRFS_TREE_BLOCK_INVALID_LEVEL;
        }
 
        /*
@@ -1710,32 +1714,32 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
                        generic_err(leaf, 0,
                        "invalid root, root %llu must never be empty",
                                    owner);
-                       return -EUCLEAN;
+                       return BTRFS_TREE_BLOCK_INVALID_NRITEMS;
                }
 
                /* Unknown tree */
                if (unlikely(owner == 0)) {
                        generic_err(leaf, 0,
                                "invalid owner, root 0 is not defined");
-                       return -EUCLEAN;
+                       return BTRFS_TREE_BLOCK_INVALID_OWNER;
                }
 
                /* EXTENT_TREE_V2 can have empty extent trees. */
                if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
-                       return 0;
+                       return BTRFS_TREE_BLOCK_CLEAN;
 
                if (unlikely(owner == BTRFS_EXTENT_TREE_OBJECTID)) {
                        generic_err(leaf, 0,
                        "invalid root, root %llu must never be empty",
                                    owner);
-                       return -EUCLEAN;
+                       return BTRFS_TREE_BLOCK_INVALID_NRITEMS;
                }
 
-               return 0;
+               return BTRFS_TREE_BLOCK_CLEAN;
        }
 
        if (unlikely(nritems == 0))
-               return 0;
+               return BTRFS_TREE_BLOCK_CLEAN;
 
        /*
         * Check the following things to make sure this is a good leaf, and
@@ -1751,7 +1755,6 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
        for (slot = 0; slot < nritems; slot++) {
                u32 item_end_expected;
                u64 item_data_end;
-               int ret;
 
                btrfs_item_key_to_cpu(leaf, &key, slot);
 
@@ -1762,7 +1765,7 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
                                prev_key.objectid, prev_key.type,
                                prev_key.offset, key.objectid, key.type,
                                key.offset);
-                       return -EUCLEAN;
+                       return BTRFS_TREE_BLOCK_BAD_KEY_ORDER;
                }
 
                item_data_end = (u64)btrfs_item_offset(leaf, slot) +
@@ -1781,7 +1784,7 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
                        generic_err(leaf, slot,
                                "unexpected item end, have %llu expect %u",
                                item_data_end, item_end_expected);
-                       return -EUCLEAN;
+                       return BTRFS_TREE_BLOCK_INVALID_OFFSETS;
                }
 
                /*
@@ -1793,7 +1796,7 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
                        generic_err(leaf, slot,
                        "slot end outside of leaf, have %llu expect range [0, %u]",
                                item_data_end, BTRFS_LEAF_DATA_SIZE(fs_info));
-                       return -EUCLEAN;
+                       return BTRFS_TREE_BLOCK_INVALID_OFFSETS;
                }
 
                /* Also check if the item pointer overlaps with btrfs item. */
@@ -1804,16 +1807,22 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
                                btrfs_item_nr_offset(leaf, slot) +
                                sizeof(struct btrfs_item),
                                btrfs_item_ptr_offset(leaf, slot));
-                       return -EUCLEAN;
+                       return BTRFS_TREE_BLOCK_INVALID_OFFSETS;
                }
 
-               if (check_item_data) {
+               /*
+                * We only want to do this if WRITTEN is set, otherwise the leaf
+                * may be in some intermediate state and won't appear valid.
+                */
+               if (btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_WRITTEN)) {
+                       enum btrfs_tree_block_status ret;
+
                        /*
                         * Check if the item size and content meet other
                         * criteria
                         */
                        ret = check_leaf_item(leaf, &key, slot, &prev_key);
-                       if (unlikely(ret < 0))
+                       if (unlikely(ret != BTRFS_TREE_BLOCK_CLEAN))
                                return ret;
                }
 
@@ -1822,21 +1831,21 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
                prev_key.offset = key.offset;
        }
 
-       return 0;
+       return BTRFS_TREE_BLOCK_CLEAN;
 }
 
-int btrfs_check_leaf_full(struct extent_buffer *leaf)
+int btrfs_check_leaf(struct extent_buffer *leaf)
 {
-       return check_leaf(leaf, true);
-}
-ALLOW_ERROR_INJECTION(btrfs_check_leaf_full, ERRNO);
+       enum btrfs_tree_block_status ret;
 
-int btrfs_check_leaf_relaxed(struct extent_buffer *leaf)
-{
-       return check_leaf(leaf, false);
+       ret = __btrfs_check_leaf(leaf);
+       if (unlikely(ret != BTRFS_TREE_BLOCK_CLEAN))
+               return -EUCLEAN;
+       return 0;
 }
+ALLOW_ERROR_INJECTION(btrfs_check_leaf, ERRNO);
 
-int btrfs_check_node(struct extent_buffer *node)
+enum btrfs_tree_block_status __btrfs_check_node(struct extent_buffer *node)
 {
        struct btrfs_fs_info *fs_info = node->fs_info;
        unsigned long nr = btrfs_header_nritems(node);
@@ -1844,13 +1853,12 @@ int btrfs_check_node(struct extent_buffer *node)
        int slot;
        int level = btrfs_header_level(node);
        u64 bytenr;
-       int ret = 0;
 
        if (unlikely(level <= 0 || level >= BTRFS_MAX_LEVEL)) {
                generic_err(node, 0,
                        "invalid level for node, have %d expect [1, %d]",
                        level, BTRFS_MAX_LEVEL - 1);
-               return -EUCLEAN;
+               return BTRFS_TREE_BLOCK_INVALID_LEVEL;
        }
        if (unlikely(nr == 0 || nr > BTRFS_NODEPTRS_PER_BLOCK(fs_info))) {
                btrfs_crit(fs_info,
@@ -1858,7 +1866,7 @@ int btrfs_check_node(struct extent_buffer *node)
                           btrfs_header_owner(node), node->start,
                           nr == 0 ? "small" : "large", nr,
                           BTRFS_NODEPTRS_PER_BLOCK(fs_info));
-               return -EUCLEAN;
+               return BTRFS_TREE_BLOCK_INVALID_NRITEMS;
        }
 
        for (slot = 0; slot < nr - 1; slot++) {
@@ -1869,15 +1877,13 @@ int btrfs_check_node(struct extent_buffer *node)
                if (unlikely(!bytenr)) {
                        generic_err(node, slot,
                                "invalid NULL node pointer");
-                       ret = -EUCLEAN;
-                       goto out;
+                       return BTRFS_TREE_BLOCK_INVALID_BLOCKPTR;
                }
                if (unlikely(!IS_ALIGNED(bytenr, fs_info->sectorsize))) {
                        generic_err(node, slot,
                        "unaligned pointer, have %llu should be aligned to %u",
                                bytenr, fs_info->sectorsize);
-                       ret = -EUCLEAN;
-                       goto out;
+                       return BTRFS_TREE_BLOCK_INVALID_BLOCKPTR;
                }
 
                if (unlikely(btrfs_comp_cpu_keys(&key, &next_key) >= 0)) {
@@ -1886,12 +1892,20 @@ int btrfs_check_node(struct extent_buffer *node)
                                key.objectid, key.type, key.offset,
                                next_key.objectid, next_key.type,
                                next_key.offset);
-                       ret = -EUCLEAN;
-                       goto out;
+                       return BTRFS_TREE_BLOCK_BAD_KEY_ORDER;
                }
        }
-out:
-       return ret;
+       return BTRFS_TREE_BLOCK_CLEAN;
+}
+
+int btrfs_check_node(struct extent_buffer *node)
+{
+       enum btrfs_tree_block_status ret;
+
+       ret = __btrfs_check_node(node);
+       if (unlikely(ret != BTRFS_TREE_BLOCK_CLEAN))
+               return -EUCLEAN;
+       return 0;
 }
 ALLOW_ERROR_INJECTION(btrfs_check_node, ERRNO);
 
@@ -1949,3 +1963,61 @@ int btrfs_check_eb_owner(const struct extent_buffer *eb, u64 root_owner)
        }
        return 0;
 }
+
+int btrfs_verify_level_key(struct extent_buffer *eb, int level,
+                          struct btrfs_key *first_key, u64 parent_transid)
+{
+       struct btrfs_fs_info *fs_info = eb->fs_info;
+       int found_level;
+       struct btrfs_key found_key;
+       int ret;
+
+       found_level = btrfs_header_level(eb);
+       if (found_level != level) {
+               WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+                    KERN_ERR "BTRFS: tree level check failed\n");
+               btrfs_err(fs_info,
+"tree level mismatch detected, bytenr=%llu level expected=%u has=%u",
+                         eb->start, level, found_level);
+               return -EIO;
+       }
+
+       if (!first_key)
+               return 0;
+
+       /*
+        * For live tree block (new tree blocks in current transaction),
+        * we need proper lock context to avoid race, which is impossible here.
+        * So we only checks tree blocks which is read from disk, whose
+        * generation <= fs_info->last_trans_committed.
+        */
+       if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
+               return 0;
+
+       /* We have @first_key, so this @eb must have at least one item */
+       if (btrfs_header_nritems(eb) == 0) {
+               btrfs_err(fs_info,
+               "invalid tree nritems, bytenr=%llu nritems=0 expect >0",
+                         eb->start);
+               WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+               return -EUCLEAN;
+       }
+
+       if (found_level)
+               btrfs_node_key_to_cpu(eb, &found_key, 0);
+       else
+               btrfs_item_key_to_cpu(eb, &found_key, 0);
+       ret = btrfs_comp_cpu_keys(first_key, &found_key);
+
+       if (ret) {
+               WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+                    KERN_ERR "BTRFS: tree first key check failed\n");
+               btrfs_err(fs_info,
+"tree first key mismatch detected, bytenr=%llu parent_transid=%llu key expected=(%llu,%u,%llu) has=(%llu,%u,%llu)",
+                         eb->start, parent_transid, first_key->objectid,
+                         first_key->type, first_key->offset,
+                         found_key.objectid, found_key.type,
+                         found_key.offset);
+       }
+       return ret;
+}
index bfb5efa..3c2a02a 100644 (file)
@@ -40,22 +40,33 @@ struct btrfs_tree_parent_check {
        u8 level;
 };
 
-/*
- * Comprehensive leaf checker.
- * Will check not only the item pointers, but also every possible member
- * in item data.
- */
-int btrfs_check_leaf_full(struct extent_buffer *leaf);
+enum btrfs_tree_block_status {
+       BTRFS_TREE_BLOCK_CLEAN,
+       BTRFS_TREE_BLOCK_INVALID_NRITEMS,
+       BTRFS_TREE_BLOCK_INVALID_PARENT_KEY,
+       BTRFS_TREE_BLOCK_BAD_KEY_ORDER,
+       BTRFS_TREE_BLOCK_INVALID_LEVEL,
+       BTRFS_TREE_BLOCK_INVALID_FREE_SPACE,
+       BTRFS_TREE_BLOCK_INVALID_OFFSETS,
+       BTRFS_TREE_BLOCK_INVALID_BLOCKPTR,
+       BTRFS_TREE_BLOCK_INVALID_ITEM,
+       BTRFS_TREE_BLOCK_INVALID_OWNER,
+};
 
 /*
- * Less strict leaf checker.
- * Will only check item pointers, not reading item data.
+ * Exported simply for btrfs-progs which wants to have the
+ * btrfs_tree_block_status return codes.
  */
-int btrfs_check_leaf_relaxed(struct extent_buffer *leaf);
+enum btrfs_tree_block_status __btrfs_check_leaf(struct extent_buffer *leaf);
+enum btrfs_tree_block_status __btrfs_check_node(struct extent_buffer *node);
+
+int btrfs_check_leaf(struct extent_buffer *leaf);
 int btrfs_check_node(struct extent_buffer *node);
 
 int btrfs_check_chunk_valid(struct extent_buffer *leaf,
                            struct btrfs_chunk *chunk, u64 logical);
 int btrfs_check_eb_owner(const struct extent_buffer *eb, u64 root_owner);
+int btrfs_verify_level_key(struct extent_buffer *eb, int level,
+                          struct btrfs_key *first_key, u64 parent_transid);
 
 #endif
index 9b212e8..365a1cc 100644 (file)
@@ -859,10 +859,10 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
                                                struct btrfs_ordered_sum,
                                                list);
                                csum_root = btrfs_csum_root(fs_info,
-                                                           sums->bytenr);
+                                                           sums->logical);
                                if (!ret)
                                        ret = btrfs_del_csums(trans, csum_root,
-                                                             sums->bytenr,
+                                                             sums->logical,
                                                              sums->len);
                                if (!ret)
                                        ret = btrfs_csum_file_blocks(trans,
@@ -3252,7 +3252,7 @@ int btrfs_free_log_root_tree(struct btrfs_trans_handle *trans,
  * Returns 1 if the inode was logged before in the transaction, 0 if it was not,
  * and < 0 on error.
  */
-static int inode_logged(struct btrfs_trans_handle *trans,
+static int inode_logged(const struct btrfs_trans_handle *trans,
                        struct btrfs_inode *inode,
                        struct btrfs_path *path_in)
 {
@@ -4056,14 +4056,14 @@ static int drop_inode_items(struct btrfs_trans_handle *trans,
 
        while (1) {
                ret = btrfs_search_slot(trans, log, &key, path, -1, 1);
-               BUG_ON(ret == 0); /* Logic error */
-               if (ret < 0)
-                       break;
-
-               if (path->slots[0] == 0)
+               if (ret < 0) {
                        break;
+               } else if (ret > 0) {
+                       if (path->slots[0] == 0)
+                               break;
+                       path->slots[0]--;
+               }
 
-               path->slots[0]--;
                btrfs_item_key_to_cpu(path->nodes[0], &found_key,
                                      path->slots[0]);
 
@@ -4221,7 +4221,7 @@ static int log_csums(struct btrfs_trans_handle *trans,
                     struct btrfs_root *log_root,
                     struct btrfs_ordered_sum *sums)
 {
-       const u64 lock_end = sums->bytenr + sums->len - 1;
+       const u64 lock_end = sums->logical + sums->len - 1;
        struct extent_state *cached_state = NULL;
        int ret;
 
@@ -4239,7 +4239,7 @@ static int log_csums(struct btrfs_trans_handle *trans,
         * file which happens to refer to the same extent as well. Such races
         * can leave checksum items in the log with overlapping ranges.
         */
-       ret = lock_extent(&log_root->log_csum_range, sums->bytenr, lock_end,
+       ret = lock_extent(&log_root->log_csum_range, sums->logical, lock_end,
                          &cached_state);
        if (ret)
                return ret;
@@ -4252,11 +4252,11 @@ static int log_csums(struct btrfs_trans_handle *trans,
         * some checksums missing in the fs/subvolume tree. So just delete (or
         * trim and adjust) any existing csum items in the log for this range.
         */
-       ret = btrfs_del_csums(trans, log_root, sums->bytenr, sums->len);
+       ret = btrfs_del_csums(trans, log_root, sums->logical, sums->len);
        if (!ret)
                ret = btrfs_csum_file_blocks(trans, log_root, sums);
 
-       unlock_extent(&log_root->log_csum_range, sums->bytenr, lock_end,
+       unlock_extent(&log_root->log_csum_range, sums->logical, lock_end,
                      &cached_state);
 
        return ret;
@@ -5303,7 +5303,7 @@ out:
  * multiple times when multiple tasks have joined the same log transaction.
  */
 static bool need_log_inode(const struct btrfs_trans_handle *trans,
-                          const struct btrfs_inode *inode)
+                          struct btrfs_inode *inode)
 {
        /*
         * If a directory was not modified, no dentries added or removed, we can
@@ -5321,7 +5321,7 @@ static bool need_log_inode(const struct btrfs_trans_handle *trans,
         * logged_trans will be 0, in which case we have to fully log it since
         * logged_trans is a transient field, not persisted.
         */
-       if (inode->logged_trans == trans->transid &&
+       if (inode_logged(trans, inode, NULL) == 1 &&
            !test_bit(BTRFS_INODE_COPY_EVERYTHING, &inode->runtime_flags))
                return false;
 
@@ -6158,7 +6158,7 @@ static int log_delayed_deletions_incremental(struct btrfs_trans_handle *trans,
 {
        struct btrfs_root *log = inode->root->log_root;
        const struct btrfs_delayed_item *curr;
-       u64 last_range_start;
+       u64 last_range_start = 0;
        u64 last_range_end = 0;
        struct btrfs_key key;
 
@@ -7309,7 +7309,7 @@ error:
  */
 void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
                             struct btrfs_inode *dir, struct btrfs_inode *inode,
-                            int for_rename)
+                            bool for_rename)
 {
        /*
         * when we're logging a file, if it hasn't been renamed
@@ -7325,18 +7325,25 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
        inode->last_unlink_trans = trans->transid;
        mutex_unlock(&inode->log_mutex);
 
+       if (!for_rename)
+               return;
+
        /*
-        * if this directory was already logged any new
-        * names for this file/dir will get recorded
+        * If this directory was already logged, any new names will be logged
+        * with btrfs_log_new_name() and old names will be deleted from the log
+        * tree with btrfs_del_dir_entries_in_log() or with
+        * btrfs_del_inode_ref_in_log().
         */
-       if (dir->logged_trans == trans->transid)
+       if (inode_logged(trans, dir, NULL) == 1)
                return;
 
        /*
-        * if the inode we're about to unlink was logged,
-        * the log will be properly updated for any new names
+        * If the inode we're about to unlink was logged before, the log will be
+        * properly updated with the new name with btrfs_log_new_name() and the
+        * old name removed with btrfs_del_dir_entries_in_log() or with
+        * btrfs_del_inode_ref_in_log().
         */
-       if (inode->logged_trans == trans->transid)
+       if (inode_logged(trans, inode, NULL) == 1)
                return;
 
        /*
@@ -7346,13 +7353,6 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
         * properly.  So, we have to be conservative and force commits
         * so the new name gets discovered.
         */
-       if (for_rename)
-               goto record;
-
-       /* we can safely do the unlink without any special recording */
-       return;
-
-record:
        mutex_lock(&dir->log_mutex);
        dir->last_unlink_trans = trans->transid;
        mutex_unlock(&dir->log_mutex);
index bdeb521..a550a8a 100644 (file)
@@ -100,7 +100,7 @@ void btrfs_end_log_trans(struct btrfs_root *root);
 void btrfs_pin_log_trans(struct btrfs_root *root);
 void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
                             struct btrfs_inode *dir, struct btrfs_inode *inode,
-                            int for_rename);
+                            bool for_rename);
 void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans,
                                   struct btrfs_inode *dir);
 void btrfs_log_new_name(struct btrfs_trans_handle *trans,
index a555baa..3df6153 100644 (file)
@@ -226,21 +226,32 @@ int btrfs_tree_mod_log_insert_key(struct extent_buffer *eb, int slot,
                                  enum btrfs_mod_log_op op)
 {
        struct tree_mod_elem *tm;
-       int ret;
+       int ret = 0;
 
        if (!tree_mod_need_log(eb->fs_info, eb))
                return 0;
 
        tm = alloc_tree_mod_elem(eb, slot, op);
        if (!tm)
-               return -ENOMEM;
+               ret = -ENOMEM;
 
        if (tree_mod_dont_log(eb->fs_info, eb)) {
                kfree(tm);
+               /*
+                * Don't error if we failed to allocate memory because we don't
+                * need to log.
+                */
                return 0;
+       } else if (ret != 0) {
+               /*
+                * We previously failed to allocate memory and we need to log,
+                * so we have to fail.
+                */
+               goto out_unlock;
        }
 
        ret = tree_mod_log_insert(eb->fs_info, tm);
+out_unlock:
        write_unlock(&eb->fs_info->tree_mod_log_lock);
        if (ret)
                kfree(tm);
@@ -248,6 +259,26 @@ int btrfs_tree_mod_log_insert_key(struct extent_buffer *eb, int slot,
        return ret;
 }
 
+static struct tree_mod_elem *tree_mod_log_alloc_move(struct extent_buffer *eb,
+                                                    int dst_slot, int src_slot,
+                                                    int nr_items)
+{
+       struct tree_mod_elem *tm;
+
+       tm = kzalloc(sizeof(*tm), GFP_NOFS);
+       if (!tm)
+               return ERR_PTR(-ENOMEM);
+
+       tm->logical = eb->start;
+       tm->slot = src_slot;
+       tm->move.dst_slot = dst_slot;
+       tm->move.nr_items = nr_items;
+       tm->op = BTRFS_MOD_LOG_MOVE_KEYS;
+       RB_CLEAR_NODE(&tm->node);
+
+       return tm;
+}
+
 int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
                                   int dst_slot, int src_slot,
                                   int nr_items)
@@ -262,35 +293,46 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
                return 0;
 
        tm_list = kcalloc(nr_items, sizeof(struct tree_mod_elem *), GFP_NOFS);
-       if (!tm_list)
-               return -ENOMEM;
-
-       tm = kzalloc(sizeof(*tm), GFP_NOFS);
-       if (!tm) {
+       if (!tm_list) {
                ret = -ENOMEM;
-               goto free_tms;
+               goto lock;
        }
 
-       tm->logical = eb->start;
-       tm->slot = src_slot;
-       tm->move.dst_slot = dst_slot;
-       tm->move.nr_items = nr_items;
-       tm->op = BTRFS_MOD_LOG_MOVE_KEYS;
+       tm = tree_mod_log_alloc_move(eb, dst_slot, src_slot, nr_items);
+       if (IS_ERR(tm)) {
+               ret = PTR_ERR(tm);
+               tm = NULL;
+               goto lock;
+       }
 
        for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
                tm_list[i] = alloc_tree_mod_elem(eb, i + dst_slot,
                                BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING);
                if (!tm_list[i]) {
                        ret = -ENOMEM;
-                       goto free_tms;
+                       goto lock;
                }
        }
 
-       if (tree_mod_dont_log(eb->fs_info, eb))
+lock:
+       if (tree_mod_dont_log(eb->fs_info, eb)) {
+               /*
+                * Don't error if we failed to allocate memory because we don't
+                * need to log.
+                */
+               ret = 0;
                goto free_tms;
+       }
        locked = true;
 
        /*
+        * We previously failed to allocate memory and we need to log, so we
+        * have to fail.
+        */
+       if (ret != 0)
+               goto free_tms;
+
+       /*
         * When we override something during the move, we log these removals.
         * This can only happen when we move towards the beginning of the
         * buffer, i.e. dst_slot < src_slot.
@@ -310,10 +352,12 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
        return 0;
 
 free_tms:
-       for (i = 0; i < nr_items; i++) {
-               if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
-                       rb_erase(&tm_list[i]->node, &eb->fs_info->tree_mod_log);
-               kfree(tm_list[i]);
+       if (tm_list) {
+               for (i = 0; i < nr_items; i++) {
+                       if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+                               rb_erase(&tm_list[i]->node, &eb->fs_info->tree_mod_log);
+                       kfree(tm_list[i]);
+               }
        }
        if (locked)
                write_unlock(&eb->fs_info->tree_mod_log_lock);
@@ -363,14 +407,14 @@ int btrfs_tree_mod_log_insert_root(struct extent_buffer *old_root,
                                  GFP_NOFS);
                if (!tm_list) {
                        ret = -ENOMEM;
-                       goto free_tms;
+                       goto lock;
                }
                for (i = 0; i < nritems; i++) {
                        tm_list[i] = alloc_tree_mod_elem(old_root, i,
                            BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING);
                        if (!tm_list[i]) {
                                ret = -ENOMEM;
-                               goto free_tms;
+                               goto lock;
                        }
                }
        }
@@ -378,7 +422,7 @@ int btrfs_tree_mod_log_insert_root(struct extent_buffer *old_root,
        tm = kzalloc(sizeof(*tm), GFP_NOFS);
        if (!tm) {
                ret = -ENOMEM;
-               goto free_tms;
+               goto lock;
        }
 
        tm->logical = new_root->start;
@@ -387,14 +431,28 @@ int btrfs_tree_mod_log_insert_root(struct extent_buffer *old_root,
        tm->generation = btrfs_header_generation(old_root);
        tm->op = BTRFS_MOD_LOG_ROOT_REPLACE;
 
-       if (tree_mod_dont_log(fs_info, NULL))
+lock:
+       if (tree_mod_dont_log(fs_info, NULL)) {
+               /*
+                * Don't error if we failed to allocate memory because we don't
+                * need to log.
+                */
+               ret = 0;
                goto free_tms;
+       } else if (ret != 0) {
+               /*
+                * We previously failed to allocate memory and we need to log,
+                * so we have to fail.
+                */
+               goto out_unlock;
+       }
 
        if (tm_list)
                ret = tree_mod_log_free_eb(fs_info, tm_list, nritems);
        if (!ret)
                ret = tree_mod_log_insert(fs_info, tm);
 
+out_unlock:
        write_unlock(&fs_info->tree_mod_log_lock);
        if (ret)
                goto free_tms;
@@ -486,9 +544,14 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
        struct btrfs_fs_info *fs_info = dst->fs_info;
        int ret = 0;
        struct tree_mod_elem **tm_list = NULL;
-       struct tree_mod_elem **tm_list_add, **tm_list_rem;
+       struct tree_mod_elem **tm_list_add = NULL;
+       struct tree_mod_elem **tm_list_rem = NULL;
        int i;
        bool locked = false;
+       struct tree_mod_elem *dst_move_tm = NULL;
+       struct tree_mod_elem *src_move_tm = NULL;
+       u32 dst_move_nr_items = btrfs_header_nritems(dst) - dst_offset;
+       u32 src_move_nr_items = btrfs_header_nritems(src) - (src_offset + nr_items);
 
        if (!tree_mod_need_log(fs_info, NULL))
                return 0;
@@ -498,8 +561,30 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 
        tm_list = kcalloc(nr_items * 2, sizeof(struct tree_mod_elem *),
                          GFP_NOFS);
-       if (!tm_list)
-               return -ENOMEM;
+       if (!tm_list) {
+               ret = -ENOMEM;
+               goto lock;
+       }
+
+       if (dst_move_nr_items) {
+               dst_move_tm = tree_mod_log_alloc_move(dst, dst_offset + nr_items,
+                                                     dst_offset, dst_move_nr_items);
+               if (IS_ERR(dst_move_tm)) {
+                       ret = PTR_ERR(dst_move_tm);
+                       dst_move_tm = NULL;
+                       goto lock;
+               }
+       }
+       if (src_move_nr_items) {
+               src_move_tm = tree_mod_log_alloc_move(src, src_offset,
+                                                     src_offset + nr_items,
+                                                     src_move_nr_items);
+               if (IS_ERR(src_move_tm)) {
+                       ret = PTR_ERR(src_move_tm);
+                       src_move_tm = NULL;
+                       goto lock;
+               }
+       }
 
        tm_list_add = tm_list;
        tm_list_rem = tm_list + nr_items;
@@ -508,21 +593,40 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
                                                     BTRFS_MOD_LOG_KEY_REMOVE);
                if (!tm_list_rem[i]) {
                        ret = -ENOMEM;
-                       goto free_tms;
+                       goto lock;
                }
 
                tm_list_add[i] = alloc_tree_mod_elem(dst, i + dst_offset,
                                                     BTRFS_MOD_LOG_KEY_ADD);
                if (!tm_list_add[i]) {
                        ret = -ENOMEM;
-                       goto free_tms;
+                       goto lock;
                }
        }
 
-       if (tree_mod_dont_log(fs_info, NULL))
+lock:
+       if (tree_mod_dont_log(fs_info, NULL)) {
+               /*
+                * Don't error if we failed to allocate memory because we don't
+                * need to log.
+                */
+               ret = 0;
                goto free_tms;
+       }
        locked = true;
 
+       /*
+        * We previously failed to allocate memory and we need to log, so we
+        * have to fail.
+        */
+       if (ret != 0)
+               goto free_tms;
+
+       if (dst_move_tm) {
+               ret = tree_mod_log_insert(fs_info, dst_move_tm);
+               if (ret)
+                       goto free_tms;
+       }
        for (i = 0; i < nr_items; i++) {
                ret = tree_mod_log_insert(fs_info, tm_list_rem[i]);
                if (ret)
@@ -531,6 +635,11 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
                if (ret)
                        goto free_tms;
        }
+       if (src_move_tm) {
+               ret = tree_mod_log_insert(fs_info, src_move_tm);
+               if (ret)
+                       goto free_tms;
+       }
 
        write_unlock(&fs_info->tree_mod_log_lock);
        kfree(tm_list);
@@ -538,10 +647,18 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
        return 0;
 
 free_tms:
-       for (i = 0; i < nr_items * 2; i++) {
-               if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
-                       rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
-               kfree(tm_list[i]);
+       if (dst_move_tm && !RB_EMPTY_NODE(&dst_move_tm->node))
+               rb_erase(&dst_move_tm->node, &fs_info->tree_mod_log);
+       kfree(dst_move_tm);
+       if (src_move_tm && !RB_EMPTY_NODE(&src_move_tm->node))
+               rb_erase(&src_move_tm->node, &fs_info->tree_mod_log);
+       kfree(src_move_tm);
+       if (tm_list) {
+               for (i = 0; i < nr_items * 2; i++) {
+                       if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+                               rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
+                       kfree(tm_list[i]);
+               }
        }
        if (locked)
                write_unlock(&fs_info->tree_mod_log_lock);
@@ -562,22 +679,38 @@ int btrfs_tree_mod_log_free_eb(struct extent_buffer *eb)
 
        nritems = btrfs_header_nritems(eb);
        tm_list = kcalloc(nritems, sizeof(struct tree_mod_elem *), GFP_NOFS);
-       if (!tm_list)
-               return -ENOMEM;
+       if (!tm_list) {
+               ret = -ENOMEM;
+               goto lock;
+       }
 
        for (i = 0; i < nritems; i++) {
                tm_list[i] = alloc_tree_mod_elem(eb, i,
                                    BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING);
                if (!tm_list[i]) {
                        ret = -ENOMEM;
-                       goto free_tms;
+                       goto lock;
                }
        }
 
-       if (tree_mod_dont_log(eb->fs_info, eb))
+lock:
+       if (tree_mod_dont_log(eb->fs_info, eb)) {
+               /*
+                * Don't error if we failed to allocate memory because we don't
+                * need to log.
+                */
+               ret = 0;
                goto free_tms;
+       } else if (ret != 0) {
+               /*
+                * We previously failed to allocate memory and we need to log,
+                * so we have to fail.
+                */
+               goto out_unlock;
+       }
 
        ret = tree_mod_log_free_eb(eb->fs_info, tm_list, nritems);
+out_unlock:
        write_unlock(&eb->fs_info->tree_mod_log_lock);
        if (ret)
                goto free_tms;
@@ -586,9 +719,11 @@ int btrfs_tree_mod_log_free_eb(struct extent_buffer *eb)
        return 0;
 
 free_tms:
-       for (i = 0; i < nritems; i++)
-               kfree(tm_list[i]);
-       kfree(tm_list);
+       if (tm_list) {
+               for (i = 0; i < nritems; i++)
+                       kfree(tm_list[i]);
+               kfree(tm_list);
+       }
 
        return ret;
 }
@@ -664,10 +799,27 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
        unsigned long o_dst;
        unsigned long o_src;
        unsigned long p_size = sizeof(struct btrfs_key_ptr);
+       /*
+        * max_slot tracks the maximum valid slot of the rewind eb at every
+        * step of the rewind. This is in contrast with 'n' which eventually
+        * matches the number of items, but can be wrong during moves or if
+        * removes overlap on already valid slots (which is probably separately
+        * a bug). We do this to validate the offsets of memmoves for rewinding
+        * moves and detect invalid memmoves.
+        *
+        * Since a rewind eb can start empty, max_slot is a signed integer with
+        * a special meaning for -1, which is that no slot is valid to move out
+        * of. Any other negative value is invalid.
+        */
+       int max_slot;
+       int move_src_end_slot;
+       int move_dst_end_slot;
 
        n = btrfs_header_nritems(eb);
+       max_slot = n - 1;
        read_lock(&fs_info->tree_mod_log_lock);
        while (tm && tm->seq >= time_seq) {
+               ASSERT(max_slot >= -1);
                /*
                 * All the operations are recorded with the operator used for
                 * the modification. As we're going backwards, we do the
@@ -684,6 +836,8 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
                        btrfs_set_node_ptr_generation(eb, tm->slot,
                                                      tm->generation);
                        n++;
+                       if (tm->slot > max_slot)
+                               max_slot = tm->slot;
                        break;
                case BTRFS_MOD_LOG_KEY_REPLACE:
                        BUG_ON(tm->slot >= n);
@@ -693,14 +847,37 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
                                                      tm->generation);
                        break;
                case BTRFS_MOD_LOG_KEY_ADD:
+                       /*
+                        * It is possible we could have already removed keys
+                        * behind the known max slot, so this will be an
+                        * overestimate. In practice, the copy operation
+                        * inserts them in increasing order, and overestimating
+                        * just means we miss some warnings, so it's OK. It
+                        * isn't worth carefully tracking the full array of
+                        * valid slots to check against when moving.
+                        */
+                       if (tm->slot == max_slot)
+                               max_slot--;
                        /* if a move operation is needed it's in the log */
                        n--;
                        break;
                case BTRFS_MOD_LOG_MOVE_KEYS:
+                       ASSERT(tm->move.nr_items > 0);
+                       move_src_end_slot = tm->move.dst_slot + tm->move.nr_items - 1;
+                       move_dst_end_slot = tm->slot + tm->move.nr_items - 1;
                        o_dst = btrfs_node_key_ptr_offset(eb, tm->slot);
                        o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
+                       if (WARN_ON(move_src_end_slot > max_slot ||
+                                   tm->move.nr_items <= 0)) {
+                               btrfs_warn(fs_info,
+"move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %d",
+                                          eb->start, tm->slot,
+                                          tm->move.dst_slot, tm->move.nr_items,
+                                          tm->seq, n, max_slot);
+                       }
                        memmove_extent_buffer(eb, o_dst, o_src,
                                              tm->move.nr_items * p_size);
+                       max_slot = move_dst_end_slot;
                        break;
                case BTRFS_MOD_LOG_ROOT_REPLACE:
                        /*
index 03f52e4..73f9ea7 100644 (file)
@@ -370,6 +370,8 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
 {
        struct btrfs_fs_devices *fs_devs;
 
+       ASSERT(fsid || !metadata_fsid);
+
        fs_devs = kzalloc(sizeof(*fs_devs), GFP_KERNEL);
        if (!fs_devs)
                return ERR_PTR(-ENOMEM);
@@ -380,21 +382,21 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
        INIT_LIST_HEAD(&fs_devs->alloc_list);
        INIT_LIST_HEAD(&fs_devs->fs_list);
        INIT_LIST_HEAD(&fs_devs->seed_list);
-       if (fsid)
-               memcpy(fs_devs->fsid, fsid, BTRFS_FSID_SIZE);
 
-       if (metadata_fsid)
-               memcpy(fs_devs->metadata_uuid, metadata_fsid, BTRFS_FSID_SIZE);
-       else if (fsid)
-               memcpy(fs_devs->metadata_uuid, fsid, BTRFS_FSID_SIZE);
+       if (fsid) {
+               memcpy(fs_devs->fsid, fsid, BTRFS_FSID_SIZE);
+               memcpy(fs_devs->metadata_uuid,
+                      metadata_fsid ?: fsid, BTRFS_FSID_SIZE);
+       }
 
        return fs_devs;
 }
 
-void btrfs_free_device(struct btrfs_device *device)
+static void btrfs_free_device(struct btrfs_device *device)
 {
        WARN_ON(!list_empty(&device->post_commit_list));
        rcu_string_free(device->name);
+       extent_io_tree_release(&device->alloc_state);
        btrfs_destroy_dev_zone_info(device);
        kfree(device);
 }
@@ -425,6 +427,21 @@ void __exit btrfs_cleanup_fs_uuids(void)
        }
 }
 
+static bool match_fsid_fs_devices(const struct btrfs_fs_devices *fs_devices,
+                                 const u8 *fsid, const u8 *metadata_fsid)
+{
+       if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) != 0)
+               return false;
+
+       if (!metadata_fsid)
+               return true;
+
+       if (memcmp(metadata_fsid, fs_devices->metadata_uuid, BTRFS_FSID_SIZE) != 0)
+               return false;
+
+       return true;
+}
+
 static noinline struct btrfs_fs_devices *find_fsid(
                const u8 *fsid, const u8 *metadata_fsid)
 {
@@ -434,19 +451,25 @@ static noinline struct btrfs_fs_devices *find_fsid(
 
        /* Handle non-split brain cases */
        list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-               if (metadata_fsid) {
-                       if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
-                           && memcmp(metadata_fsid, fs_devices->metadata_uuid,
-                                     BTRFS_FSID_SIZE) == 0)
-                               return fs_devices;
-               } else {
-                       if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0)
-                               return fs_devices;
-               }
+               if (match_fsid_fs_devices(fs_devices, fsid, metadata_fsid))
+                       return fs_devices;
        }
        return NULL;
 }
 
+/*
+ * First check if the metadata_uuid is different from the fsid in the given
+ * fs_devices. Then check if the given fsid is the same as the metadata_uuid
+ * in the fs_devices. If it is, return true; otherwise, return false.
+ */
+static inline bool check_fsid_changed(const struct btrfs_fs_devices *fs_devices,
+                                     const u8 *fsid)
+{
+       return memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
+                     BTRFS_FSID_SIZE) != 0 &&
+              memcmp(fs_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE) == 0;
+}
+
 static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
                                struct btrfs_super_block *disk_super)
 {
@@ -460,14 +483,14 @@ static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
         * at all and the CHANGING_FSID_V2 flag set.
         */
        list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-               if (fs_devices->fsid_change &&
-                   memcmp(disk_super->metadata_uuid, fs_devices->fsid,
-                          BTRFS_FSID_SIZE) == 0 &&
-                   memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
-                          BTRFS_FSID_SIZE) == 0) {
+               if (!fs_devices->fsid_change)
+                       continue;
+
+               if (match_fsid_fs_devices(fs_devices, disk_super->metadata_uuid,
+                                         fs_devices->fsid))
                        return fs_devices;
-               }
        }
+
        /*
         * Handle scanned device having completed its fsid change but
         * belonging to a fs_devices that was created by a device that
@@ -475,13 +498,11 @@ static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
         * CHANGING_FSID_V2 flag set.
         */
        list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-               if (fs_devices->fsid_change &&
-                   memcmp(fs_devices->metadata_uuid,
-                          fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
-                   memcmp(disk_super->metadata_uuid, fs_devices->metadata_uuid,
-                          BTRFS_FSID_SIZE) == 0) {
+               if (!fs_devices->fsid_change)
+                       continue;
+
+               if (check_fsid_changed(fs_devices, disk_super->metadata_uuid))
                        return fs_devices;
-               }
        }
 
        return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
@@ -489,13 +510,13 @@ static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
 
 
 static int
-btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
+btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
                      int flush, struct block_device **bdev,
                      struct btrfs_super_block **disk_super)
 {
        int ret;
 
-       *bdev = blkdev_get_by_path(device_path, flags, holder);
+       *bdev = blkdev_get_by_path(device_path, flags, holder, NULL);
 
        if (IS_ERR(*bdev)) {
                ret = PTR_ERR(*bdev);
@@ -506,14 +527,14 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
                sync_blockdev(*bdev);
        ret = set_blocksize(*bdev, BTRFS_BDEV_BLOCKSIZE);
        if (ret) {
-               blkdev_put(*bdev, flags);
+               blkdev_put(*bdev, holder);
                goto error;
        }
        invalidate_bdev(*bdev);
        *disk_super = btrfs_read_dev_super(*bdev);
        if (IS_ERR(*disk_super)) {
                ret = PTR_ERR(*disk_super);
-               blkdev_put(*bdev, flags);
+               blkdev_put(*bdev, holder);
                goto error;
        }
 
@@ -589,7 +610,7 @@ static int btrfs_free_stale_devices(dev_t devt, struct btrfs_device *skip_device
  * fs_devices->device_list_mutex here.
  */
 static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
-                       struct btrfs_device *device, fmode_t flags,
+                       struct btrfs_device *device, blk_mode_t flags,
                        void *holder)
 {
        struct block_device *bdev;
@@ -641,7 +662,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 
        device->bdev = bdev;
        clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
-       device->mode = flags;
+       device->holder = holder;
 
        fs_devices->open_devices++;
        if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) &&
@@ -655,7 +676,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 
 error_free_page:
        btrfs_release_disk_super(disk_super);
-       blkdev_put(bdev, flags);
+       blkdev_put(bdev, holder);
 
        return -EINVAL;
 }
@@ -672,18 +693,16 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
        struct btrfs_fs_devices *fs_devices;
 
        list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-               if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
-                          BTRFS_FSID_SIZE) != 0 &&
-                   memcmp(fs_devices->metadata_uuid, disk_super->fsid,
-                          BTRFS_FSID_SIZE) == 0 && !fs_devices->fsid_change) {
+               if (fs_devices->fsid_change)
+                       continue;
+
+               if (check_fsid_changed(fs_devices,  disk_super->fsid))
                        return fs_devices;
-               }
        }
 
        return find_fsid(disk_super->fsid, NULL);
 }
 
-
 static struct btrfs_fs_devices *find_fsid_changed(
                                        struct btrfs_super_block *disk_super)
 {
@@ -700,10 +719,7 @@ static struct btrfs_fs_devices *find_fsid_changed(
         */
        list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
                /* Changed UUIDs */
-               if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
-                          BTRFS_FSID_SIZE) != 0 &&
-                   memcmp(fs_devices->metadata_uuid, disk_super->metadata_uuid,
-                          BTRFS_FSID_SIZE) == 0 &&
+               if (check_fsid_changed(fs_devices, disk_super->metadata_uuid) &&
                    memcmp(fs_devices->fsid, disk_super->fsid,
                           BTRFS_FSID_SIZE) != 0)
                        return fs_devices;
@@ -734,11 +750,10 @@ static struct btrfs_fs_devices *find_fsid_reverted_metadata(
         * fs_devices equal to the FSID of the disk.
         */
        list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-               if (memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
-                          BTRFS_FSID_SIZE) != 0 &&
-                   memcmp(fs_devices->metadata_uuid, disk_super->fsid,
-                          BTRFS_FSID_SIZE) == 0 &&
-                   fs_devices->fsid_change)
+               if (!fs_devices->fsid_change)
+                       continue;
+
+               if (check_fsid_changed(fs_devices, disk_super->fsid))
                        return fs_devices;
        }
 
@@ -789,12 +804,8 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 
 
        if (!fs_devices) {
-               if (has_metadata_uuid)
-                       fs_devices = alloc_fs_devices(disk_super->fsid,
-                                                     disk_super->metadata_uuid);
-               else
-                       fs_devices = alloc_fs_devices(disk_super->fsid, NULL);
-
+               fs_devices = alloc_fs_devices(disk_super->fsid,
+                               has_metadata_uuid ? disk_super->metadata_uuid : NULL);
                if (IS_ERR(fs_devices))
                        return ERR_CAST(fs_devices);
 
@@ -1056,7 +1067,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
                        continue;
 
                if (device->bdev) {
-                       blkdev_put(device->bdev, device->mode);
+                       blkdev_put(device->bdev, device->holder);
                        device->bdev = NULL;
                        fs_devices->open_devices--;
                }
@@ -1102,7 +1113,7 @@ static void btrfs_close_bdev(struct btrfs_device *device)
                invalidate_bdev(device->bdev);
        }
 
-       blkdev_put(device->bdev, device->mode);
+       blkdev_put(device->bdev, device->holder);
 }
 
 static void btrfs_close_one_device(struct btrfs_device *device)
@@ -1206,14 +1217,12 @@ void btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 }
 
 static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
-                               fmode_t flags, void *holder)
+                               blk_mode_t flags, void *holder)
 {
        struct btrfs_device *device;
        struct btrfs_device *latest_dev = NULL;
        struct btrfs_device *tmp_device;
 
-       flags |= FMODE_EXCL;
-
        list_for_each_entry_safe(device, tmp_device, &fs_devices->devices,
                                 dev_list) {
                int ret;
@@ -1256,7 +1265,7 @@ static int devid_cmp(void *priv, const struct list_head *a,
 }
 
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
-                      fmode_t flags, void *holder)
+                      blk_mode_t flags, void *holder)
 {
        int ret;
 
@@ -1347,8 +1356,7 @@ int btrfs_forget_devices(dev_t devt)
  * and we are not allowed to call set_blocksize during the scan. The superblock
  * is read via pagecache
  */
-struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
-                                          void *holder)
+struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags)
 {
        struct btrfs_super_block *disk_super;
        bool new_device_added = false;
@@ -1367,16 +1375,16 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
         */
 
        /*
-        * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may
-        * initiate the device scan which may race with the user's mount
-        * or mkfs command, resulting in failure.
-        * Since the device scan is solely for reading purposes, there is
-        * no need for FMODE_EXCL. Additionally, the devices are read again
+        * Avoid an exclusive open here, as the systemd-udev may initiate the
+        * device scan which may race with the user's mount or mkfs command,
+        * resulting in failure.
+        * Since the device scan is solely for reading purposes, there is no
+        * need for an exclusive open. Additionally, the devices are read again
         * during the mount process. It is ok to get some inconsistent
         * values temporarily, as the device paths of the fsid are the only
         * required information for assembling the volume.
         */
-       bdev = blkdev_get_by_path(path, flags, holder);
+       bdev = blkdev_get_by_path(path, flags, NULL, NULL);
        if (IS_ERR(bdev))
                return ERR_CAST(bdev);
 
@@ -1400,7 +1408,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
        btrfs_release_disk_super(disk_super);
 
 error_bdev_put:
-       blkdev_put(bdev, flags);
+       blkdev_put(bdev, NULL);
 
        return device;
 }
@@ -1917,7 +1925,7 @@ static void update_dev_time(const char *device_path)
                return;
 
        now = current_time(d_inode(path.dentry));
-       inode_update_time(d_inode(path.dentry), &now, S_MTIME | S_CTIME);
+       inode_update_time(d_inode(path.dentry), &now, S_MTIME | S_CTIME | S_VERSION);
        path_put(&path);
 }
 
@@ -2087,7 +2095,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
 
 int btrfs_rm_device(struct btrfs_fs_info *fs_info,
                    struct btrfs_dev_lookup_args *args,
-                   struct block_device **bdev, fmode_t *mode)
+                   struct block_device **bdev, void **holder)
 {
        struct btrfs_trans_handle *trans;
        struct btrfs_device *device;
@@ -2226,7 +2234,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
        }
 
        *bdev = device->bdev;
-       *mode = device->mode;
+       *holder = device->holder;
        synchronize_rcu();
        btrfs_free_device(device);
 
@@ -2380,7 +2388,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
                return -ENOMEM;
        }
 
-       ret = btrfs_get_bdev_and_sb(path, FMODE_READ, fs_info->bdev_holder, 0,
+       ret = btrfs_get_bdev_and_sb(path, BLK_OPEN_READ, NULL, 0,
                                    &bdev, &disk_super);
        if (ret) {
                btrfs_put_dev_args_from_path(args);
@@ -2394,7 +2402,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
        else
                memcpy(args->fsid, disk_super->fsid, BTRFS_FSID_SIZE);
        btrfs_release_disk_super(disk_super);
-       blkdev_put(bdev, FMODE_READ);
+       blkdev_put(bdev, NULL);
        return 0;
 }
 
@@ -2627,8 +2635,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
        if (sb_rdonly(sb) && !fs_devices->seeding)
                return -EROFS;
 
-       bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
-                                 fs_info->bdev_holder);
+       bdev = blkdev_get_by_path(device_path, BLK_OPEN_WRITE,
+                                 fs_info->bdev_holder, NULL);
        if (IS_ERR(bdev))
                return PTR_ERR(bdev);
 
@@ -2690,7 +2698,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
        device->commit_total_bytes = device->total_bytes;
        set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
        clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
-       device->mode = FMODE_EXCL;
+       device->holder = fs_info->bdev_holder;
        device->dev_stats_valid = 1;
        set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
 
@@ -2848,7 +2856,7 @@ error_free_zone:
 error_free_device:
        btrfs_free_device(device);
 error:
-       blkdev_put(bdev, FMODE_EXCL);
+       blkdev_put(bdev, fs_info->bdev_holder);
        if (locked) {
                mutex_unlock(&uuid_mutex);
                up_write(&sb->s_umount);
@@ -5124,7 +5132,7 @@ static void init_alloc_chunk_ctl_policy_regular(
        /* We don't want a chunk larger than 10% of writable space */
        ctl->max_chunk_size = min(mult_perc(fs_devices->total_rw_bytes, 10),
                                  ctl->max_chunk_size);
-       ctl->dev_extent_min = ctl->dev_stripes << BTRFS_STRIPE_LEN_SHIFT;
+       ctl->dev_extent_min = btrfs_stripe_nr_to_offset(ctl->dev_stripes);
 }
 
 static void init_alloc_chunk_ctl_policy_zoned(
@@ -5800,7 +5808,7 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
        if (!WARN_ON(IS_ERR(em))) {
                map = em->map_lookup;
                if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
-                       len = nr_data_stripes(map) << BTRFS_STRIPE_LEN_SHIFT;
+                       len = btrfs_stripe_nr_to_offset(nr_data_stripes(map));
                free_extent_map(em);
        }
        return len;
@@ -5974,12 +5982,12 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
        stripe_nr = offset >> BTRFS_STRIPE_LEN_SHIFT;
 
        /* stripe_offset is the offset of this block in its stripe */
-       stripe_offset = offset - (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
+       stripe_offset = offset - btrfs_stripe_nr_to_offset(stripe_nr);
 
        stripe_nr_end = round_up(offset + length, BTRFS_STRIPE_LEN) >>
                        BTRFS_STRIPE_LEN_SHIFT;
        stripe_cnt = stripe_nr_end - stripe_nr;
-       stripe_end_offset = (stripe_nr_end << BTRFS_STRIPE_LEN_SHIFT) -
+       stripe_end_offset = btrfs_stripe_nr_to_offset(stripe_nr_end) -
                            (offset + length);
        /*
         * after this, stripe_nr is the number of stripes on this
@@ -6022,12 +6030,12 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
        for (i = 0; i < *num_stripes; i++) {
                stripes[i].physical =
                        map->stripes[stripe_index].physical +
-                       stripe_offset + (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
+                       stripe_offset + btrfs_stripe_nr_to_offset(stripe_nr);
                stripes[i].dev = map->stripes[stripe_index].dev;
 
                if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
                                 BTRFS_BLOCK_GROUP_RAID10)) {
-                       stripes[i].length = stripes_per_dev << BTRFS_STRIPE_LEN_SHIFT;
+                       stripes[i].length = btrfs_stripe_nr_to_offset(stripes_per_dev);
 
                        if (i / sub_stripes < remaining_stripes)
                                stripes[i].length += BTRFS_STRIPE_LEN;
@@ -6162,17 +6170,10 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op,
        bioc->replace_nr_stripes = nr_extra_stripes;
 }
 
-static bool need_full_stripe(enum btrfs_map_op op)
-{
-       return (op == BTRFS_MAP_WRITE || op == BTRFS_MAP_GET_READ_MIRRORS);
-}
-
 static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op,
                            u64 offset, u32 *stripe_nr, u64 *stripe_offset,
                            u64 *full_stripe_start)
 {
-       ASSERT(op != BTRFS_MAP_DISCARD);
-
        /*
         * Stripe_nr is the stripe where this block falls.  stripe_offset is
         * the offset of this block in its stripe.
@@ -6182,8 +6183,8 @@ static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op,
        ASSERT(*stripe_offset < U32_MAX);
 
        if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-               unsigned long full_stripe_len = nr_data_stripes(map) <<
-                                               BTRFS_STRIPE_LEN_SHIFT;
+               unsigned long full_stripe_len =
+                       btrfs_stripe_nr_to_offset(nr_data_stripes(map));
 
                /*
                 * For full stripe start, we use previously calculated
@@ -6195,9 +6196,11 @@ static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op,
                 * not ensured to be power of 2.
                 */
                *full_stripe_start =
-                       rounddown(*stripe_nr, nr_data_stripes(map)) <<
-                       BTRFS_STRIPE_LEN_SHIFT;
+                       btrfs_stripe_nr_to_offset(
+                               rounddown(*stripe_nr, nr_data_stripes(map)));
 
+               ASSERT(*full_stripe_start + full_stripe_len > offset);
+               ASSERT(*full_stripe_start <= offset);
                /*
                 * For writes to RAID56, allow to write a full stripe set, but
                 * no straddling of stripe sets.
@@ -6220,14 +6223,14 @@ static void set_io_stripe(struct btrfs_io_stripe *dst, const struct map_lookup *
 {
        dst->dev = map->stripes[stripe_index].dev;
        dst->physical = map->stripes[stripe_index].physical +
-                       stripe_offset + (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
+                       stripe_offset + btrfs_stripe_nr_to_offset(stripe_nr);
 }
 
-int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
-                     u64 logical, u64 *length,
-                     struct btrfs_io_context **bioc_ret,
-                     struct btrfs_io_stripe *smap, int *mirror_num_ret,
-                     int need_raid_map)
+int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
+                   u64 logical, u64 *length,
+                   struct btrfs_io_context **bioc_ret,
+                   struct btrfs_io_stripe *smap, int *mirror_num_ret,
+                   int need_raid_map)
 {
        struct extent_map *em;
        struct map_lookup *map;
@@ -6250,7 +6253,6 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
        u64 max_len;
 
        ASSERT(bioc_ret);
-       ASSERT(op != BTRFS_MAP_DISCARD);
 
        num_copies = btrfs_num_copies(fs_info, logical, fs_info->sectorsize);
        if (mirror_num > num_copies)
@@ -6282,21 +6284,21 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
        if (map->type & BTRFS_BLOCK_GROUP_RAID0) {
                stripe_index = stripe_nr % map->num_stripes;
                stripe_nr /= map->num_stripes;
-               if (!need_full_stripe(op))
+               if (op == BTRFS_MAP_READ)
                        mirror_num = 1;
        } else if (map->type & BTRFS_BLOCK_GROUP_RAID1_MASK) {
-               if (need_full_stripe(op))
+               if (op != BTRFS_MAP_READ) {
                        num_stripes = map->num_stripes;
-               else if (mirror_num)
+               } else if (mirror_num) {
                        stripe_index = mirror_num - 1;
-               else {
+               else {
                        stripe_index = find_live_mirror(fs_info, map, 0,
                                            dev_replace_is_ongoing);
                        mirror_num = stripe_index + 1;
                }
 
        } else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
-               if (need_full_stripe(op)) {
+               if (op != BTRFS_MAP_READ) {
                        num_stripes = map->num_stripes;
                } else if (mirror_num) {
                        stripe_index = mirror_num - 1;
@@ -6310,7 +6312,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
                stripe_index = (stripe_nr % factor) * map->sub_stripes;
                stripe_nr /= factor;
 
-               if (need_full_stripe(op))
+               if (op != BTRFS_MAP_READ)
                        num_stripes = map->sub_stripes;
                else if (mirror_num)
                        stripe_index += mirror_num - 1;
@@ -6323,7 +6325,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
                }
 
        } else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-               if (need_raid_map && (need_full_stripe(op) || mirror_num > 1)) {
+               if (need_raid_map && (op != BTRFS_MAP_READ || mirror_num > 1)) {
                        /*
                         * Push stripe_nr back to the start of the full stripe
                         * For those cases needing a full stripe, @stripe_nr
@@ -6342,7 +6344,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
                        /* Return the length to the full stripe end */
                        *length = min(logical + *length,
                                      raid56_full_stripe_start + em->start +
-                                     (data_stripes << BTRFS_STRIPE_LEN_SHIFT)) - logical;
+                                     btrfs_stripe_nr_to_offset(data_stripes)) -
+                                 logical;
                        stripe_index = 0;
                        stripe_offset = 0;
                } else {
@@ -6358,7 +6361,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 
                        /* We distribute the parity blocks across stripes */
                        stripe_index = (stripe_nr + stripe_index) % map->num_stripes;
-                       if (!need_full_stripe(op) && mirror_num <= 1)
+                       if (op == BTRFS_MAP_READ && mirror_num <= 1)
                                mirror_num = 1;
                }
        } else {
@@ -6398,7 +6401,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
         */
        if (smap && num_alloc_stripes == 1 &&
            !((map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) && mirror_num > 1) &&
-           (!need_full_stripe(op) || !dev_replace_is_ongoing ||
+           (op == BTRFS_MAP_READ || !dev_replace_is_ongoing ||
             !dev_replace->tgtdev)) {
                set_io_stripe(smap, map, stripe_index, stripe_offset, stripe_nr);
                *mirror_num_ret = mirror_num;
@@ -6422,7 +6425,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
         * It's still mostly the same as other profiles, just with extra rotation.
         */
        if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
-           (need_full_stripe(op) || mirror_num > 1)) {
+           (op != BTRFS_MAP_READ || mirror_num > 1)) {
                /*
                 * For RAID56 @stripe_nr is already the number of full stripes
                 * before us, which is also the rotation value (needs to modulo
@@ -6432,7 +6435,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
                 * modulo, to reduce one modulo call.
                 */
                bioc->full_stripe_logical = em->start +
-                       ((stripe_nr * data_stripes) << BTRFS_STRIPE_LEN_SHIFT);
+                       btrfs_stripe_nr_to_offset(stripe_nr * data_stripes);
                for (i = 0; i < num_stripes; i++)
                        set_io_stripe(&bioc->stripes[i], map,
                                      (i + stripe_nr) % num_stripes,
@@ -6449,11 +6452,11 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
                }
        }
 
-       if (need_full_stripe(op))
+       if (op != BTRFS_MAP_READ)
                max_errors = btrfs_chunk_max_errors(map);
 
        if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL &&
-           need_full_stripe(op)) {
+           op != BTRFS_MAP_READ) {
                handle_ops_on_dev_replace(op, bioc, dev_replace, logical,
                                          &num_stripes, &max_errors);
        }
@@ -6473,23 +6476,6 @@ out:
        return ret;
 }
 
-int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
-                     u64 logical, u64 *length,
-                     struct btrfs_io_context **bioc_ret, int mirror_num)
-{
-       return __btrfs_map_block(fs_info, op, logical, length, bioc_ret,
-                                NULL, &mirror_num, 0);
-}
-
-/* For Scrub/replace */
-int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
-                    u64 logical, u64 *length,
-                    struct btrfs_io_context **bioc_ret)
-{
-       return __btrfs_map_block(fs_info, op, logical, length, bioc_ret,
-                                NULL, NULL, 1);
-}
-
 static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,
                                      const struct btrfs_fs_devices *fs_devices)
 {
@@ -6909,7 +6895,7 @@ static struct btrfs_fs_devices *open_seed_devices(struct btrfs_fs_info *fs_info,
        if (IS_ERR(fs_devices))
                return fs_devices;
 
-       ret = open_fs_devices(fs_devices, FMODE_READ, fs_info->bdev_holder);
+       ret = open_fs_devices(fs_devices, BLK_OPEN_READ, fs_info->bdev_holder);
        if (ret) {
                free_fs_devices(fs_devices);
                return ERR_PTR(ret);
@@ -8029,7 +8015,7 @@ static void map_raid56_repair_block(struct btrfs_io_context *bioc,
 
        for (i = 0; i < data_stripes; i++) {
                u64 stripe_start = bioc->full_stripe_logical +
-                                  (i << BTRFS_STRIPE_LEN_SHIFT);
+                                  btrfs_stripe_nr_to_offset(i);
 
                if (logical >= stripe_start &&
                    logical < stripe_start + BTRFS_STRIPE_LEN)
@@ -8066,8 +8052,8 @@ int btrfs_map_repair_block(struct btrfs_fs_info *fs_info,
 
        ASSERT(mirror_num > 0);
 
-       ret = __btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical, &map_length,
-                               &bioc, smap, &mirror_ret, true);
+       ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical, &map_length,
+                             &bioc, smap, &mirror_ret, true);
        if (ret < 0)
                return ret;
 
index bf47a1a..b8c51f1 100644 (file)
@@ -94,8 +94,8 @@ struct btrfs_device {
 
        struct btrfs_zoned_device_info *zone_info;
 
-       /* the mode sent to blkdev_get */
-       fmode_t mode;
+       /* block device holder for blkdev_get/put */
+       void *holder;
 
        /*
         * Device's major-minor number. Must be set even if the device is not
@@ -280,8 +280,19 @@ enum btrfs_read_policy {
 
 struct btrfs_fs_devices {
        u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
+
+       /*
+        * UUID written into the btree blocks:
+        *
+        * - If metadata_uuid != fsid then super block must have
+        *   BTRFS_FEATURE_INCOMPAT_METADATA_UUID flag set.
+        *
+        * - Following shall be true at all times:
+        *   - metadata_uuid == btrfs_header::fsid
+        *   - metadata_uuid == btrfs_dev_item::fsid
+        */
        u8 metadata_uuid[BTRFS_FSID_SIZE];
-       bool fsid_change;
+
        struct list_head fs_list;
 
        /*
@@ -319,34 +330,32 @@ struct btrfs_fs_devices {
         */
        struct btrfs_device *latest_dev;
 
-       /* all of the devices in the FS, protected by a mutex
-        * so we can safely walk it to write out the supers without
-        * worrying about add/remove by the multi-device code.
-        * Scrubbing super can kick off supers writing by holding
-        * this mutex lock.
+       /*
+        * All of the devices in the filesystem, protected by a mutex so we can
+        * safely walk it to write out the super blocks without worrying about
+        * adding/removing by the multi-device code. Scrubbing super block can
+        * kick off supers writing by holding this mutex lock.
         */
        struct mutex device_list_mutex;
 
        /* List of all devices, protected by device_list_mutex */
        struct list_head devices;
 
-       /*
-        * Devices which can satisfy space allocation. Protected by
-        * chunk_mutex
-        */
+       /* Devices which can satisfy space allocation. Protected by * chunk_mutex. */
        struct list_head alloc_list;
 
        struct list_head seed_list;
-       bool seeding;
 
+       /* Count fs-devices opened. */
        int opened;
 
-       /* set when we find or add a device that doesn't have the
-        * nonrot flag set
-        */
+       /* Set when we find or add a device that doesn't have the nonrot flag set. */
        bool rotating;
-       /* Devices support TRIM/discard commands */
+       /* Devices support TRIM/discard commands. */
        bool discardable;
+       bool fsid_change;
+       /* The filesystem is a seed filesystem. */
+       bool seeding;
 
        struct btrfs_fs_info *fs_info;
        /* sysfs kobjects */
@@ -357,7 +366,7 @@ struct btrfs_fs_devices {
 
        enum btrfs_chunk_allocation_policy chunk_alloc_policy;
 
-       /* Policy used to read the mirrored stripes */
+       /* Policy used to read the mirrored stripes. */
        enum btrfs_read_policy read_policy;
 };
 
@@ -547,15 +556,12 @@ struct btrfs_dev_lookup_args {
 enum btrfs_map_op {
        BTRFS_MAP_READ,
        BTRFS_MAP_WRITE,
-       BTRFS_MAP_DISCARD,
        BTRFS_MAP_GET_READ_MIRRORS,
 };
 
 static inline enum btrfs_map_op btrfs_op(struct bio *bio)
 {
        switch (bio_op(bio)) {
-       case REQ_OP_DISCARD:
-               return BTRFS_MAP_DISCARD;
        case REQ_OP_WRITE:
        case REQ_OP_ZONE_APPEND:
                return BTRFS_MAP_WRITE;
@@ -574,19 +580,24 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes)
                sizeof(struct btrfs_stripe) * (num_stripes - 1);
 }
 
+/*
+ * Do the type safe converstion from stripe_nr to offset inside the chunk.
+ *
+ * @stripe_nr is u32, with left shift it can overflow u32 for chunks larger
+ * than 4G.  This does the proper type cast to avoid overflow.
+ */
+static inline u64 btrfs_stripe_nr_to_offset(u32 stripe_nr)
+{
+       return (u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT;
+}
+
 void btrfs_get_bioc(struct btrfs_io_context *bioc);
 void btrfs_put_bioc(struct btrfs_io_context *bioc);
 int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
                    u64 logical, u64 *length,
-                   struct btrfs_io_context **bioc_ret, int mirror_num);
-int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
-                    u64 logical, u64 *length,
-                    struct btrfs_io_context **bioc_ret);
-int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
-                     u64 logical, u64 *length,
-                     struct btrfs_io_context **bioc_ret,
-                     struct btrfs_io_stripe *smap, int *mirror_num_ret,
-                     int need_raid_map);
+                   struct btrfs_io_context **bioc_ret,
+                   struct btrfs_io_stripe *smap, int *mirror_num_ret,
+                   int need_raid_map);
 int btrfs_map_repair_block(struct btrfs_fs_info *fs_info,
                           struct btrfs_io_stripe *smap, u64 logical,
                           u32 length, int mirror_num);
@@ -599,9 +610,8 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
                                            u64 type);
 void btrfs_mapping_tree_free(struct extent_map_tree *tree);
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
-                      fmode_t flags, void *holder);
-struct btrfs_device *btrfs_scan_one_device(const char *path,
-                                          fmode_t flags, void *holder);
+                      blk_mode_t flags, void *holder);
+struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags);
 int btrfs_forget_devices(dev_t devt);
 void btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
 void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices);
@@ -617,10 +627,9 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
                                        const u64 *devid, const u8 *uuid,
                                        const char *path);
 void btrfs_put_dev_args_from_path(struct btrfs_dev_lookup_args *args);
-void btrfs_free_device(struct btrfs_device *device);
 int btrfs_rm_device(struct btrfs_fs_info *fs_info,
                    struct btrfs_dev_lookup_args *args,
-                   struct block_device **bdev, fmode_t *mode);
+                   struct block_device **bdev, void **holder);
 void __exit btrfs_cleanup_fs_uuids(void);
 int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len);
 int btrfs_grow_device(struct btrfs_trans_handle *trans,
index 8acb05e..6c231a1 100644 (file)
@@ -63,7 +63,7 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
 
        workspacesize = max(zlib_deflate_workspacesize(MAX_WBITS, MAX_MEM_LEVEL),
                        zlib_inflate_workspacesize());
-       workspace->strm.workspace = kvzalloc(workspacesize, GFP_KERNEL);
+       workspace->strm.workspace = kvzalloc(workspacesize, GFP_KERNEL | __GFP_NOWARN);
        workspace->level = level;
        workspace->buf = NULL;
        /*
index a9b32ba..85b8b33 100644 (file)
@@ -15,6 +15,7 @@
 #include "transaction.h"
 #include "dev-replace.h"
 #include "space-info.h"
+#include "super.h"
 #include "fs.h"
 #include "accessors.h"
 #include "bio.h"
@@ -122,10 +123,9 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
                int i;
 
                for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) {
-                       u64 bytenr;
-
-                       bytenr = ((zones[i].start + zones[i].len)
-                                  << SECTOR_SHIFT) - BTRFS_SUPER_INFO_SIZE;
+                       u64 zone_end = (zones[i].start + zones[i].capacity) << SECTOR_SHIFT;
+                       u64 bytenr = ALIGN_DOWN(zone_end, BTRFS_SUPER_INFO_SIZE) -
+                                               BTRFS_SUPER_INFO_SIZE;
 
                        page[i] = read_cache_page_gfp(mapping,
                                        bytenr >> PAGE_SHIFT, GFP_NOFS);
@@ -1058,7 +1058,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start,
 
                /* Check if zones in the region are all empty */
                if (btrfs_dev_is_sequential(device, pos) &&
-                   find_next_zero_bit(zinfo->empty_zones, end, begin) != end) {
+                   !bitmap_test_range_all_set(zinfo->empty_zones, begin, nzones)) {
                        pos += zinfo->zone_size;
                        continue;
                }
@@ -1157,23 +1157,23 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size)
        struct btrfs_zoned_device_info *zinfo = device->zone_info;
        const u8 shift = zinfo->zone_size_shift;
        unsigned long begin = start >> shift;
-       unsigned long end = (start + size) >> shift;
+       unsigned long nbits = size >> shift;
        u64 pos;
        int ret;
 
        ASSERT(IS_ALIGNED(start, zinfo->zone_size));
        ASSERT(IS_ALIGNED(size, zinfo->zone_size));
 
-       if (end > zinfo->nr_zones)
+       if (begin + nbits > zinfo->nr_zones)
                return -ERANGE;
 
        /* All the zones are conventional */
-       if (find_next_bit(zinfo->seq_zones, begin, end) == end)
+       if (bitmap_test_range_all_zero(zinfo->seq_zones, begin, nbits))
                return 0;
 
        /* All the zones are sequential and empty */
-       if (find_next_zero_bit(zinfo->seq_zones, begin, end) == end &&
-           find_next_zero_bit(zinfo->empty_zones, begin, end) == end)
+       if (bitmap_test_range_all_set(zinfo->seq_zones, begin, nbits) &&
+           bitmap_test_range_all_set(zinfo->empty_zones, begin, nbits))
                return 0;
 
        for (pos = start; pos < start + size; pos += zinfo->zone_size) {
@@ -1603,37 +1603,17 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache)
 void btrfs_redirty_list_add(struct btrfs_transaction *trans,
                            struct extent_buffer *eb)
 {
-       struct btrfs_fs_info *fs_info = eb->fs_info;
-
-       if (!btrfs_is_zoned(fs_info) ||
-           btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) ||
-           !list_empty(&eb->release_list))
+       if (!btrfs_is_zoned(eb->fs_info) ||
+           btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN))
                return;
 
-       set_extent_buffer_dirty(eb);
-       set_extent_bits_nowait(&trans->dirty_pages, eb->start,
-                              eb->start + eb->len - 1, EXTENT_DIRTY);
+       ASSERT(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+
        memzero_extent_buffer(eb, 0, eb->len);
        set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags);
-
-       spin_lock(&trans->releasing_ebs_lock);
-       list_add_tail(&eb->release_list, &trans->releasing_ebs);
-       spin_unlock(&trans->releasing_ebs_lock);
-       atomic_inc(&eb->refs);
-}
-
-void btrfs_free_redirty_list(struct btrfs_transaction *trans)
-{
-       spin_lock(&trans->releasing_ebs_lock);
-       while (!list_empty(&trans->releasing_ebs)) {
-               struct extent_buffer *eb;
-
-               eb = list_first_entry(&trans->releasing_ebs,
-                                     struct extent_buffer, release_list);
-               list_del_init(&eb->release_list);
-               free_extent_buffer(eb);
-       }
-       spin_unlock(&trans->releasing_ebs_lock);
+       set_extent_buffer_dirty(eb);
+       set_extent_bit(&trans->dirty_pages, eb->start, eb->start + eb->len - 1,
+                       EXTENT_DIRTY | EXTENT_NOWAIT, NULL);
 }
 
 bool btrfs_use_zone_append(struct btrfs_bio *bbio)
@@ -1678,63 +1658,89 @@ bool btrfs_use_zone_append(struct btrfs_bio *bbio)
 void btrfs_record_physical_zoned(struct btrfs_bio *bbio)
 {
        const u64 physical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
-       struct btrfs_ordered_extent *ordered;
-
-       ordered = btrfs_lookup_ordered_extent(bbio->inode, bbio->file_offset);
-       if (WARN_ON(!ordered))
-               return;
+       struct btrfs_ordered_sum *sum = bbio->sums;
 
-       ordered->physical = physical;
-       btrfs_put_ordered_extent(ordered);
+       if (physical < bbio->orig_physical)
+               sum->logical -= bbio->orig_physical - physical;
+       else
+               sum->logical += physical - bbio->orig_physical;
 }
 
-void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered)
+static void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered,
+                                       u64 logical)
 {
-       struct btrfs_inode *inode = BTRFS_I(ordered->inode);
-       struct btrfs_fs_info *fs_info = inode->root->fs_info;
-       struct extent_map_tree *em_tree;
+       struct extent_map_tree *em_tree = &BTRFS_I(ordered->inode)->extent_tree;
        struct extent_map *em;
-       struct btrfs_ordered_sum *sum;
-       u64 orig_logical = ordered->disk_bytenr;
-       struct map_lookup *map;
-       u64 physical = ordered->physical;
-       u64 chunk_start_phys;
-       u64 logical;
-
-       em = btrfs_get_chunk_map(fs_info, orig_logical, 1);
-       if (IS_ERR(em))
-               return;
-       map = em->map_lookup;
-       chunk_start_phys = map->stripes[0].physical;
-
-       if (WARN_ON_ONCE(map->num_stripes > 1) ||
-           WARN_ON_ONCE((map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) != 0) ||
-           WARN_ON_ONCE(physical < chunk_start_phys) ||
-           WARN_ON_ONCE(physical > chunk_start_phys + em->orig_block_len)) {
-               free_extent_map(em);
-               return;
-       }
-       logical = em->start + (physical - map->stripes[0].physical);
-       free_extent_map(em);
-
-       if (orig_logical == logical)
-               return;
 
        ordered->disk_bytenr = logical;
 
-       em_tree = &inode->extent_tree;
        write_lock(&em_tree->lock);
        em = search_extent_mapping(em_tree, ordered->file_offset,
                                   ordered->num_bytes);
        em->block_start = logical;
        free_extent_map(em);
        write_unlock(&em_tree->lock);
+}
 
-       list_for_each_entry(sum, &ordered->list, list) {
-               if (logical < orig_logical)
-                       sum->bytenr -= orig_logical - logical;
-               else
-                       sum->bytenr += logical - orig_logical;
+static bool btrfs_zoned_split_ordered(struct btrfs_ordered_extent *ordered,
+                                     u64 logical, u64 len)
+{
+       struct btrfs_ordered_extent *new;
+
+       if (!test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags) &&
+           split_extent_map(BTRFS_I(ordered->inode), ordered->file_offset,
+                            ordered->num_bytes, len, logical))
+               return false;
+
+       new = btrfs_split_ordered_extent(ordered, len);
+       if (IS_ERR(new))
+               return false;
+       new->disk_bytenr = logical;
+       btrfs_finish_one_ordered(new);
+       return true;
+}
+
+void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
+{
+       struct btrfs_inode *inode = BTRFS_I(ordered->inode);
+       struct btrfs_fs_info *fs_info = inode->root->fs_info;
+       struct btrfs_ordered_sum *sum =
+               list_first_entry(&ordered->list, typeof(*sum), list);
+       u64 logical = sum->logical;
+       u64 len = sum->len;
+
+       while (len < ordered->disk_num_bytes) {
+               sum = list_next_entry(sum, list);
+               if (sum->logical == logical + len) {
+                       len += sum->len;
+                       continue;
+               }
+               if (!btrfs_zoned_split_ordered(ordered, logical, len)) {
+                       set_bit(BTRFS_ORDERED_IOERR, &ordered->flags);
+                       btrfs_err(fs_info, "failed to split ordered extent");
+                       goto out;
+               }
+               logical = sum->logical;
+               len = sum->len;
+       }
+
+       if (ordered->disk_bytenr != logical)
+               btrfs_rewrite_logical_zoned(ordered, logical);
+
+out:
+       /*
+        * If we end up here for nodatasum I/O, the btrfs_ordered_sum structures
+        * were allocated by btrfs_alloc_dummy_sum only to record the logical
+        * addresses and don't contain actual checksums.  We thus must free them
+        * here so that we don't attempt to log the csums later.
+        */
+       if ((inode->flags & BTRFS_INODE_NODATASUM) ||
+           test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state)) {
+               while ((sum = list_first_entry_or_null(&ordered->list,
+                                                      typeof(*sum), list))) {
+                       list_del(&sum->list);
+                       kfree(sum);
+               }
        }
 }
 
@@ -1793,8 +1799,8 @@ static int read_zone_info(struct btrfs_fs_info *fs_info, u64 logical,
        int nmirrors;
        int i, ret;
 
-       ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical,
-                              &mapped_length, &bioc);
+       ret = btrfs_map_block(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical,
+                             &mapped_length, &bioc, NULL, NULL, 1);
        if (ret || !bioc || mapped_length < PAGE_SIZE) {
                ret = -EIO;
                goto out_put_bioc;
index c0570d3..27322b9 100644 (file)
@@ -30,6 +30,8 @@ struct btrfs_zoned_device_info {
        struct blk_zone sb_zones[2 * BTRFS_SUPER_MIRROR_MAX];
 };
 
+void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered);
+
 #ifdef CONFIG_BLK_DEV_ZONED
 int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
                       struct blk_zone *zone);
@@ -54,10 +56,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new);
 void btrfs_calc_zone_unusable(struct btrfs_block_group *cache);
 void btrfs_redirty_list_add(struct btrfs_transaction *trans,
                            struct extent_buffer *eb);
-void btrfs_free_redirty_list(struct btrfs_transaction *trans);
 bool btrfs_use_zone_append(struct btrfs_bio *bbio);
 void btrfs_record_physical_zoned(struct btrfs_bio *bbio);
-void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered);
 bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info,
                                    struct extent_buffer *eb,
                                    struct btrfs_block_group **cache_ret);
@@ -179,7 +179,6 @@ static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { }
 
 static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans,
                                          struct extent_buffer *eb) { }
-static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { }
 
 static inline bool btrfs_use_zone_append(struct btrfs_bio *bbio)
 {
@@ -190,9 +189,6 @@ static inline void btrfs_record_physical_zoned(struct btrfs_bio *bbio)
 {
 }
 
-static inline void btrfs_rewrite_logical_zoned(
-                               struct btrfs_ordered_extent *ordered) { }
-
 static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info,
                               struct extent_buffer *eb,
                               struct btrfs_block_group **cache_ret)
index f798da2..e7ac4ec 100644 (file)
@@ -356,7 +356,7 @@ struct list_head *zstd_alloc_workspace(unsigned int level)
        workspace->level = level;
        workspace->req_level = level;
        workspace->last_used = jiffies;
-       workspace->mem = kvmalloc(workspace->size, GFP_KERNEL);
+       workspace->mem = kvmalloc(workspace->size, GFP_KERNEL | __GFP_NOWARN);
        workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
        if (!workspace->mem || !workspace->buf)
                goto fail;
index a7fc561..93c7446 100644 (file)
@@ -111,7 +111,6 @@ void buffer_check_dirty_writeback(struct folio *folio,
                bh = bh->b_this_page;
        } while (bh != head);
 }
-EXPORT_SYMBOL(buffer_check_dirty_writeback);
 
 /*
  * Block until a buffer comes unlocked.  This doesn't stop it
@@ -2760,8 +2759,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 
        bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 
-       bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
-       BUG_ON(bio->bi_iter.bi_size != bh->b_size);
+       __bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
 
        bio->bi_end_io = end_bio_bh_io_sync;
        bio->bi_private = bh;
index 82219a8..d9d22d0 100644 (file)
@@ -451,9 +451,10 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 
        ret = cachefiles_inject_write_error();
        if (ret == 0) {
-               file = vfs_tmpfile_open(&nop_mnt_idmap, &parentpath, S_IFREG,
-                                       O_RDWR | O_LARGEFILE | O_DIRECT,
-                                       cache->cache_cred);
+               file = kernel_tmpfile_open(&nop_mnt_idmap, &parentpath,
+                                          S_IFREG | 0600,
+                                          O_RDWR | O_LARGEFILE | O_DIRECT,
+                                          cache->cache_cred);
                ret = PTR_ERR_OR_ZERO(file);
        }
        if (ret) {
@@ -560,8 +561,8 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
         */
        path.mnt = cache->mnt;
        path.dentry = dentry;
-       file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE | O_DIRECT,
-                                  d_backing_inode(dentry), cache->cache_cred);
+       file = kernel_file_open(&path, O_RDWR | O_LARGEFILE | O_DIRECT,
+                               d_backing_inode(dentry), cache->cache_cred);
        if (IS_ERR(file)) {
                trace_cachefiles_vfs_error(object, d_backing_inode(dentry),
                                           PTR_ERR(file),
index 789be30..2321e5d 100644 (file)
@@ -1627,6 +1627,7 @@ void ceph_flush_snaps(struct ceph_inode_info *ci,
        struct inode *inode = &ci->netfs.inode;
        struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
        struct ceph_mds_session *session = NULL;
+       bool need_put = false;
        int mds;
 
        dout("ceph_flush_snaps %p\n", inode);
@@ -1671,8 +1672,13 @@ out:
                ceph_put_mds_session(session);
        /* we flushed them all; remove this inode from the queue */
        spin_lock(&mdsc->snap_flush_lock);
+       if (!list_empty(&ci->i_snap_flush_item))
+               need_put = true;
        list_del_init(&ci->i_snap_flush_item);
        spin_unlock(&mdsc->snap_flush_lock);
+
+       if (need_put)
+               iput(inode);
 }
 
 /*
index f4d8bf7..4285f6c 100644 (file)
@@ -1746,6 +1746,69 @@ again:
 }
 
 /*
+ * Wrap filemap_splice_read with checks for cap bits on the inode.
+ * Atomically grab references, so that those bits are not released
+ * back to the MDS mid-read.
+ */
+static ssize_t ceph_splice_read(struct file *in, loff_t *ppos,
+                               struct pipe_inode_info *pipe,
+                               size_t len, unsigned int flags)
+{
+       struct ceph_file_info *fi = in->private_data;
+       struct inode *inode = file_inode(in);
+       struct ceph_inode_info *ci = ceph_inode(inode);
+       ssize_t ret;
+       int want = 0, got = 0;
+       CEPH_DEFINE_RW_CONTEXT(rw_ctx, 0);
+
+       dout("splice_read %p %llx.%llx %llu~%zu trying to get caps on %p\n",
+            inode, ceph_vinop(inode), *ppos, len, inode);
+
+       if (ceph_inode_is_shutdown(inode))
+               return -ESTALE;
+
+       if (ceph_has_inline_data(ci) ||
+           (fi->flags & CEPH_F_SYNC))
+               return copy_splice_read(in, ppos, pipe, len, flags);
+
+       ceph_start_io_read(inode);
+
+       want = CEPH_CAP_FILE_CACHE;
+       if (fi->fmode & CEPH_FILE_MODE_LAZY)
+               want |= CEPH_CAP_FILE_LAZYIO;
+
+       ret = ceph_get_caps(in, CEPH_CAP_FILE_RD, want, -1, &got);
+       if (ret < 0)
+               goto out_end;
+
+       if ((got & (CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_LAZYIO)) == 0) {
+               dout("splice_read/sync %p %llx.%llx %llu~%zu got cap refs on %s\n",
+                    inode, ceph_vinop(inode), *ppos, len,
+                    ceph_cap_string(got));
+
+               ceph_put_cap_refs(ci, got);
+               ceph_end_io_read(inode);
+               return copy_splice_read(in, ppos, pipe, len, flags);
+       }
+
+       dout("splice_read %p %llx.%llx %llu~%zu got cap refs on %s\n",
+            inode, ceph_vinop(inode), *ppos, len, ceph_cap_string(got));
+
+       rw_ctx.caps = got;
+       ceph_add_rw_context(fi, &rw_ctx);
+       ret = filemap_splice_read(in, ppos, pipe, len, flags);
+       ceph_del_rw_context(fi, &rw_ctx);
+
+       dout("splice_read %p %llx.%llx dropping cap refs on %s = %zd\n",
+            inode, ceph_vinop(inode), ceph_cap_string(got), ret);
+
+       ceph_put_cap_refs(ci, got);
+out_end:
+       ceph_end_io_read(inode);
+       return ret;
+}
+
+/*
  * Take cap references to avoid releasing caps to MDS mid-write.
  *
  * If we are synchronous, and write with an old snap context, the OSD
@@ -2593,7 +2656,7 @@ const struct file_operations ceph_file_fops = {
        .lock = ceph_lock,
        .setlease = simple_nosetlease,
        .flock = ceph_flock,
-       .splice_read = generic_file_splice_read,
+       .splice_read = ceph_splice_read,
        .splice_write = iter_file_splice_write,
        .unlocked_ioctl = ceph_ioctl,
        .compat_ioctl = compat_ptr_ioctl,
index 29cf002..4c0f22a 100644 (file)
@@ -3942,7 +3942,7 @@ static int reconnect_caps_cb(struct inode *inode, int mds, void *arg)
        struct dentry *dentry;
        struct ceph_cap *cap;
        char *path;
-       int pathlen = 0, err = 0;
+       int pathlen = 0, err;
        u64 pathbase;
        u64 snap_follows;
 
@@ -3965,6 +3965,7 @@ static int reconnect_caps_cb(struct inode *inode, int mds, void *arg)
        cap = __get_cap_for_mds(ci, mds);
        if (!cap) {
                spin_unlock(&ci->i_ceph_lock);
+               err = 0;
                goto out_err;
        }
        dout(" adding %p ino %llx.%llx cap %p %lld %s\n",
index 8700720..2e73ba6 100644 (file)
@@ -693,8 +693,10 @@ int __ceph_finish_cap_snap(struct ceph_inode_info *ci,
             capsnap->size);
 
        spin_lock(&mdsc->snap_flush_lock);
-       if (list_empty(&ci->i_snap_flush_item))
+       if (list_empty(&ci->i_snap_flush_item)) {
+               ihold(inode);
                list_add_tail(&ci->i_snap_flush_item, &mdsc->snap_flush_list);
+       }
        spin_unlock(&mdsc->snap_flush_lock);
        return 1;  /* caller may want to ceph_flush_snaps */
 }
@@ -1111,6 +1113,19 @@ skip_inode:
                                continue;
                        adjust_snap_realm_parent(mdsc, child, realm->ino);
                }
+       } else {
+               /*
+                * In the non-split case both 'num_split_inos' and
+                * 'num_split_realms' should be 0, making this a no-op.
+                * However the MDS happens to populate 'split_realms' list
+                * in one of the UPDATE op cases by mistake.
+                *
+                * Skip both lists just in case to ensure that 'p' is
+                * positioned at the start of realm info, as expected by
+                * ceph_update_snap_trace().
+                */
+               p += sizeof(u64) * num_split_inos;
+               p += sizeof(u64) * num_split_realms;
        }
 
        /*
index 13deb45..950b691 100644 (file)
@@ -150,7 +150,7 @@ __register_chrdev_region(unsigned int major, unsigned int baseminor,
        cd->major = major;
        cd->baseminor = baseminor;
        cd->minorct = minorct;
-       strlcpy(cd->name, name, sizeof(cd->name));
+       strscpy(cd->name, name, sizeof(cd->name));
 
        if (!prev) {
                cd->next = curr;
index 3f3c81e..12b26bd 100644 (file)
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/uio.h>
+#include <linux/splice.h>
 
 #include <linux/coda.h>
 #include "coda_psdev.h"
@@ -94,6 +95,32 @@ finish_write:
        return ret;
 }
 
+static ssize_t
+coda_file_splice_read(struct file *coda_file, loff_t *ppos,
+                     struct pipe_inode_info *pipe,
+                     size_t len, unsigned int flags)
+{
+       struct inode *coda_inode = file_inode(coda_file);
+       struct coda_file_info *cfi = coda_ftoc(coda_file);
+       struct file *in = cfi->cfi_container;
+       loff_t ki_pos = *ppos;
+       ssize_t ret;
+
+       ret = venus_access_intent(coda_inode->i_sb, coda_i2f(coda_inode),
+                                 &cfi->cfi_access_intent,
+                                 len, ki_pos, CODA_ACCESS_TYPE_READ);
+       if (ret)
+               goto finish_read;
+
+       ret = vfs_splice_read(in, ppos, pipe, len, flags);
+
+finish_read:
+       venus_access_intent(coda_inode->i_sb, coda_i2f(coda_inode),
+                           &cfi->cfi_access_intent,
+                           len, ki_pos, CODA_ACCESS_TYPE_READ_FINISH);
+       return ret;
+}
+
 static void
 coda_vm_open(struct vm_area_struct *vma)
 {
@@ -302,5 +329,5 @@ const struct file_operations coda_file_operations = {
        .open           = coda_open,
        .release        = coda_release,
        .fsync          = coda_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = coda_file_splice_read,
 };
index ece7bad..9d235fa 100644 (file)
@@ -371,7 +371,9 @@ static int zap_process(struct task_struct *start, int exit_code)
                if (t != current && !(t->flags & PF_POSTCOREDUMP)) {
                        sigaddset(&t->pending.signal, SIGKILL);
                        signal_wake_up(t, 1);
-                       nr++;
+                       /* The vhost_worker does not particpate in coredumps */
+                       if ((t->flags & (PF_USER_WORKER | PF_IO_WORKER)) != PF_USER_WORKER)
+                               nr++;
                }
        }
 
@@ -646,7 +648,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
        } else {
                struct mnt_idmap *idmap;
                struct inode *inode;
-               int open_flags = O_CREAT | O_RDWR | O_NOFOLLOW |
+               int open_flags = O_CREAT | O_WRONLY | O_NOFOLLOW |
                                 O_LARGEFILE | O_EXCL;
 
                if (cprm.limit < binfmt->min_coredump)
index 006ef68..27c6597 100644 (file)
@@ -473,7 +473,7 @@ static unsigned int cramfs_physmem_mmap_capabilities(struct file *file)
 static const struct file_operations cramfs_physmem_fops = {
        .llseek                 = generic_file_llseek,
        .read_iter              = generic_file_read_iter,
-       .splice_read            = generic_file_splice_read,
+       .splice_read            = filemap_splice_read,
        .mmap                   = cramfs_physmem_mmap,
 #ifndef CONFIG_MMU
        .get_unmapped_area      = cramfs_physmem_get_unmapped_area,
index 7ab5a7b..2d63da4 100644 (file)
@@ -171,7 +171,7 @@ fscrypt_policy_flags(const union fscrypt_policy *policy)
  */
 struct fscrypt_symlink_data {
        __le16 len;
-       char encrypted_path[1];
+       char encrypted_path[];
 } __packed;
 
 /**
index 9e786ae..6238dbc 100644 (file)
@@ -255,10 +255,10 @@ int fscrypt_prepare_symlink(struct inode *dir, const char *target,
         * for now since filesystems will assume it is there and subtract it.
         */
        if (!__fscrypt_fname_encrypted_size(policy, len,
-                                           max_len - sizeof(struct fscrypt_symlink_data),
+                                           max_len - sizeof(struct fscrypt_symlink_data) - 1,
                                            &disk_link->len))
                return -ENAMETOOLONG;
-       disk_link->len += sizeof(struct fscrypt_symlink_data);
+       disk_link->len += sizeof(struct fscrypt_symlink_data) + 1;
 
        disk_link->name = NULL;
        return 0;
@@ -289,7 +289,7 @@ int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
                if (!sd)
                        return -ENOMEM;
        }
-       ciphertext_len = disk_link->len - sizeof(*sd);
+       ciphertext_len = disk_link->len - sizeof(*sd) - 1;
        sd->len = cpu_to_le16(ciphertext_len);
 
        err = fscrypt_fname_encrypt(inode, &iname, sd->encrypted_path,
@@ -367,7 +367,7 @@ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
         * the ciphertext length, even though this is redundant with i_size.
         */
 
-       if (max_size < sizeof(*sd))
+       if (max_size < sizeof(*sd) + 1)
                return ERR_PTR(-EUCLEAN);
        sd = caddr;
        cstr.name = (unsigned char *)sd->encrypted_path;
@@ -376,7 +376,7 @@ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
        if (cstr.len == 0)
                return ERR_PTR(-EUCLEAN);
 
-       if (cstr.len + sizeof(*sd) - 1 > max_size)
+       if (cstr.len + sizeof(*sd) > max_size)
                return ERR_PTR(-EUCLEAN);
 
        err = fscrypt_fname_alloc_buffer(cstr.len, &pstr);
index 56a6ee4..5f4da5c 100644 (file)
@@ -7,6 +7,7 @@
 #include <linux/slab.h>
 #include <linux/prefetch.h>
 #include "mount.h"
+#include "internal.h"
 
 struct prepend_buffer {
        char *buf;
index 0b380bb..2ceb378 100644 (file)
@@ -42,8 +42,8 @@
 #include "internal.h"
 
 /*
- * How many user pages to map in one call to get_user_pages().  This determines
- * the size of a structure in the slab cache
+ * How many user pages to map in one call to iov_iter_extract_pages().  This
+ * determines the size of a structure in the slab cache
  */
 #define DIO_PAGES      64
 
@@ -121,12 +121,13 @@ struct dio {
        struct inode *inode;
        loff_t i_size;                  /* i_size when submitted */
        dio_iodone_t *end_io;           /* IO completion function */
+       bool is_pinned;                 /* T if we have pins on the pages */
 
        void *private;                  /* copy from map_bh.b_private */
 
        /* BIO completion state */
        spinlock_t bio_lock;            /* protects BIO fields below */
-       int page_errors;                /* errno from get_user_pages() */
+       int page_errors;                /* err from iov_iter_extract_pages() */
        int is_async;                   /* is IO async ? */
        bool defer_completion;          /* defer AIO completion to workqueue? */
        bool should_dirty;              /* if pages should be dirtied */
@@ -165,14 +166,14 @@ static inline unsigned dio_pages_present(struct dio_submit *sdio)
  */
 static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
 {
+       struct page **pages = dio->pages;
        const enum req_op dio_op = dio->opf & REQ_OP_MASK;
        ssize_t ret;
 
-       ret = iov_iter_get_pages2(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES,
-                               &sdio->from);
+       ret = iov_iter_extract_pages(sdio->iter, &pages, LONG_MAX,
+                                    DIO_PAGES, 0, &sdio->from);
 
        if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) {
-               struct page *page = ZERO_PAGE(0);
                /*
                 * A memory fault, but the filesystem has some outstanding
                 * mapped blocks.  We need to use those blocks up to avoid
@@ -180,8 +181,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
                 */
                if (dio->page_errors == 0)
                        dio->page_errors = ret;
-               get_page(page);
-               dio->pages[0] = page;
+               dio->pages[0] = ZERO_PAGE(0);
                sdio->head = 0;
                sdio->tail = 1;
                sdio->from = 0;
@@ -201,9 +201,9 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
 
 /*
  * Get another userspace page.  Returns an ERR_PTR on error.  Pages are
- * buffered inside the dio so that we can call get_user_pages() against a
- * decent number of pages, less frequently.  To provide nicer use of the
- * L1 cache.
+ * buffered inside the dio so that we can call iov_iter_extract_pages()
+ * against a decent number of pages, less frequently.  To provide nicer use of
+ * the L1 cache.
  */
 static inline struct page *dio_get_page(struct dio *dio,
                                        struct dio_submit *sdio)
@@ -219,6 +219,18 @@ static inline struct page *dio_get_page(struct dio *dio,
        return dio->pages[sdio->head];
 }
 
+static void dio_pin_page(struct dio *dio, struct page *page)
+{
+       if (dio->is_pinned)
+               folio_add_pin(page_folio(page));
+}
+
+static void dio_unpin_page(struct dio *dio, struct page *page)
+{
+       if (dio->is_pinned)
+               unpin_user_page(page);
+}
+
 /*
  * dio_complete() - called when all DIO BIO I/O has been completed
  *
@@ -402,6 +414,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
                bio->bi_end_io = dio_bio_end_aio;
        else
                bio->bi_end_io = dio_bio_end_io;
+       if (dio->is_pinned)
+               bio_set_flag(bio, BIO_PAGE_PINNED);
        sdio->bio = bio;
        sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
 }
@@ -442,8 +456,10 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
  */
 static inline void dio_cleanup(struct dio *dio, struct dio_submit *sdio)
 {
-       while (sdio->head < sdio->tail)
-               put_page(dio->pages[sdio->head++]);
+       if (dio->is_pinned)
+               unpin_user_pages(dio->pages + sdio->head,
+                                sdio->tail - sdio->head);
+       sdio->head = sdio->tail;
 }
 
 /*
@@ -674,7 +690,7 @@ out:
  *
  * Return zero on success.  Non-zero means the caller needs to start a new BIO.
  */
-static inline int dio_bio_add_page(struct dio_submit *sdio)
+static inline int dio_bio_add_page(struct dio *dio, struct dio_submit *sdio)
 {
        int ret;
 
@@ -686,7 +702,7 @@ static inline int dio_bio_add_page(struct dio_submit *sdio)
                 */
                if ((sdio->cur_page_len + sdio->cur_page_offset) == PAGE_SIZE)
                        sdio->pages_in_io--;
-               get_page(sdio->cur_page);
+               dio_pin_page(dio, sdio->cur_page);
                sdio->final_block_in_bio = sdio->cur_page_block +
                        (sdio->cur_page_len >> sdio->blkbits);
                ret = 0;
@@ -741,11 +757,11 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio,
                        goto out;
        }
 
-       if (dio_bio_add_page(sdio) != 0) {
+       if (dio_bio_add_page(dio, sdio) != 0) {
                dio_bio_submit(dio, sdio);
                ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh);
                if (ret == 0) {
-                       ret = dio_bio_add_page(sdio);
+                       ret = dio_bio_add_page(dio, sdio);
                        BUG_ON(ret != 0);
                }
        }
@@ -802,13 +818,13 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page,
         */
        if (sdio->cur_page) {
                ret = dio_send_cur_page(dio, sdio, map_bh);
-               put_page(sdio->cur_page);
+               dio_unpin_page(dio, sdio->cur_page);
                sdio->cur_page = NULL;
                if (ret)
                        return ret;
        }
 
-       get_page(page);         /* It is in dio */
+       dio_pin_page(dio, page);                /* It is in dio */
        sdio->cur_page = page;
        sdio->cur_page_offset = offset;
        sdio->cur_page_len = len;
@@ -823,7 +839,7 @@ out:
                ret = dio_send_cur_page(dio, sdio, map_bh);
                if (sdio->bio)
                        dio_bio_submit(dio, sdio);
-               put_page(sdio->cur_page);
+               dio_unpin_page(dio, sdio->cur_page);
                sdio->cur_page = NULL;
        }
        return ret;
@@ -924,7 +940,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 
                                ret = get_more_blocks(dio, sdio, map_bh);
                                if (ret) {
-                                       put_page(page);
+                                       dio_unpin_page(dio, page);
                                        goto out;
                                }
                                if (!buffer_mapped(map_bh))
@@ -969,7 +985,7 @@ do_holes:
 
                                /* AKPM: eargh, -ENOTBLK is a hack */
                                if (dio_op == REQ_OP_WRITE) {
-                                       put_page(page);
+                                       dio_unpin_page(dio, page);
                                        return -ENOTBLK;
                                }
 
@@ -982,7 +998,7 @@ do_holes:
                                if (sdio->block_in_file >=
                                                i_size_aligned >> blkbits) {
                                        /* We hit eof */
-                                       put_page(page);
+                                       dio_unpin_page(dio, page);
                                        goto out;
                                }
                                zero_user(page, from, 1 << blkbits);
@@ -1022,7 +1038,7 @@ do_holes:
                                                  sdio->next_block_for_io,
                                                  map_bh);
                        if (ret) {
-                               put_page(page);
+                               dio_unpin_page(dio, page);
                                goto out;
                        }
                        sdio->next_block_for_io += this_chunk_blocks;
@@ -1037,8 +1053,8 @@ next_block:
                                break;
                }
 
-               /* Drop the ref which was taken in get_user_pages() */
-               put_page(page);
+               /* Drop the pin which was taken in get_user_pages() */
+               dio_unpin_page(dio, page);
        }
 out:
        return ret;
@@ -1133,6 +1149,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
                /* will be released by direct_io_worker */
                inode_lock(inode);
        }
+       dio->is_pinned = iov_iter_extract_will_pin(iter);
 
        /* Once we sampled i_size check for reads beyond EOF */
        dio->i_size = i_size_read(inode);
@@ -1257,7 +1274,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
                ret2 = dio_send_cur_page(dio, &sdio, &map_bh);
                if (retval == 0)
                        retval = ret2;
-               put_page(sdio.cur_page);
+               dio_unpin_page(dio, sdio.cur_page);
                sdio.cur_page = NULL;
        }
        if (sdio.bio)
index 268b744..ce0a3c5 100644 (file)
@@ -44,6 +44,31 @@ static ssize_t ecryptfs_read_update_atime(struct kiocb *iocb,
        return rc;
 }
 
+/*
+ * ecryptfs_splice_read_update_atime
+ *
+ * filemap_splice_read updates the atime of upper layer inode.  But, it
+ * doesn't give us a chance to update the atime of the lower layer inode.  This
+ * function is a wrapper to generic_file_read.  It updates the atime of the
+ * lower level inode if generic_file_read returns without any errors. This is
+ * to be used only for file reads.  The function to be used for directory reads
+ * is ecryptfs_read.
+ */
+static ssize_t ecryptfs_splice_read_update_atime(struct file *in, loff_t *ppos,
+                                                struct pipe_inode_info *pipe,
+                                                size_t len, unsigned int flags)
+{
+       ssize_t rc;
+       const struct path *path;
+
+       rc = filemap_splice_read(in, ppos, pipe, len, flags);
+       if (rc >= 0) {
+               path = ecryptfs_dentry_to_lower_path(in->f_path.dentry);
+               touch_atime(path);
+       }
+       return rc;
+}
+
 struct ecryptfs_getdents_callback {
        struct dir_context ctx;
        struct dir_context *caller;
@@ -414,5 +439,5 @@ const struct file_operations ecryptfs_main_fops = {
        .release = ecryptfs_release,
        .fsync = ecryptfs_fsync,
        .fasync = ecryptfs_fasync,
-       .splice_read = generic_file_splice_read,
+       .splice_read = ecryptfs_splice_read_update_atime,
 };
index 704fb59..f259d92 100644 (file)
@@ -121,6 +121,7 @@ config EROFS_FS_PCPU_KTHREAD
 config EROFS_FS_PCPU_KTHREAD_HIPRI
        bool "EROFS high priority per-CPU kthread workers"
        depends on EROFS_FS_ZIP && EROFS_FS_PCPU_KTHREAD
+       default y
        help
          This permits EROFS to configure per-CPU kthread workers to run
          at higher priority.
index 99bbc59..a3a98fc 100644 (file)
@@ -1,8 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 obj-$(CONFIG_EROFS_FS) += erofs.o
-erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o
+erofs-objs := super.o inode.o data.o namei.o dir.o utils.o sysfs.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
-erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
+erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o pcpubuf.o
 erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
index 26fa170..b1b8465 100644 (file)
@@ -89,8 +89,7 @@ static inline bool erofs_page_is_managed(const struct erofs_sb_info *sbi,
 
 int z_erofs_fixup_insize(struct z_erofs_decompress_req *rq, const char *padbuf,
                         unsigned int padbufsize);
-int z_erofs_decompress(struct z_erofs_decompress_req *rq,
-                      struct page **pagepool);
+extern const struct z_erofs_decompressor erofs_decompressors[];
 
 /* prototypes for specific algorithms */
 int z_erofs_lzma_decompress(struct z_erofs_decompress_req *rq,
index 6fe9a77..db5e4b7 100644 (file)
@@ -448,5 +448,5 @@ const struct file_operations erofs_file_fops = {
        .llseek         = generic_file_llseek,
        .read_iter      = erofs_file_read_iter,
        .mmap           = erofs_file_mmap,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
 };
index 7021e2c..2a29943 100644 (file)
@@ -363,7 +363,7 @@ static int z_erofs_transform_plain(struct z_erofs_decompress_req *rq,
        return 0;
 }
 
-static struct z_erofs_decompressor decompressors[] = {
+const struct z_erofs_decompressor erofs_decompressors[] = {
        [Z_EROFS_COMPRESSION_SHIFTED] = {
                .decompress = z_erofs_transform_plain,
                .name = "shifted"
@@ -383,9 +383,3 @@ static struct z_erofs_decompressor decompressors[] = {
        },
 #endif
 };
-
-int z_erofs_decompress(struct z_erofs_decompress_req *rq,
-                      struct page **pagepool)
-{
-       return decompressors[rq->alg].decompress(rq, pagepool);
-}
index af0431a..36e32fa 100644 (file)
@@ -208,46 +208,12 @@ enum {
        EROFS_ZIP_CACHE_READAROUND
 };
 
-#define EROFS_LOCKED_MAGIC     (INT_MIN | 0xE0F510CCL)
-
 /* basic unit of the workstation of a super_block */
 struct erofs_workgroup {
-       /* the workgroup index in the workstation */
        pgoff_t index;
-
-       /* overall workgroup reference count */
-       atomic_t refcount;
+       struct lockref lockref;
 };
 
-static inline bool erofs_workgroup_try_to_freeze(struct erofs_workgroup *grp,
-                                                int val)
-{
-       preempt_disable();
-       if (val != atomic_cmpxchg(&grp->refcount, val, EROFS_LOCKED_MAGIC)) {
-               preempt_enable();
-               return false;
-       }
-       return true;
-}
-
-static inline void erofs_workgroup_unfreeze(struct erofs_workgroup *grp,
-                                           int orig_val)
-{
-       /*
-        * other observers should notice all modifications
-        * in the freezing period.
-        */
-       smp_mb();
-       atomic_set(&grp->refcount, orig_val);
-       preempt_enable();
-}
-
-static inline int erofs_wait_on_workgroup_freezed(struct erofs_workgroup *grp)
-{
-       return atomic_cond_read_relaxed(&grp->refcount,
-                                       VAL != EROFS_LOCKED_MAGIC);
-}
-
 enum erofs_kmap_type {
        EROFS_NO_KMAP,          /* don't map the buffer */
        EROFS_KMAP,             /* use kmap_local_page() to map the buffer */
@@ -472,12 +438,6 @@ static inline void *erofs_vm_map_ram(struct page **pages, unsigned int count)
        return NULL;
 }
 
-void *erofs_get_pcpubuf(unsigned int requiredpages);
-void erofs_put_pcpubuf(void *ptr);
-int erofs_pcpubuf_growsize(unsigned int nrpages);
-void __init erofs_pcpubuf_init(void);
-void erofs_pcpubuf_exit(void);
-
 int erofs_register_sysfs(struct super_block *sb);
 void erofs_unregister_sysfs(struct super_block *sb);
 int __init erofs_init_sysfs(void);
@@ -492,7 +452,7 @@ static inline void erofs_pagepool_add(struct page **pagepool, struct page *page)
 void erofs_release_pages(struct page **pagepool);
 
 #ifdef CONFIG_EROFS_FS_ZIP
-int erofs_workgroup_put(struct erofs_workgroup *grp);
+void erofs_workgroup_put(struct erofs_workgroup *grp);
 struct erofs_workgroup *erofs_find_workgroup(struct super_block *sb,
                                             pgoff_t index);
 struct erofs_workgroup *erofs_insert_workgroup(struct super_block *sb,
@@ -506,12 +466,17 @@ int __init z_erofs_init_zip_subsystem(void);
 void z_erofs_exit_zip_subsystem(void);
 int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
                                       struct erofs_workgroup *egrp);
-int erofs_try_to_free_cached_page(struct page *page);
 int z_erofs_load_lz4_config(struct super_block *sb,
                            struct erofs_super_block *dsb,
                            struct z_erofs_lz4_cfgs *lz4, int len);
 int z_erofs_map_blocks_iter(struct inode *inode, struct erofs_map_blocks *map,
                            int flags);
+void *erofs_get_pcpubuf(unsigned int requiredpages);
+void erofs_put_pcpubuf(void *ptr);
+int erofs_pcpubuf_growsize(unsigned int nrpages);
+void __init erofs_pcpubuf_init(void);
+void erofs_pcpubuf_exit(void);
+int erofs_init_managed_cache(struct super_block *sb);
 #else
 static inline void erofs_shrinker_register(struct super_block *sb) {}
 static inline void erofs_shrinker_unregister(struct super_block *sb) {}
@@ -529,6 +494,9 @@ static inline int z_erofs_load_lz4_config(struct super_block *sb,
        }
        return 0;
 }
+static inline void erofs_pcpubuf_init(void) {}
+static inline void erofs_pcpubuf_exit(void) {}
+static inline int erofs_init_managed_cache(struct super_block *sb) { return 0; }
 #endif /* !CONFIG_EROFS_FS_ZIP */
 
 #ifdef CONFIG_EROFS_FS_ZIP_LZMA
index 811ab66..9d6a3c6 100644 (file)
@@ -19,6 +19,7 @@
 #include <trace/events/erofs.h>
 
 static struct kmem_cache *erofs_inode_cachep __read_mostly;
+struct file_system_type erofs_fs_type;
 
 void _erofs_err(struct super_block *sb, const char *function,
                const char *fmt, ...)
@@ -253,8 +254,8 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
                        return PTR_ERR(fscache);
                dif->fscache = fscache;
        } else if (!sbi->devs->flatdev) {
-               bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL,
-                                         sb->s_type);
+               bdev = blkdev_get_by_path(dif->path, BLK_OPEN_READ, sb->s_type,
+                                         NULL);
                if (IS_ERR(bdev))
                        return PTR_ERR(bdev);
                dif->bdev = bdev;
@@ -599,68 +600,6 @@ static int erofs_fc_parse_param(struct fs_context *fc,
        return 0;
 }
 
-#ifdef CONFIG_EROFS_FS_ZIP
-static const struct address_space_operations managed_cache_aops;
-
-static bool erofs_managed_cache_release_folio(struct folio *folio, gfp_t gfp)
-{
-       bool ret = true;
-       struct address_space *const mapping = folio->mapping;
-
-       DBG_BUGON(!folio_test_locked(folio));
-       DBG_BUGON(mapping->a_ops != &managed_cache_aops);
-
-       if (folio_test_private(folio))
-               ret = erofs_try_to_free_cached_page(&folio->page);
-
-       return ret;
-}
-
-/*
- * It will be called only on inode eviction. In case that there are still some
- * decompression requests in progress, wait with rescheduling for a bit here.
- * We could introduce an extra locking instead but it seems unnecessary.
- */
-static void erofs_managed_cache_invalidate_folio(struct folio *folio,
-                                              size_t offset, size_t length)
-{
-       const size_t stop = length + offset;
-
-       DBG_BUGON(!folio_test_locked(folio));
-
-       /* Check for potential overflow in debug mode */
-       DBG_BUGON(stop > folio_size(folio) || stop < length);
-
-       if (offset == 0 && stop == folio_size(folio))
-               while (!erofs_managed_cache_release_folio(folio, GFP_NOFS))
-                       cond_resched();
-}
-
-static const struct address_space_operations managed_cache_aops = {
-       .release_folio = erofs_managed_cache_release_folio,
-       .invalidate_folio = erofs_managed_cache_invalidate_folio,
-};
-
-static int erofs_init_managed_cache(struct super_block *sb)
-{
-       struct erofs_sb_info *const sbi = EROFS_SB(sb);
-       struct inode *const inode = new_inode(sb);
-
-       if (!inode)
-               return -ENOMEM;
-
-       set_nlink(inode, 1);
-       inode->i_size = OFFSET_MAX;
-
-       inode->i_mapping->a_ops = &managed_cache_aops;
-       mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
-       sbi->managed_cache = inode;
-       return 0;
-}
-#else
-static int erofs_init_managed_cache(struct super_block *sb) { return 0; }
-#endif
-
 static struct inode *erofs_nfs_get_inode(struct super_block *sb,
                                         u64 ino, u32 generation)
 {
@@ -877,7 +816,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
 
        fs_put_dax(dif->dax_dev, NULL);
        if (dif->bdev)
-               blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
+               blkdev_put(dif->bdev, &erofs_fs_type);
        erofs_fscache_unregister_cookie(dif->fscache);
        dif->fscache = NULL;
        kfree(dif->path);
@@ -1016,10 +955,8 @@ static int __init erofs_module_init(void)
                                               sizeof(struct erofs_inode), 0,
                                               SLAB_RECLAIM_ACCOUNT,
                                               erofs_inode_init_once);
-       if (!erofs_inode_cachep) {
-               err = -ENOMEM;
-               goto icache_err;
-       }
+       if (!erofs_inode_cachep)
+               return -ENOMEM;
 
        err = erofs_init_shrinker();
        if (err)
@@ -1054,7 +991,6 @@ lzma_err:
        erofs_exit_shrinker();
 shrinker_err:
        kmem_cache_destroy(erofs_inode_cachep);
-icache_err:
        return err;
 }
 
index 46627cb..cc6fb9e 100644 (file)
@@ -4,7 +4,6 @@
  *             https://www.huawei.com/
  */
 #include "internal.h"
-#include <linux/pagevec.h>
 
 struct page *erofs_allocpage(struct page **pagepool, gfp_t gfp)
 {
@@ -33,22 +32,21 @@ void erofs_release_pages(struct page **pagepool)
 /* global shrink count (for all mounted EROFS instances) */
 static atomic_long_t erofs_global_shrink_cnt;
 
-static int erofs_workgroup_get(struct erofs_workgroup *grp)
+static bool erofs_workgroup_get(struct erofs_workgroup *grp)
 {
-       int o;
+       if (lockref_get_not_zero(&grp->lockref))
+               return true;
 
-repeat:
-       o = erofs_wait_on_workgroup_freezed(grp);
-       if (o <= 0)
-               return -1;
-
-       if (atomic_cmpxchg(&grp->refcount, o, o + 1) != o)
-               goto repeat;
+       spin_lock(&grp->lockref.lock);
+       if (__lockref_is_dead(&grp->lockref)) {
+               spin_unlock(&grp->lockref.lock);
+               return false;
+       }
 
-       /* decrease refcount paired by erofs_workgroup_put */
-       if (o == 1)
+       if (!grp->lockref.count++)
                atomic_long_dec(&erofs_global_shrink_cnt);
-       return 0;
+       spin_unlock(&grp->lockref.lock);
+       return true;
 }
 
 struct erofs_workgroup *erofs_find_workgroup(struct super_block *sb,
@@ -61,7 +59,7 @@ repeat:
        rcu_read_lock();
        grp = xa_load(&sbi->managed_pslots, index);
        if (grp) {
-               if (erofs_workgroup_get(grp)) {
+               if (!erofs_workgroup_get(grp)) {
                        /* prefer to relax rcu read side */
                        rcu_read_unlock();
                        goto repeat;
@@ -80,11 +78,10 @@ struct erofs_workgroup *erofs_insert_workgroup(struct super_block *sb,
        struct erofs_workgroup *pre;
 
        /*
-        * Bump up a reference count before making this visible
-        * to others for the XArray in order to avoid potential
-        * UAF without serialized by xa_lock.
+        * Bump up before making this visible to others for the XArray in order
+        * to avoid potential UAF without serialized by xa_lock.
         */
-       atomic_inc(&grp->refcount);
+       lockref_get(&grp->lockref);
 
 repeat:
        xa_lock(&sbi->managed_pslots);
@@ -93,13 +90,13 @@ repeat:
        if (pre) {
                if (xa_is_err(pre)) {
                        pre = ERR_PTR(xa_err(pre));
-               } else if (erofs_workgroup_get(pre)) {
+               } else if (!erofs_workgroup_get(pre)) {
                        /* try to legitimize the current in-tree one */
                        xa_unlock(&sbi->managed_pslots);
                        cond_resched();
                        goto repeat;
                }
-               atomic_dec(&grp->refcount);
+               lockref_put_return(&grp->lockref);
                grp = pre;
        }
        xa_unlock(&sbi->managed_pslots);
@@ -112,38 +109,34 @@ static void  __erofs_workgroup_free(struct erofs_workgroup *grp)
        erofs_workgroup_free_rcu(grp);
 }
 
-int erofs_workgroup_put(struct erofs_workgroup *grp)
+void erofs_workgroup_put(struct erofs_workgroup *grp)
 {
-       int count = atomic_dec_return(&grp->refcount);
+       if (lockref_put_or_lock(&grp->lockref))
+               return;
 
-       if (count == 1)
+       DBG_BUGON(__lockref_is_dead(&grp->lockref));
+       if (grp->lockref.count == 1)
                atomic_long_inc(&erofs_global_shrink_cnt);
-       else if (!count)
-               __erofs_workgroup_free(grp);
-       return count;
+       --grp->lockref.count;
+       spin_unlock(&grp->lockref.lock);
 }
 
 static bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi,
                                           struct erofs_workgroup *grp)
 {
-       /*
-        * If managed cache is on, refcount of workgroups
-        * themselves could be < 0 (freezed). In other words,
-        * there is no guarantee that all refcounts > 0.
-        */
-       if (!erofs_workgroup_try_to_freeze(grp, 1))
-               return false;
+       int free = false;
+
+       spin_lock(&grp->lockref.lock);
+       if (grp->lockref.count)
+               goto out;
 
        /*
-        * Note that all cached pages should be unattached
-        * before deleted from the XArray. Otherwise some
-        * cached pages could be still attached to the orphan
-        * old workgroup when the new one is available in the tree.
+        * Note that all cached pages should be detached before deleted from
+        * the XArray. Otherwise some cached pages could be still attached to
+        * the orphan old workgroup when the new one is available in the tree.
         */
-       if (erofs_try_to_free_all_cached_pages(sbi, grp)) {
-               erofs_workgroup_unfreeze(grp, 1);
-               return false;
-       }
+       if (erofs_try_to_free_all_cached_pages(sbi, grp))
+               goto out;
 
        /*
         * It's impossible to fail after the workgroup is freezed,
@@ -152,10 +145,13 @@ static bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi,
         */
        DBG_BUGON(__xa_erase(&sbi->managed_pslots, grp->index) != grp);
 
-       /* last refcount should be connected with its managed pslot.  */
-       erofs_workgroup_unfreeze(grp, 0);
-       __erofs_workgroup_free(grp);
-       return true;
+       lockref_mark_dead(&grp->lockref);
+       free = true;
+out:
+       spin_unlock(&grp->lockref.lock);
+       if (free)
+               __erofs_workgroup_free(grp);
+       return free;
 }
 
 static unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi,
index cd80499..40178b6 100644 (file)
@@ -7,32 +7,27 @@
 #include <linux/security.h>
 #include "xattr.h"
 
-static inline erofs_blk_t erofs_xattr_blkaddr(struct super_block *sb,
-                                             unsigned int xattr_id)
-{
-       return EROFS_SB(sb)->xattr_blkaddr +
-              erofs_blknr(sb, xattr_id * sizeof(__u32));
-}
-
-static inline unsigned int erofs_xattr_blkoff(struct super_block *sb,
-                                             unsigned int xattr_id)
-{
-       return erofs_blkoff(sb, xattr_id * sizeof(__u32));
-}
-
-struct xattr_iter {
+struct erofs_xattr_iter {
        struct super_block *sb;
        struct erofs_buf buf;
+       erofs_off_t pos;
        void *kaddr;
 
-       erofs_blk_t blkaddr;
-       unsigned int ofs;
+       char *buffer;
+       int buffer_size, buffer_ofs;
+
+       /* getxattr */
+       int index, infix_len;
+       struct qstr name;
+
+       /* listxattr */
+       struct dentry *dentry;
 };
 
 static int erofs_init_inode_xattrs(struct inode *inode)
 {
        struct erofs_inode *const vi = EROFS_I(inode);
-       struct xattr_iter it;
+       struct erofs_xattr_iter it;
        unsigned int i;
        struct erofs_xattr_ibody_header *ih;
        struct super_block *sb = inode->i_sb;
@@ -81,17 +76,17 @@ static int erofs_init_inode_xattrs(struct inode *inode)
        }
 
        it.buf = __EROFS_BUF_INITIALIZER;
-       it.blkaddr = erofs_blknr(sb, erofs_iloc(inode) + vi->inode_isize);
-       it.ofs = erofs_blkoff(sb, erofs_iloc(inode) + vi->inode_isize);
+       erofs_init_metabuf(&it.buf, sb);
+       it.pos = erofs_iloc(inode) + vi->inode_isize;
 
        /* read in shared xattr array (non-atomic, see kmalloc below) */
-       it.kaddr = erofs_read_metabuf(&it.buf, sb, it.blkaddr, EROFS_KMAP);
+       it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos), EROFS_KMAP);
        if (IS_ERR(it.kaddr)) {
                ret = PTR_ERR(it.kaddr);
                goto out_unlock;
        }
 
-       ih = (struct erofs_xattr_ibody_header *)(it.kaddr + it.ofs);
+       ih = it.kaddr + erofs_blkoff(sb, it.pos);
        vi->xattr_shared_count = ih->h_shared_count;
        vi->xattr_shared_xattrs = kmalloc_array(vi->xattr_shared_count,
                                                sizeof(uint), GFP_KERNEL);
@@ -102,26 +97,20 @@ static int erofs_init_inode_xattrs(struct inode *inode)
        }
 
        /* let's skip ibody header */
-       it.ofs += sizeof(struct erofs_xattr_ibody_header);
+       it.pos += sizeof(struct erofs_xattr_ibody_header);
 
        for (i = 0; i < vi->xattr_shared_count; ++i) {
-               if (it.ofs >= sb->s_blocksize) {
-                       /* cannot be unaligned */
-                       DBG_BUGON(it.ofs != sb->s_blocksize);
-
-                       it.kaddr = erofs_read_metabuf(&it.buf, sb, ++it.blkaddr,
-                                                     EROFS_KMAP);
-                       if (IS_ERR(it.kaddr)) {
-                               kfree(vi->xattr_shared_xattrs);
-                               vi->xattr_shared_xattrs = NULL;
-                               ret = PTR_ERR(it.kaddr);
-                               goto out_unlock;
-                       }
-                       it.ofs = 0;
+               it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos),
+                                      EROFS_KMAP);
+               if (IS_ERR(it.kaddr)) {
+                       kfree(vi->xattr_shared_xattrs);
+                       vi->xattr_shared_xattrs = NULL;
+                       ret = PTR_ERR(it.kaddr);
+                       goto out_unlock;
                }
-               vi->xattr_shared_xattrs[i] =
-                       le32_to_cpu(*(__le32 *)(it.kaddr + it.ofs));
-               it.ofs += sizeof(__le32);
+               vi->xattr_shared_xattrs[i] = le32_to_cpu(*(__le32 *)
+                               (it.kaddr + erofs_blkoff(sb, it.pos)));
+               it.pos += sizeof(__le32);
        }
        erofs_put_metabuf(&it.buf);
 
@@ -134,287 +123,6 @@ out_unlock:
        return ret;
 }
 
-/*
- * the general idea for these return values is
- * if    0 is returned, go on processing the current xattr;
- *       1 (> 0) is returned, skip this round to process the next xattr;
- *    -err (< 0) is returned, an error (maybe ENOXATTR) occurred
- *                            and need to be handled
- */
-struct xattr_iter_handlers {
-       int (*entry)(struct xattr_iter *_it, struct erofs_xattr_entry *entry);
-       int (*name)(struct xattr_iter *_it, unsigned int processed, char *buf,
-                   unsigned int len);
-       int (*alloc_buffer)(struct xattr_iter *_it, unsigned int value_sz);
-       void (*value)(struct xattr_iter *_it, unsigned int processed, char *buf,
-                     unsigned int len);
-};
-
-static inline int xattr_iter_fixup(struct xattr_iter *it)
-{
-       if (it->ofs < it->sb->s_blocksize)
-               return 0;
-
-       it->blkaddr += erofs_blknr(it->sb, it->ofs);
-       it->kaddr = erofs_read_metabuf(&it->buf, it->sb, it->blkaddr,
-                                      EROFS_KMAP);
-       if (IS_ERR(it->kaddr))
-               return PTR_ERR(it->kaddr);
-       it->ofs = erofs_blkoff(it->sb, it->ofs);
-       return 0;
-}
-
-static int inline_xattr_iter_begin(struct xattr_iter *it,
-                                  struct inode *inode)
-{
-       struct erofs_inode *const vi = EROFS_I(inode);
-       unsigned int xattr_header_sz, inline_xattr_ofs;
-
-       xattr_header_sz = sizeof(struct erofs_xattr_ibody_header) +
-                         sizeof(u32) * vi->xattr_shared_count;
-       if (xattr_header_sz >= vi->xattr_isize) {
-               DBG_BUGON(xattr_header_sz > vi->xattr_isize);
-               return -ENOATTR;
-       }
-
-       inline_xattr_ofs = vi->inode_isize + xattr_header_sz;
-
-       it->blkaddr = erofs_blknr(it->sb, erofs_iloc(inode) + inline_xattr_ofs);
-       it->ofs = erofs_blkoff(it->sb, erofs_iloc(inode) + inline_xattr_ofs);
-       it->kaddr = erofs_read_metabuf(&it->buf, inode->i_sb, it->blkaddr,
-                                      EROFS_KMAP);
-       if (IS_ERR(it->kaddr))
-               return PTR_ERR(it->kaddr);
-       return vi->xattr_isize - xattr_header_sz;
-}
-
-/*
- * Regardless of success or failure, `xattr_foreach' will end up with
- * `ofs' pointing to the next xattr item rather than an arbitrary position.
- */
-static int xattr_foreach(struct xattr_iter *it,
-                        const struct xattr_iter_handlers *op,
-                        unsigned int *tlimit)
-{
-       struct erofs_xattr_entry entry;
-       unsigned int value_sz, processed, slice;
-       int err;
-
-       /* 0. fixup blkaddr, ofs, ipage */
-       err = xattr_iter_fixup(it);
-       if (err)
-               return err;
-
-       /*
-        * 1. read xattr entry to the memory,
-        *    since we do EROFS_XATTR_ALIGN
-        *    therefore entry should be in the page
-        */
-       entry = *(struct erofs_xattr_entry *)(it->kaddr + it->ofs);
-       if (tlimit) {
-               unsigned int entry_sz = erofs_xattr_entry_size(&entry);
-
-               /* xattr on-disk corruption: xattr entry beyond xattr_isize */
-               if (*tlimit < entry_sz) {
-                       DBG_BUGON(1);
-                       return -EFSCORRUPTED;
-               }
-               *tlimit -= entry_sz;
-       }
-
-       it->ofs += sizeof(struct erofs_xattr_entry);
-       value_sz = le16_to_cpu(entry.e_value_size);
-
-       /* handle entry */
-       err = op->entry(it, &entry);
-       if (err) {
-               it->ofs += entry.e_name_len + value_sz;
-               goto out;
-       }
-
-       /* 2. handle xattr name (ofs will finally be at the end of name) */
-       processed = 0;
-
-       while (processed < entry.e_name_len) {
-               if (it->ofs >= it->sb->s_blocksize) {
-                       DBG_BUGON(it->ofs > it->sb->s_blocksize);
-
-                       err = xattr_iter_fixup(it);
-                       if (err)
-                               goto out;
-                       it->ofs = 0;
-               }
-
-               slice = min_t(unsigned int, it->sb->s_blocksize - it->ofs,
-                             entry.e_name_len - processed);
-
-               /* handle name */
-               err = op->name(it, processed, it->kaddr + it->ofs, slice);
-               if (err) {
-                       it->ofs += entry.e_name_len - processed + value_sz;
-                       goto out;
-               }
-
-               it->ofs += slice;
-               processed += slice;
-       }
-
-       /* 3. handle xattr value */
-       processed = 0;
-
-       if (op->alloc_buffer) {
-               err = op->alloc_buffer(it, value_sz);
-               if (err) {
-                       it->ofs += value_sz;
-                       goto out;
-               }
-       }
-
-       while (processed < value_sz) {
-               if (it->ofs >= it->sb->s_blocksize) {
-                       DBG_BUGON(it->ofs > it->sb->s_blocksize);
-
-                       err = xattr_iter_fixup(it);
-                       if (err)
-                               goto out;
-                       it->ofs = 0;
-               }
-
-               slice = min_t(unsigned int, it->sb->s_blocksize - it->ofs,
-                             value_sz - processed);
-               op->value(it, processed, it->kaddr + it->ofs, slice);
-               it->ofs += slice;
-               processed += slice;
-       }
-
-out:
-       /* xattrs should be 4-byte aligned (on-disk constraint) */
-       it->ofs = EROFS_XATTR_ALIGN(it->ofs);
-       return err < 0 ? err : 0;
-}
-
-struct getxattr_iter {
-       struct xattr_iter it;
-
-       char *buffer;
-       int buffer_size, index, infix_len;
-       struct qstr name;
-};
-
-static int erofs_xattr_long_entrymatch(struct getxattr_iter *it,
-                                      struct erofs_xattr_entry *entry)
-{
-       struct erofs_sb_info *sbi = EROFS_SB(it->it.sb);
-       struct erofs_xattr_prefix_item *pf = sbi->xattr_prefixes +
-               (entry->e_name_index & EROFS_XATTR_LONG_PREFIX_MASK);
-
-       if (pf >= sbi->xattr_prefixes + sbi->xattr_prefix_count)
-               return -ENOATTR;
-
-       if (it->index != pf->prefix->base_index ||
-           it->name.len != entry->e_name_len + pf->infix_len)
-               return -ENOATTR;
-
-       if (memcmp(it->name.name, pf->prefix->infix, pf->infix_len))
-               return -ENOATTR;
-
-       it->infix_len = pf->infix_len;
-       return 0;
-}
-
-static int xattr_entrymatch(struct xattr_iter *_it,
-                           struct erofs_xattr_entry *entry)
-{
-       struct getxattr_iter *it = container_of(_it, struct getxattr_iter, it);
-
-       /* should also match the infix for long name prefixes */
-       if (entry->e_name_index & EROFS_XATTR_LONG_PREFIX)
-               return erofs_xattr_long_entrymatch(it, entry);
-
-       if (it->index != entry->e_name_index ||
-           it->name.len != entry->e_name_len)
-               return -ENOATTR;
-       it->infix_len = 0;
-       return 0;
-}
-
-static int xattr_namematch(struct xattr_iter *_it,
-                          unsigned int processed, char *buf, unsigned int len)
-{
-       struct getxattr_iter *it = container_of(_it, struct getxattr_iter, it);
-
-       if (memcmp(buf, it->name.name + it->infix_len + processed, len))
-               return -ENOATTR;
-       return 0;
-}
-
-static int xattr_checkbuffer(struct xattr_iter *_it,
-                            unsigned int value_sz)
-{
-       struct getxattr_iter *it = container_of(_it, struct getxattr_iter, it);
-       int err = it->buffer_size < value_sz ? -ERANGE : 0;
-
-       it->buffer_size = value_sz;
-       return !it->buffer ? 1 : err;
-}
-
-static void xattr_copyvalue(struct xattr_iter *_it,
-                           unsigned int processed,
-                           char *buf, unsigned int len)
-{
-       struct getxattr_iter *it = container_of(_it, struct getxattr_iter, it);
-
-       memcpy(it->buffer + processed, buf, len);
-}
-
-static const struct xattr_iter_handlers find_xattr_handlers = {
-       .entry = xattr_entrymatch,
-       .name = xattr_namematch,
-       .alloc_buffer = xattr_checkbuffer,
-       .value = xattr_copyvalue
-};
-
-static int inline_getxattr(struct inode *inode, struct getxattr_iter *it)
-{
-       int ret;
-       unsigned int remaining;
-
-       ret = inline_xattr_iter_begin(&it->it, inode);
-       if (ret < 0)
-               return ret;
-
-       remaining = ret;
-       while (remaining) {
-               ret = xattr_foreach(&it->it, &find_xattr_handlers, &remaining);
-               if (ret != -ENOATTR)
-                       break;
-       }
-       return ret ? ret : it->buffer_size;
-}
-
-static int shared_getxattr(struct inode *inode, struct getxattr_iter *it)
-{
-       struct erofs_inode *const vi = EROFS_I(inode);
-       struct super_block *const sb = it->it.sb;
-       unsigned int i, xsid;
-       int ret = -ENOATTR;
-
-       for (i = 0; i < vi->xattr_shared_count; ++i) {
-               xsid = vi->xattr_shared_xattrs[i];
-               it->it.blkaddr = erofs_xattr_blkaddr(sb, xsid);
-               it->it.ofs = erofs_xattr_blkoff(sb, xsid);
-               it->it.kaddr = erofs_read_metabuf(&it->it.buf, sb,
-                                                 it->it.blkaddr, EROFS_KMAP);
-               if (IS_ERR(it->it.kaddr))
-                       return PTR_ERR(it->it.kaddr);
-
-               ret = xattr_foreach(&it->it, &find_xattr_handlers, NULL);
-               if (ret != -ENOATTR)
-                       break;
-       }
-       return ret ? ret : it->buffer_size;
-}
-
 static bool erofs_xattr_user_list(struct dentry *dentry)
 {
        return test_opt(&EROFS_SB(dentry->d_sb)->opt, XATTR_USER);
@@ -425,39 +133,6 @@ static bool erofs_xattr_trusted_list(struct dentry *dentry)
        return capable(CAP_SYS_ADMIN);
 }
 
-int erofs_getxattr(struct inode *inode, int index,
-                  const char *name,
-                  void *buffer, size_t buffer_size)
-{
-       int ret;
-       struct getxattr_iter it;
-
-       if (!name)
-               return -EINVAL;
-
-       ret = erofs_init_inode_xattrs(inode);
-       if (ret)
-               return ret;
-
-       it.index = index;
-       it.name.len = strlen(name);
-       if (it.name.len > EROFS_NAME_LEN)
-               return -ERANGE;
-
-       it.it.buf = __EROFS_BUF_INITIALIZER;
-       it.name.name = name;
-
-       it.buffer = buffer;
-       it.buffer_size = buffer_size;
-
-       it.it.sb = inode->i_sb;
-       ret = inline_getxattr(inode, &it);
-       if (ret == -ENOATTR)
-               ret = shared_getxattr(inode, &it);
-       erofs_put_metabuf(&it.it.buf);
-       return ret;
-}
-
 static int erofs_xattr_generic_get(const struct xattr_handler *handler,
                                   struct dentry *unused, struct inode *inode,
                                   const char *name, void *buffer, size_t size)
@@ -500,30 +175,49 @@ const struct xattr_handler *erofs_xattr_handlers[] = {
        NULL,
 };
 
-struct listxattr_iter {
-       struct xattr_iter it;
-
-       struct dentry *dentry;
-       char *buffer;
-       int buffer_size, buffer_ofs;
-};
+static int erofs_xattr_copy_to_buffer(struct erofs_xattr_iter *it,
+                                     unsigned int len)
+{
+       unsigned int slice, processed;
+       struct super_block *sb = it->sb;
+       void *src;
+
+       for (processed = 0; processed < len; processed += slice) {
+               it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
+                                       EROFS_KMAP);
+               if (IS_ERR(it->kaddr))
+                       return PTR_ERR(it->kaddr);
+
+               src = it->kaddr + erofs_blkoff(sb, it->pos);
+               slice = min_t(unsigned int, sb->s_blocksize -
+                               erofs_blkoff(sb, it->pos), len - processed);
+               memcpy(it->buffer + it->buffer_ofs, src, slice);
+               it->buffer_ofs += slice;
+               it->pos += slice;
+       }
+       return 0;
+}
 
-static int xattr_entrylist(struct xattr_iter *_it,
-                          struct erofs_xattr_entry *entry)
+static int erofs_listxattr_foreach(struct erofs_xattr_iter *it)
 {
-       struct listxattr_iter *it =
-               container_of(_it, struct listxattr_iter, it);
-       unsigned int base_index = entry->e_name_index;
-       unsigned int prefix_len, infix_len = 0;
+       struct erofs_xattr_entry entry;
+       unsigned int base_index, name_total, prefix_len, infix_len = 0;
        const char *prefix, *infix = NULL;
+       int err;
+
+       /* 1. handle xattr entry */
+       entry = *(struct erofs_xattr_entry *)
+                       (it->kaddr + erofs_blkoff(it->sb, it->pos));
+       it->pos += sizeof(struct erofs_xattr_entry);
 
-       if (entry->e_name_index & EROFS_XATTR_LONG_PREFIX) {
-               struct erofs_sb_info *sbi = EROFS_SB(_it->sb);
+       base_index = entry.e_name_index;
+       if (entry.e_name_index & EROFS_XATTR_LONG_PREFIX) {
+               struct erofs_sb_info *sbi = EROFS_SB(it->sb);
                struct erofs_xattr_prefix_item *pf = sbi->xattr_prefixes +
-                       (entry->e_name_index & EROFS_XATTR_LONG_PREFIX_MASK);
+                       (entry.e_name_index & EROFS_XATTR_LONG_PREFIX_MASK);
 
                if (pf >= sbi->xattr_prefixes + sbi->xattr_prefix_count)
-                       return 1;
+                       return 0;
                infix = pf->prefix->infix;
                infix_len = pf->infix_len;
                base_index = pf->prefix->base_index;
@@ -531,120 +225,228 @@ static int xattr_entrylist(struct xattr_iter *_it,
 
        prefix = erofs_xattr_prefix(base_index, it->dentry);
        if (!prefix)
-               return 1;
+               return 0;
        prefix_len = strlen(prefix);
+       name_total = prefix_len + infix_len + entry.e_name_len + 1;
 
        if (!it->buffer) {
-               it->buffer_ofs += prefix_len + infix_len +
-                                       entry->e_name_len + 1;
-               return 1;
+               it->buffer_ofs += name_total;
+               return 0;
        }
 
-       if (it->buffer_ofs + prefix_len + infix_len +
-               + entry->e_name_len + 1 > it->buffer_size)
+       if (it->buffer_ofs + name_total > it->buffer_size)
                return -ERANGE;
 
        memcpy(it->buffer + it->buffer_ofs, prefix, prefix_len);
        memcpy(it->buffer + it->buffer_ofs + prefix_len, infix, infix_len);
        it->buffer_ofs += prefix_len + infix_len;
-       return 0;
-}
 
-static int xattr_namelist(struct xattr_iter *_it,
-                         unsigned int processed, char *buf, unsigned int len)
-{
-       struct listxattr_iter *it =
-               container_of(_it, struct listxattr_iter, it);
+       /* 2. handle xattr name */
+       err = erofs_xattr_copy_to_buffer(it, entry.e_name_len);
+       if (err)
+               return err;
 
-       memcpy(it->buffer + it->buffer_ofs, buf, len);
-       it->buffer_ofs += len;
+       it->buffer[it->buffer_ofs++] = '\0';
        return 0;
 }
 
-static int xattr_skipvalue(struct xattr_iter *_it,
-                          unsigned int value_sz)
+static int erofs_getxattr_foreach(struct erofs_xattr_iter *it)
 {
-       struct listxattr_iter *it =
-               container_of(_it, struct listxattr_iter, it);
+       struct super_block *sb = it->sb;
+       struct erofs_xattr_entry entry;
+       unsigned int slice, processed, value_sz;
 
-       it->buffer[it->buffer_ofs++] = '\0';
-       return 1;
-}
+       /* 1. handle xattr entry */
+       entry = *(struct erofs_xattr_entry *)
+                       (it->kaddr + erofs_blkoff(sb, it->pos));
+       it->pos += sizeof(struct erofs_xattr_entry);
+       value_sz = le16_to_cpu(entry.e_value_size);
 
-static const struct xattr_iter_handlers list_xattr_handlers = {
-       .entry = xattr_entrylist,
-       .name = xattr_namelist,
-       .alloc_buffer = xattr_skipvalue,
-       .value = NULL
-};
+       /* should also match the infix for long name prefixes */
+       if (entry.e_name_index & EROFS_XATTR_LONG_PREFIX) {
+               struct erofs_sb_info *sbi = EROFS_SB(sb);
+               struct erofs_xattr_prefix_item *pf = sbi->xattr_prefixes +
+                       (entry.e_name_index & EROFS_XATTR_LONG_PREFIX_MASK);
+
+               if (pf >= sbi->xattr_prefixes + sbi->xattr_prefix_count)
+                       return -ENOATTR;
+
+               if (it->index != pf->prefix->base_index ||
+                   it->name.len != entry.e_name_len + pf->infix_len)
+                       return -ENOATTR;
+
+               if (memcmp(it->name.name, pf->prefix->infix, pf->infix_len))
+                       return -ENOATTR;
+
+               it->infix_len = pf->infix_len;
+       } else {
+               if (it->index != entry.e_name_index ||
+                   it->name.len != entry.e_name_len)
+                       return -ENOATTR;
 
-static int inline_listxattr(struct listxattr_iter *it)
+               it->infix_len = 0;
+       }
+
+       /* 2. handle xattr name */
+       for (processed = 0; processed < entry.e_name_len; processed += slice) {
+               it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
+                                       EROFS_KMAP);
+               if (IS_ERR(it->kaddr))
+                       return PTR_ERR(it->kaddr);
+
+               slice = min_t(unsigned int,
+                               sb->s_blocksize - erofs_blkoff(sb, it->pos),
+                               entry.e_name_len - processed);
+               if (memcmp(it->name.name + it->infix_len + processed,
+                          it->kaddr + erofs_blkoff(sb, it->pos), slice))
+                       return -ENOATTR;
+               it->pos += slice;
+       }
+
+       /* 3. handle xattr value */
+       if (!it->buffer) {
+               it->buffer_ofs = value_sz;
+               return 0;
+       }
+
+       if (it->buffer_size < value_sz)
+               return -ERANGE;
+
+       return erofs_xattr_copy_to_buffer(it, value_sz);
+}
+
+static int erofs_xattr_iter_inline(struct erofs_xattr_iter *it,
+                                  struct inode *inode, bool getxattr)
 {
+       struct erofs_inode *const vi = EROFS_I(inode);
+       unsigned int xattr_header_sz, remaining, entry_sz;
+       erofs_off_t next_pos;
        int ret;
-       unsigned int remaining;
 
-       ret = inline_xattr_iter_begin(&it->it, d_inode(it->dentry));
-       if (ret < 0)
-               return ret;
+       xattr_header_sz = sizeof(struct erofs_xattr_ibody_header) +
+                         sizeof(u32) * vi->xattr_shared_count;
+       if (xattr_header_sz >= vi->xattr_isize) {
+               DBG_BUGON(xattr_header_sz > vi->xattr_isize);
+               return -ENOATTR;
+       }
+
+       remaining = vi->xattr_isize - xattr_header_sz;
+       it->pos = erofs_iloc(inode) + vi->inode_isize + xattr_header_sz;
 
-       remaining = ret;
        while (remaining) {
-               ret = xattr_foreach(&it->it, &list_xattr_handlers, &remaining);
-               if (ret)
+               it->kaddr = erofs_bread(&it->buf, erofs_blknr(it->sb, it->pos),
+                                       EROFS_KMAP);
+               if (IS_ERR(it->kaddr))
+                       return PTR_ERR(it->kaddr);
+
+               entry_sz = erofs_xattr_entry_size(it->kaddr +
+                               erofs_blkoff(it->sb, it->pos));
+               /* xattr on-disk corruption: xattr entry beyond xattr_isize */
+               if (remaining < entry_sz) {
+                       DBG_BUGON(1);
+                       return -EFSCORRUPTED;
+               }
+               remaining -= entry_sz;
+               next_pos = it->pos + entry_sz;
+
+               if (getxattr)
+                       ret = erofs_getxattr_foreach(it);
+               else
+                       ret = erofs_listxattr_foreach(it);
+               if ((getxattr && ret != -ENOATTR) || (!getxattr && ret))
                        break;
+
+               it->pos = next_pos;
        }
-       return ret ? ret : it->buffer_ofs;
+       return ret;
 }
 
-static int shared_listxattr(struct listxattr_iter *it)
+static int erofs_xattr_iter_shared(struct erofs_xattr_iter *it,
+                                  struct inode *inode, bool getxattr)
 {
-       struct inode *const inode = d_inode(it->dentry);
        struct erofs_inode *const vi = EROFS_I(inode);
-       struct super_block *const sb = it->it.sb;
-       unsigned int i, xsid;
-       int ret = 0;
+       struct super_block *const sb = it->sb;
+       struct erofs_sb_info *sbi = EROFS_SB(sb);
+       unsigned int i;
+       int ret = -ENOATTR;
 
        for (i = 0; i < vi->xattr_shared_count; ++i) {
-               xsid = vi->xattr_shared_xattrs[i];
-               it->it.blkaddr = erofs_xattr_blkaddr(sb, xsid);
-               it->it.ofs = erofs_xattr_blkoff(sb, xsid);
-               it->it.kaddr = erofs_read_metabuf(&it->it.buf, sb,
-                                                 it->it.blkaddr, EROFS_KMAP);
-               if (IS_ERR(it->it.kaddr))
-                       return PTR_ERR(it->it.kaddr);
-
-               ret = xattr_foreach(&it->it, &list_xattr_handlers, NULL);
-               if (ret)
+               it->pos = erofs_pos(sb, sbi->xattr_blkaddr) +
+                               vi->xattr_shared_xattrs[i] * sizeof(__le32);
+               it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
+                                       EROFS_KMAP);
+               if (IS_ERR(it->kaddr))
+                       return PTR_ERR(it->kaddr);
+
+               if (getxattr)
+                       ret = erofs_getxattr_foreach(it);
+               else
+                       ret = erofs_listxattr_foreach(it);
+               if ((getxattr && ret != -ENOATTR) || (!getxattr && ret))
                        break;
        }
-       return ret ? ret : it->buffer_ofs;
+       return ret;
+}
+
+int erofs_getxattr(struct inode *inode, int index, const char *name,
+                  void *buffer, size_t buffer_size)
+{
+       int ret;
+       struct erofs_xattr_iter it;
+
+       if (!name)
+               return -EINVAL;
+
+       ret = erofs_init_inode_xattrs(inode);
+       if (ret)
+               return ret;
+
+       it.index = index;
+       it.name = (struct qstr)QSTR_INIT(name, strlen(name));
+       if (it.name.len > EROFS_NAME_LEN)
+               return -ERANGE;
+
+       it.sb = inode->i_sb;
+       it.buf = __EROFS_BUF_INITIALIZER;
+       erofs_init_metabuf(&it.buf, it.sb);
+       it.buffer = buffer;
+       it.buffer_size = buffer_size;
+       it.buffer_ofs = 0;
+
+       ret = erofs_xattr_iter_inline(&it, inode, true);
+       if (ret == -ENOATTR)
+               ret = erofs_xattr_iter_shared(&it, inode, true);
+       erofs_put_metabuf(&it.buf);
+       return ret ? ret : it.buffer_ofs;
 }
 
-ssize_t erofs_listxattr(struct dentry *dentry,
-                       char *buffer, size_t buffer_size)
+ssize_t erofs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
 {
        int ret;
-       struct listxattr_iter it;
+       struct erofs_xattr_iter it;
+       struct inode *inode = d_inode(dentry);
 
-       ret = erofs_init_inode_xattrs(d_inode(dentry));
+       ret = erofs_init_inode_xattrs(inode);
        if (ret == -ENOATTR)
                return 0;
        if (ret)
                return ret;
 
-       it.it.buf = __EROFS_BUF_INITIALIZER;
+       it.sb = dentry->d_sb;
+       it.buf = __EROFS_BUF_INITIALIZER;
+       erofs_init_metabuf(&it.buf, it.sb);
        it.dentry = dentry;
        it.buffer = buffer;
        it.buffer_size = buffer_size;
        it.buffer_ofs = 0;
 
-       it.it.sb = dentry->d_sb;
-
-       ret = inline_listxattr(&it);
-       if (ret >= 0 || ret == -ENOATTR)
-               ret = shared_listxattr(&it);
-       erofs_put_metabuf(&it.it.buf);
-       return ret;
+       ret = erofs_xattr_iter_inline(&it, inode, false);
+       if (!ret || ret == -ENOATTR)
+               ret = erofs_xattr_iter_shared(&it, inode, false);
+       if (ret == -ENOATTR)
+               ret = 0;
+       erofs_put_metabuf(&it.buf);
+       return ret ? ret : it.buffer_ofs;
 }
 
 void erofs_xattr_prefixes_cleanup(struct super_block *sb)
@@ -675,7 +477,7 @@ int erofs_xattr_prefixes_init(struct super_block *sb)
        if (!pfs)
                return -ENOMEM;
 
-       if (erofs_sb_has_fragments(sbi))
+       if (sbi->packed_inode)
                buf.inode = sbi->packed_inode;
        else
                erofs_init_metabuf(&buf, sb);
index 45f21db..5f1890e 100644 (file)
@@ -5,7 +5,6 @@
  * Copyright (C) 2022 Alibaba Cloud
  */
 #include "compress.h"
-#include <linux/prefetch.h>
 #include <linux/psi.h>
 #include <linux/cpuhotplug.h>
 #include <trace/events/erofs.h>
@@ -92,13 +91,8 @@ struct z_erofs_pcluster {
        struct z_erofs_bvec compressed_bvecs[];
 };
 
-/* let's avoid the valid 32-bit kernel addresses */
-
-/* the chained workgroup has't submitted io (still open) */
-#define Z_EROFS_PCLUSTER_TAIL           ((void *)0x5F0ECAFE)
-/* the chained workgroup has already submitted io */
-#define Z_EROFS_PCLUSTER_TAIL_CLOSED    ((void *)0x5F0EDEAD)
-
+/* the end of a chain of pclusters */
+#define Z_EROFS_PCLUSTER_TAIL           ((void *) 0x700 + POISON_POINTER_DELTA)
 #define Z_EROFS_PCLUSTER_NIL            (NULL)
 
 struct z_erofs_decompressqueue {
@@ -241,14 +235,20 @@ static void z_erofs_bvec_iter_begin(struct z_erofs_bvec_iter *iter,
 
 static int z_erofs_bvec_enqueue(struct z_erofs_bvec_iter *iter,
                                struct z_erofs_bvec *bvec,
-                               struct page **candidate_bvpage)
+                               struct page **candidate_bvpage,
+                               struct page **pagepool)
 {
-       if (iter->cur == iter->nr) {
-               if (!*candidate_bvpage)
-                       return -EAGAIN;
-
+       if (iter->cur >= iter->nr) {
+               struct page *nextpage = *candidate_bvpage;
+
+               if (!nextpage) {
+                       nextpage = erofs_allocpage(pagepool, GFP_NOFS);
+                       if (!nextpage)
+                               return -ENOMEM;
+                       set_page_private(nextpage, Z_EROFS_SHORTLIVED_PAGE);
+               }
                DBG_BUGON(iter->bvset->nextpage);
-               iter->bvset->nextpage = *candidate_bvpage;
+               iter->bvset->nextpage = nextpage;
                z_erofs_bvset_flip(iter);
 
                iter->bvset->nextpage = NULL;
@@ -369,8 +369,6 @@ static struct kthread_worker *erofs_init_percpu_worker(int cpu)
                return worker;
        if (IS_ENABLED(CONFIG_EROFS_FS_PCPU_KTHREAD_HIPRI))
                sched_set_fifo_low(worker->task);
-       else
-               sched_set_normal(worker->task, 0);
        return worker;
 }
 
@@ -502,20 +500,6 @@ out_error_pcluster_pool:
 enum z_erofs_pclustermode {
        Z_EROFS_PCLUSTER_INFLIGHT,
        /*
-        * The current pclusters was the tail of an exist chain, in addition
-        * that the previous processed chained pclusters are all decided to
-        * be hooked up to it.
-        * A new chain will be created for the remaining pclusters which are
-        * not processed yet, so different from Z_EROFS_PCLUSTER_FOLLOWED,
-        * the next pcluster cannot reuse the whole page safely for inplace I/O
-        * in the following scenario:
-        *  ________________________________________________________________
-        * |      tail (partial) page     |       head (partial) page       |
-        * |   (belongs to the next pcl)  |   (belongs to the current pcl)  |
-        * |_______PCLUSTER_FOLLOWED______|________PCLUSTER_HOOKED__________|
-        */
-       Z_EROFS_PCLUSTER_HOOKED,
-       /*
         * a weak form of Z_EROFS_PCLUSTER_FOLLOWED, the difference is that it
         * could be dispatched into bypass queue later due to uptodated managed
         * pages. All related online pages cannot be reused for inplace I/O (or
@@ -532,8 +516,8 @@ enum z_erofs_pclustermode {
         *  ________________________________________________________________
         * |  tail (partial) page |          head (partial) page           |
         * |  (of the current cl) |      (of the previous collection)      |
-        * | PCLUSTER_FOLLOWED or |                                        |
-        * |_____PCLUSTER_HOOKED__|___________PCLUSTER_FOLLOWED____________|
+        * |                      |                                        |
+        * |__PCLUSTER_FOLLOWED___|___________PCLUSTER_FOLLOWED____________|
         *
         * [  (*) the above page can be used as inplace I/O.               ]
         */
@@ -545,12 +529,12 @@ struct z_erofs_decompress_frontend {
        struct erofs_map_blocks map;
        struct z_erofs_bvec_iter biter;
 
+       struct page *pagepool;
        struct page *candidate_bvpage;
-       struct z_erofs_pcluster *pcl, *tailpcl;
+       struct z_erofs_pcluster *pcl;
        z_erofs_next_pcluster_t owned_head;
        enum z_erofs_pclustermode mode;
 
-       bool readahead;
        /* used for applying cache strategy on the fly */
        bool backmost;
        erofs_off_t headoffset;
@@ -580,8 +564,7 @@ static bool z_erofs_should_alloc_cache(struct z_erofs_decompress_frontend *fe)
        return false;
 }
 
-static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
-                              struct page **pagepool)
+static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe)
 {
        struct address_space *mc = MNGD_MAPPING(EROFS_I_SB(fe->inode));
        struct z_erofs_pcluster *pcl = fe->pcl;
@@ -622,7 +605,7 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
                         * succeeds or fallback to in-place I/O instead
                         * to avoid any direct reclaim.
                         */
-                       newpage = erofs_allocpage(pagepool, gfp);
+                       newpage = erofs_allocpage(&fe->pagepool, gfp);
                        if (!newpage)
                                continue;
                        set_page_private(newpage, Z_EROFS_PREALLOCATED_PAGE);
@@ -635,7 +618,7 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
                if (page)
                        put_page(page);
                else if (newpage)
-                       erofs_pagepool_add(pagepool, newpage);
+                       erofs_pagepool_add(&fe->pagepool, newpage);
        }
 
        /*
@@ -656,7 +639,7 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
 
        DBG_BUGON(z_erofs_is_inline_pcluster(pcl));
        /*
-        * refcount of workgroup is now freezed as 1,
+        * refcount of workgroup is now freezed as 0,
         * therefore no need to worry about available decompression users.
         */
        for (i = 0; i < pcl->pclusterpages; ++i) {
@@ -680,29 +663,73 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
        return 0;
 }
 
-int erofs_try_to_free_cached_page(struct page *page)
+static bool z_erofs_cache_release_folio(struct folio *folio, gfp_t gfp)
 {
-       struct z_erofs_pcluster *const pcl = (void *)page_private(page);
-       int ret, i;
+       struct z_erofs_pcluster *pcl = folio_get_private(folio);
+       bool ret;
+       int i;
 
-       if (!erofs_workgroup_try_to_freeze(&pcl->obj, 1))
-               return 0;
+       if (!folio_test_private(folio))
+               return true;
+
+       ret = false;
+       spin_lock(&pcl->obj.lockref.lock);
+       if (pcl->obj.lockref.count > 0)
+               goto out;
 
-       ret = 0;
        DBG_BUGON(z_erofs_is_inline_pcluster(pcl));
        for (i = 0; i < pcl->pclusterpages; ++i) {
-               if (pcl->compressed_bvecs[i].page == page) {
+               if (pcl->compressed_bvecs[i].page == &folio->page) {
                        WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL);
-                       ret = 1;
+                       ret = true;
                        break;
                }
        }
-       erofs_workgroup_unfreeze(&pcl->obj, 1);
        if (ret)
-               detach_page_private(page);
+               folio_detach_private(folio);
+out:
+       spin_unlock(&pcl->obj.lockref.lock);
        return ret;
 }
 
+/*
+ * It will be called only on inode eviction. In case that there are still some
+ * decompression requests in progress, wait with rescheduling for a bit here.
+ * An extra lock could be introduced instead but it seems unnecessary.
+ */
+static void z_erofs_cache_invalidate_folio(struct folio *folio,
+                                          size_t offset, size_t length)
+{
+       const size_t stop = length + offset;
+
+       /* Check for potential overflow in debug mode */
+       DBG_BUGON(stop > folio_size(folio) || stop < length);
+
+       if (offset == 0 && stop == folio_size(folio))
+               while (!z_erofs_cache_release_folio(folio, GFP_NOFS))
+                       cond_resched();
+}
+
+static const struct address_space_operations z_erofs_cache_aops = {
+       .release_folio = z_erofs_cache_release_folio,
+       .invalidate_folio = z_erofs_cache_invalidate_folio,
+};
+
+int erofs_init_managed_cache(struct super_block *sb)
+{
+       struct inode *const inode = new_inode(sb);
+
+       if (!inode)
+               return -ENOMEM;
+
+       set_nlink(inode, 1);
+       inode->i_size = OFFSET_MAX;
+       inode->i_mapping->a_ops = &z_erofs_cache_aops;
+       mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
+       EROFS_SB(sb)->managed_cache = inode;
+       return 0;
+}
+
 static bool z_erofs_try_inplace_io(struct z_erofs_decompress_frontend *fe,
                                   struct z_erofs_bvec *bvec)
 {
@@ -733,7 +760,8 @@ static int z_erofs_attach_page(struct z_erofs_decompress_frontend *fe,
                    !fe->candidate_bvpage)
                        fe->candidate_bvpage = bvec->page;
        }
-       ret = z_erofs_bvec_enqueue(&fe->biter, bvec, &fe->candidate_bvpage);
+       ret = z_erofs_bvec_enqueue(&fe->biter, bvec, &fe->candidate_bvpage,
+                                  &fe->pagepool);
        fe->pcl->vcnt += (ret >= 0);
        return ret;
 }
@@ -752,19 +780,7 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
                return;
        }
 
-       /*
-        * type 2, link to the end of an existing open chain, be careful
-        * that its submission is controlled by the original attached chain.
-        */
-       if (*owned_head != &pcl->next && pcl != f->tailpcl &&
-           cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL,
-                   *owned_head) == Z_EROFS_PCLUSTER_TAIL) {
-               *owned_head = Z_EROFS_PCLUSTER_TAIL;
-               f->mode = Z_EROFS_PCLUSTER_HOOKED;
-               f->tailpcl = NULL;
-               return;
-       }
-       /* type 3, it belongs to a chain, but it isn't the end of the chain */
+       /* type 2, it belongs to an ongoing chain */
        f->mode = Z_EROFS_PCLUSTER_INFLIGHT;
 }
 
@@ -788,7 +804,7 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
        if (IS_ERR(pcl))
                return PTR_ERR(pcl);
 
-       atomic_set(&pcl->obj.refcount, 1);
+       spin_lock_init(&pcl->obj.lockref.lock);
        pcl->algorithmformat = map->m_algorithmformat;
        pcl->length = 0;
        pcl->partial = true;
@@ -825,9 +841,6 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
                        goto err_out;
                }
        }
-       /* used to check tail merging loop due to corrupted images */
-       if (fe->owned_head == Z_EROFS_PCLUSTER_TAIL)
-               fe->tailpcl = pcl;
        fe->owned_head = &pcl->next;
        fe->pcl = pcl;
        return 0;
@@ -848,7 +861,6 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
 
        /* must be Z_EROFS_PCLUSTER_TAIL or pointed to previous pcluster */
        DBG_BUGON(fe->owned_head == Z_EROFS_PCLUSTER_NIL);
-       DBG_BUGON(fe->owned_head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
 
        if (!(map->m_flags & EROFS_MAP_META)) {
                grp = erofs_find_workgroup(fe->inode->i_sb,
@@ -867,10 +879,6 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
 
        if (ret == -EEXIST) {
                mutex_lock(&fe->pcl->lock);
-               /* used to check tail merging loop due to corrupted images */
-               if (fe->owned_head == Z_EROFS_PCLUSTER_TAIL)
-                       fe->tailpcl = fe->pcl;
-
                z_erofs_try_to_claim_pcluster(fe);
        } else if (ret) {
                return ret;
@@ -910,10 +918,8 @@ static bool z_erofs_collector_end(struct z_erofs_decompress_frontend *fe)
        z_erofs_bvec_iter_end(&fe->biter);
        mutex_unlock(&pcl->lock);
 
-       if (fe->candidate_bvpage) {
-               DBG_BUGON(z_erofs_is_shortlived_page(fe->candidate_bvpage));
+       if (fe->candidate_bvpage)
                fe->candidate_bvpage = NULL;
-       }
 
        /*
         * if all pending pages are added, don't hold its reference
@@ -960,7 +966,7 @@ static int z_erofs_read_fragment(struct inode *inode, erofs_off_t pos,
 }
 
 static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
-                               struct page *page, struct page **pagepool)
+                               struct page *page)
 {
        struct inode *const inode = fe->inode;
        struct erofs_map_blocks *const map = &fe->map;
@@ -1018,7 +1024,7 @@ repeat:
                fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE;
        } else {
                /* bind cache first when cached decompression is preferred */
-               z_erofs_bind_cache(fe, pagepool);
+               z_erofs_bind_cache(fe);
        }
 hitted:
        /*
@@ -1027,8 +1033,7 @@ hitted:
         * those chains are handled asynchronously thus the page cannot be used
         * for inplace I/O or bvpage (should be processed in a strict order.)
         */
-       tight &= (fe->mode >= Z_EROFS_PCLUSTER_HOOKED &&
-                 fe->mode != Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE);
+       tight &= (fe->mode > Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE);
 
        cur = end - min_t(unsigned int, offset + end - map->m_la, end);
        if (!(map->m_flags & EROFS_MAP_MAPPED)) {
@@ -1058,24 +1063,13 @@ hitted:
        if (cur)
                tight &= (fe->mode >= Z_EROFS_PCLUSTER_FOLLOWED);
 
-retry:
        err = z_erofs_attach_page(fe, &((struct z_erofs_bvec) {
                                        .page = page,
                                        .offset = offset - map->m_la,
                                        .end = end,
                                  }), exclusive);
-       /* should allocate an additional short-lived page for bvset */
-       if (err == -EAGAIN && !fe->candidate_bvpage) {
-               fe->candidate_bvpage = alloc_page(GFP_NOFS | __GFP_NOFAIL);
-               set_page_private(fe->candidate_bvpage,
-                                Z_EROFS_SHORTLIVED_PAGE);
-               goto retry;
-       }
-
-       if (err) {
-               DBG_BUGON(err == -EAGAIN && fe->candidate_bvpage);
+       if (err)
                goto out;
-       }
 
        z_erofs_onlinepage_split(page);
        /* bump up the number of spiltted parts of a page */
@@ -1106,7 +1100,7 @@ out:
        return err;
 }
 
-static bool z_erofs_get_sync_decompress_policy(struct erofs_sb_info *sbi,
+static bool z_erofs_is_sync_decompress(struct erofs_sb_info *sbi,
                                       unsigned int readahead_pages)
 {
        /* auto: enable for read_folio, disable for readahead */
@@ -1285,6 +1279,8 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
        struct erofs_sb_info *const sbi = EROFS_SB(be->sb);
        struct z_erofs_pcluster *pcl = be->pcl;
        unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
+       const struct z_erofs_decompressor *decompressor =
+                               &erofs_decompressors[pcl->algorithmformat];
        unsigned int i, inputsize;
        int err2;
        struct page *page;
@@ -1328,7 +1324,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
        else
                inputsize = pclusterpages * PAGE_SIZE;
 
-       err = z_erofs_decompress(&(struct z_erofs_decompress_req) {
+       err = decompressor->decompress(&(struct z_erofs_decompress_req) {
                                        .sb = be->sb,
                                        .in = be->compressed_pages,
                                        .out = be->decompressed_pages,
@@ -1406,10 +1402,7 @@ static void z_erofs_decompress_queue(const struct z_erofs_decompressqueue *io,
        };
        z_erofs_next_pcluster_t owned = io->head;
 
-       while (owned != Z_EROFS_PCLUSTER_TAIL_CLOSED) {
-               /* impossible that 'owned' equals Z_EROFS_WORK_TPTR_TAIL */
-               DBG_BUGON(owned == Z_EROFS_PCLUSTER_TAIL);
-               /* impossible that 'owned' equals Z_EROFS_PCLUSTER_NIL */
+       while (owned != Z_EROFS_PCLUSTER_TAIL) {
                DBG_BUGON(owned == Z_EROFS_PCLUSTER_NIL);
 
                be.pcl = container_of(owned, struct z_erofs_pcluster, next);
@@ -1426,7 +1419,7 @@ static void z_erofs_decompressqueue_work(struct work_struct *work)
                container_of(work, struct z_erofs_decompressqueue, u.work);
        struct page *pagepool = NULL;
 
-       DBG_BUGON(bgq->head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
+       DBG_BUGON(bgq->head == Z_EROFS_PCLUSTER_TAIL);
        z_erofs_decompress_queue(bgq, &pagepool);
        erofs_release_pages(&pagepool);
        kvfree(bgq);
@@ -1454,7 +1447,7 @@ static void z_erofs_decompress_kickoff(struct z_erofs_decompressqueue *io,
        if (atomic_add_return(bios, &io->pending_bios))
                return;
        /* Use (kthread_)work and sync decompression for atomic contexts only */
-       if (in_atomic() || irqs_disabled()) {
+       if (!in_task() || irqs_disabled() || rcu_read_lock_any_held()) {
 #ifdef CONFIG_EROFS_FS_PCPU_KTHREAD
                struct kthread_worker *worker;
 
@@ -1614,7 +1607,7 @@ fg_out:
                q->sync = true;
        }
        q->sb = sb;
-       q->head = Z_EROFS_PCLUSTER_TAIL_CLOSED;
+       q->head = Z_EROFS_PCLUSTER_TAIL;
        return q;
 }
 
@@ -1632,11 +1625,7 @@ static void move_to_bypass_jobqueue(struct z_erofs_pcluster *pcl,
        z_erofs_next_pcluster_t *const submit_qtail = qtail[JQ_SUBMIT];
        z_erofs_next_pcluster_t *const bypass_qtail = qtail[JQ_BYPASS];
 
-       DBG_BUGON(owned_head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
-       if (owned_head == Z_EROFS_PCLUSTER_TAIL)
-               owned_head = Z_EROFS_PCLUSTER_TAIL_CLOSED;
-
-       WRITE_ONCE(pcl->next, Z_EROFS_PCLUSTER_TAIL_CLOSED);
+       WRITE_ONCE(pcl->next, Z_EROFS_PCLUSTER_TAIL);
 
        WRITE_ONCE(*submit_qtail, owned_head);
        WRITE_ONCE(*bypass_qtail, &pcl->next);
@@ -1670,9 +1659,8 @@ static void z_erofs_decompressqueue_endio(struct bio *bio)
 }
 
 static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
-                                struct page **pagepool,
                                 struct z_erofs_decompressqueue *fgq,
-                                bool *force_fg)
+                                bool *force_fg, bool readahead)
 {
        struct super_block *sb = f->inode->i_sb;
        struct address_space *mc = MNGD_MAPPING(EROFS_SB(sb));
@@ -1707,15 +1695,10 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
                unsigned int i = 0;
                bool bypass = true;
 
-               /* no possible 'owned_head' equals the following */
-               DBG_BUGON(owned_head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
                DBG_BUGON(owned_head == Z_EROFS_PCLUSTER_NIL);
-
                pcl = container_of(owned_head, struct z_erofs_pcluster, next);
+               owned_head = READ_ONCE(pcl->next);
 
-               /* close the main owned chain at first */
-               owned_head = cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL,
-                                    Z_EROFS_PCLUSTER_TAIL_CLOSED);
                if (z_erofs_is_inline_pcluster(pcl)) {
                        move_to_bypass_jobqueue(pcl, qtail, owned_head);
                        continue;
@@ -1733,8 +1716,8 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
                do {
                        struct page *page;
 
-                       page = pickup_page_for_submission(pcl, i++, pagepool,
-                                                         mc);
+                       page = pickup_page_for_submission(pcl, i++,
+                                       &f->pagepool, mc);
                        if (!page)
                                continue;
 
@@ -1763,7 +1746,7 @@ submit_bio_retry:
                                bio->bi_iter.bi_sector = (sector_t)cur <<
                                        (sb->s_blocksize_bits - 9);
                                bio->bi_private = q[JQ_SUBMIT];
-                               if (f->readahead)
+                               if (readahead)
                                        bio->bi_opf |= REQ_RAHEAD;
                                ++nr_bios;
                        }
@@ -1799,16 +1782,16 @@ submit_bio_retry:
 }
 
 static void z_erofs_runqueue(struct z_erofs_decompress_frontend *f,
-                            struct page **pagepool, bool force_fg)
+                            bool force_fg, bool ra)
 {
        struct z_erofs_decompressqueue io[NR_JOBQUEUES];
 
        if (f->owned_head == Z_EROFS_PCLUSTER_TAIL)
                return;
-       z_erofs_submit_queue(f, pagepool, io, &force_fg);
+       z_erofs_submit_queue(f, io, &force_fg, ra);
 
        /* handle bypass queue (no i/o pclusters) immediately */
-       z_erofs_decompress_queue(&io[JQ_BYPASS], pagepool);
+       z_erofs_decompress_queue(&io[JQ_BYPASS], &f->pagepool);
 
        if (!force_fg)
                return;
@@ -1817,7 +1800,7 @@ static void z_erofs_runqueue(struct z_erofs_decompress_frontend *f,
        wait_for_completion_io(&io[JQ_SUBMIT].u.done);
 
        /* handle synchronous decompress queue in the caller context */
-       z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
+       z_erofs_decompress_queue(&io[JQ_SUBMIT], &f->pagepool);
 }
 
 /*
@@ -1825,29 +1808,28 @@ static void z_erofs_runqueue(struct z_erofs_decompress_frontend *f,
  * approximate readmore strategies as a start.
  */
 static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
-                                     struct readahead_control *rac,
-                                     erofs_off_t end,
-                                     struct page **pagepool,
-                                     bool backmost)
+               struct readahead_control *rac, bool backmost)
 {
        struct inode *inode = f->inode;
        struct erofs_map_blocks *map = &f->map;
-       erofs_off_t cur;
+       erofs_off_t cur, end, headoffset = f->headoffset;
        int err;
 
        if (backmost) {
+               if (rac)
+                       end = headoffset + readahead_length(rac) - 1;
+               else
+                       end = headoffset + PAGE_SIZE - 1;
                map->m_la = end;
                err = z_erofs_map_blocks_iter(inode, map,
                                              EROFS_GET_BLOCKS_READMORE);
                if (err)
                        return;
 
-               /* expend ra for the trailing edge if readahead */
+               /* expand ra for the trailing edge if readahead */
                if (rac) {
-                       loff_t newstart = readahead_pos(rac);
-
                        cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
-                       readahead_expand(rac, newstart, cur - newstart);
+                       readahead_expand(rac, headoffset, cur - headoffset);
                        return;
                }
                end = round_up(end, PAGE_SIZE);
@@ -1868,7 +1850,7 @@ static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
                        if (PageUptodate(page)) {
                                unlock_page(page);
                        } else {
-                               err = z_erofs_do_read_page(f, page, pagepool);
+                               err = z_erofs_do_read_page(f, page);
                                if (err)
                                        erofs_err(inode->i_sb,
                                                  "readmore error at page %lu @ nid %llu",
@@ -1889,28 +1871,24 @@ static int z_erofs_read_folio(struct file *file, struct folio *folio)
        struct inode *const inode = page->mapping->host;
        struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
        struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
-       struct page *pagepool = NULL;
        int err;
 
        trace_erofs_readpage(page, false);
        f.headoffset = (erofs_off_t)page->index << PAGE_SHIFT;
 
-       z_erofs_pcluster_readmore(&f, NULL, f.headoffset + PAGE_SIZE - 1,
-                                 &pagepool, true);
-       err = z_erofs_do_read_page(&f, page, &pagepool);
-       z_erofs_pcluster_readmore(&f, NULL, 0, &pagepool, false);
-
+       z_erofs_pcluster_readmore(&f, NULL, true);
+       err = z_erofs_do_read_page(&f, page);
+       z_erofs_pcluster_readmore(&f, NULL, false);
        (void)z_erofs_collector_end(&f);
 
        /* if some compressed cluster ready, need submit them anyway */
-       z_erofs_runqueue(&f, &pagepool,
-                        z_erofs_get_sync_decompress_policy(sbi, 0));
+       z_erofs_runqueue(&f, z_erofs_is_sync_decompress(sbi, 0), false);
 
        if (err)
                erofs_err(inode->i_sb, "failed to read, err [%d]", err);
 
        erofs_put_metabuf(&f.map.buf);
-       erofs_release_pages(&pagepool);
+       erofs_release_pages(&f.pagepool);
        return err;
 }
 
@@ -1919,14 +1897,12 @@ static void z_erofs_readahead(struct readahead_control *rac)
        struct inode *const inode = rac->mapping->host;
        struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
        struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
-       struct page *pagepool = NULL, *head = NULL, *page;
+       struct page *head = NULL, *page;
        unsigned int nr_pages;
 
-       f.readahead = true;
        f.headoffset = readahead_pos(rac);
 
-       z_erofs_pcluster_readmore(&f, rac, f.headoffset +
-                                 readahead_length(rac) - 1, &pagepool, true);
+       z_erofs_pcluster_readmore(&f, rac, true);
        nr_pages = readahead_count(rac);
        trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
 
@@ -1942,20 +1918,19 @@ static void z_erofs_readahead(struct readahead_control *rac)
                /* traversal in reverse order */
                head = (void *)page_private(page);
 
-               err = z_erofs_do_read_page(&f, page, &pagepool);
+               err = z_erofs_do_read_page(&f, page);
                if (err)
                        erofs_err(inode->i_sb,
                                  "readahead error at page %lu @ nid %llu",
                                  page->index, EROFS_I(inode)->nid);
                put_page(page);
        }
-       z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
+       z_erofs_pcluster_readmore(&f, rac, false);
        (void)z_erofs_collector_end(&f);
 
-       z_erofs_runqueue(&f, &pagepool,
-                        z_erofs_get_sync_decompress_policy(sbi, nr_pages));
+       z_erofs_runqueue(&f, z_erofs_is_sync_decompress(sbi, nr_pages), true);
        erofs_put_metabuf(&f.map.buf);
-       erofs_release_pages(&pagepool);
+       erofs_release_pages(&f.pagepool);
 }
 
 const struct address_space_operations z_erofs_aops = {
index d37c5c8..1909dda 100644 (file)
@@ -22,8 +22,8 @@ struct z_erofs_maprecorder {
        bool partialref;
 };
 
-static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
-                                        unsigned long lcn)
+static int z_erofs_load_full_lcluster(struct z_erofs_maprecorder *m,
+                                     unsigned long lcn)
 {
        struct inode *const inode = m->inode;
        struct erofs_inode *const vi = EROFS_I(inode);
@@ -129,7 +129,7 @@ static int unpack_compacted_index(struct z_erofs_maprecorder *m,
        u8 *in, type;
        bool big_pcluster;
 
-       if (1 << amortizedshift == 4)
+       if (1 << amortizedshift == 4 && lclusterbits <= 14)
                vcnt = 2;
        else if (1 << amortizedshift == 2 && lclusterbits == 12)
                vcnt = 16;
@@ -226,12 +226,11 @@ static int unpack_compacted_index(struct z_erofs_maprecorder *m,
        return 0;
 }
 
-static int compacted_load_cluster_from_disk(struct z_erofs_maprecorder *m,
-                                           unsigned long lcn, bool lookahead)
+static int z_erofs_load_compact_lcluster(struct z_erofs_maprecorder *m,
+                                        unsigned long lcn, bool lookahead)
 {
        struct inode *const inode = m->inode;
        struct erofs_inode *const vi = EROFS_I(inode);
-       const unsigned int lclusterbits = vi->z_logical_clusterbits;
        const erofs_off_t ebase = sizeof(struct z_erofs_map_header) +
                ALIGN(erofs_iloc(inode) + vi->inode_isize + vi->xattr_isize, 8);
        unsigned int totalidx = erofs_iblks(inode);
@@ -239,9 +238,6 @@ static int compacted_load_cluster_from_disk(struct z_erofs_maprecorder *m,
        unsigned int amortizedshift;
        erofs_off_t pos;
 
-       if (lclusterbits != 12)
-               return -EOPNOTSUPP;
-
        if (lcn >= totalidx)
                return -EINVAL;
 
@@ -281,23 +277,23 @@ out:
        return unpack_compacted_index(m, amortizedshift, pos, lookahead);
 }
 
-static int z_erofs_load_cluster_from_disk(struct z_erofs_maprecorder *m,
-                                         unsigned int lcn, bool lookahead)
+static int z_erofs_load_lcluster_from_disk(struct z_erofs_maprecorder *m,
+                                          unsigned int lcn, bool lookahead)
 {
-       const unsigned int datamode = EROFS_I(m->inode)->datalayout;
-
-       if (datamode == EROFS_INODE_COMPRESSED_FULL)
-               return legacy_load_cluster_from_disk(m, lcn);
-
-       if (datamode == EROFS_INODE_COMPRESSED_COMPACT)
-               return compacted_load_cluster_from_disk(m, lcn, lookahead);
-
-       return -EINVAL;
+       switch (EROFS_I(m->inode)->datalayout) {
+       case EROFS_INODE_COMPRESSED_FULL:
+               return z_erofs_load_full_lcluster(m, lcn);
+       case EROFS_INODE_COMPRESSED_COMPACT:
+               return z_erofs_load_compact_lcluster(m, lcn, lookahead);
+       default:
+               return -EINVAL;
+       }
 }
 
 static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
                                   unsigned int lookback_distance)
 {
+       struct super_block *sb = m->inode->i_sb;
        struct erofs_inode *const vi = EROFS_I(m->inode);
        const unsigned int lclusterbits = vi->z_logical_clusterbits;
 
@@ -305,21 +301,15 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
                unsigned long lcn = m->lcn - lookback_distance;
                int err;
 
-               /* load extent head logical cluster if needed */
-               err = z_erofs_load_cluster_from_disk(m, lcn, false);
+               err = z_erofs_load_lcluster_from_disk(m, lcn, false);
                if (err)
                        return err;
 
                switch (m->type) {
                case Z_EROFS_LCLUSTER_TYPE_NONHEAD:
-                       if (!m->delta[0]) {
-                               erofs_err(m->inode->i_sb,
-                                         "invalid lookback distance 0 @ nid %llu",
-                                         vi->nid);
-                               DBG_BUGON(1);
-                               return -EFSCORRUPTED;
-                       }
                        lookback_distance = m->delta[0];
+                       if (!lookback_distance)
+                               goto err_bogus;
                        continue;
                case Z_EROFS_LCLUSTER_TYPE_PLAIN:
                case Z_EROFS_LCLUSTER_TYPE_HEAD1:
@@ -328,16 +318,15 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
                        m->map->m_la = (lcn << lclusterbits) | m->clusterofs;
                        return 0;
                default:
-                       erofs_err(m->inode->i_sb,
-                                 "unknown type %u @ lcn %lu of nid %llu",
+                       erofs_err(sb, "unknown type %u @ lcn %lu of nid %llu",
                                  m->type, lcn, vi->nid);
                        DBG_BUGON(1);
                        return -EOPNOTSUPP;
                }
        }
-
-       erofs_err(m->inode->i_sb, "bogus lookback distance @ nid %llu",
-                 vi->nid);
+err_bogus:
+       erofs_err(sb, "bogus lookback distance %u @ lcn %lu of nid %llu",
+                 lookback_distance, m->lcn, vi->nid);
        DBG_BUGON(1);
        return -EFSCORRUPTED;
 }
@@ -369,7 +358,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
        if (m->compressedblks)
                goto out;
 
-       err = z_erofs_load_cluster_from_disk(m, lcn, false);
+       err = z_erofs_load_lcluster_from_disk(m, lcn, false);
        if (err)
                return err;
 
@@ -401,9 +390,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
                        break;
                fallthrough;
        default:
-               erofs_err(m->inode->i_sb,
-                         "cannot found CBLKCNT @ lcn %lu of nid %llu",
-                         lcn, vi->nid);
+               erofs_err(sb, "cannot found CBLKCNT @ lcn %lu of nid %llu", lcn,
+                         vi->nid);
                DBG_BUGON(1);
                return -EFSCORRUPTED;
        }
@@ -411,9 +399,7 @@ out:
        map->m_plen = erofs_pos(sb, m->compressedblks);
        return 0;
 err_bonus_cblkcnt:
-       erofs_err(m->inode->i_sb,
-                 "bogus CBLKCNT @ lcn %lu of nid %llu",
-                 lcn, vi->nid);
+       erofs_err(sb, "bogus CBLKCNT @ lcn %lu of nid %llu", lcn, vi->nid);
        DBG_BUGON(1);
        return -EFSCORRUPTED;
 }
@@ -434,7 +420,7 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
                        return 0;
                }
 
-               err = z_erofs_load_cluster_from_disk(m, lcn, true);
+               err = z_erofs_load_lcluster_from_disk(m, lcn, true);
                if (err)
                        return err;
 
@@ -481,7 +467,7 @@ static int z_erofs_do_map_blocks(struct inode *inode,
        initial_lcn = ofs >> lclusterbits;
        endoff = ofs & ((1 << lclusterbits) - 1);
 
-       err = z_erofs_load_cluster_from_disk(&m, initial_lcn, false);
+       err = z_erofs_load_lcluster_from_disk(&m, initial_lcn, false);
        if (err)
                goto unmap_out;
 
@@ -539,8 +525,7 @@ static int z_erofs_do_map_blocks(struct inode *inode,
        if (flags & EROFS_GET_BLOCKS_FINDTAIL) {
                vi->z_tailextent_headlcn = m.lcn;
                /* for non-compact indexes, fragmentoff is 64 bits */
-               if (fragment &&
-                   vi->datalayout == EROFS_INODE_COMPRESSED_FULL)
+               if (fragment && vi->datalayout == EROFS_INODE_COMPRESSED_FULL)
                        vi->z_fragmentoff |= (u64)m.pblk << 32;
        }
        if (ztailpacking && m.lcn == vi->z_tailextent_headlcn) {
index 95850a1..8aa36cd 100644 (file)
@@ -33,17 +33,17 @@ struct eventfd_ctx {
        /*
         * Every time that a write(2) is performed on an eventfd, the
         * value of the __u64 being written is added to "count" and a
-        * wakeup is performed on "wqh". A read(2) will return the "count"
-        * value to userspace, and will reset "count" to zero. The kernel
-        * side eventfd_signal() also, adds to the "count" counter and
-        * issue a wakeup.
+        * wakeup is performed on "wqh". If EFD_SEMAPHORE flag was not
+        * specified, a read(2) will return the "count" value to userspace,
+        * and will reset "count" to zero. The kernel side eventfd_signal()
+        * also, adds to the "count" counter and issue a wakeup.
         */
        __u64 count;
        unsigned int flags;
        int id;
 };
 
-__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask)
+__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, __poll_t mask)
 {
        unsigned long flags;
 
@@ -301,6 +301,8 @@ static void eventfd_show_fdinfo(struct seq_file *m, struct file *f)
                   (unsigned long long)ctx->count);
        spin_unlock_irq(&ctx->wqh.lock);
        seq_printf(m, "eventfd-id: %d\n", ctx->id);
+       seq_printf(m, "eventfd-semaphore: %d\n",
+                  !!(ctx->flags & EFD_SEMAPHORE));
 }
 #endif
 
index 9804834..4b1b336 100644 (file)
@@ -536,7 +536,7 @@ static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi,
 #else
 
 static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi,
-                            unsigned pollflags)
+                            __poll_t pollflags)
 {
        wake_up_poll(&ep->poll_wait, EPOLLIN | pollflags);
 }
@@ -1805,7 +1805,11 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry,
 {
        int ret = default_wake_function(wq_entry, mode, sync, key);
 
-       list_del_init(&wq_entry->entry);
+       /*
+        * Pairs with list_empty_careful in ep_poll, and ensures future loop
+        * iterations see the cause of this wakeup.
+        */
+       list_del_init_careful(&wq_entry->entry);
        return ret;
 }
 
index e99183a..3cbd270 100644 (file)
@@ -389,7 +389,7 @@ const struct file_operations exfat_file_operations = {
 #endif
        .mmap           = generic_file_mmap,
        .fsync          = exfat_file_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
 };
 
index 6b4bebe..d1ae0f0 100644 (file)
@@ -192,7 +192,7 @@ const struct file_operations ext2_file_operations = {
        .release        = ext2_release_file,
        .fsync          = ext2_fsync,
        .get_unmapped_area = thp_get_unmapped_area,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
 };
 
index 0942694..1f72f97 100644 (file)
@@ -305,6 +305,36 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb,
        return desc;
 }
 
+static ext4_fsblk_t ext4_valid_block_bitmap_padding(struct super_block *sb,
+                                                   ext4_group_t block_group,
+                                                   struct buffer_head *bh)
+{
+       ext4_grpblk_t next_zero_bit;
+       unsigned long bitmap_size = sb->s_blocksize * 8;
+       unsigned int offset = num_clusters_in_group(sb, block_group);
+
+       if (bitmap_size <= offset)
+               return 0;
+
+       next_zero_bit = ext4_find_next_zero_bit(bh->b_data, bitmap_size, offset);
+
+       return (next_zero_bit < bitmap_size ? next_zero_bit : 0);
+}
+
+struct ext4_group_info *ext4_get_group_info(struct super_block *sb,
+                                           ext4_group_t group)
+{
+       struct ext4_group_info **grp_info;
+       long indexv, indexh;
+
+       if (unlikely(group >= EXT4_SB(sb)->s_groups_count))
+               return NULL;
+       indexv = group >> (EXT4_DESC_PER_BLOCK_BITS(sb));
+       indexh = group & ((EXT4_DESC_PER_BLOCK(sb)) - 1);
+       grp_info = sbi_array_rcu_deref(EXT4_SB(sb), s_group_info, indexv);
+       return grp_info[indexh];
+}
+
 /*
  * Return the block number which was discovered to be invalid, or 0 if
  * the block bitmap is valid.
@@ -379,7 +409,7 @@ static int ext4_validate_block_bitmap(struct super_block *sb,
 
        if (buffer_verified(bh))
                return 0;
-       if (EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
+       if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
                return -EFSCORRUPTED;
 
        ext4_lock_group(sb, block_group);
@@ -402,6 +432,15 @@ static int ext4_validate_block_bitmap(struct super_block *sb,
                                        EXT4_GROUP_INFO_BBITMAP_CORRUPT);
                return -EFSCORRUPTED;
        }
+       blk = ext4_valid_block_bitmap_padding(sb, block_group, bh);
+       if (unlikely(blk != 0)) {
+               ext4_unlock_group(sb, block_group);
+               ext4_error(sb, "bg %u: block %llu: padding at end of block bitmap is not set",
+                          block_group, blk);
+               ext4_mark_group_bitmap_corrupted(sb, block_group,
+                                                EXT4_GROUP_INFO_BBITMAP_CORRUPT);
+               return -EFSCORRUPTED;
+       }
        set_buffer_verified(bh);
 verified:
        ext4_unlock_group(sb, block_group);
@@ -845,7 +884,10 @@ static unsigned long ext4_bg_num_gdb_nometa(struct super_block *sb,
        if (!ext4_bg_has_super(sb, group))
                return 0;
 
-       return EXT4_SB(sb)->s_gdb_count;
+       if (ext4_has_feature_meta_bg(sb))
+               return le32_to_cpu(EXT4_SB(sb)->s_es->s_first_meta_bg);
+       else
+               return EXT4_SB(sb)->s_gdb_count;
 }
 
 /**
index 18cb268..02fa8a6 100644 (file)
@@ -918,11 +918,13 @@ do {                                                                             \
  *                       where the second inode has larger inode number
  *                       than the first
  *  I_DATA_SEM_QUOTA  - Used for quota inodes only
+ *  I_DATA_SEM_EA     - Used for ea_inodes only
  */
 enum {
        I_DATA_SEM_NORMAL = 0,
        I_DATA_SEM_OTHER,
        I_DATA_SEM_QUOTA,
+       I_DATA_SEM_EA
 };
 
 
@@ -1684,6 +1686,30 @@ static inline struct ext4_inode_info *EXT4_I(struct inode *inode)
        return container_of(inode, struct ext4_inode_info, vfs_inode);
 }
 
+static inline int ext4_writepages_down_read(struct super_block *sb)
+{
+       percpu_down_read(&EXT4_SB(sb)->s_writepages_rwsem);
+       return memalloc_nofs_save();
+}
+
+static inline void ext4_writepages_up_read(struct super_block *sb, int ctx)
+{
+       memalloc_nofs_restore(ctx);
+       percpu_up_read(&EXT4_SB(sb)->s_writepages_rwsem);
+}
+
+static inline int ext4_writepages_down_write(struct super_block *sb)
+{
+       percpu_down_write(&EXT4_SB(sb)->s_writepages_rwsem);
+       return memalloc_nofs_save();
+}
+
+static inline void ext4_writepages_up_write(struct super_block *sb, int ctx)
+{
+       memalloc_nofs_restore(ctx);
+       percpu_up_write(&EXT4_SB(sb)->s_writepages_rwsem);
+}
+
 static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
 {
        return ino == EXT4_ROOT_INO ||
@@ -2625,6 +2651,8 @@ extern void ext4_check_blocks_bitmap(struct super_block *);
 extern struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb,
                                                    ext4_group_t block_group,
                                                    struct buffer_head ** bh);
+extern struct ext4_group_info *ext4_get_group_info(struct super_block *sb,
+                                                  ext4_group_t group);
 extern int ext4_should_retry_alloc(struct super_block *sb, int *retries);
 
 extern struct buffer_head *ext4_read_block_bitmap_nowait(struct super_block *sb,
@@ -2875,7 +2903,8 @@ typedef enum {
        EXT4_IGET_NORMAL =      0,
        EXT4_IGET_SPECIAL =     0x0001, /* OK to iget a system inode */
        EXT4_IGET_HANDLE =      0x0002, /* Inode # is from a handle */
-       EXT4_IGET_BAD =         0x0004  /* Allow to iget a bad inode */
+       EXT4_IGET_BAD =         0x0004, /* Allow to iget a bad inode */
+       EXT4_IGET_EA_INODE =    0x0008  /* Inode should contain an EA value */
 } ext4_iget_flags;
 
 extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
@@ -2939,6 +2968,7 @@ int ext4_fileattr_set(struct mnt_idmap *idmap,
 int ext4_fileattr_get(struct dentry *dentry, struct fileattr *fa);
 extern void ext4_reset_inode_seed(struct inode *inode);
 int ext4_update_overhead(struct super_block *sb, bool force);
+int ext4_force_shutdown(struct super_block *sb, u32 flags);
 
 /* migrate.c */
 extern int ext4_ext_migrate(struct inode *);
@@ -3232,19 +3262,6 @@ static inline void ext4_isize_set(struct ext4_inode *raw_inode, loff_t i_size)
        raw_inode->i_size_high = cpu_to_le32(i_size >> 32);
 }
 
-static inline
-struct ext4_group_info *ext4_get_group_info(struct super_block *sb,
-                                           ext4_group_t group)
-{
-        struct ext4_group_info **grp_info;
-        long indexv, indexh;
-        BUG_ON(group >= EXT4_SB(sb)->s_groups_count);
-        indexv = group >> (EXT4_DESC_PER_BLOCK_BITS(sb));
-        indexh = group & ((EXT4_DESC_PER_BLOCK(sb)) - 1);
-        grp_info = sbi_array_rcu_deref(EXT4_SB(sb), s_group_info, indexv);
-        return grp_info[indexh];
-}
-
 /*
  * Reading s_groups_count requires using smp_rmb() afterwards.  See
  * the locking protocol documented in the comments of ext4_group_add()
index 7bc2210..595abb9 100644 (file)
@@ -267,14 +267,12 @@ static void __es_find_extent_range(struct inode *inode,
 
        /* see if the extent has been cached */
        es->es_lblk = es->es_len = es->es_pblk = 0;
-       if (tree->cache_es) {
-               es1 = tree->cache_es;
-               if (in_range(lblk, es1->es_lblk, es1->es_len)) {
-                       es_debug("%u cached by [%u/%u) %llu %x\n",
-                                lblk, es1->es_lblk, es1->es_len,
-                                ext4_es_pblock(es1), ext4_es_status(es1));
-                       goto out;
-               }
+       es1 = READ_ONCE(tree->cache_es);
+       if (es1 && in_range(lblk, es1->es_lblk, es1->es_len)) {
+               es_debug("%u cached by [%u/%u) %llu %x\n",
+                        lblk, es1->es_lblk, es1->es_len,
+                        ext4_es_pblock(es1), ext4_es_status(es1));
+               goto out;
        }
 
        es1 = __es_tree_search(&tree->root, lblk);
@@ -293,7 +291,7 @@ out:
        }
 
        if (es1 && matching_fn(es1)) {
-               tree->cache_es = es1;
+               WRITE_ONCE(tree->cache_es, es1);
                es->es_lblk = es1->es_lblk;
                es->es_len = es1->es_len;
                es->es_pblk = es1->es_pblk;
@@ -931,14 +929,12 @@ int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
 
        /* find extent in cache firstly */
        es->es_lblk = es->es_len = es->es_pblk = 0;
-       if (tree->cache_es) {
-               es1 = tree->cache_es;
-               if (in_range(lblk, es1->es_lblk, es1->es_len)) {
-                       es_debug("%u cached by [%u/%u)\n",
-                                lblk, es1->es_lblk, es1->es_len);
-                       found = 1;
-                       goto out;
-               }
+       es1 = READ_ONCE(tree->cache_es);
+       if (es1 && in_range(lblk, es1->es_lblk, es1->es_len)) {
+               es_debug("%u cached by [%u/%u)\n",
+                        lblk, es1->es_lblk, es1->es_len);
+               found = 1;
+               goto out;
        }
 
        node = tree->root.rb_node;
index d101b3b..e826190 100644 (file)
@@ -147,6 +147,17 @@ static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
        return generic_file_read_iter(iocb, to);
 }
 
+static ssize_t ext4_file_splice_read(struct file *in, loff_t *ppos,
+                                    struct pipe_inode_info *pipe,
+                                    size_t len, unsigned int flags)
+{
+       struct inode *inode = file_inode(in);
+
+       if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+               return -EIO;
+       return filemap_splice_read(in, ppos, pipe, len, flags);
+}
+
 /*
  * Called when an inode is released. Note that this is different
  * from ext4_file_open: open gets called at every open, but release
@@ -957,7 +968,7 @@ const struct file_operations ext4_file_operations = {
        .release        = ext4_release_file,
        .fsync          = ext4_sync_file,
        .get_unmapped_area = thp_get_unmapped_area,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = ext4_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .fallocate      = ext4_fallocate,
 };
index f65fdb2..2a14320 100644 (file)
@@ -108,6 +108,13 @@ static int ext4_fsync_journal(struct inode *inode, bool datasync,
        journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
        tid_t commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
 
+       /*
+        * Fastcommit does not really support fsync on directories or other
+        * special files. Force a full commit.
+        */
+       if (!S_ISREG(inode->i_mode))
+               return ext4_force_commit(inode->i_sb);
+
        if (journal->j_flags & JBD2_BARRIER &&
            !jbd2_trans_will_send_data_barrier(journal, commit_tid))
                *needs_barrier = true;
index 147b524..46c3423 100644 (file)
@@ -277,7 +277,11 @@ static int __ext4fs_dirhash(const struct inode *dir, const char *name, int len,
        }
        default:
                hinfo->hash = 0;
-               return -1;
+               hinfo->minor_hash = 0;
+               ext4_warning(dir->i_sb,
+                            "invalid/unsupported hash tree version %u",
+                            hinfo->hash_version);
+               return -EINVAL;
        }
        hash = hash & ~1;
        if (hash == (EXT4_HTREE_EOF_32BIT << 1))
index 787ab89..754f961 100644 (file)
@@ -91,7 +91,7 @@ static int ext4_validate_inode_bitmap(struct super_block *sb,
 
        if (buffer_verified(bh))
                return 0;
-       if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
+       if (!grp || EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
                return -EFSCORRUPTED;
 
        ext4_lock_group(sb, block_group);
@@ -293,7 +293,7 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
        }
        if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
                grp = ext4_get_group_info(sb, block_group);
-               if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) {
+               if (!grp || unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) {
                        fatal = -EFSCORRUPTED;
                        goto error_return;
                }
@@ -1046,7 +1046,7 @@ got_group:
                         * Skip groups with already-known suspicious inode
                         * tables
                         */
-                       if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
+                       if (!grp || EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
                                goto next_group;
                }
 
@@ -1183,6 +1183,10 @@ got:
 
                if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
                        grp = ext4_get_group_info(sb, group);
+                       if (!grp) {
+                               err = -EFSCORRUPTED;
+                               goto out;
+                       }
                        down_read(&grp->alloc_sem); /*
                                                     * protect vs itable
                                                     * lazyinit
@@ -1526,7 +1530,7 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group,
        }
 
        gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
-       if (!gdp)
+       if (!gdp || !grp)
                goto out;
 
        /*
index 859bc4e..5854bd5 100644 (file)
@@ -34,6 +34,7 @@ static int get_max_inline_xattr_value_size(struct inode *inode,
        struct ext4_xattr_ibody_header *header;
        struct ext4_xattr_entry *entry;
        struct ext4_inode *raw_inode;
+       void *end;
        int free, min_offs;
 
        if (!EXT4_INODE_HAS_XATTR_SPACE(inode))
@@ -57,14 +58,23 @@ static int get_max_inline_xattr_value_size(struct inode *inode,
        raw_inode = ext4_raw_inode(iloc);
        header = IHDR(inode, raw_inode);
        entry = IFIRST(header);
+       end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
 
        /* Compute min_offs. */
-       for (; !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) {
+       while (!IS_LAST_ENTRY(entry)) {
+               void *next = EXT4_XATTR_NEXT(entry);
+
+               if (next >= end) {
+                       EXT4_ERROR_INODE(inode,
+                                        "corrupt xattr in inline inode");
+                       return 0;
+               }
                if (!entry->e_value_inum && entry->e_value_size) {
                        size_t offs = le16_to_cpu(entry->e_value_offs);
                        if (offs < min_offs)
                                min_offs = offs;
                }
+               entry = next;
        }
        free = min_offs -
                ((void *)entry - (void *)IFIRST(header)) - sizeof(__u32);
@@ -350,7 +360,7 @@ static int ext4_update_inline_data(handle_t *handle, struct inode *inode,
 
        error = ext4_xattr_ibody_get(inode, i.name_index, i.name,
                                     value, len);
-       if (error == -ENODATA)
+       if (error < 0)
                goto out;
 
        BUFFER_TRACE(is.iloc.bh, "get_write_access");
@@ -1175,6 +1185,7 @@ static int ext4_finish_convert_inline_dir(handle_t *handle,
                ext4_initialize_dirent_tail(dir_block,
                                            inode->i_sb->s_blocksize);
        set_buffer_uptodate(dir_block);
+       unlock_buffer(dir_block);
        err = ext4_handle_dirty_dirblock(handle, inode, dir_block);
        if (err)
                return err;
@@ -1249,6 +1260,7 @@ static int ext4_convert_inline_data_nolock(handle_t *handle,
        if (!S_ISDIR(inode->i_mode)) {
                memcpy(data_bh->b_data, buf, inline_size);
                set_buffer_uptodate(data_bh);
+               unlock_buffer(data_bh);
                error = ext4_handle_dirty_metadata(handle,
                                                   inode, data_bh);
        } else {
@@ -1256,7 +1268,6 @@ static int ext4_convert_inline_data_nolock(handle_t *handle,
                                                       buf, inline_size);
        }
 
-       unlock_buffer(data_bh);
 out_restore:
        if (error)
                ext4_restore_inline_data(handle, inode, iloc, buf, inline_size);
index 0d5ba92..02de439 100644 (file)
@@ -2783,11 +2783,12 @@ static int ext4_writepages(struct address_space *mapping,
                .can_map = 1,
        };
        int ret;
+       int alloc_ctx;
 
        if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
                return -EIO;
 
-       percpu_down_read(&EXT4_SB(sb)->s_writepages_rwsem);
+       alloc_ctx = ext4_writepages_down_read(sb);
        ret = ext4_do_writepages(&mpd);
        /*
         * For data=journal writeback we could have come across pages marked
@@ -2796,7 +2797,7 @@ static int ext4_writepages(struct address_space *mapping,
         */
        if (!ret && mpd.journalled_more_data)
                ret = ext4_do_writepages(&mpd);
-       percpu_up_read(&EXT4_SB(sb)->s_writepages_rwsem);
+       ext4_writepages_up_read(sb, alloc_ctx);
 
        return ret;
 }
@@ -2824,17 +2825,18 @@ static int ext4_dax_writepages(struct address_space *mapping,
        long nr_to_write = wbc->nr_to_write;
        struct inode *inode = mapping->host;
        struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
+       int alloc_ctx;
 
        if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
                return -EIO;
 
-       percpu_down_read(&sbi->s_writepages_rwsem);
+       alloc_ctx = ext4_writepages_down_read(inode->i_sb);
        trace_ext4_writepages(inode, wbc);
 
        ret = dax_writeback_mapping_range(mapping, sbi->s_daxdev, wbc);
        trace_ext4_writepages_result(inode, wbc, ret,
                                     nr_to_write - wbc->nr_to_write);
-       percpu_up_read(&sbi->s_writepages_rwsem);
+       ext4_writepages_up_read(inode->i_sb, alloc_ctx);
        return ret;
 }
 
@@ -3375,7 +3377,7 @@ static int ext4_iomap_overwrite_begin(struct inode *inode, loff_t offset,
         */
        flags &= ~IOMAP_WRITE;
        ret = ext4_iomap_begin(inode, offset, length, flags, iomap, srcmap);
-       WARN_ON_ONCE(iomap->type != IOMAP_MAPPED);
+       WARN_ON_ONCE(!ret && iomap->type != IOMAP_MAPPED);
        return ret;
 }
 
@@ -4639,6 +4641,24 @@ static inline void ext4_inode_set_iversion_queried(struct inode *inode, u64 val)
                inode_set_iversion_queried(inode, val);
 }
 
+static const char *check_igot_inode(struct inode *inode, ext4_iget_flags flags)
+
+{
+       if (flags & EXT4_IGET_EA_INODE) {
+               if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+                       return "missing EA_INODE flag";
+               if (ext4_test_inode_state(inode, EXT4_STATE_XATTR) ||
+                   EXT4_I(inode)->i_file_acl)
+                       return "ea_inode with extended attributes";
+       } else {
+               if ((EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+                       return "unexpected EA_INODE flag";
+       }
+       if (is_bad_inode(inode) && !(flags & EXT4_IGET_BAD))
+               return "unexpected bad inode w/o EXT4_IGET_BAD";
+       return NULL;
+}
+
 struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
                          ext4_iget_flags flags, const char *function,
                          unsigned int line)
@@ -4648,6 +4668,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
        struct ext4_inode_info *ei;
        struct ext4_super_block *es = EXT4_SB(sb)->s_es;
        struct inode *inode;
+       const char *err_str;
        journal_t *journal = EXT4_SB(sb)->s_journal;
        long ret;
        loff_t size;
@@ -4675,8 +4696,14 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
        inode = iget_locked(sb, ino);
        if (!inode)
                return ERR_PTR(-ENOMEM);
-       if (!(inode->i_state & I_NEW))
+       if (!(inode->i_state & I_NEW)) {
+               if ((err_str = check_igot_inode(inode, flags)) != NULL) {
+                       ext4_error_inode(inode, function, line, 0, err_str);
+                       iput(inode);
+                       return ERR_PTR(-EFSCORRUPTED);
+               }
                return inode;
+       }
 
        ei = EXT4_I(inode);
        iloc.bh = NULL;
@@ -4942,10 +4969,9 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
        if (IS_CASEFOLDED(inode) && !ext4_has_feature_casefold(inode->i_sb))
                ext4_error_inode(inode, function, line, 0,
                                 "casefold flag without casefold feature");
-       if (is_bad_inode(inode) && !(flags & EXT4_IGET_BAD)) {
-               ext4_error_inode(inode, function, line, 0,
-                                "bad inode without EXT4_IGET_BAD flag");
-               ret = -EUCLEAN;
+       if ((err_str = check_igot_inode(inode, flags)) != NULL) {
+               ext4_error_inode(inode, function, line, 0, err_str);
+               ret = -EFSCORRUPTED;
                goto bad_inode;
        }
 
@@ -5928,7 +5954,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
        journal_t *journal;
        handle_t *handle;
        int err;
-       struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+       int alloc_ctx;
 
        /*
         * We have to be very careful here: changing a data block's
@@ -5966,7 +5992,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
                }
        }
 
-       percpu_down_write(&sbi->s_writepages_rwsem);
+       alloc_ctx = ext4_writepages_down_write(inode->i_sb);
        jbd2_journal_lock_updates(journal);
 
        /*
@@ -5983,7 +6009,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
                err = jbd2_journal_flush(journal, 0);
                if (err < 0) {
                        jbd2_journal_unlock_updates(journal);
-                       percpu_up_write(&sbi->s_writepages_rwsem);
+                       ext4_writepages_up_write(inode->i_sb, alloc_ctx);
                        return err;
                }
                ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
@@ -5991,7 +6017,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
        ext4_set_aops(inode);
 
        jbd2_journal_unlock_updates(journal);
-       percpu_up_write(&sbi->s_writepages_rwsem);
+       ext4_writepages_up_write(inode->i_sb, alloc_ctx);
 
        if (val)
                filemap_invalidate_unlock(inode->i_mapping);
index f9a4301..961284c 100644 (file)
@@ -793,16 +793,9 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
 }
 #endif
 
-static int ext4_shutdown(struct super_block *sb, unsigned long arg)
+int ext4_force_shutdown(struct super_block *sb, u32 flags)
 {
        struct ext4_sb_info *sbi = EXT4_SB(sb);
-       __u32 flags;
-
-       if (!capable(CAP_SYS_ADMIN))
-               return -EPERM;
-
-       if (get_user(flags, (__u32 __user *)arg))
-               return -EFAULT;
 
        if (flags > EXT4_GOING_FLAGS_NOLOGFLUSH)
                return -EINVAL;
@@ -838,6 +831,19 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
        return 0;
 }
 
+static int ext4_ioctl_shutdown(struct super_block *sb, unsigned long arg)
+{
+       u32 flags;
+
+       if (!capable(CAP_SYS_ADMIN))
+               return -EPERM;
+
+       if (get_user(flags, (__u32 __user *)arg))
+               return -EFAULT;
+
+       return ext4_force_shutdown(sb, flags);
+}
+
 struct getfsmap_info {
        struct super_block      *gi_sb;
        struct fsmap_head __user *gi_data;
@@ -1566,7 +1572,7 @@ resizefs_out:
                return ext4_ioctl_get_es_cache(filp, arg);
 
        case EXT4_IOC_SHUTDOWN:
-               return ext4_shutdown(sb, arg);
+               return ext4_ioctl_shutdown(sb, arg);
 
        case FS_IOC_ENABLE_VERITY:
                if (!ext4_has_feature_verity(sb))
index 78259bd..20f67a2 100644 (file)
@@ -745,6 +745,8 @@ static int __mb_check_buddy(struct ext4_buddy *e4b, char *file,
        MB_CHECK_ASSERT(e4b->bd_info->bb_fragments == fragments);
 
        grp = ext4_get_group_info(sb, e4b->bd_group);
+       if (!grp)
+               return NULL;
        list_for_each(cur, &grp->bb_prealloc_list) {
                ext4_group_t groupnr;
                struct ext4_prealloc_space *pa;
@@ -1060,9 +1062,9 @@ mb_set_largest_free_order(struct super_block *sb, struct ext4_group_info *grp)
 
 static noinline_for_stack
 void ext4_mb_generate_buddy(struct super_block *sb,
-                               void *buddy, void *bitmap, ext4_group_t group)
+                           void *buddy, void *bitmap, ext4_group_t group,
+                           struct ext4_group_info *grp)
 {
-       struct ext4_group_info *grp = ext4_get_group_info(sb, group);
        struct ext4_sb_info *sbi = EXT4_SB(sb);
        ext4_grpblk_t max = EXT4_CLUSTERS_PER_GROUP(sb);
        ext4_grpblk_t i = 0;
@@ -1181,6 +1183,8 @@ static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp)
                        break;
 
                grinfo = ext4_get_group_info(sb, group);
+               if (!grinfo)
+                       continue;
                /*
                 * If page is uptodate then we came here after online resize
                 * which added some new uninitialized group info structs, so
@@ -1246,6 +1250,10 @@ static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp)
                                group, page->index, i * blocksize);
                        trace_ext4_mb_buddy_bitmap_load(sb, group);
                        grinfo = ext4_get_group_info(sb, group);
+                       if (!grinfo) {
+                               err = -EFSCORRUPTED;
+                               goto out;
+                       }
                        grinfo->bb_fragments = 0;
                        memset(grinfo->bb_counters, 0,
                               sizeof(*grinfo->bb_counters) *
@@ -1256,7 +1264,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp)
                        ext4_lock_group(sb, group);
                        /* init the buddy */
                        memset(data, 0xff, blocksize);
-                       ext4_mb_generate_buddy(sb, data, incore, group);
+                       ext4_mb_generate_buddy(sb, data, incore, group, grinfo);
                        ext4_unlock_group(sb, group);
                        incore = NULL;
                } else {
@@ -1370,6 +1378,9 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group, gfp_t gfp)
        might_sleep();
        mb_debug(sb, "init group %u\n", group);
        this_grp = ext4_get_group_info(sb, group);
+       if (!this_grp)
+               return -EFSCORRUPTED;
+
        /*
         * This ensures that we don't reinit the buddy cache
         * page which map to the group from which we are already
@@ -1444,6 +1455,8 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
 
        blocks_per_page = PAGE_SIZE / sb->s_blocksize;
        grp = ext4_get_group_info(sb, group);
+       if (!grp)
+               return -EFSCORRUPTED;
 
        e4b->bd_blkbits = sb->s_blocksize_bits;
        e4b->bd_info = grp;
@@ -2049,7 +2062,7 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
        if (bex->fe_len < gex->fe_len)
                return;
 
-       if (finish_group)
+       if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan)
                ext4_mb_use_best_found(ac, e4b);
 }
 
@@ -2061,6 +2074,20 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
  * in the context. Later, the best found extent will be used, if
  * mballoc can't find good enough extent.
  *
+ * The algorithm used is roughly as follows:
+ *
+ * * If free extent found is exactly as big as goal, then
+ *   stop the scan and use it immediately
+ *
+ * * If free extent found is smaller than goal, then keep retrying
+ *   upto a max of sbi->s_mb_max_to_scan times (default 200). After
+ *   that stop scanning and use whatever we have.
+ *
+ * * If free extent found is bigger than goal, then keep retrying
+ *   upto a max of sbi->s_mb_min_to_scan times (default 10) before
+ *   stopping the scan and using the extent.
+ *
+ *
  * FIXME: real allocation policy is to be designed yet!
  */
 static void ext4_mb_measure_extent(struct ext4_allocation_context *ac,
@@ -2159,6 +2186,8 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
        struct ext4_group_info *grp = ext4_get_group_info(ac->ac_sb, group);
        struct ext4_free_extent ex;
 
+       if (!grp)
+               return -EFSCORRUPTED;
        if (!(ac->ac_flags & (EXT4_MB_HINT_TRY_GOAL | EXT4_MB_HINT_GOAL_ONLY)))
                return 0;
        if (grp->bb_free == 0)
@@ -2385,7 +2414,7 @@ static bool ext4_mb_good_group(struct ext4_allocation_context *ac,
 
        BUG_ON(cr < 0 || cr >= 4);
 
-       if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(grp)))
+       if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(grp) || !grp))
                return false;
 
        free = grp->bb_free;
@@ -2454,6 +2483,8 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac,
        ext4_grpblk_t free;
        int ret = 0;
 
+       if (!grp)
+               return -EFSCORRUPTED;
        if (sbi->s_mb_stats)
                atomic64_inc(&sbi->s_bal_cX_groups_considered[ac->ac_criteria]);
        if (should_lock) {
@@ -2534,7 +2565,7 @@ ext4_group_t ext4_mb_prefetch(struct super_block *sb, ext4_group_t group,
                 * prefetch once, so we avoid getblk() call, which can
                 * be expensive.
                 */
-               if (!EXT4_MB_GRP_TEST_AND_SET_READ(grp) &&
+               if (gdp && grp && !EXT4_MB_GRP_TEST_AND_SET_READ(grp) &&
                    EXT4_MB_GRP_NEED_INIT(grp) &&
                    ext4_free_group_clusters(sb, gdp) > 0 &&
                    !(ext4_has_group_desc_csum(sb) &&
@@ -2578,7 +2609,7 @@ void ext4_mb_prefetch_fini(struct super_block *sb, ext4_group_t group,
                gdp = ext4_get_group_desc(sb, group, NULL);
                grp = ext4_get_group_info(sb, group);
 
-               if (EXT4_MB_GRP_NEED_INIT(grp) &&
+               if (grp && gdp && EXT4_MB_GRP_NEED_INIT(grp) &&
                    ext4_free_group_clusters(sb, gdp) > 0 &&
                    !(ext4_has_group_desc_csum(sb) &&
                      (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) {
@@ -2837,6 +2868,8 @@ static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v)
                sizeof(struct ext4_group_info);
 
        grinfo = ext4_get_group_info(sb, group);
+       if (!grinfo)
+               return 0;
        /* Load the group info in memory only if not already loaded. */
        if (unlikely(EXT4_MB_GRP_NEED_INIT(grinfo))) {
                err = ext4_mb_load_buddy(sb, group, &e4b);
@@ -2847,7 +2880,7 @@ static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v)
                buddy_loaded = 1;
        }
 
-       memcpy(&sg, ext4_get_group_info(sb, group), i);
+       memcpy(&sg, grinfo, i);
 
        if (buddy_loaded)
                ext4_mb_unload_buddy(&e4b);
@@ -3208,8 +3241,12 @@ static int ext4_mb_init_backend(struct super_block *sb)
 
 err_freebuddy:
        cachep = get_groupinfo_cache(sb->s_blocksize_bits);
-       while (i-- > 0)
-               kmem_cache_free(cachep, ext4_get_group_info(sb, i));
+       while (i-- > 0) {
+               struct ext4_group_info *grp = ext4_get_group_info(sb, i);
+
+               if (grp)
+                       kmem_cache_free(cachep, grp);
+       }
        i = sbi->s_group_info_size;
        rcu_read_lock();
        group_info = rcu_dereference(sbi->s_group_info);
@@ -3522,6 +3559,8 @@ int ext4_mb_release(struct super_block *sb)
                for (i = 0; i < ngroups; i++) {
                        cond_resched();
                        grinfo = ext4_get_group_info(sb, i);
+                       if (!grinfo)
+                               continue;
                        mb_group_bb_bitmap_free(grinfo);
                        ext4_lock_group(sb, i);
                        count = ext4_mb_cleanup_pa(grinfo);
@@ -4606,6 +4645,8 @@ static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap,
        struct ext4_free_data *entry;
 
        grp = ext4_get_group_info(sb, group);
+       if (!grp)
+               return;
        n = rb_first(&(grp->bb_free_root));
 
        while (n) {
@@ -4633,6 +4674,9 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
        int preallocated = 0;
        int len;
 
+       if (!grp)
+               return;
+
        /* all form of preallocation discards first load group,
         * so the only competing code is preallocation use.
         * we don't need any locking here
@@ -4869,6 +4913,8 @@ adjust_bex:
 
        ei = EXT4_I(ac->ac_inode);
        grp = ext4_get_group_info(sb, ac->ac_b_ex.fe_group);
+       if (!grp)
+               return;
 
        pa->pa_node_lock.inode_lock = &ei->i_prealloc_lock;
        pa->pa_inode = ac->ac_inode;
@@ -4918,6 +4964,8 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac)
        atomic_add(pa->pa_free, &EXT4_SB(sb)->s_mb_preallocated);
 
        grp = ext4_get_group_info(sb, ac->ac_b_ex.fe_group);
+       if (!grp)
+               return;
        lg = ac->ac_lg;
        BUG_ON(lg == NULL);
 
@@ -5013,7 +5061,11 @@ ext4_mb_release_group_pa(struct ext4_buddy *e4b,
        trace_ext4_mb_release_group_pa(sb, pa);
        BUG_ON(pa->pa_deleted == 0);
        ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, &bit);
-       BUG_ON(group != e4b->bd_group && pa->pa_len != 0);
+       if (unlikely(group != e4b->bd_group && pa->pa_len != 0)) {
+               ext4_warning(sb, "bad group: expected %u, group %u, pa_start %llu",
+                            e4b->bd_group, group, pa->pa_pstart);
+               return 0;
+       }
        mb_free_blocks(pa->pa_inode, e4b, bit, pa->pa_len);
        atomic_add(pa->pa_len, &EXT4_SB(sb)->s_mb_discarded);
        trace_ext4_mballoc_discard(sb, NULL, group, bit, pa->pa_len);
@@ -5043,6 +5095,8 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
        int err;
        int free = 0;
 
+       if (!grp)
+               return 0;
        mb_debug(sb, "discard preallocation for group %u\n", group);
        if (list_empty(&grp->bb_prealloc_list))
                goto out_dbg;
@@ -5297,6 +5351,9 @@ static inline void ext4_mb_show_pa(struct super_block *sb)
                struct ext4_prealloc_space *pa;
                ext4_grpblk_t start;
                struct list_head *cur;
+
+               if (!grp)
+                       continue;
                ext4_lock_group(sb, i);
                list_for_each(cur, &grp->bb_prealloc_list) {
                        pa = list_entry(cur, struct ext4_prealloc_space,
@@ -6064,6 +6121,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
        struct buffer_head *bitmap_bh = NULL;
        struct super_block *sb = inode->i_sb;
        struct ext4_group_desc *gdp;
+       struct ext4_group_info *grp;
        unsigned int overflow;
        ext4_grpblk_t bit;
        struct buffer_head *gd_bh;
@@ -6089,8 +6147,8 @@ do_more:
        overflow = 0;
        ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
 
-       if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(
-                       ext4_get_group_info(sb, block_group))))
+       grp = ext4_get_group_info(sb, block_group);
+       if (unlikely(!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp)))
                return;
 
        /*
@@ -6692,6 +6750,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 
        for (group = first_group; group <= last_group; group++) {
                grp = ext4_get_group_info(sb, group);
+               if (!grp)
+                       continue;
                /* We only do this if the grp has never been initialized */
                if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
                        ret = ext4_mb_init_group(sb, group, GFP_NOFS);
index a19a966..d98ac2a 100644 (file)
@@ -408,7 +408,6 @@ static int free_ext_block(handle_t *handle, struct inode *inode)
 
 int ext4_ext_migrate(struct inode *inode)
 {
-       struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
        handle_t *handle;
        int retval = 0, i;
        __le32 *i_data;
@@ -418,6 +417,7 @@ int ext4_ext_migrate(struct inode *inode)
        unsigned long max_entries;
        __u32 goal, tmp_csum_seed;
        uid_t owner[2];
+       int alloc_ctx;
 
        /*
         * If the filesystem does not support extents, or the inode
@@ -434,7 +434,7 @@ int ext4_ext_migrate(struct inode *inode)
                 */
                return retval;
 
-       percpu_down_write(&sbi->s_writepages_rwsem);
+       alloc_ctx = ext4_writepages_down_write(inode->i_sb);
 
        /*
         * Worst case we can touch the allocation bitmaps and a block
@@ -586,7 +586,7 @@ out_tmp_inode:
        unlock_new_inode(tmp_inode);
        iput(tmp_inode);
 out_unlock:
-       percpu_up_write(&sbi->s_writepages_rwsem);
+       ext4_writepages_up_write(inode->i_sb, alloc_ctx);
        return retval;
 }
 
@@ -605,6 +605,7 @@ int ext4_ind_migrate(struct inode *inode)
        ext4_fsblk_t                    blk;
        handle_t                        *handle;
        int                             ret, ret2 = 0;
+       int                             alloc_ctx;
 
        if (!ext4_has_feature_extents(inode->i_sb) ||
            (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
@@ -621,7 +622,7 @@ int ext4_ind_migrate(struct inode *inode)
        if (test_opt(inode->i_sb, DELALLOC))
                ext4_alloc_da_blocks(inode);
 
-       percpu_down_write(&sbi->s_writepages_rwsem);
+       alloc_ctx = ext4_writepages_down_write(inode->i_sb);
 
        handle = ext4_journal_start(inode, EXT4_HT_MIGRATE, 1);
        if (IS_ERR(handle)) {
@@ -665,6 +666,6 @@ errout:
        ext4_journal_stop(handle);
        up_write(&EXT4_I(inode)->i_data_sem);
 out_unlock:
-       percpu_up_write(&sbi->s_writepages_rwsem);
+       ext4_writepages_up_write(inode->i_sb, alloc_ctx);
        return ret;
 }
index 4022bc7..0aaf38f 100644 (file)
@@ -39,28 +39,36 @@ static void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp)
  * Write the MMP block using REQ_SYNC to try to get the block on-disk
  * faster.
  */
-static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+static int write_mmp_block_thawed(struct super_block *sb,
+                                 struct buffer_head *bh)
 {
        struct mmp_struct *mmp = (struct mmp_struct *)(bh->b_data);
 
-       /*
-        * We protect against freezing so that we don't create dirty buffers
-        * on frozen filesystem.
-        */
-       sb_start_write(sb);
        ext4_mmp_csum_set(sb, mmp);
        lock_buffer(bh);
        bh->b_end_io = end_buffer_write_sync;
        get_bh(bh);
        submit_bh(REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO, bh);
        wait_on_buffer(bh);
-       sb_end_write(sb);
        if (unlikely(!buffer_uptodate(bh)))
                return -EIO;
-
        return 0;
 }
 
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+{
+       int err;
+
+       /*
+        * We protect against freezing so that we don't create dirty buffers
+        * on frozen filesystem.
+        */
+       sb_start_write(sb);
+       err = write_mmp_block_thawed(sb, bh);
+       sb_end_write(sb);
+       return err;
+}
+
 /*
  * Read the MMP block. It _must_ be read from disk and hence we clear the
  * uptodate flag on the buffer.
@@ -344,7 +352,11 @@ skip:
        seq = mmp_new_seq();
        mmp->mmp_seq = cpu_to_le32(seq);
 
-       retval = write_mmp_block(sb, bh);
+       /*
+        * On mount / remount we are protected against fs freezing (by s_umount
+        * semaphore) and grabbing freeze protection upsets lockdep
+        */
+       retval = write_mmp_block_thawed(sb, bh);
        if (retval)
                goto failed;
 
index a5010b5..0caf6c7 100644 (file)
@@ -674,7 +674,7 @@ static struct stats dx_show_leaf(struct inode *dir,
                                len = de->name_len;
                                if (!IS_ENCRYPTED(dir)) {
                                        /* Directory is not encrypted */
-                                       ext4fs_dirhash(dir, de->name,
+                                       (void) ext4fs_dirhash(dir, de->name,
                                                de->name_len, &h);
                                        printk("%*.s:(U)%x.%u ", len,
                                               name, h.hash,
@@ -709,8 +709,9 @@ static struct stats dx_show_leaf(struct inode *dir,
                                        if (IS_CASEFOLDED(dir))
                                                h.hash = EXT4_DIRENT_HASH(de);
                                        else
-                                               ext4fs_dirhash(dir, de->name,
-                                                      de->name_len, &h);
+                                               (void) ext4fs_dirhash(dir,
+                                                       de->name,
+                                                       de->name_len, &h);
                                        printk("%*.s:(E)%x.%u ", len, name,
                                               h.hash, (unsigned) ((char *) de
                                                                   - base));
@@ -720,7 +721,8 @@ static struct stats dx_show_leaf(struct inode *dir,
 #else
                                int len = de->name_len;
                                char *name = de->name;
-                               ext4fs_dirhash(dir, de->name, de->name_len, &h);
+                               (void) ext4fs_dirhash(dir, de->name,
+                                                     de->name_len, &h);
                                printk("%*.s:%x.%u ", len, name, h.hash,
                                       (unsigned) ((char *) de - base));
 #endif
@@ -849,8 +851,14 @@ dx_probe(struct ext4_filename *fname, struct inode *dir,
        hinfo->seed = EXT4_SB(dir->i_sb)->s_hash_seed;
        /* hash is already computed for encrypted casefolded directory */
        if (fname && fname_name(fname) &&
-                               !(IS_ENCRYPTED(dir) && IS_CASEFOLDED(dir)))
-               ext4fs_dirhash(dir, fname_name(fname), fname_len(fname), hinfo);
+           !(IS_ENCRYPTED(dir) && IS_CASEFOLDED(dir))) {
+               int ret = ext4fs_dirhash(dir, fname_name(fname),
+                                        fname_len(fname), hinfo);
+               if (ret < 0) {
+                       ret_err = ERR_PTR(ret);
+                       goto fail;
+               }
+       }
        hash = hinfo->hash;
 
        if (root->info.unused_flags & 1) {
@@ -1111,7 +1119,12 @@ static int htree_dirblock_to_tree(struct file *dir_file,
                                hinfo->minor_hash = 0;
                        }
                } else {
-                       ext4fs_dirhash(dir, de->name, de->name_len, hinfo);
+                       err = ext4fs_dirhash(dir, de->name,
+                                            de->name_len, hinfo);
+                       if (err < 0) {
+                               count = err;
+                               goto errout;
+                       }
                }
                if ((hinfo->hash < start_hash) ||
                    ((hinfo->hash == start_hash) &&
@@ -1313,8 +1326,12 @@ static int dx_make_map(struct inode *dir, struct buffer_head *bh,
                if (de->name_len && de->inode) {
                        if (ext4_hash_in_dirent(dir))
                                h.hash = EXT4_DIRENT_HASH(de);
-                       else
-                               ext4fs_dirhash(dir, de->name, de->name_len, &h);
+                       else {
+                               int err = ext4fs_dirhash(dir, de->name,
+                                                    de->name_len, &h);
+                               if (err < 0)
+                                       return err;
+                       }
                        map_tail--;
                        map_tail->hash = h.hash;
                        map_tail->offs = ((char *) de - base)>>2;
@@ -1452,10 +1469,9 @@ int ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
        hinfo->hash_version = DX_HASH_SIPHASH;
        hinfo->seed = NULL;
        if (cf_name->name)
-               ext4fs_dirhash(dir, cf_name->name, cf_name->len, hinfo);
+               return ext4fs_dirhash(dir, cf_name->name, cf_name->len, hinfo);
        else
-               ext4fs_dirhash(dir, iname->name, iname->len, hinfo);
-       return 0;
+               return ext4fs_dirhash(dir, iname->name, iname->len, hinfo);
 }
 #endif
 
@@ -2298,10 +2314,15 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname,
        fname->hinfo.seed = EXT4_SB(dir->i_sb)->s_hash_seed;
 
        /* casefolded encrypted hashes are computed on fname setup */
-       if (!ext4_hash_in_dirent(dir))
-               ext4fs_dirhash(dir, fname_name(fname),
-                               fname_len(fname), &fname->hinfo);
-
+       if (!ext4_hash_in_dirent(dir)) {
+               int err = ext4fs_dirhash(dir, fname_name(fname),
+                                        fname_len(fname), &fname->hinfo);
+               if (err < 0) {
+                       brelse(bh2);
+                       brelse(bh);
+                       return err;
+               }
+       }
        memset(frames, 0, sizeof(frames));
        frame = frames;
        frame->entries = entries;
@@ -3813,19 +3834,10 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
                        return retval;
        }
 
-       /*
-        * We need to protect against old.inode directory getting converted
-        * from inline directory format into a normal one.
-        */
-       if (S_ISDIR(old.inode->i_mode))
-               inode_lock_nested(old.inode, I_MUTEX_NONDIR2);
-
        old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de,
                                 &old.inlined);
-       if (IS_ERR(old.bh)) {
-               retval = PTR_ERR(old.bh);
-               goto unlock_moved_dir;
-       }
+       if (IS_ERR(old.bh))
+               return PTR_ERR(old.bh);
 
        /*
         *  Check for inode number is _not_ due to possible IO errors.
@@ -4022,10 +4034,6 @@ release_bh:
        brelse(old.bh);
        brelse(new.bh);
 
-unlock_moved_dir:
-       if (S_ISDIR(old.inode->i_mode))
-               inode_unlock(old.inode);
-
        return retval;
 }
 
index d39f386..eaa5858 100644 (file)
@@ -1048,6 +1048,8 @@ void ext4_mark_group_bitmap_corrupted(struct super_block *sb,
        struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, NULL);
        int ret;
 
+       if (!grp || !gdp)
+               return;
        if (flags & EXT4_GROUP_INFO_BBITMAP_CORRUPT) {
                ret = ext4_test_and_set_bit(EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT,
                                            &grp->bb_state);
@@ -1094,6 +1096,15 @@ void ext4_update_dynamic_rev(struct super_block *sb)
         */
 }
 
+static void ext4_bdev_mark_dead(struct block_device *bdev)
+{
+       ext4_force_shutdown(bdev->bd_holder, EXT4_GOING_FLAGS_NOLOGFLUSH);
+}
+
+static const struct blk_holder_ops ext4_holder_ops = {
+       .mark_dead              = ext4_bdev_mark_dead,
+};
+
 /*
  * Open the external journal device
  */
@@ -1101,7 +1112,8 @@ static struct block_device *ext4_blkdev_get(dev_t dev, struct super_block *sb)
 {
        struct block_device *bdev;
 
-       bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb);
+       bdev = blkdev_get_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_WRITE, sb,
+                                &ext4_holder_ops);
        if (IS_ERR(bdev))
                goto fail;
        return bdev;
@@ -1116,17 +1128,12 @@ fail:
 /*
  * Release the journal device
  */
-static void ext4_blkdev_put(struct block_device *bdev)
-{
-       blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
-}
-
 static void ext4_blkdev_remove(struct ext4_sb_info *sbi)
 {
        struct block_device *bdev;
        bdev = sbi->s_journal_bdev;
        if (bdev) {
-               ext4_blkdev_put(bdev);
+               blkdev_put(bdev, sbi->s_sb);
                sbi->s_journal_bdev = NULL;
        }
 }
@@ -1447,6 +1454,11 @@ static void ext4_destroy_inode(struct inode *inode)
                         EXT4_I(inode)->i_reserved_data_blocks);
 }
 
+static void ext4_shutdown(struct super_block *sb)
+{
+       ext4_force_shutdown(sb, EXT4_GOING_FLAGS_NOLOGFLUSH);
+}
+
 static void init_once(void *foo)
 {
        struct ext4_inode_info *ei = foo;
@@ -1607,6 +1619,7 @@ static const struct super_operations ext4_sops = {
        .unfreeze_fs    = ext4_unfreeze,
        .statfs         = ext4_statfs,
        .show_options   = ext4_show_options,
+       .shutdown       = ext4_shutdown,
 #ifdef CONFIG_QUOTA
        .quota_read     = ext4_quota_read,
        .quota_write    = ext4_quota_write,
@@ -3238,11 +3251,9 @@ static __le16 ext4_group_desc_csum(struct super_block *sb, __u32 block_group,
        crc = crc16(crc, (__u8 *)gdp, offset);
        offset += sizeof(gdp->bg_checksum); /* skip checksum */
        /* for checksum of struct ext4_group_desc do the rest...*/
-       if (ext4_has_feature_64bit(sb) &&
-           offset < le16_to_cpu(sbi->s_es->s_desc_size))
+       if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
                crc = crc16(crc, (__u8 *)gdp + offset,
-                           le16_to_cpu(sbi->s_es->s_desc_size) -
-                               offset);
+                           sbi->s_desc_size - offset);
 
 out:
        return cpu_to_le16(crc);
@@ -5684,8 +5695,9 @@ static int ext4_fill_super(struct super_block *sb, struct fs_context *fc)
                descr = "out journal";
 
        if (___ratelimit(&ext4_mount_msg_ratelimit, "EXT4-fs mount"))
-               ext4_msg(sb, KERN_INFO, "mounted filesystem %pU with%s. "
-                        "Quota mode: %s.", &sb->s_uuid, descr,
+               ext4_msg(sb, KERN_INFO, "mounted filesystem %pU %s with%s. "
+                        "Quota mode: %s.", &sb->s_uuid,
+                        sb_rdonly(sb) ? "ro" : "r/w", descr,
                         ext4_quota_mode(sb));
 
        /* Update the s_overhead_clusters if necessary */
@@ -5898,7 +5910,7 @@ static journal_t *ext4_get_dev_journal(struct super_block *sb,
 out_journal:
        jbd2_journal_destroy(journal);
 out_bdev:
-       ext4_blkdev_put(bdev);
+       blkdev_put(bdev, sb);
        return NULL;
 }
 
@@ -6587,18 +6599,6 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
        }
 
        /*
-        * Reinitialize lazy itable initialization thread based on
-        * current settings
-        */
-       if (sb_rdonly(sb) || !test_opt(sb, INIT_INODE_TABLE))
-               ext4_unregister_li_request(sb);
-       else {
-               ext4_group_t first_not_zeroed;
-               first_not_zeroed = ext4_has_uninit_itable(sb);
-               ext4_register_li_request(sb, first_not_zeroed);
-       }
-
-       /*
         * Handle creation of system zone data early because it can fail.
         * Releasing of existing data is done when we are sure remount will
         * succeed.
@@ -6616,9 +6616,6 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
        }
 
 #ifdef CONFIG_QUOTA
-       /* Release old quota file names */
-       for (i = 0; i < EXT4_MAXQUOTAS; i++)
-               kfree(old_opts.s_qf_names[i]);
        if (enable_quota) {
                if (sb_any_quota_suspended(sb))
                        dquot_resume(sb, -1);
@@ -6628,16 +6625,38 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
                                goto restore_opts;
                }
        }
+       /* Release old quota file names */
+       for (i = 0; i < EXT4_MAXQUOTAS; i++)
+               kfree(old_opts.s_qf_names[i]);
 #endif
        if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
                ext4_release_system_zone(sb);
 
+       /*
+        * Reinitialize lazy itable initialization thread based on
+        * current settings
+        */
+       if (sb_rdonly(sb) || !test_opt(sb, INIT_INODE_TABLE))
+               ext4_unregister_li_request(sb);
+       else {
+               ext4_group_t first_not_zeroed;
+               first_not_zeroed = ext4_has_uninit_itable(sb);
+               ext4_register_li_request(sb, first_not_zeroed);
+       }
+
        if (!ext4_has_feature_mmp(sb) || sb_rdonly(sb))
                ext4_stop_mmpd(sbi);
 
        return 0;
 
 restore_opts:
+       /*
+        * If there was a failing r/w to ro transition, we may need to
+        * re-enable quota
+        */
+       if ((sb->s_flags & SB_RDONLY) && !(old_sb_flags & SB_RDONLY) &&
+           sb_any_quota_suspended(sb))
+               dquot_resume(sb, -1);
        sb->s_flags = old_sb_flags;
        sbi->s_mount_opt = old_opts.s_mount_opt;
        sbi->s_mount_opt2 = old_opts.s_mount_opt2;
@@ -6678,8 +6697,9 @@ static int ext4_reconfigure(struct fs_context *fc)
        if (ret < 0)
                return ret;
 
-       ext4_msg(sb, KERN_INFO, "re-mounted %pU. Quota mode: %s.",
-                &sb->s_uuid, ext4_quota_mode(sb));
+       ext4_msg(sb, KERN_INFO, "re-mounted %pU %s. Quota mode: %s.",
+                &sb->s_uuid, sb_rdonly(sb) ? "ro" : "r/w",
+                ext4_quota_mode(sb));
 
        return 0;
 }
index dadad29..321e3a8 100644 (file)
@@ -121,7 +121,11 @@ ext4_expand_inode_array(struct ext4_xattr_inode_array **ea_inode_array,
 #ifdef CONFIG_LOCKDEP
 void ext4_xattr_inode_set_class(struct inode *ea_inode)
 {
+       struct ext4_inode_info *ei = EXT4_I(ea_inode);
+
        lockdep_set_subclass(&ea_inode->i_rwsem, 1);
+       (void) ei;      /* shut up clang warning if !CONFIG_LOCKDEP */
+       lockdep_set_subclass(&ei->i_data_sem, I_DATA_SEM_EA);
 }
 #endif
 
@@ -433,7 +437,7 @@ static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino,
                return -EFSCORRUPTED;
        }
 
-       inode = ext4_iget(parent->i_sb, ea_ino, EXT4_IGET_NORMAL);
+       inode = ext4_iget(parent->i_sb, ea_ino, EXT4_IGET_EA_INODE);
        if (IS_ERR(inode)) {
                err = PTR_ERR(inode);
                ext4_error(parent->i_sb,
@@ -441,23 +445,6 @@ static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino,
                           err);
                return err;
        }
-
-       if (is_bad_inode(inode)) {
-               ext4_error(parent->i_sb,
-                          "error while reading EA inode %lu is_bad_inode",
-                          ea_ino);
-               err = -EIO;
-               goto error;
-       }
-
-       if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) {
-               ext4_error(parent->i_sb,
-                          "EA inode %lu does not have EXT4_EA_INODE_FL flag",
-                           ea_ino);
-               err = -EINVAL;
-               goto error;
-       }
-
        ext4_xattr_inode_set_class(inode);
 
        /*
@@ -478,9 +465,6 @@ static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino,
 
        *ea_inode = inode;
        return 0;
-error:
-       iput(inode);
-       return err;
 }
 
 /* Remove entry from mbcache when EA inode is getting evicted */
@@ -1556,11 +1540,11 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value,
 
        while (ce) {
                ea_inode = ext4_iget(inode->i_sb, ce->e_value,
-                                    EXT4_IGET_NORMAL);
-               if (!IS_ERR(ea_inode) &&
-                   !is_bad_inode(ea_inode) &&
-                   (EXT4_I(ea_inode)->i_flags & EXT4_EA_INODE_FL) &&
-                   i_size_read(ea_inode) == value_len &&
+                                    EXT4_IGET_EA_INODE);
+               if (IS_ERR(ea_inode))
+                       goto next_entry;
+               ext4_xattr_inode_set_class(ea_inode);
+               if (i_size_read(ea_inode) == value_len &&
                    !ext4_xattr_inode_read(ea_inode, ea_data, value_len) &&
                    !ext4_xattr_inode_verify_hashes(ea_inode, NULL, ea_data,
                                                    value_len) &&
@@ -1570,9 +1554,8 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value,
                        kvfree(ea_data);
                        return ea_inode;
                }
-
-               if (!IS_ERR(ea_inode))
-                       iput(ea_inode);
+               iput(ea_inode);
+       next_entry:
                ce = mb_cache_entry_find_next(ea_inode_cache, ce);
        }
        kvfree(ea_data);
@@ -2073,8 +2056,9 @@ inserted:
                        else {
                                u32 ref;
 
+#ifdef EXT4_XATTR_DEBUG
                                WARN_ON_ONCE(dquot_initialize_needed(inode));
-
+#endif
                                /* The old block is released after updating
                                   the inode. */
                                error = dquot_alloc_block(inode,
@@ -2137,8 +2121,9 @@ inserted:
                        /* We need to allocate a new block */
                        ext4_fsblk_t goal, block;
 
+#ifdef EXT4_XATTR_DEBUG
                        WARN_ON_ONCE(dquot_initialize_needed(inode));
-
+#endif
                        goal = ext4_group_first_block_no(sb,
                                                EXT4_I(inode)->i_block_group);
                        block = ext4_new_meta_blocks(handle, inode, goal, 0,
@@ -2614,6 +2599,7 @@ static int ext4_xattr_move_to_block(handle_t *handle, struct inode *inode,
                .in_inode = !!entry->e_value_inum,
        };
        struct ext4_xattr_ibody_header *header = IHDR(inode, raw_inode);
+       int needs_kvfree = 0;
        int error;
 
        is = kzalloc(sizeof(struct ext4_xattr_ibody_find), GFP_NOFS);
@@ -2636,7 +2622,7 @@ static int ext4_xattr_move_to_block(handle_t *handle, struct inode *inode,
                        error = -ENOMEM;
                        goto out;
                }
-
+               needs_kvfree = 1;
                error = ext4_xattr_inode_get(inode, entry, buffer, value_size);
                if (error)
                        goto out;
@@ -2675,7 +2661,7 @@ static int ext4_xattr_move_to_block(handle_t *handle, struct inode *inode,
 
 out:
        kfree(b_entry_name);
-       if (entry->e_value_inum && buffer)
+       if (needs_kvfree && buffer)
                kvfree(buffer);
        if (is)
                brelse(is->iloc.bh);
index 5ac53d2..3fce122 100644 (file)
@@ -4367,22 +4367,23 @@ out:
        return ret;
 }
 
-static void f2fs_trace_rw_file_path(struct kiocb *iocb, size_t count, int rw)
+static void f2fs_trace_rw_file_path(struct file *file, loff_t pos, size_t count,
+                                   int rw)
 {
-       struct inode *inode = file_inode(iocb->ki_filp);
+       struct inode *inode = file_inode(file);
        char *buf, *path;
 
        buf = f2fs_getname(F2FS_I_SB(inode));
        if (!buf)
                return;
-       path = dentry_path_raw(file_dentry(iocb->ki_filp), buf, PATH_MAX);
+       path = dentry_path_raw(file_dentry(file), buf, PATH_MAX);
        if (IS_ERR(path))
                goto free_buf;
        if (rw == WRITE)
-               trace_f2fs_datawrite_start(inode, iocb->ki_pos, count,
+               trace_f2fs_datawrite_start(inode, pos, count,
                                current->pid, path, current->comm);
        else
-               trace_f2fs_dataread_start(inode, iocb->ki_pos, count,
+               trace_f2fs_dataread_start(inode, pos, count,
                                current->pid, path, current->comm);
 free_buf:
        f2fs_putname(buf);
@@ -4398,7 +4399,8 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
                return -EOPNOTSUPP;
 
        if (trace_f2fs_dataread_start_enabled())
-               f2fs_trace_rw_file_path(iocb, iov_iter_count(to), READ);
+               f2fs_trace_rw_file_path(iocb->ki_filp, iocb->ki_pos,
+                                       iov_iter_count(to), READ);
 
        if (f2fs_should_use_dio(inode, iocb, to)) {
                ret = f2fs_dio_read_iter(iocb, to);
@@ -4413,6 +4415,30 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
        return ret;
 }
 
+static ssize_t f2fs_file_splice_read(struct file *in, loff_t *ppos,
+                                    struct pipe_inode_info *pipe,
+                                    size_t len, unsigned int flags)
+{
+       struct inode *inode = file_inode(in);
+       const loff_t pos = *ppos;
+       ssize_t ret;
+
+       if (!f2fs_is_compress_backend_ready(inode))
+               return -EOPNOTSUPP;
+
+       if (trace_f2fs_dataread_start_enabled())
+               f2fs_trace_rw_file_path(in, pos, len, READ);
+
+       ret = filemap_splice_read(in, ppos, pipe, len, flags);
+       if (ret > 0)
+               f2fs_update_iostat(F2FS_I_SB(inode), inode,
+                                  APP_BUFFERED_READ_IO, ret);
+
+       if (trace_f2fs_dataread_end_enabled())
+               trace_f2fs_dataread_end(inode, pos, ret);
+       return ret;
+}
+
 static ssize_t f2fs_write_checks(struct kiocb *iocb, struct iov_iter *from)
 {
        struct file *file = iocb->ki_filp;
@@ -4714,7 +4740,8 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
                ret = preallocated;
        } else {
                if (trace_f2fs_datawrite_start_enabled())
-                       f2fs_trace_rw_file_path(iocb, orig_count, WRITE);
+                       f2fs_trace_rw_file_path(iocb->ki_filp, iocb->ki_pos,
+                                               orig_count, WRITE);
 
                /* Do the actual write. */
                ret = dio ?
@@ -4919,7 +4946,7 @@ const struct file_operations f2fs_file_operations = {
 #ifdef CONFIG_COMPAT
        .compat_ioctl   = f2fs_compat_ioctl,
 #endif
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = f2fs_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .fadvise        = f2fs_file_fadvise,
 };
index 77a7127..ad597b4 100644 (file)
@@ -995,20 +995,12 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
                        goto out;
        }
 
-       /*
-        * Copied from ext4_rename: we need to protect against old.inode
-        * directory getting converted from inline directory format into
-        * a normal one.
-        */
-       if (S_ISDIR(old_inode->i_mode))
-               inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
-
        err = -ENOENT;
        old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page);
        if (!old_entry) {
                if (IS_ERR(old_page))
                        err = PTR_ERR(old_page);
-               goto out_unlock_old;
+               goto out;
        }
 
        if (S_ISDIR(old_inode->i_mode)) {
@@ -1116,9 +1108,6 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 
        f2fs_unlock_op(sbi);
 
-       if (S_ISDIR(old_inode->i_mode))
-               inode_unlock(old_inode);
-
        if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
                f2fs_sync_fs(sbi->sb, 1);
 
@@ -1133,9 +1122,6 @@ out_dir:
                f2fs_put_page(old_dir_page, 0);
 out_old:
        f2fs_put_page(old_page, 0);
-out_unlock_old:
-       if (S_ISDIR(old_inode->i_mode))
-               inode_unlock(old_inode);
 out:
        iput(whiteout);
        return err;
index 9f15b03..e34197a 100644 (file)
@@ -1538,7 +1538,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
        int i;
 
        for (i = 0; i < sbi->s_ndevs; i++) {
-               blkdev_put(FDEV(i).bdev, FMODE_EXCL);
+               blkdev_put(FDEV(i).bdev, sbi->sb->s_type);
 #ifdef CONFIG_BLK_DEV_ZONED
                kvfree(FDEV(i).blkz_seq);
 #endif
@@ -3993,6 +3993,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
        struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
        unsigned int max_devices = MAX_DEVICES;
        unsigned int logical_blksize;
+       blk_mode_t mode = sb_open_mode(sbi->sb->s_flags);
        int i;
 
        /* Initialize single device information */
@@ -4024,8 +4025,8 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
                if (max_devices == 1) {
                        /* Single zoned block device mount */
                        FDEV(0).bdev =
-                               blkdev_get_by_dev(sbi->sb->s_bdev->bd_dev,
-                                       sbi->sb->s_mode, sbi->sb->s_type);
+                               blkdev_get_by_dev(sbi->sb->s_bdev->bd_dev, mode,
+                                                 sbi->sb->s_type, NULL);
                } else {
                        /* Multi-device mount */
                        memcpy(FDEV(i).path, RDEV(i).path, MAX_PATH_LEN);
@@ -4043,8 +4044,9 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
                                        (FDEV(i).total_segments <<
                                        sbi->log_blocks_per_seg) - 1;
                        }
-                       FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path,
-                                       sbi->sb->s_mode, sbi->sb->s_type);
+                       FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path, mode,
+                                                         sbi->sb->s_type,
+                                                         NULL);
                }
                if (IS_ERR(FDEV(i).bdev))
                        return PTR_ERR(FDEV(i).bdev);
index 795a4fa..4564779 100644 (file)
@@ -209,7 +209,7 @@ const struct file_operations fat_file_operations = {
        .unlocked_ioctl = fat_generic_ioctl,
        .compat_ioctl   = compat_ptr_ioctl,
        .fsync          = fat_file_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
        .fallocate      = fat_fallocate,
 };
index 372653b..e06c68e 100644 (file)
@@ -44,18 +44,40 @@ static struct kmem_cache *filp_cachep __read_mostly;
 
 static struct percpu_counter nr_files __cacheline_aligned_in_smp;
 
+/* Container for backing file with optional real path */
+struct backing_file {
+       struct file file;
+       struct path real_path;
+};
+
+static inline struct backing_file *backing_file(struct file *f)
+{
+       return container_of(f, struct backing_file, file);
+}
+
+struct path *backing_file_real_path(struct file *f)
+{
+       return &backing_file(f)->real_path;
+}
+EXPORT_SYMBOL_GPL(backing_file_real_path);
+
 static void file_free_rcu(struct rcu_head *head)
 {
        struct file *f = container_of(head, struct file, f_rcuhead);
 
        put_cred(f->f_cred);
-       kmem_cache_free(filp_cachep, f);
+       if (unlikely(f->f_mode & FMODE_BACKING))
+               kfree(backing_file(f));
+       else
+               kmem_cache_free(filp_cachep, f);
 }
 
 static inline void file_free(struct file *f)
 {
        security_file_free(f);
-       if (!(f->f_mode & FMODE_NOACCOUNT))
+       if (unlikely(f->f_mode & FMODE_BACKING))
+               path_put(backing_file_real_path(f));
+       if (likely(!(f->f_mode & FMODE_NOACCOUNT)))
                percpu_counter_dec(&nr_files);
        call_rcu(&f->f_rcuhead, file_free_rcu);
 }
@@ -131,20 +153,15 @@ static int __init init_fs_stat_sysctls(void)
 fs_initcall(init_fs_stat_sysctls);
 #endif
 
-static struct file *__alloc_file(int flags, const struct cred *cred)
+static int init_file(struct file *f, int flags, const struct cred *cred)
 {
-       struct file *f;
        int error;
 
-       f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL);
-       if (unlikely(!f))
-               return ERR_PTR(-ENOMEM);
-
        f->f_cred = get_cred(cred);
        error = security_file_alloc(f);
        if (unlikely(error)) {
                file_free_rcu(&f->f_rcuhead);
-               return ERR_PTR(error);
+               return error;
        }
 
        atomic_long_set(&f->f_count, 1);
@@ -155,7 +172,7 @@ static struct file *__alloc_file(int flags, const struct cred *cred)
        f->f_mode = OPEN_FMODE(flags);
        /* f->f_version: 0 */
 
-       return f;
+       return 0;
 }
 
 /* Find an unused file structure and return a pointer to it.
@@ -172,6 +189,7 @@ struct file *alloc_empty_file(int flags, const struct cred *cred)
 {
        static long old_max;
        struct file *f;
+       int error;
 
        /*
         * Privileged users can go above max_files
@@ -185,9 +203,15 @@ struct file *alloc_empty_file(int flags, const struct cred *cred)
                        goto over;
        }
 
-       f = __alloc_file(flags, cred);
-       if (!IS_ERR(f))
-               percpu_counter_inc(&nr_files);
+       f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL);
+       if (unlikely(!f))
+               return ERR_PTR(-ENOMEM);
+
+       error = init_file(f, flags, cred);
+       if (unlikely(error))
+               return ERR_PTR(error);
+
+       percpu_counter_inc(&nr_files);
 
        return f;
 
@@ -203,18 +227,51 @@ over:
 /*
  * Variant of alloc_empty_file() that doesn't check and modify nr_files.
  *
- * Should not be used unless there's a very good reason to do so.
+ * This is only for kernel internal use, and the allocate file must not be
+ * installed into file tables or such.
  */
 struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred)
 {
-       struct file *f = __alloc_file(flags, cred);
+       struct file *f;
+       int error;
 
-       if (!IS_ERR(f))
-               f->f_mode |= FMODE_NOACCOUNT;
+       f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL);
+       if (unlikely(!f))
+               return ERR_PTR(-ENOMEM);
+
+       error = init_file(f, flags, cred);
+       if (unlikely(error))
+               return ERR_PTR(error);
+
+       f->f_mode |= FMODE_NOACCOUNT;
 
        return f;
 }
 
+/*
+ * Variant of alloc_empty_file() that allocates a backing_file container
+ * and doesn't check and modify nr_files.
+ *
+ * This is only for kernel internal use, and the allocate file must not be
+ * installed into file tables or such.
+ */
+struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
+{
+       struct backing_file *ff;
+       int error;
+
+       ff = kzalloc(sizeof(struct backing_file), GFP_KERNEL);
+       if (unlikely(!ff))
+               return ERR_PTR(-ENOMEM);
+
+       error = init_file(&ff->file, flags, cred);
+       if (unlikely(error))
+               return ERR_PTR(error);
+
+       ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT;
+       return &ff->file;
+}
+
 /**
  * alloc_file - allocate and initialize a 'struct file'
  *
index 24ce12f..851214d 100644 (file)
@@ -561,7 +561,8 @@ static int legacy_parse_param(struct fs_context *fc, struct fs_parameter *param)
                        return -ENOMEM;
        }
 
-       ctx->legacy_data[size++] = ',';
+       if (size)
+               ctx->legacy_data[size++] = ',';
        len = strlen(param->key);
        memcpy(ctx->legacy_data + size, param->key, len);
        size += len;
index 89d97f6..4553124 100644 (file)
@@ -3252,7 +3252,7 @@ static const struct file_operations fuse_file_operations = {
        .lock           = fuse_file_lock,
        .get_unmapped_area = thp_get_unmapped_area,
        .flock          = fuse_file_flock,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
        .unlocked_ioctl = fuse_file_ioctl,
        .compat_ioctl   = fuse_file_compat_ioctl,
index 300844f..1d679a3 100644 (file)
@@ -784,9 +784,13 @@ static inline bool should_fault_in_pages(struct iov_iter *i,
        if (!user_backed_iter(i))
                return false;
 
+       /*
+        * Try to fault in multiple pages initially.  When that doesn't result
+        * in any progress, fall back to a single page.
+        */
        size = PAGE_SIZE;
        offs = offset_in_page(iocb->ki_pos);
-       if (*prev_count != count || !*window_size) {
+       if (*prev_count != count) {
                size_t nr_dirtied;
 
                nr_dirtied = max(current->nr_dirtied_pause -
@@ -870,6 +874,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
        struct gfs2_inode *ip = GFS2_I(inode);
        size_t prev_count = 0, window_size = 0;
        size_t written = 0;
+       bool enough_retries;
        ssize_t ret;
 
        /*
@@ -913,11 +918,17 @@ retry:
        if (ret > 0)
                written = ret;
 
+       enough_retries = prev_count == iov_iter_count(from) &&
+                        window_size <= PAGE_SIZE;
        if (should_fault_in_pages(from, iocb, &prev_count, &window_size)) {
                gfs2_glock_dq(gh);
                window_size -= fault_in_iov_iter_readable(from, window_size);
-               if (window_size)
-                       goto retry;
+               if (window_size) {
+                       if (!enough_retries)
+                               goto retry;
+                       /* fall back to buffered I/O */
+                       ret = 0;
+               }
        }
 out_unlock:
        if (gfs2_holder_queued(gh))
@@ -1568,7 +1579,7 @@ const struct file_operations gfs2_file_fops = {
        .fsync          = gfs2_fsync,
        .lock           = gfs2_lock,
        .flock          = gfs2_flock,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = gfs2_file_splice_write,
        .setlease       = simple_nosetlease,
        .fallocate      = gfs2_fallocate,
@@ -1599,7 +1610,7 @@ const struct file_operations gfs2_file_fops_nolock = {
        .open           = gfs2_open,
        .release        = gfs2_release,
        .fsync          = gfs2_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = gfs2_file_splice_write,
        .setlease       = generic_setlease,
        .fallocate      = gfs2_fallocate,
index 9af9ddb..cd96298 100644 (file)
@@ -254,7 +254,7 @@ static int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector, int silent)
 
        bio = bio_alloc(sb->s_bdev, 1, REQ_OP_READ | REQ_META, GFP_NOFS);
        bio->bi_iter.bi_sector = sector * (sb->s_blocksize >> 9);
-       bio_add_page(bio, page, PAGE_SIZE, 0);
+       __bio_add_page(bio, page, PAGE_SIZE, 0);
 
        bio->bi_end_io = end_bio_io_page;
        bio->bi_private = page;
index 5eed8c2..a84bf64 100644 (file)
@@ -1419,6 +1419,14 @@ static void gfs2_evict_inode(struct inode *inode)
        if (inode->i_nlink || sb_rdonly(sb) || !ip->i_no_addr)
                goto out;
 
+       /*
+        * In case of an incomplete mount, gfs2_evict_inode() may be called for
+        * system files without having an active journal to write to.  In that
+        * case, skip the filesystem evict.
+        */
+       if (!sdp->sd_jdesc)
+               goto out;
+
        gfs2_holder_mark_uninitialized(&gh);
        ret = evict_should_delete(inode, &gh);
        if (ret == SHOULD_DEFER_EVICTION)
index 1f7bd06..441d7fc 100644 (file)
@@ -694,7 +694,7 @@ static const struct file_operations hfs_file_operations = {
        .read_iter      = generic_file_read_iter,
        .write_iter     = generic_file_write_iter,
        .mmap           = generic_file_mmap,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .fsync          = hfs_file_fsync,
        .open           = hfs_file_open,
        .release        = hfs_file_release,
index b216604..7d1a675 100644 (file)
@@ -372,7 +372,7 @@ static const struct file_operations hfsplus_file_operations = {
        .read_iter      = generic_file_read_iter,
        .write_iter     = generic_file_write_iter,
        .mmap           = generic_file_mmap,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .fsync          = hfsplus_file_fsync,
        .open           = hfsplus_file_open,
        .release        = hfsplus_file_release,
index 69cb796..0239e3a 100644 (file)
@@ -65,6 +65,7 @@ struct hostfs_stat {
        unsigned long long blocks;
        unsigned int maj;
        unsigned int min;
+       dev_t dev;
 };
 
 extern int stat_file(const char *path, struct hostfs_stat *p, int fd);
index 28b4f15..4638709 100644 (file)
@@ -26,6 +26,7 @@ struct hostfs_inode_info {
        fmode_t mode;
        struct inode vfs_inode;
        struct mutex open_mutex;
+       dev_t dev;
 };
 
 static inline struct hostfs_inode_info *HOSTFS_I(struct inode *inode)
@@ -182,14 +183,6 @@ static char *follow_link(char *link)
        return ERR_PTR(n);
 }
 
-static struct inode *hostfs_iget(struct super_block *sb)
-{
-       struct inode *inode = new_inode(sb);
-       if (!inode)
-               return ERR_PTR(-ENOMEM);
-       return inode;
-}
-
 static int hostfs_statfs(struct dentry *dentry, struct kstatfs *sf)
 {
        /*
@@ -228,6 +221,7 @@ static struct inode *hostfs_alloc_inode(struct super_block *sb)
                return NULL;
        hi->fd = -1;
        hi->mode = 0;
+       hi->dev = 0;
        inode_init_once(&hi->vfs_inode);
        mutex_init(&hi->open_mutex);
        return &hi->vfs_inode;
@@ -240,6 +234,7 @@ static void hostfs_evict_inode(struct inode *inode)
        if (HOSTFS_I(inode)->fd != -1) {
                close_file(&HOSTFS_I(inode)->fd);
                HOSTFS_I(inode)->fd = -1;
+               HOSTFS_I(inode)->dev = 0;
        }
 }
 
@@ -265,6 +260,7 @@ static int hostfs_show_options(struct seq_file *seq, struct dentry *root)
 static const struct super_operations hostfs_sbops = {
        .alloc_inode    = hostfs_alloc_inode,
        .free_inode     = hostfs_free_inode,
+       .drop_inode     = generic_delete_inode,
        .evict_inode    = hostfs_evict_inode,
        .statfs         = hostfs_statfs,
        .show_options   = hostfs_show_options,
@@ -381,7 +377,7 @@ static int hostfs_fsync(struct file *file, loff_t start, loff_t end,
 
 static const struct file_operations hostfs_file_fops = {
        .llseek         = generic_file_llseek,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
        .read_iter      = generic_file_read_iter,
        .write_iter     = generic_file_write_iter,
@@ -512,18 +508,31 @@ static const struct address_space_operations hostfs_aops = {
        .write_end      = hostfs_write_end,
 };
 
-static int read_name(struct inode *ino, char *name)
+static int hostfs_inode_update(struct inode *ino, const struct hostfs_stat *st)
+{
+       set_nlink(ino, st->nlink);
+       i_uid_write(ino, st->uid);
+       i_gid_write(ino, st->gid);
+       ino->i_atime =
+               (struct timespec64){ st->atime.tv_sec, st->atime.tv_nsec };
+       ino->i_mtime =
+               (struct timespec64){ st->mtime.tv_sec, st->mtime.tv_nsec };
+       ino->i_ctime =
+               (struct timespec64){ st->ctime.tv_sec, st->ctime.tv_nsec };
+       ino->i_size = st->size;
+       ino->i_blocks = st->blocks;
+       return 0;
+}
+
+static int hostfs_inode_set(struct inode *ino, void *data)
 {
+       struct hostfs_stat *st = data;
        dev_t rdev;
-       struct hostfs_stat st;
-       int err = stat_file(name, &st, -1);
-       if (err)
-               return err;
 
        /* Reencode maj and min with the kernel encoding.*/
-       rdev = MKDEV(st.maj, st.min);
+       rdev = MKDEV(st->maj, st->min);
 
-       switch (st.mode & S_IFMT) {
+       switch (st->mode & S_IFMT) {
        case S_IFLNK:
                ino->i_op = &hostfs_link_iops;
                break;
@@ -535,7 +544,7 @@ static int read_name(struct inode *ino, char *name)
        case S_IFBLK:
        case S_IFIFO:
        case S_IFSOCK:
-               init_special_inode(ino, st.mode & S_IFMT, rdev);
+               init_special_inode(ino, st->mode & S_IFMT, rdev);
                ino->i_op = &hostfs_iops;
                break;
        case S_IFREG:
@@ -547,17 +556,42 @@ static int read_name(struct inode *ino, char *name)
                return -EIO;
        }
 
-       ino->i_ino = st.ino;
-       ino->i_mode = st.mode;
-       set_nlink(ino, st.nlink);
-       i_uid_write(ino, st.uid);
-       i_gid_write(ino, st.gid);
-       ino->i_atime = (struct timespec64){ st.atime.tv_sec, st.atime.tv_nsec };
-       ino->i_mtime = (struct timespec64){ st.mtime.tv_sec, st.mtime.tv_nsec };
-       ino->i_ctime = (struct timespec64){ st.ctime.tv_sec, st.ctime.tv_nsec };
-       ino->i_size = st.size;
-       ino->i_blocks = st.blocks;
-       return 0;
+       HOSTFS_I(ino)->dev = st->dev;
+       ino->i_ino = st->ino;
+       ino->i_mode = st->mode;
+       return hostfs_inode_update(ino, st);
+}
+
+static int hostfs_inode_test(struct inode *inode, void *data)
+{
+       const struct hostfs_stat *st = data;
+
+       return inode->i_ino == st->ino && HOSTFS_I(inode)->dev == st->dev;
+}
+
+static struct inode *hostfs_iget(struct super_block *sb, char *name)
+{
+       struct inode *inode;
+       struct hostfs_stat st;
+       int err = stat_file(name, &st, -1);
+
+       if (err)
+               return ERR_PTR(err);
+
+       inode = iget5_locked(sb, st.ino, hostfs_inode_test, hostfs_inode_set,
+                            &st);
+       if (!inode)
+               return ERR_PTR(-ENOMEM);
+
+       if (inode->i_state & I_NEW) {
+               unlock_new_inode(inode);
+       } else {
+               spin_lock(&inode->i_lock);
+               hostfs_inode_update(inode, &st);
+               spin_unlock(&inode->i_lock);
+       }
+
+       return inode;
 }
 
 static int hostfs_create(struct mnt_idmap *idmap, struct inode *dir,
@@ -565,62 +599,48 @@ static int hostfs_create(struct mnt_idmap *idmap, struct inode *dir,
 {
        struct inode *inode;
        char *name;
-       int error, fd;
-
-       inode = hostfs_iget(dir->i_sb);
-       if (IS_ERR(inode)) {
-               error = PTR_ERR(inode);
-               goto out;
-       }
+       int fd;
 
-       error = -ENOMEM;
        name = dentry_name(dentry);
        if (name == NULL)
-               goto out_put;
+               return -ENOMEM;
 
        fd = file_create(name, mode & 0777);
-       if (fd < 0)
-               error = fd;
-       else
-               error = read_name(inode, name);
+       if (fd < 0) {
+               __putname(name);
+               return fd;
+       }
 
+       inode = hostfs_iget(dir->i_sb, name);
        __putname(name);
-       if (error)
-               goto out_put;
+       if (IS_ERR(inode))
+               return PTR_ERR(inode);
 
        HOSTFS_I(inode)->fd = fd;
        HOSTFS_I(inode)->mode = FMODE_READ | FMODE_WRITE;
        d_instantiate(dentry, inode);
        return 0;
-
- out_put:
-       iput(inode);
- out:
-       return error;
 }
 
 static struct dentry *hostfs_lookup(struct inode *ino, struct dentry *dentry,
                                    unsigned int flags)
 {
-       struct inode *inode;
+       struct inode *inode = NULL;
        char *name;
-       int err;
-
-       inode = hostfs_iget(ino->i_sb);
-       if (IS_ERR(inode))
-               goto out;
 
-       err = -ENOMEM;
        name = dentry_name(dentry);
-       if (name) {
-               err = read_name(inode, name);
-               __putname(name);
-       }
-       if (err) {
-               iput(inode);
-               inode = (err == -ENOENT) ? NULL : ERR_PTR(err);
+       if (name == NULL)
+               return ERR_PTR(-ENOMEM);
+
+       inode = hostfs_iget(ino->i_sb, name);
+       __putname(name);
+       if (IS_ERR(inode)) {
+               if (PTR_ERR(inode) == -ENOENT)
+                       inode = NULL;
+               else
+                       return ERR_CAST(inode);
        }
- out:
+
        return d_splice_alias(inode, dentry);
 }
 
@@ -704,35 +724,23 @@ static int hostfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
        char *name;
        int err;
 
-       inode = hostfs_iget(dir->i_sb);
-       if (IS_ERR(inode)) {
-               err = PTR_ERR(inode);
-               goto out;
-       }
-
-       err = -ENOMEM;
        name = dentry_name(dentry);
        if (name == NULL)
-               goto out_put;
+               return -ENOMEM;
 
        err = do_mknod(name, mode, MAJOR(dev), MINOR(dev));
-       if (err)
-               goto out_free;
+       if (err) {
+               __putname(name);
+               return err;
+       }
 
-       err = read_name(inode, name);
+       inode = hostfs_iget(dir->i_sb, name);
        __putname(name);
-       if (err)
-               goto out_put;
+       if (IS_ERR(inode))
+               return PTR_ERR(inode);
 
        d_instantiate(dentry, inode);
        return 0;
-
- out_free:
-       __putname(name);
- out_put:
-       iput(inode);
- out:
-       return err;
 }
 
 static int hostfs_rename2(struct mnt_idmap *idmap,
@@ -929,49 +937,40 @@ static int hostfs_fill_sb_common(struct super_block *sb, void *d, int silent)
        sb->s_maxbytes = MAX_LFS_FILESIZE;
        err = super_setup_bdi(sb);
        if (err)
-               goto out;
+               return err;
 
        /* NULL is printed as '(null)' by printf(): avoid that. */
        if (req_root == NULL)
                req_root = "";
 
-       err = -ENOMEM;
        sb->s_fs_info = host_root_path =
                kasprintf(GFP_KERNEL, "%s/%s", root_ino, req_root);
        if (host_root_path == NULL)
-               goto out;
-
-       root_inode = new_inode(sb);
-       if (!root_inode)
-               goto out;
+               return -ENOMEM;
 
-       err = read_name(root_inode, host_root_path);
-       if (err)
-               goto out_put;
+       root_inode = hostfs_iget(sb, host_root_path);
+       if (IS_ERR(root_inode))
+               return PTR_ERR(root_inode);
 
        if (S_ISLNK(root_inode->i_mode)) {
-               char *name = follow_link(host_root_path);
-               if (IS_ERR(name)) {
-                       err = PTR_ERR(name);
-                       goto out_put;
-               }
-               err = read_name(root_inode, name);
+               char *name;
+
+               iput(root_inode);
+               name = follow_link(host_root_path);
+               if (IS_ERR(name))
+                       return PTR_ERR(name);
+
+               root_inode = hostfs_iget(sb, name);
                kfree(name);
-               if (err)
-                       goto out_put;
+               if (IS_ERR(root_inode))
+                       return PTR_ERR(root_inode);
        }
 
-       err = -ENOMEM;
        sb->s_root = d_make_root(root_inode);
        if (sb->s_root == NULL)
-               goto out;
+               return -ENOMEM;
 
        return 0;
-
-out_put:
-       iput(root_inode);
-out:
-       return err;
 }
 
 static struct dentry *hostfs_read_sb(struct file_system_type *type,
index 5ecc470..840619e 100644 (file)
@@ -36,6 +36,7 @@ static void stat64_to_hostfs(const struct stat64 *buf, struct hostfs_stat *p)
        p->blocks = buf->st_blocks;
        p->maj = os_major(buf->st_rdev);
        p->min = os_minor(buf->st_rdev);
+       p->dev = buf->st_dev;
 }
 
 int stat_file(const char *path, struct hostfs_stat *p, int fd)
index 88952d4..1bb8d97 100644 (file)
@@ -259,7 +259,7 @@ const struct file_operations hpfs_file_ops =
        .mmap           = generic_file_mmap,
        .release        = hpfs_file_release,
        .fsync          = hpfs_file_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .unlocked_ioctl = hpfs_ioctl,
        .compat_ioctl   = compat_ptr_ioctl,
 };
index 577799b..d37fad9 100644 (file)
@@ -1104,9 +1104,51 @@ void discard_new_inode(struct inode *inode)
 EXPORT_SYMBOL(discard_new_inode);
 
 /**
+ * lock_two_inodes - lock two inodes (may be regular files but also dirs)
+ *
+ * Lock any non-NULL argument. The caller must make sure that if he is passing
+ * in two directories, one is not ancestor of the other.  Zero, one or two
+ * objects may be locked by this function.
+ *
+ * @inode1: first inode to lock
+ * @inode2: second inode to lock
+ * @subclass1: inode lock subclass for the first lock obtained
+ * @subclass2: inode lock subclass for the second lock obtained
+ */
+void lock_two_inodes(struct inode *inode1, struct inode *inode2,
+                    unsigned subclass1, unsigned subclass2)
+{
+       if (!inode1 || !inode2) {
+               /*
+                * Make sure @subclass1 will be used for the acquired lock.
+                * This is not strictly necessary (no current caller cares) but
+                * let's keep things consistent.
+                */
+               if (!inode1)
+                       swap(inode1, inode2);
+               goto lock;
+       }
+
+       /*
+        * If one object is directory and the other is not, we must make sure
+        * to lock directory first as the other object may be its child.
+        */
+       if (S_ISDIR(inode2->i_mode) == S_ISDIR(inode1->i_mode)) {
+               if (inode1 > inode2)
+                       swap(inode1, inode2);
+       } else if (!S_ISDIR(inode1->i_mode))
+               swap(inode1, inode2);
+lock:
+       if (inode1)
+               inode_lock_nested(inode1, subclass1);
+       if (inode2 && inode2 != inode1)
+               inode_lock_nested(inode2, subclass2);
+}
+
+/**
  * lock_two_nondirectories - take two i_mutexes on non-directory objects
  *
- * Lock any non-NULL argument that is not a directory.
+ * Lock any non-NULL argument. Passed objects must not be directories.
  * Zero, one or two objects may be locked by this function.
  *
  * @inode1: first inode to lock
@@ -1114,13 +1156,9 @@ EXPORT_SYMBOL(discard_new_inode);
  */
 void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
 {
-       if (inode1 > inode2)
-               swap(inode1, inode2);
-
-       if (inode1 && !S_ISDIR(inode1->i_mode))
-               inode_lock(inode1);
-       if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
-               inode_lock_nested(inode2, I_MUTEX_NONDIR2);
+       WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
+       WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
+       lock_two_inodes(inode1, inode2, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
 }
 EXPORT_SYMBOL(lock_two_nondirectories);
 
@@ -1131,10 +1169,14 @@ EXPORT_SYMBOL(lock_two_nondirectories);
  */
 void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
 {
-       if (inode1 && !S_ISDIR(inode1->i_mode))
+       if (inode1) {
+               WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
                inode_unlock(inode1);
-       if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
+       }
+       if (inode2 && inode2 != inode1) {
+               WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
                inode_unlock(inode2);
+       }
 }
 EXPORT_SYMBOL(unlock_two_nondirectories);
 
@@ -2264,7 +2306,8 @@ void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
                inode->i_fop = &def_chr_fops;
                inode->i_rdev = rdev;
        } else if (S_ISBLK(mode)) {
-               inode->i_fop = &def_blk_fops;
+               if (IS_ENABLED(CONFIG_BLOCK))
+                       inode->i_fop = &def_blk_fops;
                inode->i_rdev = rdev;
        } else if (S_ISFIFO(mode))
                inode->i_fop = &pipefifo_fops;
index bd3b281..f7a3dc1 100644 (file)
@@ -97,8 +97,9 @@ extern void chroot_fs_refs(const struct path *, const struct path *);
 /*
  * file_table.c
  */
-extern struct file *alloc_empty_file(int, const struct cred *);
-extern struct file *alloc_empty_file_noaccount(int, const struct cred *);
+struct file *alloc_empty_file(int flags, const struct cred *cred);
+struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
+struct file *alloc_empty_backing_file(int flags, const struct cred *cred);
 
 static inline void put_file_access(struct file *file)
 {
@@ -121,6 +122,47 @@ extern bool mount_capable(struct fs_context *);
 int sb_init_dio_done_wq(struct super_block *sb);
 
 /*
+ * Prepare superblock for changing its read-only state (i.e., either remount
+ * read-write superblock read-only or vice versa). After this function returns
+ * mnt_is_readonly() will return true for any mount of the superblock if its
+ * caller is able to observe any changes done by the remount. This holds until
+ * sb_end_ro_state_change() is called.
+ */
+static inline void sb_start_ro_state_change(struct super_block *sb)
+{
+       WRITE_ONCE(sb->s_readonly_remount, 1);
+       /*
+        * For RO->RW transition, the barrier pairs with the barrier in
+        * mnt_is_readonly() making sure if mnt_is_readonly() sees SB_RDONLY
+        * cleared, it will see s_readonly_remount set.
+        * For RW->RO transition, the barrier pairs with the barrier in
+        * __mnt_want_write() before the mnt_is_readonly() check. The barrier
+        * makes sure if __mnt_want_write() sees MNT_WRITE_HOLD already
+        * cleared, it will see s_readonly_remount set.
+        */
+       smp_wmb();
+}
+
+/*
+ * Ends section changing read-only state of the superblock. After this function
+ * returns if mnt_is_readonly() returns false, the caller will be able to
+ * observe all the changes remount did to the superblock.
+ */
+static inline void sb_end_ro_state_change(struct super_block *sb)
+{
+       /*
+        * This barrier provides release semantics that pairs with
+        * the smp_rmb() acquire semantics in mnt_is_readonly().
+        * This barrier pair ensure that when mnt_is_readonly() sees
+        * 0 for sb->s_readonly_remount, it will also see all the
+        * preceding flag changes that were made during the RO state
+        * change.
+        */
+       smp_wmb();
+       WRITE_ONCE(sb->s_readonly_remount, 0);
+}
+
+/*
  * open.c
  */
 struct open_flags {
@@ -152,6 +194,8 @@ extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
 int dentry_needs_remove_privs(struct mnt_idmap *, struct dentry *dentry);
 bool in_group_or_capable(struct mnt_idmap *idmap,
                         const struct inode *inode, vfsgid_t vfsgid);
+void lock_two_inodes(struct inode *inode1, struct inode *inode2,
+                    unsigned subclass1, unsigned subclass2);
 
 /*
  * fs-writeback.c
index 063133e..0edab9d 100644 (file)
@@ -312,7 +312,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
                        ctx->bio->bi_opf |= REQ_RAHEAD;
                ctx->bio->bi_iter.bi_sector = sector;
                ctx->bio->bi_end_io = iomap_read_end_io;
-               bio_add_folio(ctx->bio, folio, plen, poff);
+               bio_add_folio_nofail(ctx->bio, folio, plen, poff);
        }
 
 done:
@@ -539,7 +539,7 @@ static int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
 
        bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ);
        bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
-       bio_add_folio(&bio, folio, plen, poff);
+       bio_add_folio_nofail(&bio, folio, plen, poff);
        return submit_bio_wait(&bio);
 }
 
@@ -1582,7 +1582,7 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
 
        if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) {
                wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio);
-               bio_add_folio(wpc->ioend->io_bio, folio, len, poff);
+               bio_add_folio_nofail(wpc->ioend->io_bio, folio, len, poff);
        }
 
        if (iop)
index 019cc87..08873f0 100644 (file)
@@ -203,7 +203,6 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
        bio->bi_private = dio;
        bio->bi_end_io = iomap_dio_bio_end_io;
 
-       get_page(page);
        __bio_add_page(bio, page, len, 0);
        iomap_dio_submit_bio(iter, dio, bio, pos);
 }
index 837cd55..6ae9d6f 100644 (file)
@@ -211,7 +211,10 @@ static int jffs2_build_filesystem(struct jffs2_sb_info *c)
                ic->scan_dents = NULL;
                cond_resched();
        }
-       jffs2_build_xattr_subsystem(c);
+       ret = jffs2_build_xattr_subsystem(c);
+       if (ret)
+               goto exit;
+
        c->flags &= ~JFFS2_SB_FLAG_BUILDING;
 
        dbg_fsbuild("FS build complete\n");
index 96b0275..2345ca3 100644 (file)
@@ -56,7 +56,7 @@ const struct file_operations jffs2_file_operations =
        .unlocked_ioctl=jffs2_ioctl,
        .mmap =         generic_file_readonly_mmap,
        .fsync =        jffs2_fsync,
-       .splice_read =  generic_file_splice_read,
+       .splice_read =  filemap_splice_read,
        .splice_write = iter_file_splice_write,
 };
 
index aa4048a..3b6bdc9 100644 (file)
@@ -772,10 +772,10 @@ void jffs2_clear_xattr_subsystem(struct jffs2_sb_info *c)
 }
 
 #define XREF_TMPHASH_SIZE      (128)
-void jffs2_build_xattr_subsystem(struct jffs2_sb_info *c)
+int jffs2_build_xattr_subsystem(struct jffs2_sb_info *c)
 {
        struct jffs2_xattr_ref *ref, *_ref;
-       struct jffs2_xattr_ref *xref_tmphash[XREF_TMPHASH_SIZE];
+       struct jffs2_xattr_ref **xref_tmphash;
        struct jffs2_xattr_datum *xd, *_xd;
        struct jffs2_inode_cache *ic;
        struct jffs2_raw_node_ref *raw;
@@ -784,9 +784,12 @@ void jffs2_build_xattr_subsystem(struct jffs2_sb_info *c)
 
        BUG_ON(!(c->flags & JFFS2_SB_FLAG_BUILDING));
 
+       xref_tmphash = kcalloc(XREF_TMPHASH_SIZE,
+                              sizeof(struct jffs2_xattr_ref *), GFP_KERNEL);
+       if (!xref_tmphash)
+               return -ENOMEM;
+
        /* Phase.1 : Merge same xref */
-       for (i=0; i < XREF_TMPHASH_SIZE; i++)
-               xref_tmphash[i] = NULL;
        for (ref=c->xref_temp; ref; ref=_ref) {
                struct jffs2_xattr_ref *tmp;
 
@@ -884,6 +887,8 @@ void jffs2_build_xattr_subsystem(struct jffs2_sb_info *c)
                     "%u of xref (%u dead, %u orphan) found.\n",
                     xdatum_count, xdatum_unchecked_count, xdatum_orphan_count,
                     xref_count, xref_dead_count, xref_orphan_count);
+       kfree(xref_tmphash);
+       return 0;
 }
 
 struct jffs2_xattr_datum *jffs2_setup_xattr_datum(struct jffs2_sb_info *c,
index 720007b..1b5030a 100644 (file)
@@ -71,7 +71,7 @@ static inline int is_xattr_ref_dead(struct jffs2_xattr_ref *ref)
 #ifdef CONFIG_JFFS2_FS_XATTR
 
 extern void jffs2_init_xattr_subsystem(struct jffs2_sb_info *c);
-extern void jffs2_build_xattr_subsystem(struct jffs2_sb_info *c);
+extern int jffs2_build_xattr_subsystem(struct jffs2_sb_info *c);
 extern void jffs2_clear_xattr_subsystem(struct jffs2_sb_info *c);
 
 extern struct jffs2_xattr_datum *jffs2_setup_xattr_datum(struct jffs2_sb_info *c,
@@ -103,7 +103,7 @@ extern ssize_t jffs2_listxattr(struct dentry *, char *, size_t);
 #else
 
 #define jffs2_init_xattr_subsystem(c)
-#define jffs2_build_xattr_subsystem(c)
+#define jffs2_build_xattr_subsystem(c)         (0)
 #define jffs2_clear_xattr_subsystem(c)
 
 #define jffs2_xattr_do_crccheck_inode(c, ic)
index 2ee35be..01b6912 100644 (file)
@@ -144,7 +144,7 @@ const struct file_operations jfs_file_operations = {
        .read_iter      = generic_file_read_iter,
        .write_iter     = generic_file_write_iter,
        .mmap           = generic_file_mmap,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
        .fsync          = jfs_fsync,
        .release        = jfs_release,
index 695415c..e855b8f 100644 (file)
@@ -1100,8 +1100,8 @@ int lmLogOpen(struct super_block *sb)
         * file systems to log may have n-to-1 relationship;
         */
 
-       bdev = blkdev_get_by_dev(sbi->logdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-                                log);
+       bdev = blkdev_get_by_dev(sbi->logdev, BLK_OPEN_READ | BLK_OPEN_WRITE,
+                                log, NULL);
        if (IS_ERR(bdev)) {
                rc = PTR_ERR(bdev);
                goto free;
@@ -1141,7 +1141,7 @@ journal_found:
        lbmLogShutdown(log);
 
       close:           /* close external log device */
-       blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+       blkdev_put(bdev, log);
 
       free:            /* free log descriptor */
        mutex_unlock(&jfs_log_mutex);
@@ -1485,7 +1485,7 @@ int lmLogClose(struct super_block *sb)
        bdev = log->bdev;
        rc = lmLogShutdown(log);
 
-       blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+       blkdev_put(bdev, log);
 
        kfree(log);
 
@@ -1974,7 +1974,7 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp)
 
        bio = bio_alloc(log->bdev, 1, REQ_OP_READ, GFP_NOFS);
        bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
-       bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
+       __bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
        BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
 
        bio->bi_end_io = lbmIODone;
@@ -2115,7 +2115,7 @@ static void lbmStartIO(struct lbuf * bp)
 
        bio = bio_alloc(log->bdev, 1, REQ_OP_WRITE | REQ_SYNC, GFP_NOFS);
        bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
-       bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
+       __bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
        BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
 
        bio->bi_end_io = lbmIODone;
index b29d68b..494b9f4 100644 (file)
@@ -876,7 +876,7 @@ static int jfs_symlink(struct mnt_idmap *idmap, struct inode *dip,
        tid_t tid;
        ino_t ino = 0;
        struct component_name dname;
-       int ssize;              /* source pathname size */
+       u32 ssize;              /* source pathname size */
        struct btstack btstack;
        struct inode *ip = d_inode(dentry);
        s64 xlen = 0;
@@ -957,7 +957,7 @@ static int jfs_symlink(struct mnt_idmap *idmap, struct inode *dip,
                if (ssize > sizeof (JFS_IP(ip)->i_inline))
                        JFS_IP(ip)->mode2 &= ~INLINEEA;
 
-               jfs_info("jfs_symlink: fast symlink added  ssize:%d name:%s ",
+               jfs_info("jfs_symlink: fast symlink added  ssize:%u name:%s ",
                         ssize, name);
        }
        /*
@@ -987,7 +987,7 @@ static int jfs_symlink(struct mnt_idmap *idmap, struct inode *dip,
                ip->i_size = ssize - 1;
                while (ssize) {
                        /* This is kind of silly since PATH_MAX == 4K */
-                       int copy_size = min(ssize, PSIZE);
+                       u32 copy_size = min_t(u32, ssize, PSIZE);
 
                        mp = get_metapage(ip, xaddr, PSIZE, 1);
 
index 40c4661..180906c 100644 (file)
@@ -1011,7 +1011,7 @@ const struct file_operations kernfs_file_fops = {
        .release        = kernfs_fop_release,
        .poll           = kernfs_fop_poll,
        .fsync          = noop_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .splice_write   = iter_file_splice_write,
 };
 
index bb94949..22d3ff3 100644 (file)
@@ -77,9 +77,9 @@ static const unsigned long    nlm_grace_period_min = 0;
 static const unsigned long     nlm_grace_period_max = 240;
 static const unsigned long     nlm_timeout_min = 3;
 static const unsigned long     nlm_timeout_max = 20;
-static const int               nlm_port_min = 0, nlm_port_max = 65535;
 
 #ifdef CONFIG_SYSCTL
+static const int               nlm_port_min = 0, nlm_port_max = 65535;
 static struct ctl_table_header * nlm_sysctl_table;
 #endif
 
@@ -355,7 +355,6 @@ static int lockd_get(void)
        int error;
 
        if (nlmsvc_serv) {
-               svc_get(nlmsvc_serv);
                nlmsvc_users++;
                return 0;
        }
index 0dd05d4..906d192 100644 (file)
@@ -19,7 +19,7 @@ const struct file_operations minix_file_operations = {
        .write_iter     = generic_file_write_iter,
        .mmap           = generic_file_mmap,
        .fsync          = generic_file_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
 };
 
 static int minix_setattr(struct mnt_idmap *idmap,
index e4fe087..91171da 100644 (file)
@@ -3028,8 +3028,8 @@ static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2)
                return p;
        }
 
-       inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
-       inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
+       lock_two_inodes(p1->d_inode, p2->d_inode,
+                       I_MUTEX_PARENT, I_MUTEX_PARENT2);
        return NULL;
 }
 
@@ -3703,7 +3703,7 @@ static int vfs_tmpfile(struct mnt_idmap *idmap,
 }
 
 /**
- * vfs_tmpfile_open - open a tmpfile for kernel internal use
+ * kernel_tmpfile_open - open a tmpfile for kernel internal use
  * @idmap:     idmap of the mount the inode was found from
  * @parentpath:        path of the base directory
  * @mode:      mode of the new tmpfile
@@ -3714,24 +3714,26 @@ static int vfs_tmpfile(struct mnt_idmap *idmap,
  * hence this is only for kernel internal use, and must not be installed into
  * file tables or such.
  */
-struct file *vfs_tmpfile_open(struct mnt_idmap *idmap,
-                         const struct path *parentpath,
-                         umode_t mode, int open_flag, const struct cred *cred)
+struct file *kernel_tmpfile_open(struct mnt_idmap *idmap,
+                                const struct path *parentpath,
+                                umode_t mode, int open_flag,
+                                const struct cred *cred)
 {
        struct file *file;
        int error;
 
        file = alloc_empty_file_noaccount(open_flag, cred);
-       if (!IS_ERR(file)) {
-               error = vfs_tmpfile(idmap, parentpath, file, mode);
-               if (error) {
-                       fput(file);
-                       file = ERR_PTR(error);
-               }
+       if (IS_ERR(file))
+               return file;
+
+       error = vfs_tmpfile(idmap, parentpath, file, mode);
+       if (error) {
+               fput(file);
+               file = ERR_PTR(error);
        }
        return file;
 }
-EXPORT_SYMBOL(vfs_tmpfile_open);
+EXPORT_SYMBOL(kernel_tmpfile_open);
 
 static int do_tmpfile(struct nameidata *nd, unsigned flags,
                const struct open_flags *op,
@@ -4731,7 +4733,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
  *        sb->s_vfs_rename_mutex. We might be more accurate, but that's another
  *        story.
  *     c) we have to lock _four_ objects - parents and victim (if it exists),
- *        and source (if it is not a directory).
+ *        and source.
  *        And that - after we got ->i_mutex on parents (until then we don't know
  *        whether the target exists).  Solution: try to be smart with locking
  *        order for inodes.  We rely on the fact that tree topology may change
@@ -4815,10 +4817,16 @@ int vfs_rename(struct renamedata *rd)
 
        take_dentry_name_snapshot(&old_name, old_dentry);
        dget(new_dentry);
-       if (!is_dir || (flags & RENAME_EXCHANGE))
-               lock_two_nondirectories(source, target);
-       else if (target)
-               inode_lock(target);
+       /*
+        * Lock all moved children. Moved directories may need to change parent
+        * pointer so they need the lock to prevent against concurrent
+        * directory changes moving parent pointer. For regular files we've
+        * historically always done this. The lockdep locking subclasses are
+        * somewhat arbitrary but RENAME_EXCHANGE in particular can swap
+        * regular files and directories so it's difficult to tell which
+        * subclasses to use.
+        */
+       lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
 
        error = -EPERM;
        if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target)))
@@ -4866,9 +4874,9 @@ int vfs_rename(struct renamedata *rd)
                        d_exchange(old_dentry, new_dentry);
        }
 out:
-       if (!is_dir || (flags & RENAME_EXCHANGE))
-               unlock_two_nondirectories(source, target);
-       else if (target)
+       if (source)
+               inode_unlock(source);
+       if (target)
                inode_unlock(target);
        dput(new_dentry);
        if (!error) {
index 54847db..e157efc 100644 (file)
@@ -309,9 +309,16 @@ static unsigned int mnt_get_writers(struct mount *mnt)
 
 static int mnt_is_readonly(struct vfsmount *mnt)
 {
-       if (mnt->mnt_sb->s_readonly_remount)
+       if (READ_ONCE(mnt->mnt_sb->s_readonly_remount))
                return 1;
-       /* Order wrt setting s_flags/s_readonly_remount in do_remount() */
+       /*
+        * The barrier pairs with the barrier in sb_start_ro_state_change()
+        * making sure if we don't see s_readonly_remount set yet, we also will
+        * not see any superblock / mount flag changes done by remount.
+        * It also pairs with the barrier in sb_end_ro_state_change()
+        * assuring that if we see s_readonly_remount already cleared, we will
+        * see the values of superblock / mount flags updated by remount.
+        */
        smp_rmb();
        return __mnt_is_readonly(mnt);
 }
@@ -364,9 +371,11 @@ int __mnt_want_write(struct vfsmount *m)
                }
        }
        /*
-        * After the slowpath clears MNT_WRITE_HOLD, mnt_is_readonly will
-        * be set to match its requirements. So we must not load that until
-        * MNT_WRITE_HOLD is cleared.
+        * The barrier pairs with the barrier sb_start_ro_state_change() making
+        * sure that if we see MNT_WRITE_HOLD cleared, we will also see
+        * s_readonly_remount set (or even SB_RDONLY / MNT_READONLY flags) in
+        * mnt_is_readonly() and bail in case we are racing with remount
+        * read-only.
         */
        smp_rmb();
        if (mnt_is_readonly(m)) {
@@ -588,10 +597,8 @@ int sb_prepare_remount_readonly(struct super_block *sb)
        if (!err && atomic_long_read(&sb->s_remove_count))
                err = -EBUSY;
 
-       if (!err) {
-               sb->s_readonly_remount = 1;
-               smp_wmb();
-       }
+       if (!err)
+               sb_start_ro_state_change(sb);
        list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
                if (mnt->mnt.mnt_flags & MNT_WRITE_HOLD)
                        mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
@@ -658,9 +665,25 @@ static bool legitimize_mnt(struct vfsmount *bastard, unsigned seq)
        return false;
 }
 
-/*
- * find the first mount at @dentry on vfsmount @mnt.
- * call under rcu_read_lock()
+/**
+ * __lookup_mnt - find first child mount
+ * @mnt:       parent mount
+ * @dentry:    mountpoint
+ *
+ * If @mnt has a child mount @c mounted @dentry find and return it.
+ *
+ * Note that the child mount @c need not be unique. There are cases
+ * where shadow mounts are created. For example, during mount
+ * propagation when a source mount @mnt whose root got overmounted by a
+ * mount @o after path lookup but before @namespace_sem could be
+ * acquired gets copied and propagated. So @mnt gets copied including
+ * @o. When @mnt is propagated to a destination mount @d that already
+ * has another mount @n mounted at the same mountpoint then the source
+ * mount @mnt will be tucked beneath @n, i.e., @n will be mounted on
+ * @mnt and @mnt mounted on @d. Now both @n and @o are mounted at @mnt
+ * on @dentry.
+ *
+ * Return: The first child of @mnt mounted @dentry or NULL.
  */
 struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
 {
@@ -910,6 +933,33 @@ void mnt_set_mountpoint(struct mount *mnt,
        hlist_add_head(&child_mnt->mnt_mp_list, &mp->m_list);
 }
 
+/**
+ * mnt_set_mountpoint_beneath - mount a mount beneath another one
+ *
+ * @new_parent: the source mount
+ * @top_mnt:    the mount beneath which @new_parent is mounted
+ * @new_mp:     the new mountpoint of @top_mnt on @new_parent
+ *
+ * Remove @top_mnt from its current mountpoint @top_mnt->mnt_mp and
+ * parent @top_mnt->mnt_parent and mount it on top of @new_parent at
+ * @new_mp. And mount @new_parent on the old parent and old
+ * mountpoint of @top_mnt.
+ *
+ * Context: This function expects namespace_lock() and lock_mount_hash()
+ *          to have been acquired in that order.
+ */
+static void mnt_set_mountpoint_beneath(struct mount *new_parent,
+                                      struct mount *top_mnt,
+                                      struct mountpoint *new_mp)
+{
+       struct mount *old_top_parent = top_mnt->mnt_parent;
+       struct mountpoint *old_top_mp = top_mnt->mnt_mp;
+
+       mnt_set_mountpoint(old_top_parent, old_top_mp, new_parent);
+       mnt_change_mountpoint(new_parent, new_mp, top_mnt);
+}
+
+
 static void __attach_mnt(struct mount *mnt, struct mount *parent)
 {
        hlist_add_head_rcu(&mnt->mnt_hash,
@@ -917,15 +967,42 @@ static void __attach_mnt(struct mount *mnt, struct mount *parent)
        list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
 }
 
-/*
- * vfsmount lock must be held for write
+/**
+ * attach_mnt - mount a mount, attach to @mount_hashtable and parent's
+ *              list of child mounts
+ * @parent:  the parent
+ * @mnt:     the new mount
+ * @mp:      the new mountpoint
+ * @beneath: whether to mount @mnt beneath or on top of @parent
+ *
+ * If @beneath is false, mount @mnt at @mp on @parent. Then attach @mnt
+ * to @parent's child mount list and to @mount_hashtable.
+ *
+ * If @beneath is true, remove @mnt from its current parent and
+ * mountpoint and mount it on @mp on @parent, and mount @parent on the
+ * old parent and old mountpoint of @mnt. Finally, attach @parent to
+ * @mnt_hashtable and @parent->mnt_parent->mnt_mounts.
+ *
+ * Note, when __attach_mnt() is called @mnt->mnt_parent already points
+ * to the correct parent.
+ *
+ * Context: This function expects namespace_lock() and lock_mount_hash()
+ *          to have been acquired in that order.
  */
-static void attach_mnt(struct mount *mnt,
-                       struct mount *parent,
-                       struct mountpoint *mp)
+static void attach_mnt(struct mount *mnt, struct mount *parent,
+                      struct mountpoint *mp, bool beneath)
 {
-       mnt_set_mountpoint(parent, mp, mnt);
-       __attach_mnt(mnt, parent);
+       if (beneath)
+               mnt_set_mountpoint_beneath(mnt, parent, mp);
+       else
+               mnt_set_mountpoint(parent, mp, mnt);
+       /*
+        * Note, @mnt->mnt_parent has to be used. If @mnt was mounted
+        * beneath @parent then @mnt will need to be attached to
+        * @parent's old parent, not @parent. IOW, @mnt->mnt_parent
+        * isn't the same mount as @parent.
+        */
+       __attach_mnt(mnt, mnt->mnt_parent);
 }
 
 void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct mount *mnt)
@@ -937,7 +1014,7 @@ void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct m
        hlist_del_init(&mnt->mnt_mp_list);
        hlist_del_init_rcu(&mnt->mnt_hash);
 
-       attach_mnt(mnt, parent, mp);
+       attach_mnt(mnt, parent, mp, false);
 
        put_mountpoint(old_mp);
        mnt_add_count(old_parent, -1);
@@ -1767,6 +1844,19 @@ bool may_mount(void)
        return ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN);
 }
 
+/**
+ * path_mounted - check whether path is mounted
+ * @path: path to check
+ *
+ * Determine whether @path refers to the root of a mount.
+ *
+ * Return: true if @path is the root of a mount, false if not.
+ */
+static inline bool path_mounted(const struct path *path)
+{
+       return path->mnt->mnt_root == path->dentry;
+}
+
 static void warn_mandlock(void)
 {
        pr_warn_once("=======================================================\n"
@@ -1782,7 +1872,7 @@ static int can_umount(const struct path *path, int flags)
 
        if (!may_mount())
                return -EPERM;
-       if (path->dentry != path->mnt->mnt_root)
+       if (!path_mounted(path))
                return -EINVAL;
        if (!check_mnt(mnt))
                return -EINVAL;
@@ -1925,7 +2015,7 @@ struct mount *copy_tree(struct mount *mnt, struct dentry *dentry,
                                goto out;
                        lock_mount_hash();
                        list_add_tail(&q->mnt_list, &res->mnt_list);
-                       attach_mnt(q, parent, p->mnt_mp);
+                       attach_mnt(q, parent, p->mnt_mp, false);
                        unlock_mount_hash();
                }
        }
@@ -2134,12 +2224,17 @@ int count_mounts(struct mnt_namespace *ns, struct mount *mnt)
        return 0;
 }
 
-/*
- *  @source_mnt : mount tree to be attached
- *  @nd         : place the mount tree @source_mnt is attached
- *  @parent_nd  : if non-null, detach the source_mnt from its parent and
- *                store the parent mount and mountpoint dentry.
- *                (done when source_mnt is moved)
+enum mnt_tree_flags_t {
+       MNT_TREE_MOVE = BIT(0),
+       MNT_TREE_BENEATH = BIT(1),
+};
+
+/**
+ * attach_recursive_mnt - attach a source mount tree
+ * @source_mnt: mount tree to be attached
+ * @top_mnt:    mount that @source_mnt will be mounted on or mounted beneath
+ * @dest_mp:    the mountpoint @source_mnt will be mounted at
+ * @flags:      modify how @source_mnt is supposed to be attached
  *
  *  NOTE: in the table below explains the semantics when a source mount
  *  of a given type is attached to a destination mount of a given type.
@@ -2196,22 +2291,28 @@ int count_mounts(struct mnt_namespace *ns, struct mount *mnt)
  * applied to each mount in the tree.
  * Must be called without spinlocks held, since this function can sleep
  * in allocations.
+ *
+ * Context: The function expects namespace_lock() to be held.
+ * Return: If @source_mnt was successfully attached 0 is returned.
+ *         Otherwise a negative error code is returned.
  */
 static int attach_recursive_mnt(struct mount *source_mnt,
-                       struct mount *dest_mnt,
-                       struct mountpoint *dest_mp,
-                       bool moving)
+                               struct mount *top_mnt,
+                               struct mountpoint *dest_mp,
+                               enum mnt_tree_flags_t flags)
 {
        struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
        HLIST_HEAD(tree_list);
-       struct mnt_namespace *ns = dest_mnt->mnt_ns;
+       struct mnt_namespace *ns = top_mnt->mnt_ns;
        struct mountpoint *smp;
-       struct mount *child, *p;
+       struct mount *child, *dest_mnt, *p;
        struct hlist_node *n;
-       int err;
+       int err = 0;
+       bool moving = flags & MNT_TREE_MOVE, beneath = flags & MNT_TREE_BENEATH;
 
-       /* Preallocate a mountpoint in case the new mounts need
-        * to be tucked under other mounts.
+       /*
+        * Preallocate a mountpoint in case the new mounts need to be
+        * mounted beneath mounts on the same mountpoint.
         */
        smp = get_mountpoint(source_mnt->mnt.mnt_root);
        if (IS_ERR(smp))
@@ -2224,29 +2325,41 @@ static int attach_recursive_mnt(struct mount *source_mnt,
                        goto out;
        }
 
+       if (beneath)
+               dest_mnt = top_mnt->mnt_parent;
+       else
+               dest_mnt = top_mnt;
+
        if (IS_MNT_SHARED(dest_mnt)) {
                err = invent_group_ids(source_mnt, true);
                if (err)
                        goto out;
                err = propagate_mnt(dest_mnt, dest_mp, source_mnt, &tree_list);
-               lock_mount_hash();
-               if (err)
-                       goto out_cleanup_ids;
+       }
+       lock_mount_hash();
+       if (err)
+               goto out_cleanup_ids;
+
+       if (IS_MNT_SHARED(dest_mnt)) {
                for (p = source_mnt; p; p = next_mnt(p, source_mnt))
                        set_mnt_shared(p);
-       } else {
-               lock_mount_hash();
        }
+
        if (moving) {
+               if (beneath)
+                       dest_mp = smp;
                unhash_mnt(source_mnt);
-               attach_mnt(source_mnt, dest_mnt, dest_mp);
+               attach_mnt(source_mnt, top_mnt, dest_mp, beneath);
                touch_mnt_namespace(source_mnt->mnt_ns);
        } else {
                if (source_mnt->mnt_ns) {
                        /* move from anon - the caller will destroy */
                        list_del_init(&source_mnt->mnt_ns->list);
                }
-               mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
+               if (beneath)
+                       mnt_set_mountpoint_beneath(source_mnt, top_mnt, smp);
+               else
+                       mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
                commit_tree(source_mnt);
        }
 
@@ -2286,33 +2399,101 @@ static int attach_recursive_mnt(struct mount *source_mnt,
        return err;
 }
 
-static struct mountpoint *lock_mount(struct path *path)
+/**
+ * do_lock_mount - lock mount and mountpoint
+ * @path:    target path
+ * @beneath: whether the intention is to mount beneath @path
+ *
+ * Follow the mount stack on @path until the top mount @mnt is found. If
+ * the initial @path->{mnt,dentry} is a mountpoint lookup the first
+ * mount stacked on top of it. Then simply follow @{mnt,mnt->mnt_root}
+ * until nothing is stacked on top of it anymore.
+ *
+ * Acquire the inode_lock() on the top mount's ->mnt_root to protect
+ * against concurrent removal of the new mountpoint from another mount
+ * namespace.
+ *
+ * If @beneath is requested, acquire inode_lock() on @mnt's mountpoint
+ * @mp on @mnt->mnt_parent must be acquired. This protects against a
+ * concurrent unlink of @mp->mnt_dentry from another mount namespace
+ * where @mnt doesn't have a child mount mounted @mp. A concurrent
+ * removal of @mnt->mnt_root doesn't matter as nothing will be mounted
+ * on top of it for @beneath.
+ *
+ * In addition, @beneath needs to make sure that @mnt hasn't been
+ * unmounted or moved from its current mountpoint in between dropping
+ * @mount_lock and acquiring @namespace_sem. For the !@beneath case @mnt
+ * being unmounted would be detected later by e.g., calling
+ * check_mnt(mnt) in the function it's called from. For the @beneath
+ * case however, it's useful to detect it directly in do_lock_mount().
+ * If @mnt hasn't been unmounted then @mnt->mnt_mountpoint still points
+ * to @mnt->mnt_mp->m_dentry. But if @mnt has been unmounted it will
+ * point to @mnt->mnt_root and @mnt->mnt_mp will be NULL.
+ *
+ * Return: Either the target mountpoint on the top mount or the top
+ *         mount's mountpoint.
+ */
+static struct mountpoint *do_lock_mount(struct path *path, bool beneath)
 {
-       struct vfsmount *mnt;
-       struct dentry *dentry = path->dentry;
-retry:
-       inode_lock(dentry->d_inode);
-       if (unlikely(cant_mount(dentry))) {
-               inode_unlock(dentry->d_inode);
-               return ERR_PTR(-ENOENT);
-       }
-       namespace_lock();
-       mnt = lookup_mnt(path);
-       if (likely(!mnt)) {
-               struct mountpoint *mp = get_mountpoint(dentry);
-               if (IS_ERR(mp)) {
+       struct vfsmount *mnt = path->mnt;
+       struct dentry *dentry;
+       struct mountpoint *mp = ERR_PTR(-ENOENT);
+
+       for (;;) {
+               struct mount *m;
+
+               if (beneath) {
+                       m = real_mount(mnt);
+                       read_seqlock_excl(&mount_lock);
+                       dentry = dget(m->mnt_mountpoint);
+                       read_sequnlock_excl(&mount_lock);
+               } else {
+                       dentry = path->dentry;
+               }
+
+               inode_lock(dentry->d_inode);
+               if (unlikely(cant_mount(dentry))) {
+                       inode_unlock(dentry->d_inode);
+                       goto out;
+               }
+
+               namespace_lock();
+
+               if (beneath && (!is_mounted(mnt) || m->mnt_mountpoint != dentry)) {
                        namespace_unlock();
                        inode_unlock(dentry->d_inode);
-                       return mp;
+                       goto out;
                }
-               return mp;
+
+               mnt = lookup_mnt(path);
+               if (likely(!mnt))
+                       break;
+
+               namespace_unlock();
+               inode_unlock(dentry->d_inode);
+               if (beneath)
+                       dput(dentry);
+               path_put(path);
+               path->mnt = mnt;
+               path->dentry = dget(mnt->mnt_root);
        }
-       namespace_unlock();
-       inode_unlock(path->dentry->d_inode);
-       path_put(path);
-       path->mnt = mnt;
-       dentry = path->dentry = dget(mnt->mnt_root);
-       goto retry;
+
+       mp = get_mountpoint(dentry);
+       if (IS_ERR(mp)) {
+               namespace_unlock();
+               inode_unlock(dentry->d_inode);
+       }
+
+out:
+       if (beneath)
+               dput(dentry);
+
+       return mp;
+}
+
+static inline struct mountpoint *lock_mount(struct path *path)
+{
+       return do_lock_mount(path, false);
 }
 
 static void unlock_mount(struct mountpoint *where)
@@ -2336,7 +2517,7 @@ static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
              d_is_dir(mnt->mnt.mnt_root))
                return -ENOTDIR;
 
-       return attach_recursive_mnt(mnt, p, mp, false);
+       return attach_recursive_mnt(mnt, p, mp, 0);
 }
 
 /*
@@ -2367,7 +2548,7 @@ static int do_change_type(struct path *path, int ms_flags)
        int type;
        int err = 0;
 
-       if (path->dentry != path->mnt->mnt_root)
+       if (!path_mounted(path))
                return -EINVAL;
 
        type = flags_to_propagation_type(ms_flags);
@@ -2643,7 +2824,7 @@ static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
        if (!check_mnt(mnt))
                return -EINVAL;
 
-       if (path->dentry != mnt->mnt.mnt_root)
+       if (!path_mounted(path))
                return -EINVAL;
 
        if (!can_change_locked_flags(mnt, mnt_flags))
@@ -2682,7 +2863,7 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
        if (!check_mnt(mnt))
                return -EINVAL;
 
-       if (path->dentry != path->mnt->mnt_root)
+       if (!path_mounted(path))
                return -EINVAL;
 
        if (!can_change_locked_flags(mnt, mnt_flags))
@@ -2772,9 +2953,9 @@ static int do_set_group(struct path *from_path, struct path *to_path)
 
        err = -EINVAL;
        /* To and From paths should be mount roots */
-       if (from_path->dentry != from_path->mnt->mnt_root)
+       if (!path_mounted(from_path))
                goto out;
-       if (to_path->dentry != to_path->mnt->mnt_root)
+       if (!path_mounted(to_path))
                goto out;
 
        /* Setting sharing groups is only allowed across same superblock */
@@ -2818,7 +2999,110 @@ out:
        return err;
 }
 
-static int do_move_mount(struct path *old_path, struct path *new_path)
+/**
+ * path_overmounted - check if path is overmounted
+ * @path: path to check
+ *
+ * Check if path is overmounted, i.e., if there's a mount on top of
+ * @path->mnt with @path->dentry as mountpoint.
+ *
+ * Context: This function expects namespace_lock() to be held.
+ * Return: If path is overmounted true is returned, false if not.
+ */
+static inline bool path_overmounted(const struct path *path)
+{
+       rcu_read_lock();
+       if (unlikely(__lookup_mnt(path->mnt, path->dentry))) {
+               rcu_read_unlock();
+               return true;
+       }
+       rcu_read_unlock();
+       return false;
+}
+
+/**
+ * can_move_mount_beneath - check that we can mount beneath the top mount
+ * @from: mount to mount beneath
+ * @to:   mount under which to mount
+ *
+ * - Make sure that @to->dentry is actually the root of a mount under
+ *   which we can mount another mount.
+ * - Make sure that nothing can be mounted beneath the caller's current
+ *   root or the rootfs of the namespace.
+ * - Make sure that the caller can unmount the topmost mount ensuring
+ *   that the caller could reveal the underlying mountpoint.
+ * - Ensure that nothing has been mounted on top of @from before we
+ *   grabbed @namespace_sem to avoid creating pointless shadow mounts.
+ * - Prevent mounting beneath a mount if the propagation relationship
+ *   between the source mount, parent mount, and top mount would lead to
+ *   nonsensical mount trees.
+ *
+ * Context: This function expects namespace_lock() to be held.
+ * Return: On success 0, and on error a negative error code is returned.
+ */
+static int can_move_mount_beneath(const struct path *from,
+                                 const struct path *to,
+                                 const struct mountpoint *mp)
+{
+       struct mount *mnt_from = real_mount(from->mnt),
+                    *mnt_to = real_mount(to->mnt),
+                    *parent_mnt_to = mnt_to->mnt_parent;
+
+       if (!mnt_has_parent(mnt_to))
+               return -EINVAL;
+
+       if (!path_mounted(to))
+               return -EINVAL;
+
+       if (IS_MNT_LOCKED(mnt_to))
+               return -EINVAL;
+
+       /* Avoid creating shadow mounts during mount propagation. */
+       if (path_overmounted(from))
+               return -EINVAL;
+
+       /*
+        * Mounting beneath the rootfs only makes sense when the
+        * semantics of pivot_root(".", ".") are used.
+        */
+       if (&mnt_to->mnt == current->fs->root.mnt)
+               return -EINVAL;
+       if (parent_mnt_to == current->nsproxy->mnt_ns->root)
+               return -EINVAL;
+
+       for (struct mount *p = mnt_from; mnt_has_parent(p); p = p->mnt_parent)
+               if (p == mnt_to)
+                       return -EINVAL;
+
+       /*
+        * If the parent mount propagates to the child mount this would
+        * mean mounting @mnt_from on @mnt_to->mnt_parent and then
+        * propagating a copy @c of @mnt_from on top of @mnt_to. This
+        * defeats the whole purpose of mounting beneath another mount.
+        */
+       if (propagation_would_overmount(parent_mnt_to, mnt_to, mp))
+               return -EINVAL;
+
+       /*
+        * If @mnt_to->mnt_parent propagates to @mnt_from this would
+        * mean propagating a copy @c of @mnt_from on top of @mnt_from.
+        * Afterwards @mnt_from would be mounted on top of
+        * @mnt_to->mnt_parent and @mnt_to would be unmounted from
+        * @mnt->mnt_parent and remounted on @mnt_from. But since @c is
+        * already mounted on @mnt_from, @mnt_to would ultimately be
+        * remounted on top of @c. Afterwards, @mnt_from would be
+        * covered by a copy @c of @mnt_from and @c would be covered by
+        * @mnt_from itself. This defeats the whole purpose of mounting
+        * @mnt_from beneath @mnt_to.
+        */
+       if (propagation_would_overmount(parent_mnt_to, mnt_from, mp))
+               return -EINVAL;
+
+       return 0;
+}
+
+static int do_move_mount(struct path *old_path, struct path *new_path,
+                        bool beneath)
 {
        struct mnt_namespace *ns;
        struct mount *p;
@@ -2827,8 +3111,9 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
        struct mountpoint *mp, *old_mp;
        int err;
        bool attached;
+       enum mnt_tree_flags_t flags = 0;
 
-       mp = lock_mount(new_path);
+       mp = do_lock_mount(new_path, beneath);
        if (IS_ERR(mp))
                return PTR_ERR(mp);
 
@@ -2836,6 +3121,8 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
        p = real_mount(new_path->mnt);
        parent = old->mnt_parent;
        attached = mnt_has_parent(old);
+       if (attached)
+               flags |= MNT_TREE_MOVE;
        old_mp = old->mnt_mp;
        ns = old->mnt_ns;
 
@@ -2855,7 +3142,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
        if (old->mnt.mnt_flags & MNT_LOCKED)
                goto out;
 
-       if (old_path->dentry != old_path->mnt->mnt_root)
+       if (!path_mounted(old_path))
                goto out;
 
        if (d_is_dir(new_path->dentry) !=
@@ -2866,6 +3153,17 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
         */
        if (attached && IS_MNT_SHARED(parent))
                goto out;
+
+       if (beneath) {
+               err = can_move_mount_beneath(old_path, new_path, mp);
+               if (err)
+                       goto out;
+
+               err = -EINVAL;
+               p = p->mnt_parent;
+               flags |= MNT_TREE_BENEATH;
+       }
+
        /*
         * Don't move a mount tree containing unbindable mounts to a destination
         * mount which is shared.
@@ -2879,8 +3177,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
                if (p == old)
                        goto out;
 
-       err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp,
-                                  attached);
+       err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp, flags);
        if (err)
                goto out;
 
@@ -2912,7 +3209,7 @@ static int do_move_mount_old(struct path *path, const char *old_name)
        if (err)
                return err;
 
-       err = do_move_mount(&old_path, path);
+       err = do_move_mount(&old_path, path, false);
        path_put(&old_path);
        return err;
 }
@@ -2937,8 +3234,7 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
        }
 
        /* Refuse the same filesystem on the same mount point */
-       if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
-           path->mnt->mnt_root == path->dentry)
+       if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb && path_mounted(path))
                return -EBUSY;
 
        if (d_is_symlink(newmnt->mnt.mnt_root))
@@ -3079,13 +3375,10 @@ int finish_automount(struct vfsmount *m, const struct path *path)
                err = -ENOENT;
                goto discard_locked;
        }
-       rcu_read_lock();
-       if (unlikely(__lookup_mnt(path->mnt, dentry))) {
-               rcu_read_unlock();
+       if (path_overmounted(path)) {
                err = 0;
                goto discard_locked;
        }
-       rcu_read_unlock();
        mp = get_mountpoint(dentry);
        if (IS_ERR(mp)) {
                err = PTR_ERR(mp);
@@ -3777,6 +4070,10 @@ SYSCALL_DEFINE5(move_mount,
        if (flags & ~MOVE_MOUNT__MASK)
                return -EINVAL;
 
+       if ((flags & (MOVE_MOUNT_BENEATH | MOVE_MOUNT_SET_GROUP)) ==
+           (MOVE_MOUNT_BENEATH | MOVE_MOUNT_SET_GROUP))
+               return -EINVAL;
+
        /* If someone gives a pathname, they aren't permitted to move
         * from an fd that requires unmount as we can't get at the flag
         * to clear it afterwards.
@@ -3806,7 +4103,8 @@ SYSCALL_DEFINE5(move_mount,
        if (flags & MOVE_MOUNT_SET_GROUP)
                ret = do_set_group(&from_path, &to_path);
        else
-               ret = do_move_mount(&from_path, &to_path);
+               ret = do_move_mount(&from_path, &to_path,
+                                   (flags & MOVE_MOUNT_BENEATH));
 
 out_to:
        path_put(&to_path);
@@ -3917,11 +4215,11 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
        if (new_mnt == root_mnt || old_mnt == root_mnt)
                goto out4; /* loop, on the same file system  */
        error = -EINVAL;
-       if (root.mnt->mnt_root != root.dentry)
+       if (!path_mounted(&root))
                goto out4; /* not a mountpoint */
        if (!mnt_has_parent(root_mnt))
                goto out4; /* not attached */
-       if (new.mnt->mnt_root != new.dentry)
+       if (!path_mounted(&new))
                goto out4; /* not a mountpoint */
        if (!mnt_has_parent(new_mnt))
                goto out4; /* not attached */
@@ -3939,9 +4237,9 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
                root_mnt->mnt.mnt_flags &= ~MNT_LOCKED;
        }
        /* mount old root on put_old */
-       attach_mnt(root_mnt, old_mnt, old_mp);
+       attach_mnt(root_mnt, old_mnt, old_mp, false);
        /* mount new_root on / */
-       attach_mnt(new_mnt, root_parent, root_mp);
+       attach_mnt(new_mnt, root_parent, root_mp, false);
        mnt_add_count(root_parent, -1);
        touch_mnt_namespace(current->nsproxy->mnt_ns);
        /* A moved mount should not expire automatically */
@@ -4124,7 +4422,7 @@ static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
        struct mount *mnt = real_mount(path->mnt);
        int err = 0;
 
-       if (path->dentry != mnt->mnt.mnt_root)
+       if (!path_mounted(path))
                return -EINVAL;
 
        if (kattr->mnt_userns) {
index fea5f88..70f5563 100644 (file)
@@ -35,7 +35,7 @@ bl_free_device(struct pnfs_block_dev *dev)
                }
 
                if (dev->bdev)
-                       blkdev_put(dev->bdev, FMODE_READ | FMODE_WRITE);
+                       blkdev_put(dev->bdev, NULL);
        }
 }
 
@@ -243,7 +243,8 @@ bl_parse_simple(struct nfs_server *server, struct pnfs_block_dev *d,
        if (!dev)
                return -EIO;
 
-       bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL);
+       bdev = blkdev_get_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_WRITE, NULL,
+                                NULL);
        if (IS_ERR(bdev)) {
                printk(KERN_WARNING "pNFS: failed to open device %d:%d (%ld)\n",
                        MAJOR(dev), MINOR(dev), PTR_ERR(bdev));
@@ -312,7 +313,8 @@ bl_open_path(struct pnfs_block_volume *v, const char *prefix)
        if (!devname)
                return ERR_PTR(-ENOMEM);
 
-       bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL);
+       bdev = blkdev_get_by_path(devname, BLK_OPEN_READ | BLK_OPEN_WRITE, NULL,
+                                 NULL);
        if (IS_ERR(bdev)) {
                pr_warn("pNFS: failed to open device %s (%ld)\n",
                        devname, PTR_ERR(bdev));
@@ -373,7 +375,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
        return 0;
 
 out_blkdev_put:
-       blkdev_put(d->bdev, FMODE_READ | FMODE_WRITE);
+       blkdev_put(d->bdev, NULL);
        return error;
 }
 
index bacad0c..8f3112e 100644 (file)
@@ -317,7 +317,7 @@ static int nfs_readdir_folio_array_append(struct folio *folio,
 
        name = nfs_readdir_copy_name(entry->name, entry->len);
 
-       array = kmap_atomic(folio_page(folio, 0));
+       array = kmap_local_folio(folio, 0);
        if (!name)
                goto out;
        ret = nfs_readdir_array_can_expand(array);
@@ -340,7 +340,7 @@ static int nfs_readdir_folio_array_append(struct folio *folio,
                nfs_readdir_array_set_eof(array);
 out:
        *cookie = array->last_cookie;
-       kunmap_atomic(array);
+       kunmap_local(array);
        return ret;
 }
 
@@ -402,7 +402,7 @@ static struct folio *nfs_readdir_folio_get_locked(struct address_space *mapping,
        struct folio *folio;
 
        folio = filemap_grab_folio(mapping, index);
-       if (!folio)
+       if (IS_ERR(folio))
                return NULL;
        nfs_readdir_folio_init_and_validate(folio, cookie, change_attr);
        return folio;
index f0edf5a..3855f3c 100644 (file)
@@ -178,6 +178,27 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to)
 }
 EXPORT_SYMBOL_GPL(nfs_file_read);
 
+ssize_t
+nfs_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe,
+                    size_t len, unsigned int flags)
+{
+       struct inode *inode = file_inode(in);
+       ssize_t result;
+
+       dprintk("NFS: splice_read(%pD2, %zu@%llu)\n", in, len, *ppos);
+
+       nfs_start_io_read(inode);
+       result = nfs_revalidate_mapping(inode, in->f_mapping);
+       if (!result) {
+               result = filemap_splice_read(in, ppos, pipe, len, flags);
+               if (result > 0)
+                       nfs_add_stats(inode, NFSIOS_NORMALREADBYTES, result);
+       }
+       nfs_end_io_read(inode);
+       return result;
+}
+EXPORT_SYMBOL_GPL(nfs_file_splice_read);
+
 int
 nfs_file_mmap(struct file * file, struct vm_area_struct * vma)
 {
@@ -879,7 +900,7 @@ const struct file_operations nfs_file_operations = {
        .fsync          = nfs_file_fsync,
        .lock           = nfs_lock,
        .flock          = nfs_flock,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = nfs_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .check_flags    = nfs_check_flags,
        .setlease       = simple_nosetlease,
index 3cc027d..b5f21d3 100644 (file)
@@ -416,6 +416,8 @@ static inline __u32 nfs_access_xattr_mask(const struct nfs_server *server)
 int nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync);
 loff_t nfs_file_llseek(struct file *, loff_t, int);
 ssize_t nfs_file_read(struct kiocb *, struct iov_iter *);
+ssize_t nfs_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe,
+                            size_t len, unsigned int flags);
 int nfs_file_mmap(struct file *, struct vm_area_struct *);
 ssize_t nfs_file_write(struct kiocb *, struct iov_iter *);
 int nfs_file_release(struct inode *, struct file *);
index 2563ed8..4aeadd6 100644 (file)
@@ -454,7 +454,7 @@ const struct file_operations nfs4_file_operations = {
        .fsync          = nfs_file_fsync,
        .lock           = nfs_lock,
        .flock          = nfs_flock,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = nfs_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .check_flags    = nfs_check_flags,
        .setlease       = nfs4_setlease,
index 18f25ff..d366539 100644 (file)
@@ -5437,10 +5437,18 @@ static bool nfs4_read_plus_not_supported(struct rpc_task *task,
        return false;
 }
 
-static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
+static inline void nfs4_read_plus_scratch_free(struct nfs_pgio_header *hdr)
 {
-       if (hdr->res.scratch)
+       if (hdr->res.scratch) {
                kfree(hdr->res.scratch);
+               hdr->res.scratch = NULL;
+       }
+}
+
+static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
+{
+       nfs4_read_plus_scratch_free(hdr);
+
        if (!nfs4_sequence_done(task, &hdr->res.seq_res))
                return -EAGAIN;
        if (nfs4_read_stateid_changed(task, &hdr->args))
index f21259e..4c9b878 100644 (file)
@@ -80,6 +80,8 @@ enum {
 
 int    nfsd_drc_slab_create(void);
 void   nfsd_drc_slab_free(void);
+int    nfsd_net_reply_cache_init(struct nfsd_net *nn);
+void   nfsd_net_reply_cache_destroy(struct nfsd_net *nn);
 int    nfsd_reply_cache_init(struct nfsd_net *);
 void   nfsd_reply_cache_shutdown(struct nfsd_net *);
 int    nfsd_cache_lookup(struct svc_rqst *);
index ae85257..11a0eaa 100644 (file)
@@ -97,7 +97,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
                goto out;
 
        err = -EINVAL;
-       if ((len=qword_get(&mesg, buf, PAGE_SIZE)) <= 0)
+       if (qword_get(&mesg, buf, PAGE_SIZE) <= 0)
                goto out;
 
        err = -ENOENT;
@@ -107,7 +107,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
        dprintk("found domain %s\n", buf);
 
        err = -EINVAL;
-       if ((len=qword_get(&mesg, buf, PAGE_SIZE)) <= 0)
+       if (qword_get(&mesg, buf, PAGE_SIZE) <= 0)
                goto out;
        fsidtype = simple_strtoul(buf, &ep, 10);
        if (*ep)
@@ -593,7 +593,6 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 {
        /* client path expiry [flags anonuid anongid fsid] */
        char *buf;
-       int len;
        int err;
        struct auth_domain *dom = NULL;
        struct svc_export exp = {}, *expp;
@@ -609,8 +608,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 
        /* client */
        err = -EINVAL;
-       len = qword_get(&mesg, buf, PAGE_SIZE);
-       if (len <= 0)
+       if (qword_get(&mesg, buf, PAGE_SIZE) <= 0)
                goto out;
 
        err = -ENOENT;
@@ -620,7 +618,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 
        /* path */
        err = -EINVAL;
-       if ((len = qword_get(&mesg, buf, PAGE_SIZE)) <= 0)
+       if (qword_get(&mesg, buf, PAGE_SIZE) <= 0)
                goto out1;
 
        err = kern_path(buf, 0, &exp.ex_path);
@@ -665,7 +663,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
                        goto out3;
                exp.ex_fsid = an_int;
 
-               while ((len = qword_get(&mesg, buf, PAGE_SIZE)) > 0) {
+               while (qword_get(&mesg, buf, PAGE_SIZE) > 0) {
                        if (strcmp(buf, "fsloc") == 0)
                                err = fsloc_parse(&mesg, buf, &exp.ex_fslocs);
                        else if (strcmp(buf, "uuid") == 0)
index e6bb8ee..fc8d5b7 100644 (file)
@@ -151,8 +151,6 @@ nfsd3_proc_read(struct svc_rqst *rqstp)
 {
        struct nfsd3_readargs *argp = rqstp->rq_argp;
        struct nfsd3_readres *resp = rqstp->rq_resp;
-       unsigned int len;
-       int v;
 
        dprintk("nfsd: READ(3) %s %lu bytes at %Lu\n",
                                SVCFH_fmt(&argp->fh),
@@ -166,17 +164,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp)
        if (argp->offset + argp->count > (u64)OFFSET_MAX)
                argp->count = (u64)OFFSET_MAX - argp->offset;
 
-       v = 0;
-       len = argp->count;
        resp->pages = rqstp->rq_next_page;
-       while (len > 0) {
-               struct page *page = *(rqstp->rq_next_page++);
-
-               rqstp->rq_vec[v].iov_base = page_address(page);
-               rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE);
-               len -= rqstp->rq_vec[v].iov_len;
-               v++;
-       }
 
        /* Obtain buffer pointer for payload.
         * 1 (status) + 22 (post_op_attr) + 1 (count) + 1 (eof)
@@ -187,7 +175,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp)
 
        fh_copy(&resp->fh, &argp->fh);
        resp->status = nfsd_read(rqstp, &resp->fh, argp->offset,
-                                rqstp->rq_vec, v, &resp->count, &resp->eof);
+                                &resp->count, &resp->eof);
        return rpc_success;
 }
 
index 3308dd6..f321289 100644 (file)
@@ -828,7 +828,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
                        return false;
                if (xdr_stream_encode_u32(xdr, resp->len) < 0)
                        return false;
-               xdr_write_pages(xdr, resp->pages, 0, resp->len);
+               svcxdr_encode_opaque_pages(rqstp, xdr, resp->pages, 0,
+                                          resp->len);
                if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0)
                        return false;
                break;
@@ -859,8 +860,9 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
                        return false;
                if (xdr_stream_encode_u32(xdr, resp->count) < 0)
                        return false;
-               xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base,
-                               resp->count);
+               svcxdr_encode_opaque_pages(rqstp, xdr, resp->pages,
+                                          rqstp->rq_res.page_base,
+                                          resp->count);
                if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0)
                        return false;
                break;
@@ -961,7 +963,8 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
                        return false;
                if (!svcxdr_encode_cookieverf3(xdr, resp->verf))
                        return false;
-               xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len);
+               svcxdr_encode_opaque_pages(rqstp, xdr, dirlist->pages, 0,
+                                          dirlist->len);
                /* no more entries */
                if (xdr_stream_encode_item_absent(xdr) < 0)
                        return false;
index 76db2fe..26b1343 100644 (file)
@@ -2541,6 +2541,20 @@ static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode,
        return p;
 }
 
+static __be32 nfsd4_encode_nfstime4(struct xdr_stream *xdr,
+                                   struct timespec64 *tv)
+{
+       __be32 *p;
+
+       p = xdr_reserve_space(xdr, XDR_UNIT * 3);
+       if (!p)
+               return nfserr_resource;
+
+       p = xdr_encode_hyper(p, (s64)tv->tv_sec);
+       *p = cpu_to_be32(tv->tv_nsec);
+       return nfs_ok;
+}
+
 /*
  * ctime (in NFSv4, time_metadata) is not writeable, and the client
  * doesn't really care what resolution could theoretically be stored by
@@ -2566,12 +2580,16 @@ static __be32 *encode_time_delta(__be32 *p, struct inode *inode)
        return p;
 }
 
-static __be32 *encode_cinfo(__be32 *p, struct nfsd4_change_info *c)
+static __be32
+nfsd4_encode_change_info4(struct xdr_stream *xdr, struct nfsd4_change_info *c)
 {
-       *p++ = cpu_to_be32(c->atomic);
-       p = xdr_encode_hyper(p, c->before_change);
-       p = xdr_encode_hyper(p, c->after_change);
-       return p;
+       if (xdr_stream_encode_bool(xdr, c->atomic) < 0)
+               return nfserr_resource;
+       if (xdr_stream_encode_u64(xdr, c->before_change) < 0)
+               return nfserr_resource;
+       if (xdr_stream_encode_u64(xdr, c->after_change) < 0)
+               return nfserr_resource;
+       return nfs_ok;
 }
 
 /* Encode as an array of strings the string given with components
@@ -3348,11 +3366,9 @@ out_acl:
                p = xdr_encode_hyper(p, dummy64);
        }
        if (bmval1 & FATTR4_WORD1_TIME_ACCESS) {
-               p = xdr_reserve_space(xdr, 12);
-               if (!p)
-                       goto out_resource;
-               p = xdr_encode_hyper(p, (s64)stat.atime.tv_sec);
-               *p++ = cpu_to_be32(stat.atime.tv_nsec);
+               status = nfsd4_encode_nfstime4(xdr, &stat.atime);
+               if (status)
+                       goto out;
        }
        if (bmval1 & FATTR4_WORD1_TIME_DELTA) {
                p = xdr_reserve_space(xdr, 12);
@@ -3361,25 +3377,19 @@ out_acl:
                p = encode_time_delta(p, d_inode(dentry));
        }
        if (bmval1 & FATTR4_WORD1_TIME_METADATA) {
-               p = xdr_reserve_space(xdr, 12);
-               if (!p)
-                       goto out_resource;
-               p = xdr_encode_hyper(p, (s64)stat.ctime.tv_sec);
-               *p++ = cpu_to_be32(stat.ctime.tv_nsec);
+               status = nfsd4_encode_nfstime4(xdr, &stat.ctime);
+               if (status)
+                       goto out;
        }
        if (bmval1 & FATTR4_WORD1_TIME_MODIFY) {
-               p = xdr_reserve_space(xdr, 12);
-               if (!p)
-                       goto out_resource;
-               p = xdr_encode_hyper(p, (s64)stat.mtime.tv_sec);
-               *p++ = cpu_to_be32(stat.mtime.tv_nsec);
+               status = nfsd4_encode_nfstime4(xdr, &stat.mtime);
+               if (status)
+                       goto out;
        }
        if (bmval1 & FATTR4_WORD1_TIME_CREATE) {
-               p = xdr_reserve_space(xdr, 12);
-               if (!p)
-                       goto out_resource;
-               p = xdr_encode_hyper(p, (s64)stat.btime.tv_sec);
-               *p++ = cpu_to_be32(stat.btime.tv_nsec);
+               status = nfsd4_encode_nfstime4(xdr, &stat.btime);
+               if (status)
+                       goto out;
        }
        if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
                u64 ino = stat.ino;
@@ -3689,6 +3699,30 @@ fail:
 }
 
 static __be32
+nfsd4_encode_verifier4(struct xdr_stream *xdr, const nfs4_verifier *verf)
+{
+       __be32 *p;
+
+       p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
+       if (!p)
+               return nfserr_resource;
+       memcpy(p, verf->data, sizeof(verf->data));
+       return nfs_ok;
+}
+
+static __be32
+nfsd4_encode_clientid4(struct xdr_stream *xdr, const clientid_t *clientid)
+{
+       __be32 *p;
+
+       p = xdr_reserve_space(xdr, sizeof(__be64));
+       if (!p)
+               return nfserr_resource;
+       memcpy(p, clientid, sizeof(*clientid));
+       return nfs_ok;
+}
+
+static __be32
 nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid)
 {
        __be32 *p;
@@ -3752,15 +3786,8 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr,
                    union nfsd4_op_u *u)
 {
        struct nfsd4_commit *commit = &u->commit;
-       struct xdr_stream *xdr = resp->xdr;
-       __be32 *p;
 
-       p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
-       if (!p)
-               return nfserr_resource;
-       p = xdr_encode_opaque_fixed(p, commit->co_verf.data,
-                                               NFS4_VERIFIER_SIZE);
-       return 0;
+       return nfsd4_encode_verifier4(resp->xdr, &commit->co_verf);
 }
 
 static __be32
@@ -3769,12 +3796,10 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr,
 {
        struct nfsd4_create *create = &u->create;
        struct xdr_stream *xdr = resp->xdr;
-       __be32 *p;
 
-       p = xdr_reserve_space(xdr, 20);
-       if (!p)
-               return nfserr_resource;
-       encode_cinfo(p, &create->cr_cinfo);
+       nfserr = nfsd4_encode_change_info4(xdr, &create->cr_cinfo);
+       if (nfserr)
+               return nfserr;
        return nfsd4_encode_bitmap(xdr, create->cr_bmval[0],
                        create->cr_bmval[1], create->cr_bmval[2]);
 }
@@ -3892,13 +3917,8 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr,
 {
        struct nfsd4_link *link = &u->link;
        struct xdr_stream *xdr = resp->xdr;
-       __be32 *p;
 
-       p = xdr_reserve_space(xdr, 20);
-       if (!p)
-               return nfserr_resource;
-       p = encode_cinfo(p, &link->li_cinfo);
-       return 0;
+       return nfsd4_encode_change_info4(xdr, &link->li_cinfo);
 }
 
 
@@ -3913,11 +3933,11 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr,
        nfserr = nfsd4_encode_stateid(xdr, &open->op_stateid);
        if (nfserr)
                return nfserr;
-       p = xdr_reserve_space(xdr, 24);
-       if (!p)
+       nfserr = nfsd4_encode_change_info4(xdr, &open->op_cinfo);
+       if (nfserr)
+               return nfserr;
+       if (xdr_stream_encode_u32(xdr, open->op_rflags) < 0)
                return nfserr_resource;
-       p = encode_cinfo(p, &open->op_cinfo);
-       *p++ = cpu_to_be32(open->op_rflags);
 
        nfserr = nfsd4_encode_bitmap(xdr, open->op_bmval[0], open->op_bmval[1],
                                        open->op_bmval[2]);
@@ -3956,7 +3976,7 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr,
                p = xdr_reserve_space(xdr, 32);
                if (!p)
                        return nfserr_resource;
-               *p++ = cpu_to_be32(0);
+               *p++ = cpu_to_be32(open->op_recall);
 
                /*
                 * TODO: space_limit's in delegations
@@ -4018,6 +4038,11 @@ nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr,
        return nfsd4_encode_stateid(xdr, &od->od_stateid);
 }
 
+/*
+ * The operation of this function assumes that this is the only
+ * READ operation in the COMPOUND. If there are multiple READs,
+ * we use nfsd4_encode_readv().
+ */
 static __be32 nfsd4_encode_splice_read(
                                struct nfsd4_compoundres *resp,
                                struct nfsd4_read *read,
@@ -4028,8 +4053,12 @@ static __be32 nfsd4_encode_splice_read(
        int status, space_left;
        __be32 nfserr;
 
-       /* Make sure there will be room for padding if needed */
-       if (xdr->end - xdr->p < 1)
+       /*
+        * Make sure there is room at the end of buf->head for
+        * svcxdr_encode_opaque_pages() to create a tail buffer
+        * to XDR-pad the payload.
+        */
+       if (xdr->iov != xdr->buf->head || xdr->end - xdr->p < 1)
                return nfserr_resource;
 
        nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp,
@@ -4038,6 +4067,8 @@ static __be32 nfsd4_encode_splice_read(
        read->rd_length = maxcount;
        if (nfserr)
                goto out_err;
+       svcxdr_encode_opaque_pages(read->rd_rqstp, xdr, buf->pages,
+                                  buf->page_base, maxcount);
        status = svc_encode_result_payload(read->rd_rqstp,
                                           buf->head[0].iov_len, maxcount);
        if (status) {
@@ -4045,31 +4076,19 @@ static __be32 nfsd4_encode_splice_read(
                goto out_err;
        }
 
-       buf->page_len = maxcount;
-       buf->len += maxcount;
-       xdr->page_ptr += (buf->page_base + maxcount + PAGE_SIZE - 1)
-                                                       / PAGE_SIZE;
-
-       /* Use rest of head for padding and remaining ops: */
-       buf->tail[0].iov_base = xdr->p;
-       buf->tail[0].iov_len = 0;
-       xdr->iov = buf->tail;
-       if (maxcount&3) {
-               int pad = 4 - (maxcount&3);
-
-               *(xdr->p++) = 0;
-
-               buf->tail[0].iov_base += maxcount&3;
-               buf->tail[0].iov_len = pad;
-               buf->len += pad;
-       }
-
+       /*
+        * Prepare to encode subsequent operations.
+        *
+        * xdr_truncate_encode() is not safe to use after a successful
+        * splice read has been done, so the following stream
+        * manipulations are open-coded.
+        */
        space_left = min_t(int, (void *)xdr->end - (void *)xdr->p,
                                buf->buflen - buf->len);
        buf->buflen = buf->len + space_left;
        xdr->end = (__be32 *)((void *)xdr->end + space_left);
 
-       return 0;
+       return nfs_ok;
 
 out_err:
        /*
@@ -4090,13 +4109,13 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
        __be32 zero = xdr_zero;
        __be32 nfserr;
 
-       read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
-       if (read->rd_vlen < 0)
+       if (xdr_reserve_space_vec(xdr, maxcount) < 0)
                return nfserr_resource;
 
-       nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
-                           resp->rqstp->rq_vec, read->rd_vlen, &maxcount,
-                           &read->rd_eof);
+       nfserr = nfsd_iter_read(resp->rqstp, read->rd_fhp, file,
+                               read->rd_offset, &maxcount,
+                               xdr->buf->page_len & ~PAGE_MASK,
+                               &read->rd_eof);
        read->rd_length = maxcount;
        if (nfserr)
                return nfserr;
@@ -4213,15 +4232,9 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr,
        int starting_len = xdr->buf->len;
        __be32 *p;
 
-       p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
-       if (!p)
-               return nfserr_resource;
-
-       /* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
-       *p++ = cpu_to_be32(0);
-       *p++ = cpu_to_be32(0);
-       xdr->buf->head[0].iov_len = (char *)xdr->p -
-                                   (char *)xdr->buf->head[0].iov_base;
+       nfserr = nfsd4_encode_verifier4(xdr, &readdir->rd_verf);
+       if (nfserr != nfs_ok)
+               return nfserr;
 
        /*
         * Number of bytes left for directory entries allowing for the
@@ -4299,13 +4312,8 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr,
 {
        struct nfsd4_remove *remove = &u->remove;
        struct xdr_stream *xdr = resp->xdr;
-       __be32 *p;
 
-       p = xdr_reserve_space(xdr, 20);
-       if (!p)
-               return nfserr_resource;
-       p = encode_cinfo(p, &remove->rm_cinfo);
-       return 0;
+       return nfsd4_encode_change_info4(xdr, &remove->rm_cinfo);
 }
 
 static __be32
@@ -4314,14 +4322,11 @@ nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr,
 {
        struct nfsd4_rename *rename = &u->rename;
        struct xdr_stream *xdr = resp->xdr;
-       __be32 *p;
 
-       p = xdr_reserve_space(xdr, 40);
-       if (!p)
-               return nfserr_resource;
-       p = encode_cinfo(p, &rename->rn_sinfo);
-       p = encode_cinfo(p, &rename->rn_tinfo);
-       return 0;
+       nfserr = nfsd4_encode_change_info4(xdr, &rename->rn_sinfo);
+       if (nfserr)
+               return nfserr;
+       return nfsd4_encode_change_info4(xdr, &rename->rn_tinfo);
 }
 
 static __be32
@@ -4448,23 +4453,25 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr,
 {
        struct nfsd4_setclientid *scd = &u->setclientid;
        struct xdr_stream *xdr = resp->xdr;
-       __be32 *p;
 
        if (!nfserr) {
-               p = xdr_reserve_space(xdr, 8 + NFS4_VERIFIER_SIZE);
-               if (!p)
-                       return nfserr_resource;
-               p = xdr_encode_opaque_fixed(p, &scd->se_clientid, 8);
-               p = xdr_encode_opaque_fixed(p, &scd->se_confirm,
-                                               NFS4_VERIFIER_SIZE);
-       }
-       else if (nfserr == nfserr_clid_inuse) {
-               p = xdr_reserve_space(xdr, 8);
-               if (!p)
-                       return nfserr_resource;
-               *p++ = cpu_to_be32(0);
-               *p++ = cpu_to_be32(0);
+               nfserr = nfsd4_encode_clientid4(xdr, &scd->se_clientid);
+               if (nfserr != nfs_ok)
+                       goto out;
+               nfserr = nfsd4_encode_verifier4(xdr, &scd->se_confirm);
+       } else if (nfserr == nfserr_clid_inuse) {
+               /* empty network id */
+               if (xdr_stream_encode_u32(xdr, 0) < 0) {
+                       nfserr = nfserr_resource;
+                       goto out;
+               }
+               /* empty universal address */
+               if (xdr_stream_encode_u32(xdr, 0) < 0) {
+                       nfserr = nfserr_resource;
+                       goto out;
+               }
        }
+out:
        return nfserr;
 }
 
@@ -4473,17 +4480,12 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr,
                   union nfsd4_op_u *u)
 {
        struct nfsd4_write *write = &u->write;
-       struct xdr_stream *xdr = resp->xdr;
-       __be32 *p;
 
-       p = xdr_reserve_space(xdr, 16);
-       if (!p)
+       if (xdr_stream_encode_u32(resp->xdr, write->wr_bytes_written) < 0)
                return nfserr_resource;
-       *p++ = cpu_to_be32(write->wr_bytes_written);
-       *p++ = cpu_to_be32(write->wr_how_written);
-       p = xdr_encode_opaque_fixed(p, write->wr_verifier.data,
-                                               NFS4_VERIFIER_SIZE);
-       return 0;
+       if (xdr_stream_encode_u32(resp->xdr, write->wr_how_written) < 0)
+               return nfserr_resource;
+       return nfsd4_encode_verifier4(resp->xdr, &write->wr_verifier);
 }
 
 static __be32
@@ -4505,20 +4507,15 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
        server_scope = nn->nfsd_name;
        server_scope_sz = strlen(nn->nfsd_name);
 
-       p = xdr_reserve_space(xdr,
-               8 /* eir_clientid */ +
-               4 /* eir_sequenceid */ +
-               4 /* eir_flags */ +
-               4 /* spr_how */);
-       if (!p)
+       if (nfsd4_encode_clientid4(xdr, &exid->clientid) != nfs_ok)
+               return nfserr_resource;
+       if (xdr_stream_encode_u32(xdr, exid->seqid) < 0)
+               return nfserr_resource;
+       if (xdr_stream_encode_u32(xdr, exid->flags) < 0)
                return nfserr_resource;
 
-       p = xdr_encode_opaque_fixed(p, &exid->clientid, 8);
-       *p++ = cpu_to_be32(exid->seqid);
-       *p++ = cpu_to_be32(exid->flags);
-
-       *p++ = cpu_to_be32(exid->spa_how);
-
+       if (xdr_stream_encode_u32(xdr, exid->spa_how) < 0)
+               return nfserr_resource;
        switch (exid->spa_how) {
        case SP4_NONE:
                break;
@@ -5099,15 +5096,8 @@ nfsd4_encode_setxattr(struct nfsd4_compoundres *resp, __be32 nfserr,
 {
        struct nfsd4_setxattr *setxattr = &u->setxattr;
        struct xdr_stream *xdr = resp->xdr;
-       __be32 *p;
 
-       p = xdr_reserve_space(xdr, 20);
-       if (!p)
-               return nfserr_resource;
-
-       encode_cinfo(p, &setxattr->setxa_cinfo);
-
-       return 0;
+       return nfsd4_encode_change_info4(xdr, &setxattr->setxa_cinfo);
 }
 
 /*
@@ -5253,14 +5243,8 @@ nfsd4_encode_removexattr(struct nfsd4_compoundres *resp, __be32 nfserr,
 {
        struct nfsd4_removexattr *removexattr = &u->removexattr;
        struct xdr_stream *xdr = resp->xdr;
-       __be32 *p;
 
-       p = xdr_reserve_space(xdr, 20);
-       if (!p)
-               return nfserr_resource;
-
-       p = encode_cinfo(p, &removexattr->rmxa_cinfo);
-       return 0;
+       return nfsd4_encode_change_info4(xdr, &removexattr->rmxa_cinfo);
 }
 
 typedef __be32(*nfsd4_enc)(struct nfsd4_compoundres *, __be32, union nfsd4_op_u *u);
@@ -5460,6 +5444,12 @@ status:
 release:
        if (opdesc && opdesc->op_release)
                opdesc->op_release(&op->u);
+
+       /*
+        * Account for pages consumed while encoding this operation.
+        * The xdr_stream primitives don't manage rq_next_page.
+        */
+       rqstp->rq_next_page = xdr->page_ptr + 1;
 }
 
 /* 
@@ -5528,9 +5518,6 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
        p = resp->statusp;
 
        *p++ = resp->cstate.status;
-
-       rqstp->rq_next_page = xdr->page_ptr + 1;
-
        *p++ = htonl(resp->taglen);
        memcpy(p, resp->tag, resp->taglen);
        p += XDR_QUADLEN(resp->taglen);
index 041faa1..a8eda1c 100644 (file)
@@ -148,12 +148,23 @@ void nfsd_drc_slab_free(void)
        kmem_cache_destroy(drc_slab);
 }
 
-static int nfsd_reply_cache_stats_init(struct nfsd_net *nn)
+/**
+ * nfsd_net_reply_cache_init - per net namespace reply cache set-up
+ * @nn: nfsd_net being initialized
+ *
+ * Returns zero on succes; otherwise a negative errno is returned.
+ */
+int nfsd_net_reply_cache_init(struct nfsd_net *nn)
 {
        return nfsd_percpu_counters_init(nn->counter, NFSD_NET_COUNTERS_NUM);
 }
 
-static void nfsd_reply_cache_stats_destroy(struct nfsd_net *nn)
+/**
+ * nfsd_net_reply_cache_destroy - per net namespace reply cache tear-down
+ * @nn: nfsd_net being freed
+ *
+ */
+void nfsd_net_reply_cache_destroy(struct nfsd_net *nn)
 {
        nfsd_percpu_counters_destroy(nn->counter, NFSD_NET_COUNTERS_NUM);
 }
@@ -169,17 +180,13 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
        hashsize = nfsd_hashsize(nn->max_drc_entries);
        nn->maskbits = ilog2(hashsize);
 
-       status = nfsd_reply_cache_stats_init(nn);
-       if (status)
-               goto out_nomem;
-
        nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan;
        nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count;
        nn->nfsd_reply_cache_shrinker.seeks = 1;
        status = register_shrinker(&nn->nfsd_reply_cache_shrinker,
                                   "nfsd-reply:%s", nn->nfsd_name);
        if (status)
-               goto out_stats_destroy;
+               return status;
 
        nn->drc_hashtbl = kvzalloc(array_size(hashsize,
                                sizeof(*nn->drc_hashtbl)), GFP_KERNEL);
@@ -195,9 +202,6 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
        return 0;
 out_shrinker:
        unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
-out_stats_destroy:
-       nfsd_reply_cache_stats_destroy(nn);
-out_nomem:
        printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
        return -ENOMEM;
 }
@@ -217,7 +221,6 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn)
                                                                        rp, nn);
                }
        }
-       nfsd_reply_cache_stats_destroy(nn);
 
        kvfree(nn->drc_hashtbl);
        nn->drc_hashtbl = NULL;
index 7b8f17e..1b8b1aa 100644 (file)
@@ -25,6 +25,7 @@
 #include "netns.h"
 #include "pnfs.h"
 #include "filecache.h"
+#include "trace.h"
 
 /*
  *     We have a single directory with several nodes in it.
@@ -109,12 +110,12 @@ static ssize_t nfsctl_transaction_write(struct file *file, const char __user *bu
        if (IS_ERR(data))
                return PTR_ERR(data);
 
-       rv =  write_op[ino](file, data, size);
-       if (rv >= 0) {
-               simple_transaction_set(file, rv);
-               rv = size;
-       }
-       return rv;
+       rv = write_op[ino](file, data, size);
+       if (rv < 0)
+               return rv;
+
+       simple_transaction_set(file, rv);
+       return size;
 }
 
 static ssize_t nfsctl_transaction_read(struct file *file, char __user *buf, size_t size, loff_t *pos)
@@ -153,18 +154,6 @@ static int exports_net_open(struct net *net, struct file *file)
        return 0;
 }
 
-static int exports_proc_open(struct inode *inode, struct file *file)
-{
-       return exports_net_open(current->nsproxy->net_ns, file);
-}
-
-static const struct proc_ops exports_proc_ops = {
-       .proc_open      = exports_proc_open,
-       .proc_read      = seq_read,
-       .proc_lseek     = seq_lseek,
-       .proc_release   = seq_release,
-};
-
 static int exports_nfsd_open(struct inode *inode, struct file *file)
 {
        return exports_net_open(inode->i_sb->s_fs_info, file);
@@ -242,6 +231,7 @@ static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size)
        if (rpc_pton(net, fo_path, size, sap, salen) == 0)
                return -EINVAL;
 
+       trace_nfsd_ctl_unlock_ip(net, buf);
        return nlmsvc_unlock_all_by_ip(sap);
 }
 
@@ -275,7 +265,7 @@ static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size)
        fo_path = buf;
        if (qword_get(&buf, fo_path, size) < 0)
                return -EINVAL;
-
+       trace_nfsd_ctl_unlock_fs(netns(file), fo_path);
        error = kern_path(fo_path, 0, &path);
        if (error)
                return error;
@@ -336,7 +326,7 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
        len = qword_get(&mesg, dname, size);
        if (len <= 0)
                return -EINVAL;
-       
+
        path = dname+len+1;
        len = qword_get(&mesg, path, size);
        if (len <= 0)
@@ -350,15 +340,17 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
                return -EINVAL;
        maxsize = min(maxsize, NFS3_FHSIZE);
 
-       if (qword_get(&mesg, mesg, size)>0)
+       if (qword_get(&mesg, mesg, size) > 0)
                return -EINVAL;
 
+       trace_nfsd_ctl_filehandle(netns(file), dname, path, maxsize);
+
        /* we have all the words, they are in buf.. */
        dom = unix_domain_find(dname);
        if (!dom)
                return -ENOMEM;
 
-       len = exp_rootfh(netns(file), dom, path, &fh,  maxsize);
+       len = exp_rootfh(netns(file), dom, path, &fh, maxsize);
        auth_domain_put(dom);
        if (len)
                return len;
@@ -411,6 +403,7 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
                        return rv;
                if (newthreads < 0)
                        return -EINVAL;
+               trace_nfsd_ctl_threads(net, newthreads);
                rv = nfsd_svc(newthreads, net, file->f_cred);
                if (rv < 0)
                        return rv;
@@ -430,8 +423,8 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
  * OR
  *
  * Input:
- *                     buf:            C string containing whitespace-
- *                                     separated unsigned integer values
+ *                     buf:            C string containing whitespace-
+ *                                     separated unsigned integer values
  *                                     representing the number of NFSD
  *                                     threads to start in each pool
  *                     size:           non-zero length of C string in @buf
@@ -483,6 +476,7 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
                        rv = -EINVAL;
                        if (nthreads[i] < 0)
                                goto out_free;
+                       trace_nfsd_ctl_pool_threads(net, i, nthreads[i]);
                }
                rv = nfsd_set_nrthreads(i, nthreads, net);
                if (rv)
@@ -538,7 +532,7 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size)
        char *sep;
        struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
 
-       if (size>0) {
+       if (size > 0) {
                if (nn->nfsd_serv)
                        /* Cannot change versions without updating
                         * nn->nfsd_serv->sv_xdrsize, and reallocing
@@ -548,6 +542,7 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size)
                if (buf[size-1] != '\n')
                        return -EINVAL;
                buf[size-1] = 0;
+               trace_nfsd_ctl_version(netns(file), buf);
 
                vers = mesg;
                len = qword_get(&mesg, vers, size);
@@ -649,11 +644,11 @@ out:
  * OR
  *
  * Input:
- *                     buf:            C string containing whitespace-
- *                                     separated positive or negative
- *                                     integer values representing NFS
- *                                     protocol versions to enable ("+n")
- *                                     or disable ("-n")
+ *                     buf:            C string containing whitespace-
+ *                                     separated positive or negative
+ *                                     integer values representing NFS
+ *                                     protocol versions to enable ("+n")
+ *                                     or disable ("-n")
  *                     size:           non-zero length of C string in @buf
  * Output:
  *     On success:     status of zero or more protocol versions has
@@ -701,17 +696,13 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
        err = get_int(&mesg, &fd);
        if (err != 0 || fd < 0)
                return -EINVAL;
-
-       if (svc_alien_sock(net, fd)) {
-               printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__);
-               return -EINVAL;
-       }
+       trace_nfsd_ctl_ports_addfd(net, fd);
 
        err = nfsd_create_serv(net);
        if (err != 0)
                return err;
 
-       err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
+       err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
 
        if (err >= 0 &&
            !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1))
@@ -722,7 +713,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
 }
 
 /*
- * A transport listener is added by writing it's transport name and
+ * A transport listener is added by writing its transport name and
  * a port number.
  */
 static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cred *cred)
@@ -737,6 +728,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr
 
        if (port < 1 || port > USHRT_MAX)
                return -EINVAL;
+       trace_nfsd_ctl_ports_addxprt(net, transport, port);
 
        err = nfsd_create_serv(net);
        if (err != 0)
@@ -849,9 +841,9 @@ int nfsd_max_blksize;
  * OR
  *
  * Input:
- *                     buf:            C string containing an unsigned
- *                                     integer value representing the new
- *                                     NFS blksize
+ *                     buf:            C string containing an unsigned
+ *                                     integer value representing the new
+ *                                     NFS blksize
  *                     size:           non-zero length of C string in @buf
  * Output:
  *     On success:     passed-in buffer filled with '\n'-terminated C string
@@ -870,6 +862,8 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
                int rv = get_int(&mesg, &bsize);
                if (rv)
                        return rv;
+               trace_nfsd_ctl_maxblksize(netns(file), bsize);
+
                /* force bsize into allowed range and
                 * required alignment.
                 */
@@ -898,9 +892,9 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
  * OR
  *
  * Input:
- *                     buf:            C string containing an unsigned
- *                                     integer value representing the new
- *                                     number of max connections
+ *                     buf:            C string containing an unsigned
+ *                                     integer value representing the new
+ *                                     number of max connections
  *                     size:           non-zero length of C string in @buf
  * Output:
  *     On success:     passed-in buffer filled with '\n'-terminated C string
@@ -920,6 +914,7 @@ static ssize_t write_maxconn(struct file *file, char *buf, size_t size)
 
                if (rv)
                        return rv;
+               trace_nfsd_ctl_maxconn(netns(file), maxconn);
                nn->max_connections = maxconn;
        }
 
@@ -930,6 +925,7 @@ static ssize_t write_maxconn(struct file *file, char *buf, size_t size)
 static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t size,
                                  time64_t *time, struct nfsd_net *nn)
 {
+       struct dentry *dentry = file_dentry(file);
        char *mesg = buf;
        int rv, i;
 
@@ -939,6 +935,9 @@ static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t size,
                rv = get_int(&mesg, &i);
                if (rv)
                        return rv;
+               trace_nfsd_ctl_time(netns(file), dentry->d_name.name,
+                                   dentry->d_name.len, i);
+
                /*
                 * Some sanity checking.  We don't have a reason for
                 * these particular numbers, but problems with the
@@ -1031,6 +1030,7 @@ static ssize_t __write_recoverydir(struct file *file, char *buf, size_t size,
                len = qword_get(&mesg, recdir, size);
                if (len <= 0)
                        return -EINVAL;
+               trace_nfsd_ctl_recoverydir(netns(file), recdir);
 
                status = nfs4_reset_recoverydir(recdir);
                if (status)
@@ -1082,7 +1082,7 @@ static ssize_t write_recoverydir(struct file *file, char *buf, size_t size)
  * OR
  *
  * Input:
- *                     buf:            any value
+ *                     buf:            any value
  *                     size:           non-zero length of C string in @buf
  * Output:
  *                     passed-in buffer filled with "Y" or "N" with a newline
@@ -1104,7 +1104,7 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
                case '1':
                        if (!nn->nfsd_serv)
                                return -EBUSY;
-                       nfsd4_end_grace(nn);
+                       trace_nfsd_end_grace(netns(file));
                        break;
                default:
                        return -EINVAL;
@@ -1209,8 +1209,8 @@ static int __nfsd_symlink(struct inode *dir, struct dentry *dentry,
  * @content is assumed to be a NUL-terminated string that lives
  * longer than the symlink itself.
  */
-static void nfsd_symlink(struct dentry *parent, const char *name,
-                        const char *content)
+static void _nfsd_symlink(struct dentry *parent, const char *name,
+                         const char *content)
 {
        struct inode *dir = parent->d_inode;
        struct dentry *dentry;
@@ -1227,8 +1227,8 @@ out:
        inode_unlock(dir);
 }
 #else
-static inline void nfsd_symlink(struct dentry *parent, const char *name,
-                               const char *content)
+static inline void _nfsd_symlink(struct dentry *parent, const char *name,
+                                const char *content)
 {
 }
 
@@ -1406,8 +1406,8 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
        ret = simple_fill_super(sb, 0x6e667364, nfsd_files);
        if (ret)
                return ret;
-       nfsd_symlink(sb->s_root, "supported_krb5_enctypes",
-                    "/proc/net/rpc/gss_krb5_enctypes");
+       _nfsd_symlink(sb->s_root, "supported_krb5_enctypes",
+                     "/proc/net/rpc/gss_krb5_enctypes");
        dentry = nfsd_mkdir(sb->s_root, NULL, "clients");
        if (IS_ERR(dentry))
                return PTR_ERR(dentry);
@@ -1458,6 +1458,19 @@ static struct file_system_type nfsd_fs_type = {
 MODULE_ALIAS_FS("nfsd");
 
 #ifdef CONFIG_PROC_FS
+
+static int exports_proc_open(struct inode *inode, struct file *file)
+{
+       return exports_net_open(current->nsproxy->net_ns, file);
+}
+
+static const struct proc_ops exports_proc_ops = {
+       .proc_open      = exports_proc_open,
+       .proc_read      = seq_read,
+       .proc_lseek     = seq_lseek,
+       .proc_release   = seq_release,
+};
+
 static int create_proc_exports_entry(void)
 {
        struct proc_dir_entry *entry;
@@ -1481,7 +1494,17 @@ static int create_proc_exports_entry(void)
 
 unsigned int nfsd_net_id;
 
-static __net_init int nfsd_init_net(struct net *net)
+/**
+ * nfsd_net_init - Prepare the nfsd_net portion of a new net namespace
+ * @net: a freshly-created network namespace
+ *
+ * This information stays around as long as the network namespace is
+ * alive whether or not there is an NFSD instance running in the
+ * namespace.
+ *
+ * Returns zero on success, or a negative errno otherwise.
+ */
+static __net_init int nfsd_net_init(struct net *net)
 {
        int retval;
        struct nfsd_net *nn = net_generic(net, nfsd_net_id);
@@ -1492,6 +1515,9 @@ static __net_init int nfsd_init_net(struct net *net)
        retval = nfsd_idmap_init(net);
        if (retval)
                goto out_idmap_error;
+       retval = nfsd_net_reply_cache_init(nn);
+       if (retval)
+               goto out_repcache_error;
        nn->nfsd_versions = NULL;
        nn->nfsd4_minorversions = NULL;
        nfsd4_init_leases_net(nn);
@@ -1500,22 +1526,32 @@ static __net_init int nfsd_init_net(struct net *net)
 
        return 0;
 
+out_repcache_error:
+       nfsd_idmap_shutdown(net);
 out_idmap_error:
        nfsd_export_shutdown(net);
 out_export_error:
        return retval;
 }
 
-static __net_exit void nfsd_exit_net(struct net *net)
+/**
+ * nfsd_net_exit - Release the nfsd_net portion of a net namespace
+ * @net: a network namespace that is about to be destroyed
+ *
+ */
+static __net_exit void nfsd_net_exit(struct net *net)
 {
+       struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+
+       nfsd_net_reply_cache_destroy(nn);
        nfsd_idmap_shutdown(net);
        nfsd_export_shutdown(net);
-       nfsd_netns_free_versions(net_generic(net, nfsd_net_id));
+       nfsd_netns_free_versions(nn);
 }
 
 static struct pernet_operations nfsd_net_ops = {
-       .init = nfsd_init_net,
-       .exit = nfsd_exit_net,
+       .init = nfsd_net_init,
+       .exit = nfsd_net_exit,
        .id   = &nfsd_net_id,
        .size = sizeof(struct nfsd_net),
 };
index ccd8485..e8e13ae 100644 (file)
@@ -623,16 +623,9 @@ void fh_fill_pre_attrs(struct svc_fh *fhp)
 
        inode = d_inode(fhp->fh_dentry);
        err = fh_getattr(fhp, &stat);
-       if (err) {
-               /* Grab the times from inode anyway */
-               stat.mtime = inode->i_mtime;
-               stat.ctime = inode->i_ctime;
-               stat.size  = inode->i_size;
-               if (v4 && IS_I_VERSION(inode)) {
-                       stat.change_cookie = inode_query_iversion(inode);
-                       stat.result_mask |= STATX_CHANGE_COOKIE;
-               }
-       }
+       if (err)
+               return;
+
        if (v4)
                fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode);
 
@@ -660,15 +653,10 @@ void fh_fill_post_attrs(struct svc_fh *fhp)
                printk("nfsd: inode locked twice during operation.\n");
 
        err = fh_getattr(fhp, &fhp->fh_post_attr);
-       if (err) {
-               fhp->fh_post_saved = false;
-               fhp->fh_post_attr.ctime = inode->i_ctime;
-               if (v4 && IS_I_VERSION(inode)) {
-                       fhp->fh_post_attr.change_cookie = inode_query_iversion(inode);
-                       fhp->fh_post_attr.result_mask |= STATX_CHANGE_COOKIE;
-               }
-       } else
-               fhp->fh_post_saved = true;
+       if (err)
+               return;
+
+       fhp->fh_post_saved = true;
        if (v4)
                fhp->fh_post_change =
                        nfsd4_change_attribute(&fhp->fh_post_attr, inode);
index c371955..a731592 100644 (file)
@@ -176,9 +176,7 @@ nfsd_proc_read(struct svc_rqst *rqstp)
 {
        struct nfsd_readargs *argp = rqstp->rq_argp;
        struct nfsd_readres *resp = rqstp->rq_resp;
-       unsigned int len;
        u32 eof;
-       int v;
 
        dprintk("nfsd: READ    %s %d bytes at %d\n",
                SVCFH_fmt(&argp->fh),
@@ -187,17 +185,7 @@ nfsd_proc_read(struct svc_rqst *rqstp)
        argp->count = min_t(u32, argp->count, NFSSVC_MAXBLKSIZE_V2);
        argp->count = min_t(u32, argp->count, rqstp->rq_res.buflen);
 
-       v = 0;
-       len = argp->count;
        resp->pages = rqstp->rq_next_page;
-       while (len > 0) {
-               struct page *page = *(rqstp->rq_next_page++);
-
-               rqstp->rq_vec[v].iov_base = page_address(page);
-               rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE);
-               len -= rqstp->rq_vec[v].iov_len;
-               v++;
-       }
 
        /* Obtain buffer pointer for payload. 19 is 1 word for
         * status, 17 words for fattr, and 1 word for the byte count.
@@ -207,7 +195,7 @@ nfsd_proc_read(struct svc_rqst *rqstp)
        resp->count = argp->count;
        fh_copy(&resp->fh, &argp->fh);
        resp->status = nfsd_read(rqstp, &resp->fh, argp->offset,
-                                rqstp->rq_vec, v, &resp->count, &eof);
+                                &resp->count, &eof);
        if (resp->status == nfs_ok)
                resp->status = fh_getattr(&resp->fh, &resp->stat);
        else if (resp->status == nfserr_jukebox)
index 9c7b1ef..2154fa6 100644 (file)
@@ -402,6 +402,11 @@ void nfsd_reset_write_verifier(struct nfsd_net *nn)
        write_sequnlock(&nn->writeverf_lock);
 }
 
+/*
+ * Crank up a set of per-namespace resources for a new NFSD instance,
+ * including lockd, a duplicate reply cache, an open file cache
+ * instance, and a cache of NFSv4 state objects.
+ */
 static int nfsd_startup_net(struct net *net, const struct cred *cred)
 {
        struct nfsd_net *nn = net_generic(net, nfsd_net_id);
index caf6355..5777f40 100644 (file)
@@ -468,7 +468,8 @@ nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
        case nfs_ok:
                if (xdr_stream_encode_u32(xdr, resp->len) < 0)
                        return false;
-               xdr_write_pages(xdr, &resp->page, 0, resp->len);
+               svcxdr_encode_opaque_pages(rqstp, xdr, &resp->page, 0,
+                                          resp->len);
                if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0)
                        return false;
                break;
@@ -491,8 +492,9 @@ nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
                        return false;
                if (xdr_stream_encode_u32(xdr, resp->count) < 0)
                        return false;
-               xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base,
-                               resp->count);
+               svcxdr_encode_opaque_pages(rqstp, xdr, resp->pages,
+                                          rqstp->rq_res.page_base,
+                                          resp->count);
                if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0)
                        return false;
                break;
@@ -511,7 +513,8 @@ nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
                return false;
        switch (resp->status) {
        case nfs_ok:
-               xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len);
+               svcxdr_encode_opaque_pages(rqstp, xdr, dirlist->pages, 0,
+                                          dirlist->len);
                /* no more entries */
                if (xdr_stream_encode_item_absent(xdr) < 0)
                        return false;
index 4183819..2af7498 100644 (file)
@@ -1365,19 +1365,19 @@ TRACE_EVENT(nfsd_cb_setup,
                __field(u32, cl_id)
                __field(unsigned long, authflavor)
                __sockaddr(addr, clp->cl_cb_conn.cb_addrlen)
-               __array(unsigned char, netid, 8)
+               __string(netid, netid)
        ),
        TP_fast_assign(
                __entry->cl_boot = clp->cl_clientid.cl_boot;
                __entry->cl_id = clp->cl_clientid.cl_id;
-               strlcpy(__entry->netid, netid, sizeof(__entry->netid));
+               __assign_str(netid, netid);
                __entry->authflavor = authflavor;
                __assign_sockaddr(addr, &clp->cl_cb_conn.cb_addr,
                                  clp->cl_cb_conn.cb_addrlen)
        ),
        TP_printk("addr=%pISpc client %08x:%08x proto=%s flavor=%s",
                __get_sockaddr(addr), __entry->cl_boot, __entry->cl_id,
-               __entry->netid, show_nfsd_authflavor(__entry->authflavor))
+               __get_str(netid), show_nfsd_authflavor(__entry->authflavor))
 );
 
 TRACE_EVENT(nfsd_cb_setup_err,
@@ -1581,6 +1581,265 @@ TRACE_EVENT(nfsd_cb_recall_any_done,
        )
 );
 
+TRACE_EVENT(nfsd_ctl_unlock_ip,
+       TP_PROTO(
+               const struct net *net,
+               const char *address
+       ),
+       TP_ARGS(net, address),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __string(address, address)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __assign_str(address, address);
+       ),
+       TP_printk("address=%s",
+               __get_str(address)
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_unlock_fs,
+       TP_PROTO(
+               const struct net *net,
+               const char *path
+       ),
+       TP_ARGS(net, path),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __string(path, path)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __assign_str(path, path);
+       ),
+       TP_printk("path=%s",
+               __get_str(path)
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_filehandle,
+       TP_PROTO(
+               const struct net *net,
+               const char *domain,
+               const char *path,
+               int maxsize
+       ),
+       TP_ARGS(net, domain, path, maxsize),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __field(int, maxsize)
+               __string(domain, domain)
+               __string(path, path)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __entry->maxsize = maxsize;
+               __assign_str(domain, domain);
+               __assign_str(path, path);
+       ),
+       TP_printk("domain=%s path=%s maxsize=%d",
+               __get_str(domain), __get_str(path), __entry->maxsize
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_threads,
+       TP_PROTO(
+               const struct net *net,
+               int newthreads
+       ),
+       TP_ARGS(net, newthreads),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __field(int, newthreads)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __entry->newthreads = newthreads;
+       ),
+       TP_printk("newthreads=%d",
+               __entry->newthreads
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_pool_threads,
+       TP_PROTO(
+               const struct net *net,
+               int pool,
+               int nrthreads
+       ),
+       TP_ARGS(net, pool, nrthreads),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __field(int, pool)
+               __field(int, nrthreads)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __entry->pool = pool;
+               __entry->nrthreads = nrthreads;
+       ),
+       TP_printk("pool=%d nrthreads=%d",
+               __entry->pool, __entry->nrthreads
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_version,
+       TP_PROTO(
+               const struct net *net,
+               const char *mesg
+       ),
+       TP_ARGS(net, mesg),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __string(mesg, mesg)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __assign_str(mesg, mesg);
+       ),
+       TP_printk("%s",
+               __get_str(mesg)
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_ports_addfd,
+       TP_PROTO(
+               const struct net *net,
+               int fd
+       ),
+       TP_ARGS(net, fd),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __field(int, fd)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __entry->fd = fd;
+       ),
+       TP_printk("fd=%d",
+               __entry->fd
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_ports_addxprt,
+       TP_PROTO(
+               const struct net *net,
+               const char *transport,
+               int port
+       ),
+       TP_ARGS(net, transport, port),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __field(int, port)
+               __string(transport, transport)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __entry->port = port;
+               __assign_str(transport, transport);
+       ),
+       TP_printk("transport=%s port=%d",
+               __get_str(transport), __entry->port
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_maxblksize,
+       TP_PROTO(
+               const struct net *net,
+               int bsize
+       ),
+       TP_ARGS(net, bsize),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __field(int, bsize)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __entry->bsize = bsize;
+       ),
+       TP_printk("bsize=%d",
+               __entry->bsize
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_maxconn,
+       TP_PROTO(
+               const struct net *net,
+               int maxconn
+       ),
+       TP_ARGS(net, maxconn),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __field(int, maxconn)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __entry->maxconn = maxconn;
+       ),
+       TP_printk("maxconn=%d",
+               __entry->maxconn
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_time,
+       TP_PROTO(
+               const struct net *net,
+               const char *name,
+               size_t namelen,
+               int time
+       ),
+       TP_ARGS(net, name, namelen, time),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __field(int, time)
+               __string_len(name, name, namelen)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __entry->time = time;
+               __assign_str_len(name, name, namelen);
+       ),
+       TP_printk("file=%s time=%d\n",
+               __get_str(name), __entry->time
+       )
+);
+
+TRACE_EVENT(nfsd_ctl_recoverydir,
+       TP_PROTO(
+               const struct net *net,
+               const char *recdir
+       ),
+       TP_ARGS(net, recdir),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __string(recdir, recdir)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+               __assign_str(recdir, recdir);
+       ),
+       TP_printk("recdir=%s",
+               __get_str(recdir)
+       )
+);
+
+TRACE_EVENT(nfsd_end_grace,
+       TP_PROTO(
+               const struct net *net
+       ),
+       TP_ARGS(net),
+       TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+       ),
+       TP_fast_assign(
+               __entry->netns_ino = net->ns.inum;
+       ),
+       TP_printk("nn=%d", __entry->netns_ino
+       )
+);
+
 #endif /* _NFSD_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH
index bb9d471..59b7d60 100644 (file)
@@ -388,7 +388,9 @@ nfsd_sanitize_attrs(struct inode *inode, struct iattr *iap)
                                iap->ia_mode &= ~S_ISGID;
                } else {
                        /* set ATTR_KILL_* bits and let VFS handle it */
-                       iap->ia_valid |= (ATTR_KILL_SUID | ATTR_KILL_SGID);
+                       iap->ia_valid |= ATTR_KILL_SUID;
+                       iap->ia_valid |=
+                               setattr_should_drop_sgid(&nop_mnt_idmap, inode);
                }
        }
 }
@@ -536,7 +538,15 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
        inode_lock(inode);
        for (retries = 1;;) {
-               host_err = __nfsd_setattr(dentry, iap);
+               struct iattr attrs;
+
+               /*
+                * notify_change() can alter its iattr argument, making
+                * @iap unsuitable for submission multiple times. Make a
+                * copy for every loop iteration.
+                */
+               attrs = *iap;
+               host_err = __nfsd_setattr(dentry, &attrs);
                if (host_err != -EAGAIN || !retries--)
                        break;
                if (!nfsd_wait_for_delegreturn(rqstp, inode))
@@ -993,6 +1003,18 @@ static __be32 nfsd_finish_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
        }
 }
 
+/**
+ * nfsd_splice_read - Perform a VFS read using a splice pipe
+ * @rqstp: RPC transaction context
+ * @fhp: file handle of file to be read
+ * @file: opened struct file of file to be read
+ * @offset: starting byte offset
+ * @count: IN: requested number of bytes; OUT: number of bytes read
+ * @eof: OUT: set non-zero if operation reached the end of the file
+ *
+ * Returns nfs_ok on success, otherwise an nfserr stat value is
+ * returned.
+ */
 __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
                        struct file *file, loff_t offset, unsigned long *count,
                        u32 *eof)
@@ -1006,22 +1028,50 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
        ssize_t host_err;
 
        trace_nfsd_read_splice(rqstp, fhp, offset, *count);
-       rqstp->rq_next_page = rqstp->rq_respages + 1;
        host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
        return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
 }
 
-__be32 nfsd_readv(struct svc_rqst *rqstp, struct svc_fh *fhp,
-                 struct file *file, loff_t offset,
-                 struct kvec *vec, int vlen, unsigned long *count,
-                 u32 *eof)
+/**
+ * nfsd_iter_read - Perform a VFS read using an iterator
+ * @rqstp: RPC transaction context
+ * @fhp: file handle of file to be read
+ * @file: opened struct file of file to be read
+ * @offset: starting byte offset
+ * @count: IN: requested number of bytes; OUT: number of bytes read
+ * @base: offset in first page of read buffer
+ * @eof: OUT: set non-zero if operation reached the end of the file
+ *
+ * Some filesystems or situations cannot use nfsd_splice_read. This
+ * function is the slightly less-performant fallback for those cases.
+ *
+ * Returns nfs_ok on success, otherwise an nfserr stat value is
+ * returned.
+ */
+__be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
+                     struct file *file, loff_t offset, unsigned long *count,
+                     unsigned int base, u32 *eof)
 {
+       unsigned long v, total;
        struct iov_iter iter;
        loff_t ppos = offset;
+       struct page *page;
        ssize_t host_err;
 
+       v = 0;
+       total = *count;
+       while (total) {
+               page = *(rqstp->rq_next_page++);
+               rqstp->rq_vec[v].iov_base = page_address(page) + base;
+               rqstp->rq_vec[v].iov_len = min_t(size_t, total, PAGE_SIZE - base);
+               total -= rqstp->rq_vec[v].iov_len;
+               ++v;
+               base = 0;
+       }
+       WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec));
+
        trace_nfsd_read_vector(rqstp, fhp, offset, *count);
-       iov_iter_kvec(&iter, ITER_DEST, vec, vlen, *count);
+       iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count);
        host_err = vfs_iter_read(file, &iter, &ppos, 0);
        return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
 }
@@ -1151,14 +1201,24 @@ out_nfserr:
        return nfserr;
 }
 
-/*
- * Read data from a file. count must contain the requested read count
- * on entry. On return, *count contains the number of bytes actually read.
+/**
+ * nfsd_read - Read data from a file
+ * @rqstp: RPC transaction context
+ * @fhp: file handle of file to be read
+ * @offset: starting byte offset
+ * @count: IN: requested number of bytes; OUT: number of bytes read
+ * @eof: OUT: set non-zero if operation reached the end of the file
+ *
+ * The caller must verify that there is enough space in @rqstp.rq_res
+ * to perform this operation.
+ *
  * N.B. After this call fhp needs an fh_put
+ *
+ * Returns nfs_ok on success, otherwise an nfserr stat value is
+ * returned.
  */
 __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
-       loff_t offset, struct kvec *vec, int vlen, unsigned long *count,
-       u32 *eof)
+                loff_t offset, unsigned long *count, u32 *eof)
 {
        struct nfsd_file        *nf;
        struct file *file;
@@ -1173,12 +1233,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
        if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, &rqstp->rq_flags))
                err = nfsd_splice_read(rqstp, fhp, file, offset, count, eof);
        else
-               err = nfsd_readv(rqstp, fhp, file, offset, vec, vlen, count, eof);
+               err = nfsd_iter_read(rqstp, fhp, file, offset, count, 0, eof);
 
        nfsd_file_put(nf);
-
        trace_nfsd_read_done(rqstp, fhp, offset, *count);
-
        return err;
 }
 
index 43fb57a..a6890ea 100644 (file)
@@ -110,13 +110,12 @@ __be32            nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
                                struct file *file, loff_t offset,
                                unsigned long *count,
                                u32 *eof);
-__be32         nfsd_readv(struct svc_rqst *rqstp, struct svc_fh *fhp,
+__be32         nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
                                struct file *file, loff_t offset,
-                               struct kvec *vec, int vlen,
-                               unsigned long *count,
+                               unsigned long *count, unsigned int base,
                                u32 *eof);
-__be32                 nfsd_read(struct svc_rqst *, struct svc_fh *,
-                               loff_t, struct kvec *, int, unsigned long *,
+__be32         nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
+                               loff_t offset, unsigned long *count,
                                u32 *eof);
 __be32                 nfsd_write(struct svc_rqst *, struct svc_fh *, loff_t,
                                struct kvec *, int, unsigned long *,
index e956f88..5710833 100644 (file)
@@ -285,6 +285,14 @@ void nilfs_btnode_abort_change_key(struct address_space *btnc,
        if (nbh == NULL) {      /* blocksize == pagesize */
                xa_erase_irq(&btnc->i_pages, newkey);
                unlock_page(ctxt->bh->b_page);
-       } else
-               brelse(nbh);
+       } else {
+               /*
+                * When canceling a buffer that a prepare operation has
+                * allocated to copy a node block to another location, use
+                * nilfs_btnode_delete() to initialize and release the buffer
+                * so that the buffer flags will not be in an inconsistent
+                * state when it is reallocated.
+                */
+               nilfs_btnode_delete(nbh);
+       }
 }
index a265d39..a9eb348 100644 (file)
@@ -140,7 +140,7 @@ const struct file_operations nilfs_file_operations = {
        .open           = generic_file_open,
        /* .release     = nilfs_release_file, */
        .fsync          = nilfs_sync_file,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
 };
 
index 1310d2d..a8ce522 100644 (file)
@@ -917,6 +917,7 @@ void nilfs_evict_inode(struct inode *inode)
        struct nilfs_transaction_info ti;
        struct super_block *sb = inode->i_sb;
        struct nilfs_inode_info *ii = NILFS_I(inode);
+       struct the_nilfs *nilfs;
        int ret;
 
        if (inode->i_nlink || !ii->i_root || unlikely(is_bad_inode(inode))) {
@@ -929,6 +930,23 @@ void nilfs_evict_inode(struct inode *inode)
 
        truncate_inode_pages_final(&inode->i_data);
 
+       nilfs = sb->s_fs_info;
+       if (unlikely(sb_rdonly(sb) || !nilfs->ns_writer)) {
+               /*
+                * If this inode is about to be disposed after the file system
+                * has been degraded to read-only due to file system corruption
+                * or after the writer has been detached, do not make any
+                * changes that cause writes, just clear it.
+                * Do this check after read-locking ns_segctor_sem by
+                * nilfs_transaction_begin() in order to avoid a race with
+                * the writer detach operation.
+                */
+               clear_inode(inode);
+               nilfs_clear_inode(inode);
+               nilfs_transaction_abort(sb);
+               return;
+       }
+
        /* TODO: some of the following operations may fail.  */
        nilfs_truncate_bmap(ii, 0);
        nilfs_mark_inode_dirty(inode);
index 5cf3082..b4e54d0 100644 (file)
@@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent)
                        struct folio *folio = fbatch.folios[i];
 
                        folio_lock(folio);
-                       nilfs_clear_dirty_page(&folio->page, silent);
+
+                       /*
+                        * This folio may have been removed from the address
+                        * space by truncation or invalidation when the lock
+                        * was acquired.  Skip processing in that case.
+                        */
+                       if (likely(folio->mapping == mapping))
+                               nilfs_clear_dirty_page(&folio->page, silent);
+
                        folio_unlock(folio);
                }
                folio_batch_release(&fbatch);
index 1362ccb..6e59dc1 100644 (file)
@@ -101,6 +101,12 @@ int nilfs_segbuf_extend_segsum(struct nilfs_segment_buffer *segbuf)
        if (unlikely(!bh))
                return -ENOMEM;
 
+       lock_buffer(bh);
+       if (!buffer_uptodate(bh)) {
+               memset(bh->b_data, 0, bh->b_size);
+               set_buffer_uptodate(bh);
+       }
+       unlock_buffer(bh);
        nilfs_segbuf_add_segsum_buffer(segbuf, bh);
        return 0;
 }
index ac949fd..c255302 100644 (file)
@@ -981,10 +981,13 @@ static void nilfs_segctor_fill_in_super_root(struct nilfs_sc_info *sci,
        unsigned int isz, srsz;
 
        bh_sr = NILFS_LAST_SEGBUF(&sci->sc_segbufs)->sb_super_root;
+
+       lock_buffer(bh_sr);
        raw_sr = (struct nilfs_super_root *)bh_sr->b_data;
        isz = nilfs->ns_inode_size;
        srsz = NILFS_SR_BYTES(isz);
 
+       raw_sr->sr_sum = 0;  /* Ensure initialization within this update */
        raw_sr->sr_bytes = cpu_to_le16(srsz);
        raw_sr->sr_nongc_ctime
                = cpu_to_le64(nilfs_doing_gc() ?
@@ -998,6 +1001,8 @@ static void nilfs_segctor_fill_in_super_root(struct nilfs_sc_info *sci,
        nilfs_write_inode_common(nilfs->ns_sufile, (void *)raw_sr +
                                 NILFS_SR_SUFILE_OFFSET(isz), 1);
        memset((void *)raw_sr + srsz, 0, nilfs->ns_blocksize - srsz);
+       set_buffer_uptodate(bh_sr);
+       unlock_buffer(bh_sr);
 }
 
 static void nilfs_redirty_inodes(struct list_head *head)
@@ -1780,6 +1785,7 @@ static void nilfs_abort_logs(struct list_head *logs, int err)
        list_for_each_entry(segbuf, logs, sb_list) {
                list_for_each_entry(bh, &segbuf->sb_segsum_buffers,
                                    b_assoc_buffers) {
+                       clear_buffer_uptodate(bh);
                        if (bh->b_page != bd_page) {
                                if (bd_page)
                                        end_page_writeback(bd_page);
@@ -1791,6 +1797,7 @@ static void nilfs_abort_logs(struct list_head *logs, int err)
                                    b_assoc_buffers) {
                        clear_buffer_async_write(bh);
                        if (bh == segbuf->sb_super_root) {
+                               clear_buffer_uptodate(bh);
                                if (bh->b_page != bd_page) {
                                        end_page_writeback(bd_page);
                                        bd_page = bh->b_page;
index dc359b5..2c6078a 100644 (file)
@@ -779,6 +779,15 @@ int nilfs_sufile_resize(struct inode *sufile, __u64 newnsegs)
                        goto out_header;
 
                sui->ncleansegs -= nsegs - newnsegs;
+
+               /*
+                * If the sufile is successfully truncated, immediately adjust
+                * the segment allocation space while locking the semaphore
+                * "mi_sem" so that nilfs_sufile_alloc() never allocates
+                * segments in the truncated space.
+                */
+               sui->allocmax = newnsegs - 1;
+               sui->allocmin = 0;
        }
 
        kaddr = kmap_atomic(header_bh->b_page);
index 77f1e57..0ef8c71 100644 (file)
@@ -372,10 +372,31 @@ static int nilfs_move_2nd_super(struct super_block *sb, loff_t sb2off)
                goto out;
        }
        nsbp = (void *)nsbh->b_data + offset;
-       memset(nsbp, 0, nilfs->ns_blocksize);
 
+       lock_buffer(nsbh);
        if (sb2i >= 0) {
+               /*
+                * The position of the second superblock only changes by 4KiB,
+                * which is larger than the maximum superblock data size
+                * (= 1KiB), so there is no need to use memmove() to allow
+                * overlap between source and destination.
+                */
                memcpy(nsbp, nilfs->ns_sbp[sb2i], nilfs->ns_sbsize);
+
+               /*
+                * Zero fill after copy to avoid overwriting in case of move
+                * within the same block.
+                */
+               memset(nsbh->b_data, 0, offset);
+               memset((void *)nsbp + nilfs->ns_sbsize, 0,
+                      nsbh->b_size - offset - nilfs->ns_sbsize);
+       } else {
+               memset(nsbh->b_data, 0, nsbh->b_size);
+       }
+       set_buffer_uptodate(nsbh);
+       unlock_buffer(nsbh);
+
+       if (sb2i >= 0) {
                brelse(nilfs->ns_sbh[sb2i]);
                nilfs->ns_sbh[sb2i] = nsbh;
                nilfs->ns_sbp[sb2i] = nsbp;
@@ -1278,14 +1299,11 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
 {
        struct nilfs_super_data sd;
        struct super_block *s;
-       fmode_t mode = FMODE_READ | FMODE_EXCL;
        struct dentry *root_dentry;
        int err, s_new = false;
 
-       if (!(flags & SB_RDONLY))
-               mode |= FMODE_WRITE;
-
-       sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type);
+       sd.bdev = blkdev_get_by_path(dev_name, sb_open_mode(flags), fs_type,
+                                    NULL);
        if (IS_ERR(sd.bdev))
                return ERR_CAST(sd.bdev);
 
@@ -1319,7 +1337,6 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
                s_new = true;
 
                /* New superblock instance created */
-               s->s_mode = mode;
                snprintf(s->s_id, sizeof(s->s_id), "%pg", sd.bdev);
                sb_set_blocksize(s, block_size(sd.bdev));
 
@@ -1357,7 +1374,7 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
        }
 
        if (!s_new)
-               blkdev_put(sd.bdev, mode);
+               blkdev_put(sd.bdev, fs_type);
 
        return root_dentry;
 
@@ -1366,7 +1383,7 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
 
  failed:
        if (!s_new)
-               blkdev_put(sd.bdev, mode);
+               blkdev_put(sd.bdev, fs_type);
        return ERR_PTR(err);
 }
 
index 2894152..0f06679 100644 (file)
@@ -405,6 +405,18 @@ unsigned long nilfs_nrsvsegs(struct the_nilfs *nilfs, unsigned long nsegs)
                                  100));
 }
 
+/**
+ * nilfs_max_segment_count - calculate the maximum number of segments
+ * @nilfs: nilfs object
+ */
+static u64 nilfs_max_segment_count(struct the_nilfs *nilfs)
+{
+       u64 max_count = U64_MAX;
+
+       do_div(max_count, nilfs->ns_blocks_per_segment);
+       return min_t(u64, max_count, ULONG_MAX);
+}
+
 void nilfs_set_nsegments(struct the_nilfs *nilfs, unsigned long nsegs)
 {
        nilfs->ns_nsegments = nsegs;
@@ -414,6 +426,8 @@ void nilfs_set_nsegments(struct the_nilfs *nilfs, unsigned long nsegs)
 static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
                                   struct nilfs_super_block *sbp)
 {
+       u64 nsegments, nblocks;
+
        if (le32_to_cpu(sbp->s_rev_level) < NILFS_MIN_SUPP_REV) {
                nilfs_err(nilfs->ns_sb,
                          "unsupported revision (superblock rev.=%d.%d, current rev.=%d.%d). Please check the version of mkfs.nilfs(2).",
@@ -457,7 +471,34 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
                return -EINVAL;
        }
 
-       nilfs_set_nsegments(nilfs, le64_to_cpu(sbp->s_nsegments));
+       nsegments = le64_to_cpu(sbp->s_nsegments);
+       if (nsegments > nilfs_max_segment_count(nilfs)) {
+               nilfs_err(nilfs->ns_sb,
+                         "segment count %llu exceeds upper limit (%llu segments)",
+                         (unsigned long long)nsegments,
+                         (unsigned long long)nilfs_max_segment_count(nilfs));
+               return -EINVAL;
+       }
+
+       nblocks = sb_bdev_nr_blocks(nilfs->ns_sb);
+       if (nblocks) {
+               u64 min_block_count = nsegments * nilfs->ns_blocks_per_segment;
+               /*
+                * To avoid failing to mount early device images without a
+                * second superblock, exclude that block count from the
+                * "min_block_count" calculation.
+                */
+
+               if (nblocks < min_block_count) {
+                       nilfs_err(nilfs->ns_sb,
+                                 "total number of segment blocks %llu exceeds device size (%llu blocks)",
+                                 (unsigned long long)min_block_count,
+                                 (unsigned long long)nblocks);
+                       return -EINVAL;
+               }
+       }
+
+       nilfs_set_nsegments(nilfs, nsegments);
        nilfs->ns_crc_seed = le32_to_cpu(sbp->s_crc_seed);
        return 0;
 }
diff --git a/fs/no-block.c b/fs/no-block.c
deleted file mode 100644 (file)
index 481c0f0..0000000
+++ /dev/null
@@ -1,19 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/* no-block.c: implementation of routines required for non-BLOCK configuration
- *
- * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- */
-
-#include <linux/kernel.h>
-#include <linux/fs.h>
-
-static int no_blkdev_open(struct inode * inode, struct file * filp)
-{
-       return -ENODEV;
-}
-
-const struct file_operations def_blk_fops = {
-       .open           = no_blkdev_open,
-       .llseek         = noop_llseek,
-};
index 49cfe2a..993375f 100644 (file)
@@ -65,7 +65,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
        struct fsnotify_event *fsn_event;
        struct fsnotify_group *group = inode_mark->group;
        int ret;
-       int len = 0;
+       int len = 0, wd;
        int alloc_len = sizeof(struct inotify_event_info);
        struct mem_cgroup *old_memcg;
 
@@ -81,6 +81,13 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
                              fsn_mark);
 
        /*
+        * We can be racing with mark being detached. Don't report event with
+        * invalid wd.
+        */
+       wd = READ_ONCE(i_mark->wd);
+       if (wd == -1)
+               return 0;
+       /*
         * Whoever is interested in the event, pays for the allocation. Do not
         * trigger OOM killer in the target monitoring memcg as it may have
         * security repercussion.
@@ -110,7 +117,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
        fsn_event = &event->fse;
        fsnotify_init_event(fsn_event);
        event->mask = mask;
-       event->wd = i_mark->wd;
+       event->wd = wd;
        event->sync_cookie = cookie;
        event->name_len = len;
        if (len)
index a3865bc..f79408f 100644 (file)
@@ -2491,7 +2491,7 @@ conv_err_out:
  * byte offset @ofs inside the attribute with the constant byte @val.
  *
  * This function is effectively like memset() applied to an ntfs attribute.
- * Note thie function actually only operates on the page cache pages belonging
+ * Note this function actually only operates on the page cache pages belonging
  * to the ntfs attribute and it marks them dirty after doing the memset().
  * Thus it relies on the vm dirty page write code paths to cause the modified
  * pages to be written to the mft record/disk.
index f9cb180..761aaa0 100644 (file)
@@ -161,7 +161,7 @@ static int ntfs_decompress(struct page *dest_pages[], int completed_pages[],
         */
        u8 *cb_end = cb_start + cb_size; /* End of cb. */
        u8 *cb = cb_start;      /* Current position in cb. */
-       u8 *cb_sb_start = cb;   /* Beginning of the current sb in the cb. */
+       u8 *cb_sb_start;        /* Beginning of the current sb in the cb. */
        u8 *cb_sb_end;          /* End of current sb / beginning of next sb. */
 
        /* Variables for uncompressed data / destination. */
index c481b14..e5e0ed5 100644 (file)
@@ -1992,7 +1992,7 @@ const struct file_operations ntfs_file_ops = {
 #endif /* NTFS_RW */
        .mmap           = generic_file_mmap,
        .open           = ntfs_file_open,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
 };
 
 const struct inode_operations ntfs_file_inode_ops = {
index 4803089..0155f10 100644 (file)
@@ -1955,36 +1955,38 @@ undo_alloc:
                                "attribute.%s", es);
                NVolSetErrors(vol);
        }
-       a = ctx->attr;
+
        if (ntfs_rl_truncate_nolock(vol, &mft_ni->runlist, old_last_vcn)) {
                ntfs_error(vol->sb, "Failed to truncate mft data attribute "
                                "runlist.%s", es);
                NVolSetErrors(vol);
        }
-       if (mp_rebuilt && !IS_ERR(ctx->mrec)) {
-               if (ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu(
+       if (ctx) {
+               a = ctx->attr;
+               if (mp_rebuilt && !IS_ERR(ctx->mrec)) {
+                       if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(
                                a->data.non_resident.mapping_pairs_offset),
                                old_alen - le16_to_cpu(
-                               a->data.non_resident.mapping_pairs_offset),
+                                       a->data.non_resident.mapping_pairs_offset),
                                rl2, ll, -1, NULL)) {
-                       ntfs_error(vol->sb, "Failed to restore mapping pairs "
+                               ntfs_error(vol->sb, "Failed to restore mapping pairs "
                                        "array.%s", es);
-                       NVolSetErrors(vol);
-               }
-               if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
-                       ntfs_error(vol->sb, "Failed to restore attribute "
+                               NVolSetErrors(vol);
+                       }
+                       if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
+                               ntfs_error(vol->sb, "Failed to restore attribute "
                                        "record.%s", es);
+                               NVolSetErrors(vol);
+                       }
+                       flush_dcache_mft_record_page(ctx->ntfs_ino);
+                       mark_mft_record_dirty(ctx->ntfs_ino);
+               } else if (IS_ERR(ctx->mrec)) {
+                       ntfs_error(vol->sb, "Failed to restore attribute search "
+                               "context.%s", es);
                        NVolSetErrors(vol);
                }
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               mark_mft_record_dirty(ctx->ntfs_ino);
-       } else if (IS_ERR(ctx->mrec)) {
-               ntfs_error(vol->sb, "Failed to restore attribute search "
-                               "context.%s", es);
-               NVolSetErrors(vol);
-       }
-       if (ctx)
                ntfs_attr_put_search_ctx(ctx);
+       }
        if (!IS_ERR(mrec))
                unmap_mft_record(mft_ni);
        up_write(&mft_ni->runlist.lock);
index 2643a08..56a7d5b 100644 (file)
@@ -1620,7 +1620,7 @@ read_partial_attrdef_page:
                memcpy((u8*)vol->attrdef + (index++ << PAGE_SHIFT),
                                page_address(page), size);
                ntfs_unmap_page(page);
-       };
+       }
        if (size == PAGE_SIZE) {
                size = i_size & ~PAGE_MASK;
                if (size)
@@ -1689,7 +1689,7 @@ read_partial_upcase_page:
                memcpy((char*)vol->upcase + (index++ << PAGE_SHIFT),
                                page_address(page), size);
                ntfs_unmap_page(page);
-       };
+       }
        if (size == PAGE_SIZE) {
                size = i_size & ~PAGE_MASK;
                if (size)
index 9a3d55c..036efd8 100644 (file)
@@ -744,6 +744,35 @@ static ssize_t ntfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
        return generic_file_read_iter(iocb, iter);
 }
 
+static ssize_t ntfs_file_splice_read(struct file *in, loff_t *ppos,
+                                    struct pipe_inode_info *pipe,
+                                    size_t len, unsigned int flags)
+{
+       struct inode *inode = in->f_mapping->host;
+       struct ntfs_inode *ni = ntfs_i(inode);
+
+       if (is_encrypted(ni)) {
+               ntfs_inode_warn(inode, "encrypted i/o not supported");
+               return -EOPNOTSUPP;
+       }
+
+#ifndef CONFIG_NTFS3_LZX_XPRESS
+       if (ni->ni_flags & NI_FLAG_COMPRESSED_MASK) {
+               ntfs_inode_warn(
+                       inode,
+                       "activate CONFIG_NTFS3_LZX_XPRESS to read external compressed files");
+               return -EOPNOTSUPP;
+       }
+#endif
+
+       if (is_dedup(ni)) {
+               ntfs_inode_warn(inode, "read deduplicated not supported");
+               return -EOPNOTSUPP;
+       }
+
+       return filemap_splice_read(in, ppos, pipe, len, flags);
+}
+
 /*
  * ntfs_get_frame_pages
  *
@@ -1159,7 +1188,7 @@ const struct file_operations ntfs_file_operations = {
 #ifdef CONFIG_COMPAT
        .compat_ioctl   = ntfs_compat_ioctl,
 #endif
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = ntfs_file_splice_read,
        .mmap           = ntfs_file_mmap,
        .open           = ntfs_file_open,
        .fsync          = generic_file_fsync,
index 60b97c9..21472e3 100644 (file)
@@ -1503,7 +1503,7 @@ static void o2hb_region_release(struct config_item *item)
        }
 
        if (reg->hr_bdev)
-               blkdev_put(reg->hr_bdev, FMODE_READ|FMODE_WRITE);
+               blkdev_put(reg->hr_bdev, NULL);
 
        kfree(reg->hr_slots);
 
@@ -1786,7 +1786,8 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
                goto out2;
 
        reg->hr_bdev = blkdev_get_by_dev(f.file->f_mapping->host->i_rdev,
-                                        FMODE_WRITE | FMODE_READ, NULL);
+                                        BLK_OPEN_WRITE | BLK_OPEN_READ, NULL,
+                                        NULL);
        if (IS_ERR(reg->hr_bdev)) {
                ret = PTR_ERR(reg->hr_bdev);
                reg->hr_bdev = NULL;
@@ -1893,7 +1894,7 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 
 out3:
        if (ret < 0) {
-               blkdev_put(reg->hr_bdev, FMODE_READ | FMODE_WRITE);
+               blkdev_put(reg->hr_bdev, NULL);
                reg->hr_bdev = NULL;
        }
 out2:
index efb09de..91a1945 100644 (file)
@@ -2100,14 +2100,20 @@ static long ocfs2_fallocate(struct file *file, int mode, loff_t offset,
        struct ocfs2_space_resv sr;
        int change_size = 1;
        int cmd = OCFS2_IOC_RESVSP64;
+       int ret = 0;
 
        if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
                return -EOPNOTSUPP;
        if (!ocfs2_writes_unwritten_extents(osb))
                return -EOPNOTSUPP;
 
-       if (mode & FALLOC_FL_KEEP_SIZE)
+       if (mode & FALLOC_FL_KEEP_SIZE) {
                change_size = 0;
+       } else {
+               ret = inode_newsize_ok(inode, offset + len);
+               if (ret)
+                       return ret;
+       }
 
        if (mode & FALLOC_FL_PUNCH_HOLE)
                cmd = OCFS2_IOC_UNRESVSP64;
@@ -2552,7 +2558,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
         *
         * Take and drop the meta data lock to update inode fields
         * like i_size. This allows the checks down below
-        * generic_file_read_iter() a chance of actually working.
+        * copy_splice_read() a chance of actually working.
         */
        ret = ocfs2_inode_lock_atime(inode, filp->f_path.mnt, &lock_level,
                                     !nowait);
@@ -2581,6 +2587,43 @@ bail:
        return ret;
 }
 
+static ssize_t ocfs2_file_splice_read(struct file *in, loff_t *ppos,
+                                     struct pipe_inode_info *pipe,
+                                     size_t len, unsigned int flags)
+{
+       struct inode *inode = file_inode(in);
+       ssize_t ret = 0;
+       int lock_level = 0;
+
+       trace_ocfs2_file_splice_read(inode, in, in->f_path.dentry,
+                                    (unsigned long long)OCFS2_I(inode)->ip_blkno,
+                                    in->f_path.dentry->d_name.len,
+                                    in->f_path.dentry->d_name.name,
+                                    flags);
+
+       /*
+        * We're fine letting folks race truncates and extending writes with
+        * read across the cluster, just like they can locally.  Hence no
+        * rw_lock during read.
+        *
+        * Take and drop the meta data lock to update inode fields like i_size.
+        * This allows the checks down below filemap_splice_read() a chance of
+        * actually working.
+        */
+       ret = ocfs2_inode_lock_atime(inode, in->f_path.mnt, &lock_level, 1);
+       if (ret < 0) {
+               if (ret != -EAGAIN)
+                       mlog_errno(ret);
+               goto bail;
+       }
+       ocfs2_inode_unlock(inode, lock_level);
+
+       ret = filemap_splice_read(in, ppos, pipe, len, flags);
+       trace_filemap_splice_read_ret(ret);
+bail:
+       return ret;
+}
+
 /* Refer generic_file_llseek_unlocked() */
 static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence)
 {
@@ -2744,7 +2787,7 @@ const struct file_operations ocfs2_fops = {
 #endif
        .lock           = ocfs2_lock,
        .flock          = ocfs2_flock,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = ocfs2_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .fallocate      = ocfs2_fallocate,
        .remap_file_range = ocfs2_remap_file_range,
@@ -2790,7 +2833,7 @@ const struct file_operations ocfs2_fops_no_plocks = {
        .compat_ioctl   = ocfs2_compat_ioctl,
 #endif
        .flock          = ocfs2_flock,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
        .fallocate      = ocfs2_fallocate,
        .remap_file_range = ocfs2_remap_file_range,
index dc4bce1..b8c3d17 100644 (file)
@@ -1319,6 +1319,8 @@ DEFINE_OCFS2_FILE_OPS(ocfs2_file_splice_write);
 
 DEFINE_OCFS2_FILE_OPS(ocfs2_file_read_iter);
 
+DEFINE_OCFS2_FILE_OPS(ocfs2_file_splice_read);
+
 DEFINE_OCFS2_ULL_ULL_ULL_EVENT(ocfs2_truncate_file);
 
 DEFINE_OCFS2_ULL_ULL_EVENT(ocfs2_truncate_file_error);
@@ -1470,6 +1472,7 @@ TRACE_EVENT(ocfs2_prepare_inode_for_write,
 );
 
 DEFINE_OCFS2_INT_EVENT(generic_file_read_iter_ret);
+DEFINE_OCFS2_INT_EVENT(filemap_splice_read_ret);
 
 /* End of trace events for fs/ocfs2/file.c. */
 
index 0b0e6a1..988d1c0 100644 (file)
@@ -952,8 +952,10 @@ static void ocfs2_disable_quotas(struct ocfs2_super *osb)
        for (type = 0; type < OCFS2_MAXQUOTAS; type++) {
                if (!sb_has_quota_loaded(sb, type))
                        continue;
-               oinfo = sb_dqinfo(sb, type)->dqi_priv;
-               cancel_delayed_work_sync(&oinfo->dqi_sync_work);
+               if (!sb_has_quota_suspended(sb, type)) {
+                       oinfo = sb_dqinfo(sb, type)->dqi_priv;
+                       cancel_delayed_work_sync(&oinfo->dqi_sync_work);
+               }
                inode = igrab(sb->s_dquot.files[type]);
                /* Turn off quotas. This will remove all dquot structures from
                 * memory and so they will be automatically synced to global
index 0101f1f..de8f57e 100644 (file)
@@ -334,7 +334,7 @@ const struct file_operations omfs_file_operations = {
        .write_iter = generic_file_write_iter,
        .mmap = generic_file_mmap,
        .fsync = generic_file_fsync,
-       .splice_read = generic_file_splice_read,
+       .splice_read = filemap_splice_read,
 };
 
 static int omfs_setattr(struct mnt_idmap *idmap,
index 4478adc..fb07b28 100644 (file)
--- a/fs/open.c
+++ b/fs/open.c
@@ -700,10 +700,7 @@ SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
        return do_fchmodat(AT_FDCWD, filename, mode);
 }
 
-/**
- * setattr_vfsuid - check and set ia_fsuid attribute
- * @kuid: new inode owner
- *
+/*
  * Check whether @kuid is valid and if so generate and set vfsuid_t in
  * ia_vfsuid.
  *
@@ -718,10 +715,7 @@ static inline bool setattr_vfsuid(struct iattr *attr, kuid_t kuid)
        return true;
 }
 
-/**
- * setattr_vfsgid - check and set ia_fsgid attribute
- * @kgid: new inode owner
- *
+/*
  * Check whether @kgid is valid and if so generate and set vfsgid_t in
  * ia_vfsgid.
  *
@@ -989,7 +983,6 @@ cleanup_file:
  * @file: file pointer
  * @dentry: pointer to dentry
  * @open: open callback
- * @opened: state of open
  *
  * This can be used to finish opening a file passed to i_op->atomic_open().
  *
@@ -1043,7 +1036,6 @@ EXPORT_SYMBOL(file_path);
  * vfs_open - open the file at the given path
  * @path: path to open
  * @file: newly allocated file with f_flag initialized
- * @cred: credentials to use
  */
 int vfs_open(const struct path *path, struct file *file)
 {
@@ -1116,23 +1108,77 @@ struct file *dentry_create(const struct path *path, int flags, umode_t mode,
 }
 EXPORT_SYMBOL(dentry_create);
 
-struct file *open_with_fake_path(const struct path *path, int flags,
+/**
+ * kernel_file_open - open a file for kernel internal use
+ * @path:      path of the file to open
+ * @flags:     open flags
+ * @inode:     the inode
+ * @cred:      credentials for open
+ *
+ * Open a file for use by in-kernel consumers. The file is not accounted
+ * against nr_files and must not be installed into the file descriptor
+ * table.
+ *
+ * Return: Opened file on success, an error pointer on failure.
+ */
+struct file *kernel_file_open(const struct path *path, int flags,
                                struct inode *inode, const struct cred *cred)
 {
-       struct file *f = alloc_empty_file_noaccount(flags, cred);
-       if (!IS_ERR(f)) {
-               int error;
+       struct file *f;
+       int error;
 
-               f->f_path = *path;
-               error = do_dentry_open(f, inode, NULL);
-               if (error) {
-                       fput(f);
-                       f = ERR_PTR(error);
-               }
+       f = alloc_empty_file_noaccount(flags, cred);
+       if (IS_ERR(f))
+               return f;
+
+       f->f_path = *path;
+       error = do_dentry_open(f, inode, NULL);
+       if (error) {
+               fput(f);
+               f = ERR_PTR(error);
        }
        return f;
 }
-EXPORT_SYMBOL(open_with_fake_path);
+EXPORT_SYMBOL_GPL(kernel_file_open);
+
+/**
+ * backing_file_open - open a backing file for kernel internal use
+ * @path:      path of the file to open
+ * @flags:     open flags
+ * @path:      path of the backing file
+ * @cred:      credentials for open
+ *
+ * Open a backing file for a stackable filesystem (e.g., overlayfs).
+ * @path may be on the stackable filesystem and backing inode on the
+ * underlying filesystem. In this case, we want to be able to return
+ * the @real_path of the backing inode. This is done by embedding the
+ * returned file into a container structure that also stores the path of
+ * the backing inode on the underlying filesystem, which can be
+ * retrieved using backing_file_real_path().
+ */
+struct file *backing_file_open(const struct path *path, int flags,
+                              const struct path *real_path,
+                              const struct cred *cred)
+{
+       struct file *f;
+       int error;
+
+       f = alloc_empty_backing_file(flags, cred);
+       if (IS_ERR(f))
+               return f;
+
+       f->f_path = *path;
+       path_get(real_path);
+       *backing_file_real_path(f) = *real_path;
+       error = do_dentry_open(f, d_inode(real_path->dentry), NULL);
+       if (error) {
+               fput(f);
+               f = ERR_PTR(error);
+       }
+
+       return f;
+}
+EXPORT_SYMBOL_GPL(backing_file_open);
 
 #define WILL_CREATE(flags)     (flags & (O_CREAT | __O_TMPFILE))
 #define O_PATH_FLAGS           (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC)
@@ -1156,7 +1202,7 @@ inline struct open_how build_open_how(int flags, umode_t mode)
 inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 {
        u64 flags = how->flags;
-       u64 strip = FMODE_NONOTIFY | O_CLOEXEC;
+       u64 strip = __FMODE_NONOTIFY | O_CLOEXEC;
        int lookup_flags = 0;
        int acc_mode = ACC_MODE(flags);
 
index 1a4301a..d683722 100644 (file)
@@ -337,6 +337,26 @@ out:
        return ret;
 }
 
+static ssize_t orangefs_file_splice_read(struct file *in, loff_t *ppos,
+                                        struct pipe_inode_info *pipe,
+                                        size_t len, unsigned int flags)
+{
+       struct inode *inode = file_inode(in);
+       ssize_t ret;
+
+       orangefs_stats.reads++;
+
+       down_read(&inode->i_rwsem);
+       ret = orangefs_revalidate_mapping(inode);
+       if (ret)
+               goto out;
+
+       ret = filemap_splice_read(in, ppos, pipe, len, flags);
+out:
+       up_read(&inode->i_rwsem);
+       return ret;
+}
+
 static ssize_t orangefs_file_write_iter(struct kiocb *iocb,
     struct iov_iter *iter)
 {
@@ -556,7 +576,7 @@ const struct file_operations orangefs_file_operations = {
        .lock           = orangefs_lock,
        .mmap           = orangefs_file_mmap,
        .open           = generic_file_open,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = orangefs_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .flush          = orangefs_flush,
        .release        = orangefs_file_release,
index 7c04f03..1f93a3a 100644 (file)
@@ -34,8 +34,8 @@ static char ovl_whatisit(struct inode *inode, struct inode *realinode)
                return 'm';
 }
 
-/* No atime modification nor notify on underlying */
-#define OVL_OPEN_FLAGS (O_NOATIME | FMODE_NONOTIFY)
+/* No atime modification on underlying */
+#define OVL_OPEN_FLAGS (O_NOATIME)
 
 static struct file *ovl_open_realfile(const struct file *file,
                                      const struct path *realpath)
@@ -61,8 +61,8 @@ static struct file *ovl_open_realfile(const struct file *file,
                if (!inode_owner_or_capable(real_idmap, realinode))
                        flags &= ~O_NOATIME;
 
-               realfile = open_with_fake_path(&file->f_path, flags, realinode,
-                                              current_cred());
+               realfile = backing_file_open(&file->f_path, flags, realpath,
+                                            current_cred());
        }
        revert_creds(old_cred);
 
@@ -419,6 +419,27 @@ out_unlock:
        return ret;
 }
 
+static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
+                              struct pipe_inode_info *pipe, size_t len,
+                              unsigned int flags)
+{
+       const struct cred *old_cred;
+       struct fd real;
+       ssize_t ret;
+
+       ret = ovl_real_fdget(in, &real);
+       if (ret)
+               return ret;
+
+       old_cred = ovl_override_creds(file_inode(in)->i_sb);
+       ret = vfs_splice_read(real.file, ppos, pipe, len, flags);
+       revert_creds(old_cred);
+       ovl_file_accessed(in);
+
+       fdput(real);
+       return ret;
+}
+
 /*
  * Calling iter_file_splice_write() directly from overlay's f_op may deadlock
  * due to lock order inversion between pipe->mutex in iter_file_splice_write()
@@ -695,7 +716,7 @@ const struct file_operations ovl_file_operations = {
        .fallocate      = ovl_fallocate,
        .fadvise        = ovl_fadvise,
        .flush          = ovl_flush,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = ovl_splice_read,
        .splice_write   = ovl_splice_write,
 
        .copy_file_range        = ovl_copy_file_range,
index 4d0b278..23686e8 100644 (file)
@@ -329,8 +329,9 @@ static inline struct file *ovl_do_tmpfile(struct ovl_fs *ofs,
                                          struct dentry *dentry, umode_t mode)
 {
        struct path path = { .mnt = ovl_upper_mnt(ofs), .dentry = dentry };
-       struct file *file = vfs_tmpfile_open(ovl_upper_mnt_idmap(ofs), &path, mode,
-                                       O_LARGEFILE | O_WRONLY, current_cred());
+       struct file *file = kernel_tmpfile_open(ovl_upper_mnt_idmap(ofs), &path,
+                                               mode, O_LARGEFILE | O_WRONLY,
+                                               current_cred());
        int err = PTR_ERR_OR_ZERO(file);
 
        pr_debug("tmpfile(%pd2, 0%o) = %i\n", dentry, mode, err);
index ceb17d2..2d88f73 100644 (file)
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -342,7 +342,8 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
                        break;
                if (ret)
                        break;
-               if (filp->f_flags & O_NONBLOCK) {
+               if ((filp->f_flags & O_NONBLOCK) ||
+                   (iocb->ki_flags & IOCB_NOWAIT)) {
                        ret = -EAGAIN;
                        break;
                }
@@ -547,7 +548,8 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
                        continue;
 
                /* Wait for buffer space to become available. */
-               if (filp->f_flags & O_NONBLOCK) {
+               if ((filp->f_flags & O_NONBLOCK) ||
+                   (iocb->ki_flags & IOCB_NOWAIT)) {
                        if (!ret)
                                ret = -EAGAIN;
                        break;
index 3cede8b..e4d0340 100644 (file)
@@ -216,7 +216,7 @@ static struct mount *next_group(struct mount *m, struct mount *origin)
 static struct mount *last_dest, *first_source, *last_source, *dest_master;
 static struct hlist_head *list;
 
-static inline bool peers(struct mount *m1, struct mount *m2)
+static inline bool peers(const struct mount *m1, const struct mount *m2)
 {
        return m1->mnt_group_id == m2->mnt_group_id && m1->mnt_group_id;
 }
@@ -354,6 +354,46 @@ static inline int do_refcount_check(struct mount *mnt, int count)
        return mnt_get_count(mnt) > count;
 }
 
+/**
+ * propagation_would_overmount - check whether propagation from @from
+ *                               would overmount @to
+ * @from: shared mount
+ * @to:   mount to check
+ * @mp:   future mountpoint of @to on @from
+ *
+ * If @from propagates mounts to @to, @from and @to must either be peers
+ * or one of the masters in the hierarchy of masters of @to must be a
+ * peer of @from.
+ *
+ * If the root of the @to mount is equal to the future mountpoint @mp of
+ * the @to mount on @from then @to will be overmounted by whatever is
+ * propagated to it.
+ *
+ * Context: This function expects namespace_lock() to be held and that
+ *          @mp is stable.
+ * Return: If @from overmounts @to, true is returned, false if not.
+ */
+bool propagation_would_overmount(const struct mount *from,
+                                const struct mount *to,
+                                const struct mountpoint *mp)
+{
+       if (!IS_MNT_SHARED(from))
+               return false;
+
+       if (IS_MNT_NEW(to))
+               return false;
+
+       if (to->mnt.mnt_root != mp->m_dentry)
+               return false;
+
+       for (const struct mount *m = to; m; m = m->mnt_master) {
+               if (peers(from, m))
+                       return true;
+       }
+
+       return false;
+}
+
 /*
  * check if the mount 'mnt' can be unmounted successfully.
  * @mnt: the mount to be checked for unmount
index 988f1aa..0b02a63 100644 (file)
@@ -53,4 +53,7 @@ struct mount *copy_tree(struct mount *, struct dentry *, int);
 bool is_path_reachable(struct mount *, struct dentry *,
                         const struct path *root);
 int count_mounts(struct mnt_namespace *ns, struct mount *mnt);
+bool propagation_would_overmount(const struct mount *from,
+                                const struct mount *to,
+                                const struct mountpoint *mp);
 #endif /* _LINUX_PNODE_H */
index f495fdb..67b09a1 100644 (file)
@@ -591,7 +591,7 @@ static const struct file_operations proc_iter_file_ops = {
        .llseek         = proc_reg_llseek,
        .read_iter      = proc_reg_read_iter,
        .write          = proc_reg_write,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .poll           = proc_reg_poll,
        .unlocked_ioctl = proc_reg_unlocked_ioctl,
        .mmap           = proc_reg_mmap,
@@ -617,7 +617,7 @@ static const struct file_operations proc_reg_file_ops_compat = {
 static const struct file_operations proc_iter_file_ops_compat = {
        .llseek         = proc_reg_llseek,
        .read_iter      = proc_reg_read_iter,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .write          = proc_reg_write,
        .poll           = proc_reg_poll,
        .unlocked_ioctl = proc_reg_unlocked_ioctl,
index b43d0bd..8dca4d6 100644 (file)
@@ -168,6 +168,11 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
                    global_zone_page_state(NR_FREE_CMA_PAGES));
 #endif
 
+#ifdef CONFIG_UNACCEPTED_MEMORY
+       show_val_kb(m, "Unaccepted:     ",
+                   global_zone_page_state(NR_UNACCEPTED));
+#endif
+
        hugetlb_report_meminfo(m);
 
        arch_report_meminfo(m);
index 8038833..ae832e9 100644 (file)
@@ -868,7 +868,7 @@ static const struct file_operations proc_sys_file_operations = {
        .poll           = proc_sys_poll,
        .read_iter      = proc_sys_read,
        .write_iter     = proc_sys_write,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .splice_write   = iter_file_splice_write,
        .llseek         = default_llseek,
 };
index 846f945..250eb5b 100644 (file)
@@ -324,7 +324,7 @@ static int mountstats_open(struct inode *inode, struct file *file)
 const struct file_operations proc_mounts_operations = {
        .open           = mounts_open,
        .read_iter      = seq_read_iter,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .llseek         = seq_lseek,
        .release        = mounts_release,
        .poll           = mounts_poll,
@@ -333,7 +333,7 @@ const struct file_operations proc_mounts_operations = {
 const struct file_operations proc_mountinfo_operations = {
        .open           = mountinfo_open,
        .read_iter      = seq_read_iter,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .llseek         = seq_lseek,
        .release        = mounts_release,
        .poll           = mounts_poll,
@@ -342,7 +342,7 @@ const struct file_operations proc_mountinfo_operations = {
 const struct file_operations proc_mountstats_operations = {
        .open           = mountstats_open,
        .read_iter      = seq_read_iter,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .llseek         = seq_lseek,
        .release        = mounts_release,
 };
index 4ae0cfc..de8cf5d 100644 (file)
@@ -263,9 +263,9 @@ static __init const char *early_boot_devpath(const char *initial_devname)
         * same scheme to find the device that we use for mounting
         * the root file system.
         */
-       dev_t dev = name_to_dev_t(initial_devname);
+       dev_t dev;
 
-       if (!dev) {
+       if (early_lookup_bdev(initial_devname, &dev)) {
                pr_err("failed to resolve '%s'!\n", initial_devname);
                return initial_devname;
        }
index 12af049..c7a1aa3 100644 (file)
@@ -43,7 +43,7 @@ const struct file_operations ramfs_file_operations = {
        .write_iter     = generic_file_write_iter,
        .mmap           = generic_file_mmap,
        .fsync          = noop_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
        .llseek         = generic_file_llseek,
        .get_unmapped_area      = ramfs_mmu_get_unmapped_area,
index 9fbb9b5..efb1b4c 100644 (file)
@@ -43,7 +43,7 @@ const struct file_operations ramfs_file_operations = {
        .read_iter              = generic_file_read_iter,
        .write_iter             = generic_file_write_iter,
        .fsync                  = noop_fsync,
-       .splice_read            = generic_file_splice_read,
+       .splice_read            = filemap_splice_read,
        .splice_write           = iter_file_splice_write,
        .llseek                 = generic_file_llseek,
 };
index a21ba3b..b07de77 100644 (file)
@@ -29,7 +29,7 @@ const struct file_operations generic_ro_fops = {
        .llseek         = generic_file_llseek,
        .read_iter      = generic_file_read_iter,
        .mmap           = generic_file_readonly_mmap,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
 };
 
 EXPORT_SYMBOL(generic_ro_fops);
index 9c53edb..b264ce6 100644 (file)
@@ -131,7 +131,7 @@ struct old_linux_dirent {
        unsigned long   d_ino;
        unsigned long   d_offset;
        unsigned short  d_namlen;
-       char            d_name[1];
+       char            d_name[];
 };
 
 struct readdir_callback {
@@ -208,7 +208,7 @@ struct linux_dirent {
        unsigned long   d_ino;
        unsigned long   d_off;
        unsigned short  d_reclen;
-       char            d_name[1];
+       char            d_name[];
 };
 
 struct getdents_callback {
@@ -388,7 +388,7 @@ struct compat_old_linux_dirent {
        compat_ulong_t  d_ino;
        compat_ulong_t  d_offset;
        unsigned short  d_namlen;
-       char            d_name[1];
+       char            d_name[];
 };
 
 struct compat_readdir_callback {
@@ -460,7 +460,7 @@ struct compat_linux_dirent {
        compat_ulong_t  d_ino;
        compat_ulong_t  d_off;
        unsigned short  d_reclen;
-       char            d_name[1];
+       char            d_name[];
 };
 
 struct compat_getdents_callback {
index b54cc70..8eb3ad3 100644 (file)
@@ -247,7 +247,7 @@ const struct file_operations reiserfs_file_operations = {
        .fsync = reiserfs_sync_file,
        .read_iter = generic_file_read_iter,
        .write_iter = generic_file_write_iter,
-       .splice_read = generic_file_splice_read,
+       .splice_read = filemap_splice_read,
        .splice_write = iter_file_splice_write,
        .llseek = generic_file_llseek,
 };
index 4d11d60..479aa4a 100644 (file)
@@ -2589,7 +2589,12 @@ static void release_journal_dev(struct super_block *super,
                               struct reiserfs_journal *journal)
 {
        if (journal->j_dev_bd != NULL) {
-               blkdev_put(journal->j_dev_bd, journal->j_dev_mode);
+               void *holder = NULL;
+
+               if (journal->j_dev_bd->bd_dev != super->s_dev)
+                       holder = journal;
+
+               blkdev_put(journal->j_dev_bd, holder);
                journal->j_dev_bd = NULL;
        }
 }
@@ -2598,9 +2603,10 @@ static int journal_init_dev(struct super_block *super,
                            struct reiserfs_journal *journal,
                            const char *jdev_name)
 {
+       blk_mode_t blkdev_mode = BLK_OPEN_READ;
+       void *holder = journal;
        int result;
        dev_t jdev;
-       fmode_t blkdev_mode = FMODE_READ | FMODE_WRITE | FMODE_EXCL;
 
        result = 0;
 
@@ -2608,16 +2614,15 @@ static int journal_init_dev(struct super_block *super,
        jdev = SB_ONDISK_JOURNAL_DEVICE(super) ?
            new_decode_dev(SB_ONDISK_JOURNAL_DEVICE(super)) : super->s_dev;
 
-       if (bdev_read_only(super->s_bdev))
-               blkdev_mode = FMODE_READ;
+       if (!bdev_read_only(super->s_bdev))
+               blkdev_mode |= BLK_OPEN_WRITE;
 
        /* there is no "jdev" option and journal is on separate device */
        if ((!jdev_name || !jdev_name[0])) {
                if (jdev == super->s_dev)
-                       blkdev_mode &= ~FMODE_EXCL;
-               journal->j_dev_bd = blkdev_get_by_dev(jdev, blkdev_mode,
-                                                     journal);
-               journal->j_dev_mode = blkdev_mode;
+                       holder = NULL;
+               journal->j_dev_bd = blkdev_get_by_dev(jdev, blkdev_mode, holder,
+                                                     NULL);
                if (IS_ERR(journal->j_dev_bd)) {
                        result = PTR_ERR(journal->j_dev_bd);
                        journal->j_dev_bd = NULL;
@@ -2631,8 +2636,8 @@ static int journal_init_dev(struct super_block *super,
                return 0;
        }
 
-       journal->j_dev_mode = blkdev_mode;
-       journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal);
+       journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, holder,
+                                              NULL);
        if (IS_ERR(journal->j_dev_bd)) {
                result = PTR_ERR(journal->j_dev_bd);
                journal->j_dev_bd = NULL;
index 1bccf6a..55e8525 100644 (file)
@@ -300,7 +300,6 @@ struct reiserfs_journal {
        struct reiserfs_journal_cnode *j_first;
 
        struct block_device *j_dev_bd;
-       fmode_t j_dev_mode;
 
        /* first block on s_dev of reserved area journal */
        int j_1st_reserved_block;
index 1331a89..87ae4f0 100644 (file)
@@ -15,6 +15,7 @@
 #include <linux/mount.h>
 #include <linux/fs.h>
 #include <linux/dax.h>
+#include <linux/overflow.h>
 #include "internal.h"
 
 #include <linux/uaccess.h>
@@ -101,10 +102,12 @@ static int generic_remap_checks(struct file *file_in, loff_t pos_in,
 static int remap_verify_area(struct file *file, loff_t pos, loff_t len,
                             bool write)
 {
+       loff_t tmp;
+
        if (unlikely(pos < 0 || len < 0))
                return -EINVAL;
 
-       if (unlikely((loff_t) (pos + len) < 0))
+       if (unlikely(check_add_overflow(pos, len, &tmp)))
                return -EINVAL;
 
        return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
index 4578dc4..4520ca4 100644 (file)
@@ -78,7 +78,7 @@ static unsigned romfs_mmap_capabilities(struct file *file)
 const struct file_operations romfs_ro_fops = {
        .llseek                 = generic_file_llseek,
        .read_iter              = generic_file_read_iter,
-       .splice_read            = generic_file_splice_read,
+       .splice_read            = filemap_splice_read,
        .mmap                   = romfs_mmap,
        .get_unmapped_area      = romfs_get_unmapped_area,
        .mmap_capabilities      = romfs_mmap_capabilities,
diff --git a/fs/smb/Kconfig b/fs/smb/Kconfig
new file mode 100644 (file)
index 0000000..ef42578
--- /dev/null
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# smbfs configuration
+
+source "fs/smb/client/Kconfig"
+source "fs/smb/server/Kconfig"
+
+config SMBFS
+       tristate
+       default y if CIFS=y || SMB_SERVER=y
+       default m if CIFS=m || SMB_SERVER=m
diff --git a/fs/smb/Makefile b/fs/smb/Makefile
new file mode 100644 (file)
index 0000000..9a1bf59
--- /dev/null
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_SMBFS)            += common/
+obj-$(CONFIG_CIFS)             += client/
+obj-$(CONFIG_SMB_SERVER)       += server/
similarity index 100%
rename from fs/cifs/Kconfig
rename to fs/smb/client/Kconfig
similarity index 100%
rename from fs/cifs/Makefile
rename to fs/smb/client/Makefile
similarity index 100%
rename from fs/cifs/asn1.c
rename to fs/smb/client/asn1.c
similarity index 95%
rename from fs/cifs/cifs_debug.c
rename to fs/smb/client/cifs_debug.c
index d4ed200..b279f74 100644 (file)
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/uaccess.h>
+#include <uapi/linux/ethtool.h>
 #include "cifspdu.h"
 #include "cifsglob.h"
 #include "cifsproto.h"
@@ -108,7 +109,7 @@ static void cifs_debug_tcon(struct seq_file *m, struct cifs_tcon *tcon)
        if ((tcon->seal) ||
            (tcon->ses->session_flags & SMB2_SESSION_FLAG_ENCRYPT_DATA) ||
            (tcon->share_flags & SHI1005_FLAGS_ENCRYPT_DATA))
-               seq_printf(m, " Encrypted");
+               seq_puts(m, " encrypted");
        if (tcon->nocase)
                seq_printf(m, " nocase");
        if (tcon->unix_ext)
@@ -130,12 +131,14 @@ cifs_dump_channel(struct seq_file *m, int i, struct cifs_chan *chan)
        struct TCP_Server_Info *server = chan->server;
 
        seq_printf(m, "\n\n\t\tChannel: %d ConnectionId: 0x%llx"
-                  "\n\t\tNumber of credits: %d Dialect 0x%x"
+                  "\n\t\tNumber of credits: %d,%d,%d Dialect 0x%x"
                   "\n\t\tTCP status: %d Instance: %d"
                   "\n\t\tLocal Users To Server: %d SecMode: 0x%x Req On Wire: %d"
                   "\n\t\tIn Send: %d In MaxReq Wait: %d",
                   i+1, server->conn_id,
                   server->credits,
+                  server->echo_credits,
+                  server->oplock_credits,
                   server->dialect,
                   server->tcpStatus,
                   server->reconnect_instance,
@@ -146,18 +149,62 @@ cifs_dump_channel(struct seq_file *m, int i, struct cifs_chan *chan)
                   atomic_read(&server->num_waiters));
 }
 
+static inline const char *smb_speed_to_str(size_t bps)
+{
+       size_t mbps = bps / 1000 / 1000;
+
+       switch (mbps) {
+       case SPEED_10:
+               return "10Mbps";
+       case SPEED_100:
+               return "100Mbps";
+       case SPEED_1000:
+               return "1Gbps";
+       case SPEED_2500:
+               return "2.5Gbps";
+       case SPEED_5000:
+               return "5Gbps";
+       case SPEED_10000:
+               return "10Gbps";
+       case SPEED_14000:
+               return "14Gbps";
+       case SPEED_20000:
+               return "20Gbps";
+       case SPEED_25000:
+               return "25Gbps";
+       case SPEED_40000:
+               return "40Gbps";
+       case SPEED_50000:
+               return "50Gbps";
+       case SPEED_56000:
+               return "56Gbps";
+       case SPEED_100000:
+               return "100Gbps";
+       case SPEED_200000:
+               return "200Gbps";
+       case SPEED_400000:
+               return "400Gbps";
+       case SPEED_800000:
+               return "800Gbps";
+       default:
+               return "Unknown";
+       }
+}
+
 static void
 cifs_dump_iface(struct seq_file *m, struct cifs_server_iface *iface)
 {
        struct sockaddr_in *ipv4 = (struct sockaddr_in *)&iface->sockaddr;
        struct sockaddr_in6 *ipv6 = (struct sockaddr_in6 *)&iface->sockaddr;
 
-       seq_printf(m, "\tSpeed: %zu bps\n", iface->speed);
+       seq_printf(m, "\tSpeed: %s\n", smb_speed_to_str(iface->speed));
        seq_puts(m, "\t\tCapabilities: ");
        if (iface->rdma_capable)
                seq_puts(m, "rdma ");
        if (iface->rss_capable)
                seq_puts(m, "rss ");
+       if (!iface->rdma_capable && !iface->rss_capable)
+               seq_puts(m, "None");
        seq_putc(m, '\n');
        if (iface->sockaddr.ss_family == AF_INET)
                seq_printf(m, "\t\tIPv4: %pI4\n", &ipv4->sin_addr);
@@ -350,8 +397,11 @@ static int cifs_debug_data_proc_show(struct seq_file *m, void *v)
                        atomic_read(&server->smbd_conn->mr_used_count));
 skip_rdma:
 #endif
-               seq_printf(m, "\nNumber of credits: %d Dialect 0x%x",
-                       server->credits,  server->dialect);
+               seq_printf(m, "\nNumber of credits: %d,%d,%d Dialect 0x%x",
+                       server->credits,
+                       server->echo_credits,
+                       server->oplock_credits,
+                       server->dialect);
                if (server->compress_algorithm == SMB3_COMPRESS_LZNT1)
                        seq_printf(m, " COMPRESS_LZNT1");
                else if (server->compress_algorithm == SMB3_COMPRESS_LZ77)
@@ -415,8 +465,12 @@ skip_rdma:
 
                        /* dump session id helpful for use with network trace */
                        seq_printf(m, " SessionId: 0x%llx", ses->Suid);
-                       if (ses->session_flags & SMB2_SESSION_FLAG_ENCRYPT_DATA)
+                       if (ses->session_flags & SMB2_SESSION_FLAG_ENCRYPT_DATA) {
                                seq_puts(m, " encrypted");
+                               /* can help in debugging to show encryption type */
+                               if (server->cipher_type == SMB2_ENCRYPTION_AES256_GCM)
+                                       seq_puts(m, "(gcm256)");
+                       }
                        if (ses->sign)
                                seq_puts(m, " signed");
 
similarity index 100%
rename from fs/cifs/cifs_swn.c
rename to fs/smb/client/cifs_swn.c
similarity index 100%
rename from fs/cifs/cifs_swn.h
rename to fs/smb/client/cifs_swn.h
similarity index 100%
rename from fs/cifs/cifsacl.c
rename to fs/smb/client/cifsacl.c
similarity index 100%
rename from fs/cifs/cifsacl.h
rename to fs/smb/client/cifsacl.h
similarity index 99%
rename from fs/cifs/cifsencrypt.c
rename to fs/smb/client/cifsencrypt.c
index 357bd27..ef4c2e3 100644 (file)
@@ -21,7 +21,7 @@
 #include <linux/random.h>
 #include <linux/highmem.h>
 #include <linux/fips.h>
-#include "../smbfs_common/arc4.h"
+#include "../common/arc4.h"
 #include <crypto/aead.h>
 
 /*
similarity index 98%
rename from fs/cifs/cifsfs.c
rename to fs/smb/client/cifsfs.c
index 32f7c81..4f4492e 100644 (file)
@@ -246,7 +246,7 @@ cifs_read_super(struct super_block *sb)
        if (cifs_sb->ctx->rasize)
                sb->s_bdi->ra_pages = cifs_sb->ctx->rasize / PAGE_SIZE;
        else
-               sb->s_bdi->ra_pages = cifs_sb->ctx->rsize / PAGE_SIZE;
+               sb->s_bdi->ra_pages = 2 * (cifs_sb->ctx->rsize / PAGE_SIZE);
 
        sb->s_blocksize = CIFS_MAX_MSGSIZE;
        sb->s_blocksize_bits = 14;      /* default 2**14 = CIFS_MAX_MSGSIZE */
@@ -744,6 +744,7 @@ static void cifs_umount_begin(struct super_block *sb)
        spin_unlock(&tcon->tc_lock);
        spin_unlock(&cifs_tcp_ses_lock);
 
+       cifs_close_all_deferred_files(tcon);
        /* cancel_brl_requests(tcon); */ /* BB mark all brl mids as exiting */
        /* cancel_notify_requests(tcon); */
        if (tcon->ses && tcon->ses->server) {
@@ -759,6 +760,20 @@ static void cifs_umount_begin(struct super_block *sb)
        return;
 }
 
+static int cifs_freeze(struct super_block *sb)
+{
+       struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
+       struct cifs_tcon *tcon;
+
+       if (cifs_sb == NULL)
+               return 0;
+
+       tcon = cifs_sb_master_tcon(cifs_sb);
+
+       cifs_close_all_deferred_files(tcon);
+       return 0;
+}
+
 #ifdef CONFIG_CIFS_STATS2
 static int cifs_show_stats(struct seq_file *s, struct dentry *root)
 {
@@ -797,6 +812,7 @@ static const struct super_operations cifs_super_ops = {
        as opens */
        .show_options = cifs_show_options,
        .umount_begin   = cifs_umount_begin,
+       .freeze_fs      = cifs_freeze,
 #ifdef CONFIG_CIFS_STATS2
        .show_stats = cifs_show_stats,
 #endif
@@ -1360,7 +1376,7 @@ const struct file_operations cifs_file_ops = {
        .fsync = cifs_fsync,
        .flush = cifs_flush,
        .mmap  = cifs_file_mmap,
-       .splice_read = cifs_splice_read,
+       .splice_read = filemap_splice_read,
        .splice_write = iter_file_splice_write,
        .llseek = cifs_llseek,
        .unlocked_ioctl = cifs_ioctl,
@@ -1380,7 +1396,7 @@ const struct file_operations cifs_file_strict_ops = {
        .fsync = cifs_strict_fsync,
        .flush = cifs_flush,
        .mmap = cifs_file_strict_mmap,
-       .splice_read = cifs_splice_read,
+       .splice_read = filemap_splice_read,
        .splice_write = iter_file_splice_write,
        .llseek = cifs_llseek,
        .unlocked_ioctl = cifs_ioctl,
@@ -1400,7 +1416,7 @@ const struct file_operations cifs_file_direct_ops = {
        .fsync = cifs_fsync,
        .flush = cifs_flush,
        .mmap = cifs_file_mmap,
-       .splice_read = direct_splice_read,
+       .splice_read = copy_splice_read,
        .splice_write = iter_file_splice_write,
        .unlocked_ioctl  = cifs_ioctl,
        .copy_file_range = cifs_copy_file_range,
@@ -1418,7 +1434,7 @@ const struct file_operations cifs_file_nobrl_ops = {
        .fsync = cifs_fsync,
        .flush = cifs_flush,
        .mmap  = cifs_file_mmap,
-       .splice_read = cifs_splice_read,
+       .splice_read = filemap_splice_read,
        .splice_write = iter_file_splice_write,
        .llseek = cifs_llseek,
        .unlocked_ioctl = cifs_ioctl,
@@ -1436,7 +1452,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
        .fsync = cifs_strict_fsync,
        .flush = cifs_flush,
        .mmap = cifs_file_strict_mmap,
-       .splice_read = cifs_splice_read,
+       .splice_read = filemap_splice_read,
        .splice_write = iter_file_splice_write,
        .llseek = cifs_llseek,
        .unlocked_ioctl = cifs_ioctl,
@@ -1454,7 +1470,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
        .fsync = cifs_fsync,
        .flush = cifs_flush,
        .mmap = cifs_file_mmap,
-       .splice_read = direct_splice_read,
+       .splice_read = copy_splice_read,
        .splice_write = iter_file_splice_write,
        .unlocked_ioctl  = cifs_ioctl,
        .copy_file_range = cifs_copy_file_range,
similarity index 98%
rename from fs/cifs/cifsfs.h
rename to fs/smb/client/cifsfs.h
index 74cd6fa..d7274ee 100644 (file)
@@ -100,9 +100,6 @@ extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
-extern ssize_t cifs_splice_read(struct file *in, loff_t *ppos,
-                               struct pipe_inode_info *pipe, size_t len,
-                               unsigned int flags);
 extern int cifs_flock(struct file *pfile, int cmd, struct file_lock *plock);
 extern int cifs_lock(struct file *, int, struct file_lock *);
 extern int cifs_fsync(struct file *, loff_t, loff_t, int);
similarity index 98%
rename from fs/cifs/cifsglob.h
rename to fs/smb/client/cifsglob.h
index 414685c..b212a4e 100644 (file)
@@ -24,7 +24,7 @@
 #include "cifsacl.h"
 #include <crypto/internal/hash.h>
 #include <uapi/linux/cifs/cifs_mount.h>
-#include "../smbfs_common/smb2pdu.h"
+#include "../common/smb2pdu.h"
 #include "smb2pdu.h"
 #include <linux/filelock.h>
 
@@ -424,8 +424,8 @@ struct smb_version_operations {
        /* check for STATUS_NETWORK_SESSION_EXPIRED */
        bool (*is_session_expired)(char *);
        /* send oplock break response */
-       int (*oplock_response)(struct cifs_tcon *, struct cifs_fid *,
-                              struct cifsInodeInfo *);
+       int (*oplock_response)(struct cifs_tcon *tcon, __u64 persistent_fid, __u64 volatile_fid,
+                       __u16 net_fid, struct cifsInodeInfo *cifs_inode);
        /* query remote filesystem */
        int (*queryfs)(const unsigned int, struct cifs_tcon *,
                       struct cifs_sb_info *, struct kstatfs *);
@@ -970,43 +970,6 @@ release_iface(struct kref *ref)
        kfree(iface);
 }
 
-/*
- * compare two interfaces a and b
- * return 0 if everything matches.
- * return 1 if a has higher link speed, or rdma capable, or rss capable
- * return -1 otherwise.
- */
-static inline int
-iface_cmp(struct cifs_server_iface *a, struct cifs_server_iface *b)
-{
-       int cmp_ret = 0;
-
-       WARN_ON(!a || !b);
-       if (a->speed == b->speed) {
-               if (a->rdma_capable == b->rdma_capable) {
-                       if (a->rss_capable == b->rss_capable) {
-                               cmp_ret = memcmp(&a->sockaddr, &b->sockaddr,
-                                                sizeof(a->sockaddr));
-                               if (!cmp_ret)
-                                       return 0;
-                               else if (cmp_ret > 0)
-                                       return 1;
-                               else
-                                       return -1;
-                       } else if (a->rss_capable > b->rss_capable)
-                               return 1;
-                       else
-                               return -1;
-               } else if (a->rdma_capable > b->rdma_capable)
-                       return 1;
-               else
-                       return -1;
-       } else if (a->speed > b->speed)
-               return 1;
-       else
-               return -1;
-}
-
 struct cifs_chan {
        unsigned int in_reconnect : 1; /* if session setup in progress for this channel */
        struct TCP_Server_Info *server;
similarity index 99%
rename from fs/cifs/cifspdu.h
rename to fs/smb/client/cifspdu.h
index 445e3ea..e17222f 100644 (file)
@@ -11,7 +11,7 @@
 
 #include <net/sock.h>
 #include <asm/unaligned.h>
-#include "../smbfs_common/smbfsctl.h"
+#include "../common/smbfsctl.h"
 
 #define CIFS_PROT   0
 #define POSIX_PROT  (CIFS_PROT+1)
similarity index 99%
rename from fs/cifs/cifsproto.h
rename to fs/smb/client/cifsproto.h
index c1c7049..d127ade 100644 (file)
@@ -87,6 +87,7 @@ extern int cifs_handle_standard(struct TCP_Server_Info *server,
                                struct mid_q_entry *mid);
 extern int smb3_parse_devname(const char *devname, struct smb3_fs_context *ctx);
 extern int smb3_parse_opt(const char *options, const char *key, char **val);
+extern int cifs_ipaddr_cmp(struct sockaddr *srcaddr, struct sockaddr *rhs);
 extern bool cifs_match_ipaddr(struct sockaddr *srcaddr, struct sockaddr *rhs);
 extern int cifs_discard_remaining_data(struct TCP_Server_Info *server);
 extern int cifs_call_async(struct TCP_Server_Info *server,
similarity index 100%
rename from fs/cifs/cifsroot.c
rename to fs/smb/client/cifsroot.c
similarity index 100%
rename from fs/cifs/cifssmb.c
rename to fs/smb/client/cifssmb.c
similarity index 98%
rename from fs/cifs/connect.c
rename to fs/smb/client/connect.c
index eeeed6f..9d16626 100644 (file)
@@ -1288,6 +1288,56 @@ next_pdu:
        module_put_and_kthread_exit(0);
 }
 
+int
+cifs_ipaddr_cmp(struct sockaddr *srcaddr, struct sockaddr *rhs)
+{
+       struct sockaddr_in *saddr4 = (struct sockaddr_in *)srcaddr;
+       struct sockaddr_in *vaddr4 = (struct sockaddr_in *)rhs;
+       struct sockaddr_in6 *saddr6 = (struct sockaddr_in6 *)srcaddr;
+       struct sockaddr_in6 *vaddr6 = (struct sockaddr_in6 *)rhs;
+
+       switch (srcaddr->sa_family) {
+       case AF_UNSPEC:
+               switch (rhs->sa_family) {
+               case AF_UNSPEC:
+                       return 0;
+               case AF_INET:
+               case AF_INET6:
+                       return 1;
+               default:
+                       return -1;
+               }
+       case AF_INET: {
+               switch (rhs->sa_family) {
+               case AF_UNSPEC:
+                       return -1;
+               case AF_INET:
+                       return memcmp(saddr4, vaddr4,
+                                     sizeof(struct sockaddr_in));
+               case AF_INET6:
+                       return 1;
+               default:
+                       return -1;
+               }
+       }
+       case AF_INET6: {
+               switch (rhs->sa_family) {
+               case AF_UNSPEC:
+               case AF_INET:
+                       return -1;
+               case AF_INET6:
+                       return memcmp(saddr6,
+                                     vaddr6,
+                                     sizeof(struct sockaddr_in6));
+               default:
+                       return -1;
+               }
+       }
+       default:
+               return -1; /* don't expect to be here */
+       }
+}
+
 /*
  * Returns true if srcaddr isn't specified and rhs isn't specified, or
  * if srcaddr is specified and matches the IP address of the rhs argument
@@ -2709,6 +2759,13 @@ cifs_match_super(struct super_block *sb, void *data)
 
        spin_lock(&cifs_tcp_ses_lock);
        cifs_sb = CIFS_SB(sb);
+
+       /* We do not want to use a superblock that has been shutdown */
+       if (CIFS_MOUNT_SHUTDOWN & cifs_sb->mnt_cifs_flags) {
+               spin_unlock(&cifs_tcp_ses_lock);
+               return 0;
+       }
+
        tlink = cifs_get_tlink(cifs_sb_master_tlink(cifs_sb));
        if (tlink == NULL) {
                /* can not match superblock if tlink were ever null */
@@ -4079,16 +4136,17 @@ int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const stru
 
        /* only send once per connect */
        spin_lock(&tcon->tc_lock);
+       if (tcon->status == TID_GOOD) {
+               spin_unlock(&tcon->tc_lock);
+               return 0;
+       }
+
        if (tcon->status != TID_NEW &&
            tcon->status != TID_NEED_TCON) {
                spin_unlock(&tcon->tc_lock);
                return -EHOSTDOWN;
        }
 
-       if (tcon->status == TID_GOOD) {
-               spin_unlock(&tcon->tc_lock);
-               return 0;
-       }
        tcon->status = TID_IN_TCON;
        spin_unlock(&tcon->tc_lock);
 
similarity index 99%
rename from fs/cifs/dfs.c
rename to fs/smb/client/dfs.c
index a93dbca..2390b2f 100644 (file)
@@ -303,7 +303,7 @@ int dfs_mount_share(struct cifs_mount_ctx *mnt_ctx, bool *isdfs)
        if (!nodfs) {
                rc = dfs_get_referral(mnt_ctx, ctx->UNC + 1, NULL, NULL);
                if (rc) {
-                       if (rc != -ENOENT && rc != -EOPNOTSUPP)
+                       if (rc != -ENOENT && rc != -EOPNOTSUPP && rc != -EIO)
                                goto out;
                        nodfs = true;
                }
@@ -575,16 +575,17 @@ int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const stru
 
        /* only send once per connect */
        spin_lock(&tcon->tc_lock);
+       if (tcon->status == TID_GOOD) {
+               spin_unlock(&tcon->tc_lock);
+               return 0;
+       }
+
        if (tcon->status != TID_NEW &&
            tcon->status != TID_NEED_TCON) {
                spin_unlock(&tcon->tc_lock);
                return -EHOSTDOWN;
        }
 
-       if (tcon->status == TID_GOOD) {
-               spin_unlock(&tcon->tc_lock);
-               return 0;
-       }
        tcon->status = TID_IN_TCON;
        spin_unlock(&tcon->tc_lock);
 
similarity index 100%
rename from fs/cifs/dfs.h
rename to fs/smb/client/dfs.h
similarity index 100%
rename from fs/cifs/dir.c
rename to fs/smb/client/dir.c
similarity index 100%
rename from fs/cifs/export.c
rename to fs/smb/client/export.c
similarity index 99%
rename from fs/cifs/file.c
rename to fs/smb/client/file.c
index c5fcefd..f30f6dd 100644 (file)
@@ -3353,9 +3353,10 @@ static size_t cifs_limit_bvec_subset(const struct iov_iter *iter, size_t max_siz
        while (n && ix < nbv) {
                len = min3(n, bvecs[ix].bv_len - skip, max_size);
                span += len;
+               max_size -= len;
                nsegs++;
                ix++;
-               if (span >= max_size || nsegs >= max_segs)
+               if (max_size == 0 || nsegs >= max_segs)
                        break;
                skip = 0;
                n -= len;
@@ -4881,9 +4882,9 @@ void cifs_oplock_break(struct work_struct *work)
        struct cifs_tcon *tcon = tlink_tcon(cfile->tlink);
        struct TCP_Server_Info *server = tcon->ses->server;
        int rc = 0;
-       bool purge_cache = false;
-       struct cifs_deferred_close *dclose;
-       bool is_deferred = false;
+       bool purge_cache = false, oplock_break_cancelled;
+       __u64 persistent_fid, volatile_fid;
+       __u16 net_fid;
 
        wait_on_bit(&cinode->flags, CIFS_INODE_PENDING_WRITERS,
                        TASK_UNINTERRUPTIBLE);
@@ -4924,28 +4925,32 @@ oplock_break_ack:
         * file handles but cached, then schedule deferred close immediately.
         * So, new open will not use cached handle.
         */
-       spin_lock(&CIFS_I(inode)->deferred_lock);
-       is_deferred = cifs_is_deferred_close(cfile, &dclose);
-       spin_unlock(&CIFS_I(inode)->deferred_lock);
 
-       if (!CIFS_CACHE_HANDLE(cinode) && is_deferred &&
-                       cfile->deferred_close_scheduled && delayed_work_pending(&cfile->deferred)) {
+       if (!CIFS_CACHE_HANDLE(cinode) && !list_empty(&cinode->deferred_closes))
                cifs_close_deferred_file(cinode);
-       }
 
+       persistent_fid = cfile->fid.persistent_fid;
+       volatile_fid = cfile->fid.volatile_fid;
+       net_fid = cfile->fid.netfid;
+       oplock_break_cancelled = cfile->oplock_break_cancelled;
+
+       _cifsFileInfo_put(cfile, false /* do not wait for ourself */, false);
        /*
         * releasing stale oplock after recent reconnect of smb session using
         * a now incorrect file handle is not a data integrity issue but do
         * not bother sending an oplock release if session to server still is
         * disconnected since oplock already released by the server
         */
-       if (!cfile->oplock_break_cancelled) {
-               rc = tcon->ses->server->ops->oplock_response(tcon, &cfile->fid,
-                                                            cinode);
-               cifs_dbg(FYI, "Oplock release rc = %d\n", rc);
+       if (!oplock_break_cancelled) {
+               /* check for server null since can race with kill_sb calling tree disconnect */
+               if (tcon->ses && tcon->ses->server) {
+                       rc = tcon->ses->server->ops->oplock_response(tcon, persistent_fid,
+                               volatile_fid, net_fid, cinode);
+                       cifs_dbg(FYI, "Oplock release rc = %d\n", rc);
+               } else
+                       pr_warn_once("lease break not sent for unmounted share\n");
        }
 
-       _cifsFileInfo_put(cfile, false /* do not wait for ourself */, false);
        cifs_done_oplock_break(cinode);
 }
 
@@ -5078,19 +5083,3 @@ const struct address_space_operations cifs_addr_ops_smallbuf = {
        .launder_folio = cifs_launder_folio,
        .migrate_folio = filemap_migrate_folio,
 };
-
-/*
- * Splice data from a file into a pipe.
- */
-ssize_t cifs_splice_read(struct file *in, loff_t *ppos,
-                        struct pipe_inode_info *pipe, size_t len,
-                        unsigned int flags)
-{
-       if (unlikely(*ppos >= file_inode(in)->i_sb->s_maxbytes))
-               return 0;
-       if (unlikely(!len))
-               return 0;
-       if (in->f_flags & O_DIRECT)
-               return direct_splice_read(in, ppos, pipe, len, flags);
-       return filemap_splice_read(in, ppos, pipe, len, flags);
-}
similarity index 99%
rename from fs/cifs/fs_context.c
rename to fs/smb/client/fs_context.c
index ace11a1..1bda756 100644 (file)
@@ -904,6 +904,14 @@ static int smb3_fs_context_parse_param(struct fs_context *fc,
                        ctx->sfu_remap = false; /* disable SFU mapping */
                }
                break;
+       case Opt_mapchars:
+               if (result.negated)
+                       ctx->sfu_remap = false;
+               else {
+                       ctx->sfu_remap = true;
+                       ctx->remap = false; /* disable SFM (mapposix) mapping */
+               }
+               break;
        case Opt_user_xattr:
                if (result.negated)
                        ctx->no_xattr = 1;
similarity index 100%
rename from fs/cifs/fscache.c
rename to fs/smb/client/fscache.c
similarity index 100%
rename from fs/cifs/fscache.h
rename to fs/smb/client/fscache.h
similarity index 100%
rename from fs/cifs/inode.c
rename to fs/smb/client/inode.c
similarity index 98%
rename from fs/cifs/ioctl.c
rename to fs/smb/client/ioctl.c
index cb3be58..fff092b 100644 (file)
@@ -321,7 +321,11 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
        struct tcon_link *tlink;
        struct cifs_sb_info *cifs_sb;
        __u64   ExtAttrBits = 0;
+#ifdef CONFIG_CIFS_POSIX
+#ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY
        __u64   caps;
+#endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */
+#endif /* CONFIG_CIFS_POSIX */
 
        xid = get_xid();
 
@@ -331,9 +335,9 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
                        if (pSMBFile == NULL)
                                break;
                        tcon = tlink_tcon(pSMBFile->tlink);
-                       caps = le64_to_cpu(tcon->fsUnixInfo.Capability);
 #ifdef CONFIG_CIFS_POSIX
 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY
+                       caps = le64_to_cpu(tcon->fsUnixInfo.Capability);
                        if (CIFS_UNIX_EXTATTR_CAP & caps) {
                                __u64   ExtAttrMask = 0;
                                rc = CIFSGetExtAttr(xid, tcon,
similarity index 100%
rename from fs/cifs/link.c
rename to fs/smb/client/link.c
similarity index 100%
rename from fs/cifs/misc.c
rename to fs/smb/client/misc.c
similarity index 100%
rename from fs/cifs/netlink.c
rename to fs/smb/client/netlink.c
similarity index 100%
rename from fs/cifs/netlink.h
rename to fs/smb/client/netlink.h
similarity index 100%
rename from fs/cifs/netmisc.c
rename to fs/smb/client/netmisc.c
similarity index 100%
rename from fs/cifs/nterr.c
rename to fs/smb/client/nterr.c
similarity index 100%
rename from fs/cifs/nterr.h
rename to fs/smb/client/nterr.h
similarity index 100%
rename from fs/cifs/ntlmssp.h
rename to fs/smb/client/ntlmssp.h
similarity index 100%
rename from fs/cifs/readdir.c
rename to fs/smb/client/readdir.c
similarity index 100%
rename from fs/cifs/sess.c
rename to fs/smb/client/sess.c
similarity index 99%
rename from fs/cifs/smb1ops.c
rename to fs/smb/client/smb1ops.c
index abda614..7d1b3fc 100644 (file)
@@ -897,12 +897,11 @@ cifs_close_dir(const unsigned int xid, struct cifs_tcon *tcon,
 }
 
 static int
-cifs_oplock_response(struct cifs_tcon *tcon, struct cifs_fid *fid,
-                    struct cifsInodeInfo *cinode)
+cifs_oplock_response(struct cifs_tcon *tcon, __u64 persistent_fid,
+               __u64 volatile_fid, __u16 net_fid, struct cifsInodeInfo *cinode)
 {
-       return CIFSSMBLock(0, tcon, fid->netfid, current->tgid, 0, 0, 0, 0,
-                          LOCKING_ANDX_OPLOCK_RELEASE, false,
-                          CIFS_CACHE_READ(cinode) ? 1 : 0);
+       return CIFSSMBLock(0, tcon, net_fid, current->tgid, 0, 0, 0, 0,
+                          LOCKING_ANDX_OPLOCK_RELEASE, false, CIFS_CACHE_READ(cinode) ? 1 : 0);
 }
 
 static int
similarity index 100%
rename from fs/cifs/smb2file.c
rename to fs/smb/client/smb2file.c
similarity index 100%
rename from fs/cifs/smb2glob.h
rename to fs/smb/client/smb2glob.h
similarity index 100%
rename from fs/cifs/smb2misc.c
rename to fs/smb/client/smb2misc.c
similarity index 99%
rename from fs/cifs/smb2ops.c
rename to fs/smb/client/smb2ops.c
index a817582..a8bb9d0 100644 (file)
@@ -34,6 +34,8 @@ static int
 change_conf(struct TCP_Server_Info *server)
 {
        server->credits += server->echo_credits + server->oplock_credits;
+       if (server->credits > server->max_credits)
+               server->credits = server->max_credits;
        server->oplock_credits = server->echo_credits = 0;
        switch (server->credits) {
        case 0:
@@ -91,6 +93,7 @@ smb2_add_credits(struct TCP_Server_Info *server,
                                            server->conn_id, server->hostname, *val,
                                            add, server->in_flight);
        }
+       WARN_ON_ONCE(server->in_flight == 0);
        server->in_flight--;
        if (server->in_flight == 0 &&
           ((optype & CIFS_OP_MASK) != CIFS_NEG_OP) &&
@@ -510,6 +513,43 @@ smb3_negotiate_rsize(struct cifs_tcon *tcon, struct smb3_fs_context *ctx)
        return rsize;
 }
 
+/*
+ * compare two interfaces a and b
+ * return 0 if everything matches.
+ * return 1 if a is rdma capable, or rss capable, or has higher link speed
+ * return -1 otherwise.
+ */
+static int
+iface_cmp(struct cifs_server_iface *a, struct cifs_server_iface *b)
+{
+       int cmp_ret = 0;
+
+       WARN_ON(!a || !b);
+       if (a->rdma_capable == b->rdma_capable) {
+               if (a->rss_capable == b->rss_capable) {
+                       if (a->speed == b->speed) {
+                               cmp_ret = cifs_ipaddr_cmp((struct sockaddr *) &a->sockaddr,
+                                                         (struct sockaddr *) &b->sockaddr);
+                               if (!cmp_ret)
+                                       return 0;
+                               else if (cmp_ret > 0)
+                                       return 1;
+                               else
+                                       return -1;
+                       } else if (a->speed > b->speed)
+                               return 1;
+                       else
+                               return -1;
+               } else if (a->rss_capable > b->rss_capable)
+                       return 1;
+               else
+                       return -1;
+       } else if (a->rdma_capable > b->rdma_capable)
+               return 1;
+       else
+               return -1;
+}
+
 static int
 parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf,
                        size_t buf_len, struct cifs_ses *ses, bool in_mount)
@@ -618,7 +658,6 @@ parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf,
                 * Add a new one instead
                 */
                spin_lock(&ses->iface_lock);
-               iface = niface = NULL;
                list_for_each_entry_safe(iface, niface, &ses->iface_list,
                                         iface_head) {
                        ret = iface_cmp(iface, &tmp_iface);
@@ -1682,7 +1721,7 @@ smb2_copychunk_range(const unsigned int xid,
                pcchunk->SourceOffset = cpu_to_le64(src_off);
                pcchunk->TargetOffset = cpu_to_le64(dest_off);
                pcchunk->Length =
-                       cpu_to_le32(min_t(u32, len, tcon->max_bytes_chunk));
+                       cpu_to_le32(min_t(u64, len, tcon->max_bytes_chunk));
 
                /* Request server copy to target from src identified by key */
                kfree(retbuf);
@@ -2383,15 +2422,14 @@ smb2_is_network_name_deleted(char *buf, struct TCP_Server_Info *server)
 }
 
 static int
-smb2_oplock_response(struct cifs_tcon *tcon, struct cifs_fid *fid,
-                    struct cifsInodeInfo *cinode)
+smb2_oplock_response(struct cifs_tcon *tcon, __u64 persistent_fid,
+               __u64 volatile_fid, __u16 net_fid, struct cifsInodeInfo *cinode)
 {
        if (tcon->ses->server->capabilities & SMB2_GLOBAL_CAP_LEASING)
                return SMB2_lease_break(0, tcon, cinode->lease_key,
                                        smb2_get_lease_state(cinode));
 
-       return SMB2_oplock_break(0, tcon, fid->persistent_fid,
-                                fid->volatile_fid,
+       return SMB2_oplock_break(0, tcon, persistent_fid, volatile_fid,
                                 CIFS_CACHE_READ(cinode) ? 1 : 0);
 }
 
similarity index 99%
rename from fs/cifs/smb2pdu.c
rename to fs/smb/client/smb2pdu.c
index e33ca0d..17fe212 100644 (file)
@@ -1305,7 +1305,12 @@ SMB2_sess_alloc_buffer(struct SMB2_sess_data *sess_data)
        }
 
        /* enough to enable echos and oplocks and one max size write */
-       req->hdr.CreditRequest = cpu_to_le16(130);
+       if (server->credits >= server->max_credits)
+               req->hdr.CreditRequest = cpu_to_le16(0);
+       else
+               req->hdr.CreditRequest = cpu_to_le16(
+                       min_t(int, server->max_credits -
+                             server->credits, 130));
 
        /* only one of SMB2 signing flags may be set in SMB2 request */
        if (server->sign)
@@ -1899,7 +1904,12 @@ SMB2_tcon(const unsigned int xid, struct cifs_ses *ses, const char *tree,
        rqst.rq_nvec = 2;
 
        /* Need 64 for max size write so ask for more in case not there yet */
-       req->hdr.CreditRequest = cpu_to_le16(64);
+       if (server->credits >= server->max_credits)
+               req->hdr.CreditRequest = cpu_to_le16(0);
+       else
+               req->hdr.CreditRequest = cpu_to_le16(
+                       min_t(int, server->max_credits -
+                             server->credits, 64));
 
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
@@ -1947,6 +1957,9 @@ SMB2_tcon(const unsigned int xid, struct cifs_ses *ses, const char *tree,
        init_copy_chunk_defaults(tcon);
        if (server->ops->validate_negotiate)
                rc = server->ops->validate_negotiate(xid, tcon);
+       if (rc == 0) /* See MS-SMB2 2.2.10 and 3.2.5.5 */
+               if (tcon->share_flags & SMB2_SHAREFLAG_ISOLATED_TRANSPORT)
+                       server->nosharesock = true;
 tcon_exit:
 
        free_rsp_buf(resp_buftype, rsp);
@@ -3722,7 +3735,7 @@ SMB2_change_notify(const unsigned int xid, struct cifs_tcon *tcon,
                if (*out_data == NULL) {
                        rc = -ENOMEM;
                        goto cnotify_exit;
-               } else
+               } else if (plen)
                        *plen = le32_to_cpu(smb_rsp->OutputBufferLength);
        }
 
@@ -4224,6 +4237,7 @@ smb2_async_readv(struct cifs_readdata *rdata)
        struct TCP_Server_Info *server;
        struct cifs_tcon *tcon = tlink_tcon(rdata->cfile->tlink);
        unsigned int total_len;
+       int credit_request;
 
        cifs_dbg(FYI, "%s: offset=%llu bytes=%u\n",
                 __func__, rdata->offset, rdata->bytes);
@@ -4255,7 +4269,13 @@ smb2_async_readv(struct cifs_readdata *rdata)
        if (rdata->credits.value > 0) {
                shdr->CreditCharge = cpu_to_le16(DIV_ROUND_UP(rdata->bytes,
                                                SMB2_MAX_BUFFER_SIZE));
-               shdr->CreditRequest = cpu_to_le16(le16_to_cpu(shdr->CreditCharge) + 8);
+               credit_request = le16_to_cpu(shdr->CreditCharge) + 8;
+               if (server->credits >= server->max_credits)
+                       shdr->CreditRequest = cpu_to_le16(0);
+               else
+                       shdr->CreditRequest = cpu_to_le16(
+                               min_t(int, server->max_credits -
+                                               server->credits, credit_request));
 
                rc = adjust_credits(server, &rdata->credits, rdata->bytes);
                if (rc)
@@ -4465,6 +4485,7 @@ smb2_async_writev(struct cifs_writedata *wdata,
        unsigned int total_len;
        struct cifs_io_parms _io_parms;
        struct cifs_io_parms *io_parms = NULL;
+       int credit_request;
 
        if (!wdata->server)
                server = wdata->server = cifs_pick_channel(tcon->ses);
@@ -4569,7 +4590,13 @@ smb2_async_writev(struct cifs_writedata *wdata,
        if (wdata->credits.value > 0) {
                shdr->CreditCharge = cpu_to_le16(DIV_ROUND_UP(wdata->bytes,
                                                    SMB2_MAX_BUFFER_SIZE));
-               shdr->CreditRequest = cpu_to_le16(le16_to_cpu(shdr->CreditCharge) + 8);
+               credit_request = le16_to_cpu(shdr->CreditCharge) + 8;
+               if (server->credits >= server->max_credits)
+                       shdr->CreditRequest = cpu_to_le16(0);
+               else
+                       shdr->CreditRequest = cpu_to_le16(
+                               min_t(int, server->max_credits -
+                                               server->credits, credit_request));
 
                rc = adjust_credits(server, &wdata->credits, io_parms->length);
                if (rc)
similarity index 100%
rename from fs/cifs/smb2pdu.h
rename to fs/smb/client/smb2pdu.h
similarity index 98%
rename from fs/cifs/smbencrypt.c
rename to fs/smb/client/smbencrypt.c
index 4a04877..f0ce264 100644 (file)
@@ -24,7 +24,7 @@
 #include "cifsglob.h"
 #include "cifs_debug.h"
 #include "cifsproto.h"
-#include "../smbfs_common/md4.h"
+#include "../common/md4.h"
 
 #ifndef false
 #define false 0
similarity index 100%
rename from fs/cifs/smberr.h
rename to fs/smb/client/smberr.h
similarity index 100%
rename from fs/cifs/trace.c
rename to fs/smb/client/trace.c
similarity index 100%
rename from fs/cifs/trace.h
rename to fs/smb/client/trace.h
similarity index 99%
rename from fs/cifs/transport.c
rename to fs/smb/client/transport.c
index 24bdd5f..0474d0b 100644 (file)
@@ -55,7 +55,7 @@ alloc_mid(const struct smb_hdr *smb_buffer, struct TCP_Server_Info *server)
        temp->pid = current->pid;
        temp->command = cpu_to_le16(smb_buffer->Command);
        cifs_dbg(FYI, "For smb_command %d\n", smb_buffer->Command);
-       /*      do_gettimeofday(&temp->when_sent);*/ /* easier to use jiffies */
+       /* easier to use jiffies */
        /* when mid allocated can be before when sent */
        temp->when_alloc = jiffies;
        temp->server = server;
similarity index 100%
rename from fs/cifs/unc.c
rename to fs/smb/client/unc.c
similarity index 100%
rename from fs/cifs/winucase.c
rename to fs/smb/client/winucase.c
similarity index 100%
rename from fs/cifs/xattr.c
rename to fs/smb/client/xattr.c
similarity index 59%
rename from fs/smbfs_common/Makefile
rename to fs/smb/common/Makefile
index cafc61a..c66dbbc 100644 (file)
@@ -3,5 +3,5 @@
 # Makefile for Linux filesystem routines that are shared by client and server.
 #
 
-obj-$(CONFIG_SMBFS_COMMON) += cifs_arc4.o
-obj-$(CONFIG_SMBFS_COMMON) += cifs_md4.o
+obj-$(CONFIG_SMBFS) += cifs_arc4.o
+obj-$(CONFIG_SMBFS) += cifs_md4.o
similarity index 100%
rename from fs/smbfs_common/arc4.h
rename to fs/smb/common/arc4.h
similarity index 100%
rename from fs/smbfs_common/md4.h
rename to fs/smb/common/md4.h
similarity index 100%
rename from fs/ksmbd/Kconfig
rename to fs/smb/server/Kconfig
similarity index 100%
rename from fs/ksmbd/Makefile
rename to fs/smb/server/Makefile
similarity index 100%
rename from fs/ksmbd/asn1.c
rename to fs/smb/server/asn1.c
similarity index 100%
rename from fs/ksmbd/asn1.h
rename to fs/smb/server/asn1.h
similarity index 99%
rename from fs/ksmbd/auth.c
rename to fs/smb/server/auth.c
index df8fb07..5e5e120 100644 (file)
@@ -29,7 +29,7 @@
 #include "mgmt/user_config.h"
 #include "crypto_ctx.h"
 #include "transport_ipc.h"
-#include "../smbfs_common/arc4.h"
+#include "../common/arc4.h"
 
 /*
  * Fixed format data defining GSS header and fixed string
similarity index 100%
rename from fs/ksmbd/auth.h
rename to fs/smb/server/auth.h
similarity index 96%
rename from fs/ksmbd/connection.c
rename to fs/smb/server/connection.c
index 4ed379f..2a717d1 100644 (file)
@@ -294,6 +294,9 @@ bool ksmbd_conn_alive(struct ksmbd_conn *conn)
        return true;
 }
 
+#define SMB1_MIN_SUPPORTED_HEADER_SIZE (sizeof(struct smb_hdr))
+#define SMB2_MIN_SUPPORTED_HEADER_SIZE (sizeof(struct smb2_hdr) + 4)
+
 /**
  * ksmbd_conn_handler_loop() - session thread to listen on new smb requests
  * @p:         connection instance
@@ -350,15 +353,17 @@ int ksmbd_conn_handler_loop(void *p)
                if (pdu_size > MAX_STREAM_PROT_LEN)
                        break;
 
+               if (pdu_size < SMB1_MIN_SUPPORTED_HEADER_SIZE)
+                       break;
+
                /* 4 for rfc1002 length field */
-               size = pdu_size + 4;
+               /* 1 for implied bcc[0] */
+               size = pdu_size + 4 + 1;
                conn->request_buf = kvmalloc(size, GFP_KERNEL);
                if (!conn->request_buf)
                        break;
 
                memcpy(conn->request_buf, hdr_buf, sizeof(hdr_buf));
-               if (!ksmbd_smb_request(conn))
-                       break;
 
                /*
                 * We already read 4 bytes to find out PDU size, now
@@ -376,6 +381,15 @@ int ksmbd_conn_handler_loop(void *p)
                        continue;
                }
 
+               if (!ksmbd_smb_request(conn))
+                       break;
+
+               if (((struct smb2_hdr *)smb2_get_msg(conn->request_buf))->ProtocolId ==
+                   SMB2_PROTO_NUMBER) {
+                       if (pdu_size < SMB2_MIN_SUPPORTED_HEADER_SIZE)
+                               break;
+               }
+
                if (!default_conn_ops.process_fn) {
                        pr_err("No connection request callback\n");
                        break;
similarity index 100%
rename from fs/ksmbd/glob.h
rename to fs/smb/server/glob.h
similarity index 100%
rename from fs/ksmbd/misc.c
rename to fs/smb/server/misc.c
similarity index 100%
rename from fs/ksmbd/misc.h
rename to fs/smb/server/misc.h
similarity index 100%
rename from fs/ksmbd/ndr.c
rename to fs/smb/server/ndr.c
similarity index 100%
rename from fs/ksmbd/ndr.h
rename to fs/smb/server/ndr.h
similarity index 100%
rename from fs/ksmbd/nterr.h
rename to fs/smb/server/nterr.h
similarity index 100%
rename from fs/ksmbd/ntlmssp.h
rename to fs/smb/server/ntlmssp.h
similarity index 95%
rename from fs/ksmbd/oplock.c
rename to fs/smb/server/oplock.c
index 2e54ded..844b303 100644 (file)
@@ -157,13 +157,42 @@ static struct oplock_info *opinfo_get_list(struct ksmbd_inode *ci)
        rcu_read_lock();
        opinfo = list_first_or_null_rcu(&ci->m_op_list, struct oplock_info,
                                        op_entry);
-       if (opinfo && !atomic_inc_not_zero(&opinfo->refcount))
-               opinfo = NULL;
+       if (opinfo) {
+               if (!atomic_inc_not_zero(&opinfo->refcount))
+                       opinfo = NULL;
+               else {
+                       atomic_inc(&opinfo->conn->r_count);
+                       if (ksmbd_conn_releasing(opinfo->conn)) {
+                               atomic_dec(&opinfo->conn->r_count);
+                               atomic_dec(&opinfo->refcount);
+                               opinfo = NULL;
+                       }
+               }
+       }
+
        rcu_read_unlock();
 
        return opinfo;
 }
 
+static void opinfo_conn_put(struct oplock_info *opinfo)
+{
+       struct ksmbd_conn *conn;
+
+       if (!opinfo)
+               return;
+
+       conn = opinfo->conn;
+       /*
+        * Checking waitqueue to dropping pending requests on
+        * disconnection. waitqueue_active is safe because it
+        * uses atomic operation for condition.
+        */
+       if (!atomic_dec_return(&conn->r_count) && waitqueue_active(&conn->r_count_q))
+               wake_up(&conn->r_count_q);
+       opinfo_put(opinfo);
+}
+
 void opinfo_put(struct oplock_info *opinfo)
 {
        if (!atomic_dec_and_test(&opinfo->refcount))
@@ -666,13 +695,6 @@ static void __smb2_oplock_break_noti(struct work_struct *wk)
 
 out:
        ksmbd_free_work_struct(work);
-       /*
-        * Checking waitqueue to dropping pending requests on
-        * disconnection. waitqueue_active is safe because it
-        * uses atomic operation for condition.
-        */
-       if (!atomic_dec_return(&conn->r_count) && waitqueue_active(&conn->r_count_q))
-               wake_up(&conn->r_count_q);
 }
 
 /**
@@ -706,7 +728,6 @@ static int smb2_oplock_break_noti(struct oplock_info *opinfo)
        work->conn = conn;
        work->sess = opinfo->sess;
 
-       atomic_inc(&conn->r_count);
        if (opinfo->op_state == OPLOCK_ACK_WAIT) {
                INIT_WORK(&work->work, __smb2_oplock_break_noti);
                ksmbd_queue_work(work);
@@ -776,13 +797,6 @@ static void __smb2_lease_break_noti(struct work_struct *wk)
 
 out:
        ksmbd_free_work_struct(work);
-       /*
-        * Checking waitqueue to dropping pending requests on
-        * disconnection. waitqueue_active is safe because it
-        * uses atomic operation for condition.
-        */
-       if (!atomic_dec_return(&conn->r_count) && waitqueue_active(&conn->r_count_q))
-               wake_up(&conn->r_count_q);
 }
 
 /**
@@ -822,7 +836,6 @@ static int smb2_lease_break_noti(struct oplock_info *opinfo)
        work->conn = conn;
        work->sess = opinfo->sess;
 
-       atomic_inc(&conn->r_count);
        if (opinfo->op_state == OPLOCK_ACK_WAIT) {
                list_for_each_safe(tmp, t, &opinfo->interim_list) {
                        struct ksmbd_work *in_work;
@@ -1144,8 +1157,10 @@ int smb_grant_oplock(struct ksmbd_work *work, int req_op_level, u64 pid,
        }
        prev_opinfo = opinfo_get_list(ci);
        if (!prev_opinfo ||
-           (prev_opinfo->level == SMB2_OPLOCK_LEVEL_NONE && lctx))
+           (prev_opinfo->level == SMB2_OPLOCK_LEVEL_NONE && lctx)) {
+               opinfo_conn_put(prev_opinfo);
                goto set_lev;
+       }
        prev_op_has_lease = prev_opinfo->is_lease;
        if (prev_op_has_lease)
                prev_op_state = prev_opinfo->o_lease->state;
@@ -1153,19 +1168,19 @@ int smb_grant_oplock(struct ksmbd_work *work, int req_op_level, u64 pid,
        if (share_ret < 0 &&
            prev_opinfo->level == SMB2_OPLOCK_LEVEL_EXCLUSIVE) {
                err = share_ret;
-               opinfo_put(prev_opinfo);
+               opinfo_conn_put(prev_opinfo);
                goto err_out;
        }
 
        if (prev_opinfo->level != SMB2_OPLOCK_LEVEL_BATCH &&
            prev_opinfo->level != SMB2_OPLOCK_LEVEL_EXCLUSIVE) {
-               opinfo_put(prev_opinfo);
+               opinfo_conn_put(prev_opinfo);
                goto op_break_not_needed;
        }
 
        list_add(&work->interim_entry, &prev_opinfo->interim_list);
        err = oplock_break(prev_opinfo, SMB2_OPLOCK_LEVEL_II);
-       opinfo_put(prev_opinfo);
+       opinfo_conn_put(prev_opinfo);
        if (err == -ENOENT)
                goto set_lev;
        /* Check all oplock was freed by close */
@@ -1228,14 +1243,14 @@ static void smb_break_all_write_oplock(struct ksmbd_work *work,
                return;
        if (brk_opinfo->level != SMB2_OPLOCK_LEVEL_BATCH &&
            brk_opinfo->level != SMB2_OPLOCK_LEVEL_EXCLUSIVE) {
-               opinfo_put(brk_opinfo);
+               opinfo_conn_put(brk_opinfo);
                return;
        }
 
        brk_opinfo->open_trunc = is_trunc;
        list_add(&work->interim_entry, &brk_opinfo->interim_list);
        oplock_break(brk_opinfo, SMB2_OPLOCK_LEVEL_II);
-       opinfo_put(brk_opinfo);
+       opinfo_conn_put(brk_opinfo);
 }
 
 /**
@@ -1263,6 +1278,13 @@ void smb_break_all_levII_oplock(struct ksmbd_work *work, struct ksmbd_file *fp,
        list_for_each_entry_rcu(brk_op, &ci->m_op_list, op_entry) {
                if (!atomic_inc_not_zero(&brk_op->refcount))
                        continue;
+
+               atomic_inc(&brk_op->conn->r_count);
+               if (ksmbd_conn_releasing(brk_op->conn)) {
+                       atomic_dec(&brk_op->conn->r_count);
+                       continue;
+               }
+
                rcu_read_unlock();
                if (brk_op->is_lease && (brk_op->o_lease->state &
                    (~(SMB2_LEASE_READ_CACHING_LE |
@@ -1292,7 +1314,7 @@ void smb_break_all_levII_oplock(struct ksmbd_work *work, struct ksmbd_file *fp,
                brk_op->open_trunc = is_trunc;
                oplock_break(brk_op, SMB2_OPLOCK_LEVEL_NONE);
 next:
-               opinfo_put(brk_op);
+               opinfo_conn_put(brk_op);
                rcu_read_lock();
        }
        rcu_read_unlock();
@@ -1393,67 +1415,50 @@ void create_lease_buf(u8 *rbuf, struct lease *lease)
  */
 struct lease_ctx_info *parse_lease_state(void *open_req)
 {
-       char *data_offset;
        struct create_context *cc;
-       unsigned int next = 0;
-       char *name;
-       bool found = false;
        struct smb2_create_req *req = (struct smb2_create_req *)open_req;
-       struct lease_ctx_info *lreq = kzalloc(sizeof(struct lease_ctx_info),
-               GFP_KERNEL);
+       struct lease_ctx_info *lreq;
+
+       cc = smb2_find_context_vals(req, SMB2_CREATE_REQUEST_LEASE, 4);
+       if (IS_ERR_OR_NULL(cc))
+               return NULL;
+
+       lreq = kzalloc(sizeof(struct lease_ctx_info), GFP_KERNEL);
        if (!lreq)
                return NULL;
 
-       data_offset = (char *)req + le32_to_cpu(req->CreateContextsOffset);
-       cc = (struct create_context *)data_offset;
-       do {
-               cc = (struct create_context *)((char *)cc + next);
-               name = le16_to_cpu(cc->NameOffset) + (char *)cc;
-               if (le16_to_cpu(cc->NameLength) != 4 ||
-                   strncmp(name, SMB2_CREATE_REQUEST_LEASE, 4)) {
-                       next = le32_to_cpu(cc->Next);
-                       continue;
-               }
-               found = true;
-               break;
-       } while (next != 0);
+       if (sizeof(struct lease_context_v2) == le32_to_cpu(cc->DataLength)) {
+               struct create_lease_v2 *lc = (struct create_lease_v2 *)cc;
 
-       if (found) {
-               if (sizeof(struct lease_context_v2) == le32_to_cpu(cc->DataLength)) {
-                       struct create_lease_v2 *lc = (struct create_lease_v2 *)cc;
-
-                       memcpy(lreq->lease_key, lc->lcontext.LeaseKey, SMB2_LEASE_KEY_SIZE);
-                       lreq->req_state = lc->lcontext.LeaseState;
-                       lreq->flags = lc->lcontext.LeaseFlags;
-                       lreq->duration = lc->lcontext.LeaseDuration;
-                       memcpy(lreq->parent_lease_key, lc->lcontext.ParentLeaseKey,
-                              SMB2_LEASE_KEY_SIZE);
-                       lreq->version = 2;
-               } else {
-                       struct create_lease *lc = (struct create_lease *)cc;
+               memcpy(lreq->lease_key, lc->lcontext.LeaseKey, SMB2_LEASE_KEY_SIZE);
+               lreq->req_state = lc->lcontext.LeaseState;
+               lreq->flags = lc->lcontext.LeaseFlags;
+               lreq->duration = lc->lcontext.LeaseDuration;
+               memcpy(lreq->parent_lease_key, lc->lcontext.ParentLeaseKey,
+                               SMB2_LEASE_KEY_SIZE);
+               lreq->version = 2;
+       } else {
+               struct create_lease *lc = (struct create_lease *)cc;
 
-                       memcpy(lreq->lease_key, lc->lcontext.LeaseKey, SMB2_LEASE_KEY_SIZE);
-                       lreq->req_state = lc->lcontext.LeaseState;
-                       lreq->flags = lc->lcontext.LeaseFlags;
-                       lreq->duration = lc->lcontext.LeaseDuration;
-                       lreq->version = 1;
-               }
-               return lreq;
+               memcpy(lreq->lease_key, lc->lcontext.LeaseKey, SMB2_LEASE_KEY_SIZE);
+               lreq->req_state = lc->lcontext.LeaseState;
+               lreq->flags = lc->lcontext.LeaseFlags;
+               lreq->duration = lc->lcontext.LeaseDuration;
+               lreq->version = 1;
        }
-
-       kfree(lreq);
-       return NULL;
+       return lreq;
 }
 
 /**
  * smb2_find_context_vals() - find a particular context info in open request
  * @open_req:  buffer containing smb2 file open(create) request
  * @tag:       context name to search for
+ * @tag_len:   the length of tag
  *
  * Return:     pointer to requested context, NULL if @str context not found
  *             or error pointer if name length is invalid.
  */
-struct create_context *smb2_find_context_vals(void *open_req, const char *tag)
+struct create_context *smb2_find_context_vals(void *open_req, const char *tag, int tag_len)
 {
        struct create_context *cc;
        unsigned int next = 0;
@@ -1492,7 +1497,7 @@ struct create_context *smb2_find_context_vals(void *open_req, const char *tag)
                        return ERR_PTR(-EINVAL);
 
                name = (char *)cc + name_off;
-               if (memcmp(name, tag, name_len) == 0)
+               if (name_len == tag_len && !memcmp(name, tag, name_len))
                        return cc;
 
                remain_len -= next;
similarity index 99%
rename from fs/ksmbd/oplock.h
rename to fs/smb/server/oplock.h
index 0975344..4b0fe6d 100644 (file)
@@ -118,7 +118,7 @@ void create_durable_v2_rsp_buf(char *cc, struct ksmbd_file *fp);
 void create_mxac_rsp_buf(char *cc, int maximal_access);
 void create_disk_id_rsp_buf(char *cc, __u64 file_id, __u64 vol_id);
 void create_posix_rsp_buf(char *cc, struct ksmbd_file *fp);
-struct create_context *smb2_find_context_vals(void *open_req, const char *str);
+struct create_context *smb2_find_context_vals(void *open_req, const char *tag, int tag_len);
 struct oplock_info *lookup_lease_in_table(struct ksmbd_conn *conn,
                                          char *lease_key);
 int find_same_lease_key(struct ksmbd_session *sess, struct ksmbd_inode *ci,
similarity index 96%
rename from fs/ksmbd/server.c
rename to fs/smb/server/server.c
index f9b2e0f..ced7a9e 100644 (file)
@@ -185,24 +185,31 @@ static void __handle_ksmbd_work(struct ksmbd_work *work,
                goto send;
        }
 
-       if (conn->ops->check_user_session) {
-               rc = conn->ops->check_user_session(work);
-               if (rc < 0) {
-                       command = conn->ops->get_cmd_val(work);
-                       conn->ops->set_rsp_status(work,
-                                       STATUS_USER_SESSION_DELETED);
-                       goto send;
-               } else if (rc > 0) {
-                       rc = conn->ops->get_ksmbd_tcon(work);
+       do {
+               if (conn->ops->check_user_session) {
+                       rc = conn->ops->check_user_session(work);
                        if (rc < 0) {
-                               conn->ops->set_rsp_status(work,
-                                       STATUS_NETWORK_NAME_DELETED);
+                               if (rc == -EINVAL)
+                                       conn->ops->set_rsp_status(work,
+                                               STATUS_INVALID_PARAMETER);
+                               else
+                                       conn->ops->set_rsp_status(work,
+                                               STATUS_USER_SESSION_DELETED);
                                goto send;
+                       } else if (rc > 0) {
+                               rc = conn->ops->get_ksmbd_tcon(work);
+                               if (rc < 0) {
+                                       if (rc == -EINVAL)
+                                               conn->ops->set_rsp_status(work,
+                                                       STATUS_INVALID_PARAMETER);
+                                       else
+                                               conn->ops->set_rsp_status(work,
+                                                       STATUS_NETWORK_NAME_DELETED);
+                                       goto send;
+                               }
                        }
                }
-       }
 
-       do {
                rc = __process_request(work, conn, &command);
                if (rc == SERVER_HANDLER_ABORT)
                        break;
similarity index 100%
rename from fs/ksmbd/server.h
rename to fs/smb/server/server.h
similarity index 93%
rename from fs/ksmbd/smb2misc.c
rename to fs/smb/server/smb2misc.c
index fbdde42..33b7e6c 100644 (file)
@@ -351,9 +351,16 @@ int ksmbd_smb2_check_message(struct ksmbd_work *work)
        int command;
        __u32 clc_len;  /* calculated length */
        __u32 len = get_rfc1002_len(work->request_buf);
+       __u32 req_struct_size, next_cmd = le32_to_cpu(hdr->NextCommand);
 
-       if (le32_to_cpu(hdr->NextCommand) > 0)
-               len = le32_to_cpu(hdr->NextCommand);
+       if ((u64)work->next_smb2_rcv_hdr_off + next_cmd > len) {
+               pr_err("next command(%u) offset exceeds smb msg size\n",
+                               next_cmd);
+               return 1;
+       }
+
+       if (next_cmd > 0)
+               len = next_cmd;
        else if (work->next_smb2_rcv_hdr_off)
                len -= work->next_smb2_rcv_hdr_off;
 
@@ -373,17 +380,9 @@ int ksmbd_smb2_check_message(struct ksmbd_work *work)
        }
 
        if (smb2_req_struct_sizes[command] != pdu->StructureSize2) {
-               if (command != SMB2_OPLOCK_BREAK_HE &&
-                   (hdr->Status == 0 || pdu->StructureSize2 != SMB2_ERROR_STRUCTURE_SIZE2_LE)) {
-                       /* error packets have 9 byte structure size */
-                       ksmbd_debug(SMB,
-                                   "Illegal request size %u for command %d\n",
-                                   le16_to_cpu(pdu->StructureSize2), command);
-                       return 1;
-               } else if (command == SMB2_OPLOCK_BREAK_HE &&
-                          hdr->Status == 0 &&
-                          le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_20 &&
-                          le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_21) {
+               if (command == SMB2_OPLOCK_BREAK_HE &&
+                   le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_20 &&
+                   le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_21) {
                        /* special case for SMB2.1 lease break message */
                        ksmbd_debug(SMB,
                                    "Illegal request size %d for oplock break\n",
@@ -392,6 +391,14 @@ int ksmbd_smb2_check_message(struct ksmbd_work *work)
                }
        }
 
+       req_struct_size = le16_to_cpu(pdu->StructureSize2) +
+               __SMB2_HEADER_STRUCTURE_SIZE;
+       if (command == SMB2_LOCK_HE)
+               req_struct_size -= sizeof(struct smb2_lock_element);
+
+       if (req_struct_size > len + 1)
+               return 1;
+
        if (smb2_calc_size(hdr, &clc_len))
                return 1;
 
@@ -416,8 +423,11 @@ int ksmbd_smb2_check_message(struct ksmbd_work *work)
 
                /*
                 * Allow a message that padded to 8byte boundary.
+                * Linux 4.19.217 with smb 3.0.2 are sometimes
+                * sending messages where the cls_len is exactly
+                * 8 bytes less than len.
                 */
-               if (clc_len < len && (len - clc_len) < 8)
+               if (clc_len < len && (len - clc_len) <= 8)
                        goto validate_credit;
 
                pr_err_ratelimited(
similarity index 100%
rename from fs/ksmbd/smb2ops.c
rename to fs/smb/server/smb2ops.c
similarity index 98%
rename from fs/ksmbd/smb2pdu.c
rename to fs/smb/server/smb2pdu.c
index cb93fd2..da1787c 100644 (file)
@@ -91,7 +91,6 @@ int smb2_get_ksmbd_tcon(struct ksmbd_work *work)
        unsigned int cmd = le16_to_cpu(req_hdr->Command);
        int tree_id;
 
-       work->tcon = NULL;
        if (cmd == SMB2_TREE_CONNECT_HE ||
            cmd ==  SMB2_CANCEL_HE ||
            cmd ==  SMB2_LOGOFF_HE) {
@@ -105,10 +104,28 @@ int smb2_get_ksmbd_tcon(struct ksmbd_work *work)
        }
 
        tree_id = le32_to_cpu(req_hdr->Id.SyncId.TreeId);
+
+       /*
+        * If request is not the first in Compound request,
+        * Just validate tree id in header with work->tcon->id.
+        */
+       if (work->next_smb2_rcv_hdr_off) {
+               if (!work->tcon) {
+                       pr_err("The first operation in the compound does not have tcon\n");
+                       return -EINVAL;
+               }
+               if (work->tcon->id != tree_id) {
+                       pr_err("tree id(%u) is different with id(%u) in first operation\n",
+                                       tree_id, work->tcon->id);
+                       return -EINVAL;
+               }
+               return 1;
+       }
+
        work->tcon = ksmbd_tree_conn_lookup(work->sess, tree_id);
        if (!work->tcon) {
                pr_err("Invalid tid %d\n", tree_id);
-               return -EINVAL;
+               return -ENOENT;
        }
 
        return 1;
@@ -326,13 +343,9 @@ int smb2_set_rsp_credits(struct ksmbd_work *work)
        if (hdr->Command == SMB2_NEGOTIATE)
                aux_max = 1;
        else
-               aux_max = conn->vals->max_credits - credit_charge;
+               aux_max = conn->vals->max_credits - conn->total_credits;
        credits_granted = min_t(unsigned short, credits_requested, aux_max);
 
-       if (conn->vals->max_credits - conn->total_credits < credits_granted)
-               credits_granted = conn->vals->max_credits -
-                       conn->total_credits;
-
        conn->total_credits += credits_granted;
        work->credits_granted += credits_granted;
 
@@ -551,7 +564,6 @@ int smb2_check_user_session(struct ksmbd_work *work)
        unsigned int cmd = conn->ops->get_cmd_val(work);
        unsigned long long sess_id;
 
-       work->sess = NULL;
        /*
         * SMB2_ECHO, SMB2_NEGOTIATE, SMB2_SESSION_SETUP command do not
         * require a session id, so no need to validate user session's for
@@ -562,15 +574,33 @@ int smb2_check_user_session(struct ksmbd_work *work)
                return 0;
 
        if (!ksmbd_conn_good(conn))
-               return -EINVAL;
+               return -EIO;
 
        sess_id = le64_to_cpu(req_hdr->SessionId);
+
+       /*
+        * If request is not the first in Compound request,
+        * Just validate session id in header with work->sess->id.
+        */
+       if (work->next_smb2_rcv_hdr_off) {
+               if (!work->sess) {
+                       pr_err("The first operation in the compound does not have sess\n");
+                       return -EINVAL;
+               }
+               if (work->sess->id != sess_id) {
+                       pr_err("session id(%llu) is different with the first operation(%lld)\n",
+                                       sess_id, work->sess->id);
+                       return -EINVAL;
+               }
+               return 1;
+       }
+
        /* Check for validity of user session */
        work->sess = ksmbd_session_lookup_all(conn, sess_id);
        if (work->sess)
                return 1;
        ksmbd_debug(SMB, "Invalid user session, Uid %llu\n", sess_id);
-       return -EINVAL;
+       return -ENOENT;
 }
 
 static void destroy_previous_session(struct ksmbd_conn *conn,
@@ -849,13 +879,14 @@ static void assemble_neg_contexts(struct ksmbd_conn *conn,
 
 static __le32 decode_preauth_ctxt(struct ksmbd_conn *conn,
                                  struct smb2_preauth_neg_context *pneg_ctxt,
-                                 int len_of_ctxts)
+                                 int ctxt_len)
 {
        /*
         * sizeof(smb2_preauth_neg_context) assumes SMB311_SALT_SIZE Salt,
         * which may not be present. Only check for used HashAlgorithms[1].
         */
-       if (len_of_ctxts < MIN_PREAUTH_CTXT_DATA_LEN)
+       if (ctxt_len <
+           sizeof(struct smb2_neg_context) + MIN_PREAUTH_CTXT_DATA_LEN)
                return STATUS_INVALID_PARAMETER;
 
        if (pneg_ctxt->HashAlgorithms != SMB2_PREAUTH_INTEGRITY_SHA512)
@@ -867,15 +898,23 @@ static __le32 decode_preauth_ctxt(struct ksmbd_conn *conn,
 
 static void decode_encrypt_ctxt(struct ksmbd_conn *conn,
                                struct smb2_encryption_neg_context *pneg_ctxt,
-                               int len_of_ctxts)
+                               int ctxt_len)
 {
-       int cph_cnt = le16_to_cpu(pneg_ctxt->CipherCount);
-       int i, cphs_size = cph_cnt * sizeof(__le16);
+       int cph_cnt;
+       int i, cphs_size;
+
+       if (sizeof(struct smb2_encryption_neg_context) > ctxt_len) {
+               pr_err("Invalid SMB2_ENCRYPTION_CAPABILITIES context size\n");
+               return;
+       }
 
        conn->cipher_type = 0;
 
+       cph_cnt = le16_to_cpu(pneg_ctxt->CipherCount);
+       cphs_size = cph_cnt * sizeof(__le16);
+
        if (sizeof(struct smb2_encryption_neg_context) + cphs_size >
-           len_of_ctxts) {
+           ctxt_len) {
                pr_err("Invalid cipher count(%d)\n", cph_cnt);
                return;
        }
@@ -923,15 +962,22 @@ static void decode_compress_ctxt(struct ksmbd_conn *conn,
 
 static void decode_sign_cap_ctxt(struct ksmbd_conn *conn,
                                 struct smb2_signing_capabilities *pneg_ctxt,
-                                int len_of_ctxts)
+                                int ctxt_len)
 {
-       int sign_algo_cnt = le16_to_cpu(pneg_ctxt->SigningAlgorithmCount);
-       int i, sign_alos_size = sign_algo_cnt * sizeof(__le16);
+       int sign_algo_cnt;
+       int i, sign_alos_size;
+
+       if (sizeof(struct smb2_signing_capabilities) > ctxt_len) {
+               pr_err("Invalid SMB2_SIGNING_CAPABILITIES context length\n");
+               return;
+       }
 
        conn->signing_negotiated = false;
+       sign_algo_cnt = le16_to_cpu(pneg_ctxt->SigningAlgorithmCount);
+       sign_alos_size = sign_algo_cnt * sizeof(__le16);
 
        if (sizeof(struct smb2_signing_capabilities) + sign_alos_size >
-           len_of_ctxts) {
+           ctxt_len) {
                pr_err("Invalid signing algorithm count(%d)\n", sign_algo_cnt);
                return;
        }
@@ -951,13 +997,13 @@ static void decode_sign_cap_ctxt(struct ksmbd_conn *conn,
 
 static __le32 deassemble_neg_contexts(struct ksmbd_conn *conn,
                                      struct smb2_negotiate_req *req,
-                                     int len_of_smb)
+                                     unsigned int len_of_smb)
 {
        /* +4 is to account for the RFC1001 len field */
        struct smb2_neg_context *pctx = (struct smb2_neg_context *)req;
        int i = 0, len_of_ctxts;
-       int offset = le32_to_cpu(req->NegotiateContextOffset);
-       int neg_ctxt_cnt = le16_to_cpu(req->NegotiateContextCount);
+       unsigned int offset = le32_to_cpu(req->NegotiateContextOffset);
+       unsigned int neg_ctxt_cnt = le16_to_cpu(req->NegotiateContextCount);
        __le32 status = STATUS_INVALID_PARAMETER;
 
        ksmbd_debug(SMB, "decoding %d negotiate contexts\n", neg_ctxt_cnt);
@@ -969,18 +1015,16 @@ static __le32 deassemble_neg_contexts(struct ksmbd_conn *conn,
        len_of_ctxts = len_of_smb - offset;
 
        while (i++ < neg_ctxt_cnt) {
-               int clen;
-
-               /* check that offset is not beyond end of SMB */
-               if (len_of_ctxts == 0)
-                       break;
+               int clen, ctxt_len;
 
-               if (len_of_ctxts < sizeof(struct smb2_neg_context))
+               if (len_of_ctxts < (int)sizeof(struct smb2_neg_context))
                        break;
 
                pctx = (struct smb2_neg_context *)((char *)pctx + offset);
                clen = le16_to_cpu(pctx->DataLength);
-               if (clen + sizeof(struct smb2_neg_context) > len_of_ctxts)
+               ctxt_len = clen + sizeof(struct smb2_neg_context);
+
+               if (ctxt_len > len_of_ctxts)
                        break;
 
                if (pctx->ContextType == SMB2_PREAUTH_INTEGRITY_CAPABILITIES) {
@@ -991,7 +1035,7 @@ static __le32 deassemble_neg_contexts(struct ksmbd_conn *conn,
 
                        status = decode_preauth_ctxt(conn,
                                                     (struct smb2_preauth_neg_context *)pctx,
-                                                    len_of_ctxts);
+                                                    ctxt_len);
                        if (status != STATUS_SUCCESS)
                                break;
                } else if (pctx->ContextType == SMB2_ENCRYPTION_CAPABILITIES) {
@@ -1002,7 +1046,7 @@ static __le32 deassemble_neg_contexts(struct ksmbd_conn *conn,
 
                        decode_encrypt_ctxt(conn,
                                            (struct smb2_encryption_neg_context *)pctx,
-                                           len_of_ctxts);
+                                           ctxt_len);
                } else if (pctx->ContextType == SMB2_COMPRESSION_CAPABILITIES) {
                        ksmbd_debug(SMB,
                                    "deassemble SMB2_COMPRESSION_CAPABILITIES context\n");
@@ -1021,15 +1065,15 @@ static __le32 deassemble_neg_contexts(struct ksmbd_conn *conn,
                } else if (pctx->ContextType == SMB2_SIGNING_CAPABILITIES) {
                        ksmbd_debug(SMB,
                                    "deassemble SMB2_SIGNING_CAPABILITIES context\n");
+
                        decode_sign_cap_ctxt(conn,
                                             (struct smb2_signing_capabilities *)pctx,
-                                            len_of_ctxts);
+                                            ctxt_len);
                }
 
                /* offsets must be 8 byte aligned */
-               clen = (clen + 7) & ~0x7;
-               offset = clen + sizeof(struct smb2_neg_context);
-               len_of_ctxts -= clen + sizeof(struct smb2_neg_context);
+               offset = (ctxt_len + 7) & ~0x7;
+               len_of_ctxts -= offset;
        }
        return status;
 }
@@ -1057,16 +1101,16 @@ int smb2_handle_negotiate(struct ksmbd_work *work)
                return rc;
        }
 
-       if (req->DialectCount == 0) {
-               pr_err("malformed packet\n");
+       smb2_buf_len = get_rfc1002_len(work->request_buf);
+       smb2_neg_size = offsetof(struct smb2_negotiate_req, Dialects);
+       if (smb2_neg_size > smb2_buf_len) {
                rsp->hdr.Status = STATUS_INVALID_PARAMETER;
                rc = -EINVAL;
                goto err_out;
        }
 
-       smb2_buf_len = get_rfc1002_len(work->request_buf);
-       smb2_neg_size = offsetof(struct smb2_negotiate_req, Dialects);
-       if (smb2_neg_size > smb2_buf_len) {
+       if (req->DialectCount == 0) {
+               pr_err("malformed packet\n");
                rsp->hdr.Status = STATUS_INVALID_PARAMETER;
                rc = -EINVAL;
                goto err_out;
@@ -1356,7 +1400,7 @@ static struct ksmbd_user *session_user(struct ksmbd_conn *conn,
        struct authenticate_message *authblob;
        struct ksmbd_user *user;
        char *name;
-       unsigned int auth_msg_len, name_off, name_len, secbuf_len;
+       unsigned int name_off, name_len, secbuf_len;
 
        secbuf_len = le16_to_cpu(req->SecurityBufferLength);
        if (secbuf_len < sizeof(struct authenticate_message)) {
@@ -1366,9 +1410,8 @@ static struct ksmbd_user *session_user(struct ksmbd_conn *conn,
        authblob = user_authblob(conn, req);
        name_off = le32_to_cpu(authblob->UserName.BufferOffset);
        name_len = le16_to_cpu(authblob->UserName.Length);
-       auth_msg_len = le16_to_cpu(req->SecurityBufferOffset) + secbuf_len;
 
-       if (auth_msg_len < (u64)name_off + name_len)
+       if (secbuf_len < (u64)name_off + name_len)
                return NULL;
 
        name = smb_strndup_from_utf16((const char *)authblob + name_off,
@@ -2240,7 +2283,7 @@ static int smb2_set_ea(struct smb2_ea_info *eabuf, unsigned int buf_len,
                        /* delete the EA only when it exits */
                        if (rc > 0) {
                                rc = ksmbd_vfs_remove_xattr(idmap,
-                                                           path->dentry,
+                                                           path,
                                                            attr_name);
 
                                if (rc < 0) {
@@ -2254,8 +2297,7 @@ static int smb2_set_ea(struct smb2_ea_info *eabuf, unsigned int buf_len,
                        /* if the EA doesn't exist, just do nothing. */
                        rc = 0;
                } else {
-                       rc = ksmbd_vfs_setxattr(idmap,
-                                               path->dentry, attr_name, value,
+                       rc = ksmbd_vfs_setxattr(idmap, path, attr_name, value,
                                                le16_to_cpu(eabuf->EaValueLength), 0);
                        if (rc < 0) {
                                ksmbd_debug(SMB,
@@ -2312,8 +2354,7 @@ static noinline int smb2_set_stream_name_xattr(const struct path *path,
                return -EBADF;
        }
 
-       rc = ksmbd_vfs_setxattr(idmap, path->dentry,
-                               xattr_stream_name, NULL, 0, 0);
+       rc = ksmbd_vfs_setxattr(idmap, path, xattr_stream_name, NULL, 0, 0);
        if (rc < 0)
                pr_err("Failed to store XATTR stream name :%d\n", rc);
        return 0;
@@ -2341,7 +2382,7 @@ static int smb2_remove_smb_xattrs(const struct path *path)
                if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) &&
                    !strncmp(&name[XATTR_USER_PREFIX_LEN], STREAM_PREFIX,
                             STREAM_PREFIX_LEN)) {
-                       err = ksmbd_vfs_remove_xattr(idmap, path->dentry,
+                       err = ksmbd_vfs_remove_xattr(idmap, path,
                                                     name);
                        if (err)
                                ksmbd_debug(SMB, "remove xattr failed : %s\n",
@@ -2388,8 +2429,7 @@ static void smb2_new_xattrs(struct ksmbd_tree_connect *tcon, const struct path *
        da.flags = XATTR_DOSINFO_ATTRIB | XATTR_DOSINFO_CREATE_TIME |
                XATTR_DOSINFO_ITIME;
 
-       rc = ksmbd_vfs_set_dos_attrib_xattr(mnt_idmap(path->mnt),
-                                           path->dentry, &da);
+       rc = ksmbd_vfs_set_dos_attrib_xattr(mnt_idmap(path->mnt), path, &da);
        if (rc)
                ksmbd_debug(SMB, "failed to store file attribute into xattr\n");
 }
@@ -2464,7 +2504,7 @@ static int smb2_create_sd_buffer(struct ksmbd_work *work,
                return -ENOENT;
 
        /* Parse SD BUFFER create contexts */
-       context = smb2_find_context_vals(req, SMB2_CREATE_SD_BUFFER);
+       context = smb2_find_context_vals(req, SMB2_CREATE_SD_BUFFER, 4);
        if (!context)
                return -ENOENT;
        else if (IS_ERR(context))
@@ -2666,7 +2706,7 @@ int smb2_open(struct ksmbd_work *work)
 
        if (req->CreateContextsOffset) {
                /* Parse non-durable handle create contexts */
-               context = smb2_find_context_vals(req, SMB2_CREATE_EA_BUFFER);
+               context = smb2_find_context_vals(req, SMB2_CREATE_EA_BUFFER, 4);
                if (IS_ERR(context)) {
                        rc = PTR_ERR(context);
                        goto err_out1;
@@ -2686,7 +2726,7 @@ int smb2_open(struct ksmbd_work *work)
                }
 
                context = smb2_find_context_vals(req,
-                                                SMB2_CREATE_QUERY_MAXIMAL_ACCESS_REQUEST);
+                                                SMB2_CREATE_QUERY_MAXIMAL_ACCESS_REQUEST, 4);
                if (IS_ERR(context)) {
                        rc = PTR_ERR(context);
                        goto err_out1;
@@ -2697,7 +2737,7 @@ int smb2_open(struct ksmbd_work *work)
                }
 
                context = smb2_find_context_vals(req,
-                                                SMB2_CREATE_TIMEWARP_REQUEST);
+                                                SMB2_CREATE_TIMEWARP_REQUEST, 4);
                if (IS_ERR(context)) {
                        rc = PTR_ERR(context);
                        goto err_out1;
@@ -2709,7 +2749,7 @@ int smb2_open(struct ksmbd_work *work)
 
                if (tcon->posix_extensions) {
                        context = smb2_find_context_vals(req,
-                                                        SMB2_CREATE_TAG_POSIX);
+                                                        SMB2_CREATE_TAG_POSIX, 16);
                        if (IS_ERR(context)) {
                                rc = PTR_ERR(context);
                                goto err_out1;
@@ -2963,7 +3003,7 @@ int smb2_open(struct ksmbd_work *work)
                struct inode *inode = d_inode(path.dentry);
 
                posix_acl_rc = ksmbd_vfs_inherit_posix_acl(idmap,
-                                                          path.dentry,
+                                                          &path,
                                                           d_inode(path.dentry->d_parent));
                if (posix_acl_rc)
                        ksmbd_debug(SMB, "inherit posix acl failed : %d\n", posix_acl_rc);
@@ -2979,7 +3019,7 @@ int smb2_open(struct ksmbd_work *work)
                        if (rc) {
                                if (posix_acl_rc)
                                        ksmbd_vfs_set_init_posix_acl(idmap,
-                                                                    path.dentry);
+                                                                    &path);
 
                                if (test_share_config_flag(work->tcon->share_conf,
                                                           KSMBD_SHARE_FLAG_ACL_XATTR)) {
@@ -3019,7 +3059,7 @@ int smb2_open(struct ksmbd_work *work)
 
                                        rc = ksmbd_vfs_set_sd_xattr(conn,
                                                                    idmap,
-                                                                   path.dentry,
+                                                                   &path,
                                                                    pntsd,
                                                                    pntsd_size);
                                        kfree(pntsd);
@@ -3107,7 +3147,7 @@ int smb2_open(struct ksmbd_work *work)
                struct create_alloc_size_req *az_req;
 
                az_req = (struct create_alloc_size_req *)smb2_find_context_vals(req,
-                                       SMB2_CREATE_ALLOCATION_SIZE);
+                                       SMB2_CREATE_ALLOCATION_SIZE, 4);
                if (IS_ERR(az_req)) {
                        rc = PTR_ERR(az_req);
                        goto err_out;
@@ -3134,7 +3174,7 @@ int smb2_open(struct ksmbd_work *work)
                                            err);
                }
 
-               context = smb2_find_context_vals(req, SMB2_CREATE_QUERY_ON_DISK_ID);
+               context = smb2_find_context_vals(req, SMB2_CREATE_QUERY_ON_DISK_ID, 4);
                if (IS_ERR(context)) {
                        rc = PTR_ERR(context);
                        goto err_out;
@@ -4359,21 +4399,6 @@ static int get_file_basic_info(struct smb2_query_info_rsp *rsp,
        return 0;
 }
 
-static unsigned long long get_allocation_size(struct inode *inode,
-                                             struct kstat *stat)
-{
-       unsigned long long alloc_size = 0;
-
-       if (!S_ISDIR(stat->mode)) {
-               if ((inode->i_blocks << 9) <= stat->size)
-                       alloc_size = stat->size;
-               else
-                       alloc_size = inode->i_blocks << 9;
-       }
-
-       return alloc_size;
-}
-
 static void get_file_standard_info(struct smb2_query_info_rsp *rsp,
                                   struct ksmbd_file *fp, void *rsp_org)
 {
@@ -4388,7 +4413,7 @@ static void get_file_standard_info(struct smb2_query_info_rsp *rsp,
        sinfo = (struct smb2_file_standard_info *)rsp->Buffer;
        delete_pending = ksmbd_inode_pending_delete(fp);
 
-       sinfo->AllocationSize = cpu_to_le64(get_allocation_size(inode, &stat));
+       sinfo->AllocationSize = cpu_to_le64(inode->i_blocks << 9);
        sinfo->EndOfFile = S_ISDIR(stat.mode) ? 0 : cpu_to_le64(stat.size);
        sinfo->NumberOfLinks = cpu_to_le32(get_nlink(&stat) - delete_pending);
        sinfo->DeletePending = delete_pending;
@@ -4453,7 +4478,7 @@ static int get_file_all_info(struct ksmbd_work *work,
        file_info->Attributes = fp->f_ci->m_fattr;
        file_info->Pad1 = 0;
        file_info->AllocationSize =
-               cpu_to_le64(get_allocation_size(inode, &stat));
+               cpu_to_le64(inode->i_blocks << 9);
        file_info->EndOfFile = S_ISDIR(stat.mode) ? 0 : cpu_to_le64(stat.size);
        file_info->NumberOfLinks =
                        cpu_to_le32(get_nlink(&stat) - delete_pending);
@@ -4642,7 +4667,7 @@ static int get_file_network_open_info(struct smb2_query_info_rsp *rsp,
        file_info->ChangeTime = cpu_to_le64(time);
        file_info->Attributes = fp->f_ci->m_fattr;
        file_info->AllocationSize =
-               cpu_to_le64(get_allocation_size(inode, &stat));
+               cpu_to_le64(inode->i_blocks << 9);
        file_info->EndOfFile = S_ISDIR(stat.mode) ? 0 : cpu_to_le64(stat.size);
        file_info->Reserved = cpu_to_le32(0);
        rsp->OutputBufferLength =
@@ -5470,7 +5495,7 @@ static int smb2_rename(struct ksmbd_work *work,
                        goto out;
 
                rc = ksmbd_vfs_setxattr(file_mnt_idmap(fp->filp),
-                                       fp->filp->f_path.dentry,
+                                       &fp->filp->f_path,
                                        xattr_stream_name,
                                        NULL, 0, 0);
                if (rc < 0) {
@@ -5507,7 +5532,7 @@ static int smb2_create_link(struct ksmbd_work *work,
 {
        char *link_name = NULL, *target_name = NULL, *pathname = NULL;
        struct path path;
-       bool file_present = true;
+       bool file_present = false;
        int rc;
 
        if (buf_len < (u64)sizeof(struct smb2_file_link_info) +
@@ -5540,8 +5565,8 @@ static int smb2_create_link(struct ksmbd_work *work,
        if (rc) {
                if (rc != -ENOENT)
                        goto out;
-               file_present = false;
-       }
+       } else
+               file_present = true;
 
        if (file_info->ReplaceIfExists) {
                if (file_present) {
@@ -5635,8 +5660,7 @@ static int set_file_basic_info(struct ksmbd_file *fp,
                da.flags = XATTR_DOSINFO_ATTRIB | XATTR_DOSINFO_CREATE_TIME |
                        XATTR_DOSINFO_ITIME;
 
-               rc = ksmbd_vfs_set_dos_attrib_xattr(idmap,
-                                                   filp->f_path.dentry, &da);
+               rc = ksmbd_vfs_set_dos_attrib_xattr(idmap, &filp->f_path, &da);
                if (rc)
                        ksmbd_debug(SMB,
                                    "failed to restore file attribute in EA\n");
@@ -7491,7 +7515,7 @@ static inline int fsctl_set_sparse(struct ksmbd_work *work, u64 id,
 
                da.attr = le32_to_cpu(fp->f_ci->m_fattr);
                ret = ksmbd_vfs_set_dos_attrib_xattr(idmap,
-                                                    fp->filp->f_path.dentry, &da);
+                                                    &fp->filp->f_path, &da);
                if (ret)
                        fp->f_ci->m_fattr = old_fattr;
        }
similarity index 100%
rename from fs/ksmbd/smb2pdu.h
rename to fs/smb/server/smb2pdu.h
similarity index 98%
rename from fs/ksmbd/smb_common.c
rename to fs/smb/server/smb_common.c
index af0c2a9..569e5ee 100644 (file)
@@ -158,7 +158,19 @@ int ksmbd_verify_smb_message(struct ksmbd_work *work)
  */
 bool ksmbd_smb_request(struct ksmbd_conn *conn)
 {
-       return conn->request_buf[0] == 0;
+       __le32 *proto = (__le32 *)smb2_get_msg(conn->request_buf);
+
+       if (*proto == SMB2_COMPRESSION_TRANSFORM_ID) {
+               pr_err_ratelimited("smb2 compression not support yet");
+               return false;
+       }
+
+       if (*proto != SMB1_PROTO_NUMBER &&
+           *proto != SMB2_PROTO_NUMBER &&
+           *proto != SMB2_TRANSFORM_PROTO_NUM)
+               return false;
+
+       return true;
 }
 
 static bool supported_protocol(int idx)
similarity index 99%
rename from fs/ksmbd/smb_common.h
rename to fs/smb/server/smb_common.h
index 9130d2e..6b0d5f1 100644 (file)
@@ -10,7 +10,7 @@
 
 #include "glob.h"
 #include "nterr.h"
-#include "../smbfs_common/smb2pdu.h"
+#include "../common/smb2pdu.h"
 #include "smb2pdu.h"
 
 /* ksmbd's Specific ERRNO */
similarity index 99%
rename from fs/ksmbd/smbacl.c
rename to fs/smb/server/smbacl.c
index 6d6cfb6..ad919a4 100644 (file)
@@ -1162,8 +1162,7 @@ pass:
                        pntsd_size += sizeof(struct smb_acl) + nt_size;
                }
 
-               ksmbd_vfs_set_sd_xattr(conn, idmap,
-                                      path->dentry, pntsd, pntsd_size);
+               ksmbd_vfs_set_sd_xattr(conn, idmap, path, pntsd, pntsd_size);
                kfree(pntsd);
        }
 
@@ -1290,7 +1289,7 @@ int smb_check_perm_dacl(struct ksmbd_conn *conn, const struct path *path,
 
        if (IS_ENABLED(CONFIG_FS_POSIX_ACL)) {
                posix_acls = get_inode_acl(d_inode(path->dentry), ACL_TYPE_ACCESS);
-               if (posix_acls && !found) {
+               if (!IS_ERR_OR_NULL(posix_acls) && !found) {
                        unsigned int id = -1;
 
                        pa_entry = posix_acls->a_entries;
@@ -1314,7 +1313,7 @@ int smb_check_perm_dacl(struct ksmbd_conn *conn, const struct path *path,
                                }
                        }
                }
-               if (posix_acls)
+               if (!IS_ERR_OR_NULL(posix_acls))
                        posix_acl_release(posix_acls);
        }
 
@@ -1383,7 +1382,7 @@ int set_info_sec(struct ksmbd_conn *conn, struct ksmbd_tree_connect *tcon,
        newattrs.ia_valid |= ATTR_MODE;
        newattrs.ia_mode = (inode->i_mode & ~0777) | (fattr.cf_mode & 0777);
 
-       ksmbd_vfs_remove_acl_xattrs(idmap, path->dentry);
+       ksmbd_vfs_remove_acl_xattrs(idmap, path);
        /* Update posix acls */
        if (IS_ENABLED(CONFIG_FS_POSIX_ACL) && fattr.cf_dacls) {
                rc = set_posix_acl(idmap, path->dentry,
@@ -1414,9 +1413,8 @@ int set_info_sec(struct ksmbd_conn *conn, struct ksmbd_tree_connect *tcon,
 
        if (test_share_config_flag(tcon->share_conf, KSMBD_SHARE_FLAG_ACL_XATTR)) {
                /* Update WinACL in xattr */
-               ksmbd_vfs_remove_sd_xattrs(idmap, path->dentry);
-               ksmbd_vfs_set_sd_xattr(conn, idmap,
-                                      path->dentry, pntsd, ntsd_len);
+               ksmbd_vfs_remove_sd_xattrs(idmap, path);
+               ksmbd_vfs_set_sd_xattr(conn, idmap, path, pntsd, ntsd_len);
        }
 
 out:
similarity index 100%
rename from fs/ksmbd/smbacl.h
rename to fs/smb/server/smbacl.h
similarity index 98%
rename from fs/ksmbd/smbfsctl.h
rename to fs/smb/server/smbfsctl.h
index b98418a..ecdf8f6 100644 (file)
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: LGPL-2.1+ */
 /*
- *   fs/cifs/smbfsctl.h: SMB, CIFS, SMB2 FSCTL definitions
+ *   fs/smb/server/smbfsctl.h: SMB, CIFS, SMB2 FSCTL definitions
  *
  *   Copyright (c) International Business Machines  Corp., 2002,2009
  *   Author(s): Steve French (sfrench@us.ibm.com)
similarity index 99%
rename from fs/ksmbd/smbstatus.h
rename to fs/smb/server/smbstatus.h
index 108a8b6..8963deb 100644 (file)
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: LGPL-2.1+ */
 /*
- *   fs/cifs/smb2status.h
+ *   fs/server/smb2status.h
  *
  *   SMB2 Status code (network error) definitions
  *   Definitions are from MS-ERREF
similarity index 100%
rename from fs/ksmbd/unicode.c
rename to fs/smb/server/unicode.c
similarity index 100%
rename from fs/ksmbd/unicode.h
rename to fs/smb/server/unicode.h
similarity index 100%
rename from fs/ksmbd/uniupr.h
rename to fs/smb/server/uniupr.h
similarity index 95%
rename from fs/ksmbd/vfs.c
rename to fs/smb/server/vfs.c
index 778c152..81489fd 100644 (file)
@@ -86,12 +86,14 @@ static int ksmbd_vfs_path_lookup_locked(struct ksmbd_share_config *share_conf,
        err = vfs_path_parent_lookup(filename, flags,
                                     &parent_path, &last, &type,
                                     root_share_path);
-       putname(filename);
-       if (err)
+       if (err) {
+               putname(filename);
                return err;
+       }
 
        if (unlikely(type != LAST_NORM)) {
                path_put(&parent_path);
+               putname(filename);
                return -ENOENT;
        }
 
@@ -108,12 +110,14 @@ static int ksmbd_vfs_path_lookup_locked(struct ksmbd_share_config *share_conf,
        path->dentry = d;
        path->mnt = share_conf->vfs_path.mnt;
        path_put(&parent_path);
+       putname(filename);
 
        return 0;
 
 err_out:
        inode_unlock(parent_path.dentry->d_inode);
        path_put(&parent_path);
+       putname(filename);
        return -ENOENT;
 }
 
@@ -166,6 +170,10 @@ int ksmbd_vfs_create(struct ksmbd_work *work, const char *name, umode_t mode)
                return err;
        }
 
+       err = mnt_want_write(path.mnt);
+       if (err)
+               goto out_err;
+
        mode |= S_IFREG;
        err = vfs_create(mnt_idmap(path.mnt), d_inode(path.dentry),
                         dentry, mode, true);
@@ -175,6 +183,9 @@ int ksmbd_vfs_create(struct ksmbd_work *work, const char *name, umode_t mode)
        } else {
                pr_err("File(%s): creation failed (err:%d)\n", name, err);
        }
+       mnt_drop_write(path.mnt);
+
+out_err:
        done_path_create(&path, dentry);
        return err;
 }
@@ -205,30 +216,35 @@ int ksmbd_vfs_mkdir(struct ksmbd_work *work, const char *name, umode_t mode)
                return err;
        }
 
+       err = mnt_want_write(path.mnt);
+       if (err)
+               goto out_err2;
+
        idmap = mnt_idmap(path.mnt);
        mode |= S_IFDIR;
        err = vfs_mkdir(idmap, d_inode(path.dentry), dentry, mode);
-       if (err) {
-               goto out;
-       } else if (d_unhashed(dentry)) {
+       if (!err && d_unhashed(dentry)) {
                struct dentry *d;
 
                d = lookup_one(idmap, dentry->d_name.name, dentry->d_parent,
                               dentry->d_name.len);
                if (IS_ERR(d)) {
                        err = PTR_ERR(d);
-                       goto out;
+                       goto out_err1;
                }
                if (unlikely(d_is_negative(d))) {
                        dput(d);
                        err = -ENOENT;
-                       goto out;
+                       goto out_err1;
                }
 
                ksmbd_vfs_inherit_owner(work, d_inode(path.dentry), d_inode(d));
                dput(d);
        }
-out:
+
+out_err1:
+       mnt_drop_write(path.mnt);
+out_err2:
        done_path_create(&path, dentry);
        if (err)
                pr_err("mkdir(%s): creation failed (err:%d)\n", name, err);
@@ -439,7 +455,7 @@ static int ksmbd_vfs_stream_write(struct ksmbd_file *fp, char *buf, loff_t *pos,
        memcpy(&stream_buf[*pos], buf, count);
 
        err = ksmbd_vfs_setxattr(idmap,
-                                fp->filp->f_path.dentry,
+                                &fp->filp->f_path,
                                 fp->stream.name,
                                 (void *)stream_buf,
                                 size,
@@ -585,6 +601,10 @@ int ksmbd_vfs_remove_file(struct ksmbd_work *work, const struct path *path)
                goto out_err;
        }
 
+       err = mnt_want_write(path->mnt);
+       if (err)
+               goto out_err;
+
        idmap = mnt_idmap(path->mnt);
        if (S_ISDIR(d_inode(path->dentry)->i_mode)) {
                err = vfs_rmdir(idmap, d_inode(parent), path->dentry);
@@ -595,6 +615,7 @@ int ksmbd_vfs_remove_file(struct ksmbd_work *work, const struct path *path)
                if (err)
                        ksmbd_debug(VFS, "unlink failed, err %d\n", err);
        }
+       mnt_drop_write(path->mnt);
 
 out_err:
        ksmbd_revert_fsids(work);
@@ -640,11 +661,16 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname,
                goto out3;
        }
 
+       err = mnt_want_write(newpath.mnt);
+       if (err)
+               goto out3;
+
        err = vfs_link(oldpath.dentry, mnt_idmap(newpath.mnt),
                       d_inode(newpath.dentry),
                       dentry, NULL);
        if (err)
                ksmbd_debug(VFS, "vfs_link failed err %d\n", err);
+       mnt_drop_write(newpath.mnt);
 
 out3:
        done_path_create(&newpath, dentry);
@@ -690,6 +716,10 @@ retry:
                goto out2;
        }
 
+       err = mnt_want_write(old_path->mnt);
+       if (err)
+               goto out2;
+
        trap = lock_rename_child(old_child, new_path.dentry);
 
        old_parent = dget(old_child->d_parent);
@@ -743,6 +773,7 @@ retry:
        rd.new_dir              = new_path.dentry->d_inode,
        rd.new_dentry           = new_dentry,
        rd.flags                = flags,
+       rd.delegated_inode      = NULL,
        err = vfs_rename(&rd);
        if (err)
                ksmbd_debug(VFS, "vfs_rename failed err %d\n", err);
@@ -752,6 +783,7 @@ out4:
 out3:
        dput(old_parent);
        unlock_rename(old_parent, new_path.dentry);
+       mnt_drop_write(old_path->mnt);
 out2:
        path_put(&new_path);
 
@@ -892,19 +924,24 @@ ssize_t ksmbd_vfs_getxattr(struct mnt_idmap *idmap,
  * Return:     0 on success, otherwise error
  */
 int ksmbd_vfs_setxattr(struct mnt_idmap *idmap,
-                      struct dentry *dentry, const char *attr_name,
+                      const struct path *path, const char *attr_name,
                       void *attr_value, size_t attr_size, int flags)
 {
        int err;
 
+       err = mnt_want_write(path->mnt);
+       if (err)
+               return err;
+
        err = vfs_setxattr(idmap,
-                          dentry,
+                          path->dentry,
                           attr_name,
                           attr_value,
                           attr_size,
                           flags);
        if (err)
                ksmbd_debug(VFS, "setxattr failed, err %d\n", err);
+       mnt_drop_write(path->mnt);
        return err;
 }
 
@@ -1008,9 +1045,18 @@ int ksmbd_vfs_fqar_lseek(struct ksmbd_file *fp, loff_t start, loff_t length,
 }
 
 int ksmbd_vfs_remove_xattr(struct mnt_idmap *idmap,
-                          struct dentry *dentry, char *attr_name)
+                          const struct path *path, char *attr_name)
 {
-       return vfs_removexattr(idmap, dentry, attr_name);
+       int err;
+
+       err = mnt_want_write(path->mnt);
+       if (err)
+               return err;
+
+       err = vfs_removexattr(idmap, path->dentry, attr_name);
+       mnt_drop_write(path->mnt);
+
+       return err;
 }
 
 int ksmbd_vfs_unlink(struct file *filp)
@@ -1019,6 +1065,10 @@ int ksmbd_vfs_unlink(struct file *filp)
        struct dentry *dir, *dentry = filp->f_path.dentry;
        struct mnt_idmap *idmap = file_mnt_idmap(filp);
 
+       err = mnt_want_write(filp->f_path.mnt);
+       if (err)
+               return err;
+
        dir = dget_parent(dentry);
        err = ksmbd_vfs_lock_parent(dir, dentry);
        if (err)
@@ -1036,6 +1086,7 @@ int ksmbd_vfs_unlink(struct file *filp)
                ksmbd_debug(VFS, "failed to delete, err %d\n", err);
 out:
        dput(dir);
+       mnt_drop_write(filp->f_path.mnt);
 
        return err;
 }
@@ -1239,13 +1290,13 @@ struct dentry *ksmbd_vfs_kern_path_create(struct ksmbd_work *work,
 }
 
 int ksmbd_vfs_remove_acl_xattrs(struct mnt_idmap *idmap,
-                               struct dentry *dentry)
+                               const struct path *path)
 {
        char *name, *xattr_list = NULL;
        ssize_t xattr_list_len;
        int err = 0;
 
-       xattr_list_len = ksmbd_vfs_listxattr(dentry, &xattr_list);
+       xattr_list_len = ksmbd_vfs_listxattr(path->dentry, &xattr_list);
        if (xattr_list_len < 0) {
                goto out;
        } else if (!xattr_list_len) {
@@ -1253,6 +1304,10 @@ int ksmbd_vfs_remove_acl_xattrs(struct mnt_idmap *idmap,
                goto out;
        }
 
+       err = mnt_want_write(path->mnt);
+       if (err)
+               goto out;
+
        for (name = xattr_list; name - xattr_list < xattr_list_len;
             name += strlen(name) + 1) {
                ksmbd_debug(SMB, "%s, len %zd\n", name, strlen(name));
@@ -1261,25 +1316,26 @@ int ksmbd_vfs_remove_acl_xattrs(struct mnt_idmap *idmap,
                             sizeof(XATTR_NAME_POSIX_ACL_ACCESS) - 1) ||
                    !strncmp(name, XATTR_NAME_POSIX_ACL_DEFAULT,
                             sizeof(XATTR_NAME_POSIX_ACL_DEFAULT) - 1)) {
-                       err = vfs_remove_acl(idmap, dentry, name);
+                       err = vfs_remove_acl(idmap, path->dentry, name);
                        if (err)
                                ksmbd_debug(SMB,
                                            "remove acl xattr failed : %s\n", name);
                }
        }
+       mnt_drop_write(path->mnt);
+
 out:
        kvfree(xattr_list);
        return err;
 }
 
-int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap,
-                              struct dentry *dentry)
+int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap, const struct path *path)
 {
        char *name, *xattr_list = NULL;
        ssize_t xattr_list_len;
        int err = 0;
 
-       xattr_list_len = ksmbd_vfs_listxattr(dentry, &xattr_list);
+       xattr_list_len = ksmbd_vfs_listxattr(path->dentry, &xattr_list);
        if (xattr_list_len < 0) {
                goto out;
        } else if (!xattr_list_len) {
@@ -1292,7 +1348,7 @@ int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap,
                ksmbd_debug(SMB, "%s, len %zd\n", name, strlen(name));
 
                if (!strncmp(name, XATTR_NAME_SD, XATTR_NAME_SD_LEN)) {
-                       err = ksmbd_vfs_remove_xattr(idmap, dentry, name);
+                       err = ksmbd_vfs_remove_xattr(idmap, path, name);
                        if (err)
                                ksmbd_debug(SMB, "remove xattr failed : %s\n", name);
                }
@@ -1316,7 +1372,7 @@ static struct xattr_smb_acl *ksmbd_vfs_make_xattr_posix_acl(struct mnt_idmap *id
                return NULL;
 
        posix_acls = get_inode_acl(inode, acl_type);
-       if (!posix_acls)
+       if (IS_ERR_OR_NULL(posix_acls))
                return NULL;
 
        smb_acl = kzalloc(sizeof(struct xattr_smb_acl) +
@@ -1369,13 +1425,14 @@ out:
 
 int ksmbd_vfs_set_sd_xattr(struct ksmbd_conn *conn,
                           struct mnt_idmap *idmap,
-                          struct dentry *dentry,
+                          const struct path *path,
                           struct smb_ntsd *pntsd, int len)
 {
        int rc;
        struct ndr sd_ndr = {0}, acl_ndr = {0};
        struct xattr_ntacl acl = {0};
        struct xattr_smb_acl *smb_acl, *def_smb_acl = NULL;
+       struct dentry *dentry = path->dentry;
        struct inode *inode = d_inode(dentry);
 
        acl.version = 4;
@@ -1427,7 +1484,7 @@ int ksmbd_vfs_set_sd_xattr(struct ksmbd_conn *conn,
                goto out;
        }
 
-       rc = ksmbd_vfs_setxattr(idmap, dentry,
+       rc = ksmbd_vfs_setxattr(idmap, path,
                                XATTR_NAME_SD, sd_ndr.data,
                                sd_ndr.offset, 0);
        if (rc < 0)
@@ -1517,7 +1574,7 @@ free_n_data:
 }
 
 int ksmbd_vfs_set_dos_attrib_xattr(struct mnt_idmap *idmap,
-                                  struct dentry *dentry,
+                                  const struct path *path,
                                   struct xattr_dos_attrib *da)
 {
        struct ndr n;
@@ -1527,7 +1584,7 @@ int ksmbd_vfs_set_dos_attrib_xattr(struct mnt_idmap *idmap,
        if (err)
                return err;
 
-       err = ksmbd_vfs_setxattr(idmap, dentry, XATTR_NAME_DOS_ATTRIBUTE,
+       err = ksmbd_vfs_setxattr(idmap, path, XATTR_NAME_DOS_ATTRIBUTE,
                                 (void *)n.data, n.offset, 0);
        if (err)
                ksmbd_debug(SMB, "failed to store dos attribute in xattr\n");
@@ -1764,10 +1821,11 @@ void ksmbd_vfs_posix_lock_unblock(struct file_lock *flock)
 }
 
 int ksmbd_vfs_set_init_posix_acl(struct mnt_idmap *idmap,
-                                struct dentry *dentry)
+                                struct path *path)
 {
        struct posix_acl_state acl_state;
        struct posix_acl *acls;
+       struct dentry *dentry = path->dentry;
        struct inode *inode = d_inode(dentry);
        int rc;
 
@@ -1797,6 +1855,11 @@ int ksmbd_vfs_set_init_posix_acl(struct mnt_idmap *idmap,
                return -ENOMEM;
        }
        posix_state_to_acl(&acl_state, acls->a_entries);
+
+       rc = mnt_want_write(path->mnt);
+       if (rc)
+               goto out_err;
+
        rc = set_posix_acl(idmap, dentry, ACL_TYPE_ACCESS, acls);
        if (rc < 0)
                ksmbd_debug(SMB, "Set posix acl(ACL_TYPE_ACCESS) failed, rc : %d\n",
@@ -1808,16 +1871,20 @@ int ksmbd_vfs_set_init_posix_acl(struct mnt_idmap *idmap,
                        ksmbd_debug(SMB, "Set posix acl(ACL_TYPE_DEFAULT) failed, rc : %d\n",
                                    rc);
        }
+       mnt_drop_write(path->mnt);
+
+out_err:
        free_acl_state(&acl_state);
        posix_acl_release(acls);
        return rc;
 }
 
 int ksmbd_vfs_inherit_posix_acl(struct mnt_idmap *idmap,
-                               struct dentry *dentry, struct inode *parent_inode)
+                               struct path *path, struct inode *parent_inode)
 {
        struct posix_acl *acls;
        struct posix_acl_entry *pace;
+       struct dentry *dentry = path->dentry;
        struct inode *inode = d_inode(dentry);
        int rc, i;
 
@@ -1825,7 +1892,7 @@ int ksmbd_vfs_inherit_posix_acl(struct mnt_idmap *idmap,
                return -EOPNOTSUPP;
 
        acls = get_inode_acl(parent_inode, ACL_TYPE_DEFAULT);
-       if (!acls)
+       if (IS_ERR_OR_NULL(acls))
                return -ENOENT;
        pace = acls->a_entries;
 
@@ -1836,6 +1903,10 @@ int ksmbd_vfs_inherit_posix_acl(struct mnt_idmap *idmap,
                }
        }
 
+       rc = mnt_want_write(path->mnt);
+       if (rc)
+               goto out_err;
+
        rc = set_posix_acl(idmap, dentry, ACL_TYPE_ACCESS, acls);
        if (rc < 0)
                ksmbd_debug(SMB, "Set posix acl(ACL_TYPE_ACCESS) failed, rc : %d\n",
@@ -1847,6 +1918,9 @@ int ksmbd_vfs_inherit_posix_acl(struct mnt_idmap *idmap,
                        ksmbd_debug(SMB, "Set posix acl(ACL_TYPE_DEFAULT) failed, rc : %d\n",
                                    rc);
        }
+       mnt_drop_write(path->mnt);
+
+out_err:
        posix_acl_release(acls);
        return rc;
 }
similarity index 94%
rename from fs/ksmbd/vfs.h
rename to fs/smb/server/vfs.h
index a4ae89f..8c0931d 100644 (file)
@@ -108,12 +108,12 @@ ssize_t ksmbd_vfs_casexattr_len(struct mnt_idmap *idmap,
                                struct dentry *dentry, char *attr_name,
                                int attr_name_len);
 int ksmbd_vfs_setxattr(struct mnt_idmap *idmap,
-                      struct dentry *dentry, const char *attr_name,
+                      const struct path *path, const char *attr_name,
                       void *attr_value, size_t attr_size, int flags);
 int ksmbd_vfs_xattr_stream_name(char *stream_name, char **xattr_stream_name,
                                size_t *xattr_stream_name_size, int s_type);
 int ksmbd_vfs_remove_xattr(struct mnt_idmap *idmap,
-                          struct dentry *dentry, char *attr_name);
+                          const struct path *path, char *attr_name);
 int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name,
                               unsigned int flags, struct path *path,
                               bool caseless);
@@ -139,26 +139,25 @@ void ksmbd_vfs_posix_lock_wait(struct file_lock *flock);
 int ksmbd_vfs_posix_lock_wait_timeout(struct file_lock *flock, long timeout);
 void ksmbd_vfs_posix_lock_unblock(struct file_lock *flock);
 int ksmbd_vfs_remove_acl_xattrs(struct mnt_idmap *idmap,
-                               struct dentry *dentry);
-int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap,
-                              struct dentry *dentry);
+                               const struct path *path);
+int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap, const struct path *path);
 int ksmbd_vfs_set_sd_xattr(struct ksmbd_conn *conn,
                           struct mnt_idmap *idmap,
-                          struct dentry *dentry,
+                          const struct path *path,
                           struct smb_ntsd *pntsd, int len);
 int ksmbd_vfs_get_sd_xattr(struct ksmbd_conn *conn,
                           struct mnt_idmap *idmap,
                           struct dentry *dentry,
                           struct smb_ntsd **pntsd);
 int ksmbd_vfs_set_dos_attrib_xattr(struct mnt_idmap *idmap,
-                                  struct dentry *dentry,
+                                  const struct path *path,
                                   struct xattr_dos_attrib *da);
 int ksmbd_vfs_get_dos_attrib_xattr(struct mnt_idmap *idmap,
                                   struct dentry *dentry,
                                   struct xattr_dos_attrib *da);
 int ksmbd_vfs_set_init_posix_acl(struct mnt_idmap *idmap,
-                                struct dentry *dentry);
+                                struct path *path);
 int ksmbd_vfs_inherit_posix_acl(struct mnt_idmap *idmap,
-                               struct dentry *dentry,
+                               struct path *path,
                                struct inode *parent_inode);
 #endif /* __KSMBD_VFS_H__ */
similarity index 99%
rename from fs/ksmbd/vfs_cache.c
rename to fs/smb/server/vfs_cache.c
index 2d0138e..f41f8d6 100644 (file)
@@ -252,7 +252,7 @@ static void __ksmbd_inode_close(struct ksmbd_file *fp)
        if (ksmbd_stream_fd(fp) && (ci->m_flags & S_DEL_ON_CLS_STREAM)) {
                ci->m_flags &= ~S_DEL_ON_CLS_STREAM;
                err = ksmbd_vfs_remove_xattr(file_mnt_idmap(filp),
-                                            filp->f_path.dentry,
+                                            &filp->f_path,
                                             fp->stream.name);
                if (err)
                        pr_err("remove xattr failed : %s\n",
similarity index 100%
rename from fs/ksmbd/xattr.h
rename to fs/smb/server/xattr.h
index 3e06611..7a9565d 100644 (file)
@@ -299,20 +299,36 @@ void splice_shrink_spd(struct splice_pipe_desc *spd)
        kfree(spd->partial);
 }
 
-/*
- * Splice data from an O_DIRECT file into pages and then add them to the output
- * pipe.
+/**
+ * copy_splice_read -  Copy data from a file and splice the copy into a pipe
+ * @in: The file to read from
+ * @ppos: Pointer to the file position to read from
+ * @pipe: The pipe to splice into
+ * @len: The amount to splice
+ * @flags: The SPLICE_F_* flags
+ *
+ * This function allocates a bunch of pages sufficient to hold the requested
+ * amount of data (but limited by the remaining pipe capacity), passes it to
+ * the file's ->read_iter() to read into and then splices the used pages into
+ * the pipe.
+ *
+ * Return: On success, the number of bytes read will be returned and *@ppos
+ * will be updated if appropriate; 0 will be returned if there is no more data
+ * to be read; -EAGAIN will be returned if the pipe had no space, and some
+ * other negative error code will be returned on error.  A short read may occur
+ * if the pipe has insufficient space, we reach the end of the data or we hit a
+ * hole.
  */
-ssize_t direct_splice_read(struct file *in, loff_t *ppos,
-                          struct pipe_inode_info *pipe,
-                          size_t len, unsigned int flags)
+ssize_t copy_splice_read(struct file *in, loff_t *ppos,
+                        struct pipe_inode_info *pipe,
+                        size_t len, unsigned int flags)
 {
        struct iov_iter to;
        struct bio_vec *bv;
        struct kiocb kiocb;
        struct page **pages;
        ssize_t ret;
-       size_t used, npages, chunk, remain, reclaim;
+       size_t used, npages, chunk, remain, keep = 0;
        int i;
 
        /* Work out how much data we can actually add into the pipe */
@@ -326,7 +342,7 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos,
        if (!bv)
                return -ENOMEM;
 
-       pages = (void *)(bv + npages);
+       pages = (struct page **)(bv + npages);
        npages = alloc_pages_bulk_array(GFP_USER, npages, pages);
        if (!npages) {
                kfree(bv);
@@ -349,31 +365,25 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos,
        kiocb.ki_pos = *ppos;
        ret = call_read_iter(in, &kiocb, &to);
 
-       reclaim = npages * PAGE_SIZE;
-       remain = 0;
        if (ret > 0) {
-               reclaim -= ret;
-               remain = ret;
+               keep = DIV_ROUND_UP(ret, PAGE_SIZE);
                *ppos = kiocb.ki_pos;
-               file_accessed(in);
-       } else if (ret < 0) {
-               /*
-                * callers of ->splice_read() expect -EAGAIN on
-                * "can't put anything in there", rather than -EFAULT.
-                */
-               if (ret == -EFAULT)
-                       ret = -EAGAIN;
        }
 
+       /*
+        * Callers of ->splice_read() expect -EAGAIN on "can't put anything in
+        * there", rather than -EFAULT.
+        */
+       if (ret == -EFAULT)
+               ret = -EAGAIN;
+
        /* Free any pages that didn't get touched at all. */
-       reclaim /= PAGE_SIZE;
-       if (reclaim) {
-               npages -= reclaim;
-               release_pages(pages + npages, reclaim);
-       }
+       if (keep < npages)
+               release_pages(pages + keep, npages - keep);
 
        /* Push the remaining pages into the pipe. */
-       for (i = 0; i < npages; i++) {
+       remain = ret;
+       for (i = 0; i < keep; i++) {
                struct pipe_buffer *buf = pipe_head_buf(pipe);
 
                chunk = min_t(size_t, remain, PAGE_SIZE);
@@ -390,50 +400,7 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos,
        kfree(bv);
        return ret;
 }
-EXPORT_SYMBOL(direct_splice_read);
-
-/**
- * generic_file_splice_read - splice data from file to a pipe
- * @in:                file to splice from
- * @ppos:      position in @in
- * @pipe:      pipe to splice to
- * @len:       number of bytes to splice
- * @flags:     splice modifier flags
- *
- * Description:
- *    Will read pages from given file and fill them into a pipe. Can be
- *    used as long as it has more or less sane ->read_iter().
- *
- */
-ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
-                                struct pipe_inode_info *pipe, size_t len,
-                                unsigned int flags)
-{
-       struct iov_iter to;
-       struct kiocb kiocb;
-       int ret;
-
-       iov_iter_pipe(&to, ITER_DEST, pipe, len);
-       init_sync_kiocb(&kiocb, in);
-       kiocb.ki_pos = *ppos;
-       ret = call_read_iter(in, &kiocb, &to);
-       if (ret > 0) {
-               *ppos = kiocb.ki_pos;
-               file_accessed(in);
-       } else if (ret < 0) {
-               /* free what was emitted */
-               pipe_discard_from(pipe, to.start_head);
-               /*
-                * callers of ->splice_read() expect -EAGAIN on
-                * "can't put anything in there", rather than -EFAULT.
-                */
-               if (ret == -EFAULT)
-                       ret = -EAGAIN;
-       }
-
-       return ret;
-}
-EXPORT_SYMBOL(generic_file_splice_read);
+EXPORT_SYMBOL(copy_splice_read);
 
 const struct pipe_buf_operations default_pipe_buf_ops = {
        .release        = generic_pipe_buf_release,
@@ -873,18 +840,32 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
        return out->f_op->splice_write(pipe, out, ppos, len, flags);
 }
 
-/*
- * Attempt to initiate a splice from a file to a pipe.
+/**
+ * vfs_splice_read - Read data from a file and splice it into a pipe
+ * @in:                File to splice from
+ * @ppos:      Input file offset
+ * @pipe:      Pipe to splice to
+ * @len:       Number of bytes to splice
+ * @flags:     Splice modifier flags (SPLICE_F_*)
+ *
+ * Splice the requested amount of data from the input file to the pipe.  This
+ * is synchronous as the caller must hold the pipe lock across the entire
+ * operation.
+ *
+ * If successful, it returns the amount of data spliced, 0 if it hit the EOF or
+ * a hole and a negative error code otherwise.
  */
-static long do_splice_to(struct file *in, loff_t *ppos,
-                        struct pipe_inode_info *pipe, size_t len,
-                        unsigned int flags)
+long vfs_splice_read(struct file *in, loff_t *ppos,
+                    struct pipe_inode_info *pipe, size_t len,
+                    unsigned int flags)
 {
        unsigned int p_space;
        int ret;
 
        if (unlikely(!(in->f_mode & FMODE_READ)))
                return -EBADF;
+       if (!len)
+               return 0;
 
        /* Don't try to read more the pipe has space for. */
        p_space = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail);
@@ -899,8 +880,15 @@ static long do_splice_to(struct file *in, loff_t *ppos,
 
        if (unlikely(!in->f_op->splice_read))
                return warn_unsupported(in, "read");
+       /*
+        * O_DIRECT and DAX don't deal with the pagecache, so we allocate a
+        * buffer, copy into it and splice that into the pipe.
+        */
+       if ((in->f_flags & O_DIRECT) || IS_DAX(in->f_mapping->host))
+               return copy_splice_read(in, ppos, pipe, len, flags);
        return in->f_op->splice_read(in, ppos, pipe, len, flags);
 }
+EXPORT_SYMBOL_GPL(vfs_splice_read);
 
 /**
  * splice_direct_to_actor - splices data directly between two non-pipes
@@ -970,7 +958,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
                size_t read_len;
                loff_t pos = sd->pos, prev_pos = pos;
 
-               ret = do_splice_to(in, &pos, pipe, len, flags);
+               ret = vfs_splice_read(in, &pos, pipe, len, flags);
                if (unlikely(ret <= 0))
                        goto out_release;
 
@@ -1118,7 +1106,7 @@ long splice_file_to_pipe(struct file *in,
        pipe_lock(opipe);
        ret = wait_for_space(opipe, flags);
        if (!ret)
-               ret = do_splice_to(in, offset, opipe, len, flags);
+               ret = vfs_splice_read(in, offset, opipe, len, flags);
        pipe_unlock(opipe);
        if (ret > 0)
                wakeup_pipe_readers(opipe);
index 0ba34c1..96d1c3e 100644 (file)
@@ -130,6 +130,7 @@ static int do_statfs_native(struct kstatfs *st, struct statfs __user *p)
        if (sizeof(buf) == sizeof(*st))
                memcpy(&buf, st, sizeof(*st));
        else {
+               memset(&buf, 0, sizeof(buf));
                if (sizeof buf.f_blocks == 4) {
                        if ((st->f_blocks | st->f_bfree | st->f_bavail |
                             st->f_bsize | st->f_frsize) &
@@ -158,7 +159,6 @@ static int do_statfs_native(struct kstatfs *st, struct statfs __user *p)
                buf.f_namelen = st->f_namelen;
                buf.f_frsize = st->f_frsize;
                buf.f_flags = st->f_flags;
-               memset(buf.f_spare, 0, sizeof(buf.f_spare));
        }
        if (copy_to_user(p, &buf, sizeof(buf)))
                return -EFAULT;
@@ -171,6 +171,7 @@ static int do_statfs64(struct kstatfs *st, struct statfs64 __user *p)
        if (sizeof(buf) == sizeof(*st))
                memcpy(&buf, st, sizeof(*st));
        else {
+               memset(&buf, 0, sizeof(buf));
                buf.f_type = st->f_type;
                buf.f_bsize = st->f_bsize;
                buf.f_blocks = st->f_blocks;
@@ -182,7 +183,6 @@ static int do_statfs64(struct kstatfs *st, struct statfs64 __user *p)
                buf.f_namelen = st->f_namelen;
                buf.f_frsize = st->f_frsize;
                buf.f_flags = st->f_flags;
-               memset(buf.f_spare, 0, sizeof(buf.f_spare));
        }
        if (copy_to_user(p, &buf, sizeof(buf)))
                return -EFAULT;
index 34afe41..05ff6ab 100644 (file)
@@ -54,7 +54,7 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = {
  * One thing we have to be careful of with a per-sb shrinker is that we don't
  * drop the last active reference to the superblock from within the shrinker.
  * If that happens we could trigger unregistering the shrinker from within the
- * shrinker path and that leads to deadlock on the shrinker_mutex. Hence we
+ * shrinker path and that leads to deadlock on the shrinker_rwsem. Hence we
  * take a passive reference to the superblock to avoid this from occurring.
  */
 static unsigned long super_cache_scan(struct shrinker *shrink,
@@ -595,7 +595,7 @@ retry:
        fc->s_fs_info = NULL;
        s->s_type = fc->fs_type;
        s->s_iflags |= fc->s_iflags;
-       strlcpy(s->s_id, s->s_type->name, sizeof(s->s_id));
+       strscpy(s->s_id, s->s_type->name, sizeof(s->s_id));
        list_add_tail(&s->s_list, &super_blocks);
        hlist_add_head(&s->s_instances, &s->s_type->fs_supers);
        spin_unlock(&sb_lock);
@@ -674,7 +674,7 @@ retry:
                return ERR_PTR(err);
        }
        s->s_type = type;
-       strlcpy(s->s_id, type->name, sizeof(s->s_id));
+       strscpy(s->s_id, type->name, sizeof(s->s_id));
        list_add_tail(&s->s_list, &super_blocks);
        hlist_add_head(&s->s_instances, &type->fs_supers);
        spin_unlock(&sb_lock);
@@ -903,6 +903,7 @@ int reconfigure_super(struct fs_context *fc)
        struct super_block *sb = fc->root->d_sb;
        int retval;
        bool remount_ro = false;
+       bool remount_rw = false;
        bool force = fc->sb_flags & SB_FORCE;
 
        if (fc->sb_flags_mask & ~MS_RMT_MASK)
@@ -920,7 +921,7 @@ int reconfigure_super(struct fs_context *fc)
                    bdev_read_only(sb->s_bdev))
                        return -EACCES;
 #endif
-
+               remount_rw = !(fc->sb_flags & SB_RDONLY) && sb_rdonly(sb);
                remount_ro = (fc->sb_flags & SB_RDONLY) && !sb_rdonly(sb);
        }
 
@@ -943,13 +944,18 @@ int reconfigure_super(struct fs_context *fc)
         */
        if (remount_ro) {
                if (force) {
-                       sb->s_readonly_remount = 1;
-                       smp_wmb();
+                       sb_start_ro_state_change(sb);
                } else {
                        retval = sb_prepare_remount_readonly(sb);
                        if (retval)
                                return retval;
                }
+       } else if (remount_rw) {
+               /*
+                * Protect filesystem's reconfigure code from writes from
+                * userspace until reconfigure finishes.
+                */
+               sb_start_ro_state_change(sb);
        }
 
        if (fc->ops->reconfigure) {
@@ -965,9 +971,7 @@ int reconfigure_super(struct fs_context *fc)
 
        WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
                                 (fc->sb_flags & fc->sb_flags_mask)));
-       /* Needs to be ordered wrt mnt_is_readonly() */
-       smp_wmb();
-       sb->s_readonly_remount = 0;
+       sb_end_ro_state_change(sb);
 
        /*
         * Some filesystems modify their metadata via some other path than the
@@ -982,7 +986,7 @@ int reconfigure_super(struct fs_context *fc)
        return 0;
 
 cancel_readonly:
-       sb->s_readonly_remount = 0;
+       sb_end_ro_state_change(sb);
        return retval;
 }
 
@@ -1206,6 +1210,22 @@ int get_tree_keyed(struct fs_context *fc,
 EXPORT_SYMBOL(get_tree_keyed);
 
 #ifdef CONFIG_BLOCK
+static void fs_mark_dead(struct block_device *bdev)
+{
+       struct super_block *sb;
+
+       sb = get_super(bdev);
+       if (!sb)
+               return;
+
+       if (sb->s_op->shutdown)
+               sb->s_op->shutdown(sb);
+       drop_super(sb);
+}
+
+static const struct blk_holder_ops fs_holder_ops = {
+       .mark_dead              = fs_mark_dead,
+};
 
 static int set_bdev_super(struct super_block *s, void *data)
 {
@@ -1239,16 +1259,13 @@ int get_tree_bdev(struct fs_context *fc,
 {
        struct block_device *bdev;
        struct super_block *s;
-       fmode_t mode = FMODE_READ | FMODE_EXCL;
        int error = 0;
 
-       if (!(fc->sb_flags & SB_RDONLY))
-               mode |= FMODE_WRITE;
-
        if (!fc->source)
                return invalf(fc, "No source specified");
 
-       bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type);
+       bdev = blkdev_get_by_path(fc->source, sb_open_mode(fc->sb_flags),
+                                 fc->fs_type, &fs_holder_ops);
        if (IS_ERR(bdev)) {
                errorf(fc, "%s: Can't open blockdev", fc->source);
                return PTR_ERR(bdev);
@@ -1262,7 +1279,7 @@ int get_tree_bdev(struct fs_context *fc,
        if (bdev->bd_fsfreeze_count > 0) {
                mutex_unlock(&bdev->bd_fsfreeze_mutex);
                warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
-               blkdev_put(bdev, mode);
+               blkdev_put(bdev, fc->fs_type);
                return -EBUSY;
        }
 
@@ -1271,7 +1288,7 @@ int get_tree_bdev(struct fs_context *fc,
        s = sget_fc(fc, test_bdev_super_fc, set_bdev_super_fc);
        mutex_unlock(&bdev->bd_fsfreeze_mutex);
        if (IS_ERR(s)) {
-               blkdev_put(bdev, mode);
+               blkdev_put(bdev, fc->fs_type);
                return PTR_ERR(s);
        }
 
@@ -1280,7 +1297,7 @@ int get_tree_bdev(struct fs_context *fc,
                if ((fc->sb_flags ^ s->s_flags) & SB_RDONLY) {
                        warnf(fc, "%pg: Can't mount, would change RO state", bdev);
                        deactivate_locked_super(s);
-                       blkdev_put(bdev, mode);
+                       blkdev_put(bdev, fc->fs_type);
                        return -EBUSY;
                }
 
@@ -1292,10 +1309,9 @@ int get_tree_bdev(struct fs_context *fc,
                 * holding an active reference.
                 */
                up_write(&s->s_umount);
-               blkdev_put(bdev, mode);
+               blkdev_put(bdev, fc->fs_type);
                down_write(&s->s_umount);
        } else {
-               s->s_mode = mode;
                snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
                shrinker_debugfs_rename(&s->s_shrink, "sb-%s:%s",
                                        fc->fs_type->name, s->s_id);
@@ -1327,13 +1343,10 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 {
        struct block_device *bdev;
        struct super_block *s;
-       fmode_t mode = FMODE_READ | FMODE_EXCL;
        int error = 0;
 
-       if (!(flags & SB_RDONLY))
-               mode |= FMODE_WRITE;
-
-       bdev = blkdev_get_by_path(dev_name, mode, fs_type);
+       bdev = blkdev_get_by_path(dev_name, sb_open_mode(flags), fs_type,
+                                 &fs_holder_ops);
        if (IS_ERR(bdev))
                return ERR_CAST(bdev);
 
@@ -1369,10 +1382,9 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
                 * holding an active reference.
                 */
                up_write(&s->s_umount);
-               blkdev_put(bdev, mode);
+               blkdev_put(bdev, fs_type);
                down_write(&s->s_umount);
        } else {
-               s->s_mode = mode;
                snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
                shrinker_debugfs_rename(&s->s_shrink, "sb-%s:%s",
                                        fs_type->name, s->s_id);
@@ -1392,7 +1404,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 error_s:
        error = PTR_ERR(s);
 error_bdev:
-       blkdev_put(bdev, mode);
+       blkdev_put(bdev, fs_type);
 error:
        return ERR_PTR(error);
 }
@@ -1401,13 +1413,11 @@ EXPORT_SYMBOL(mount_bdev);
 void kill_block_super(struct super_block *sb)
 {
        struct block_device *bdev = sb->s_bdev;
-       fmode_t mode = sb->s_mode;
 
        bdev->bd_super = NULL;
        generic_shutdown_super(sb);
        sync_blockdev(bdev);
-       WARN_ON_ONCE(!(mode & FMODE_EXCL));
-       blkdev_put(bdev, mode | FMODE_EXCL);
+       blkdev_put(bdev, sb->s_type);
 }
 
 EXPORT_SYMBOL(kill_block_super);
index cdb3d63..0140010 100644 (file)
@@ -52,7 +52,7 @@ static int sysv_handle_dirsync(struct inode *dir)
 }
 
 /*
- * Calls to dir_get_page()/put_and_unmap_page() must be nested according to the
+ * Calls to dir_get_page()/unmap_and_put_page() must be nested according to the
  * rules documented in mm/highmem.rst.
  *
  * NOTE: sysv_find_entry() and sysv_dotdot() act as calls to dir_get_page()
@@ -103,11 +103,11 @@ static int sysv_readdir(struct file *file, struct dir_context *ctx)
                        if (!dir_emit(ctx, name, strnlen(name,SYSV_NAMELEN),
                                        fs16_to_cpu(SYSV_SB(sb), de->inode),
                                        DT_UNKNOWN)) {
-                               put_and_unmap_page(page, kaddr);
+                               unmap_and_put_page(page, kaddr);
                                return 0;
                        }
                }
-               put_and_unmap_page(page, kaddr);
+               unmap_and_put_page(page, kaddr);
        }
        return 0;
 }
@@ -131,7 +131,7 @@ static inline int namecompare(int len, int maxlen,
  * itself (as a parameter - res_dir). It does NOT read the inode of the
  * entry - you'll have to do that yourself if you want to.
  *
- * On Success put_and_unmap_page() should be called on *res_page.
+ * On Success unmap_and_put_page() should be called on *res_page.
  *
  * sysv_find_entry() acts as a call to dir_get_page() and must be treated
  * accordingly for nesting purposes.
@@ -166,7 +166,7 @@ struct sysv_dir_entry *sysv_find_entry(struct dentry *dentry, struct page **res_
                                                        name, de->name))
                                        goto found;
                        }
-                       put_and_unmap_page(page, kaddr);
+                       unmap_and_put_page(page, kaddr);
                }
 
                if (++n >= npages)
@@ -209,7 +209,7 @@ int sysv_add_link(struct dentry *dentry, struct inode *inode)
                                goto out_page;
                        de++;
                }
-               put_and_unmap_page(page, kaddr);
+               unmap_and_put_page(page, kaddr);
        }
        BUG();
        return -EINVAL;
@@ -228,7 +228,7 @@ got_it:
        mark_inode_dirty(dir);
        err = sysv_handle_dirsync(dir);
 out_page:
-       put_and_unmap_page(page, kaddr);
+       unmap_and_put_page(page, kaddr);
        return err;
 out_unlock:
        unlock_page(page);
@@ -321,12 +321,12 @@ int sysv_empty_dir(struct inode * inode)
                        if (de->name[1] != '.' || de->name[2])
                                goto not_empty;
                }
-               put_and_unmap_page(page, kaddr);
+               unmap_and_put_page(page, kaddr);
        }
        return 1;
 
 not_empty:
-       put_and_unmap_page(page, kaddr);
+       unmap_and_put_page(page, kaddr);
        return 0;
 }
 
@@ -352,7 +352,7 @@ int sysv_set_link(struct sysv_dir_entry *de, struct page *page,
 }
 
 /*
- * Calls to dir_get_page()/put_and_unmap_page() must be nested according to the
+ * Calls to dir_get_page()/unmap_and_put_page() must be nested according to the
  * rules documented in mm/highmem.rst.
  *
  * sysv_dotdot() acts as a call to dir_get_page() and must be treated
@@ -376,7 +376,7 @@ ino_t sysv_inode_by_name(struct dentry *dentry)
        
        if (de) {
                res = fs16_to_cpu(SYSV_SB(dentry->d_sb), de->inode);
-               put_and_unmap_page(page, de);
+               unmap_and_put_page(page, de);
        }
        return res;
 }
index 50eb925..c645f60 100644 (file)
@@ -26,7 +26,7 @@ const struct file_operations sysv_file_operations = {
        .write_iter     = generic_file_write_iter,
        .mmap           = generic_file_mmap,
        .fsync          = generic_file_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
 };
 
 static int sysv_setattr(struct mnt_idmap *idmap,
index b22764f..58d7f43 100644 (file)
@@ -145,6 +145,10 @@ static int alloc_branch(struct inode *inode,
                 */
                parent = block_to_cpu(SYSV_SB(inode->i_sb), branch[n-1].key);
                bh = sb_getblk(inode->i_sb, parent);
+               if (!bh) {
+                       sysv_free_block(inode->i_sb, branch[n].key);
+                       break;
+               }
                lock_buffer(bh);
                memset(bh->b_data, 0, blocksize);
                branch[n].bh = bh;
index 2b2dba4..fcf163f 100644 (file)
@@ -164,7 +164,7 @@ static int sysv_unlink(struct inode * dir, struct dentry * dentry)
                inode->i_ctime = dir->i_ctime;
                inode_dec_link_count(inode);
        }
-       put_and_unmap_page(page, de);
+       unmap_and_put_page(page, de);
        return err;
 }
 
@@ -227,7 +227,7 @@ static int sysv_rename(struct mnt_idmap *idmap, struct inode *old_dir,
                if (!new_de)
                        goto out_dir;
                err = sysv_set_link(new_de, new_page, old_inode);
-               put_and_unmap_page(new_page, new_de);
+               unmap_and_put_page(new_page, new_de);
                if (err)
                        goto out_dir;
                new_inode->i_ctime = current_time(new_inode);
@@ -256,9 +256,9 @@ static int sysv_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 
 out_dir:
        if (dir_de)
-               put_and_unmap_page(dir_page, dir_de);
+               unmap_and_put_page(dir_page, dir_de);
 out_old:
-       put_and_unmap_page(old_page, old_de);
+       unmap_and_put_page(old_page, old_de);
 out:
        return err;
 }
index 979ab1d..6738fe4 100644 (file)
@@ -1669,7 +1669,7 @@ const struct file_operations ubifs_file_operations = {
        .mmap           = ubifs_file_mmap,
        .fsync          = ubifs_fsync,
        .unlocked_ioctl = ubifs_ioctl,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
        .splice_write   = iter_file_splice_write,
        .open           = fscrypt_file_open,
 #ifdef CONFIG_COMPAT
index 8238f74..29daf5d 100644 (file)
@@ -209,7 +209,7 @@ const struct file_operations udf_file_operations = {
        .write_iter             = udf_file_write_iter,
        .release                = udf_release_file,
        .fsync                  = generic_file_fsync,
-       .splice_read            = generic_file_splice_read,
+       .splice_read            = filemap_splice_read,
        .splice_write           = iter_file_splice_write,
        .llseek                 = generic_file_llseek,
 };
index fd20423..fd29a66 100644 (file)
@@ -793,11 +793,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
                        if (!empty_dir(new_inode))
                                goto out_oiter;
                }
-               /*
-                * We need to protect against old_inode getting converted from
-                * ICB to normal directory.
-                */
-               inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
                retval = udf_fiiter_find_entry(old_inode, &dotdot_name,
                                               &diriter);
                if (retval == -ENOENT) {
@@ -806,10 +801,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
                                old_inode->i_ino);
                        retval = -EFSCORRUPTED;
                }
-               if (retval) {
-                       inode_unlock(old_inode);
+               if (retval)
                        goto out_oiter;
-               }
                has_diriter = true;
                tloc = lelb_to_cpu(diriter.fi.icb.extLocation);
                if (udf_get_lb_pblock(old_inode->i_sb, &tloc, 0) !=
@@ -889,7 +882,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
                               udf_dir_entry_len(&diriter.fi));
                udf_fiiter_write_fi(&diriter, NULL);
                udf_fiiter_release(&diriter);
-               inode_unlock(old_inode);
 
                inode_dec_link_count(old_dir);
                if (new_inode)
@@ -901,10 +893,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
        }
        return 0;
 out_oiter:
-       if (has_diriter) {
+       if (has_diriter)
                udf_fiiter_release(&diriter);
-               inode_unlock(old_inode);
-       }
        udf_fiiter_release(&oiter);
 
        return retval;
index 7e08758..6558882 100644 (file)
@@ -41,5 +41,5 @@ const struct file_operations ufs_file_operations = {
        .mmap           = generic_file_mmap,
        .open           = generic_file_open,
        .fsync          = generic_file_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = filemap_splice_read,
 };
index 0fd96d6..4e800bb 100644 (file)
@@ -1332,6 +1332,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
        bool basic_ioctls;
        unsigned long start, end, vma_end;
        struct vma_iterator vmi;
+       pgoff_t pgoff;
 
        user_uffdio_register = (struct uffdio_register __user *) arg;
 
@@ -1459,6 +1460,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 
        vma_iter_set(&vmi, start);
        prev = vma_prev(&vmi);
+       if (vma->vm_start < start)
+               prev = vma;
 
        ret = 0;
        for_each_vma_range(vmi, vma, end) {
@@ -1482,8 +1485,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
                vma_end = min(end, vma->vm_end);
 
                new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
+               pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
                prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
-                                vma->anon_vma, vma->vm_file, vma->vm_pgoff,
+                                vma->anon_vma, vma->vm_file, pgoff,
                                 vma_policy(vma),
                                 ((struct vm_userfaultfd_ctx){ ctx }),
                                 anon_vma_name(vma));
@@ -1563,6 +1567,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
        unsigned long start, end, vma_end;
        const void __user *buf = (void __user *)arg;
        struct vma_iterator vmi;
+       pgoff_t pgoff;
 
        ret = -EFAULT;
        if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
@@ -1625,6 +1630,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 
        vma_iter_set(&vmi, start);
        prev = vma_prev(&vmi);
+       if (vma->vm_start < start)
+               prev = vma;
+
        ret = 0;
        for_each_vma_range(vmi, vma, end) {
                cond_resched();
@@ -1662,8 +1670,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
                        uffd_wp_range(vma, start, vma_end - start, false);
 
                new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
+               pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
                prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
-                                vma->anon_vma, vma->vm_file, vma->vm_pgoff,
+                                vma->anon_vma, vma->vm_file, pgoff,
                                 vma_policy(vma),
                                 NULL_VM_UFFD_CTX, anon_vma_name(vma));
                if (prev) {
index 572aa1c..2307f80 100644 (file)
@@ -217,7 +217,7 @@ const struct file_operations vboxsf_reg_fops = {
        .open = vboxsf_file_open,
        .release = vboxsf_file_release,
        .fsync = noop_fsync,
-       .splice_read = generic_file_splice_read,
+       .splice_read = filemap_splice_read,
 };
 
 const struct inode_operations vboxsf_reg_iops = {
index a7ffd71..e1036e5 100644 (file)
@@ -39,14 +39,14 @@ config FS_VERITY_BUILTIN_SIGNATURES
        depends on FS_VERITY
        select SYSTEM_DATA_VERIFICATION
        help
-         Support verifying signatures of verity files against the X.509
-         certificates that have been loaded into the ".fs-verity"
-         kernel keyring.
+         This option adds support for in-kernel verification of
+         fs-verity builtin signatures.
 
-         This is meant as a relatively simple mechanism that can be
-         used to provide an authenticity guarantee for verity files, as
-         an alternative to IMA appraisal.  Userspace programs still
-         need to check that the verity bit is set in order to get an
-         authenticity guarantee.
+         Please take great care before using this feature.  It is not
+         the only way to do signatures with fs-verity, and the
+         alternatives (such as userspace signature verification, and
+         IMA appraisal) can be much better.  For details about the
+         limitations of this feature, see
+         Documentation/filesystems/fsverity.rst.
 
          If unsure, say N.
index fc4c50e..c284f46 100644 (file)
@@ -7,6 +7,7 @@
 
 #include "fsverity_private.h"
 
+#include <crypto/hash.h>
 #include <linux/mount.h>
 #include <linux/sched/signal.h>
 #include <linux/uaccess.h>
@@ -20,7 +21,7 @@ struct block_buffer {
 /* Hash a block, writing the result to the next level's pending block buffer. */
 static int hash_one_block(struct inode *inode,
                          const struct merkle_tree_params *params,
-                         struct ahash_request *req, struct block_buffer *cur)
+                         struct block_buffer *cur)
 {
        struct block_buffer *next = cur + 1;
        int err;
@@ -36,8 +37,7 @@ static int hash_one_block(struct inode *inode,
        /* Zero-pad the block if it's shorter than the block size. */
        memset(&cur->data[cur->filled], 0, params->block_size - cur->filled);
 
-       err = fsverity_hash_block(params, inode, req, virt_to_page(cur->data),
-                                 offset_in_page(cur->data),
+       err = fsverity_hash_block(params, inode, cur->data,
                                  &next->data[next->filled]);
        if (err)
                return err;
@@ -76,7 +76,6 @@ static int build_merkle_tree(struct file *filp,
        struct inode *inode = file_inode(filp);
        const u64 data_size = inode->i_size;
        const int num_levels = params->num_levels;
-       struct ahash_request *req;
        struct block_buffer _buffers[1 + FS_VERITY_MAX_LEVELS + 1] = {};
        struct block_buffer *buffers = &_buffers[1];
        unsigned long level_offset[FS_VERITY_MAX_LEVELS];
@@ -90,9 +89,6 @@ static int build_merkle_tree(struct file *filp,
                return 0;
        }
 
-       /* This allocation never fails, since it's mempool-backed. */
-       req = fsverity_alloc_hash_request(params->hash_alg, GFP_KERNEL);
-
        /*
         * Allocate the block buffers.  Buffer "-1" is for data blocks.
         * Buffers 0 <= level < num_levels are for the actual tree levels.
@@ -130,7 +126,7 @@ static int build_merkle_tree(struct file *filp,
                        fsverity_err(inode, "Short read of file data");
                        goto out;
                }
-               err = hash_one_block(inode, params, req, &buffers[-1]);
+               err = hash_one_block(inode, params, &buffers[-1]);
                if (err)
                        goto out;
                for (level = 0; level < num_levels; level++) {
@@ -141,8 +137,7 @@ static int build_merkle_tree(struct file *filp,
                        }
                        /* Next block at @level is full */
 
-                       err = hash_one_block(inode, params, req,
-                                            &buffers[level]);
+                       err = hash_one_block(inode, params, &buffers[level]);
                        if (err)
                                goto out;
                        err = write_merkle_tree_block(inode,
@@ -162,8 +157,7 @@ static int build_merkle_tree(struct file *filp,
        /* Finish all nonempty pending tree blocks. */
        for (level = 0; level < num_levels; level++) {
                if (buffers[level].filled != 0) {
-                       err = hash_one_block(inode, params, req,
-                                            &buffers[level]);
+                       err = hash_one_block(inode, params, &buffers[level]);
                        if (err)
                                goto out;
                        err = write_merkle_tree_block(inode,
@@ -183,7 +177,6 @@ static int build_merkle_tree(struct file *filp,
 out:
        for (level = -1; level < num_levels; level++)
                kfree(buffers[level].data);
-       fsverity_free_hash_request(params->hash_alg, req);
        return err;
 }
 
@@ -215,7 +208,7 @@ static int enable_verity(struct file *filp,
        }
        desc->salt_size = arg->salt_size;
 
-       /* Get the signature if the user provided one */
+       /* Get the builtin signature if the user provided one */
        if (arg->sig_size &&
            copy_from_user(desc->signature, u64_to_user_ptr(arg->sig_ptr),
                           arg->sig_size)) {
index d34dcc0..49bf3a1 100644 (file)
@@ -11,9 +11,6 @@
 #define pr_fmt(fmt) "fs-verity: " fmt
 
 #include <linux/fsverity.h>
-#include <linux/mempool.h>
-
-struct ahash_request;
 
 /*
  * Implementation limit: maximum depth of the Merkle tree.  For now 8 is plenty;
@@ -23,11 +20,10 @@ struct ahash_request;
 
 /* A hash algorithm supported by fs-verity */
 struct fsverity_hash_alg {
-       struct crypto_ahash *tfm; /* hash tfm, allocated on demand */
+       struct crypto_shash *tfm; /* hash tfm, allocated on demand */
        const char *name;         /* crypto API name, e.g. sha256 */
        unsigned int digest_size; /* digest size in bytes, e.g. 32 for SHA-256 */
        unsigned int block_size;  /* block size in bytes, e.g. 64 for SHA-256 */
-       mempool_t req_pool;       /* mempool with a preallocated hash request */
        /*
         * The HASH_ALGO_* constant for this algorithm.  This is different from
         * FS_VERITY_HASH_ALG_*, which uses a different numbering scheme.
@@ -37,7 +33,7 @@ struct fsverity_hash_alg {
 
 /* Merkle tree parameters: hash algorithm, initial hash state, and topology */
 struct merkle_tree_params {
-       struct fsverity_hash_alg *hash_alg; /* the hash algorithm */
+       const struct fsverity_hash_alg *hash_alg; /* the hash algorithm */
        const u8 *hashstate;            /* initial hash state or NULL */
        unsigned int digest_size;       /* same as hash_alg->digest_size */
        unsigned int block_size;        /* size of data and tree blocks */
@@ -83,18 +79,13 @@ struct fsverity_info {
 
 extern struct fsverity_hash_alg fsverity_hash_algs[];
 
-struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
-                                               unsigned int num);
-struct ahash_request *fsverity_alloc_hash_request(struct fsverity_hash_alg *alg,
-                                                 gfp_t gfp_flags);
-void fsverity_free_hash_request(struct fsverity_hash_alg *alg,
-                               struct ahash_request *req);
-const u8 *fsverity_prepare_hash_state(struct fsverity_hash_alg *alg,
+const struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
+                                                     unsigned int num);
+const u8 *fsverity_prepare_hash_state(const struct fsverity_hash_alg *alg,
                                      const u8 *salt, size_t salt_size);
 int fsverity_hash_block(const struct merkle_tree_params *params,
-                       const struct inode *inode, struct ahash_request *req,
-                       struct page *page, unsigned int offset, u8 *out);
-int fsverity_hash_buffer(struct fsverity_hash_alg *alg,
+                       const struct inode *inode, const void *data, u8 *out);
+int fsverity_hash_buffer(const struct fsverity_hash_alg *alg,
                         const void *data, size_t size, u8 *out);
 void __init fsverity_check_hash_algs(void);
 
index ea00dbe..c598d20 100644 (file)
@@ -8,7 +8,6 @@
 #include "fsverity_private.h"
 
 #include <crypto/hash.h>
-#include <linux/scatterlist.h>
 
 /* The hash algorithms supported by fs-verity */
 struct fsverity_hash_alg fsverity_hash_algs[] = {
@@ -40,11 +39,11 @@ static DEFINE_MUTEX(fsverity_hash_alg_init_mutex);
  *
  * Return: pointer to the hash alg on success, else an ERR_PTR()
  */
-struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
-                                               unsigned int num)
+const struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
+                                                     unsigned int num)
 {
        struct fsverity_hash_alg *alg;
-       struct crypto_ahash *tfm;
+       struct crypto_shash *tfm;
        int err;
 
        if (num >= ARRAY_SIZE(fsverity_hash_algs) ||
@@ -63,11 +62,7 @@ struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
        if (alg->tfm != NULL)
                goto out_unlock;
 
-       /*
-        * Using the shash API would make things a bit simpler, but the ahash
-        * API is preferable as it allows the use of crypto accelerators.
-        */
-       tfm = crypto_alloc_ahash(alg->name, 0, 0);
+       tfm = crypto_alloc_shash(alg->name, 0, 0);
        if (IS_ERR(tfm)) {
                if (PTR_ERR(tfm) == -ENOENT) {
                        fsverity_warn(inode,
@@ -84,26 +79,20 @@ struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
        }
 
        err = -EINVAL;
-       if (WARN_ON_ONCE(alg->digest_size != crypto_ahash_digestsize(tfm)))
+       if (WARN_ON_ONCE(alg->digest_size != crypto_shash_digestsize(tfm)))
                goto err_free_tfm;
-       if (WARN_ON_ONCE(alg->block_size != crypto_ahash_blocksize(tfm)))
-               goto err_free_tfm;
-
-       err = mempool_init_kmalloc_pool(&alg->req_pool, 1,
-                                       sizeof(struct ahash_request) +
-                                       crypto_ahash_reqsize(tfm));
-       if (err)
+       if (WARN_ON_ONCE(alg->block_size != crypto_shash_blocksize(tfm)))
                goto err_free_tfm;
 
        pr_info("%s using implementation \"%s\"\n",
-               alg->name, crypto_ahash_driver_name(tfm));
+               alg->name, crypto_shash_driver_name(tfm));
 
        /* pairs with smp_load_acquire() above */
        smp_store_release(&alg->tfm, tfm);
        goto out_unlock;
 
 err_free_tfm:
-       crypto_free_ahash(tfm);
+       crypto_free_shash(tfm);
        alg = ERR_PTR(err);
 out_unlock:
        mutex_unlock(&fsverity_hash_alg_init_mutex);
@@ -111,42 +100,6 @@ out_unlock:
 }
 
 /**
- * fsverity_alloc_hash_request() - allocate a hash request object
- * @alg: the hash algorithm for which to allocate the request
- * @gfp_flags: memory allocation flags
- *
- * This is mempool-backed, so this never fails if __GFP_DIRECT_RECLAIM is set in
- * @gfp_flags.  However, in that case this might need to wait for all
- * previously-allocated requests to be freed.  So to avoid deadlocks, callers
- * must never need multiple requests at a time to make forward progress.
- *
- * Return: the request object on success; NULL on failure (but see above)
- */
-struct ahash_request *fsverity_alloc_hash_request(struct fsverity_hash_alg *alg,
-                                                 gfp_t gfp_flags)
-{
-       struct ahash_request *req = mempool_alloc(&alg->req_pool, gfp_flags);
-
-       if (req)
-               ahash_request_set_tfm(req, alg->tfm);
-       return req;
-}
-
-/**
- * fsverity_free_hash_request() - free a hash request object
- * @alg: the hash algorithm
- * @req: the hash request object to free
- */
-void fsverity_free_hash_request(struct fsverity_hash_alg *alg,
-                               struct ahash_request *req)
-{
-       if (req) {
-               ahash_request_zero(req);
-               mempool_free(req, &alg->req_pool);
-       }
-}
-
-/**
  * fsverity_prepare_hash_state() - precompute the initial hash state
  * @alg: hash algorithm
  * @salt: a salt which is to be prepended to all data to be hashed
@@ -155,27 +108,24 @@ void fsverity_free_hash_request(struct fsverity_hash_alg *alg,
  * Return: NULL if the salt is empty, otherwise the kmalloc()'ed precomputed
  *        initial hash state on success or an ERR_PTR() on failure.
  */
-const u8 *fsverity_prepare_hash_state(struct fsverity_hash_alg *alg,
+const u8 *fsverity_prepare_hash_state(const struct fsverity_hash_alg *alg,
                                      const u8 *salt, size_t salt_size)
 {
        u8 *hashstate = NULL;
-       struct ahash_request *req = NULL;
+       SHASH_DESC_ON_STACK(desc, alg->tfm);
        u8 *padded_salt = NULL;
        size_t padded_salt_size;
-       struct scatterlist sg;
-       DECLARE_CRYPTO_WAIT(wait);
        int err;
 
+       desc->tfm = alg->tfm;
+
        if (salt_size == 0)
                return NULL;
 
-       hashstate = kmalloc(crypto_ahash_statesize(alg->tfm), GFP_KERNEL);
+       hashstate = kmalloc(crypto_shash_statesize(alg->tfm), GFP_KERNEL);
        if (!hashstate)
                return ERR_PTR(-ENOMEM);
 
-       /* This allocation never fails, since it's mempool-backed. */
-       req = fsverity_alloc_hash_request(alg, GFP_KERNEL);
-
        /*
         * Zero-pad the salt to the next multiple of the input size of the hash
         * algorithm's compression function, e.g. 64 bytes for SHA-256 or 128
@@ -190,26 +140,18 @@ const u8 *fsverity_prepare_hash_state(struct fsverity_hash_alg *alg,
                goto err_free;
        }
        memcpy(padded_salt, salt, salt_size);
-
-       sg_init_one(&sg, padded_salt, padded_salt_size);
-       ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
-                                       CRYPTO_TFM_REQ_MAY_BACKLOG,
-                                  crypto_req_done, &wait);
-       ahash_request_set_crypt(req, &sg, NULL, padded_salt_size);
-
-       err = crypto_wait_req(crypto_ahash_init(req), &wait);
+       err = crypto_shash_init(desc);
        if (err)
                goto err_free;
 
-       err = crypto_wait_req(crypto_ahash_update(req), &wait);
+       err = crypto_shash_update(desc, padded_salt, padded_salt_size);
        if (err)
                goto err_free;
 
-       err = crypto_ahash_export(req, hashstate);
+       err = crypto_shash_export(desc, hashstate);
        if (err)
                goto err_free;
 out:
-       fsverity_free_hash_request(alg, req);
        kfree(padded_salt);
        return hashstate;
 
@@ -223,9 +165,7 @@ err_free:
  * fsverity_hash_block() - hash a single data or hash block
  * @params: the Merkle tree's parameters
  * @inode: inode for which the hashing is being done
- * @req: preallocated hash request
- * @page: the page containing the block to hash
- * @offset: the offset of the block within @page
+ * @data: virtual address of a buffer containing the block to hash
  * @out: output digest, size 'params->digest_size' bytes
  *
  * Hash a single data or hash block.  The hash is salted if a salt is specified
@@ -234,33 +174,24 @@ err_free:
  * Return: 0 on success, -errno on failure
  */
 int fsverity_hash_block(const struct merkle_tree_params *params,
-                       const struct inode *inode, struct ahash_request *req,
-                       struct page *page, unsigned int offset, u8 *out)
+                       const struct inode *inode, const void *data, u8 *out)
 {
-       struct scatterlist sg;
-       DECLARE_CRYPTO_WAIT(wait);
+       SHASH_DESC_ON_STACK(desc, params->hash_alg->tfm);
        int err;
 
-       sg_init_table(&sg, 1);
-       sg_set_page(&sg, page, params->block_size, offset);
-       ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
-                                       CRYPTO_TFM_REQ_MAY_BACKLOG,
-                                  crypto_req_done, &wait);
-       ahash_request_set_crypt(req, &sg, out, params->block_size);
+       desc->tfm = params->hash_alg->tfm;
 
        if (params->hashstate) {
-               err = crypto_ahash_import(req, params->hashstate);
+               err = crypto_shash_import(desc, params->hashstate);
                if (err) {
                        fsverity_err(inode,
                                     "Error %d importing hash state", err);
                        return err;
                }
-               err = crypto_ahash_finup(req);
+               err = crypto_shash_finup(desc, data, params->block_size, out);
        } else {
-               err = crypto_ahash_digest(req);
+               err = crypto_shash_digest(desc, data, params->block_size, out);
        }
-
-       err = crypto_wait_req(err, &wait);
        if (err)
                fsverity_err(inode, "Error %d computing block hash", err);
        return err;
@@ -273,32 +204,12 @@ int fsverity_hash_block(const struct merkle_tree_params *params,
  * @size: size of data to hash, in bytes
  * @out: output digest, size 'alg->digest_size' bytes
  *
- * Hash some data which is located in physically contiguous memory (i.e. memory
- * allocated by kmalloc(), not by vmalloc()).  No salt is used.
- *
  * Return: 0 on success, -errno on failure
  */
-int fsverity_hash_buffer(struct fsverity_hash_alg *alg,
+int fsverity_hash_buffer(const struct fsverity_hash_alg *alg,
                         const void *data, size_t size, u8 *out)
 {
-       struct ahash_request *req;
-       struct scatterlist sg;
-       DECLARE_CRYPTO_WAIT(wait);
-       int err;
-
-       /* This allocation never fails, since it's mempool-backed. */
-       req = fsverity_alloc_hash_request(alg, GFP_KERNEL);
-
-       sg_init_one(&sg, data, size);
-       ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
-                                       CRYPTO_TFM_REQ_MAY_BACKLOG,
-                                  crypto_req_done, &wait);
-       ahash_request_set_crypt(req, &sg, out, size);
-
-       err = crypto_wait_req(crypto_ahash_digest(req), &wait);
-
-       fsverity_free_hash_request(alg, req);
-       return err;
+       return crypto_shash_tfm_digest(alg->tfm, data, size, out);
 }
 
 void __init fsverity_check_hash_algs(void)
index 5c79ea1..eec5956 100644 (file)
@@ -61,27 +61,42 @@ EXPORT_SYMBOL_GPL(fsverity_ioctl_measure);
 /**
  * fsverity_get_digest() - get a verity file's digest
  * @inode: inode to get digest of
- * @digest: (out) pointer to the digest
- * @alg: (out) pointer to the hash algorithm enumeration
+ * @raw_digest: (out) the raw file digest
+ * @alg: (out) the digest's algorithm, as a FS_VERITY_HASH_ALG_* value
+ * @halg: (out) the digest's algorithm, as a HASH_ALGO_* value
  *
- * Return the file hash algorithm and digest of an fsverity protected file.
- * Assumption: before calling this, the file must have been opened.
+ * Retrieves the fsverity digest of the given file.  The file must have been
+ * opened at least once since the inode was last loaded into the inode cache;
+ * otherwise this function will not recognize when fsverity is enabled.
  *
- * Return: 0 on success, -errno on failure
+ * The file's fsverity digest consists of @raw_digest in combination with either
+ * @alg or @halg.  (The caller can choose which one of @alg or @halg to use.)
+ *
+ * IMPORTANT: Callers *must* make use of one of the two algorithm IDs, since
+ * @raw_digest is meaningless without knowing which algorithm it uses!  fsverity
+ * provides no security guarantee for users who ignore the algorithm ID, even if
+ * they use the digest size (since algorithms can share the same digest size).
+ *
+ * Return: The size of the raw digest in bytes, or 0 if the file doesn't have
+ *        fsverity enabled.
  */
 int fsverity_get_digest(struct inode *inode,
-                       u8 digest[FS_VERITY_MAX_DIGEST_SIZE],
-                       enum hash_algo *alg)
+                       u8 raw_digest[FS_VERITY_MAX_DIGEST_SIZE],
+                       u8 *alg, enum hash_algo *halg)
 {
        const struct fsverity_info *vi;
        const struct fsverity_hash_alg *hash_alg;
 
        vi = fsverity_get_info(inode);
        if (!vi)
-               return -ENODATA; /* not a verity file */
+               return 0; /* not a verity file */
 
        hash_alg = vi->tree_params.hash_alg;
-       memcpy(digest, vi->file_digest, hash_alg->digest_size);
-       *alg = hash_alg->algo_id;
-       return 0;
+       memcpy(raw_digest, vi->file_digest, hash_alg->digest_size);
+       if (alg)
+               *alg = hash_alg - fsverity_hash_algs;
+       if (halg)
+               *halg = hash_alg->algo_id;
+       return hash_alg->digest_size;
 }
+EXPORT_SYMBOL_GPL(fsverity_get_digest);
index 52048b7..1db5106 100644 (file)
@@ -32,7 +32,7 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
                                     unsigned int log_blocksize,
                                     const u8 *salt, size_t salt_size)
 {
-       struct fsverity_hash_alg *hash_alg;
+       const struct fsverity_hash_alg *hash_alg;
        int err;
        u64 blocks;
        u64 blocks_in_level[FS_VERITY_MAX_LEVELS];
@@ -156,9 +156,9 @@ out_err:
 
 /*
  * Compute the file digest by hashing the fsverity_descriptor excluding the
- * signature and with the sig_size field set to 0.
+ * builtin signature and with the sig_size field set to 0.
  */
-static int compute_file_digest(struct fsverity_hash_alg *hash_alg,
+static int compute_file_digest(const struct fsverity_hash_alg *hash_alg,
                               struct fsverity_descriptor *desc,
                               u8 *file_digest)
 {
@@ -174,7 +174,7 @@ static int compute_file_digest(struct fsverity_hash_alg *hash_alg,
 
 /*
  * Create a new fsverity_info from the given fsverity_descriptor (with optional
- * appended signature), and check the signature if present.  The
+ * appended builtin signature), and check the signature if present.  The
  * fsverity_descriptor must have already undergone basic validation.
  */
 struct fsverity_info *fsverity_create_info(const struct inode *inode,
@@ -319,8 +319,8 @@ static bool validate_fsverity_descriptor(struct inode *inode,
 }
 
 /*
- * Read the inode's fsverity_descriptor (with optional appended signature) from
- * the filesystem, and do basic validation of it.
+ * Read the inode's fsverity_descriptor (with optional appended builtin
+ * signature) from the filesystem, and do basic validation of it.
  */
 int fsverity_get_descriptor(struct inode *inode,
                            struct fsverity_descriptor **desc_ret)
index 2aefc55..f584327 100644 (file)
@@ -105,7 +105,7 @@ static int fsverity_read_descriptor(struct inode *inode,
        if (res)
                return res;
 
-       /* don't include the signature */
+       /* don't include the builtin signature */
        desc_size = offsetof(struct fsverity_descriptor, signature);
        desc->sig_size = 0;
 
@@ -131,7 +131,7 @@ static int fsverity_read_signature(struct inode *inode,
        }
 
        /*
-        * Include only the signature.  Note that fsverity_get_descriptor()
+        * Include only the builtin signature.  fsverity_get_descriptor()
         * already verified that sig_size is in-bounds.
         */
        res = fsverity_read_buffer(buf, offset, length, desc->signature,
index b8c51ad..72034bc 100644 (file)
@@ -5,6 +5,14 @@
  * Copyright 2019 Google LLC
  */
 
+/*
+ * This file implements verification of fs-verity builtin signatures.  Please
+ * take great care before using this feature.  It is not the only way to do
+ * signatures with fs-verity, and the alternatives (such as userspace signature
+ * verification, and IMA appraisal) can be much better.  For details about the
+ * limitations of this feature, see Documentation/filesystems/fsverity.rst.
+ */
+
 #include "fsverity_private.h"
 
 #include <linux/cred.h>
index e250822..433cef5 100644 (file)
 
 static struct workqueue_struct *fsverity_read_workqueue;
 
-static inline int cmp_hashes(const struct fsverity_info *vi,
-                            const u8 *want_hash, const u8 *real_hash,
-                            u64 data_pos, int level)
-{
-       const unsigned int hsize = vi->tree_params.digest_size;
-
-       if (memcmp(want_hash, real_hash, hsize) == 0)
-               return 0;
-
-       fsverity_err(vi->inode,
-                    "FILE CORRUPTED! pos=%llu, level=%d, want_hash=%s:%*phN, real_hash=%s:%*phN",
-                    data_pos, level,
-                    vi->tree_params.hash_alg->name, hsize, want_hash,
-                    vi->tree_params.hash_alg->name, hsize, real_hash);
-       return -EBADMSG;
-}
-
-static bool data_is_zeroed(struct inode *inode, struct page *page,
-                          unsigned int len, unsigned int offset)
-{
-       void *virt = kmap_local_page(page);
-
-       if (memchr_inv(virt + offset, 0, len)) {
-               kunmap_local(virt);
-               fsverity_err(inode,
-                            "FILE CORRUPTED!  Data past EOF is not zeroed");
-               return false;
-       }
-       kunmap_local(virt);
-       return true;
-}
-
 /*
  * Returns true if the hash block with index @hblock_idx in the tree, located in
  * @hpage, has already been verified.
@@ -122,9 +90,7 @@ static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage,
  */
 static bool
 verify_data_block(struct inode *inode, struct fsverity_info *vi,
-                 struct ahash_request *req, struct page *data_page,
-                 u64 data_pos, unsigned int dblock_offset_in_page,
-                 unsigned long max_ra_pages)
+                 const void *data, u64 data_pos, unsigned long max_ra_pages)
 {
        const struct merkle_tree_params *params = &vi->tree_params;
        const unsigned int hsize = params->digest_size;
@@ -136,11 +102,11 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
        struct {
                /* Page containing the hash block */
                struct page *page;
+               /* Mapped address of the hash block (will be within @page) */
+               const void *addr;
                /* Index of the hash block in the tree overall */
                unsigned long index;
-               /* Byte offset of the hash block within @page */
-               unsigned int offset_in_page;
-               /* Byte offset of the wanted hash within @page */
+               /* Byte offset of the wanted hash relative to @addr */
                unsigned int hoffset;
        } hblocks[FS_VERITY_MAX_LEVELS];
        /*
@@ -148,7 +114,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
         * index of that block's hash within the current level.
         */
        u64 hidx = data_pos >> params->log_blocksize;
-       int err;
+
+       /* Up to 1 + FS_VERITY_MAX_LEVELS pages may be mapped at once */
+       BUILD_BUG_ON(1 + FS_VERITY_MAX_LEVELS > KM_MAX_IDX);
 
        if (unlikely(data_pos >= inode->i_size)) {
                /*
@@ -159,8 +127,12 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
                 * any part past EOF should be all zeroes.  Therefore, we need
                 * to verify that any data blocks fully past EOF are all zeroes.
                 */
-               return data_is_zeroed(inode, data_page, params->block_size,
-                                     dblock_offset_in_page);
+               if (memchr_inv(data, 0, params->block_size)) {
+                       fsverity_err(inode,
+                                    "FILE CORRUPTED!  Data past EOF is not zeroed");
+                       return false;
+               }
+               return true;
        }
 
        /*
@@ -175,6 +147,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
                unsigned int hblock_offset_in_page;
                unsigned int hoffset;
                struct page *hpage;
+               const void *haddr;
 
                /*
                 * The index of the block in the current level; also the index
@@ -192,30 +165,30 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
                hblock_offset_in_page =
                        (hblock_idx << params->log_blocksize) & ~PAGE_MASK;
 
-               /* Byte offset of the hash within the page */
-               hoffset = hblock_offset_in_page +
-                         ((hidx << params->log_digestsize) &
-                          (params->block_size - 1));
+               /* Byte offset of the hash within the block */
+               hoffset = (hidx << params->log_digestsize) &
+                         (params->block_size - 1);
 
                hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode,
                                hpage_idx, level == 0 ? min(max_ra_pages,
                                        params->tree_pages - hpage_idx) : 0);
                if (IS_ERR(hpage)) {
-                       err = PTR_ERR(hpage);
                        fsverity_err(inode,
-                                    "Error %d reading Merkle tree page %lu",
-                                    err, hpage_idx);
-                       goto out;
+                                    "Error %ld reading Merkle tree page %lu",
+                                    PTR_ERR(hpage), hpage_idx);
+                       goto error;
                }
+               haddr = kmap_local_page(hpage) + hblock_offset_in_page;
                if (is_hash_block_verified(vi, hpage, hblock_idx)) {
-                       memcpy_from_page(_want_hash, hpage, hoffset, hsize);
+                       memcpy(_want_hash, haddr + hoffset, hsize);
                        want_hash = _want_hash;
+                       kunmap_local(haddr);
                        put_page(hpage);
                        goto descend;
                }
                hblocks[level].page = hpage;
+               hblocks[level].addr = haddr;
                hblocks[level].index = hblock_idx;
-               hblocks[level].offset_in_page = hblock_offset_in_page;
                hblocks[level].hoffset = hoffset;
                hidx = next_hidx;
        }
@@ -225,18 +198,14 @@ descend:
        /* Descend the tree verifying hash blocks. */
        for (; level > 0; level--) {
                struct page *hpage = hblocks[level - 1].page;
+               const void *haddr = hblocks[level - 1].addr;
                unsigned long hblock_idx = hblocks[level - 1].index;
-               unsigned int hblock_offset_in_page =
-                       hblocks[level - 1].offset_in_page;
                unsigned int hoffset = hblocks[level - 1].hoffset;
 
-               err = fsverity_hash_block(params, inode, req, hpage,
-                                         hblock_offset_in_page, real_hash);
-               if (err)
-                       goto out;
-               err = cmp_hashes(vi, want_hash, real_hash, data_pos, level - 1);
-               if (err)
-                       goto out;
+               if (fsverity_hash_block(params, inode, haddr, real_hash) != 0)
+                       goto error;
+               if (memcmp(want_hash, real_hash, hsize) != 0)
+                       goto corrupted;
                /*
                 * Mark the hash block as verified.  This must be atomic and
                 * idempotent, as the same hash block might be verified by
@@ -246,29 +215,39 @@ descend:
                        set_bit(hblock_idx, vi->hash_block_verified);
                else
                        SetPageChecked(hpage);
-               memcpy_from_page(_want_hash, hpage, hoffset, hsize);
+               memcpy(_want_hash, haddr + hoffset, hsize);
                want_hash = _want_hash;
+               kunmap_local(haddr);
                put_page(hpage);
        }
 
        /* Finally, verify the data block. */
-       err = fsverity_hash_block(params, inode, req, data_page,
-                                 dblock_offset_in_page, real_hash);
-       if (err)
-               goto out;
-       err = cmp_hashes(vi, want_hash, real_hash, data_pos, -1);
-out:
-       for (; level > 0; level--)
-               put_page(hblocks[level - 1].page);
+       if (fsverity_hash_block(params, inode, data, real_hash) != 0)
+               goto error;
+       if (memcmp(want_hash, real_hash, hsize) != 0)
+               goto corrupted;
+       return true;
 
-       return err == 0;
+corrupted:
+       fsverity_err(inode,
+                    "FILE CORRUPTED! pos=%llu, level=%d, want_hash=%s:%*phN, real_hash=%s:%*phN",
+                    data_pos, level - 1,
+                    params->hash_alg->name, hsize, want_hash,
+                    params->hash_alg->name, hsize, real_hash);
+error:
+       for (; level > 0; level--) {
+               kunmap_local(hblocks[level - 1].addr);
+               put_page(hblocks[level - 1].page);
+       }
+       return false;
 }
 
 static bool
-verify_data_blocks(struct inode *inode, struct fsverity_info *vi,
-                  struct ahash_request *req, struct folio *data_folio,
-                  size_t len, size_t offset, unsigned long max_ra_pages)
+verify_data_blocks(struct folio *data_folio, size_t len, size_t offset,
+                  unsigned long max_ra_pages)
 {
+       struct inode *inode = data_folio->mapping->host;
+       struct fsverity_info *vi = inode->i_verity_info;
        const unsigned int block_size = vi->tree_params.block_size;
        u64 pos = (u64)data_folio->index << PAGE_SHIFT;
 
@@ -278,11 +257,14 @@ verify_data_blocks(struct inode *inode, struct fsverity_info *vi,
                         folio_test_uptodate(data_folio)))
                return false;
        do {
-               struct page *data_page =
-                       folio_page(data_folio, offset >> PAGE_SHIFT);
-
-               if (!verify_data_block(inode, vi, req, data_page, pos + offset,
-                                      offset & ~PAGE_MASK, max_ra_pages))
+               void *data;
+               bool valid;
+
+               data = kmap_local_folio(data_folio, offset);
+               valid = verify_data_block(inode, vi, data, pos + offset,
+                                         max_ra_pages);
+               kunmap_local(data);
+               if (!valid)
                        return false;
                offset += block_size;
                len -= block_size;
@@ -304,19 +286,7 @@ verify_data_blocks(struct inode *inode, struct fsverity_info *vi,
  */
 bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset)
 {
-       struct inode *inode = folio->mapping->host;
-       struct fsverity_info *vi = inode->i_verity_info;
-       struct ahash_request *req;
-       bool valid;
-
-       /* This allocation never fails, since it's mempool-backed. */
-       req = fsverity_alloc_hash_request(vi->tree_params.hash_alg, GFP_NOFS);
-
-       valid = verify_data_blocks(inode, vi, req, folio, len, offset, 0);
-
-       fsverity_free_hash_request(vi->tree_params.hash_alg, req);
-
-       return valid;
+       return verify_data_blocks(folio, len, offset, 0);
 }
 EXPORT_SYMBOL_GPL(fsverity_verify_blocks);
 
@@ -337,15 +307,9 @@ EXPORT_SYMBOL_GPL(fsverity_verify_blocks);
  */
 void fsverity_verify_bio(struct bio *bio)
 {
-       struct inode *inode = bio_first_page_all(bio)->mapping->host;
-       struct fsverity_info *vi = inode->i_verity_info;
-       struct ahash_request *req;
        struct folio_iter fi;
        unsigned long max_ra_pages = 0;
 
-       /* This allocation never fails, since it's mempool-backed. */
-       req = fsverity_alloc_hash_request(vi->tree_params.hash_alg, GFP_NOFS);
-
        if (bio->bi_opf & REQ_RAHEAD) {
                /*
                 * If this bio is for data readahead, then we also do readahead
@@ -360,14 +324,12 @@ void fsverity_verify_bio(struct bio *bio)
        }
 
        bio_for_each_folio_all(fi, bio) {
-               if (!verify_data_blocks(inode, vi, req, fi.folio, fi.length,
-                                       fi.offset, max_ra_pages)) {
+               if (!verify_data_blocks(fi.folio, fi.length, fi.offset,
+                                       max_ra_pages)) {
                        bio->bi_status = BLK_STS_IOERR;
                        break;
                }
        }
-
-       fsverity_free_hash_request(vi->tree_params.hash_alg, req);
 }
 EXPORT_SYMBOL_GPL(fsverity_verify_bio);
 #endif /* CONFIG_BLOCK */
index fcf67d8..e7bbb7f 100644 (file)
@@ -985,9 +985,16 @@ int xattr_list_one(char **buffer, ssize_t *remaining_size, const char *name)
        return 0;
 }
 
-/*
+/**
+ * generic_listxattr - run through a dentry's xattr list() operations
+ * @dentry: dentry to list the xattrs
+ * @buffer: result buffer
+ * @buffer_size: size of @buffer
+ *
  * Combine the results of the list() operation from every xattr_handler in the
- * list.
+ * xattr_handler stack.
+ *
+ * Note that this will not include the entries for POSIX ACLs.
  */
 ssize_t
 generic_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
@@ -996,10 +1003,6 @@ generic_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
        ssize_t remaining_size = buffer_size;
        int err = 0;
 
-       err = posix_acl_listxattr(d_inode(dentry), &buffer, &remaining_size);
-       if (err)
-               return err;
-
        for_each_xattr_handler(handlers, handler) {
                if (!handler->name || (handler->list && !handler->list(dentry)))
                        continue;
index 1b078bb..ee84835 100644 (file)
@@ -495,10 +495,12 @@ xfs_freesp_init_recs(
                ASSERT(start >= mp->m_ag_prealloc_blocks);
                if (start != mp->m_ag_prealloc_blocks) {
                        /*
-                        * Modify first record to pad stripe align of log
+                        * Modify first record to pad stripe align of log and
+                        * bump the record count.
                         */
                        arec->ar_blockcount = cpu_to_be32(start -
                                                mp->m_ag_prealloc_blocks);
+                       be16_add_cpu(&block->bb_numrecs, 1);
                        nrec = arec + 1;
 
                        /*
@@ -509,7 +511,6 @@ xfs_freesp_init_recs(
                                        be32_to_cpu(arec->ar_startblock) +
                                        be32_to_cpu(arec->ar_blockcount));
                        arec = nrec;
-                       be16_add_cpu(&block->bb_numrecs, 1);
                }
                /*
                 * Change record start to after the internal log
@@ -518,15 +519,13 @@ xfs_freesp_init_recs(
        }
 
        /*
-        * Calculate the record block count and check for the case where
-        * the log might have consumed all available space in the AG. If
-        * so, reset the record count to 0 to avoid exposure of an invalid
-        * record start block.
+        * Calculate the block count of this record; if it is nonzero,
+        * increment the record count.
         */
        arec->ar_blockcount = cpu_to_be32(id->agsize -
                                          be32_to_cpu(arec->ar_startblock));
-       if (!arec->ar_blockcount)
-               block->bb_numrecs = 0;
+       if (arec->ar_blockcount)
+               be16_add_cpu(&block->bb_numrecs, 1);
 }
 
 /*
@@ -538,7 +537,7 @@ xfs_bnoroot_init(
        struct xfs_buf          *bp,
        struct aghdr_init_data  *id)
 {
-       xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, id->agno);
+       xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 0, id->agno);
        xfs_freesp_init_recs(mp, bp, id);
 }
 
@@ -548,7 +547,7 @@ xfs_cntroot_init(
        struct xfs_buf          *bp,
        struct aghdr_init_data  *id)
 {
-       xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, id->agno);
+       xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 0, id->agno);
        xfs_freesp_init_recs(mp, bp, id);
 }
 
@@ -985,7 +984,10 @@ xfs_ag_shrink_space(
                if (err2 != -ENOSPC)
                        goto resv_err;
 
-               __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, true);
+               err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL,
+                               true);
+               if (err2)
+                       goto resv_err;
 
                /*
                 * Roll the transaction before trying to re-init the per-ag
index fdfa08c..c20fe99 100644 (file)
@@ -628,6 +628,25 @@ xfs_alloc_fixup_trees(
        return 0;
 }
 
+/*
+ * We do not verify the AGFL contents against AGF-based index counters here,
+ * even though we may have access to the perag that contains shadow copies. We
+ * don't know if the AGF based counters have been checked, and if they have they
+ * still may be inconsistent because they haven't yet been reset on the first
+ * allocation after the AGF has been read in.
+ *
+ * This means we can only check that all agfl entries contain valid or null
+ * values because we can't reliably determine the active range to exclude
+ * NULLAGBNO as a valid value.
+ *
+ * However, we can't even do that for v4 format filesystems because there are
+ * old versions of mkfs out there that does not initialise the AGFL to known,
+ * verifiable values. HEnce we can't tell the difference between a AGFL block
+ * allocated by mkfs and a corrupted AGFL block here on v4 filesystems.
+ *
+ * As a result, we can only fully validate AGFL block numbers when we pull them
+ * from the freelist in xfs_alloc_get_freelist().
+ */
 static xfs_failaddr_t
 xfs_agfl_verify(
        struct xfs_buf  *bp)
@@ -637,12 +656,6 @@ xfs_agfl_verify(
        __be32          *agfl_bno = xfs_buf_to_agfl_bno(bp);
        int             i;
 
-       /*
-        * There is no verification of non-crc AGFLs because mkfs does not
-        * initialise the AGFL to zero or NULL. Hence the only valid part of the
-        * AGFL is what the AGF says is active. We can't get to the AGF, so we
-        * can't verify just those entries are valid.
-        */
        if (!xfs_has_crc(mp))
                return NULL;
 
@@ -2321,12 +2334,16 @@ xfs_free_agfl_block(
 }
 
 /*
- * Check the agfl fields of the agf for inconsistency or corruption. The purpose
- * is to detect an agfl header padding mismatch between current and early v5
- * kernels. This problem manifests as a 1-slot size difference between the
- * on-disk flcount and the active [first, last] range of a wrapped agfl. This
- * may also catch variants of agfl count corruption unrelated to padding. Either
- * way, we'll reset the agfl and warn the user.
+ * Check the agfl fields of the agf for inconsistency or corruption.
+ *
+ * The original purpose was to detect an agfl header padding mismatch between
+ * current and early v5 kernels. This problem manifests as a 1-slot size
+ * difference between the on-disk flcount and the active [first, last] range of
+ * a wrapped agfl.
+ *
+ * However, we need to use these same checks to catch agfl count corruptions
+ * unrelated to padding. This could occur on any v4 or v5 filesystem, so either
+ * way, we need to reset the agfl and warn the user.
  *
  * Return true if a reset is required before the agfl can be used, false
  * otherwise.
@@ -2342,10 +2359,6 @@ xfs_agfl_needs_reset(
        int                     agfl_size = xfs_agfl_size(mp);
        int                     active;
 
-       /* no agfl header on v4 supers */
-       if (!xfs_has_crc(mp))
-               return false;
-
        /*
         * The agf read verifier catches severe corruption of these fields.
         * Repeat some sanity checks to cover a packed -> unpacked mismatch if
@@ -2418,7 +2431,7 @@ xfs_agfl_reset(
  * the real allocation can proceed. Deferring the free disconnects freeing up
  * the AGFL slot from freeing the block.
  */
-STATIC void
+static int
 xfs_defer_agfl_block(
        struct xfs_trans                *tp,
        xfs_agnumber_t                  agno,
@@ -2437,17 +2450,21 @@ xfs_defer_agfl_block(
        xefi->xefi_blockcount = 1;
        xefi->xefi_owner = oinfo->oi_owner;
 
+       if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, xefi->xefi_startblock)))
+               return -EFSCORRUPTED;
+
        trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
 
        xfs_extent_free_get_group(mp, xefi);
        xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list);
+       return 0;
 }
 
 /*
  * Add the extent to the list of extents to be free at transaction end.
  * The list is maintained sorted (by block number).
  */
-void
+int
 __xfs_free_extent_later(
        struct xfs_trans                *tp,
        xfs_fsblock_t                   bno,
@@ -2474,6 +2491,9 @@ __xfs_free_extent_later(
 #endif
        ASSERT(xfs_extfree_item_cache != NULL);
 
+       if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len)))
+               return -EFSCORRUPTED;
+
        xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
                               GFP_KERNEL | __GFP_NOFAIL);
        xefi->xefi_startblock = bno;
@@ -2497,6 +2517,7 @@ __xfs_free_extent_later(
 
        xfs_extent_free_get_group(mp, xefi);
        xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list);
+       return 0;
 }
 
 #ifdef DEBUG
@@ -2657,7 +2678,9 @@ xfs_alloc_fix_freelist(
                        goto out_agbp_relse;
 
                /* defer agfl frees */
-               xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo);
+               error = xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo);
+               if (error)
+                       goto out_agbp_relse;
        }
 
        targs.tp = tp;
@@ -2767,6 +2790,9 @@ xfs_alloc_get_freelist(
         */
        agfl_bno = xfs_buf_to_agfl_bno(agflbp);
        bno = be32_to_cpu(agfl_bno[be32_to_cpu(agf->agf_flfirst)]);
+       if (XFS_IS_CORRUPT(tp->t_mountp, !xfs_verify_agbno(pag, bno)))
+               return -EFSCORRUPTED;
+
        be32_add_cpu(&agf->agf_flfirst, 1);
        xfs_trans_brelse(tp, agflbp);
        if (be32_to_cpu(agf->agf_flfirst) == xfs_agfl_size(mp))
@@ -2889,6 +2915,19 @@ xfs_alloc_put_freelist(
        return 0;
 }
 
+/*
+ * Verify the AGF is consistent.
+ *
+ * We do not verify the AGFL indexes in the AGF are fully consistent here
+ * because of issues with variable on-disk structure sizes. Instead, we check
+ * the agfl indexes for consistency when we initialise the perag from the AGF
+ * information after a read completes.
+ *
+ * If the index is inconsistent, then we mark the perag as needing an AGFL
+ * reset. The first AGFL update performed then resets the AGFL indexes and
+ * refills the AGFL with known good free blocks, allowing the filesystem to
+ * continue operating normally at the cost of a few leaked free space blocks.
+ */
 static xfs_failaddr_t
 xfs_agf_verify(
        struct xfs_buf          *bp)
@@ -2962,7 +3001,6 @@ xfs_agf_verify(
                return __this_address;
 
        return NULL;
-
 }
 
 static void
@@ -3187,7 +3225,8 @@ xfs_alloc_vextent_check_args(
  */
 static int
 xfs_alloc_vextent_prepare_ag(
-       struct xfs_alloc_arg    *args)
+       struct xfs_alloc_arg    *args,
+       uint32_t                flags)
 {
        bool                    need_pag = !args->pag;
        int                     error;
@@ -3196,7 +3235,7 @@ xfs_alloc_vextent_prepare_ag(
                args->pag = xfs_perag_get(args->mp, args->agno);
 
        args->agbp = NULL;
-       error = xfs_alloc_fix_freelist(args, 0);
+       error = xfs_alloc_fix_freelist(args, flags);
        if (error) {
                trace_xfs_alloc_vextent_nofix(args);
                if (need_pag)
@@ -3336,7 +3375,7 @@ xfs_alloc_vextent_this_ag(
                return error;
        }
 
-       error = xfs_alloc_vextent_prepare_ag(args);
+       error = xfs_alloc_vextent_prepare_ag(args, 0);
        if (!error && args->agbp)
                error = xfs_alloc_ag_vextent_size(args);
 
@@ -3380,7 +3419,7 @@ restart:
        for_each_perag_wrap_range(mp, start_agno, restart_agno,
                        mp->m_sb.sb_agcount, agno, args->pag) {
                args->agno = agno;
-               error = xfs_alloc_vextent_prepare_ag(args);
+               error = xfs_alloc_vextent_prepare_ag(args, flags);
                if (error)
                        break;
                if (!args->agbp) {
@@ -3546,7 +3585,7 @@ xfs_alloc_vextent_exact_bno(
                return error;
        }
 
-       error = xfs_alloc_vextent_prepare_ag(args);
+       error = xfs_alloc_vextent_prepare_ag(args, 0);
        if (!error && args->agbp)
                error = xfs_alloc_ag_vextent_exact(args);
 
@@ -3587,7 +3626,7 @@ xfs_alloc_vextent_near_bno(
        if (needs_perag)
                args->pag = xfs_perag_grab(mp, args->agno);
 
-       error = xfs_alloc_vextent_prepare_ag(args);
+       error = xfs_alloc_vextent_prepare_ag(args, 0);
        if (!error && args->agbp)
                error = xfs_alloc_ag_vextent_near(args);
 
index 5dbb255..85ac470 100644 (file)
@@ -230,7 +230,7 @@ xfs_buf_to_agfl_bno(
        return bp->b_addr;
 }
 
-void __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno,
+int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno,
                xfs_filblks_t len, const struct xfs_owner_info *oinfo,
                bool skip_discard);
 
@@ -254,14 +254,14 @@ void xfs_extent_free_get_group(struct xfs_mount *mp,
 #define XFS_EFI_ATTR_FORK      (1U << 1) /* freeing attr fork block */
 #define XFS_EFI_BMBT_BLOCK     (1U << 2) /* freeing bmap btree block */
 
-static inline void
+static inline int
 xfs_free_extent_later(
        struct xfs_trans                *tp,
        xfs_fsblock_t                   bno,
        xfs_filblks_t                   len,
        const struct xfs_owner_info     *oinfo)
 {
-       __xfs_free_extent_later(tp, bno, len, oinfo, false);
+       return __xfs_free_extent_later(tp, bno, len, oinfo, false);
 }
 
 
index b512de0..fef3569 100644 (file)
@@ -572,8 +572,12 @@ xfs_bmap_btree_to_extents(
        cblock = XFS_BUF_TO_BLOCK(cbp);
        if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
                return error;
+
        xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
-       xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo);
+       error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo);
+       if (error)
+               return error;
+
        ip->i_nblocks--;
        xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
        xfs_trans_binval(tp, cbp);
@@ -3494,8 +3498,10 @@ xfs_bmap_btalloc_at_eof(
                if (!caller_pag)
                        args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
                error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
-               if (!caller_pag)
+               if (!caller_pag) {
                        xfs_perag_put(args->pag);
+                       args->pag = NULL;
+               }
                if (error)
                        return error;
 
@@ -3505,7 +3511,6 @@ xfs_bmap_btalloc_at_eof(
                 * Exact allocation failed. Reset to try an aligned allocation
                 * according to the original allocation specification.
                 */
-               args->pag = NULL;
                args->alignment = stripe_align;
                args->minlen = nextminlen;
                args->minalignslop = 0;
@@ -5229,10 +5234,12 @@ xfs_bmap_del_extent_real(
                if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
                        xfs_refcount_decrease_extent(tp, del);
                } else {
-                       __xfs_free_extent_later(tp, del->br_startblock,
+                       error = __xfs_free_extent_later(tp, del->br_startblock,
                                        del->br_blockcount, NULL,
                                        (bflags & XFS_BMAPI_NODISCARD) ||
                                        del->br_state == XFS_EXT_UNWRITTEN);
+                       if (error)
+                               goto done;
                }
        }
 
index 1b40e5f..36564ae 100644 (file)
@@ -268,11 +268,14 @@ xfs_bmbt_free_block(
        struct xfs_trans        *tp = cur->bc_tp;
        xfs_fsblock_t           fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp));
        struct xfs_owner_info   oinfo;
+       int                     error;
 
        xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork);
-       xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo);
-       ip->i_nblocks--;
+       error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo);
+       if (error)
+               return error;
 
+       ip->i_nblocks--;
        xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
        xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
        return 0;
index a16d5de..34600f9 100644 (file)
@@ -1834,7 +1834,7 @@ retry:
  * might be sparse and only free the regions that are allocated as part of the
  * chunk.
  */
-STATIC void
+static int
 xfs_difree_inode_chunk(
        struct xfs_trans                *tp,
        xfs_agnumber_t                  agno,
@@ -1851,10 +1851,10 @@ xfs_difree_inode_chunk(
 
        if (!xfs_inobt_issparse(rec->ir_holemask)) {
                /* not sparse, calculate extent info directly */
-               xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, sagbno),
-                                 M_IGEO(mp)->ialloc_blks,
-                                 &XFS_RMAP_OINFO_INODES);
-               return;
+               return xfs_free_extent_later(tp,
+                               XFS_AGB_TO_FSB(mp, agno, sagbno),
+                               M_IGEO(mp)->ialloc_blks,
+                               &XFS_RMAP_OINFO_INODES);
        }
 
        /* holemask is only 16-bits (fits in an unsigned long) */
@@ -1871,6 +1871,8 @@ xfs_difree_inode_chunk(
                                                XFS_INOBT_HOLEMASK_BITS);
        nextbit = startidx + 1;
        while (startidx < XFS_INOBT_HOLEMASK_BITS) {
+               int error;
+
                nextbit = find_next_zero_bit(holemask, XFS_INOBT_HOLEMASK_BITS,
                                             nextbit);
                /*
@@ -1896,8 +1898,11 @@ xfs_difree_inode_chunk(
 
                ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
                ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
-               xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, agbno),
-                                 contigblk, &XFS_RMAP_OINFO_INODES);
+               error = xfs_free_extent_later(tp,
+                               XFS_AGB_TO_FSB(mp, agno, agbno),
+                               contigblk, &XFS_RMAP_OINFO_INODES);
+               if (error)
+                       return error;
 
                /* reset range to current bit and carry on... */
                startidx = endidx = nextbit;
@@ -1905,6 +1910,7 @@ xfs_difree_inode_chunk(
 next:
                nextbit++;
        }
+       return 0;
 }
 
 STATIC int
@@ -2003,7 +2009,9 @@ xfs_difree_inobt(
                        goto error0;
                }
 
-               xfs_difree_inode_chunk(tp, pag->pag_agno, &rec);
+               error = xfs_difree_inode_chunk(tp, pag->pag_agno, &rec);
+               if (error)
+                       goto error0;
        } else {
                xic->deleted = false;
 
index f13e080..269573c 100644 (file)
@@ -324,7 +324,6 @@ struct xfs_inode_log_format_32 {
 #define XFS_ILOG_DOWNER        0x200   /* change the data fork owner on replay */
 #define XFS_ILOG_AOWNER        0x400   /* change the attr fork owner on replay */
 
-
 /*
  * The timestamps are dirty, but not necessarily anything else in the inode
  * core.  Unlike the other fields above this one must never make it to disk
@@ -333,6 +332,14 @@ struct xfs_inode_log_format_32 {
  */
 #define XFS_ILOG_TIMESTAMP     0x4000
 
+/*
+ * The version field has been changed, but not necessarily anything else of
+ * interest. This must never make it to disk - it is used purely to ensure that
+ * the inode item ->precommit operation can update the fsync flag triggers
+ * in the inode item correctly.
+ */
+#define XFS_ILOG_IVERSION      0x8000
+
 #define        XFS_ILOG_NONCORE        (XFS_ILOG_DDATA | XFS_ILOG_DEXT | \
                                 XFS_ILOG_DBROOT | XFS_ILOG_DEV | \
                                 XFS_ILOG_ADATA | XFS_ILOG_AEXT | \
index c1c6577..b6e2143 100644 (file)
@@ -1151,8 +1151,10 @@ xfs_refcount_adjust_extents(
                                fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
                                                cur->bc_ag.pag->pag_agno,
                                                tmp.rc_startblock);
-                               xfs_free_extent_later(cur->bc_tp, fsbno,
+                               error = xfs_free_extent_later(cur->bc_tp, fsbno,
                                                  tmp.rc_blockcount, NULL);
+                               if (error)
+                                       goto out_error;
                        }
 
                        (*agbno) += tmp.rc_blockcount;
@@ -1210,8 +1212,10 @@ xfs_refcount_adjust_extents(
                        fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
                                        cur->bc_ag.pag->pag_agno,
                                        ext.rc_startblock);
-                       xfs_free_extent_later(cur->bc_tp, fsbno,
+                       error = xfs_free_extent_later(cur->bc_tp, fsbno,
                                        ext.rc_blockcount, NULL);
+                       if (error)
+                               goto out_error;
                }
 
 skip:
@@ -1976,7 +1980,10 @@ xfs_refcount_recover_cow_leftovers(
                                rr->rr_rrec.rc_blockcount);
 
                /* Free the block. */
-               xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL);
+               error = xfs_free_extent_later(tp, fsb,
+                               rr->rr_rrec.rc_blockcount, NULL);
+               if (error)
+                       goto out_trans;
 
                error = xfs_trans_commit(tp);
                if (error)
index 8b55470..cb4796b 100644 (file)
@@ -40,9 +40,8 @@ xfs_trans_ijoin(
        iip->ili_lock_flags = lock_flags;
        ASSERT(!xfs_iflags_test(ip, XFS_ISTALE));
 
-       /*
-        * Get a log_item_desc to point at the new item.
-        */
+       /* Reset the per-tx dirty context and add the item to the tx. */
+       iip->ili_dirty_flags = 0;
        xfs_trans_add_item(tp, &iip->ili_item);
 }
 
@@ -76,17 +75,10 @@ xfs_trans_ichgtime(
 /*
  * This is called to mark the fields indicated in fieldmask as needing to be
  * logged when the transaction is committed.  The inode must already be
- * associated with the given transaction.
- *
- * The values for fieldmask are defined in xfs_inode_item.h.  We always log all
- * of the core inode if any of it has changed, and we always log all of the
- * inline data/extents/b-tree root if any of them has changed.
- *
- * Grab and pin the cluster buffer associated with this inode to avoid RMW
- * cycles at inode writeback time. Avoid the need to add error handling to every
- * xfs_trans_log_inode() call by shutting down on read error.  This will cause
- * transactions to fail and everything to error out, just like if we return a
- * read error in a dirty transaction and cancel it.
+ * associated with the given transaction. All we do here is record where the
+ * inode was dirtied and mark the transaction and inode log item dirty;
+ * everything else is done in the ->precommit log item operation after the
+ * changes in the transaction have been completed.
  */
 void
 xfs_trans_log_inode(
@@ -96,7 +88,6 @@ xfs_trans_log_inode(
 {
        struct xfs_inode_log_item *iip = ip->i_itemp;
        struct inode            *inode = VFS_I(ip);
-       uint                    iversion_flags = 0;
 
        ASSERT(iip);
        ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
@@ -105,18 +96,6 @@ xfs_trans_log_inode(
        tp->t_flags |= XFS_TRANS_DIRTY;
 
        /*
-        * Don't bother with i_lock for the I_DIRTY_TIME check here, as races
-        * don't matter - we either will need an extra transaction in 24 hours
-        * to log the timestamps, or will clear already cleared fields in the
-        * worst case.
-        */
-       if (inode->i_state & I_DIRTY_TIME) {
-               spin_lock(&inode->i_lock);
-               inode->i_state &= ~I_DIRTY_TIME;
-               spin_unlock(&inode->i_lock);
-       }
-
-       /*
         * First time we log the inode in a transaction, bump the inode change
         * counter if it is configured for this to occur. While we have the
         * inode locked exclusively for metadata modification, we can usually
@@ -128,86 +107,10 @@ xfs_trans_log_inode(
        if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) {
                if (IS_I_VERSION(inode) &&
                    inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE))
-                       iversion_flags = XFS_ILOG_CORE;
-       }
-
-       /*
-        * If we're updating the inode core or the timestamps and it's possible
-        * to upgrade this inode to bigtime format, do so now.
-        */
-       if ((flags & (XFS_ILOG_CORE | XFS_ILOG_TIMESTAMP)) &&
-           xfs_has_bigtime(ip->i_mount) &&
-           !xfs_inode_has_bigtime(ip)) {
-               ip->i_diflags2 |= XFS_DIFLAG2_BIGTIME;
-               flags |= XFS_ILOG_CORE;
-       }
-
-       /*
-        * Inode verifiers do not check that the extent size hint is an integer
-        * multiple of the rt extent size on a directory with both rtinherit
-        * and extszinherit flags set.  If we're logging a directory that is
-        * misconfigured in this way, clear the hint.
-        */
-       if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
-           (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
-           (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
-               ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
-                                  XFS_DIFLAG_EXTSZINHERIT);
-               ip->i_extsize = 0;
-               flags |= XFS_ILOG_CORE;
+                       flags |= XFS_ILOG_IVERSION;
        }
 
-       /*
-        * Record the specific change for fdatasync optimisation. This allows
-        * fdatasync to skip log forces for inodes that are only timestamp
-        * dirty.
-        */
-       spin_lock(&iip->ili_lock);
-       iip->ili_fsync_fields |= flags;
-
-       if (!iip->ili_item.li_buf) {
-               struct xfs_buf  *bp;
-               int             error;
-
-               /*
-                * We hold the ILOCK here, so this inode is not going to be
-                * flushed while we are here. Further, because there is no
-                * buffer attached to the item, we know that there is no IO in
-                * progress, so nothing will clear the ili_fields while we read
-                * in the buffer. Hence we can safely drop the spin lock and
-                * read the buffer knowing that the state will not change from
-                * here.
-                */
-               spin_unlock(&iip->ili_lock);
-               error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &bp);
-               if (error) {
-                       xfs_force_shutdown(ip->i_mount, SHUTDOWN_META_IO_ERROR);
-                       return;
-               }
-
-               /*
-                * We need an explicit buffer reference for the log item but
-                * don't want the buffer to remain attached to the transaction.
-                * Hold the buffer but release the transaction reference once
-                * we've attached the inode log item to the buffer log item
-                * list.
-                */
-               xfs_buf_hold(bp);
-               spin_lock(&iip->ili_lock);
-               iip->ili_item.li_buf = bp;
-               bp->b_flags |= _XBF_INODES;
-               list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list);
-               xfs_trans_brelse(tp, bp);
-       }
-
-       /*
-        * Always OR in the bits from the ili_last_fields field.  This is to
-        * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines
-        * in the eventual clearing of the ili_fields bits.  See the big comment
-        * in xfs_iflush() for an explanation of this coordination mechanism.
-        */
-       iip->ili_fields |= (flags | iip->ili_last_fields | iversion_flags);
-       spin_unlock(&iip->ili_lock);
+       iip->ili_dirty_flags |= flags;
 }
 
 int
index 87ab9f9..5bf4326 100644 (file)
@@ -42,12 +42,12 @@ xchk_setup_inode_bmap(
        xfs_ilock(sc->ip, XFS_IOLOCK_EXCL);
 
        /*
-        * We don't want any ephemeral data fork updates sitting around
+        * We don't want any ephemeral data/cow fork updates sitting around
         * while we inspect block mappings, so wait for directio to finish
         * and flush dirty data if we have delalloc reservations.
         */
        if (S_ISREG(VFS_I(sc->ip)->i_mode) &&
-           sc->sm->sm_type == XFS_SCRUB_TYPE_BMBTD) {
+           sc->sm->sm_type != XFS_SCRUB_TYPE_BMBTA) {
                struct address_space    *mapping = VFS_I(sc->ip)->i_mapping;
 
                sc->ilock_flags |= XFS_MMAPLOCK_EXCL;
@@ -769,14 +769,14 @@ xchk_are_bmaps_contiguous(
  * mapping or false if there are no more mappings.  Caller must ensure that
  * @info.icur is zeroed before the first call.
  */
-static int
+static bool
 xchk_bmap_iext_iter(
        struct xchk_bmap_info   *info,
        struct xfs_bmbt_irec    *irec)
 {
        struct xfs_bmbt_irec    got;
        struct xfs_ifork        *ifp;
-       xfs_filblks_t           prev_len;
+       unsigned int            nr = 0;
 
        ifp = xfs_ifork_ptr(info->sc->ip, info->whichfork);
 
@@ -790,12 +790,12 @@ xchk_bmap_iext_iter(
                                irec->br_startoff);
                return false;
        }
+       nr++;
 
        /*
         * Iterate subsequent iextent records and merge them with the one
         * that we just read, if possible.
         */
-       prev_len = irec->br_blockcount;
        while (xfs_iext_peek_next_extent(ifp, &info->icur, &got)) {
                if (!xchk_are_bmaps_contiguous(irec, &got))
                        break;
@@ -805,20 +805,21 @@ xchk_bmap_iext_iter(
                                        got.br_startoff);
                        return false;
                }
-
-               /*
-                * Notify the user of mergeable records in the data or attr
-                * forks.  CoW forks only exist in memory so we ignore them.
-                */
-               if (info->whichfork != XFS_COW_FORK &&
-                   prev_len + got.br_blockcount > BMBT_BLOCKCOUNT_MASK)
-                       xchk_ino_set_preen(info->sc, info->sc->ip->i_ino);
+               nr++;
 
                irec->br_blockcount += got.br_blockcount;
-               prev_len = got.br_blockcount;
                xfs_iext_next(ifp, &info->icur);
        }
 
+       /*
+        * If the merged mapping could be expressed with fewer bmbt records
+        * than we actually found, notify the user that this fork could be
+        * optimized.  CoW forks only exist in memory so we ignore them.
+        */
+       if (nr > 1 && info->whichfork != XFS_COW_FORK &&
+           howmany_64(irec->br_blockcount, XFS_MAX_BMBT_EXTLEN) < nr)
+               xchk_ino_set_preen(info->sc, info->sc->ip->i_ino);
+
        return true;
 }
 
index 9aa7966..7a20256 100644 (file)
@@ -1164,32 +1164,6 @@ xchk_metadata_inode_forks(
        return 0;
 }
 
-/* Pause background reaping of resources. */
-void
-xchk_stop_reaping(
-       struct xfs_scrub        *sc)
-{
-       sc->flags |= XCHK_REAPING_DISABLED;
-       xfs_blockgc_stop(sc->mp);
-       xfs_inodegc_stop(sc->mp);
-}
-
-/* Restart background reaping of resources. */
-void
-xchk_start_reaping(
-       struct xfs_scrub        *sc)
-{
-       /*
-        * Readonly filesystems do not perform inactivation or speculative
-        * preallocation, so there's no need to restart the workers.
-        */
-       if (!xfs_is_readonly(sc->mp)) {
-               xfs_inodegc_start(sc->mp);
-               xfs_blockgc_start(sc->mp);
-       }
-       sc->flags &= ~XCHK_REAPING_DISABLED;
-}
-
 /*
  * Enable filesystem hooks (i.e. runtime code patching) before starting a scrub
  * operation.  Callers must not hold any locks that intersect with the CPU
index 18b5f2b..791235c 100644 (file)
@@ -156,8 +156,6 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm)
 }
 
 int xchk_metadata_inode_forks(struct xfs_scrub *sc);
-void xchk_stop_reaping(struct xfs_scrub *sc);
-void xchk_start_reaping(struct xfs_scrub *sc);
 
 /*
  * Setting up a hook to wait for intents to drain is costly -- we have to take
index faa315b..e382a35 100644 (file)
@@ -150,13 +150,6 @@ xchk_setup_fscounters(
        if (error)
                return error;
 
-       /*
-        * Pause background reclaim while we're scrubbing to reduce the
-        * likelihood of background perturbations to the counters throwing off
-        * our calculations.
-        */
-       xchk_stop_reaping(sc);
-
        return xchk_trans_alloc(sc, 0);
 }
 
@@ -454,6 +447,12 @@ xchk_fscounters(
                xchk_set_corrupt(sc);
 
        /*
+        * XXX: We can't quiesce percpu counter updates, so exit early.
+        * This can be re-enabled when we gain exclusive freeze functionality.
+        */
+       return 0;
+
+       /*
         * If ifree exceeds icount by more than the minimum variance then
         * something's probably wrong with the counters.
         */
index 02819be..3d98f60 100644 (file)
@@ -186,8 +186,6 @@ xchk_teardown(
        }
        if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
                mnt_drop_write_file(sc->file);
-       if (sc->flags & XCHK_REAPING_DISABLED)
-               xchk_start_reaping(sc);
        if (sc->buf) {
                if (sc->buf_cleanup)
                        sc->buf_cleanup(sc->buf);
index e719034..e113f2f 100644 (file)
@@ -105,11 +105,10 @@ struct xfs_scrub {
 };
 
 /* XCHK state flags grow up from zero, XREP state flags grown down from 2^31 */
-#define XCHK_TRY_HARDER                (1 << 0)  /* can't get resources, try again */
-#define XCHK_REAPING_DISABLED  (1 << 1)  /* background block reaping paused */
-#define XCHK_FSGATES_DRAIN     (1 << 2)  /* defer ops draining enabled */
-#define XCHK_NEED_DRAIN                (1 << 3)  /* scrub needs to drain defer ops */
-#define XREP_ALREADY_FIXED     (1 << 31) /* checking our repair work */
+#define XCHK_TRY_HARDER                (1U << 0)  /* can't get resources, try again */
+#define XCHK_FSGATES_DRAIN     (1U << 2)  /* defer ops draining enabled */
+#define XCHK_NEED_DRAIN                (1U << 3)  /* scrub needs to drain defer ops */
+#define XREP_ALREADY_FIXED     (1U << 31) /* checking our repair work */
 
 /*
  * The XCHK_FSGATES* flags reflect functionality in the main filesystem that
index 68efd6f..b3894da 100644 (file)
@@ -98,7 +98,6 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_FSCOUNTERS);
 
 #define XFS_SCRUB_STATE_STRINGS \
        { XCHK_TRY_HARDER,                      "try_harder" }, \
-       { XCHK_REAPING_DISABLED,                "reaping_disabled" }, \
        { XCHK_FSGATES_DRAIN,                   "fsgates_drain" }, \
        { XCHK_NEED_DRAIN,                      "need_drain" }, \
        { XREP_ALREADY_FIXED,                   "already_fixed" }
index f032d3a..fbb6755 100644 (file)
@@ -558,7 +558,9 @@ xfs_getbmap(
                if (!xfs_iext_next_extent(ifp, &icur, &got)) {
                        xfs_fileoff_t   end = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
 
-                       out[bmv->bmv_entries - 1].bmv_oflags |= BMV_OF_LAST;
+                       if (bmv->bmv_entries > 0)
+                               out[bmv->bmv_entries - 1].bmv_oflags |=
+                                                               BMV_OF_LAST;
 
                        if (whichfork != XFS_ATTR_FORK && bno < end &&
                            !xfs_getbmap_full(bmv)) {
index df7322e..023d4e0 100644 (file)
@@ -452,10 +452,18 @@ xfs_buf_item_format(
  * This is called to pin the buffer associated with the buf log item in memory
  * so it cannot be written out.
  *
- * We also always take a reference to the buffer log item here so that the bli
- * is held while the item is pinned in memory. This means that we can
- * unconditionally drop the reference count a transaction holds when the
- * transaction is completed.
+ * We take a reference to the buffer log item here so that the BLI life cycle
+ * extends at least until the buffer is unpinned via xfs_buf_item_unpin() and
+ * inserted into the AIL.
+ *
+ * We also need to take a reference to the buffer itself as the BLI unpin
+ * processing requires accessing the buffer after the BLI has dropped the final
+ * BLI reference. See xfs_buf_item_unpin() for an explanation.
+ * If unpins race to drop the final BLI reference and only the
+ * BLI owns a reference to the buffer, then the loser of the race can have the
+ * buffer fgreed from under it (e.g. on shutdown). Taking a buffer reference per
+ * pin count ensures the life cycle of the buffer extends for as
+ * long as we hold the buffer pin reference in xfs_buf_item_unpin().
  */
 STATIC void
 xfs_buf_item_pin(
@@ -470,13 +478,30 @@ xfs_buf_item_pin(
 
        trace_xfs_buf_item_pin(bip);
 
+       xfs_buf_hold(bip->bli_buf);
        atomic_inc(&bip->bli_refcount);
        atomic_inc(&bip->bli_buf->b_pin_count);
 }
 
 /*
- * This is called to unpin the buffer associated with the buf log item which
- * was previously pinned with a call to xfs_buf_item_pin().
+ * This is called to unpin the buffer associated with the buf log item which was
+ * previously pinned with a call to xfs_buf_item_pin().  We enter this function
+ * with a buffer pin count, a buffer reference and a BLI reference.
+ *
+ * We must drop the BLI reference before we unpin the buffer because the AIL
+ * doesn't acquire a BLI reference whenever it accesses it. Therefore if the
+ * refcount drops to zero, the bli could still be AIL resident and the buffer
+ * submitted for I/O at any point before we return. This can result in IO
+ * completion freeing the buffer while we are still trying to access it here.
+ * This race condition can also occur in shutdown situations where we abort and
+ * unpin buffers from contexts other that journal IO completion.
+ *
+ * Hence we have to hold a buffer reference per pin count to ensure that the
+ * buffer cannot be freed until we have finished processing the unpin operation.
+ * The reference is taken in xfs_buf_item_pin(), and we must hold it until we
+ * are done processing the buffer state. In the case of an abort (remove =
+ * true) then we re-use the current pin reference as the IO reference we hand
+ * off to IO failure handling.
  */
 STATIC void
 xfs_buf_item_unpin(
@@ -493,24 +518,18 @@ xfs_buf_item_unpin(
 
        trace_xfs_buf_item_unpin(bip);
 
-       /*
-        * Drop the bli ref associated with the pin and grab the hold required
-        * for the I/O simulation failure in the abort case. We have to do this
-        * before the pin count drops because the AIL doesn't acquire a bli
-        * reference. Therefore if the refcount drops to zero, the bli could
-        * still be AIL resident and the buffer submitted for I/O (and freed on
-        * completion) at any point before we return. This can be removed once
-        * the AIL properly holds a reference on the bli.
-        */
        freed = atomic_dec_and_test(&bip->bli_refcount);
-       if (freed && !stale && remove)
-               xfs_buf_hold(bp);
        if (atomic_dec_and_test(&bp->b_pin_count))
                wake_up_all(&bp->b_waiters);
 
-        /* nothing to do but drop the pin count if the bli is active */
-       if (!freed)
+       /*
+        * Nothing to do but drop the buffer pin reference if the BLI is
+        * still active.
+        */
+       if (!freed) {
+               xfs_buf_rele(bp);
                return;
+       }
 
        if (stale) {
                ASSERT(bip->bli_flags & XFS_BLI_STALE);
@@ -523,6 +542,15 @@ xfs_buf_item_unpin(
                trace_xfs_buf_item_unpin_stale(bip);
 
                /*
+                * The buffer has been locked and referenced since it was marked
+                * stale so we own both lock and reference exclusively here. We
+                * do not need the pin reference any more, so drop it now so
+                * that we only have one reference to drop once item completion
+                * processing is complete.
+                */
+               xfs_buf_rele(bp);
+
+               /*
                 * If we get called here because of an IO error, we may or may
                 * not have the item on the AIL. xfs_trans_ail_delete() will
                 * take care of that situation. xfs_trans_ail_delete() drops
@@ -538,16 +566,30 @@ xfs_buf_item_unpin(
                        ASSERT(bp->b_log_item == NULL);
                }
                xfs_buf_relse(bp);
-       } else if (remove) {
+               return;
+       }
+
+       if (remove) {
                /*
-                * The buffer must be locked and held by the caller to simulate
-                * an async I/O failure. We acquired the hold for this case
-                * before the buffer was unpinned.
+                * We need to simulate an async IO failures here to ensure that
+                * the correct error completion is run on this buffer. This
+                * requires a reference to the buffer and for the buffer to be
+                * locked. We can safely pass ownership of the pin reference to
+                * the IO to ensure that nothing can free the buffer while we
+                * wait for the lock and then run the IO failure completion.
                 */
                xfs_buf_lock(bp);
                bp->b_flags |= XBF_ASYNC;
                xfs_buf_ioend_fail(bp);
+               return;
        }
+
+       /*
+        * BLI has no more active references - it will be moved to the AIL to
+        * manage the remaining BLI/buffer life cycle. There is nothing left for
+        * us to do here so drop the pin reference to the buffer.
+        */
+       xfs_buf_rele(bp);
 }
 
 STATIC uint
index aede746..08d6326 100644 (file)
@@ -306,6 +306,34 @@ xfs_file_read_iter(
        return ret;
 }
 
+STATIC ssize_t
+xfs_file_splice_read(
+       struct file             *in,
+       loff_t                  *ppos,
+       struct pipe_inode_info  *pipe,
+       size_t                  len,
+       unsigned int            flags)
+{
+       struct inode            *inode = file_inode(in);
+       struct xfs_inode        *ip = XFS_I(inode);
+       struct xfs_mount        *mp = ip->i_mount;
+       ssize_t                 ret = 0;
+
+       XFS_STATS_INC(mp, xs_read_calls);
+
+       if (xfs_is_shutdown(mp))
+               return -EIO;
+
+       trace_xfs_file_splice_read(ip, *ppos, len);
+
+       xfs_ilock(ip, XFS_IOLOCK_SHARED);
+       ret = filemap_splice_read(in, ppos, pipe, len, flags);
+       xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+       if (ret > 0)
+               XFS_STATS_ADD(mp, xs_read_bytes, ret);
+       return ret;
+}
+
 /*
  * Common pre-write limit and setup checks.
  *
@@ -1423,7 +1451,7 @@ const struct file_operations xfs_file_operations = {
        .llseek         = xfs_file_llseek,
        .read_iter      = xfs_file_read_iter,
        .write_iter     = xfs_file_write_iter,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = xfs_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .iopoll         = iocb_bio_iopoll,
        .unlocked_ioctl = xfs_file_ioctl,
index 22c1393..2fc98d3 100644 (file)
@@ -78,7 +78,6 @@ restart:
                *longest = 0;
                err = xfs_bmap_longest_free_extent(pag, NULL, longest);
                if (err) {
-                       xfs_perag_rele(pag);
                        if (err != -EAGAIN)
                                break;
                        /* Couldn't lock the AGF, skip this AG. */
index 13851c0..9ebb833 100644 (file)
@@ -534,6 +534,9 @@ xfs_do_force_shutdown(
        } else if (flags & SHUTDOWN_CORRUPT_ONDISK) {
                tag = XFS_PTAG_SHUTDOWN_CORRUPT;
                why = "Corruption of on-disk metadata";
+       } else if (flags & SHUTDOWN_DEVICE_REMOVED) {
+               tag = XFS_PTAG_SHUTDOWN_IOERROR;
+               why = "Block device removal";
        } else {
                tag = XFS_PTAG_SHUTDOWN_IOERROR;
                why = "Metadata I/O Error";
index 351849f..4538909 100644 (file)
@@ -435,18 +435,44 @@ xfs_iget_check_free_state(
 }
 
 /* Make all pending inactivation work start immediately. */
-static void
+static bool
 xfs_inodegc_queue_all(
        struct xfs_mount        *mp)
 {
        struct xfs_inodegc      *gc;
        int                     cpu;
+       bool                    ret = false;
 
        for_each_online_cpu(cpu) {
                gc = per_cpu_ptr(mp->m_inodegc, cpu);
-               if (!llist_empty(&gc->list))
+               if (!llist_empty(&gc->list)) {
                        mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0);
+                       ret = true;
+               }
+       }
+
+       return ret;
+}
+
+/* Wait for all queued work and collect errors */
+static int
+xfs_inodegc_wait_all(
+       struct xfs_mount        *mp)
+{
+       int                     cpu;
+       int                     error = 0;
+
+       flush_workqueue(mp->m_inodegc_wq);
+       for_each_online_cpu(cpu) {
+               struct xfs_inodegc      *gc;
+
+               gc = per_cpu_ptr(mp->m_inodegc, cpu);
+               if (gc->error && !error)
+                       error = gc->error;
+               gc->error = 0;
        }
+
+       return error;
 }
 
 /*
@@ -1486,15 +1512,14 @@ xfs_blockgc_free_space(
        if (error)
                return error;
 
-       xfs_inodegc_flush(mp);
-       return 0;
+       return xfs_inodegc_flush(mp);
 }
 
 /*
  * Reclaim all the free space that we can by scheduling the background blockgc
  * and inodegc workers immediately and waiting for them all to clear.
  */
-void
+int
 xfs_blockgc_flush_all(
        struct xfs_mount        *mp)
 {
@@ -1515,7 +1540,7 @@ xfs_blockgc_flush_all(
        for_each_perag_tag(mp, agno, pag, XFS_ICI_BLOCKGC_TAG)
                flush_delayed_work(&pag->pag_blockgc_work);
 
-       xfs_inodegc_flush(mp);
+       return xfs_inodegc_flush(mp);
 }
 
 /*
@@ -1837,13 +1862,17 @@ xfs_inodegc_set_reclaimable(
  * This is the last chance to make changes to an otherwise unreferenced file
  * before incore reclamation happens.
  */
-static void
+static int
 xfs_inodegc_inactivate(
        struct xfs_inode        *ip)
 {
+       int                     error;
+
        trace_xfs_inode_inactivating(ip);
-       xfs_inactive(ip);
+       error = xfs_inactive(ip);
        xfs_inodegc_set_reclaimable(ip);
+       return error;
+
 }
 
 void
@@ -1856,6 +1885,8 @@ xfs_inodegc_worker(
        struct xfs_inode        *ip, *n;
        unsigned int            nofs_flag;
 
+       ASSERT(gc->cpu == smp_processor_id());
+
        WRITE_ONCE(gc->items, 0);
 
        if (!node)
@@ -1873,8 +1904,12 @@ xfs_inodegc_worker(
 
        WRITE_ONCE(gc->shrinker_hits, 0);
        llist_for_each_entry_safe(ip, n, node, i_gclist) {
+               int     error;
+
                xfs_iflags_set(ip, XFS_INACTIVATING);
-               xfs_inodegc_inactivate(ip);
+               error = xfs_inodegc_inactivate(ip);
+               if (error && !gc->error)
+                       gc->error = error;
        }
 
        memalloc_nofs_restore(nofs_flag);
@@ -1898,35 +1933,52 @@ xfs_inodegc_push(
  * Force all currently queued inode inactivation work to run immediately and
  * wait for the work to finish.
  */
-void
+int
 xfs_inodegc_flush(
        struct xfs_mount        *mp)
 {
        xfs_inodegc_push(mp);
        trace_xfs_inodegc_flush(mp, __return_address);
-       flush_workqueue(mp->m_inodegc_wq);
+       return xfs_inodegc_wait_all(mp);
 }
 
 /*
  * Flush all the pending work and then disable the inode inactivation background
- * workers and wait for them to stop.
+ * workers and wait for them to stop.  Caller must hold sb->s_umount to
+ * coordinate changes in the inodegc_enabled state.
  */
 void
 xfs_inodegc_stop(
        struct xfs_mount        *mp)
 {
+       bool                    rerun;
+
        if (!xfs_clear_inodegc_enabled(mp))
                return;
 
+       /*
+        * Drain all pending inodegc work, including inodes that could be
+        * queued by racing xfs_inodegc_queue or xfs_inodegc_shrinker_scan
+        * threads that sample the inodegc state just prior to us clearing it.
+        * The inodegc flag state prevents new threads from queuing more
+        * inodes, so we queue pending work items and flush the workqueue until
+        * all inodegc lists are empty.  IOWs, we cannot use drain_workqueue
+        * here because it does not allow other unserialized mechanisms to
+        * reschedule inodegc work while this draining is in progress.
+        */
        xfs_inodegc_queue_all(mp);
-       drain_workqueue(mp->m_inodegc_wq);
+       do {
+               flush_workqueue(mp->m_inodegc_wq);
+               rerun = xfs_inodegc_queue_all(mp);
+       } while (rerun);
 
        trace_xfs_inodegc_stop(mp, __return_address);
 }
 
 /*
  * Enable the inode inactivation background workers and schedule deferred inode
- * inactivation work if there is any.
+ * inactivation work if there is any.  Caller must hold sb->s_umount to
+ * coordinate changes in the inodegc_enabled state.
  */
 void
 xfs_inodegc_start(
@@ -2069,7 +2121,8 @@ xfs_inodegc_queue(
                queue_delay = 0;
 
        trace_xfs_inodegc_queue(mp, __return_address);
-       mod_delayed_work(mp->m_inodegc_wq, &gc->work, queue_delay);
+       mod_delayed_work_on(current_cpu(), mp->m_inodegc_wq, &gc->work,
+                       queue_delay);
        put_cpu_ptr(gc);
 
        if (xfs_inodegc_want_flush_work(ip, items, shrinker_hits)) {
@@ -2113,7 +2166,8 @@ xfs_inodegc_cpu_dead(
 
        if (xfs_is_inodegc_enabled(mp)) {
                trace_xfs_inodegc_queue(mp, __return_address);
-               mod_delayed_work(mp->m_inodegc_wq, &gc->work, 0);
+               mod_delayed_work_on(current_cpu(), mp->m_inodegc_wq, &gc->work,
+                               0);
        }
        put_cpu_ptr(gc);
 }
index 8791019..1dcdcb2 100644 (file)
@@ -62,7 +62,7 @@ int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp,
                unsigned int iwalk_flags);
 int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags);
 int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm);
-void xfs_blockgc_flush_all(struct xfs_mount *mp);
+int xfs_blockgc_flush_all(struct xfs_mount *mp);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
@@ -80,7 +80,7 @@ void xfs_blockgc_start(struct xfs_mount *mp);
 
 void xfs_inodegc_worker(struct work_struct *work);
 void xfs_inodegc_push(struct xfs_mount *mp);
-void xfs_inodegc_flush(struct xfs_mount *mp);
+int xfs_inodegc_flush(struct xfs_mount *mp);
 void xfs_inodegc_stop(struct xfs_mount *mp);
 void xfs_inodegc_start(struct xfs_mount *mp);
 void xfs_inodegc_cpu_dead(struct xfs_mount *mp, unsigned int cpu);
index 5808aba..9e62cc5 100644 (file)
@@ -1620,16 +1620,7 @@ xfs_inactive_ifree(
         */
        xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_ICOUNT, -1);
 
-       /*
-        * Just ignore errors at this point.  There is nothing we can do except
-        * to try to keep going. Make sure it's not a silent error.
-        */
-       error = xfs_trans_commit(tp);
-       if (error)
-               xfs_notice(mp, "%s: xfs_trans_commit returned error %d",
-                       __func__, error);
-
-       return 0;
+       return xfs_trans_commit(tp);
 }
 
 /*
@@ -1693,12 +1684,12 @@ xfs_inode_needs_inactive(
  * now be truncated.  Also, we clear all of the read-ahead state
  * kept for the inode here since the file is now closed.
  */
-void
+int
 xfs_inactive(
        xfs_inode_t     *ip)
 {
        struct xfs_mount        *mp;
-       int                     error;
+       int                     error = 0;
        int                     truncate = 0;
 
        /*
@@ -1736,7 +1727,7 @@ xfs_inactive(
                 * reference to the inode at this point anyways.
                 */
                if (xfs_can_free_eofblocks(ip, true))
-                       xfs_free_eofblocks(ip);
+                       error = xfs_free_eofblocks(ip);
 
                goto out;
        }
@@ -1773,7 +1764,7 @@ xfs_inactive(
        /*
         * Free the inode.
         */
-       xfs_inactive_ifree(ip);
+       error = xfs_inactive_ifree(ip);
 
 out:
        /*
@@ -1781,6 +1772,7 @@ out:
         * the attached dquots.
         */
        xfs_qm_dqdetach(ip);
+       return error;
 }
 
 /*
index 69d21e4..7547caf 100644 (file)
@@ -470,7 +470,7 @@ enum layout_break_reason {
        (xfs_has_grpid((pip)->i_mount) || (VFS_I(pip)->i_mode & S_ISGID))
 
 int            xfs_release(struct xfs_inode *ip);
-void           xfs_inactive(struct xfs_inode *ip);
+int            xfs_inactive(struct xfs_inode *ip);
 int            xfs_lookup(struct xfs_inode *dp, const struct xfs_name *name,
                           struct xfs_inode **ipp, struct xfs_name *ci_name);
 int            xfs_create(struct mnt_idmap *idmap,
index ca2941a..91c847a 100644 (file)
@@ -29,6 +29,153 @@ static inline struct xfs_inode_log_item *INODE_ITEM(struct xfs_log_item *lip)
        return container_of(lip, struct xfs_inode_log_item, ili_item);
 }
 
+static uint64_t
+xfs_inode_item_sort(
+       struct xfs_log_item     *lip)
+{
+       return INODE_ITEM(lip)->ili_inode->i_ino;
+}
+
+/*
+ * Prior to finally logging the inode, we have to ensure that all the
+ * per-modification inode state changes are applied. This includes VFS inode
+ * state updates, format conversions, verifier state synchronisation and
+ * ensuring the inode buffer remains in memory whilst the inode is dirty.
+ *
+ * We have to be careful when we grab the inode cluster buffer due to lock
+ * ordering constraints. The unlinked inode modifications (xfs_iunlink_item)
+ * require AGI -> inode cluster buffer lock order. The inode cluster buffer is
+ * not locked until ->precommit, so it happens after everything else has been
+ * modified.
+ *
+ * Further, we have AGI -> AGF lock ordering, and with O_TMPFILE handling we
+ * have AGI -> AGF -> iunlink item -> inode cluster buffer lock order. Hence we
+ * cannot safely lock the inode cluster buffer in xfs_trans_log_inode() because
+ * it can be called on a inode (e.g. via bumplink/droplink) before we take the
+ * AGF lock modifying directory blocks.
+ *
+ * Rather than force a complete rework of all the transactions to call
+ * xfs_trans_log_inode() once and once only at the end of every transaction, we
+ * move the pinning of the inode cluster buffer to a ->precommit operation. This
+ * matches how the xfs_iunlink_item locks the inode cluster buffer, and it
+ * ensures that the inode cluster buffer locking is always done last in a
+ * transaction. i.e. we ensure the lock order is always AGI -> AGF -> inode
+ * cluster buffer.
+ *
+ * If we return the inode number as the precommit sort key then we'll also
+ * guarantee that the order all inode cluster buffer locking is the same all the
+ * inodes and unlink items in the transaction.
+ */
+static int
+xfs_inode_item_precommit(
+       struct xfs_trans        *tp,
+       struct xfs_log_item     *lip)
+{
+       struct xfs_inode_log_item *iip = INODE_ITEM(lip);
+       struct xfs_inode        *ip = iip->ili_inode;
+       struct inode            *inode = VFS_I(ip);
+       unsigned int            flags = iip->ili_dirty_flags;
+
+       /*
+        * Don't bother with i_lock for the I_DIRTY_TIME check here, as races
+        * don't matter - we either will need an extra transaction in 24 hours
+        * to log the timestamps, or will clear already cleared fields in the
+        * worst case.
+        */
+       if (inode->i_state & I_DIRTY_TIME) {
+               spin_lock(&inode->i_lock);
+               inode->i_state &= ~I_DIRTY_TIME;
+               spin_unlock(&inode->i_lock);
+       }
+
+       /*
+        * If we're updating the inode core or the timestamps and it's possible
+        * to upgrade this inode to bigtime format, do so now.
+        */
+       if ((flags & (XFS_ILOG_CORE | XFS_ILOG_TIMESTAMP)) &&
+           xfs_has_bigtime(ip->i_mount) &&
+           !xfs_inode_has_bigtime(ip)) {
+               ip->i_diflags2 |= XFS_DIFLAG2_BIGTIME;
+               flags |= XFS_ILOG_CORE;
+       }
+
+       /*
+        * Inode verifiers do not check that the extent size hint is an integer
+        * multiple of the rt extent size on a directory with both rtinherit
+        * and extszinherit flags set.  If we're logging a directory that is
+        * misconfigured in this way, clear the hint.
+        */
+       if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
+           (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
+           (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
+               ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
+                                  XFS_DIFLAG_EXTSZINHERIT);
+               ip->i_extsize = 0;
+               flags |= XFS_ILOG_CORE;
+       }
+
+       /*
+        * Record the specific change for fdatasync optimisation. This allows
+        * fdatasync to skip log forces for inodes that are only timestamp
+        * dirty. Once we've processed the XFS_ILOG_IVERSION flag, convert it
+        * to XFS_ILOG_CORE so that the actual on-disk dirty tracking
+        * (ili_fields) correctly tracks that the version has changed.
+        */
+       spin_lock(&iip->ili_lock);
+       iip->ili_fsync_fields |= (flags & ~XFS_ILOG_IVERSION);
+       if (flags & XFS_ILOG_IVERSION)
+               flags = ((flags & ~XFS_ILOG_IVERSION) | XFS_ILOG_CORE);
+
+       if (!iip->ili_item.li_buf) {
+               struct xfs_buf  *bp;
+               int             error;
+
+               /*
+                * We hold the ILOCK here, so this inode is not going to be
+                * flushed while we are here. Further, because there is no
+                * buffer attached to the item, we know that there is no IO in
+                * progress, so nothing will clear the ili_fields while we read
+                * in the buffer. Hence we can safely drop the spin lock and
+                * read the buffer knowing that the state will not change from
+                * here.
+                */
+               spin_unlock(&iip->ili_lock);
+               error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &bp);
+               if (error)
+                       return error;
+
+               /*
+                * We need an explicit buffer reference for the log item but
+                * don't want the buffer to remain attached to the transaction.
+                * Hold the buffer but release the transaction reference once
+                * we've attached the inode log item to the buffer log item
+                * list.
+                */
+               xfs_buf_hold(bp);
+               spin_lock(&iip->ili_lock);
+               iip->ili_item.li_buf = bp;
+               bp->b_flags |= _XBF_INODES;
+               list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list);
+               xfs_trans_brelse(tp, bp);
+       }
+
+       /*
+        * Always OR in the bits from the ili_last_fields field.  This is to
+        * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines
+        * in the eventual clearing of the ili_fields bits.  See the big comment
+        * in xfs_iflush() for an explanation of this coordination mechanism.
+        */
+       iip->ili_fields |= (flags | iip->ili_last_fields);
+       spin_unlock(&iip->ili_lock);
+
+       /*
+        * We are done with the log item transaction dirty state, so clear it so
+        * that it doesn't pollute future transactions.
+        */
+       iip->ili_dirty_flags = 0;
+       return 0;
+}
+
 /*
  * The logged size of an inode fork is always the current size of the inode
  * fork. This means that when an inode fork is relogged, the size of the logged
@@ -662,6 +809,8 @@ xfs_inode_item_committing(
 }
 
 static const struct xfs_item_ops xfs_inode_item_ops = {
+       .iop_sort       = xfs_inode_item_sort,
+       .iop_precommit  = xfs_inode_item_precommit,
        .iop_size       = xfs_inode_item_size,
        .iop_format     = xfs_inode_item_format,
        .iop_pin        = xfs_inode_item_pin,
index bbd836a..377e060 100644 (file)
@@ -17,6 +17,7 @@ struct xfs_inode_log_item {
        struct xfs_log_item     ili_item;          /* common portion */
        struct xfs_inode        *ili_inode;        /* inode ptr */
        unsigned short          ili_lock_flags;    /* inode lock flags */
+       unsigned int            ili_dirty_flags;   /* dirty in current tx */
        /*
         * The ili_lock protects the interactions between the dirty state and
         * the flush state of the inode log item. This allows us to do atomic
index 285885c..18c8f16 100644 (file)
@@ -1006,8 +1006,9 @@ xfs_buffered_write_iomap_begin(
        if (eof)
                imap.br_startoff = end_fsb; /* fake hole until the end */
 
-       /* We never need to allocate blocks for zeroing a hole. */
-       if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
+       /* We never need to allocate blocks for zeroing or unsharing a hole. */
+       if ((flags & (IOMAP_UNSHARE | IOMAP_ZERO)) &&
+           imap.br_startoff > offset_fsb) {
                xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
                goto out_unlock;
        }
index 322eb2e..82c81d2 100644 (file)
@@ -2711,7 +2711,9 @@ xlog_recover_iunlink_bucket(
                         * just to flush the inodegc queue and wait for it to
                         * complete.
                         */
-                       xfs_inodegc_flush(mp);
+                       error = xfs_inodegc_flush(mp);
+                       if (error)
+                               break;
                }
 
                prev_agino = agino;
@@ -2719,10 +2721,15 @@ xlog_recover_iunlink_bucket(
        }
 
        if (prev_ip) {
+               int     error2;
+
                ip->i_prev_unlinked = prev_agino;
                xfs_irele(prev_ip);
+
+               error2 = xfs_inodegc_flush(mp);
+               if (error2 && !error)
+                       return error2;
        }
-       xfs_inodegc_flush(mp);
        return error;
 }
 
@@ -2789,7 +2796,6 @@ xlog_recover_iunlink_ag(
                         * bucket and remaining inodes on it unreferenced and
                         * unfreeable.
                         */
-                       xfs_inodegc_flush(pag->pag_mount);
                        xlog_recover_clear_agi_bucket(pag, bucket);
                }
        }
@@ -2806,13 +2812,6 @@ xlog_recover_process_iunlinks(
 
        for_each_perag(log->l_mp, agno, pag)
                xlog_recover_iunlink_ag(pag);
-
-       /*
-        * Flush the pending unlinked inodes to ensure that the inactivations
-        * are fully completed on disk and the incore inodes can be reclaimed
-        * before we signal that recovery is complete.
-        */
-       xfs_inodegc_flush(log->l_mp);
 }
 
 STATIC void
index f3269c0..e2866e7 100644 (file)
@@ -62,10 +62,14 @@ struct xfs_error_cfg {
 struct xfs_inodegc {
        struct llist_head       list;
        struct delayed_work     work;
+       int                     error;
 
        /* approximate count of inodes in the list */
        unsigned int            items;
        unsigned int            shrinker_hits;
+#if defined(DEBUG) || defined(XFS_WARN)
+       unsigned int            cpu;
+#endif
 };
 
 /*
@@ -454,12 +458,14 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, uint32_t flags, char *fname,
 #define SHUTDOWN_FORCE_UMOUNT  (1u << 2) /* shutdown from a forced unmount */
 #define SHUTDOWN_CORRUPT_INCORE        (1u << 3) /* corrupt in-memory structures */
 #define SHUTDOWN_CORRUPT_ONDISK        (1u << 4)  /* corrupt metadata on device */
+#define SHUTDOWN_DEVICE_REMOVED        (1u << 5) /* device removed underneath us */
 
 #define XFS_SHUTDOWN_STRINGS \
        { SHUTDOWN_META_IO_ERROR,       "metadata_io" }, \
        { SHUTDOWN_LOG_IO_ERROR,        "log_io" }, \
        { SHUTDOWN_FORCE_UMOUNT,        "force_umount" }, \
-       { SHUTDOWN_CORRUPT_INCORE,      "corruption" }
+       { SHUTDOWN_CORRUPT_INCORE,      "corruption" }, \
+       { SHUTDOWN_DEVICE_REMOVED,      "device_removed" }
 
 /*
  * Flags for xfs_mountfs
index f5dc46c..abcc559 100644 (file)
@@ -616,8 +616,10 @@ xfs_reflink_cancel_cow_blocks(
                        xfs_refcount_free_cow_extent(*tpp, del.br_startblock,
                                        del.br_blockcount);
 
-                       xfs_free_extent_later(*tpp, del.br_startblock,
+                       error = xfs_free_extent_later(*tpp, del.br_startblock,
                                          del.br_blockcount, NULL);
+                       if (error)
+                               break;
 
                        /* Roll the transaction */
                        error = xfs_defer_finish(tpp);
index 4d2e874..d910b14 100644 (file)
@@ -377,6 +377,17 @@ disable_dax:
        return 0;
 }
 
+static void
+xfs_bdev_mark_dead(
+       struct block_device     *bdev)
+{
+       xfs_force_shutdown(bdev->bd_holder, SHUTDOWN_DEVICE_REMOVED);
+}
+
+static const struct blk_holder_ops xfs_holder_ops = {
+       .mark_dead              = xfs_bdev_mark_dead,
+};
+
 STATIC int
 xfs_blkdev_get(
        xfs_mount_t             *mp,
@@ -385,8 +396,8 @@ xfs_blkdev_get(
 {
        int                     error = 0;
 
-       *bdevp = blkdev_get_by_path(name, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-                                   mp);
+       *bdevp = blkdev_get_by_path(name, BLK_OPEN_READ | BLK_OPEN_WRITE, mp,
+                                   &xfs_holder_ops);
        if (IS_ERR(*bdevp)) {
                error = PTR_ERR(*bdevp);
                xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
@@ -397,10 +408,11 @@ xfs_blkdev_get(
 
 STATIC void
 xfs_blkdev_put(
+       struct xfs_mount        *mp,
        struct block_device     *bdev)
 {
        if (bdev)
-               blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+               blkdev_put(bdev, mp);
 }
 
 STATIC void
@@ -411,13 +423,13 @@ xfs_close_devices(
                struct block_device *logdev = mp->m_logdev_targp->bt_bdev;
 
                xfs_free_buftarg(mp->m_logdev_targp);
-               xfs_blkdev_put(logdev);
+               xfs_blkdev_put(mp, logdev);
        }
        if (mp->m_rtdev_targp) {
                struct block_device *rtdev = mp->m_rtdev_targp->bt_bdev;
 
                xfs_free_buftarg(mp->m_rtdev_targp);
-               xfs_blkdev_put(rtdev);
+               xfs_blkdev_put(mp, rtdev);
        }
        xfs_free_buftarg(mp->m_ddev_targp);
 }
@@ -492,10 +504,10 @@ xfs_open_devices(
  out_free_ddev_targ:
        xfs_free_buftarg(mp->m_ddev_targp);
  out_close_rtdev:
-       xfs_blkdev_put(rtdev);
+       xfs_blkdev_put(mp, rtdev);
  out_close_logdev:
        if (logdev && logdev != ddev)
-               xfs_blkdev_put(logdev);
+               xfs_blkdev_put(mp, logdev);
        return error;
 }
 
@@ -1095,8 +1107,12 @@ xfs_inodegc_init_percpu(
 
        for_each_possible_cpu(cpu) {
                gc = per_cpu_ptr(mp->m_inodegc, cpu);
+#if defined(DEBUG) || defined(XFS_WARN)
+               gc->cpu = cpu;
+#endif
                init_llist_head(&gc->list);
                gc->items = 0;
+               gc->error = 0;
                INIT_DELAYED_WORK(&gc->work, xfs_inodegc_worker);
        }
        return 0;
@@ -1156,6 +1172,13 @@ xfs_fs_free_cached_objects(
        return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
 }
 
+static void
+xfs_fs_shutdown(
+       struct super_block      *sb)
+{
+       xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REMOVED);
+}
+
 static const struct super_operations xfs_super_operations = {
        .alloc_inode            = xfs_fs_alloc_inode,
        .destroy_inode          = xfs_fs_destroy_inode,
@@ -1169,6 +1192,7 @@ static const struct super_operations xfs_super_operations = {
        .show_options           = xfs_fs_show_options,
        .nr_cached_objects      = xfs_fs_nr_cached_objects,
        .free_cached_objects    = xfs_fs_free_cached_objects,
+       .shutdown               = xfs_fs_shutdown,
 };
 
 static int
index cd4ca5b..4db6692 100644 (file)
@@ -1445,7 +1445,6 @@ DEFINE_RW_EVENT(xfs_file_direct_write);
 DEFINE_RW_EVENT(xfs_file_dax_write);
 DEFINE_RW_EVENT(xfs_reflink_bounce_dio_write);
 
-
 DECLARE_EVENT_CLASS(xfs_imap_class,
        TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
                 int whichfork, struct xfs_bmbt_irec *irec),
@@ -1535,6 +1534,7 @@ DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
 DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
 DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_unwritten);
 DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_append);
+DEFINE_SIMPLE_IO_EVENT(xfs_file_splice_read);
 
 DECLARE_EVENT_CLASS(xfs_itrunc_class,
        TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size),
index 8afc0c0..8c0bfc9 100644 (file)
@@ -290,7 +290,9 @@ retry:
                 * Do not perform a synchronous scan because callers can hold
                 * other locks.
                 */
-               xfs_blockgc_flush_all(mp);
+               error = xfs_blockgc_flush_all(mp);
+               if (error)
+                       return error;
                want_retry = false;
                goto retry;
        }
@@ -970,6 +972,11 @@ __xfs_trans_commit(
                error = xfs_defer_finish_noroll(&tp);
                if (error)
                        goto out_unreserve;
+
+               /* Run precommits from final tx in defer chain. */
+               error = xfs_trans_run_precommits(tp);
+               if (error)
+                       goto out_unreserve;
        }
 
        /*
index 132f01d..1451e7b 100644 (file)
@@ -181,7 +181,6 @@ const struct address_space_operations zonefs_file_aops = {
        .migrate_folio          = filemap_migrate_folio,
        .is_partially_uptodate  = iomap_is_partially_uptodate,
        .error_remove_page      = generic_error_remove_page,
-       .direct_IO              = noop_direct_IO,
        .swap_activate          = zonefs_swap_activate,
 };
 
@@ -342,6 +341,77 @@ static loff_t zonefs_file_llseek(struct file *file, loff_t offset, int whence)
        return generic_file_llseek_size(file, offset, whence, isize, isize);
 }
 
+struct zonefs_zone_append_bio {
+       /* The target inode of the BIO */
+       struct inode *inode;
+
+       /* For sync writes, the target append write offset */
+       u64 append_offset;
+
+       /*
+        * This member must come last, bio_alloc_bioset will allocate enough
+        * bytes for entire zonefs_bio but relies on bio being last.
+        */
+       struct bio bio;
+};
+
+static inline struct zonefs_zone_append_bio *
+zonefs_zone_append_bio(struct bio *bio)
+{
+       return container_of(bio, struct zonefs_zone_append_bio, bio);
+}
+
+static void zonefs_file_zone_append_dio_bio_end_io(struct bio *bio)
+{
+       struct zonefs_zone_append_bio *za_bio = zonefs_zone_append_bio(bio);
+       struct zonefs_zone *z = zonefs_inode_zone(za_bio->inode);
+       sector_t za_sector;
+
+       if (bio->bi_status != BLK_STS_OK)
+               goto bio_end;
+
+       /*
+        * If the file zone was written underneath the file system, the zone
+        * append operation can still succedd (if the zone is not full) but
+        * the write append location will not be where we expect it to be.
+        * Check that we wrote where we intended to, that is, at z->z_wpoffset.
+        */
+       za_sector = z->z_sector + (za_bio->append_offset >> SECTOR_SHIFT);
+       if (bio->bi_iter.bi_sector != za_sector) {
+               zonefs_warn(za_bio->inode->i_sb,
+                           "Invalid write sector %llu for zone at %llu\n",
+                           bio->bi_iter.bi_sector, z->z_sector);
+               bio->bi_status = BLK_STS_IOERR;
+       }
+
+bio_end:
+       iomap_dio_bio_end_io(bio);
+}
+
+static void zonefs_file_zone_append_dio_submit_io(const struct iomap_iter *iter,
+                                                 struct bio *bio,
+                                                 loff_t file_offset)
+{
+       struct zonefs_zone_append_bio *za_bio = zonefs_zone_append_bio(bio);
+       struct inode *inode = iter->inode;
+       struct zonefs_zone *z = zonefs_inode_zone(inode);
+
+       /*
+        * Issue a zone append BIO to process sync dio writes. The append
+        * file offset is saved to check the zone append write location
+        * on completion of the BIO.
+        */
+       za_bio->inode = inode;
+       za_bio->append_offset = file_offset;
+
+       bio->bi_opf &= ~REQ_OP_WRITE;
+       bio->bi_opf |= REQ_OP_ZONE_APPEND;
+       bio->bi_iter.bi_sector = z->z_sector;
+       bio->bi_end_io = zonefs_file_zone_append_dio_bio_end_io;
+
+       submit_bio(bio);
+}
+
 static int zonefs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
                                        int error, unsigned int flags)
 {
@@ -372,93 +442,17 @@ static int zonefs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
        return 0;
 }
 
-static const struct iomap_dio_ops zonefs_write_dio_ops = {
-       .end_io                 = zonefs_file_write_dio_end_io,
-};
+static struct bio_set zonefs_zone_append_bio_set;
 
-static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
-{
-       struct inode *inode = file_inode(iocb->ki_filp);
-       struct zonefs_zone *z = zonefs_inode_zone(inode);
-       struct block_device *bdev = inode->i_sb->s_bdev;
-       unsigned int max = bdev_max_zone_append_sectors(bdev);
-       pgoff_t start, end;
-       struct bio *bio;
-       ssize_t size = 0;
-       int nr_pages;
-       ssize_t ret;
-
-       max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize);
-       iov_iter_truncate(from, max);
-
-       /*
-        * If the inode block size (zone write granularity) is smaller than the
-        * page size, we may be appending data belonging to the last page of the
-        * inode straddling inode->i_size, with that page already cached due to
-        * a buffered read or readahead. So make sure to invalidate that page.
-        * This will always be a no-op for the case where the block size is
-        * equal to the page size.
-        */
-       start = iocb->ki_pos >> PAGE_SHIFT;
-       end = (iocb->ki_pos + iov_iter_count(from) - 1) >> PAGE_SHIFT;
-       if (invalidate_inode_pages2_range(inode->i_mapping, start, end))
-               return -EBUSY;
-
-       nr_pages = iov_iter_npages(from, BIO_MAX_VECS);
-       if (!nr_pages)
-               return 0;
-
-       bio = bio_alloc(bdev, nr_pages,
-                       REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE, GFP_NOFS);
-       bio->bi_iter.bi_sector = z->z_sector;
-       bio->bi_ioprio = iocb->ki_ioprio;
-       if (iocb_is_dsync(iocb))
-               bio->bi_opf |= REQ_FUA;
-
-       ret = bio_iov_iter_get_pages(bio, from);
-       if (unlikely(ret))
-               goto out_release;
-
-       size = bio->bi_iter.bi_size;
-       task_io_account_write(size);
-
-       if (iocb->ki_flags & IOCB_HIPRI)
-               bio_set_polled(bio, iocb);
-
-       ret = submit_bio_wait(bio);
-
-       /*
-        * If the file zone was written underneath the file system, the zone
-        * write pointer may not be where we expect it to be, but the zone
-        * append write can still succeed. So check manually that we wrote where
-        * we intended to, that is, at zi->i_wpoffset.
-        */
-       if (!ret) {
-               sector_t wpsector =
-                       z->z_sector + (z->z_wpoffset >> SECTOR_SHIFT);
-
-               if (bio->bi_iter.bi_sector != wpsector) {
-                       zonefs_warn(inode->i_sb,
-                               "Corrupted write pointer %llu for zone at %llu\n",
-                               bio->bi_iter.bi_sector, z->z_sector);
-                       ret = -EIO;
-               }
-       }
-
-       zonefs_file_write_dio_end_io(iocb, size, ret, 0);
-       trace_zonefs_file_dio_append(inode, size, ret);
-
-out_release:
-       bio_release_pages(bio, false);
-       bio_put(bio);
-
-       if (ret >= 0) {
-               iocb->ki_pos += size;
-               return size;
-       }
+static const struct iomap_dio_ops zonefs_zone_append_dio_ops = {
+       .submit_io      = zonefs_file_zone_append_dio_submit_io,
+       .end_io         = zonefs_file_write_dio_end_io,
+       .bio_set        = &zonefs_zone_append_bio_set,
+};
 
-       return ret;
-}
+static const struct iomap_dio_ops zonefs_write_dio_ops = {
+       .end_io         = zonefs_file_write_dio_end_io,
+};
 
 /*
  * Do not exceed the LFS limits nor the file zone size. If pos is under the
@@ -539,6 +533,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
        struct zonefs_inode_info *zi = ZONEFS_I(inode);
        struct zonefs_zone *z = zonefs_inode_zone(inode);
        struct super_block *sb = inode->i_sb;
+       const struct iomap_dio_ops *dio_ops;
        bool sync = is_sync_kiocb(iocb);
        bool append = false;
        ssize_t ret, count;
@@ -582,20 +577,26 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
        }
 
        if (append) {
-               ret = zonefs_file_dio_append(iocb, from);
+               unsigned int max = bdev_max_zone_append_sectors(sb->s_bdev);
+
+               max = ALIGN_DOWN(max << SECTOR_SHIFT, sb->s_blocksize);
+               iov_iter_truncate(from, max);
+
+               dio_ops = &zonefs_zone_append_dio_ops;
        } else {
-               /*
-                * iomap_dio_rw() may return ENOTBLK if there was an issue with
-                * page invalidation. Overwrite that error code with EBUSY to
-                * be consistent with zonefs_file_dio_append() return value for
-                * similar issues.
-                */
-               ret = iomap_dio_rw(iocb, from, &zonefs_write_iomap_ops,
-                                  &zonefs_write_dio_ops, 0, NULL, 0);
-               if (ret == -ENOTBLK)
-                       ret = -EBUSY;
+               dio_ops = &zonefs_write_dio_ops;
        }
 
+       /*
+        * iomap_dio_rw() may return ENOTBLK if there was an issue with
+        * page invalidation. Overwrite that error code with EBUSY so that
+        * the user can make sense of the error.
+        */
+       ret = iomap_dio_rw(iocb, from, &zonefs_write_iomap_ops,
+                          dio_ops, 0, NULL, 0);
+       if (ret == -ENOTBLK)
+               ret = -EBUSY;
+
        if (zonefs_zone_is_seq(z) &&
            (ret > 0 || ret == -EIOCBQUEUED)) {
                if (ret > 0)
@@ -752,6 +753,44 @@ inode_unlock:
        return ret;
 }
 
+static ssize_t zonefs_file_splice_read(struct file *in, loff_t *ppos,
+                                      struct pipe_inode_info *pipe,
+                                      size_t len, unsigned int flags)
+{
+       struct inode *inode = file_inode(in);
+       struct zonefs_inode_info *zi = ZONEFS_I(inode);
+       struct zonefs_zone *z = zonefs_inode_zone(inode);
+       loff_t isize;
+       ssize_t ret = 0;
+
+       /* Offline zones cannot be read */
+       if (unlikely(IS_IMMUTABLE(inode) && !(inode->i_mode & 0777)))
+               return -EPERM;
+
+       if (*ppos >= z->z_capacity)
+               return 0;
+
+       inode_lock_shared(inode);
+
+       /* Limit read operations to written data */
+       mutex_lock(&zi->i_truncate_mutex);
+       isize = i_size_read(inode);
+       if (*ppos >= isize)
+               len = 0;
+       else
+               len = min_t(loff_t, len, isize - *ppos);
+       mutex_unlock(&zi->i_truncate_mutex);
+
+       if (len > 0) {
+               ret = filemap_splice_read(in, ppos, pipe, len, flags);
+               if (ret == -EIO)
+                       zonefs_io_error(inode, false);
+       }
+
+       inode_unlock_shared(inode);
+       return ret;
+}
+
 /*
  * Write open accounting is done only for sequential files.
  */
@@ -813,6 +852,7 @@ static int zonefs_file_open(struct inode *inode, struct file *file)
 {
        int ret;
 
+       file->f_mode |= FMODE_CAN_ODIRECT;
        ret = generic_file_open(inode, file);
        if (ret)
                return ret;
@@ -896,7 +936,19 @@ const struct file_operations zonefs_file_operations = {
        .llseek         = zonefs_file_llseek,
        .read_iter      = zonefs_file_read_iter,
        .write_iter     = zonefs_file_write_iter,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = zonefs_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .iopoll         = iocb_bio_iopoll,
 };
+
+int zonefs_file_bioset_init(void)
+{
+       return bioset_init(&zonefs_zone_append_bio_set, BIO_POOL_SIZE,
+                          offsetof(struct zonefs_zone_append_bio, bio),
+                          BIOSET_NEED_BVECS);
+}
+
+void zonefs_file_bioset_exit(void)
+{
+       bioset_exit(&zonefs_zone_append_bio_set);
+}
index 23b8b29..bbe44a2 100644 (file)
@@ -1128,7 +1128,7 @@ static int zonefs_read_super(struct super_block *sb)
 
        bio_init(&bio, sb->s_bdev, &bio_vec, 1, REQ_OP_READ);
        bio.bi_iter.bi_sector = 0;
-       bio_add_page(&bio, page, PAGE_SIZE, 0);
+       __bio_add_page(&bio, page, PAGE_SIZE, 0);
 
        ret = submit_bio_wait(&bio);
        if (ret)
@@ -1412,10 +1412,14 @@ static int __init zonefs_init(void)
 
        BUILD_BUG_ON(sizeof(struct zonefs_super) != ZONEFS_SUPER_SIZE);
 
-       ret = zonefs_init_inodecache();
+       ret = zonefs_file_bioset_init();
        if (ret)
                return ret;
 
+       ret = zonefs_init_inodecache();
+       if (ret)
+               goto destroy_bioset;
+
        ret = zonefs_sysfs_init();
        if (ret)
                goto destroy_inodecache;
@@ -1430,6 +1434,8 @@ sysfs_exit:
        zonefs_sysfs_exit();
 destroy_inodecache:
        zonefs_destroy_inodecache();
+destroy_bioset:
+       zonefs_file_bioset_exit();
 
        return ret;
 }
@@ -1439,6 +1445,7 @@ static void __exit zonefs_exit(void)
        unregister_filesystem(&zonefs_type);
        zonefs_sysfs_exit();
        zonefs_destroy_inodecache();
+       zonefs_file_bioset_exit();
 }
 
 MODULE_AUTHOR("Damien Le Moal");
index 8175652..f663b8e 100644 (file)
@@ -279,6 +279,8 @@ extern const struct file_operations zonefs_dir_operations;
 extern const struct address_space_operations zonefs_file_aops;
 extern const struct file_operations zonefs_file_operations;
 int zonefs_file_truncate(struct inode *inode, loff_t isize);
+int zonefs_file_bioset_init(void);
+void zonefs_file_bioset_exit(void);
 
 /* In sysfs.c */
 int zonefs_sysfs_register(struct super_block *sb);
index a6affc0..c941d99 100644 (file)
@@ -289,6 +289,8 @@ struct acpi_dep_data {
        acpi_handle supplier;
        acpi_handle consumer;
        bool honor_dep;
+       bool met;
+       bool free_when_met;
 };
 
 /* Performance Management */
index e6098a0..9ffdc04 100644 (file)
@@ -761,6 +761,7 @@ ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status
                                                     acpi_event_status
                                                     *event_status))
 ACPI_HW_DEPENDENT_RETURN_UINT32(u32 acpi_dispatch_gpe(acpi_handle gpe_device, u32 gpe_number))
+ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_hw_disable_all_gpes(void))
 ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_disable_all_gpes(void))
 ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_runtime_gpes(void))
 ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_wakeup_gpes(void))
index e5dfb6f..451f627 100644 (file)
@@ -307,7 +307,8 @@ enum acpi_preferred_pm_profiles {
        PM_SOHO_SERVER = 5,
        PM_APPLIANCE_PC = 6,
        PM_PERFORMANCE_SERVER = 7,
-       PM_TABLET = 8
+       PM_TABLET = 8,
+       NR_PM_PROFILES = 9
 };
 
 /* Values for sleep_status and sleep_control registers (V5+ FADT) */
index e271d67..22142c7 100644 (file)
@@ -130,7 +130,4 @@ ATOMIC_OP(xor, ^)
 #define arch_atomic_read(v)                    READ_ONCE((v)->counter)
 #define arch_atomic_set(v, i)                  WRITE_ONCE(((v)->counter), (i))
 
-#define arch_atomic_xchg(ptr, v)               (arch_xchg(&(ptr)->counter, (u32)(v)))
-#define arch_atomic_cmpxchg(v, old, new)       (arch_cmpxchg(&((v)->counter), (u32)(old), (u32)(new)))
-
 #endif /* __ASM_GENERIC_ATOMIC_H */
index 71ab4ba..e076e07 100644 (file)
@@ -15,21 +15,21 @@ static __always_inline void
 arch_set_bit(unsigned int nr, volatile unsigned long *p)
 {
        p += BIT_WORD(nr);
-       arch_atomic_long_or(BIT_MASK(nr), (atomic_long_t *)p);
+       raw_atomic_long_or(BIT_MASK(nr), (atomic_long_t *)p);
 }
 
 static __always_inline void
 arch_clear_bit(unsigned int nr, volatile unsigned long *p)
 {
        p += BIT_WORD(nr);
-       arch_atomic_long_andnot(BIT_MASK(nr), (atomic_long_t *)p);
+       raw_atomic_long_andnot(BIT_MASK(nr), (atomic_long_t *)p);
 }
 
 static __always_inline void
 arch_change_bit(unsigned int nr, volatile unsigned long *p)
 {
        p += BIT_WORD(nr);
-       arch_atomic_long_xor(BIT_MASK(nr), (atomic_long_t *)p);
+       raw_atomic_long_xor(BIT_MASK(nr), (atomic_long_t *)p);
 }
 
 static __always_inline int
@@ -39,7 +39,7 @@ arch_test_and_set_bit(unsigned int nr, volatile unsigned long *p)
        unsigned long mask = BIT_MASK(nr);
 
        p += BIT_WORD(nr);
-       old = arch_atomic_long_fetch_or(mask, (atomic_long_t *)p);
+       old = raw_atomic_long_fetch_or(mask, (atomic_long_t *)p);
        return !!(old & mask);
 }
 
@@ -50,7 +50,7 @@ arch_test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
        unsigned long mask = BIT_MASK(nr);
 
        p += BIT_WORD(nr);
-       old = arch_atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
+       old = raw_atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
        return !!(old & mask);
 }
 
@@ -61,7 +61,7 @@ arch_test_and_change_bit(unsigned int nr, volatile unsigned long *p)
        unsigned long mask = BIT_MASK(nr);
 
        p += BIT_WORD(nr);
-       old = arch_atomic_long_fetch_xor(mask, (atomic_long_t *)p);
+       old = raw_atomic_long_fetch_xor(mask, (atomic_long_t *)p);
        return !!(old & mask);
 }
 
index 630f2f6..4091351 100644 (file)
@@ -25,7 +25,7 @@ arch_test_and_set_bit_lock(unsigned int nr, volatile unsigned long *p)
        if (READ_ONCE(*p) & mask)
                return 1;
 
-       old = arch_atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
+       old = raw_atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
        return !!(old & mask);
 }
 
@@ -41,7 +41,7 @@ static __always_inline void
 arch_clear_bit_unlock(unsigned int nr, volatile unsigned long *p)
 {
        p += BIT_WORD(nr);
-       arch_atomic_long_fetch_andnot_release(BIT_MASK(nr), (atomic_long_t *)p);
+       raw_atomic_long_fetch_andnot_release(BIT_MASK(nr), (atomic_long_t *)p);
 }
 
 /**
@@ -63,7 +63,7 @@ arch___clear_bit_unlock(unsigned int nr, volatile unsigned long *p)
        p += BIT_WORD(nr);
        old = READ_ONCE(*p);
        old &= ~BIT_MASK(nr);
-       arch_atomic_long_set_release((atomic_long_t *)p, old);
+       raw_atomic_long_set_release((atomic_long_t *)p, old);
 }
 
 /**
@@ -83,7 +83,7 @@ static inline bool arch_clear_bit_unlock_is_negative_byte(unsigned int nr,
        unsigned long mask = BIT_MASK(nr);
 
        p += BIT_WORD(nr);
-       old = arch_atomic_long_fetch_andnot_release(mask, (atomic_long_t *)p);
+       old = raw_atomic_long_fetch_andnot_release(mask, (atomic_long_t *)p);
        return !!(old & BIT(7));
 }
 #define arch_clear_bit_unlock_is_negative_byte arch_clear_bit_unlock_is_negative_byte
diff --git a/include/asm-generic/bugs.h b/include/asm-generic/bugs.h
deleted file mode 100644 (file)
index 6902183..0000000
+++ /dev/null
@@ -1,11 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_GENERIC_BUGS_H
-#define __ASM_GENERIC_BUGS_H
-/*
- * This file is included by 'init/main.c' to check for
- * architecture-dependent bugs.
- */
-
-static inline void check_bugs(void) { }
-
-#endif /* __ASM_GENERIC_BUGS_H */
index 6432a7f..94cbd50 100644 (file)
@@ -89,27 +89,35 @@ do {                                                                        \
        __ret;                                                          \
 })
 
-#define raw_cpu_generic_cmpxchg(pcp, oval, nval)                       \
+#define __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, _cmpxchg)         \
+({                                                                     \
+       typeof(pcp) __val, __old = *(ovalp);                            \
+       __val = _cmpxchg(pcp, __old, nval);                             \
+       if (__val != __old)                                             \
+               *(ovalp) = __val;                                       \
+       __val == __old;                                                 \
+})
+
+#define raw_cpu_generic_try_cmpxchg(pcp, ovalp, nval)                  \
 ({                                                                     \
        typeof(pcp) *__p = raw_cpu_ptr(&(pcp));                         \
-       typeof(pcp) __ret;                                              \
-       __ret = *__p;                                                   \
-       if (__ret == (oval))                                            \
+       typeof(pcp) __val = *__p, ___old = *(ovalp);                    \
+       bool __ret;                                                     \
+       if (__val == ___old) {                                          \
                *__p = nval;                                            \
+               __ret = true;                                           \
+       } else {                                                        \
+               *(ovalp) = __val;                                       \
+               __ret = false;                                          \
+       }                                                               \
        __ret;                                                          \
 })
 
-#define raw_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) \
+#define raw_cpu_generic_cmpxchg(pcp, oval, nval)                       \
 ({                                                                     \
-       typeof(pcp1) *__p1 = raw_cpu_ptr(&(pcp1));                      \
-       typeof(pcp2) *__p2 = raw_cpu_ptr(&(pcp2));                      \
-       int __ret = 0;                                                  \
-       if (*__p1 == (oval1) && *__p2  == (oval2)) {                    \
-               *__p1 = nval1;                                          \
-               *__p2 = nval2;                                          \
-               __ret = 1;                                              \
-       }                                                               \
-       (__ret);                                                        \
+       typeof(pcp) __old = (oval);                                     \
+       raw_cpu_generic_try_cmpxchg(pcp, &__old, nval);                 \
+       __old;                                                          \
 })
 
 #define __this_cpu_generic_read_nopreempt(pcp)                         \
@@ -170,23 +178,22 @@ do {                                                                      \
        __ret;                                                          \
 })
 
-#define this_cpu_generic_cmpxchg(pcp, oval, nval)                      \
+#define this_cpu_generic_try_cmpxchg(pcp, ovalp, nval)                 \
 ({                                                                     \
-       typeof(pcp) __ret;                                              \
+       bool __ret;                                                     \
        unsigned long __flags;                                          \
        raw_local_irq_save(__flags);                                    \
-       __ret = raw_cpu_generic_cmpxchg(pcp, oval, nval);               \
+       __ret = raw_cpu_generic_try_cmpxchg(pcp, ovalp, nval);          \
        raw_local_irq_restore(__flags);                                 \
        __ret;                                                          \
 })
 
-#define this_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)        \
+#define this_cpu_generic_cmpxchg(pcp, oval, nval)                      \
 ({                                                                     \
-       int __ret;                                                      \
+       typeof(pcp) __ret;                                              \
        unsigned long __flags;                                          \
        raw_local_irq_save(__flags);                                    \
-       __ret = raw_cpu_generic_cmpxchg_double(pcp1, pcp2,              \
-                       oval1, oval2, nval1, nval2);                    \
+       __ret = raw_cpu_generic_cmpxchg(pcp, oval, nval);               \
        raw_local_irq_restore(__flags);                                 \
        __ret;                                                          \
 })
@@ -282,6 +289,62 @@ do {                                                                       \
 #define raw_cpu_xchg_8(pcp, nval)      raw_cpu_generic_xchg(pcp, nval)
 #endif
 
+#ifndef raw_cpu_try_cmpxchg_1
+#ifdef raw_cpu_cmpxchg_1
+#define raw_cpu_try_cmpxchg_1(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, raw_cpu_cmpxchg_1)
+#else
+#define raw_cpu_try_cmpxchg_1(pcp, ovalp, nval) \
+       raw_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+#ifndef raw_cpu_try_cmpxchg_2
+#ifdef raw_cpu_cmpxchg_2
+#define raw_cpu_try_cmpxchg_2(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, raw_cpu_cmpxchg_2)
+#else
+#define raw_cpu_try_cmpxchg_2(pcp, ovalp, nval) \
+       raw_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+#ifndef raw_cpu_try_cmpxchg_4
+#ifdef raw_cpu_cmpxchg_4
+#define raw_cpu_try_cmpxchg_4(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, raw_cpu_cmpxchg_4)
+#else
+#define raw_cpu_try_cmpxchg_4(pcp, ovalp, nval) \
+       raw_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+#ifndef raw_cpu_try_cmpxchg_8
+#ifdef raw_cpu_cmpxchg_8
+#define raw_cpu_try_cmpxchg_8(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, raw_cpu_cmpxchg_8)
+#else
+#define raw_cpu_try_cmpxchg_8(pcp, ovalp, nval) \
+       raw_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+
+#ifndef raw_cpu_try_cmpxchg64
+#ifdef raw_cpu_cmpxchg64
+#define raw_cpu_try_cmpxchg64(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, raw_cpu_cmpxchg64)
+#else
+#define raw_cpu_try_cmpxchg64(pcp, ovalp, nval) \
+       raw_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+#ifndef raw_cpu_try_cmpxchg128
+#ifdef raw_cpu_cmpxchg128
+#define raw_cpu_try_cmpxchg128(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, raw_cpu_cmpxchg128)
+#else
+#define raw_cpu_try_cmpxchg128(pcp, ovalp, nval) \
+       raw_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+
 #ifndef raw_cpu_cmpxchg_1
 #define raw_cpu_cmpxchg_1(pcp, oval, nval) \
        raw_cpu_generic_cmpxchg(pcp, oval, nval)
@@ -299,21 +362,13 @@ do {                                                                      \
        raw_cpu_generic_cmpxchg(pcp, oval, nval)
 #endif
 
-#ifndef raw_cpu_cmpxchg_double_1
-#define raw_cpu_cmpxchg_double_1(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       raw_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
-#endif
-#ifndef raw_cpu_cmpxchg_double_2
-#define raw_cpu_cmpxchg_double_2(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       raw_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
-#endif
-#ifndef raw_cpu_cmpxchg_double_4
-#define raw_cpu_cmpxchg_double_4(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       raw_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
+#ifndef raw_cpu_cmpxchg64
+#define raw_cpu_cmpxchg64(pcp, oval, nval) \
+       raw_cpu_generic_cmpxchg(pcp, oval, nval)
 #endif
-#ifndef raw_cpu_cmpxchg_double_8
-#define raw_cpu_cmpxchg_double_8(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       raw_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
+#ifndef raw_cpu_cmpxchg128
+#define raw_cpu_cmpxchg128(pcp, oval, nval) \
+       raw_cpu_generic_cmpxchg(pcp, oval, nval)
 #endif
 
 #ifndef this_cpu_read_1
@@ -407,6 +462,62 @@ do {                                                                       \
 #define this_cpu_xchg_8(pcp, nval)     this_cpu_generic_xchg(pcp, nval)
 #endif
 
+#ifndef this_cpu_try_cmpxchg_1
+#ifdef this_cpu_cmpxchg_1
+#define this_cpu_try_cmpxchg_1(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg_1)
+#else
+#define this_cpu_try_cmpxchg_1(pcp, ovalp, nval) \
+       this_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+#ifndef this_cpu_try_cmpxchg_2
+#ifdef this_cpu_cmpxchg_2
+#define this_cpu_try_cmpxchg_2(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg_2)
+#else
+#define this_cpu_try_cmpxchg_2(pcp, ovalp, nval) \
+       this_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+#ifndef this_cpu_try_cmpxchg_4
+#ifdef this_cpu_cmpxchg_4
+#define this_cpu_try_cmpxchg_4(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg_4)
+#else
+#define this_cpu_try_cmpxchg_4(pcp, ovalp, nval) \
+       this_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+#ifndef this_cpu_try_cmpxchg_8
+#ifdef this_cpu_cmpxchg_8
+#define this_cpu_try_cmpxchg_8(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg_8)
+#else
+#define this_cpu_try_cmpxchg_8(pcp, ovalp, nval) \
+       this_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+
+#ifndef this_cpu_try_cmpxchg64
+#ifdef this_cpu_cmpxchg64
+#define this_cpu_try_cmpxchg64(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg64)
+#else
+#define this_cpu_try_cmpxchg64(pcp, ovalp, nval) \
+       this_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+#ifndef this_cpu_try_cmpxchg128
+#ifdef this_cpu_cmpxchg128
+#define this_cpu_try_cmpxchg128(pcp, ovalp, nval) \
+       __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg128)
+#else
+#define this_cpu_try_cmpxchg128(pcp, ovalp, nval) \
+       this_cpu_generic_try_cmpxchg(pcp, ovalp, nval)
+#endif
+#endif
+
 #ifndef this_cpu_cmpxchg_1
 #define this_cpu_cmpxchg_1(pcp, oval, nval) \
        this_cpu_generic_cmpxchg(pcp, oval, nval)
@@ -424,21 +535,13 @@ do {                                                                      \
        this_cpu_generic_cmpxchg(pcp, oval, nval)
 #endif
 
-#ifndef this_cpu_cmpxchg_double_1
-#define this_cpu_cmpxchg_double_1(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       this_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
-#endif
-#ifndef this_cpu_cmpxchg_double_2
-#define this_cpu_cmpxchg_double_2(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       this_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
-#endif
-#ifndef this_cpu_cmpxchg_double_4
-#define this_cpu_cmpxchg_double_4(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       this_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
+#ifndef this_cpu_cmpxchg64
+#define this_cpu_cmpxchg64(pcp, oval, nval) \
+       this_cpu_generic_cmpxchg(pcp, oval, nval)
 #endif
-#ifndef this_cpu_cmpxchg_double_8
-#define this_cpu_cmpxchg_double_8(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       this_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
+#ifndef this_cpu_cmpxchg128
+#define this_cpu_cmpxchg128(pcp, oval, nval) \
+       this_cpu_generic_cmpxchg(pcp, oval, nval)
 #endif
 
 #endif /* _ASM_GENERIC_PERCPU_H_ */
index d1f57e4..da9e562 100644 (file)
 
 #ifdef CONFIG_UNWINDER_ORC
 #define ORC_UNWIND_TABLE                                               \
+       .orc_header : AT(ADDR(.orc_header) - LOAD_OFFSET) {             \
+               BOUNDED_SECTION_BY(.orc_header, _orc_header)            \
+       }                                                               \
        . = ALIGN(4);                                                   \
        .orc_unwind_ip : AT(ADDR(.orc_unwind_ip) - LOAD_OFFSET) {       \
                BOUNDED_SECTION_BY(.orc_unwind_ip, _orc_unwind_ip)      \
 /*
  * Discard .note.GNU-stack, which is emitted as PROGBITS by the compiler.
  * Otherwise, the type of .notes section would become PROGBITS instead of NOTES.
+ *
+ * Also, discard .note.gnu.property, otherwise it forces the notes section to
+ * be 8-byte aligned which causes alignment mismatches with the kernel's custom
+ * 4-byte aligned notes.
  */
 #define NOTES                                                          \
-       /DISCARD/ : { *(.note.GNU-stack) }                              \
+       /DISCARD/ : {                                                   \
+               *(.note.GNU-stack)                                      \
+               *(.note.gnu.property)                                   \
+       }                                                               \
        .notes : AT(ADDR(.notes) - LOAD_OFFSET) {                       \
                BOUNDED_SECTION_BY(.note.*, _notes)                     \
        } NOTES_HEADERS                                                 \
index 536f897..6cdc873 100644 (file)
@@ -38,8 +38,9 @@ extern void hv_remap_tsc_clocksource(void);
 extern unsigned long hv_get_tsc_pfn(void);
 extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
 
-static inline notrace u64
-hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
+static __always_inline bool
+hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+                    u64 *cur_tsc, u64 *time)
 {
        u64 scale, offset;
        u32 sequence;
@@ -63,7 +64,7 @@ hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
        do {
                sequence = READ_ONCE(tsc_pg->tsc_sequence);
                if (!sequence)
-                       return U64_MAX;
+                       return false;
                /*
                 * Make sure we read sequence before we read other values from
                 * TSC page.
@@ -82,15 +83,8 @@ hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
 
        } while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
 
-       return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
-}
-
-static inline notrace u64
-hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
-{
-       u64 cur_tsc;
-
-       return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
+       *time = mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
+       return true;
 }
 
 #else /* CONFIG_HYPERV_TIMER */
@@ -104,10 +98,10 @@ static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
        return NULL;
 }
 
-static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
-                                      u64 *cur_tsc)
+static __always_inline bool
+hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc, u64 *time)
 {
-       return U64_MAX;
+       return false;
 }
 
 static inline int hv_stimer_cleanup(unsigned int cpu) { return 0; }
index 0b8e6bc..f3b37cb 100644 (file)
 #include <linux/types.h>
 
 typedef struct {
-       u64 a, b;
-} u128;
-
-typedef struct {
        __be64 a, b;
 } be128;
 
@@ -61,20 +57,16 @@ typedef struct {
        __le64 b, a;
 } le128;
 
-static inline void u128_xor(u128 *r, const u128 *p, const u128 *q)
+static inline void be128_xor(be128 *r, const be128 *p, const be128 *q)
 {
        r->a = p->a ^ q->a;
        r->b = p->b ^ q->b;
 }
 
-static inline void be128_xor(be128 *r, const be128 *p, const be128 *q)
-{
-       u128_xor((u128 *)r, (u128 *)p, (u128 *)q);
-}
-
 static inline void le128_xor(le128 *r, const le128 *p, const le128 *q)
 {
-       u128_xor((u128 *)r, (u128 *)p, (u128 *)q);
+       r->a = p->a ^ q->a;
+       r->b = p->b ^ q->b;
 }
 
 #endif /* _CRYPTO_B128OPS_H */
index 358db4a..f8813c1 100644 (file)
 
 #define DP_DSC_MAX_BITS_PER_PIXEL_HI        0x068   /* eDP 1.4 */
 # define DP_DSC_MAX_BITS_PER_PIXEL_HI_MASK  (0x3 << 0)
-# define DP_DSC_MAX_BITS_PER_PIXEL_HI_SHIFT 8
-# define DP_DSC_MAX_BPP_DELTA_VERSION_MASK  0x06
-# define DP_DSC_MAX_BPP_DELTA_AVAILABILITY  0x08
+# define DP_DSC_MAX_BPP_DELTA_VERSION_MASK  (0x3 << 5) /* eDP 1.5 & DP 2.0 */
+# define DP_DSC_MAX_BPP_DELTA_AVAILABILITY  (1 << 7)   /* eDP 1.5 & DP 2.0 */
 
 #define DP_DSC_DEC_COLOR_FORMAT_CAP         0x069
 # define DP_DSC_RGB                         (1 << 0)
index 533d3ee..86f24a7 100644 (file)
@@ -181,9 +181,8 @@ static inline u16
 drm_edp_dsc_sink_output_bpp(const u8 dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE])
 {
        return dsc_dpcd[DP_DSC_MAX_BITS_PER_PIXEL_LOW - DP_DSC_SUPPORT] |
-               (dsc_dpcd[DP_DSC_MAX_BITS_PER_PIXEL_HI - DP_DSC_SUPPORT] &
-                DP_DSC_MAX_BITS_PER_PIXEL_HI_MASK <<
-                DP_DSC_MAX_BITS_PER_PIXEL_HI_SHIFT);
+               ((dsc_dpcd[DP_DSC_MAX_BITS_PER_PIXEL_HI - DP_DSC_SUPPORT] &
+                 DP_DSC_MAX_BITS_PER_PIXEL_HI_MASK) << 8);
 }
 
 static inline u32
index 3598839..ad08f83 100644 (file)
@@ -105,6 +105,22 @@ char *drmm_kstrdup(struct drm_device *dev, const char *s, gfp_t gfp);
 
 void drmm_kfree(struct drm_device *dev, void *data);
 
-int drmm_mutex_init(struct drm_device *dev, struct mutex *lock);
+void __drmm_mutex_release(struct drm_device *dev, void *res);
+
+/**
+ * drmm_mutex_init - &drm_device-managed mutex_init()
+ * @dev: DRM device
+ * @lock: lock to be initialized
+ *
+ * Returns:
+ * 0 on success, or a negative errno code otherwise.
+ *
+ * This is a &drm_device-managed version of mutex_init(). The initialized
+ * lock is automatically destroyed on the final drm_dev_put().
+ */
+#define drmm_mutex_init(dev, lock) ({                                       \
+       mutex_init(lock);                                                    \
+       drmm_add_action_or_reset(dev, __drmm_mutex_release, lock);           \
+})                                                                          \
 
 #endif
index 1bf8e87..867b18e 100644 (file)
 #define SM8150_MMCX    9
 #define SM8150_MMCX_AO 10
 
+/* SA8155P is a special case, kept for backwards compatibility */
+#define SA8155P_CX     SM8150_CX
+#define SA8155P_CX_AO  SM8150_CX_AO
+#define SA8155P_EBI    SM8150_EBI
+#define SA8155P_GFX    SM8150_GFX
+#define SA8155P_MSS    SM8150_MSS
+#define SA8155P_MX     SM8150_MX
+#define SA8155P_MX_AO  SM8150_MX_AO
+
 /* SM8250 Power Domain Indexes */
 #define SM8250_CX      0
 #define SM8250_CX_AO   1
index c0d88b3..c7383e9 100644 (file)
@@ -387,4 +387,96 @@ static inline int kunit_destroy_named_resource(struct kunit *test,
  */
 void kunit_remove_resource(struct kunit *test, struct kunit_resource *res);
 
+/* A 'deferred action' function to be used with kunit_add_action. */
+typedef void (kunit_action_t)(void *);
+
+/**
+ * kunit_add_action() - Call a function when the test ends.
+ * @test: Test case to associate the action with.
+ * @action: The function to run on test exit
+ * @ctx: Data passed into @func
+ *
+ * Defer the execution of a function until the test exits, either normally or
+ * due to a failure.  @ctx is passed as additional context. All functions
+ * registered with kunit_add_action() will execute in the opposite order to that
+ * they were registered in.
+ *
+ * This is useful for cleaning up allocated memory and resources, as these
+ * functions are called even if the test aborts early due to, e.g., a failed
+ * assertion.
+ *
+ * See also: devm_add_action() for the devres equivalent.
+ *
+ * Returns:
+ *   0 on success, an error if the action could not be deferred.
+ */
+int kunit_add_action(struct kunit *test, kunit_action_t *action, void *ctx);
+
+/**
+ * kunit_add_action_or_reset() - Call a function when the test ends.
+ * @test: Test case to associate the action with.
+ * @action: The function to run on test exit
+ * @ctx: Data passed into @func
+ *
+ * Defer the execution of a function until the test exits, either normally or
+ * due to a failure.  @ctx is passed as additional context. All functions
+ * registered with kunit_add_action() will execute in the opposite order to that
+ * they were registered in.
+ *
+ * This is useful for cleaning up allocated memory and resources, as these
+ * functions are called even if the test aborts early due to, e.g., a failed
+ * assertion.
+ *
+ * If the action cannot be created (e.g., due to the system being out of memory),
+ * then action(ctx) will be called immediately, and an error will be returned.
+ *
+ * See also: devm_add_action_or_reset() for the devres equivalent.
+ *
+ * Returns:
+ *   0 on success, an error if the action could not be deferred.
+ */
+int kunit_add_action_or_reset(struct kunit *test, kunit_action_t *action,
+                             void *ctx);
+
+/**
+ * kunit_remove_action() - Cancel a matching deferred action.
+ * @test: Test case the action is associated with.
+ * @action: The deferred function to cancel.
+ * @ctx: The context passed to the deferred function to trigger.
+ *
+ * Prevent an action deferred via kunit_add_action() from executing when the
+ * test terminates.
+ *
+ * If the function/context pair was deferred multiple times, only the most
+ * recent one will be cancelled.
+ *
+ * See also: devm_remove_action() for the devres equivalent.
+ */
+void kunit_remove_action(struct kunit *test,
+                        kunit_action_t *action,
+                        void *ctx);
+
+/**
+ * kunit_release_action() - Run a matching action call immediately.
+ * @test: Test case the action is associated with.
+ * @action: The deferred function to trigger.
+ * @ctx: The context passed to the deferred function to trigger.
+ *
+ * Execute a function deferred via kunit_add_action()) immediately, rather than
+ * when the test ends.
+ *
+ * If the function/context pair was deferred multiple times, it will only be
+ * executed once here. The most recent deferral will no longer execute when
+ * the test ends.
+ *
+ * kunit_release_action(test, func, ctx);
+ * is equivalent to
+ * func(ctx);
+ * kunit_remove_action(test, func, ctx);
+ *
+ * See also: devm_release_action() for the devres equivalent.
+ */
+void kunit_release_action(struct kunit *test,
+                         kunit_action_t *action,
+                         void *ctx);
 #endif /* _KUNIT_RESOURCE_H */
index 57b309c..23120d5 100644 (file)
@@ -47,6 +47,7 @@ struct kunit;
  * sub-subtest.  See the "Subtests" section in
  * https://node-tap.org/tap-protocol/
  */
+#define KUNIT_INDENT_LEN               4
 #define KUNIT_SUBTEST_INDENT           "    "
 #define KUNIT_SUBSUBTEST_INDENT                "        "
 
@@ -168,6 +169,9 @@ static inline char *kunit_status_to_ok_not_ok(enum kunit_status status)
  * test case, similar to the notion of a *test fixture* or a *test class*
  * in other unit testing frameworks like JUnit or Googletest.
  *
+ * Note that @exit and @suite_exit will run even if @init or @suite_init
+ * fail: make sure they can handle any inconsistent state which may result.
+ *
  * Every &struct kunit_case must be associated with a kunit_suite for KUnit
  * to run it.
  */
@@ -321,8 +325,11 @@ enum kunit_status kunit_suite_has_succeeded(struct kunit_suite *suite);
  * @gfp: flags passed to underlying kmalloc().
  *
  * Just like `kmalloc_array(...)`, except the allocation is managed by the test case
- * and is automatically cleaned up after the test case concludes. See &struct
- * kunit_resource for more information.
+ * and is automatically cleaned up after the test case concludes. See kunit_add_action()
+ * for more information.
+ *
+ * Note that some internal context data is also allocated with GFP_KERNEL,
+ * regardless of the gfp passed in.
  */
 void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp);
 
@@ -333,6 +340,9 @@ void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp);
  * @gfp: flags passed to underlying kmalloc().
  *
  * See kmalloc() and kunit_kmalloc_array() for more information.
+ *
+ * Note that some internal context data is also allocated with GFP_KERNEL,
+ * regardless of the gfp passed in.
  */
 static inline void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
 {
@@ -472,7 +482,9 @@ void __printf(2, 3) kunit_log_append(char *log, const char *fmt, ...);
  */
 #define KUNIT_SUCCEED(test) do {} while (0)
 
-void kunit_do_failed_assertion(struct kunit *test,
+void __noreturn __kunit_abort(struct kunit *test);
+
+void __kunit_do_failed_assertion(struct kunit *test,
                               const struct kunit_loc *loc,
                               enum kunit_assert_type type,
                               const struct kunit_assert *assert,
@@ -482,13 +494,15 @@ void kunit_do_failed_assertion(struct kunit *test,
 #define _KUNIT_FAILED(test, assert_type, assert_class, assert_format, INITIALIZER, fmt, ...) do { \
        static const struct kunit_loc __loc = KUNIT_CURRENT_LOC;               \
        const struct assert_class __assertion = INITIALIZER;                   \
-       kunit_do_failed_assertion(test,                                        \
-                                 &__loc,                                      \
-                                 assert_type,                                 \
-                                 &__assertion.assert,                         \
-                                 assert_format,                               \
-                                 fmt,                                         \
-                                 ##__VA_ARGS__);                              \
+       __kunit_do_failed_assertion(test,                                      \
+                                   &__loc,                                    \
+                                   assert_type,                               \
+                                   &__assertion.assert,                       \
+                                   assert_format,                             \
+                                   fmt,                                       \
+                                   ##__VA_ARGS__);                            \
+       if (assert_type == KUNIT_ASSERTION)                                    \
+               __kunit_abort(test);                                           \
 } while (0)
 
 
index 7b71dd7..5ef126a 100644 (file)
@@ -1507,6 +1507,12 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 }
 #endif
 
+#ifdef CONFIG_ARM64
+void acpi_arm_init(void);
+#else
+static inline void acpi_arm_init(void) { }
+#endif
+
 #ifdef CONFIG_ACPI_PCC
 void acpi_init_pcc(void);
 #else
diff --git a/include/linux/acpi_agdi.h b/include/linux/acpi_agdi.h
deleted file mode 100644 (file)
index f477f0b..0000000
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-
-#ifndef __ACPI_AGDI_H__
-#define __ACPI_AGDI_H__
-
-#include <linux/acpi.h>
-
-#ifdef CONFIG_ACPI_AGDI
-void __init acpi_agdi_init(void);
-#else
-static inline void acpi_agdi_init(void) {}
-#endif
-#endif /* __ACPI_AGDI_H__ */
diff --git a/include/linux/acpi_apmt.h b/include/linux/acpi_apmt.h
deleted file mode 100644 (file)
index 40bd634..0000000
+++ /dev/null
@@ -1,19 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0
- *
- * ARM CoreSight PMU driver.
- * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
- *
- */
-
-#ifndef __ACPI_APMT_H__
-#define __ACPI_APMT_H__
-
-#include <linux/acpi.h>
-
-#ifdef CONFIG_ACPI_APMT
-void acpi_apmt_init(void);
-#else
-static inline void acpi_apmt_init(void) { }
-#endif /* CONFIG_ACPI_APMT */
-
-#endif /* __ACPI_APMT_H__ */
index b43be09..ee7cb6a 100644 (file)
@@ -26,13 +26,13 @@ int iort_register_domain_token(int trans_id, phys_addr_t base,
                               struct fwnode_handle *fw_node);
 void iort_deregister_domain_token(int trans_id);
 struct fwnode_handle *iort_find_domain_token(int trans_id);
+int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
+
 #ifdef CONFIG_ACPI_IORT
-void acpi_iort_init(void);
 u32 iort_msi_map_id(struct device *dev, u32 id);
 struct irq_domain *iort_get_device_domain(struct device *dev, u32 id,
                                          enum irq_domain_bus_token bus_token);
 void acpi_configure_pmsi_domain(struct device *dev);
-int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 void iort_get_rmr_sids(struct fwnode_handle *iommu_fwnode,
                       struct list_head *head);
 void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode,
@@ -43,7 +43,6 @@ int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
 void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head);
 phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
 #else
-static inline void acpi_iort_init(void) { }
 static inline u32 iort_msi_map_id(struct device *dev, u32 id)
 { return id; }
 static inline struct irq_domain *iort_get_device_domain(
index c10ebf8..446394f 100644 (file)
@@ -94,7 +94,8 @@ struct amd_cpudata {
  * enum amd_pstate_mode - driver working mode of amd pstate
  */
 enum amd_pstate_mode {
-       AMD_PSTATE_DISABLE = 0,
+       AMD_PSTATE_UNDEFINED = 0,
+       AMD_PSTATE_DISABLE,
        AMD_PSTATE_PASSIVE,
        AMD_PSTATE_ACTIVE,
        AMD_PSTATE_GUIDED,
@@ -102,6 +103,7 @@ enum amd_pstate_mode {
 };
 
 static const char * const amd_pstate_mode_string[] = {
+       [AMD_PSTATE_UNDEFINED]   = "undefined",
        [AMD_PSTATE_DISABLE]     = "disable",
        [AMD_PSTATE_PASSIVE]     = "passive",
        [AMD_PSTATE_ACTIVE]      = "active",
index c87aeec..583fe3b 100644 (file)
@@ -96,6 +96,7 @@
 
 /* FFA Bus/Device/Driver related */
 struct ffa_device {
+       u32 id;
        int vm_id;
        bool mode_32bit;
        uuid_t uuid;
index a6e4437..18f5744 100644 (file)
 
 #include <linux/compiler.h>
 
-#ifndef arch_xchg_relaxed
-#define arch_xchg_acquire arch_xchg
-#define arch_xchg_release arch_xchg
-#define arch_xchg_relaxed arch_xchg
-#else /* arch_xchg_relaxed */
-
-#ifndef arch_xchg_acquire
-#define arch_xchg_acquire(...) \
-       __atomic_op_acquire(arch_xchg, __VA_ARGS__)
+#if defined(arch_xchg)
+#define raw_xchg arch_xchg
+#elif defined(arch_xchg_relaxed)
+#define raw_xchg(...) \
+       __atomic_op_fence(arch_xchg, __VA_ARGS__)
+#else
+extern void raw_xchg_not_implemented(void);
+#define raw_xchg(...) raw_xchg_not_implemented()
 #endif
 
-#ifndef arch_xchg_release
-#define arch_xchg_release(...) \
-       __atomic_op_release(arch_xchg, __VA_ARGS__)
+#if defined(arch_xchg_acquire)
+#define raw_xchg_acquire arch_xchg_acquire
+#elif defined(arch_xchg_relaxed)
+#define raw_xchg_acquire(...) \
+       __atomic_op_acquire(arch_xchg, __VA_ARGS__)
+#elif defined(arch_xchg)
+#define raw_xchg_acquire arch_xchg
+#else
+extern void raw_xchg_acquire_not_implemented(void);
+#define raw_xchg_acquire(...) raw_xchg_acquire_not_implemented()
 #endif
 
-#ifndef arch_xchg
-#define arch_xchg(...) \
-       __atomic_op_fence(arch_xchg, __VA_ARGS__)
+#if defined(arch_xchg_release)
+#define raw_xchg_release arch_xchg_release
+#elif defined(arch_xchg_relaxed)
+#define raw_xchg_release(...) \
+       __atomic_op_release(arch_xchg, __VA_ARGS__)
+#elif defined(arch_xchg)
+#define raw_xchg_release arch_xchg
+#else
+extern void raw_xchg_release_not_implemented(void);
+#define raw_xchg_release(...) raw_xchg_release_not_implemented()
+#endif
+
+#if defined(arch_xchg_relaxed)
+#define raw_xchg_relaxed arch_xchg_relaxed
+#elif defined(arch_xchg)
+#define raw_xchg_relaxed arch_xchg
+#else
+extern void raw_xchg_relaxed_not_implemented(void);
+#define raw_xchg_relaxed(...) raw_xchg_relaxed_not_implemented()
+#endif
+
+#if defined(arch_cmpxchg)
+#define raw_cmpxchg arch_cmpxchg
+#elif defined(arch_cmpxchg_relaxed)
+#define raw_cmpxchg(...) \
+       __atomic_op_fence(arch_cmpxchg, __VA_ARGS__)
+#else
+extern void raw_cmpxchg_not_implemented(void);
+#define raw_cmpxchg(...) raw_cmpxchg_not_implemented()
 #endif
 
-#endif /* arch_xchg_relaxed */
-
-#ifndef arch_cmpxchg_relaxed
-#define arch_cmpxchg_acquire arch_cmpxchg
-#define arch_cmpxchg_release arch_cmpxchg
-#define arch_cmpxchg_relaxed arch_cmpxchg
-#else /* arch_cmpxchg_relaxed */
-
-#ifndef arch_cmpxchg_acquire
-#define arch_cmpxchg_acquire(...) \
+#if defined(arch_cmpxchg_acquire)
+#define raw_cmpxchg_acquire arch_cmpxchg_acquire
+#elif defined(arch_cmpxchg_relaxed)
+#define raw_cmpxchg_acquire(...) \
        __atomic_op_acquire(arch_cmpxchg, __VA_ARGS__)
+#elif defined(arch_cmpxchg)
+#define raw_cmpxchg_acquire arch_cmpxchg
+#else
+extern void raw_cmpxchg_acquire_not_implemented(void);
+#define raw_cmpxchg_acquire(...) raw_cmpxchg_acquire_not_implemented()
 #endif
 
-#ifndef arch_cmpxchg_release
-#define arch_cmpxchg_release(...) \
+#if defined(arch_cmpxchg_release)
+#define raw_cmpxchg_release arch_cmpxchg_release
+#elif defined(arch_cmpxchg_relaxed)
+#define raw_cmpxchg_release(...) \
        __atomic_op_release(arch_cmpxchg, __VA_ARGS__)
+#elif defined(arch_cmpxchg)
+#define raw_cmpxchg_release arch_cmpxchg
+#else
+extern void raw_cmpxchg_release_not_implemented(void);
+#define raw_cmpxchg_release(...) raw_cmpxchg_release_not_implemented()
+#endif
+
+#if defined(arch_cmpxchg_relaxed)
+#define raw_cmpxchg_relaxed arch_cmpxchg_relaxed
+#elif defined(arch_cmpxchg)
+#define raw_cmpxchg_relaxed arch_cmpxchg
+#else
+extern void raw_cmpxchg_relaxed_not_implemented(void);
+#define raw_cmpxchg_relaxed(...) raw_cmpxchg_relaxed_not_implemented()
+#endif
+
+#if defined(arch_cmpxchg64)
+#define raw_cmpxchg64 arch_cmpxchg64
+#elif defined(arch_cmpxchg64_relaxed)
+#define raw_cmpxchg64(...) \
+       __atomic_op_fence(arch_cmpxchg64, __VA_ARGS__)
+#else
+extern void raw_cmpxchg64_not_implemented(void);
+#define raw_cmpxchg64(...) raw_cmpxchg64_not_implemented()
 #endif
 
-#ifndef arch_cmpxchg
-#define arch_cmpxchg(...) \
-       __atomic_op_fence(arch_cmpxchg, __VA_ARGS__)
-#endif
-
-#endif /* arch_cmpxchg_relaxed */
-
-#ifndef arch_cmpxchg64_relaxed
-#define arch_cmpxchg64_acquire arch_cmpxchg64
-#define arch_cmpxchg64_release arch_cmpxchg64
-#define arch_cmpxchg64_relaxed arch_cmpxchg64
-#else /* arch_cmpxchg64_relaxed */
-
-#ifndef arch_cmpxchg64_acquire
-#define arch_cmpxchg64_acquire(...) \
+#if defined(arch_cmpxchg64_acquire)
+#define raw_cmpxchg64_acquire arch_cmpxchg64_acquire
+#elif defined(arch_cmpxchg64_relaxed)
+#define raw_cmpxchg64_acquire(...) \
        __atomic_op_acquire(arch_cmpxchg64, __VA_ARGS__)
+#elif defined(arch_cmpxchg64)
+#define raw_cmpxchg64_acquire arch_cmpxchg64
+#else
+extern void raw_cmpxchg64_acquire_not_implemented(void);
+#define raw_cmpxchg64_acquire(...) raw_cmpxchg64_acquire_not_implemented()
 #endif
 
-#ifndef arch_cmpxchg64_release
-#define arch_cmpxchg64_release(...) \
+#if defined(arch_cmpxchg64_release)
+#define raw_cmpxchg64_release arch_cmpxchg64_release
+#elif defined(arch_cmpxchg64_relaxed)
+#define raw_cmpxchg64_release(...) \
        __atomic_op_release(arch_cmpxchg64, __VA_ARGS__)
+#elif defined(arch_cmpxchg64)
+#define raw_cmpxchg64_release arch_cmpxchg64
+#else
+extern void raw_cmpxchg64_release_not_implemented(void);
+#define raw_cmpxchg64_release(...) raw_cmpxchg64_release_not_implemented()
+#endif
+
+#if defined(arch_cmpxchg64_relaxed)
+#define raw_cmpxchg64_relaxed arch_cmpxchg64_relaxed
+#elif defined(arch_cmpxchg64)
+#define raw_cmpxchg64_relaxed arch_cmpxchg64
+#else
+extern void raw_cmpxchg64_relaxed_not_implemented(void);
+#define raw_cmpxchg64_relaxed(...) raw_cmpxchg64_relaxed_not_implemented()
+#endif
+
+#if defined(arch_cmpxchg128)
+#define raw_cmpxchg128 arch_cmpxchg128
+#elif defined(arch_cmpxchg128_relaxed)
+#define raw_cmpxchg128(...) \
+       __atomic_op_fence(arch_cmpxchg128, __VA_ARGS__)
+#else
+extern void raw_cmpxchg128_not_implemented(void);
+#define raw_cmpxchg128(...) raw_cmpxchg128_not_implemented()
+#endif
+
+#if defined(arch_cmpxchg128_acquire)
+#define raw_cmpxchg128_acquire arch_cmpxchg128_acquire
+#elif defined(arch_cmpxchg128_relaxed)
+#define raw_cmpxchg128_acquire(...) \
+       __atomic_op_acquire(arch_cmpxchg128, __VA_ARGS__)
+#elif defined(arch_cmpxchg128)
+#define raw_cmpxchg128_acquire arch_cmpxchg128
+#else
+extern void raw_cmpxchg128_acquire_not_implemented(void);
+#define raw_cmpxchg128_acquire(...) raw_cmpxchg128_acquire_not_implemented()
+#endif
+
+#if defined(arch_cmpxchg128_release)
+#define raw_cmpxchg128_release arch_cmpxchg128_release
+#elif defined(arch_cmpxchg128_relaxed)
+#define raw_cmpxchg128_release(...) \
+       __atomic_op_release(arch_cmpxchg128, __VA_ARGS__)
+#elif defined(arch_cmpxchg128)
+#define raw_cmpxchg128_release arch_cmpxchg128
+#else
+extern void raw_cmpxchg128_release_not_implemented(void);
+#define raw_cmpxchg128_release(...) raw_cmpxchg128_release_not_implemented()
+#endif
+
+#if defined(arch_cmpxchg128_relaxed)
+#define raw_cmpxchg128_relaxed arch_cmpxchg128_relaxed
+#elif defined(arch_cmpxchg128)
+#define raw_cmpxchg128_relaxed arch_cmpxchg128
+#else
+extern void raw_cmpxchg128_relaxed_not_implemented(void);
+#define raw_cmpxchg128_relaxed(...) raw_cmpxchg128_relaxed_not_implemented()
+#endif
+
+#if defined(arch_try_cmpxchg)
+#define raw_try_cmpxchg arch_try_cmpxchg
+#elif defined(arch_try_cmpxchg_relaxed)
+#define raw_try_cmpxchg(...) \
+       __atomic_op_fence(arch_try_cmpxchg, __VA_ARGS__)
+#else
+#define raw_try_cmpxchg(_ptr, _oldp, _new) \
+({ \
+       typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+       ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
+       if (unlikely(___r != ___o)) \
+               *___op = ___r; \
+       likely(___r == ___o); \
+})
 #endif
 
-#ifndef arch_cmpxchg64
-#define arch_cmpxchg64(...) \
-       __atomic_op_fence(arch_cmpxchg64, __VA_ARGS__)
+#if defined(arch_try_cmpxchg_acquire)
+#define raw_try_cmpxchg_acquire arch_try_cmpxchg_acquire
+#elif defined(arch_try_cmpxchg_relaxed)
+#define raw_try_cmpxchg_acquire(...) \
+       __atomic_op_acquire(arch_try_cmpxchg, __VA_ARGS__)
+#elif defined(arch_try_cmpxchg)
+#define raw_try_cmpxchg_acquire arch_try_cmpxchg
+#else
+#define raw_try_cmpxchg_acquire(_ptr, _oldp, _new) \
+({ \
+       typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+       ___r = raw_cmpxchg_acquire((_ptr), ___o, (_new)); \
+       if (unlikely(___r != ___o)) \
+               *___op = ___r; \
+       likely(___r == ___o); \
+})
 #endif
 
-#endif /* arch_cmpxchg64_relaxed */
-
-#ifndef arch_try_cmpxchg_relaxed
-#ifdef arch_try_cmpxchg
-#define arch_try_cmpxchg_acquire arch_try_cmpxchg
-#define arch_try_cmpxchg_release arch_try_cmpxchg
-#define arch_try_cmpxchg_relaxed arch_try_cmpxchg
-#endif /* arch_try_cmpxchg */
-
-#ifndef arch_try_cmpxchg
-#define arch_try_cmpxchg(_ptr, _oldp, _new) \
+#if defined(arch_try_cmpxchg_release)
+#define raw_try_cmpxchg_release arch_try_cmpxchg_release
+#elif defined(arch_try_cmpxchg_relaxed)
+#define raw_try_cmpxchg_release(...) \
+       __atomic_op_release(arch_try_cmpxchg, __VA_ARGS__)
+#elif defined(arch_try_cmpxchg)
+#define raw_try_cmpxchg_release arch_try_cmpxchg
+#else
+#define raw_try_cmpxchg_release(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg_release((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg */
+#endif
 
-#ifndef arch_try_cmpxchg_acquire
-#define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
+#if defined(arch_try_cmpxchg_relaxed)
+#define raw_try_cmpxchg_relaxed arch_try_cmpxchg_relaxed
+#elif defined(arch_try_cmpxchg)
+#define raw_try_cmpxchg_relaxed arch_try_cmpxchg
+#else
+#define raw_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg_relaxed((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg_acquire */
+#endif
 
-#ifndef arch_try_cmpxchg_release
-#define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
+#if defined(arch_try_cmpxchg64)
+#define raw_try_cmpxchg64 arch_try_cmpxchg64
+#elif defined(arch_try_cmpxchg64_relaxed)
+#define raw_try_cmpxchg64(...) \
+       __atomic_op_fence(arch_try_cmpxchg64, __VA_ARGS__)
+#else
+#define raw_try_cmpxchg64(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg64((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg_release */
+#endif
 
-#ifndef arch_try_cmpxchg_relaxed
-#define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
+#if defined(arch_try_cmpxchg64_acquire)
+#define raw_try_cmpxchg64_acquire arch_try_cmpxchg64_acquire
+#elif defined(arch_try_cmpxchg64_relaxed)
+#define raw_try_cmpxchg64_acquire(...) \
+       __atomic_op_acquire(arch_try_cmpxchg64, __VA_ARGS__)
+#elif defined(arch_try_cmpxchg64)
+#define raw_try_cmpxchg64_acquire arch_try_cmpxchg64
+#else
+#define raw_try_cmpxchg64_acquire(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg64_acquire((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg_relaxed */
-
-#else /* arch_try_cmpxchg_relaxed */
-
-#ifndef arch_try_cmpxchg_acquire
-#define arch_try_cmpxchg_acquire(...) \
-       __atomic_op_acquire(arch_try_cmpxchg, __VA_ARGS__)
 #endif
 
-#ifndef arch_try_cmpxchg_release
-#define arch_try_cmpxchg_release(...) \
-       __atomic_op_release(arch_try_cmpxchg, __VA_ARGS__)
+#if defined(arch_try_cmpxchg64_release)
+#define raw_try_cmpxchg64_release arch_try_cmpxchg64_release
+#elif defined(arch_try_cmpxchg64_relaxed)
+#define raw_try_cmpxchg64_release(...) \
+       __atomic_op_release(arch_try_cmpxchg64, __VA_ARGS__)
+#elif defined(arch_try_cmpxchg64)
+#define raw_try_cmpxchg64_release arch_try_cmpxchg64
+#else
+#define raw_try_cmpxchg64_release(_ptr, _oldp, _new) \
+({ \
+       typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+       ___r = raw_cmpxchg64_release((_ptr), ___o, (_new)); \
+       if (unlikely(___r != ___o)) \
+               *___op = ___r; \
+       likely(___r == ___o); \
+})
 #endif
 
-#ifndef arch_try_cmpxchg
-#define arch_try_cmpxchg(...) \
-       __atomic_op_fence(arch_try_cmpxchg, __VA_ARGS__)
+#if defined(arch_try_cmpxchg64_relaxed)
+#define raw_try_cmpxchg64_relaxed arch_try_cmpxchg64_relaxed
+#elif defined(arch_try_cmpxchg64)
+#define raw_try_cmpxchg64_relaxed arch_try_cmpxchg64
+#else
+#define raw_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \
+({ \
+       typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+       ___r = raw_cmpxchg64_relaxed((_ptr), ___o, (_new)); \
+       if (unlikely(___r != ___o)) \
+               *___op = ___r; \
+       likely(___r == ___o); \
+})
 #endif
 
-#endif /* arch_try_cmpxchg_relaxed */
-
-#ifndef arch_try_cmpxchg64_relaxed
-#ifdef arch_try_cmpxchg64
-#define arch_try_cmpxchg64_acquire arch_try_cmpxchg64
-#define arch_try_cmpxchg64_release arch_try_cmpxchg64
-#define arch_try_cmpxchg64_relaxed arch_try_cmpxchg64
-#endif /* arch_try_cmpxchg64 */
-
-#ifndef arch_try_cmpxchg64
-#define arch_try_cmpxchg64(_ptr, _oldp, _new) \
+#if defined(arch_try_cmpxchg128)
+#define raw_try_cmpxchg128 arch_try_cmpxchg128
+#elif defined(arch_try_cmpxchg128_relaxed)
+#define raw_try_cmpxchg128(...) \
+       __atomic_op_fence(arch_try_cmpxchg128, __VA_ARGS__)
+#else
+#define raw_try_cmpxchg128(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg64((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg128((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg64 */
+#endif
 
-#ifndef arch_try_cmpxchg64_acquire
-#define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \
+#if defined(arch_try_cmpxchg128_acquire)
+#define raw_try_cmpxchg128_acquire arch_try_cmpxchg128_acquire
+#elif defined(arch_try_cmpxchg128_relaxed)
+#define raw_try_cmpxchg128_acquire(...) \
+       __atomic_op_acquire(arch_try_cmpxchg128, __VA_ARGS__)
+#elif defined(arch_try_cmpxchg128)
+#define raw_try_cmpxchg128_acquire arch_try_cmpxchg128
+#else
+#define raw_try_cmpxchg128_acquire(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg128_acquire((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg64_acquire */
+#endif
 
-#ifndef arch_try_cmpxchg64_release
-#define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \
+#if defined(arch_try_cmpxchg128_release)
+#define raw_try_cmpxchg128_release arch_try_cmpxchg128_release
+#elif defined(arch_try_cmpxchg128_relaxed)
+#define raw_try_cmpxchg128_release(...) \
+       __atomic_op_release(arch_try_cmpxchg128, __VA_ARGS__)
+#elif defined(arch_try_cmpxchg128)
+#define raw_try_cmpxchg128_release arch_try_cmpxchg128
+#else
+#define raw_try_cmpxchg128_release(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg128_release((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg64_release */
+#endif
 
-#ifndef arch_try_cmpxchg64_relaxed
-#define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \
+#if defined(arch_try_cmpxchg128_relaxed)
+#define raw_try_cmpxchg128_relaxed arch_try_cmpxchg128_relaxed
+#elif defined(arch_try_cmpxchg128)
+#define raw_try_cmpxchg128_relaxed arch_try_cmpxchg128
+#else
+#define raw_try_cmpxchg128_relaxed(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg128_relaxed((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg64_relaxed */
-
-#else /* arch_try_cmpxchg64_relaxed */
-
-#ifndef arch_try_cmpxchg64_acquire
-#define arch_try_cmpxchg64_acquire(...) \
-       __atomic_op_acquire(arch_try_cmpxchg64, __VA_ARGS__)
 #endif
 
-#ifndef arch_try_cmpxchg64_release
-#define arch_try_cmpxchg64_release(...) \
-       __atomic_op_release(arch_try_cmpxchg64, __VA_ARGS__)
-#endif
+#define raw_cmpxchg_local arch_cmpxchg_local
 
-#ifndef arch_try_cmpxchg64
-#define arch_try_cmpxchg64(...) \
-       __atomic_op_fence(arch_try_cmpxchg64, __VA_ARGS__)
+#ifdef arch_try_cmpxchg_local
+#define raw_try_cmpxchg_local arch_try_cmpxchg_local
+#else
+#define raw_try_cmpxchg_local(_ptr, _oldp, _new) \
+({ \
+       typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+       ___r = raw_cmpxchg_local((_ptr), ___o, (_new)); \
+       if (unlikely(___r != ___o)) \
+               *___op = ___r; \
+       likely(___r == ___o); \
+})
 #endif
 
-#endif /* arch_try_cmpxchg64_relaxed */
+#define raw_cmpxchg64_local arch_cmpxchg64_local
 
-#ifndef arch_try_cmpxchg_local
-#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
+#ifdef arch_try_cmpxchg64_local
+#define raw_try_cmpxchg64_local arch_try_cmpxchg64_local
+#else
+#define raw_try_cmpxchg64_local(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg64_local((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg_local */
+#endif
 
-#ifndef arch_try_cmpxchg64_local
-#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
+#define raw_cmpxchg128_local arch_cmpxchg128_local
+
+#ifdef arch_try_cmpxchg128_local
+#define raw_try_cmpxchg128_local arch_try_cmpxchg128_local
+#else
+#define raw_try_cmpxchg128_local(_ptr, _oldp, _new) \
 ({ \
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
-       ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
+       ___r = raw_cmpxchg128_local((_ptr), ___o, (_new)); \
        if (unlikely(___r != ___o)) \
                *___op = ___r; \
        likely(___r == ___o); \
 })
-#endif /* arch_try_cmpxchg64_local */
+#endif
+
+#define raw_sync_cmpxchg arch_sync_cmpxchg
+
+/**
+ * raw_atomic_read() - atomic load with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically loads the value of @v with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_read() elsewhere.
+ *
+ * Return: The value loaded from @v.
+ */
+static __always_inline int
+raw_atomic_read(const atomic_t *v)
+{
+       return arch_atomic_read(v);
+}
 
-#ifndef arch_atomic_read_acquire
+/**
+ * raw_atomic_read_acquire() - atomic load with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically loads the value of @v with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_read_acquire() elsewhere.
+ *
+ * Return: The value loaded from @v.
+ */
 static __always_inline int
-arch_atomic_read_acquire(const atomic_t *v)
+raw_atomic_read_acquire(const atomic_t *v)
 {
+#if defined(arch_atomic_read_acquire)
+       return arch_atomic_read_acquire(v);
+#elif defined(arch_atomic_read)
+       return arch_atomic_read(v);
+#else
        int ret;
 
        if (__native_word(atomic_t)) {
                ret = smp_load_acquire(&(v)->counter);
        } else {
-               ret = arch_atomic_read(v);
+               ret = raw_atomic_read(v);
                __atomic_acquire_fence();
        }
 
        return ret;
-}
-#define arch_atomic_read_acquire arch_atomic_read_acquire
 #endif
+}
+
+/**
+ * raw_atomic_set() - atomic set with relaxed ordering
+ * @v: pointer to atomic_t
+ * @i: int value to assign
+ *
+ * Atomically sets @v to @i with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_set() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic_set(atomic_t *v, int i)
+{
+       arch_atomic_set(v, i);
+}
 
-#ifndef arch_atomic_set_release
+/**
+ * raw_atomic_set_release() - atomic set with release ordering
+ * @v: pointer to atomic_t
+ * @i: int value to assign
+ *
+ * Atomically sets @v to @i with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_set_release() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_set_release(atomic_t *v, int i)
+raw_atomic_set_release(atomic_t *v, int i)
 {
+#if defined(arch_atomic_set_release)
+       arch_atomic_set_release(v, i);
+#elif defined(arch_atomic_set)
+       arch_atomic_set(v, i);
+#else
        if (__native_word(atomic_t)) {
                smp_store_release(&(v)->counter, i);
        } else {
                __atomic_release_fence();
-               arch_atomic_set(v, i);
+               raw_atomic_set(v, i);
        }
-}
-#define arch_atomic_set_release arch_atomic_set_release
 #endif
+}
+
+/**
+ * raw_atomic_add() - atomic add with relaxed ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_add() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic_add(int i, atomic_t *v)
+{
+       arch_atomic_add(i, v);
+}
 
-#ifndef arch_atomic_add_return_relaxed
-#define arch_atomic_add_return_acquire arch_atomic_add_return
-#define arch_atomic_add_return_release arch_atomic_add_return
-#define arch_atomic_add_return_relaxed arch_atomic_add_return
-#else /* arch_atomic_add_return_relaxed */
+/**
+ * raw_atomic_add_return() - atomic add with full ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_add_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
+static __always_inline int
+raw_atomic_add_return(int i, atomic_t *v)
+{
+#if defined(arch_atomic_add_return)
+       return arch_atomic_add_return(i, v);
+#elif defined(arch_atomic_add_return_relaxed)
+       int ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic_add_return_relaxed(i, v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+#error "Unable to define raw_atomic_add_return"
+#endif
+}
 
-#ifndef arch_atomic_add_return_acquire
+/**
+ * raw_atomic_add_return_acquire() - atomic add with acquire ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_add_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_add_return_acquire(int i, atomic_t *v)
+raw_atomic_add_return_acquire(int i, atomic_t *v)
 {
+#if defined(arch_atomic_add_return_acquire)
+       return arch_atomic_add_return_acquire(i, v);
+#elif defined(arch_atomic_add_return_relaxed)
        int ret = arch_atomic_add_return_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_add_return_acquire arch_atomic_add_return_acquire
+#elif defined(arch_atomic_add_return)
+       return arch_atomic_add_return(i, v);
+#else
+#error "Unable to define raw_atomic_add_return_acquire"
 #endif
+}
 
-#ifndef arch_atomic_add_return_release
+/**
+ * raw_atomic_add_return_release() - atomic add with release ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_add_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_add_return_release(int i, atomic_t *v)
+raw_atomic_add_return_release(int i, atomic_t *v)
 {
+#if defined(arch_atomic_add_return_release)
+       return arch_atomic_add_return_release(i, v);
+#elif defined(arch_atomic_add_return_relaxed)
        __atomic_release_fence();
        return arch_atomic_add_return_relaxed(i, v);
+#elif defined(arch_atomic_add_return)
+       return arch_atomic_add_return(i, v);
+#else
+#error "Unable to define raw_atomic_add_return_release"
+#endif
 }
-#define arch_atomic_add_return_release arch_atomic_add_return_release
+
+/**
+ * raw_atomic_add_return_relaxed() - atomic add with relaxed ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_add_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
+static __always_inline int
+raw_atomic_add_return_relaxed(int i, atomic_t *v)
+{
+#if defined(arch_atomic_add_return_relaxed)
+       return arch_atomic_add_return_relaxed(i, v);
+#elif defined(arch_atomic_add_return)
+       return arch_atomic_add_return(i, v);
+#else
+#error "Unable to define raw_atomic_add_return_relaxed"
 #endif
+}
 
-#ifndef arch_atomic_add_return
+/**
+ * raw_atomic_fetch_add() - atomic add with full ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_add() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_add_return(int i, atomic_t *v)
+raw_atomic_fetch_add(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_add)
+       return arch_atomic_fetch_add(i, v);
+#elif defined(arch_atomic_fetch_add_relaxed)
        int ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic_add_return_relaxed(i, v);
+       ret = arch_atomic_fetch_add_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic_add_return arch_atomic_add_return
+#else
+#error "Unable to define raw_atomic_fetch_add"
 #endif
+}
 
-#endif /* arch_atomic_add_return_relaxed */
-
-#ifndef arch_atomic_fetch_add_relaxed
-#define arch_atomic_fetch_add_acquire arch_atomic_fetch_add
-#define arch_atomic_fetch_add_release arch_atomic_fetch_add
-#define arch_atomic_fetch_add_relaxed arch_atomic_fetch_add
-#else /* arch_atomic_fetch_add_relaxed */
-
-#ifndef arch_atomic_fetch_add_acquire
+/**
+ * raw_atomic_fetch_add_acquire() - atomic add with acquire ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_add_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_add_acquire(int i, atomic_t *v)
+raw_atomic_fetch_add_acquire(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_add_acquire)
+       return arch_atomic_fetch_add_acquire(i, v);
+#elif defined(arch_atomic_fetch_add_relaxed)
        int ret = arch_atomic_fetch_add_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_fetch_add_acquire arch_atomic_fetch_add_acquire
+#elif defined(arch_atomic_fetch_add)
+       return arch_atomic_fetch_add(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_add_acquire"
 #endif
+}
 
-#ifndef arch_atomic_fetch_add_release
+/**
+ * raw_atomic_fetch_add_release() - atomic add with release ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_add_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_add_release(int i, atomic_t *v)
+raw_atomic_fetch_add_release(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_add_release)
+       return arch_atomic_fetch_add_release(i, v);
+#elif defined(arch_atomic_fetch_add_relaxed)
        __atomic_release_fence();
        return arch_atomic_fetch_add_relaxed(i, v);
+#elif defined(arch_atomic_fetch_add)
+       return arch_atomic_fetch_add(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_add_release"
+#endif
 }
-#define arch_atomic_fetch_add_release arch_atomic_fetch_add_release
+
+/**
+ * raw_atomic_fetch_add_relaxed() - atomic add with relaxed ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_add_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline int
+raw_atomic_fetch_add_relaxed(int i, atomic_t *v)
+{
+#if defined(arch_atomic_fetch_add_relaxed)
+       return arch_atomic_fetch_add_relaxed(i, v);
+#elif defined(arch_atomic_fetch_add)
+       return arch_atomic_fetch_add(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_add_relaxed"
 #endif
+}
+
+/**
+ * raw_atomic_sub() - atomic subtract with relaxed ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_sub() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic_sub(int i, atomic_t *v)
+{
+       arch_atomic_sub(i, v);
+}
 
-#ifndef arch_atomic_fetch_add
+/**
+ * raw_atomic_sub_return() - atomic subtract with full ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_sub_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_add(int i, atomic_t *v)
+raw_atomic_sub_return(int i, atomic_t *v)
 {
+#if defined(arch_atomic_sub_return)
+       return arch_atomic_sub_return(i, v);
+#elif defined(arch_atomic_sub_return_relaxed)
        int ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic_fetch_add_relaxed(i, v);
+       ret = arch_atomic_sub_return_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic_fetch_add arch_atomic_fetch_add
+#else
+#error "Unable to define raw_atomic_sub_return"
 #endif
+}
 
-#endif /* arch_atomic_fetch_add_relaxed */
-
-#ifndef arch_atomic_sub_return_relaxed
-#define arch_atomic_sub_return_acquire arch_atomic_sub_return
-#define arch_atomic_sub_return_release arch_atomic_sub_return
-#define arch_atomic_sub_return_relaxed arch_atomic_sub_return
-#else /* arch_atomic_sub_return_relaxed */
-
-#ifndef arch_atomic_sub_return_acquire
+/**
+ * raw_atomic_sub_return_acquire() - atomic subtract with acquire ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_sub_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_sub_return_acquire(int i, atomic_t *v)
+raw_atomic_sub_return_acquire(int i, atomic_t *v)
 {
+#if defined(arch_atomic_sub_return_acquire)
+       return arch_atomic_sub_return_acquire(i, v);
+#elif defined(arch_atomic_sub_return_relaxed)
        int ret = arch_atomic_sub_return_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_sub_return_acquire arch_atomic_sub_return_acquire
+#elif defined(arch_atomic_sub_return)
+       return arch_atomic_sub_return(i, v);
+#else
+#error "Unable to define raw_atomic_sub_return_acquire"
 #endif
+}
 
-#ifndef arch_atomic_sub_return_release
+/**
+ * raw_atomic_sub_return_release() - atomic subtract with release ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_sub_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_sub_return_release(int i, atomic_t *v)
+raw_atomic_sub_return_release(int i, atomic_t *v)
 {
+#if defined(arch_atomic_sub_return_release)
+       return arch_atomic_sub_return_release(i, v);
+#elif defined(arch_atomic_sub_return_relaxed)
        __atomic_release_fence();
        return arch_atomic_sub_return_relaxed(i, v);
+#elif defined(arch_atomic_sub_return)
+       return arch_atomic_sub_return(i, v);
+#else
+#error "Unable to define raw_atomic_sub_return_release"
+#endif
 }
-#define arch_atomic_sub_return_release arch_atomic_sub_return_release
+
+/**
+ * raw_atomic_sub_return_relaxed() - atomic subtract with relaxed ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_sub_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
+static __always_inline int
+raw_atomic_sub_return_relaxed(int i, atomic_t *v)
+{
+#if defined(arch_atomic_sub_return_relaxed)
+       return arch_atomic_sub_return_relaxed(i, v);
+#elif defined(arch_atomic_sub_return)
+       return arch_atomic_sub_return(i, v);
+#else
+#error "Unable to define raw_atomic_sub_return_relaxed"
 #endif
+}
 
-#ifndef arch_atomic_sub_return
+/**
+ * raw_atomic_fetch_sub() - atomic subtract with full ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_sub() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_sub_return(int i, atomic_t *v)
+raw_atomic_fetch_sub(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_sub)
+       return arch_atomic_fetch_sub(i, v);
+#elif defined(arch_atomic_fetch_sub_relaxed)
        int ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic_sub_return_relaxed(i, v);
+       ret = arch_atomic_fetch_sub_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic_sub_return arch_atomic_sub_return
+#else
+#error "Unable to define raw_atomic_fetch_sub"
 #endif
+}
 
-#endif /* arch_atomic_sub_return_relaxed */
-
-#ifndef arch_atomic_fetch_sub_relaxed
-#define arch_atomic_fetch_sub_acquire arch_atomic_fetch_sub
-#define arch_atomic_fetch_sub_release arch_atomic_fetch_sub
-#define arch_atomic_fetch_sub_relaxed arch_atomic_fetch_sub
-#else /* arch_atomic_fetch_sub_relaxed */
-
-#ifndef arch_atomic_fetch_sub_acquire
+/**
+ * raw_atomic_fetch_sub_acquire() - atomic subtract with acquire ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_sub_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_sub_acquire(int i, atomic_t *v)
+raw_atomic_fetch_sub_acquire(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_sub_acquire)
+       return arch_atomic_fetch_sub_acquire(i, v);
+#elif defined(arch_atomic_fetch_sub_relaxed)
        int ret = arch_atomic_fetch_sub_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_fetch_sub_acquire arch_atomic_fetch_sub_acquire
+#elif defined(arch_atomic_fetch_sub)
+       return arch_atomic_fetch_sub(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_sub_acquire"
 #endif
+}
 
-#ifndef arch_atomic_fetch_sub_release
+/**
+ * raw_atomic_fetch_sub_release() - atomic subtract with release ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_sub_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_sub_release(int i, atomic_t *v)
+raw_atomic_fetch_sub_release(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_sub_release)
+       return arch_atomic_fetch_sub_release(i, v);
+#elif defined(arch_atomic_fetch_sub_relaxed)
        __atomic_release_fence();
        return arch_atomic_fetch_sub_relaxed(i, v);
-}
-#define arch_atomic_fetch_sub_release arch_atomic_fetch_sub_release
+#elif defined(arch_atomic_fetch_sub)
+       return arch_atomic_fetch_sub(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_sub_release"
 #endif
+}
 
-#ifndef arch_atomic_fetch_sub
+/**
+ * raw_atomic_fetch_sub_relaxed() - atomic subtract with relaxed ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_sub_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_sub(int i, atomic_t *v)
+raw_atomic_fetch_sub_relaxed(int i, atomic_t *v)
 {
-       int ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic_fetch_sub_relaxed(i, v);
-       __atomic_post_full_fence();
-       return ret;
-}
-#define arch_atomic_fetch_sub arch_atomic_fetch_sub
+#if defined(arch_atomic_fetch_sub_relaxed)
+       return arch_atomic_fetch_sub_relaxed(i, v);
+#elif defined(arch_atomic_fetch_sub)
+       return arch_atomic_fetch_sub(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_sub_relaxed"
 #endif
-
-#endif /* arch_atomic_fetch_sub_relaxed */
-
-#ifndef arch_atomic_inc
-static __always_inline void
-arch_atomic_inc(atomic_t *v)
-{
-       arch_atomic_add(1, v);
 }
-#define arch_atomic_inc arch_atomic_inc
-#endif
-
-#ifndef arch_atomic_inc_return_relaxed
-#ifdef arch_atomic_inc_return
-#define arch_atomic_inc_return_acquire arch_atomic_inc_return
-#define arch_atomic_inc_return_release arch_atomic_inc_return
-#define arch_atomic_inc_return_relaxed arch_atomic_inc_return
-#endif /* arch_atomic_inc_return */
 
-#ifndef arch_atomic_inc_return
-static __always_inline int
-arch_atomic_inc_return(atomic_t *v)
+/**
+ * raw_atomic_inc() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_inc() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic_inc(atomic_t *v)
 {
-       return arch_atomic_add_return(1, v);
-}
-#define arch_atomic_inc_return arch_atomic_inc_return
+#if defined(arch_atomic_inc)
+       arch_atomic_inc(v);
+#else
+       raw_atomic_add(1, v);
 #endif
-
-#ifndef arch_atomic_inc_return_acquire
-static __always_inline int
-arch_atomic_inc_return_acquire(atomic_t *v)
-{
-       return arch_atomic_add_return_acquire(1, v);
 }
-#define arch_atomic_inc_return_acquire arch_atomic_inc_return_acquire
-#endif
 
-#ifndef arch_atomic_inc_return_release
+/**
+ * raw_atomic_inc_return() - atomic increment with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_inc_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_inc_return_release(atomic_t *v)
+raw_atomic_inc_return(atomic_t *v)
 {
-       return arch_atomic_add_return_release(1, v);
-}
-#define arch_atomic_inc_return_release arch_atomic_inc_return_release
+#if defined(arch_atomic_inc_return)
+       return arch_atomic_inc_return(v);
+#elif defined(arch_atomic_inc_return_relaxed)
+       int ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic_inc_return_relaxed(v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic_add_return(1, v);
 #endif
-
-#ifndef arch_atomic_inc_return_relaxed
-static __always_inline int
-arch_atomic_inc_return_relaxed(atomic_t *v)
-{
-       return arch_atomic_add_return_relaxed(1, v);
 }
-#define arch_atomic_inc_return_relaxed arch_atomic_inc_return_relaxed
-#endif
 
-#else /* arch_atomic_inc_return_relaxed */
-
-#ifndef arch_atomic_inc_return_acquire
+/**
+ * raw_atomic_inc_return_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_inc_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_inc_return_acquire(atomic_t *v)
+raw_atomic_inc_return_acquire(atomic_t *v)
 {
+#if defined(arch_atomic_inc_return_acquire)
+       return arch_atomic_inc_return_acquire(v);
+#elif defined(arch_atomic_inc_return_relaxed)
        int ret = arch_atomic_inc_return_relaxed(v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_inc_return_acquire arch_atomic_inc_return_acquire
+#elif defined(arch_atomic_inc_return)
+       return arch_atomic_inc_return(v);
+#else
+       return raw_atomic_add_return_acquire(1, v);
 #endif
+}
 
-#ifndef arch_atomic_inc_return_release
+/**
+ * raw_atomic_inc_return_release() - atomic increment with release ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_inc_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_inc_return_release(atomic_t *v)
+raw_atomic_inc_return_release(atomic_t *v)
 {
+#if defined(arch_atomic_inc_return_release)
+       return arch_atomic_inc_return_release(v);
+#elif defined(arch_atomic_inc_return_relaxed)
        __atomic_release_fence();
        return arch_atomic_inc_return_relaxed(v);
-}
-#define arch_atomic_inc_return_release arch_atomic_inc_return_release
+#elif defined(arch_atomic_inc_return)
+       return arch_atomic_inc_return(v);
+#else
+       return raw_atomic_add_return_release(1, v);
 #endif
-
-#ifndef arch_atomic_inc_return
-static __always_inline int
-arch_atomic_inc_return(atomic_t *v)
-{
-       int ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic_inc_return_relaxed(v);
-       __atomic_post_full_fence();
-       return ret;
 }
-#define arch_atomic_inc_return arch_atomic_inc_return
-#endif
-
-#endif /* arch_atomic_inc_return_relaxed */
 
-#ifndef arch_atomic_fetch_inc_relaxed
-#ifdef arch_atomic_fetch_inc
-#define arch_atomic_fetch_inc_acquire arch_atomic_fetch_inc
-#define arch_atomic_fetch_inc_release arch_atomic_fetch_inc
-#define arch_atomic_fetch_inc_relaxed arch_atomic_fetch_inc
-#endif /* arch_atomic_fetch_inc */
-
-#ifndef arch_atomic_fetch_inc
+/**
+ * raw_atomic_inc_return_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_inc_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_inc(atomic_t *v)
+raw_atomic_inc_return_relaxed(atomic_t *v)
 {
-       return arch_atomic_fetch_add(1, v);
-}
-#define arch_atomic_fetch_inc arch_atomic_fetch_inc
+#if defined(arch_atomic_inc_return_relaxed)
+       return arch_atomic_inc_return_relaxed(v);
+#elif defined(arch_atomic_inc_return)
+       return arch_atomic_inc_return(v);
+#else
+       return raw_atomic_add_return_relaxed(1, v);
 #endif
-
-#ifndef arch_atomic_fetch_inc_acquire
-static __always_inline int
-arch_atomic_fetch_inc_acquire(atomic_t *v)
-{
-       return arch_atomic_fetch_add_acquire(1, v);
 }
-#define arch_atomic_fetch_inc_acquire arch_atomic_fetch_inc_acquire
-#endif
 
-#ifndef arch_atomic_fetch_inc_release
+/**
+ * raw_atomic_fetch_inc() - atomic increment with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_inc() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_inc_release(atomic_t *v)
+raw_atomic_fetch_inc(atomic_t *v)
 {
-       return arch_atomic_fetch_add_release(1, v);
-}
-#define arch_atomic_fetch_inc_release arch_atomic_fetch_inc_release
+#if defined(arch_atomic_fetch_inc)
+       return arch_atomic_fetch_inc(v);
+#elif defined(arch_atomic_fetch_inc_relaxed)
+       int ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic_fetch_inc_relaxed(v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic_fetch_add(1, v);
 #endif
-
-#ifndef arch_atomic_fetch_inc_relaxed
-static __always_inline int
-arch_atomic_fetch_inc_relaxed(atomic_t *v)
-{
-       return arch_atomic_fetch_add_relaxed(1, v);
 }
-#define arch_atomic_fetch_inc_relaxed arch_atomic_fetch_inc_relaxed
-#endif
 
-#else /* arch_atomic_fetch_inc_relaxed */
-
-#ifndef arch_atomic_fetch_inc_acquire
+/**
+ * raw_atomic_fetch_inc_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_inc_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_inc_acquire(atomic_t *v)
+raw_atomic_fetch_inc_acquire(atomic_t *v)
 {
+#if defined(arch_atomic_fetch_inc_acquire)
+       return arch_atomic_fetch_inc_acquire(v);
+#elif defined(arch_atomic_fetch_inc_relaxed)
        int ret = arch_atomic_fetch_inc_relaxed(v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_fetch_inc_acquire arch_atomic_fetch_inc_acquire
+#elif defined(arch_atomic_fetch_inc)
+       return arch_atomic_fetch_inc(v);
+#else
+       return raw_atomic_fetch_add_acquire(1, v);
 #endif
+}
 
-#ifndef arch_atomic_fetch_inc_release
+/**
+ * raw_atomic_fetch_inc_release() - atomic increment with release ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_inc_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_inc_release(atomic_t *v)
+raw_atomic_fetch_inc_release(atomic_t *v)
 {
+#if defined(arch_atomic_fetch_inc_release)
+       return arch_atomic_fetch_inc_release(v);
+#elif defined(arch_atomic_fetch_inc_relaxed)
        __atomic_release_fence();
        return arch_atomic_fetch_inc_relaxed(v);
-}
-#define arch_atomic_fetch_inc_release arch_atomic_fetch_inc_release
+#elif defined(arch_atomic_fetch_inc)
+       return arch_atomic_fetch_inc(v);
+#else
+       return raw_atomic_fetch_add_release(1, v);
 #endif
+}
 
-#ifndef arch_atomic_fetch_inc
+/**
+ * raw_atomic_fetch_inc_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_inc_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_inc(atomic_t *v)
+raw_atomic_fetch_inc_relaxed(atomic_t *v)
 {
-       int ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic_fetch_inc_relaxed(v);
-       __atomic_post_full_fence();
-       return ret;
-}
-#define arch_atomic_fetch_inc arch_atomic_fetch_inc
+#if defined(arch_atomic_fetch_inc_relaxed)
+       return arch_atomic_fetch_inc_relaxed(v);
+#elif defined(arch_atomic_fetch_inc)
+       return arch_atomic_fetch_inc(v);
+#else
+       return raw_atomic_fetch_add_relaxed(1, v);
 #endif
-
-#endif /* arch_atomic_fetch_inc_relaxed */
-
-#ifndef arch_atomic_dec
-static __always_inline void
-arch_atomic_dec(atomic_t *v)
-{
-       arch_atomic_sub(1, v);
 }
-#define arch_atomic_dec arch_atomic_dec
-#endif
-
-#ifndef arch_atomic_dec_return_relaxed
-#ifdef arch_atomic_dec_return
-#define arch_atomic_dec_return_acquire arch_atomic_dec_return
-#define arch_atomic_dec_return_release arch_atomic_dec_return
-#define arch_atomic_dec_return_relaxed arch_atomic_dec_return
-#endif /* arch_atomic_dec_return */
 
-#ifndef arch_atomic_dec_return
-static __always_inline int
-arch_atomic_dec_return(atomic_t *v)
+/**
+ * raw_atomic_dec() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_dec() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic_dec(atomic_t *v)
 {
-       return arch_atomic_sub_return(1, v);
-}
-#define arch_atomic_dec_return arch_atomic_dec_return
+#if defined(arch_atomic_dec)
+       arch_atomic_dec(v);
+#else
+       raw_atomic_sub(1, v);
 #endif
-
-#ifndef arch_atomic_dec_return_acquire
-static __always_inline int
-arch_atomic_dec_return_acquire(atomic_t *v)
-{
-       return arch_atomic_sub_return_acquire(1, v);
 }
-#define arch_atomic_dec_return_acquire arch_atomic_dec_return_acquire
-#endif
 
-#ifndef arch_atomic_dec_return_release
+/**
+ * raw_atomic_dec_return() - atomic decrement with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_dec_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_dec_return_release(atomic_t *v)
+raw_atomic_dec_return(atomic_t *v)
 {
-       return arch_atomic_sub_return_release(1, v);
-}
-#define arch_atomic_dec_return_release arch_atomic_dec_return_release
+#if defined(arch_atomic_dec_return)
+       return arch_atomic_dec_return(v);
+#elif defined(arch_atomic_dec_return_relaxed)
+       int ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic_dec_return_relaxed(v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic_sub_return(1, v);
 #endif
-
-#ifndef arch_atomic_dec_return_relaxed
-static __always_inline int
-arch_atomic_dec_return_relaxed(atomic_t *v)
-{
-       return arch_atomic_sub_return_relaxed(1, v);
 }
-#define arch_atomic_dec_return_relaxed arch_atomic_dec_return_relaxed
-#endif
 
-#else /* arch_atomic_dec_return_relaxed */
-
-#ifndef arch_atomic_dec_return_acquire
+/**
+ * raw_atomic_dec_return_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_dec_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_dec_return_acquire(atomic_t *v)
+raw_atomic_dec_return_acquire(atomic_t *v)
 {
+#if defined(arch_atomic_dec_return_acquire)
+       return arch_atomic_dec_return_acquire(v);
+#elif defined(arch_atomic_dec_return_relaxed)
        int ret = arch_atomic_dec_return_relaxed(v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_dec_return_acquire arch_atomic_dec_return_acquire
+#elif defined(arch_atomic_dec_return)
+       return arch_atomic_dec_return(v);
+#else
+       return raw_atomic_sub_return_acquire(1, v);
 #endif
+}
 
-#ifndef arch_atomic_dec_return_release
+/**
+ * raw_atomic_dec_return_release() - atomic decrement with release ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_dec_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
-arch_atomic_dec_return_release(atomic_t *v)
+raw_atomic_dec_return_release(atomic_t *v)
 {
+#if defined(arch_atomic_dec_return_release)
+       return arch_atomic_dec_return_release(v);
+#elif defined(arch_atomic_dec_return_relaxed)
        __atomic_release_fence();
        return arch_atomic_dec_return_relaxed(v);
+#elif defined(arch_atomic_dec_return)
+       return arch_atomic_dec_return(v);
+#else
+       return raw_atomic_sub_return_release(1, v);
+#endif
 }
-#define arch_atomic_dec_return_release arch_atomic_dec_return_release
+
+/**
+ * raw_atomic_dec_return_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_dec_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
+static __always_inline int
+raw_atomic_dec_return_relaxed(atomic_t *v)
+{
+#if defined(arch_atomic_dec_return_relaxed)
+       return arch_atomic_dec_return_relaxed(v);
+#elif defined(arch_atomic_dec_return)
+       return arch_atomic_dec_return(v);
+#else
+       return raw_atomic_sub_return_relaxed(1, v);
 #endif
+}
 
-#ifndef arch_atomic_dec_return
+/**
+ * raw_atomic_fetch_dec() - atomic decrement with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_dec() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_dec_return(atomic_t *v)
+raw_atomic_fetch_dec(atomic_t *v)
 {
+#if defined(arch_atomic_fetch_dec)
+       return arch_atomic_fetch_dec(v);
+#elif defined(arch_atomic_fetch_dec_relaxed)
        int ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic_dec_return_relaxed(v);
+       ret = arch_atomic_fetch_dec_relaxed(v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic_dec_return arch_atomic_dec_return
+#else
+       return raw_atomic_fetch_sub(1, v);
 #endif
-
-#endif /* arch_atomic_dec_return_relaxed */
-
-#ifndef arch_atomic_fetch_dec_relaxed
-#ifdef arch_atomic_fetch_dec
-#define arch_atomic_fetch_dec_acquire arch_atomic_fetch_dec
-#define arch_atomic_fetch_dec_release arch_atomic_fetch_dec
-#define arch_atomic_fetch_dec_relaxed arch_atomic_fetch_dec
-#endif /* arch_atomic_fetch_dec */
-
-#ifndef arch_atomic_fetch_dec
-static __always_inline int
-arch_atomic_fetch_dec(atomic_t *v)
-{
-       return arch_atomic_fetch_sub(1, v);
 }
-#define arch_atomic_fetch_dec arch_atomic_fetch_dec
-#endif
 
-#ifndef arch_atomic_fetch_dec_acquire
+/**
+ * raw_atomic_fetch_dec_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_dec_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_dec_acquire(atomic_t *v)
+raw_atomic_fetch_dec_acquire(atomic_t *v)
 {
-       return arch_atomic_fetch_sub_acquire(1, v);
-}
-#define arch_atomic_fetch_dec_acquire arch_atomic_fetch_dec_acquire
+#if defined(arch_atomic_fetch_dec_acquire)
+       return arch_atomic_fetch_dec_acquire(v);
+#elif defined(arch_atomic_fetch_dec_relaxed)
+       int ret = arch_atomic_fetch_dec_relaxed(v);
+       __atomic_acquire_fence();
+       return ret;
+#elif defined(arch_atomic_fetch_dec)
+       return arch_atomic_fetch_dec(v);
+#else
+       return raw_atomic_fetch_sub_acquire(1, v);
 #endif
-
-#ifndef arch_atomic_fetch_dec_release
-static __always_inline int
-arch_atomic_fetch_dec_release(atomic_t *v)
-{
-       return arch_atomic_fetch_sub_release(1, v);
 }
-#define arch_atomic_fetch_dec_release arch_atomic_fetch_dec_release
-#endif
 
-#ifndef arch_atomic_fetch_dec_relaxed
+/**
+ * raw_atomic_fetch_dec_release() - atomic decrement with release ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_dec_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_dec_relaxed(atomic_t *v)
+raw_atomic_fetch_dec_release(atomic_t *v)
 {
-       return arch_atomic_fetch_sub_relaxed(1, v);
-}
-#define arch_atomic_fetch_dec_relaxed arch_atomic_fetch_dec_relaxed
+#if defined(arch_atomic_fetch_dec_release)
+       return arch_atomic_fetch_dec_release(v);
+#elif defined(arch_atomic_fetch_dec_relaxed)
+       __atomic_release_fence();
+       return arch_atomic_fetch_dec_relaxed(v);
+#elif defined(arch_atomic_fetch_dec)
+       return arch_atomic_fetch_dec(v);
+#else
+       return raw_atomic_fetch_sub_release(1, v);
 #endif
+}
 
-#else /* arch_atomic_fetch_dec_relaxed */
-
-#ifndef arch_atomic_fetch_dec_acquire
+/**
+ * raw_atomic_fetch_dec_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_dec_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_dec_acquire(atomic_t *v)
+raw_atomic_fetch_dec_relaxed(atomic_t *v)
 {
-       int ret = arch_atomic_fetch_dec_relaxed(v);
-       __atomic_acquire_fence();
-       return ret;
-}
-#define arch_atomic_fetch_dec_acquire arch_atomic_fetch_dec_acquire
+#if defined(arch_atomic_fetch_dec_relaxed)
+       return arch_atomic_fetch_dec_relaxed(v);
+#elif defined(arch_atomic_fetch_dec)
+       return arch_atomic_fetch_dec(v);
+#else
+       return raw_atomic_fetch_sub_relaxed(1, v);
 #endif
+}
 
-#ifndef arch_atomic_fetch_dec_release
-static __always_inline int
-arch_atomic_fetch_dec_release(atomic_t *v)
+/**
+ * raw_atomic_and() - atomic bitwise AND with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_and() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic_and(int i, atomic_t *v)
 {
-       __atomic_release_fence();
-       return arch_atomic_fetch_dec_relaxed(v);
+       arch_atomic_and(i, v);
 }
-#define arch_atomic_fetch_dec_release arch_atomic_fetch_dec_release
-#endif
 
-#ifndef arch_atomic_fetch_dec
+/**
+ * raw_atomic_fetch_and() - atomic bitwise AND with full ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_and() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_dec(atomic_t *v)
+raw_atomic_fetch_and(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_and)
+       return arch_atomic_fetch_and(i, v);
+#elif defined(arch_atomic_fetch_and_relaxed)
        int ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic_fetch_dec_relaxed(v);
+       ret = arch_atomic_fetch_and_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic_fetch_dec arch_atomic_fetch_dec
+#else
+#error "Unable to define raw_atomic_fetch_and"
 #endif
+}
 
-#endif /* arch_atomic_fetch_dec_relaxed */
-
-#ifndef arch_atomic_fetch_and_relaxed
-#define arch_atomic_fetch_and_acquire arch_atomic_fetch_and
-#define arch_atomic_fetch_and_release arch_atomic_fetch_and
-#define arch_atomic_fetch_and_relaxed arch_atomic_fetch_and
-#else /* arch_atomic_fetch_and_relaxed */
-
-#ifndef arch_atomic_fetch_and_acquire
+/**
+ * raw_atomic_fetch_and_acquire() - atomic bitwise AND with acquire ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_and_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_and_acquire(int i, atomic_t *v)
+raw_atomic_fetch_and_acquire(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_and_acquire)
+       return arch_atomic_fetch_and_acquire(i, v);
+#elif defined(arch_atomic_fetch_and_relaxed)
        int ret = arch_atomic_fetch_and_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_fetch_and_acquire arch_atomic_fetch_and_acquire
+#elif defined(arch_atomic_fetch_and)
+       return arch_atomic_fetch_and(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_and_acquire"
 #endif
+}
 
-#ifndef arch_atomic_fetch_and_release
+/**
+ * raw_atomic_fetch_and_release() - atomic bitwise AND with release ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_and_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_and_release(int i, atomic_t *v)
+raw_atomic_fetch_and_release(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_and_release)
+       return arch_atomic_fetch_and_release(i, v);
+#elif defined(arch_atomic_fetch_and_relaxed)
        __atomic_release_fence();
        return arch_atomic_fetch_and_relaxed(i, v);
-}
-#define arch_atomic_fetch_and_release arch_atomic_fetch_and_release
+#elif defined(arch_atomic_fetch_and)
+       return arch_atomic_fetch_and(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_and_release"
 #endif
+}
 
-#ifndef arch_atomic_fetch_and
+/**
+ * raw_atomic_fetch_and_relaxed() - atomic bitwise AND with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_and_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_and(int i, atomic_t *v)
+raw_atomic_fetch_and_relaxed(int i, atomic_t *v)
 {
-       int ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic_fetch_and_relaxed(i, v);
-       __atomic_post_full_fence();
-       return ret;
-}
-#define arch_atomic_fetch_and arch_atomic_fetch_and
+#if defined(arch_atomic_fetch_and_relaxed)
+       return arch_atomic_fetch_and_relaxed(i, v);
+#elif defined(arch_atomic_fetch_and)
+       return arch_atomic_fetch_and(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_and_relaxed"
 #endif
+}
 
-#endif /* arch_atomic_fetch_and_relaxed */
-
-#ifndef arch_atomic_andnot
+/**
+ * raw_atomic_andnot() - atomic bitwise AND NOT with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_andnot() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_andnot(int i, atomic_t *v)
+raw_atomic_andnot(int i, atomic_t *v)
 {
-       arch_atomic_and(~i, v);
-}
-#define arch_atomic_andnot arch_atomic_andnot
+#if defined(arch_atomic_andnot)
+       arch_atomic_andnot(i, v);
+#else
+       raw_atomic_and(~i, v);
 #endif
-
-#ifndef arch_atomic_fetch_andnot_relaxed
-#ifdef arch_atomic_fetch_andnot
-#define arch_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
-#define arch_atomic_fetch_andnot_release arch_atomic_fetch_andnot
-#define arch_atomic_fetch_andnot_relaxed arch_atomic_fetch_andnot
-#endif /* arch_atomic_fetch_andnot */
-
-#ifndef arch_atomic_fetch_andnot
-static __always_inline int
-arch_atomic_fetch_andnot(int i, atomic_t *v)
-{
-       return arch_atomic_fetch_and(~i, v);
 }
-#define arch_atomic_fetch_andnot arch_atomic_fetch_andnot
-#endif
 
-#ifndef arch_atomic_fetch_andnot_acquire
-static __always_inline int
-arch_atomic_fetch_andnot_acquire(int i, atomic_t *v)
-{
-       return arch_atomic_fetch_and_acquire(~i, v);
-}
-#define arch_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
-#endif
-
-#ifndef arch_atomic_fetch_andnot_release
+/**
+ * raw_atomic_fetch_andnot() - atomic bitwise AND NOT with full ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_andnot() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_andnot_release(int i, atomic_t *v)
+raw_atomic_fetch_andnot(int i, atomic_t *v)
 {
-       return arch_atomic_fetch_and_release(~i, v);
-}
-#define arch_atomic_fetch_andnot_release arch_atomic_fetch_andnot_release
+#if defined(arch_atomic_fetch_andnot)
+       return arch_atomic_fetch_andnot(i, v);
+#elif defined(arch_atomic_fetch_andnot_relaxed)
+       int ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic_fetch_andnot_relaxed(i, v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic_fetch_and(~i, v);
 #endif
-
-#ifndef arch_atomic_fetch_andnot_relaxed
-static __always_inline int
-arch_atomic_fetch_andnot_relaxed(int i, atomic_t *v)
-{
-       return arch_atomic_fetch_and_relaxed(~i, v);
 }
-#define arch_atomic_fetch_andnot_relaxed arch_atomic_fetch_andnot_relaxed
-#endif
-
-#else /* arch_atomic_fetch_andnot_relaxed */
 
-#ifndef arch_atomic_fetch_andnot_acquire
+/**
+ * raw_atomic_fetch_andnot_acquire() - atomic bitwise AND NOT with acquire ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_andnot_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_andnot_acquire(int i, atomic_t *v)
+raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_andnot_acquire)
+       return arch_atomic_fetch_andnot_acquire(i, v);
+#elif defined(arch_atomic_fetch_andnot_relaxed)
        int ret = arch_atomic_fetch_andnot_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
+#elif defined(arch_atomic_fetch_andnot)
+       return arch_atomic_fetch_andnot(i, v);
+#else
+       return raw_atomic_fetch_and_acquire(~i, v);
 #endif
+}
 
-#ifndef arch_atomic_fetch_andnot_release
+/**
+ * raw_atomic_fetch_andnot_release() - atomic bitwise AND NOT with release ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_andnot_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_andnot_release(int i, atomic_t *v)
+raw_atomic_fetch_andnot_release(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_andnot_release)
+       return arch_atomic_fetch_andnot_release(i, v);
+#elif defined(arch_atomic_fetch_andnot_relaxed)
        __atomic_release_fence();
        return arch_atomic_fetch_andnot_relaxed(i, v);
+#elif defined(arch_atomic_fetch_andnot)
+       return arch_atomic_fetch_andnot(i, v);
+#else
+       return raw_atomic_fetch_and_release(~i, v);
+#endif
 }
-#define arch_atomic_fetch_andnot_release arch_atomic_fetch_andnot_release
+
+/**
+ * raw_atomic_fetch_andnot_relaxed() - atomic bitwise AND NOT with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_andnot_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline int
+raw_atomic_fetch_andnot_relaxed(int i, atomic_t *v)
+{
+#if defined(arch_atomic_fetch_andnot_relaxed)
+       return arch_atomic_fetch_andnot_relaxed(i, v);
+#elif defined(arch_atomic_fetch_andnot)
+       return arch_atomic_fetch_andnot(i, v);
+#else
+       return raw_atomic_fetch_and_relaxed(~i, v);
 #endif
+}
+
+/**
+ * raw_atomic_or() - atomic bitwise OR with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_or() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic_or(int i, atomic_t *v)
+{
+       arch_atomic_or(i, v);
+}
 
-#ifndef arch_atomic_fetch_andnot
+/**
+ * raw_atomic_fetch_or() - atomic bitwise OR with full ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_or() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_andnot(int i, atomic_t *v)
+raw_atomic_fetch_or(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_or)
+       return arch_atomic_fetch_or(i, v);
+#elif defined(arch_atomic_fetch_or_relaxed)
        int ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic_fetch_andnot_relaxed(i, v);
+       ret = arch_atomic_fetch_or_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic_fetch_andnot arch_atomic_fetch_andnot
+#else
+#error "Unable to define raw_atomic_fetch_or"
 #endif
+}
 
-#endif /* arch_atomic_fetch_andnot_relaxed */
-
-#ifndef arch_atomic_fetch_or_relaxed
-#define arch_atomic_fetch_or_acquire arch_atomic_fetch_or
-#define arch_atomic_fetch_or_release arch_atomic_fetch_or
-#define arch_atomic_fetch_or_relaxed arch_atomic_fetch_or
-#else /* arch_atomic_fetch_or_relaxed */
-
-#ifndef arch_atomic_fetch_or_acquire
+/**
+ * raw_atomic_fetch_or_acquire() - atomic bitwise OR with acquire ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_or_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_or_acquire(int i, atomic_t *v)
+raw_atomic_fetch_or_acquire(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_or_acquire)
+       return arch_atomic_fetch_or_acquire(i, v);
+#elif defined(arch_atomic_fetch_or_relaxed)
        int ret = arch_atomic_fetch_or_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_fetch_or_acquire arch_atomic_fetch_or_acquire
+#elif defined(arch_atomic_fetch_or)
+       return arch_atomic_fetch_or(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_or_acquire"
 #endif
+}
 
-#ifndef arch_atomic_fetch_or_release
+/**
+ * raw_atomic_fetch_or_release() - atomic bitwise OR with release ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_or_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_or_release(int i, atomic_t *v)
+raw_atomic_fetch_or_release(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_or_release)
+       return arch_atomic_fetch_or_release(i, v);
+#elif defined(arch_atomic_fetch_or_relaxed)
        __atomic_release_fence();
        return arch_atomic_fetch_or_relaxed(i, v);
+#elif defined(arch_atomic_fetch_or)
+       return arch_atomic_fetch_or(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_or_release"
+#endif
 }
-#define arch_atomic_fetch_or_release arch_atomic_fetch_or_release
+
+/**
+ * raw_atomic_fetch_or_relaxed() - atomic bitwise OR with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_or_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline int
+raw_atomic_fetch_or_relaxed(int i, atomic_t *v)
+{
+#if defined(arch_atomic_fetch_or_relaxed)
+       return arch_atomic_fetch_or_relaxed(i, v);
+#elif defined(arch_atomic_fetch_or)
+       return arch_atomic_fetch_or(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_or_relaxed"
 #endif
+}
 
-#ifndef arch_atomic_fetch_or
+/**
+ * raw_atomic_xor() - atomic bitwise XOR with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_xor() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic_xor(int i, atomic_t *v)
+{
+       arch_atomic_xor(i, v);
+}
+
+/**
+ * raw_atomic_fetch_xor() - atomic bitwise XOR with full ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_xor() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_or(int i, atomic_t *v)
+raw_atomic_fetch_xor(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_xor)
+       return arch_atomic_fetch_xor(i, v);
+#elif defined(arch_atomic_fetch_xor_relaxed)
        int ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic_fetch_or_relaxed(i, v);
+       ret = arch_atomic_fetch_xor_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic_fetch_or arch_atomic_fetch_or
+#else
+#error "Unable to define raw_atomic_fetch_xor"
 #endif
+}
 
-#endif /* arch_atomic_fetch_or_relaxed */
-
-#ifndef arch_atomic_fetch_xor_relaxed
-#define arch_atomic_fetch_xor_acquire arch_atomic_fetch_xor
-#define arch_atomic_fetch_xor_release arch_atomic_fetch_xor
-#define arch_atomic_fetch_xor_relaxed arch_atomic_fetch_xor
-#else /* arch_atomic_fetch_xor_relaxed */
-
-#ifndef arch_atomic_fetch_xor_acquire
+/**
+ * raw_atomic_fetch_xor_acquire() - atomic bitwise XOR with acquire ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_xor_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_xor_acquire(int i, atomic_t *v)
+raw_atomic_fetch_xor_acquire(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_xor_acquire)
+       return arch_atomic_fetch_xor_acquire(i, v);
+#elif defined(arch_atomic_fetch_xor_relaxed)
        int ret = arch_atomic_fetch_xor_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_fetch_xor_acquire arch_atomic_fetch_xor_acquire
+#elif defined(arch_atomic_fetch_xor)
+       return arch_atomic_fetch_xor(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_xor_acquire"
 #endif
+}
 
-#ifndef arch_atomic_fetch_xor_release
+/**
+ * raw_atomic_fetch_xor_release() - atomic bitwise XOR with release ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_xor_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_xor_release(int i, atomic_t *v)
+raw_atomic_fetch_xor_release(int i, atomic_t *v)
 {
+#if defined(arch_atomic_fetch_xor_release)
+       return arch_atomic_fetch_xor_release(i, v);
+#elif defined(arch_atomic_fetch_xor_relaxed)
        __atomic_release_fence();
        return arch_atomic_fetch_xor_relaxed(i, v);
+#elif defined(arch_atomic_fetch_xor)
+       return arch_atomic_fetch_xor(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_xor_release"
+#endif
 }
-#define arch_atomic_fetch_xor_release arch_atomic_fetch_xor_release
+
+/**
+ * raw_atomic_fetch_xor_relaxed() - atomic bitwise XOR with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_xor_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline int
+raw_atomic_fetch_xor_relaxed(int i, atomic_t *v)
+{
+#if defined(arch_atomic_fetch_xor_relaxed)
+       return arch_atomic_fetch_xor_relaxed(i, v);
+#elif defined(arch_atomic_fetch_xor)
+       return arch_atomic_fetch_xor(i, v);
+#else
+#error "Unable to define raw_atomic_fetch_xor_relaxed"
 #endif
+}
 
-#ifndef arch_atomic_fetch_xor
+/**
+ * raw_atomic_xchg() - atomic exchange with full ordering
+ * @v: pointer to atomic_t
+ * @new: int value to assign
+ *
+ * Atomically updates @v to @new with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_xchg() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_fetch_xor(int i, atomic_t *v)
+raw_atomic_xchg(atomic_t *v, int new)
 {
+#if defined(arch_atomic_xchg)
+       return arch_atomic_xchg(v, new);
+#elif defined(arch_atomic_xchg_relaxed)
        int ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic_fetch_xor_relaxed(i, v);
+       ret = arch_atomic_xchg_relaxed(v, new);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic_fetch_xor arch_atomic_fetch_xor
+#else
+       return raw_xchg(&v->counter, new);
 #endif
+}
 
-#endif /* arch_atomic_fetch_xor_relaxed */
-
-#ifndef arch_atomic_xchg_relaxed
-#define arch_atomic_xchg_acquire arch_atomic_xchg
-#define arch_atomic_xchg_release arch_atomic_xchg
-#define arch_atomic_xchg_relaxed arch_atomic_xchg
-#else /* arch_atomic_xchg_relaxed */
-
-#ifndef arch_atomic_xchg_acquire
+/**
+ * raw_atomic_xchg_acquire() - atomic exchange with acquire ordering
+ * @v: pointer to atomic_t
+ * @new: int value to assign
+ *
+ * Atomically updates @v to @new with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_xchg_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_xchg_acquire(atomic_t *v, int i)
+raw_atomic_xchg_acquire(atomic_t *v, int new)
 {
-       int ret = arch_atomic_xchg_relaxed(v, i);
+#if defined(arch_atomic_xchg_acquire)
+       return arch_atomic_xchg_acquire(v, new);
+#elif defined(arch_atomic_xchg_relaxed)
+       int ret = arch_atomic_xchg_relaxed(v, new);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_xchg_acquire arch_atomic_xchg_acquire
+#elif defined(arch_atomic_xchg)
+       return arch_atomic_xchg(v, new);
+#else
+       return raw_xchg_acquire(&v->counter, new);
 #endif
+}
 
-#ifndef arch_atomic_xchg_release
+/**
+ * raw_atomic_xchg_release() - atomic exchange with release ordering
+ * @v: pointer to atomic_t
+ * @new: int value to assign
+ *
+ * Atomically updates @v to @new with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_xchg_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_xchg_release(atomic_t *v, int i)
+raw_atomic_xchg_release(atomic_t *v, int new)
 {
+#if defined(arch_atomic_xchg_release)
+       return arch_atomic_xchg_release(v, new);
+#elif defined(arch_atomic_xchg_relaxed)
        __atomic_release_fence();
-       return arch_atomic_xchg_relaxed(v, i);
+       return arch_atomic_xchg_relaxed(v, new);
+#elif defined(arch_atomic_xchg)
+       return arch_atomic_xchg(v, new);
+#else
+       return raw_xchg_release(&v->counter, new);
+#endif
 }
-#define arch_atomic_xchg_release arch_atomic_xchg_release
+
+/**
+ * raw_atomic_xchg_relaxed() - atomic exchange with relaxed ordering
+ * @v: pointer to atomic_t
+ * @new: int value to assign
+ *
+ * Atomically updates @v to @new with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_xchg_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline int
+raw_atomic_xchg_relaxed(atomic_t *v, int new)
+{
+#if defined(arch_atomic_xchg_relaxed)
+       return arch_atomic_xchg_relaxed(v, new);
+#elif defined(arch_atomic_xchg)
+       return arch_atomic_xchg(v, new);
+#else
+       return raw_xchg_relaxed(&v->counter, new);
 #endif
+}
 
-#ifndef arch_atomic_xchg
+/**
+ * raw_atomic_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic_t
+ * @old: int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_cmpxchg() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_xchg(atomic_t *v, int i)
+raw_atomic_cmpxchg(atomic_t *v, int old, int new)
 {
+#if defined(arch_atomic_cmpxchg)
+       return arch_atomic_cmpxchg(v, old, new);
+#elif defined(arch_atomic_cmpxchg_relaxed)
        int ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic_xchg_relaxed(v, i);
+       ret = arch_atomic_cmpxchg_relaxed(v, old, new);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic_xchg arch_atomic_xchg
-#endif
-
-#endif /* arch_atomic_xchg_relaxed */
-
-#ifndef arch_atomic_cmpxchg_relaxed
-#define arch_atomic_cmpxchg_acquire arch_atomic_cmpxchg
-#define arch_atomic_cmpxchg_release arch_atomic_cmpxchg
-#define arch_atomic_cmpxchg_relaxed arch_atomic_cmpxchg
-#else /* arch_atomic_cmpxchg_relaxed */
+#else
+       return raw_cmpxchg(&v->counter, old, new);
+#endif
+}
 
-#ifndef arch_atomic_cmpxchg_acquire
+/**
+ * raw_atomic_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic_t
+ * @old: int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_cmpxchg_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
+raw_atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 {
+#if defined(arch_atomic_cmpxchg_acquire)
+       return arch_atomic_cmpxchg_acquire(v, old, new);
+#elif defined(arch_atomic_cmpxchg_relaxed)
        int ret = arch_atomic_cmpxchg_relaxed(v, old, new);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic_cmpxchg_acquire arch_atomic_cmpxchg_acquire
+#elif defined(arch_atomic_cmpxchg)
+       return arch_atomic_cmpxchg(v, old, new);
+#else
+       return raw_cmpxchg_acquire(&v->counter, old, new);
 #endif
+}
 
-#ifndef arch_atomic_cmpxchg_release
+/**
+ * raw_atomic_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic_t
+ * @old: int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_cmpxchg_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_cmpxchg_release(atomic_t *v, int old, int new)
+raw_atomic_cmpxchg_release(atomic_t *v, int old, int new)
 {
+#if defined(arch_atomic_cmpxchg_release)
+       return arch_atomic_cmpxchg_release(v, old, new);
+#elif defined(arch_atomic_cmpxchg_relaxed)
        __atomic_release_fence();
        return arch_atomic_cmpxchg_relaxed(v, old, new);
-}
-#define arch_atomic_cmpxchg_release arch_atomic_cmpxchg_release
+#elif defined(arch_atomic_cmpxchg)
+       return arch_atomic_cmpxchg(v, old, new);
+#else
+       return raw_cmpxchg_release(&v->counter, old, new);
 #endif
+}
 
-#ifndef arch_atomic_cmpxchg
+/**
+ * raw_atomic_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic_t
+ * @old: int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_cmpxchg_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-arch_atomic_cmpxchg(atomic_t *v, int old, int new)
+raw_atomic_cmpxchg_relaxed(atomic_t *v, int old, int new)
 {
-       int ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic_cmpxchg_relaxed(v, old, new);
-       __atomic_post_full_fence();
-       return ret;
-}
-#define arch_atomic_cmpxchg arch_atomic_cmpxchg
+#if defined(arch_atomic_cmpxchg_relaxed)
+       return arch_atomic_cmpxchg_relaxed(v, old, new);
+#elif defined(arch_atomic_cmpxchg)
+       return arch_atomic_cmpxchg(v, old, new);
+#else
+       return raw_cmpxchg_relaxed(&v->counter, old, new);
 #endif
+}
 
-#endif /* arch_atomic_cmpxchg_relaxed */
-
-#ifndef arch_atomic_try_cmpxchg_relaxed
-#ifdef arch_atomic_try_cmpxchg
-#define arch_atomic_try_cmpxchg_acquire arch_atomic_try_cmpxchg
-#define arch_atomic_try_cmpxchg_release arch_atomic_try_cmpxchg
-#define arch_atomic_try_cmpxchg_relaxed arch_atomic_try_cmpxchg
-#endif /* arch_atomic_try_cmpxchg */
-
-#ifndef arch_atomic_try_cmpxchg
+/**
+ * raw_atomic_try_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic_t
+ * @old: pointer to int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic_try_cmpxchg() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_try_cmpxchg(atomic_t *v, int *old, int new)
+raw_atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
+#if defined(arch_atomic_try_cmpxchg)
+       return arch_atomic_try_cmpxchg(v, old, new);
+#elif defined(arch_atomic_try_cmpxchg_relaxed)
+       bool ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic_try_cmpxchg_relaxed(v, old, new);
+       __atomic_post_full_fence();
+       return ret;
+#else
        int r, o = *old;
-       r = arch_atomic_cmpxchg(v, o, new);
+       r = raw_atomic_cmpxchg(v, o, new);
        if (unlikely(r != o))
                *old = r;
        return likely(r == o);
-}
-#define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg
 #endif
+}
 
-#ifndef arch_atomic_try_cmpxchg_acquire
+/**
+ * raw_atomic_try_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic_t
+ * @old: pointer to int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic_try_cmpxchg_acquire() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
+raw_atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
+#if defined(arch_atomic_try_cmpxchg_acquire)
+       return arch_atomic_try_cmpxchg_acquire(v, old, new);
+#elif defined(arch_atomic_try_cmpxchg_relaxed)
+       bool ret = arch_atomic_try_cmpxchg_relaxed(v, old, new);
+       __atomic_acquire_fence();
+       return ret;
+#elif defined(arch_atomic_try_cmpxchg)
+       return arch_atomic_try_cmpxchg(v, old, new);
+#else
        int r, o = *old;
-       r = arch_atomic_cmpxchg_acquire(v, o, new);
+       r = raw_atomic_cmpxchg_acquire(v, o, new);
        if (unlikely(r != o))
                *old = r;
        return likely(r == o);
-}
-#define arch_atomic_try_cmpxchg_acquire arch_atomic_try_cmpxchg_acquire
 #endif
+}
 
-#ifndef arch_atomic_try_cmpxchg_release
+/**
+ * raw_atomic_try_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic_t
+ * @old: pointer to int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic_try_cmpxchg_release() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
+raw_atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
+#if defined(arch_atomic_try_cmpxchg_release)
+       return arch_atomic_try_cmpxchg_release(v, old, new);
+#elif defined(arch_atomic_try_cmpxchg_relaxed)
+       __atomic_release_fence();
+       return arch_atomic_try_cmpxchg_relaxed(v, old, new);
+#elif defined(arch_atomic_try_cmpxchg)
+       return arch_atomic_try_cmpxchg(v, old, new);
+#else
        int r, o = *old;
-       r = arch_atomic_cmpxchg_release(v, o, new);
+       r = raw_atomic_cmpxchg_release(v, o, new);
        if (unlikely(r != o))
                *old = r;
        return likely(r == o);
-}
-#define arch_atomic_try_cmpxchg_release arch_atomic_try_cmpxchg_release
 #endif
+}
 
-#ifndef arch_atomic_try_cmpxchg_relaxed
+/**
+ * raw_atomic_try_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic_t
+ * @old: pointer to int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic_try_cmpxchg_relaxed() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
+raw_atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 {
+#if defined(arch_atomic_try_cmpxchg_relaxed)
+       return arch_atomic_try_cmpxchg_relaxed(v, old, new);
+#elif defined(arch_atomic_try_cmpxchg)
+       return arch_atomic_try_cmpxchg(v, old, new);
+#else
        int r, o = *old;
-       r = arch_atomic_cmpxchg_relaxed(v, o, new);
+       r = raw_atomic_cmpxchg_relaxed(v, o, new);
        if (unlikely(r != o))
                *old = r;
        return likely(r == o);
-}
-#define arch_atomic_try_cmpxchg_relaxed arch_atomic_try_cmpxchg_relaxed
-#endif
-
-#else /* arch_atomic_try_cmpxchg_relaxed */
-
-#ifndef arch_atomic_try_cmpxchg_acquire
-static __always_inline bool
-arch_atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
-{
-       bool ret = arch_atomic_try_cmpxchg_relaxed(v, old, new);
-       __atomic_acquire_fence();
-       return ret;
-}
-#define arch_atomic_try_cmpxchg_acquire arch_atomic_try_cmpxchg_acquire
-#endif
-
-#ifndef arch_atomic_try_cmpxchg_release
-static __always_inline bool
-arch_atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
-{
-       __atomic_release_fence();
-       return arch_atomic_try_cmpxchg_relaxed(v, old, new);
-}
-#define arch_atomic_try_cmpxchg_release arch_atomic_try_cmpxchg_release
 #endif
-
-#ifndef arch_atomic_try_cmpxchg
-static __always_inline bool
-arch_atomic_try_cmpxchg(atomic_t *v, int *old, int new)
-{
-       bool ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic_try_cmpxchg_relaxed(v, old, new);
-       __atomic_post_full_fence();
-       return ret;
 }
-#define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg
-#endif
 
-#endif /* arch_atomic_try_cmpxchg_relaxed */
-
-#ifndef arch_atomic_sub_and_test
 /**
- * arch_atomic_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
+ * raw_atomic_sub_and_test() - atomic subtract and test if zero with full ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
  *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
+ * Safe to use in noinstr code; prefer atomic_sub_and_test() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
  */
 static __always_inline bool
-arch_atomic_sub_and_test(int i, atomic_t *v)
+raw_atomic_sub_and_test(int i, atomic_t *v)
 {
-       return arch_atomic_sub_return(i, v) == 0;
-}
-#define arch_atomic_sub_and_test arch_atomic_sub_and_test
+#if defined(arch_atomic_sub_and_test)
+       return arch_atomic_sub_and_test(i, v);
+#else
+       return raw_atomic_sub_return(i, v) == 0;
 #endif
+}
 
-#ifndef arch_atomic_dec_and_test
 /**
- * arch_atomic_dec_and_test - decrement and test
- * @v: pointer of type atomic_t
+ * raw_atomic_dec_and_test() - atomic decrement and test if zero with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
  *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
+ * Safe to use in noinstr code; prefer atomic_dec_and_test() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
  */
 static __always_inline bool
-arch_atomic_dec_and_test(atomic_t *v)
+raw_atomic_dec_and_test(atomic_t *v)
 {
-       return arch_atomic_dec_return(v) == 0;
-}
-#define arch_atomic_dec_and_test arch_atomic_dec_and_test
+#if defined(arch_atomic_dec_and_test)
+       return arch_atomic_dec_and_test(v);
+#else
+       return raw_atomic_dec_return(v) == 0;
 #endif
+}
 
-#ifndef arch_atomic_inc_and_test
 /**
- * arch_atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
+ * raw_atomic_inc_and_test() - atomic increment and test if zero with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
  *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
+ * Safe to use in noinstr code; prefer atomic_inc_and_test() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
  */
 static __always_inline bool
-arch_atomic_inc_and_test(atomic_t *v)
+raw_atomic_inc_and_test(atomic_t *v)
 {
-       return arch_atomic_inc_return(v) == 0;
-}
-#define arch_atomic_inc_and_test arch_atomic_inc_and_test
+#if defined(arch_atomic_inc_and_test)
+       return arch_atomic_inc_and_test(v);
+#else
+       return raw_atomic_inc_return(v) == 0;
 #endif
+}
 
-#ifndef arch_atomic_add_negative_relaxed
-#ifdef arch_atomic_add_negative
-#define arch_atomic_add_negative_acquire arch_atomic_add_negative
-#define arch_atomic_add_negative_release arch_atomic_add_negative
-#define arch_atomic_add_negative_relaxed arch_atomic_add_negative
-#endif /* arch_atomic_add_negative */
-
-#ifndef arch_atomic_add_negative
 /**
- * arch_atomic_add_negative - Add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic_t
+ * raw_atomic_add_negative() - atomic add and test if negative with full ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
  *
- * Atomically adds @i to @v and returns true if the result is negative,
- * or false when the result is greater than or equal to zero.
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_add_negative() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
  */
 static __always_inline bool
-arch_atomic_add_negative(int i, atomic_t *v)
+raw_atomic_add_negative(int i, atomic_t *v)
 {
-       return arch_atomic_add_return(i, v) < 0;
-}
-#define arch_atomic_add_negative arch_atomic_add_negative
+#if defined(arch_atomic_add_negative)
+       return arch_atomic_add_negative(i, v);
+#elif defined(arch_atomic_add_negative_relaxed)
+       bool ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic_add_negative_relaxed(i, v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic_add_return(i, v) < 0;
 #endif
+}
 
-#ifndef arch_atomic_add_negative_acquire
 /**
- * arch_atomic_add_negative_acquire - Add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic_t
+ * raw_atomic_add_negative_acquire() - atomic add and test if negative with acquire ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
  *
- * Atomically adds @i to @v and returns true if the result is negative,
- * or false when the result is greater than or equal to zero.
+ * Safe to use in noinstr code; prefer atomic_add_negative_acquire() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
  */
 static __always_inline bool
-arch_atomic_add_negative_acquire(int i, atomic_t *v)
+raw_atomic_add_negative_acquire(int i, atomic_t *v)
 {
-       return arch_atomic_add_return_acquire(i, v) < 0;
-}
-#define arch_atomic_add_negative_acquire arch_atomic_add_negative_acquire
+#if defined(arch_atomic_add_negative_acquire)
+       return arch_atomic_add_negative_acquire(i, v);
+#elif defined(arch_atomic_add_negative_relaxed)
+       bool ret = arch_atomic_add_negative_relaxed(i, v);
+       __atomic_acquire_fence();
+       return ret;
+#elif defined(arch_atomic_add_negative)
+       return arch_atomic_add_negative(i, v);
+#else
+       return raw_atomic_add_return_acquire(i, v) < 0;
 #endif
+}
 
-#ifndef arch_atomic_add_negative_release
 /**
- * arch_atomic_add_negative_release - Add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic_t
+ * raw_atomic_add_negative_release() - atomic add and test if negative with release ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
  *
- * Atomically adds @i to @v and returns true if the result is negative,
- * or false when the result is greater than or equal to zero.
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_add_negative_release() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
  */
 static __always_inline bool
-arch_atomic_add_negative_release(int i, atomic_t *v)
+raw_atomic_add_negative_release(int i, atomic_t *v)
 {
-       return arch_atomic_add_return_release(i, v) < 0;
-}
-#define arch_atomic_add_negative_release arch_atomic_add_negative_release
+#if defined(arch_atomic_add_negative_release)
+       return arch_atomic_add_negative_release(i, v);
+#elif defined(arch_atomic_add_negative_relaxed)
+       __atomic_release_fence();
+       return arch_atomic_add_negative_relaxed(i, v);
+#elif defined(arch_atomic_add_negative)
+       return arch_atomic_add_negative(i, v);
+#else
+       return raw_atomic_add_return_release(i, v) < 0;
 #endif
+}
 
-#ifndef arch_atomic_add_negative_relaxed
 /**
- * arch_atomic_add_negative_relaxed - Add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic_t
+ * raw_atomic_add_negative_relaxed() - atomic add and test if negative with relaxed ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_add_negative_relaxed() elsewhere.
  *
- * Atomically adds @i to @v and returns true if the result is negative,
- * or false when the result is greater than or equal to zero.
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
  */
 static __always_inline bool
-arch_atomic_add_negative_relaxed(int i, atomic_t *v)
-{
-       return arch_atomic_add_return_relaxed(i, v) < 0;
-}
-#define arch_atomic_add_negative_relaxed arch_atomic_add_negative_relaxed
-#endif
-
-#else /* arch_atomic_add_negative_relaxed */
-
-#ifndef arch_atomic_add_negative_acquire
-static __always_inline bool
-arch_atomic_add_negative_acquire(int i, atomic_t *v)
-{
-       bool ret = arch_atomic_add_negative_relaxed(i, v);
-       __atomic_acquire_fence();
-       return ret;
-}
-#define arch_atomic_add_negative_acquire arch_atomic_add_negative_acquire
-#endif
-
-#ifndef arch_atomic_add_negative_release
-static __always_inline bool
-arch_atomic_add_negative_release(int i, atomic_t *v)
+raw_atomic_add_negative_relaxed(int i, atomic_t *v)
 {
-       __atomic_release_fence();
+#if defined(arch_atomic_add_negative_relaxed)
        return arch_atomic_add_negative_relaxed(i, v);
-}
-#define arch_atomic_add_negative_release arch_atomic_add_negative_release
+#elif defined(arch_atomic_add_negative)
+       return arch_atomic_add_negative(i, v);
+#else
+       return raw_atomic_add_return_relaxed(i, v) < 0;
 #endif
-
-#ifndef arch_atomic_add_negative
-static __always_inline bool
-arch_atomic_add_negative(int i, atomic_t *v)
-{
-       bool ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic_add_negative_relaxed(i, v);
-       __atomic_post_full_fence();
-       return ret;
 }
-#define arch_atomic_add_negative arch_atomic_add_negative
-#endif
-
-#endif /* arch_atomic_add_negative_relaxed */
 
-#ifndef arch_atomic_fetch_add_unless
 /**
- * arch_atomic_fetch_add_unless - add unless the number is already a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
+ * raw_atomic_fetch_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic_t
+ * @a: int value to add
+ * @u: int value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_fetch_add_unless() elsewhere.
  *
- * Atomically adds @a to @v, so long as @v was not already @u.
- * Returns original value of @v
+ * Return: The original value of @v.
  */
 static __always_inline int
-arch_atomic_fetch_add_unless(atomic_t *v, int a, int u)
+raw_atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
-       int c = arch_atomic_read(v);
+#if defined(arch_atomic_fetch_add_unless)
+       return arch_atomic_fetch_add_unless(v, a, u);
+#else
+       int c = raw_atomic_read(v);
 
        do {
                if (unlikely(c == u))
                        break;
-       } while (!arch_atomic_try_cmpxchg(v, &c, c + a));
+       } while (!raw_atomic_try_cmpxchg(v, &c, c + a));
 
        return c;
-}
-#define arch_atomic_fetch_add_unless arch_atomic_fetch_add_unless
 #endif
+}
 
-#ifndef arch_atomic_add_unless
 /**
- * arch_atomic_add_unless - add unless the number is already a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
+ * raw_atomic_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic_t
+ * @a: int value to add
+ * @u: int value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_add_unless() elsewhere.
  *
- * Atomically adds @a to @v, if @v was not already @u.
- * Returns true if the addition was done.
+ * Return: @true if @v was updated, @false otherwise.
  */
 static __always_inline bool
-arch_atomic_add_unless(atomic_t *v, int a, int u)
+raw_atomic_add_unless(atomic_t *v, int a, int u)
 {
-       return arch_atomic_fetch_add_unless(v, a, u) != u;
-}
-#define arch_atomic_add_unless arch_atomic_add_unless
+#if defined(arch_atomic_add_unless)
+       return arch_atomic_add_unless(v, a, u);
+#else
+       return raw_atomic_fetch_add_unless(v, a, u) != u;
 #endif
+}
 
-#ifndef arch_atomic_inc_not_zero
 /**
- * arch_atomic_inc_not_zero - increment unless the number is zero
- * @v: pointer of type atomic_t
+ * raw_atomic_inc_not_zero() - atomic increment unless zero with full ordering
+ * @v: pointer to atomic_t
+ *
+ * If (@v != 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_inc_not_zero() elsewhere.
  *
- * Atomically increments @v by 1, if @v is non-zero.
- * Returns true if the increment was done.
+ * Return: @true if @v was updated, @false otherwise.
  */
 static __always_inline bool
-arch_atomic_inc_not_zero(atomic_t *v)
+raw_atomic_inc_not_zero(atomic_t *v)
 {
-       return arch_atomic_add_unless(v, 1, 0);
-}
-#define arch_atomic_inc_not_zero arch_atomic_inc_not_zero
+#if defined(arch_atomic_inc_not_zero)
+       return arch_atomic_inc_not_zero(v);
+#else
+       return raw_atomic_add_unless(v, 1, 0);
 #endif
+}
 
-#ifndef arch_atomic_inc_unless_negative
+/**
+ * raw_atomic_inc_unless_negative() - atomic increment unless negative with full ordering
+ * @v: pointer to atomic_t
+ *
+ * If (@v >= 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_inc_unless_negative() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_inc_unless_negative(atomic_t *v)
+raw_atomic_inc_unless_negative(atomic_t *v)
 {
-       int c = arch_atomic_read(v);
+#if defined(arch_atomic_inc_unless_negative)
+       return arch_atomic_inc_unless_negative(v);
+#else
+       int c = raw_atomic_read(v);
 
        do {
                if (unlikely(c < 0))
                        return false;
-       } while (!arch_atomic_try_cmpxchg(v, &c, c + 1));
+       } while (!raw_atomic_try_cmpxchg(v, &c, c + 1));
 
        return true;
-}
-#define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative
 #endif
+}
 
-#ifndef arch_atomic_dec_unless_positive
+/**
+ * raw_atomic_dec_unless_positive() - atomic decrement unless positive with full ordering
+ * @v: pointer to atomic_t
+ *
+ * If (@v <= 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_dec_unless_positive() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_dec_unless_positive(atomic_t *v)
+raw_atomic_dec_unless_positive(atomic_t *v)
 {
-       int c = arch_atomic_read(v);
+#if defined(arch_atomic_dec_unless_positive)
+       return arch_atomic_dec_unless_positive(v);
+#else
+       int c = raw_atomic_read(v);
 
        do {
                if (unlikely(c > 0))
                        return false;
-       } while (!arch_atomic_try_cmpxchg(v, &c, c - 1));
+       } while (!raw_atomic_try_cmpxchg(v, &c, c - 1));
 
        return true;
-}
-#define arch_atomic_dec_unless_positive arch_atomic_dec_unless_positive
 #endif
+}
 
-#ifndef arch_atomic_dec_if_positive
+/**
+ * raw_atomic_dec_if_positive() - atomic decrement if positive with full ordering
+ * @v: pointer to atomic_t
+ *
+ * If (@v > 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_dec_if_positive() elsewhere.
+ *
+ * Return: The old value of (@v - 1), regardless of whether @v was updated.
+ */
 static __always_inline int
-arch_atomic_dec_if_positive(atomic_t *v)
+raw_atomic_dec_if_positive(atomic_t *v)
 {
-       int dec, c = arch_atomic_read(v);
+#if defined(arch_atomic_dec_if_positive)
+       return arch_atomic_dec_if_positive(v);
+#else
+       int dec, c = raw_atomic_read(v);
 
        do {
                dec = c - 1;
                if (unlikely(dec < 0))
                        break;
-       } while (!arch_atomic_try_cmpxchg(v, &c, dec));
+       } while (!raw_atomic_try_cmpxchg(v, &c, dec));
 
        return dec;
-}
-#define arch_atomic_dec_if_positive arch_atomic_dec_if_positive
 #endif
+}
 
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
 #endif
 
-#ifndef arch_atomic64_read_acquire
+/**
+ * raw_atomic64_read() - atomic load with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically loads the value of @v with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_read() elsewhere.
+ *
+ * Return: The value loaded from @v.
+ */
+static __always_inline s64
+raw_atomic64_read(const atomic64_t *v)
+{
+       return arch_atomic64_read(v);
+}
+
+/**
+ * raw_atomic64_read_acquire() - atomic load with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically loads the value of @v with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_read_acquire() elsewhere.
+ *
+ * Return: The value loaded from @v.
+ */
 static __always_inline s64
-arch_atomic64_read_acquire(const atomic64_t *v)
+raw_atomic64_read_acquire(const atomic64_t *v)
 {
+#if defined(arch_atomic64_read_acquire)
+       return arch_atomic64_read_acquire(v);
+#elif defined(arch_atomic64_read)
+       return arch_atomic64_read(v);
+#else
        s64 ret;
 
        if (__native_word(atomic64_t)) {
                ret = smp_load_acquire(&(v)->counter);
        } else {
-               ret = arch_atomic64_read(v);
+               ret = raw_atomic64_read(v);
                __atomic_acquire_fence();
        }
 
        return ret;
-}
-#define arch_atomic64_read_acquire arch_atomic64_read_acquire
 #endif
+}
+
+/**
+ * raw_atomic64_set() - atomic set with relaxed ordering
+ * @v: pointer to atomic64_t
+ * @i: s64 value to assign
+ *
+ * Atomically sets @v to @i with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_set() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic64_set(atomic64_t *v, s64 i)
+{
+       arch_atomic64_set(v, i);
+}
 
-#ifndef arch_atomic64_set_release
+/**
+ * raw_atomic64_set_release() - atomic set with release ordering
+ * @v: pointer to atomic64_t
+ * @i: s64 value to assign
+ *
+ * Atomically sets @v to @i with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_set_release() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic64_set_release(atomic64_t *v, s64 i)
+raw_atomic64_set_release(atomic64_t *v, s64 i)
 {
+#if defined(arch_atomic64_set_release)
+       arch_atomic64_set_release(v, i);
+#elif defined(arch_atomic64_set)
+       arch_atomic64_set(v, i);
+#else
        if (__native_word(atomic64_t)) {
                smp_store_release(&(v)->counter, i);
        } else {
                __atomic_release_fence();
-               arch_atomic64_set(v, i);
+               raw_atomic64_set(v, i);
        }
-}
-#define arch_atomic64_set_release arch_atomic64_set_release
 #endif
+}
+
+/**
+ * raw_atomic64_add() - atomic add with relaxed ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_add() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic64_add(s64 i, atomic64_t *v)
+{
+       arch_atomic64_add(i, v);
+}
 
-#ifndef arch_atomic64_add_return_relaxed
-#define arch_atomic64_add_return_acquire arch_atomic64_add_return
-#define arch_atomic64_add_return_release arch_atomic64_add_return
-#define arch_atomic64_add_return_relaxed arch_atomic64_add_return
-#else /* arch_atomic64_add_return_relaxed */
+/**
+ * raw_atomic64_add_return() - atomic add with full ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_add_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
+static __always_inline s64
+raw_atomic64_add_return(s64 i, atomic64_t *v)
+{
+#if defined(arch_atomic64_add_return)
+       return arch_atomic64_add_return(i, v);
+#elif defined(arch_atomic64_add_return_relaxed)
+       s64 ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic64_add_return_relaxed(i, v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+#error "Unable to define raw_atomic64_add_return"
+#endif
+}
 
-#ifndef arch_atomic64_add_return_acquire
+/**
+ * raw_atomic64_add_return_acquire() - atomic add with acquire ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_add_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_add_return_acquire(s64 i, atomic64_t *v)
+raw_atomic64_add_return_acquire(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_add_return_acquire)
+       return arch_atomic64_add_return_acquire(i, v);
+#elif defined(arch_atomic64_add_return_relaxed)
        s64 ret = arch_atomic64_add_return_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_add_return_acquire arch_atomic64_add_return_acquire
+#elif defined(arch_atomic64_add_return)
+       return arch_atomic64_add_return(i, v);
+#else
+#error "Unable to define raw_atomic64_add_return_acquire"
 #endif
+}
 
-#ifndef arch_atomic64_add_return_release
+/**
+ * raw_atomic64_add_return_release() - atomic add with release ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_add_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_add_return_release(s64 i, atomic64_t *v)
+raw_atomic64_add_return_release(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_add_return_release)
+       return arch_atomic64_add_return_release(i, v);
+#elif defined(arch_atomic64_add_return_relaxed)
        __atomic_release_fence();
        return arch_atomic64_add_return_relaxed(i, v);
+#elif defined(arch_atomic64_add_return)
+       return arch_atomic64_add_return(i, v);
+#else
+#error "Unable to define raw_atomic64_add_return_release"
+#endif
 }
-#define arch_atomic64_add_return_release arch_atomic64_add_return_release
+
+/**
+ * raw_atomic64_add_return_relaxed() - atomic add with relaxed ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_add_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
+static __always_inline s64
+raw_atomic64_add_return_relaxed(s64 i, atomic64_t *v)
+{
+#if defined(arch_atomic64_add_return_relaxed)
+       return arch_atomic64_add_return_relaxed(i, v);
+#elif defined(arch_atomic64_add_return)
+       return arch_atomic64_add_return(i, v);
+#else
+#error "Unable to define raw_atomic64_add_return_relaxed"
 #endif
+}
 
-#ifndef arch_atomic64_add_return
+/**
+ * raw_atomic64_fetch_add() - atomic add with full ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_add() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_add_return(s64 i, atomic64_t *v)
+raw_atomic64_fetch_add(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_add)
+       return arch_atomic64_fetch_add(i, v);
+#elif defined(arch_atomic64_fetch_add_relaxed)
        s64 ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic64_add_return_relaxed(i, v);
+       ret = arch_atomic64_fetch_add_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic64_add_return arch_atomic64_add_return
+#else
+#error "Unable to define raw_atomic64_fetch_add"
 #endif
+}
 
-#endif /* arch_atomic64_add_return_relaxed */
-
-#ifndef arch_atomic64_fetch_add_relaxed
-#define arch_atomic64_fetch_add_acquire arch_atomic64_fetch_add
-#define arch_atomic64_fetch_add_release arch_atomic64_fetch_add
-#define arch_atomic64_fetch_add_relaxed arch_atomic64_fetch_add
-#else /* arch_atomic64_fetch_add_relaxed */
-
-#ifndef arch_atomic64_fetch_add_acquire
+/**
+ * raw_atomic64_fetch_add_acquire() - atomic add with acquire ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_add_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
+raw_atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_add_acquire)
+       return arch_atomic64_fetch_add_acquire(i, v);
+#elif defined(arch_atomic64_fetch_add_relaxed)
        s64 ret = arch_atomic64_fetch_add_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_fetch_add_acquire arch_atomic64_fetch_add_acquire
+#elif defined(arch_atomic64_fetch_add)
+       return arch_atomic64_fetch_add(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_add_acquire"
 #endif
+}
 
-#ifndef arch_atomic64_fetch_add_release
+/**
+ * raw_atomic64_fetch_add_release() - atomic add with release ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_add_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_add_release(s64 i, atomic64_t *v)
+raw_atomic64_fetch_add_release(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_add_release)
+       return arch_atomic64_fetch_add_release(i, v);
+#elif defined(arch_atomic64_fetch_add_relaxed)
        __atomic_release_fence();
        return arch_atomic64_fetch_add_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_add)
+       return arch_atomic64_fetch_add(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_add_release"
+#endif
 }
-#define arch_atomic64_fetch_add_release arch_atomic64_fetch_add_release
+
+/**
+ * raw_atomic64_fetch_add_relaxed() - atomic add with relaxed ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_add_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline s64
+raw_atomic64_fetch_add_relaxed(s64 i, atomic64_t *v)
+{
+#if defined(arch_atomic64_fetch_add_relaxed)
+       return arch_atomic64_fetch_add_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_add)
+       return arch_atomic64_fetch_add(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_add_relaxed"
 #endif
+}
+
+/**
+ * raw_atomic64_sub() - atomic subtract with relaxed ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_sub() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic64_sub(s64 i, atomic64_t *v)
+{
+       arch_atomic64_sub(i, v);
+}
 
-#ifndef arch_atomic64_fetch_add
+/**
+ * raw_atomic64_sub_return() - atomic subtract with full ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_sub_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_add(s64 i, atomic64_t *v)
+raw_atomic64_sub_return(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_sub_return)
+       return arch_atomic64_sub_return(i, v);
+#elif defined(arch_atomic64_sub_return_relaxed)
        s64 ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic64_fetch_add_relaxed(i, v);
+       ret = arch_atomic64_sub_return_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic64_fetch_add arch_atomic64_fetch_add
+#else
+#error "Unable to define raw_atomic64_sub_return"
 #endif
+}
 
-#endif /* arch_atomic64_fetch_add_relaxed */
-
-#ifndef arch_atomic64_sub_return_relaxed
-#define arch_atomic64_sub_return_acquire arch_atomic64_sub_return
-#define arch_atomic64_sub_return_release arch_atomic64_sub_return
-#define arch_atomic64_sub_return_relaxed arch_atomic64_sub_return
-#else /* arch_atomic64_sub_return_relaxed */
-
-#ifndef arch_atomic64_sub_return_acquire
+/**
+ * raw_atomic64_sub_return_acquire() - atomic subtract with acquire ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_sub_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_sub_return_acquire(s64 i, atomic64_t *v)
+raw_atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_sub_return_acquire)
+       return arch_atomic64_sub_return_acquire(i, v);
+#elif defined(arch_atomic64_sub_return_relaxed)
        s64 ret = arch_atomic64_sub_return_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_sub_return_acquire arch_atomic64_sub_return_acquire
+#elif defined(arch_atomic64_sub_return)
+       return arch_atomic64_sub_return(i, v);
+#else
+#error "Unable to define raw_atomic64_sub_return_acquire"
 #endif
+}
 
-#ifndef arch_atomic64_sub_return_release
+/**
+ * raw_atomic64_sub_return_release() - atomic subtract with release ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_sub_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_sub_return_release(s64 i, atomic64_t *v)
+raw_atomic64_sub_return_release(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_sub_return_release)
+       return arch_atomic64_sub_return_release(i, v);
+#elif defined(arch_atomic64_sub_return_relaxed)
        __atomic_release_fence();
        return arch_atomic64_sub_return_relaxed(i, v);
+#elif defined(arch_atomic64_sub_return)
+       return arch_atomic64_sub_return(i, v);
+#else
+#error "Unable to define raw_atomic64_sub_return_release"
+#endif
 }
-#define arch_atomic64_sub_return_release arch_atomic64_sub_return_release
+
+/**
+ * raw_atomic64_sub_return_relaxed() - atomic subtract with relaxed ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_sub_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
+static __always_inline s64
+raw_atomic64_sub_return_relaxed(s64 i, atomic64_t *v)
+{
+#if defined(arch_atomic64_sub_return_relaxed)
+       return arch_atomic64_sub_return_relaxed(i, v);
+#elif defined(arch_atomic64_sub_return)
+       return arch_atomic64_sub_return(i, v);
+#else
+#error "Unable to define raw_atomic64_sub_return_relaxed"
 #endif
+}
 
-#ifndef arch_atomic64_sub_return
+/**
+ * raw_atomic64_fetch_sub() - atomic subtract with full ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_sub() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_sub_return(s64 i, atomic64_t *v)
+raw_atomic64_fetch_sub(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_sub)
+       return arch_atomic64_fetch_sub(i, v);
+#elif defined(arch_atomic64_fetch_sub_relaxed)
        s64 ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic64_sub_return_relaxed(i, v);
+       ret = arch_atomic64_fetch_sub_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic64_sub_return arch_atomic64_sub_return
+#else
+#error "Unable to define raw_atomic64_fetch_sub"
 #endif
+}
 
-#endif /* arch_atomic64_sub_return_relaxed */
-
-#ifndef arch_atomic64_fetch_sub_relaxed
-#define arch_atomic64_fetch_sub_acquire arch_atomic64_fetch_sub
-#define arch_atomic64_fetch_sub_release arch_atomic64_fetch_sub
-#define arch_atomic64_fetch_sub_relaxed arch_atomic64_fetch_sub
-#else /* arch_atomic64_fetch_sub_relaxed */
-
-#ifndef arch_atomic64_fetch_sub_acquire
+/**
+ * raw_atomic64_fetch_sub_acquire() - atomic subtract with acquire ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_sub_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
+raw_atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_sub_acquire)
+       return arch_atomic64_fetch_sub_acquire(i, v);
+#elif defined(arch_atomic64_fetch_sub_relaxed)
        s64 ret = arch_atomic64_fetch_sub_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_fetch_sub_acquire arch_atomic64_fetch_sub_acquire
+#elif defined(arch_atomic64_fetch_sub)
+       return arch_atomic64_fetch_sub(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_sub_acquire"
 #endif
+}
 
-#ifndef arch_atomic64_fetch_sub_release
+/**
+ * raw_atomic64_fetch_sub_release() - atomic subtract with release ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_sub_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_sub_release(s64 i, atomic64_t *v)
+raw_atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_sub_release)
+       return arch_atomic64_fetch_sub_release(i, v);
+#elif defined(arch_atomic64_fetch_sub_relaxed)
        __atomic_release_fence();
        return arch_atomic64_fetch_sub_relaxed(i, v);
-}
-#define arch_atomic64_fetch_sub_release arch_atomic64_fetch_sub_release
+#elif defined(arch_atomic64_fetch_sub)
+       return arch_atomic64_fetch_sub(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_sub_release"
 #endif
+}
 
-#ifndef arch_atomic64_fetch_sub
+/**
+ * raw_atomic64_fetch_sub_relaxed() - atomic subtract with relaxed ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_sub_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_sub(s64 i, atomic64_t *v)
+raw_atomic64_fetch_sub_relaxed(s64 i, atomic64_t *v)
 {
-       s64 ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic64_fetch_sub_relaxed(i, v);
-       __atomic_post_full_fence();
-       return ret;
-}
-#define arch_atomic64_fetch_sub arch_atomic64_fetch_sub
+#if defined(arch_atomic64_fetch_sub_relaxed)
+       return arch_atomic64_fetch_sub_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_sub)
+       return arch_atomic64_fetch_sub(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_sub_relaxed"
 #endif
-
-#endif /* arch_atomic64_fetch_sub_relaxed */
-
-#ifndef arch_atomic64_inc
-static __always_inline void
-arch_atomic64_inc(atomic64_t *v)
-{
-       arch_atomic64_add(1, v);
 }
-#define arch_atomic64_inc arch_atomic64_inc
-#endif
 
-#ifndef arch_atomic64_inc_return_relaxed
-#ifdef arch_atomic64_inc_return
-#define arch_atomic64_inc_return_acquire arch_atomic64_inc_return
-#define arch_atomic64_inc_return_release arch_atomic64_inc_return
-#define arch_atomic64_inc_return_relaxed arch_atomic64_inc_return
-#endif /* arch_atomic64_inc_return */
-
-#ifndef arch_atomic64_inc_return
-static __always_inline s64
-arch_atomic64_inc_return(atomic64_t *v)
+/**
+ * raw_atomic64_inc() - atomic increment with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_inc() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic64_inc(atomic64_t *v)
 {
-       return arch_atomic64_add_return(1, v);
-}
-#define arch_atomic64_inc_return arch_atomic64_inc_return
+#if defined(arch_atomic64_inc)
+       arch_atomic64_inc(v);
+#else
+       raw_atomic64_add(1, v);
 #endif
-
-#ifndef arch_atomic64_inc_return_acquire
-static __always_inline s64
-arch_atomic64_inc_return_acquire(atomic64_t *v)
-{
-       return arch_atomic64_add_return_acquire(1, v);
 }
-#define arch_atomic64_inc_return_acquire arch_atomic64_inc_return_acquire
-#endif
 
-#ifndef arch_atomic64_inc_return_release
+/**
+ * raw_atomic64_inc_return() - atomic increment with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_inc_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_inc_return_release(atomic64_t *v)
+raw_atomic64_inc_return(atomic64_t *v)
 {
-       return arch_atomic64_add_return_release(1, v);
-}
-#define arch_atomic64_inc_return_release arch_atomic64_inc_return_release
+#if defined(arch_atomic64_inc_return)
+       return arch_atomic64_inc_return(v);
+#elif defined(arch_atomic64_inc_return_relaxed)
+       s64 ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic64_inc_return_relaxed(v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic64_add_return(1, v);
 #endif
-
-#ifndef arch_atomic64_inc_return_relaxed
-static __always_inline s64
-arch_atomic64_inc_return_relaxed(atomic64_t *v)
-{
-       return arch_atomic64_add_return_relaxed(1, v);
 }
-#define arch_atomic64_inc_return_relaxed arch_atomic64_inc_return_relaxed
-#endif
 
-#else /* arch_atomic64_inc_return_relaxed */
-
-#ifndef arch_atomic64_inc_return_acquire
+/**
+ * raw_atomic64_inc_return_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_inc_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_inc_return_acquire(atomic64_t *v)
+raw_atomic64_inc_return_acquire(atomic64_t *v)
 {
+#if defined(arch_atomic64_inc_return_acquire)
+       return arch_atomic64_inc_return_acquire(v);
+#elif defined(arch_atomic64_inc_return_relaxed)
        s64 ret = arch_atomic64_inc_return_relaxed(v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_inc_return_acquire arch_atomic64_inc_return_acquire
+#elif defined(arch_atomic64_inc_return)
+       return arch_atomic64_inc_return(v);
+#else
+       return raw_atomic64_add_return_acquire(1, v);
 #endif
+}
 
-#ifndef arch_atomic64_inc_return_release
+/**
+ * raw_atomic64_inc_return_release() - atomic increment with release ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_inc_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_inc_return_release(atomic64_t *v)
+raw_atomic64_inc_return_release(atomic64_t *v)
 {
+#if defined(arch_atomic64_inc_return_release)
+       return arch_atomic64_inc_return_release(v);
+#elif defined(arch_atomic64_inc_return_relaxed)
        __atomic_release_fence();
        return arch_atomic64_inc_return_relaxed(v);
-}
-#define arch_atomic64_inc_return_release arch_atomic64_inc_return_release
+#elif defined(arch_atomic64_inc_return)
+       return arch_atomic64_inc_return(v);
+#else
+       return raw_atomic64_add_return_release(1, v);
 #endif
-
-#ifndef arch_atomic64_inc_return
-static __always_inline s64
-arch_atomic64_inc_return(atomic64_t *v)
-{
-       s64 ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic64_inc_return_relaxed(v);
-       __atomic_post_full_fence();
-       return ret;
 }
-#define arch_atomic64_inc_return arch_atomic64_inc_return
-#endif
-
-#endif /* arch_atomic64_inc_return_relaxed */
 
-#ifndef arch_atomic64_fetch_inc_relaxed
-#ifdef arch_atomic64_fetch_inc
-#define arch_atomic64_fetch_inc_acquire arch_atomic64_fetch_inc
-#define arch_atomic64_fetch_inc_release arch_atomic64_fetch_inc
-#define arch_atomic64_fetch_inc_relaxed arch_atomic64_fetch_inc
-#endif /* arch_atomic64_fetch_inc */
-
-#ifndef arch_atomic64_fetch_inc
+/**
+ * raw_atomic64_inc_return_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_inc_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_inc(atomic64_t *v)
+raw_atomic64_inc_return_relaxed(atomic64_t *v)
 {
-       return arch_atomic64_fetch_add(1, v);
-}
-#define arch_atomic64_fetch_inc arch_atomic64_fetch_inc
+#if defined(arch_atomic64_inc_return_relaxed)
+       return arch_atomic64_inc_return_relaxed(v);
+#elif defined(arch_atomic64_inc_return)
+       return arch_atomic64_inc_return(v);
+#else
+       return raw_atomic64_add_return_relaxed(1, v);
 #endif
-
-#ifndef arch_atomic64_fetch_inc_acquire
-static __always_inline s64
-arch_atomic64_fetch_inc_acquire(atomic64_t *v)
-{
-       return arch_atomic64_fetch_add_acquire(1, v);
 }
-#define arch_atomic64_fetch_inc_acquire arch_atomic64_fetch_inc_acquire
-#endif
 
-#ifndef arch_atomic64_fetch_inc_release
+/**
+ * raw_atomic64_fetch_inc() - atomic increment with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_inc() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_inc_release(atomic64_t *v)
+raw_atomic64_fetch_inc(atomic64_t *v)
 {
-       return arch_atomic64_fetch_add_release(1, v);
-}
-#define arch_atomic64_fetch_inc_release arch_atomic64_fetch_inc_release
+#if defined(arch_atomic64_fetch_inc)
+       return arch_atomic64_fetch_inc(v);
+#elif defined(arch_atomic64_fetch_inc_relaxed)
+       s64 ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic64_fetch_inc_relaxed(v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic64_fetch_add(1, v);
 #endif
-
-#ifndef arch_atomic64_fetch_inc_relaxed
-static __always_inline s64
-arch_atomic64_fetch_inc_relaxed(atomic64_t *v)
-{
-       return arch_atomic64_fetch_add_relaxed(1, v);
 }
-#define arch_atomic64_fetch_inc_relaxed arch_atomic64_fetch_inc_relaxed
-#endif
-
-#else /* arch_atomic64_fetch_inc_relaxed */
 
-#ifndef arch_atomic64_fetch_inc_acquire
+/**
+ * raw_atomic64_fetch_inc_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_inc_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_inc_acquire(atomic64_t *v)
+raw_atomic64_fetch_inc_acquire(atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_inc_acquire)
+       return arch_atomic64_fetch_inc_acquire(v);
+#elif defined(arch_atomic64_fetch_inc_relaxed)
        s64 ret = arch_atomic64_fetch_inc_relaxed(v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_fetch_inc_acquire arch_atomic64_fetch_inc_acquire
+#elif defined(arch_atomic64_fetch_inc)
+       return arch_atomic64_fetch_inc(v);
+#else
+       return raw_atomic64_fetch_add_acquire(1, v);
 #endif
+}
 
-#ifndef arch_atomic64_fetch_inc_release
+/**
+ * raw_atomic64_fetch_inc_release() - atomic increment with release ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_inc_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_inc_release(atomic64_t *v)
+raw_atomic64_fetch_inc_release(atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_inc_release)
+       return arch_atomic64_fetch_inc_release(v);
+#elif defined(arch_atomic64_fetch_inc_relaxed)
        __atomic_release_fence();
        return arch_atomic64_fetch_inc_relaxed(v);
-}
-#define arch_atomic64_fetch_inc_release arch_atomic64_fetch_inc_release
-#endif
-
-#ifndef arch_atomic64_fetch_inc
-static __always_inline s64
-arch_atomic64_fetch_inc(atomic64_t *v)
-{
-       s64 ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic64_fetch_inc_relaxed(v);
-       __atomic_post_full_fence();
-       return ret;
-}
-#define arch_atomic64_fetch_inc arch_atomic64_fetch_inc
+#elif defined(arch_atomic64_fetch_inc)
+       return arch_atomic64_fetch_inc(v);
+#else
+       return raw_atomic64_fetch_add_release(1, v);
 #endif
-
-#endif /* arch_atomic64_fetch_inc_relaxed */
-
-#ifndef arch_atomic64_dec
-static __always_inline void
-arch_atomic64_dec(atomic64_t *v)
-{
-       arch_atomic64_sub(1, v);
 }
-#define arch_atomic64_dec arch_atomic64_dec
-#endif
 
-#ifndef arch_atomic64_dec_return_relaxed
-#ifdef arch_atomic64_dec_return
-#define arch_atomic64_dec_return_acquire arch_atomic64_dec_return
-#define arch_atomic64_dec_return_release arch_atomic64_dec_return
-#define arch_atomic64_dec_return_relaxed arch_atomic64_dec_return
-#endif /* arch_atomic64_dec_return */
-
-#ifndef arch_atomic64_dec_return
+/**
+ * raw_atomic64_fetch_inc_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_inc_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_dec_return(atomic64_t *v)
+raw_atomic64_fetch_inc_relaxed(atomic64_t *v)
 {
-       return arch_atomic64_sub_return(1, v);
-}
-#define arch_atomic64_dec_return arch_atomic64_dec_return
+#if defined(arch_atomic64_fetch_inc_relaxed)
+       return arch_atomic64_fetch_inc_relaxed(v);
+#elif defined(arch_atomic64_fetch_inc)
+       return arch_atomic64_fetch_inc(v);
+#else
+       return raw_atomic64_fetch_add_relaxed(1, v);
 #endif
+}
 
-#ifndef arch_atomic64_dec_return_acquire
-static __always_inline s64
-arch_atomic64_dec_return_acquire(atomic64_t *v)
+/**
+ * raw_atomic64_dec() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_dec() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic64_dec(atomic64_t *v)
 {
-       return arch_atomic64_sub_return_acquire(1, v);
-}
-#define arch_atomic64_dec_return_acquire arch_atomic64_dec_return_acquire
+#if defined(arch_atomic64_dec)
+       arch_atomic64_dec(v);
+#else
+       raw_atomic64_sub(1, v);
 #endif
-
-#ifndef arch_atomic64_dec_return_release
-static __always_inline s64
-arch_atomic64_dec_return_release(atomic64_t *v)
-{
-       return arch_atomic64_sub_return_release(1, v);
 }
-#define arch_atomic64_dec_return_release arch_atomic64_dec_return_release
-#endif
 
-#ifndef arch_atomic64_dec_return_relaxed
+/**
+ * raw_atomic64_dec_return() - atomic decrement with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_dec_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_dec_return_relaxed(atomic64_t *v)
+raw_atomic64_dec_return(atomic64_t *v)
 {
-       return arch_atomic64_sub_return_relaxed(1, v);
-}
-#define arch_atomic64_dec_return_relaxed arch_atomic64_dec_return_relaxed
+#if defined(arch_atomic64_dec_return)
+       return arch_atomic64_dec_return(v);
+#elif defined(arch_atomic64_dec_return_relaxed)
+       s64 ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic64_dec_return_relaxed(v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic64_sub_return(1, v);
 #endif
+}
 
-#else /* arch_atomic64_dec_return_relaxed */
-
-#ifndef arch_atomic64_dec_return_acquire
+/**
+ * raw_atomic64_dec_return_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_dec_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_dec_return_acquire(atomic64_t *v)
+raw_atomic64_dec_return_acquire(atomic64_t *v)
 {
+#if defined(arch_atomic64_dec_return_acquire)
+       return arch_atomic64_dec_return_acquire(v);
+#elif defined(arch_atomic64_dec_return_relaxed)
        s64 ret = arch_atomic64_dec_return_relaxed(v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_dec_return_acquire arch_atomic64_dec_return_acquire
+#elif defined(arch_atomic64_dec_return)
+       return arch_atomic64_dec_return(v);
+#else
+       return raw_atomic64_sub_return_acquire(1, v);
 #endif
+}
 
-#ifndef arch_atomic64_dec_return_release
+/**
+ * raw_atomic64_dec_return_release() - atomic decrement with release ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_dec_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
-arch_atomic64_dec_return_release(atomic64_t *v)
+raw_atomic64_dec_return_release(atomic64_t *v)
 {
+#if defined(arch_atomic64_dec_return_release)
+       return arch_atomic64_dec_return_release(v);
+#elif defined(arch_atomic64_dec_return_relaxed)
        __atomic_release_fence();
        return arch_atomic64_dec_return_relaxed(v);
+#elif defined(arch_atomic64_dec_return)
+       return arch_atomic64_dec_return(v);
+#else
+       return raw_atomic64_sub_return_release(1, v);
+#endif
 }
-#define arch_atomic64_dec_return_release arch_atomic64_dec_return_release
+
+/**
+ * raw_atomic64_dec_return_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_dec_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
+static __always_inline s64
+raw_atomic64_dec_return_relaxed(atomic64_t *v)
+{
+#if defined(arch_atomic64_dec_return_relaxed)
+       return arch_atomic64_dec_return_relaxed(v);
+#elif defined(arch_atomic64_dec_return)
+       return arch_atomic64_dec_return(v);
+#else
+       return raw_atomic64_sub_return_relaxed(1, v);
 #endif
+}
 
-#ifndef arch_atomic64_dec_return
+/**
+ * raw_atomic64_fetch_dec() - atomic decrement with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_dec() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_dec_return(atomic64_t *v)
+raw_atomic64_fetch_dec(atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_dec)
+       return arch_atomic64_fetch_dec(v);
+#elif defined(arch_atomic64_fetch_dec_relaxed)
        s64 ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic64_dec_return_relaxed(v);
+       ret = arch_atomic64_fetch_dec_relaxed(v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic64_dec_return arch_atomic64_dec_return
+#else
+       return raw_atomic64_fetch_sub(1, v);
 #endif
-
-#endif /* arch_atomic64_dec_return_relaxed */
-
-#ifndef arch_atomic64_fetch_dec_relaxed
-#ifdef arch_atomic64_fetch_dec
-#define arch_atomic64_fetch_dec_acquire arch_atomic64_fetch_dec
-#define arch_atomic64_fetch_dec_release arch_atomic64_fetch_dec
-#define arch_atomic64_fetch_dec_relaxed arch_atomic64_fetch_dec
-#endif /* arch_atomic64_fetch_dec */
-
-#ifndef arch_atomic64_fetch_dec
-static __always_inline s64
-arch_atomic64_fetch_dec(atomic64_t *v)
-{
-       return arch_atomic64_fetch_sub(1, v);
 }
-#define arch_atomic64_fetch_dec arch_atomic64_fetch_dec
-#endif
 
-#ifndef arch_atomic64_fetch_dec_acquire
+/**
+ * raw_atomic64_fetch_dec_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_dec_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_dec_acquire(atomic64_t *v)
+raw_atomic64_fetch_dec_acquire(atomic64_t *v)
 {
-       return arch_atomic64_fetch_sub_acquire(1, v);
-}
-#define arch_atomic64_fetch_dec_acquire arch_atomic64_fetch_dec_acquire
+#if defined(arch_atomic64_fetch_dec_acquire)
+       return arch_atomic64_fetch_dec_acquire(v);
+#elif defined(arch_atomic64_fetch_dec_relaxed)
+       s64 ret = arch_atomic64_fetch_dec_relaxed(v);
+       __atomic_acquire_fence();
+       return ret;
+#elif defined(arch_atomic64_fetch_dec)
+       return arch_atomic64_fetch_dec(v);
+#else
+       return raw_atomic64_fetch_sub_acquire(1, v);
 #endif
-
-#ifndef arch_atomic64_fetch_dec_release
-static __always_inline s64
-arch_atomic64_fetch_dec_release(atomic64_t *v)
-{
-       return arch_atomic64_fetch_sub_release(1, v);
 }
-#define arch_atomic64_fetch_dec_release arch_atomic64_fetch_dec_release
-#endif
 
-#ifndef arch_atomic64_fetch_dec_relaxed
+/**
+ * raw_atomic64_fetch_dec_release() - atomic decrement with release ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_dec_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_dec_relaxed(atomic64_t *v)
+raw_atomic64_fetch_dec_release(atomic64_t *v)
 {
-       return arch_atomic64_fetch_sub_relaxed(1, v);
-}
-#define arch_atomic64_fetch_dec_relaxed arch_atomic64_fetch_dec_relaxed
+#if defined(arch_atomic64_fetch_dec_release)
+       return arch_atomic64_fetch_dec_release(v);
+#elif defined(arch_atomic64_fetch_dec_relaxed)
+       __atomic_release_fence();
+       return arch_atomic64_fetch_dec_relaxed(v);
+#elif defined(arch_atomic64_fetch_dec)
+       return arch_atomic64_fetch_dec(v);
+#else
+       return raw_atomic64_fetch_sub_release(1, v);
 #endif
+}
 
-#else /* arch_atomic64_fetch_dec_relaxed */
-
-#ifndef arch_atomic64_fetch_dec_acquire
+/**
+ * raw_atomic64_fetch_dec_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_dec_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_dec_acquire(atomic64_t *v)
+raw_atomic64_fetch_dec_relaxed(atomic64_t *v)
 {
-       s64 ret = arch_atomic64_fetch_dec_relaxed(v);
-       __atomic_acquire_fence();
-       return ret;
-}
-#define arch_atomic64_fetch_dec_acquire arch_atomic64_fetch_dec_acquire
+#if defined(arch_atomic64_fetch_dec_relaxed)
+       return arch_atomic64_fetch_dec_relaxed(v);
+#elif defined(arch_atomic64_fetch_dec)
+       return arch_atomic64_fetch_dec(v);
+#else
+       return raw_atomic64_fetch_sub_relaxed(1, v);
 #endif
+}
 
-#ifndef arch_atomic64_fetch_dec_release
-static __always_inline s64
-arch_atomic64_fetch_dec_release(atomic64_t *v)
+/**
+ * raw_atomic64_and() - atomic bitwise AND with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_and() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic64_and(s64 i, atomic64_t *v)
 {
-       __atomic_release_fence();
-       return arch_atomic64_fetch_dec_relaxed(v);
+       arch_atomic64_and(i, v);
 }
-#define arch_atomic64_fetch_dec_release arch_atomic64_fetch_dec_release
-#endif
 
-#ifndef arch_atomic64_fetch_dec
+/**
+ * raw_atomic64_fetch_and() - atomic bitwise AND with full ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_and() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_dec(atomic64_t *v)
+raw_atomic64_fetch_and(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_and)
+       return arch_atomic64_fetch_and(i, v);
+#elif defined(arch_atomic64_fetch_and_relaxed)
        s64 ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic64_fetch_dec_relaxed(v);
+       ret = arch_atomic64_fetch_and_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic64_fetch_dec arch_atomic64_fetch_dec
+#else
+#error "Unable to define raw_atomic64_fetch_and"
 #endif
+}
 
-#endif /* arch_atomic64_fetch_dec_relaxed */
-
-#ifndef arch_atomic64_fetch_and_relaxed
-#define arch_atomic64_fetch_and_acquire arch_atomic64_fetch_and
-#define arch_atomic64_fetch_and_release arch_atomic64_fetch_and
-#define arch_atomic64_fetch_and_relaxed arch_atomic64_fetch_and
-#else /* arch_atomic64_fetch_and_relaxed */
-
-#ifndef arch_atomic64_fetch_and_acquire
+/**
+ * raw_atomic64_fetch_and_acquire() - atomic bitwise AND with acquire ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_and_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
+raw_atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_and_acquire)
+       return arch_atomic64_fetch_and_acquire(i, v);
+#elif defined(arch_atomic64_fetch_and_relaxed)
        s64 ret = arch_atomic64_fetch_and_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_fetch_and_acquire arch_atomic64_fetch_and_acquire
+#elif defined(arch_atomic64_fetch_and)
+       return arch_atomic64_fetch_and(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_and_acquire"
 #endif
+}
 
-#ifndef arch_atomic64_fetch_and_release
+/**
+ * raw_atomic64_fetch_and_release() - atomic bitwise AND with release ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_and_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_and_release(s64 i, atomic64_t *v)
+raw_atomic64_fetch_and_release(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_and_release)
+       return arch_atomic64_fetch_and_release(i, v);
+#elif defined(arch_atomic64_fetch_and_relaxed)
        __atomic_release_fence();
        return arch_atomic64_fetch_and_relaxed(i, v);
-}
-#define arch_atomic64_fetch_and_release arch_atomic64_fetch_and_release
+#elif defined(arch_atomic64_fetch_and)
+       return arch_atomic64_fetch_and(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_and_release"
 #endif
+}
 
-#ifndef arch_atomic64_fetch_and
+/**
+ * raw_atomic64_fetch_and_relaxed() - atomic bitwise AND with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_and_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_and(s64 i, atomic64_t *v)
+raw_atomic64_fetch_and_relaxed(s64 i, atomic64_t *v)
 {
-       s64 ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic64_fetch_and_relaxed(i, v);
-       __atomic_post_full_fence();
-       return ret;
-}
-#define arch_atomic64_fetch_and arch_atomic64_fetch_and
+#if defined(arch_atomic64_fetch_and_relaxed)
+       return arch_atomic64_fetch_and_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_and)
+       return arch_atomic64_fetch_and(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_and_relaxed"
 #endif
-
-#endif /* arch_atomic64_fetch_and_relaxed */
-
-#ifndef arch_atomic64_andnot
-static __always_inline void
-arch_atomic64_andnot(s64 i, atomic64_t *v)
-{
-       arch_atomic64_and(~i, v);
 }
-#define arch_atomic64_andnot arch_atomic64_andnot
-#endif
-
-#ifndef arch_atomic64_fetch_andnot_relaxed
-#ifdef arch_atomic64_fetch_andnot
-#define arch_atomic64_fetch_andnot_acquire arch_atomic64_fetch_andnot
-#define arch_atomic64_fetch_andnot_release arch_atomic64_fetch_andnot
-#define arch_atomic64_fetch_andnot_relaxed arch_atomic64_fetch_andnot
-#endif /* arch_atomic64_fetch_andnot */
 
-#ifndef arch_atomic64_fetch_andnot
-static __always_inline s64
-arch_atomic64_fetch_andnot(s64 i, atomic64_t *v)
+/**
+ * raw_atomic64_andnot() - atomic bitwise AND NOT with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_andnot() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic64_andnot(s64 i, atomic64_t *v)
 {
-       return arch_atomic64_fetch_and(~i, v);
-}
-#define arch_atomic64_fetch_andnot arch_atomic64_fetch_andnot
+#if defined(arch_atomic64_andnot)
+       arch_atomic64_andnot(i, v);
+#else
+       raw_atomic64_and(~i, v);
 #endif
-
-#ifndef arch_atomic64_fetch_andnot_acquire
-static __always_inline s64
-arch_atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
-{
-       return arch_atomic64_fetch_and_acquire(~i, v);
 }
-#define arch_atomic64_fetch_andnot_acquire arch_atomic64_fetch_andnot_acquire
-#endif
 
-#ifndef arch_atomic64_fetch_andnot_release
+/**
+ * raw_atomic64_fetch_andnot() - atomic bitwise AND NOT with full ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_andnot() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
+raw_atomic64_fetch_andnot(s64 i, atomic64_t *v)
 {
-       return arch_atomic64_fetch_and_release(~i, v);
-}
-#define arch_atomic64_fetch_andnot_release arch_atomic64_fetch_andnot_release
+#if defined(arch_atomic64_fetch_andnot)
+       return arch_atomic64_fetch_andnot(i, v);
+#elif defined(arch_atomic64_fetch_andnot_relaxed)
+       s64 ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic64_fetch_andnot_relaxed(i, v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic64_fetch_and(~i, v);
 #endif
-
-#ifndef arch_atomic64_fetch_andnot_relaxed
-static __always_inline s64
-arch_atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
-{
-       return arch_atomic64_fetch_and_relaxed(~i, v);
 }
-#define arch_atomic64_fetch_andnot_relaxed arch_atomic64_fetch_andnot_relaxed
-#endif
-
-#else /* arch_atomic64_fetch_andnot_relaxed */
 
-#ifndef arch_atomic64_fetch_andnot_acquire
+/**
+ * raw_atomic64_fetch_andnot_acquire() - atomic bitwise AND NOT with acquire ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_andnot_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
+raw_atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_andnot_acquire)
+       return arch_atomic64_fetch_andnot_acquire(i, v);
+#elif defined(arch_atomic64_fetch_andnot_relaxed)
        s64 ret = arch_atomic64_fetch_andnot_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_fetch_andnot_acquire arch_atomic64_fetch_andnot_acquire
+#elif defined(arch_atomic64_fetch_andnot)
+       return arch_atomic64_fetch_andnot(i, v);
+#else
+       return raw_atomic64_fetch_and_acquire(~i, v);
 #endif
+}
 
-#ifndef arch_atomic64_fetch_andnot_release
+/**
+ * raw_atomic64_fetch_andnot_release() - atomic bitwise AND NOT with release ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_andnot_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
+raw_atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_andnot_release)
+       return arch_atomic64_fetch_andnot_release(i, v);
+#elif defined(arch_atomic64_fetch_andnot_relaxed)
        __atomic_release_fence();
        return arch_atomic64_fetch_andnot_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_andnot)
+       return arch_atomic64_fetch_andnot(i, v);
+#else
+       return raw_atomic64_fetch_and_release(~i, v);
+#endif
 }
-#define arch_atomic64_fetch_andnot_release arch_atomic64_fetch_andnot_release
+
+/**
+ * raw_atomic64_fetch_andnot_relaxed() - atomic bitwise AND NOT with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_andnot_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline s64
+raw_atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
+{
+#if defined(arch_atomic64_fetch_andnot_relaxed)
+       return arch_atomic64_fetch_andnot_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_andnot)
+       return arch_atomic64_fetch_andnot(i, v);
+#else
+       return raw_atomic64_fetch_and_relaxed(~i, v);
 #endif
+}
+
+/**
+ * raw_atomic64_or() - atomic bitwise OR with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_or() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic64_or(s64 i, atomic64_t *v)
+{
+       arch_atomic64_or(i, v);
+}
 
-#ifndef arch_atomic64_fetch_andnot
+/**
+ * raw_atomic64_fetch_or() - atomic bitwise OR with full ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_or() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_andnot(s64 i, atomic64_t *v)
+raw_atomic64_fetch_or(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_or)
+       return arch_atomic64_fetch_or(i, v);
+#elif defined(arch_atomic64_fetch_or_relaxed)
        s64 ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic64_fetch_andnot_relaxed(i, v);
+       ret = arch_atomic64_fetch_or_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic64_fetch_andnot arch_atomic64_fetch_andnot
+#else
+#error "Unable to define raw_atomic64_fetch_or"
 #endif
+}
 
-#endif /* arch_atomic64_fetch_andnot_relaxed */
-
-#ifndef arch_atomic64_fetch_or_relaxed
-#define arch_atomic64_fetch_or_acquire arch_atomic64_fetch_or
-#define arch_atomic64_fetch_or_release arch_atomic64_fetch_or
-#define arch_atomic64_fetch_or_relaxed arch_atomic64_fetch_or
-#else /* arch_atomic64_fetch_or_relaxed */
-
-#ifndef arch_atomic64_fetch_or_acquire
+/**
+ * raw_atomic64_fetch_or_acquire() - atomic bitwise OR with acquire ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_or_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
+raw_atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_or_acquire)
+       return arch_atomic64_fetch_or_acquire(i, v);
+#elif defined(arch_atomic64_fetch_or_relaxed)
        s64 ret = arch_atomic64_fetch_or_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_fetch_or_acquire arch_atomic64_fetch_or_acquire
+#elif defined(arch_atomic64_fetch_or)
+       return arch_atomic64_fetch_or(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_or_acquire"
 #endif
+}
 
-#ifndef arch_atomic64_fetch_or_release
+/**
+ * raw_atomic64_fetch_or_release() - atomic bitwise OR with release ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_or_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_or_release(s64 i, atomic64_t *v)
+raw_atomic64_fetch_or_release(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_or_release)
+       return arch_atomic64_fetch_or_release(i, v);
+#elif defined(arch_atomic64_fetch_or_relaxed)
        __atomic_release_fence();
        return arch_atomic64_fetch_or_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_or)
+       return arch_atomic64_fetch_or(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_or_release"
+#endif
 }
-#define arch_atomic64_fetch_or_release arch_atomic64_fetch_or_release
+
+/**
+ * raw_atomic64_fetch_or_relaxed() - atomic bitwise OR with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_or_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline s64
+raw_atomic64_fetch_or_relaxed(s64 i, atomic64_t *v)
+{
+#if defined(arch_atomic64_fetch_or_relaxed)
+       return arch_atomic64_fetch_or_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_or)
+       return arch_atomic64_fetch_or(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_or_relaxed"
 #endif
+}
+
+/**
+ * raw_atomic64_xor() - atomic bitwise XOR with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_xor() elsewhere.
+ *
+ * Return: Nothing.
+ */
+static __always_inline void
+raw_atomic64_xor(s64 i, atomic64_t *v)
+{
+       arch_atomic64_xor(i, v);
+}
 
-#ifndef arch_atomic64_fetch_or
+/**
+ * raw_atomic64_fetch_xor() - atomic bitwise XOR with full ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_xor() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_or(s64 i, atomic64_t *v)
+raw_atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_xor)
+       return arch_atomic64_fetch_xor(i, v);
+#elif defined(arch_atomic64_fetch_xor_relaxed)
        s64 ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic64_fetch_or_relaxed(i, v);
+       ret = arch_atomic64_fetch_xor_relaxed(i, v);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic64_fetch_or arch_atomic64_fetch_or
-#endif
-
-#endif /* arch_atomic64_fetch_or_relaxed */
-
-#ifndef arch_atomic64_fetch_xor_relaxed
-#define arch_atomic64_fetch_xor_acquire arch_atomic64_fetch_xor
-#define arch_atomic64_fetch_xor_release arch_atomic64_fetch_xor
-#define arch_atomic64_fetch_xor_relaxed arch_atomic64_fetch_xor
-#else /* arch_atomic64_fetch_xor_relaxed */
+#else
+#error "Unable to define raw_atomic64_fetch_xor"
+#endif
+}
 
-#ifndef arch_atomic64_fetch_xor_acquire
+/**
+ * raw_atomic64_fetch_xor_acquire() - atomic bitwise XOR with acquire ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_xor_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
+raw_atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_xor_acquire)
+       return arch_atomic64_fetch_xor_acquire(i, v);
+#elif defined(arch_atomic64_fetch_xor_relaxed)
        s64 ret = arch_atomic64_fetch_xor_relaxed(i, v);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_fetch_xor_acquire arch_atomic64_fetch_xor_acquire
+#elif defined(arch_atomic64_fetch_xor)
+       return arch_atomic64_fetch_xor(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_xor_acquire"
 #endif
+}
 
-#ifndef arch_atomic64_fetch_xor_release
+/**
+ * raw_atomic64_fetch_xor_release() - atomic bitwise XOR with release ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_xor_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_xor_release(s64 i, atomic64_t *v)
+raw_atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 {
+#if defined(arch_atomic64_fetch_xor_release)
+       return arch_atomic64_fetch_xor_release(i, v);
+#elif defined(arch_atomic64_fetch_xor_relaxed)
        __atomic_release_fence();
        return arch_atomic64_fetch_xor_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_xor)
+       return arch_atomic64_fetch_xor(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_xor_release"
+#endif
 }
-#define arch_atomic64_fetch_xor_release arch_atomic64_fetch_xor_release
+
+/**
+ * raw_atomic64_fetch_xor_relaxed() - atomic bitwise XOR with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_fetch_xor_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline s64
+raw_atomic64_fetch_xor_relaxed(s64 i, atomic64_t *v)
+{
+#if defined(arch_atomic64_fetch_xor_relaxed)
+       return arch_atomic64_fetch_xor_relaxed(i, v);
+#elif defined(arch_atomic64_fetch_xor)
+       return arch_atomic64_fetch_xor(i, v);
+#else
+#error "Unable to define raw_atomic64_fetch_xor_relaxed"
 #endif
+}
 
-#ifndef arch_atomic64_fetch_xor
+/**
+ * raw_atomic64_xchg() - atomic exchange with full ordering
+ * @v: pointer to atomic64_t
+ * @new: s64 value to assign
+ *
+ * Atomically updates @v to @new with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_xchg() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
+raw_atomic64_xchg(atomic64_t *v, s64 new)
 {
+#if defined(arch_atomic64_xchg)
+       return arch_atomic64_xchg(v, new);
+#elif defined(arch_atomic64_xchg_relaxed)
        s64 ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic64_fetch_xor_relaxed(i, v);
+       ret = arch_atomic64_xchg_relaxed(v, new);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
+#else
+       return raw_xchg(&v->counter, new);
 #endif
+}
 
-#endif /* arch_atomic64_fetch_xor_relaxed */
-
-#ifndef arch_atomic64_xchg_relaxed
-#define arch_atomic64_xchg_acquire arch_atomic64_xchg
-#define arch_atomic64_xchg_release arch_atomic64_xchg
-#define arch_atomic64_xchg_relaxed arch_atomic64_xchg
-#else /* arch_atomic64_xchg_relaxed */
-
-#ifndef arch_atomic64_xchg_acquire
+/**
+ * raw_atomic64_xchg_acquire() - atomic exchange with acquire ordering
+ * @v: pointer to atomic64_t
+ * @new: s64 value to assign
+ *
+ * Atomically updates @v to @new with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_xchg_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_xchg_acquire(atomic64_t *v, s64 i)
+raw_atomic64_xchg_acquire(atomic64_t *v, s64 new)
 {
-       s64 ret = arch_atomic64_xchg_relaxed(v, i);
+#if defined(arch_atomic64_xchg_acquire)
+       return arch_atomic64_xchg_acquire(v, new);
+#elif defined(arch_atomic64_xchg_relaxed)
+       s64 ret = arch_atomic64_xchg_relaxed(v, new);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_xchg_acquire arch_atomic64_xchg_acquire
+#elif defined(arch_atomic64_xchg)
+       return arch_atomic64_xchg(v, new);
+#else
+       return raw_xchg_acquire(&v->counter, new);
 #endif
+}
 
-#ifndef arch_atomic64_xchg_release
+/**
+ * raw_atomic64_xchg_release() - atomic exchange with release ordering
+ * @v: pointer to atomic64_t
+ * @new: s64 value to assign
+ *
+ * Atomically updates @v to @new with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_xchg_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_xchg_release(atomic64_t *v, s64 i)
+raw_atomic64_xchg_release(atomic64_t *v, s64 new)
 {
+#if defined(arch_atomic64_xchg_release)
+       return arch_atomic64_xchg_release(v, new);
+#elif defined(arch_atomic64_xchg_relaxed)
        __atomic_release_fence();
-       return arch_atomic64_xchg_relaxed(v, i);
+       return arch_atomic64_xchg_relaxed(v, new);
+#elif defined(arch_atomic64_xchg)
+       return arch_atomic64_xchg(v, new);
+#else
+       return raw_xchg_release(&v->counter, new);
+#endif
 }
-#define arch_atomic64_xchg_release arch_atomic64_xchg_release
+
+/**
+ * raw_atomic64_xchg_relaxed() - atomic exchange with relaxed ordering
+ * @v: pointer to atomic64_t
+ * @new: s64 value to assign
+ *
+ * Atomically updates @v to @new with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_xchg_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
+static __always_inline s64
+raw_atomic64_xchg_relaxed(atomic64_t *v, s64 new)
+{
+#if defined(arch_atomic64_xchg_relaxed)
+       return arch_atomic64_xchg_relaxed(v, new);
+#elif defined(arch_atomic64_xchg)
+       return arch_atomic64_xchg(v, new);
+#else
+       return raw_xchg_relaxed(&v->counter, new);
 #endif
+}
 
-#ifndef arch_atomic64_xchg
+/**
+ * raw_atomic64_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic64_t
+ * @old: s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_cmpxchg() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_xchg(atomic64_t *v, s64 i)
+raw_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
+#if defined(arch_atomic64_cmpxchg)
+       return arch_atomic64_cmpxchg(v, old, new);
+#elif defined(arch_atomic64_cmpxchg_relaxed)
        s64 ret;
        __atomic_pre_full_fence();
-       ret = arch_atomic64_xchg_relaxed(v, i);
+       ret = arch_atomic64_cmpxchg_relaxed(v, old, new);
        __atomic_post_full_fence();
        return ret;
-}
-#define arch_atomic64_xchg arch_atomic64_xchg
+#else
+       return raw_cmpxchg(&v->counter, old, new);
 #endif
+}
 
-#endif /* arch_atomic64_xchg_relaxed */
-
-#ifndef arch_atomic64_cmpxchg_relaxed
-#define arch_atomic64_cmpxchg_acquire arch_atomic64_cmpxchg
-#define arch_atomic64_cmpxchg_release arch_atomic64_cmpxchg
-#define arch_atomic64_cmpxchg_relaxed arch_atomic64_cmpxchg
-#else /* arch_atomic64_cmpxchg_relaxed */
-
-#ifndef arch_atomic64_cmpxchg_acquire
+/**
+ * raw_atomic64_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic64_t
+ * @old: s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_cmpxchg_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
+raw_atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 {
+#if defined(arch_atomic64_cmpxchg_acquire)
+       return arch_atomic64_cmpxchg_acquire(v, old, new);
+#elif defined(arch_atomic64_cmpxchg_relaxed)
        s64 ret = arch_atomic64_cmpxchg_relaxed(v, old, new);
        __atomic_acquire_fence();
        return ret;
-}
-#define arch_atomic64_cmpxchg_acquire arch_atomic64_cmpxchg_acquire
+#elif defined(arch_atomic64_cmpxchg)
+       return arch_atomic64_cmpxchg(v, old, new);
+#else
+       return raw_cmpxchg_acquire(&v->counter, old, new);
 #endif
+}
 
-#ifndef arch_atomic64_cmpxchg_release
+/**
+ * raw_atomic64_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic64_t
+ * @old: s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_cmpxchg_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
+raw_atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 {
+#if defined(arch_atomic64_cmpxchg_release)
+       return arch_atomic64_cmpxchg_release(v, old, new);
+#elif defined(arch_atomic64_cmpxchg_relaxed)
        __atomic_release_fence();
        return arch_atomic64_cmpxchg_relaxed(v, old, new);
-}
-#define arch_atomic64_cmpxchg_release arch_atomic64_cmpxchg_release
+#elif defined(arch_atomic64_cmpxchg)
+       return arch_atomic64_cmpxchg(v, old, new);
+#else
+       return raw_cmpxchg_release(&v->counter, old, new);
 #endif
+}
 
-#ifndef arch_atomic64_cmpxchg
+/**
+ * raw_atomic64_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic64_t
+ * @old: s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_cmpxchg_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
+raw_atomic64_cmpxchg_relaxed(atomic64_t *v, s64 old, s64 new)
 {
-       s64 ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic64_cmpxchg_relaxed(v, old, new);
-       __atomic_post_full_fence();
-       return ret;
-}
-#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
+#if defined(arch_atomic64_cmpxchg_relaxed)
+       return arch_atomic64_cmpxchg_relaxed(v, old, new);
+#elif defined(arch_atomic64_cmpxchg)
+       return arch_atomic64_cmpxchg(v, old, new);
+#else
+       return raw_cmpxchg_relaxed(&v->counter, old, new);
 #endif
+}
 
-#endif /* arch_atomic64_cmpxchg_relaxed */
-
-#ifndef arch_atomic64_try_cmpxchg_relaxed
-#ifdef arch_atomic64_try_cmpxchg
-#define arch_atomic64_try_cmpxchg_acquire arch_atomic64_try_cmpxchg
-#define arch_atomic64_try_cmpxchg_release arch_atomic64_try_cmpxchg
-#define arch_atomic64_try_cmpxchg_relaxed arch_atomic64_try_cmpxchg
-#endif /* arch_atomic64_try_cmpxchg */
-
-#ifndef arch_atomic64_try_cmpxchg
+/**
+ * raw_atomic64_try_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic64_t
+ * @old: pointer to s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic64_try_cmpxchg() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+raw_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
+#if defined(arch_atomic64_try_cmpxchg)
+       return arch_atomic64_try_cmpxchg(v, old, new);
+#elif defined(arch_atomic64_try_cmpxchg_relaxed)
+       bool ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic64_try_cmpxchg_relaxed(v, old, new);
+       __atomic_post_full_fence();
+       return ret;
+#else
        s64 r, o = *old;
-       r = arch_atomic64_cmpxchg(v, o, new);
+       r = raw_atomic64_cmpxchg(v, o, new);
        if (unlikely(r != o))
                *old = r;
        return likely(r == o);
-}
-#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
 #endif
+}
 
-#ifndef arch_atomic64_try_cmpxchg_acquire
+/**
+ * raw_atomic64_try_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic64_t
+ * @old: pointer to s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic64_try_cmpxchg_acquire() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
+raw_atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
+#if defined(arch_atomic64_try_cmpxchg_acquire)
+       return arch_atomic64_try_cmpxchg_acquire(v, old, new);
+#elif defined(arch_atomic64_try_cmpxchg_relaxed)
+       bool ret = arch_atomic64_try_cmpxchg_relaxed(v, old, new);
+       __atomic_acquire_fence();
+       return ret;
+#elif defined(arch_atomic64_try_cmpxchg)
+       return arch_atomic64_try_cmpxchg(v, old, new);
+#else
        s64 r, o = *old;
-       r = arch_atomic64_cmpxchg_acquire(v, o, new);
+       r = raw_atomic64_cmpxchg_acquire(v, o, new);
        if (unlikely(r != o))
                *old = r;
        return likely(r == o);
-}
-#define arch_atomic64_try_cmpxchg_acquire arch_atomic64_try_cmpxchg_acquire
 #endif
+}
 
-#ifndef arch_atomic64_try_cmpxchg_release
+/**
+ * raw_atomic64_try_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic64_t
+ * @old: pointer to s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic64_try_cmpxchg_release() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
+raw_atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
+#if defined(arch_atomic64_try_cmpxchg_release)
+       return arch_atomic64_try_cmpxchg_release(v, old, new);
+#elif defined(arch_atomic64_try_cmpxchg_relaxed)
+       __atomic_release_fence();
+       return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
+#elif defined(arch_atomic64_try_cmpxchg)
+       return arch_atomic64_try_cmpxchg(v, old, new);
+#else
        s64 r, o = *old;
-       r = arch_atomic64_cmpxchg_release(v, o, new);
+       r = raw_atomic64_cmpxchg_release(v, o, new);
        if (unlikely(r != o))
                *old = r;
        return likely(r == o);
-}
-#define arch_atomic64_try_cmpxchg_release arch_atomic64_try_cmpxchg_release
 #endif
+}
 
-#ifndef arch_atomic64_try_cmpxchg_relaxed
+/**
+ * raw_atomic64_try_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic64_t
+ * @old: pointer to s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic64_try_cmpxchg_relaxed() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
+raw_atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 {
+#if defined(arch_atomic64_try_cmpxchg_relaxed)
+       return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
+#elif defined(arch_atomic64_try_cmpxchg)
+       return arch_atomic64_try_cmpxchg(v, old, new);
+#else
        s64 r, o = *old;
-       r = arch_atomic64_cmpxchg_relaxed(v, o, new);
+       r = raw_atomic64_cmpxchg_relaxed(v, o, new);
        if (unlikely(r != o))
                *old = r;
        return likely(r == o);
-}
-#define arch_atomic64_try_cmpxchg_relaxed arch_atomic64_try_cmpxchg_relaxed
-#endif
-
-#else /* arch_atomic64_try_cmpxchg_relaxed */
-
-#ifndef arch_atomic64_try_cmpxchg_acquire
-static __always_inline bool
-arch_atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
-{
-       bool ret = arch_atomic64_try_cmpxchg_relaxed(v, old, new);
-       __atomic_acquire_fence();
-       return ret;
-}
-#define arch_atomic64_try_cmpxchg_acquire arch_atomic64_try_cmpxchg_acquire
-#endif
-
-#ifndef arch_atomic64_try_cmpxchg_release
-static __always_inline bool
-arch_atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
-{
-       __atomic_release_fence();
-       return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
-}
-#define arch_atomic64_try_cmpxchg_release arch_atomic64_try_cmpxchg_release
 #endif
-
-#ifndef arch_atomic64_try_cmpxchg
-static __always_inline bool
-arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
-{
-       bool ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic64_try_cmpxchg_relaxed(v, old, new);
-       __atomic_post_full_fence();
-       return ret;
 }
-#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
-#endif
-
-#endif /* arch_atomic64_try_cmpxchg_relaxed */
 
-#ifndef arch_atomic64_sub_and_test
 /**
- * arch_atomic64_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer of type atomic64_t
+ * raw_atomic64_sub_and_test() - atomic subtract and test if zero with full ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
  *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_sub_and_test() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
  */
 static __always_inline bool
-arch_atomic64_sub_and_test(s64 i, atomic64_t *v)
+raw_atomic64_sub_and_test(s64 i, atomic64_t *v)
 {
-       return arch_atomic64_sub_return(i, v) == 0;
-}
-#define arch_atomic64_sub_and_test arch_atomic64_sub_and_test
+#if defined(arch_atomic64_sub_and_test)
+       return arch_atomic64_sub_and_test(i, v);
+#else
+       return raw_atomic64_sub_return(i, v) == 0;
 #endif
+}
 
-#ifndef arch_atomic64_dec_and_test
 /**
- * arch_atomic64_dec_and_test - decrement and test
- * @v: pointer of type atomic64_t
+ * raw_atomic64_dec_and_test() - atomic decrement and test if zero with full ordering
+ * @v: pointer to atomic64_t
  *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_dec_and_test() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
  */
 static __always_inline bool
-arch_atomic64_dec_and_test(atomic64_t *v)
+raw_atomic64_dec_and_test(atomic64_t *v)
 {
-       return arch_atomic64_dec_return(v) == 0;
-}
-#define arch_atomic64_dec_and_test arch_atomic64_dec_and_test
+#if defined(arch_atomic64_dec_and_test)
+       return arch_atomic64_dec_and_test(v);
+#else
+       return raw_atomic64_dec_return(v) == 0;
 #endif
+}
 
-#ifndef arch_atomic64_inc_and_test
 /**
- * arch_atomic64_inc_and_test - increment and test
- * @v: pointer of type atomic64_t
+ * raw_atomic64_inc_and_test() - atomic increment and test if zero with full ordering
+ * @v: pointer to atomic64_t
  *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_inc_and_test() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
  */
 static __always_inline bool
-arch_atomic64_inc_and_test(atomic64_t *v)
+raw_atomic64_inc_and_test(atomic64_t *v)
 {
-       return arch_atomic64_inc_return(v) == 0;
-}
-#define arch_atomic64_inc_and_test arch_atomic64_inc_and_test
+#if defined(arch_atomic64_inc_and_test)
+       return arch_atomic64_inc_and_test(v);
+#else
+       return raw_atomic64_inc_return(v) == 0;
 #endif
+}
 
-#ifndef arch_atomic64_add_negative_relaxed
-#ifdef arch_atomic64_add_negative
-#define arch_atomic64_add_negative_acquire arch_atomic64_add_negative
-#define arch_atomic64_add_negative_release arch_atomic64_add_negative
-#define arch_atomic64_add_negative_relaxed arch_atomic64_add_negative
-#endif /* arch_atomic64_add_negative */
-
-#ifndef arch_atomic64_add_negative
 /**
- * arch_atomic64_add_negative - Add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic64_t
+ * raw_atomic64_add_negative() - atomic add and test if negative with full ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
  *
- * Atomically adds @i to @v and returns true if the result is negative,
- * or false when the result is greater than or equal to zero.
+ * Safe to use in noinstr code; prefer atomic64_add_negative() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
  */
 static __always_inline bool
-arch_atomic64_add_negative(s64 i, atomic64_t *v)
+raw_atomic64_add_negative(s64 i, atomic64_t *v)
 {
-       return arch_atomic64_add_return(i, v) < 0;
-}
-#define arch_atomic64_add_negative arch_atomic64_add_negative
+#if defined(arch_atomic64_add_negative)
+       return arch_atomic64_add_negative(i, v);
+#elif defined(arch_atomic64_add_negative_relaxed)
+       bool ret;
+       __atomic_pre_full_fence();
+       ret = arch_atomic64_add_negative_relaxed(i, v);
+       __atomic_post_full_fence();
+       return ret;
+#else
+       return raw_atomic64_add_return(i, v) < 0;
 #endif
+}
 
-#ifndef arch_atomic64_add_negative_acquire
 /**
- * arch_atomic64_add_negative_acquire - Add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic64_t
+ * raw_atomic64_add_negative_acquire() - atomic add and test if negative with acquire ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_add_negative_acquire() elsewhere.
  *
- * Atomically adds @i to @v and returns true if the result is negative,
- * or false when the result is greater than or equal to zero.
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
  */
 static __always_inline bool
-arch_atomic64_add_negative_acquire(s64 i, atomic64_t *v)
+raw_atomic64_add_negative_acquire(s64 i, atomic64_t *v)
 {
-       return arch_atomic64_add_return_acquire(i, v) < 0;
-}
-#define arch_atomic64_add_negative_acquire arch_atomic64_add_negative_acquire
+#if defined(arch_atomic64_add_negative_acquire)
+       return arch_atomic64_add_negative_acquire(i, v);
+#elif defined(arch_atomic64_add_negative_relaxed)
+       bool ret = arch_atomic64_add_negative_relaxed(i, v);
+       __atomic_acquire_fence();
+       return ret;
+#elif defined(arch_atomic64_add_negative)
+       return arch_atomic64_add_negative(i, v);
+#else
+       return raw_atomic64_add_return_acquire(i, v) < 0;
 #endif
+}
 
-#ifndef arch_atomic64_add_negative_release
 /**
- * arch_atomic64_add_negative_release - Add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic64_t
+ * raw_atomic64_add_negative_release() - atomic add and test if negative with release ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
  *
- * Atomically adds @i to @v and returns true if the result is negative,
- * or false when the result is greater than or equal to zero.
+ * Safe to use in noinstr code; prefer atomic64_add_negative_release() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
  */
 static __always_inline bool
-arch_atomic64_add_negative_release(s64 i, atomic64_t *v)
+raw_atomic64_add_negative_release(s64 i, atomic64_t *v)
 {
-       return arch_atomic64_add_return_release(i, v) < 0;
-}
-#define arch_atomic64_add_negative_release arch_atomic64_add_negative_release
+#if defined(arch_atomic64_add_negative_release)
+       return arch_atomic64_add_negative_release(i, v);
+#elif defined(arch_atomic64_add_negative_relaxed)
+       __atomic_release_fence();
+       return arch_atomic64_add_negative_relaxed(i, v);
+#elif defined(arch_atomic64_add_negative)
+       return arch_atomic64_add_negative(i, v);
+#else
+       return raw_atomic64_add_return_release(i, v) < 0;
 #endif
+}
 
-#ifndef arch_atomic64_add_negative_relaxed
 /**
- * arch_atomic64_add_negative_relaxed - Add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic64_t
+ * raw_atomic64_add_negative_relaxed() - atomic add and test if negative with relaxed ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
  *
- * Atomically adds @i to @v and returns true if the result is negative,
- * or false when the result is greater than or equal to zero.
+ * Safe to use in noinstr code; prefer atomic64_add_negative_relaxed() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
  */
 static __always_inline bool
-arch_atomic64_add_negative_relaxed(s64 i, atomic64_t *v)
-{
-       return arch_atomic64_add_return_relaxed(i, v) < 0;
-}
-#define arch_atomic64_add_negative_relaxed arch_atomic64_add_negative_relaxed
-#endif
-
-#else /* arch_atomic64_add_negative_relaxed */
-
-#ifndef arch_atomic64_add_negative_acquire
-static __always_inline bool
-arch_atomic64_add_negative_acquire(s64 i, atomic64_t *v)
-{
-       bool ret = arch_atomic64_add_negative_relaxed(i, v);
-       __atomic_acquire_fence();
-       return ret;
-}
-#define arch_atomic64_add_negative_acquire arch_atomic64_add_negative_acquire
-#endif
-
-#ifndef arch_atomic64_add_negative_release
-static __always_inline bool
-arch_atomic64_add_negative_release(s64 i, atomic64_t *v)
+raw_atomic64_add_negative_relaxed(s64 i, atomic64_t *v)
 {
-       __atomic_release_fence();
+#if defined(arch_atomic64_add_negative_relaxed)
        return arch_atomic64_add_negative_relaxed(i, v);
-}
-#define arch_atomic64_add_negative_release arch_atomic64_add_negative_release
+#elif defined(arch_atomic64_add_negative)
+       return arch_atomic64_add_negative(i, v);
+#else
+       return raw_atomic64_add_return_relaxed(i, v) < 0;
 #endif
-
-#ifndef arch_atomic64_add_negative
-static __always_inline bool
-arch_atomic64_add_negative(s64 i, atomic64_t *v)
-{
-       bool ret;
-       __atomic_pre_full_fence();
-       ret = arch_atomic64_add_negative_relaxed(i, v);
-       __atomic_post_full_fence();
-       return ret;
 }
-#define arch_atomic64_add_negative arch_atomic64_add_negative
-#endif
 
-#endif /* arch_atomic64_add_negative_relaxed */
-
-#ifndef arch_atomic64_fetch_add_unless
 /**
- * arch_atomic64_fetch_add_unless - add unless the number is already a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
+ * raw_atomic64_fetch_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic64_t
+ * @a: s64 value to add
+ * @u: s64 value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
  *
- * Atomically adds @a to @v, so long as @v was not already @u.
- * Returns original value of @v
+ * Safe to use in noinstr code; prefer atomic64_fetch_add_unless() elsewhere.
+ *
+ * Return: The original value of @v.
  */
 static __always_inline s64
-arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
+raw_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
-       s64 c = arch_atomic64_read(v);
+#if defined(arch_atomic64_fetch_add_unless)
+       return arch_atomic64_fetch_add_unless(v, a, u);
+#else
+       s64 c = raw_atomic64_read(v);
 
        do {
                if (unlikely(c == u))
                        break;
-       } while (!arch_atomic64_try_cmpxchg(v, &c, c + a));
+       } while (!raw_atomic64_try_cmpxchg(v, &c, c + a));
 
        return c;
-}
-#define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
 #endif
+}
 
-#ifndef arch_atomic64_add_unless
 /**
- * arch_atomic64_add_unless - add unless the number is already a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
+ * raw_atomic64_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic64_t
+ * @a: s64 value to add
+ * @u: s64 value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
  *
- * Atomically adds @a to @v, if @v was not already @u.
- * Returns true if the addition was done.
+ * Safe to use in noinstr code; prefer atomic64_add_unless() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
  */
 static __always_inline bool
-arch_atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
+raw_atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 {
-       return arch_atomic64_fetch_add_unless(v, a, u) != u;
-}
-#define arch_atomic64_add_unless arch_atomic64_add_unless
+#if defined(arch_atomic64_add_unless)
+       return arch_atomic64_add_unless(v, a, u);
+#else
+       return raw_atomic64_fetch_add_unless(v, a, u) != u;
 #endif
+}
 
-#ifndef arch_atomic64_inc_not_zero
 /**
- * arch_atomic64_inc_not_zero - increment unless the number is zero
- * @v: pointer of type atomic64_t
+ * raw_atomic64_inc_not_zero() - atomic increment unless zero with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * If (@v != 0), atomically updates @v to (@v + 1) with full ordering.
  *
- * Atomically increments @v by 1, if @v is non-zero.
- * Returns true if the increment was done.
+ * Safe to use in noinstr code; prefer atomic64_inc_not_zero() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
  */
 static __always_inline bool
-arch_atomic64_inc_not_zero(atomic64_t *v)
+raw_atomic64_inc_not_zero(atomic64_t *v)
 {
-       return arch_atomic64_add_unless(v, 1, 0);
-}
-#define arch_atomic64_inc_not_zero arch_atomic64_inc_not_zero
+#if defined(arch_atomic64_inc_not_zero)
+       return arch_atomic64_inc_not_zero(v);
+#else
+       return raw_atomic64_add_unless(v, 1, 0);
 #endif
+}
 
-#ifndef arch_atomic64_inc_unless_negative
+/**
+ * raw_atomic64_inc_unless_negative() - atomic increment unless negative with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * If (@v >= 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_inc_unless_negative() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic64_inc_unless_negative(atomic64_t *v)
+raw_atomic64_inc_unless_negative(atomic64_t *v)
 {
-       s64 c = arch_atomic64_read(v);
+#if defined(arch_atomic64_inc_unless_negative)
+       return arch_atomic64_inc_unless_negative(v);
+#else
+       s64 c = raw_atomic64_read(v);
 
        do {
                if (unlikely(c < 0))
                        return false;
-       } while (!arch_atomic64_try_cmpxchg(v, &c, c + 1));
+       } while (!raw_atomic64_try_cmpxchg(v, &c, c + 1));
 
        return true;
-}
-#define arch_atomic64_inc_unless_negative arch_atomic64_inc_unless_negative
 #endif
+}
 
-#ifndef arch_atomic64_dec_unless_positive
+/**
+ * raw_atomic64_dec_unless_positive() - atomic decrement unless positive with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * If (@v <= 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_dec_unless_positive() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic64_dec_unless_positive(atomic64_t *v)
+raw_atomic64_dec_unless_positive(atomic64_t *v)
 {
-       s64 c = arch_atomic64_read(v);
+#if defined(arch_atomic64_dec_unless_positive)
+       return arch_atomic64_dec_unless_positive(v);
+#else
+       s64 c = raw_atomic64_read(v);
 
        do {
                if (unlikely(c > 0))
                        return false;
-       } while (!arch_atomic64_try_cmpxchg(v, &c, c - 1));
+       } while (!raw_atomic64_try_cmpxchg(v, &c, c - 1));
 
        return true;
-}
-#define arch_atomic64_dec_unless_positive arch_atomic64_dec_unless_positive
 #endif
+}
 
-#ifndef arch_atomic64_dec_if_positive
+/**
+ * raw_atomic64_dec_if_positive() - atomic decrement if positive with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * If (@v > 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic64_dec_if_positive() elsewhere.
+ *
+ * Return: The old value of (@v - 1), regardless of whether @v was updated.
+ */
 static __always_inline s64
-arch_atomic64_dec_if_positive(atomic64_t *v)
+raw_atomic64_dec_if_positive(atomic64_t *v)
 {
-       s64 dec, c = arch_atomic64_read(v);
+#if defined(arch_atomic64_dec_if_positive)
+       return arch_atomic64_dec_if_positive(v);
+#else
+       s64 dec, c = raw_atomic64_read(v);
 
        do {
                dec = c - 1;
                if (unlikely(dec < 0))
                        break;
-       } while (!arch_atomic64_try_cmpxchg(v, &c, dec));
+       } while (!raw_atomic64_try_cmpxchg(v, &c, dec));
 
        return dec;
-}
-#define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
 #endif
+}
 
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// ad2e2b4d168dbc60a73922616047a9bfa446af36
+// 202b45c7db600ce36198eb1f1fc2c2d5268ace2d
index 03a232a..d401b40 100644 (file)
@@ -4,15 +4,10 @@
 // DO NOT MODIFY THIS FILE DIRECTLY
 
 /*
- * This file provides wrappers with KASAN instrumentation for atomic operations.
- * To use this functionality an arch's atomic.h file needs to define all
- * atomic operations with arch_ prefix (e.g. arch_atomic_read()) and include
- * this file at the end. This file provides atomic_read() that forwards to
- * arch_atomic_read() for actual atomic operation.
- * Note: if an arch atomic operation is implemented by means of other atomic
- * operations (e.g. atomic_read()/atomic_cmpxchg() loop), then it needs to use
- * arch_ variants (i.e. arch_atomic_read()/arch_atomic_cmpxchg()) to avoid
- * double instrumentation.
+ * This file provoides atomic operations with explicit instrumentation (e.g.
+ * KASAN, KCSAN), which should be used unless it is necessary to avoid
+ * instrumentation. Where it is necessary to aovid instrumenation, the
+ * raw_atomic*() operations should be used.
  */
 #ifndef _LINUX_ATOMIC_INSTRUMENTED_H
 #define _LINUX_ATOMIC_INSTRUMENTED_H
 #include <linux/compiler.h>
 #include <linux/instrumented.h>
 
+/**
+ * atomic_read() - atomic load with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically loads the value of @v with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_read() there.
+ *
+ * Return: The value loaded from @v.
+ */
 static __always_inline int
 atomic_read(const atomic_t *v)
 {
        instrument_atomic_read(v, sizeof(*v));
-       return arch_atomic_read(v);
-}
-
+       return raw_atomic_read(v);
+}
+
+/**
+ * atomic_read_acquire() - atomic load with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically loads the value of @v with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_read_acquire() there.
+ *
+ * Return: The value loaded from @v.
+ */
 static __always_inline int
 atomic_read_acquire(const atomic_t *v)
 {
        instrument_atomic_read(v, sizeof(*v));
-       return arch_atomic_read_acquire(v);
-}
-
+       return raw_atomic_read_acquire(v);
+}
+
+/**
+ * atomic_set() - atomic set with relaxed ordering
+ * @v: pointer to atomic_t
+ * @i: int value to assign
+ *
+ * Atomically sets @v to @i with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_set() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_set(atomic_t *v, int i)
 {
        instrument_atomic_write(v, sizeof(*v));
-       arch_atomic_set(v, i);
-}
-
+       raw_atomic_set(v, i);
+}
+
+/**
+ * atomic_set_release() - atomic set with release ordering
+ * @v: pointer to atomic_t
+ * @i: int value to assign
+ *
+ * Atomically sets @v to @i with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_set_release() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_set_release(atomic_t *v, int i)
 {
        kcsan_release();
        instrument_atomic_write(v, sizeof(*v));
-       arch_atomic_set_release(v, i);
-}
-
+       raw_atomic_set_release(v, i);
+}
+
+/**
+ * atomic_add() - atomic add with relaxed ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_add(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_add(i, v);
+       raw_atomic_add(i, v);
 }
 
+/**
+ * atomic_add_return() - atomic add with full ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_add_return(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_add_return(i, v);
+       return raw_atomic_add_return(i, v);
 }
 
+/**
+ * atomic_add_return_acquire() - atomic add with acquire ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_add_return_acquire(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_add_return_acquire(i, v);
+       return raw_atomic_add_return_acquire(i, v);
 }
 
+/**
+ * atomic_add_return_release() - atomic add with release ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_add_return_release(int i, atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_add_return_release(i, v);
+       return raw_atomic_add_return_release(i, v);
 }
 
+/**
+ * atomic_add_return_relaxed() - atomic add with relaxed ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_add_return_relaxed(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_add_return_relaxed(i, v);
+       return raw_atomic_add_return_relaxed(i, v);
 }
 
+/**
+ * atomic_fetch_add() - atomic add with full ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_add() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_add(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_add(i, v);
+       return raw_atomic_fetch_add(i, v);
 }
 
+/**
+ * atomic_fetch_add_acquire() - atomic add with acquire ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_add_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_add_acquire(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_add_acquire(i, v);
+       return raw_atomic_fetch_add_acquire(i, v);
 }
 
+/**
+ * atomic_fetch_add_release() - atomic add with release ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_add_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_add_release(int i, atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_add_release(i, v);
+       return raw_atomic_fetch_add_release(i, v);
 }
 
+/**
+ * atomic_fetch_add_relaxed() - atomic add with relaxed ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_add_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_add_relaxed(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_add_relaxed(i, v);
+       return raw_atomic_fetch_add_relaxed(i, v);
 }
 
+/**
+ * atomic_sub() - atomic subtract with relaxed ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_sub() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_sub(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_sub(i, v);
+       raw_atomic_sub(i, v);
 }
 
+/**
+ * atomic_sub_return() - atomic subtract with full ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_sub_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_sub_return(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_sub_return(i, v);
+       return raw_atomic_sub_return(i, v);
 }
 
+/**
+ * atomic_sub_return_acquire() - atomic subtract with acquire ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_sub_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_sub_return_acquire(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_sub_return_acquire(i, v);
+       return raw_atomic_sub_return_acquire(i, v);
 }
 
+/**
+ * atomic_sub_return_release() - atomic subtract with release ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_sub_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_sub_return_release(int i, atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_sub_return_release(i, v);
+       return raw_atomic_sub_return_release(i, v);
 }
 
+/**
+ * atomic_sub_return_relaxed() - atomic subtract with relaxed ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_sub_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_sub_return_relaxed(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_sub_return_relaxed(i, v);
+       return raw_atomic_sub_return_relaxed(i, v);
 }
 
+/**
+ * atomic_fetch_sub() - atomic subtract with full ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_sub() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_sub(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_sub(i, v);
+       return raw_atomic_fetch_sub(i, v);
 }
 
+/**
+ * atomic_fetch_sub_acquire() - atomic subtract with acquire ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_sub_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_sub_acquire(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_sub_acquire(i, v);
+       return raw_atomic_fetch_sub_acquire(i, v);
 }
 
+/**
+ * atomic_fetch_sub_release() - atomic subtract with release ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_sub_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_sub_release(int i, atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_sub_release(i, v);
+       return raw_atomic_fetch_sub_release(i, v);
 }
 
+/**
+ * atomic_fetch_sub_relaxed() - atomic subtract with relaxed ordering
+ * @i: int value to subtract
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_sub_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_sub_relaxed(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_sub_relaxed(i, v);
+       return raw_atomic_fetch_sub_relaxed(i, v);
 }
 
+/**
+ * atomic_inc() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_inc() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_inc(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_inc(v);
+       raw_atomic_inc(v);
 }
 
+/**
+ * atomic_inc_return() - atomic increment with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_inc_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_inc_return(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_inc_return(v);
+       return raw_atomic_inc_return(v);
 }
 
+/**
+ * atomic_inc_return_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_inc_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_inc_return_acquire(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_inc_return_acquire(v);
+       return raw_atomic_inc_return_acquire(v);
 }
 
+/**
+ * atomic_inc_return_release() - atomic increment with release ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_inc_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_inc_return_release(atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_inc_return_release(v);
+       return raw_atomic_inc_return_release(v);
 }
 
+/**
+ * atomic_inc_return_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_inc_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_inc_return_relaxed(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_inc_return_relaxed(v);
+       return raw_atomic_inc_return_relaxed(v);
 }
 
+/**
+ * atomic_fetch_inc() - atomic increment with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_inc() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_inc(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_inc(v);
+       return raw_atomic_fetch_inc(v);
 }
 
+/**
+ * atomic_fetch_inc_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_inc_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_inc_acquire(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_inc_acquire(v);
+       return raw_atomic_fetch_inc_acquire(v);
 }
 
+/**
+ * atomic_fetch_inc_release() - atomic increment with release ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_inc_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_inc_release(atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_inc_release(v);
+       return raw_atomic_fetch_inc_release(v);
 }
 
+/**
+ * atomic_fetch_inc_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_inc_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_inc_relaxed(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_inc_relaxed(v);
+       return raw_atomic_fetch_inc_relaxed(v);
 }
 
+/**
+ * atomic_dec() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_dec() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_dec(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_dec(v);
+       raw_atomic_dec(v);
 }
 
+/**
+ * atomic_dec_return() - atomic decrement with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_dec_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_dec_return(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_dec_return(v);
+       return raw_atomic_dec_return(v);
 }
 
+/**
+ * atomic_dec_return_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_dec_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_dec_return_acquire(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_dec_return_acquire(v);
+       return raw_atomic_dec_return_acquire(v);
 }
 
+/**
+ * atomic_dec_return_release() - atomic decrement with release ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_dec_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_dec_return_release(atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_dec_return_release(v);
+       return raw_atomic_dec_return_release(v);
 }
 
+/**
+ * atomic_dec_return_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_dec_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline int
 atomic_dec_return_relaxed(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_dec_return_relaxed(v);
+       return raw_atomic_dec_return_relaxed(v);
 }
 
+/**
+ * atomic_fetch_dec() - atomic decrement with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_dec() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_dec(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_dec(v);
+       return raw_atomic_fetch_dec(v);
 }
 
+/**
+ * atomic_fetch_dec_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_dec_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_dec_acquire(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_dec_acquire(v);
+       return raw_atomic_fetch_dec_acquire(v);
 }
 
+/**
+ * atomic_fetch_dec_release() - atomic decrement with release ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_dec_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_dec_release(atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_dec_release(v);
+       return raw_atomic_fetch_dec_release(v);
 }
 
+/**
+ * atomic_fetch_dec_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_dec_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_dec_relaxed(atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_dec_relaxed(v);
+       return raw_atomic_fetch_dec_relaxed(v);
 }
 
+/**
+ * atomic_and() - atomic bitwise AND with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_and() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_and(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_and(i, v);
+       raw_atomic_and(i, v);
 }
 
+/**
+ * atomic_fetch_and() - atomic bitwise AND with full ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_and() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_and(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_and(i, v);
+       return raw_atomic_fetch_and(i, v);
 }
 
+/**
+ * atomic_fetch_and_acquire() - atomic bitwise AND with acquire ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_and_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_and_acquire(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_and_acquire(i, v);
+       return raw_atomic_fetch_and_acquire(i, v);
 }
 
+/**
+ * atomic_fetch_and_release() - atomic bitwise AND with release ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_and_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_and_release(int i, atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_and_release(i, v);
+       return raw_atomic_fetch_and_release(i, v);
 }
 
+/**
+ * atomic_fetch_and_relaxed() - atomic bitwise AND with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_and_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_and_relaxed(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_and_relaxed(i, v);
+       return raw_atomic_fetch_and_relaxed(i, v);
 }
 
+/**
+ * atomic_andnot() - atomic bitwise AND NOT with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_andnot() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_andnot(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_andnot(i, v);
+       raw_atomic_andnot(i, v);
 }
 
+/**
+ * atomic_fetch_andnot() - atomic bitwise AND NOT with full ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_andnot() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_andnot(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_andnot(i, v);
+       return raw_atomic_fetch_andnot(i, v);
 }
 
+/**
+ * atomic_fetch_andnot_acquire() - atomic bitwise AND NOT with acquire ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_andnot_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_andnot_acquire(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_andnot_acquire(i, v);
+       return raw_atomic_fetch_andnot_acquire(i, v);
 }
 
+/**
+ * atomic_fetch_andnot_release() - atomic bitwise AND NOT with release ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_andnot_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_andnot_release(int i, atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_andnot_release(i, v);
+       return raw_atomic_fetch_andnot_release(i, v);
 }
 
+/**
+ * atomic_fetch_andnot_relaxed() - atomic bitwise AND NOT with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_andnot_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_andnot_relaxed(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_andnot_relaxed(i, v);
+       return raw_atomic_fetch_andnot_relaxed(i, v);
 }
 
+/**
+ * atomic_or() - atomic bitwise OR with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_or() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_or(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_or(i, v);
+       raw_atomic_or(i, v);
 }
 
+/**
+ * atomic_fetch_or() - atomic bitwise OR with full ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_or() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_or(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_or(i, v);
+       return raw_atomic_fetch_or(i, v);
 }
 
+/**
+ * atomic_fetch_or_acquire() - atomic bitwise OR with acquire ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_or_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_or_acquire(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_or_acquire(i, v);
+       return raw_atomic_fetch_or_acquire(i, v);
 }
 
+/**
+ * atomic_fetch_or_release() - atomic bitwise OR with release ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_or_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_or_release(int i, atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_or_release(i, v);
+       return raw_atomic_fetch_or_release(i, v);
 }
 
+/**
+ * atomic_fetch_or_relaxed() - atomic bitwise OR with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_or_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_or_relaxed(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_or_relaxed(i, v);
+       return raw_atomic_fetch_or_relaxed(i, v);
 }
 
+/**
+ * atomic_xor() - atomic bitwise XOR with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_xor() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_xor(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_xor(i, v);
+       raw_atomic_xor(i, v);
 }
 
+/**
+ * atomic_fetch_xor() - atomic bitwise XOR with full ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_xor() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_xor(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_xor(i, v);
+       return raw_atomic_fetch_xor(i, v);
 }
 
+/**
+ * atomic_fetch_xor_acquire() - atomic bitwise XOR with acquire ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_xor_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_xor_acquire(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_xor_acquire(i, v);
+       return raw_atomic_fetch_xor_acquire(i, v);
 }
 
+/**
+ * atomic_fetch_xor_release() - atomic bitwise XOR with release ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_xor_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_xor_release(int i, atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_xor_release(i, v);
+       return raw_atomic_fetch_xor_release(i, v);
 }
 
+/**
+ * atomic_fetch_xor_relaxed() - atomic bitwise XOR with relaxed ordering
+ * @i: int value
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_xor_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_xor_relaxed(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_xor_relaxed(i, v);
+       return raw_atomic_fetch_xor_relaxed(i, v);
 }
 
+/**
+ * atomic_xchg() - atomic exchange with full ordering
+ * @v: pointer to atomic_t
+ * @new: int value to assign
+ *
+ * Atomically updates @v to @new with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_xchg() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-atomic_xchg(atomic_t *v, int i)
+atomic_xchg(atomic_t *v, int new)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_xchg(v, i);
+       return raw_atomic_xchg(v, new);
 }
 
+/**
+ * atomic_xchg_acquire() - atomic exchange with acquire ordering
+ * @v: pointer to atomic_t
+ * @new: int value to assign
+ *
+ * Atomically updates @v to @new with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_xchg_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-atomic_xchg_acquire(atomic_t *v, int i)
+atomic_xchg_acquire(atomic_t *v, int new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_xchg_acquire(v, i);
+       return raw_atomic_xchg_acquire(v, new);
 }
 
+/**
+ * atomic_xchg_release() - atomic exchange with release ordering
+ * @v: pointer to atomic_t
+ * @new: int value to assign
+ *
+ * Atomically updates @v to @new with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_xchg_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-atomic_xchg_release(atomic_t *v, int i)
+atomic_xchg_release(atomic_t *v, int new)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_xchg_release(v, i);
+       return raw_atomic_xchg_release(v, new);
 }
 
+/**
+ * atomic_xchg_relaxed() - atomic exchange with relaxed ordering
+ * @v: pointer to atomic_t
+ * @new: int value to assign
+ *
+ * Atomically updates @v to @new with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_xchg_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
-atomic_xchg_relaxed(atomic_t *v, int i)
+atomic_xchg_relaxed(atomic_t *v, int new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_xchg_relaxed(v, i);
+       return raw_atomic_xchg_relaxed(v, new);
 }
 
+/**
+ * atomic_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic_t
+ * @old: int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_cmpxchg() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_cmpxchg(atomic_t *v, int old, int new)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_cmpxchg(v, old, new);
+       return raw_atomic_cmpxchg(v, old, new);
 }
 
+/**
+ * atomic_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic_t
+ * @old: int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_cmpxchg_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_cmpxchg_acquire(v, old, new);
+       return raw_atomic_cmpxchg_acquire(v, old, new);
 }
 
+/**
+ * atomic_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic_t
+ * @old: int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_cmpxchg_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_cmpxchg_release(atomic_t *v, int old, int new)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_cmpxchg_release(v, old, new);
+       return raw_atomic_cmpxchg_release(v, old, new);
 }
 
+/**
+ * atomic_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic_t
+ * @old: int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_cmpxchg_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_cmpxchg_relaxed(atomic_t *v, int old, int new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_cmpxchg_relaxed(v, old, new);
+       return raw_atomic_cmpxchg_relaxed(v, old, new);
 }
 
+/**
+ * atomic_try_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic_t
+ * @old: pointer to int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_try_cmpxchg() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic_try_cmpxchg(v, old, new);
-}
-
+       return raw_atomic_try_cmpxchg(v, old, new);
+}
+
+/**
+ * atomic_try_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic_t
+ * @old: pointer to int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_try_cmpxchg_acquire() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic_try_cmpxchg_acquire(v, old, new);
-}
-
+       return raw_atomic_try_cmpxchg_acquire(v, old, new);
+}
+
+/**
+ * atomic_try_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic_t
+ * @old: pointer to int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_try_cmpxchg_release() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic_try_cmpxchg_release(v, old, new);
-}
-
+       return raw_atomic_try_cmpxchg_release(v, old, new);
+}
+
+/**
+ * atomic_try_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic_t
+ * @old: pointer to int value to compare with
+ * @new: int value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_try_cmpxchg_relaxed() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic_try_cmpxchg_relaxed(v, old, new);
-}
-
+       return raw_atomic_try_cmpxchg_relaxed(v, old, new);
+}
+
+/**
+ * atomic_sub_and_test() - atomic subtract and test if zero with full ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_sub_and_test() there.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
 atomic_sub_and_test(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_sub_and_test(i, v);
+       return raw_atomic_sub_and_test(i, v);
 }
 
+/**
+ * atomic_dec_and_test() - atomic decrement and test if zero with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_dec_and_test() there.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
 atomic_dec_and_test(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_dec_and_test(v);
+       return raw_atomic_dec_and_test(v);
 }
 
+/**
+ * atomic_inc_and_test() - atomic increment and test if zero with full ordering
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_inc_and_test() there.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
 atomic_inc_and_test(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_inc_and_test(v);
+       return raw_atomic_inc_and_test(v);
 }
 
+/**
+ * atomic_add_negative() - atomic add and test if negative with full ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add_negative() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic_add_negative(int i, atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_add_negative(i, v);
+       return raw_atomic_add_negative(i, v);
 }
 
+/**
+ * atomic_add_negative_acquire() - atomic add and test if negative with acquire ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add_negative_acquire() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic_add_negative_acquire(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_add_negative_acquire(i, v);
+       return raw_atomic_add_negative_acquire(i, v);
 }
 
+/**
+ * atomic_add_negative_release() - atomic add and test if negative with release ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add_negative_release() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic_add_negative_release(int i, atomic_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_add_negative_release(i, v);
+       return raw_atomic_add_negative_release(i, v);
 }
 
+/**
+ * atomic_add_negative_relaxed() - atomic add and test if negative with relaxed ordering
+ * @i: int value to add
+ * @v: pointer to atomic_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add_negative_relaxed() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic_add_negative_relaxed(int i, atomic_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_add_negative_relaxed(i, v);
+       return raw_atomic_add_negative_relaxed(i, v);
 }
 
+/**
+ * atomic_fetch_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic_t
+ * @a: int value to add
+ * @u: int value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_fetch_add_unless() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline int
 atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_fetch_add_unless(v, a, u);
+       return raw_atomic_fetch_add_unless(v, a, u);
 }
 
+/**
+ * atomic_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic_t
+ * @a: int value to add
+ * @u: int value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_add_unless() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic_add_unless(atomic_t *v, int a, int u)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_add_unless(v, a, u);
+       return raw_atomic_add_unless(v, a, u);
 }
 
+/**
+ * atomic_inc_not_zero() - atomic increment unless zero with full ordering
+ * @v: pointer to atomic_t
+ *
+ * If (@v != 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_inc_not_zero() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic_inc_not_zero(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_inc_not_zero(v);
+       return raw_atomic_inc_not_zero(v);
 }
 
+/**
+ * atomic_inc_unless_negative() - atomic increment unless negative with full ordering
+ * @v: pointer to atomic_t
+ *
+ * If (@v >= 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_inc_unless_negative() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic_inc_unless_negative(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_inc_unless_negative(v);
+       return raw_atomic_inc_unless_negative(v);
 }
 
+/**
+ * atomic_dec_unless_positive() - atomic decrement unless positive with full ordering
+ * @v: pointer to atomic_t
+ *
+ * If (@v <= 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_dec_unless_positive() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic_dec_unless_positive(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_dec_unless_positive(v);
+       return raw_atomic_dec_unless_positive(v);
 }
 
+/**
+ * atomic_dec_if_positive() - atomic decrement if positive with full ordering
+ * @v: pointer to atomic_t
+ *
+ * If (@v > 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_dec_if_positive() there.
+ *
+ * Return: The old value of (@v - 1), regardless of whether @v was updated.
+ */
 static __always_inline int
 atomic_dec_if_positive(atomic_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_dec_if_positive(v);
+       return raw_atomic_dec_if_positive(v);
 }
 
+/**
+ * atomic64_read() - atomic load with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically loads the value of @v with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_read() there.
+ *
+ * Return: The value loaded from @v.
+ */
 static __always_inline s64
 atomic64_read(const atomic64_t *v)
 {
        instrument_atomic_read(v, sizeof(*v));
-       return arch_atomic64_read(v);
-}
-
+       return raw_atomic64_read(v);
+}
+
+/**
+ * atomic64_read_acquire() - atomic load with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically loads the value of @v with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_read_acquire() there.
+ *
+ * Return: The value loaded from @v.
+ */
 static __always_inline s64
 atomic64_read_acquire(const atomic64_t *v)
 {
        instrument_atomic_read(v, sizeof(*v));
-       return arch_atomic64_read_acquire(v);
-}
-
+       return raw_atomic64_read_acquire(v);
+}
+
+/**
+ * atomic64_set() - atomic set with relaxed ordering
+ * @v: pointer to atomic64_t
+ * @i: s64 value to assign
+ *
+ * Atomically sets @v to @i with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_set() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_set(atomic64_t *v, s64 i)
 {
        instrument_atomic_write(v, sizeof(*v));
-       arch_atomic64_set(v, i);
-}
-
+       raw_atomic64_set(v, i);
+}
+
+/**
+ * atomic64_set_release() - atomic set with release ordering
+ * @v: pointer to atomic64_t
+ * @i: s64 value to assign
+ *
+ * Atomically sets @v to @i with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_set_release() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_set_release(atomic64_t *v, s64 i)
 {
        kcsan_release();
        instrument_atomic_write(v, sizeof(*v));
-       arch_atomic64_set_release(v, i);
-}
-
+       raw_atomic64_set_release(v, i);
+}
+
+/**
+ * atomic64_add() - atomic add with relaxed ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_add(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic64_add(i, v);
+       raw_atomic64_add(i, v);
 }
 
+/**
+ * atomic64_add_return() - atomic add with full ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_add_return(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_add_return(i, v);
+       return raw_atomic64_add_return(i, v);
 }
 
+/**
+ * atomic64_add_return_acquire() - atomic add with acquire ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_add_return_acquire(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_add_return_acquire(i, v);
+       return raw_atomic64_add_return_acquire(i, v);
 }
 
+/**
+ * atomic64_add_return_release() - atomic add with release ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_add_return_release(s64 i, atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_add_return_release(i, v);
+       return raw_atomic64_add_return_release(i, v);
 }
 
+/**
+ * atomic64_add_return_relaxed() - atomic add with relaxed ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_add_return_relaxed(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_add_return_relaxed(i, v);
+       return raw_atomic64_add_return_relaxed(i, v);
 }
 
+/**
+ * atomic64_fetch_add() - atomic add with full ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_add() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_add(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_add(i, v);
+       return raw_atomic64_fetch_add(i, v);
 }
 
+/**
+ * atomic64_fetch_add_acquire() - atomic add with acquire ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_add_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_add_acquire(i, v);
+       return raw_atomic64_fetch_add_acquire(i, v);
 }
 
+/**
+ * atomic64_fetch_add_release() - atomic add with release ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_add_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_add_release(s64 i, atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_add_release(i, v);
+       return raw_atomic64_fetch_add_release(i, v);
 }
 
+/**
+ * atomic64_fetch_add_relaxed() - atomic add with relaxed ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_add_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_add_relaxed(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_add_relaxed(i, v);
+       return raw_atomic64_fetch_add_relaxed(i, v);
 }
 
+/**
+ * atomic64_sub() - atomic subtract with relaxed ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_sub() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_sub(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic64_sub(i, v);
+       raw_atomic64_sub(i, v);
 }
 
+/**
+ * atomic64_sub_return() - atomic subtract with full ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_sub_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_sub_return(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_sub_return(i, v);
+       return raw_atomic64_sub_return(i, v);
 }
 
+/**
+ * atomic64_sub_return_acquire() - atomic subtract with acquire ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_sub_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_sub_return_acquire(i, v);
+       return raw_atomic64_sub_return_acquire(i, v);
 }
 
+/**
+ * atomic64_sub_return_release() - atomic subtract with release ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_sub_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_sub_return_release(s64 i, atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_sub_return_release(i, v);
+       return raw_atomic64_sub_return_release(i, v);
 }
 
+/**
+ * atomic64_sub_return_relaxed() - atomic subtract with relaxed ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_sub_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_sub_return_relaxed(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_sub_return_relaxed(i, v);
+       return raw_atomic64_sub_return_relaxed(i, v);
 }
 
+/**
+ * atomic64_fetch_sub() - atomic subtract with full ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_sub() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_sub(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_sub(i, v);
+       return raw_atomic64_fetch_sub(i, v);
 }
 
+/**
+ * atomic64_fetch_sub_acquire() - atomic subtract with acquire ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_sub_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_sub_acquire(i, v);
+       return raw_atomic64_fetch_sub_acquire(i, v);
 }
 
+/**
+ * atomic64_fetch_sub_release() - atomic subtract with release ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_sub_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_sub_release(i, v);
+       return raw_atomic64_fetch_sub_release(i, v);
 }
 
+/**
+ * atomic64_fetch_sub_relaxed() - atomic subtract with relaxed ordering
+ * @i: s64 value to subtract
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_sub_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_sub_relaxed(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_sub_relaxed(i, v);
+       return raw_atomic64_fetch_sub_relaxed(i, v);
 }
 
+/**
+ * atomic64_inc() - atomic increment with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_inc() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_inc(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic64_inc(v);
+       raw_atomic64_inc(v);
 }
 
+/**
+ * atomic64_inc_return() - atomic increment with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_inc_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_inc_return(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_inc_return(v);
+       return raw_atomic64_inc_return(v);
 }
 
+/**
+ * atomic64_inc_return_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_inc_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_inc_return_acquire(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_inc_return_acquire(v);
+       return raw_atomic64_inc_return_acquire(v);
 }
 
+/**
+ * atomic64_inc_return_release() - atomic increment with release ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_inc_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_inc_return_release(atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_inc_return_release(v);
+       return raw_atomic64_inc_return_release(v);
 }
 
+/**
+ * atomic64_inc_return_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_inc_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_inc_return_relaxed(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_inc_return_relaxed(v);
+       return raw_atomic64_inc_return_relaxed(v);
 }
 
+/**
+ * atomic64_fetch_inc() - atomic increment with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_inc() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_inc(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_inc(v);
+       return raw_atomic64_fetch_inc(v);
 }
 
+/**
+ * atomic64_fetch_inc_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_inc_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_inc_acquire(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_inc_acquire(v);
+       return raw_atomic64_fetch_inc_acquire(v);
 }
 
+/**
+ * atomic64_fetch_inc_release() - atomic increment with release ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_inc_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_inc_release(atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_inc_release(v);
+       return raw_atomic64_fetch_inc_release(v);
 }
 
+/**
+ * atomic64_fetch_inc_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_inc_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_inc_relaxed(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_inc_relaxed(v);
+       return raw_atomic64_fetch_inc_relaxed(v);
 }
 
+/**
+ * atomic64_dec() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_dec() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_dec(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic64_dec(v);
+       raw_atomic64_dec(v);
 }
 
+/**
+ * atomic64_dec_return() - atomic decrement with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_dec_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_dec_return(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_dec_return(v);
+       return raw_atomic64_dec_return(v);
 }
 
+/**
+ * atomic64_dec_return_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_dec_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_dec_return_acquire(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_dec_return_acquire(v);
+       return raw_atomic64_dec_return_acquire(v);
 }
 
+/**
+ * atomic64_dec_return_release() - atomic decrement with release ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_dec_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_dec_return_release(atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_dec_return_release(v);
+       return raw_atomic64_dec_return_release(v);
 }
 
+/**
+ * atomic64_dec_return_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_dec_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline s64
 atomic64_dec_return_relaxed(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_dec_return_relaxed(v);
+       return raw_atomic64_dec_return_relaxed(v);
 }
 
+/**
+ * atomic64_fetch_dec() - atomic decrement with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_dec() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_dec(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_dec(v);
+       return raw_atomic64_fetch_dec(v);
 }
 
+/**
+ * atomic64_fetch_dec_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_dec_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_dec_acquire(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_dec_acquire(v);
+       return raw_atomic64_fetch_dec_acquire(v);
 }
 
+/**
+ * atomic64_fetch_dec_release() - atomic decrement with release ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_dec_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_dec_release(atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_dec_release(v);
+       return raw_atomic64_fetch_dec_release(v);
 }
 
+/**
+ * atomic64_fetch_dec_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_dec_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_dec_relaxed(atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_dec_relaxed(v);
+       return raw_atomic64_fetch_dec_relaxed(v);
 }
 
+/**
+ * atomic64_and() - atomic bitwise AND with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_and() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_and(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic64_and(i, v);
+       raw_atomic64_and(i, v);
 }
 
+/**
+ * atomic64_fetch_and() - atomic bitwise AND with full ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_and() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_and(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_and(i, v);
+       return raw_atomic64_fetch_and(i, v);
 }
 
+/**
+ * atomic64_fetch_and_acquire() - atomic bitwise AND with acquire ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_and_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_and_acquire(i, v);
+       return raw_atomic64_fetch_and_acquire(i, v);
 }
 
+/**
+ * atomic64_fetch_and_release() - atomic bitwise AND with release ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_and_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_and_release(s64 i, atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_and_release(i, v);
+       return raw_atomic64_fetch_and_release(i, v);
 }
 
+/**
+ * atomic64_fetch_and_relaxed() - atomic bitwise AND with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_and_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_and_relaxed(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_and_relaxed(i, v);
+       return raw_atomic64_fetch_and_relaxed(i, v);
 }
 
+/**
+ * atomic64_andnot() - atomic bitwise AND NOT with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_andnot() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_andnot(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic64_andnot(i, v);
+       raw_atomic64_andnot(i, v);
 }
 
+/**
+ * atomic64_fetch_andnot() - atomic bitwise AND NOT with full ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_andnot() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_andnot(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_andnot(i, v);
+       return raw_atomic64_fetch_andnot(i, v);
 }
 
+/**
+ * atomic64_fetch_andnot_acquire() - atomic bitwise AND NOT with acquire ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_andnot_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_andnot_acquire(i, v);
+       return raw_atomic64_fetch_andnot_acquire(i, v);
 }
 
+/**
+ * atomic64_fetch_andnot_release() - atomic bitwise AND NOT with release ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_andnot_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_andnot_release(i, v);
+       return raw_atomic64_fetch_andnot_release(i, v);
 }
 
+/**
+ * atomic64_fetch_andnot_relaxed() - atomic bitwise AND NOT with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_andnot_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_andnot_relaxed(i, v);
+       return raw_atomic64_fetch_andnot_relaxed(i, v);
 }
 
+/**
+ * atomic64_or() - atomic bitwise OR with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_or() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_or(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic64_or(i, v);
+       raw_atomic64_or(i, v);
 }
 
+/**
+ * atomic64_fetch_or() - atomic bitwise OR with full ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_or() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_or(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_or(i, v);
+       return raw_atomic64_fetch_or(i, v);
 }
 
+/**
+ * atomic64_fetch_or_acquire() - atomic bitwise OR with acquire ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_or_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_or_acquire(i, v);
+       return raw_atomic64_fetch_or_acquire(i, v);
 }
 
+/**
+ * atomic64_fetch_or_release() - atomic bitwise OR with release ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_or_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_or_release(s64 i, atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_or_release(i, v);
+       return raw_atomic64_fetch_or_release(i, v);
 }
 
+/**
+ * atomic64_fetch_or_relaxed() - atomic bitwise OR with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_or_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_or_relaxed(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_or_relaxed(i, v);
+       return raw_atomic64_fetch_or_relaxed(i, v);
 }
 
+/**
+ * atomic64_xor() - atomic bitwise XOR with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_xor() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic64_xor(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic64_xor(i, v);
+       raw_atomic64_xor(i, v);
 }
 
+/**
+ * atomic64_fetch_xor() - atomic bitwise XOR with full ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_xor() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_xor(i, v);
+       return raw_atomic64_fetch_xor(i, v);
 }
 
+/**
+ * atomic64_fetch_xor_acquire() - atomic bitwise XOR with acquire ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_xor_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_xor_acquire(i, v);
+       return raw_atomic64_fetch_xor_acquire(i, v);
 }
 
+/**
+ * atomic64_fetch_xor_release() - atomic bitwise XOR with release ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_xor_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_xor_release(i, v);
+       return raw_atomic64_fetch_xor_release(i, v);
 }
 
+/**
+ * atomic64_fetch_xor_relaxed() - atomic bitwise XOR with relaxed ordering
+ * @i: s64 value
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_xor_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_xor_relaxed(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_xor_relaxed(i, v);
+       return raw_atomic64_fetch_xor_relaxed(i, v);
 }
 
+/**
+ * atomic64_xchg() - atomic exchange with full ordering
+ * @v: pointer to atomic64_t
+ * @new: s64 value to assign
+ *
+ * Atomically updates @v to @new with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_xchg() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-atomic64_xchg(atomic64_t *v, s64 i)
+atomic64_xchg(atomic64_t *v, s64 new)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_xchg(v, i);
+       return raw_atomic64_xchg(v, new);
 }
 
+/**
+ * atomic64_xchg_acquire() - atomic exchange with acquire ordering
+ * @v: pointer to atomic64_t
+ * @new: s64 value to assign
+ *
+ * Atomically updates @v to @new with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_xchg_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-atomic64_xchg_acquire(atomic64_t *v, s64 i)
+atomic64_xchg_acquire(atomic64_t *v, s64 new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_xchg_acquire(v, i);
+       return raw_atomic64_xchg_acquire(v, new);
 }
 
+/**
+ * atomic64_xchg_release() - atomic exchange with release ordering
+ * @v: pointer to atomic64_t
+ * @new: s64 value to assign
+ *
+ * Atomically updates @v to @new with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_xchg_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-atomic64_xchg_release(atomic64_t *v, s64 i)
+atomic64_xchg_release(atomic64_t *v, s64 new)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_xchg_release(v, i);
+       return raw_atomic64_xchg_release(v, new);
 }
 
+/**
+ * atomic64_xchg_relaxed() - atomic exchange with relaxed ordering
+ * @v: pointer to atomic64_t
+ * @new: s64 value to assign
+ *
+ * Atomically updates @v to @new with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_xchg_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
-atomic64_xchg_relaxed(atomic64_t *v, s64 i)
+atomic64_xchg_relaxed(atomic64_t *v, s64 new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_xchg_relaxed(v, i);
+       return raw_atomic64_xchg_relaxed(v, new);
 }
 
+/**
+ * atomic64_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic64_t
+ * @old: s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_cmpxchg() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_cmpxchg(v, old, new);
+       return raw_atomic64_cmpxchg(v, old, new);
 }
 
+/**
+ * atomic64_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic64_t
+ * @old: s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_cmpxchg_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_cmpxchg_acquire(v, old, new);
+       return raw_atomic64_cmpxchg_acquire(v, old, new);
 }
 
+/**
+ * atomic64_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic64_t
+ * @old: s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_cmpxchg_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_cmpxchg_release(v, old, new);
+       return raw_atomic64_cmpxchg_release(v, old, new);
 }
 
+/**
+ * atomic64_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic64_t
+ * @old: s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_cmpxchg_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_cmpxchg_relaxed(atomic64_t *v, s64 old, s64 new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_cmpxchg_relaxed(v, old, new);
+       return raw_atomic64_cmpxchg_relaxed(v, old, new);
 }
 
+/**
+ * atomic64_try_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic64_t
+ * @old: pointer to s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_try_cmpxchg() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic64_try_cmpxchg(v, old, new);
-}
-
+       return raw_atomic64_try_cmpxchg(v, old, new);
+}
+
+/**
+ * atomic64_try_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic64_t
+ * @old: pointer to s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_try_cmpxchg_acquire() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic64_try_cmpxchg_acquire(v, old, new);
-}
-
+       return raw_atomic64_try_cmpxchg_acquire(v, old, new);
+}
+
+/**
+ * atomic64_try_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic64_t
+ * @old: pointer to s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_try_cmpxchg_release() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic64_try_cmpxchg_release(v, old, new);
-}
-
+       return raw_atomic64_try_cmpxchg_release(v, old, new);
+}
+
+/**
+ * atomic64_try_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic64_t
+ * @old: pointer to s64 value to compare with
+ * @new: s64 value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_try_cmpxchg_relaxed() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
-}
-
+       return raw_atomic64_try_cmpxchg_relaxed(v, old, new);
+}
+
+/**
+ * atomic64_sub_and_test() - atomic subtract and test if zero with full ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_sub_and_test() there.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
 atomic64_sub_and_test(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_sub_and_test(i, v);
+       return raw_atomic64_sub_and_test(i, v);
 }
 
+/**
+ * atomic64_dec_and_test() - atomic decrement and test if zero with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_dec_and_test() there.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
 atomic64_dec_and_test(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_dec_and_test(v);
+       return raw_atomic64_dec_and_test(v);
 }
 
+/**
+ * atomic64_inc_and_test() - atomic increment and test if zero with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_inc_and_test() there.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
 atomic64_inc_and_test(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_inc_and_test(v);
+       return raw_atomic64_inc_and_test(v);
 }
 
+/**
+ * atomic64_add_negative() - atomic add and test if negative with full ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add_negative() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic64_add_negative(s64 i, atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_add_negative(i, v);
+       return raw_atomic64_add_negative(i, v);
 }
 
+/**
+ * atomic64_add_negative_acquire() - atomic add and test if negative with acquire ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add_negative_acquire() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic64_add_negative_acquire(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_add_negative_acquire(i, v);
+       return raw_atomic64_add_negative_acquire(i, v);
 }
 
+/**
+ * atomic64_add_negative_release() - atomic add and test if negative with release ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add_negative_release() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic64_add_negative_release(s64 i, atomic64_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_add_negative_release(i, v);
+       return raw_atomic64_add_negative_release(i, v);
 }
 
+/**
+ * atomic64_add_negative_relaxed() - atomic add and test if negative with relaxed ordering
+ * @i: s64 value to add
+ * @v: pointer to atomic64_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add_negative_relaxed() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic64_add_negative_relaxed(s64 i, atomic64_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_add_negative_relaxed(i, v);
+       return raw_atomic64_add_negative_relaxed(i, v);
 }
 
+/**
+ * atomic64_fetch_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic64_t
+ * @a: s64 value to add
+ * @u: s64 value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_fetch_add_unless() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline s64
 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_fetch_add_unless(v, a, u);
+       return raw_atomic64_fetch_add_unless(v, a, u);
 }
 
+/**
+ * atomic64_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic64_t
+ * @a: s64 value to add
+ * @u: s64 value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_add_unless() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_add_unless(v, a, u);
+       return raw_atomic64_add_unless(v, a, u);
 }
 
+/**
+ * atomic64_inc_not_zero() - atomic increment unless zero with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * If (@v != 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_inc_not_zero() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic64_inc_not_zero(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_inc_not_zero(v);
+       return raw_atomic64_inc_not_zero(v);
 }
 
+/**
+ * atomic64_inc_unless_negative() - atomic increment unless negative with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * If (@v >= 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_inc_unless_negative() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic64_inc_unless_negative(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_inc_unless_negative(v);
+       return raw_atomic64_inc_unless_negative(v);
 }
 
+/**
+ * atomic64_dec_unless_positive() - atomic decrement unless positive with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * If (@v <= 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_dec_unless_positive() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic64_dec_unless_positive(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_dec_unless_positive(v);
+       return raw_atomic64_dec_unless_positive(v);
 }
 
+/**
+ * atomic64_dec_if_positive() - atomic decrement if positive with full ordering
+ * @v: pointer to atomic64_t
+ *
+ * If (@v > 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic64_dec_if_positive() there.
+ *
+ * Return: The old value of (@v - 1), regardless of whether @v was updated.
+ */
 static __always_inline s64
 atomic64_dec_if_positive(atomic64_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic64_dec_if_positive(v);
+       return raw_atomic64_dec_if_positive(v);
 }
 
+/**
+ * atomic_long_read() - atomic load with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically loads the value of @v with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_read() there.
+ *
+ * Return: The value loaded from @v.
+ */
 static __always_inline long
 atomic_long_read(const atomic_long_t *v)
 {
        instrument_atomic_read(v, sizeof(*v));
-       return arch_atomic_long_read(v);
-}
-
+       return raw_atomic_long_read(v);
+}
+
+/**
+ * atomic_long_read_acquire() - atomic load with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically loads the value of @v with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_read_acquire() there.
+ *
+ * Return: The value loaded from @v.
+ */
 static __always_inline long
 atomic_long_read_acquire(const atomic_long_t *v)
 {
        instrument_atomic_read(v, sizeof(*v));
-       return arch_atomic_long_read_acquire(v);
-}
-
+       return raw_atomic_long_read_acquire(v);
+}
+
+/**
+ * atomic_long_set() - atomic set with relaxed ordering
+ * @v: pointer to atomic_long_t
+ * @i: long value to assign
+ *
+ * Atomically sets @v to @i with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_set() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_set(atomic_long_t *v, long i)
 {
        instrument_atomic_write(v, sizeof(*v));
-       arch_atomic_long_set(v, i);
-}
-
+       raw_atomic_long_set(v, i);
+}
+
+/**
+ * atomic_long_set_release() - atomic set with release ordering
+ * @v: pointer to atomic_long_t
+ * @i: long value to assign
+ *
+ * Atomically sets @v to @i with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_set_release() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_set_release(atomic_long_t *v, long i)
 {
        kcsan_release();
        instrument_atomic_write(v, sizeof(*v));
-       arch_atomic_long_set_release(v, i);
-}
-
+       raw_atomic_long_set_release(v, i);
+}
+
+/**
+ * atomic_long_add() - atomic add with relaxed ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_add(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_long_add(i, v);
+       raw_atomic_long_add(i, v);
 }
 
+/**
+ * atomic_long_add_return() - atomic add with full ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_add_return(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_add_return(i, v);
+       return raw_atomic_long_add_return(i, v);
 }
 
+/**
+ * atomic_long_add_return_acquire() - atomic add with acquire ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_add_return_acquire(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_add_return_acquire(i, v);
+       return raw_atomic_long_add_return_acquire(i, v);
 }
 
+/**
+ * atomic_long_add_return_release() - atomic add with release ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_add_return_release(long i, atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_add_return_release(i, v);
+       return raw_atomic_long_add_return_release(i, v);
 }
 
+/**
+ * atomic_long_add_return_relaxed() - atomic add with relaxed ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_add_return_relaxed(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_add_return_relaxed(i, v);
+       return raw_atomic_long_add_return_relaxed(i, v);
 }
 
+/**
+ * atomic_long_fetch_add() - atomic add with full ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_add() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_add(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_add(i, v);
+       return raw_atomic_long_fetch_add(i, v);
 }
 
+/**
+ * atomic_long_fetch_add_acquire() - atomic add with acquire ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_add_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_add_acquire(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_add_acquire(i, v);
+       return raw_atomic_long_fetch_add_acquire(i, v);
 }
 
+/**
+ * atomic_long_fetch_add_release() - atomic add with release ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_add_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_add_release(long i, atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_add_release(i, v);
+       return raw_atomic_long_fetch_add_release(i, v);
 }
 
+/**
+ * atomic_long_fetch_add_relaxed() - atomic add with relaxed ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_add_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_add_relaxed(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_add_relaxed(i, v);
+       return raw_atomic_long_fetch_add_relaxed(i, v);
 }
 
+/**
+ * atomic_long_sub() - atomic subtract with relaxed ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_sub() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_sub(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_long_sub(i, v);
+       raw_atomic_long_sub(i, v);
 }
 
+/**
+ * atomic_long_sub_return() - atomic subtract with full ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_sub_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_sub_return(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_sub_return(i, v);
+       return raw_atomic_long_sub_return(i, v);
 }
 
+/**
+ * atomic_long_sub_return_acquire() - atomic subtract with acquire ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_sub_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_sub_return_acquire(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_sub_return_acquire(i, v);
+       return raw_atomic_long_sub_return_acquire(i, v);
 }
 
+/**
+ * atomic_long_sub_return_release() - atomic subtract with release ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_sub_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_sub_return_release(long i, atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_sub_return_release(i, v);
+       return raw_atomic_long_sub_return_release(i, v);
 }
 
+/**
+ * atomic_long_sub_return_relaxed() - atomic subtract with relaxed ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_sub_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_sub_return_relaxed(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_sub_return_relaxed(i, v);
+       return raw_atomic_long_sub_return_relaxed(i, v);
 }
 
+/**
+ * atomic_long_fetch_sub() - atomic subtract with full ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_sub() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_sub(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_sub(i, v);
+       return raw_atomic_long_fetch_sub(i, v);
 }
 
+/**
+ * atomic_long_fetch_sub_acquire() - atomic subtract with acquire ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_sub_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_sub_acquire(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_sub_acquire(i, v);
+       return raw_atomic_long_fetch_sub_acquire(i, v);
 }
 
+/**
+ * atomic_long_fetch_sub_release() - atomic subtract with release ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_sub_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_sub_release(long i, atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_sub_release(i, v);
+       return raw_atomic_long_fetch_sub_release(i, v);
 }
 
+/**
+ * atomic_long_fetch_sub_relaxed() - atomic subtract with relaxed ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_sub_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_sub_relaxed(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_sub_relaxed(i, v);
+       return raw_atomic_long_fetch_sub_relaxed(i, v);
 }
 
+/**
+ * atomic_long_inc() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_inc() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_inc(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_long_inc(v);
+       raw_atomic_long_inc(v);
 }
 
+/**
+ * atomic_long_inc_return() - atomic increment with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_inc_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_inc_return(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_inc_return(v);
+       return raw_atomic_long_inc_return(v);
 }
 
+/**
+ * atomic_long_inc_return_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_inc_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_inc_return_acquire(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_inc_return_acquire(v);
+       return raw_atomic_long_inc_return_acquire(v);
 }
 
+/**
+ * atomic_long_inc_return_release() - atomic increment with release ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_inc_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_inc_return_release(atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_inc_return_release(v);
+       return raw_atomic_long_inc_return_release(v);
 }
 
+/**
+ * atomic_long_inc_return_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_inc_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_inc_return_relaxed(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_inc_return_relaxed(v);
+       return raw_atomic_long_inc_return_relaxed(v);
 }
 
+/**
+ * atomic_long_fetch_inc() - atomic increment with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_inc() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_inc(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_inc(v);
+       return raw_atomic_long_fetch_inc(v);
 }
 
+/**
+ * atomic_long_fetch_inc_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_inc_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_inc_acquire(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_inc_acquire(v);
+       return raw_atomic_long_fetch_inc_acquire(v);
 }
 
+/**
+ * atomic_long_fetch_inc_release() - atomic increment with release ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_inc_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_inc_release(atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_inc_release(v);
+       return raw_atomic_long_fetch_inc_release(v);
 }
 
+/**
+ * atomic_long_fetch_inc_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_inc_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_inc_relaxed(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_inc_relaxed(v);
+       return raw_atomic_long_fetch_inc_relaxed(v);
 }
 
+/**
+ * atomic_long_dec() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_dec() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_dec(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_long_dec(v);
+       raw_atomic_long_dec(v);
 }
 
+/**
+ * atomic_long_dec_return() - atomic decrement with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_dec_return() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_dec_return(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_dec_return(v);
+       return raw_atomic_long_dec_return(v);
 }
 
+/**
+ * atomic_long_dec_return_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_dec_return_acquire() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_dec_return_acquire(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_dec_return_acquire(v);
+       return raw_atomic_long_dec_return_acquire(v);
 }
 
+/**
+ * atomic_long_dec_return_release() - atomic decrement with release ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_dec_return_release() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_dec_return_release(atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_dec_return_release(v);
+       return raw_atomic_long_dec_return_release(v);
 }
 
+/**
+ * atomic_long_dec_return_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_dec_return_relaxed() there.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
 atomic_long_dec_return_relaxed(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_dec_return_relaxed(v);
+       return raw_atomic_long_dec_return_relaxed(v);
 }
 
+/**
+ * atomic_long_fetch_dec() - atomic decrement with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_dec() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_dec(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_dec(v);
+       return raw_atomic_long_fetch_dec(v);
 }
 
+/**
+ * atomic_long_fetch_dec_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_dec_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_dec_acquire(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_dec_acquire(v);
+       return raw_atomic_long_fetch_dec_acquire(v);
 }
 
+/**
+ * atomic_long_fetch_dec_release() - atomic decrement with release ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_dec_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_dec_release(atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_dec_release(v);
+       return raw_atomic_long_fetch_dec_release(v);
 }
 
+/**
+ * atomic_long_fetch_dec_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_dec_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_dec_relaxed(atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_dec_relaxed(v);
+       return raw_atomic_long_fetch_dec_relaxed(v);
 }
 
+/**
+ * atomic_long_and() - atomic bitwise AND with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_and() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_and(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_long_and(i, v);
+       raw_atomic_long_and(i, v);
 }
 
+/**
+ * atomic_long_fetch_and() - atomic bitwise AND with full ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_and() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_and(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_and(i, v);
+       return raw_atomic_long_fetch_and(i, v);
 }
 
+/**
+ * atomic_long_fetch_and_acquire() - atomic bitwise AND with acquire ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_and_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_and_acquire(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_and_acquire(i, v);
+       return raw_atomic_long_fetch_and_acquire(i, v);
 }
 
+/**
+ * atomic_long_fetch_and_release() - atomic bitwise AND with release ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_and_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_and_release(long i, atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_and_release(i, v);
+       return raw_atomic_long_fetch_and_release(i, v);
 }
 
+/**
+ * atomic_long_fetch_and_relaxed() - atomic bitwise AND with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_and_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_and_relaxed(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_and_relaxed(i, v);
+       return raw_atomic_long_fetch_and_relaxed(i, v);
 }
 
+/**
+ * atomic_long_andnot() - atomic bitwise AND NOT with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_andnot() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_andnot(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_long_andnot(i, v);
+       raw_atomic_long_andnot(i, v);
 }
 
+/**
+ * atomic_long_fetch_andnot() - atomic bitwise AND NOT with full ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_andnot() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_andnot(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_andnot(i, v);
+       return raw_atomic_long_fetch_andnot(i, v);
 }
 
+/**
+ * atomic_long_fetch_andnot_acquire() - atomic bitwise AND NOT with acquire ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_andnot_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_andnot_acquire(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_andnot_acquire(i, v);
+       return raw_atomic_long_fetch_andnot_acquire(i, v);
 }
 
+/**
+ * atomic_long_fetch_andnot_release() - atomic bitwise AND NOT with release ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_andnot_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_andnot_release(long i, atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_andnot_release(i, v);
+       return raw_atomic_long_fetch_andnot_release(i, v);
 }
 
+/**
+ * atomic_long_fetch_andnot_relaxed() - atomic bitwise AND NOT with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_andnot_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_andnot_relaxed(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_andnot_relaxed(i, v);
+       return raw_atomic_long_fetch_andnot_relaxed(i, v);
 }
 
+/**
+ * atomic_long_or() - atomic bitwise OR with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_or() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_or(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_long_or(i, v);
+       raw_atomic_long_or(i, v);
 }
 
+/**
+ * atomic_long_fetch_or() - atomic bitwise OR with full ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_or() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_or(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_or(i, v);
+       return raw_atomic_long_fetch_or(i, v);
 }
 
+/**
+ * atomic_long_fetch_or_acquire() - atomic bitwise OR with acquire ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_or_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_or_acquire(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_or_acquire(i, v);
+       return raw_atomic_long_fetch_or_acquire(i, v);
 }
 
+/**
+ * atomic_long_fetch_or_release() - atomic bitwise OR with release ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_or_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_or_release(long i, atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_or_release(i, v);
+       return raw_atomic_long_fetch_or_release(i, v);
 }
 
+/**
+ * atomic_long_fetch_or_relaxed() - atomic bitwise OR with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_or_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_or_relaxed(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_or_relaxed(i, v);
+       return raw_atomic_long_fetch_or_relaxed(i, v);
 }
 
+/**
+ * atomic_long_xor() - atomic bitwise XOR with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_xor() there.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
 atomic_long_xor(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       arch_atomic_long_xor(i, v);
+       raw_atomic_long_xor(i, v);
 }
 
+/**
+ * atomic_long_fetch_xor() - atomic bitwise XOR with full ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_xor() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_xor(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_xor(i, v);
+       return raw_atomic_long_fetch_xor(i, v);
 }
 
+/**
+ * atomic_long_fetch_xor_acquire() - atomic bitwise XOR with acquire ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_xor_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_xor_acquire(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_xor_acquire(i, v);
+       return raw_atomic_long_fetch_xor_acquire(i, v);
 }
 
+/**
+ * atomic_long_fetch_xor_release() - atomic bitwise XOR with release ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_xor_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_xor_release(long i, atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_xor_release(i, v);
+       return raw_atomic_long_fetch_xor_release(i, v);
 }
 
+/**
+ * atomic_long_fetch_xor_relaxed() - atomic bitwise XOR with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_xor_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_xor_relaxed(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_xor_relaxed(i, v);
+       return raw_atomic_long_fetch_xor_relaxed(i, v);
 }
 
+/**
+ * atomic_long_xchg() - atomic exchange with full ordering
+ * @v: pointer to atomic_long_t
+ * @new: long value to assign
+ *
+ * Atomically updates @v to @new with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_xchg() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-atomic_long_xchg(atomic_long_t *v, long i)
+atomic_long_xchg(atomic_long_t *v, long new)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_xchg(v, i);
+       return raw_atomic_long_xchg(v, new);
 }
 
+/**
+ * atomic_long_xchg_acquire() - atomic exchange with acquire ordering
+ * @v: pointer to atomic_long_t
+ * @new: long value to assign
+ *
+ * Atomically updates @v to @new with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_xchg_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-atomic_long_xchg_acquire(atomic_long_t *v, long i)
+atomic_long_xchg_acquire(atomic_long_t *v, long new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_xchg_acquire(v, i);
+       return raw_atomic_long_xchg_acquire(v, new);
 }
 
+/**
+ * atomic_long_xchg_release() - atomic exchange with release ordering
+ * @v: pointer to atomic_long_t
+ * @new: long value to assign
+ *
+ * Atomically updates @v to @new with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_xchg_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-atomic_long_xchg_release(atomic_long_t *v, long i)
+atomic_long_xchg_release(atomic_long_t *v, long new)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_xchg_release(v, i);
+       return raw_atomic_long_xchg_release(v, new);
 }
 
+/**
+ * atomic_long_xchg_relaxed() - atomic exchange with relaxed ordering
+ * @v: pointer to atomic_long_t
+ * @new: long value to assign
+ *
+ * Atomically updates @v to @new with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_xchg_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-atomic_long_xchg_relaxed(atomic_long_t *v, long i)
+atomic_long_xchg_relaxed(atomic_long_t *v, long new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_xchg_relaxed(v, i);
+       return raw_atomic_long_xchg_relaxed(v, new);
 }
 
+/**
+ * atomic_long_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic_long_t
+ * @old: long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_cmpxchg() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_cmpxchg(atomic_long_t *v, long old, long new)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_cmpxchg(v, old, new);
+       return raw_atomic_long_cmpxchg(v, old, new);
 }
 
+/**
+ * atomic_long_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic_long_t
+ * @old: long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_cmpxchg_acquire() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_cmpxchg_acquire(atomic_long_t *v, long old, long new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_cmpxchg_acquire(v, old, new);
+       return raw_atomic_long_cmpxchg_acquire(v, old, new);
 }
 
+/**
+ * atomic_long_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic_long_t
+ * @old: long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_cmpxchg_release() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_cmpxchg_release(atomic_long_t *v, long old, long new)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_cmpxchg_release(v, old, new);
+       return raw_atomic_long_cmpxchg_release(v, old, new);
 }
 
+/**
+ * atomic_long_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic_long_t
+ * @old: long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_cmpxchg_relaxed() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_cmpxchg_relaxed(atomic_long_t *v, long old, long new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_cmpxchg_relaxed(v, old, new);
+       return raw_atomic_long_cmpxchg_relaxed(v, old, new);
 }
 
+/**
+ * atomic_long_try_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic_long_t
+ * @old: pointer to long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_try_cmpxchg() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic_long_try_cmpxchg(v, old, new);
-}
-
+       return raw_atomic_long_try_cmpxchg(v, old, new);
+}
+
+/**
+ * atomic_long_try_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic_long_t
+ * @old: pointer to long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_try_cmpxchg_acquire() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic_long_try_cmpxchg_acquire(v, old, new);
-}
-
+       return raw_atomic_long_try_cmpxchg_acquire(v, old, new);
+}
+
+/**
+ * atomic_long_try_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic_long_t
+ * @old: pointer to long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_try_cmpxchg_release() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic_long_try_cmpxchg_release(v, old, new);
-}
-
+       return raw_atomic_long_try_cmpxchg_release(v, old, new);
+}
+
+/**
+ * atomic_long_try_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic_long_t
+ * @old: pointer to long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_try_cmpxchg_relaxed() there.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
 {
        instrument_atomic_read_write(v, sizeof(*v));
        instrument_atomic_read_write(old, sizeof(*old));
-       return arch_atomic_long_try_cmpxchg_relaxed(v, old, new);
-}
-
+       return raw_atomic_long_try_cmpxchg_relaxed(v, old, new);
+}
+
+/**
+ * atomic_long_sub_and_test() - atomic subtract and test if zero with full ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_sub_and_test() there.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_sub_and_test(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_sub_and_test(i, v);
+       return raw_atomic_long_sub_and_test(i, v);
 }
 
+/**
+ * atomic_long_dec_and_test() - atomic decrement and test if zero with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_dec_and_test() there.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_dec_and_test(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_dec_and_test(v);
+       return raw_atomic_long_dec_and_test(v);
 }
 
+/**
+ * atomic_long_inc_and_test() - atomic increment and test if zero with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_inc_and_test() there.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_inc_and_test(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_inc_and_test(v);
+       return raw_atomic_long_inc_and_test(v);
 }
 
+/**
+ * atomic_long_add_negative() - atomic add and test if negative with full ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add_negative() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_add_negative(long i, atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_add_negative(i, v);
+       return raw_atomic_long_add_negative(i, v);
 }
 
+/**
+ * atomic_long_add_negative_acquire() - atomic add and test if negative with acquire ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add_negative_acquire() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_add_negative_acquire(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_add_negative_acquire(i, v);
+       return raw_atomic_long_add_negative_acquire(i, v);
 }
 
+/**
+ * atomic_long_add_negative_release() - atomic add and test if negative with release ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add_negative_release() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_add_negative_release(long i, atomic_long_t *v)
 {
        kcsan_release();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_add_negative_release(i, v);
+       return raw_atomic_long_add_negative_release(i, v);
 }
 
+/**
+ * atomic_long_add_negative_relaxed() - atomic add and test if negative with relaxed ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add_negative_relaxed() there.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_add_negative_relaxed(long i, atomic_long_t *v)
 {
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_add_negative_relaxed(i, v);
+       return raw_atomic_long_add_negative_relaxed(i, v);
 }
 
+/**
+ * atomic_long_fetch_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic_long_t
+ * @a: long value to add
+ * @u: long value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_fetch_add_unless() there.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
 atomic_long_fetch_add_unless(atomic_long_t *v, long a, long u)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_fetch_add_unless(v, a, u);
+       return raw_atomic_long_fetch_add_unless(v, a, u);
 }
 
+/**
+ * atomic_long_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic_long_t
+ * @a: long value to add
+ * @u: long value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_add_unless() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_add_unless(atomic_long_t *v, long a, long u)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_add_unless(v, a, u);
+       return raw_atomic_long_add_unless(v, a, u);
 }
 
+/**
+ * atomic_long_inc_not_zero() - atomic increment unless zero with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * If (@v != 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_inc_not_zero() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_inc_not_zero(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_inc_not_zero(v);
+       return raw_atomic_long_inc_not_zero(v);
 }
 
+/**
+ * atomic_long_inc_unless_negative() - atomic increment unless negative with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * If (@v >= 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_inc_unless_negative() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_inc_unless_negative(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_inc_unless_negative(v);
+       return raw_atomic_long_inc_unless_negative(v);
 }
 
+/**
+ * atomic_long_dec_unless_positive() - atomic decrement unless positive with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * If (@v <= 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_dec_unless_positive() there.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
 atomic_long_dec_unless_positive(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_dec_unless_positive(v);
+       return raw_atomic_long_dec_unless_positive(v);
 }
 
+/**
+ * atomic_long_dec_if_positive() - atomic decrement if positive with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * If (@v > 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Unsafe to use in noinstr code; use raw_atomic_long_dec_if_positive() there.
+ *
+ * Return: The old value of (@v - 1), regardless of whether @v was updated.
+ */
 static __always_inline long
 atomic_long_dec_if_positive(atomic_long_t *v)
 {
        kcsan_mb();
        instrument_atomic_read_write(v, sizeof(*v));
-       return arch_atomic_long_dec_if_positive(v);
+       return raw_atomic_long_dec_if_positive(v);
 }
 
 #define xchg(ptr, ...) \
@@ -1949,14 +4713,14 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(ptr) __ai_ptr = (ptr); \
        kcsan_mb(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_xchg(__ai_ptr, __VA_ARGS__); \
+       raw_xchg(__ai_ptr, __VA_ARGS__); \
 })
 
 #define xchg_acquire(ptr, ...) \
 ({ \
        typeof(ptr) __ai_ptr = (ptr); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \
+       raw_xchg_acquire(__ai_ptr, __VA_ARGS__); \
 })
 
 #define xchg_release(ptr, ...) \
@@ -1964,14 +4728,14 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(ptr) __ai_ptr = (ptr); \
        kcsan_release(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_xchg_release(__ai_ptr, __VA_ARGS__); \
+       raw_xchg_release(__ai_ptr, __VA_ARGS__); \
 })
 
 #define xchg_relaxed(ptr, ...) \
 ({ \
        typeof(ptr) __ai_ptr = (ptr); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \
+       raw_xchg_relaxed(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg(ptr, ...) \
@@ -1979,14 +4743,14 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(ptr) __ai_ptr = (ptr); \
        kcsan_mb(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg_acquire(ptr, ...) \
 ({ \
        typeof(ptr) __ai_ptr = (ptr); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg_release(ptr, ...) \
@@ -1994,14 +4758,14 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(ptr) __ai_ptr = (ptr); \
        kcsan_release(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg_relaxed(ptr, ...) \
 ({ \
        typeof(ptr) __ai_ptr = (ptr); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64(ptr, ...) \
@@ -2009,14 +4773,14 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(ptr) __ai_ptr = (ptr); \
        kcsan_mb(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg64(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64_acquire(ptr, ...) \
 ({ \
        typeof(ptr) __ai_ptr = (ptr); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64_release(ptr, ...) \
@@ -2024,14 +4788,44 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(ptr) __ai_ptr = (ptr); \
        kcsan_release(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64_relaxed(ptr, ...) \
 ({ \
        typeof(ptr) __ai_ptr = (ptr); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \
+})
+
+#define cmpxchg128(ptr, ...) \
+({ \
+       typeof(ptr) __ai_ptr = (ptr); \
+       kcsan_mb(); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       raw_cmpxchg128(__ai_ptr, __VA_ARGS__); \
+})
+
+#define cmpxchg128_acquire(ptr, ...) \
+({ \
+       typeof(ptr) __ai_ptr = (ptr); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       raw_cmpxchg128_acquire(__ai_ptr, __VA_ARGS__); \
+})
+
+#define cmpxchg128_release(ptr, ...) \
+({ \
+       typeof(ptr) __ai_ptr = (ptr); \
+       kcsan_release(); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       raw_cmpxchg128_release(__ai_ptr, __VA_ARGS__); \
+})
+
+#define cmpxchg128_relaxed(ptr, ...) \
+({ \
+       typeof(ptr) __ai_ptr = (ptr); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       raw_cmpxchg128_relaxed(__ai_ptr, __VA_ARGS__); \
 })
 
 #define try_cmpxchg(ptr, oldp, ...) \
@@ -2041,7 +4835,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        kcsan_mb(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define try_cmpxchg_acquire(ptr, oldp, ...) \
@@ -2050,7 +4844,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(oldp) __ai_oldp = (oldp); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define try_cmpxchg_release(ptr, oldp, ...) \
@@ -2060,7 +4854,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        kcsan_release(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define try_cmpxchg_relaxed(ptr, oldp, ...) \
@@ -2069,7 +4863,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(oldp) __ai_oldp = (oldp); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define try_cmpxchg64(ptr, oldp, ...) \
@@ -2079,7 +4873,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        kcsan_mb(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg64(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg64(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define try_cmpxchg64_acquire(ptr, oldp, ...) \
@@ -2088,7 +4882,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(oldp) __ai_oldp = (oldp); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg64_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg64_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define try_cmpxchg64_release(ptr, oldp, ...) \
@@ -2098,7 +4892,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        kcsan_release(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg64_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg64_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define try_cmpxchg64_relaxed(ptr, oldp, ...) \
@@ -2107,21 +4901,66 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(oldp) __ai_oldp = (oldp); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg64_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg64_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg128(ptr, oldp, ...) \
+({ \
+       typeof(ptr) __ai_ptr = (ptr); \
+       typeof(oldp) __ai_oldp = (oldp); \
+       kcsan_mb(); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
+       raw_try_cmpxchg128(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg128_acquire(ptr, oldp, ...) \
+({ \
+       typeof(ptr) __ai_ptr = (ptr); \
+       typeof(oldp) __ai_oldp = (oldp); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
+       raw_try_cmpxchg128_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg128_release(ptr, oldp, ...) \
+({ \
+       typeof(ptr) __ai_ptr = (ptr); \
+       typeof(oldp) __ai_oldp = (oldp); \
+       kcsan_release(); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
+       raw_try_cmpxchg128_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg128_relaxed(ptr, oldp, ...) \
+({ \
+       typeof(ptr) __ai_ptr = (ptr); \
+       typeof(oldp) __ai_oldp = (oldp); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
+       raw_try_cmpxchg128_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define cmpxchg_local(ptr, ...) \
 ({ \
        typeof(ptr) __ai_ptr = (ptr); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg_local(__ai_ptr, __VA_ARGS__); \
 })
 
 #define cmpxchg64_local(ptr, ...) \
 ({ \
        typeof(ptr) __ai_ptr = (ptr); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \
+       raw_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \
+})
+
+#define cmpxchg128_local(ptr, ...) \
+({ \
+       typeof(ptr) __ai_ptr = (ptr); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       raw_cmpxchg128_local(__ai_ptr, __VA_ARGS__); \
 })
 
 #define sync_cmpxchg(ptr, ...) \
@@ -2129,7 +4968,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(ptr) __ai_ptr = (ptr); \
        kcsan_mb(); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
-       arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
+       raw_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
 
 #define try_cmpxchg_local(ptr, oldp, ...) \
@@ -2138,7 +4977,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(oldp) __ai_oldp = (oldp); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 #define try_cmpxchg64_local(ptr, oldp, ...) \
@@ -2147,24 +4986,18 @@ atomic_long_dec_if_positive(atomic_long_t *v)
        typeof(oldp) __ai_oldp = (oldp); \
        instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
        instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
-       arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+       raw_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
-#define cmpxchg_double(ptr, ...) \
+#define try_cmpxchg128_local(ptr, oldp, ...) \
 ({ \
        typeof(ptr) __ai_ptr = (ptr); \
-       kcsan_mb(); \
-       instrument_atomic_read_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
-       arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \
+       typeof(oldp) __ai_oldp = (oldp); \
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
+       instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \
+       raw_try_cmpxchg128_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
 })
 
 
-#define cmpxchg_double_local(ptr, ...) \
-({ \
-       typeof(ptr) __ai_ptr = (ptr); \
-       instrument_atomic_read_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
-       arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \
-})
-
 #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 6b513a42e1a1b5962532a019b7fc91eaa044ad5e
+// 1568f875fef72097413caab8339120c065a39aa4
index 2fc51ba..c829471 100644 (file)
@@ -21,1030 +21,1778 @@ typedef atomic_t atomic_long_t;
 #define atomic_long_cond_read_relaxed  atomic_cond_read_relaxed
 #endif
 
-#ifdef CONFIG_64BIT
-
-static __always_inline long
-arch_atomic_long_read(const atomic_long_t *v)
-{
-       return arch_atomic64_read(v);
-}
-
-static __always_inline long
-arch_atomic_long_read_acquire(const atomic_long_t *v)
-{
-       return arch_atomic64_read_acquire(v);
-}
-
-static __always_inline void
-arch_atomic_long_set(atomic_long_t *v, long i)
-{
-       arch_atomic64_set(v, i);
-}
-
-static __always_inline void
-arch_atomic_long_set_release(atomic_long_t *v, long i)
-{
-       arch_atomic64_set_release(v, i);
-}
-
-static __always_inline void
-arch_atomic_long_add(long i, atomic_long_t *v)
-{
-       arch_atomic64_add(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_add_return(long i, atomic_long_t *v)
-{
-       return arch_atomic64_add_return(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_add_return_acquire(long i, atomic_long_t *v)
-{
-       return arch_atomic64_add_return_acquire(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_add_return_release(long i, atomic_long_t *v)
-{
-       return arch_atomic64_add_return_release(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_add_return_relaxed(long i, atomic_long_t *v)
-{
-       return arch_atomic64_add_return_relaxed(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_add(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_add(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_add_acquire(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_add_acquire(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_add_release(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_add_release(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_add_relaxed(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_add_relaxed(i, v);
-}
-
-static __always_inline void
-arch_atomic_long_sub(long i, atomic_long_t *v)
-{
-       arch_atomic64_sub(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_sub_return(long i, atomic_long_t *v)
-{
-       return arch_atomic64_sub_return(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_sub_return_acquire(long i, atomic_long_t *v)
-{
-       return arch_atomic64_sub_return_acquire(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_sub_return_release(long i, atomic_long_t *v)
-{
-       return arch_atomic64_sub_return_release(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_sub_return_relaxed(long i, atomic_long_t *v)
-{
-       return arch_atomic64_sub_return_relaxed(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_sub(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_sub(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_sub_acquire(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_sub_acquire(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_sub_release(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_sub_release(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_sub_relaxed(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_sub_relaxed(i, v);
-}
-
-static __always_inline void
-arch_atomic_long_inc(atomic_long_t *v)
-{
-       arch_atomic64_inc(v);
-}
-
-static __always_inline long
-arch_atomic_long_inc_return(atomic_long_t *v)
-{
-       return arch_atomic64_inc_return(v);
-}
-
-static __always_inline long
-arch_atomic_long_inc_return_acquire(atomic_long_t *v)
-{
-       return arch_atomic64_inc_return_acquire(v);
-}
-
-static __always_inline long
-arch_atomic_long_inc_return_release(atomic_long_t *v)
-{
-       return arch_atomic64_inc_return_release(v);
-}
-
-static __always_inline long
-arch_atomic_long_inc_return_relaxed(atomic_long_t *v)
-{
-       return arch_atomic64_inc_return_relaxed(v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_inc(atomic_long_t *v)
-{
-       return arch_atomic64_fetch_inc(v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_inc_acquire(atomic_long_t *v)
-{
-       return arch_atomic64_fetch_inc_acquire(v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_inc_release(atomic_long_t *v)
-{
-       return arch_atomic64_fetch_inc_release(v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_inc_relaxed(atomic_long_t *v)
-{
-       return arch_atomic64_fetch_inc_relaxed(v);
-}
-
-static __always_inline void
-arch_atomic_long_dec(atomic_long_t *v)
-{
-       arch_atomic64_dec(v);
-}
-
-static __always_inline long
-arch_atomic_long_dec_return(atomic_long_t *v)
-{
-       return arch_atomic64_dec_return(v);
-}
-
-static __always_inline long
-arch_atomic_long_dec_return_acquire(atomic_long_t *v)
-{
-       return arch_atomic64_dec_return_acquire(v);
-}
-
-static __always_inline long
-arch_atomic_long_dec_return_release(atomic_long_t *v)
-{
-       return arch_atomic64_dec_return_release(v);
-}
-
-static __always_inline long
-arch_atomic_long_dec_return_relaxed(atomic_long_t *v)
-{
-       return arch_atomic64_dec_return_relaxed(v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_dec(atomic_long_t *v)
-{
-       return arch_atomic64_fetch_dec(v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_dec_acquire(atomic_long_t *v)
-{
-       return arch_atomic64_fetch_dec_acquire(v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_dec_release(atomic_long_t *v)
-{
-       return arch_atomic64_fetch_dec_release(v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_dec_relaxed(atomic_long_t *v)
-{
-       return arch_atomic64_fetch_dec_relaxed(v);
-}
-
-static __always_inline void
-arch_atomic_long_and(long i, atomic_long_t *v)
-{
-       arch_atomic64_and(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_and(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_and(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_and_acquire(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_and_acquire(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_and_release(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_and_release(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_and_relaxed(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_and_relaxed(i, v);
-}
-
-static __always_inline void
-arch_atomic_long_andnot(long i, atomic_long_t *v)
-{
-       arch_atomic64_andnot(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_andnot(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_andnot(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_andnot_acquire(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_andnot_acquire(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_andnot_release(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_andnot_release(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_andnot_relaxed(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_andnot_relaxed(i, v);
-}
-
-static __always_inline void
-arch_atomic_long_or(long i, atomic_long_t *v)
-{
-       arch_atomic64_or(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_or(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_or(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_or_acquire(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_or_acquire(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_or_release(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_or_release(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_or_relaxed(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_or_relaxed(i, v);
-}
-
-static __always_inline void
-arch_atomic_long_xor(long i, atomic_long_t *v)
-{
-       arch_atomic64_xor(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_xor(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_xor(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_xor_acquire(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_xor_acquire(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_xor_release(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_xor_release(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_xor_relaxed(long i, atomic_long_t *v)
-{
-       return arch_atomic64_fetch_xor_relaxed(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_xchg(atomic_long_t *v, long i)
-{
-       return arch_atomic64_xchg(v, i);
-}
-
-static __always_inline long
-arch_atomic_long_xchg_acquire(atomic_long_t *v, long i)
-{
-       return arch_atomic64_xchg_acquire(v, i);
-}
-
-static __always_inline long
-arch_atomic_long_xchg_release(atomic_long_t *v, long i)
-{
-       return arch_atomic64_xchg_release(v, i);
-}
-
-static __always_inline long
-arch_atomic_long_xchg_relaxed(atomic_long_t *v, long i)
-{
-       return arch_atomic64_xchg_relaxed(v, i);
-}
-
-static __always_inline long
-arch_atomic_long_cmpxchg(atomic_long_t *v, long old, long new)
-{
-       return arch_atomic64_cmpxchg(v, old, new);
-}
-
-static __always_inline long
-arch_atomic_long_cmpxchg_acquire(atomic_long_t *v, long old, long new)
-{
-       return arch_atomic64_cmpxchg_acquire(v, old, new);
-}
-
-static __always_inline long
-arch_atomic_long_cmpxchg_release(atomic_long_t *v, long old, long new)
-{
-       return arch_atomic64_cmpxchg_release(v, old, new);
-}
-
-static __always_inline long
-arch_atomic_long_cmpxchg_relaxed(atomic_long_t *v, long old, long new)
-{
-       return arch_atomic64_cmpxchg_relaxed(v, old, new);
-}
-
-static __always_inline bool
-arch_atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
-{
-       return arch_atomic64_try_cmpxchg(v, (s64 *)old, new);
-}
-
-static __always_inline bool
-arch_atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
-{
-       return arch_atomic64_try_cmpxchg_acquire(v, (s64 *)old, new);
-}
-
-static __always_inline bool
-arch_atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
-{
-       return arch_atomic64_try_cmpxchg_release(v, (s64 *)old, new);
-}
-
-static __always_inline bool
-arch_atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
-{
-       return arch_atomic64_try_cmpxchg_relaxed(v, (s64 *)old, new);
-}
-
-static __always_inline bool
-arch_atomic_long_sub_and_test(long i, atomic_long_t *v)
-{
-       return arch_atomic64_sub_and_test(i, v);
-}
-
-static __always_inline bool
-arch_atomic_long_dec_and_test(atomic_long_t *v)
-{
-       return arch_atomic64_dec_and_test(v);
-}
-
-static __always_inline bool
-arch_atomic_long_inc_and_test(atomic_long_t *v)
-{
-       return arch_atomic64_inc_and_test(v);
-}
-
-static __always_inline bool
-arch_atomic_long_add_negative(long i, atomic_long_t *v)
-{
-       return arch_atomic64_add_negative(i, v);
-}
-
-static __always_inline bool
-arch_atomic_long_add_negative_acquire(long i, atomic_long_t *v)
-{
-       return arch_atomic64_add_negative_acquire(i, v);
-}
-
-static __always_inline bool
-arch_atomic_long_add_negative_release(long i, atomic_long_t *v)
-{
-       return arch_atomic64_add_negative_release(i, v);
-}
-
-static __always_inline bool
-arch_atomic_long_add_negative_relaxed(long i, atomic_long_t *v)
-{
-       return arch_atomic64_add_negative_relaxed(i, v);
-}
-
-static __always_inline long
-arch_atomic_long_fetch_add_unless(atomic_long_t *v, long a, long u)
-{
-       return arch_atomic64_fetch_add_unless(v, a, u);
-}
-
-static __always_inline bool
-arch_atomic_long_add_unless(atomic_long_t *v, long a, long u)
-{
-       return arch_atomic64_add_unless(v, a, u);
-}
-
-static __always_inline bool
-arch_atomic_long_inc_not_zero(atomic_long_t *v)
-{
-       return arch_atomic64_inc_not_zero(v);
-}
-
-static __always_inline bool
-arch_atomic_long_inc_unless_negative(atomic_long_t *v)
-{
-       return arch_atomic64_inc_unless_negative(v);
-}
-
-static __always_inline bool
-arch_atomic_long_dec_unless_positive(atomic_long_t *v)
-{
-       return arch_atomic64_dec_unless_positive(v);
-}
-
-static __always_inline long
-arch_atomic_long_dec_if_positive(atomic_long_t *v)
-{
-       return arch_atomic64_dec_if_positive(v);
-}
-
-#else /* CONFIG_64BIT */
-
-static __always_inline long
-arch_atomic_long_read(const atomic_long_t *v)
+/**
+ * raw_atomic_long_read() - atomic load with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically loads the value of @v with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_read() elsewhere.
+ *
+ * Return: The value loaded from @v.
+ */
+static __always_inline long
+raw_atomic_long_read(const atomic_long_t *v)
 {
-       return arch_atomic_read(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_read(v);
+#else
+       return raw_atomic_read(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_read_acquire() - atomic load with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically loads the value of @v with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_read_acquire() elsewhere.
+ *
+ * Return: The value loaded from @v.
+ */
 static __always_inline long
-arch_atomic_long_read_acquire(const atomic_long_t *v)
+raw_atomic_long_read_acquire(const atomic_long_t *v)
 {
-       return arch_atomic_read_acquire(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_read_acquire(v);
+#else
+       return raw_atomic_read_acquire(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_set() - atomic set with relaxed ordering
+ * @v: pointer to atomic_long_t
+ * @i: long value to assign
+ *
+ * Atomically sets @v to @i with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_set() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_set(atomic_long_t *v, long i)
+raw_atomic_long_set(atomic_long_t *v, long i)
 {
-       arch_atomic_set(v, i);
+#ifdef CONFIG_64BIT
+       raw_atomic64_set(v, i);
+#else
+       raw_atomic_set(v, i);
+#endif
 }
 
+/**
+ * raw_atomic_long_set_release() - atomic set with release ordering
+ * @v: pointer to atomic_long_t
+ * @i: long value to assign
+ *
+ * Atomically sets @v to @i with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_set_release() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_set_release(atomic_long_t *v, long i)
+raw_atomic_long_set_release(atomic_long_t *v, long i)
 {
-       arch_atomic_set_release(v, i);
+#ifdef CONFIG_64BIT
+       raw_atomic64_set_release(v, i);
+#else
+       raw_atomic_set_release(v, i);
+#endif
 }
 
+/**
+ * raw_atomic_long_add() - atomic add with relaxed ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_add(long i, atomic_long_t *v)
+raw_atomic_long_add(long i, atomic_long_t *v)
 {
-       arch_atomic_add(i, v);
+#ifdef CONFIG_64BIT
+       raw_atomic64_add(i, v);
+#else
+       raw_atomic_add(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_add_return() - atomic add with full ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_add_return(long i, atomic_long_t *v)
+raw_atomic_long_add_return(long i, atomic_long_t *v)
 {
-       return arch_atomic_add_return(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_add_return(i, v);
+#else
+       return raw_atomic_add_return(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_add_return_acquire() - atomic add with acquire ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_add_return_acquire(long i, atomic_long_t *v)
+raw_atomic_long_add_return_acquire(long i, atomic_long_t *v)
 {
-       return arch_atomic_add_return_acquire(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_add_return_acquire(i, v);
+#else
+       return raw_atomic_add_return_acquire(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_add_return_release() - atomic add with release ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_add_return_release(long i, atomic_long_t *v)
+raw_atomic_long_add_return_release(long i, atomic_long_t *v)
 {
-       return arch_atomic_add_return_release(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_add_return_release(i, v);
+#else
+       return raw_atomic_add_return_release(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_add_return_relaxed() - atomic add with relaxed ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_add_return_relaxed(long i, atomic_long_t *v)
+raw_atomic_long_add_return_relaxed(long i, atomic_long_t *v)
 {
-       return arch_atomic_add_return_relaxed(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_add_return_relaxed(i, v);
+#else
+       return raw_atomic_add_return_relaxed(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_add() - atomic add with full ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_add() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_add(long i, atomic_long_t *v)
+raw_atomic_long_fetch_add(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_add(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_add(i, v);
+#else
+       return raw_atomic_fetch_add(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_add_acquire() - atomic add with acquire ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_add_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_add_acquire(long i, atomic_long_t *v)
+raw_atomic_long_fetch_add_acquire(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_add_acquire(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_add_acquire(i, v);
+#else
+       return raw_atomic_fetch_add_acquire(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_add_release() - atomic add with release ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_add_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_add_release(long i, atomic_long_t *v)
+raw_atomic_long_fetch_add_release(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_add_release(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_add_release(i, v);
+#else
+       return raw_atomic_fetch_add_release(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_add_relaxed() - atomic add with relaxed ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_add_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_add_relaxed(long i, atomic_long_t *v)
+raw_atomic_long_fetch_add_relaxed(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_add_relaxed(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_add_relaxed(i, v);
+#else
+       return raw_atomic_fetch_add_relaxed(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_sub() - atomic subtract with relaxed ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_sub() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_sub(long i, atomic_long_t *v)
+raw_atomic_long_sub(long i, atomic_long_t *v)
 {
-       arch_atomic_sub(i, v);
+#ifdef CONFIG_64BIT
+       raw_atomic64_sub(i, v);
+#else
+       raw_atomic_sub(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_sub_return() - atomic subtract with full ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_sub_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_sub_return(long i, atomic_long_t *v)
+raw_atomic_long_sub_return(long i, atomic_long_t *v)
 {
-       return arch_atomic_sub_return(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_sub_return(i, v);
+#else
+       return raw_atomic_sub_return(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_sub_return_acquire() - atomic subtract with acquire ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_sub_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_sub_return_acquire(long i, atomic_long_t *v)
+raw_atomic_long_sub_return_acquire(long i, atomic_long_t *v)
 {
-       return arch_atomic_sub_return_acquire(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_sub_return_acquire(i, v);
+#else
+       return raw_atomic_sub_return_acquire(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_sub_return_release() - atomic subtract with release ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_sub_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_sub_return_release(long i, atomic_long_t *v)
+raw_atomic_long_sub_return_release(long i, atomic_long_t *v)
 {
-       return arch_atomic_sub_return_release(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_sub_return_release(i, v);
+#else
+       return raw_atomic_sub_return_release(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_sub_return_relaxed() - atomic subtract with relaxed ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_sub_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_sub_return_relaxed(long i, atomic_long_t *v)
+raw_atomic_long_sub_return_relaxed(long i, atomic_long_t *v)
 {
-       return arch_atomic_sub_return_relaxed(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_sub_return_relaxed(i, v);
+#else
+       return raw_atomic_sub_return_relaxed(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_sub() - atomic subtract with full ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_sub() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_sub(long i, atomic_long_t *v)
+raw_atomic_long_fetch_sub(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_sub(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_sub(i, v);
+#else
+       return raw_atomic_fetch_sub(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_sub_acquire() - atomic subtract with acquire ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_sub_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_sub_acquire(long i, atomic_long_t *v)
+raw_atomic_long_fetch_sub_acquire(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_sub_acquire(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_sub_acquire(i, v);
+#else
+       return raw_atomic_fetch_sub_acquire(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_sub_release() - atomic subtract with release ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_sub_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_sub_release(long i, atomic_long_t *v)
+raw_atomic_long_fetch_sub_release(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_sub_release(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_sub_release(i, v);
+#else
+       return raw_atomic_fetch_sub_release(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_sub_relaxed() - atomic subtract with relaxed ordering
+ * @i: long value to subtract
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_sub_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_sub_relaxed(long i, atomic_long_t *v)
+raw_atomic_long_fetch_sub_relaxed(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_sub_relaxed(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_sub_relaxed(i, v);
+#else
+       return raw_atomic_fetch_sub_relaxed(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_inc() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_inc() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_inc(atomic_long_t *v)
+raw_atomic_long_inc(atomic_long_t *v)
 {
-       arch_atomic_inc(v);
+#ifdef CONFIG_64BIT
+       raw_atomic64_inc(v);
+#else
+       raw_atomic_inc(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_inc_return() - atomic increment with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_inc_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_inc_return(atomic_long_t *v)
+raw_atomic_long_inc_return(atomic_long_t *v)
 {
-       return arch_atomic_inc_return(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_inc_return(v);
+#else
+       return raw_atomic_inc_return(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_inc_return_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_inc_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_inc_return_acquire(atomic_long_t *v)
+raw_atomic_long_inc_return_acquire(atomic_long_t *v)
 {
-       return arch_atomic_inc_return_acquire(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_inc_return_acquire(v);
+#else
+       return raw_atomic_inc_return_acquire(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_inc_return_release() - atomic increment with release ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_inc_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_inc_return_release(atomic_long_t *v)
+raw_atomic_long_inc_return_release(atomic_long_t *v)
 {
-       return arch_atomic_inc_return_release(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_inc_return_release(v);
+#else
+       return raw_atomic_inc_return_release(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_inc_return_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_inc_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_inc_return_relaxed(atomic_long_t *v)
+raw_atomic_long_inc_return_relaxed(atomic_long_t *v)
 {
-       return arch_atomic_inc_return_relaxed(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_inc_return_relaxed(v);
+#else
+       return raw_atomic_inc_return_relaxed(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_inc() - atomic increment with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_inc() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_inc(atomic_long_t *v)
+raw_atomic_long_fetch_inc(atomic_long_t *v)
 {
-       return arch_atomic_fetch_inc(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_inc(v);
+#else
+       return raw_atomic_fetch_inc(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_inc_acquire() - atomic increment with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_inc_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_inc_acquire(atomic_long_t *v)
+raw_atomic_long_fetch_inc_acquire(atomic_long_t *v)
 {
-       return arch_atomic_fetch_inc_acquire(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_inc_acquire(v);
+#else
+       return raw_atomic_fetch_inc_acquire(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_inc_release() - atomic increment with release ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_inc_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_inc_release(atomic_long_t *v)
+raw_atomic_long_fetch_inc_release(atomic_long_t *v)
 {
-       return arch_atomic_fetch_inc_release(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_inc_release(v);
+#else
+       return raw_atomic_fetch_inc_release(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_inc_relaxed() - atomic increment with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_inc_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_inc_relaxed(atomic_long_t *v)
+raw_atomic_long_fetch_inc_relaxed(atomic_long_t *v)
 {
-       return arch_atomic_fetch_inc_relaxed(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_inc_relaxed(v);
+#else
+       return raw_atomic_fetch_inc_relaxed(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_dec() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_dec() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_dec(atomic_long_t *v)
+raw_atomic_long_dec(atomic_long_t *v)
 {
-       arch_atomic_dec(v);
+#ifdef CONFIG_64BIT
+       raw_atomic64_dec(v);
+#else
+       raw_atomic_dec(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_dec_return() - atomic decrement with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_dec_return() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_dec_return(atomic_long_t *v)
+raw_atomic_long_dec_return(atomic_long_t *v)
 {
-       return arch_atomic_dec_return(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_dec_return(v);
+#else
+       return raw_atomic_dec_return(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_dec_return_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_dec_return_acquire() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_dec_return_acquire(atomic_long_t *v)
+raw_atomic_long_dec_return_acquire(atomic_long_t *v)
 {
-       return arch_atomic_dec_return_acquire(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_dec_return_acquire(v);
+#else
+       return raw_atomic_dec_return_acquire(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_dec_return_release() - atomic decrement with release ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_dec_return_release() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_dec_return_release(atomic_long_t *v)
+raw_atomic_long_dec_return_release(atomic_long_t *v)
 {
-       return arch_atomic_dec_return_release(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_dec_return_release(v);
+#else
+       return raw_atomic_dec_return_release(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_dec_return_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_dec_return_relaxed() elsewhere.
+ *
+ * Return: The updated value of @v.
+ */
 static __always_inline long
-arch_atomic_long_dec_return_relaxed(atomic_long_t *v)
+raw_atomic_long_dec_return_relaxed(atomic_long_t *v)
 {
-       return arch_atomic_dec_return_relaxed(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_dec_return_relaxed(v);
+#else
+       return raw_atomic_dec_return_relaxed(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_dec() - atomic decrement with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_dec() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_dec(atomic_long_t *v)
+raw_atomic_long_fetch_dec(atomic_long_t *v)
 {
-       return arch_atomic_fetch_dec(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_dec(v);
+#else
+       return raw_atomic_fetch_dec(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_dec_acquire() - atomic decrement with acquire ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_dec_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_dec_acquire(atomic_long_t *v)
+raw_atomic_long_fetch_dec_acquire(atomic_long_t *v)
 {
-       return arch_atomic_fetch_dec_acquire(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_dec_acquire(v);
+#else
+       return raw_atomic_fetch_dec_acquire(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_dec_release() - atomic decrement with release ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_dec_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_dec_release(atomic_long_t *v)
+raw_atomic_long_fetch_dec_release(atomic_long_t *v)
 {
-       return arch_atomic_fetch_dec_release(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_dec_release(v);
+#else
+       return raw_atomic_fetch_dec_release(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_dec_relaxed() - atomic decrement with relaxed ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_dec_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_dec_relaxed(atomic_long_t *v)
+raw_atomic_long_fetch_dec_relaxed(atomic_long_t *v)
 {
-       return arch_atomic_fetch_dec_relaxed(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_dec_relaxed(v);
+#else
+       return raw_atomic_fetch_dec_relaxed(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_and() - atomic bitwise AND with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_and() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_and(long i, atomic_long_t *v)
+raw_atomic_long_and(long i, atomic_long_t *v)
 {
-       arch_atomic_and(i, v);
+#ifdef CONFIG_64BIT
+       raw_atomic64_and(i, v);
+#else
+       raw_atomic_and(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_and() - atomic bitwise AND with full ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_and() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_and(long i, atomic_long_t *v)
+raw_atomic_long_fetch_and(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_and(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_and(i, v);
+#else
+       return raw_atomic_fetch_and(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_and_acquire() - atomic bitwise AND with acquire ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_and_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_and_acquire(long i, atomic_long_t *v)
+raw_atomic_long_fetch_and_acquire(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_and_acquire(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_and_acquire(i, v);
+#else
+       return raw_atomic_fetch_and_acquire(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_and_release() - atomic bitwise AND with release ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_and_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_and_release(long i, atomic_long_t *v)
+raw_atomic_long_fetch_and_release(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_and_release(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_and_release(i, v);
+#else
+       return raw_atomic_fetch_and_release(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_and_relaxed() - atomic bitwise AND with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_and_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_and_relaxed(long i, atomic_long_t *v)
+raw_atomic_long_fetch_and_relaxed(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_and_relaxed(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_and_relaxed(i, v);
+#else
+       return raw_atomic_fetch_and_relaxed(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_andnot() - atomic bitwise AND NOT with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_andnot() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_andnot(long i, atomic_long_t *v)
+raw_atomic_long_andnot(long i, atomic_long_t *v)
 {
-       arch_atomic_andnot(i, v);
+#ifdef CONFIG_64BIT
+       raw_atomic64_andnot(i, v);
+#else
+       raw_atomic_andnot(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_andnot() - atomic bitwise AND NOT with full ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_andnot() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_andnot(long i, atomic_long_t *v)
+raw_atomic_long_fetch_andnot(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_andnot(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_andnot(i, v);
+#else
+       return raw_atomic_fetch_andnot(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_andnot_acquire() - atomic bitwise AND NOT with acquire ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_andnot_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_andnot_acquire(long i, atomic_long_t *v)
+raw_atomic_long_fetch_andnot_acquire(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_andnot_acquire(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_andnot_acquire(i, v);
+#else
+       return raw_atomic_fetch_andnot_acquire(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_andnot_release() - atomic bitwise AND NOT with release ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_andnot_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_andnot_release(long i, atomic_long_t *v)
+raw_atomic_long_fetch_andnot_release(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_andnot_release(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_andnot_release(i, v);
+#else
+       return raw_atomic_fetch_andnot_release(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_andnot_relaxed() - atomic bitwise AND NOT with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v & ~@i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_andnot_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_andnot_relaxed(long i, atomic_long_t *v)
+raw_atomic_long_fetch_andnot_relaxed(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_andnot_relaxed(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_andnot_relaxed(i, v);
+#else
+       return raw_atomic_fetch_andnot_relaxed(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_or() - atomic bitwise OR with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_or() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_or(long i, atomic_long_t *v)
+raw_atomic_long_or(long i, atomic_long_t *v)
 {
-       arch_atomic_or(i, v);
+#ifdef CONFIG_64BIT
+       raw_atomic64_or(i, v);
+#else
+       raw_atomic_or(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_or() - atomic bitwise OR with full ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_or() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_or(long i, atomic_long_t *v)
+raw_atomic_long_fetch_or(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_or(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_or(i, v);
+#else
+       return raw_atomic_fetch_or(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_or_acquire() - atomic bitwise OR with acquire ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_or_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_or_acquire(long i, atomic_long_t *v)
+raw_atomic_long_fetch_or_acquire(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_or_acquire(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_or_acquire(i, v);
+#else
+       return raw_atomic_fetch_or_acquire(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_or_release() - atomic bitwise OR with release ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_or_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_or_release(long i, atomic_long_t *v)
+raw_atomic_long_fetch_or_release(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_or_release(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_or_release(i, v);
+#else
+       return raw_atomic_fetch_or_release(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_or_relaxed() - atomic bitwise OR with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v | @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_or_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_or_relaxed(long i, atomic_long_t *v)
+raw_atomic_long_fetch_or_relaxed(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_or_relaxed(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_or_relaxed(i, v);
+#else
+       return raw_atomic_fetch_or_relaxed(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_xor() - atomic bitwise XOR with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_xor() elsewhere.
+ *
+ * Return: Nothing.
+ */
 static __always_inline void
-arch_atomic_long_xor(long i, atomic_long_t *v)
+raw_atomic_long_xor(long i, atomic_long_t *v)
 {
-       arch_atomic_xor(i, v);
+#ifdef CONFIG_64BIT
+       raw_atomic64_xor(i, v);
+#else
+       raw_atomic_xor(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_xor() - atomic bitwise XOR with full ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_xor() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_xor(long i, atomic_long_t *v)
+raw_atomic_long_fetch_xor(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_xor(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_xor(i, v);
+#else
+       return raw_atomic_fetch_xor(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_xor_acquire() - atomic bitwise XOR with acquire ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_xor_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_xor_acquire(long i, atomic_long_t *v)
+raw_atomic_long_fetch_xor_acquire(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_xor_acquire(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_xor_acquire(i, v);
+#else
+       return raw_atomic_fetch_xor_acquire(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_xor_release() - atomic bitwise XOR with release ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_xor_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_xor_release(long i, atomic_long_t *v)
+raw_atomic_long_fetch_xor_release(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_xor_release(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_xor_release(i, v);
+#else
+       return raw_atomic_fetch_xor_release(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_xor_relaxed() - atomic bitwise XOR with relaxed ordering
+ * @i: long value
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v ^ @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_xor_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_xor_relaxed(long i, atomic_long_t *v)
+raw_atomic_long_fetch_xor_relaxed(long i, atomic_long_t *v)
 {
-       return arch_atomic_fetch_xor_relaxed(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_xor_relaxed(i, v);
+#else
+       return raw_atomic_fetch_xor_relaxed(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_xchg() - atomic exchange with full ordering
+ * @v: pointer to atomic_long_t
+ * @new: long value to assign
+ *
+ * Atomically updates @v to @new with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_xchg() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_xchg(atomic_long_t *v, long i)
+raw_atomic_long_xchg(atomic_long_t *v, long new)
 {
-       return arch_atomic_xchg(v, i);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_xchg(v, new);
+#else
+       return raw_atomic_xchg(v, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_xchg_acquire() - atomic exchange with acquire ordering
+ * @v: pointer to atomic_long_t
+ * @new: long value to assign
+ *
+ * Atomically updates @v to @new with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_xchg_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_xchg_acquire(atomic_long_t *v, long i)
+raw_atomic_long_xchg_acquire(atomic_long_t *v, long new)
 {
-       return arch_atomic_xchg_acquire(v, i);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_xchg_acquire(v, new);
+#else
+       return raw_atomic_xchg_acquire(v, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_xchg_release() - atomic exchange with release ordering
+ * @v: pointer to atomic_long_t
+ * @new: long value to assign
+ *
+ * Atomically updates @v to @new with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_xchg_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_xchg_release(atomic_long_t *v, long i)
+raw_atomic_long_xchg_release(atomic_long_t *v, long new)
 {
-       return arch_atomic_xchg_release(v, i);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_xchg_release(v, new);
+#else
+       return raw_atomic_xchg_release(v, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_xchg_relaxed() - atomic exchange with relaxed ordering
+ * @v: pointer to atomic_long_t
+ * @new: long value to assign
+ *
+ * Atomically updates @v to @new with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_xchg_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_xchg_relaxed(atomic_long_t *v, long i)
+raw_atomic_long_xchg_relaxed(atomic_long_t *v, long new)
 {
-       return arch_atomic_xchg_relaxed(v, i);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_xchg_relaxed(v, new);
+#else
+       return raw_atomic_xchg_relaxed(v, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic_long_t
+ * @old: long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_cmpxchg() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_cmpxchg(atomic_long_t *v, long old, long new)
+raw_atomic_long_cmpxchg(atomic_long_t *v, long old, long new)
 {
-       return arch_atomic_cmpxchg(v, old, new);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_cmpxchg(v, old, new);
+#else
+       return raw_atomic_cmpxchg(v, old, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic_long_t
+ * @old: long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_cmpxchg_acquire() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_cmpxchg_acquire(atomic_long_t *v, long old, long new)
+raw_atomic_long_cmpxchg_acquire(atomic_long_t *v, long old, long new)
 {
-       return arch_atomic_cmpxchg_acquire(v, old, new);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_cmpxchg_acquire(v, old, new);
+#else
+       return raw_atomic_cmpxchg_acquire(v, old, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic_long_t
+ * @old: long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_cmpxchg_release() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_cmpxchg_release(atomic_long_t *v, long old, long new)
+raw_atomic_long_cmpxchg_release(atomic_long_t *v, long old, long new)
 {
-       return arch_atomic_cmpxchg_release(v, old, new);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_cmpxchg_release(v, old, new);
+#else
+       return raw_atomic_cmpxchg_release(v, old, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic_long_t
+ * @old: long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_cmpxchg_relaxed() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_cmpxchg_relaxed(atomic_long_t *v, long old, long new)
+raw_atomic_long_cmpxchg_relaxed(atomic_long_t *v, long old, long new)
 {
-       return arch_atomic_cmpxchg_relaxed(v, old, new);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_cmpxchg_relaxed(v, old, new);
+#else
+       return raw_atomic_cmpxchg_relaxed(v, old, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_try_cmpxchg() - atomic compare and exchange with full ordering
+ * @v: pointer to atomic_long_t
+ * @old: pointer to long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with full ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_try_cmpxchg() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
+raw_atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
 {
-       return arch_atomic_try_cmpxchg(v, (int *)old, new);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_try_cmpxchg(v, (s64 *)old, new);
+#else
+       return raw_atomic_try_cmpxchg(v, (int *)old, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_try_cmpxchg_acquire() - atomic compare and exchange with acquire ordering
+ * @v: pointer to atomic_long_t
+ * @old: pointer to long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with acquire ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_try_cmpxchg_acquire() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
+raw_atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
 {
-       return arch_atomic_try_cmpxchg_acquire(v, (int *)old, new);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_try_cmpxchg_acquire(v, (s64 *)old, new);
+#else
+       return raw_atomic_try_cmpxchg_acquire(v, (int *)old, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_try_cmpxchg_release() - atomic compare and exchange with release ordering
+ * @v: pointer to atomic_long_t
+ * @old: pointer to long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with release ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_try_cmpxchg_release() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
+raw_atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
 {
-       return arch_atomic_try_cmpxchg_release(v, (int *)old, new);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_try_cmpxchg_release(v, (s64 *)old, new);
+#else
+       return raw_atomic_try_cmpxchg_release(v, (int *)old, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_try_cmpxchg_relaxed() - atomic compare and exchange with relaxed ordering
+ * @v: pointer to atomic_long_t
+ * @old: pointer to long value to compare with
+ * @new: long value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with relaxed ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_try_cmpxchg_relaxed() elsewhere.
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
+raw_atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
 {
-       return arch_atomic_try_cmpxchg_relaxed(v, (int *)old, new);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_try_cmpxchg_relaxed(v, (s64 *)old, new);
+#else
+       return raw_atomic_try_cmpxchg_relaxed(v, (int *)old, new);
+#endif
 }
 
+/**
+ * raw_atomic_long_sub_and_test() - atomic subtract and test if zero with full ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_sub_and_test() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_sub_and_test(long i, atomic_long_t *v)
+raw_atomic_long_sub_and_test(long i, atomic_long_t *v)
 {
-       return arch_atomic_sub_and_test(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_sub_and_test(i, v);
+#else
+       return raw_atomic_sub_and_test(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_dec_and_test() - atomic decrement and test if zero with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_dec_and_test() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_dec_and_test(atomic_long_t *v)
+raw_atomic_long_dec_and_test(atomic_long_t *v)
 {
-       return arch_atomic_dec_and_test(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_dec_and_test(v);
+#else
+       return raw_atomic_dec_and_test(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_inc_and_test() - atomic increment and test if zero with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_inc_and_test() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_inc_and_test(atomic_long_t *v)
+raw_atomic_long_inc_and_test(atomic_long_t *v)
 {
-       return arch_atomic_inc_and_test(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_inc_and_test(v);
+#else
+       return raw_atomic_inc_and_test(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_add_negative() - atomic add and test if negative with full ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add_negative() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_add_negative(long i, atomic_long_t *v)
+raw_atomic_long_add_negative(long i, atomic_long_t *v)
 {
-       return arch_atomic_add_negative(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_add_negative(i, v);
+#else
+       return raw_atomic_add_negative(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_add_negative_acquire() - atomic add and test if negative with acquire ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with acquire ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add_negative_acquire() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_add_negative_acquire(long i, atomic_long_t *v)
+raw_atomic_long_add_negative_acquire(long i, atomic_long_t *v)
 {
-       return arch_atomic_add_negative_acquire(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_add_negative_acquire(i, v);
+#else
+       return raw_atomic_add_negative_acquire(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_add_negative_release() - atomic add and test if negative with release ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with release ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add_negative_release() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_add_negative_release(long i, atomic_long_t *v)
+raw_atomic_long_add_negative_release(long i, atomic_long_t *v)
 {
-       return arch_atomic_add_negative_release(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_add_negative_release(i, v);
+#else
+       return raw_atomic_add_negative_release(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_add_negative_relaxed() - atomic add and test if negative with relaxed ordering
+ * @i: long value to add
+ * @v: pointer to atomic_long_t
+ *
+ * Atomically updates @v to (@v + @i) with relaxed ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add_negative_relaxed() elsewhere.
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_add_negative_relaxed(long i, atomic_long_t *v)
+raw_atomic_long_add_negative_relaxed(long i, atomic_long_t *v)
 {
-       return arch_atomic_add_negative_relaxed(i, v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_add_negative_relaxed(i, v);
+#else
+       return raw_atomic_add_negative_relaxed(i, v);
+#endif
 }
 
+/**
+ * raw_atomic_long_fetch_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic_long_t
+ * @a: long value to add
+ * @u: long value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_fetch_add_unless() elsewhere.
+ *
+ * Return: The original value of @v.
+ */
 static __always_inline long
-arch_atomic_long_fetch_add_unless(atomic_long_t *v, long a, long u)
+raw_atomic_long_fetch_add_unless(atomic_long_t *v, long a, long u)
 {
-       return arch_atomic_fetch_add_unless(v, a, u);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_fetch_add_unless(v, a, u);
+#else
+       return raw_atomic_fetch_add_unless(v, a, u);
+#endif
 }
 
+/**
+ * raw_atomic_long_add_unless() - atomic add unless value with full ordering
+ * @v: pointer to atomic_long_t
+ * @a: long value to add
+ * @u: long value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_add_unless() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_add_unless(atomic_long_t *v, long a, long u)
+raw_atomic_long_add_unless(atomic_long_t *v, long a, long u)
 {
-       return arch_atomic_add_unless(v, a, u);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_add_unless(v, a, u);
+#else
+       return raw_atomic_add_unless(v, a, u);
+#endif
 }
 
+/**
+ * raw_atomic_long_inc_not_zero() - atomic increment unless zero with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * If (@v != 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_inc_not_zero() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_inc_not_zero(atomic_long_t *v)
+raw_atomic_long_inc_not_zero(atomic_long_t *v)
 {
-       return arch_atomic_inc_not_zero(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_inc_not_zero(v);
+#else
+       return raw_atomic_inc_not_zero(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_inc_unless_negative() - atomic increment unless negative with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * If (@v >= 0), atomically updates @v to (@v + 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_inc_unless_negative() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_inc_unless_negative(atomic_long_t *v)
+raw_atomic_long_inc_unless_negative(atomic_long_t *v)
 {
-       return arch_atomic_inc_unless_negative(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_inc_unless_negative(v);
+#else
+       return raw_atomic_inc_unless_negative(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_dec_unless_positive() - atomic decrement unless positive with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * If (@v <= 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_dec_unless_positive() elsewhere.
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
 static __always_inline bool
-arch_atomic_long_dec_unless_positive(atomic_long_t *v)
+raw_atomic_long_dec_unless_positive(atomic_long_t *v)
 {
-       return arch_atomic_dec_unless_positive(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_dec_unless_positive(v);
+#else
+       return raw_atomic_dec_unless_positive(v);
+#endif
 }
 
+/**
+ * raw_atomic_long_dec_if_positive() - atomic decrement if positive with full ordering
+ * @v: pointer to atomic_long_t
+ *
+ * If (@v > 0), atomically updates @v to (@v - 1) with full ordering.
+ *
+ * Safe to use in noinstr code; prefer atomic_long_dec_if_positive() elsewhere.
+ *
+ * Return: The old value of (@v - 1), regardless of whether @v was updated.
+ */
 static __always_inline long
-arch_atomic_long_dec_if_positive(atomic_long_t *v)
+raw_atomic_long_dec_if_positive(atomic_long_t *v)
 {
-       return arch_atomic_dec_if_positive(v);
+#ifdef CONFIG_64BIT
+       return raw_atomic64_dec_if_positive(v);
+#else
+       return raw_atomic_dec_if_positive(v);
+#endif
 }
 
-#endif /* CONFIG_64BIT */
 #endif /* _LINUX_ATOMIC_LONG_H */
-// a194c07d7d2f4b0e178d3c118c919775d5d65f50
+// 4ef23f98c73cff96d239896175fd26b10b88899e
index 31086a7..6a3a9e1 100644 (file)
@@ -130,8 +130,6 @@ extern unsigned compat_dir_class[];
 extern unsigned compat_chattr_class[];
 extern unsigned compat_signal_class[];
 
-extern int audit_classify_compat_syscall(int abi, unsigned syscall);
-
 /* audit_names->type values */
 #define        AUDIT_TYPE_UNKNOWN      0       /* we don't know yet */
 #define        AUDIT_TYPE_NORMAL       1       /* a "normal" audit record */
index 8fdb1af..0e34d67 100644 (file)
@@ -21,4 +21,6 @@ enum auditsc_class_t {
        AUDITSC_NVALS /* count */
 };
 
+extern int audit_classify_compat_syscall(int abi, unsigned syscall);
+
 #endif
index b3e7529..c4f5b52 100644 (file)
@@ -229,7 +229,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count)
 
 static inline bool bio_flagged(struct bio *bio, unsigned int bit)
 {
-       return (bio->bi_flags & (1U << bit)) != 0;
+       return bio->bi_flags & (1U << bit);
 }
 
 static inline void bio_set_flag(struct bio *bio, unsigned int bit)
@@ -465,14 +465,18 @@ extern void bio_uninit(struct bio *);
 void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf);
 void bio_chain(struct bio *, struct bio *);
 
-int bio_add_page(struct bio *, struct page *, unsigned len, unsigned off);
-bool bio_add_folio(struct bio *, struct folio *, size_t len, size_t off);
+int __must_check bio_add_page(struct bio *bio, struct page *page, unsigned len,
+                             unsigned off);
+bool __must_check bio_add_folio(struct bio *bio, struct folio *folio,
+                               size_t len, size_t off);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
                           unsigned int, unsigned int);
 int bio_add_zone_append_page(struct bio *bio, struct page *page,
                             unsigned int len, unsigned int offset);
 void __bio_add_page(struct bio *bio, struct page *page,
                unsigned int len, unsigned int off);
+void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len,
+                         size_t off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
 void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter);
 void __bio_release_pages(struct bio *bio, bool mark_dirty);
@@ -488,7 +492,7 @@ void zero_fill_bio(struct bio *bio);
 
 static inline void bio_release_pages(struct bio *bio, bool mark_dirty)
 {
-       if (!bio_flagged(bio, BIO_NO_PAGE_REF))
+       if (bio_flagged(bio, BIO_PAGE_PINNED))
                __bio_release_pages(bio, mark_dirty);
 }
 
index 06caacd..f401067 100644 (file)
@@ -28,8 +28,6 @@ typedef __u32 __bitwise req_flags_t;
 
 /* drive already may have started this one */
 #define RQF_STARTED            ((__force req_flags_t)(1 << 1))
-/* may not be passed by ioscheduler */
-#define RQF_SOFTBARRIER                ((__force req_flags_t)(1 << 3))
 /* request for flush sequence */
 #define RQF_FLUSH_SEQ          ((__force req_flags_t)(1 << 4))
 /* merge of different types, fail separately */
@@ -38,12 +36,14 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_MQ_INFLIGHT                ((__force req_flags_t)(1 << 6))
 /* don't call prep for this one */
 #define RQF_DONTPREP           ((__force req_flags_t)(1 << 7))
+/* use hctx->sched_tags */
+#define RQF_SCHED_TAGS         ((__force req_flags_t)(1 << 8))
+/* use an I/O scheduler for this request */
+#define RQF_USE_SCHED          ((__force req_flags_t)(1 << 9))
 /* vaguely specified driver internal error.  Ignored by the block layer */
 #define RQF_FAILED             ((__force req_flags_t)(1 << 10))
 /* don't warn about errors */
 #define RQF_QUIET              ((__force req_flags_t)(1 << 11))
-/* elevator private data attached */
-#define RQF_ELVPRIV            ((__force req_flags_t)(1 << 12))
 /* account into disk and partition IO statistics */
 #define RQF_IO_STAT            ((__force req_flags_t)(1 << 13))
 /* runtime pm request */
@@ -59,13 +59,11 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_ZONE_WRITE_LOCKED  ((__force req_flags_t)(1 << 19))
 /* ->timeout has been called, don't expire again */
 #define RQF_TIMED_OUT          ((__force req_flags_t)(1 << 21))
-/* queue has elevator attached */
-#define RQF_ELV                        ((__force req_flags_t)(1 << 22))
-#define RQF_RESV                       ((__force req_flags_t)(1 << 23))
+#define RQF_RESV               ((__force req_flags_t)(1 << 23))
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \
-       (RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
+       (RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
 
 enum mq_rq_state {
        MQ_RQ_IDLE              = 0,
@@ -169,25 +167,20 @@ struct request {
                void *completion_data;
        };
 
-
        /*
         * Three pointers are available for the IO schedulers, if they need
-        * more they have to dynamically allocate it.  Flush requests are
-        * never put on the IO scheduler. So let the flush fields share
-        * space with the elevator data.
+        * more they have to dynamically allocate it.
         */
-       union {
-               struct {
-                       struct io_cq            *icq;
-                       void                    *priv[2];
-               } elv;
-
-               struct {
-                       unsigned int            seq;
-                       struct list_head        list;
-                       rq_end_io_fn            *saved_end_io;
-               } flush;
-       };
+       struct {
+               struct io_cq            *icq;
+               void                    *priv[2];
+       } elv;
+
+       struct {
+               unsigned int            seq;
+               struct list_head        list;
+               rq_end_io_fn            *saved_end_io;
+       } flush;
 
        union {
                struct __call_single_data csd;
@@ -208,7 +201,7 @@ static inline enum req_op req_op(const struct request *req)
 
 static inline bool blk_rq_is_passthrough(struct request *rq)
 {
-       return blk_op_is_passthrough(req_op(rq));
+       return blk_op_is_passthrough(rq->cmd_flags);
 }
 
 static inline unsigned short req_get_ioprio(struct request *req)
@@ -746,8 +739,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 struct blk_mq_tags {
        unsigned int nr_tags;
        unsigned int nr_reserved_tags;
-
-       atomic_t active_queues;
+       unsigned int active_queues;
 
        struct sbitmap_queue bitmap_tags;
        struct sbitmap_queue breserved_tags;
@@ -844,7 +836,7 @@ void blk_mq_end_request_batch(struct io_comp_batch *ib);
  */
 static inline bool blk_mq_need_time_stamp(struct request *rq)
 {
-       return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS | RQF_ELV));
+       return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS | RQF_USE_SCHED));
 }
 
 static inline bool blk_mq_is_reserved_rq(struct request *rq)
@@ -860,7 +852,7 @@ static inline bool blk_mq_add_to_batch(struct request *req,
                                       struct io_comp_batch *iob, int ioerror,
                                       void (*complete)(struct io_comp_batch *))
 {
-       if (!iob || (req->rq_flags & RQF_ELV) || ioerror ||
+       if (!iob || (req->rq_flags & RQF_USE_SCHED) || ioerror ||
                        (req->end_io && !blk_rq_is_passthrough(req)))
                return false;
 
@@ -1164,6 +1156,18 @@ static inline unsigned int blk_rq_zone_is_seq(struct request *rq)
        return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
 }
 
+/**
+ * blk_rq_is_seq_zoned_write() - Check if @rq requires write serialization.
+ * @rq: Request to examine.
+ *
+ * Note: REQ_OP_ZONE_APPEND requests do not require serialization.
+ */
+static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
+{
+       return op_needs_zoned_write_locking(req_op(rq)) &&
+               blk_rq_zone_is_seq(rq);
+}
+
 bool blk_req_needs_zone_write_lock(struct request *rq);
 bool blk_req_zone_write_trylock(struct request *rq);
 void __blk_req_zone_write_lock(struct request *rq);
@@ -1194,6 +1198,11 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
        return !blk_req_zone_is_write_locked(rq);
 }
 #else /* CONFIG_BLK_DEV_ZONED */
+static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
+{
+       return false;
+}
+
 static inline bool blk_req_needs_zone_write_lock(struct request *rq)
 {
        return false;
index 740afe8..752a54e 100644 (file)
@@ -55,6 +55,8 @@ struct block_device {
        struct super_block *    bd_super;
        void *                  bd_claiming;
        void *                  bd_holder;
+       const struct blk_holder_ops *bd_holder_ops;
+       struct mutex            bd_holder_lock;
        /* The counter of freeze processes */
        int                     bd_fsfreeze_count;
        int                     bd_holders;
@@ -323,7 +325,7 @@ struct bio {
  * bio flags
  */
 enum {
-       BIO_NO_PAGE_REF,        /* don't put release vec pages */
+       BIO_PAGE_PINNED,        /* Unpin pages in bio_release_pages() */
        BIO_CLONED,             /* doesn't own data */
        BIO_BOUNCED,            /* bio is a bounce bio */
        BIO_QUIET,              /* Make BIO Quiet */
index b441e63..ed44a99 100644 (file)
@@ -41,7 +41,7 @@ struct blk_stat_callback;
 struct blk_crypto_profile;
 
 extern const struct device_type disk_type;
-extern struct device_type part_type;
+extern const struct device_type part_type;
 extern struct class block_class;
 
 /*
@@ -112,6 +112,19 @@ struct blk_integrity {
        unsigned char                           tag_size;
 };
 
+typedef unsigned int __bitwise blk_mode_t;
+
+/* open for reading */
+#define BLK_OPEN_READ          ((__force blk_mode_t)(1 << 0))
+/* open for writing */
+#define BLK_OPEN_WRITE         ((__force blk_mode_t)(1 << 1))
+/* open exclusively (vs other exclusive openers */
+#define BLK_OPEN_EXCL          ((__force blk_mode_t)(1 << 2))
+/* opened with O_NDELAY */
+#define BLK_OPEN_NDELAY                ((__force blk_mode_t)(1 << 3))
+/* open for "writes" only for ioctls (specialy hack for floppy.c) */
+#define BLK_OPEN_WRITE_IOCTL   ((__force blk_mode_t)(1 << 4))
+
 struct gendisk {
        /*
         * major/first_minor/minors should not be set by any new driver, the
@@ -187,6 +200,7 @@ struct gendisk {
        struct badblocks *bb;
        struct lockdep_map lockdep_map;
        u64 diskseq;
+       blk_mode_t open_mode;
 
        /*
         * Independent sector access ranges. This is always NULL for
@@ -318,7 +332,6 @@ typedef int (*report_zones_cb)(struct blk_zone *zone, unsigned int idx,
 void disk_set_zoned(struct gendisk *disk, enum blk_zoned_model model);
 
 #ifdef CONFIG_BLK_DEV_ZONED
-
 #define BLK_ALL_ZONES  ((unsigned int)-1)
 int blkdev_report_zones(struct block_device *bdev, sector_t sector,
                        unsigned int nr_zones, report_zones_cb cb, void *data);
@@ -328,33 +341,11 @@ extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op,
                            gfp_t gfp_mask);
 int blk_revalidate_disk_zones(struct gendisk *disk,
                              void (*update_driver_data)(struct gendisk *disk));
-
-extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
-                                    unsigned int cmd, unsigned long arg);
-extern int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
-                                 unsigned int cmd, unsigned long arg);
-
 #else /* CONFIG_BLK_DEV_ZONED */
-
 static inline unsigned int bdev_nr_zones(struct block_device *bdev)
 {
        return 0;
 }
-
-static inline int blkdev_report_zones_ioctl(struct block_device *bdev,
-                                           fmode_t mode, unsigned int cmd,
-                                           unsigned long arg)
-{
-       return -ENOTTY;
-}
-
-static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
-                                        fmode_t mode, unsigned int cmd,
-                                        unsigned long arg)
-{
-       return -ENOTTY;
-}
-
 #endif /* CONFIG_BLK_DEV_ZONED */
 
 /*
@@ -392,6 +383,7 @@ struct request_queue {
 
        struct blk_queue_stats  *stats;
        struct rq_qos           *rq_qos;
+       struct mutex            rq_qos_mutex;
 
        const struct blk_mq_ops *mq_ops;
 
@@ -487,6 +479,7 @@ struct request_queue {
         * for flush operations
         */
        struct blk_flush_queue  *fq;
+       struct list_head        flush_list;
 
        struct list_head        requeue_list;
        spinlock_t              requeue_lock;
@@ -815,7 +808,7 @@ int __register_blkdev(unsigned int major, const char *name,
        __register_blkdev(major, name, NULL)
 void unregister_blkdev(unsigned int major, const char *name);
 
-bool bdev_check_media_change(struct block_device *bdev);
+bool disk_check_media_change(struct gendisk *disk);
 int __invalidate_device(struct block_device *bdev, bool kill_dirty);
 void set_capacity(struct gendisk *disk, sector_t size);
 
@@ -836,7 +829,6 @@ static inline void bd_unlink_disk_holder(struct block_device *bdev,
 
 dev_t part_devt(struct gendisk *disk, u8 partno);
 void inc_diskseq(struct gendisk *disk);
-dev_t blk_lookup_devt(const char *name, int partno);
 void blk_request_module(dev_t devt);
 
 extern int blk_register_queue(struct gendisk *disk);
@@ -1281,15 +1273,18 @@ static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
        return disk_zone_no(bdev->bd_disk, sec);
 }
 
-static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
-                                         blk_opf_t op)
+/* Whether write serialization is required for @op on zoned devices. */
+static inline bool op_needs_zoned_write_locking(enum req_op op)
 {
-       if (!bdev_is_zoned(bdev))
-               return false;
-
        return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
 }
 
+static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
+                                         enum req_op op)
+{
+       return bdev_is_zoned(bdev) && op_needs_zoned_write_locking(op);
+}
+
 static inline sector_t bdev_zone_sectors(struct block_device *bdev)
 {
        struct request_queue *q = bdev_get_queue(bdev);
@@ -1376,16 +1371,16 @@ enum blk_unique_id {
        BLK_UID_NAA     = 3,
 };
 
-#define NFL4_UFLG_MASK                 0x0000003F
-
 struct block_device_operations {
        void (*submit_bio)(struct bio *bio);
        int (*poll_bio)(struct bio *bio, struct io_comp_batch *iob,
                        unsigned int flags);
-       int (*open) (struct block_device *, fmode_t);
-       void (*release) (struct gendisk *, fmode_t);
-       int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-       int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
+       int (*open)(struct gendisk *disk, blk_mode_t mode);
+       void (*release)(struct gendisk *disk);
+       int (*ioctl)(struct block_device *bdev, blk_mode_t mode,
+                       unsigned cmd, unsigned long arg);
+       int (*compat_ioctl)(struct block_device *bdev, blk_mode_t mode,
+                       unsigned cmd, unsigned long arg);
        unsigned int (*check_events) (struct gendisk *disk,
                                      unsigned int clearing);
        void (*unlock_native_capacity) (struct gendisk *);
@@ -1412,7 +1407,7 @@ struct block_device_operations {
 };
 
 #ifdef CONFIG_COMPAT
-extern int blkdev_compat_ptr_ioctl(struct block_device *, fmode_t,
+extern int blkdev_compat_ptr_ioctl(struct block_device *, blk_mode_t,
                                      unsigned int, unsigned long);
 #else
 #define blkdev_compat_ptr_ioctl NULL
@@ -1465,22 +1460,31 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
 #define BLKDEV_MAJOR_MAX       0
 #endif
 
-struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
-               void *holder);
-struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder);
-int bd_prepare_to_claim(struct block_device *bdev, void *holder);
+struct blk_holder_ops {
+       void (*mark_dead)(struct block_device *bdev);
+};
+
+/*
+ * Return the correct open flags for blkdev_get_by_* for super block flags
+ * as stored in sb->s_flags.
+ */
+#define sb_open_mode(flags) \
+       (BLK_OPEN_READ | (((flags) & SB_RDONLY) ? 0 : BLK_OPEN_WRITE))
+
+struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+               const struct blk_holder_ops *hops);
+struct block_device *blkdev_get_by_path(const char *path, blk_mode_t mode,
+               void *holder, const struct blk_holder_ops *hops);
+int bd_prepare_to_claim(struct block_device *bdev, void *holder,
+               const struct blk_holder_ops *hops);
 void bd_abort_claiming(struct block_device *bdev, void *holder);
-void blkdev_put(struct block_device *bdev, fmode_t mode);
+void blkdev_put(struct block_device *bdev, void *holder);
 
 /* just for blk-cgroup, don't use elsewhere */
 struct block_device *blkdev_get_no_open(dev_t dev);
 void blkdev_put_no_open(struct block_device *bdev);
 
-struct block_device *bdev_alloc(struct gendisk *disk, u8 partno);
-void bdev_add(struct block_device *bdev, dev_t dev);
 struct block_device *I_BDEV(struct inode *inode);
-int truncate_bdev_range(struct block_device *bdev, fmode_t mode, loff_t lstart,
-               loff_t lend);
 
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
@@ -1490,6 +1494,7 @@ int sync_blockdev_nowait(struct block_device *bdev);
 void sync_bdevs(bool wait);
 void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
 void printk_all_partitions(void);
+int __init early_lookup_bdev(const char *pathname, dev_t *dev);
 #else
 static inline void invalidate_bdev(struct block_device *bdev)
 {
@@ -1511,6 +1516,10 @@ static inline void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 static inline void printk_all_partitions(void)
 {
 }
+static inline int early_lookup_bdev(const char *pathname, dev_t *dev)
+{
+       return -EINVAL;
+}
 #endif /* CONFIG_BLOCK */
 
 int fsync_bdev(struct block_device *bdev);
index cfbda11..122c62e 100644 (file)
@@ -85,10 +85,14 @@ extern int blk_trace_remove(struct request_queue *q);
 # define blk_add_driver_data(rq, data, len)            do {} while (0)
 # define blk_trace_setup(q, name, dev, bdev, arg)      (-ENOTTY)
 # define blk_trace_startstop(q, start)                 (-ENOTTY)
-# define blk_trace_remove(q)                           (-ENOTTY)
 # define blk_add_trace_msg(q, fmt, ...)                        do { } while (0)
 # define blk_add_cgroup_trace_msg(q, cg, fmt, ...)     do { } while (0)
 # define blk_trace_note_message_enabled(q)             (false)
+
+static inline int blk_trace_remove(struct request_queue *q)
+{
+       return -ENOTTY;
+}
 #endif /* CONFIG_BLK_DEV_IO_TRACE */
 
 #ifdef CONFIG_COMPAT
index 1ac81c8..ee2df73 100644 (file)
@@ -9,7 +9,7 @@ struct device;
 struct request_queue;
 
 typedef int (bsg_sg_io_fn)(struct request_queue *, struct sg_io_v4 *hdr,
-               fmode_t mode, unsigned int timeout);
+               bool open_for_write, unsigned int timeout);
 
 struct bsg_device *bsg_register_queue(struct request_queue *q,
                struct device *parent, const char *name,
index 67caa90..98c6fd0 100644 (file)
@@ -13,6 +13,7 @@
 
 #include <linux/fs.h>          /* not really needed, later.. */
 #include <linux/list.h>
+#include <linux/blkdev.h>
 #include <scsi/scsi_common.h>
 #include <uapi/linux/cdrom.h>
 
@@ -61,9 +62,9 @@ struct cdrom_device_info {
        __u8 last_sense;
        __u8 media_written;             /* dirty flag, DVD+RW bookkeeping */
        unsigned short mmc3_profile;    /* current MMC3 profile */
-       int for_data;
        int (*exit)(struct cdrom_device_info *);
        int mrw_mode_page;
+       bool opened_for_data;
        __s64 last_media_change_ms;
 };
 
@@ -101,11 +102,10 @@ int cdrom_read_tocentry(struct cdrom_device_info *cdi,
                struct cdrom_tocentry *entry);
 
 /* the general block_device operations structure: */
-extern int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev,
-                       fmode_t mode);
-extern void cdrom_release(struct cdrom_device_info *cdi, fmode_t mode);
-extern int cdrom_ioctl(struct cdrom_device_info *cdi, struct block_device *bdev,
-                      fmode_t mode, unsigned int cmd, unsigned long arg);
+int cdrom_open(struct cdrom_device_info *cdi, blk_mode_t mode);
+void cdrom_release(struct cdrom_device_info *cdi);
+int cdrom_ioctl(struct cdrom_device_info *cdi, struct block_device *bdev,
+               unsigned int cmd, unsigned long arg);
 extern unsigned int cdrom_check_events(struct cdrom_device_info *cdi,
                                       unsigned int clearing);
 
index 885f539..1261a47 100644 (file)
@@ -118,7 +118,6 @@ int cgroup_rm_cftypes(struct cftype *cfts);
 void cgroup_file_notify(struct cgroup_file *cfile);
 void cgroup_file_show(struct cgroup_file *cfile, bool show);
 
-int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen);
 int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry);
 int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
                     struct pid *pid, struct task_struct *tsk);
index 947a60b..d7779a1 100644 (file)
  * Note: DISABLE_BRANCH_PROFILING can be used by special lowlevel code
  * to disable branch tracing on a per file basis.
  */
-#if defined(CONFIG_TRACE_BRANCH_PROFILING) \
-    && !defined(DISABLE_BRANCH_PROFILING) && !defined(__CHECKER__)
 void ftrace_likely_update(struct ftrace_likely_data *f, int val,
                          int expect, int is_constant);
-
+#if defined(CONFIG_TRACE_BRANCH_PROFILING) \
+    && !defined(DISABLE_BRANCH_PROFILING) && !defined(__CHECKER__)
 #define likely_notrace(x)      __builtin_expect(!!(x), 1)
 #define unlikely_notrace(x)    __builtin_expect(!!(x), 0)
 
index e659cb6..8486476 100644 (file)
 #define __noreturn                      __attribute__((__noreturn__))
 
 /*
+ * Optional: only supported since GCC >= 11.1, clang >= 7.0.
+ *
+ *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-no_005fstack_005fprotector-function-attribute
+ *   clang: https://clang.llvm.org/docs/AttributeReference.html#no-stack-protector-safebuffers
+ */
+#if __has_attribute(__no_stack_protector__)
+# define __no_stack_protector          __attribute__((__no_stack_protector__))
+#else
+# define __no_stack_protector
+#endif
+
+/*
  * Optional: not supported by gcc.
  *
  * clang: https://clang.llvm.org/docs/AttributeReference.html#overloadable
index d3cbb6c..6e76b9d 100644 (file)
@@ -119,7 +119,7 @@ extern void ct_idle_exit(void);
  */
 static __always_inline bool rcu_dynticks_curr_cpu_in_eqs(void)
 {
-       return !(arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & RCU_DYNTICKS_IDX);
+       return !(raw_atomic_read(this_cpu_ptr(&context_tracking.state)) & RCU_DYNTICKS_IDX);
 }
 
 /*
@@ -128,7 +128,7 @@ static __always_inline bool rcu_dynticks_curr_cpu_in_eqs(void)
  */
 static __always_inline unsigned long ct_state_inc(int incby)
 {
-       return arch_atomic_add_return(incby, this_cpu_ptr(&context_tracking.state));
+       return raw_atomic_add_return(incby, this_cpu_ptr(&context_tracking.state));
 }
 
 static __always_inline bool warn_rcu_enter(void)
index fdd537e..bbff5f7 100644 (file)
@@ -51,7 +51,7 @@ DECLARE_PER_CPU(struct context_tracking, context_tracking);
 #ifdef CONFIG_CONTEXT_TRACKING_USER
 static __always_inline int __ct_state(void)
 {
-       return arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK;
+       return raw_atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK;
 }
 #endif
 
index eacb7dd..c1a7dc3 100644 (file)
@@ -572,4 +572,10 @@ void cper_print_proc_ia(const char *pfx,
 int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg);
 int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg);
 
+struct acpi_hest_generic_status;
+void cper_estatus_print(const char *pfx,
+                       const struct acpi_hest_generic_status *estatus);
+int cper_estatus_check_header(const struct acpi_hest_generic_status *estatus);
+int cper_estatus_check(const struct acpi_hest_generic_status *estatus);
+
 #endif
index 8582a71..6e6e57e 100644 (file)
@@ -184,8 +184,12 @@ void arch_cpu_idle_enter(void);
 void arch_cpu_idle_exit(void);
 void __noreturn arch_cpu_idle_dead(void);
 
-int cpu_report_state(int cpu);
-int cpu_check_up_prepare(int cpu);
+#ifdef CONFIG_ARCH_HAS_CPU_FINALIZE_INIT
+void arch_cpu_finalize_init(void);
+#else
+static inline void arch_cpu_finalize_init(void) { }
+#endif
+
 void cpu_set_state_online(int cpu);
 void play_idle_precise(u64 duration_ns, u64 latency_ns);
 
@@ -195,8 +199,6 @@ static inline void play_idle(unsigned long duration_us)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-bool cpu_wait_death(unsigned int cpu, int seconds);
-bool cpu_report_death(void);
 void cpuhp_report_idle_dead(void);
 #else
 static inline void cpuhp_report_idle_dead(void) { }
index 26e2eb3..172ff51 100644 (file)
@@ -340,7 +340,10 @@ struct cpufreq_driver {
        /*
         * ->fast_switch() replacement for drivers that use an internal
         * representation of performance levels and can pass hints other than
-        * the target performance level to the hardware.
+        * the target performance level to the hardware. This can only be set
+        * if ->fast_switch is set too, because in those cases (under specific
+        * conditions) scale invariance can be disabled, which causes the
+        * schedutil governor to fall back to the latter.
         */
        void            (*adjust_perf)(unsigned int cpu,
                                       unsigned long min_perf,
index 0f1001d..25b6e6e 100644 (file)
@@ -133,6 +133,7 @@ enum cpuhp_state {
        CPUHP_MIPS_SOC_PREPARE,
        CPUHP_BP_PREPARE_DYN,
        CPUHP_BP_PREPARE_DYN_END                = CPUHP_BP_PREPARE_DYN + 20,
+       CPUHP_BP_KICK_AP,
        CPUHP_BRINGUP_CPU,
 
        /*
@@ -200,6 +201,7 @@ enum cpuhp_state {
 
        /* Online section invoked on the hotplugged CPU from the hotplug thread */
        CPUHP_AP_ONLINE_IDLE,
+       CPUHP_AP_HYPERV_ONLINE,
        CPUHP_AP_KVM_ONLINE,
        CPUHP_AP_SCHED_WAIT_EMPTY,
        CPUHP_AP_SMPBOOT_THREADS,
@@ -517,4 +519,20 @@ void cpuhp_online_idle(enum cpuhp_state state);
 static inline void cpuhp_online_idle(enum cpuhp_state state) { }
 #endif
 
+struct task_struct;
+
+void cpuhp_ap_sync_alive(void);
+void arch_cpuhp_sync_state_poll(void);
+void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu);
+int arch_cpuhp_kick_ap_alive(unsigned int cpu, struct task_struct *tidle);
+bool arch_cpuhp_init_parallel_bringup(void);
+
+#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD
+void cpuhp_ap_report_dead(void);
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu);
+#else
+static inline void cpuhp_ap_report_dead(void) { }
+static inline void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) { }
+#endif
+
 #endif
index ca736b0..0d2e2a3 100644 (file)
@@ -1071,7 +1071,7 @@ static inline const struct cpumask *get_cpu_mask(unsigned int cpu)
  */
 static __always_inline unsigned int num_online_cpus(void)
 {
-       return arch_atomic_read(&__num_online_cpus);
+       return raw_atomic_read(&__num_online_cpus);
 }
 #define num_possible_cpus()    cpumask_weight(cpu_possible_mask)
 #define num_present_cpus()     cpumask_weight(cpu_present_mask)
index 980b76a..d629094 100644 (file)
@@ -71,8 +71,10 @@ extern void cpuset_init_smp(void);
 extern void cpuset_force_rebuild(void);
 extern void cpuset_update_active_cpus(void);
 extern void cpuset_wait_for_hotplug(void);
-extern void cpuset_read_lock(void);
-extern void cpuset_read_unlock(void);
+extern void inc_dl_tasks_cs(struct task_struct *task);
+extern void dec_dl_tasks_cs(struct task_struct *task);
+extern void cpuset_lock(void);
+extern void cpuset_unlock(void);
 extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
 extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
@@ -189,8 +191,10 @@ static inline void cpuset_update_active_cpus(void)
 
 static inline void cpuset_wait_for_hotplug(void) { }
 
-static inline void cpuset_read_lock(void) { }
-static inline void cpuset_read_unlock(void) { }
+static inline void inc_dl_tasks_cs(struct task_struct *task) { }
+static inline void dec_dl_tasks_cs(struct task_struct *task) { }
+static inline void cpuset_lock(void) { }
+static inline void cpuset_unlock(void) { }
 
 static inline void cpuset_cpus_allowed(struct task_struct *p,
                                       struct cpumask *mask)
index 7fd704b..d312ffb 100644 (file)
@@ -108,7 +108,6 @@ struct devfreq_dev_profile {
        unsigned long initial_freq;
        unsigned int polling_ms;
        enum devfreq_timer timer;
-       bool is_cooling_device;
 
        int (*target)(struct device *dev, unsigned long *freq, u32 flags);
        int (*get_dev_status)(struct device *dev,
@@ -118,6 +117,8 @@ struct devfreq_dev_profile {
 
        unsigned long *freq_table;
        unsigned int max_state;
+
+       bool is_cooling_device;
 };
 
 /**
index a52d2b9..69d0435 100644 (file)
@@ -166,17 +166,15 @@ void dm_error(const char *message);
 struct dm_dev {
        struct block_device *bdev;
        struct dax_device *dax_dev;
-       fmode_t mode;
+       blk_mode_t mode;
        char name[16];
 };
 
-dev_t dm_get_dev_t(const char *path);
-
 /*
  * Constructors should call these functions to ensure destination devices
  * are opened/closed correctly.
  */
-int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
+int dm_get_device(struct dm_target *ti, const char *path, blk_mode_t mode,
                  struct dm_dev **result);
 void dm_put_device(struct dm_target *ti, struct dm_dev *d);
 
@@ -545,7 +543,7 @@ int dm_set_geometry(struct mapped_device *md, struct hd_geometry *geo);
 /*
  * First create an empty table.
  */
-int dm_table_create(struct dm_table **result, fmode_t mode,
+int dm_table_create(struct dm_table **result, blk_mode_t mode,
                    unsigned int num_targets, struct mapped_device *md);
 
 /*
@@ -588,7 +586,7 @@ void dm_sync_table(struct mapped_device *md);
  * Queries
  */
 sector_t dm_table_get_size(struct dm_table *t);
-fmode_t dm_table_get_mode(struct dm_table *t);
+blk_mode_t dm_table_get_mode(struct dm_table *t);
 struct mapped_device *dm_table_get_md(struct dm_table *t);
 const char *dm_table_device_name(struct dm_table *t);
 
index 9deeaeb..abf3d3b 100644 (file)
@@ -74,6 +74,7 @@ struct class {
 struct class_dev_iter {
        struct klist_iter               ki;
        const struct device_type        *type;
+       struct subsys_private           *sp;
 };
 
 int __must_check class_register(const struct class *class);
index c244267..7738f45 100644 (file)
@@ -126,7 +126,7 @@ int __must_check driver_register(struct device_driver *drv);
 void driver_unregister(struct device_driver *drv);
 
 struct device_driver *driver_find(const char *name, const struct bus_type *bus);
-int driver_probe_done(void);
+bool __init driver_probe_done(void);
 void wait_for_device_probe(void);
 void __init wait_for_init_devices_probe(void);
 
index 6c57339..f343bc9 100644 (file)
@@ -236,8 +236,9 @@ void dim_park_tired(struct dim *dim);
  *
  * Calculate the delta between two samples (in data rates).
  * Takes into consideration counter wrap-around.
+ * Returned boolean indicates whether curr_stats are reliable.
  */
-void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
+bool dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
                    struct dim_stats *curr_stats);
 
 /**
index 725d5e6..27dbd4c 100644 (file)
@@ -202,67 +202,74 @@ static inline void detect_intel_iommu(void)
 
 struct irte {
        union {
-               /* Shared between remapped and posted mode*/
                struct {
-                       __u64   present         : 1,  /*  0      */
-                               fpd             : 1,  /*  1      */
-                               __res0          : 6,  /*  2 -  6 */
-                               avail           : 4,  /*  8 - 11 */
-                               __res1          : 3,  /* 12 - 14 */
-                               pst             : 1,  /* 15      */
-                               vector          : 8,  /* 16 - 23 */
-                               __res2          : 40; /* 24 - 63 */
+                       union {
+                               /* Shared between remapped and posted mode*/
+                               struct {
+                                       __u64   present         : 1,  /*  0      */
+                                               fpd             : 1,  /*  1      */
+                                               __res0          : 6,  /*  2 -  6 */
+                                               avail           : 4,  /*  8 - 11 */
+                                               __res1          : 3,  /* 12 - 14 */
+                                               pst             : 1,  /* 15      */
+                                               vector          : 8,  /* 16 - 23 */
+                                               __res2          : 40; /* 24 - 63 */
+                               };
+
+                               /* Remapped mode */
+                               struct {
+                                       __u64   r_present       : 1,  /*  0      */
+                                               r_fpd           : 1,  /*  1      */
+                                               dst_mode        : 1,  /*  2      */
+                                               redir_hint      : 1,  /*  3      */
+                                               trigger_mode    : 1,  /*  4      */
+                                               dlvry_mode      : 3,  /*  5 -  7 */
+                                               r_avail         : 4,  /*  8 - 11 */
+                                               r_res0          : 4,  /* 12 - 15 */
+                                               r_vector        : 8,  /* 16 - 23 */
+                                               r_res1          : 8,  /* 24 - 31 */
+                                               dest_id         : 32; /* 32 - 63 */
+                               };
+
+                               /* Posted mode */
+                               struct {
+                                       __u64   p_present       : 1,  /*  0      */
+                                               p_fpd           : 1,  /*  1      */
+                                               p_res0          : 6,  /*  2 -  7 */
+                                               p_avail         : 4,  /*  8 - 11 */
+                                               p_res1          : 2,  /* 12 - 13 */
+                                               p_urgent        : 1,  /* 14      */
+                                               p_pst           : 1,  /* 15      */
+                                               p_vector        : 8,  /* 16 - 23 */
+                                               p_res2          : 14, /* 24 - 37 */
+                                               pda_l           : 26; /* 38 - 63 */
+                               };
+                               __u64 low;
+                       };
+
+                       union {
+                               /* Shared between remapped and posted mode*/
+                               struct {
+                                       __u64   sid             : 16,  /* 64 - 79  */
+                                               sq              : 2,   /* 80 - 81  */
+                                               svt             : 2,   /* 82 - 83  */
+                                               __res3          : 44;  /* 84 - 127 */
+                               };
+
+                               /* Posted mode*/
+                               struct {
+                                       __u64   p_sid           : 16,  /* 64 - 79  */
+                                               p_sq            : 2,   /* 80 - 81  */
+                                               p_svt           : 2,   /* 82 - 83  */
+                                               p_res3          : 12,  /* 84 - 95  */
+                                               pda_h           : 32;  /* 96 - 127 */
+                               };
+                               __u64 high;
+                       };
                };
-
-               /* Remapped mode */
-               struct {
-                       __u64   r_present       : 1,  /*  0      */
-                               r_fpd           : 1,  /*  1      */
-                               dst_mode        : 1,  /*  2      */
-                               redir_hint      : 1,  /*  3      */
-                               trigger_mode    : 1,  /*  4      */
-                               dlvry_mode      : 3,  /*  5 -  7 */
-                               r_avail         : 4,  /*  8 - 11 */
-                               r_res0          : 4,  /* 12 - 15 */
-                               r_vector        : 8,  /* 16 - 23 */
-                               r_res1          : 8,  /* 24 - 31 */
-                               dest_id         : 32; /* 32 - 63 */
-               };
-
-               /* Posted mode */
-               struct {
-                       __u64   p_present       : 1,  /*  0      */
-                               p_fpd           : 1,  /*  1      */
-                               p_res0          : 6,  /*  2 -  7 */
-                               p_avail         : 4,  /*  8 - 11 */
-                               p_res1          : 2,  /* 12 - 13 */
-                               p_urgent        : 1,  /* 14      */
-                               p_pst           : 1,  /* 15      */
-                               p_vector        : 8,  /* 16 - 23 */
-                               p_res2          : 14, /* 24 - 37 */
-                               pda_l           : 26; /* 38 - 63 */
-               };
-               __u64 low;
-       };
-
-       union {
-               /* Shared between remapped and posted mode*/
-               struct {
-                       __u64   sid             : 16,  /* 64 - 79  */
-                               sq              : 2,   /* 80 - 81  */
-                               svt             : 2,   /* 82 - 83  */
-                               __res3          : 44;  /* 84 - 127 */
-               };
-
-               /* Posted mode*/
-               struct {
-                       __u64   p_sid           : 16,  /* 64 - 79  */
-                               p_sq            : 2,   /* 80 - 81  */
-                               p_svt           : 2,   /* 82 - 83  */
-                               p_res3          : 12,  /* 84 - 95  */
-                               pda_h           : 32;  /* 96 - 127 */
-               };
-               __u64 high;
+#ifdef CONFIG_IRQ_REMAP
+               __u128 irte;
+#endif
        };
 };
 
index 7aa62c9..18d83a6 100644 (file)
@@ -108,7 +108,8 @@ typedef     struct {
 #define EFI_MEMORY_MAPPED_IO_PORT_SPACE        12
 #define EFI_PAL_CODE                   13
 #define EFI_PERSISTENT_MEMORY          14
-#define EFI_MAX_MEMORY_TYPE            15
+#define EFI_UNACCEPTED_MEMORY          15
+#define EFI_MAX_MEMORY_TYPE            16
 
 /* Attribute values: */
 #define EFI_MEMORY_UC          ((u64)0x0000000000000001ULL)    /* uncached */
@@ -417,6 +418,7 @@ void efi_native_runtime_setup(void);
 #define LINUX_EFI_MOK_VARIABLE_TABLE_GUID      EFI_GUID(0xc451ed2b, 0x9694, 0x45d3,  0xba, 0xba, 0xed, 0x9f, 0x89, 0x88, 0xa3, 0x89)
 #define LINUX_EFI_COCO_SECRET_AREA_GUID                EFI_GUID(0xadf956ad, 0xe98c, 0x484c,  0xae, 0x11, 0xb5, 0x1c, 0x7d, 0x33, 0x64, 0x47)
 #define LINUX_EFI_BOOT_MEMMAP_GUID             EFI_GUID(0x800f683f, 0xd08b, 0x423a,  0xa2, 0x93, 0x96, 0x5c, 0x3c, 0x6f, 0xe2, 0xb4)
+#define LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID    EFI_GUID(0xd5d1de3c, 0x105c, 0x44f9,  0x9e, 0xa9, 0xbc, 0xef, 0x98, 0x12, 0x00, 0x31)
 
 #define RISCV_EFI_BOOT_PROTOCOL_GUID           EFI_GUID(0xccd15fec, 0x6f73, 0x4eec,  0x83, 0x95, 0x3e, 0x69, 0xe4, 0xb9, 0x40, 0xbf)
 
@@ -435,6 +437,9 @@ void efi_native_runtime_setup(void);
 #define DELLEMC_EFI_RCI2_TABLE_GUID            EFI_GUID(0x2d9f28a2, 0xa886, 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
 #define AMD_SEV_MEM_ENCRYPT_GUID               EFI_GUID(0x0cf29b71, 0x9e51, 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
 
+/* OVMF protocol GUIDs */
+#define OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID       EFI_GUID(0xc5a010fe, 0x38a7, 0x4531,  0x8a, 0x4a, 0x05, 0x00, 0xd2, 0xfd, 0x16, 0x49)
+
 typedef struct {
        efi_guid_t guid;
        u64 table;
@@ -534,6 +539,14 @@ struct efi_boot_memmap {
        efi_memory_desc_t       map[];
 };
 
+struct efi_unaccepted_memory {
+       u32 version;
+       u32 unit_size;
+       u64 phys_base;
+       u64 size;
+       unsigned long bitmap[];
+};
+
 /*
  * Architecture independent structure for describing a memory map for the
  * benefit of efi_memmap_init_early(), and for passing context between
@@ -636,6 +649,7 @@ extern struct efi {
        unsigned long                   tpm_final_log;          /* TPM2 Final Events Log table */
        unsigned long                   mokvar_table;           /* MOK variable config table */
        unsigned long                   coco_secret;            /* Confidential computing secret table */
+       unsigned long                   unaccepted;             /* Unaccepted memory table */
 
        efi_get_time_t                  *get_time;
        efi_set_time_t                  *set_time;
@@ -1338,4 +1352,6 @@ bool efi_config_table_is_usable(const efi_guid_t *guid, unsigned long table)
        return xen_efi_config_table_is_usable(guid, table);
 }
 
+umode_t efi_attr_is_visible(struct kobject *kobj, struct attribute *attr, int n);
+
 #endif /* _LINUX_EFI_H */
index a139c64..b5d9bb2 100644 (file)
 
 #ifndef __ASSEMBLY__
 
+/**
+ * IS_ERR_VALUE - Detect an error pointer.
+ * @x: The pointer to check.
+ *
+ * Like IS_ERR(), but does not generate a compiler warning if result is unused.
+ */
 #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO)
 
+/**
+ * ERR_PTR - Create an error pointer.
+ * @error: A negative error code.
+ *
+ * Encodes @error into a pointer value. Users should consider the result
+ * opaque and not assume anything about how the error is encoded.
+ *
+ * Return: A pointer with @error encoded within its value.
+ */
 static inline void * __must_check ERR_PTR(long error)
 {
        return (void *) error;
 }
 
+/**
+ * PTR_ERR - Extract the error code from an error pointer.
+ * @ptr: An error pointer.
+ * Return: The error code within @ptr.
+ */
 static inline long __must_check PTR_ERR(__force const void *ptr)
 {
        return (long) ptr;
 }
 
+/**
+ * IS_ERR - Detect an error pointer.
+ * @ptr: The pointer to check.
+ * Return: true if @ptr is an error pointer, false otherwise.
+ */
 static inline bool __must_check IS_ERR(__force const void *ptr)
 {
        return IS_ERR_VALUE((unsigned long)ptr);
 }
 
+/**
+ * IS_ERR_OR_NULL - Detect an error pointer or a null pointer.
+ * @ptr: The pointer to check.
+ *
+ * Like IS_ERR(), but also returns true for a null pointer.
+ */
 static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
 {
        return unlikely(!ptr) || IS_ERR_VALUE((unsigned long)ptr);
@@ -54,6 +85,23 @@ static inline void * __must_check ERR_CAST(__force const void *ptr)
        return (void *) ptr;
 }
 
+/**
+ * PTR_ERR_OR_ZERO - Extract the error code from a pointer if it has one.
+ * @ptr: A potential error pointer.
+ *
+ * Convenience function that can be used inside a function that returns
+ * an error code to propagate errors received as error pointers.
+ * For example, ``return PTR_ERR_OR_ZERO(ptr);`` replaces:
+ *
+ * .. code-block:: c
+ *
+ *     if (IS_ERR(ptr))
+ *             return PTR_ERR(ptr);
+ *     else
+ *             return 0;
+ *
+ * Return: The error code within @ptr if it is an error pointer; 0 otherwise.
+ */
 static inline int __must_check PTR_ERR_OR_ZERO(__force const void *ptr)
 {
        if (IS_ERR(ptr))
index 36a4865..b9d8365 100644 (file)
@@ -9,12 +9,12 @@
 #ifndef _LINUX_EVENTFD_H
 #define _LINUX_EVENTFD_H
 
-#include <linux/fcntl.h>
 #include <linux/wait.h>
 #include <linux/err.h>
 #include <linux/percpu-defs.h>
 #include <linux/percpu.h>
 #include <linux/sched.h>
+#include <uapi/linux/eventfd.h>
 
 /*
  * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining
  * from eventfd, in order to leave a free define-space for
  * shared O_* flags.
  */
-#define EFD_SEMAPHORE (1 << 0)
-#define EFD_CLOEXEC O_CLOEXEC
-#define EFD_NONBLOCK O_NONBLOCK
-
 #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
 #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
 
@@ -40,7 +36,7 @@ struct file *eventfd_fget(int fd);
 struct eventfd_ctx *eventfd_ctx_fdget(int fd);
 struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
 __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n);
-__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask);
+__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, __poll_t mask);
 int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *wait,
                                  __u64 *cnt);
 void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt);
index 1716c01..efb6e2c 100644 (file)
@@ -391,7 +391,7 @@ struct fw_iso_packet {
        u32 tag:2;              /* tx: Tag in packet header             */
        u32 sy:4;               /* tx: Sy in packet header              */
        u32 header_length:8;    /* Length of immediate header           */
-       u32 header[0];          /* tx: Top of 1394 isoch. data_block    */
+       u32 header[];           /* tx: Top of 1394 isoch. data_block    */
 };
 
 #define FW_ISO_CONTEXT_TRANSMIT                        0
index 21a9816..ed5b32d 100644 (file)
@@ -119,13 +119,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define FMODE_PWRITE           ((__force fmode_t)0x10)
 /* File is opened for execution with sys_execve / sys_uselib */
 #define FMODE_EXEC             ((__force fmode_t)0x20)
-/* File is opened with O_NDELAY (only set for block devices) */
-#define FMODE_NDELAY           ((__force fmode_t)0x40)
-/* File is opened with O_EXCL (only set for block devices) */
-#define FMODE_EXCL             ((__force fmode_t)0x80)
-/* File is opened using open(.., 3, ..) and is writeable only for ioctls
-   (specialy hack for floppy.c) */
-#define FMODE_WRITE_IOCTL      ((__force fmode_t)0x100)
 /* 32bit hashes as llseek() offset (for directories) */
 #define FMODE_32BITHASH         ((__force fmode_t)0x200)
 /* 64bit hashes as llseek() offset (for directories) */
@@ -171,6 +164,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File supports non-exclusive O_DIRECT writes from multiple threads */
 #define FMODE_DIO_PARALLEL_WRITE       ((__force fmode_t)0x1000000)
 
+/* File is embedded in backing_file object */
+#define FMODE_BACKING          ((__force fmode_t)0x2000000)
+
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY         ((__force fmode_t)0x4000000)
 
@@ -956,29 +952,35 @@ static inline int ra_has_index(struct file_ra_state *ra, pgoff_t index)
                index <  ra->start + ra->size);
 }
 
+/*
+ * f_{lock,count,pos_lock} members can be highly contended and share
+ * the same cacheline. f_{lock,mode} are very frequently used together
+ * and so share the same cacheline as well. The read-mostly
+ * f_{path,inode,op} are kept on a separate cacheline.
+ */
 struct file {
        union {
                struct llist_node       f_llist;
                struct rcu_head         f_rcuhead;
                unsigned int            f_iocb_flags;
        };
-       struct path             f_path;
-       struct inode            *f_inode;       /* cached value */
-       const struct file_operations    *f_op;
 
        /*
         * Protects f_ep, f_flags.
         * Must not be taken from IRQ context.
         */
        spinlock_t              f_lock;
-       atomic_long_t           f_count;
-       unsigned int            f_flags;
        fmode_t                 f_mode;
+       atomic_long_t           f_count;
        struct mutex            f_pos_lock;
        loff_t                  f_pos;
+       unsigned int            f_flags;
        struct fown_struct      f_owner;
        const struct cred       *f_cred;
        struct file_ra_state    f_ra;
+       struct path             f_path;
+       struct inode            *f_inode;       /* cached value */
+       const struct file_operations    *f_op;
 
        u64                     f_version;
 #ifdef CONFIG_SECURITY
@@ -1076,29 +1078,29 @@ extern int send_sigurg(struct fown_struct *fown);
  * sb->s_flags.  Note that these mirror the equivalent MS_* flags where
  * represented in both.
  */
-#define SB_RDONLY       1      /* Mount read-only */
-#define SB_NOSUID       2      /* Ignore suid and sgid bits */
-#define SB_NODEV        4      /* Disallow access to device special files */
-#define SB_NOEXEC       8      /* Disallow program execution */
-#define SB_SYNCHRONOUS 16      /* Writes are synced at once */
-#define SB_MANDLOCK    64      /* Allow mandatory locks on an FS */
-#define SB_DIRSYNC     128     /* Directory modifications are synchronous */
-#define SB_NOATIME     1024    /* Do not update access times. */
-#define SB_NODIRATIME  2048    /* Do not update directory access times */
-#define SB_SILENT      32768
-#define SB_POSIXACL    (1<<16) /* VFS does not apply the umask */
-#define SB_INLINECRYPT (1<<17) /* Use blk-crypto for encrypted files */
-#define SB_KERNMOUNT   (1<<22) /* this is a kern_mount call */
-#define SB_I_VERSION   (1<<23) /* Update inode I_version field */
-#define SB_LAZYTIME    (1<<25) /* Update the on-disk [acm]times lazily */
+#define SB_RDONLY       BIT(0) /* Mount read-only */
+#define SB_NOSUID       BIT(1) /* Ignore suid and sgid bits */
+#define SB_NODEV        BIT(2) /* Disallow access to device special files */
+#define SB_NOEXEC       BIT(3) /* Disallow program execution */
+#define SB_SYNCHRONOUS  BIT(4) /* Writes are synced at once */
+#define SB_MANDLOCK     BIT(6) /* Allow mandatory locks on an FS */
+#define SB_DIRSYNC      BIT(7) /* Directory modifications are synchronous */
+#define SB_NOATIME      BIT(10)        /* Do not update access times. */
+#define SB_NODIRATIME   BIT(11)        /* Do not update directory access times */
+#define SB_SILENT       BIT(15)
+#define SB_POSIXACL     BIT(16)        /* VFS does not apply the umask */
+#define SB_INLINECRYPT  BIT(17)        /* Use blk-crypto for encrypted files */
+#define SB_KERNMOUNT    BIT(22)        /* this is a kern_mount call */
+#define SB_I_VERSION    BIT(23)        /* Update inode I_version field */
+#define SB_LAZYTIME     BIT(25)        /* Update the on-disk [acm]times lazily */
 
 /* These sb flags are internal to the kernel */
-#define SB_SUBMOUNT     (1<<26)
-#define SB_FORCE       (1<<27)
-#define SB_NOSEC       (1<<28)
-#define SB_BORN                (1<<29)
-#define SB_ACTIVE      (1<<30)
-#define SB_NOUSER      (1<<31)
+#define SB_SUBMOUNT     BIT(26)
+#define SB_FORCE        BIT(27)
+#define SB_NOSEC        BIT(28)
+#define SB_BORN         BIT(29)
+#define SB_ACTIVE       BIT(30)
+#define SB_NOUSER       BIT(31)
 
 /* These flags relate to encoding and casefolding */
 #define SB_ENC_STRICT_MODE_FL  (1 << 0)
@@ -1215,7 +1217,6 @@ struct super_block {
        uuid_t                  s_uuid;         /* UUID */
 
        unsigned int            s_max_links;
-       fmode_t                 s_mode;
 
        /*
         * The next field is for VFS *only*. No filesystems have any business
@@ -1242,7 +1243,7 @@ struct super_block {
         */
        atomic_long_t s_fsnotify_connectors;
 
-       /* Being remounted read-only */
+       /* Read-only state of the superblock is being changed */
        int s_readonly_remount;
 
        /* per-sb errseq_t for reporting writeback errors via syncfs */
@@ -1672,9 +1673,12 @@ static inline int vfs_whiteout(struct mnt_idmap *idmap,
                         WHITEOUT_DEV);
 }
 
-struct file *vfs_tmpfile_open(struct mnt_idmap *idmap,
-                       const struct path *parentpath,
-                       umode_t mode, int open_flag, const struct cred *cred);
+struct file *kernel_tmpfile_open(struct mnt_idmap *idmap,
+                                const struct path *parentpath,
+                                umode_t mode, int open_flag,
+                                const struct cred *cred);
+struct file *kernel_file_open(const struct path *path, int flags,
+                             struct inode *inode, const struct cred *cred);
 
 int vfs_mkobj(struct dentry *, umode_t,
                int (*f)(struct dentry *, umode_t, void *),
@@ -1932,6 +1936,7 @@ struct super_operations {
                                  struct shrink_control *);
        long (*free_cached_objects)(struct super_block *,
                                    struct shrink_control *);
+       void (*shutdown)(struct super_block *sb);
 };
 
 /*
@@ -2349,11 +2354,31 @@ static inline struct file *file_open_root_mnt(struct vfsmount *mnt,
        return file_open_root(&(struct path){.mnt = mnt, .dentry = mnt->mnt_root},
                              name, flags, mode);
 }
-extern struct file * dentry_open(const struct path *, int, const struct cred *);
-extern struct file *dentry_create(const struct path *path, int flags,
-                                 umode_t mode, const struct cred *cred);
-extern struct file * open_with_fake_path(const struct path *, int,
-                                        struct inode*, const struct cred *);
+struct file *dentry_open(const struct path *path, int flags,
+                        const struct cred *creds);
+struct file *dentry_create(const struct path *path, int flags, umode_t mode,
+                          const struct cred *cred);
+struct file *backing_file_open(const struct path *path, int flags,
+                              const struct path *real_path,
+                              const struct cred *cred);
+struct path *backing_file_real_path(struct file *f);
+
+/*
+ * file_real_path - get the path corresponding to f_inode
+ *
+ * When opening a backing file for a stackable filesystem (e.g.,
+ * overlayfs) f_path may be on the stackable filesystem and f_inode on
+ * the underlying filesystem.  When the path associated with f_inode is
+ * needed, this helper should be used instead of accessing f_path
+ * directly.
+*/
+static inline const struct path *file_real_path(struct file *f)
+{
+       if (unlikely(f->f_mode & FMODE_BACKING))
+               return backing_file_real_path(f);
+       return &f->f_path;
+}
+
 static inline struct file *file_clone_open(struct file *file)
 {
        return dentry_open(&file->f_path, file->f_flags, file->f_cred);
@@ -2669,7 +2694,7 @@ extern void evict_inodes(struct super_block *sb);
 void dump_mapping(const struct address_space *);
 
 /*
- * Userspace may rely on the the inode number being non-zero. For example, glibc
+ * Userspace may rely on the inode number being non-zero. For example, glibc
  * simply ignores files with zero i_ino in unlink() and other places.
  *
  * As an additional complication, if userspace was compiled with
@@ -2752,11 +2777,9 @@ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
 ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
                            struct pipe_inode_info *pipe,
                            size_t len, unsigned int flags);
-ssize_t direct_splice_read(struct file *in, loff_t *ppos,
-                          struct pipe_inode_info *pipe,
-                          size_t len, unsigned int flags);
-extern ssize_t generic_file_splice_read(struct file *, loff_t *,
-               struct pipe_inode_info *, size_t, unsigned int);
+ssize_t copy_splice_read(struct file *in, loff_t *ppos,
+                        struct pipe_inode_info *pipe,
+                        size_t len, unsigned int flags);
 extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
                struct file *, loff_t *, size_t, unsigned int);
 extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
index bb8467c..ed48e4f 100644 (file)
@@ -91,11 +91,13 @@ static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 
 static inline int fsnotify_file(struct file *file, __u32 mask)
 {
-       const struct path *path = &file->f_path;
+       const struct path *path;
 
        if (file->f_mode & FMODE_NONOTIFY)
                return 0;
 
+       /* Overlayfs internal files have fake f_path */
+       path = file_real_path(file);
        return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
 }
 
index e76605d..1eb7eae 100644 (file)
@@ -143,8 +143,8 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *arg);
 
 int fsverity_ioctl_measure(struct file *filp, void __user *arg);
 int fsverity_get_digest(struct inode *inode,
-                       u8 digest[FS_VERITY_MAX_DIGEST_SIZE],
-                       enum hash_algo *alg);
+                       u8 raw_digest[FS_VERITY_MAX_DIGEST_SIZE],
+                       u8 *alg, enum hash_algo *halg);
 
 /* open.c */
 
@@ -197,10 +197,14 @@ static inline int fsverity_ioctl_measure(struct file *filp, void __user *arg)
 }
 
 static inline int fsverity_get_digest(struct inode *inode,
-                                     u8 digest[FS_VERITY_MAX_DIGEST_SIZE],
-                                     enum hash_algo *alg)
+                                     u8 raw_digest[FS_VERITY_MAX_DIGEST_SIZE],
+                                     u8 *alg, enum hash_algo *halg)
 {
-       return -EOPNOTSUPP;
+       /*
+        * fsverity is not enabled in the kernel configuration, so always report
+        * that the file doesn't have fsverity enabled (digest size 0).
+        */
+       return 0;
 }
 
 /* open.c */
index 5c6db55..67b8774 100644 (file)
@@ -252,6 +252,14 @@ struct gpio_irq_chip {
        bool initialized;
 
        /**
+        * @domain_is_allocated_externally:
+        *
+        * True it the irq_domain was allocated outside of gpiolib, in which
+        * case gpiolib won't free the irq_domain itself.
+        */
+       bool domain_is_allocated_externally;
+
+       /**
         * @init_hw: optional routine to initialize hardware before
         * an IRQ chip will be added. This is quite useful when
         * a particular driver wants to clear IRQ related registers
index 4de1dbc..68da306 100644 (file)
@@ -507,7 +507,7 @@ static inline void folio_zero_range(struct folio *folio,
        zero_user_segments(&folio->page, start, start + length, 0, 0);
 }
 
-static inline void put_and_unmap_page(struct page *page, void *addr)
+static inline void unmap_and_put_page(struct page *page, void *addr)
 {
        kunmap_local(addr);
        put_page(page);
index fc985e5..8de6b6e 100644 (file)
@@ -208,6 +208,7 @@ struct team {
        bool queue_override_enabled;
        struct list_head *qom_lists; /* array of queue override mapping lists */
        bool port_mtu_change_allowed;
+       bool notifier_ctx;
        struct {
                unsigned int count;
                unsigned int interval; /* in ms */
index 0f40f37..6ba7195 100644 (file)
@@ -637,6 +637,23 @@ static inline __be16 vlan_get_protocol(const struct sk_buff *skb)
        return __vlan_get_protocol(skb, skb->protocol, NULL);
 }
 
+/* This version of __vlan_get_protocol() also pulls mac header in skb->head */
+static inline __be16 vlan_get_protocol_and_depth(struct sk_buff *skb,
+                                                __be16 type, int *depth)
+{
+       int maclen;
+
+       type = __vlan_get_protocol(skb, type, &maclen);
+
+       if (type) {
+               if (!pskb_may_pull(skb, maclen))
+                       type = 0;
+               else if (depth)
+                       *depth = maclen;
+       }
+       return type;
+}
+
 /* A getter for the SKB protocol field which will handle VLAN tags consistently
  * whether VLAN acceleration is enabled or not.
  */
index dd64e54..9cb6c80 100644 (file)
@@ -135,7 +135,7 @@ static inline int iio_gts_find_int_time_by_sel(struct iio_gts *gts, int sel)
 /**
  * iio_gts_find_sel_by_int_time - find selector matching integration time
  * @gts:       Gain time scale descriptor
- * @gain:      HW-gain for which matching selector is searched for
+ * @time:      Integration time for which matching selector is searched for
  *
  * Return:     a selector matching given integration time or -EINVAL if
  *             selector was not found.
index 9f4b6f5..e6936cb 100644 (file)
 #include <linux/powercap.h>
 #include <linux/cpuhotplug.h>
 
+enum rapl_if_type {
+       RAPL_IF_MSR,    /* RAPL I/F using MSR registers */
+       RAPL_IF_MMIO,   /* RAPL I/F using MMIO registers */
+       RAPL_IF_TPMI,   /* RAPL I/F using TPMI registers */
+};
+
 enum rapl_domain_type {
        RAPL_DOMAIN_PACKAGE,    /* entire package/socket */
        RAPL_DOMAIN_PP0,        /* core power plane */
@@ -30,17 +36,23 @@ enum rapl_domain_reg_id {
        RAPL_DOMAIN_REG_POLICY,
        RAPL_DOMAIN_REG_INFO,
        RAPL_DOMAIN_REG_PL4,
+       RAPL_DOMAIN_REG_UNIT,
+       RAPL_DOMAIN_REG_PL2,
        RAPL_DOMAIN_REG_MAX,
 };
 
 struct rapl_domain;
 
 enum rapl_primitives {
-       ENERGY_COUNTER,
        POWER_LIMIT1,
        POWER_LIMIT2,
        POWER_LIMIT4,
+       ENERGY_COUNTER,
        FW_LOCK,
+       FW_HIGH_LOCK,
+       PL1_LOCK,
+       PL2_LOCK,
+       PL4_LOCK,
 
        PL1_ENABLE,             /* power limit 1, aka long term */
        PL1_CLAMP,              /* allow frequency to go below OS request */
@@ -74,12 +86,13 @@ struct rapl_domain_data {
        unsigned long timestamp;
 };
 
-#define NR_POWER_LIMITS (3)
+#define NR_POWER_LIMITS        (POWER_LIMIT4 + 1)
+
 struct rapl_power_limit {
        struct powercap_zone_constraint *constraint;
-       int prim_id;            /* primitive ID used to enable */
        struct rapl_domain *domain;
        const char *name;
+       bool locked;
        u64 last_power_limit;
 };
 
@@ -96,7 +109,9 @@ struct rapl_domain {
        struct rapl_power_limit rpl[NR_POWER_LIMITS];
        u64 attr_map;           /* track capabilities */
        unsigned int state;
-       unsigned int domain_energy_unit;
+       unsigned int power_unit;
+       unsigned int energy_unit;
+       unsigned int time_unit;
        struct rapl_package *rp;
 };
 
@@ -121,16 +136,20 @@ struct reg_action {
  *                             registers.
  * @write_raw:                 Callback for writing RAPL interface specific
  *                             registers.
+ * @defaults:                  internal pointer to interface default settings
+ * @rpi:                       internal pointer to interface primitive info
  */
 struct rapl_if_priv {
+       enum rapl_if_type type;
        struct powercap_control_type *control_type;
-       struct rapl_domain *platform_rapl_domain;
        enum cpuhp_state pcap_rapl_online;
        u64 reg_unit;
        u64 regs[RAPL_DOMAIN_MAX][RAPL_DOMAIN_REG_MAX];
        int limits[RAPL_DOMAIN_MAX];
-       int (*read_raw)(int cpu, struct reg_action *ra);
-       int (*write_raw)(int cpu, struct reg_action *ra);
+       int (*read_raw)(int id, struct reg_action *ra);
+       int (*write_raw)(int id, struct reg_action *ra);
+       void *defaults;
+       void *rpi;
 };
 
 /* maximum rapl package domain name: package-%d-die-%d */
@@ -140,9 +159,6 @@ struct rapl_package {
        unsigned int id;        /* logical die id, equals physical 1-die systems */
        unsigned int nr_domains;
        unsigned long domain_map;       /* bit map of active domains */
-       unsigned int power_unit;
-       unsigned int energy_unit;
-       unsigned int time_unit;
        struct rapl_domain *domains;    /* array of domains, sized at runtime */
        struct powercap_zone *power_zone;       /* keep track of parent zone */
        unsigned long power_limit_irq;  /* keep track of package power limit
@@ -156,8 +172,8 @@ struct rapl_package {
        struct rapl_if_priv *priv;
 };
 
-struct rapl_package *rapl_find_package_domain(int cpu, struct rapl_if_priv *priv);
-struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv);
+struct rapl_package *rapl_find_package_domain(int id, struct rapl_if_priv *priv, bool id_is_cpu);
+struct rapl_package *rapl_add_package(int id, struct rapl_if_priv *priv, bool id_is_cpu);
 void rapl_remove_package(struct rapl_package *rp);
 
 #endif /* __INTEL_RAPL_H__ */
index 308f4f0..7304f2a 100644 (file)
@@ -68,6 +68,11 @@ void *devm_memremap(struct device *dev, resource_size_t offset,
                size_t size, unsigned long flags);
 void devm_memunmap(struct device *dev, void *addr);
 
+/* architectures can override this */
+pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
+                                       unsigned long size, pgprot_t prot);
+
+
 #ifdef CONFIG_PCI
 /*
  * The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and
index 3399d97..bb9c666 100644 (file)
@@ -36,18 +36,33 @@ struct io_uring_cmd {
        u8              pdu[32]; /* available inline for free use */
 };
 
+static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe)
+{
+       return sqe->cmd;
+}
+
 #if defined(CONFIG_IO_URING)
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
                              struct iov_iter *iter, void *ioucmd);
 void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
                        unsigned issue_flags);
-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
-                       void (*task_work_cb)(struct io_uring_cmd *, unsigned));
 struct sock *io_uring_get_socket(struct file *file);
 void __io_uring_cancel(bool cancel_all);
 void __io_uring_free(struct task_struct *tsk);
 void io_uring_unreg_ringfd(void);
 const char *io_uring_get_opcode(u8 opcode);
+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
+                           void (*task_work_cb)(struct io_uring_cmd *, unsigned),
+                           unsigned flags);
+/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */
+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
+                       void (*task_work_cb)(struct io_uring_cmd *, unsigned));
+
+static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
+                       void (*task_work_cb)(struct io_uring_cmd *, unsigned))
+{
+       __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
+}
 
 static inline void io_uring_files_cancel(void)
 {
@@ -66,11 +81,6 @@ static inline void io_uring_free(struct task_struct *tsk)
        if (tsk->io_uring)
                __io_uring_free(tsk);
 }
-
-static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe)
-{
-       return sqe->cmd;
-}
 #else
 static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
                              struct iov_iter *iter, void *ioucmd)
@@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
                        void (*task_work_cb)(struct io_uring_cmd *, unsigned))
 {
 }
+static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
+                       void (*task_work_cb)(struct io_uring_cmd *, unsigned))
+{
+}
 static inline struct sock *io_uring_get_socket(struct file *file)
 {
        return NULL;
index 1b2a20a..f04ce51 100644 (file)
@@ -211,6 +211,16 @@ struct io_ring_ctx {
                unsigned int            compat: 1;
 
                enum task_work_notify_mode      notify_method;
+
+               /*
+                * If IORING_SETUP_NO_MMAP is used, then the below holds
+                * the gup'ed pages for the two rings, and the sqes.
+                */
+               unsigned short          n_ring_pages;
+               unsigned short          n_sqe_pages;
+               struct page             **ring_pages;
+               struct page             **sqe_pages;
+
                struct io_rings                 *rings;
                struct task_struct              *submitter_task;
                struct percpu_ref               refs;
index b1b28af..d8a6fdc 100644 (file)
@@ -223,32 +223,35 @@ struct irq_data {
  *                               irq_chip::irq_set_affinity() when deactivated.
  * IRQD_IRQ_ENABLED_ON_SUSPEND - Interrupt is enabled on suspend by irq pm if
  *                               irqchip have flag IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND set.
+ * IRQD_RESEND_WHEN_IN_PROGRESS        - Interrupt may fire when already in progress in which
+ *                               case it must be resent at the next available opportunity.
  */
 enum {
        IRQD_TRIGGER_MASK               = 0xf,
-       IRQD_SETAFFINITY_PENDING        = (1 <<  8),
-       IRQD_ACTIVATED                  = (1 <<  9),
-       IRQD_NO_BALANCING               = (1 << 10),
-       IRQD_PER_CPU                    = (1 << 11),
-       IRQD_AFFINITY_SET               = (1 << 12),
-       IRQD_LEVEL                      = (1 << 13),
-       IRQD_WAKEUP_STATE               = (1 << 14),
-       IRQD_MOVE_PCNTXT                = (1 << 15),
-       IRQD_IRQ_DISABLED               = (1 << 16),
-       IRQD_IRQ_MASKED                 = (1 << 17),
-       IRQD_IRQ_INPROGRESS             = (1 << 18),
-       IRQD_WAKEUP_ARMED               = (1 << 19),
-       IRQD_FORWARDED_TO_VCPU          = (1 << 20),
-       IRQD_AFFINITY_MANAGED           = (1 << 21),
-       IRQD_IRQ_STARTED                = (1 << 22),
-       IRQD_MANAGED_SHUTDOWN           = (1 << 23),
-       IRQD_SINGLE_TARGET              = (1 << 24),
-       IRQD_DEFAULT_TRIGGER_SET        = (1 << 25),
-       IRQD_CAN_RESERVE                = (1 << 26),
-       IRQD_MSI_NOMASK_QUIRK           = (1 << 27),
-       IRQD_HANDLE_ENFORCE_IRQCTX      = (1 << 28),
-       IRQD_AFFINITY_ON_ACTIVATE       = (1 << 29),
-       IRQD_IRQ_ENABLED_ON_SUSPEND     = (1 << 30),
+       IRQD_SETAFFINITY_PENDING        = BIT(8),
+       IRQD_ACTIVATED                  = BIT(9),
+       IRQD_NO_BALANCING               = BIT(10),
+       IRQD_PER_CPU                    = BIT(11),
+       IRQD_AFFINITY_SET               = BIT(12),
+       IRQD_LEVEL                      = BIT(13),
+       IRQD_WAKEUP_STATE               = BIT(14),
+       IRQD_MOVE_PCNTXT                = BIT(15),
+       IRQD_IRQ_DISABLED               = BIT(16),
+       IRQD_IRQ_MASKED                 = BIT(17),
+       IRQD_IRQ_INPROGRESS             = BIT(18),
+       IRQD_WAKEUP_ARMED               = BIT(19),
+       IRQD_FORWARDED_TO_VCPU          = BIT(20),
+       IRQD_AFFINITY_MANAGED           = BIT(21),
+       IRQD_IRQ_STARTED                = BIT(22),
+       IRQD_MANAGED_SHUTDOWN           = BIT(23),
+       IRQD_SINGLE_TARGET              = BIT(24),
+       IRQD_DEFAULT_TRIGGER_SET        = BIT(25),
+       IRQD_CAN_RESERVE                = BIT(26),
+       IRQD_MSI_NOMASK_QUIRK           = BIT(27),
+       IRQD_HANDLE_ENFORCE_IRQCTX      = BIT(28),
+       IRQD_AFFINITY_ON_ACTIVATE       = BIT(29),
+       IRQD_IRQ_ENABLED_ON_SUSPEND     = BIT(30),
+       IRQD_RESEND_WHEN_IN_PROGRESS    = BIT(31),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -448,6 +451,16 @@ static inline bool irqd_affinity_on_activate(struct irq_data *d)
        return __irqd_to_state(d) & IRQD_AFFINITY_ON_ACTIVATE;
 }
 
+static inline void irqd_set_resend_when_in_progress(struct irq_data *d)
+{
+       __irqd_to_state(d) |= IRQD_RESEND_WHEN_IN_PROGRESS;
+}
+
+static inline bool irqd_needs_resend_when_in_progress(struct irq_data *d)
+{
+       return __irqd_to_state(d) & IRQD_RESEND_WHEN_IN_PROGRESS;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
diff --git a/include/linux/irqchip/mmp.h b/include/linux/irqchip/mmp.h
deleted file mode 100644 (file)
index aa18137..0000000
+++ /dev/null
@@ -1,10 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef        __IRQCHIP_MMP_H
-#define        __IRQCHIP_MMP_H
-
-extern struct irq_chip icu_irq_chip;
-
-extern void icu_init_irq(void);
-extern void mmp2_init_icu(void);
-
-#endif /* __IRQCHIP_MMP_H */
diff --git a/include/linux/irqchip/mxs.h b/include/linux/irqchip/mxs.h
deleted file mode 100644 (file)
index 4f447e3..0000000
+++ /dev/null
@@ -1,11 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2013 Freescale Semiconductor, Inc.
- */
-
-#ifndef __LINUX_IRQCHIP_MXS_H
-#define __LINUX_IRQCHIP_MXS_H
-
-extern void icoll_handle_irq(struct pt_regs *);
-
-#endif
index 844a8e3..d9451d4 100644 (file)
@@ -102,6 +102,9 @@ struct irq_desc {
        int                     parent_irq;
        struct module           *owner;
        const char              *name;
+#ifdef CONFIG_HARDIRQS_SW_RESEND
+       struct hlist_node       resend_node;
+#endif
 } ____cacheline_internodealigned_in_smp;
 
 #ifdef CONFIG_SPARSE_IRQ
index 790e7fc..e274274 100644 (file)
  */
 extern phys_addr_t ibft_phys_addr;
 
+#ifdef CONFIG_ISCSI_IBFT_FIND
+
 /*
  * Routine used to find and reserve the iSCSI Boot Format Table. The
  * physical address is set in the ibft_phys_addr variable.
  */
-#ifdef CONFIG_ISCSI_IBFT_FIND
 void reserve_ibft_region(void);
+
+/*
+ * Physical bounds to search for the iSCSI Boot Format Table.
+ */
+#define IBFT_START 0x80000 /* 512kB */
+#define IBFT_END 0x100000 /* 1MB */
+
 #else
 static inline void reserve_ibft_region(void) {}
 #endif
index 4e968eb..f0a949b 100644 (file)
@@ -257,7 +257,7 @@ extern enum jump_label_type jump_label_init_type(struct jump_entry *entry);
 
 static __always_inline int static_key_count(struct static_key *key)
 {
-       return arch_atomic_read(&key->enabled);
+       return raw_atomic_read(&key->enabled);
 }
 
 static __always_inline void jump_label_init(void)
index 30e5bec..f1f95a7 100644 (file)
@@ -89,6 +89,7 @@ int kthread_stop(struct task_struct *k);
 bool kthread_should_stop(void);
 bool kthread_should_park(void);
 bool __kthread_should_park(struct task_struct *k);
+bool kthread_should_stop_or_park(void);
 bool kthread_freezable_should_stop(bool *was_frozen);
 void *kthread_func(struct task_struct *k);
 void *kthread_data(struct task_struct *k);
index 311cd93..dd5797f 100644 (file)
@@ -836,7 +836,7 @@ struct ata_port {
 
        struct mutex            scsi_scan_mutex;
        struct delayed_work     hotplug_task;
-       struct work_struct      scsi_rescan_task;
+       struct delayed_work     scsi_rescan_task;
 
        unsigned int            hsm_task_state;
 
index b32256e..310f859 100644 (file)
@@ -344,6 +344,16 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie);
 #define lockdep_repin_lock(l,c)        lock_repin_lock(&(l)->dep_map, (c))
 #define lockdep_unpin_lock(l,c)        lock_unpin_lock(&(l)->dep_map, (c))
 
+/*
+ * Must use lock_map_aquire_try() with override maps to avoid
+ * lockdep thinking they participate in the block chain.
+ */
+#define DEFINE_WAIT_OVERRIDE_MAP(_name, _wait_type)    \
+       struct lockdep_map _name = {                    \
+               .name = #_name "-wait-type-override",   \
+               .wait_type_inner = _wait_type,          \
+               .lock_type = LD_LOCK_WAIT_OVERRIDE, }
+
 #else /* !CONFIG_LOCKDEP */
 
 static inline void lockdep_init_task(struct task_struct *task)
@@ -432,8 +442,19 @@ extern int lockdep_is_held(const void *);
 #define lockdep_repin_lock(l, c)               do { (void)(l); (void)(c); } while (0)
 #define lockdep_unpin_lock(l, c)               do { (void)(l); (void)(c); } while (0)
 
+#define DEFINE_WAIT_OVERRIDE_MAP(_name, _wait_type)    \
+       struct lockdep_map __maybe_unused _name = {}
+
 #endif /* !LOCKDEP */
 
+#ifdef CONFIG_PROVE_LOCKING
+void lockdep_set_lock_cmp_fn(struct lockdep_map *, lock_cmp_fn, lock_print_fn);
+
+#define lock_set_cmp_fn(lock, ...)     lockdep_set_lock_cmp_fn(&(lock)->dep_map, __VA_ARGS__)
+#else
+#define lock_set_cmp_fn(lock, ...)     do { } while (0)
+#endif
+
 enum xhlock_context_t {
        XHLOCK_HARD,
        XHLOCK_SOFT,
@@ -556,6 +577,7 @@ do {                                                                        \
 #define rwsem_release(l, i)                    lock_release(l, i)
 
 #define lock_map_acquire(l)                    lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
+#define lock_map_acquire_try(l)                        lock_acquire_exclusive(l, 0, 1, NULL, _THIS_IP_)
 #define lock_map_acquire_read(l)               lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
 #define lock_map_acquire_tryread(l)            lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_)
 #define lock_map_release(l)                    lock_release(l, _THIS_IP_)
index d224308..2ebc323 100644 (file)
@@ -33,6 +33,7 @@ enum lockdep_wait_type {
 enum lockdep_lock_type {
        LD_LOCK_NORMAL = 0,     /* normal, catch all */
        LD_LOCK_PERCPU,         /* percpu */
+       LD_LOCK_WAIT_OVERRIDE,  /* annotation */
        LD_LOCK_MAX,
 };
 
@@ -84,6 +85,11 @@ struct lock_trace;
 
 #define LOCKSTAT_POINTS                4
 
+struct lockdep_map;
+typedef int (*lock_cmp_fn)(const struct lockdep_map *a,
+                          const struct lockdep_map *b);
+typedef void (*lock_print_fn)(const struct lockdep_map *map);
+
 /*
  * The lock-class itself. The order of the structure members matters.
  * reinit_class() zeroes the key member and all subsequent members.
@@ -109,6 +115,9 @@ struct lock_class {
        struct list_head                locks_after, locks_before;
 
        const struct lockdep_subclass_key *key;
+       lock_cmp_fn                     cmp_fn;
+       lock_print_fn                   print_fn;
+
        unsigned int                    subclass;
        unsigned int                    dep_gen_id;
 
index 8b9191a..bf74478 100644 (file)
@@ -168,7 +168,7 @@ static __always_inline u64 mul_u64_u32_shr(u64 a, u32 mul, unsigned int shift)
 #endif /* mul_u64_u32_shr */
 
 #ifndef mul_u64_u64_shr
-static inline u64 mul_u64_u64_shr(u64 a, u64 mul, unsigned int shift)
+static __always_inline u64 mul_u64_u64_shr(u64 a, u64 mul, unsigned int shift)
 {
        return (u64)(((unsigned __int128)a * mul) >> shift);
 }
index a4c4f73..4b9626c 100644 (file)
@@ -1093,6 +1093,7 @@ void mlx5_cmdif_debugfs_cleanup(struct mlx5_core_dev *dev);
 int mlx5_core_create_psv(struct mlx5_core_dev *dev, u32 pdn,
                         int npsvs, u32 *sig_index);
 int mlx5_core_destroy_psv(struct mlx5_core_dev *dev, int psv_num);
+__be32 mlx5_core_get_terminate_scatter_list_mkey(struct mlx5_core_dev *dev);
 void mlx5_core_put_rsc(struct mlx5_core_rsc_common *common);
 int mlx5_query_odp_caps(struct mlx5_core_dev *dev,
                        struct mlx5_odp_caps *odp_caps);
@@ -1237,6 +1238,18 @@ static inline u16 mlx5_core_max_vfs(const struct mlx5_core_dev *dev)
        return dev->priv.sriov.max_vfs;
 }
 
+static inline int mlx5_lag_is_lacp_owner(struct mlx5_core_dev *dev)
+{
+       /* LACP owner conditions:
+        * 1) Function is physical.
+        * 2) LAG is supported by FW.
+        * 3) LAG is managed by driver (currently the only option).
+        */
+       return  MLX5_CAP_GEN(dev, vport_group_manager) &&
+                  (MLX5_CAP_GEN(dev, num_lag_ports) > 1) &&
+                   MLX5_CAP_GEN(dev, lag_master);
+}
+
 static inline int mlx5_get_gid_table_len(u16 param)
 {
        if (param > 4) {
index dc5e2cb..b89778d 100644 (file)
@@ -1705,7 +1705,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
        u8         rc[0x1];
 
        u8         uar_4k[0x1];
-       u8         reserved_at_241[0x9];
+       u8         reserved_at_241[0x7];
+       u8         fl_rc_qp_when_roce_disabled[0x1];
+       u8         regexp_params[0x1];
        u8         uar_sz[0x6];
        u8         port_selection_cap[0x1];
        u8         reserved_at_248[0x1];
index 27ce770..fec1495 100644 (file)
@@ -1910,6 +1910,28 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
        return page_maybe_dma_pinned(page);
 }
 
+/**
+ * is_zero_page - Query if a page is a zero page
+ * @page: The page to query
+ *
+ * This returns true if @page is one of the permanent zero pages.
+ */
+static inline bool is_zero_page(const struct page *page)
+{
+       return is_zero_pfn(page_to_pfn(page));
+}
+
+/**
+ * is_zero_folio - Query if a folio is a zero page
+ * @folio: The folio to query
+ *
+ * This returns true if @folio is one of the permanent zero pages.
+ */
+static inline bool is_zero_folio(const struct folio *folio)
+{
+       return is_zero_page(&folio->page);
+}
+
 /* MIGRATE_CMA and ZONE_MOVABLE do not allow pin pages */
 #ifdef CONFIG_MIGRATION
 static inline bool is_longterm_pinnable_page(struct page *page)
@@ -1920,8 +1942,8 @@ static inline bool is_longterm_pinnable_page(struct page *page)
        if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
                return false;
 #endif
-       /* The zero page may always be pinned */
-       if (is_zero_pfn(page_to_pfn(page)))
+       /* The zero page can be "pinned" but gets special handling. */
+       if (is_zero_page(page))
                return true;
 
        /* Coherent device memory must always allow eviction. */
@@ -2383,6 +2405,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
                        unsigned int gup_flags, struct page **pages);
 int pin_user_pages_fast(unsigned long start, int nr_pages,
                        unsigned int gup_flags, struct page **pages);
+void folio_add_pin(struct folio *folio);
 
 int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
@@ -3816,4 +3839,23 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
 }
 #endif
 
+#ifdef CONFIG_UNACCEPTED_MEMORY
+
+bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end);
+void accept_memory(phys_addr_t start, phys_addr_t end);
+
+#else
+
+static inline bool range_contains_unaccepted_memory(phys_addr_t start,
+                                                   phys_addr_t end)
+{
+       return false;
+}
+
+static inline void accept_memory(phys_addr_t start, phys_addr_t end)
+{
+}
+
+#endif
+
 #endif /* _LINUX_MM_H */
index a4889c9..6c1c2fc 100644 (file)
@@ -143,6 +143,9 @@ enum zone_stat_item {
        NR_ZSPAGES,             /* allocated in zsmalloc */
 #endif
        NR_FREE_CMA_PAGES,
+#ifdef CONFIG_UNACCEPTED_MEMORY
+       NR_UNACCEPTED,
+#endif
        NR_VM_ZONE_STAT_ITEMS };
 
 enum node_stat_item {
@@ -910,6 +913,11 @@ struct zone {
        /* free areas of different sizes */
        struct free_area        free_area[MAX_ORDER + 1];
 
+#ifdef CONFIG_UNACCEPTED_MEMORY
+       /* Pages to be accepted. All pages on the list are MAX_ORDER */
+       struct list_head        unaccepted_pages;
+#endif
+
        /* zone flags, see below */
        unsigned long           flags;
 
index 1ea326c..4b81ea9 100644 (file)
@@ -107,7 +107,6 @@ extern struct vfsmount *vfs_submount(const struct dentry *mountpoint,
 extern void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list);
 extern void mark_mounts_for_expiry(struct list_head *mounts);
 
-extern dev_t name_to_dev_t(const char *name);
 extern bool path_is_mountpoint(const struct path *path);
 
 extern bool our_mnt(struct vfsmount *mnt);
index cdb14a1..a50ea79 100644 (file)
@@ -383,6 +383,13 @@ int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
 void arch_teardown_msi_irq(unsigned int irq);
 int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
 void arch_teardown_msi_irqs(struct pci_dev *dev);
+#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
+
+/*
+ * Xen uses non-default msi_domain_ops and hence needs a way to populate sysfs
+ * entries of MSI IRQs.
+ */
+#if defined(CONFIG_PCI_XEN) || defined(CONFIG_PCI_MSI_ARCH_FALLBACKS)
 #ifdef CONFIG_SYSFS
 int msi_device_populate_sysfs(struct device *dev);
 void msi_device_destroy_sysfs(struct device *dev);
@@ -390,7 +397,7 @@ void msi_device_destroy_sysfs(struct device *dev);
 static inline int msi_device_populate_sysfs(struct device *dev) { return 0; }
 static inline void msi_device_destroy_sysfs(struct device *dev) { }
 #endif /* !CONFIG_SYSFS */
-#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
+#endif /* CONFIG_PCI_XEN || CONFIG_PCI_MSI_ARCH_FALLBACKS */
 
 /*
  * The restore hook is still available even for fully irq domain based
index 15cc9b9..6e47143 100644 (file)
@@ -34,7 +34,7 @@ struct mtd_blktrans_dev {
        struct blk_mq_tag_set *tag_set;
        spinlock_t queue_lock;
        void *priv;
-       fmode_t file_mode;
+       bool writable;
 };
 
 struct mtd_blktrans_ops {
index 08fbd46..c2f0c60 100644 (file)
@@ -620,7 +620,7 @@ struct netdev_queue {
        netdevice_tracker       dev_tracker;
 
        struct Qdisc __rcu      *qdisc;
-       struct Qdisc            *qdisc_sleeping;
+       struct Qdisc __rcu      *qdisc_sleeping;
 #ifdef CONFIG_SYSFS
        struct kobject          kobj;
 #endif
@@ -768,8 +768,11 @@ static inline void rps_record_sock_flow(struct rps_sock_flow_table *table,
                /* We only give a hint, preemption can change CPU under us */
                val |= raw_smp_processor_id();
 
-               if (table->ents[index] != val)
-                       table->ents[index] = val;
+               /* The following WRITE_ONCE() is paired with the READ_ONCE()
+                * here, and another one in get_rps_cpu().
+                */
+               if (READ_ONCE(table->ents[index]) != val)
+                       WRITE_ONCE(table->ents[index], val);
        }
 }
 
index 2aba751..8654470 100644 (file)
@@ -106,12 +106,22 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh);
 #define RAW_NOTIFIER_INIT(name)        {                               \
                .head = NULL }
 
+#ifdef CONFIG_TREE_SRCU
 #define SRCU_NOTIFIER_INIT(name, pcpu)                         \
        {                                                       \
                .mutex = __MUTEX_INITIALIZER(name.mutex),       \
                .head = NULL,                                   \
+               .srcuu = __SRCU_USAGE_INIT(name.srcuu),         \
                .srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \
        }
+#else
+#define SRCU_NOTIFIER_INIT(name, pcpu)                         \
+       {                                                       \
+               .mutex = __MUTEX_INITIALIZER(name.mutex),       \
+               .head = NULL,                                   \
+               .srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \
+       }
+#endif
 
 #define ATOMIC_NOTIFIER_HEAD(name)                             \
        struct atomic_notifier_head name =                      \
index 392fc6c..bdcd85e 100644 (file)
@@ -93,6 +93,7 @@ extern struct bus_type nubus_bus_type;
 
 /* Generic NuBus interface functions, modelled after the PCI interface */
 #ifdef CONFIG_PROC_FS
+extern bool nubus_populate_procfs;
 void nubus_proc_init(void);
 struct proc_dir_entry *nubus_proc_add_board(struct nubus_board *board);
 struct proc_dir_entry *nubus_proc_add_rsrc_dir(struct proc_dir_entry *procdir,
index fa092b9..4109f1b 100644 (file)
@@ -185,7 +185,6 @@ enum nvmefc_fcp_datadir {
  * @first_sgl: memory for 1st scatter/gather list segment for payload data
  * @sg_cnt:    number of elements in the scatter/gather list
  * @io_dir:    direction of the FCP request (see NVMEFC_FCP_xxx)
- * @sqid:      The nvme SQID the command is being issued on
  * @done:      The callback routine the LLDD is to invoke upon completion of
  *             the FCP operation. req argument is the pointer to the original
  *             FCP IO operation.
@@ -194,12 +193,13 @@ enum nvmefc_fcp_datadir {
  *             while processing the operation. The length of the buffer
  *             corresponds to the fcprqst_priv_sz value specified in the
  *             nvme_fc_port_template supplied by the LLDD.
+ * @sqid:      The nvme SQID the command is being issued on
  *
  * Values set by the LLDD indicating completion status of the FCP operation.
  * Must be set prior to calling the done() callback.
+ * @rcv_rsplen: length, in bytes, of the FCP RSP IU received.
  * @transferred_length: amount of payload data, in bytes, that were
  *             transferred. Should equal payload_length on success.
- * @rcv_rsplen: length, in bytes, of the FCP RSP IU received.
  * @status:    Completion status of the FCP operation. must be 0 upon success,
  *             negative errno value upon failure (ex: -EIO). Note: this is
  *             NOT a reflection of the NVME CQE completion status. Only the
@@ -219,14 +219,14 @@ struct nvmefc_fcp_req {
        int                     sg_cnt;
        enum nvmefc_fcp_datadir io_dir;
 
-       __le16                  sqid;
-
        void (*done)(struct nvmefc_fcp_req *req);
 
        void                    *private;
 
-       u32                     transferred_length;
+       __le16                  sqid;
+
        u16                     rcv_rsplen;
+       u32                     transferred_length;
        u32                     status;
 } __aligned(sizeof(u64));      /* alignment for other things alloc'd with */
 
index c460236..3c2891d 100644 (file)
@@ -56,6 +56,8 @@ extern int olpc_ec_sci_query(u16 *sci_value);
 
 extern bool olpc_ec_wakeup_available(void);
 
+asmlinkage int xo1_do_sleep(u8 sleep_state);
+
 #else
 
 static inline int olpc_ec_cmd(u8 cmd, u8 *inbuf, size_t inlen, u8 *outbuf,
index 1c68d67..92a2063 100644 (file)
@@ -617,6 +617,12 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
  * Please note that, confusingly, "page_mapping" refers to the inode
  * address_space which maps the page from disk; whereas "page_mapped"
  * refers to user virtual address space into which the page is mapped.
+ *
+ * For slab pages, since slab reuses the bits in struct page to store its
+ * internal states, the page->mapping does not exist as such, nor do these
+ * flags below.  So in order to avoid testing non-existent bits, please
+ * make sure that PageSlab(page) actually evaluates to false before calling
+ * the following functions (e.g., PageAnon).  See mm/slab.h.
  */
 #define PAGE_MAPPING_ANON      0x1
 #define PAGE_MAPPING_MOVABLE   0x2
index 45c3d62..a99b1fc 100644 (file)
 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
 #define PCI_DEVICE_ID_AMD_19H_M60H_DF_F3 0x14e3
 #define PCI_DEVICE_ID_AMD_19H_M70H_DF_F3 0x14f3
+#define PCI_DEVICE_ID_AMD_19H_M78H_DF_F3 0x12fb
+#define PCI_DEVICE_ID_AMD_MI200_DF_F3  0x14d3
 #define PCI_DEVICE_ID_AMD_CNB17H_F3    0x1703
 #define PCI_DEVICE_ID_AMD_LANCE                0x2000
 #define PCI_DEVICE_ID_AMD_LANCE_HOME   0x2001
index 5e1e115..fdf9c95 100644 (file)
 #include <linux/types.h>
 
 /*
- * Linux EFI stub v1.0 adds the following functionality:
- * - Loading initrd from the LINUX_EFI_INITRD_MEDIA_GUID device path,
- * - Loading/starting the kernel from firmware that targets a different
- *   machine type, via the entrypoint exposed in the .compat PE/COFF section.
+ * Starting from version v3.0, the major version field should be interpreted as
+ * a bit mask of features supported by the kernel's EFI stub:
+ * - 0x1: initrd loading from the LINUX_EFI_INITRD_MEDIA_GUID device path,
+ * - 0x2: initrd loading using the initrd= command line option, where the file
+ *        may be specified using device path notation, and is not required to
+ *        reside on the same volume as the loaded kernel image.
  *
  * The recommended way of loading and starting v1.0 or later kernels is to use
  * the LoadImage() and StartImage() EFI boot services, and expose the initrd
  * via the LINUX_EFI_INITRD_MEDIA_GUID device path.
  *
- * Versions older than v1.0 support initrd loading via the image load options
- * (using initrd=, limited to the volume from which the kernel itself was
- * loaded), or via arch specific means (bootparams, DT, etc).
+ * Versions older than v1.0 may support initrd loading via the image load
+ * options (using initrd=, limited to the volume from which the kernel itself
+ * was loaded), or only via arch specific means (bootparams, DT, etc).
  *
- * On x86, LoadImage() and StartImage() can be omitted if the EFI handover
- * protocol is implemented, which can be inferred from the version,
- * handover_offset and xloadflags fields in the bootparams structure.
+ * The minor version field must remain 0x0.
+ * (https://lore.kernel.org/all/efd6f2d4-547c-1378-1faa-53c044dbd297@gmail.com/)
  */
-#define LINUX_EFISTUB_MAJOR_VERSION            0x1
-#define LINUX_EFISTUB_MINOR_VERSION            0x1
+#define LINUX_EFISTUB_MAJOR_VERSION            0x3
+#define LINUX_EFISTUB_MINOR_VERSION            0x0
 
 /*
  * LINUX_PE_MAGIC appears at offset 0x38 into the MS-DOS header of EFI bootable
index e60727b..ec35731 100644 (file)
@@ -343,31 +343,19 @@ static __always_inline void __this_cpu_preempt_check(const char *op) { }
        pscr2_ret__;                                                    \
 })
 
-/*
- * Special handling for cmpxchg_double.  cmpxchg_double is passed two
- * percpu variables.  The first has to be aligned to a double word
- * boundary and the second has to follow directly thereafter.
- * We enforce this on all architectures even if they don't support
- * a double cmpxchg instruction, since it's a cheap requirement, and it
- * avoids breaking the requirement for architectures with the instruction.
- */
-#define __pcpu_double_call_return_bool(stem, pcp1, pcp2, ...)          \
+#define __pcpu_size_call_return2bool(stem, variable, ...)              \
 ({                                                                     \
-       bool pdcrb_ret__;                                               \
-       __verify_pcpu_ptr(&(pcp1));                                     \
-       BUILD_BUG_ON(sizeof(pcp1) != sizeof(pcp2));                     \
-       VM_BUG_ON((unsigned long)(&(pcp1)) % (2 * sizeof(pcp1)));       \
-       VM_BUG_ON((unsigned long)(&(pcp2)) !=                           \
-                 (unsigned long)(&(pcp1)) + sizeof(pcp1));             \
-       switch(sizeof(pcp1)) {                                          \
-       case 1: pdcrb_ret__ = stem##1(pcp1, pcp2, __VA_ARGS__); break;  \
-       case 2: pdcrb_ret__ = stem##2(pcp1, pcp2, __VA_ARGS__); break;  \
-       case 4: pdcrb_ret__ = stem##4(pcp1, pcp2, __VA_ARGS__); break;  \
-       case 8: pdcrb_ret__ = stem##8(pcp1, pcp2, __VA_ARGS__); break;  \
+       bool pscr2_ret__;                                               \
+       __verify_pcpu_ptr(&(variable));                                 \
+       switch(sizeof(variable)) {                                      \
+       case 1: pscr2_ret__ = stem##1(variable, __VA_ARGS__); break;    \
+       case 2: pscr2_ret__ = stem##2(variable, __VA_ARGS__); break;    \
+       case 4: pscr2_ret__ = stem##4(variable, __VA_ARGS__); break;    \
+       case 8: pscr2_ret__ = stem##8(variable, __VA_ARGS__); break;    \
        default:                                                        \
                __bad_size_call_parameter(); break;                     \
        }                                                               \
-       pdcrb_ret__;                                                    \
+       pscr2_ret__;                                                    \
 })
 
 #define __pcpu_size_call(stem, variable, ...)                          \
@@ -426,9 +414,8 @@ do {                                                                        \
 #define raw_cpu_xchg(pcp, nval)                __pcpu_size_call_return2(raw_cpu_xchg_, pcp, nval)
 #define raw_cpu_cmpxchg(pcp, oval, nval) \
        __pcpu_size_call_return2(raw_cpu_cmpxchg_, pcp, oval, nval)
-#define raw_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       __pcpu_double_call_return_bool(raw_cpu_cmpxchg_double_, pcp1, pcp2, oval1, oval2, nval1, nval2)
-
+#define raw_cpu_try_cmpxchg(pcp, ovalp, nval) \
+       __pcpu_size_call_return2bool(raw_cpu_try_cmpxchg_, pcp, ovalp, nval)
 #define raw_cpu_sub(pcp, val)          raw_cpu_add(pcp, -(val))
 #define raw_cpu_inc(pcp)               raw_cpu_add(pcp, 1)
 #define raw_cpu_dec(pcp)               raw_cpu_sub(pcp, 1)
@@ -488,11 +475,6 @@ do {                                                                       \
        raw_cpu_cmpxchg(pcp, oval, nval);                               \
 })
 
-#define __this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-({     __this_cpu_preempt_check("cmpxchg_double");                     \
-       raw_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2); \
-})
-
 #define __this_cpu_sub(pcp, val)       __this_cpu_add(pcp, -(typeof(pcp))(val))
 #define __this_cpu_inc(pcp)            __this_cpu_add(pcp, 1)
 #define __this_cpu_dec(pcp)            __this_cpu_sub(pcp, 1)
@@ -513,9 +495,8 @@ do {                                                                        \
 #define this_cpu_xchg(pcp, nval)       __pcpu_size_call_return2(this_cpu_xchg_, pcp, nval)
 #define this_cpu_cmpxchg(pcp, oval, nval) \
        __pcpu_size_call_return2(this_cpu_cmpxchg_, pcp, oval, nval)
-#define this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) \
-       __pcpu_double_call_return_bool(this_cpu_cmpxchg_double_, pcp1, pcp2, oval1, oval2, nval1, nval2)
-
+#define this_cpu_try_cmpxchg(pcp, ovalp, nval) \
+       __pcpu_size_call_return2bool(this_cpu_try_cmpxchg_, pcp, ovalp, nval)
 #define this_cpu_sub(pcp, val)         this_cpu_add(pcp, -(typeof(pcp))(val))
 #define this_cpu_inc(pcp)              this_cpu_add(pcp, 1)
 #define this_cpu_dec(pcp)              this_cpu_sub(pcp, 1)
index 525b5d6..c0e4baf 100644 (file)
  */
 #define ARMPMU_EVT_64BIT               0x00001 /* Event uses a 64bit counter */
 #define ARMPMU_EVT_47BIT               0x00002 /* Event uses a 47bit counter */
+#define ARMPMU_EVT_63BIT               0x00004 /* Event uses a 63bit counter */
 
 static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_64BIT) == ARMPMU_EVT_64BIT);
 static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_47BIT) == ARMPMU_EVT_47BIT);
+static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_63BIT) == ARMPMU_EVT_63BIT);
 
 #define HW_OP_UNSUPPORTED              0xFFFF
 #define C(_x)                          PERF_COUNT_HW_CACHE_##_x
index d5628a7..b528be0 100644 (file)
@@ -295,6 +295,8 @@ struct perf_event_pmu_context;
 
 struct perf_output_handle;
 
+#define PMU_NULL_DEV   ((void *)(~0UL))
+
 /**
  * struct pmu - generic performance monitoring unit
  */
@@ -827,6 +829,14 @@ struct perf_event {
        void *security;
 #endif
        struct list_head                sb_list;
+
+       /*
+        * Certain events gets forwarded to another pmu internally by over-
+        * writing kernel copy of event->attr.type without user being aware
+        * of it. event->orig_type contains original 'type' requested by
+        * user.
+        */
+       __u32                           orig_type;
 #endif /* CONFIG_PERF_EVENTS */
 };
 
@@ -1845,9 +1855,9 @@ int perf_event_exit_cpu(unsigned int cpu);
 #define perf_event_exit_cpu    NULL
 #endif
 
-extern void __weak arch_perf_update_userpage(struct perf_event *event,
-                                            struct perf_event_mmap_page *userpg,
-                                            u64 now);
+extern void arch_perf_update_userpage(struct perf_event *event,
+                                     struct perf_event_mmap_page *userpg,
+                                     u64 now);
 
 #ifdef CONFIG_MMU
 extern __weak u64 arch_perf_get_page_size(struct mm_struct *mm, unsigned long addr);
index c5a0dc8..6478838 100644 (file)
@@ -1900,10 +1900,8 @@ void phy_package_leave(struct phy_device *phydev);
 int devm_phy_package_join(struct device *dev, struct phy_device *phydev,
                          int addr, size_t priv_size);
 
-#if IS_ENABLED(CONFIG_PHYLIB)
 int __init mdio_bus_init(void);
 void mdio_bus_exit(void);
-#endif
 
 int phy_ethtool_get_strings(struct phy_device *phydev, u8 *data);
 int phy_ethtool_get_sset_count(struct phy_device *phydev);
index d2c3f16..02e0086 100644 (file)
@@ -261,18 +261,14 @@ void generic_pipe_buf_release(struct pipe_inode_info *, struct pipe_buffer *);
 
 extern const struct pipe_buf_operations nosteal_pipe_buf_ops;
 
-#ifdef CONFIG_WATCH_QUEUE
 unsigned long account_pipe_buffers(struct user_struct *user,
                                   unsigned long old, unsigned long new);
 bool too_many_pipe_buffers_soft(unsigned long user_bufs);
 bool too_many_pipe_buffers_hard(unsigned long user_bufs);
 bool pipe_is_unprivileged_user(void);
-#endif
 
 /* for F_SETPIPE_SZ and F_GETPIPE_SZ */
-#ifdef CONFIG_WATCH_QUEUE
 int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots);
-#endif
 long pipe_fcntl(struct file *, unsigned int, unsigned long arg);
 struct pipe_inode_info *get_pipe_info(struct file *file, bool for_splice);
 
index f9c5ac8..80cb00d 100644 (file)
@@ -156,7 +156,6 @@ struct pktcdvd_device
 {
        struct block_device     *bdev;          /* dev attached */
        dev_t                   pkt_dev;        /* our dev */
-       char                    name[20];
        struct packet_settings  settings;
        struct packet_stats     stats;
        int                     refcnt;         /* Open count */
index a1aa681..7c8d654 100644 (file)
@@ -2,6 +2,8 @@
 #ifndef __LINUX_BQ27X00_BATTERY_H__
 #define __LINUX_BQ27X00_BATTERY_H__
 
+#include <linux/power_supply.h>
+
 enum bq27xxx_chip {
        BQ27000 = 1, /* bq27000, bq27200 */
        BQ27010, /* bq27010, bq27210 */
@@ -68,7 +70,9 @@ struct bq27xxx_device_info {
        struct bq27xxx_access_methods bus;
        struct bq27xxx_reg_cache cache;
        int charge_design_full;
+       bool removed;
        unsigned long last_update;
+       union power_supply_propval last_status;
        struct delayed_work work;
        struct power_supply *bat;
        struct list_head list;
index 0260f5e..253f267 100644 (file)
@@ -158,6 +158,8 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
                        struct pid *pid, struct task_struct *task);
 #endif /* CONFIG_PROC_PID_ARCH_STATUS */
 
+void arch_report_meminfo(struct seq_file *m);
+
 #else /* CONFIG_PROC_FS */
 
 static inline void proc_root_init(void)
index 3d1a9e7..6a0999c 100644 (file)
@@ -206,7 +206,7 @@ latch_tree_find(void *key, struct latch_tree_root *root,
        do {
                seq = raw_read_seqcount_latch(&root->seq);
                node = __lt_find(key, root, seq & 1, ops->comp);
-       } while (read_seqcount_latch_retry(&root->seq, seq));
+       } while (raw_read_seqcount_latch_retry(&root->seq, seq));
 
        return node;
 }
index dcd2cf1..7d9c2a6 100644 (file)
@@ -156,31 +156,6 @@ static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; }
 static inline void rcu_nocb_flush_deferred_wakeup(void) { }
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
-/**
- * RCU_NONIDLE - Indicate idle-loop code that needs RCU readers
- * @a: Code that RCU needs to pay attention to.
- *
- * RCU read-side critical sections are forbidden in the inner idle loop,
- * that is, between the ct_idle_enter() and the ct_idle_exit() -- RCU
- * will happily ignore any such read-side critical sections.  However,
- * things like powertop need tracepoints in the inner idle loop.
- *
- * This macro provides the way out:  RCU_NONIDLE(do_something_with_RCU())
- * will tell RCU that it needs to pay attention, invoke its argument
- * (in this example, calling the do_something_with_RCU() function),
- * and then tell RCU to go back to ignoring this CPU.  It is permissible
- * to nest RCU_NONIDLE() wrappers, but not indefinitely (but the limit is
- * on the order of a million or so, even on 32-bit systems).  It is
- * not legal to block within RCU_NONIDLE(), nor is it permissible to
- * transfer control either into or out of RCU_NONIDLE()'s statement.
- */
-#define RCU_NONIDLE(a) \
-       do { \
-               ct_irq_enter_irqson(); \
-               do { a; } while (0); \
-               ct_irq_exit_irqson(); \
-       } while (0)
-
 /*
  * Note a quasi-voluntary context switch for RCU-tasks's benefit.
  * This is a macro rather than an inline function to avoid #include hell.
@@ -957,9 +932,8 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
 
 /**
  * kfree_rcu() - kfree an object after a grace period.
- * @ptr: pointer to kfree for both single- and double-argument invocations.
- * @rhf: the name of the struct rcu_head within the type of @ptr,
- *       but only for double-argument invocations.
+ * @ptr: pointer to kfree for double-argument invocations.
+ * @rhf: the name of the struct rcu_head within the type of @ptr.
  *
  * Many rcu callbacks functions just call kfree() on the base structure.
  * These functions are trivial, but their size adds up, and furthermore
@@ -984,26 +958,18 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * The BUILD_BUG_ON check must not involve any function calls, hence the
  * checks are done in macros here.
  */
-#define kfree_rcu(ptr, rhf...) kvfree_rcu(ptr, ## rhf)
+#define kfree_rcu(ptr, rhf) kvfree_rcu_arg_2(ptr, rhf)
+#define kvfree_rcu(ptr, rhf) kvfree_rcu_arg_2(ptr, rhf)
 
 /**
- * kvfree_rcu() - kvfree an object after a grace period.
- *
- * This macro consists of one or two arguments and it is
- * based on whether an object is head-less or not. If it
- * has a head then a semantic stays the same as it used
- * to be before:
- *
- *     kvfree_rcu(ptr, rhf);
- *
- * where @ptr is a pointer to kvfree(), @rhf is the name
- * of the rcu_head structure within the type of @ptr.
+ * kfree_rcu_mightsleep() - kfree an object after a grace period.
+ * @ptr: pointer to kfree for single-argument invocations.
  *
  * When it comes to head-less variant, only one argument
  * is passed and that is just a pointer which has to be
  * freed after a grace period. Therefore the semantic is
  *
- *     kvfree_rcu(ptr);
+ *     kfree_rcu_mightsleep(ptr);
  *
  * where @ptr is the pointer to be freed by kvfree().
  *
@@ -1012,13 +978,9 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * annotation. Otherwise, please switch and embed the
  * rcu_head structure within the type of @ptr.
  */
-#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__,          \
-       kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
-
+#define kfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
 #define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
-#define kfree_rcu_mightsleep(ptr) kvfree_rcu_mightsleep(ptr)
 
-#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
 #define kvfree_rcu_arg_2(ptr, rhf)                                     \
 do {                                                                   \
        typeof (ptr) ___p = (ptr);                                      \
index 3c01c2b..505c908 100644 (file)
@@ -196,11 +196,11 @@ enum {
 
 /* PCA9450_REG_LDO3_VOLT bits */
 #define LDO3_EN_MASK                   0xC0
-#define LDO3OUT_MASK                   0x0F
+#define LDO3OUT_MASK                   0x1F
 
 /* PCA9450_REG_LDO4_VOLT bits */
 #define LDO4_EN_MASK                   0xC0
-#define LDO4OUT_MASK                   0x0F
+#define LDO4OUT_MASK                   0x1F
 
 /* PCA9450_REG_LDO5_VOLT bits */
 #define LDO5L_EN_MASK                  0xC0
index 4e78651..847c9a0 100644 (file)
@@ -9,15 +9,8 @@
 enum {
        Root_NFS = MKDEV(UNNAMED_MAJOR, 255),
        Root_CIFS = MKDEV(UNNAMED_MAJOR, 254),
+       Root_Generic = MKDEV(UNNAMED_MAJOR, 253),
        Root_RAM0 = MKDEV(RAMDISK_MAJOR, 0),
-       Root_RAM1 = MKDEV(RAMDISK_MAJOR, 1),
-       Root_FD0 = MKDEV(FLOPPY_MAJOR, 0),
-       Root_HDA1 = MKDEV(IDE0_MAJOR, 1),
-       Root_HDA2 = MKDEV(IDE0_MAJOR, 2),
-       Root_SDA1 = MKDEV(SCSI_DISK0_MAJOR, 1),
-       Root_SDA2 = MKDEV(SCSI_DISK0_MAJOR, 2),
-       Root_HDC1 = MKDEV(IDE1_MAJOR, 1),
-       Root_SR0 = MKDEV(SCSI_CDROM_MAJOR, 0),
 };
 
 extern dev_t ROOT_DEV;
index eed5d65..b0011c5 100644 (file)
@@ -1852,7 +1852,9 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags)
 }
 
 extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
-extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus);
+extern int task_can_attach(struct task_struct *p);
+extern int dl_bw_alloc(int cpu, u64 dl_bw);
+extern void dl_bw_free(int cpu, u64 dl_bw);
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
@@ -2006,15 +2008,12 @@ static __always_inline void scheduler_ipi(void)
         */
        preempt_fold_need_resched();
 }
-extern unsigned long wait_task_inactive(struct task_struct *, unsigned int match_state);
 #else
 static inline void scheduler_ipi(void) { }
-static inline unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
-{
-       return 1;
-}
 #endif
 
+extern unsigned long wait_task_inactive(struct task_struct *, unsigned int match_state);
+
 /*
  * Set thread flags in other task's structures.
  * See asm/thread_info.h for TIF_xxxx flags available:
index ca008f7..196f0ca 100644 (file)
  *
  * Please use one of the three interfaces below.
  */
-extern unsigned long long notrace sched_clock(void);
+extern u64 sched_clock(void);
+
+#if defined(CONFIG_ARCH_WANTS_NO_INSTR) || defined(CONFIG_GENERIC_SCHED_CLOCK)
+extern u64 sched_clock_noinstr(void);
+#else
+static __always_inline u64 sched_clock_noinstr(void)
+{
+       return sched_clock();
+}
+#endif
 
 /*
  * See the comment in kernel/sched/clock.c
@@ -45,6 +54,11 @@ static inline u64 cpu_clock(int cpu)
        return sched_clock();
 }
 
+static __always_inline u64 local_clock_noinstr(void)
+{
+       return sched_clock_noinstr();
+}
+
 static __always_inline u64 local_clock(void)
 {
        return sched_clock();
@@ -79,6 +93,7 @@ static inline u64 cpu_clock(int cpu)
        return sched_clock_cpu(cpu);
 }
 
+extern u64 local_clock_noinstr(void);
 extern u64 local_clock(void);
 
 #endif
index 57bde66..fad77b5 100644 (file)
@@ -132,12 +132,9 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 /*
  * Place busy tasks earlier in the domain
  *
- * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
- *               up, but currently assumed to be set from the base domain
- *               upwards (see update_top_cache_domain()).
  * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
+SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS)
 
 /*
  * Prefer to place tasks in a sibling domain
index 2009926..669e8cf 100644 (file)
@@ -135,7 +135,7 @@ struct signal_struct {
 #ifdef CONFIG_POSIX_TIMERS
 
        /* POSIX.1b Interval Timers */
-       int                     posix_timer_id;
+       unsigned int            next_posix_timer_id;
        struct list_head        posix_timers;
 
        /* ITIMER_REAL timer for the process */
index 537cbf9..e0f5ac9 100644 (file)
@@ -29,7 +29,6 @@ struct kernel_clone_args {
        u32 io_thread:1;
        u32 user_worker:1;
        u32 no_files:1;
-       u32 ignore_signals:1;
        unsigned long stack;
        unsigned long stack_size;
        unsigned long tls;
index 816df6c..67b573d 100644 (file)
@@ -203,7 +203,7 @@ struct sched_domain_topology_level {
 #endif
 };
 
-extern void set_sched_topology(struct sched_domain_topology_level *tl);
+extern void __init set_sched_topology(struct sched_domain_topology_level *tl);
 
 #ifdef CONFIG_SCHED_DEBUG
 # define SD_INIT_NAME(type)            .name = #type
index 6123c10..837a236 100644 (file)
@@ -2,22 +2,13 @@
 #ifndef _LINUX_VHOST_TASK_H
 #define _LINUX_VHOST_TASK_H
 
-#include <linux/completion.h>
 
-struct task_struct;
+struct vhost_task;
 
-struct vhost_task {
-       int (*fn)(void *data);
-       void *data;
-       struct completion exited;
-       unsigned long flags;
-       struct task_struct *task;
-};
-
-struct vhost_task *vhost_task_create(int (*fn)(void *), void *arg,
+struct vhost_task *vhost_task_create(bool (*fn)(void *), void *arg,
                                     const char *name);
 void vhost_task_start(struct vhost_task *vtsk);
 void vhost_task_stop(struct vhost_task *vtsk);
-bool vhost_task_should_stop(struct vhost_task *vtsk);
+void vhost_task_wake(struct vhost_task *vtsk);
 
 #endif
index 3926e90..987a59d 100644 (file)
@@ -671,9 +671,9 @@ typedef struct {
  *
  * Return: sequence counter raw value. Use the lowest bit as an index for
  * picking which data copy to read. The full counter must then be checked
- * with read_seqcount_latch_retry().
+ * with raw_read_seqcount_latch_retry().
  */
-static inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
+static __always_inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
 {
        /*
         * Pairs with the first smp_wmb() in raw_write_seqcount_latch().
@@ -683,16 +683,17 @@ static inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
 }
 
 /**
- * read_seqcount_latch_retry() - end a seqcount_latch_t read section
+ * raw_read_seqcount_latch_retry() - end a seqcount_latch_t read section
  * @s:         Pointer to seqcount_latch_t
  * @start:     count, from raw_read_seqcount_latch()
  *
  * Return: true if a read section retry is required, else false
  */
-static inline int
-read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
+static __always_inline int
+raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 {
-       return read_seqcount_retry(&s->seqcount, start);
+       smp_rmb();
+       return unlikely(READ_ONCE(s->seqcount.sequence) != start);
 }
 
 /**
@@ -752,7 +753,7 @@ read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *                     entry = data_query(latch->data[idx], ...);
  *
  *             // This includes needed smp_rmb()
- *             } while (read_seqcount_latch_retry(&latch->seq, seq));
+ *             } while (raw_read_seqcount_latch_retry(&latch->seq, seq));
  *
  *             return entry;
  *     }
index 7bde8e1..224293b 100644 (file)
@@ -107,7 +107,10 @@ extern void synchronize_shrinkers(void);
 
 #ifdef CONFIG_SHRINKER_DEBUG
 extern int shrinker_debugfs_add(struct shrinker *shrinker);
-extern struct dentry *shrinker_debugfs_remove(struct shrinker *shrinker);
+extern struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
+                                             int *debugfs_id);
+extern void shrinker_debugfs_remove(struct dentry *debugfs_entry,
+                                   int debugfs_id);
 extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
                                                  const char *fmt, ...);
 #else /* CONFIG_SHRINKER_DEBUG */
@@ -115,10 +118,16 @@ static inline int shrinker_debugfs_add(struct shrinker *shrinker)
 {
        return 0;
 }
-static inline struct dentry *shrinker_debugfs_remove(struct shrinker *shrinker)
+static inline struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
+                                                    int *debugfs_id)
 {
+       *debugfs_id = -1;
        return NULL;
 }
+static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry,
+                                          int debugfs_id)
+{
+}
 static inline __printf(2, 3)
 int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...)
 {
index 738776a..0b40417 100644 (file)
@@ -1587,6 +1587,16 @@ static inline void skb_copy_hash(struct sk_buff *to, const struct sk_buff *from)
        to->l4_hash = from->l4_hash;
 };
 
+static inline int skb_cmp_decrypted(const struct sk_buff *skb1,
+                                   const struct sk_buff *skb2)
+{
+#ifdef CONFIG_TLS_DEVICE
+       return skb2->decrypted - skb1->decrypted;
+#else
+       return 0;
+#endif
+}
+
 static inline void skb_copy_decrypted(struct sk_buff *to,
                                      const struct sk_buff *from)
 {
index 84f7874..054d791 100644 (file)
@@ -71,7 +71,6 @@ struct sk_psock_link {
 };
 
 struct sk_psock_work_state {
-       struct sk_buff                  *skb;
        u32                             len;
        u32                             off;
 };
@@ -105,7 +104,7 @@ struct sk_psock {
        struct proto                    *sk_proto;
        struct mutex                    work_mutex;
        struct sk_psock_work_state      work_state;
-       struct work_struct              work;
+       struct delayed_work             work;
        struct rcu_work                 rwork;
 };
 
index f6df03f..deb90cf 100644 (file)
@@ -39,7 +39,8 @@ enum stat_item {
        CPU_PARTIAL_FREE,       /* Refill cpu partial on free */
        CPU_PARTIAL_NODE,       /* Refill cpu partial from node partial */
        CPU_PARTIAL_DRAIN,      /* Drain cpu partial to node partial */
-       NR_SLUB_STAT_ITEMS };
+       NR_SLUB_STAT_ITEMS
+};
 
 #ifndef CONFIG_SLUB_TINY
 /*
@@ -47,8 +48,13 @@ enum stat_item {
  * with this_cpu_cmpxchg_double() alignment requirements.
  */
 struct kmem_cache_cpu {
-       void **freelist;        /* Pointer to next available object */
-       unsigned long tid;      /* Globally unique transaction id */
+       union {
+               struct {
+                       void **freelist;        /* Pointer to next available object */
+                       unsigned long tid;      /* Globally unique transaction id */
+               };
+               freelist_aba_t freelist_tid;
+       };
        struct slab *slab;      /* The slab from which we are allocating */
 #ifdef CONFIG_SLUB_CPU_PARTIAL
        struct slab *partial;   /* Partially allocated frozen slabs */
index 423220e..93417ba 100644 (file)
@@ -69,9 +69,6 @@ struct llcc_slice_desc {
 /**
  * struct llcc_edac_reg_data - llcc edac registers data for each error type
  * @name: Name of the error
- * @synd_reg: Syndrome register address
- * @count_status_reg: Status register address to read the error count
- * @ways_status_reg: Status register address to read the error ways
  * @reg_cnt: Number of registers
  * @count_mask: Mask value to get the error count
  * @ways_mask: Mask value to get the error ways
@@ -80,9 +77,6 @@ struct llcc_slice_desc {
  */
 struct llcc_edac_reg_data {
        char *name;
-       u64 synd_reg;
-       u64 count_status_reg;
-       u64 ways_status_reg;
        u32 reg_cnt;
        u32 count_mask;
        u32 ways_mask;
index a55179f..8f052c3 100644 (file)
@@ -76,6 +76,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
                              struct splice_pipe_desc *);
 extern ssize_t add_to_pipe(struct pipe_inode_info *,
                              struct pipe_buffer *);
+long vfs_splice_read(struct file *in, loff_t *ppos,
+                    struct pipe_inode_info *pipe, size_t len,
+                    unsigned int flags);
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
                                      splice_direct_actor *);
 extern long do_splice(struct file *in, loff_t *off_in,
index 41c4b26..eb92a50 100644 (file)
@@ -212,7 +212,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
 
        srcu_check_nmi_safety(ssp, false);
        retval = __srcu_read_lock(ssp);
-       srcu_lock_acquire(&(ssp)->dep_map);
+       srcu_lock_acquire(&ssp->dep_map);
        return retval;
 }
 
@@ -229,7 +229,7 @@ static inline int srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp
 
        srcu_check_nmi_safety(ssp, true);
        retval = __srcu_read_lock_nmisafe(ssp);
-       rcu_lock_acquire(&(ssp)->dep_map);
+       rcu_lock_acquire(&ssp->dep_map);
        return retval;
 }
 
@@ -284,7 +284,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
 {
        WARN_ON_ONCE(idx & ~0x1);
        srcu_check_nmi_safety(ssp, false);
-       srcu_lock_release(&(ssp)->dep_map);
+       srcu_lock_release(&ssp->dep_map);
        __srcu_read_unlock(ssp, idx);
 }
 
@@ -300,7 +300,7 @@ static inline void srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx)
 {
        WARN_ON_ONCE(idx & ~0x1);
        srcu_check_nmi_safety(ssp, true);
-       rcu_lock_release(&(ssp)->dep_map);
+       rcu_lock_release(&ssp->dep_map);
        __srcu_read_unlock_nmisafe(ssp, idx);
 }
 
index 762d723..3b10636 100644 (file)
@@ -509,6 +509,27 @@ static inline void svcxdr_init_encode(struct svc_rqst *rqstp)
 }
 
 /**
+ * svcxdr_encode_opaque_pages - Insert pages into an xdr_stream
+ * @xdr: xdr_stream to be updated
+ * @pages: array of pages to insert
+ * @base: starting offset of first data byte in @pages
+ * @len: number of data bytes in @pages to insert
+ *
+ * After the @pages are added, the tail iovec is instantiated pointing
+ * to end of the head buffer, and the stream is set up to encode
+ * subsequent items into the tail.
+ */
+static inline void svcxdr_encode_opaque_pages(struct svc_rqst *rqstp,
+                                             struct xdr_stream *xdr,
+                                             struct page **pages,
+                                             unsigned int base,
+                                             unsigned int len)
+{
+       xdr_write_pages(xdr, pages, base, len);
+       xdr->page_ptr = rqstp->rq_next_page - 1;
+}
+
+/**
  * svcxdr_set_auth_slack -
  * @rqstp: RPC transaction
  * @slack: buffer space to reserve for the transaction's security flavor
index 24aa159..a5ee0af 100644 (file)
@@ -135,7 +135,6 @@ struct svc_rdma_recv_ctxt {
        struct ib_sge           rc_recv_sge;
        void                    *rc_recv_buf;
        struct xdr_stream       rc_stream;
-       bool                    rc_temp;
        u32                     rc_byte_len;
        unsigned int            rc_page_count;
        u32                     rc_inv_rkey;
@@ -155,12 +154,12 @@ struct svc_rdma_send_ctxt {
 
        struct ib_send_wr       sc_send_wr;
        struct ib_cqe           sc_cqe;
-       struct completion       sc_done;
        struct xdr_buf          sc_hdrbuf;
        struct xdr_stream       sc_stream;
        void                    *sc_xprt_buf;
+       int                     sc_page_count;
        int                     sc_cur_sge_no;
-
+       struct page             *sc_pages[RPCSVC_MAXPAGES];
        struct ib_sge           sc_sges[];
 };
 
@@ -176,7 +175,7 @@ extern struct svc_rdma_recv_ctxt *
 extern void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
                                   struct svc_rdma_recv_ctxt *ctxt);
 extern void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma);
-extern void svc_rdma_release_rqst(struct svc_rqst *rqstp);
+extern void svc_rdma_release_ctxt(struct svc_xprt *xprt, void *ctxt);
 extern int svc_rdma_recvfrom(struct svc_rqst *);
 
 /* svc_rdma_rw.c */
index 8674792..a6b1263 100644 (file)
@@ -23,7 +23,7 @@ struct svc_xprt_ops {
        int             (*xpo_sendto)(struct svc_rqst *);
        int             (*xpo_result_payload)(struct svc_rqst *, unsigned int,
                                              unsigned int);
-       void            (*xpo_release_rqst)(struct svc_rqst *);
+       void            (*xpo_release_ctxt)(struct svc_xprt *xprt, void *ctxt);
        void            (*xpo_detach)(struct svc_xprt *);
        void            (*xpo_free)(struct svc_xprt *);
        void            (*xpo_kill_temp_xprt)(struct svc_xprt *);
index d16ae62..a711604 100644 (file)
@@ -61,10 +61,9 @@ int          svc_recv(struct svc_rqst *, long);
 void           svc_send(struct svc_rqst *rqstp);
 void           svc_drop(struct svc_rqst *);
 void           svc_sock_update_bufs(struct svc_serv *serv);
-bool           svc_alien_sock(struct net *net, int fd);
-int            svc_addsock(struct svc_serv *serv, const int fd,
-                                       char *name_return, const size_t len,
-                                       const struct cred *cred);
+int            svc_addsock(struct svc_serv *serv, struct net *net,
+                           const int fd, char *name_return, const size_t len,
+                           const struct cred *cred);
 void           svc_init_xprt_sock(void);
 void           svc_cleanup_xprt_sock(void);
 struct svc_xprt *svc_sock_create(struct svc_serv *serv, int prot);
index 72014c9..f89ec4b 100644 (file)
@@ -242,8 +242,7 @@ extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf,
 extern void xdr_init_encode_pages(struct xdr_stream *xdr, struct xdr_buf *buf,
                           struct page **pages, struct rpc_rqst *rqst);
 extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
-extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec,
-               size_t nbytes);
+extern int xdr_reserve_space_vec(struct xdr_stream *xdr, size_t nbytes);
 extern void __xdr_commit_encode(struct xdr_stream *xdr);
 extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
 extern void xdr_truncate_decode(struct xdr_stream *xdr, size_t len);
index df81043..42b249b 100644 (file)
@@ -243,11 +243,7 @@ static inline bool is_ssam_device(struct device *d)
  * Return: Returns the pointer to the &struct ssam_device_driver wrapping the
  * given device driver @d.
  */
-static inline
-struct ssam_device_driver *to_ssam_device_driver(struct device_driver *d)
-{
-       return container_of(d, struct ssam_device_driver, driver);
-}
+#define to_ssam_device_driver(d)       container_of_const(d, struct ssam_device_driver, driver)
 
 const struct ssam_device_id *ssam_device_id_match(const struct ssam_device_id *table,
                                                  const struct ssam_device_uid uid);
index d0d4598..4d0095e 100644 (file)
@@ -202,6 +202,7 @@ struct platform_s2idle_ops {
 };
 
 #ifdef CONFIG_SUSPEND
+extern suspend_state_t pm_suspend_target_state;
 extern suspend_state_t mem_sleep_current;
 extern suspend_state_t mem_sleep_default;
 
@@ -337,6 +338,8 @@ extern bool sync_on_suspend_enabled;
 #else /* !CONFIG_SUSPEND */
 #define suspend_valid_only_mem NULL
 
+#define pm_suspend_target_state        (PM_SUSPEND_ON)
+
 static inline void pm_suspend_clear_flags(void) {}
 static inline void pm_set_suspend_via_firmware(void) {}
 static inline void pm_set_resume_via_firmware(void) {}
@@ -452,6 +455,10 @@ extern struct pbe *restore_pblist;
 int pfn_is_nosave(unsigned long pfn);
 
 int hibernate_quiet_exec(int (*func)(void *data), void *data);
+int hibernate_resume_nonboot_cpu_disable(void);
+int arch_hibernation_header_save(void *addr, unsigned int max_size);
+int arch_hibernation_header_restore(void *addr);
+
 #else /* CONFIG_HIBERNATION */
 static inline void register_nosave_region(unsigned long b, unsigned long e) {}
 static inline int swsusp_page_is_forbidden(struct page *p) { return 0; }
@@ -468,6 +475,8 @@ static inline int hibernate_quiet_exec(int (*func)(void *data), void *data) {
 }
 #endif /* CONFIG_HIBERNATION */
 
+int arch_resume_nosmt(void);
+
 #ifdef CONFIG_HIBERNATION_SNAPSHOT_DEV
 int is_hibernate_resume_dev(dev_t dev);
 #else
@@ -503,7 +512,6 @@ extern void pm_report_max_hw_sleep(u64 t);
 
 /* drivers/base/power/wakeup.c */
 extern bool events_check_enabled;
-extern suspend_state_t pm_suspend_target_state;
 
 extern bool pm_wakeup_pending(void);
 extern void pm_system_wakeup(void);
@@ -551,6 +559,7 @@ static inline void unlock_system_sleep(unsigned int flags) {}
 #ifdef CONFIG_PM_SLEEP_DEBUG
 extern bool pm_print_times_enabled;
 extern bool pm_debug_messages_on;
+extern bool pm_debug_messages_should_print(void);
 static inline int pm_dyn_debug_messages_on(void)
 {
 #ifdef CONFIG_DYNAMIC_DEBUG
@@ -564,14 +573,14 @@ static inline int pm_dyn_debug_messages_on(void)
 #endif
 #define __pm_pr_dbg(fmt, ...)                                  \
        do {                                                    \
-               if (pm_debug_messages_on)                       \
+               if (pm_debug_messages_should_print())           \
                        printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__);  \
                else if (pm_dyn_debug_messages_on())            \
                        pr_debug(fmt, ##__VA_ARGS__);   \
        } while (0)
 #define __pm_deferred_pr_dbg(fmt, ...)                         \
        do {                                                    \
-               if (pm_debug_messages_on)                       \
+               if (pm_debug_messages_should_print())           \
                        printk_deferred(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); \
        } while (0)
 #else
@@ -589,7 +598,8 @@ static inline int pm_dyn_debug_messages_on(void)
 /**
  * pm_pr_dbg - print pm sleep debug messages
  *
- * If pm_debug_messages_on is enabled, print message.
+ * If pm_debug_messages_on is enabled and the system is entering/leaving
+ *      suspend, print message.
  * If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is enabled,
  *     print message only from instances explicitly enabled on dynamic debug's
  *     control.
index 33a0ee3..24871f8 100644 (file)
@@ -1280,6 +1280,7 @@ asmlinkage long sys_ni_syscall(void);
 
 #endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */
 
+asmlinkage long sys_ni_posix_timers(void);
 
 /*
  * Kernel code should not call syscalls (i.e., sys_xyzyyz()) directly.
index bb9d3f5..03d9c5a 100644 (file)
@@ -44,7 +44,6 @@ struct time_namespace *copy_time_ns(unsigned long flags,
                                    struct time_namespace *old_ns);
 void free_time_ns(struct time_namespace *ns);
 void timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk);
-struct vdso_data *arch_get_vdso_data(void *vvar_page);
 struct page *find_timens_vvar_page(struct vm_area_struct *vma);
 
 static inline void put_time_ns(struct time_namespace *ns)
@@ -163,4 +162,6 @@ static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim)
 }
 #endif
 
+struct vdso_data *arch_get_vdso_data(void *vvar_page);
+
 #endif /* _LINUX_TIMENS_H */
index 7769338..6a1e8f1 100644 (file)
@@ -282,6 +282,7 @@ enum tpm_chip_flags {
        TPM_CHIP_FLAG_ALWAYS_POWERED            = BIT(5),
        TPM_CHIP_FLAG_FIRMWARE_POWER_MANAGED    = BIT(6),
        TPM_CHIP_FLAG_FIRMWARE_UPGRADE          = BIT(7),
+       TPM_CHIP_FLAG_SUSPENDED                 = BIT(8),
 };
 
 #define to_tpm_chip(d) container_of(d, struct tpm_chip, dev)
index 0e37322..7c4a0b7 100644 (file)
@@ -806,6 +806,7 @@ enum {
        FILTER_TRACE_FN,
        FILTER_COMM,
        FILTER_CPU,
+       FILTER_STACKTRACE,
 };
 
 extern int trace_event_raw_init(struct trace_event_call *call);
index 688fb94..becb8cd 100644 (file)
 #define DECLARE_BITMAP(name,bits) \
        unsigned long name[BITS_TO_LONGS(bits)]
 
+#ifdef __SIZEOF_INT128__
+typedef __s128 s128;
+typedef __u128 u128;
+#endif
+
 typedef u32 __kernel_dev_t;
 
 typedef __kernel_fd_set                fd_set;
index 044c1d8..8e7d2c4 100644 (file)
@@ -11,7 +11,6 @@
 #include <uapi/linux/uio.h>
 
 struct page;
-struct pipe_inode_info;
 
 typedef unsigned int __bitwise iov_iter_extraction_t;
 
@@ -25,7 +24,6 @@ enum iter_type {
        ITER_IOVEC,
        ITER_KVEC,
        ITER_BVEC,
-       ITER_PIPE,
        ITER_XARRAY,
        ITER_DISCARD,
        ITER_UBUF,
@@ -74,7 +72,6 @@ struct iov_iter {
                                const struct kvec *kvec;
                                const struct bio_vec *bvec;
                                struct xarray *xarray;
-                               struct pipe_inode_info *pipe;
                                void __user *ubuf;
                        };
                        size_t count;
@@ -82,10 +79,6 @@ struct iov_iter {
        };
        union {
                unsigned long nr_segs;
-               struct {
-                       unsigned int head;
-                       unsigned int start_head;
-               };
                loff_t xarray_start;
        };
 };
@@ -133,11 +126,6 @@ static inline bool iov_iter_is_bvec(const struct iov_iter *i)
        return iov_iter_type(i) == ITER_BVEC;
 }
 
-static inline bool iov_iter_is_pipe(const struct iov_iter *i)
-{
-       return iov_iter_type(i) == ITER_PIPE;
-}
-
 static inline bool iov_iter_is_discard(const struct iov_iter *i)
 {
        return iov_iter_type(i) == ITER_DISCARD;
@@ -286,19 +274,11 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec
                        unsigned long nr_segs, size_t count);
 void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec,
                        unsigned long nr_segs, size_t count);
-void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe,
-                       size_t count);
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
 void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
                     loff_t start, size_t count);
-ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
-               size_t maxsize, unsigned maxpages, size_t *start,
-               iov_iter_extraction_t extraction_flags);
 ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
                        size_t maxsize, unsigned maxpages, size_t *start);
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
-               struct page ***pages, size_t maxsize, size_t *start,
-               iov_iter_extraction_t extraction_flags);
 ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
                        size_t maxsize, size_t *start);
 int iov_iter_npages(const struct iov_iter *i, int maxpages);
index a2448e9..07531c4 100644 (file)
@@ -443,7 +443,7 @@ static inline struct usb_composite_driver *to_cdriver(
  * @bcd_webusb_version: 0x0100 by default, WebUSB specification version
  * @b_webusb_vendor_code: 0x0 by default, vendor code for WebUSB
  * @landing_page: empty by default, landing page to announce in WebUSB
- * @use_webusb:: false by default, interested gadgets set it
+ * @use_webusb: false by default, interested gadgets set it
  * @os_desc_config: the configuration to be used with OS descriptors
  * @setup_pending: true when setup request is queued but not completed
  * @os_desc_pending: true when os_desc request is queued but not completed
index 094c77e..0c7eff9 100644 (file)
@@ -501,6 +501,11 @@ void *hcd_buffer_alloc(struct usb_bus *bus, size_t size,
 void hcd_buffer_free(struct usb_bus *bus, size_t size,
        void *addr, dma_addr_t dma);
 
+void *hcd_buffer_alloc_pages(struct usb_hcd *hcd,
+               size_t size, gfp_t mem_flags, dma_addr_t *dma);
+void hcd_buffer_free_pages(struct usb_hcd *hcd,
+               size_t size, void *addr, dma_addr_t dma);
+
 /* generic bus glue, needed for host controllers that don't use PCI */
 extern irqreturn_t usb_hcd_irq(int irq, void *__hcd);
 
index 2847f5a..8afa8c3 100644 (file)
 
 #ifdef CONFIG_USER_EVENTS
 struct user_event_mm {
-       struct list_head        link;
+       struct list_head        mms_link;
        struct list_head        enablers;
        struct mm_struct        *mm;
+       /* Used for one-shot lists, protected by event_mutex */
        struct user_event_mm    *next;
        refcount_t              refcnt;
        refcount_t              tasks;
index fc6bba2..45cd42f 100644 (file)
@@ -38,7 +38,7 @@ struct watch_filter {
 struct watch_queue {
        struct rcu_head         rcu;
        struct watch_filter __rcu *filter;
-       struct pipe_inode_info  *pipe;          /* The pipe we're using as a buffer */
+       struct pipe_inode_info  *pipe;          /* Pipe we use as a buffer, NULL if queue closed */
        struct hlist_head       watches;        /* Contributory watches */
        struct page             **notes;        /* Preallocated notifications */
        unsigned long           *notes_bitmap;  /* Allocation bitmap for notes */
@@ -46,7 +46,6 @@ struct watch_queue {
        spinlock_t              lock;
        unsigned int            nr_notes;       /* Number of notes */
        unsigned int            nr_pages;       /* Number of pages in notes[] */
-       bool                    defunct;        /* T when queues closed */
 };
 
 /*
index 3992c99..683efe2 100644 (file)
@@ -68,7 +68,6 @@ enum {
        WORK_OFFQ_FLAG_BASE     = WORK_STRUCT_COLOR_SHIFT,
 
        __WORK_OFFQ_CANCELING   = WORK_OFFQ_FLAG_BASE,
-       WORK_OFFQ_CANCELING     = (1 << __WORK_OFFQ_CANCELING),
 
        /*
         * When a work item is off queue, its high bits point to the last
@@ -79,12 +78,6 @@ enum {
        WORK_OFFQ_POOL_SHIFT    = WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,
        WORK_OFFQ_LEFT          = BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT,
        WORK_OFFQ_POOL_BITS     = WORK_OFFQ_LEFT <= 31 ? WORK_OFFQ_LEFT : 31,
-       WORK_OFFQ_POOL_NONE     = (1LU << WORK_OFFQ_POOL_BITS) - 1,
-
-       /* convenience constants */
-       WORK_STRUCT_FLAG_MASK   = (1UL << WORK_STRUCT_FLAG_BITS) - 1,
-       WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK,
-       WORK_STRUCT_NO_POOL     = (unsigned long)WORK_OFFQ_POOL_NONE << WORK_OFFQ_POOL_SHIFT,
 
        /* bit mask for work_busy() return values */
        WORK_BUSY_PENDING       = 1 << 0,
@@ -94,6 +87,14 @@ enum {
        WORKER_DESC_LEN         = 24,
 };
 
+/* Convenience constants - of type 'unsigned long', not 'enum'! */
+#define WORK_OFFQ_CANCELING    (1ul << __WORK_OFFQ_CANCELING)
+#define WORK_OFFQ_POOL_NONE    ((1ul << WORK_OFFQ_POOL_BITS) - 1)
+#define WORK_STRUCT_NO_POOL    (WORK_OFFQ_POOL_NONE << WORK_OFFQ_POOL_SHIFT)
+
+#define WORK_STRUCT_FLAG_MASK    ((1ul << WORK_STRUCT_FLAG_BITS) - 1)
+#define WORK_STRUCT_WQ_DATA_MASK (~WORK_STRUCT_FLAG_MASK)
+
 struct work_struct {
        atomic_long_t data;
        struct list_head entry;
index 9980b1d..4a921ea 100644 (file)
@@ -39,6 +39,9 @@ struct net_device;
  * @exit:              flag to indicate when the device is being removed.
  * @demux:             pointer to &struct dmx_demux.
  * @ioctl_mutex:       protect access to this struct.
+ * @remove_mutex:      mutex that avoids a race condition between a callback
+ *                     called when the hardware is disconnected and the
+ *                     file_operations of dvb_net.
  *
  * Currently, the core supports up to %DVB_NET_DEVICES_MAX (10) network
  * devices.
@@ -51,6 +54,7 @@ struct dvb_net {
        unsigned int exit:1;
        struct dmx_demux *demux;
        struct mutex ioctl_mutex;
+       struct mutex remove_mutex;
 };
 
 /**
index 29d25c8..8958e5e 100644 (file)
@@ -194,6 +194,21 @@ struct dvb_device {
 };
 
 /**
+ * struct dvbdevfops_node - fops nodes registered in dvbdevfops_list
+ *
+ * @fops:              Dynamically allocated fops for ->owner registration
+ * @type:              type of dvb_device
+ * @template:          dvb_device used for registration
+ * @list_head:         list_head for dvbdevfops_list
+ */
+struct dvbdevfops_node {
+       struct file_operations *fops;
+       enum dvb_device_type type;
+       const struct dvb_device *template;
+       struct list_head list_head;
+};
+
+/**
  * dvb_device_get - Increase dvb_device reference
  *
  * @dvbdev:    pointer to struct dvb_device
index cfd19e7..b325df0 100644 (file)
@@ -1119,6 +1119,7 @@ struct v4l2_subdev {
  * @vfh: pointer to &struct v4l2_fh
  * @state: pointer to &struct v4l2_subdev_state
  * @owner: module pointer to the owner of this file handle
+ * @client_caps: bitmask of ``V4L2_SUBDEV_CLIENT_CAP_*``
  */
 struct v4l2_subdev_fh {
        struct v4l2_fh vfh;
index 07df96c..872dcb9 100644 (file)
@@ -350,6 +350,7 @@ enum {
 enum {
        HCI_SETUP,
        HCI_CONFIG,
+       HCI_DEBUGFS_CREATED,
        HCI_AUTO_OFF,
        HCI_RFKILLED,
        HCI_MGMT,
index a6c8aee..9654567 100644 (file)
@@ -515,6 +515,7 @@ struct hci_dev {
        struct work_struct      cmd_sync_work;
        struct list_head        cmd_sync_work_list;
        struct mutex            cmd_sync_work_lock;
+       struct mutex            unregister_lock;
        struct work_struct      cmd_sync_cancel_work;
        struct work_struct      reenable_adv_work;
 
@@ -1201,7 +1202,8 @@ static inline struct hci_conn *hci_conn_hash_lookup_cis(struct hci_dev *hdev,
                if (id != BT_ISO_QOS_CIS_UNSET && id != c->iso_qos.ucast.cis)
                        continue;
 
-               if (ba_type == c->dst_type && !bacmp(&c->dst, ba)) {
+               /* Match destination address if set */
+               if (!ba || (ba_type == c->dst_type && !bacmp(&c->dst, ba))) {
                        rcu_read_unlock();
                        return c;
                }
@@ -1327,7 +1329,7 @@ int hci_le_create_cis(struct hci_conn *conn);
 
 struct hci_conn *hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t *dst,
                              u8 role);
-int hci_conn_del(struct hci_conn *conn);
+void hci_conn_del(struct hci_conn *conn);
 void hci_conn_hash_flush(struct hci_dev *hdev);
 void hci_conn_check_pending(struct hci_dev *hdev);
 
index a60a249..59955ac 100644 (file)
@@ -221,6 +221,7 @@ struct bonding {
        struct   bond_up_slave __rcu *usable_slaves;
        struct   bond_up_slave __rcu *all_slaves;
        bool     force_primary;
+       bool     notifier_ctx;
        s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
        int     (*recv_probe)(const struct sk_buff *, struct bonding *,
                              struct slave *);
@@ -233,7 +234,7 @@ struct bonding {
         */
        spinlock_t mode_lock;
        spinlock_t stats_lock;
-       u      send_peer_notif;
+       u32      send_peer_notif;
        u8       igmp_retrans;
 #ifdef CONFIG_PROC_FS
        struct   proc_dir_entry *proc_entry;
index 8903053..ab0f0a5 100644 (file)
@@ -959,6 +959,14 @@ struct dsa_switch_ops {
        void    (*port_disable)(struct dsa_switch *ds, int port);
 
        /*
+        * Compatibility between device trees defining multiple CPU ports and
+        * drivers which are not OK to use by default the numerically smallest
+        * CPU port of a switch for its local ports. This can return NULL,
+        * meaning "don't know/don't care".
+        */
+       struct dsa_port *(*preferred_default_local_cpu_port)(struct dsa_switch *ds);
+
+       /*
         * Port's MAC EEE settings
         */
        int     (*set_mac_eee)(struct dsa_switch *ds, int port,
index 3352b1a..2e26e43 100644 (file)
@@ -24,6 +24,7 @@ struct tls_handshake_args {
        struct socket           *ta_sock;
        tls_done_func_t         ta_done;
        void                    *ta_data;
+       const char              *ta_peername;
        unsigned int            ta_timeout_ms;
        key_serial_t            ta_keyring;
        key_serial_t            ta_my_cert;
index c3fffaa..acec504 100644 (file)
@@ -76,6 +76,7 @@ struct ipcm_cookie {
        __be32                  addr;
        int                     oif;
        struct ip_options_rcu   *opt;
+       __u8                    protocol;
        __u8                    ttl;
        __s16                   tos;
        char                    priority;
@@ -96,6 +97,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
        ipcm->sockc.tsflags = inet->sk.sk_tsflags;
        ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
        ipcm->addr = inet->inet_saddr;
+       ipcm->protocol = inet->inet_num;
 }
 
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
index cd386aa..9eef199 100644 (file)
@@ -347,10 +347,8 @@ struct mana_tx_qp {
 struct mana_ethtool_stats {
        u64 stop_queue;
        u64 wake_queue;
-       u64 tx_cqes;
        u64 tx_cqe_err;
        u64 tx_cqe_unknown_type;
-       u64 rx_cqes;
        u64 rx_coalesced_err;
        u64 rx_cqe_unknown_type;
 };
index 3fa5774..f6a8ecc 100644 (file)
@@ -180,7 +180,7 @@ struct pneigh_entry {
        netdevice_tracker       dev_tracker;
        u32                     flags;
        u8                      protocol;
-       u                     key[];
+       u32                     key[];
 };
 
 /*
index ebb28ec..f37f9f3 100644 (file)
@@ -268,7 +268,7 @@ int flow_offload_route_init(struct flow_offload *flow,
 
 int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow);
 void flow_offload_refresh(struct nf_flowtable *flow_table,
-                         struct flow_offload *flow);
+                         struct flow_offload *flow, bool force);
 
 struct flow_offload_tuple_rhash *flow_offload_lookup(struct nf_flowtable *flow_table,
                                                     struct flow_offload_tuple *tuple);
index 2e24ea1..ee47d71 100644 (file)
@@ -462,7 +462,8 @@ struct nft_set_ops {
                                               const struct nft_set *set,
                                               const struct nft_set_elem *elem,
                                               unsigned int flags);
-
+       void                            (*commit)(const struct nft_set *set);
+       void                            (*abort)(const struct nft_set *set);
        u64                             (*privsize)(const struct nlattr * const nla[],
                                                    const struct nft_set_desc *desc);
        bool                            (*estimate)(const struct nft_set_desc *desc,
@@ -471,7 +472,8 @@ struct nft_set_ops {
        int                             (*init)(const struct nft_set *set,
                                                const struct nft_set_desc *desc,
                                                const struct nlattr * const nla[]);
-       void                            (*destroy)(const struct nft_set *set);
+       void                            (*destroy)(const struct nft_ctx *ctx,
+                                                  const struct nft_set *set);
        void                            (*gc_init)(const struct nft_set *set);
 
        unsigned int                    elemsize;
@@ -557,6 +559,7 @@ struct nft_set {
        u16                             policy;
        u16                             udlen;
        unsigned char                   *udata;
+       struct list_head                pending_update;
        /* runtime data below here */
        const struct nft_set_ops        *ops ____cacheline_aligned;
        u16                             flags:14,
@@ -807,6 +810,8 @@ int nft_set_elem_expr_clone(const struct nft_ctx *ctx, struct nft_set *set,
                            struct nft_expr *expr_array[]);
 void nft_set_elem_destroy(const struct nft_set *set, void *elem,
                          bool destroy_expr);
+void nf_tables_set_elem_destroy(const struct nft_ctx *ctx,
+                               const struct nft_set *set, void *elem);
 
 /**
  *     struct nft_set_gc_batch_head - nf_tables set garbage collection batch
@@ -899,6 +904,7 @@ struct nft_expr_type {
 
 enum nft_trans_phase {
        NFT_TRANS_PREPARE,
+       NFT_TRANS_PREPARE_ERROR,
        NFT_TRANS_ABORT,
        NFT_TRANS_COMMIT,
        NFT_TRANS_RELEASE
@@ -1007,7 +1013,10 @@ static inline struct nft_userdata *nft_userdata(const struct nft_rule *rule)
        return (void *)&rule->data[rule->dlen];
 }
 
-void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule);
+void nft_rule_expr_activate(const struct nft_ctx *ctx, struct nft_rule *rule);
+void nft_rule_expr_deactivate(const struct nft_ctx *ctx, struct nft_rule *rule,
+                             enum nft_trans_phase phase);
+void nf_tables_rule_destroy(const struct nft_ctx *ctx, struct nft_rule *rule);
 
 static inline void nft_set_elem_update_expr(const struct nft_set_ext *ext,
                                            struct nft_regs *regs,
@@ -1102,6 +1111,8 @@ int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set,
                         const struct nft_set_iter *iter,
                         struct nft_set_elem *elem);
 int nft_set_catchall_validate(const struct nft_ctx *ctx, struct nft_set *set);
+int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
+void nf_tables_unbind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
 
 enum nft_chain_types {
        NFT_CHAIN_T_DEFAULT = 0,
@@ -1138,11 +1149,17 @@ int nft_chain_validate_dependency(const struct nft_chain *chain,
 int nft_chain_validate_hooks(const struct nft_chain *chain,
                              unsigned int hook_flags);
 
+static inline bool nft_chain_binding(const struct nft_chain *chain)
+{
+       return chain->flags & NFT_CHAIN_BINDING;
+}
+
 static inline bool nft_chain_is_bound(struct nft_chain *chain)
 {
        return (chain->flags & NFT_CHAIN_BINDING) && chain->bound;
 }
 
+int nft_chain_add(struct nft_table *table, struct nft_chain *chain);
 void nft_chain_del(struct nft_chain *chain);
 void nf_tables_chain_destroy(struct nft_ctx *ctx);
 
@@ -1556,6 +1573,7 @@ static inline void nft_set_elem_clear_busy(struct nft_set_ext *ext)
  *     struct nft_trans - nf_tables object update in transaction
  *
  *     @list: used internally
+ *     @binding_list: list of objects with possible bindings
  *     @msg_type: message type
  *     @put_net: ctx->net needs to be put
  *     @ctx: transaction context
@@ -1563,6 +1581,7 @@ static inline void nft_set_elem_clear_busy(struct nft_set_ext *ext)
  */
 struct nft_trans {
        struct list_head                list;
+       struct list_head                binding_list;
        int                             msg_type;
        bool                            put_net;
        struct nft_ctx                  ctx;
@@ -1573,6 +1592,7 @@ struct nft_trans_rule {
        struct nft_rule                 *rule;
        struct nft_flow_rule            *flow;
        u32                             rule_id;
+       bool                            bound;
 };
 
 #define nft_trans_rule(trans)  \
@@ -1581,6 +1601,8 @@ struct nft_trans_rule {
        (((struct nft_trans_rule *)trans->data)->flow)
 #define nft_trans_rule_id(trans)       \
        (((struct nft_trans_rule *)trans->data)->rule_id)
+#define nft_trans_rule_bound(trans)    \
+       (((struct nft_trans_rule *)trans->data)->bound)
 
 struct nft_trans_set {
        struct nft_set                  *set;
@@ -1605,15 +1627,19 @@ struct nft_trans_set {
        (((struct nft_trans_set *)trans->data)->gc_int)
 
 struct nft_trans_chain {
+       struct nft_chain                *chain;
        bool                            update;
        char                            *name;
        struct nft_stats __percpu       *stats;
        u8                              policy;
+       bool                            bound;
        u32                             chain_id;
        struct nft_base_chain           *basechain;
        struct list_head                hook_list;
 };
 
+#define nft_trans_chain(trans) \
+       (((struct nft_trans_chain *)trans->data)->chain)
 #define nft_trans_chain_update(trans)  \
        (((struct nft_trans_chain *)trans->data)->update)
 #define nft_trans_chain_name(trans)    \
@@ -1622,6 +1648,8 @@ struct nft_trans_chain {
        (((struct nft_trans_chain *)trans->data)->stats)
 #define nft_trans_chain_policy(trans)  \
        (((struct nft_trans_chain *)trans->data)->policy)
+#define nft_trans_chain_bound(trans)   \
+       (((struct nft_trans_chain *)trans->data)->bound)
 #define nft_trans_chain_id(trans)      \
        (((struct nft_trans_chain *)trans->data)->chain_id)
 #define nft_trans_basechain(trans)     \
@@ -1698,6 +1726,7 @@ static inline int nft_request_module(struct net *net, const char *fmt, ...) { re
 struct nftables_pernet {
        struct list_head        tables;
        struct list_head        commit_list;
+       struct list_head        binding_list;
        struct list_head        module_list;
        struct list_head        notify_list;
        struct mutex            commit_mutex;
index 3cceb3e..5f2cfd8 100644 (file)
@@ -53,7 +53,7 @@ struct netns_sysctl_ipv6 {
        int seg6_flowlabel;
        u32 ioam6_id;
        u64 ioam6_id_wide;
-       bool skip_notify_on_dev_down;
+       u8 skip_notify_on_dev_down;
        u8 fib_notify_on_flag_change;
        u8 icmpv6_error_anycast_as_unicast;
 };
index 9fa291a..2b12725 100644 (file)
@@ -497,29 +497,6 @@ static inline struct fib6_nh *nexthop_fib6_nh(struct nexthop *nh)
        return NULL;
 }
 
-/* Variant of nexthop_fib6_nh().
- * Caller should either hold rcu_read_lock(), or RTNL.
- */
-static inline struct fib6_nh *nexthop_fib6_nh_bh(struct nexthop *nh)
-{
-       struct nh_info *nhi;
-
-       if (nh->is_group) {
-               struct nh_group *nh_grp;
-
-               nh_grp = rcu_dereference_rtnl(nh->nh_grp);
-               nh = nexthop_mpath_select(nh_grp, 0);
-               if (!nh)
-                       return NULL;
-       }
-
-       nhi = rcu_dereference_rtnl(nh->nh_info);
-       if (nhi->family == AF_INET6)
-               return &nhi->fib6_nh;
-
-       return NULL;
-}
-
 static inline struct net_device *fib6_info_nh_dev(struct fib6_info *f6i)
 {
        struct fib6_nh *fib6_nh;
index c8ec2f3..126f9e2 100644 (file)
@@ -399,22 +399,4 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
                page_pool_update_nid(pool, new_nid);
 }
 
-static inline void page_pool_ring_lock(struct page_pool *pool)
-       __acquires(&pool->ring.producer_lock)
-{
-       if (in_softirq())
-               spin_lock(&pool->ring.producer_lock);
-       else
-               spin_lock_bh(&pool->ring.producer_lock);
-}
-
-static inline void page_pool_ring_unlock(struct page_pool *pool)
-       __releases(&pool->ring.producer_lock)
-{
-       if (in_softirq())
-               spin_unlock(&pool->ring.producer_lock);
-       else
-               spin_unlock_bh(&pool->ring.producer_lock);
-}
-
 #endif /* _NET_PAGE_POOL_H */
index 9233ad3..bc77792 100644 (file)
 #define PING_HTABLE_SIZE       64
 #define PING_HTABLE_MASK       (PING_HTABLE_SIZE-1)
 
-/*
- * gid_t is either uint or ushort.  We want to pass it to
- * proc_dointvec_minmax(), so it must not be larger than MAX_INT
- */
-#define GID_T_MAX (((gid_t)~0U) >> 1)
+#define GID_T_MAX (((gid_t)~0U) - 1)
 
 /* Compatibility glue so we can support IPv6 when it's compiled as a module */
 struct pingv6_ops {
index f436688..5722931 100644 (file)
@@ -127,6 +127,8 @@ static inline void qdisc_run(struct Qdisc *q)
        }
 }
 
+extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
+
 /* Calculate maximal size of packet seen by hard_start_xmit
    routine of this device.
  */
index 308ef0a..30fe780 100644 (file)
@@ -23,9 +23,6 @@ static inline int rpl_init(void)
 static inline void rpl_exit(void) {}
 #endif
 
-/* Worst decompression memory usage ipv6 address (16) + pad 7 */
-#define IPV6_RPL_SRH_WORST_SWAP_SIZE (sizeof(struct in6_addr) + 7)
-
 size_t ipv6_rpl_srh_size(unsigned char n, unsigned char cmpri,
                         unsigned char cmpre);
 
index fab5ba3..12eadec 100644 (file)
@@ -137,6 +137,13 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
        refcount_inc(&qdisc->refcnt);
 }
 
+static inline bool qdisc_refcount_dec_if_one(struct Qdisc *qdisc)
+{
+       if (qdisc->flags & TCQ_F_BUILTIN)
+               return true;
+       return refcount_dec_if_one(&qdisc->refcnt);
+}
+
 /* Intended to be used by unlocked users, when concurrent qdisc release is
  * possible.
  */
@@ -545,7 +552,7 @@ static inline struct Qdisc *qdisc_root_bh(const struct Qdisc *qdisc)
 
 static inline struct Qdisc *qdisc_root_sleeping(const struct Qdisc *qdisc)
 {
-       return qdisc->dev_queue->qdisc_sleeping;
+       return rcu_dereference_rtnl(qdisc->dev_queue->qdisc_sleeping);
 }
 
 static inline spinlock_t *qdisc_root_sleeping_lock(const struct Qdisc *qdisc)
@@ -652,6 +659,7 @@ void dev_deactivate_many(struct list_head *head);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
                              struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
+void qdisc_destroy(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
 void qdisc_put_unlocked(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, int n, int len);
@@ -754,7 +762,9 @@ static inline bool qdisc_tx_changing(const struct net_device *dev)
 
        for (i = 0; i < dev->num_tx_queues; i++) {
                struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-               if (rcu_access_pointer(txq->qdisc) != txq->qdisc_sleeping)
+
+               if (rcu_access_pointer(txq->qdisc) !=
+                   rcu_access_pointer(txq->qdisc_sleeping))
                        return true;
        }
        return false;
index 8b7ed71..6f428a7 100644 (file)
@@ -336,6 +336,7 @@ struct sk_filter;
   *    @sk_cgrp_data: cgroup data for this cgroup
   *    @sk_memcg: this socket's memory cgroup association
   *    @sk_write_pending: a write to stream socket waits to start
+  *    @sk_wait_pending: number of threads blocked on this socket
   *    @sk_state_change: callback to indicate change in the state of the sock
   *    @sk_data_ready: callback to indicate there is data to be processed
   *    @sk_write_space: callback to indicate there is bf sending space available
@@ -428,6 +429,7 @@ struct sock {
        unsigned int            sk_napi_id;
 #endif
        int                     sk_rcvbuf;
+       int                     sk_wait_pending;
 
        struct sk_filter __rcu  *sk_filter;
        union {
@@ -1150,8 +1152,12 @@ static inline void sock_rps_record_flow(const struct sock *sk)
                 * OR   an additional socket flag
                 * [1] : sk_state and sk_prot are in the same cache line.
                 */
-               if (sk->sk_state == TCP_ESTABLISHED)
-                       sock_rps_record_flow_hash(sk->sk_rxhash);
+               if (sk->sk_state == TCP_ESTABLISHED) {
+                       /* This READ_ONCE() is paired with the WRITE_ONCE()
+                        * from sock_rps_save_rxhash() and sock_rps_reset_rxhash().
+                        */
+                       sock_rps_record_flow_hash(READ_ONCE(sk->sk_rxhash));
+               }
        }
 #endif
 }
@@ -1160,20 +1166,25 @@ static inline void sock_rps_save_rxhash(struct sock *sk,
                                        const struct sk_buff *skb)
 {
 #ifdef CONFIG_RPS
-       if (unlikely(sk->sk_rxhash != skb->hash))
-               sk->sk_rxhash = skb->hash;
+       /* The following WRITE_ONCE() is paired with the READ_ONCE()
+        * here, and another one in sock_rps_record_flow().
+        */
+       if (unlikely(READ_ONCE(sk->sk_rxhash) != skb->hash))
+               WRITE_ONCE(sk->sk_rxhash, skb->hash);
 #endif
 }
 
 static inline void sock_rps_reset_rxhash(struct sock *sk)
 {
 #ifdef CONFIG_RPS
-       sk->sk_rxhash = 0;
+       /* Paired with READ_ONCE() in sock_rps_record_flow() */
+       WRITE_ONCE(sk->sk_rxhash, 0);
 #endif
 }
 
 #define sk_wait_event(__sk, __timeo, __condition, __wait)              \
        ({      int __rc;                                               \
+               __sk->sk_wait_pending++;                                \
                release_sock(__sk);                                     \
                __rc = __condition;                                     \
                if (!__rc) {                                            \
@@ -1183,6 +1194,7 @@ static inline void sock_rps_reset_rxhash(struct sock *sk)
                }                                                       \
                sched_annotate_sleep();                                 \
                lock_sock(__sk);                                        \
+               __sk->sk_wait_pending--;                                \
                __rc = __condition;                                     \
                __rc;                                                   \
        })
@@ -2718,7 +2730,7 @@ static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk,
                __sock_recv_cmsgs(msg, sk, skb);
        else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
                sock_write_timestamp(sk, skb->tstamp);
-       else if (unlikely(sk->sk_stamp == SK_DEFAULT_STAMP))
+       else if (unlikely(sock_read_timestamp(sk) == SK_DEFAULT_STAMP))
                sock_write_timestamp(sk, 0);
 }
 
index 04a3164..5066e45 100644 (file)
@@ -632,6 +632,7 @@ void tcp_reset(struct sock *sk, struct sk_buff *skb);
 void tcp_skb_mark_lost_uncond_verify(struct tcp_sock *tp, struct sk_buff *skb);
 void tcp_fin(struct sock *sk);
 void tcp_check_space(struct sock *sk);
+void tcp_sack_compress_send_ack(struct sock *sk);
 
 /* tcp_timer.c */
 void tcp_init_xmit_timers(struct sock *);
@@ -1470,6 +1471,8 @@ static inline void tcp_adjust_rcv_ssthresh(struct sock *sk)
 }
 
 void tcp_cleanup_rbuf(struct sock *sk, int copied);
+void __tcp_cleanup_rbuf(struct sock *sk, int copied);
+
 
 /* We provision sk_rcvbuf around 200% of sk_rcvlowat.
  * If 87.5 % (7/8) of the space has been consumed, we want to override
@@ -2326,6 +2329,14 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
 void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
 #endif /* CONFIG_BPF_SYSCALL */
 
+#ifdef CONFIG_INET
+void tcp_eat_skb(struct sock *sk, struct sk_buff *skb);
+#else
+static inline void tcp_eat_skb(struct sock *sk, struct sk_buff *skb)
+{
+}
+#endif
+
 int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
                          struct sk_msg *msg, u32 bytes, int flags);
 #endif /* CONFIG_NET_SOCK_MSG */
index 6056ce5..596595c 100644 (file)
@@ -126,6 +126,7 @@ struct tls_strparser {
        u32 mark : 8;
        u32 stopped : 1;
        u32 copy_mode : 1;
+       u32 mixed_decrypted : 1;
        u32 msg_ready : 1;
 
        struct strp_msg stm;
index 33ee3f5..151ca95 100644 (file)
@@ -1054,6 +1054,7 @@ struct xfrm_offload {
 struct sec_path {
        int                     len;
        int                     olen;
+       int                     verified_cnt;
 
        struct xfrm_state       *xvec[XFRM_MAX_DEPTH];
        struct xfrm_offload     ovec[XFRM_MAX_OFFLOAD_DEPTH];
index d808dc3..811a0f1 100644 (file)
@@ -194,29 +194,6 @@ static inline enum ib_mtu iboe_get_mtu(int mtu)
                return 0;
 }
 
-static inline int iboe_get_rate(struct net_device *dev)
-{
-       struct ethtool_link_ksettings cmd;
-       int err;
-
-       rtnl_lock();
-       err = __ethtool_get_link_ksettings(dev, &cmd);
-       rtnl_unlock();
-       if (err)
-               return IB_RATE_PORT_CURRENT;
-
-       if (cmd.base.speed >= 40000)
-               return IB_RATE_40_GBPS;
-       else if (cmd.base.speed >= 30000)
-               return IB_RATE_30_GBPS;
-       else if (cmd.base.speed >= 20000)
-               return IB_RATE_20_GBPS;
-       else if (cmd.base.speed >= 10000)
-               return IB_RATE_10_GBPS;
-       else
-               return IB_RATE_PORT_CURRENT;
-}
-
 static inline int rdma_link_local_addr(struct in6_addr *addr)
 {
        if (addr->s6_addr32[0] == htonl(0xfe800000) &&
index beac64e..a207c07 100644 (file)
@@ -45,11 +45,11 @@ typedef struct scsi_fctargaddress {
 
 int scsi_ioctl_block_when_processing_errors(struct scsi_device *sdev,
                int cmd, bool ndelay);
-int scsi_ioctl(struct scsi_device *sdev, fmode_t mode, int cmd,
+int scsi_ioctl(struct scsi_device *sdev, bool open_for_write, int cmd,
                void __user *arg);
 int get_sg_io_hdr(struct sg_io_hdr *hdr, const void __user *argp);
 int put_sg_io_hdr(const struct sg_io_hdr *hdr, void __user *argp);
-bool scsi_cmd_allowed(unsigned char *cmd, fmode_t mode);
+bool scsi_cmd_allowed(unsigned char *cmd, bool open_for_write);
 
 #endif /* __KERNEL__ */
 #endif /* _SCSI_IOCTL_H */
diff --git a/include/soc/imx/timer.h b/include/soc/imx/timer.h
deleted file mode 100644 (file)
index 25f29c6..0000000
+++ /dev/null
@@ -1,16 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright 2015 Linaro Ltd.
- */
-
-#ifndef __SOC_IMX_TIMER_H__
-#define __SOC_IMX_TIMER_H__
-
-enum imx_gpt_type {
-       GPT_TYPE_IMX1,          /* i.MX1 */
-       GPT_TYPE_IMX21,         /* i.MX21/27 */
-       GPT_TYPE_IMX31,         /* i.MX31/35/25/37/51/6Q */
-       GPT_TYPE_IMX6DL,        /* i.MX6DL/SX/SL */
-};
-
-#endif  /* __SOC_IMX_TIMER_H__ */
index dbc47af..4f44f0b 100644 (file)
@@ -44,6 +44,9 @@ int hdac_bus_eml_sdw_power_down_unlocked(struct hdac_bus *bus, int sublink);
 
 int hdac_bus_eml_sdw_set_lsdiid(struct hdac_bus *bus, int sublink, int dev_num);
 
+int hdac_bus_eml_sdw_map_stream_ch(struct hdac_bus *bus, int sublink, int y,
+                                  int channel_mask, int stream_id, int dir);
+
 void hda_bus_ml_put_all(struct hdac_bus *bus);
 void hda_bus_ml_reset_losidv(struct hdac_bus *bus);
 int hda_bus_ml_resume(struct hdac_bus *bus);
@@ -51,6 +54,7 @@ int hda_bus_ml_suspend(struct hdac_bus *bus);
 
 struct hdac_ext_link *hdac_bus_eml_ssp_get_hlink(struct hdac_bus *bus);
 struct hdac_ext_link *hdac_bus_eml_dmic_get_hlink(struct hdac_bus *bus);
+struct hdac_ext_link *hdac_bus_eml_sdw_get_hlink(struct hdac_bus *bus);
 
 struct mutex *hdac_bus_eml_get_mutex(struct hdac_bus *bus, bool alt, int elid);
 
@@ -144,6 +148,13 @@ hdac_bus_eml_sdw_power_down_unlocked(struct hdac_bus *bus, int sublink) { return
 static inline int
 hdac_bus_eml_sdw_set_lsdiid(struct hdac_bus *bus, int sublink, int dev_num) { return 0; }
 
+static inline int
+hdac_bus_eml_sdw_map_stream_ch(struct hdac_bus *bus, int sublink, int y,
+                              int channel_mask, int stream_id, int dir)
+{
+       return 0;
+}
+
 static inline void hda_bus_ml_put_all(struct hdac_bus *bus) { }
 static inline void hda_bus_ml_reset_losidv(struct hdac_bus *bus) { }
 static inline int hda_bus_ml_resume(struct hdac_bus *bus) { return 0; }
@@ -155,6 +166,9 @@ hdac_bus_eml_ssp_get_hlink(struct hdac_bus *bus) { return NULL; }
 static inline struct hdac_ext_link *
 hdac_bus_eml_dmic_get_hlink(struct hdac_bus *bus) { return NULL; }
 
+static inline struct hdac_ext_link *
+hdac_bus_eml_sdw_get_hlink(struct hdac_bus *bus) { return NULL; }
+
 static inline struct mutex *
 hdac_bus_eml_get_mutex(struct hdac_bus *bus, bool alt, int elid) { return NULL; }
 
index b38fd25..5282790 100644 (file)
@@ -170,6 +170,7 @@ struct snd_soc_acpi_link_adr {
 /* Descriptor for SST ASoC machine driver */
 struct snd_soc_acpi_mach {
        u8 id[ACPI_ID_LEN];
+       const char *uid;
        const struct snd_soc_acpi_codecs *comp_ids;
        const u32 link_mask;
        const struct snd_soc_acpi_link_adr *links;
index 4d6ac76..ebd2475 100644 (file)
@@ -122,6 +122,10 @@ int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
                struct snd_soc_pcm_runtime *be, int stream);
 
+/* can this BE perform prepare */
+int snd_soc_dpcm_can_be_prepared(struct snd_soc_pcm_runtime *fe,
+                                struct snd_soc_pcm_runtime *be, int stream);
+
 /* is the current PCM operation for this FE ? */
 int snd_soc_dpcm_fe_can_update(struct snd_soc_pcm_runtime *fe, int stream);
 
index 2291181..4c15420 100644 (file)
@@ -562,12 +562,13 @@ struct iscsit_conn {
 #define LOGIN_FLAGS_READ_ACTIVE                2
 #define LOGIN_FLAGS_WRITE_ACTIVE       3
 #define LOGIN_FLAGS_CLOSED             4
+#define LOGIN_FLAGS_WORKER_RUNNING     5
        unsigned long           login_flags;
        struct delayed_work     login_work;
        struct iscsi_login      *login;
        struct timer_list       nopin_timer;
        struct timer_list       nopin_response_timer;
-       struct timer_list       transport_timer;
+       struct timer_list       login_timer;
        struct task_struct      *login_kworker;
        /* Spinlock used for add/deleting cmd's from conn_cmd_list */
        spinlock_t              cmd_lock;
@@ -576,6 +577,8 @@ struct iscsit_conn {
        spinlock_t              nopin_timer_lock;
        spinlock_t              response_queue_lock;
        spinlock_t              state_lock;
+       spinlock_t              login_timer_lock;
+       spinlock_t              login_worker_lock;
        /* libcrypto RX and TX contexts for crc32c */
        struct ahash_request    *conn_rx_hash;
        struct ahash_request    *conn_tx_hash;
@@ -792,7 +795,6 @@ struct iscsi_np {
        enum np_thread_state_table np_thread_state;
        bool                    enabled;
        atomic_t                np_reset_count;
-       enum iscsi_timer_flags_table np_login_timer_flags;
        u32                     np_exports;
        enum np_flags_table     np_flags;
        spinlock_t              np_thread_lock;
@@ -800,7 +802,6 @@ struct iscsi_np {
        struct socket           *np_socket;
        struct sockaddr_storage np_sockaddr;
        struct task_struct      *np_thread;
-       struct timer_list       np_login_timer;
        void                    *np_context;
        struct iscsit_transport *np_transport;
        struct list_head        np_list;
index 7f4dfbd..40e60c3 100644 (file)
@@ -246,6 +246,32 @@ DEFINE_EVENT(block_rq, block_rq_merge,
 );
 
 /**
+ * block_io_start - insert a request for execution
+ * @rq: block IO operation request
+ *
+ * Called when block operation request @rq is queued for execution
+ */
+DEFINE_EVENT(block_rq, block_io_start,
+
+       TP_PROTO(struct request *rq),
+
+       TP_ARGS(rq)
+);
+
+/**
+ * block_io_done - block IO operation request completed
+ * @rq: block IO operation request
+ *
+ * Called when block operation request @rq is completed
+ */
+DEFINE_EVENT(block_rq, block_io_done,
+
+       TP_PROTO(struct request *rq),
+
+       TP_ARGS(rq)
+);
+
+/**
  * block_bio_complete - completed all work on the block operation
  * @q: queue holding the block operation
  * @bio: block operation completed
index 8ea9cea..a8206f5 100644 (file)
@@ -661,6 +661,35 @@ DEFINE_EVENT(btrfs__ordered_extent, btrfs_ordered_extent_mark_finished,
             TP_ARGS(inode, ordered)
 );
 
+TRACE_EVENT(btrfs_finish_ordered_extent,
+
+       TP_PROTO(const struct btrfs_inode *inode, u64 start, u64 len,
+                bool uptodate),
+
+       TP_ARGS(inode, start, len, uptodate),
+
+       TP_STRUCT__entry_btrfs(
+               __field(        u64,     ino            )
+               __field(        u64,     start          )
+               __field(        u64,     len            )
+               __field(        bool,    uptodate       )
+               __field(        u64,     root_objectid  )
+       ),
+
+       TP_fast_assign_btrfs(inode->root->fs_info,
+               __entry->ino    = btrfs_ino(inode);
+               __entry->start  = start;
+               __entry->len    = len;
+               __entry->uptodate = uptodate;
+               __entry->root_objectid = inode->root->root_key.objectid;
+       ),
+
+       TP_printk_btrfs("root=%llu(%s) ino=%llu start=%llu len=%llu uptodate=%d",
+                 show_root_type(__entry->root_objectid),
+                 __entry->ino, __entry->start,
+                 __entry->len, !!__entry->uptodate)
+);
+
 DECLARE_EVENT_CLASS(btrfs__writepage,
 
        TP_PROTO(const struct page *page, const struct inode *inode,
@@ -1982,25 +2011,27 @@ DEFINE_EVENT(btrfs__prelim_ref, btrfs_prelim_ref_insert,
 );
 
 TRACE_EVENT(btrfs_inode_mod_outstanding_extents,
-       TP_PROTO(const struct btrfs_root *root, u64 ino, int mod),
+       TP_PROTO(const struct btrfs_root *root, u64 ino, int mod, unsigned outstanding),
 
-       TP_ARGS(root, ino, mod),
+       TP_ARGS(root, ino, mod, outstanding),
 
        TP_STRUCT__entry_btrfs(
                __field(        u64, root_objectid      )
                __field(        u64, ino                )
                __field(        int, mod                )
+               __field(        unsigned, outstanding   )
        ),
 
        TP_fast_assign_btrfs(root->fs_info,
                __entry->root_objectid  = root->root_key.objectid;
                __entry->ino            = ino;
                __entry->mod            = mod;
+               __entry->outstanding    = outstanding;
        ),
 
-       TP_printk_btrfs("root=%llu(%s) ino=%llu mod=%d",
+       TP_printk_btrfs("root=%llu(%s) ino=%llu mod=%d outstanding=%u",
                        show_root_type(__entry->root_objectid),
-                       __entry->ino, __entry->mod)
+                       __entry->ino, __entry->mod, __entry->outstanding)
 );
 
 DECLARE_EVENT_CLASS(btrfs__block_group,
diff --git a/include/trace/events/csd.h b/include/trace/events/csd.h
new file mode 100644 (file)
index 0000000..67e9d01
--- /dev/null
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM csd
+
+#if !defined(_TRACE_CSD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_CSD_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(csd_queue_cpu,
+
+       TP_PROTO(const unsigned int cpu,
+               unsigned long callsite,
+               smp_call_func_t func,
+               struct __call_single_data *csd),
+
+       TP_ARGS(cpu, callsite, func, csd),
+
+       TP_STRUCT__entry(
+               __field(unsigned int, cpu)
+               __field(void *, callsite)
+               __field(void *, func)
+               __field(void *, csd)
+               ),
+
+           TP_fast_assign(
+               __entry->cpu = cpu;
+               __entry->callsite = (void *)callsite;
+               __entry->func = func;
+               __entry->csd  = csd;
+               ),
+
+       TP_printk("cpu=%u callsite=%pS func=%ps csd=%p",
+               __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
+       );
+
+/*
+ * Tracepoints for a function which is called as an effect of smp_call_function.*
+ */
+DECLARE_EVENT_CLASS(csd_function,
+
+       TP_PROTO(smp_call_func_t func, struct __call_single_data *csd),
+
+       TP_ARGS(func, csd),
+
+       TP_STRUCT__entry(
+               __field(void *, func)
+               __field(void *, csd)
+       ),
+
+       TP_fast_assign(
+               __entry->func   = func;
+               __entry->csd    = csd;
+       ),
+
+       TP_printk("func=%ps, csd=%p", __entry->func, __entry->csd)
+);
+
+DEFINE_EVENT(csd_function, csd_function_entry,
+       TP_PROTO(smp_call_func_t func, struct __call_single_data *csd),
+       TP_ARGS(func, csd)
+);
+
+DEFINE_EVENT(csd_function, csd_function_exit,
+       TP_PROTO(smp_call_func_t func, struct __call_single_data *csd),
+       TP_ARGS(func, csd)
+);
+
+#endif /* _TRACE_CSD_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
index 8f461e0..f8069ef 100644 (file)
@@ -2112,6 +2112,14 @@ DEFINE_POST_CHUNK_EVENT(read);
 DEFINE_POST_CHUNK_EVENT(write);
 DEFINE_POST_CHUNK_EVENT(reply);
 
+DEFINE_EVENT(svcrdma_post_chunk_class, svcrdma_cc_release,
+       TP_PROTO(
+               const struct rpc_rdma_cid *cid,
+               int sqecount
+       ),
+       TP_ARGS(cid, sqecount)
+);
+
 TRACE_EVENT(svcrdma_wc_read,
        TP_PROTO(
                const struct ib_wc *wc,
index 31bc702..69e42ef 100644 (file)
@@ -2104,31 +2104,46 @@ DEFINE_SVC_DEFERRED_EVENT(drop);
 DEFINE_SVC_DEFERRED_EVENT(queue);
 DEFINE_SVC_DEFERRED_EVENT(recv);
 
-TRACE_EVENT(svcsock_new_socket,
+DECLARE_EVENT_CLASS(svcsock_lifetime_class,
        TP_PROTO(
+               const void *svsk,
                const struct socket *socket
        ),
-
-       TP_ARGS(socket),
-
+       TP_ARGS(svsk, socket),
        TP_STRUCT__entry(
+               __field(unsigned int, netns_ino)
+               __field(const void *, svsk)
+               __field(const void *, sk)
                __field(unsigned long, type)
                __field(unsigned long, family)
-               __field(bool, listener)
+               __field(unsigned long, state)
        ),
-
        TP_fast_assign(
+               struct sock *sk = socket->sk;
+
+               __entry->netns_ino = sock_net(sk)->ns.inum;
+               __entry->svsk = svsk;
+               __entry->sk = sk;
                __entry->type = socket->type;
-               __entry->family = socket->sk->sk_family;
-               __entry->listener = (socket->sk->sk_state == TCP_LISTEN);
+               __entry->family = sk->sk_family;
+               __entry->state = sk->sk_state;
        ),
-
-       TP_printk("type=%s family=%s%s",
-               show_socket_type(__entry->type),
+       TP_printk("svsk=%p type=%s family=%s%s",
+               __entry->svsk, show_socket_type(__entry->type),
                rpc_show_address_family(__entry->family),
-               __entry->listener ? " (listener)" : ""
+               __entry->state == TCP_LISTEN ? " (listener)" : ""
        )
 );
+#define DEFINE_SVCSOCK_LIFETIME_EVENT(name) \
+       DEFINE_EVENT(svcsock_lifetime_class, name, \
+               TP_PROTO( \
+                       const void *svsk, \
+                       const struct socket *socket \
+               ), \
+               TP_ARGS(svsk, socket))
+
+DEFINE_SVCSOCK_LIFETIME_EVENT(svcsock_new);
+DEFINE_SVCSOCK_LIFETIME_EVENT(svcsock_free);
 
 TRACE_EVENT(svcsock_marker,
        TP_PROTO(
index 3e8619c..b4bc282 100644 (file)
@@ -158,7 +158,11 @@ DEFINE_EVENT(timer_class, timer_cancel,
                { HRTIMER_MODE_ABS_SOFT,        "ABS|SOFT"      },      \
                { HRTIMER_MODE_REL_SOFT,        "REL|SOFT"      },      \
                { HRTIMER_MODE_ABS_PINNED_SOFT, "ABS|PINNED|SOFT" },    \
-               { HRTIMER_MODE_REL_PINNED_SOFT, "REL|PINNED|SOFT" })
+               { HRTIMER_MODE_REL_PINNED_SOFT, "REL|PINNED|SOFT" },    \
+               { HRTIMER_MODE_ABS_HARD,        "ABS|HARD" },           \
+               { HRTIMER_MODE_REL_HARD,        "REL|HARD" },           \
+               { HRTIMER_MODE_ABS_PINNED_HARD, "ABS|PINNED|HARD" },    \
+               { HRTIMER_MODE_REL_PINNED_HARD, "REL|PINNED|HARD" })
 
 /**
  * hrtimer_init - called when the hrtimer is initialized
index 86b2a82..54e353c 100644 (file)
@@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(writeback_folio_template,
                strscpy_pad(__entry->name,
                            bdi_dev_name(mapping ? inode_to_bdi(mapping->host) :
                                         NULL), 32);
-               __entry->ino = mapping ? mapping->host->i_ino : 0;
+               __entry->ino = (mapping && mapping->host) ? mapping->host->i_ino : 0;
                __entry->index = folio->index;
        ),
 
index 5e2fb84..a5aff2e 100644 (file)
@@ -7,42 +7,42 @@
 /* Just the needed definitions for the RDB of an Amiga HD. */
 
 struct RigidDiskBlock {
-       __u32   rdb_ID;
+       __be32  rdb_ID;
        __be32  rdb_SummedLongs;
-       __s32   rdb_ChkSum;
-       __u32   rdb_HostID;
+       __be32  rdb_ChkSum;
+       __be32  rdb_HostID;
        __be32  rdb_BlockBytes;
-       __u32   rdb_Flags;
-       __u32   rdb_BadBlockList;
+       __be32  rdb_Flags;
+       __be32  rdb_BadBlockList;
        __be32  rdb_PartitionList;
-       __u32   rdb_FileSysHeaderList;
-       __u32   rdb_DriveInit;
-       __u32   rdb_Reserved1[6];
-       __u32   rdb_Cylinders;
-       __u32   rdb_Sectors;
-       __u32   rdb_Heads;
-       __u32   rdb_Interleave;
-       __u32   rdb_Park;
-       __u32   rdb_Reserved2[3];
-       __u32   rdb_WritePreComp;
-       __u32   rdb_ReducedWrite;
-       __u32   rdb_StepRate;
-       __u32   rdb_Reserved3[5];
-       __u32   rdb_RDBBlocksLo;
-       __u32   rdb_RDBBlocksHi;
-       __u32   rdb_LoCylinder;
-       __u32   rdb_HiCylinder;
-       __u32   rdb_CylBlocks;
-       __u32   rdb_AutoParkSeconds;
-       __u32   rdb_HighRDSKBlock;
-       __u32   rdb_Reserved4;
+       __be32  rdb_FileSysHeaderList;
+       __be32  rdb_DriveInit;
+       __be32  rdb_Reserved1[6];
+       __be32  rdb_Cylinders;
+       __be32  rdb_Sectors;
+       __be32  rdb_Heads;
+       __be32  rdb_Interleave;
+       __be32  rdb_Park;
+       __be32  rdb_Reserved2[3];
+       __be32  rdb_WritePreComp;
+       __be32  rdb_ReducedWrite;
+       __be32  rdb_StepRate;
+       __be32  rdb_Reserved3[5];
+       __be32  rdb_RDBBlocksLo;
+       __be32  rdb_RDBBlocksHi;
+       __be32  rdb_LoCylinder;
+       __be32  rdb_HiCylinder;
+       __be32  rdb_CylBlocks;
+       __be32  rdb_AutoParkSeconds;
+       __be32  rdb_HighRDSKBlock;
+       __be32  rdb_Reserved4;
        char    rdb_DiskVendor[8];
        char    rdb_DiskProduct[16];
        char    rdb_DiskRevision[4];
        char    rdb_ControllerVendor[8];
        char    rdb_ControllerProduct[16];
        char    rdb_ControllerRevision[4];
-       __u32   rdb_Reserved5[10];
+       __be32  rdb_Reserved5[10];
 };
 
 #define        IDNAME_RIGIDDISK        0x5244534B      /* "RDSK" */
@@ -50,16 +50,16 @@ struct RigidDiskBlock {
 struct PartitionBlock {
        __be32  pb_ID;
        __be32  pb_SummedLongs;
-       __s32   pb_ChkSum;
-       __u32   pb_HostID;
+       __be32  pb_ChkSum;
+       __be32  pb_HostID;
        __be32  pb_Next;
-       __u32   pb_Flags;
-       __u32   pb_Reserved1[2];
-       __u32   pb_DevFlags;
+       __be32  pb_Flags;
+       __be32  pb_Reserved1[2];
+       __be32  pb_DevFlags;
        __u8    pb_DriveName[32];
-       __u32   pb_Reserved2[15];
+       __be32  pb_Reserved2[15];
        __be32  pb_Environment[17];
-       __u32   pb_EReserved[15];
+       __be32  pb_EReserved[15];
 };
 
 #define        IDNAME_PARTITION        0x50415254      /* "PART" */
index 1bb11a6..c994ff5 100644 (file)
@@ -1035,6 +1035,7 @@ enum bpf_attach_type {
        BPF_TRACE_KPROBE_MULTI,
        BPF_LSM_CGROUP,
        BPF_STRUCT_OPS,
+       BPF_NETFILTER,
        __MAX_BPF_ATTACH_TYPE
 };
 
index 1ebf8d4..73e2c10 100644 (file)
@@ -783,7 +783,7 @@ enum {
 
        /* add new constants above here */
        __ETHTOOL_A_STATS_GRP_CNT,
-       ETHTOOL_A_STATS_GRP_MAX = (__ETHTOOL_A_STATS_CNT - 1)
+       ETHTOOL_A_STATS_GRP_MAX = (__ETHTOOL_A_STATS_GRP_CNT - 1)
 };
 
 enum {
diff --git a/include/uapi/linux/eventfd.h b/include/uapi/linux/eventfd.h
new file mode 100644 (file)
index 0000000..2eb9ab6
--- /dev/null
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_EVENTFD_H
+#define _UAPI_LINUX_EVENTFD_H
+
+#include <linux/fcntl.h>
+
+#define EFD_SEMAPHORE (1 << 0)
+#define EFD_CLOEXEC O_CLOEXEC
+#define EFD_NONBLOCK O_NONBLOCK
+
+#endif /* _UAPI_LINUX_EVENTFD_H */
index 1de4d0b..3d7ea58 100644 (file)
@@ -44,6 +44,7 @@ enum {
        HANDSHAKE_A_ACCEPT_AUTH_MODE,
        HANDSHAKE_A_ACCEPT_PEER_IDENTITY,
        HANDSHAKE_A_ACCEPT_CERTIFICATE,
+       HANDSHAKE_A_ACCEPT_PEERNAME,
 
        __HANDSHAKE_A_ACCEPT_MAX,
        HANDSHAKE_A_ACCEPT_MAX = (__HANDSHAKE_A_ACCEPT_MAX - 1)
index 4b7f2df..e682ab6 100644 (file)
@@ -163,6 +163,7 @@ struct in_addr {
 #define IP_MULTICAST_ALL               49
 #define IP_UNICAST_IF                  50
 #define IP_LOCAL_PORT_RANGE            51
+#define IP_PROTOCOL                    52
 
 #define MCAST_EXCLUDE  0
 #define MCAST_INCLUDE  1
index 0716cb1..f222d26 100644 (file)
@@ -173,6 +173,18 @@ enum {
  */
 #define IORING_SETUP_DEFER_TASKRUN     (1U << 13)
 
+/*
+ * Application provides the memory for the rings
+ */
+#define IORING_SETUP_NO_MMAP           (1U << 14)
+
+/*
+ * Register the ring fd in itself for use with
+ * IORING_REGISTER_USE_REGISTERED_RING; return a registered fd index rather
+ * than an fd.
+ */
+#define IORING_SETUP_REGISTERED_FD_ONLY        (1U << 15)
+
 enum io_uring_op {
        IORING_OP_NOP,
        IORING_OP_READV,
@@ -406,7 +418,7 @@ struct io_sqring_offsets {
        __u32 dropped;
        __u32 array;
        __u32 resv1;
-       __u64 resv2;
+       __u64 user_addr;
 };
 
 /*
@@ -425,7 +437,7 @@ struct io_cqring_offsets {
        __u32 cqes;
        __u32 flags;
        __u32 resv1;
-       __u64 resv2;
+       __u64 user_addr;
 };
 
 /*
index 4d93967..8eb0d7b 100644 (file)
@@ -74,7 +74,8 @@
 #define MOVE_MOUNT_T_AUTOMOUNTS                0x00000020 /* Follow automounts on to path */
 #define MOVE_MOUNT_T_EMPTY_PATH                0x00000040 /* Empty to path permitted */
 #define MOVE_MOUNT_SET_GROUP           0x00000100 /* Set sharing group instead */
-#define MOVE_MOUNT__MASK               0x00000177
+#define MOVE_MOUNT_BENEATH             0x00000200 /* Mount beneath top mount */
+#define MOVE_MOUNT__MASK               0x00000377
 
 /*
  * fsopen() flags.
index 6a5552d..987a302 100644 (file)
@@ -16,6 +16,7 @@
 #include <linux/types.h>
 
 /*
+ * UNUSED:
  * 1 for normal debug messages, 2 is very verbose. 0 to turn it off.
  */
 #define PACKET_DEBUG           1
index 308433b..6375a06 100644 (file)
 
 #include <linux/posix_types.h>
 
+#ifdef __SIZEOF_INT128__
+typedef __signed__ __int128 __s128 __attribute__((aligned(16)));
+typedef unsigned __int128 __u128 __attribute__((aligned(16)));
+#endif
 
 /*
  * Below are truly Linux-specific types that should never collide with
index 640bf68..4b8558d 100644 (file)
        _IOWR('u', UBLK_CMD_END_USER_RECOVERY, struct ublksrv_ctrl_cmd)
 #define UBLK_U_CMD_GET_DEV_INFO2       \
        _IOR('u', UBLK_CMD_GET_DEV_INFO2, struct ublksrv_ctrl_cmd)
+#define UBLK_U_CMD_GET_FEATURES        \
+       _IOR('u', 0x13, struct ublksrv_ctrl_cmd)
+
+/*
+ * 64bits are enough now, and it should be easy to extend in case of
+ * running out of feature flags
+ */
+#define UBLK_FEATURES_LEN  8
 
 /*
  * IO commands, issued by ublk server, and handled by ublk driver.
 #define UBLKSRV_CMD_BUF_OFFSET 0
 #define UBLKSRV_IO_BUF_OFFSET  0x80000000
 
-/* tag bit is 12bit, so at most 4096 IOs for each queue */
+/* tag bit is 16bit, so far limit at most 4096 IOs for each queue */
 #define UBLK_MAX_QUEUE_DEPTH   4096
 
+/* single IO buffer max size is 32MB */
+#define UBLK_IO_BUF_OFF                0
+#define UBLK_IO_BUF_BITS       25
+#define UBLK_IO_BUF_BITS_MASK  ((1ULL << UBLK_IO_BUF_BITS) - 1)
+
+/* so at most 64K IOs for each queue */
+#define UBLK_TAG_OFF           UBLK_IO_BUF_BITS
+#define UBLK_TAG_BITS          16
+#define UBLK_TAG_BITS_MASK     ((1ULL << UBLK_TAG_BITS) - 1)
+
+/* max 4096 queues */
+#define UBLK_QID_OFF           (UBLK_TAG_OFF + UBLK_TAG_BITS)
+#define UBLK_QID_BITS          12
+#define UBLK_QID_BITS_MASK     ((1ULL << UBLK_QID_BITS) - 1)
+
+#define UBLK_MAX_NR_QUEUES     (1U << UBLK_QID_BITS)
+
+#define UBLKSRV_IO_BUF_TOTAL_BITS      (UBLK_QID_OFF + UBLK_QID_BITS)
+#define UBLKSRV_IO_BUF_TOTAL_SIZE      (1ULL << UBLKSRV_IO_BUF_TOTAL_BITS)
+
 /*
  * zero copy requires 4k block size, and can remap ublk driver's io
  * request into ublksrv's vm space
 /* use ioctl encoding for uring command */
 #define UBLK_F_CMD_IOCTL_ENCODE        (1UL << 6)
 
+/* Copy between request and user buffer by pread()/pwrite() */
+#define UBLK_F_USER_COPY       (1UL << 7)
+
 /* device state */
 #define UBLK_S_DEV_DEAD        0
 #define UBLK_S_DEV_LIVE        1
index 0552e8d..b71276b 100644 (file)
@@ -646,6 +646,15 @@ enum {
        VFIO_CCW_NUM_IRQS
 };
 
+/*
+ * The vfio-ap bus driver makes use of the following IRQ index mapping.
+ * Unimplemented IRQ types return a count of zero.
+ */
+enum {
+       VFIO_AP_REQ_IRQ_INDEX,
+       VFIO_AP_NUM_IRQS
+};
+
 /**
  * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
  *                                           struct vfio_pci_hot_reset_info)
index f29899b..4bf9c4f 100644 (file)
@@ -66,7 +66,8 @@ enum skl_ch_cfg {
        SKL_CH_CFG_DUAL_MONO = 9,
        SKL_CH_CFG_I2S_DUAL_STEREO_0 = 10,
        SKL_CH_CFG_I2S_DUAL_STEREO_1 = 11,
-       SKL_CH_CFG_4_CHANNEL = 12,
+       SKL_CH_CFG_7_1 = 12,
+       SKL_CH_CFG_4_CHANNEL = SKL_CH_CFG_7_1,
        SKL_CH_CFG_INVALID
 };
 
index bbc3787..e9ec7e4 100644 (file)
 #define SOF_TKN_CAVS_AUDIO_FORMAT_IN_INTERLEAVING_STYLE        1906
 #define SOF_TKN_CAVS_AUDIO_FORMAT_IN_FMT_CFG   1907
 #define SOF_TKN_CAVS_AUDIO_FORMAT_IN_SAMPLE_TYPE       1908
-#define SOF_TKN_CAVS_AUDIO_FORMAT_PIN_INDEX            1909
+#define SOF_TKN_CAVS_AUDIO_FORMAT_INPUT_PIN_INDEX      1909
 /* intentional token numbering discontinuity, reserved for future use */
 #define SOF_TKN_CAVS_AUDIO_FORMAT_OUT_RATE     1930
 #define SOF_TKN_CAVS_AUDIO_FORMAT_OUT_BIT_DEPTH        1931
 #define SOF_TKN_CAVS_AUDIO_FORMAT_OUT_INTERLEAVING_STYLE       1936
 #define SOF_TKN_CAVS_AUDIO_FORMAT_OUT_FMT_CFG  1937
 #define SOF_TKN_CAVS_AUDIO_FORMAT_OUT_SAMPLE_TYPE      1938
+#define SOF_TKN_CAVS_AUDIO_FORMAT_OUTPUT_PIN_INDEX     1939
 /* intentional token numbering discontinuity, reserved for future use */
 #define SOF_TKN_CAVS_AUDIO_FORMAT_IBS          1970
 #define SOF_TKN_CAVS_AUDIO_FORMAT_OBS          1971
index f755329..df1d04f 100644 (file)
@@ -1133,7 +1133,7 @@ static inline size_t ufshcd_sg_entry_size(const struct ufs_hba *hba)
        ({ (void)(hba); BUILD_BUG_ON(sg_entry_size != sizeof(struct ufshcd_sg_entry)); })
 #endif
 
-static inline size_t sizeof_utp_transfer_cmd_desc(const struct ufs_hba *hba)
+static inline size_t ufshcd_get_ucd_size(const struct ufs_hba *hba)
 {
        return sizeof(struct utp_transfer_cmd_desc) + SG_ALL * ufshcd_sg_entry_size(hba);
 }
index 44c2855..ac1281c 100644 (file)
@@ -138,4 +138,7 @@ int xen_test_irq_shared(int irq);
 
 /* initialize Xen IRQ subsystem */
 void xen_init_IRQ(void);
+
+irqreturn_t xen_debug_interrupt(int irq, void *dev_id);
+
 #endif /* _XEN_EVENTS_H */
index 0efeb65..f989162 100644 (file)
@@ -31,6 +31,9 @@ extern uint32_t xen_start_flags;
 
 #include <xen/interface/hvm/start_info.h>
 extern struct hvm_start_info pvh_start_info;
+void xen_prepare_pvh(void);
+struct pt_regs;
+void xen_pv_evtchn_do_upcall(struct pt_regs *regs);
 
 #ifdef CONFIG_XEN_DOM0
 #include <xen/interface/xen.h>
index 811e94d..2a970f6 100644 (file)
@@ -28,7 +28,6 @@
 #include "do_mounts.h"
 
 int root_mountflags = MS_RDONLY | MS_SILENT;
-static char * __initdata root_device_name;
 static char __initdata saved_root_name[64];
 static int root_wait;
 
@@ -60,240 +59,6 @@ static int __init readwrite(char *str)
 __setup("ro", readonly);
 __setup("rw", readwrite);
 
-#ifdef CONFIG_BLOCK
-struct uuidcmp {
-       const char *uuid;
-       int len;
-};
-
-/**
- * match_dev_by_uuid - callback for finding a partition using its uuid
- * @dev:       device passed in by the caller
- * @data:      opaque pointer to the desired struct uuidcmp to match
- *
- * Returns 1 if the device matches, and 0 otherwise.
- */
-static int match_dev_by_uuid(struct device *dev, const void *data)
-{
-       struct block_device *bdev = dev_to_bdev(dev);
-       const struct uuidcmp *cmp = data;
-
-       if (!bdev->bd_meta_info ||
-           strncasecmp(cmp->uuid, bdev->bd_meta_info->uuid, cmp->len))
-               return 0;
-       return 1;
-}
-
-/**
- * devt_from_partuuid - looks up the dev_t of a partition by its UUID
- * @uuid_str:  char array containing ascii UUID
- *
- * The function will return the first partition which contains a matching
- * UUID value in its partition_meta_info struct.  This does not search
- * by filesystem UUIDs.
- *
- * If @uuid_str is followed by a "/PARTNROFF=%d", then the number will be
- * extracted and used as an offset from the partition identified by the UUID.
- *
- * Returns the matching dev_t on success or 0 on failure.
- */
-static dev_t devt_from_partuuid(const char *uuid_str)
-{
-       struct uuidcmp cmp;
-       struct device *dev = NULL;
-       dev_t devt = 0;
-       int offset = 0;
-       char *slash;
-
-       cmp.uuid = uuid_str;
-
-       slash = strchr(uuid_str, '/');
-       /* Check for optional partition number offset attributes. */
-       if (slash) {
-               char c = 0;
-
-               /* Explicitly fail on poor PARTUUID syntax. */
-               if (sscanf(slash + 1, "PARTNROFF=%d%c", &offset, &c) != 1)
-                       goto clear_root_wait;
-               cmp.len = slash - uuid_str;
-       } else {
-               cmp.len = strlen(uuid_str);
-       }
-
-       if (!cmp.len)
-               goto clear_root_wait;
-
-       dev = class_find_device(&block_class, NULL, &cmp, &match_dev_by_uuid);
-       if (!dev)
-               return 0;
-
-       if (offset) {
-               /*
-                * Attempt to find the requested partition by adding an offset
-                * to the partition number found by UUID.
-                */
-               devt = part_devt(dev_to_disk(dev),
-                                dev_to_bdev(dev)->bd_partno + offset);
-       } else {
-               devt = dev->devt;
-       }
-
-       put_device(dev);
-       return devt;
-
-clear_root_wait:
-       pr_err("VFS: PARTUUID= is invalid.\n"
-              "Expected PARTUUID=<valid-uuid-id>[/PARTNROFF=%%d]\n");
-       if (root_wait)
-               pr_err("Disabling rootwait; root= is invalid.\n");
-       root_wait = 0;
-       return 0;
-}
-
-/**
- * match_dev_by_label - callback for finding a partition using its label
- * @dev:       device passed in by the caller
- * @data:      opaque pointer to the label to match
- *
- * Returns 1 if the device matches, and 0 otherwise.
- */
-static int match_dev_by_label(struct device *dev, const void *data)
-{
-       struct block_device *bdev = dev_to_bdev(dev);
-       const char *label = data;
-
-       if (!bdev->bd_meta_info || strcmp(label, bdev->bd_meta_info->volname))
-               return 0;
-       return 1;
-}
-
-static dev_t devt_from_partlabel(const char *label)
-{
-       struct device *dev;
-       dev_t devt = 0;
-
-       dev = class_find_device(&block_class, NULL, label, &match_dev_by_label);
-       if (dev) {
-               devt = dev->devt;
-               put_device(dev);
-       }
-
-       return devt;
-}
-
-static dev_t devt_from_devname(const char *name)
-{
-       dev_t devt = 0;
-       int part;
-       char s[32];
-       char *p;
-
-       if (strlen(name) > 31)
-               return 0;
-       strcpy(s, name);
-       for (p = s; *p; p++) {
-               if (*p == '/')
-                       *p = '!';
-       }
-
-       devt = blk_lookup_devt(s, 0);
-       if (devt)
-               return devt;
-
-       /*
-        * Try non-existent, but valid partition, which may only exist after
-        * opening the device, like partitioned md devices.
-        */
-       while (p > s && isdigit(p[-1]))
-               p--;
-       if (p == s || !*p || *p == '0')
-               return 0;
-
-       /* try disk name without <part number> */
-       part = simple_strtoul(p, NULL, 10);
-       *p = '\0';
-       devt = blk_lookup_devt(s, part);
-       if (devt)
-               return devt;
-
-       /* try disk name without p<part number> */
-       if (p < s + 2 || !isdigit(p[-2]) || p[-1] != 'p')
-               return 0;
-       p[-1] = '\0';
-       return blk_lookup_devt(s, part);
-}
-#endif /* CONFIG_BLOCK */
-
-static dev_t devt_from_devnum(const char *name)
-{
-       unsigned maj, min, offset;
-       dev_t devt = 0;
-       char *p, dummy;
-
-       if (sscanf(name, "%u:%u%c", &maj, &min, &dummy) == 2 ||
-           sscanf(name, "%u:%u:%u:%c", &maj, &min, &offset, &dummy) == 3) {
-               devt = MKDEV(maj, min);
-               if (maj != MAJOR(devt) || min != MINOR(devt))
-                       return 0;
-       } else {
-               devt = new_decode_dev(simple_strtoul(name, &p, 16));
-               if (*p)
-                       return 0;
-       }
-
-       return devt;
-}
-
-/*
- *     Convert a name into device number.  We accept the following variants:
- *
- *     1) <hex_major><hex_minor> device number in hexadecimal represents itself
- *         no leading 0x, for example b302.
- *     2) /dev/nfs represents Root_NFS (0xff)
- *     3) /dev/<disk_name> represents the device number of disk
- *     4) /dev/<disk_name><decimal> represents the device number
- *         of partition - device number of disk plus the partition number
- *     5) /dev/<disk_name>p<decimal> - same as the above, that form is
- *        used when disk name of partitioned disk ends on a digit.
- *     6) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
- *        unique id of a partition if the partition table provides it.
- *        The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
- *        partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
- *        filled hex representation of the 32-bit "NT disk signature", and PP
- *        is a zero-filled hex representation of the 1-based partition number.
- *     7) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
- *        a partition with a known unique id.
- *     8) <major>:<minor> major and minor number of the device separated by
- *        a colon.
- *     9) PARTLABEL=<name> with name being the GPT partition label.
- *        MSDOS partitions do not support labels!
- *     10) /dev/cifs represents Root_CIFS (0xfe)
- *
- *     If name doesn't have fall into the categories above, we return (0,0).
- *     block_class is used to check if something is a disk name. If the disk
- *     name contains slashes, the device name has them replaced with
- *     bangs.
- */
-dev_t name_to_dev_t(const char *name)
-{
-       if (strcmp(name, "/dev/nfs") == 0)
-               return Root_NFS;
-       if (strcmp(name, "/dev/cifs") == 0)
-               return Root_CIFS;
-       if (strcmp(name, "/dev/ram") == 0)
-               return Root_RAM0;
-#ifdef CONFIG_BLOCK
-       if (strncmp(name, "PARTUUID=", 9) == 0)
-               return devt_from_partuuid(name + 9);
-       if (strncmp(name, "PARTLABEL=", 10) == 0)
-               return devt_from_partlabel(name + 10);
-       if (strncmp(name, "/dev/", 5) == 0)
-               return devt_from_devname(name + 5);
-#endif
-       return devt_from_devnum(name);
-}
-EXPORT_SYMBOL_GPL(name_to_dev_t);
-
 static int __init root_dev_setup(char *line)
 {
        strscpy(saved_root_name, line, sizeof(saved_root_name));
@@ -338,7 +103,7 @@ __setup("rootfstype=", fs_names_setup);
 __setup("rootdelay=", root_delay_setup);
 
 /* This can return zero length strings. Caller should check */
-static int __init split_fs_names(char *page, size_t size, char *names)
+static int __init split_fs_names(char *page, size_t size)
 {
        int count = 1;
        char *p = page;
@@ -391,7 +156,7 @@ out:
        return ret;
 }
 
-void __init mount_block_root(char *name, int flags)
+void __init mount_root_generic(char *name, char *pretty_name, int flags)
 {
        struct page *page = alloc_page(GFP_KERNEL);
        char *fs_names = page_address(page);
@@ -402,7 +167,7 @@ void __init mount_block_root(char *name, int flags)
        scnprintf(b, BDEVNAME_SIZE, "unknown-block(%u,%u)",
                  MAJOR(ROOT_DEV), MINOR(ROOT_DEV));
        if (root_fs_names)
-               num_fs = split_fs_names(fs_names, PAGE_SIZE, root_fs_names);
+               num_fs = split_fs_names(fs_names, PAGE_SIZE);
        else
                num_fs = list_bdev_fs_names(fs_names, PAGE_SIZE);
 retry:
@@ -425,7 +190,7 @@ retry:
                 * and give them a list of the available devices
                 */
                printk("VFS: Cannot open root device \"%s\" or %s: error %d\n",
-                               root_device_name, b, err);
+                               pretty_name, b, err);
                printk("Please append a correct \"root=\" boot option; here are the available partitions:\n");
 
                printk_all_partitions();
@@ -453,15 +218,14 @@ out:
 #define NFSROOT_TIMEOUT_MAX    30
 #define NFSROOT_RETRY_MAX      5
 
-static int __init mount_nfs_root(void)
+static void __init mount_nfs_root(void)
 {
        char *root_dev, *root_data;
        unsigned int timeout;
-       int try, err;
+       int try;
 
-       err = nfs_root_data(&root_dev, &root_data);
-       if (err != 0)
-               return 0;
+       if (nfs_root_data(&root_dev, &root_data))
+               goto fail;
 
        /*
         * The server or network may not be ready, so try several
@@ -470,10 +234,8 @@ static int __init mount_nfs_root(void)
         */
        timeout = NFSROOT_TIMEOUT_MIN;
        for (try = 1; ; try++) {
-               err = do_mount_root(root_dev, "nfs",
-                                       root_mountflags, root_data);
-               if (err == 0)
-                       return 1;
+               if (!do_mount_root(root_dev, "nfs", root_mountflags, root_data))
+                       return;
                if (try > NFSROOT_RETRY_MAX)
                        break;
 
@@ -483,9 +245,14 @@ static int __init mount_nfs_root(void)
                if (timeout > NFSROOT_TIMEOUT_MAX)
                        timeout = NFSROOT_TIMEOUT_MAX;
        }
-       return 0;
+fail:
+       pr_err("VFS: Unable to mount root fs via NFS.\n");
+}
+#else
+static inline void mount_nfs_root(void)
+{
 }
-#endif
+#endif /* CONFIG_ROOT_NFS */
 
 #ifdef CONFIG_CIFS_ROOT
 
@@ -495,22 +262,20 @@ extern int cifs_root_data(char **dev, char **opts);
 #define CIFSROOT_TIMEOUT_MAX   30
 #define CIFSROOT_RETRY_MAX     5
 
-static int __init mount_cifs_root(void)
+static void __init mount_cifs_root(void)
 {
        char *root_dev, *root_data;
        unsigned int timeout;
-       int try, err;
+       int try;
 
-       err = cifs_root_data(&root_dev, &root_data);
-       if (err != 0)
-               return 0;
+       if (cifs_root_data(&root_dev, &root_data))
+               goto fail;
 
        timeout = CIFSROOT_TIMEOUT_MIN;
        for (try = 1; ; try++) {
-               err = do_mount_root(root_dev, "cifs", root_mountflags,
-                                   root_data);
-               if (err == 0)
-                       return 1;
+               if (!do_mount_root(root_dev, "cifs", root_mountflags,
+                                  root_data))
+                       return;
                if (try > CIFSROOT_RETRY_MAX)
                        break;
 
@@ -519,9 +284,14 @@ static int __init mount_cifs_root(void)
                if (timeout > CIFSROOT_TIMEOUT_MAX)
                        timeout = CIFSROOT_TIMEOUT_MAX;
        }
-       return 0;
+fail:
+       pr_err("VFS: Unable to mount root fs via SMB.\n");
 }
-#endif
+#else
+static inline void mount_cifs_root(void)
+{
+}
+#endif /* CONFIG_CIFS_ROOT */
 
 static bool __init fs_is_nodev(char *fstype)
 {
@@ -536,7 +306,7 @@ static bool __init fs_is_nodev(char *fstype)
        return ret;
 }
 
-static int __init mount_nodev_root(void)
+static int __init mount_nodev_root(char *root_device_name)
 {
        char *fs_names, *fstype;
        int err = -EINVAL;
@@ -545,7 +315,7 @@ static int __init mount_nodev_root(void)
        fs_names = (void *)__get_free_page(GFP_KERNEL);
        if (!fs_names)
                return -EINVAL;
-       num_fs = split_fs_names(fs_names, PAGE_SIZE, root_fs_names);
+       num_fs = split_fs_names(fs_names, PAGE_SIZE);
 
        for (i = 0, fstype = fs_names; i < num_fs;
             i++, fstype += strlen(fstype) + 1) {
@@ -563,35 +333,84 @@ static int __init mount_nodev_root(void)
        return err;
 }
 
-void __init mount_root(void)
+#ifdef CONFIG_BLOCK
+static void __init mount_block_root(char *root_device_name)
 {
-#ifdef CONFIG_ROOT_NFS
-       if (ROOT_DEV == Root_NFS) {
-               if (!mount_nfs_root())
-                       printk(KERN_ERR "VFS: Unable to mount root fs via NFS.\n");
-               return;
+       int err = create_dev("/dev/root", ROOT_DEV);
+
+       if (err < 0)
+               pr_emerg("Failed to create /dev/root: %d\n", err);
+       mount_root_generic("/dev/root", root_device_name, root_mountflags);
+}
+#else
+static inline void mount_block_root(char *root_device_name)
+{
+}
+#endif /* CONFIG_BLOCK */
+
+void __init mount_root(char *root_device_name)
+{
+       switch (ROOT_DEV) {
+       case Root_NFS:
+               mount_nfs_root();
+               break;
+       case Root_CIFS:
+               mount_cifs_root();
+               break;
+       case Root_Generic:
+               mount_root_generic(root_device_name, root_device_name,
+                                  root_mountflags);
+               break;
+       case 0:
+               if (root_device_name && root_fs_names &&
+                   mount_nodev_root(root_device_name) == 0)
+                       break;
+               fallthrough;
+       default:
+               mount_block_root(root_device_name);
+               break;
        }
-#endif
-#ifdef CONFIG_CIFS_ROOT
-       if (ROOT_DEV == Root_CIFS) {
-               if (!mount_cifs_root())
-                       printk(KERN_ERR "VFS: Unable to mount root fs via SMB.\n");
+}
+
+/* wait for any asynchronous scanning to complete */
+static void __init wait_for_root(char *root_device_name)
+{
+       if (ROOT_DEV != 0)
                return;
-       }
-#endif
-       if (ROOT_DEV == 0 && root_device_name && root_fs_names) {
-               if (mount_nodev_root() == 0)
-                       return;
-       }
-#ifdef CONFIG_BLOCK
-       {
-               int err = create_dev("/dev/root", ROOT_DEV);
 
-               if (err < 0)
-                       pr_emerg("Failed to create /dev/root: %d\n", err);
-               mount_block_root("/dev/root", root_mountflags);
+       pr_info("Waiting for root device %s...\n", root_device_name);
+
+       while (!driver_probe_done() ||
+              early_lookup_bdev(root_device_name, &ROOT_DEV) < 0)
+               msleep(5);
+       async_synchronize_full();
+
+}
+
+static dev_t __init parse_root_device(char *root_device_name)
+{
+       int error;
+       dev_t dev;
+
+       if (!strncmp(root_device_name, "mtd", 3) ||
+           !strncmp(root_device_name, "ubi", 3))
+               return Root_Generic;
+       if (strcmp(root_device_name, "/dev/nfs") == 0)
+               return Root_NFS;
+       if (strcmp(root_device_name, "/dev/cifs") == 0)
+               return Root_CIFS;
+       if (strcmp(root_device_name, "/dev/ram") == 0)
+               return Root_RAM0;
+
+       error = early_lookup_bdev(root_device_name, &dev);
+       if (error) {
+               if (error == -EINVAL && root_wait) {
+                       pr_err("Disabling rootwait; root= is invalid.\n");
+                       root_wait = 0;
+               }
+               return 0;
        }
-#endif
+       return dev;
 }
 
 /*
@@ -616,32 +435,15 @@ void __init prepare_namespace(void)
 
        md_run_setup();
 
-       if (saved_root_name[0]) {
-               root_device_name = saved_root_name;
-               if (!strncmp(root_device_name, "mtd", 3) ||
-                   !strncmp(root_device_name, "ubi", 3)) {
-                       mount_block_root(root_device_name, root_mountflags);
-                       goto out;
-               }
-               ROOT_DEV = name_to_dev_t(root_device_name);
-               if (strncmp(root_device_name, "/dev/", 5) == 0)
-                       root_device_name += 5;
-       }
+       if (saved_root_name[0])
+               ROOT_DEV = parse_root_device(saved_root_name);
 
-       if (initrd_load())
+       if (initrd_load(saved_root_name))
                goto out;
 
-       /* wait for any asynchronous scanning to complete */
-       if ((ROOT_DEV == 0) && root_wait) {
-               printk(KERN_INFO "Waiting for root device %s...\n",
-                       saved_root_name);
-               while (driver_probe_done() != 0 ||
-                       (ROOT_DEV = name_to_dev_t(saved_root_name)) == 0)
-                       msleep(5);
-               async_synchronize_full();
-       }
-
-       mount_root();
+       if (root_wait)
+               wait_for_root(saved_root_name);
+       mount_root(saved_root_name);
 out:
        devtmpfs_mount();
        init_mount(".", "/", NULL, MS_MOVE, NULL);
index 7a29ac3..15e372b 100644 (file)
@@ -10,8 +10,8 @@
 #include <linux/root_dev.h>
 #include <linux/init_syscalls.h>
 
-void  mount_block_root(char *name, int flags);
-void  mount_root(void);
+void  mount_root_generic(char *name, char *pretty_name, int flags);
+void  mount_root(char *root_device_name);
 extern int root_mountflags;
 
 static inline __init int create_dev(char *name, dev_t dev)
@@ -33,11 +33,11 @@ static inline int rd_load_image(char *from) { return 0; }
 #endif
 
 #ifdef CONFIG_BLK_DEV_INITRD
-
-bool __init initrd_load(void);
-
+bool __init initrd_load(char *root_device_name);
 #else
-
-static inline bool initrd_load(void) { return false; }
+static inline bool initrd_load(char *root_device_name)
+{
+       return false;
+       }
 
 #endif
index 3473124..425f4bc 100644 (file)
@@ -83,7 +83,7 @@ static int __init init_linuxrc(struct subprocess_info *info, struct cred *new)
        return 0;
 }
 
-static void __init handle_initrd(void)
+static void __init handle_initrd(char *root_device_name)
 {
        struct subprocess_info *info;
        static char *argv[] = { "linuxrc", NULL, };
@@ -95,7 +95,8 @@ static void __init handle_initrd(void)
        real_root_dev = new_encode_dev(ROOT_DEV);
        create_dev("/dev/root.old", Root_RAM0);
        /* mount initrd on rootfs' /root */
-       mount_block_root("/dev/root.old", root_mountflags & ~MS_RDONLY);
+       mount_root_generic("/dev/root.old", root_device_name,
+                          root_mountflags & ~MS_RDONLY);
        init_mkdir("/old", 0700);
        init_chdir("/old");
 
@@ -117,7 +118,7 @@ static void __init handle_initrd(void)
 
        init_chdir("/");
        ROOT_DEV = new_decode_dev(real_root_dev);
-       mount_root();
+       mount_root(root_device_name);
 
        printk(KERN_NOTICE "Trying to move old root to /initrd ... ");
        error = init_mount("/old", "/root/initrd", NULL, MS_MOVE, NULL);
@@ -133,7 +134,7 @@ static void __init handle_initrd(void)
        }
 }
 
-bool __init initrd_load(void)
+bool __init initrd_load(char *root_device_name)
 {
        if (mount_initrd) {
                create_dev("/dev/ram", Root_RAM0);
@@ -145,7 +146,7 @@ bool __init initrd_load(void)
                 */
                if (rd_load_image("/initrd.image") && ROOT_DEV != Root_RAM0) {
                        init_unlink("/initrd.image");
-                       handle_initrd();
+                       handle_initrd(root_device_name);
                        return true;
                }
        }
index af50044..0d2ccef 100644 (file)
@@ -95,7 +95,6 @@
 #include <linux/cache.h>
 #include <linux/rodata_test.h>
 #include <linux/jump_label.h>
-#include <linux/mem_encrypt.h>
 #include <linux/kcsan.h>
 #include <linux/init_syscalls.h>
 #include <linux/stackdepot.h>
 #include <net/net_namespace.h>
 
 #include <asm/io.h>
-#include <asm/bugs.h>
 #include <asm/setup.h>
 #include <asm/sections.h>
 #include <asm/cacheflush.h>
@@ -787,8 +785,6 @@ void __init __weak thread_stack_cache_init(void)
 }
 #endif
 
-void __init __weak mem_encrypt_init(void) { }
-
 void __init __weak poking_init(void) { }
 
 void __init __weak pgtable_cache_init(void) { }
@@ -877,7 +873,8 @@ static void __init print_unknown_bootoptions(void)
        memblock_free(unknown_options, len);
 }
 
-asmlinkage __visible void __init __no_sanitize_address __noreturn start_kernel(void)
+asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
+void start_kernel(void)
 {
        char *command_line;
        char *after_dashes;
@@ -1042,15 +1039,7 @@ asmlinkage __visible void __init __no_sanitize_address __noreturn start_kernel(v
        sched_clock_init();
        calibrate_delay();
 
-       /*
-        * This needs to be called before any devices perform DMA
-        * operations that might use the SWIOTLB bounce buffers. It will
-        * mark the bounce buffers as decrypted so that their usage will
-        * not cause "plain-text" data to be decrypted when accessed. It
-        * must be called after late_time_init() so that Hyper-V x86/x64
-        * hypercalls work when the SWIOTLB bounce buffers are decrypted.
-        */
-       mem_encrypt_init();
+       arch_cpu_finalize_init();
 
        pid_idr_init();
        anon_vma_init();
@@ -1078,8 +1067,6 @@ asmlinkage __visible void __init __no_sanitize_address __noreturn start_kernel(v
        taskstats_init_early();
        delayacct_init();
 
-       check_bugs();
-
        acpi_subsystem_init();
        arch_post_acpi_subsys_init();
        kcsan_init();
@@ -1087,7 +1074,13 @@ asmlinkage __visible void __init __no_sanitize_address __noreturn start_kernel(v
        /* Do the rest non-__init'ed, we're now alive */
        arch_call_rest_init();
 
+       /*
+        * Avoid stack canaries in callers of boot_init_stack_canary for gcc-10
+        * and older.
+        */
+#if !__has_attribute(__no_stack_protector__)
        prevent_tail_call_optimization();
+#endif
 }
 
 /* Call all constructor functions linked into the kernel. */
index b4f5dfa..58c46c8 100644 (file)
@@ -216,13 +216,10 @@ static int __io_sync_cancel(struct io_uring_task *tctx,
        /* fixed must be grabbed every time since we drop the uring_lock */
        if ((cd->flags & IORING_ASYNC_CANCEL_FD) &&
            (cd->flags & IORING_ASYNC_CANCEL_FD_FIXED)) {
-               unsigned long file_ptr;
-
                if (unlikely(fd >= ctx->nr_user_files))
                        return -EBADF;
                fd = array_index_nospec(fd, ctx->nr_user_files);
-               file_ptr = io_fixed_file_slot(&ctx->file_table, fd)->file_ptr;
-               cd->file = (struct file *) (file_ptr & FFS_MASK);
+               cd->file = io_file_from_index(&ctx->file_table, fd);
                if (!cd->file)
                        return -EBADF;
        }
index 9aa74d2..89bff20 100644 (file)
@@ -25,10 +25,6 @@ int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
        struct io_epoll *epoll = io_kiocb_to_cmd(req, struct io_epoll);
 
-       pr_warn_once("%s: epoll_ctl support in io_uring is deprecated and will "
-                    "be removed in a future Linux kernel version.\n",
-                    current->comm);
-
        if (sqe->buf_index || sqe->splice_fd_in)
                return -EINVAL;
 
index 0f6fa79..e7d7499 100644 (file)
@@ -78,10 +78,8 @@ static int io_install_fixed_file(struct io_ring_ctx *ctx, struct file *file,
        file_slot = io_fixed_file_slot(&ctx->file_table, slot_index);
 
        if (file_slot->file_ptr) {
-               struct file *old_file;
-
-               old_file = (struct file *)(file_slot->file_ptr & FFS_MASK);
-               ret = io_queue_rsrc_removal(ctx->file_data, slot_index, old_file);
+               ret = io_queue_rsrc_removal(ctx->file_data, slot_index,
+                                           io_slot_file(file_slot));
                if (ret)
                        return ret;
 
@@ -140,7 +138,6 @@ int io_fixed_fd_install(struct io_kiocb *req, unsigned int issue_flags,
 int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset)
 {
        struct io_fixed_file *file_slot;
-       struct file *file;
        int ret;
 
        if (unlikely(!ctx->file_data))
@@ -153,8 +150,8 @@ int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset)
        if (!file_slot->file_ptr)
                return -EBADF;
 
-       file = (struct file *)(file_slot->file_ptr & FFS_MASK);
-       ret = io_queue_rsrc_removal(ctx->file_data, offset, file);
+       ret = io_queue_rsrc_removal(ctx->file_data, offset,
+                                   io_slot_file(file_slot));
        if (ret)
                return ret;
 
index 351111f..b47adf1 100644 (file)
@@ -5,10 +5,6 @@
 #include <linux/file.h>
 #include <linux/io_uring_types.h>
 
-#define FFS_NOWAIT             0x1UL
-#define FFS_ISREG              0x2UL
-#define FFS_MASK               ~(FFS_NOWAIT|FFS_ISREG)
-
 bool io_alloc_file_tables(struct io_file_table *table, unsigned nr_files);
 void io_free_file_tables(struct io_file_table *table);
 
@@ -43,21 +39,31 @@ io_fixed_file_slot(struct io_file_table *table, unsigned i)
        return &table->files[i];
 }
 
+#define FFS_NOWAIT             0x1UL
+#define FFS_ISREG              0x2UL
+#define FFS_MASK               ~(FFS_NOWAIT|FFS_ISREG)
+
+static inline unsigned int io_slot_flags(struct io_fixed_file *slot)
+{
+       return (slot->file_ptr & ~FFS_MASK) << REQ_F_SUPPORT_NOWAIT_BIT;
+}
+
+static inline struct file *io_slot_file(struct io_fixed_file *slot)
+{
+       return (struct file *)(slot->file_ptr & FFS_MASK);
+}
+
 static inline struct file *io_file_from_index(struct io_file_table *table,
                                              int index)
 {
-       struct io_fixed_file *slot = io_fixed_file_slot(table, index);
-
-       return (struct file *) (slot->file_ptr & FFS_MASK);
+       return io_slot_file(io_fixed_file_slot(table, index));
 }
 
 static inline void io_fixed_file_set(struct io_fixed_file *file_slot,
                                     struct file *file)
 {
-       unsigned long file_ptr = (unsigned long) file;
-
-       file_ptr |= io_file_get_flags(file);
-       file_slot->file_ptr = file_ptr;
+       file_slot->file_ptr = (unsigned long)file |
+               (io_file_get_flags(file) >> REQ_F_SUPPORT_NOWAIT_BIT);
 }
 
 static inline void io_reset_alloc_hint(struct io_ring_ctx *ctx)
index b271598..399e9a1 100644 (file)
@@ -220,10 +220,12 @@ static void io_worker_exit(struct io_worker *worker)
        list_del_rcu(&worker->all_list);
        raw_spin_unlock(&wq->lock);
        io_wq_dec_running(worker);
-       worker->flags = 0;
-       preempt_disable();
-       current->flags &= ~PF_IO_WORKER;
-       preempt_enable();
+       /*
+        * this worker is a goner, clear ->worker_private to avoid any
+        * inc/dec running calls that could happen as part of exit from
+        * touching 'worker'.
+        */
+       current->worker_private = NULL;
 
        kfree_rcu(worker, rcu);
        io_worker_ref_put(wq);
index 3bca7a7..1b53a2a 100644 (file)
@@ -95,6 +95,7 @@
 
 #include "timeout.h"
 #include "poll.h"
+#include "rw.h"
 #include "alloc_cache.h"
 
 #define IORING_MAX_ENTRIES     32768
@@ -145,8 +146,6 @@ static bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
                                         struct task_struct *task,
                                         bool cancel_all);
 
-static void io_dismantle_req(struct io_kiocb *req);
-static void io_clean_op(struct io_kiocb *req);
 static void io_queue_sqe(struct io_kiocb *req);
 static void io_move_task_work_from_local(struct io_ring_ctx *ctx);
 static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
@@ -367,6 +366,39 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
        return false;
 }
 
+static void io_clean_op(struct io_kiocb *req)
+{
+       if (req->flags & REQ_F_BUFFER_SELECTED) {
+               spin_lock(&req->ctx->completion_lock);
+               io_put_kbuf_comp(req);
+               spin_unlock(&req->ctx->completion_lock);
+       }
+
+       if (req->flags & REQ_F_NEED_CLEANUP) {
+               const struct io_cold_def *def = &io_cold_defs[req->opcode];
+
+               if (def->cleanup)
+                       def->cleanup(req);
+       }
+       if ((req->flags & REQ_F_POLLED) && req->apoll) {
+               kfree(req->apoll->double_poll);
+               kfree(req->apoll);
+               req->apoll = NULL;
+       }
+       if (req->flags & REQ_F_INFLIGHT) {
+               struct io_uring_task *tctx = req->task->io_uring;
+
+               atomic_dec(&tctx->inflight_tracked);
+       }
+       if (req->flags & REQ_F_CREDS)
+               put_cred(req->creds);
+       if (req->flags & REQ_F_ASYNC_DATA) {
+               kfree(req->async_data);
+               req->async_data = NULL;
+       }
+       req->flags &= ~IO_REQ_CLEAN_FLAGS;
+}
+
 static inline void io_req_track_inflight(struct io_kiocb *req)
 {
        if (!(req->flags & REQ_F_INFLIGHT)) {
@@ -423,8 +455,8 @@ static void io_prep_async_work(struct io_kiocb *req)
        if (req->flags & REQ_F_FORCE_ASYNC)
                req->work.flags |= IO_WQ_WORK_CONCURRENT;
 
-       if (req->file && !io_req_ffs_set(req))
-               req->flags |= io_file_get_flags(req->file) << REQ_F_SUPPORT_NOWAIT_BIT;
+       if (req->file && !(req->flags & REQ_F_FIXED_FILE))
+               req->flags |= io_file_get_flags(req->file);
 
        if (req->file && (req->flags & REQ_F_ISREG)) {
                bool should_hash = def->hash_reg_file;
@@ -594,42 +626,18 @@ void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
 }
 
 static inline void __io_cq_lock(struct io_ring_ctx *ctx)
-       __acquires(ctx->completion_lock)
 {
        if (!ctx->task_complete)
                spin_lock(&ctx->completion_lock);
 }
 
-static inline void __io_cq_unlock(struct io_ring_ctx *ctx)
-{
-       if (!ctx->task_complete)
-               spin_unlock(&ctx->completion_lock);
-}
-
 static inline void io_cq_lock(struct io_ring_ctx *ctx)
        __acquires(ctx->completion_lock)
 {
        spin_lock(&ctx->completion_lock);
 }
 
-static inline void io_cq_unlock(struct io_ring_ctx *ctx)
-       __releases(ctx->completion_lock)
-{
-       spin_unlock(&ctx->completion_lock);
-}
-
-/* keep it inlined for io_submit_flush_completions() */
 static inline void __io_cq_unlock_post(struct io_ring_ctx *ctx)
-       __releases(ctx->completion_lock)
-{
-       io_commit_cqring(ctx);
-       __io_cq_unlock(ctx);
-       io_commit_cqring_flush(ctx);
-       io_cqring_wake(ctx);
-}
-
-static void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
-       __releases(ctx->completion_lock)
 {
        io_commit_cqring(ctx);
 
@@ -641,13 +649,13 @@ static void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
                 */
                io_commit_cqring_flush(ctx);
        } else {
-               __io_cq_unlock(ctx);
+               spin_unlock(&ctx->completion_lock);
                io_commit_cqring_flush(ctx);
                io_cqring_wake(ctx);
        }
 }
 
-void io_cq_unlock_post(struct io_ring_ctx *ctx)
+static void io_cq_unlock_post(struct io_ring_ctx *ctx)
        __releases(ctx->completion_lock)
 {
        io_commit_cqring(ctx);
@@ -662,10 +670,10 @@ static void io_cqring_overflow_kill(struct io_ring_ctx *ctx)
        struct io_overflow_cqe *ocqe;
        LIST_HEAD(list);
 
-       io_cq_lock(ctx);
+       spin_lock(&ctx->completion_lock);
        list_splice_init(&ctx->cq_overflow_list, &list);
        clear_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
-       io_cq_unlock(ctx);
+       spin_unlock(&ctx->completion_lock);
 
        while (!list_empty(&list)) {
                ocqe = list_first_entry(&list, struct io_overflow_cqe, list);
@@ -722,29 +730,29 @@ static void io_cqring_overflow_flush(struct io_ring_ctx *ctx)
 }
 
 /* can be called by any task */
-static void io_put_task_remote(struct task_struct *task, int nr)
+static void io_put_task_remote(struct task_struct *task)
 {
        struct io_uring_task *tctx = task->io_uring;
 
-       percpu_counter_sub(&tctx->inflight, nr);
+       percpu_counter_sub(&tctx->inflight, 1);
        if (unlikely(atomic_read(&tctx->in_cancel)))
                wake_up(&tctx->wait);
-       put_task_struct_many(task, nr);
+       put_task_struct(task);
 }
 
 /* used by a task to put its own references */
-static void io_put_task_local(struct task_struct *task, int nr)
+static void io_put_task_local(struct task_struct *task)
 {
-       task->io_uring->cached_refs += nr;
+       task->io_uring->cached_refs++;
 }
 
 /* must to be called somewhat shortly after putting a request */
-static inline void io_put_task(struct task_struct *task, int nr)
+static inline void io_put_task(struct task_struct *task)
 {
        if (likely(task == current))
-               io_put_task_local(task, nr);
+               io_put_task_local(task);
        else
-               io_put_task_remote(task, nr);
+               io_put_task_remote(task);
 }
 
 void io_task_refs_refill(struct io_uring_task *tctx)
@@ -934,20 +942,19 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags
        return __io_post_aux_cqe(ctx, user_data, res, cflags, true);
 }
 
-bool io_aux_cqe(struct io_ring_ctx *ctx, bool defer, u64 user_data, s32 res, u32 cflags,
+bool io_aux_cqe(const struct io_kiocb *req, bool defer, s32 res, u32 cflags,
                bool allow_overflow)
 {
+       struct io_ring_ctx *ctx = req->ctx;
+       u64 user_data = req->cqe.user_data;
        struct io_uring_cqe *cqe;
-       unsigned int length;
 
        if (!defer)
                return __io_post_aux_cqe(ctx, user_data, res, cflags, allow_overflow);
 
-       length = ARRAY_SIZE(ctx->submit_state.cqes);
-
        lockdep_assert_held(&ctx->uring_lock);
 
-       if (ctx->submit_state.cqes_count == length) {
+       if (ctx->submit_state.cqes_count == ARRAY_SIZE(ctx->submit_state.cqes)) {
                __io_cq_lock(ctx);
                __io_flush_post_cqes(ctx);
                /* no need to flush - flush is deferred */
@@ -991,14 +998,18 @@ static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
                        }
                }
                io_put_kbuf_comp(req);
-               io_dismantle_req(req);
+               if (unlikely(req->flags & IO_REQ_CLEAN_FLAGS))
+                       io_clean_op(req);
+               if (!(req->flags & REQ_F_FIXED_FILE))
+                       io_put_file(req->file);
+
                rsrc_node = req->rsrc_node;
                /*
                 * Selected buffer deallocation in io_clean_op() assumes that
                 * we don't hold ->completion_lock. Clean them here to avoid
                 * deadlocks.
                 */
-               io_put_task_remote(req->task, 1);
+               io_put_task_remote(req->task);
                wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
                ctx->locked_free_nr++;
        }
@@ -1111,36 +1122,13 @@ __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx)
        return true;
 }
 
-static inline void io_dismantle_req(struct io_kiocb *req)
-{
-       unsigned int flags = req->flags;
-
-       if (unlikely(flags & IO_REQ_CLEAN_FLAGS))
-               io_clean_op(req);
-       if (!(flags & REQ_F_FIXED_FILE))
-               io_put_file(req->file);
-}
-
-static __cold void io_free_req_tw(struct io_kiocb *req, struct io_tw_state *ts)
-{
-       struct io_ring_ctx *ctx = req->ctx;
-
-       if (req->rsrc_node) {
-               io_tw_lock(ctx, ts);
-               io_put_rsrc_node(ctx, req->rsrc_node);
-       }
-       io_dismantle_req(req);
-       io_put_task_remote(req->task, 1);
-
-       spin_lock(&ctx->completion_lock);
-       wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
-       ctx->locked_free_nr++;
-       spin_unlock(&ctx->completion_lock);
-}
-
 __cold void io_free_req(struct io_kiocb *req)
 {
-       req->io_task_work.func = io_free_req_tw;
+       /* refs were already put, restore them for io_req_task_complete() */
+       req->flags &= ~REQ_F_REFCOUNT;
+       /* we only want to free it, don't post CQEs */
+       req->flags |= REQ_F_CQE_SKIP;
+       req->io_task_work.func = io_req_task_complete;
        io_req_task_work_add(req);
 }
 
@@ -1205,7 +1193,9 @@ static unsigned int handle_tw_list(struct llist_node *node,
                        ts->locked = mutex_trylock(&(*ctx)->uring_lock);
                        percpu_ref_get(&(*ctx)->refs);
                }
-               req->io_task_work.func(req, ts);
+               INDIRECT_CALL_2(req->io_task_work.func,
+                               io_poll_task_func, io_req_rw_complete,
+                               req, ts);
                node = next;
                count++;
                if (unlikely(need_resched())) {
@@ -1303,7 +1293,7 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx)
        }
 }
 
-static void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
+static inline void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
 {
        struct io_ring_ctx *ctx = req->ctx;
        unsigned nr_wait, nr_tw, nr_tw_prev;
@@ -1354,19 +1344,11 @@ static void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
        wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
 }
 
-void __io_req_task_work_add(struct io_kiocb *req, unsigned flags)
+static void io_req_normal_work_add(struct io_kiocb *req)
 {
        struct io_uring_task *tctx = req->task->io_uring;
        struct io_ring_ctx *ctx = req->ctx;
 
-       if (!(flags & IOU_F_TWQ_FORCE_NORMAL) &&
-           (ctx->flags & IORING_SETUP_DEFER_TASKRUN)) {
-               rcu_read_lock();
-               io_req_local_work_add(req, flags);
-               rcu_read_unlock();
-               return;
-       }
-
        /* task_work already pending, we're done */
        if (!llist_add(&req->io_task_work.node, &tctx->task_list))
                return;
@@ -1380,6 +1362,17 @@ void __io_req_task_work_add(struct io_kiocb *req, unsigned flags)
        io_fallback_tw(tctx);
 }
 
+void __io_req_task_work_add(struct io_kiocb *req, unsigned flags)
+{
+       if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
+               rcu_read_lock();
+               io_req_local_work_add(req, flags);
+               rcu_read_unlock();
+       } else {
+               io_req_normal_work_add(req);
+       }
+}
+
 static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
 {
        struct llist_node *node;
@@ -1390,7 +1383,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
                                                    io_task_work.node);
 
                node = node->next;
-               __io_req_task_work_add(req, IOU_F_TWQ_FORCE_NORMAL);
+               io_req_normal_work_add(req);
        }
 }
 
@@ -1405,13 +1398,19 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts)
        if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
                atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
 again:
-       node = io_llist_xchg(&ctx->work_llist, NULL);
+       /*
+        * llists are in reverse order, flip it back the right way before
+        * running the pending items.
+        */
+       node = llist_reverse_order(io_llist_xchg(&ctx->work_llist, NULL));
        while (node) {
                struct llist_node *next = node->next;
                struct io_kiocb *req = container_of(node, struct io_kiocb,
                                                    io_task_work.node);
                prefetch(container_of(next, struct io_kiocb, io_task_work.node));
-               req->io_task_work.func(req, ts);
+               INDIRECT_CALL_2(req->io_task_work.func,
+                               io_poll_task_func, io_req_rw_complete,
+                               req, ts);
                ret++;
                node = next;
        }
@@ -1498,9 +1497,6 @@ void io_queue_next(struct io_kiocb *req)
 void io_free_batch_list(struct io_ring_ctx *ctx, struct io_wq_work_node *node)
        __must_hold(&ctx->uring_lock)
 {
-       struct task_struct *task = NULL;
-       int task_refs = 0;
-
        do {
                struct io_kiocb *req = container_of(node, struct io_kiocb,
                                                    comp_list);
@@ -1530,19 +1526,10 @@ void io_free_batch_list(struct io_ring_ctx *ctx, struct io_wq_work_node *node)
 
                io_req_put_rsrc_locked(req, ctx);
 
-               if (req->task != task) {
-                       if (task)
-                               io_put_task(task, task_refs);
-                       task = req->task;
-                       task_refs = 0;
-               }
-               task_refs++;
+               io_put_task(req->task);
                node = req->comp_list.next;
                io_req_add_to_cache(req, ctx);
        } while (node);
-
-       if (task)
-               io_put_task(task, task_refs);
 }
 
 static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
@@ -1570,7 +1557,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
                        }
                }
        }
-       __io_cq_unlock_post_flush(ctx);
+       __io_cq_unlock_post(ctx);
 
        if (!wq_list_empty(&ctx->submit_state.compl_reqs)) {
                io_free_batch_list(ctx, state->compl_reqs.first);
@@ -1578,22 +1565,6 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
        }
 }
 
-/*
- * Drop reference to request, return next in chain (if there is one) if this
- * was the last reference to this request.
- */
-static inline struct io_kiocb *io_put_req_find_next(struct io_kiocb *req)
-{
-       struct io_kiocb *nxt = NULL;
-
-       if (req_ref_put_and_test(req)) {
-               if (unlikely(req->flags & IO_REQ_LINK_FLAGS))
-                       nxt = io_req_find_next(req);
-               io_free_req(req);
-       }
-       return nxt;
-}
-
 static unsigned io_cqring_events(struct io_ring_ctx *ctx)
 {
        /* See comment at the top of this file */
@@ -1758,54 +1729,14 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
        }
 }
 
-static bool io_bdev_nowait(struct block_device *bdev)
-{
-       return !bdev || bdev_nowait(bdev);
-}
-
-/*
- * If we tracked the file through the SCM inflight mechanism, we could support
- * any file. For now, just ensure that anything potentially problematic is done
- * inline.
- */
-static bool __io_file_supports_nowait(struct file *file, umode_t mode)
-{
-       if (S_ISBLK(mode)) {
-               if (IS_ENABLED(CONFIG_BLOCK) &&
-                   io_bdev_nowait(I_BDEV(file->f_mapping->host)))
-                       return true;
-               return false;
-       }
-       if (S_ISSOCK(mode))
-               return true;
-       if (S_ISREG(mode)) {
-               if (IS_ENABLED(CONFIG_BLOCK) &&
-                   io_bdev_nowait(file->f_inode->i_sb->s_bdev) &&
-                   !io_is_uring_fops(file))
-                       return true;
-               return false;
-       }
-
-       /* any ->read/write should understand O_NONBLOCK */
-       if (file->f_flags & O_NONBLOCK)
-               return true;
-       return file->f_mode & FMODE_NOWAIT;
-}
-
-/*
- * If we tracked the file through the SCM inflight mechanism, we could support
- * any file. For now, just ensure that anything potentially problematic is done
- * inline.
- */
 unsigned int io_file_get_flags(struct file *file)
 {
-       umode_t mode = file_inode(file)->i_mode;
        unsigned int res = 0;
 
-       if (S_ISREG(mode))
-               res |= FFS_ISREG;
-       if (__io_file_supports_nowait(file, mode))
-               res |= FFS_NOWAIT;
+       if (S_ISREG(file_inode(file)->i_mode))
+               res |= REQ_F_ISREG;
+       if ((file->f_flags & O_NONBLOCK) || (file->f_mode & FMODE_NOWAIT))
+               res |= REQ_F_SUPPORT_NOWAIT;
        return res;
 }
 
@@ -1891,39 +1822,6 @@ queue:
        spin_unlock(&ctx->completion_lock);
 }
 
-static void io_clean_op(struct io_kiocb *req)
-{
-       if (req->flags & REQ_F_BUFFER_SELECTED) {
-               spin_lock(&req->ctx->completion_lock);
-               io_put_kbuf_comp(req);
-               spin_unlock(&req->ctx->completion_lock);
-       }
-
-       if (req->flags & REQ_F_NEED_CLEANUP) {
-               const struct io_cold_def *def = &io_cold_defs[req->opcode];
-
-               if (def->cleanup)
-                       def->cleanup(req);
-       }
-       if ((req->flags & REQ_F_POLLED) && req->apoll) {
-               kfree(req->apoll->double_poll);
-               kfree(req->apoll);
-               req->apoll = NULL;
-       }
-       if (req->flags & REQ_F_INFLIGHT) {
-               struct io_uring_task *tctx = req->task->io_uring;
-
-               atomic_dec(&tctx->inflight_tracked);
-       }
-       if (req->flags & REQ_F_CREDS)
-               put_cred(req->creds);
-       if (req->flags & REQ_F_ASYNC_DATA) {
-               kfree(req->async_data);
-               req->async_data = NULL;
-       }
-       req->flags &= ~IO_REQ_CLEAN_FLAGS;
-}
-
 static bool io_assign_file(struct io_kiocb *req, const struct io_issue_def *def,
                           unsigned int issue_flags)
 {
@@ -1986,9 +1884,14 @@ int io_poll_issue(struct io_kiocb *req, struct io_tw_state *ts)
 struct io_wq_work *io_wq_free_work(struct io_wq_work *work)
 {
        struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+       struct io_kiocb *nxt = NULL;
 
-       req = io_put_req_find_next(req);
-       return req ? &req->work : NULL;
+       if (req_ref_put_and_test(req)) {
+               if (req->flags & IO_REQ_LINK_FLAGS)
+                       nxt = io_req_find_next(req);
+               io_free_req(req);
+       }
+       return nxt ? &nxt->work : NULL;
 }
 
 void io_wq_submit_work(struct io_wq_work *work)
@@ -2060,19 +1963,17 @@ inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
                                      unsigned int issue_flags)
 {
        struct io_ring_ctx *ctx = req->ctx;
+       struct io_fixed_file *slot;
        struct file *file = NULL;
-       unsigned long file_ptr;
 
        io_ring_submit_lock(ctx, issue_flags);
 
        if (unlikely((unsigned int)fd >= ctx->nr_user_files))
                goto out;
        fd = array_index_nospec(fd, ctx->nr_user_files);
-       file_ptr = io_fixed_file_slot(&ctx->file_table, fd)->file_ptr;
-       file = (struct file *) (file_ptr & FFS_MASK);
-       file_ptr &= ~FFS_MASK;
-       /* mask in overlapping REQ_F and FFS bits */
-       req->flags |= (file_ptr << REQ_F_SUPPORT_NOWAIT_BIT);
+       slot = io_fixed_file_slot(&ctx->file_table, fd);
+       file = io_slot_file(slot);
+       req->flags |= io_slot_flags(slot);
        io_req_set_rsrc_node(req, ctx, 0);
 out:
        io_ring_submit_unlock(ctx, issue_flags);
@@ -2709,11 +2610,96 @@ static void io_mem_free(void *ptr)
                free_compound_page(page);
 }
 
+static void io_pages_free(struct page ***pages, int npages)
+{
+       struct page **page_array;
+       int i;
+
+       if (!pages)
+               return;
+       page_array = *pages;
+       for (i = 0; i < npages; i++)
+               unpin_user_page(page_array[i]);
+       kvfree(page_array);
+       *pages = NULL;
+}
+
+static void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
+                           unsigned long uaddr, size_t size)
+{
+       struct page **page_array;
+       unsigned int nr_pages;
+       int ret;
+
+       *npages = 0;
+
+       if (uaddr & (PAGE_SIZE - 1) || !size)
+               return ERR_PTR(-EINVAL);
+
+       nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+       if (nr_pages > USHRT_MAX)
+               return ERR_PTR(-EINVAL);
+       page_array = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL);
+       if (!page_array)
+               return ERR_PTR(-ENOMEM);
+
+       ret = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
+                                       page_array);
+       if (ret != nr_pages) {
+err:
+               io_pages_free(&page_array, ret > 0 ? ret : 0);
+               return ret < 0 ? ERR_PTR(ret) : ERR_PTR(-EFAULT);
+       }
+       /*
+        * Should be a single page. If the ring is small enough that we can
+        * use a normal page, that is fine. If we need multiple pages, then
+        * userspace should use a huge page. That's the only way to guarantee
+        * that we get contigious memory, outside of just being lucky or
+        * (currently) having low memory fragmentation.
+        */
+       if (page_array[0] != page_array[ret - 1])
+               goto err;
+       *pages = page_array;
+       *npages = nr_pages;
+       return page_to_virt(page_array[0]);
+}
+
+static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr,
+                         size_t size)
+{
+       return __io_uaddr_map(&ctx->ring_pages, &ctx->n_ring_pages, uaddr,
+                               size);
+}
+
+static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr,
+                        size_t size)
+{
+       return __io_uaddr_map(&ctx->sqe_pages, &ctx->n_sqe_pages, uaddr,
+                               size);
+}
+
+static void io_rings_free(struct io_ring_ctx *ctx)
+{
+       if (!(ctx->flags & IORING_SETUP_NO_MMAP)) {
+               io_mem_free(ctx->rings);
+               io_mem_free(ctx->sq_sqes);
+               ctx->rings = NULL;
+               ctx->sq_sqes = NULL;
+       } else {
+               io_pages_free(&ctx->ring_pages, ctx->n_ring_pages);
+               io_pages_free(&ctx->sqe_pages, ctx->n_sqe_pages);
+       }
+}
+
 static void *io_mem_alloc(size_t size)
 {
        gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP;
+       void *ret;
 
-       return (void *) __get_free_pages(gfp, get_order(size));
+       ret = (void *) __get_free_pages(gfp, get_order(size));
+       if (ret)
+               return ret;
+       return ERR_PTR(-ENOMEM);
 }
 
 static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries,
@@ -2869,8 +2855,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
                mmdrop(ctx->mm_account);
                ctx->mm_account = NULL;
        }
-       io_mem_free(ctx->rings);
-       io_mem_free(ctx->sq_sqes);
+       io_rings_free(ctx);
 
        percpu_ref_exit(&ctx->refs);
        free_uid(ctx->user);
@@ -3050,7 +3035,18 @@ static __cold void io_ring_exit_work(struct work_struct *work)
                        /* there is little hope left, don't run it too often */
                        interval = HZ * 60;
                }
-       } while (!wait_for_completion_timeout(&ctx->ref_comp, interval));
+               /*
+                * This is really an uninterruptible wait, as it has to be
+                * complete. But it's also run from a kworker, which doesn't
+                * take signals, so it's fine to make it interruptible. This
+                * avoids scenarios where we knowingly can wait much longer
+                * on completions, for example if someone does a SIGSTOP on
+                * a task that needs to finish task_work to make this loop
+                * complete. That's a synthetic situation that should not
+                * cause a stuck task backtrace, and hence a potential panic
+                * on stuck tasks if that is enabled.
+                */
+       } while (!wait_for_completion_interruptible_timeout(&ctx->ref_comp, interval));
 
        init_completion(&exit.completion);
        init_task_work(&exit.task_work, io_tctx_exit_cb);
@@ -3074,7 +3070,12 @@ static __cold void io_ring_exit_work(struct work_struct *work)
                        continue;
 
                mutex_unlock(&ctx->uring_lock);
-               wait_for_completion(&exit.completion);
+               /*
+                * See comment above for
+                * wait_for_completion_interruptible_timeout() on why this
+                * wait is marked as interruptible.
+                */
+               wait_for_completion_interruptible(&exit.completion);
                mutex_lock(&ctx->uring_lock);
        }
        mutex_unlock(&ctx->uring_lock);
@@ -3348,6 +3349,10 @@ static void *io_uring_validate_mmap_request(struct file *file,
        struct page *page;
        void *ptr;
 
+       /* Don't allow mmap if the ring was setup without it */
+       if (ctx->flags & IORING_SETUP_NO_MMAP)
+               return ERR_PTR(-EINVAL);
+
        switch (offset & IORING_OFF_MMAP_MASK) {
        case IORING_OFF_SQ_RING:
        case IORING_OFF_CQ_RING:
@@ -3673,6 +3678,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
 {
        struct io_rings *rings;
        size_t size, sq_array_offset;
+       void *ptr;
 
        /* make sure these are sane, as we already accounted them */
        ctx->sq_entries = p->sq_entries;
@@ -3682,9 +3688,13 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
        if (size == SIZE_MAX)
                return -EOVERFLOW;
 
-       rings = io_mem_alloc(size);
-       if (!rings)
-               return -ENOMEM;
+       if (!(ctx->flags & IORING_SETUP_NO_MMAP))
+               rings = io_mem_alloc(size);
+       else
+               rings = io_rings_map(ctx, p->cq_off.user_addr, size);
+
+       if (IS_ERR(rings))
+               return PTR_ERR(rings);
 
        ctx->rings = rings;
        ctx->sq_array = (u32 *)((char *)rings + sq_array_offset);
@@ -3698,34 +3708,31 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
        else
                size = array_size(sizeof(struct io_uring_sqe), p->sq_entries);
        if (size == SIZE_MAX) {
-               io_mem_free(ctx->rings);
-               ctx->rings = NULL;
+               io_rings_free(ctx);
                return -EOVERFLOW;
        }
 
-       ctx->sq_sqes = io_mem_alloc(size);
-       if (!ctx->sq_sqes) {
-               io_mem_free(ctx->rings);
-               ctx->rings = NULL;
-               return -ENOMEM;
+       if (!(ctx->flags & IORING_SETUP_NO_MMAP))
+               ptr = io_mem_alloc(size);
+       else
+               ptr = io_sqes_map(ctx, p->sq_off.user_addr, size);
+
+       if (IS_ERR(ptr)) {
+               io_rings_free(ctx);
+               return PTR_ERR(ptr);
        }
 
+       ctx->sq_sqes = ptr;
        return 0;
 }
 
-static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
+static int io_uring_install_fd(struct file *file)
 {
-       int ret, fd;
+       int fd;
 
        fd = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
        if (fd < 0)
                return fd;
-
-       ret = __io_uring_add_tctx_node(ctx);
-       if (ret) {
-               put_unused_fd(fd);
-               return ret;
-       }
        fd_install(fd, file);
        return fd;
 }
@@ -3765,6 +3772,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
                                  struct io_uring_params __user *params)
 {
        struct io_ring_ctx *ctx;
+       struct io_uring_task *tctx;
        struct file *file;
        int ret;
 
@@ -3776,6 +3784,10 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
                entries = IORING_MAX_ENTRIES;
        }
 
+       if ((p->flags & IORING_SETUP_REGISTERED_FD_ONLY)
+           && !(p->flags & IORING_SETUP_NO_MMAP))
+               return -EINVAL;
+
        /*
         * Use twice as many entries for the CQ ring. It's possible for the
         * application to drive a higher depth than the size of the SQ ring,
@@ -3887,7 +3899,6 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
        if (ret)
                goto err;
 
-       memset(&p->sq_off, 0, sizeof(p->sq_off));
        p->sq_off.head = offsetof(struct io_rings, sq.head);
        p->sq_off.tail = offsetof(struct io_rings, sq.tail);
        p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask);
@@ -3895,8 +3906,10 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
        p->sq_off.flags = offsetof(struct io_rings, sq_flags);
        p->sq_off.dropped = offsetof(struct io_rings, sq_dropped);
        p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings;
+       p->sq_off.resv1 = 0;
+       if (!(ctx->flags & IORING_SETUP_NO_MMAP))
+               p->sq_off.user_addr = 0;
 
-       memset(&p->cq_off, 0, sizeof(p->cq_off));
        p->cq_off.head = offsetof(struct io_rings, cq.head);
        p->cq_off.tail = offsetof(struct io_rings, cq.tail);
        p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask);
@@ -3904,6 +3917,9 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
        p->cq_off.overflow = offsetof(struct io_rings, cq_overflow);
        p->cq_off.cqes = offsetof(struct io_rings, cqes);
        p->cq_off.flags = offsetof(struct io_rings, cq_flags);
+       p->cq_off.resv1 = 0;
+       if (!(ctx->flags & IORING_SETUP_NO_MMAP))
+               p->cq_off.user_addr = 0;
 
        p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP |
                        IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS |
@@ -3928,22 +3944,30 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
                goto err;
        }
 
+       ret = __io_uring_add_tctx_node(ctx);
+       if (ret)
+               goto err_fput;
+       tctx = current->io_uring;
+
        /*
         * Install ring fd as the very last thing, so we don't risk someone
         * having closed it before we finish setup
         */
-       ret = io_uring_install_fd(ctx, file);
-       if (ret < 0) {
-               /* fput will clean it up */
-               fput(file);
-               return ret;
-       }
+       if (p->flags & IORING_SETUP_REGISTERED_FD_ONLY)
+               ret = io_ring_add_registered_file(tctx, file, 0, IO_RINGFD_REG_MAX);
+       else
+               ret = io_uring_install_fd(file);
+       if (ret < 0)
+               goto err_fput;
 
        trace_io_uring_create(ret, ctx, p->sq_entries, p->cq_entries, p->flags);
        return ret;
 err:
        io_ring_ctx_wait_and_kill(ctx);
        return ret;
+err_fput:
+       fput(file);
+       return ret;
 }
 
 /*
@@ -3969,7 +3993,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
                        IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
                        IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
                        IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
-                       IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN))
+                       IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN |
+                       IORING_SETUP_NO_MMAP | IORING_SETUP_REGISTERED_FD_ONLY))
                return -EINVAL;
 
        return io_uring_create(entries, &p, params);
index 259bf79..d3606d3 100644 (file)
@@ -16,9 +16,6 @@
 #endif
 
 enum {
-       /* don't use deferred task_work */
-       IOU_F_TWQ_FORCE_NORMAL                  = 1,
-
        /*
         * A hint to not wake right away but delay until there are enough of
         * tw's queued to match the number of CQEs the task is waiting for.
@@ -26,7 +23,7 @@ enum {
         * Must not be used wirh requests generating more than one CQE.
         * It's also ignored unless IORING_SETUP_DEFER_TASKRUN is set.
         */
-       IOU_F_TWQ_LAZY_WAKE                     = 2,
+       IOU_F_TWQ_LAZY_WAKE                     = 1,
 };
 
 enum {
@@ -47,7 +44,7 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx);
 void io_req_defer_failed(struct io_kiocb *req, s32 res);
 void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags);
 bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
-bool io_aux_cqe(struct io_ring_ctx *ctx, bool defer, u64 user_data, s32 res, u32 cflags,
+bool io_aux_cqe(const struct io_kiocb *req, bool defer, s32 res, u32 cflags,
                bool allow_overflow);
 void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
 
@@ -57,11 +54,6 @@ struct file *io_file_get_normal(struct io_kiocb *req, int fd);
 struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
                               unsigned issue_flags);
 
-static inline bool io_req_ffs_set(struct io_kiocb *req)
-{
-       return req->flags & REQ_F_FIXED_FILE;
-}
-
 void __io_req_task_work_add(struct io_kiocb *req, unsigned flags);
 bool io_is_uring_fops(struct file *file);
 bool io_alloc_async_data(struct io_kiocb *req);
@@ -75,6 +67,9 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd);
 int io_uring_alloc_task_context(struct task_struct *task,
                                struct io_ring_ctx *ctx);
 
+int io_ring_add_registered_file(struct io_uring_task *tctx, struct file *file,
+                                    int start, int end);
+
 int io_poll_issue(struct io_kiocb *req, struct io_tw_state *ts);
 int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr);
 int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin);
@@ -115,8 +110,6 @@ static inline void io_req_task_work_add(struct io_kiocb *req)
 #define io_for_each_link(pos, head) \
        for (pos = (head); pos; pos = pos->link)
 
-void io_cq_unlock_post(struct io_ring_ctx *ctx);
-
 static inline struct io_uring_cqe *io_get_cqe_overflow(struct io_ring_ctx *ctx,
                                                       bool overflow)
 {
index 85fd7ce..cd6dcf6 100644 (file)
@@ -162,14 +162,12 @@ static struct file *io_msg_grab_file(struct io_kiocb *req, unsigned int issue_fl
        struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
        struct io_ring_ctx *ctx = req->ctx;
        struct file *file = NULL;
-       unsigned long file_ptr;
        int idx = msg->src_fd;
 
        io_ring_submit_lock(ctx, issue_flags);
        if (likely(idx < ctx->nr_user_files)) {
                idx = array_index_nospec(idx, ctx->nr_user_files);
-               file_ptr = io_fixed_file_slot(&ctx->file_table, idx)->file_ptr;
-               file = (struct file *) (file_ptr & FFS_MASK);
+               file = io_file_from_index(&ctx->file_table, idx);
                if (file)
                        get_file(file);
        }
index 89e8390..a8e3037 100644 (file)
@@ -65,6 +65,7 @@ struct io_sr_msg {
        u16                             addr_len;
        u16                             buf_group;
        void __user                     *addr;
+       void __user                     *msg_control;
        /* used only for send zerocopy */
        struct io_kiocb                 *notif;
 };
@@ -195,11 +196,15 @@ static int io_sendmsg_copy_hdr(struct io_kiocb *req,
                               struct io_async_msghdr *iomsg)
 {
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+       int ret;
 
        iomsg->msg.msg_name = &iomsg->addr;
        iomsg->free_iov = iomsg->fast_iov;
-       return sendmsg_copy_msghdr(&iomsg->msg, sr->umsg, sr->msg_flags,
+       ret = sendmsg_copy_msghdr(&iomsg->msg, sr->umsg, sr->msg_flags,
                                        &iomsg->free_iov);
+       /* save msg_control as sys_sendmsg() overwrites it */
+       sr->msg_control = iomsg->msg.msg_control_user;
+       return ret;
 }
 
 int io_send_prep_async(struct io_kiocb *req)
@@ -297,6 +302,7 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 
        if (req_has_async_data(req)) {
                kmsg = req->async_data;
+               kmsg->msg.msg_control_user = sr->msg_control;
        } else {
                ret = io_sendmsg_copy_hdr(req, &iomsg);
                if (ret)
@@ -320,6 +326,8 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
                if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
                        return io_setup_async_msg(req, kmsg, issue_flags);
                if (ret > 0 && io_net_retry(sock, flags)) {
+                       kmsg->msg.msg_controllen = 0;
+                       kmsg->msg.msg_control = NULL;
                        sr->done_io += ret;
                        req->flags |= REQ_F_PARTIAL_IO;
                        return io_setup_async_msg(req, kmsg, issue_flags);
@@ -616,9 +624,15 @@ static inline void io_recv_prep_retry(struct io_kiocb *req)
  * again (for multishot).
  */
 static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
-                                 unsigned int cflags, bool mshot_finished,
+                                 struct msghdr *msg, bool mshot_finished,
                                  unsigned issue_flags)
 {
+       unsigned int cflags;
+
+       cflags = io_put_kbuf(req, issue_flags);
+       if (msg->msg_inq && msg->msg_inq != -1U)
+               cflags |= IORING_CQE_F_SOCK_NONEMPTY;
+
        if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
                io_req_set_res(req, *ret, cflags);
                *ret = IOU_OK;
@@ -626,10 +640,18 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
        }
 
        if (!mshot_finished) {
-               if (io_aux_cqe(req->ctx, issue_flags & IO_URING_F_COMPLETE_DEFER,
-                              req->cqe.user_data, *ret, cflags | IORING_CQE_F_MORE, true)) {
+               if (io_aux_cqe(req, issue_flags & IO_URING_F_COMPLETE_DEFER,
+                              *ret, cflags | IORING_CQE_F_MORE, true)) {
                        io_recv_prep_retry(req);
-                       return false;
+                       /* Known not-empty or unknown state, retry */
+                       if (cflags & IORING_CQE_F_SOCK_NONEMPTY ||
+                           msg->msg_inq == -1U)
+                               return false;
+                       if (issue_flags & IO_URING_F_MULTISHOT)
+                               *ret = IOU_ISSUE_SKIP_COMPLETE;
+                       else
+                               *ret = -EAGAIN;
+                       return true;
                }
                /* Otherwise stop multishot but use the current result. */
        }
@@ -732,7 +754,6 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
        struct io_async_msghdr iomsg, *kmsg;
        struct socket *sock;
-       unsigned int cflags;
        unsigned flags;
        int ret, min_ret = 0;
        bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
@@ -781,16 +802,20 @@ retry_multishot:
        flags = sr->msg_flags;
        if (force_nonblock)
                flags |= MSG_DONTWAIT;
-       if (flags & MSG_WAITALL)
-               min_ret = iov_iter_count(&kmsg->msg.msg_iter);
 
        kmsg->msg.msg_get_inq = 1;
-       if (req->flags & REQ_F_APOLL_MULTISHOT)
+       kmsg->msg.msg_inq = -1U;
+       if (req->flags & REQ_F_APOLL_MULTISHOT) {
                ret = io_recvmsg_multishot(sock, sr, kmsg, flags,
                                           &mshot_finished);
-       else
+       } else {
+               /* disable partial retry for recvmsg with cmsg attached */
+               if (flags & MSG_WAITALL && !kmsg->msg.msg_controllen)
+                       min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+
                ret = __sys_recvmsg_sock(sock, &kmsg->msg, sr->umsg,
                                         kmsg->uaddr, flags);
+       }
 
        if (ret < min_ret) {
                if (ret == -EAGAIN && force_nonblock) {
@@ -820,11 +845,7 @@ retry_multishot:
        else
                io_kbuf_recycle(req, issue_flags);
 
-       cflags = io_put_kbuf(req, issue_flags);
-       if (kmsg->msg.msg_inq)
-               cflags |= IORING_CQE_F_SOCK_NONEMPTY;
-
-       if (!io_recv_finish(req, &ret, cflags, mshot_finished, issue_flags))
+       if (!io_recv_finish(req, &ret, &kmsg->msg, mshot_finished, issue_flags))
                goto retry_multishot;
 
        if (mshot_finished) {
@@ -843,7 +864,6 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
        struct msghdr msg;
        struct socket *sock;
-       unsigned int cflags;
        unsigned flags;
        int ret, min_ret = 0;
        bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
@@ -860,6 +880,14 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
        if (unlikely(!sock))
                return -ENOTSOCK;
 
+       msg.msg_name = NULL;
+       msg.msg_namelen = 0;
+       msg.msg_control = NULL;
+       msg.msg_get_inq = 1;
+       msg.msg_controllen = 0;
+       msg.msg_iocb = NULL;
+       msg.msg_ubuf = NULL;
+
 retry_multishot:
        if (io_do_buffer_select(req)) {
                void __user *buf;
@@ -874,14 +902,8 @@ retry_multishot:
        if (unlikely(ret))
                goto out_free;
 
-       msg.msg_name = NULL;
-       msg.msg_namelen = 0;
-       msg.msg_control = NULL;
-       msg.msg_get_inq = 1;
+       msg.msg_inq = -1U;
        msg.msg_flags = 0;
-       msg.msg_controllen = 0;
-       msg.msg_iocb = NULL;
-       msg.msg_ubuf = NULL;
 
        flags = sr->msg_flags;
        if (force_nonblock)
@@ -921,11 +943,7 @@ out_free:
        else
                io_kbuf_recycle(req, issue_flags);
 
-       cflags = io_put_kbuf(req, issue_flags);
-       if (msg.msg_inq)
-               cflags |= IORING_CQE_F_SOCK_NONEMPTY;
-
-       if (!io_recv_finish(req, &ret, cflags, ret <= 0, issue_flags))
+       if (!io_recv_finish(req, &ret, &msg, ret <= 0, issue_flags))
                goto retry_multishot;
 
        return ret;
@@ -1297,7 +1315,6 @@ int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 {
-       struct io_ring_ctx *ctx = req->ctx;
        struct io_accept *accept = io_kiocb_to_cmd(req, struct io_accept);
        bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
        unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
@@ -1347,8 +1364,8 @@ retry:
 
        if (ret < 0)
                return ret;
-       if (io_aux_cqe(ctx, issue_flags & IO_URING_F_COMPLETE_DEFER,
-                      req->cqe.user_data, ret, IORING_CQE_F_MORE, true))
+       if (io_aux_cqe(req, issue_flags & IO_URING_F_COMPLETE_DEFER, ret,
+                      IORING_CQE_F_MORE, true))
                goto retry;
 
        return -ECANCELED;
index c90e47d..d4597ef 100644 (file)
@@ -300,8 +300,8 @@ static int io_poll_check_events(struct io_kiocb *req, struct io_tw_state *ts)
                        __poll_t mask = mangle_poll(req->cqe.res &
                                                    req->apoll_events);
 
-                       if (!io_aux_cqe(req->ctx, ts->locked, req->cqe.user_data,
-                                       mask, IORING_CQE_F_MORE, false)) {
+                       if (!io_aux_cqe(req, ts->locked, mask,
+                                       IORING_CQE_F_MORE, false)) {
                                io_req_set_res(req, mask, 0);
                                return IOU_POLL_REMOVE_POLL_USE_RES;
                        }
@@ -326,7 +326,7 @@ static int io_poll_check_events(struct io_kiocb *req, struct io_tw_state *ts)
        return IOU_POLL_NO_ACTION;
 }
 
-static void io_poll_task_func(struct io_kiocb *req, struct io_tw_state *ts)
+void io_poll_task_func(struct io_kiocb *req, struct io_tw_state *ts)
 {
        int ret;
 
@@ -977,8 +977,9 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
        struct io_hash_bucket *bucket;
        struct io_kiocb *preq;
        int ret2, ret = 0;
-       struct io_tw_state ts = {};
+       struct io_tw_state ts = { .locked = true };
 
+       io_ring_submit_lock(ctx, issue_flags);
        preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket);
        ret2 = io_poll_disarm(preq);
        if (bucket)
@@ -990,12 +991,10 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
                goto out;
        }
 
-       io_ring_submit_lock(ctx, issue_flags);
        preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table_locked, &bucket);
        ret2 = io_poll_disarm(preq);
        if (bucket)
                spin_unlock(&bucket->lock);
-       io_ring_submit_unlock(ctx, issue_flags);
        if (ret2) {
                ret = ret2;
                goto out;
@@ -1019,7 +1018,7 @@ found:
                if (poll_update->update_user_data)
                        preq->cqe.user_data = poll_update->new_user_data;
 
-               ret2 = io_poll_add(preq, issue_flags);
+               ret2 = io_poll_add(preq, issue_flags & ~IO_URING_F_UNLOCKED);
                /* successfully updated, don't complete poll request */
                if (!ret2 || ret2 == -EIOCBQUEUED)
                        goto out;
@@ -1027,9 +1026,9 @@ found:
 
        req_set_fail(preq);
        io_req_set_res(preq, -ECANCELED, 0);
-       ts.locked = !(issue_flags & IO_URING_F_UNLOCKED);
        io_req_task_complete(preq, &ts);
 out:
+       io_ring_submit_unlock(ctx, issue_flags);
        if (ret < 0) {
                req_set_fail(req);
                return ret;
index b2393b4..ff4d5d7 100644 (file)
@@ -38,3 +38,5 @@ bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
                        bool cancel_all);
 
 void io_apoll_cache_free(struct io_cache_entry *entry);
+
+void io_poll_task_func(struct io_kiocb *req, struct io_tw_state *ts);
index d46f72a..a2dce7e 100644 (file)
@@ -354,7 +354,6 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
        __s32 __user *fds = u64_to_user_ptr(up->data);
        struct io_rsrc_data *data = ctx->file_data;
        struct io_fixed_file *file_slot;
-       struct file *file;
        int fd, i, err = 0;
        unsigned int done;
 
@@ -382,15 +381,16 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
                file_slot = io_fixed_file_slot(&ctx->file_table, i);
 
                if (file_slot->file_ptr) {
-                       file = (struct file *)(file_slot->file_ptr & FFS_MASK);
-                       err = io_queue_rsrc_removal(data, i, file);
+                       err = io_queue_rsrc_removal(data, i,
+                                                   io_slot_file(file_slot));
                        if (err)
                                break;
                        file_slot->file_ptr = 0;
                        io_file_bitmap_clear(&ctx->file_table, i);
                }
                if (fd != -1) {
-                       file = fget(fd);
+                       struct file *file = fget(fd);
+
                        if (!file) {
                                err = -EBADF;
                                break;
index 3f118ed..1bce220 100644 (file)
@@ -283,7 +283,7 @@ static inline int io_fixup_rw_res(struct io_kiocb *req, long res)
        return res;
 }
 
-static void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts)
+void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts)
 {
        io_req_io_end(req);
 
@@ -666,8 +666,8 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
        if (unlikely(!file || !(file->f_mode & mode)))
                return -EBADF;
 
-       if (!io_req_ffs_set(req))
-               req->flags |= io_file_get_flags(file) << REQ_F_SUPPORT_NOWAIT_BIT;
+       if (!(req->flags & REQ_F_FIXED_FILE))
+               req->flags |= io_file_get_flags(file);
 
        kiocb->ki_flags = file->f_iocb_flags;
        ret = kiocb_set_rw_flags(kiocb, rw->flags);
index 3b733f4..4b89f96 100644 (file)
@@ -22,3 +22,4 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags);
 int io_writev_prep_async(struct io_kiocb *req);
 void io_readv_writev_cleanup(struct io_kiocb *req);
 void io_rw_fail(struct io_kiocb *req);
+void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts);
index 9db4bc1..5e329e3 100644 (file)
@@ -255,9 +255,13 @@ static int io_sq_thread(void *data)
                        sqt_spin = true;
 
                if (sqt_spin || !time_after(jiffies, timeout)) {
-                       cond_resched();
                        if (sqt_spin)
                                timeout = jiffies + sqd->sq_thread_idle;
+                       if (unlikely(need_resched())) {
+                               mutex_unlock(&sqd->lock);
+                               cond_resched();
+                               mutex_lock(&sqd->lock);
+                       }
                        continue;
                }
 
index 3a8d1dd..c043fe9 100644 (file)
@@ -208,31 +208,40 @@ void io_uring_unreg_ringfd(void)
        }
 }
 
-static int io_ring_add_registered_fd(struct io_uring_task *tctx, int fd,
+int io_ring_add_registered_file(struct io_uring_task *tctx, struct file *file,
                                     int start, int end)
 {
-       struct file *file;
        int offset;
-
        for (offset = start; offset < end; offset++) {
                offset = array_index_nospec(offset, IO_RINGFD_REG_MAX);
                if (tctx->registered_rings[offset])
                        continue;
 
-               file = fget(fd);
-               if (!file) {
-                       return -EBADF;
-               } else if (!io_is_uring_fops(file)) {
-                       fput(file);
-                       return -EOPNOTSUPP;
-               }
                tctx->registered_rings[offset] = file;
                return offset;
        }
-
        return -EBUSY;
 }
 
+static int io_ring_add_registered_fd(struct io_uring_task *tctx, int fd,
+                                    int start, int end)
+{
+       struct file *file;
+       int offset;
+
+       file = fget(fd);
+       if (!file) {
+               return -EBADF;
+       } else if (!io_is_uring_fops(file)) {
+               fput(file);
+               return -EOPNOTSUPP;
+       }
+       offset = io_ring_add_registered_file(tctx, file, start, end);
+       if (offset < 0)
+               fput(file);
+       return offset;
+}
+
 /*
  * Register a ring fd to avoid fdget/fdput for each io_uring_enter()
  * invocation. User passes in an array of struct io_uring_rsrc_update
index fc95017..fb0547b 100644 (file)
@@ -73,8 +73,8 @@ static void io_timeout_complete(struct io_kiocb *req, struct io_tw_state *ts)
 
        if (!io_timeout_finish(timeout, data)) {
                bool filled;
-               filled = io_aux_cqe(ctx, ts->locked, req->cqe.user_data, -ETIME,
-                                   IORING_CQE_F_MORE, false);
+               filled = io_aux_cqe(req, ts->locked, -ETIME, IORING_CQE_F_MORE,
+                                   false);
                if (filled) {
                        /* re-arm timer */
                        spin_lock_irq(&ctx->timeout_lock);
@@ -594,7 +594,7 @@ int io_timeout(struct io_kiocb *req, unsigned int issue_flags)
                goto add;
        }
 
-       tail = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
+       tail = data_race(ctx->cached_cq_tail) - atomic_read(&ctx->cq_timeouts);
        timeout->target_seq = tail + off;
 
        /* Update the last seq here in case io_flush_timeouts() hasn't.
index 5e32db4..476c787 100644 (file)
@@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
        ioucmd->task_work_cb(ioucmd, issue_flags);
 }
 
-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
-                       void (*task_work_cb)(struct io_uring_cmd *, unsigned))
+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
+                       void (*task_work_cb)(struct io_uring_cmd *, unsigned),
+                       unsigned flags)
 {
        struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
 
        ioucmd->task_work_cb = task_work_cb;
        req->io_task_work.func = io_uring_cmd_work;
-       io_req_task_work_add(req);
+       __io_req_task_work_add(req, flags);
+}
+EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task);
+
+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
+                       void (*task_work_cb)(struct io_uring_cmd *, unsigned))
+{
+       __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE);
 }
-EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task);
+EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy);
 
 static inline void io_req_set_cqe32_extra(struct io_kiocb *req,
                                          u64 extra1, u64 extra2)
index c57b008..94738bc 100644 (file)
@@ -259,8 +259,8 @@ extern struct tty_struct *audit_get_tty(void);
 extern void audit_put_tty(struct tty_struct *tty);
 
 /* audit watch/mark/tree functions */
-#ifdef CONFIG_AUDITSYSCALL
 extern unsigned int audit_serial(void);
+#ifdef CONFIG_AUDITSYSCALL
 extern int auditsc_get_stamp(struct audit_context *ctx,
                              struct timespec64 *t, unsigned int *serial);
 
index 6b682b8..72b32b7 100644 (file)
@@ -744,13 +744,12 @@ static bool btf_name_offset_valid(const struct btf *btf, u32 offset)
        return offset < btf->hdr.str_len;
 }
 
-static bool __btf_name_char_ok(char c, bool first, bool dot_ok)
+static bool __btf_name_char_ok(char c, bool first)
 {
        if ((first ? !isalpha(c) :
                     !isalnum(c)) &&
            c != '_' &&
-           ((c == '.' && !dot_ok) ||
-             c != '.'))
+           c != '.')
                return false;
        return true;
 }
@@ -767,20 +766,20 @@ static const char *btf_str_by_offset(const struct btf *btf, u32 offset)
        return NULL;
 }
 
-static bool __btf_name_valid(const struct btf *btf, u32 offset, bool dot_ok)
+static bool __btf_name_valid(const struct btf *btf, u32 offset)
 {
        /* offset must be valid */
        const char *src = btf_str_by_offset(btf, offset);
        const char *src_limit;
 
-       if (!__btf_name_char_ok(*src, true, dot_ok))
+       if (!__btf_name_char_ok(*src, true))
                return false;
 
        /* set a limit on identifier length */
        src_limit = src + KSYM_NAME_LEN;
        src++;
        while (*src && src < src_limit) {
-               if (!__btf_name_char_ok(*src, false, dot_ok))
+               if (!__btf_name_char_ok(*src, false))
                        return false;
                src++;
        }
@@ -788,17 +787,14 @@ static bool __btf_name_valid(const struct btf *btf, u32 offset, bool dot_ok)
        return !*src;
 }
 
-/* Only C-style identifier is permitted. This can be relaxed if
- * necessary.
- */
 static bool btf_name_valid_identifier(const struct btf *btf, u32 offset)
 {
-       return __btf_name_valid(btf, offset, false);
+       return __btf_name_valid(btf, offset);
 }
 
 static bool btf_name_valid_section(const struct btf *btf, u32 offset)
 {
-       return __btf_name_valid(btf, offset, true);
+       return __btf_name_valid(btf, offset);
 }
 
 static const char *__btf_name_by_offset(const struct btf *btf, u32 offset)
@@ -4422,7 +4418,7 @@ static s32 btf_var_check_meta(struct btf_verifier_env *env,
        }
 
        if (!t->name_off ||
-           !__btf_name_valid(env->btf, t->name_off, true)) {
+           !__btf_name_valid(env->btf, t->name_off)) {
                btf_verifier_log_type(env, t, "Invalid name");
                return -EINVAL;
        }
index 00c253b..9901efe 100644 (file)
@@ -1215,7 +1215,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 
        ret = htab_lock_bucket(htab, b, hash, &flags);
        if (ret)
-               return ret;
+               goto err_lock_bucket;
 
        l_old = lookup_elem_raw(head, hash, key, key_size);
 
@@ -1236,6 +1236,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 err:
        htab_unlock_bucket(htab, b, hash, flags);
 
+err_lock_bucket:
        if (ret)
                htab_lru_push_free(htab, l_new);
        else if (l_old)
@@ -1338,7 +1339,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 
        ret = htab_lock_bucket(htab, b, hash, &flags);
        if (ret)
-               return ret;
+               goto err_lock_bucket;
 
        l_old = lookup_elem_raw(head, hash, key, key_size);
 
@@ -1361,6 +1362,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
        ret = 0;
 err:
        htab_unlock_bucket(htab, b, hash, flags);
+err_lock_bucket:
        if (l_new)
                bpf_lru_push_free(&htab->lru, &l_new->lru_node);
        return ret;
index 2c5c64c..cd5eafa 100644 (file)
@@ -69,9 +69,13 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
        /* Misc members not needed in bpf_map_meta_equal() check. */
        inner_map_meta->ops = inner_map->ops;
        if (inner_map->ops == &array_map_ops) {
+               struct bpf_array *inner_array_meta =
+                       container_of(inner_map_meta, struct bpf_array, map);
+               struct bpf_array *inner_array = container_of(inner_map, struct bpf_array, map);
+
+               inner_array_meta->index_mask = inner_array->index_mask;
+               inner_array_meta->elem_size = inner_array->elem_size;
                inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1;
-               container_of(inner_map_meta, struct bpf_array, map)->index_mask =
-                    container_of(inner_map, struct bpf_array, map)->index_mask;
        }
 
        fdput(f);
index d9c9f45..8a26cd8 100644 (file)
@@ -859,4 +859,4 @@ static int __init bpf_offload_init(void)
        return rhashtable_init(&offdevs, &offdevs_params);
 }
 
-late_initcall(bpf_offload_init);
+core_initcall(bpf_offload_init);
index 14f39c1..f1c8733 100644 (file)
@@ -2433,6 +2433,10 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
                default:
                        return -EINVAL;
                }
+       case BPF_PROG_TYPE_NETFILTER:
+               if (expected_attach_type == BPF_NETFILTER)
+                       return 0;
+               return -EINVAL;
        case BPF_PROG_TYPE_SYSCALL:
        case BPF_PROG_TYPE_EXT:
                if (expected_attach_type)
@@ -3436,6 +3440,11 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
                return prog->enforce_expected_attach_type &&
                        prog->expected_attach_type != attach_type ?
                        -EINVAL : 0;
+       case BPF_PROG_TYPE_KPROBE:
+               if (prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI &&
+                   attach_type != BPF_TRACE_KPROBE_MULTI)
+                       return -EINVAL;
+               return 0;
        default:
                return 0;
        }
@@ -4590,7 +4599,12 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 
        switch (prog->type) {
        case BPF_PROG_TYPE_EXT:
+               break;
        case BPF_PROG_TYPE_NETFILTER:
+               if (attr->link_create.attach_type != BPF_NETFILTER) {
+                       ret = -EINVAL;
+                       goto out;
+               }
                break;
        case BPF_PROG_TYPE_PERF_EVENT:
        case BPF_PROG_TYPE_TRACEPOINT:
index fbcf5a4..cf5f230 100644 (file)
@@ -3868,6 +3868,9 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
                                return err;
                }
                save_register_state(state, spi, reg, size);
+               /* Break the relation on a narrowing spill. */
+               if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
+                       state->stack[spi].spilled_ptr.id = 0;
        } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
                   insn->imm != 0 && env->bpf_capable) {
                struct bpf_reg_state fake_reg = {};
@@ -17033,7 +17036,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
                                        insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
                                                                        insn->dst_reg,
                                                                        shift);
-                               insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
+                               insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
                                                                (1ULL << size * 8) - 1);
                        }
                }
@@ -17214,9 +17217,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
        }
 
        /* finally lock prog and jit images for all functions and
-        * populate kallsysm
+        * populate kallsysm. Begin at the first subprogram, since
+        * bpf_prog_load will add the kallsyms for the main program.
         */
-       for (i = 0; i < env->subprog_cnt; i++) {
+       for (i = 1; i < env->subprog_cnt; i++) {
                bpf_prog_lock_ro(func[i]);
                bpf_prog_kallsyms_add(func[i]);
        }
@@ -17242,6 +17246,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
        prog->jited = 1;
        prog->bpf_func = func[0]->bpf_func;
        prog->jited_len = func[0]->jited_len;
+       prog->aux->extable = func[0]->aux->extable;
+       prog->aux->num_exentries = func[0]->aux->num_exentries;
        prog->aux->func = func;
        prog->aux->func_cnt = env->subprog_cnt;
        bpf_prog_jit_attempt_done(prog);
index 367b0a4..c56071f 100644 (file)
@@ -220,8 +220,6 @@ static inline void get_css_set(struct css_set *cset)
 
 bool cgroup_ssid_enabled(int ssid);
 bool cgroup_on_dfl(const struct cgroup *cgrp);
-bool cgroup_is_thread_root(struct cgroup *cgrp);
-bool cgroup_is_threaded(struct cgroup *cgrp);
 
 struct cgroup_root *cgroup_root_from_kf(struct kernfs_root *kf_root);
 struct cgroup *task_cgroup_from_root(struct task_struct *task,
index aeef06c..8304431 100644 (file)
@@ -108,7 +108,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
 
        cgroup_lock();
 
-       percpu_down_write(&cgroup_threadgroup_rwsem);
+       cgroup_attach_lock(true);
 
        /* all tasks in @from are being moved, all csets are source */
        spin_lock_irq(&css_set_lock);
@@ -144,7 +144,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
        } while (task && !ret);
 out_err:
        cgroup_migrate_finish(&mgctx);
-       percpu_up_write(&cgroup_threadgroup_rwsem);
+       cgroup_attach_unlock(true);
        cgroup_unlock();
        return ret;
 }
@@ -563,7 +563,7 @@ static ssize_t cgroup_release_agent_write(struct kernfs_open_file *of,
        if (!cgrp)
                return -ENODEV;
        spin_lock(&release_agent_path_lock);
-       strlcpy(cgrp->root->release_agent_path, strstrip(buf),
+       strscpy(cgrp->root->release_agent_path, strstrip(buf),
                sizeof(cgrp->root->release_agent_path));
        spin_unlock(&release_agent_path_lock);
        cgroup_kn_unlock(of->kn);
@@ -797,7 +797,7 @@ void cgroup1_release_agent(struct work_struct *work)
                goto out_free;
 
        spin_lock(&release_agent_path_lock);
-       strlcpy(agentbuf, cgrp->root->release_agent_path, PATH_MAX);
+       strscpy(agentbuf, cgrp->root->release_agent_path, PATH_MAX);
        spin_unlock(&release_agent_path_lock);
        if (!agentbuf[0])
                goto out_free;
index 625d748..bfe3cd8 100644 (file)
@@ -57,6 +57,7 @@
 #include <linux/file.h>
 #include <linux/fs_parser.h>
 #include <linux/sched/cputime.h>
+#include <linux/sched/deadline.h>
 #include <linux/psi.h>
 #include <net/sock.h>
 
@@ -312,8 +313,6 @@ bool cgroup_ssid_enabled(int ssid)
  *   masks of ancestors.
  *
  * - blkcg: blk-throttle becomes properly hierarchical.
- *
- * - debug: disallowed on the default hierarchy.
  */
 bool cgroup_on_dfl(const struct cgroup *cgrp)
 {
@@ -356,7 +355,7 @@ static bool cgroup_has_tasks(struct cgroup *cgrp)
        return cgrp->nr_populated_csets;
 }
 
-bool cgroup_is_threaded(struct cgroup *cgrp)
+static bool cgroup_is_threaded(struct cgroup *cgrp)
 {
        return cgrp->dom_cgrp != cgrp;
 }
@@ -395,7 +394,7 @@ static bool cgroup_can_be_thread_root(struct cgroup *cgrp)
 }
 
 /* is @cgrp root of a threaded subtree? */
-bool cgroup_is_thread_root(struct cgroup *cgrp)
+static bool cgroup_is_thread_root(struct cgroup *cgrp)
 {
        /* thread root should be a domain */
        if (cgroup_is_threaded(cgrp))
@@ -618,7 +617,7 @@ EXPORT_SYMBOL_GPL(cgroup_get_e_css);
 static void cgroup_get_live(struct cgroup *cgrp)
 {
        WARN_ON_ONCE(cgroup_is_dead(cgrp));
-       css_get(&cgrp->self);
+       cgroup_get(cgrp);
 }
 
 /**
@@ -690,21 +689,6 @@ EXPORT_SYMBOL_GPL(of_css);
                else
 
 /**
- * for_each_e_css - iterate all effective css's of a cgroup
- * @css: the iteration cursor
- * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end
- * @cgrp: the target cgroup to iterate css's of
- *
- * Should be called under cgroup_[tree_]mutex.
- */
-#define for_each_e_css(css, ssid, cgrp)                                            \
-       for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++)            \
-               if (!((css) = cgroup_e_css_by_mask(cgrp,                    \
-                                                  cgroup_subsys[(ssid)]))) \
-                       ;                                                   \
-               else
-
-/**
  * do_each_subsys_mask - filter for_each_subsys with a bitmask
  * @ss: the iteration cursor
  * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end
@@ -1798,7 +1782,7 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
 {
        struct cgroup *dcgrp = &dst_root->cgrp;
        struct cgroup_subsys *ss;
-       int ssid, i, ret;
+       int ssid, ret;
        u16 dfl_disable_ss_mask = 0;
 
        lockdep_assert_held(&cgroup_mutex);
@@ -1842,7 +1826,8 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
                struct cgroup_root *src_root = ss->root;
                struct cgroup *scgrp = &src_root->cgrp;
                struct cgroup_subsys_state *css = cgroup_css(scgrp, ss);
-               struct css_set *cset;
+               struct css_set *cset, *cset_pos;
+               struct css_task_iter *it;
 
                WARN_ON(!css || cgroup_css(dcgrp, ss));
 
@@ -1860,9 +1845,22 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
                css->cgroup = dcgrp;
 
                spin_lock_irq(&css_set_lock);
-               hash_for_each(css_set_table, i, cset, hlist)
+               WARN_ON(!list_empty(&dcgrp->e_csets[ss->id]));
+               list_for_each_entry_safe(cset, cset_pos, &scgrp->e_csets[ss->id],
+                                        e_cset_node[ss->id]) {
                        list_move_tail(&cset->e_cset_node[ss->id],
                                       &dcgrp->e_csets[ss->id]);
+                       /*
+                        * all css_sets of scgrp together in same order to dcgrp,
+                        * patch in-flight iterators to preserve correct iteration.
+                        * since the iterator is always advanced right away and
+                        * finished when it->cset_pos meets it->cset_head, so only
+                        * update it->cset_head is enough here.
+                        */
+                       list_for_each_entry(it, &cset->task_iters, iters_node)
+                               if (it->cset_head == &scgrp->e_csets[ss->id])
+                                       it->cset_head = &dcgrp->e_csets[ss->id];
+               }
                spin_unlock_irq(&css_set_lock);
 
                if (ss->css_rstat_flush) {
@@ -2379,45 +2377,6 @@ int cgroup_path_ns(struct cgroup *cgrp, char *buf, size_t buflen,
 EXPORT_SYMBOL_GPL(cgroup_path_ns);
 
 /**
- * task_cgroup_path - cgroup path of a task in the first cgroup hierarchy
- * @task: target task
- * @buf: the buffer to write the path into
- * @buflen: the length of the buffer
- *
- * Determine @task's cgroup on the first (the one with the lowest non-zero
- * hierarchy_id) cgroup hierarchy and copy its path into @buf.  This
- * function grabs cgroup_mutex and shouldn't be used inside locks used by
- * cgroup controller callbacks.
- *
- * Return value is the same as kernfs_path().
- */
-int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
-{
-       struct cgroup_root *root;
-       struct cgroup *cgrp;
-       int hierarchy_id = 1;
-       int ret;
-
-       cgroup_lock();
-       spin_lock_irq(&css_set_lock);
-
-       root = idr_get_next(&cgroup_hierarchy_idr, &hierarchy_id);
-
-       if (root) {
-               cgrp = task_cgroup_from_root(task, root);
-               ret = cgroup_path_ns_locked(cgrp, buf, buflen, &init_cgroup_ns);
-       } else {
-               /* if no hierarchy exists, everyone is in "/" */
-               ret = strscpy(buf, "/", buflen);
-       }
-
-       spin_unlock_irq(&css_set_lock);
-       cgroup_unlock();
-       return ret;
-}
-EXPORT_SYMBOL_GPL(task_cgroup_path);
-
-/**
  * cgroup_attach_lock - Lock for ->attach()
  * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
  *
@@ -2871,9 +2830,9 @@ int cgroup_migrate(struct task_struct *leader, bool threadgroup,
        struct task_struct *task;
 
        /*
-        * Prevent freeing of tasks while we take a snapshot. Tasks that are
-        * already PF_EXITING could be freed from underneath us unless we
-        * take an rcu_read_lock.
+        * The following thread iteration should be inside an RCU critical
+        * section to prevent tasks from being freed while taking the snapshot.
+        * spin_lock_irq() implies RCU critical section here.
         */
        spin_lock_irq(&css_set_lock);
        task = leader;
@@ -3877,6 +3836,14 @@ static __poll_t cgroup_pressure_poll(struct kernfs_open_file *of,
        return psi_trigger_poll(&ctx->psi.trigger, of->file, pt);
 }
 
+static int cgroup_pressure_open(struct kernfs_open_file *of)
+{
+       if (of->file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE))
+               return -EPERM;
+
+       return 0;
+}
+
 static void cgroup_pressure_release(struct kernfs_open_file *of)
 {
        struct cgroup_file_ctx *ctx = of->priv;
@@ -5276,6 +5243,7 @@ static struct cftype cgroup_psi_files[] = {
        {
                .name = "io.pressure",
                .file_offset = offsetof(struct cgroup, psi_files[PSI_IO]),
+               .open = cgroup_pressure_open,
                .seq_show = cgroup_io_pressure_show,
                .write = cgroup_io_pressure_write,
                .poll = cgroup_pressure_poll,
@@ -5284,6 +5252,7 @@ static struct cftype cgroup_psi_files[] = {
        {
                .name = "memory.pressure",
                .file_offset = offsetof(struct cgroup, psi_files[PSI_MEM]),
+               .open = cgroup_pressure_open,
                .seq_show = cgroup_memory_pressure_show,
                .write = cgroup_memory_pressure_write,
                .poll = cgroup_pressure_poll,
@@ -5292,6 +5261,7 @@ static struct cftype cgroup_psi_files[] = {
        {
                .name = "cpu.pressure",
                .file_offset = offsetof(struct cgroup, psi_files[PSI_CPU]),
+               .open = cgroup_pressure_open,
                .seq_show = cgroup_cpu_pressure_show,
                .write = cgroup_cpu_pressure_write,
                .poll = cgroup_pressure_poll,
@@ -5301,6 +5271,7 @@ static struct cftype cgroup_psi_files[] = {
        {
                .name = "irq.pressure",
                .file_offset = offsetof(struct cgroup, psi_files[PSI_IRQ]),
+               .open = cgroup_pressure_open,
                .seq_show = cgroup_irq_pressure_show,
                .write = cgroup_irq_pressure_write,
                .poll = cgroup_pressure_poll,
@@ -6486,19 +6457,18 @@ err:
 static void cgroup_css_set_put_fork(struct kernel_clone_args *kargs)
        __releases(&cgroup_threadgroup_rwsem) __releases(&cgroup_mutex)
 {
+       struct cgroup *cgrp = kargs->cgrp;
+       struct css_set *cset = kargs->cset;
+
        cgroup_threadgroup_change_end(current);
 
-       if (kargs->flags & CLONE_INTO_CGROUP) {
-               struct cgroup *cgrp = kargs->cgrp;
-               struct css_set *cset = kargs->cset;
+       if (cset) {
+               put_css_set(cset);
+               kargs->cset = NULL;
+       }
 
+       if (kargs->flags & CLONE_INTO_CGROUP) {
                cgroup_unlock();
-
-               if (cset) {
-                       put_css_set(cset);
-                       kargs->cset = NULL;
-               }
-
                if (cgrp) {
                        cgroup_put(cgrp);
                        kargs->cgrp = NULL;
@@ -6683,6 +6653,9 @@ void cgroup_exit(struct task_struct *tsk)
        list_add_tail(&tsk->cg_list, &cset->dying_tasks);
        cset->nr_tasks--;
 
+       if (dl_task(tsk))
+               dec_dl_tasks_cs(tsk);
+
        WARN_ON_ONCE(cgroup_task_frozen(tsk));
        if (unlikely(!(tsk->flags & PF_KTHREAD) &&
                     test_bit(CGRP_FREEZE, &task_dfl_cgroup(tsk)->flags)))
index e4ca2dd..58e6f18 100644 (file)
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
 #include <linux/cpuset.h>
-#include <linux/err.h>
-#include <linux/errno.h>
-#include <linux/file.h>
-#include <linux/fs.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
-#include <linux/kmod.h>
-#include <linux/kthread.h>
-#include <linux/list.h>
 #include <linux/mempolicy.h>
 #include <linux/mm.h>
 #include <linux/memory.h>
 #include <linux/export.h>
-#include <linux/mount.h>
-#include <linux/fs_context.h>
-#include <linux/namei.h>
-#include <linux/pagemap.h>
-#include <linux/proc_fs.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
 #include <linux/sched/deadline.h>
 #include <linux/sched/mm.h>
 #include <linux/sched/task.h>
-#include <linux/seq_file.h>
 #include <linux/security.h>
-#include <linux/slab.h>
 #include <linux/spinlock.h>
-#include <linux/stat.h>
-#include <linux/string.h>
-#include <linux/time.h>
-#include <linux/time64.h>
-#include <linux/backing-dev.h>
-#include <linux/sort.h>
 #include <linux/oom.h>
 #include <linux/sched/isolation.h>
-#include <linux/uaccess.h>
-#include <linux/atomic.h>
-#include <linux/mutex.h>
 #include <linux/cgroup.h>
 #include <linux/wait.h>
 
@@ -193,6 +170,14 @@ struct cpuset {
        int use_parent_ecpus;
        int child_ecpus_count;
 
+       /*
+        * number of SCHED_DEADLINE tasks attached to this cpuset, so that we
+        * know when to rebuild associated root domain bandwidth information.
+        */
+       int nr_deadline_tasks;
+       int nr_migrate_dl_tasks;
+       u64 sum_migrate_dl_bw;
+
        /* Invalid partition error code, not lock protected */
        enum prs_errcode prs_err;
 
@@ -245,6 +230,20 @@ static inline struct cpuset *parent_cs(struct cpuset *cs)
        return css_cs(cs->css.parent);
 }
 
+void inc_dl_tasks_cs(struct task_struct *p)
+{
+       struct cpuset *cs = task_cs(p);
+
+       cs->nr_deadline_tasks++;
+}
+
+void dec_dl_tasks_cs(struct task_struct *p)
+{
+       struct cpuset *cs = task_cs(p);
+
+       cs->nr_deadline_tasks--;
+}
+
 /* bits in struct cpuset flags field */
 typedef enum {
        CS_ONLINE,
@@ -366,22 +365,23 @@ static struct cpuset top_cpuset = {
                if (is_cpuset_online(((des_cs) = css_cs((pos_css)))))
 
 /*
- * There are two global locks guarding cpuset structures - cpuset_rwsem and
+ * There are two global locks guarding cpuset structures - cpuset_mutex and
  * callback_lock. We also require taking task_lock() when dereferencing a
  * task's cpuset pointer. See "The task_lock() exception", at the end of this
- * comment.  The cpuset code uses only cpuset_rwsem write lock.  Other
- * kernel subsystems can use cpuset_read_lock()/cpuset_read_unlock() to
- * prevent change to cpuset structures.
+ * comment.  The cpuset code uses only cpuset_mutex. Other kernel subsystems
+ * can use cpuset_lock()/cpuset_unlock() to prevent change to cpuset
+ * structures. Note that cpuset_mutex needs to be a mutex as it is used in
+ * paths that rely on priority inheritance (e.g. scheduler - on RT) for
+ * correctness.
  *
  * A task must hold both locks to modify cpusets.  If a task holds
- * cpuset_rwsem, it blocks others wanting that rwsem, ensuring that it
- * is the only task able to also acquire callback_lock and be able to
- * modify cpusets.  It can perform various checks on the cpuset structure
- * first, knowing nothing will change.  It can also allocate memory while
- * just holding cpuset_rwsem.  While it is performing these checks, various
- * callback routines can briefly acquire callback_lock to query cpusets.
- * Once it is ready to make the changes, it takes callback_lock, blocking
- * everyone else.
+ * cpuset_mutex, it blocks others, ensuring that it is the only task able to
+ * also acquire callback_lock and be able to modify cpusets.  It can perform
+ * various checks on the cpuset structure first, knowing nothing will change.
+ * It can also allocate memory while just holding cpuset_mutex.  While it is
+ * performing these checks, various callback routines can briefly acquire
+ * callback_lock to query cpusets.  Once it is ready to make the changes, it
+ * takes callback_lock, blocking everyone else.
  *
  * Calls to the kernel memory allocator can not be made while holding
  * callback_lock, as that would risk double tripping on callback_lock
@@ -403,16 +403,16 @@ static struct cpuset top_cpuset = {
  * guidelines for accessing subsystem state in kernel/cgroup.c
  */
 
-DEFINE_STATIC_PERCPU_RWSEM(cpuset_rwsem);
+static DEFINE_MUTEX(cpuset_mutex);
 
-void cpuset_read_lock(void)
+void cpuset_lock(void)
 {
-       percpu_down_read(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
 }
 
-void cpuset_read_unlock(void)
+void cpuset_unlock(void)
 {
-       percpu_up_read(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
 }
 
 static DEFINE_SPINLOCK(callback_lock);
@@ -496,7 +496,7 @@ static inline bool partition_is_populated(struct cpuset *cs,
  * One way or another, we guarantee to return some non-empty subset
  * of cpu_online_mask.
  *
- * Call with callback_lock or cpuset_rwsem held.
+ * Call with callback_lock or cpuset_mutex held.
  */
 static void guarantee_online_cpus(struct task_struct *tsk,
                                  struct cpumask *pmask)
@@ -538,7 +538,7 @@ out_unlock:
  * One way or another, we guarantee to return some non-empty subset
  * of node_states[N_MEMORY].
  *
- * Call with callback_lock or cpuset_rwsem held.
+ * Call with callback_lock or cpuset_mutex held.
  */
 static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
 {
@@ -550,7 +550,7 @@ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
 /*
  * update task's spread flag if cpuset's page/slab spread flag is set
  *
- * Call with callback_lock or cpuset_rwsem held. The check can be skipped
+ * Call with callback_lock or cpuset_mutex held. The check can be skipped
  * if on default hierarchy.
  */
 static void cpuset_update_task_spread_flags(struct cpuset *cs,
@@ -575,7 +575,7 @@ static void cpuset_update_task_spread_flags(struct cpuset *cs,
  *
  * One cpuset is a subset of another if all its allowed CPUs and
  * Memory Nodes are a subset of the other, and its exclusive flags
- * are only set if the other's are set.  Call holding cpuset_rwsem.
+ * are only set if the other's are set.  Call holding cpuset_mutex.
  */
 
 static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
@@ -713,7 +713,7 @@ out:
  * If we replaced the flag and mask values of the current cpuset
  * (cur) with those values in the trial cpuset (trial), would
  * our various subset and exclusive rules still be valid?  Presumes
- * cpuset_rwsem held.
+ * cpuset_mutex held.
  *
  * 'cur' is the address of an actual, in-use cpuset.  Operations
  * such as list traversal that depend on the actual address of the
@@ -829,7 +829,7 @@ static void update_domain_attr_tree(struct sched_domain_attr *dattr,
        rcu_read_unlock();
 }
 
-/* Must be called with cpuset_rwsem held.  */
+/* Must be called with cpuset_mutex held.  */
 static inline int nr_cpusets(void)
 {
        /* jump label reference count + the top-level cpuset */
@@ -855,7 +855,7 @@ static inline int nr_cpusets(void)
  * domains when operating in the severe memory shortage situations
  * that could cause allocation failures below.
  *
- * Must be called with cpuset_rwsem held.
+ * Must be called with cpuset_mutex held.
  *
  * The three key local variables below are:
  *    cp - cpuset pointer, used (together with pos_css) to perform a
@@ -1066,11 +1066,14 @@ done:
        return ndoms;
 }
 
-static void update_tasks_root_domain(struct cpuset *cs)
+static void dl_update_tasks_root_domain(struct cpuset *cs)
 {
        struct css_task_iter it;
        struct task_struct *task;
 
+       if (cs->nr_deadline_tasks == 0)
+               return;
+
        css_task_iter_start(&cs->css, 0, &it);
 
        while ((task = css_task_iter_next(&it)))
@@ -1079,12 +1082,12 @@ static void update_tasks_root_domain(struct cpuset *cs)
        css_task_iter_end(&it);
 }
 
-static void rebuild_root_domains(void)
+static void dl_rebuild_rd_accounting(void)
 {
        struct cpuset *cs = NULL;
        struct cgroup_subsys_state *pos_css;
 
-       percpu_rwsem_assert_held(&cpuset_rwsem);
+       lockdep_assert_held(&cpuset_mutex);
        lockdep_assert_cpus_held();
        lockdep_assert_held(&sched_domains_mutex);
 
@@ -1107,7 +1110,7 @@ static void rebuild_root_domains(void)
 
                rcu_read_unlock();
 
-               update_tasks_root_domain(cs);
+               dl_update_tasks_root_domain(cs);
 
                rcu_read_lock();
                css_put(&cs->css);
@@ -1121,7 +1124,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
 {
        mutex_lock(&sched_domains_mutex);
        partition_sched_domains_locked(ndoms_new, doms_new, dattr_new);
-       rebuild_root_domains();
+       dl_rebuild_rd_accounting();
        mutex_unlock(&sched_domains_mutex);
 }
 
@@ -1134,7 +1137,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
  * 'cpus' is removed, then call this routine to rebuild the
  * scheduler's dynamic sched domains.
  *
- * Call with cpuset_rwsem held.  Takes cpus_read_lock().
+ * Call with cpuset_mutex held.  Takes cpus_read_lock().
  */
 static void rebuild_sched_domains_locked(void)
 {
@@ -1145,7 +1148,7 @@ static void rebuild_sched_domains_locked(void)
        int ndoms;
 
        lockdep_assert_cpus_held();
-       percpu_rwsem_assert_held(&cpuset_rwsem);
+       lockdep_assert_held(&cpuset_mutex);
 
        /*
         * If we have raced with CPU hotplug, return early to avoid
@@ -1196,9 +1199,9 @@ static void rebuild_sched_domains_locked(void)
 void rebuild_sched_domains(void)
 {
        cpus_read_lock();
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        rebuild_sched_domains_locked();
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
        cpus_read_unlock();
 }
 
@@ -1208,7 +1211,7 @@ void rebuild_sched_domains(void)
  * @new_cpus: the temp variable for the new effective_cpus mask
  *
  * Iterate through each task of @cs updating its cpus_allowed to the
- * effective cpuset's.  As this function is called with cpuset_rwsem held,
+ * effective cpuset's.  As this function is called with cpuset_mutex held,
  * cpuset membership stays stable. For top_cpuset, task_cpu_possible_mask()
  * is used instead of effective_cpus to make sure all offline CPUs are also
  * included as hotplug code won't update cpumasks for tasks in top_cpuset.
@@ -1322,7 +1325,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
        int old_prs, new_prs;
        int part_error = PERR_NONE;     /* Partition error? */
 
-       percpu_rwsem_assert_held(&cpuset_rwsem);
+       lockdep_assert_held(&cpuset_mutex);
 
        /*
         * The parent must be a partition root.
@@ -1545,7 +1548,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
  *
  * On legacy hierarchy, effective_cpus will be the same with cpu_allowed.
  *
- * Called with cpuset_rwsem held
+ * Called with cpuset_mutex held
  */
 static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
                                 bool force)
@@ -1705,7 +1708,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
        struct cpuset *sibling;
        struct cgroup_subsys_state *pos_css;
 
-       percpu_rwsem_assert_held(&cpuset_rwsem);
+       lockdep_assert_held(&cpuset_mutex);
 
        /*
         * Check all its siblings and call update_cpumasks_hier()
@@ -1955,12 +1958,12 @@ static void *cpuset_being_rebound;
  * @cs: the cpuset in which each task's mems_allowed mask needs to be changed
  *
  * Iterate through each task of @cs updating its mems_allowed to the
- * effective cpuset's.  As this function is called with cpuset_rwsem held,
+ * effective cpuset's.  As this function is called with cpuset_mutex held,
  * cpuset membership stays stable.
  */
 static void update_tasks_nodemask(struct cpuset *cs)
 {
-       static nodemask_t newmems;      /* protected by cpuset_rwsem */
+       static nodemask_t newmems;      /* protected by cpuset_mutex */
        struct css_task_iter it;
        struct task_struct *task;
 
@@ -1973,7 +1976,7 @@ static void update_tasks_nodemask(struct cpuset *cs)
         * take while holding tasklist_lock.  Forks can happen - the
         * mpol_dup() cpuset_being_rebound check will catch such forks,
         * and rebind their vma mempolicies too.  Because we still hold
-        * the global cpuset_rwsem, we know that no other rebind effort
+        * the global cpuset_mutex, we know that no other rebind effort
         * will be contending for the global variable cpuset_being_rebound.
         * It's ok if we rebind the same mm twice; mpol_rebind_mm()
         * is idempotent.  Also migrate pages in each mm to new nodes.
@@ -2019,7 +2022,7 @@ static void update_tasks_nodemask(struct cpuset *cs)
  *
  * On legacy hierarchy, effective_mems will be the same with mems_allowed.
  *
- * Called with cpuset_rwsem held
+ * Called with cpuset_mutex held
  */
 static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
 {
@@ -2072,7 +2075,7 @@ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
  * mempolicies and if the cpuset is marked 'memory_migrate',
  * migrate the tasks pages to the new memory.
  *
- * Call with cpuset_rwsem held. May take callback_lock during call.
+ * Call with cpuset_mutex held. May take callback_lock during call.
  * Will take tasklist_lock, scan tasklist for tasks in cpuset cs,
  * lock each such tasks mm->mmap_lock, scan its vma's and rebind
  * their mempolicies to the cpusets new mems_allowed.
@@ -2164,7 +2167,7 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val)
  * @cs: the cpuset in which each task's spread flags needs to be changed
  *
  * Iterate through each task of @cs updating its spread flags.  As this
- * function is called with cpuset_rwsem held, cpuset membership stays
+ * function is called with cpuset_mutex held, cpuset membership stays
  * stable.
  */
 static void update_tasks_flags(struct cpuset *cs)
@@ -2184,7 +2187,7 @@ static void update_tasks_flags(struct cpuset *cs)
  * cs:         the cpuset to update
  * turning_on:         whether the flag is being set or cleared
  *
- * Call with cpuset_rwsem held.
+ * Call with cpuset_mutex held.
  */
 
 static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
@@ -2234,7 +2237,7 @@ out:
  * @new_prs: new partition root state
  * Return: 0 if successful, != 0 if error
  *
- * Call with cpuset_rwsem held.
+ * Call with cpuset_mutex held.
  */
 static int update_prstate(struct cpuset *cs, int new_prs)
 {
@@ -2472,19 +2475,26 @@ static int cpuset_can_attach_check(struct cpuset *cs)
        return 0;
 }
 
-/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */
+static void reset_migrate_dl_data(struct cpuset *cs)
+{
+       cs->nr_migrate_dl_tasks = 0;
+       cs->sum_migrate_dl_bw = 0;
+}
+
+/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
 static int cpuset_can_attach(struct cgroup_taskset *tset)
 {
        struct cgroup_subsys_state *css;
-       struct cpuset *cs;
+       struct cpuset *cs, *oldcs;
        struct task_struct *task;
        int ret;
 
        /* used later by cpuset_attach() */
        cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
+       oldcs = cpuset_attach_old_cs;
        cs = css_cs(css);
 
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
 
        /* Check to see if task is allowed in the cpuset */
        ret = cpuset_can_attach_check(cs);
@@ -2492,21 +2502,46 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
                goto out_unlock;
 
        cgroup_taskset_for_each(task, css, tset) {
-               ret = task_can_attach(task, cs->effective_cpus);
+               ret = task_can_attach(task);
                if (ret)
                        goto out_unlock;
                ret = security_task_setscheduler(task);
                if (ret)
                        goto out_unlock;
+
+               if (dl_task(task)) {
+                       cs->nr_migrate_dl_tasks++;
+                       cs->sum_migrate_dl_bw += task->dl.dl_bw;
+               }
        }
 
+       if (!cs->nr_migrate_dl_tasks)
+               goto out_success;
+
+       if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) {
+               int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
+
+               if (unlikely(cpu >= nr_cpu_ids)) {
+                       reset_migrate_dl_data(cs);
+                       ret = -EINVAL;
+                       goto out_unlock;
+               }
+
+               ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
+               if (ret) {
+                       reset_migrate_dl_data(cs);
+                       goto out_unlock;
+               }
+       }
+
+out_success:
        /*
         * Mark attach is in progress.  This makes validate_change() fail
         * changes which zero cpus/mems_allowed.
         */
        cs->attach_in_progress++;
 out_unlock:
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
        return ret;
 }
 
@@ -2518,15 +2553,23 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
        cgroup_taskset_first(tset, &css);
        cs = css_cs(css);
 
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        cs->attach_in_progress--;
        if (!cs->attach_in_progress)
                wake_up(&cpuset_attach_wq);
-       percpu_up_write(&cpuset_rwsem);
+
+       if (cs->nr_migrate_dl_tasks) {
+               int cpu = cpumask_any(cs->effective_cpus);
+
+               dl_bw_free(cpu, cs->sum_migrate_dl_bw);
+               reset_migrate_dl_data(cs);
+       }
+
+       mutex_unlock(&cpuset_mutex);
 }
 
 /*
- * Protected by cpuset_rwsem. cpus_attach is used only by cpuset_attach_task()
+ * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach_task()
  * but we can't allocate it dynamically there.  Define it global and
  * allocate from cpuset_init().
  */
@@ -2535,7 +2578,7 @@ static nodemask_t cpuset_attach_nodemask_to;
 
 static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task)
 {
-       percpu_rwsem_assert_held(&cpuset_rwsem);
+       lockdep_assert_held(&cpuset_mutex);
 
        if (cs != &top_cpuset)
                guarantee_online_cpus(task, cpus_attach);
@@ -2565,7 +2608,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
        cs = css_cs(css);
 
        lockdep_assert_cpus_held();     /* see cgroup_attach_lock() */
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        cpus_updated = !cpumask_equal(cs->effective_cpus,
                                      oldcs->effective_cpus);
        mems_updated = !nodes_equal(cs->effective_mems, oldcs->effective_mems);
@@ -2622,11 +2665,17 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 out:
        cs->old_mems_allowed = cpuset_attach_nodemask_to;
 
+       if (cs->nr_migrate_dl_tasks) {
+               cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks;
+               oldcs->nr_deadline_tasks -= cs->nr_migrate_dl_tasks;
+               reset_migrate_dl_data(cs);
+       }
+
        cs->attach_in_progress--;
        if (!cs->attach_in_progress)
                wake_up(&cpuset_attach_wq);
 
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
 }
 
 /* The various types of files and directories in a cpuset file system */
@@ -2658,7 +2707,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
        int retval = 0;
 
        cpus_read_lock();
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        if (!is_cpuset_online(cs)) {
                retval = -ENODEV;
                goto out_unlock;
@@ -2694,7 +2743,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
                break;
        }
 out_unlock:
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
        cpus_read_unlock();
        return retval;
 }
@@ -2707,7 +2756,7 @@ static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
        int retval = -ENODEV;
 
        cpus_read_lock();
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        if (!is_cpuset_online(cs))
                goto out_unlock;
 
@@ -2720,7 +2769,7 @@ static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
                break;
        }
 out_unlock:
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
        cpus_read_unlock();
        return retval;
 }
@@ -2753,7 +2802,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
         * operation like this one can lead to a deadlock through kernfs
         * active_ref protection.  Let's break the protection.  Losing the
         * protection is okay as we check whether @cs is online after
-        * grabbing cpuset_rwsem anyway.  This only happens on the legacy
+        * grabbing cpuset_mutex anyway.  This only happens on the legacy
         * hierarchies.
         */
        css_get(&cs->css);
@@ -2761,7 +2810,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
        flush_work(&cpuset_hotplug_work);
 
        cpus_read_lock();
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        if (!is_cpuset_online(cs))
                goto out_unlock;
 
@@ -2785,7 +2834,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
 
        free_cpuset(trialcs);
 out_unlock:
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
        cpus_read_unlock();
        kernfs_unbreak_active_protection(of->kn);
        css_put(&cs->css);
@@ -2933,13 +2982,13 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
 
        css_get(&cs->css);
        cpus_read_lock();
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        if (!is_cpuset_online(cs))
                goto out_unlock;
 
        retval = update_prstate(cs, val);
 out_unlock:
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
        cpus_read_unlock();
        css_put(&cs->css);
        return retval ?: nbytes;
@@ -3156,7 +3205,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
                return 0;
 
        cpus_read_lock();
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
 
        set_bit(CS_ONLINE, &cs->flags);
        if (is_spread_page(parent))
@@ -3207,7 +3256,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
        cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
        spin_unlock_irq(&callback_lock);
 out_unlock:
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
        cpus_read_unlock();
        return 0;
 }
@@ -3228,7 +3277,7 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css)
        struct cpuset *cs = css_cs(css);
 
        cpus_read_lock();
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
 
        if (is_partition_valid(cs))
                update_prstate(cs, 0);
@@ -3247,7 +3296,7 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css)
        cpuset_dec();
        clear_bit(CS_ONLINE, &cs->flags);
 
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
        cpus_read_unlock();
 }
 
@@ -3260,7 +3309,7 @@ static void cpuset_css_free(struct cgroup_subsys_state *css)
 
 static void cpuset_bind(struct cgroup_subsys_state *root_css)
 {
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        spin_lock_irq(&callback_lock);
 
        if (is_in_v2_mode()) {
@@ -3273,7 +3322,7 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css)
        }
 
        spin_unlock_irq(&callback_lock);
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
 }
 
 /*
@@ -3294,14 +3343,14 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
                return 0;
 
        lockdep_assert_held(&cgroup_mutex);
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
 
        /* Check to see if task is allowed in the cpuset */
        ret = cpuset_can_attach_check(cs);
        if (ret)
                goto out_unlock;
 
-       ret = task_can_attach(task, cs->effective_cpus);
+       ret = task_can_attach(task);
        if (ret)
                goto out_unlock;
 
@@ -3315,7 +3364,7 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
         */
        cs->attach_in_progress++;
 out_unlock:
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
        return ret;
 }
 
@@ -3331,11 +3380,11 @@ static void cpuset_cancel_fork(struct task_struct *task, struct css_set *cset)
        if (same_cs)
                return;
 
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        cs->attach_in_progress--;
        if (!cs->attach_in_progress)
                wake_up(&cpuset_attach_wq);
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
 }
 
 /*
@@ -3363,7 +3412,7 @@ static void cpuset_fork(struct task_struct *task)
        }
 
        /* CLONE_INTO_CGROUP */
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
        guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
        cpuset_attach_task(cs, task);
 
@@ -3371,7 +3420,7 @@ static void cpuset_fork(struct task_struct *task)
        if (!cs->attach_in_progress)
                wake_up(&cpuset_attach_wq);
 
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
 }
 
 struct cgroup_subsys cpuset_cgrp_subsys = {
@@ -3472,7 +3521,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
        is_empty = cpumask_empty(cs->cpus_allowed) ||
                   nodes_empty(cs->mems_allowed);
 
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
 
        /*
         * Move tasks to the nearest ancestor with execution resources,
@@ -3482,7 +3531,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
        if (is_empty)
                remove_tasks_in_empty_cpuset(cs);
 
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
 }
 
 static void
@@ -3533,14 +3582,14 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 retry:
        wait_event(cpuset_attach_wq, cs->attach_in_progress == 0);
 
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
 
        /*
         * We have raced with task attaching. We wait until attaching
         * is finished, so we won't attach a task to an empty cpuset.
         */
        if (cs->attach_in_progress) {
-               percpu_up_write(&cpuset_rwsem);
+               mutex_unlock(&cpuset_mutex);
                goto retry;
        }
 
@@ -3637,7 +3686,7 @@ update_tasks:
                                            cpus_updated, mems_updated);
 
 unlock:
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
 }
 
 /**
@@ -3667,7 +3716,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
        if (on_dfl && !alloc_cpumasks(NULL, &tmp))
                ptmp = &tmp;
 
-       percpu_down_write(&cpuset_rwsem);
+       mutex_lock(&cpuset_mutex);
 
        /* fetch the available cpus/mems and find out which changed how */
        cpumask_copy(&new_cpus, cpu_active_mask);
@@ -3724,7 +3773,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
                update_tasks_nodemask(&top_cpuset);
        }
 
-       percpu_up_write(&cpuset_rwsem);
+       mutex_unlock(&cpuset_mutex);
 
        /* if cpus or mems changed, we need to propagate to descendants */
        if (cpus_updated || mems_updated) {
@@ -4155,7 +4204,7 @@ void __cpuset_memory_pressure_bump(void)
  *  - Used for /proc/<pid>/cpuset.
  *  - No need to task_lock(tsk) on this tsk->cpuset reference, as it
  *    doesn't really matter if tsk->cpuset changes after we read it,
- *    and we take cpuset_rwsem, keeping cpuset_attach() from changing it
+ *    and we take cpuset_mutex, keeping cpuset_attach() from changing it
  *    anyway.
  */
 int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns,
index 9364732..122dacb 100644 (file)
@@ -108,16 +108,18 @@ static int freezer_css_online(struct cgroup_subsys_state *css)
        struct freezer *freezer = css_freezer(css);
        struct freezer *parent = parent_freezer(freezer);
 
+       cpus_read_lock();
        mutex_lock(&freezer_mutex);
 
        freezer->state |= CGROUP_FREEZER_ONLINE;
 
        if (parent && (parent->state & CGROUP_FREEZING)) {
                freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
-               static_branch_inc(&freezer_active);
+               static_branch_inc_cpuslocked(&freezer_active);
        }
 
        mutex_unlock(&freezer_mutex);
+       cpus_read_unlock();
        return 0;
 }
 
@@ -132,14 +134,16 @@ static void freezer_css_offline(struct cgroup_subsys_state *css)
 {
        struct freezer *freezer = css_freezer(css);
 
+       cpus_read_lock();
        mutex_lock(&freezer_mutex);
 
        if (freezer->state & CGROUP_FREEZING)
-               static_branch_dec(&freezer_active);
+               static_branch_dec_cpuslocked(&freezer_active);
 
        freezer->state = 0;
 
        mutex_unlock(&freezer_mutex);
+       cpus_read_unlock();
 }
 
 static void freezer_css_free(struct cgroup_subsys_state *css)
index fe3e8a0..ae2f4dd 100644 (file)
@@ -357,7 +357,6 @@ static struct cftype misc_cg_files[] = {
        {
                .name = "current",
                .seq_show = misc_cg_current_show,
-               .flags = CFTYPE_NOT_ON_ROOT,
        },
        {
                .name = "capacity",
index 3135406..ef5878f 100644 (file)
@@ -197,6 +197,7 @@ uncharge_cg_locked(struct rdma_cgroup *cg,
 
 /**
  * rdmacg_uncharge_hierarchy - hierarchically uncharge rdma resource count
+ * @cg: pointer to cg to uncharge and all parents in hierarchy
  * @device: pointer to rdmacg device
  * @stop_cg: while traversing hirerchy, when meet with stop_cg cgroup
  *           stop uncharging
@@ -221,6 +222,7 @@ static void rdmacg_uncharge_hierarchy(struct rdma_cgroup *cg,
 
 /**
  * rdmacg_uncharge - hierarchically uncharge rdma resource count
+ * @cg: pointer to cg to uncharge and all parents in hierarchy
  * @device: pointer to rdmacg device
  * @index: index of the resource to uncharge in cgroup in given resource pool
  */
index a09f1c1..6ef0b35 100644 (file)
@@ -510,7 +510,7 @@ void noinstr __ct_user_enter(enum ctx_state state)
                         * In this we case we don't care about any concurrency/ordering.
                         */
                        if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE))
-                               arch_atomic_set(&ct->state, state);
+                               raw_atomic_set(&ct->state, state);
                } else {
                        /*
                         * Even if context tracking is disabled on this CPU, because it's outside
@@ -527,7 +527,7 @@ void noinstr __ct_user_enter(enum ctx_state state)
                         */
                        if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) {
                                /* Tracking for vtime only, no concurrent RCU EQS accounting */
-                               arch_atomic_set(&ct->state, state);
+                               raw_atomic_set(&ct->state, state);
                        } else {
                                /*
                                 * Tracking for vtime and RCU EQS. Make sure we don't race
@@ -535,7 +535,7 @@ void noinstr __ct_user_enter(enum ctx_state state)
                                 * RCU only requires RCU_DYNTICKS_IDX increments to be fully
                                 * ordered.
                                 */
-                               arch_atomic_add(state, &ct->state);
+                               raw_atomic_add(state, &ct->state);
                        }
                }
        }
@@ -630,12 +630,12 @@ void noinstr __ct_user_exit(enum ctx_state state)
                         * In this we case we don't care about any concurrency/ordering.
                         */
                        if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE))
-                               arch_atomic_set(&ct->state, CONTEXT_KERNEL);
+                               raw_atomic_set(&ct->state, CONTEXT_KERNEL);
 
                } else {
                        if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) {
                                /* Tracking for vtime only, no concurrent RCU EQS accounting */
-                               arch_atomic_set(&ct->state, CONTEXT_KERNEL);
+                               raw_atomic_set(&ct->state, CONTEXT_KERNEL);
                        } else {
                                /*
                                 * Tracking for vtime and RCU EQS. Make sure we don't race
@@ -643,7 +643,7 @@ void noinstr __ct_user_exit(enum ctx_state state)
                                 * RCU only requires RCU_DYNTICKS_IDX increments to be fully
                                 * ordered.
                                 */
-                               arch_atomic_sub(state, &ct->state);
+                               raw_atomic_sub(state, &ct->state);
                        }
                }
        }
index f4a2c58..88a7ede 100644 (file)
@@ -17,6 +17,7 @@
 #include <linux/cpu.h>
 #include <linux/oom.h>
 #include <linux/rcupdate.h>
+#include <linux/delay.h>
 #include <linux/export.h>
 #include <linux/bug.h>
 #include <linux/kthread.h>
@@ -59,6 +60,7 @@
  * @last:      For multi-instance rollback, remember how far we got
  * @cb_state:  The state for a single callback (install/uninstall)
  * @result:    Result of the operation
+ * @ap_sync_state:     State for AP synchronization
  * @done_up:   Signal completion to the issuer of the task for cpu-up
  * @done_down: Signal completion to the issuer of the task for cpu-down
  */
@@ -76,6 +78,7 @@ struct cpuhp_cpu_state {
        struct hlist_node       *last;
        enum cpuhp_state        cb_state;
        int                     result;
+       atomic_t                ap_sync_state;
        struct completion       done_up;
        struct completion       done_down;
 #endif
@@ -276,6 +279,182 @@ static bool cpuhp_is_atomic_state(enum cpuhp_state state)
        return CPUHP_AP_IDLE_DEAD <= state && state < CPUHP_AP_ONLINE;
 }
 
+/* Synchronization state management */
+enum cpuhp_sync_state {
+       SYNC_STATE_DEAD,
+       SYNC_STATE_KICKED,
+       SYNC_STATE_SHOULD_DIE,
+       SYNC_STATE_ALIVE,
+       SYNC_STATE_SHOULD_ONLINE,
+       SYNC_STATE_ONLINE,
+};
+
+#ifdef CONFIG_HOTPLUG_CORE_SYNC
+/**
+ * cpuhp_ap_update_sync_state - Update synchronization state during bringup/teardown
+ * @state:     The synchronization state to set
+ *
+ * No synchronization point. Just update of the synchronization state, but implies
+ * a full barrier so that the AP changes are visible before the control CPU proceeds.
+ */
+static inline void cpuhp_ap_update_sync_state(enum cpuhp_sync_state state)
+{
+       atomic_t *st = this_cpu_ptr(&cpuhp_state.ap_sync_state);
+
+       (void)atomic_xchg(st, state);
+}
+
+void __weak arch_cpuhp_sync_state_poll(void) { cpu_relax(); }
+
+static bool cpuhp_wait_for_sync_state(unsigned int cpu, enum cpuhp_sync_state state,
+                                     enum cpuhp_sync_state next_state)
+{
+       atomic_t *st = per_cpu_ptr(&cpuhp_state.ap_sync_state, cpu);
+       ktime_t now, end, start = ktime_get();
+       int sync;
+
+       end = start + 10ULL * NSEC_PER_SEC;
+
+       sync = atomic_read(st);
+       while (1) {
+               if (sync == state) {
+                       if (!atomic_try_cmpxchg(st, &sync, next_state))
+                               continue;
+                       return true;
+               }
+
+               now = ktime_get();
+               if (now > end) {
+                       /* Timeout. Leave the state unchanged */
+                       return false;
+               } else if (now - start < NSEC_PER_MSEC) {
+                       /* Poll for one millisecond */
+                       arch_cpuhp_sync_state_poll();
+               } else {
+                       usleep_range_state(USEC_PER_MSEC, 2 * USEC_PER_MSEC, TASK_UNINTERRUPTIBLE);
+               }
+               sync = atomic_read(st);
+       }
+       return true;
+}
+#else  /* CONFIG_HOTPLUG_CORE_SYNC */
+static inline void cpuhp_ap_update_sync_state(enum cpuhp_sync_state state) { }
+#endif /* !CONFIG_HOTPLUG_CORE_SYNC */
+
+#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD
+/**
+ * cpuhp_ap_report_dead - Update synchronization state to DEAD
+ *
+ * No synchronization point. Just update of the synchronization state.
+ */
+void cpuhp_ap_report_dead(void)
+{
+       cpuhp_ap_update_sync_state(SYNC_STATE_DEAD);
+}
+
+void __weak arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) { }
+
+/*
+ * Late CPU shutdown synchronization point. Cannot use cpuhp_state::done_down
+ * because the AP cannot issue complete() at this stage.
+ */
+static void cpuhp_bp_sync_dead(unsigned int cpu)
+{
+       atomic_t *st = per_cpu_ptr(&cpuhp_state.ap_sync_state, cpu);
+       int sync = atomic_read(st);
+
+       do {
+               /* CPU can have reported dead already. Don't overwrite that! */
+               if (sync == SYNC_STATE_DEAD)
+                       break;
+       } while (!atomic_try_cmpxchg(st, &sync, SYNC_STATE_SHOULD_DIE));
+
+       if (cpuhp_wait_for_sync_state(cpu, SYNC_STATE_DEAD, SYNC_STATE_DEAD)) {
+               /* CPU reached dead state. Invoke the cleanup function */
+               arch_cpuhp_cleanup_dead_cpu(cpu);
+               return;
+       }
+
+       /* No further action possible. Emit message and give up. */
+       pr_err("CPU%u failed to report dead state\n", cpu);
+}
+#else /* CONFIG_HOTPLUG_CORE_SYNC_DEAD */
+static inline void cpuhp_bp_sync_dead(unsigned int cpu) { }
+#endif /* !CONFIG_HOTPLUG_CORE_SYNC_DEAD */
+
+#ifdef CONFIG_HOTPLUG_CORE_SYNC_FULL
+/**
+ * cpuhp_ap_sync_alive - Synchronize AP with the control CPU once it is alive
+ *
+ * Updates the AP synchronization state to SYNC_STATE_ALIVE and waits
+ * for the BP to release it.
+ */
+void cpuhp_ap_sync_alive(void)
+{
+       atomic_t *st = this_cpu_ptr(&cpuhp_state.ap_sync_state);
+
+       cpuhp_ap_update_sync_state(SYNC_STATE_ALIVE);
+
+       /* Wait for the control CPU to release it. */
+       while (atomic_read(st) != SYNC_STATE_SHOULD_ONLINE)
+               cpu_relax();
+}
+
+static bool cpuhp_can_boot_ap(unsigned int cpu)
+{
+       atomic_t *st = per_cpu_ptr(&cpuhp_state.ap_sync_state, cpu);
+       int sync = atomic_read(st);
+
+again:
+       switch (sync) {
+       case SYNC_STATE_DEAD:
+               /* CPU is properly dead */
+               break;
+       case SYNC_STATE_KICKED:
+               /* CPU did not come up in previous attempt */
+               break;
+       case SYNC_STATE_ALIVE:
+               /* CPU is stuck cpuhp_ap_sync_alive(). */
+               break;
+       default:
+               /* CPU failed to report online or dead and is in limbo state. */
+               return false;
+       }
+
+       /* Prepare for booting */
+       if (!atomic_try_cmpxchg(st, &sync, SYNC_STATE_KICKED))
+               goto again;
+
+       return true;
+}
+
+void __weak arch_cpuhp_cleanup_kick_cpu(unsigned int cpu) { }
+
+/*
+ * Early CPU bringup synchronization point. Cannot use cpuhp_state::done_up
+ * because the AP cannot issue complete() so early in the bringup.
+ */
+static int cpuhp_bp_sync_alive(unsigned int cpu)
+{
+       int ret = 0;
+
+       if (!IS_ENABLED(CONFIG_HOTPLUG_CORE_SYNC_FULL))
+               return 0;
+
+       if (!cpuhp_wait_for_sync_state(cpu, SYNC_STATE_ALIVE, SYNC_STATE_SHOULD_ONLINE)) {
+               pr_err("CPU%u failed to report alive state\n", cpu);
+               ret = -EIO;
+       }
+
+       /* Let the architecture cleanup the kick alive mechanics. */
+       arch_cpuhp_cleanup_kick_cpu(cpu);
+       return ret;
+}
+#else /* CONFIG_HOTPLUG_CORE_SYNC_FULL */
+static inline int cpuhp_bp_sync_alive(unsigned int cpu) { return 0; }
+static inline bool cpuhp_can_boot_ap(unsigned int cpu) { return true; }
+#endif /* !CONFIG_HOTPLUG_CORE_SYNC_FULL */
+
 /* Serializes the updates to cpu_online_mask, cpu_present_mask */
 static DEFINE_MUTEX(cpu_add_remove_lock);
 bool cpuhp_tasks_frozen;
@@ -470,8 +649,23 @@ bool cpu_smt_possible(void)
                cpu_smt_control != CPU_SMT_NOT_SUPPORTED;
 }
 EXPORT_SYMBOL_GPL(cpu_smt_possible);
+
+static inline bool cpuhp_smt_aware(void)
+{
+       return topology_smt_supported();
+}
+
+static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
+{
+       return cpu_primary_thread_mask;
+}
 #else
 static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
+static inline bool cpuhp_smt_aware(void) { return false; }
+static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
+{
+       return cpu_present_mask;
+}
 #endif
 
 static inline enum cpuhp_state
@@ -558,7 +752,7 @@ static int cpuhp_kick_ap(int cpu, struct cpuhp_cpu_state *st,
        return ret;
 }
 
-static int bringup_wait_for_ap(unsigned int cpu)
+static int bringup_wait_for_ap_online(unsigned int cpu)
 {
        struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 
@@ -579,38 +773,94 @@ static int bringup_wait_for_ap(unsigned int cpu)
         */
        if (!cpu_smt_allowed(cpu))
                return -ECANCELED;
+       return 0;
+}
+
+#ifdef CONFIG_HOTPLUG_SPLIT_STARTUP
+static int cpuhp_kick_ap_alive(unsigned int cpu)
+{
+       if (!cpuhp_can_boot_ap(cpu))
+               return -EAGAIN;
+
+       return arch_cpuhp_kick_ap_alive(cpu, idle_thread_get(cpu));
+}
+
+static int cpuhp_bringup_ap(unsigned int cpu)
+{
+       struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+       int ret;
+
+       /*
+        * Some architectures have to walk the irq descriptors to
+        * setup the vector space for the cpu which comes online.
+        * Prevent irq alloc/free across the bringup.
+        */
+       irq_lock_sparse();
+
+       ret = cpuhp_bp_sync_alive(cpu);
+       if (ret)
+               goto out_unlock;
+
+       ret = bringup_wait_for_ap_online(cpu);
+       if (ret)
+               goto out_unlock;
+
+       irq_unlock_sparse();
 
        if (st->target <= CPUHP_AP_ONLINE_IDLE)
                return 0;
 
        return cpuhp_kick_ap(cpu, st, st->target);
-}
 
+out_unlock:
+       irq_unlock_sparse();
+       return ret;
+}
+#else
 static int bringup_cpu(unsigned int cpu)
 {
+       struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
        struct task_struct *idle = idle_thread_get(cpu);
        int ret;
 
-       /*
-        * Reset stale stack state from the last time this CPU was online.
-        */
-       scs_task_reset(idle);
-       kasan_unpoison_task_stack(idle);
+       if (!cpuhp_can_boot_ap(cpu))
+               return -EAGAIN;
 
        /*
         * Some architectures have to walk the irq descriptors to
         * setup the vector space for the cpu which comes online.
-        * Prevent irq alloc/free across the bringup.
+        *
+        * Prevent irq alloc/free across the bringup by acquiring the
+        * sparse irq lock. Hold it until the upcoming CPU completes the
+        * startup in cpuhp_online_idle() which allows to avoid
+        * intermediate synchronization points in the architecture code.
         */
        irq_lock_sparse();
 
-       /* Arch-specific enabling code. */
        ret = __cpu_up(cpu, idle);
-       irq_unlock_sparse();
        if (ret)
-               return ret;
-       return bringup_wait_for_ap(cpu);
+               goto out_unlock;
+
+       ret = cpuhp_bp_sync_alive(cpu);
+       if (ret)
+               goto out_unlock;
+
+       ret = bringup_wait_for_ap_online(cpu);
+       if (ret)
+               goto out_unlock;
+
+       irq_unlock_sparse();
+
+       if (st->target <= CPUHP_AP_ONLINE_IDLE)
+               return 0;
+
+       return cpuhp_kick_ap(cpu, st, st->target);
+
+out_unlock:
+       irq_unlock_sparse();
+       return ret;
 }
+#endif
 
 static int finish_cpu(unsigned int cpu)
 {
@@ -1099,6 +1349,8 @@ static int takedown_cpu(unsigned int cpu)
        /* This actually kills the CPU. */
        __cpu_die(cpu);
 
+       cpuhp_bp_sync_dead(cpu);
+
        tick_cleanup_dead_cpu(cpu);
        rcutree_migrate_callbacks(cpu);
        return 0;
@@ -1345,8 +1597,10 @@ void cpuhp_online_idle(enum cpuhp_state state)
        if (state != CPUHP_AP_ONLINE_IDLE)
                return;
 
+       cpuhp_ap_update_sync_state(SYNC_STATE_ONLINE);
+
        /*
-        * Unpart the stopper thread before we start the idle loop (and start
+        * Unpark the stopper thread before we start the idle loop (and start
         * scheduling); this ensures the stopper task is always available.
         */
        stop_machine_unpark(smp_processor_id());
@@ -1383,6 +1637,12 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
                        ret = PTR_ERR(idle);
                        goto out;
                }
+
+               /*
+                * Reset stale stack state from the last time this CPU was online.
+                */
+               scs_task_reset(idle);
+               kasan_unpoison_task_stack(idle);
        }
 
        cpuhp_tasks_frozen = tasks_frozen;
@@ -1502,18 +1762,96 @@ int bringup_hibernate_cpu(unsigned int sleep_cpu)
        return 0;
 }
 
-void bringup_nonboot_cpus(unsigned int setup_max_cpus)
+static void __init cpuhp_bringup_mask(const struct cpumask *mask, unsigned int ncpus,
+                                     enum cpuhp_state target)
 {
        unsigned int cpu;
 
-       for_each_present_cpu(cpu) {
-               if (num_online_cpus() >= setup_max_cpus)
+       for_each_cpu(cpu, mask) {
+               struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+
+               if (cpu_up(cpu, target) && can_rollback_cpu(st)) {
+                       /*
+                        * If this failed then cpu_up() might have only
+                        * rolled back to CPUHP_BP_KICK_AP for the final
+                        * online. Clean it up. NOOP if already rolled back.
+                        */
+                       WARN_ON(cpuhp_invoke_callback_range(false, cpu, st, CPUHP_OFFLINE));
+               }
+
+               if (!--ncpus)
                        break;
-               if (!cpu_online(cpu))
-                       cpu_up(cpu, CPUHP_ONLINE);
        }
 }
 
+#ifdef CONFIG_HOTPLUG_PARALLEL
+static bool __cpuhp_parallel_bringup __ro_after_init = true;
+
+static int __init parallel_bringup_parse_param(char *arg)
+{
+       return kstrtobool(arg, &__cpuhp_parallel_bringup);
+}
+early_param("cpuhp.parallel", parallel_bringup_parse_param);
+
+/*
+ * On architectures which have enabled parallel bringup this invokes all BP
+ * prepare states for each of the to be onlined APs first. The last state
+ * sends the startup IPI to the APs. The APs proceed through the low level
+ * bringup code in parallel and then wait for the control CPU to release
+ * them one by one for the final onlining procedure.
+ *
+ * This avoids waiting for each AP to respond to the startup IPI in
+ * CPUHP_BRINGUP_CPU.
+ */
+static bool __init cpuhp_bringup_cpus_parallel(unsigned int ncpus)
+{
+       const struct cpumask *mask = cpu_present_mask;
+
+       if (__cpuhp_parallel_bringup)
+               __cpuhp_parallel_bringup = arch_cpuhp_init_parallel_bringup();
+       if (!__cpuhp_parallel_bringup)
+               return false;
+
+       if (cpuhp_smt_aware()) {
+               const struct cpumask *pmask = cpuhp_get_primary_thread_mask();
+               static struct cpumask tmp_mask __initdata;
+
+               /*
+                * X86 requires to prevent that SMT siblings stopped while
+                * the primary thread does a microcode update for various
+                * reasons. Bring the primary threads up first.
+                */
+               cpumask_and(&tmp_mask, mask, pmask);
+               cpuhp_bringup_mask(&tmp_mask, ncpus, CPUHP_BP_KICK_AP);
+               cpuhp_bringup_mask(&tmp_mask, ncpus, CPUHP_ONLINE);
+               /* Account for the online CPUs */
+               ncpus -= num_online_cpus();
+               if (!ncpus)
+                       return true;
+               /* Create the mask for secondary CPUs */
+               cpumask_andnot(&tmp_mask, mask, pmask);
+               mask = &tmp_mask;
+       }
+
+       /* Bring the not-yet started CPUs up */
+       cpuhp_bringup_mask(mask, ncpus, CPUHP_BP_KICK_AP);
+       cpuhp_bringup_mask(mask, ncpus, CPUHP_ONLINE);
+       return true;
+}
+#else
+static inline bool cpuhp_bringup_cpus_parallel(unsigned int ncpus) { return false; }
+#endif /* CONFIG_HOTPLUG_PARALLEL */
+
+void __init bringup_nonboot_cpus(unsigned int setup_max_cpus)
+{
+       /* Try parallel bringup optimization if enabled */
+       if (cpuhp_bringup_cpus_parallel(setup_max_cpus))
+               return;
+
+       /* Full per CPU serialized bringup */
+       cpuhp_bringup_mask(cpu_present_mask, setup_max_cpus, CPUHP_ONLINE);
+}
+
 #ifdef CONFIG_PM_SLEEP_SMP
 static cpumask_var_t frozen_cpus;
 
@@ -1740,13 +2078,38 @@ static struct cpuhp_step cpuhp_hp_states[] = {
                .startup.single         = timers_prepare_cpu,
                .teardown.single        = timers_dead_cpu,
        },
-       /* Kicks the plugged cpu into life */
+
+#ifdef CONFIG_HOTPLUG_SPLIT_STARTUP
+       /*
+        * Kicks the AP alive. AP will wait in cpuhp_ap_sync_alive() until
+        * the next step will release it.
+        */
+       [CPUHP_BP_KICK_AP] = {
+               .name                   = "cpu:kick_ap",
+               .startup.single         = cpuhp_kick_ap_alive,
+       },
+
+       /*
+        * Waits for the AP to reach cpuhp_ap_sync_alive() and then
+        * releases it for the complete bringup.
+        */
+       [CPUHP_BRINGUP_CPU] = {
+               .name                   = "cpu:bringup",
+               .startup.single         = cpuhp_bringup_ap,
+               .teardown.single        = finish_cpu,
+               .cant_stop              = true,
+       },
+#else
+       /*
+        * All-in-one CPU bringup state which includes the kick alive.
+        */
        [CPUHP_BRINGUP_CPU] = {
                .name                   = "cpu:bringup",
                .startup.single         = bringup_cpu,
                .teardown.single        = finish_cpu,
                .cant_stop              = true,
        },
+#endif
        /* Final state before CPU kills itself */
        [CPUHP_AP_IDLE_DEAD] = {
                .name                   = "idle:dead",
@@ -2723,6 +3086,7 @@ void __init boot_cpu_hotplug_init(void)
 {
 #ifdef CONFIG_SMP
        cpumask_set_cpu(smp_processor_id(), &cpus_booted_once_mask);
+       atomic_set(this_cpu_ptr(&cpuhp_state.ap_sync_state), SYNC_STATE_ONLINE);
 #endif
        this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
        this_cpu_write(cpuhp_state.target, CPUHP_ONLINE);
index 68baa81..b2f1053 100644 (file)
@@ -6647,7 +6647,7 @@ static void perf_sigtrap(struct perf_event *event)
                return;
 
        send_sig_perf((void __user *)event->pending_addr,
-                     event->attr.type, event->attr.sig_data);
+                     event->orig_type, event->attr.sig_data);
 }
 
 /*
@@ -9951,6 +9951,9 @@ static void sw_perf_event_destroy(struct perf_event *event)
        swevent_hlist_put();
 }
 
+static struct pmu perf_cpu_clock; /* fwd declaration */
+static struct pmu perf_task_clock;
+
 static int perf_swevent_init(struct perf_event *event)
 {
        u64 event_id = event->attr.config;
@@ -9966,7 +9969,10 @@ static int perf_swevent_init(struct perf_event *event)
 
        switch (event_id) {
        case PERF_COUNT_SW_CPU_CLOCK:
+               event->attr.type = perf_cpu_clock.type;
+               return -ENOENT;
        case PERF_COUNT_SW_TASK_CLOCK:
+               event->attr.type = perf_task_clock.type;
                return -ENOENT;
 
        default:
@@ -10150,8 +10156,20 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size,
        perf_trace_buf_update(record, event_type);
 
        hlist_for_each_entry_rcu(event, head, hlist_entry) {
-               if (perf_tp_event_match(event, &data, regs))
+               if (perf_tp_event_match(event, &data, regs)) {
                        perf_swevent_event(event, count, &data, regs);
+
+                       /*
+                        * Here use the same on-stack perf_sample_data,
+                        * some members in data are event-specific and
+                        * need to be re-computed for different sweveents.
+                        * Re-initialize data->sample_flags safely to avoid
+                        * the problem that next event skips preparing data
+                        * because data->sample_flags is set.
+                        */
+                       perf_sample_data_init(&data, 0, 0);
+                       perf_sample_save_raw_data(&data, &raw);
+               }
        }
 
        /*
@@ -11086,7 +11104,7 @@ static void cpu_clock_event_read(struct perf_event *event)
 
 static int cpu_clock_event_init(struct perf_event *event)
 {
-       if (event->attr.type != PERF_TYPE_SOFTWARE)
+       if (event->attr.type != perf_cpu_clock.type)
                return -ENOENT;
 
        if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK)
@@ -11107,6 +11125,7 @@ static struct pmu perf_cpu_clock = {
        .task_ctx_nr    = perf_sw_context,
 
        .capabilities   = PERF_PMU_CAP_NO_NMI,
+       .dev            = PMU_NULL_DEV,
 
        .event_init     = cpu_clock_event_init,
        .add            = cpu_clock_event_add,
@@ -11167,7 +11186,7 @@ static void task_clock_event_read(struct perf_event *event)
 
 static int task_clock_event_init(struct perf_event *event)
 {
-       if (event->attr.type != PERF_TYPE_SOFTWARE)
+       if (event->attr.type != perf_task_clock.type)
                return -ENOENT;
 
        if (event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
@@ -11188,6 +11207,7 @@ static struct pmu perf_task_clock = {
        .task_ctx_nr    = perf_sw_context,
 
        .capabilities   = PERF_PMU_CAP_NO_NMI,
+       .dev            = PMU_NULL_DEV,
 
        .event_init     = task_clock_event_init,
        .add            = task_clock_event_add,
@@ -11415,31 +11435,31 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
                goto unlock;
 
        pmu->type = -1;
-       if (!name)
-               goto skip_type;
+       if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
+               ret = -EINVAL;
+               goto free_pdc;
+       }
+
        pmu->name = name;
 
-       if (type != PERF_TYPE_SOFTWARE) {
-               if (type >= 0)
-                       max = type;
+       if (type >= 0)
+               max = type;
 
-               ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL);
-               if (ret < 0)
-                       goto free_pdc;
+       ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL);
+       if (ret < 0)
+               goto free_pdc;
 
-               WARN_ON(type >= 0 && ret != type);
+       WARN_ON(type >= 0 && ret != type);
 
-               type = ret;
-       }
+       type = ret;
        pmu->type = type;
 
-       if (pmu_bus_running) {
+       if (pmu_bus_running && !pmu->dev) {
                ret = pmu_dev_alloc(pmu);
                if (ret)
                        goto free_idr;
        }
 
-skip_type:
        ret = -ENOMEM;
        pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
        if (!pmu->cpu_pmu_context)
@@ -11481,16 +11501,7 @@ skip_type:
        if (!pmu->event_idx)
                pmu->event_idx = perf_event_idx_default;
 
-       /*
-        * Ensure the TYPE_SOFTWARE PMUs are at the head of the list,
-        * since these cannot be in the IDR. This way the linear search
-        * is fast, provided a valid software event is provided.
-        */
-       if (type == PERF_TYPE_SOFTWARE || !name)
-               list_add_rcu(&pmu->entry, &pmus);
-       else
-               list_add_tail_rcu(&pmu->entry, &pmus);
-
+       list_add_rcu(&pmu->entry, &pmus);
        atomic_set(&pmu->exclusive_cnt, 0);
        ret = 0;
 unlock:
@@ -11499,12 +11510,13 @@ unlock:
        return ret;
 
 free_dev:
-       device_del(pmu->dev);
-       put_device(pmu->dev);
+       if (pmu->dev && pmu->dev != PMU_NULL_DEV) {
+               device_del(pmu->dev);
+               put_device(pmu->dev);
+       }
 
 free_idr:
-       if (pmu->type != PERF_TYPE_SOFTWARE)
-               idr_remove(&pmu_idr, pmu->type);
+       idr_remove(&pmu_idr, pmu->type);
 
 free_pdc:
        free_percpu(pmu->pmu_disable_count);
@@ -11525,9 +11537,8 @@ void perf_pmu_unregister(struct pmu *pmu)
        synchronize_rcu();
 
        free_percpu(pmu->pmu_disable_count);
-       if (pmu->type != PERF_TYPE_SOFTWARE)
-               idr_remove(&pmu_idr, pmu->type);
-       if (pmu_bus_running) {
+       idr_remove(&pmu_idr, pmu->type);
+       if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
                if (pmu->nr_addr_filters)
                        device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
                device_del(pmu->dev);
@@ -11601,6 +11612,12 @@ static struct pmu *perf_init_event(struct perf_event *event)
 
        idx = srcu_read_lock(&pmus_srcu);
 
+       /*
+        * Save original type before calling pmu->event_init() since certain
+        * pmus overwrites event->attr.type to forward event to another pmu.
+        */
+       event->orig_type = event->attr.type;
+
        /* Try parent's PMU first: */
        if (event->parent && event->parent->pmu) {
                pmu = event->parent->pmu;
@@ -13640,8 +13657,8 @@ void __init perf_event_init(void)
        perf_event_init_all_cpus();
        init_srcu_struct(&pmus_srcu);
        perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE);
-       perf_pmu_register(&perf_cpu_clock, NULL, -1);
-       perf_pmu_register(&perf_task_clock, NULL, -1);
+       perf_pmu_register(&perf_cpu_clock, "cpu_clock", -1);
+       perf_pmu_register(&perf_task_clock, "task_clock", -1);
        perf_tp_register();
        perf_event_init_cpu(smp_processor_id());
        register_reboot_notifier(&perf_reboot_notifier);
@@ -13684,7 +13701,7 @@ static int __init perf_event_sysfs_init(void)
                goto unlock;
 
        list_for_each_entry(pmu, &pmus, entry) {
-               if (!pmu->name || pmu->type < 0)
+               if (pmu->dev)
                        continue;
 
                ret = pmu_dev_alloc(pmu);
index 34b90e2..edb50b4 100644 (file)
@@ -411,7 +411,10 @@ static void coredump_task_exit(struct task_struct *tsk)
        tsk->flags |= PF_POSTCOREDUMP;
        core_state = tsk->signal->core_state;
        spin_unlock_irq(&tsk->sighand->siglock);
-       if (core_state) {
+
+       /* The vhost_worker does not particpate in coredumps */
+       if (core_state &&
+           ((tsk->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER)) {
                struct core_thread self;
 
                self.task = current;
index ed4e01d..41c9641 100644 (file)
@@ -627,6 +627,7 @@ void free_task(struct task_struct *tsk)
        arch_release_task_struct(tsk);
        if (tsk->flags & PF_KTHREAD)
                free_kthread_struct(tsk);
+       bpf_task_storage_free(tsk);
        free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
@@ -979,7 +980,6 @@ void __put_task_struct(struct task_struct *tsk)
        cgroup_free(tsk);
        task_numa_free(tsk, true);
        security_task_free(tsk);
-       bpf_task_storage_free(tsk);
        exit_creds(tsk);
        delayacct_tsk_free(tsk);
        put_signal_struct(tsk->signal);
@@ -2336,16 +2336,16 @@ __latent_entropy struct task_struct *copy_process(
        p->flags &= ~PF_KTHREAD;
        if (args->kthread)
                p->flags |= PF_KTHREAD;
-       if (args->user_worker)
-               p->flags |= PF_USER_WORKER;
-       if (args->io_thread) {
+       if (args->user_worker) {
                /*
-                * Mark us an IO worker, and block any signal that isn't
+                * Mark us a user worker, and block any signal that isn't
                 * fatal or STOP
                 */
-               p->flags |= PF_IO_WORKER;
+               p->flags |= PF_USER_WORKER;
                siginitsetinv(&p->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
        }
+       if (args->io_thread)
+               p->flags |= PF_IO_WORKER;
 
        if (args->name)
                strscpy_pad(p->comm, args->name, sizeof(p->comm));
@@ -2517,9 +2517,6 @@ __latent_entropy struct task_struct *copy_process(
        if (retval)
                goto bad_fork_cleanup_io;
 
-       if (args->ignore_signals)
-               ignore_signals(p);
-
        stackleak_task_init(p);
 
        if (pid != &init_struct_pid) {
index 49e7bc8..ee8c0ac 100644 (file)
@@ -306,6 +306,7 @@ static void __irq_disable(struct irq_desc *desc, bool mask);
 void irq_shutdown(struct irq_desc *desc)
 {
        if (irqd_is_started(&desc->irq_data)) {
+               clear_irq_resend(desc);
                desc->depth = 1;
                if (desc->irq_data.chip->irq_shutdown) {
                        desc->irq_data.chip->irq_shutdown(&desc->irq_data);
@@ -692,8 +693,16 @@ void handle_fasteoi_irq(struct irq_desc *desc)
 
        raw_spin_lock(&desc->lock);
 
-       if (!irq_may_run(desc))
+       /*
+        * When an affinity change races with IRQ handling, the next interrupt
+        * can arrive on the new CPU before the original CPU has completed
+        * handling the previous one - it may need to be resent.
+        */
+       if (!irq_may_run(desc)) {
+               if (irqd_needs_resend_when_in_progress(&desc->irq_data))
+                       desc->istate |= IRQS_PENDING;
                goto out;
+       }
 
        desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
 
@@ -715,6 +724,12 @@ void handle_fasteoi_irq(struct irq_desc *desc)
 
        cond_unmask_eoi_irq(desc, chip);
 
+       /*
+        * When the race described above happens this will resend the interrupt.
+        */
+       if (unlikely(desc->istate & IRQS_PENDING))
+               check_irq_resend(desc, false);
+
        raw_spin_unlock(&desc->lock);
        return;
 out:
index bbcaac6..5971a66 100644 (file)
@@ -133,6 +133,8 @@ static const struct irq_bit_descr irqdata_states[] = {
        BIT_MASK_DESCR(IRQD_HANDLE_ENFORCE_IRQCTX),
 
        BIT_MASK_DESCR(IRQD_IRQ_ENABLED_ON_SUSPEND),
+
+       BIT_MASK_DESCR(IRQD_RESEND_WHEN_IN_PROGRESS),
 };
 
 static const struct irq_bit_descr irqdesc_states[] = {
index 5fdc0b5..bdd35bb 100644 (file)
@@ -12,9 +12,9 @@
 #include <linux/sched/clock.h>
 
 #ifdef CONFIG_SPARSE_IRQ
-# define IRQ_BITMAP_BITS       (NR_IRQS + 8196)
+# define MAX_SPARSE_IRQS       INT_MAX
 #else
-# define IRQ_BITMAP_BITS       NR_IRQS
+# define MAX_SPARSE_IRQS       NR_IRQS
 #endif
 
 #define istate core_internal_state__do_not_mess_with_it
@@ -47,9 +47,12 @@ enum {
  *                               detection
  * IRQS_POLL_INPROGRESS                - polling in progress
  * IRQS_ONESHOT                        - irq is not unmasked in primary handler
- * IRQS_REPLAY                 - irq is replayed
+ * IRQS_REPLAY                 - irq has been resent and will not be resent
+ *                               again until the handler has run and cleared
+ *                               this flag.
  * IRQS_WAITING                        - irq is waiting
- * IRQS_PENDING                        - irq is pending and replayed later
+ * IRQS_PENDING                        - irq needs to be resent and should be resent
+ *                               at the next available opportunity.
  * IRQS_SUSPENDED              - irq is suspended
  * IRQS_NMI                    - irq line is used to deliver NMIs
  * IRQS_SYSFS                  - descriptor has been added to sysfs
@@ -113,6 +116,8 @@ irqreturn_t handle_irq_event(struct irq_desc *desc);
 
 /* Resending of interrupts :*/
 int check_irq_resend(struct irq_desc *desc, bool inject);
+void clear_irq_resend(struct irq_desc *desc);
+void irq_resend_init(struct irq_desc *desc);
 bool irq_wait_for_poll(struct irq_desc *desc);
 void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action);
 
index 240e145..27ca1c8 100644 (file)
@@ -12,8 +12,7 @@
 #include <linux/export.h>
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
-#include <linux/radix-tree.h>
-#include <linux/bitmap.h>
+#include <linux/maple_tree.h>
 #include <linux/irqdomain.h>
 #include <linux/sysfs.h>
 
@@ -131,7 +130,40 @@ int nr_irqs = NR_IRQS;
 EXPORT_SYMBOL_GPL(nr_irqs);
 
 static DEFINE_MUTEX(sparse_irq_lock);
-static DECLARE_BITMAP(allocated_irqs, IRQ_BITMAP_BITS);
+static struct maple_tree sparse_irqs = MTREE_INIT_EXT(sparse_irqs,
+                                       MT_FLAGS_ALLOC_RANGE |
+                                       MT_FLAGS_LOCK_EXTERN |
+                                       MT_FLAGS_USE_RCU,
+                                       sparse_irq_lock);
+
+static int irq_find_free_area(unsigned int from, unsigned int cnt)
+{
+       MA_STATE(mas, &sparse_irqs, 0, 0);
+
+       if (mas_empty_area(&mas, from, MAX_SPARSE_IRQS, cnt))
+               return -ENOSPC;
+       return mas.index;
+}
+
+static unsigned int irq_find_at_or_after(unsigned int offset)
+{
+       unsigned long index = offset;
+       struct irq_desc *desc = mt_find(&sparse_irqs, &index, nr_irqs);
+
+       return desc ? irq_desc_get_irq(desc) : nr_irqs;
+}
+
+static void irq_insert_desc(unsigned int irq, struct irq_desc *desc)
+{
+       MA_STATE(mas, &sparse_irqs, irq, irq);
+       WARN_ON(mas_store_gfp(&mas, desc, GFP_KERNEL) != 0);
+}
+
+static void delete_irq_desc(unsigned int irq)
+{
+       MA_STATE(mas, &sparse_irqs, irq, irq);
+       mas_erase(&mas);
+}
 
 #ifdef CONFIG_SPARSE_IRQ
 
@@ -344,26 +376,14 @@ static void irq_sysfs_del(struct irq_desc *desc) {}
 
 #endif /* CONFIG_SYSFS */
 
-static RADIX_TREE(irq_desc_tree, GFP_KERNEL);
-
-static void irq_insert_desc(unsigned int irq, struct irq_desc *desc)
-{
-       radix_tree_insert(&irq_desc_tree, irq, desc);
-}
-
 struct irq_desc *irq_to_desc(unsigned int irq)
 {
-       return radix_tree_lookup(&irq_desc_tree, irq);
+       return mtree_load(&sparse_irqs, irq);
 }
 #ifdef CONFIG_KVM_BOOK3S_64_HV_MODULE
 EXPORT_SYMBOL_GPL(irq_to_desc);
 #endif
 
-static void delete_irq_desc(unsigned int irq)
-{
-       radix_tree_delete(&irq_desc_tree, irq);
-}
-
 #ifdef CONFIG_SMP
 static void free_masks(struct irq_desc *desc)
 {
@@ -415,6 +435,7 @@ static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags,
        desc_set_defaults(irq, desc, node, affinity, owner);
        irqd_set(&desc->irq_data, flags);
        kobject_init(&desc->kobj, &irq_kobj_type);
+       irq_resend_init(desc);
 
        return desc;
 
@@ -505,7 +526,6 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node,
                irq_sysfs_add(start + i, desc);
                irq_add_debugfs_entry(start + i, desc);
        }
-       bitmap_set(allocated_irqs, start, cnt);
        return start;
 
 err:
@@ -516,7 +536,7 @@ err:
 
 static int irq_expand_nr_irqs(unsigned int nr)
 {
-       if (nr > IRQ_BITMAP_BITS)
+       if (nr > MAX_SPARSE_IRQS)
                return -ENOMEM;
        nr_irqs = nr;
        return 0;
@@ -534,18 +554,17 @@ int __init early_irq_init(void)
        printk(KERN_INFO "NR_IRQS: %d, nr_irqs: %d, preallocated irqs: %d\n",
               NR_IRQS, nr_irqs, initcnt);
 
-       if (WARN_ON(nr_irqs > IRQ_BITMAP_BITS))
-               nr_irqs = IRQ_BITMAP_BITS;
+       if (WARN_ON(nr_irqs > MAX_SPARSE_IRQS))
+               nr_irqs = MAX_SPARSE_IRQS;
 
-       if (WARN_ON(initcnt > IRQ_BITMAP_BITS))
-               initcnt = IRQ_BITMAP_BITS;
+       if (WARN_ON(initcnt > MAX_SPARSE_IRQS))
+               initcnt = MAX_SPARSE_IRQS;
 
        if (initcnt > nr_irqs)
                nr_irqs = initcnt;
 
        for (i = 0; i < initcnt; i++) {
                desc = alloc_desc(i, node, 0, NULL, NULL);
-               set_bit(i, allocated_irqs);
                irq_insert_desc(i, desc);
        }
        return arch_early_irq_init();
@@ -581,6 +600,7 @@ int __init early_irq_init(void)
                mutex_init(&desc[i].request_mutex);
                init_waitqueue_head(&desc[i].wait_for_threads);
                desc_set_defaults(i, &desc[i], node, NULL, NULL);
+               irq_resend_init(desc);
        }
        return arch_early_irq_init();
 }
@@ -599,6 +619,7 @@ static void free_desc(unsigned int irq)
        raw_spin_lock_irqsave(&desc->lock, flags);
        desc_set_defaults(irq, desc, irq_desc_get_node(desc), NULL, NULL);
        raw_spin_unlock_irqrestore(&desc->lock, flags);
+       delete_irq_desc(irq);
 }
 
 static inline int alloc_descs(unsigned int start, unsigned int cnt, int node,
@@ -611,8 +632,8 @@ static inline int alloc_descs(unsigned int start, unsigned int cnt, int node,
                struct irq_desc *desc = irq_to_desc(start + i);
 
                desc->owner = owner;
+               irq_insert_desc(start + i, desc);
        }
-       bitmap_set(allocated_irqs, start, cnt);
        return start;
 }
 
@@ -624,7 +645,7 @@ static int irq_expand_nr_irqs(unsigned int nr)
 void irq_mark_irq(unsigned int irq)
 {
        mutex_lock(&sparse_irq_lock);
-       bitmap_set(allocated_irqs, irq, 1);
+       irq_insert_desc(irq, irq_desc + irq);
        mutex_unlock(&sparse_irq_lock);
 }
 
@@ -768,7 +789,6 @@ void irq_free_descs(unsigned int from, unsigned int cnt)
        for (i = 0; i < cnt; i++)
                free_desc(from + i);
 
-       bitmap_clear(allocated_irqs, from, cnt);
        mutex_unlock(&sparse_irq_lock);
 }
 EXPORT_SYMBOL_GPL(irq_free_descs);
@@ -810,8 +830,7 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
 
        mutex_lock(&sparse_irq_lock);
 
-       start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
-                                          from, cnt, 0);
+       start = irq_find_free_area(from, cnt);
        ret = -EEXIST;
        if (irq >=0 && start != irq)
                goto unlock;
@@ -836,7 +855,7 @@ EXPORT_SYMBOL_GPL(__irq_alloc_descs);
  */
 unsigned int irq_get_next_irq(unsigned int offset)
 {
-       return find_next_bit(allocated_irqs, nr_irqs, offset);
+       return irq_find_at_or_after(offset);
 }
 
 struct irq_desc *
index f34760a..5bd0162 100644 (file)
@@ -1915,6 +1915,8 @@ static void irq_domain_check_hierarchy(struct irq_domain *domain)
 #endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */
 
 #ifdef CONFIG_GENERIC_IRQ_DEBUGFS
+#include "internals.h"
+
 static struct dentry *domain_dir;
 
 static void
index 7a97bcb..b4c31a5 100644 (file)
@@ -542,7 +542,7 @@ fail:
        return ret;
 }
 
-#ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
+#if defined(CONFIG_PCI_MSI_ARCH_FALLBACKS) || defined(CONFIG_PCI_XEN)
 /**
  * msi_device_populate_sysfs - Populate msi_irqs sysfs entries for a device
  * @dev:       The device (PCI, platform etc) which will get sysfs entries
@@ -574,7 +574,7 @@ void msi_device_destroy_sysfs(struct device *dev)
        msi_for_each_desc(desc, dev, MSI_DESC_ALL)
                msi_sysfs_remove_desc(dev, desc);
 }
-#endif /* CONFIG_PCI_MSI_ARCH_FALLBACK */
+#endif /* CONFIG_PCI_MSI_ARCH_FALLBACK || CONFIG_PCI_XEN */
 #else /* CONFIG_SYSFS */
 static inline int msi_sysfs_create_group(struct device *dev) { return 0; }
 static inline int msi_sysfs_populate_desc(struct device *dev, struct msi_desc *desc) { return 0; }
index 0c46e9f..edec335 100644 (file)
@@ -21,8 +21,9 @@
 
 #ifdef CONFIG_HARDIRQS_SW_RESEND
 
-/* Bitmap to handle software resend of interrupts: */
-static DECLARE_BITMAP(irqs_resend, IRQ_BITMAP_BITS);
+/* hlist_head to handle software resend of interrupts: */
+static HLIST_HEAD(irq_resend_list);
+static DEFINE_RAW_SPINLOCK(irq_resend_lock);
 
 /*
  * Run software resends of IRQ's
@@ -30,18 +31,17 @@ static DECLARE_BITMAP(irqs_resend, IRQ_BITMAP_BITS);
 static void resend_irqs(struct tasklet_struct *unused)
 {
        struct irq_desc *desc;
-       int irq;
-
-       while (!bitmap_empty(irqs_resend, nr_irqs)) {
-               irq = find_first_bit(irqs_resend, nr_irqs);
-               clear_bit(irq, irqs_resend);
-               desc = irq_to_desc(irq);
-               if (!desc)
-                       continue;
-               local_irq_disable();
+
+       raw_spin_lock_irq(&irq_resend_lock);
+       while (!hlist_empty(&irq_resend_list)) {
+               desc = hlist_entry(irq_resend_list.first, struct irq_desc,
+                                  resend_node);
+               hlist_del_init(&desc->resend_node);
+               raw_spin_unlock(&irq_resend_lock);
                desc->handle_irq(desc);
-               local_irq_enable();
+               raw_spin_lock(&irq_resend_lock);
        }
+       raw_spin_unlock_irq(&irq_resend_lock);
 }
 
 /* Tasklet to handle resend: */
@@ -49,8 +49,6 @@ static DECLARE_TASKLET(resend_tasklet, resend_irqs);
 
 static int irq_sw_resend(struct irq_desc *desc)
 {
-       unsigned int irq = irq_desc_get_irq(desc);
-
        /*
         * Validate whether this interrupt can be safely injected from
         * non interrupt context
@@ -70,16 +68,31 @@ static int irq_sw_resend(struct irq_desc *desc)
                 */
                if (!desc->parent_irq)
                        return -EINVAL;
-               irq = desc->parent_irq;
        }
 
-       /* Set it pending and activate the softirq: */
-       set_bit(irq, irqs_resend);
+       /* Add to resend_list and activate the softirq: */
+       raw_spin_lock(&irq_resend_lock);
+       hlist_add_head(&desc->resend_node, &irq_resend_list);
+       raw_spin_unlock(&irq_resend_lock);
        tasklet_schedule(&resend_tasklet);
        return 0;
 }
 
+void clear_irq_resend(struct irq_desc *desc)
+{
+       raw_spin_lock(&irq_resend_lock);
+       hlist_del_init(&desc->resend_node);
+       raw_spin_unlock(&irq_resend_lock);
+}
+
+void irq_resend_init(struct irq_desc *desc)
+{
+       INIT_HLIST_NODE(&desc->resend_node);
+}
 #else
+void clear_irq_resend(struct irq_desc *desc) {}
+void irq_resend_init(struct irq_desc *desc) {}
+
 static int irq_sw_resend(struct irq_desc *desc)
 {
        return -EINVAL;
index f989f5f..69ee4a2 100644 (file)
@@ -901,10 +901,22 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi,
                }
 
                offset = ALIGN(offset, align);
+
+               /*
+                * Check if the segment contains the entry point, if so,
+                * calculate the value of image->start based on it.
+                * If the compiler has produced more than one .text section
+                * (Eg: .text.hot), they are generally after the main .text
+                * section, and they shall not be used to calculate
+                * image->start. So do not re-calculate image->start if it
+                * is not set to the initial value, and warn the user so they
+                * have a chance to fix their purgatory's linker script.
+                */
                if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
                    pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
                    pi->ehdr->e_entry < (sechdrs[i].sh_addr
-                                        + sechdrs[i].sh_size)) {
+                                        + sechdrs[i].sh_size) &&
+                   !WARN_ON(kbuf->image->start != pi->ehdr->e_entry)) {
                        kbuf->image->start -= sechdrs[i].sh_addr;
                        kbuf->image->start += kbuf->mem + offset;
                }
index 490792b..07a0570 100644 (file)
@@ -182,6 +182,16 @@ bool kthread_should_park(void)
 }
 EXPORT_SYMBOL_GPL(kthread_should_park);
 
+bool kthread_should_stop_or_park(void)
+{
+       struct kthread *kthread = __to_kthread(current);
+
+       if (!kthread)
+               return false;
+
+       return kthread->flags & (BIT(KTHREAD_SHOULD_STOP) | BIT(KTHREAD_SHOULD_PARK));
+}
+
 /**
  * kthread_freezable_should_stop - should this freezable kthread return now?
  * @was_frozen: optional out parameter, indicates whether %current was frozen
index dcd1d5b..111607d 100644 (file)
@@ -709,7 +709,7 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
        usage[i] = '\0';
 }
 
-static void __print_lock_name(struct lock_class *class)
+static void __print_lock_name(struct held_lock *hlock, struct lock_class *class)
 {
        char str[KSYM_NAME_LEN];
        const char *name;
@@ -724,17 +724,19 @@ static void __print_lock_name(struct lock_class *class)
                        printk(KERN_CONT "#%d", class->name_version);
                if (class->subclass)
                        printk(KERN_CONT "/%d", class->subclass);
+               if (hlock && class->print_fn)
+                       class->print_fn(hlock->instance);
        }
 }
 
-static void print_lock_name(struct lock_class *class)
+static void print_lock_name(struct held_lock *hlock, struct lock_class *class)
 {
        char usage[LOCK_USAGE_CHARS];
 
        get_usage_chars(class, usage);
 
        printk(KERN_CONT " (");
-       __print_lock_name(class);
+       __print_lock_name(hlock, class);
        printk(KERN_CONT "){%s}-{%d:%d}", usage,
                        class->wait_type_outer ?: class->wait_type_inner,
                        class->wait_type_inner);
@@ -772,7 +774,7 @@ static void print_lock(struct held_lock *hlock)
        }
 
        printk(KERN_CONT "%px", hlock->instance);
-       print_lock_name(lock);
+       print_lock_name(hlock, lock);
        printk(KERN_CONT ", at: %pS\n", (void *)hlock->acquire_ip);
 }
 
@@ -1868,7 +1870,7 @@ print_circular_bug_entry(struct lock_list *target, int depth)
        if (debug_locks_silent)
                return;
        printk("\n-> #%u", depth);
-       print_lock_name(target->class);
+       print_lock_name(NULL, target->class);
        printk(KERN_CONT ":\n");
        print_lock_trace(target->trace, 6);
 }
@@ -1899,11 +1901,11 @@ print_circular_lock_scenario(struct held_lock *src,
         */
        if (parent != source) {
                printk("Chain exists of:\n  ");
-               __print_lock_name(source);
+               __print_lock_name(src, source);
                printk(KERN_CONT " --> ");
-               __print_lock_name(parent);
+               __print_lock_name(NULL, parent);
                printk(KERN_CONT " --> ");
-               __print_lock_name(target);
+               __print_lock_name(tgt, target);
                printk(KERN_CONT "\n\n");
        }
 
@@ -1914,13 +1916,13 @@ print_circular_lock_scenario(struct held_lock *src,
                printk("  rlock(");
        else
                printk("  lock(");
-       __print_lock_name(target);
+       __print_lock_name(tgt, target);
        printk(KERN_CONT ");\n");
        printk("                               lock(");
-       __print_lock_name(parent);
+       __print_lock_name(NULL, parent);
        printk(KERN_CONT ");\n");
        printk("                               lock(");
-       __print_lock_name(target);
+       __print_lock_name(tgt, target);
        printk(KERN_CONT ");\n");
        if (src_read != 0)
                printk("  rlock(");
@@ -1928,7 +1930,7 @@ print_circular_lock_scenario(struct held_lock *src,
                printk("  sync(");
        else
                printk("  lock(");
-       __print_lock_name(source);
+       __print_lock_name(src, source);
        printk(KERN_CONT ");\n");
        printk("\n *** DEADLOCK ***\n\n");
 }
@@ -2154,6 +2156,8 @@ check_path(struct held_lock *target, struct lock_list *src_entry,
        return ret;
 }
 
+static void print_deadlock_bug(struct task_struct *, struct held_lock *, struct held_lock *);
+
 /*
  * Prove that the dependency graph starting at <src> can not
  * lead to <target>. If it can, there is a circle when adding
@@ -2185,7 +2189,10 @@ check_noncircular(struct held_lock *src, struct held_lock *target,
                        *trace = save_trace();
                }
 
-               print_circular_bug(&src_entry, target_entry, src, target);
+               if (src->class_idx == target->class_idx)
+                       print_deadlock_bug(current, src, target);
+               else
+                       print_circular_bug(&src_entry, target_entry, src, target);
        }
 
        return ret;
@@ -2263,6 +2270,9 @@ static inline bool usage_match(struct lock_list *entry, void *mask)
 
 static inline bool usage_skip(struct lock_list *entry, void *mask)
 {
+       if (entry->class->lock_type == LD_LOCK_NORMAL)
+               return false;
+
        /*
         * Skip local_lock() for irq inversion detection.
         *
@@ -2289,14 +2299,16 @@ static inline bool usage_skip(struct lock_list *entry, void *mask)
         * As a result, we will skip local_lock(), when we search for irq
         * inversion bugs.
         */
-       if (entry->class->lock_type == LD_LOCK_PERCPU) {
-               if (DEBUG_LOCKS_WARN_ON(entry->class->wait_type_inner < LD_WAIT_CONFIG))
-                       return false;
+       if (entry->class->lock_type == LD_LOCK_PERCPU &&
+           DEBUG_LOCKS_WARN_ON(entry->class->wait_type_inner < LD_WAIT_CONFIG))
+               return false;
 
-               return true;
-       }
+       /*
+        * Skip WAIT_OVERRIDE for irq inversion detection -- it's not actually
+        * a lock and only used to override the wait_type.
+        */
 
-       return false;
+       return true;
 }
 
 /*
@@ -2341,7 +2353,7 @@ static void print_lock_class_header(struct lock_class *class, int depth)
        int bit;
 
        printk("%*s->", depth, "");
-       print_lock_name(class);
+       print_lock_name(NULL, class);
 #ifdef CONFIG_DEBUG_LOCKDEP
        printk(KERN_CONT " ops: %lu", debug_class_ops_read(class));
 #endif
@@ -2523,11 +2535,11 @@ print_irq_lock_scenario(struct lock_list *safe_entry,
         */
        if (middle_class != unsafe_class) {
                printk("Chain exists of:\n  ");
-               __print_lock_name(safe_class);
+               __print_lock_name(NULL, safe_class);
                printk(KERN_CONT " --> ");
-               __print_lock_name(middle_class);
+               __print_lock_name(NULL, middle_class);
                printk(KERN_CONT " --> ");
-               __print_lock_name(unsafe_class);
+               __print_lock_name(NULL, unsafe_class);
                printk(KERN_CONT "\n\n");
        }
 
@@ -2535,18 +2547,18 @@ print_irq_lock_scenario(struct lock_list *safe_entry,
        printk("       CPU0                    CPU1\n");
        printk("       ----                    ----\n");
        printk("  lock(");
-       __print_lock_name(unsafe_class);
+       __print_lock_name(NULL, unsafe_class);
        printk(KERN_CONT ");\n");
        printk("                               local_irq_disable();\n");
        printk("                               lock(");
-       __print_lock_name(safe_class);
+       __print_lock_name(NULL, safe_class);
        printk(KERN_CONT ");\n");
        printk("                               lock(");
-       __print_lock_name(middle_class);
+       __print_lock_name(NULL, middle_class);
        printk(KERN_CONT ");\n");
        printk("  <Interrupt>\n");
        printk("    lock(");
-       __print_lock_name(safe_class);
+       __print_lock_name(NULL, safe_class);
        printk(KERN_CONT ");\n");
        printk("\n *** DEADLOCK ***\n\n");
 }
@@ -2583,20 +2595,20 @@ print_bad_irq_dependency(struct task_struct *curr,
        pr_warn("\nand this task is already holding:\n");
        print_lock(prev);
        pr_warn("which would create a new lock dependency:\n");
-       print_lock_name(hlock_class(prev));
+       print_lock_name(prev, hlock_class(prev));
        pr_cont(" ->");
-       print_lock_name(hlock_class(next));
+       print_lock_name(next, hlock_class(next));
        pr_cont("\n");
 
        pr_warn("\nbut this new dependency connects a %s-irq-safe lock:\n",
                irqclass);
-       print_lock_name(backwards_entry->class);
+       print_lock_name(NULL, backwards_entry->class);
        pr_warn("\n... which became %s-irq-safe at:\n", irqclass);
 
        print_lock_trace(backwards_entry->class->usage_traces[bit1], 1);
 
        pr_warn("\nto a %s-irq-unsafe lock:\n", irqclass);
-       print_lock_name(forwards_entry->class);
+       print_lock_name(NULL, forwards_entry->class);
        pr_warn("\n... which became %s-irq-unsafe at:\n", irqclass);
        pr_warn("...");
 
@@ -2966,10 +2978,10 @@ print_deadlock_scenario(struct held_lock *nxt, struct held_lock *prv)
        printk("       CPU0\n");
        printk("       ----\n");
        printk("  lock(");
-       __print_lock_name(prev);
+       __print_lock_name(prv, prev);
        printk(KERN_CONT ");\n");
        printk("  lock(");
-       __print_lock_name(next);
+       __print_lock_name(nxt, next);
        printk(KERN_CONT ");\n");
        printk("\n *** DEADLOCK ***\n\n");
        printk(" May be due to missing lock nesting notation\n\n");
@@ -2979,6 +2991,8 @@ static void
 print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
                   struct held_lock *next)
 {
+       struct lock_class *class = hlock_class(prev);
+
        if (!debug_locks_off_graph_unlock() || debug_locks_silent)
                return;
 
@@ -2993,6 +3007,11 @@ print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
        pr_warn("\nbut task is already holding lock:\n");
        print_lock(prev);
 
+       if (class->cmp_fn) {
+               pr_warn("and the lock comparison function returns %i:\n",
+                       class->cmp_fn(prev->instance, next->instance));
+       }
+
        pr_warn("\nother info that might help us debug this:\n");
        print_deadlock_scenario(next, prev);
        lockdep_print_held_locks(curr);
@@ -3014,6 +3033,7 @@ print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
 static int
 check_deadlock(struct task_struct *curr, struct held_lock *next)
 {
+       struct lock_class *class;
        struct held_lock *prev;
        struct held_lock *nest = NULL;
        int i;
@@ -3034,6 +3054,12 @@ check_deadlock(struct task_struct *curr, struct held_lock *next)
                if ((next->read == 2) && prev->read)
                        continue;
 
+               class = hlock_class(prev);
+
+               if (class->cmp_fn &&
+                   class->cmp_fn(prev->instance, next->instance) < 0)
+                       continue;
+
                /*
                 * We're holding the nest_lock, which serializes this lock's
                 * nesting behaviour.
@@ -3095,6 +3121,14 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
                return 2;
        }
 
+       if (prev->class_idx == next->class_idx) {
+               struct lock_class *class = hlock_class(prev);
+
+               if (class->cmp_fn &&
+                   class->cmp_fn(prev->instance, next->instance) < 0)
+                       return 2;
+       }
+
        /*
         * Prove that the new <prev> -> <next> dependency would not
         * create a circular dependency in the graph. (We do this by
@@ -3571,7 +3605,7 @@ static void print_chain_keys_chain(struct lock_chain *chain)
                hlock_id = chain_hlocks[chain->base + i];
                chain_key = print_chain_key_iteration(hlock_id, chain_key);
 
-               print_lock_name(lock_classes + chain_hlock_class_idx(hlock_id));
+               print_lock_name(NULL, lock_classes + chain_hlock_class_idx(hlock_id));
                printk("\n");
        }
 }
@@ -3928,11 +3962,11 @@ static void print_usage_bug_scenario(struct held_lock *lock)
        printk("       CPU0\n");
        printk("       ----\n");
        printk("  lock(");
-       __print_lock_name(class);
+       __print_lock_name(lock, class);
        printk(KERN_CONT ");\n");
        printk("  <Interrupt>\n");
        printk("    lock(");
-       __print_lock_name(class);
+       __print_lock_name(lock, class);
        printk(KERN_CONT ");\n");
        printk("\n *** DEADLOCK ***\n\n");
 }
@@ -4018,7 +4052,7 @@ print_irq_inversion_bug(struct task_struct *curr,
                pr_warn("but this lock took another, %s-unsafe lock in the past:\n", irqclass);
        else
                pr_warn("but this lock was taken by another, %s-safe lock in the past:\n", irqclass);
-       print_lock_name(other->class);
+       print_lock_name(NULL, other->class);
        pr_warn("\n\nand interrupts could create inverse lock ordering between them.\n\n");
 
        pr_warn("\nother info that might help us debug this:\n");
@@ -4768,7 +4802,8 @@ static int check_wait_context(struct task_struct *curr, struct held_lock *next)
 
        for (; depth < curr->lockdep_depth; depth++) {
                struct held_lock *prev = curr->held_locks + depth;
-               u8 prev_inner = hlock_class(prev)->wait_type_inner;
+               struct lock_class *class = hlock_class(prev);
+               u8 prev_inner = class->wait_type_inner;
 
                if (prev_inner) {
                        /*
@@ -4778,6 +4813,14 @@ static int check_wait_context(struct task_struct *curr, struct held_lock *next)
                         * Also due to trylocks.
                         */
                        curr_inner = min(curr_inner, prev_inner);
+
+                       /*
+                        * Allow override for annotations -- this is typically
+                        * only valid/needed for code that only exists when
+                        * CONFIG_PREEMPT_RT=n.
+                        */
+                       if (unlikely(class->lock_type == LD_LOCK_WAIT_OVERRIDE))
+                               curr_inner = prev_inner;
                }
        }
 
@@ -4882,6 +4925,33 @@ EXPORT_SYMBOL_GPL(lockdep_init_map_type);
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
 
+#ifdef CONFIG_PROVE_LOCKING
+void lockdep_set_lock_cmp_fn(struct lockdep_map *lock, lock_cmp_fn cmp_fn,
+                            lock_print_fn print_fn)
+{
+       struct lock_class *class = lock->class_cache[0];
+       unsigned long flags;
+
+       raw_local_irq_save(flags);
+       lockdep_recursion_inc();
+
+       if (!class)
+               class = register_lock_class(lock, 0, 0);
+
+       if (class) {
+               WARN_ON(class->cmp_fn   && class->cmp_fn != cmp_fn);
+               WARN_ON(class->print_fn && class->print_fn != print_fn);
+
+               class->cmp_fn   = cmp_fn;
+               class->print_fn = print_fn;
+       }
+
+       lockdep_recursion_finish();
+       raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lockdep_set_lock_cmp_fn);
+#endif
+
 static void
 print_lock_nested_lock_not_held(struct task_struct *curr,
                                struct held_lock *hlock)
index 153ddc4..949d3de 100644 (file)
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
 
-torture_param(int, nwriters_stress, -1,
-            "Number of write-locking stress-test threads");
-torture_param(int, nreaders_stress, -1,
-            "Number of read-locking stress-test threads");
+torture_param(int, nwriters_stress, -1, "Number of write-locking stress-test threads");
+torture_param(int, nreaders_stress, -1, "Number of read-locking stress-test threads");
+torture_param(int, long_hold, 100, "Do occasional long hold of lock (ms), 0=disable");
 torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
-torture_param(int, onoff_interval, 0,
-            "Time between CPU hotplugs (s), 0=disable");
-torture_param(int, shuffle_interval, 3,
-            "Number of jiffies between shuffles, 0=disable");
+torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (s), 0=disable");
+torture_param(int, shuffle_interval, 3, "Number of jiffies between shuffles, 0=disable");
 torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable.");
-torture_param(int, stat_interval, 60,
-            "Number of seconds between stats printk()s");
+torture_param(int, stat_interval, 60, "Number of seconds between stats printk()s");
 torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
 torture_param(int, rt_boost, 2,
-               "Do periodic rt-boost. 0=Disable, 1=Only for rt_mutex, 2=For all lock types.");
+                  "Do periodic rt-boost. 0=Disable, 1=Only for rt_mutex, 2=For all lock types.");
 torture_param(int, rt_boost_factor, 50, "A factor determining how often rt-boost happens.");
-torture_param(int, verbose, 1,
-            "Enable verbose debugging printk()s");
+torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
 torture_param(int, nested_locks, 0, "Number of nested locks (max = 8)");
 /* Going much higher trips "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" errors */
 #define MAX_NESTED_LOCKS 8
@@ -120,7 +115,7 @@ static int torture_lock_busted_write_lock(int tid __maybe_unused)
 
 static void torture_lock_busted_write_delay(struct torture_random_state *trsp)
 {
-       const unsigned long longdelay_ms = 100;
+       const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
 
        /* We want a long delay occasionally to force massive contention.  */
        if (!(torture_random(trsp) %
@@ -198,16 +193,18 @@ __acquires(torture_spinlock)
 static void torture_spin_lock_write_delay(struct torture_random_state *trsp)
 {
        const unsigned long shortdelay_us = 2;
-       const unsigned long longdelay_ms = 100;
+       const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
+       unsigned long j;
 
        /* We want a short delay mostly to emulate likely code, and
         * we want a long delay occasionally to force massive contention.
         */
-       if (!(torture_random(trsp) %
-             (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
+       if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2000 * longdelay_ms))) {
+               j = jiffies;
                mdelay(longdelay_ms);
-       if (!(torture_random(trsp) %
-             (cxt.nrealwriters_stress * 2 * shortdelay_us)))
+               pr_alert("%s: delay = %lu jiffies.\n", __func__, jiffies - j);
+       }
+       if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 200 * shortdelay_us)))
                udelay(shortdelay_us);
        if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
                torture_preempt_schedule();  /* Allow test to be preempted. */
@@ -322,7 +319,7 @@ __acquires(torture_rwlock)
 static void torture_rwlock_write_delay(struct torture_random_state *trsp)
 {
        const unsigned long shortdelay_us = 2;
-       const unsigned long longdelay_ms = 100;
+       const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
 
        /* We want a short delay mostly to emulate likely code, and
         * we want a long delay occasionally to force massive contention.
@@ -455,14 +452,12 @@ __acquires(torture_mutex)
 
 static void torture_mutex_delay(struct torture_random_state *trsp)
 {
-       const unsigned long longdelay_ms = 100;
+       const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
 
        /* We want a long delay occasionally to force massive contention.  */
        if (!(torture_random(trsp) %
              (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
                mdelay(longdelay_ms * 5);
-       else
-               mdelay(longdelay_ms / 5);
        if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
                torture_preempt_schedule();  /* Allow test to be preempted. */
 }
@@ -630,7 +625,7 @@ __acquires(torture_rtmutex)
 static void torture_rtmutex_delay(struct torture_random_state *trsp)
 {
        const unsigned long shortdelay_us = 2;
-       const unsigned long longdelay_ms = 100;
+       const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
 
        /*
         * We want a short delay mostly to emulate likely code, and
@@ -640,7 +635,7 @@ static void torture_rtmutex_delay(struct torture_random_state *trsp)
              (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
                mdelay(longdelay_ms);
        if (!(torture_random(trsp) %
-             (cxt.nrealwriters_stress * 2 * shortdelay_us)))
+             (cxt.nrealwriters_stress * 200 * shortdelay_us)))
                udelay(shortdelay_us);
        if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
                torture_preempt_schedule();  /* Allow test to be preempted. */
@@ -695,14 +690,12 @@ __acquires(torture_rwsem)
 
 static void torture_rwsem_write_delay(struct torture_random_state *trsp)
 {
-       const unsigned long longdelay_ms = 100;
+       const unsigned long longdelay_ms = long_hold ? long_hold : ULONG_MAX;
 
        /* We want a long delay occasionally to force massive contention.  */
        if (!(torture_random(trsp) %
              (cxt.nrealwriters_stress * 2000 * longdelay_ms)))
                mdelay(longdelay_ms * 10);
-       else
-               mdelay(longdelay_ms / 10);
        if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000)))
                torture_preempt_schedule();  /* Allow test to be preempted. */
 }
@@ -848,8 +841,8 @@ static int lock_torture_writer(void *arg)
 
                        lwsp->n_lock_acquired++;
                }
-               cxt.cur_ops->write_delay(&rand);
                if (!skip_main_lock) {
+                       cxt.cur_ops->write_delay(&rand);
                        lock_is_write_held = false;
                        WRITE_ONCE(last_lock_release, jiffies);
                        cxt.cur_ops->writeunlock(tid);
index acb5a50..9eabd58 100644 (file)
@@ -1240,7 +1240,7 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
 /*
  * lock for reading
  */
-static inline int __down_read_common(struct rw_semaphore *sem, int state)
+static __always_inline int __down_read_common(struct rw_semaphore *sem, int state)
 {
        int ret = 0;
        long count;
@@ -1258,17 +1258,17 @@ out:
        return ret;
 }
 
-static inline void __down_read(struct rw_semaphore *sem)
+static __always_inline void __down_read(struct rw_semaphore *sem)
 {
        __down_read_common(sem, TASK_UNINTERRUPTIBLE);
 }
 
-static inline int __down_read_interruptible(struct rw_semaphore *sem)
+static __always_inline int __down_read_interruptible(struct rw_semaphore *sem)
 {
        return __down_read_common(sem, TASK_INTERRUPTIBLE);
 }
 
-static inline int __down_read_killable(struct rw_semaphore *sem)
+static __always_inline int __down_read_killable(struct rw_semaphore *sem)
 {
        return __down_read_common(sem, TASK_KILLABLE);
 }
index e97232b..8a5d6d6 100644 (file)
@@ -257,7 +257,7 @@ static ssize_t module_zstd_decompress(struct load_info *info,
        do {
                struct page *page = module_get_next_page(info);
 
-               if (!IS_ERR(page)) {
+               if (IS_ERR(page)) {
                        retval = PTR_ERR(page);
                        goto out;
                }
index 044aa2c..4e2cf78 100644 (file)
@@ -1521,14 +1521,14 @@ static void __layout_sections(struct module *mod, struct load_info *info, bool i
                MOD_RODATA,
                MOD_RO_AFTER_INIT,
                MOD_DATA,
-               MOD_INVALID,    /* This is needed to match the masks array */
+               MOD_DATA,
        };
        static const int init_m_to_mem_type[] = {
                MOD_INIT_TEXT,
                MOD_INIT_RODATA,
                MOD_INVALID,
                MOD_INIT_DATA,
-               MOD_INVALID,    /* This is needed to match the masks array */
+               MOD_INIT_DATA,
        };
 
        for (m = 0; m < ARRAY_SIZE(masks); ++m) {
index ad7b6ad..6ab2c94 100644 (file)
@@ -276,6 +276,7 @@ static ssize_t read_file_mod_stats(struct file *file, char __user *user_buf,
        struct mod_fail_load *mod_fail;
        unsigned int len, size, count_failed = 0;
        char *buf;
+       int ret;
        u32 live_mod_count, fkreads, fdecompress, fbecoming, floads;
        unsigned long total_size, text_size, ikread_bytes, ibecoming_bytes,
                idecompress_bytes, imod_bytes, total_virtual_lost;
@@ -390,8 +391,9 @@ static ssize_t read_file_mod_stats(struct file *file, char __user *user_buf,
 out_unlock:
        mutex_unlock(&module_mutex);
 out:
+       ret = simple_read_from_buffer(user_buf, count, ppos, buf, len);
        kfree(buf);
-        return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+       return ret;
 }
 #undef MAX_PREAMBLE
 #undef MAX_FAILED_MOD_PRINT
index 30d1274..f62e89d 100644 (file)
@@ -11,6 +11,7 @@
 
 #define pr_fmt(fmt) "PM: hibernation: " fmt
 
+#include <linux/blkdev.h>
 #include <linux/export.h>
 #include <linux/suspend.h>
 #include <linux/reboot.h>
@@ -64,7 +65,6 @@ enum {
 static int hibernation_mode = HIBERNATION_SHUTDOWN;
 
 bool freezer_test_done;
-bool snapshot_test;
 
 static const struct platform_hibernation_ops *hibernation_ops;
 
@@ -684,26 +684,22 @@ static void power_down(void)
                cpu_relax();
 }
 
-static int load_image_and_restore(void)
+static int load_image_and_restore(bool snapshot_test)
 {
        int error;
        unsigned int flags;
-       fmode_t mode = FMODE_READ;
-
-       if (snapshot_test)
-               mode |= FMODE_EXCL;
 
        pm_pr_dbg("Loading hibernation image.\n");
 
        lock_device_hotplug();
        error = create_basic_memory_bitmaps();
        if (error) {
-               swsusp_close(mode);
+               swsusp_close(snapshot_test);
                goto Unlock;
        }
 
        error = swsusp_read(&flags);
-       swsusp_close(mode);
+       swsusp_close(snapshot_test);
        if (!error)
                error = hibernation_restore(flags & SF_PLATFORM_MODE);
 
@@ -721,6 +717,7 @@ static int load_image_and_restore(void)
  */
 int hibernate(void)
 {
+       bool snapshot_test = false;
        unsigned int sleep_flags;
        int error;
 
@@ -748,9 +745,6 @@ int hibernate(void)
        if (error)
                goto Exit;
 
-       /* protected by system_transition_mutex */
-       snapshot_test = false;
-
        lock_device_hotplug();
        /* Allocate memory management structures */
        error = create_basic_memory_bitmaps();
@@ -792,9 +786,9 @@ int hibernate(void)
        unlock_device_hotplug();
        if (snapshot_test) {
                pm_pr_dbg("Checking hibernation image\n");
-               error = swsusp_check();
+               error = swsusp_check(snapshot_test);
                if (!error)
-                       error = load_image_and_restore();
+                       error = load_image_and_restore(snapshot_test);
        }
        thaw_processes();
 
@@ -910,52 +904,10 @@ unlock:
 }
 EXPORT_SYMBOL_GPL(hibernate_quiet_exec);
 
-/**
- * software_resume - Resume from a saved hibernation image.
- *
- * This routine is called as a late initcall, when all devices have been
- * discovered and initialized already.
- *
- * The image reading code is called to see if there is a hibernation image
- * available for reading.  If that is the case, devices are quiesced and the
- * contents of memory is restored from the saved image.
- *
- * If this is successful, control reappears in the restored target kernel in
- * hibernation_snapshot() which returns to hibernate().  Otherwise, the routine
- * attempts to recover gracefully and make the kernel return to the normal mode
- * of operation.
- */
-static int software_resume(void)
+static int __init find_resume_device(void)
 {
-       int error;
-
-       /*
-        * If the user said "noresume".. bail out early.
-        */
-       if (noresume || !hibernation_available())
-               return 0;
-
-       /*
-        * name_to_dev_t() below takes a sysfs buffer mutex when sysfs
-        * is configured into the kernel. Since the regular hibernate
-        * trigger path is via sysfs which takes a buffer mutex before
-        * calling hibernate functions (which take system_transition_mutex)
-        * this can cause lockdep to complain about a possible ABBA deadlock
-        * which cannot happen since we're in the boot code here and
-        * sysfs can't be invoked yet. Therefore, we use a subclass
-        * here to avoid lockdep complaining.
-        */
-       mutex_lock_nested(&system_transition_mutex, SINGLE_DEPTH_NESTING);
-
-       snapshot_test = false;
-
-       if (swsusp_resume_device)
-               goto Check_image;
-
-       if (!strlen(resume_file)) {
-               error = -ENOENT;
-               goto Unlock;
-       }
+       if (!strlen(resume_file))
+               return -ENOENT;
 
        pm_pr_dbg("Checking hibernation image partition %s\n", resume_file);
 
@@ -966,40 +918,41 @@ static int software_resume(void)
        }
 
        /* Check if the device is there */
-       swsusp_resume_device = name_to_dev_t(resume_file);
-       if (!swsusp_resume_device) {
-               /*
-                * Some device discovery might still be in progress; we need
-                * to wait for this to finish.
-                */
-               wait_for_device_probe();
-
-               if (resume_wait) {
-                       while ((swsusp_resume_device = name_to_dev_t(resume_file)) == 0)
-                               msleep(10);
-                       async_synchronize_full();
-               }
+       if (!early_lookup_bdev(resume_file, &swsusp_resume_device))
+               return 0;
 
-               swsusp_resume_device = name_to_dev_t(resume_file);
-               if (!swsusp_resume_device) {
-                       error = -ENODEV;
-                       goto Unlock;
-               }
+       /*
+        * Some device discovery might still be in progress; we need to wait for
+        * this to finish.
+        */
+       wait_for_device_probe();
+       if (resume_wait) {
+               while (early_lookup_bdev(resume_file, &swsusp_resume_device))
+                       msleep(10);
+               async_synchronize_full();
        }
 
- Check_image:
+       return early_lookup_bdev(resume_file, &swsusp_resume_device);
+}
+
+static int software_resume(void)
+{
+       int error;
+
        pm_pr_dbg("Hibernation image partition %d:%d present\n",
                MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));
 
        pm_pr_dbg("Looking for hibernation image.\n");
-       error = swsusp_check();
+
+       mutex_lock(&system_transition_mutex);
+       error = swsusp_check(false);
        if (error)
                goto Unlock;
 
        /* The snapshot device should not be opened while we're running */
        if (!hibernate_acquire()) {
                error = -EBUSY;
-               swsusp_close(FMODE_READ | FMODE_EXCL);
+               swsusp_close(false);
                goto Unlock;
        }
 
@@ -1020,7 +973,7 @@ static int software_resume(void)
                goto Close_Finish;
        }
 
-       error = load_image_and_restore();
+       error = load_image_and_restore(false);
        thaw_processes();
  Finish:
        pm_notifier_call_chain(PM_POST_RESTORE);
@@ -1034,11 +987,43 @@ static int software_resume(void)
        pm_pr_dbg("Hibernation image not present or could not be loaded.\n");
        return error;
  Close_Finish:
-       swsusp_close(FMODE_READ | FMODE_EXCL);
+       swsusp_close(false);
        goto Finish;
 }
 
-late_initcall_sync(software_resume);
+/**
+ * software_resume_initcall - Resume from a saved hibernation image.
+ *
+ * This routine is called as a late initcall, when all devices have been
+ * discovered and initialized already.
+ *
+ * The image reading code is called to see if there is a hibernation image
+ * available for reading.  If that is the case, devices are quiesced and the
+ * contents of memory is restored from the saved image.
+ *
+ * If this is successful, control reappears in the restored target kernel in
+ * hibernation_snapshot() which returns to hibernate().  Otherwise, the routine
+ * attempts to recover gracefully and make the kernel return to the normal mode
+ * of operation.
+ */
+static int __init software_resume_initcall(void)
+{
+       /*
+        * If the user said "noresume".. bail out early.
+        */
+       if (noresume || !hibernation_available())
+               return 0;
+
+       if (!swsusp_resume_device) {
+               int error = find_resume_device();
+
+               if (error)
+                       return error;
+       }
+
+       return software_resume();
+}
+late_initcall_sync(software_resume_initcall);
 
 
 static const char * const hibernation_modes[] = {
@@ -1177,7 +1162,11 @@ static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
        unsigned int sleep_flags;
        int len = n;
        char *name;
-       dev_t res;
+       dev_t dev;
+       int error;
+
+       if (!hibernation_available())
+               return 0;
 
        if (len && buf[len-1] == '\n')
                len--;
@@ -1185,13 +1174,29 @@ static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
        if (!name)
                return -ENOMEM;
 
-       res = name_to_dev_t(name);
+       error = lookup_bdev(name, &dev);
+       if (error) {
+               unsigned maj, min, offset;
+               char *p, dummy;
+
+               if (sscanf(name, "%u:%u%c", &maj, &min, &dummy) == 2 ||
+                   sscanf(name, "%u:%u:%u:%c", &maj, &min, &offset,
+                               &dummy) == 3) {
+                       dev = MKDEV(maj, min);
+                       if (maj != MAJOR(dev) || min != MINOR(dev))
+                               error = -EINVAL;
+               } else {
+                       dev = new_decode_dev(simple_strtoul(name, &p, 16));
+                       if (*p)
+                               error = -EINVAL;
+               }
+       }
        kfree(name);
-       if (!res)
-               return -EINVAL;
+       if (error)
+               return error;
 
        sleep_flags = lock_system_sleep();
-       swsusp_resume_device = res;
+       swsusp_resume_device = dev;
        unlock_system_sleep(sleep_flags);
 
        pm_pr_dbg("Configured hibernation resume from disk to %u\n",
index 3113ec2..daa5350 100644 (file)
@@ -556,6 +556,12 @@ power_attr_ro(pm_wakeup_irq);
 
 bool pm_debug_messages_on __read_mostly;
 
+bool pm_debug_messages_should_print(void)
+{
+       return pm_debug_messages_on && pm_suspend_target_state != PM_SUSPEND_ON;
+}
+EXPORT_SYMBOL_GPL(pm_debug_messages_should_print);
+
 static ssize_t pm_debug_messages_show(struct kobject *kobj,
                                      struct kobj_attribute *attr, char *buf)
 {
index b83c8d5..f4a380b 100644 (file)
@@ -26,9 +26,6 @@ extern void __init hibernate_image_size_init(void);
 /* Maximum size of architecture specific data in a hibernation header */
 #define MAX_ARCH_HEADER_SIZE   (sizeof(struct new_utsname) + 4)
 
-extern int arch_hibernation_header_save(void *addr, unsigned int max_size);
-extern int arch_hibernation_header_restore(void *addr);
-
 static inline int init_header_complete(struct swsusp_info *info)
 {
        return arch_hibernation_header_save(info, MAX_ARCH_HEADER_SIZE);
@@ -41,8 +38,6 @@ static inline const char *check_image_kernel(struct swsusp_info *info)
 }
 #endif /* CONFIG_ARCH_HIBERNATION_HEADER */
 
-extern int hibernate_resume_nonboot_cpu_disable(void);
-
 /*
  * Keep some memory free so that I/O operations can succeed without paging
  * [Might this be more than 4 MB?]
@@ -59,7 +54,6 @@ asmlinkage int swsusp_save(void);
 
 /* kernel/power/hibernate.c */
 extern bool freezer_test_done;
-extern bool snapshot_test;
 
 extern int hibernation_snapshot(int platform_mode);
 extern int hibernation_restore(int platform_mode);
@@ -174,11 +168,11 @@ extern int swsusp_swap_in_use(void);
 #define SF_HW_SIG              8
 
 /* kernel/power/hibernate.c */
-extern int swsusp_check(void);
+int swsusp_check(bool snapshot_test);
 extern void swsusp_free(void);
 extern int swsusp_read(unsigned int *flags_p);
 extern int swsusp_write(unsigned int flags);
-extern void swsusp_close(fmode_t);
+void swsusp_close(bool snapshot_test);
 #ifdef CONFIG_SUSPEND
 extern int swsusp_unmark(void);
 #endif
index cd8b7b3..b27affb 100644 (file)
@@ -398,7 +398,7 @@ struct mem_zone_bm_rtree {
        unsigned int blocks;            /* Number of Bitmap Blocks     */
 };
 
-/* strcut bm_position is used for browsing memory bitmaps */
+/* struct bm_position is used for browsing memory bitmaps */
 
 struct bm_position {
        struct mem_zone_bm_rtree *zone;
index 92e41ed..f6ebcd0 100644 (file)
@@ -356,14 +356,14 @@ static int swsusp_swap_check(void)
                return res;
        root_swap = res;
 
-       hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device, FMODE_WRITE,
-                       NULL);
+       hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device,
+                       BLK_OPEN_WRITE, NULL, NULL);
        if (IS_ERR(hib_resume_bdev))
                return PTR_ERR(hib_resume_bdev);
 
        res = set_blocksize(hib_resume_bdev, PAGE_SIZE);
        if (res < 0)
-               blkdev_put(hib_resume_bdev, FMODE_WRITE);
+               blkdev_put(hib_resume_bdev, NULL);
 
        return res;
 }
@@ -443,7 +443,7 @@ static int get_swap_writer(struct swap_map_handle *handle)
 err_rel:
        release_swap_writer(handle);
 err_close:
-       swsusp_close(FMODE_WRITE);
+       swsusp_close(false);
        return ret;
 }
 
@@ -508,7 +508,7 @@ static int swap_writer_finish(struct swap_map_handle *handle,
        if (error)
                free_all_swap_pages(root_swap);
        release_swap_writer(handle);
-       swsusp_close(FMODE_WRITE);
+       swsusp_close(false);
 
        return error;
 }
@@ -1510,21 +1510,19 @@ end:
        return error;
 }
 
+static void *swsusp_holder;
+
 /**
  *      swsusp_check - Check for swsusp signature in the resume device
  */
 
-int swsusp_check(void)
+int swsusp_check(bool snapshot_test)
 {
+       void *holder = snapshot_test ? &swsusp_holder : NULL;
        int error;
-       void *holder;
-       fmode_t mode = FMODE_READ;
 
-       if (snapshot_test)
-               mode |= FMODE_EXCL;
-
-       hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device,
-                                           mode, &holder);
+       hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device, BLK_OPEN_READ,
+                                           holder, NULL);
        if (!IS_ERR(hib_resume_bdev)) {
                set_blocksize(hib_resume_bdev, PAGE_SIZE);
                clear_page(swsusp_header);
@@ -1551,7 +1549,7 @@ int swsusp_check(void)
 
 put:
                if (error)
-                       blkdev_put(hib_resume_bdev, mode);
+                       blkdev_put(hib_resume_bdev, holder);
                else
                        pr_debug("Image signature found, resuming\n");
        } else {
@@ -1568,14 +1566,14 @@ put:
  *     swsusp_close - close swap device.
  */
 
-void swsusp_close(fmode_t mode)
+void swsusp_close(bool snapshot_test)
 {
        if (IS_ERR(hib_resume_bdev)) {
                pr_debug("Image device not initialised\n");
                return;
        }
 
-       blkdev_put(hib_resume_bdev, mode);
+       blkdev_put(hib_resume_bdev, snapshot_test ? &swsusp_holder : NULL);
 }
 
 /**
index 6a333ad..357a4d1 100644 (file)
@@ -528,7 +528,7 @@ static u64 latched_seq_read_nolock(struct latched_seq *ls)
                seq = raw_read_seqcount_latch(&ls->latch);
                idx = seq & 0x1;
                val = ls->val[idx];
-       } while (read_seqcount_latch_retry(&ls->latch, seq));
+       } while (raw_read_seqcount_latch_retry(&ls->latch, seq));
 
        return val;
 }
index 9071182..bdd7ead 100644 (file)
@@ -314,4 +314,22 @@ config RCU_LAZY
          To save power, batch RCU callbacks and flush after delay, memory
          pressure, or callback list growing too big.
 
+config RCU_DOUBLE_CHECK_CB_TIME
+       bool "RCU callback-batch backup time check"
+       depends on RCU_EXPERT
+       default n
+       help
+         Use this option to provide more precise enforcement of the
+         rcutree.rcu_resched_ns module parameter in situations where
+         a single RCU callback might run for hundreds of microseconds,
+         thus defeating the 32-callback batching used to amortize the
+         cost of the fine-grained but expensive local_clock() function.
+
+         This option rounds rcutree.rcu_resched_ns up to the next
+         jiffy, and overrides the 32-callback batching if this limit
+         is exceeded.
+
+         Say Y here if you need tighter callback-limit enforcement.
+         Say N here if you are unsure.
+
 endmenu # "RCU Subsystem"
index 4a1b962..98c1544 100644 (file)
@@ -642,4 +642,10 @@ void show_rcu_tasks_trace_gp_kthread(void);
 static inline void show_rcu_tasks_trace_gp_kthread(void) {}
 #endif
 
+#ifdef CONFIG_TINY_RCU
+static inline bool rcu_cpu_beenfullyonline(int cpu) { return true; }
+#else
+bool rcu_cpu_beenfullyonline(int cpu);
+#endif
+
 #endif /* __LINUX_RCU_H */
index e82ec9f..d122173 100644 (file)
@@ -522,89 +522,6 @@ rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag)
                 scale_type, tag, nrealreaders, nrealwriters, verbose, shutdown);
 }
 
-static void
-rcu_scale_cleanup(void)
-{
-       int i;
-       int j;
-       int ngps = 0;
-       u64 *wdp;
-       u64 *wdpp;
-
-       /*
-        * Would like warning at start, but everything is expedited
-        * during the mid-boot phase, so have to wait till the end.
-        */
-       if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp)
-               SCALEOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
-       if (rcu_gp_is_normal() && gp_exp)
-               SCALEOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
-       if (gp_exp && gp_async)
-               SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
-
-       if (torture_cleanup_begin())
-               return;
-       if (!cur_ops) {
-               torture_cleanup_end();
-               return;
-       }
-
-       if (reader_tasks) {
-               for (i = 0; i < nrealreaders; i++)
-                       torture_stop_kthread(rcu_scale_reader,
-                                            reader_tasks[i]);
-               kfree(reader_tasks);
-       }
-
-       if (writer_tasks) {
-               for (i = 0; i < nrealwriters; i++) {
-                       torture_stop_kthread(rcu_scale_writer,
-                                            writer_tasks[i]);
-                       if (!writer_n_durations)
-                               continue;
-                       j = writer_n_durations[i];
-                       pr_alert("%s%s writer %d gps: %d\n",
-                                scale_type, SCALE_FLAG, i, j);
-                       ngps += j;
-               }
-               pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n",
-                        scale_type, SCALE_FLAG,
-                        t_rcu_scale_writer_started, t_rcu_scale_writer_finished,
-                        t_rcu_scale_writer_finished -
-                        t_rcu_scale_writer_started,
-                        ngps,
-                        rcuscale_seq_diff(b_rcu_gp_test_finished,
-                                          b_rcu_gp_test_started));
-               for (i = 0; i < nrealwriters; i++) {
-                       if (!writer_durations)
-                               break;
-                       if (!writer_n_durations)
-                               continue;
-                       wdpp = writer_durations[i];
-                       if (!wdpp)
-                               continue;
-                       for (j = 0; j < writer_n_durations[i]; j++) {
-                               wdp = &wdpp[j];
-                               pr_alert("%s%s %4d writer-duration: %5d %llu\n",
-                                       scale_type, SCALE_FLAG,
-                                       i, j, *wdp);
-                               if (j % 100 == 0)
-                                       schedule_timeout_uninterruptible(1);
-                       }
-                       kfree(writer_durations[i]);
-               }
-               kfree(writer_tasks);
-               kfree(writer_durations);
-               kfree(writer_n_durations);
-       }
-
-       /* Do torture-type-specific cleanup operations.  */
-       if (cur_ops->cleanup != NULL)
-               cur_ops->cleanup();
-
-       torture_cleanup_end();
-}
-
 /*
  * Return the number if non-negative.  If -1, the number of CPUs.
  * If less than -1, that much less than the number of CPUs, but
@@ -625,20 +542,6 @@ static int compute_real(int n)
 }
 
 /*
- * RCU scalability shutdown kthread.  Just waits to be awakened, then shuts
- * down system.
- */
-static int
-rcu_scale_shutdown(void *arg)
-{
-       wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
-       smp_mb(); /* Wake before output. */
-       rcu_scale_cleanup();
-       kernel_power_off();
-       return -EINVAL;
-}
-
-/*
  * kfree_rcu() scalability tests: Start a kfree_rcu() loop on all CPUs for number
  * of iterations and measure total time and number of GP for all iterations to complete.
  */
@@ -874,6 +777,108 @@ unwind:
        return firsterr;
 }
 
+static void
+rcu_scale_cleanup(void)
+{
+       int i;
+       int j;
+       int ngps = 0;
+       u64 *wdp;
+       u64 *wdpp;
+
+       /*
+        * Would like warning at start, but everything is expedited
+        * during the mid-boot phase, so have to wait till the end.
+        */
+       if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp)
+               SCALEOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
+       if (rcu_gp_is_normal() && gp_exp)
+               SCALEOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
+       if (gp_exp && gp_async)
+               SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
+
+       if (kfree_rcu_test) {
+               kfree_scale_cleanup();
+               return;
+       }
+
+       if (torture_cleanup_begin())
+               return;
+       if (!cur_ops) {
+               torture_cleanup_end();
+               return;
+       }
+
+       if (reader_tasks) {
+               for (i = 0; i < nrealreaders; i++)
+                       torture_stop_kthread(rcu_scale_reader,
+                                            reader_tasks[i]);
+               kfree(reader_tasks);
+       }
+
+       if (writer_tasks) {
+               for (i = 0; i < nrealwriters; i++) {
+                       torture_stop_kthread(rcu_scale_writer,
+                                            writer_tasks[i]);
+                       if (!writer_n_durations)
+                               continue;
+                       j = writer_n_durations[i];
+                       pr_alert("%s%s writer %d gps: %d\n",
+                                scale_type, SCALE_FLAG, i, j);
+                       ngps += j;
+               }
+               pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n",
+                        scale_type, SCALE_FLAG,
+                        t_rcu_scale_writer_started, t_rcu_scale_writer_finished,
+                        t_rcu_scale_writer_finished -
+                        t_rcu_scale_writer_started,
+                        ngps,
+                        rcuscale_seq_diff(b_rcu_gp_test_finished,
+                                          b_rcu_gp_test_started));
+               for (i = 0; i < nrealwriters; i++) {
+                       if (!writer_durations)
+                               break;
+                       if (!writer_n_durations)
+                               continue;
+                       wdpp = writer_durations[i];
+                       if (!wdpp)
+                               continue;
+                       for (j = 0; j < writer_n_durations[i]; j++) {
+                               wdp = &wdpp[j];
+                               pr_alert("%s%s %4d writer-duration: %5d %llu\n",
+                                       scale_type, SCALE_FLAG,
+                                       i, j, *wdp);
+                               if (j % 100 == 0)
+                                       schedule_timeout_uninterruptible(1);
+                       }
+                       kfree(writer_durations[i]);
+               }
+               kfree(writer_tasks);
+               kfree(writer_durations);
+               kfree(writer_n_durations);
+       }
+
+       /* Do torture-type-specific cleanup operations.  */
+       if (cur_ops->cleanup != NULL)
+               cur_ops->cleanup();
+
+       torture_cleanup_end();
+}
+
+/*
+ * RCU scalability shutdown kthread.  Just waits to be awakened, then shuts
+ * down system.
+ */
+static int
+rcu_scale_shutdown(void *arg)
+{
+       wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
+       smp_mb(); /* Wake before output. */
+       rcu_scale_cleanup();
+       kernel_power_off();
+       return -EINVAL;
+}
+
 static int __init
 rcu_scale_init(void)
 {
index 5f4fc81..b770add 100644 (file)
@@ -241,7 +241,6 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
        if (rcu_task_enqueue_lim < 0) {
                rcu_task_enqueue_lim = 1;
                rcu_task_cb_adjust = true;
-               pr_info("%s: Setting adjustable number of callback queues.\n", __func__);
        } else if (rcu_task_enqueue_lim == 0) {
                rcu_task_enqueue_lim = 1;
        }
@@ -272,7 +271,9 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
                raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
        }
        raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
-       pr_info("%s: Setting shift to %d and lim to %d.\n", __func__, data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim));
+
+       pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d.\n", rtp->name,
+                       data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim), rcu_task_cb_adjust);
 }
 
 // IRQ-work handler that does deferred wakeup for call_rcu_tasks_generic().
@@ -463,6 +464,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
 {
        int cpu;
        int cpunext;
+       int cpuwq;
        unsigned long flags;
        int len;
        struct rcu_head *rhp;
@@ -473,11 +475,13 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
        cpunext = cpu * 2 + 1;
        if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
                rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
-               queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
+               cpuwq = rcu_cpu_beenfullyonline(cpunext) ? cpunext : WORK_CPU_UNBOUND;
+               queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
                cpunext++;
                if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
                        rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
-                       queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
+                       cpuwq = rcu_cpu_beenfullyonline(cpunext) ? cpunext : WORK_CPU_UNBOUND;
+                       queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
                }
        }
 
index f52ff72..1449cb6 100644 (file)
@@ -2046,19 +2046,35 @@ rcu_check_quiescent_state(struct rcu_data *rdp)
        rcu_report_qs_rdp(rdp);
 }
 
+/* Return true if callback-invocation time limit exceeded. */
+static bool rcu_do_batch_check_time(long count, long tlimit,
+                                   bool jlimit_check, unsigned long jlimit)
+{
+       // Invoke local_clock() only once per 32 consecutive callbacks.
+       return unlikely(tlimit) &&
+              (!likely(count & 31) ||
+               (IS_ENABLED(CONFIG_RCU_DOUBLE_CHECK_CB_TIME) &&
+                jlimit_check && time_after(jiffies, jlimit))) &&
+              local_clock() >= tlimit;
+}
+
 /*
  * Invoke any RCU callbacks that have made it to the end of their grace
  * period.  Throttle as specified by rdp->blimit.
  */
 static void rcu_do_batch(struct rcu_data *rdp)
 {
+       long bl;
+       long count = 0;
        int div;
        bool __maybe_unused empty;
        unsigned long flags;
-       struct rcu_head *rhp;
+       unsigned long jlimit;
+       bool jlimit_check = false;
+       long pending;
        struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
-       long bl, count = 0;
-       long pending, tlimit = 0;
+       struct rcu_head *rhp;
+       long tlimit = 0;
 
        /* If no callbacks are ready, just return. */
        if (!rcu_segcblist_ready_cbs(&rdp->cblist)) {
@@ -2082,11 +2098,15 @@ static void rcu_do_batch(struct rcu_data *rdp)
        div = READ_ONCE(rcu_divisor);
        div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
        bl = max(rdp->blimit, pending >> div);
-       if (in_serving_softirq() && unlikely(bl > 100)) {
+       if ((in_serving_softirq() || rdp->rcu_cpu_kthread_status == RCU_KTHREAD_RUNNING) &&
+           (IS_ENABLED(CONFIG_RCU_DOUBLE_CHECK_CB_TIME) || unlikely(bl > 100))) {
+               const long npj = NSEC_PER_SEC / HZ;
                long rrn = READ_ONCE(rcu_resched_ns);
 
                rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn;
                tlimit = local_clock() + rrn;
+               jlimit = jiffies + (rrn + npj + 1) / npj;
+               jlimit_check = true;
        }
        trace_rcu_batch_start(rcu_state.name,
                              rcu_segcblist_n_cbs(&rdp->cblist), bl);
@@ -2126,21 +2146,23 @@ static void rcu_do_batch(struct rcu_data *rdp)
                         * Make sure we don't spend too much time here and deprive other
                         * softirq vectors of CPU cycles.
                         */
-                       if (unlikely(tlimit)) {
-                               /* only call local_clock() every 32 callbacks */
-                               if (likely((count & 31) || local_clock() < tlimit))
-                                       continue;
-                               /* Exceeded the time limit, so leave. */
+                       if (rcu_do_batch_check_time(count, tlimit, jlimit_check, jlimit))
                                break;
-                       }
                } else {
-                       // In rcuoc context, so no worries about depriving
-                       // other softirq vectors of CPU cycles.
+                       // In rcuc/rcuoc context, so no worries about
+                       // depriving other softirq vectors of CPU cycles.
                        local_bh_enable();
                        lockdep_assert_irqs_enabled();
                        cond_resched_tasks_rcu_qs();
                        lockdep_assert_irqs_enabled();
                        local_bh_disable();
+                       // But rcuc kthreads can delay quiescent-state
+                       // reporting, so check time limits for them.
+                       if (rdp->rcu_cpu_kthread_status == RCU_KTHREAD_RUNNING &&
+                           rcu_do_batch_check_time(count, tlimit, jlimit_check, jlimit)) {
+                               rdp->rcu_cpu_has_work = 1;
+                               break;
+                       }
                }
        }
 
@@ -2459,12 +2481,12 @@ static void rcu_cpu_kthread(unsigned int cpu)
                *statusp = RCU_KTHREAD_RUNNING;
                local_irq_disable();
                work = *workp;
-               *workp = 0;
+               WRITE_ONCE(*workp, 0);
                local_irq_enable();
                if (work)
                        rcu_core();
                local_bh_enable();
-               if (*workp == 0) {
+               if (!READ_ONCE(*workp)) {
                        trace_rcu_utilization(TPS("End CPU kthread@rcu_wait"));
                        *statusp = RCU_KTHREAD_WAITING;
                        return;
@@ -2756,7 +2778,7 @@ EXPORT_SYMBOL_GPL(call_rcu);
  */
 struct kvfree_rcu_bulk_data {
        struct list_head list;
-       unsigned long gp_snap;
+       struct rcu_gp_oldstate gp_snap;
        unsigned long nr_records;
        void *records[];
 };
@@ -2773,6 +2795,7 @@ struct kvfree_rcu_bulk_data {
  * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
  * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
  * @head_free: List of kfree_rcu() objects waiting for a grace period
+ * @head_free_gp_snap: Grace-period snapshot to check for attempted premature frees.
  * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
  * @krcp: Pointer to @kfree_rcu_cpu structure
  */
@@ -2780,6 +2803,7 @@ struct kvfree_rcu_bulk_data {
 struct kfree_rcu_cpu_work {
        struct rcu_work rcu_work;
        struct rcu_head *head_free;
+       struct rcu_gp_oldstate head_free_gp_snap;
        struct list_head bulk_head_free[FREE_N_CHANNELS];
        struct kfree_rcu_cpu *krcp;
 };
@@ -2900,6 +2924,9 @@ drain_page_cache(struct kfree_rcu_cpu *krcp)
        struct llist_node *page_list, *pos, *n;
        int freed = 0;
 
+       if (!rcu_min_cached_objs)
+               return 0;
+
        raw_spin_lock_irqsave(&krcp->lock, flags);
        page_list = llist_del_all(&krcp->bkvcache);
        WRITE_ONCE(krcp->nr_bkv_objs, 0);
@@ -2920,24 +2947,25 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
        unsigned long flags;
        int i;
 
-       debug_rcu_bhead_unqueue(bnode);
-
-       rcu_lock_acquire(&rcu_callback_map);
-       if (idx == 0) { // kmalloc() / kfree().
-               trace_rcu_invoke_kfree_bulk_callback(
-                       rcu_state.name, bnode->nr_records,
-                       bnode->records);
-
-               kfree_bulk(bnode->nr_records, bnode->records);
-       } else { // vmalloc() / vfree().
-               for (i = 0; i < bnode->nr_records; i++) {
-                       trace_rcu_invoke_kvfree_callback(
-                               rcu_state.name, bnode->records[i], 0);
-
-                       vfree(bnode->records[i]);
+       if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
+               debug_rcu_bhead_unqueue(bnode);
+               rcu_lock_acquire(&rcu_callback_map);
+               if (idx == 0) { // kmalloc() / kfree().
+                       trace_rcu_invoke_kfree_bulk_callback(
+                               rcu_state.name, bnode->nr_records,
+                               bnode->records);
+
+                       kfree_bulk(bnode->nr_records, bnode->records);
+               } else { // vmalloc() / vfree().
+                       for (i = 0; i < bnode->nr_records; i++) {
+                               trace_rcu_invoke_kvfree_callback(
+                                       rcu_state.name, bnode->records[i], 0);
+
+                               vfree(bnode->records[i]);
+                       }
                }
+               rcu_lock_release(&rcu_callback_map);
        }
-       rcu_lock_release(&rcu_callback_map);
 
        raw_spin_lock_irqsave(&krcp->lock, flags);
        if (put_cached_bnode(krcp, bnode))
@@ -2984,6 +3012,7 @@ static void kfree_rcu_work(struct work_struct *work)
        struct rcu_head *head;
        struct kfree_rcu_cpu *krcp;
        struct kfree_rcu_cpu_work *krwp;
+       struct rcu_gp_oldstate head_gp_snap;
        int i;
 
        krwp = container_of(to_rcu_work(work),
@@ -2998,6 +3027,7 @@ static void kfree_rcu_work(struct work_struct *work)
        // Channel 3.
        head = krwp->head_free;
        krwp->head_free = NULL;
+       head_gp_snap = krwp->head_free_gp_snap;
        raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
        // Handle the first two channels.
@@ -3014,7 +3044,8 @@ static void kfree_rcu_work(struct work_struct *work)
         * queued on a linked list through their rcu_head structures.
         * This list is named "Channel 3".
         */
-       kvfree_rcu_list(head);
+       if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap)))
+               kvfree_rcu_list(head);
 }
 
 static bool
@@ -3081,7 +3112,7 @@ kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp)
                INIT_LIST_HEAD(&bulk_ready[i]);
 
                list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) {
-                       if (!poll_state_synchronize_rcu(bnode->gp_snap))
+                       if (!poll_state_synchronize_rcu_full(&bnode->gp_snap))
                                break;
 
                        atomic_sub(bnode->nr_records, &krcp->bulk_count[i]);
@@ -3146,6 +3177,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
                        // objects queued on the linked list.
                        if (!krwp->head_free) {
                                krwp->head_free = krcp->head;
+                               get_state_synchronize_rcu_full(&krwp->head_free_gp_snap);
                                atomic_set(&krcp->head_count, 0);
                                WRITE_ONCE(krcp->head, NULL);
                        }
@@ -3194,7 +3226,7 @@ static void fill_page_cache_func(struct work_struct *work)
        nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ?
                1 : rcu_min_cached_objs;
 
-       for (i = 0; i < nr_pages; i++) {
+       for (i = READ_ONCE(krcp->nr_bkv_objs); i < nr_pages; i++) {
                bnode = (struct kvfree_rcu_bulk_data *)
                        __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 
@@ -3218,6 +3250,10 @@ static void fill_page_cache_func(struct work_struct *work)
 static void
 run_page_cache_worker(struct kfree_rcu_cpu *krcp)
 {
+       // If cache disabled, bail out.
+       if (!rcu_min_cached_objs)
+               return;
+
        if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
                        !atomic_xchg(&krcp->work_in_progress, 1)) {
                if (atomic_read(&krcp->backoff_page_cache_fill)) {
@@ -3272,7 +3308,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
                        // scenarios.
                        bnode = (struct kvfree_rcu_bulk_data *)
                                __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
-                       *krcp = krc_this_cpu_lock(flags);
+                       raw_spin_lock_irqsave(&(*krcp)->lock, *flags);
                }
 
                if (!bnode)
@@ -3285,7 +3321,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
 
        // Finally insert and update the GP for this page.
        bnode->records[bnode->nr_records++] = ptr;
-       bnode->gp_snap = get_state_synchronize_rcu();
+       get_state_synchronize_rcu_full(&bnode->gp_snap);
        atomic_inc(&(*krcp)->bulk_count[idx]);
 
        return true;
@@ -4283,7 +4319,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
         */
        rnp = rdp->mynode;
        raw_spin_lock_rcu_node(rnp);            /* irqs already disabled. */
-       rdp->beenonline = true;  /* We have now been online. */
        rdp->gp_seq = READ_ONCE(rnp->gp_seq);
        rdp->gp_seq_needed = rdp->gp_seq;
        rdp->cpu_no_qs.b.norm = true;
@@ -4311,6 +4346,16 @@ static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
 }
 
 /*
+ * Has the specified (known valid) CPU ever been fully online?
+ */
+bool rcu_cpu_beenfullyonline(int cpu)
+{
+       struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
+       return smp_load_acquire(&rdp->beenonline);
+}
+
+/*
  * Near the end of the CPU-online process.  Pretty much all services
  * enabled, and the CPU is now very much alive.
  */
@@ -4368,15 +4413,16 @@ int rcutree_offline_cpu(unsigned int cpu)
  * Note that this function is special in that it is invoked directly
  * from the incoming CPU rather than from the cpuhp_step mechanism.
  * This is because this function must be invoked at a precise location.
+ * This incoming CPU must not have enabled interrupts yet.
  */
 void rcu_cpu_starting(unsigned int cpu)
 {
-       unsigned long flags;
        unsigned long mask;
        struct rcu_data *rdp;
        struct rcu_node *rnp;
        bool newcpu;
 
+       lockdep_assert_irqs_disabled();
        rdp = per_cpu_ptr(&rcu_data, cpu);
        if (rdp->cpu_started)
                return;
@@ -4384,7 +4430,6 @@ void rcu_cpu_starting(unsigned int cpu)
 
        rnp = rdp->mynode;
        mask = rdp->grpmask;
-       local_irq_save(flags);
        arch_spin_lock(&rcu_state.ofl_lock);
        rcu_dynticks_eqs_online();
        raw_spin_lock(&rcu_state.barrier_lock);
@@ -4403,17 +4448,17 @@ void rcu_cpu_starting(unsigned int cpu)
        /* An incoming CPU should never be blocking a grace period. */
        if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
                /* rcu_report_qs_rnp() *really* wants some flags to restore */
-               unsigned long flags2;
+               unsigned long flags;
 
-               local_irq_save(flags2);
+               local_irq_save(flags);
                rcu_disable_urgency_upon_qs(rdp);
                /* Report QS -after- changing ->qsmaskinitnext! */
-               rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags2);
+               rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
        } else {
                raw_spin_unlock_rcu_node(rnp);
        }
        arch_spin_unlock(&rcu_state.ofl_lock);
-       local_irq_restore(flags);
+       smp_store_release(&rdp->beenonline, true);
        smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
 }
 
index 3b7abb5..8239b39 100644 (file)
@@ -643,7 +643,7 @@ static void synchronize_rcu_expedited_wait(void)
                                        "O."[!!cpu_online(cpu)],
                                        "o."[!!(rdp->grpmask & rnp->expmaskinit)],
                                        "N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
-                                       "D."[!!(rdp->cpu_no_qs.b.exp)]);
+                                       "D."[!!data_race(rdp->cpu_no_qs.b.exp)]);
                        }
                }
                pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
index f228061..43229d2 100644 (file)
@@ -1319,13 +1319,22 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
        int cpu;
        unsigned long count = 0;
 
+       if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
+               return 0;
+
+       /*  Protect rcu_nocb_mask against concurrent (de-)offloading. */
+       if (!mutex_trylock(&rcu_state.barrier_mutex))
+               return 0;
+
        /* Snapshot count of all CPUs */
-       for_each_possible_cpu(cpu) {
+       for_each_cpu(cpu, rcu_nocb_mask) {
                struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 
                count +=  READ_ONCE(rdp->lazy_len);
        }
 
+       mutex_unlock(&rcu_state.barrier_mutex);
+
        return count ? count : SHRINK_EMPTY;
 }
 
@@ -1336,15 +1345,45 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
        unsigned long flags;
        unsigned long count = 0;
 
+       if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
+               return 0;
+       /*
+        * Protect against concurrent (de-)offloading. Otherwise nocb locking
+        * may be ignored or imbalanced.
+        */
+       if (!mutex_trylock(&rcu_state.barrier_mutex)) {
+               /*
+                * But really don't insist if barrier_mutex is contended since we
+                * can't guarantee that it will never engage in a dependency
+                * chain involving memory allocation. The lock is seldom contended
+                * anyway.
+                */
+               return 0;
+       }
+
        /* Snapshot count of all CPUs */
-       for_each_possible_cpu(cpu) {
+       for_each_cpu(cpu, rcu_nocb_mask) {
                struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
-               int _count = READ_ONCE(rdp->lazy_len);
+               int _count;
+
+               if (WARN_ON_ONCE(!rcu_rdp_is_offloaded(rdp)))
+                       continue;
 
-               if (_count == 0)
+               if (!READ_ONCE(rdp->lazy_len))
                        continue;
+
                rcu_nocb_lock_irqsave(rdp, flags);
-               WRITE_ONCE(rdp->lazy_len, 0);
+               /*
+                * Recheck under the nocb lock. Since we are not holding the bypass
+                * lock we may still race with increments from the enqueuer but still
+                * we know for sure if there is at least one lazy callback.
+                */
+               _count = READ_ONCE(rdp->lazy_len);
+               if (!_count) {
+                       rcu_nocb_unlock_irqrestore(rdp, flags);
+                       continue;
+               }
+               WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
                rcu_nocb_unlock_irqrestore(rdp, flags);
                wake_nocb_gp(rdp, false);
                sc->nr_to_scan -= _count;
@@ -1352,6 +1391,9 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
                if (sc->nr_to_scan <= 0)
                        break;
        }
+
+       mutex_unlock(&rcu_state.barrier_mutex);
+
        return count ? count : SHRINK_STOP;
 }
 
index 7b0fe74..4102108 100644 (file)
@@ -257,6 +257,8 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
         * GP should not be able to end until we report, so there should be
         * no need to check for a subsequent expedited GP.  (Though we are
         * still in a quiescent state in any case.)
+        *
+        * Interrupts are disabled, so ->cpu_no_qs.b.exp cannot change.
         */
        if (blkd_state & RCU_EXP_BLKD && rdp->cpu_no_qs.b.exp)
                rcu_report_exp_rdp(rdp);
@@ -941,7 +943,7 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t)
 {
        struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 
-       if (rdp->cpu_no_qs.b.exp)
+       if (READ_ONCE(rdp->cpu_no_qs.b.exp))
                rcu_report_exp_rdp(rdp);
 }
 
index b5cc2b5..3c6193d 100644 (file)
@@ -266,7 +266,7 @@ static __always_inline u64 sched_clock_local(struct sched_clock_data *scd)
        s64 delta;
 
 again:
-       now = sched_clock();
+       now = sched_clock_noinstr();
        delta = now - scd->tick_raw;
        if (unlikely(delta < 0))
                delta = 0;
@@ -287,28 +287,35 @@ again:
        clock = wrap_max(clock, min_clock);
        clock = wrap_min(clock, max_clock);
 
-       if (!arch_try_cmpxchg64(&scd->clock, &old_clock, clock))
+       if (!raw_try_cmpxchg64(&scd->clock, &old_clock, clock))
                goto again;
 
        return clock;
 }
 
-noinstr u64 local_clock(void)
+noinstr u64 local_clock_noinstr(void)
 {
        u64 clock;
 
        if (static_branch_likely(&__sched_clock_stable))
-               return sched_clock() + __sched_clock_offset;
+               return sched_clock_noinstr() + __sched_clock_offset;
 
        if (!static_branch_likely(&sched_clock_running))
-               return sched_clock();
+               return sched_clock_noinstr();
 
-       preempt_disable_notrace();
        clock = sched_clock_local(this_scd());
-       preempt_enable_notrace();
 
        return clock;
 }
+
+u64 local_clock(void)
+{
+       u64 now;
+       preempt_disable_notrace();
+       now = local_clock_noinstr();
+       preempt_enable_notrace();
+       return now;
+}
 EXPORT_SYMBOL_GPL(local_clock);
 
 static notrace u64 sched_clock_remote(struct sched_clock_data *scd)
index 944c3ae..c52c2eb 100644 (file)
@@ -2213,6 +2213,154 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
                rq_clock_skip_update(rq);
 }
 
+static __always_inline
+int __task_state_match(struct task_struct *p, unsigned int state)
+{
+       if (READ_ONCE(p->__state) & state)
+               return 1;
+
+#ifdef CONFIG_PREEMPT_RT
+       if (READ_ONCE(p->saved_state) & state)
+               return -1;
+#endif
+       return 0;
+}
+
+static __always_inline
+int task_state_match(struct task_struct *p, unsigned int state)
+{
+#ifdef CONFIG_PREEMPT_RT
+       int match;
+
+       /*
+        * Serialize against current_save_and_set_rtlock_wait_state() and
+        * current_restore_rtlock_saved_state().
+        */
+       raw_spin_lock_irq(&p->pi_lock);
+       match = __task_state_match(p, state);
+       raw_spin_unlock_irq(&p->pi_lock);
+
+       return match;
+#else
+       return __task_state_match(p, state);
+#endif
+}
+
+/*
+ * wait_task_inactive - wait for a thread to unschedule.
+ *
+ * Wait for the thread to block in any of the states set in @match_state.
+ * If it changes, i.e. @p might have woken up, then return zero.  When we
+ * succeed in waiting for @p to be off its CPU, we return a positive number
+ * (its total switch count).  If a second call a short while later returns the
+ * same number, the caller can be sure that @p has remained unscheduled the
+ * whole time.
+ *
+ * The caller must ensure that the task *will* unschedule sometime soon,
+ * else this function might spin for a *long* time. This function can't
+ * be called with interrupts off, or it may introduce deadlock with
+ * smp_call_function() if an IPI is sent by the same process we are
+ * waiting to become inactive.
+ */
+unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
+{
+       int running, queued, match;
+       struct rq_flags rf;
+       unsigned long ncsw;
+       struct rq *rq;
+
+       for (;;) {
+               /*
+                * We do the initial early heuristics without holding
+                * any task-queue locks at all. We'll only try to get
+                * the runqueue lock when things look like they will
+                * work out!
+                */
+               rq = task_rq(p);
+
+               /*
+                * If the task is actively running on another CPU
+                * still, just relax and busy-wait without holding
+                * any locks.
+                *
+                * NOTE! Since we don't hold any locks, it's not
+                * even sure that "rq" stays as the right runqueue!
+                * But we don't care, since "task_on_cpu()" will
+                * return false if the runqueue has changed and p
+                * is actually now running somewhere else!
+                */
+               while (task_on_cpu(rq, p)) {
+                       if (!task_state_match(p, match_state))
+                               return 0;
+                       cpu_relax();
+               }
+
+               /*
+                * Ok, time to look more closely! We need the rq
+                * lock now, to be *sure*. If we're wrong, we'll
+                * just go back and repeat.
+                */
+               rq = task_rq_lock(p, &rf);
+               trace_sched_wait_task(p);
+               running = task_on_cpu(rq, p);
+               queued = task_on_rq_queued(p);
+               ncsw = 0;
+               if ((match = __task_state_match(p, match_state))) {
+                       /*
+                        * When matching on p->saved_state, consider this task
+                        * still queued so it will wait.
+                        */
+                       if (match < 0)
+                               queued = 1;
+                       ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+               }
+               task_rq_unlock(rq, p, &rf);
+
+               /*
+                * If it changed from the expected state, bail out now.
+                */
+               if (unlikely(!ncsw))
+                       break;
+
+               /*
+                * Was it really running after all now that we
+                * checked with the proper locks actually held?
+                *
+                * Oops. Go back and try again..
+                */
+               if (unlikely(running)) {
+                       cpu_relax();
+                       continue;
+               }
+
+               /*
+                * It's not enough that it's not actively running,
+                * it must be off the runqueue _entirely_, and not
+                * preempted!
+                *
+                * So if it was still runnable (but just not actively
+                * running right now), it's preempted, and we should
+                * yield - it could be a while.
+                */
+               if (unlikely(queued)) {
+                       ktime_t to = NSEC_PER_SEC / HZ;
+
+                       set_current_state(TASK_UNINTERRUPTIBLE);
+                       schedule_hrtimeout(&to, HRTIMER_MODE_REL_HARD);
+                       continue;
+               }
+
+               /*
+                * Ahh, all good. It wasn't running, and it wasn't
+                * runnable, which means that it will never become
+                * running in the future either. We're all done!
+                */
+               break;
+       }
+
+       return ncsw;
+}
+
 #ifdef CONFIG_SMP
 
 static void
@@ -2398,7 +2546,6 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
        if (!is_cpu_allowed(p, dest_cpu))
                return rq;
 
-       update_rq_clock(rq);
        rq = move_queued_task(rq, rf, p, dest_cpu);
 
        return rq;
@@ -2456,10 +2603,12 @@ static int migration_cpu_stop(void *data)
                                goto out;
                }
 
-               if (task_on_rq_queued(p))
+               if (task_on_rq_queued(p)) {
+                       update_rq_clock(rq);
                        rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
-               else
+               } else {
                        p->wake_cpu = arg->dest_cpu;
+               }
 
                /*
                 * XXX __migrate_task() can fail, at which point we might end
@@ -3341,114 +3490,6 @@ out:
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
-/*
- * wait_task_inactive - wait for a thread to unschedule.
- *
- * Wait for the thread to block in any of the states set in @match_state.
- * If it changes, i.e. @p might have woken up, then return zero.  When we
- * succeed in waiting for @p to be off its CPU, we return a positive number
- * (its total switch count).  If a second call a short while later returns the
- * same number, the caller can be sure that @p has remained unscheduled the
- * whole time.
- *
- * The caller must ensure that the task *will* unschedule sometime soon,
- * else this function might spin for a *long* time. This function can't
- * be called with interrupts off, or it may introduce deadlock with
- * smp_call_function() if an IPI is sent by the same process we are
- * waiting to become inactive.
- */
-unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
-{
-       int running, queued;
-       struct rq_flags rf;
-       unsigned long ncsw;
-       struct rq *rq;
-
-       for (;;) {
-               /*
-                * We do the initial early heuristics without holding
-                * any task-queue locks at all. We'll only try to get
-                * the runqueue lock when things look like they will
-                * work out!
-                */
-               rq = task_rq(p);
-
-               /*
-                * If the task is actively running on another CPU
-                * still, just relax and busy-wait without holding
-                * any locks.
-                *
-                * NOTE! Since we don't hold any locks, it's not
-                * even sure that "rq" stays as the right runqueue!
-                * But we don't care, since "task_on_cpu()" will
-                * return false if the runqueue has changed and p
-                * is actually now running somewhere else!
-                */
-               while (task_on_cpu(rq, p)) {
-                       if (!(READ_ONCE(p->__state) & match_state))
-                               return 0;
-                       cpu_relax();
-               }
-
-               /*
-                * Ok, time to look more closely! We need the rq
-                * lock now, to be *sure*. If we're wrong, we'll
-                * just go back and repeat.
-                */
-               rq = task_rq_lock(p, &rf);
-               trace_sched_wait_task(p);
-               running = task_on_cpu(rq, p);
-               queued = task_on_rq_queued(p);
-               ncsw = 0;
-               if (READ_ONCE(p->__state) & match_state)
-                       ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
-               task_rq_unlock(rq, p, &rf);
-
-               /*
-                * If it changed from the expected state, bail out now.
-                */
-               if (unlikely(!ncsw))
-                       break;
-
-               /*
-                * Was it really running after all now that we
-                * checked with the proper locks actually held?
-                *
-                * Oops. Go back and try again..
-                */
-               if (unlikely(running)) {
-                       cpu_relax();
-                       continue;
-               }
-
-               /*
-                * It's not enough that it's not actively running,
-                * it must be off the runqueue _entirely_, and not
-                * preempted!
-                *
-                * So if it was still runnable (but just not actively
-                * running right now), it's preempted, and we should
-                * yield - it could be a while.
-                */
-               if (unlikely(queued)) {
-                       ktime_t to = NSEC_PER_SEC / HZ;
-
-                       set_current_state(TASK_UNINTERRUPTIBLE);
-                       schedule_hrtimeout(&to, HRTIMER_MODE_REL_HARD);
-                       continue;
-               }
-
-               /*
-                * Ahh, all good. It wasn't running, and it wasn't
-                * runnable, which means that it will never become
-                * running in the future either. We're all done!
-                */
-               break;
-       }
-
-       return ncsw;
-}
-
 /***
  * kick_process - kick a running thread to enter/exit the kernel
  * @p: the to-be-kicked thread
@@ -4003,15 +4044,14 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
 static __always_inline
 bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success)
 {
+       int match;
+
        if (IS_ENABLED(CONFIG_DEBUG_PREEMPT)) {
                WARN_ON_ONCE((state & TASK_RTLOCK_WAIT) &&
                             state != TASK_RTLOCK_WAIT);
        }
 
-       if (READ_ONCE(p->__state) & state) {
-               *success = 1;
-               return true;
-       }
+       *success = !!(match = __task_state_match(p, state));
 
 #ifdef CONFIG_PREEMPT_RT
        /*
@@ -4027,12 +4067,10 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success)
         * p::saved_state to TASK_RUNNING so any further tests will
         * not result in false positives vs. @success
         */
-       if (p->saved_state & state) {
+       if (match < 0)
                p->saved_state = TASK_RUNNING;
-               *success = 1;
-       }
 #endif
-       return false;
+       return match > 0;
 }
 
 /*
@@ -5632,6 +5670,9 @@ void scheduler_tick(void)
 
        perf_event_task_tick();
 
+       if (curr->flags & PF_WQ_WORKER)
+               wq_worker_tick(curr);
+
 #ifdef CONFIG_SMP
        rq->idle_balance = idle_cpu(cpu);
        trigger_load_balance(rq);
@@ -7590,6 +7631,7 @@ static int __sched_setscheduler(struct task_struct *p,
        int reset_on_fork;
        int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK;
        struct rq *rq;
+       bool cpuset_locked = false;
 
        /* The pi code expects interrupts enabled */
        BUG_ON(pi && in_interrupt());
@@ -7639,8 +7681,14 @@ recheck:
                        return retval;
        }
 
-       if (pi)
-               cpuset_read_lock();
+       /*
+        * SCHED_DEADLINE bandwidth accounting relies on stable cpusets
+        * information.
+        */
+       if (dl_policy(policy) || dl_policy(p->policy)) {
+               cpuset_locked = true;
+               cpuset_lock();
+       }
 
        /*
         * Make sure no PI-waiters arrive (or leave) while we are
@@ -7716,8 +7764,8 @@ change:
        if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
                policy = oldpolicy = -1;
                task_rq_unlock(rq, p, &rf);
-               if (pi)
-                       cpuset_read_unlock();
+               if (cpuset_locked)
+                       cpuset_unlock();
                goto recheck;
        }
 
@@ -7784,7 +7832,8 @@ change:
        task_rq_unlock(rq, p, &rf);
 
        if (pi) {
-               cpuset_read_unlock();
+               if (cpuset_locked)
+                       cpuset_unlock();
                rt_mutex_adjust_pi(p);
        }
 
@@ -7796,8 +7845,8 @@ change:
 
 unlock:
        task_rq_unlock(rq, p, &rf);
-       if (pi)
-               cpuset_read_unlock();
+       if (cpuset_locked)
+               cpuset_unlock();
        return retval;
 }
 
@@ -9286,8 +9335,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
        return ret;
 }
 
-int task_can_attach(struct task_struct *p,
-                   const struct cpumask *cs_effective_cpus)
+int task_can_attach(struct task_struct *p)
 {
        int ret = 0;
 
@@ -9300,21 +9348,9 @@ int task_can_attach(struct task_struct *p,
         * success of set_cpus_allowed_ptr() on all attached tasks
         * before cpus_mask may be changed.
         */
-       if (p->flags & PF_NO_SETAFFINITY) {
+       if (p->flags & PF_NO_SETAFFINITY)
                ret = -EINVAL;
-               goto out;
-       }
-
-       if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
-                                             cs_effective_cpus)) {
-               int cpu = cpumask_any_and(cpu_active_mask, cs_effective_cpus);
-
-               if (unlikely(cpu >= nr_cpu_ids))
-                       return -EINVAL;
-               ret = dl_cpu_busy(cpu, p);
-       }
 
-out:
        return ret;
 }
 
@@ -9548,6 +9584,7 @@ void set_rq_offline(struct rq *rq)
        if (rq->online) {
                const struct sched_class *class;
 
+               update_rq_clock(rq);
                for_each_class(class) {
                        if (class->rq_offline)
                                class->rq_offline(rq);
@@ -9596,7 +9633,7 @@ static void cpuset_cpu_active(void)
 static int cpuset_cpu_inactive(unsigned int cpu)
 {
        if (!cpuhp_tasks_frozen) {
-               int ret = dl_cpu_busy(cpu, NULL);
+               int ret = dl_bw_check_overflow(cpu);
 
                if (ret)
                        return ret;
@@ -9689,7 +9726,6 @@ int sched_cpu_deactivate(unsigned int cpu)
 
        rq_lock_irqsave(rq, &rf);
        if (rq->rd) {
-               update_rq_clock(rq);
                BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
                set_rq_offline(rq);
        }
@@ -11492,7 +11528,7 @@ void call_trace_sched_update_nr_running(struct rq *rq, int count)
 
 #ifdef CONFIG_SCHED_MM_CID
 
-/**
+/*
  * @cid_lock: Guarantee forward-progress of cid allocation.
  *
  * Concurrency ID allocation within a bitmap is mostly lock-free. The cid_lock
@@ -11501,7 +11537,7 @@ void call_trace_sched_update_nr_running(struct rq *rq, int count)
  */
 DEFINE_RAW_SPINLOCK(cid_lock);
 
-/**
+/*
  * @use_cid_lock: Select cid allocation behavior: lock-free vs spinlock.
  *
  * When @use_cid_lock is 0, the cid allocation is lock-free. When contention is
index e321145..4492608 100644 (file)
@@ -155,10 +155,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 
 static void sugov_get_util(struct sugov_cpu *sg_cpu)
 {
+       unsigned long util = cpu_util_cfs_boost(sg_cpu->cpu);
        struct rq *rq = cpu_rq(sg_cpu->cpu);
 
        sg_cpu->bw_dl = cpu_bw_dl(rq);
-       sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu),
+       sg_cpu->util = effective_cpu_util(sg_cpu->cpu, util,
                                          FREQUENCY_UTIL, NULL);
 }
 
index 5a9a4b8..58b542b 100644 (file)
@@ -16,6 +16,8 @@
  *                    Fabio Checconi <fchecconi@gmail.com>
  */
 
+#include <linux/cpuset.h>
+
 /*
  * Default limits for DL period; on the top end we guard against small util
  * tasks still getting ridiculously long effective runtimes, on the bottom end we
@@ -489,13 +491,6 @@ static inline int is_leftmost(struct task_struct *p, struct dl_rq *dl_rq)
 
 static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq);
 
-void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime)
-{
-       raw_spin_lock_init(&dl_b->dl_runtime_lock);
-       dl_b->dl_period = period;
-       dl_b->dl_runtime = runtime;
-}
-
 void init_dl_bw(struct dl_bw *dl_b)
 {
        raw_spin_lock_init(&dl_b->lock);
@@ -1260,43 +1255,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
 }
 
 /*
- * This function implements the GRUB accounting rule:
- * according to the GRUB reclaiming algorithm, the runtime is
- * not decreased as "dq = -dt", but as
- * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
+ * This function implements the GRUB accounting rule. According to the
+ * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
+ * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
  * where u is the utilization of the task, Umax is the maximum reclaimable
  * utilization, Uinact is the (per-runqueue) inactive utilization, computed
  * as the difference between the "total runqueue utilization" and the
- * runqueue active utilization, and Uextra is the (per runqueue) extra
+ * "runqueue active utilization", and Uextra is the (per runqueue) extra
  * reclaimable utilization.
- * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
- * multiplied by 2^BW_SHIFT, the result has to be shifted right by
- * BW_SHIFT.
- * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
- * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
- * Since delta is a 64 bit variable, to have an overflow its value
- * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
- * So, overflow is not an issue here.
+ * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
+ * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
+ * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
+ * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
+ * Since delta is a 64 bit variable, to have an overflow its value should be
+ * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
+ * not an issue here.
  */
 static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
 {
-       u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
        u64 u_act;
-       u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
+       u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
 
        /*
-        * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
-        * we compare u_inact + rq->dl.extra_bw with
-        * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
-        * u_inact + rq->dl.extra_bw can be larger than
-        * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
-        * leading to wrong results)
+        * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
+        * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
+        * can be larger than u_max. So, u_max - u_inact - u_extra would be
+        * negative leading to wrong results.
         */
-       if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
-               u_act = u_act_min;
+       if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
+               u_act = dl_se->dl_bw;
        else
-               u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
+               u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
 
+       u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
        return (delta * u_act) >> BW_SHIFT;
 }
 
@@ -2596,6 +2587,12 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p)
        if (task_on_rq_queued(p) && p->dl.dl_runtime)
                task_non_contending(p);
 
+       /*
+        * In case a task is setscheduled out from SCHED_DEADLINE we need to
+        * keep track of that on its cpuset (for correct bandwidth tracking).
+        */
+       dec_dl_tasks_cs(p);
+
        if (!task_on_rq_queued(p)) {
                /*
                 * Inactive timer is armed. However, p is leaving DEADLINE and
@@ -2636,6 +2633,12 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p)
        if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1)
                put_task_struct(p);
 
+       /*
+        * In case a task is setscheduled to SCHED_DEADLINE we need to keep
+        * track of that on its cpuset (for correct bandwidth tracking).
+        */
+       inc_dl_tasks_cs(p);
+
        /* If p is not queued we will update its parameters at next wakeup. */
        if (!task_on_rq_queued(p)) {
                add_rq_bw(&p->dl, &rq->dl);
@@ -2795,12 +2798,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
 {
        if (global_rt_runtime() == RUNTIME_INF) {
                dl_rq->bw_ratio = 1 << RATIO_SHIFT;
-               dl_rq->extra_bw = 1 << BW_SHIFT;
+               dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
        } else {
                dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
                          global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
-               dl_rq->extra_bw = to_ratio(global_rt_period(),
-                                                   global_rt_runtime());
+               dl_rq->max_bw = dl_rq->extra_bw =
+                       to_ratio(global_rt_period(), global_rt_runtime());
        }
 }
 
@@ -3044,26 +3047,38 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
        return ret;
 }
 
-int dl_cpu_busy(int cpu, struct task_struct *p)
+enum dl_bw_request {
+       dl_bw_req_check_overflow = 0,
+       dl_bw_req_alloc,
+       dl_bw_req_free
+};
+
+static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
 {
-       unsigned long flags, cap;
+       unsigned long flags;
        struct dl_bw *dl_b;
-       bool overflow;
+       bool overflow = 0;
 
        rcu_read_lock_sched();
        dl_b = dl_bw_of(cpu);
        raw_spin_lock_irqsave(&dl_b->lock, flags);
-       cap = dl_bw_capacity(cpu);
-       overflow = __dl_overflow(dl_b, cap, 0, p ? p->dl.dl_bw : 0);
 
-       if (!overflow && p) {
-               /*
-                * We reserve space for this task in the destination
-                * root_domain, as we can't fail after this point.
-                * We will free resources in the source root_domain
-                * later on (see set_cpus_allowed_dl()).
-                */
-               __dl_add(dl_b, p->dl.dl_bw, dl_bw_cpus(cpu));
+       if (req == dl_bw_req_free) {
+               __dl_sub(dl_b, dl_bw, dl_bw_cpus(cpu));
+       } else {
+               unsigned long cap = dl_bw_capacity(cpu);
+
+               overflow = __dl_overflow(dl_b, cap, 0, dl_bw);
+
+               if (req == dl_bw_req_alloc && !overflow) {
+                       /*
+                        * We reserve space in the destination
+                        * root_domain, as we can't fail after this point.
+                        * We will free resources in the source root_domain
+                        * later on (see set_cpus_allowed_dl()).
+                        */
+                       __dl_add(dl_b, dl_bw, dl_bw_cpus(cpu));
+               }
        }
 
        raw_spin_unlock_irqrestore(&dl_b->lock, flags);
@@ -3071,6 +3086,21 @@ int dl_cpu_busy(int cpu, struct task_struct *p)
 
        return overflow ? -EBUSY : 0;
 }
+
+int dl_bw_check_overflow(int cpu)
+{
+       return dl_bw_manage(dl_bw_req_check_overflow, cpu, 0);
+}
+
+int dl_bw_alloc(int cpu, u64 dl_bw)
+{
+       return dl_bw_manage(dl_bw_req_alloc, cpu, dl_bw);
+}
+
+void dl_bw_free(int cpu, u64 dl_bw)
+{
+       dl_bw_manage(dl_bw_req_free, cpu, dl_bw);
+}
 #endif
 
 #ifdef CONFIG_SCHED_DEBUG
index 0b2340a..066ff1c 100644 (file)
@@ -777,7 +777,7 @@ static void print_cpu(struct seq_file *m, int cpu)
 #define P(x)                                                           \
 do {                                                                   \
        if (sizeof(rq->x) == 4)                                         \
-               SEQ_printf(m, "  .%-30s: %ld\n", #x, (long)(rq->x));    \
+               SEQ_printf(m, "  .%-30s: %d\n", #x, (int)(rq->x));      \
        else                                                            \
                SEQ_printf(m, "  .%-30s: %Ld\n", #x, (long long)(rq->x));\
 } while (0)
index 373ff5f..a80a739 100644 (file)
@@ -1064,6 +1064,23 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
  * Scheduling class queueing methods:
  */
 
+static inline bool is_core_idle(int cpu)
+{
+#ifdef CONFIG_SCHED_SMT
+       int sibling;
+
+       for_each_cpu(sibling, cpu_smt_mask(cpu)) {
+               if (cpu == sibling)
+                       continue;
+
+               if (!idle_cpu(sibling))
+                       return false;
+       }
+#endif
+
+       return true;
+}
+
 #ifdef CONFIG_NUMA
 #define NUMA_IMBALANCE_MIN 2
 
@@ -1700,23 +1717,6 @@ struct numa_stats {
        int idle_cpu;
 };
 
-static inline bool is_core_idle(int cpu)
-{
-#ifdef CONFIG_SCHED_SMT
-       int sibling;
-
-       for_each_cpu(sibling, cpu_smt_mask(cpu)) {
-               if (cpu == sibling)
-                       continue;
-
-               if (!idle_cpu(sibling))
-                       return false;
-       }
-#endif
-
-       return true;
-}
-
 struct task_numa_env {
        struct task_struct *p;
 
@@ -5577,6 +5577,14 @@ static void __cfsb_csd_unthrottle(void *arg)
        rq_lock(rq, &rf);
 
        /*
+        * Iterating over the list can trigger several call to
+        * update_rq_clock() in unthrottle_cfs_rq().
+        * Do it once and skip the potential next ones.
+        */
+       update_rq_clock(rq);
+       rq_clock_start_loop_update(rq);
+
+       /*
         * Since we hold rq lock we're safe from concurrent manipulation of
         * the CSD list. However, this RCU critical section annotates the
         * fact that we pair with sched_free_group_rcu(), so that we cannot
@@ -5595,6 +5603,7 @@ static void __cfsb_csd_unthrottle(void *arg)
 
        rcu_read_unlock();
 
+       rq_clock_stop_loop_update(rq);
        rq_unlock(rq, &rf);
 }
 
@@ -6115,6 +6124,13 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
 
        lockdep_assert_rq_held(rq);
 
+       /*
+        * The rq clock has already been updated in the
+        * set_rq_offline(), so we should skip updating
+        * the rq clock again in unthrottle_cfs_rq().
+        */
+       rq_clock_start_loop_update(rq);
+
        rcu_read_lock();
        list_for_each_entry_rcu(tg, &task_groups, list) {
                struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
@@ -6137,6 +6153,8 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
                        unthrottle_cfs_rq(cfs_rq);
        }
        rcu_read_unlock();
+
+       rq_clock_stop_loop_update(rq);
 }
 
 #else /* CONFIG_CFS_BANDWIDTH */
@@ -7202,14 +7220,58 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
        return target;
 }
 
-/*
- * Predicts what cpu_util(@cpu) would return if @p was removed from @cpu
- * (@dst_cpu = -1) or migrated to @dst_cpu.
- */
-static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
+/**
+ * cpu_util() - Estimates the amount of CPU capacity used by CFS tasks.
+ * @cpu: the CPU to get the utilization for
+ * @p: task for which the CPU utilization should be predicted or NULL
+ * @dst_cpu: CPU @p migrates to, -1 if @p moves from @cpu or @p == NULL
+ * @boost: 1 to enable boosting, otherwise 0
+ *
+ * The unit of the return value must be the same as the one of CPU capacity
+ * so that CPU utilization can be compared with CPU capacity.
+ *
+ * CPU utilization is the sum of running time of runnable tasks plus the
+ * recent utilization of currently non-runnable tasks on that CPU.
+ * It represents the amount of CPU capacity currently used by CFS tasks in
+ * the range [0..max CPU capacity] with max CPU capacity being the CPU
+ * capacity at f_max.
+ *
+ * The estimated CPU utilization is defined as the maximum between CPU
+ * utilization and sum of the estimated utilization of the currently
+ * runnable tasks on that CPU. It preserves a utilization "snapshot" of
+ * previously-executed tasks, which helps better deduce how busy a CPU will
+ * be when a long-sleeping task wakes up. The contribution to CPU utilization
+ * of such a task would be significantly decayed at this point of time.
+ *
+ * Boosted CPU utilization is defined as max(CPU runnable, CPU utilization).
+ * CPU contention for CFS tasks can be detected by CPU runnable > CPU
+ * utilization. Boosting is implemented in cpu_util() so that internal
+ * users (e.g. EAS) can use it next to external users (e.g. schedutil),
+ * latter via cpu_util_cfs_boost().
+ *
+ * CPU utilization can be higher than the current CPU capacity
+ * (f_curr/f_max * max CPU capacity) or even the max CPU capacity because
+ * of rounding errors as well as task migrations or wakeups of new tasks.
+ * CPU utilization has to be capped to fit into the [0..max CPU capacity]
+ * range. Otherwise a group of CPUs (CPU0 util = 121% + CPU1 util = 80%)
+ * could be seen as over-utilized even though CPU1 has 20% of spare CPU
+ * capacity. CPU utilization is allowed to overshoot current CPU capacity
+ * though since this is useful for predicting the CPU capacity required
+ * after task migrations (scheduler-driven DVFS).
+ *
+ * Return: (Boosted) (estimated) utilization for the specified CPU.
+ */
+static unsigned long
+cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost)
 {
        struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
        unsigned long util = READ_ONCE(cfs_rq->avg.util_avg);
+       unsigned long runnable;
+
+       if (boost) {
+               runnable = READ_ONCE(cfs_rq->avg.runnable_avg);
+               util = max(util, runnable);
+       }
 
        /*
         * If @dst_cpu is -1 or @p migrates from @cpu to @dst_cpu remove its
@@ -7217,9 +7279,9 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
         * contribution. In all the other cases @cpu is not impacted by the
         * migration so its util_avg is already correct.
         */
-       if (task_cpu(p) == cpu && dst_cpu != cpu)
+       if (p && task_cpu(p) == cpu && dst_cpu != cpu)
                lsub_positive(&util, task_util(p));
-       else if (task_cpu(p) != cpu && dst_cpu == cpu)
+       else if (p && task_cpu(p) != cpu && dst_cpu == cpu)
                util += task_util(p);
 
        if (sched_feat(UTIL_EST)) {
@@ -7227,6 +7289,9 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
 
                util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued);
 
+               if (boost)
+                       util_est = max(util_est, runnable);
+
                /*
                 * During wake-up @p isn't enqueued yet and doesn't contribute
                 * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued.
@@ -7255,7 +7320,7 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
                 */
                if (dst_cpu == cpu)
                        util_est += _task_util_est(p);
-               else if (unlikely(task_on_rq_queued(p) || current == p))
+               else if (p && unlikely(task_on_rq_queued(p) || current == p))
                        lsub_positive(&util_est, _task_util_est(p));
 
                util = max(util, util_est);
@@ -7264,6 +7329,16 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
        return min(util, capacity_orig_of(cpu));
 }
 
+unsigned long cpu_util_cfs(int cpu)
+{
+       return cpu_util(cpu, NULL, -1, 0);
+}
+
+unsigned long cpu_util_cfs_boost(int cpu)
+{
+       return cpu_util(cpu, NULL, -1, 1);
+}
+
 /*
  * cpu_util_without: compute cpu utilization without any contributions from *p
  * @cpu: the CPU which utilization is requested
@@ -7281,9 +7356,9 @@ static unsigned long cpu_util_without(int cpu, struct task_struct *p)
 {
        /* Task has no contribution or is new */
        if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
-               return cpu_util_cfs(cpu);
+               p = NULL;
 
-       return cpu_util_next(cpu, p, -1);
+       return cpu_util(cpu, p, -1, 0);
 }
 
 /*
@@ -7330,7 +7405,7 @@ static inline void eenv_task_busy_time(struct energy_env *eenv,
  * cpu_capacity.
  *
  * The contribution of the task @p for which we want to estimate the
- * energy cost is removed (by cpu_util_next()) and must be calculated
+ * energy cost is removed (by cpu_util()) and must be calculated
  * separately (see eenv_task_busy_time). This ensures:
  *
  *   - A stable PD utilization, no matter which CPU of that PD we want to place
@@ -7351,7 +7426,7 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
        int cpu;
 
        for_each_cpu(cpu, pd_cpus) {
-               unsigned long util = cpu_util_next(cpu, p, -1);
+               unsigned long util = cpu_util(cpu, p, -1, 0);
 
                busy_time += effective_cpu_util(cpu, util, ENERGY_UTIL, NULL);
        }
@@ -7375,8 +7450,8 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
 
        for_each_cpu(cpu, pd_cpus) {
                struct task_struct *tsk = (cpu == dst_cpu) ? p : NULL;
-               unsigned long util = cpu_util_next(cpu, p, dst_cpu);
-               unsigned long cpu_util;
+               unsigned long util = cpu_util(cpu, p, dst_cpu, 1);
+               unsigned long eff_util;
 
                /*
                 * Performance domain frequency: utilization clamping
@@ -7385,8 +7460,8 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
                 * NOTE: in case RT tasks are running, by default the
                 * FREQUENCY_UTIL's utilization can be max OPP.
                 */
-               cpu_util = effective_cpu_util(cpu, util, FREQUENCY_UTIL, tsk);
-               max_util = max(max_util, cpu_util);
+               eff_util = effective_cpu_util(cpu, util, FREQUENCY_UTIL, tsk);
+               max_util = max(max_util, eff_util);
        }
 
        return min(max_util, eenv->cpu_cap);
@@ -7521,7 +7596,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
                        if (!cpumask_test_cpu(cpu, p->cpus_ptr))
                                continue;
 
-                       util = cpu_util_next(cpu, p, cpu);
+                       util = cpu_util(cpu, p, cpu, 0);
                        cpu_cap = capacity_of(cpu);
 
                        /*
@@ -9331,96 +9406,61 @@ group_type group_classify(unsigned int imbalance_pct,
 }
 
 /**
- * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
- * @dst_cpu:   Destination CPU of the load balancing
+ * sched_use_asym_prio - Check whether asym_packing priority must be used
+ * @sd:                The scheduling domain of the load balancing
+ * @cpu:       A CPU
+ *
+ * Always use CPU priority when balancing load between SMT siblings. When
+ * balancing load between cores, it is not sufficient that @cpu is idle. Only
+ * use CPU priority if the whole core is idle.
+ *
+ * Returns: True if the priority of @cpu must be followed. False otherwise.
+ */
+static bool sched_use_asym_prio(struct sched_domain *sd, int cpu)
+{
+       if (!sched_smt_active())
+               return true;
+
+       return sd->flags & SD_SHARE_CPUCAPACITY || is_core_idle(cpu);
+}
+
+/**
+ * sched_asym - Check if the destination CPU can do asym_packing load balance
+ * @env:       The load balancing environment
  * @sds:       Load-balancing data with statistics of the local group
  * @sgs:       Load-balancing statistics of the candidate busiest group
- * @sg:                The candidate busiest group
+ * @group:     The candidate busiest group
  *
- * Check the state of the SMT siblings of both @sds::local and @sg and decide
- * if @dst_cpu can pull tasks.
+ * @env::dst_cpu can do asym_packing if it has higher priority than the
+ * preferred CPU of @group.
  *
- * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
- * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
- * only if @dst_cpu has higher priority.
+ * SMT is a special case. If we are balancing load between cores, @env::dst_cpu
+ * can do asym_packing balance only if all its SMT siblings are idle. Also, it
+ * can only do it if @group is an SMT group and has exactly on busy CPU. Larger
+ * imbalances in the number of CPUS are dealt with in find_busiest_group().
  *
- * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
- * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
- * Bigger imbalances in the number of busy CPUs will be dealt with in
- * update_sd_pick_busiest().
+ * If we are balancing load within an SMT core, or at DIE domain level, always
+ * proceed.
  *
- * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
- * of @dst_cpu are idle and @sg has lower priority.
- *
- * Return: true if @dst_cpu can pull tasks, false otherwise.
+ * Return: true if @env::dst_cpu can do with asym_packing load balance. False
+ * otherwise.
  */
-static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
-                                   struct sg_lb_stats *sgs,
-                                   struct sched_group *sg)
+static inline bool
+sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
+          struct sched_group *group)
 {
-#ifdef CONFIG_SCHED_SMT
-       bool local_is_smt, sg_is_smt;
-       int sg_busy_cpus;
-
-       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
-       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
-
-       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
-
-       if (!local_is_smt) {
-               /*
-                * If we are here, @dst_cpu is idle and does not have SMT
-                * siblings. Pull tasks if candidate group has two or more
-                * busy CPUs.
-                */
-               if (sg_busy_cpus >= 2) /* implies sg_is_smt */
-                       return true;
-
-               /*
-                * @dst_cpu does not have SMT siblings. @sg may have SMT
-                * siblings and only one is busy. In such case, @dst_cpu
-                * can help if it has higher priority and is idle (i.e.,
-                * it has no running tasks).
-                */
-               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
-       }
-
-       /* @dst_cpu has SMT siblings. */
-
-       if (sg_is_smt) {
-               int local_busy_cpus = sds->local->group_weight -
-                                     sds->local_stat.idle_cpus;
-               int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
-
-               if (busy_cpus_delta == 1)
-                       return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
-
+       /* Ensure that the whole local core is idle, if applicable. */
+       if (!sched_use_asym_prio(env->sd, env->dst_cpu))
                return false;
-       }
 
        /*
-        * @sg does not have SMT siblings. Ensure that @sds::local does not end
-        * up with more than one busy SMT sibling and only pull tasks if there
-        * are not busy CPUs (i.e., no CPU has running tasks).
+        * CPU priorities does not make sense for SMT cores with more than one
+        * busy sibling.
         */
-       if (!sds->local_stat.sum_nr_running)
-               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
-
-       return false;
-#else
-       /* Always return false so that callers deal with non-SMT cases. */
-       return false;
-#endif
-}
-
-static inline bool
-sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
-          struct sched_group *group)
-{
-       /* Only do SMT checks if either local or candidate have SMT siblings */
-       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
-           (group->flags & SD_SHARE_CPUCAPACITY))
-               return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
+       if (group->flags & SD_SHARE_CPUCAPACITY) {
+               if (sgs->group_weight - sgs->idle_cpus != 1)
+                       return false;
+       }
 
        return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
 }
@@ -9610,10 +9650,22 @@ static bool update_sd_pick_busiest(struct lb_env *env,
                 * contention when accessing shared HW resources.
                 *
                 * XXX for now avg_load is not computed and always 0 so we
-                * select the 1st one.
+                * select the 1st one, except if @sg is composed of SMT
+                * siblings.
                 */
-               if (sgs->avg_load <= busiest->avg_load)
+
+               if (sgs->avg_load < busiest->avg_load)
                        return false;
+
+               if (sgs->avg_load == busiest->avg_load) {
+                       /*
+                        * SMT sched groups need more help than non-SMT groups.
+                        * If @sg happens to also be SMT, either choice is good.
+                        */
+                       if (sds->busiest->flags & SD_SHARE_CPUCAPACITY)
+                               return false;
+               }
+
                break;
 
        case group_has_spare:
@@ -10088,7 +10140,6 @@ static void update_idle_cpu_scan(struct lb_env *env,
 
 static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds)
 {
-       struct sched_domain *child = env->sd->child;
        struct sched_group *sg = env->sd->groups;
        struct sg_lb_stats *local = &sds->local_stat;
        struct sg_lb_stats tmp_sgs;
@@ -10129,8 +10180,13 @@ next_group:
                sg = sg->next;
        } while (sg != env->sd->groups);
 
-       /* Tag domain that child domain prefers tasks go to siblings first */
-       sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;
+       /*
+        * Indicate that the child domain of the busiest group prefers tasks
+        * go to a child's sibling domains first. NB the flags of a sched group
+        * are those of the child domain.
+        */
+       if (sds->busiest)
+               sds->prefer_sibling = !!(sds->busiest->flags & SD_PREFER_SIBLING);
 
 
        if (env->sd->flags & SD_NUMA)
@@ -10440,7 +10496,10 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
                        goto out_balanced;
        }
 
-       /* Try to move all excess tasks to child's sibling domain */
+       /*
+        * Try to move all excess tasks to a sibling domain of the busiest
+        * group's child domain.
+        */
        if (sds.prefer_sibling && local->group_type == group_has_spare &&
            busiest->sum_nr_running > local->sum_nr_running + 1)
                goto force_balance;
@@ -10542,8 +10601,15 @@ static struct rq *find_busiest_queue(struct lb_env *env,
                    nr_running == 1)
                        continue;
 
-               /* Make sure we only pull tasks from a CPU of lower priority */
+               /*
+                * Make sure we only pull tasks from a CPU of lower priority
+                * when balancing between SMT siblings.
+                *
+                * If balancing between cores, let lower priority CPUs help
+                * SMT cores with more than one busy sibling.
+                */
                if ((env->sd->flags & SD_ASYM_PACKING) &&
+                   sched_use_asym_prio(env->sd, i) &&
                    sched_asym_prefer(i, env->dst_cpu) &&
                    nr_running == 1)
                        continue;
@@ -10581,7 +10647,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
                        break;
 
                case migrate_util:
-                       util = cpu_util_cfs(i);
+                       util = cpu_util_cfs_boost(i);
 
                        /*
                         * Don't try to pull utilization from a CPU with one
@@ -10632,12 +10698,19 @@ static inline bool
 asym_active_balance(struct lb_env *env)
 {
        /*
-        * ASYM_PACKING needs to force migrate tasks from busy but
-        * lower priority CPUs in order to pack all tasks in the
-        * highest priority CPUs.
+        * ASYM_PACKING needs to force migrate tasks from busy but lower
+        * priority CPUs in order to pack all tasks in the highest priority
+        * CPUs. When done between cores, do it only if the whole core if the
+        * whole core is idle.
+        *
+        * If @env::src_cpu is an SMT core with busy siblings, let
+        * the lower priority @env::dst_cpu help it. Do not follow
+        * CPU priority.
         */
        return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
-              sched_asym_prefer(env->dst_cpu, env->src_cpu);
+              sched_use_asym_prio(env->sd, env->dst_cpu) &&
+              (sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
+               !sched_use_asym_prio(env->sd, env->src_cpu));
 }
 
 static inline bool
@@ -10744,7 +10817,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
                .sd             = sd,
                .dst_cpu        = this_cpu,
                .dst_rq         = this_rq,
-               .dst_grpmask    = sched_group_span(sd->groups),
+               .dst_grpmask    = group_balance_mask(sd->groups),
                .idle           = idle,
                .loop_break     = SCHED_NR_MIGRATE_BREAK,
                .cpus           = cpus,
@@ -11371,9 +11444,13 @@ static void nohz_balancer_kick(struct rq *rq)
                 * When ASYM_PACKING; see if there's a more preferred CPU
                 * currently idle; in which case, kick the ILB to move tasks
                 * around.
+                *
+                * When balancing betwen cores, all the SMT siblings of the
+                * preferred CPU must be idle.
                 */
                for_each_cpu_and(i, sched_domain_span(sd), nohz.idle_cpus_mask) {
-                       if (sched_asym_prefer(i, cpu)) {
+                       if (sched_use_asym_prio(sd, i) &&
+                           sched_asym_prefer(i, cpu)) {
                                flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
                                goto unlock;
                        }
index e072f6b..81fca77 100644 (file)
@@ -160,7 +160,6 @@ __setup("psi=", setup_psi);
 #define EXP_300s       2034            /* 1/exp(2s/300s) */
 
 /* PSI trigger definitions */
-#define WINDOW_MIN_US 500000   /* Min window size is 500ms */
 #define WINDOW_MAX_US 10000000 /* Max window size is 10s */
 #define UPDATES_PER_WINDOW 10  /* 10 updates per window */
 
@@ -1305,8 +1304,7 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
        if (state >= PSI_NONIDLE)
                return ERR_PTR(-EINVAL);
 
-       if (window_us < WINDOW_MIN_US ||
-               window_us > WINDOW_MAX_US)
+       if (window_us == 0 || window_us > WINDOW_MAX_US)
                return ERR_PTR(-EINVAL);
 
        /*
@@ -1409,11 +1407,16 @@ void psi_trigger_destroy(struct psi_trigger *t)
                        group->rtpoll_nr_triggers[t->state]--;
                        if (!group->rtpoll_nr_triggers[t->state])
                                group->rtpoll_states &= ~(1 << t->state);
-                       /* reset min update period for the remaining triggers */
-                       list_for_each_entry(tmp, &group->rtpoll_triggers, node)
-                               period = min(period, div_u64(tmp->win.size,
-                                               UPDATES_PER_WINDOW));
-                       group->rtpoll_min_period = period;
+                       /*
+                        * Reset min update period for the remaining triggers
+                        * iff the destroying trigger had the min window size.
+                        */
+                       if (group->rtpoll_min_period == div_u64(t->win.size, UPDATES_PER_WINDOW)) {
+                               list_for_each_entry(tmp, &group->rtpoll_triggers, node)
+                                       period = min(period, div_u64(tmp->win.size,
+                                                       UPDATES_PER_WINDOW));
+                               group->rtpoll_min_period = period;
+                       }
                        /* Destroy rtpoll_task when the last trigger is destroyed */
                        if (group->rtpoll_states == 0) {
                                group->rtpoll_until = 0;
index ec7b3e0..e93e006 100644 (file)
@@ -286,12 +286,6 @@ struct rt_bandwidth {
 
 void __dl_clear_params(struct task_struct *p);
 
-struct dl_bandwidth {
-       raw_spinlock_t          dl_runtime_lock;
-       u64                     dl_runtime;
-       u64                     dl_period;
-};
-
 static inline int dl_bandwidth_enabled(void)
 {
        return sysctl_sched_rt_runtime >= 0;
@@ -330,7 +324,7 @@ extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr);
 extern bool __checkparam_dl(const struct sched_attr *attr);
 extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr);
 extern int  dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
-extern int  dl_cpu_busy(int cpu, struct task_struct *p);
+extern int  dl_bw_check_overflow(int cpu);
 
 #ifdef CONFIG_CGROUP_SCHED
 
@@ -754,6 +748,12 @@ struct dl_rq {
        u64                     extra_bw;
 
        /*
+        * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
+        * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
+        */
+       u64                     max_bw;
+
+       /*
         * Inverse of the fraction of CPU utilization that can be reclaimed
         * by the GRUB algorithm.
         */
@@ -1546,6 +1546,28 @@ static inline void rq_clock_cancel_skipupdate(struct rq *rq)
        rq->clock_update_flags &= ~RQCF_REQ_SKIP;
 }
 
+/*
+ * During cpu offlining and rq wide unthrottling, we can trigger
+ * an update_rq_clock() for several cfs and rt runqueues (Typically
+ * when using list_for_each_entry_*)
+ * rq_clock_start_loop_update() can be called after updating the clock
+ * once and before iterating over the list to prevent multiple update.
+ * After the iterative traversal, we need to call rq_clock_stop_loop_update()
+ * to clear RQCF_ACT_SKIP of rq->clock_update_flags.
+ */
+static inline void rq_clock_start_loop_update(struct rq *rq)
+{
+       lockdep_assert_rq_held(rq);
+       SCHED_WARN_ON(rq->clock_update_flags & RQCF_ACT_SKIP);
+       rq->clock_update_flags |= RQCF_ACT_SKIP;
+}
+
+static inline void rq_clock_stop_loop_update(struct rq *rq)
+{
+       lockdep_assert_rq_held(rq);
+       rq->clock_update_flags &= ~RQCF_ACT_SKIP;
+}
+
 struct rq_flags {
        unsigned long flags;
        struct pin_cookie cookie;
@@ -1772,6 +1794,13 @@ queue_balance_callback(struct rq *rq,
        for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \
                        __sd; __sd = __sd->parent)
 
+/* A mask of all the SD flags that have the SDF_SHARED_CHILD metaflag */
+#define SD_FLAG(name, mflags) (name * !!((mflags) & SDF_SHARED_CHILD)) |
+static const unsigned int SD_SHARED_CHILD_MASK =
+#include <linux/sched/sd_flags.h>
+0;
+#undef SD_FLAG
+
 /**
  * highest_flag_domain - Return highest sched_domain containing flag.
  * @cpu:       The CPU whose highest level of sched domain is to
@@ -1779,16 +1808,25 @@ queue_balance_callback(struct rq *rq,
  * @flag:      The flag to check for the highest sched_domain
  *             for the given CPU.
  *
- * Returns the highest sched_domain of a CPU which contains the given flag.
+ * Returns the highest sched_domain of a CPU which contains @flag. If @flag has
+ * the SDF_SHARED_CHILD metaflag, all the children domains also have @flag.
  */
 static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
 {
        struct sched_domain *sd, *hsd = NULL;
 
        for_each_domain(cpu, sd) {
-               if (!(sd->flags & flag))
+               if (sd->flags & flag) {
+                       hsd = sd;
+                       continue;
+               }
+
+               /*
+                * Stop the search if @flag is known to be shared at lower
+                * levels. It will not be found further up.
+                */
+               if (flag & SD_SHARED_CHILD_MASK)
                        break;
-               hsd = sd;
        }
 
        return hsd;
@@ -2378,7 +2416,6 @@ extern struct rt_bandwidth def_rt_bandwidth;
 extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime);
 extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq);
 
-extern void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime);
 extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
 extern void init_dl_inactive_task_timer(struct sched_dl_entity *dl_se);
 
@@ -2946,53 +2983,9 @@ static inline unsigned long cpu_util_dl(struct rq *rq)
        return READ_ONCE(rq->avg_dl.util_avg);
 }
 
-/**
- * cpu_util_cfs() - Estimates the amount of CPU capacity used by CFS tasks.
- * @cpu: the CPU to get the utilization for.
- *
- * The unit of the return value must be the same as the one of CPU capacity
- * so that CPU utilization can be compared with CPU capacity.
- *
- * CPU utilization is the sum of running time of runnable tasks plus the
- * recent utilization of currently non-runnable tasks on that CPU.
- * It represents the amount of CPU capacity currently used by CFS tasks in
- * the range [0..max CPU capacity] with max CPU capacity being the CPU
- * capacity at f_max.
- *
- * The estimated CPU utilization is defined as the maximum between CPU
- * utilization and sum of the estimated utilization of the currently
- * runnable tasks on that CPU. It preserves a utilization "snapshot" of
- * previously-executed tasks, which helps better deduce how busy a CPU will
- * be when a long-sleeping task wakes up. The contribution to CPU utilization
- * of such a task would be significantly decayed at this point of time.
- *
- * CPU utilization can be higher than the current CPU capacity
- * (f_curr/f_max * max CPU capacity) or even the max CPU capacity because
- * of rounding errors as well as task migrations or wakeups of new tasks.
- * CPU utilization has to be capped to fit into the [0..max CPU capacity]
- * range. Otherwise a group of CPUs (CPU0 util = 121% + CPU1 util = 80%)
- * could be seen as over-utilized even though CPU1 has 20% of spare CPU
- * capacity. CPU utilization is allowed to overshoot current CPU capacity
- * though since this is useful for predicting the CPU capacity required
- * after task migrations (scheduler-driven DVFS).
- *
- * Return: (Estimated) utilization for the specified CPU.
- */
-static inline unsigned long cpu_util_cfs(int cpu)
-{
-       struct cfs_rq *cfs_rq;
-       unsigned long util;
-
-       cfs_rq = &cpu_rq(cpu)->cfs;
-       util = READ_ONCE(cfs_rq->avg.util_avg);
-
-       if (sched_feat(UTIL_EST)) {
-               util = max_t(unsigned long, util,
-                            READ_ONCE(cfs_rq->avg.util_est.enqueued));
-       }
 
-       return min(util, capacity_orig_of(cpu));
-}
+extern unsigned long cpu_util_cfs(int cpu);
+extern unsigned long cpu_util_cfs_boost(int cpu);
 
 static inline unsigned long cpu_util_rt(struct rq *rq)
 {
index 6682535..d3a3b26 100644 (file)
@@ -487,9 +487,9 @@ static void free_rootdomain(struct rcu_head *rcu)
 void rq_attach_root(struct rq *rq, struct root_domain *rd)
 {
        struct root_domain *old_rd = NULL;
-       unsigned long flags;
+       struct rq_flags rf;
 
-       raw_spin_rq_lock_irqsave(rq, flags);
+       rq_lock_irqsave(rq, &rf);
 
        if (rq->rd) {
                old_rd = rq->rd;
@@ -515,7 +515,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
        if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
                set_rq_online(rq);
 
-       raw_spin_rq_unlock_irqrestore(rq, flags);
+       rq_unlock_irqrestore(rq, &rf);
 
        if (old_rd)
                call_rcu(&old_rd->rcu, free_rootdomain);
@@ -719,8 +719,13 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
 
                if (sd_parent_degenerate(tmp, parent)) {
                        tmp->parent = parent->parent;
-                       if (parent->parent)
+
+                       if (parent->parent) {
                                parent->parent->child = tmp;
+                               if (tmp->flags & SD_SHARE_CPUCAPACITY)
+                                       parent->parent->groups->flags |= SD_SHARE_CPUCAPACITY;
+                       }
+
                        /*
                         * Transfer SD_PREFER_SIBLING down in case of a
                         * degenerate parent; the spans match for this
@@ -1676,7 +1681,7 @@ static struct sched_domain_topology_level *sched_domain_topology_saved;
 #define for_each_sd_topology(tl)                       \
        for (tl = sched_domain_topology; tl->mask; tl++)
 
-void set_sched_topology(struct sched_domain_topology_level *tl)
+void __init set_sched_topology(struct sched_domain_topology_level *tl)
 {
        if (WARN_ON_ONCE(sched_smp_initialized))
                return;
index 133b747..48c53e4 100644 (file)
@@ -425,11 +425,6 @@ int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, i
 }
 EXPORT_SYMBOL(autoremove_wake_function);
 
-static inline bool is_kthread_should_stop(void)
-{
-       return (current->flags & PF_KTHREAD) && kthread_should_stop();
-}
-
 /*
  * DEFINE_WAIT_FUNC(wait, woken_wake_func);
  *
@@ -459,7 +454,7 @@ long wait_woken(struct wait_queue_entry *wq_entry, unsigned mode, long timeout)
         * or woken_wake_function() sees our store to current->state.
         */
        set_current_state(mode); /* A */
-       if (!(wq_entry->flags & WQ_FLAG_WOKEN) && !is_kthread_should_stop())
+       if (!(wq_entry->flags & WQ_FLAG_WOKEN) && !kthread_should_stop_or_park())
                timeout = schedule_timeout(timeout);
        __set_current_state(TASK_RUNNING);
 
index 8f6330f..2547fa7 100644 (file)
@@ -1368,7 +1368,9 @@ int zap_other_threads(struct task_struct *p)
 
        while_each_thread(p, t) {
                task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
-               count++;
+               /* Don't require de_thread to wait for the vhost_worker */
+               if ((t->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER)
+                       count++;
 
                /* Don't bother with already dead threads */
                if (t->exit_state)
@@ -2861,11 +2863,11 @@ relock:
                }
 
                /*
-                * PF_IO_WORKER threads will catch and exit on fatal signals
+                * PF_USER_WORKER threads will catch and exit on fatal signals
                 * themselves. They have cleanup that must be performed, so
                 * we cannot call do_exit() on their behalf.
                 */
-               if (current->flags & PF_IO_WORKER)
+               if (current->flags & PF_USER_WORKER)
                        goto out;
 
                /*
index ab3e5da..385179d 100644 (file)
@@ -27,6 +27,9 @@
 #include <linux/jump_label.h>
 
 #include <trace/events/ipi.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/csd.h>
+#undef CREATE_TRACE_POINTS
 
 #include "smpboot.h"
 #include "sched/smp.h"
@@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask)
        arch_send_call_function_ipi_mask(mask);
 }
 
+static __always_inline void
+csd_do_func(smp_call_func_t func, void *info, struct __call_single_data *csd)
+{
+       trace_csd_function_entry(func, csd);
+       func(info);
+       trace_csd_function_exit(func, csd);
+}
+
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
 static DEFINE_STATIC_KEY_MAYBE(CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT, csdlock_debug_enabled);
@@ -329,7 +340,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
         * even if we haven't sent the smp_call IPI yet (e.g. the stopper
         * executes migration_cpu_stop() on the remote CPU).
         */
-       if (trace_ipi_send_cpu_enabled()) {
+       if (trace_csd_queue_cpu_enabled()) {
                call_single_data_t *csd;
                smp_call_func_t func;
 
@@ -337,7 +348,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
                func = CSD_TYPE(csd) == CSD_TYPE_TTWU ?
                        sched_ttwu_pending : csd->func;
 
-               trace_ipi_send_cpu(cpu, _RET_IP_, func);
+               trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
        }
 
        /*
@@ -375,7 +386,7 @@ static int generic_exec_single(int cpu, struct __call_single_data *csd)
                csd_lock_record(csd);
                csd_unlock(csd);
                local_irq_save(flags);
-               func(info);
+               csd_do_func(func, info, NULL);
                csd_lock_record(NULL);
                local_irq_restore(flags);
                return 0;
@@ -477,7 +488,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
                        }
 
                        csd_lock_record(csd);
-                       func(info);
+                       csd_do_func(func, info, csd);
                        csd_unlock(csd);
                        csd_lock_record(NULL);
                } else {
@@ -508,7 +519,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
 
                                csd_lock_record(csd);
                                csd_unlock(csd);
-                               func(info);
+                               csd_do_func(func, info, csd);
                                csd_lock_record(NULL);
                        } else if (type == CSD_TYPE_IRQ_WORK) {
                                irq_work_single(csd);
@@ -522,8 +533,10 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
        /*
         * Third; only CSD_TYPE_TTWU is left, issue those.
         */
-       if (entry)
-               sched_ttwu_pending(entry);
+       if (entry) {
+               csd = llist_entry(entry, typeof(*csd), node.llist);
+               csd_do_func(sched_ttwu_pending, entry, csd);
+       }
 }
 
 
@@ -728,7 +741,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
        int cpu, last_cpu, this_cpu = smp_processor_id();
        struct call_function_data *cfd;
        bool wait = scf_flags & SCF_WAIT;
-       int nr_cpus = 0, nr_queued = 0;
+       int nr_cpus = 0;
        bool run_remote = false;
        bool run_local = false;
 
@@ -786,22 +799,16 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
                        csd->node.src = smp_processor_id();
                        csd->node.dst = cpu;
 #endif
+                       trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
+
                        if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) {
                                __cpumask_set_cpu(cpu, cfd->cpumask_ipi);
                                nr_cpus++;
                                last_cpu = cpu;
                        }
-                       nr_queued++;
                }
 
                /*
-                * Trace each smp_function_call_*() as an IPI, actual IPIs
-                * will be traced with func==generic_smp_call_function_single_ipi().
-                */
-               if (nr_queued)
-                       trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func);
-
-               /*
                 * Choose the most efficient way to send an IPI. Note that the
                 * number of CPUs might be zero due to concurrent changes to the
                 * provided mask.
@@ -816,7 +823,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
                unsigned long flags;
 
                local_irq_save(flags);
-               func(info);
+               csd_do_func(func, info, NULL);
                local_irq_restore(flags);
        }
 
@@ -892,7 +899,7 @@ EXPORT_SYMBOL(setup_max_cpus);
  * SMP mode to <NUM>.
  */
 
-void __weak arch_disable_smp_support(void) { }
+void __weak __init arch_disable_smp_support(void) { }
 
 static int __init nosmp(char *str)
 {
index 2c7396d..f47d8f3 100644 (file)
@@ -325,166 +325,3 @@ void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread)
        cpus_read_unlock();
 }
 EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
-
-static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
-
-/*
- * Called to poll specified CPU's state, for example, when waiting for
- * a CPU to come online.
- */
-int cpu_report_state(int cpu)
-{
-       return atomic_read(&per_cpu(cpu_hotplug_state, cpu));
-}
-
-/*
- * If CPU has died properly, set its state to CPU_UP_PREPARE and
- * return success.  Otherwise, return -EBUSY if the CPU died after
- * cpu_wait_death() timed out.  And yet otherwise again, return -EAGAIN
- * if cpu_wait_death() timed out and the CPU still hasn't gotten around
- * to dying.  In the latter two cases, the CPU might not be set up
- * properly, but it is up to the arch-specific code to decide.
- * Finally, -EIO indicates an unanticipated problem.
- *
- * Note that it is permissible to omit this call entirely, as is
- * done in architectures that do no CPU-hotplug error checking.
- */
-int cpu_check_up_prepare(int cpu)
-{
-       if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
-               atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
-               return 0;
-       }
-
-       switch (atomic_read(&per_cpu(cpu_hotplug_state, cpu))) {
-
-       case CPU_POST_DEAD:
-
-               /* The CPU died properly, so just start it up again. */
-               atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
-               return 0;
-
-       case CPU_DEAD_FROZEN:
-
-               /*
-                * Timeout during CPU death, so let caller know.
-                * The outgoing CPU completed its processing, but after
-                * cpu_wait_death() timed out and reported the error. The
-                * caller is free to proceed, in which case the state
-                * will be reset properly by cpu_set_state_online().
-                * Proceeding despite this -EBUSY return makes sense
-                * for systems where the outgoing CPUs take themselves
-                * offline, with no post-death manipulation required from
-                * a surviving CPU.
-                */
-               return -EBUSY;
-
-       case CPU_BROKEN:
-
-               /*
-                * The most likely reason we got here is that there was
-                * a timeout during CPU death, and the outgoing CPU never
-                * did complete its processing.  This could happen on
-                * a virtualized system if the outgoing VCPU gets preempted
-                * for more than five seconds, and the user attempts to
-                * immediately online that same CPU.  Trying again later
-                * might return -EBUSY above, hence -EAGAIN.
-                */
-               return -EAGAIN;
-
-       case CPU_UP_PREPARE:
-               /*
-                * Timeout while waiting for the CPU to show up. Allow to try
-                * again later.
-                */
-               return 0;
-
-       default:
-
-               /* Should not happen.  Famous last words. */
-               return -EIO;
-       }
-}
-
-/*
- * Mark the specified CPU online.
- *
- * Note that it is permissible to omit this call entirely, as is
- * done in architectures that do no CPU-hotplug error checking.
- */
-void cpu_set_state_online(int cpu)
-{
-       (void)atomic_xchg(&per_cpu(cpu_hotplug_state, cpu), CPU_ONLINE);
-}
-
-#ifdef CONFIG_HOTPLUG_CPU
-
-/*
- * Wait for the specified CPU to exit the idle loop and die.
- */
-bool cpu_wait_death(unsigned int cpu, int seconds)
-{
-       int jf_left = seconds * HZ;
-       int oldstate;
-       bool ret = true;
-       int sleep_jf = 1;
-
-       might_sleep();
-
-       /* The outgoing CPU will normally get done quite quickly. */
-       if (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) == CPU_DEAD)
-               goto update_state_early;
-       udelay(5);
-
-       /* But if the outgoing CPU dawdles, wait increasingly long times. */
-       while (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) != CPU_DEAD) {
-               schedule_timeout_uninterruptible(sleep_jf);
-               jf_left -= sleep_jf;
-               if (jf_left <= 0)
-                       break;
-               sleep_jf = DIV_ROUND_UP(sleep_jf * 11, 10);
-       }
-update_state_early:
-       oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
-update_state:
-       if (oldstate == CPU_DEAD) {
-               /* Outgoing CPU died normally, update state. */
-               smp_mb(); /* atomic_read() before update. */
-               atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_POST_DEAD);
-       } else {
-               /* Outgoing CPU still hasn't died, set state accordingly. */
-               if (!atomic_try_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
-                                       &oldstate, CPU_BROKEN))
-                       goto update_state;
-               ret = false;
-       }
-       return ret;
-}
-
-/*
- * Called by the outgoing CPU to report its successful death.  Return
- * false if this report follows the surviving CPU's timing out.
- *
- * A separate "CPU_DEAD_FROZEN" is used when the surviving CPU
- * timed out.  This approach allows architectures to omit calls to
- * cpu_check_up_prepare() and cpu_set_state_online() without defeating
- * the next cpu_wait_death()'s polling loop.
- */
-bool cpu_report_death(void)
-{
-       int oldstate;
-       int newstate;
-       int cpu = smp_processor_id();
-
-       oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
-       do {
-               if (oldstate != CPU_BROKEN)
-                       newstate = CPU_DEAD;
-               else
-                       newstate = CPU_DEAD_FROZEN;
-       } while (!atomic_try_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
-                                    &oldstate, newstate));
-       return newstate == CPU_DEAD;
-}
-
-#endif /* #ifdef CONFIG_HOTPLUG_CPU */
index 1b72551..807b34c 100644 (file)
@@ -80,21 +80,6 @@ static void wakeup_softirqd(void)
                wake_up_process(tsk);
 }
 
-/*
- * If ksoftirqd is scheduled, we do not want to process pending softirqs
- * right now. Let ksoftirqd handle this at its own rate, to get fairness,
- * unless we're doing some of the synchronous softirqs.
- */
-#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
-static bool ksoftirqd_running(unsigned long pending)
-{
-       struct task_struct *tsk = __this_cpu_read(ksoftirqd);
-
-       if (pending & SOFTIRQ_NOW_MASK)
-               return false;
-       return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
-}
-
 #ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_PER_CPU(int, hardirqs_enabled);
 DEFINE_PER_CPU(int, hardirq_context);
@@ -236,7 +221,7 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
                goto out;
 
        pending = local_softirq_pending();
-       if (!pending || ksoftirqd_running(pending))
+       if (!pending)
                goto out;
 
        /*
@@ -432,9 +417,6 @@ static inline bool should_wake_ksoftirqd(void)
 
 static inline void invoke_softirq(void)
 {
-       if (ksoftirqd_running(local_softirq_pending()))
-               return;
-
        if (!force_irqthreads() || !__this_cpu_read(ksoftirqd)) {
 #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
                /*
@@ -468,7 +450,7 @@ asmlinkage __visible void do_softirq(void)
 
        pending = local_softirq_pending();
 
-       if (pending && !ksoftirqd_running(pending))
+       if (pending)
                do_softirq_own_stack();
 
        local_irq_restore(flags);
index 82b28ab..8d9f13d 100644 (file)
@@ -751,7 +751,7 @@ static int alarm_timer_create(struct k_itimer *new_timer)
 static enum alarmtimer_restart alarmtimer_nsleep_wakeup(struct alarm *alarm,
                                                                ktime_t now)
 {
-       struct task_struct *task = (struct task_struct *)alarm->data;
+       struct task_struct *task = alarm->data;
 
        alarm->data = NULL;
        if (task)
@@ -847,7 +847,7 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
        struct restart_block *restart = &current->restart_block;
        struct alarm alarm;
        ktime_t exp;
-       int ret = 0;
+       int ret;
 
        if (!alarmtimer_get_rtcdev())
                return -EOPNOTSUPP;
index e8c0829..238262e 100644 (file)
@@ -164,6 +164,7 @@ static inline bool is_migration_base(struct hrtimer_clock_base *base)
 static
 struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
                                             unsigned long *flags)
+       __acquires(&timer->base->lock)
 {
        struct hrtimer_clock_base *base;
 
@@ -280,6 +281,7 @@ static inline bool is_migration_base(struct hrtimer_clock_base *base)
 
 static inline struct hrtimer_clock_base *
 lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
+       __acquires(&timer->base->cpu_base->lock)
 {
        struct hrtimer_clock_base *base = timer->base;
 
@@ -1013,6 +1015,7 @@ void hrtimers_resume_local(void)
  */
 static inline
 void unlock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
+       __releases(&timer->base->cpu_base->lock)
 {
        raw_spin_unlock_irqrestore(&timer->base->cpu_base->lock, *flags);
 }
index 808a247..b924f0f 100644 (file)
 #include "timekeeping.h"
 #include "posix-timers.h"
 
-/*
- * Management arrays for POSIX timers. Timers are now kept in static hash table
- * with 512 entries.
- * Timer ids are allocated by local routine, which selects proper hash head by
- * key, constructed from current->signal address and per signal struct counter.
- * This keeps timer ids unique per process, but now they can intersect between
- * processes.
- */
+static struct kmem_cache *posix_timers_cache;
 
 /*
- * Lets keep our timers in a slab cache :-)
+ * Timers are managed in a hash table for lockless lookup. The hash key is
+ * constructed from current::signal and the timer ID and the timer is
+ * matched against current::signal and the timer ID when walking the hash
+ * bucket list.
+ *
+ * This allows checkpoint/restore to reconstruct the exact timer IDs for
+ * a process.
  */
-static struct kmem_cache *posix_timers_cache;
-
 static DEFINE_HASHTABLE(posix_timers_hashtable, 9);
 static DEFINE_SPINLOCK(hash_lock);
 
@@ -56,52 +53,12 @@ static const struct k_clock * const posix_clocks[];
 static const struct k_clock *clockid_to_kclock(const clockid_t id);
 static const struct k_clock clock_realtime, clock_monotonic;
 
-/*
- * we assume that the new SIGEV_THREAD_ID shares no bits with the other
- * SIGEV values.  Here we put out an error if this assumption fails.
- */
+/* SIGEV_THREAD_ID cannot share a bit with the other SIGEV values. */
 #if SIGEV_THREAD_ID != (SIGEV_THREAD_ID & \
-                       ~(SIGEV_SIGNAL | SIGEV_NONE | SIGEV_THREAD))
+                       ~(SIGEV_SIGNAL | SIGEV_NONE | SIGEV_THREAD))
 #error "SIGEV_THREAD_ID must not share bit with other SIGEV values!"
 #endif
 
-/*
- * The timer ID is turned into a timer address by idr_find().
- * Verifying a valid ID consists of:
- *
- * a) checking that idr_find() returns other than -1.
- * b) checking that the timer id matches the one in the timer itself.
- * c) that the timer owner is in the callers thread group.
- */
-
-/*
- * CLOCKs: The POSIX standard calls for a couple of clocks and allows us
- *         to implement others.  This structure defines the various
- *         clocks.
- *
- * RESOLUTION: Clock resolution is used to round up timer and interval
- *         times, NOT to report clock times, which are reported with as
- *         much resolution as the system can muster.  In some cases this
- *         resolution may depend on the underlying clock hardware and
- *         may not be quantifiable until run time, and only then is the
- *         necessary code is written.  The standard says we should say
- *         something about this issue in the documentation...
- *
- * FUNCTIONS: The CLOCKs structure defines possible functions to
- *         handle various clock functions.
- *
- *         The standard POSIX timer management code assumes the
- *         following: 1.) The k_itimer struct (sched.h) is used for
- *         the timer.  2.) The list, it_lock, it_clock, it_id and
- *         it_pid fields are not modified by timer code.
- *
- * Permissions: It is assumed that the clock_settime() function defined
- *         for each clock will take care of permission checks.  Some
- *         clocks may be set able by any user (i.e. local process
- *         clocks) others not.  Currently the only set able clock we
- *         have is CLOCK_REALTIME and its high res counter part, both of
- *         which we beg off on and pass to do_sys_settimeofday().
- */
 static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flags);
 
 #define lock_timer(tid, flags)                                            \
@@ -121,9 +78,9 @@ static struct k_itimer *__posix_timers_find(struct hlist_head *head,
 {
        struct k_itimer *timer;
 
-       hlist_for_each_entry_rcu(timer, head, t_hash,
-                                lockdep_is_held(&hash_lock)) {
-               if ((timer->it_signal == sig) && (timer->it_id == id))
+       hlist_for_each_entry_rcu(timer, head, t_hash, lockdep_is_held(&hash_lock)) {
+               /* timer->it_signal can be set concurrently */
+               if ((READ_ONCE(timer->it_signal) == sig) && (timer->it_id == id))
                        return timer;
        }
        return NULL;
@@ -140,25 +97,30 @@ static struct k_itimer *posix_timer_by_id(timer_t id)
 static int posix_timer_add(struct k_itimer *timer)
 {
        struct signal_struct *sig = current->signal;
-       int first_free_id = sig->posix_timer_id;
        struct hlist_head *head;
-       int ret = -ENOENT;
+       unsigned int cnt, id;
 
-       do {
+       /*
+        * FIXME: Replace this by a per signal struct xarray once there is
+        * a plan to handle the resulting CRIU regression gracefully.
+        */
+       for (cnt = 0; cnt <= INT_MAX; cnt++) {
                spin_lock(&hash_lock);
-               head = &posix_timers_hashtable[hash(sig, sig->posix_timer_id)];
-               if (!__posix_timers_find(head, sig, sig->posix_timer_id)) {
+               id = sig->next_posix_timer_id;
+
+               /* Write the next ID back. Clamp it to the positive space */
+               sig->next_posix_timer_id = (id + 1) & INT_MAX;
+
+               head = &posix_timers_hashtable[hash(sig, id)];
+               if (!__posix_timers_find(head, sig, id)) {
                        hlist_add_head_rcu(&timer->t_hash, head);
-                       ret = sig->posix_timer_id;
+                       spin_unlock(&hash_lock);
+                       return id;
                }
-               if (++sig->posix_timer_id < 0)
-                       sig->posix_timer_id = 0;
-               if ((sig->posix_timer_id == first_free_id) && (ret == -ENOENT))
-                       /* Loop over all possible ids completed */
-                       ret = -EAGAIN;
                spin_unlock(&hash_lock);
-       } while (ret == -ENOENT);
-       return ret;
+       }
+       /* POSIX return code when no timer ID could be allocated */
+       return -EAGAIN;
 }
 
 static inline void unlock_timer(struct k_itimer *timr, unsigned long flags)
@@ -166,7 +128,6 @@ static inline void unlock_timer(struct k_itimer *timr, unsigned long flags)
        spin_unlock_irqrestore(&timr->it_lock, flags);
 }
 
-/* Get clock_realtime */
 static int posix_get_realtime_timespec(clockid_t which_clock, struct timespec64 *tp)
 {
        ktime_get_real_ts64(tp);
@@ -178,7 +139,6 @@ static ktime_t posix_get_realtime_ktime(clockid_t which_clock)
        return ktime_get_real();
 }
 
-/* Set clock_realtime */
 static int posix_clock_realtime_set(const clockid_t which_clock,
                                    const struct timespec64 *tp)
 {
@@ -191,9 +151,6 @@ static int posix_clock_realtime_adj(const clockid_t which_clock,
        return do_adjtimex(t);
 }
 
-/*
- * Get monotonic time for posix timers
- */
 static int posix_get_monotonic_timespec(clockid_t which_clock, struct timespec64 *tp)
 {
        ktime_get_ts64(tp);
@@ -206,9 +163,6 @@ static ktime_t posix_get_monotonic_ktime(clockid_t which_clock)
        return ktime_get();
 }
 
-/*
- * Get monotonic-raw time for posix timers
- */
 static int posix_get_monotonic_raw(clockid_t which_clock, struct timespec64 *tp)
 {
        ktime_get_raw_ts64(tp);
@@ -216,7 +170,6 @@ static int posix_get_monotonic_raw(clockid_t which_clock, struct timespec64 *tp)
        return 0;
 }
 
-
 static int posix_get_realtime_coarse(clockid_t which_clock, struct timespec64 *tp)
 {
        ktime_get_coarse_real_ts64(tp);
@@ -267,9 +220,6 @@ static int posix_get_hrtimer_res(clockid_t which_clock, struct timespec64 *tp)
        return 0;
 }
 
-/*
- * Initialize everything, well, just everything in Posix clocks/timers ;)
- */
 static __init int init_posix_timers(void)
 {
        posix_timers_cache = kmem_cache_create("posix_timers_cache",
@@ -300,15 +250,9 @@ static void common_hrtimer_rearm(struct k_itimer *timr)
 }
 
 /*
- * This function is exported for use by the signal deliver code.  It is
- * called just prior to the info block being released and passes that
- * block to us.  It's function is to update the overrun entry AND to
- * restart the timer.  It should only be called if the timer is to be
- * restarted (i.e. we have flagged this in the sys_private entry of the
- * info block).
- *
- * To protect against the timer going away while the interrupt is queued,
- * we require that the it_requeue_pending flag be set.
+ * This function is called from the signal delivery code if
+ * info->si_sys_private is not zero, which indicates that the timer has to
+ * be rearmed. Restart the timer and update info::si_overrun.
  */
 void posixtimer_rearm(struct kernel_siginfo *info)
 {
@@ -357,18 +301,18 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
 }
 
 /*
- * This function gets called when a POSIX.1b interval timer expires.  It
- * is used as a callback from the kernel internal timer.  The
- * run_timer_list code ALWAYS calls with interrupts on.
-
- * This code is for CLOCK_REALTIME* and CLOCK_MONOTONIC* timers.
+ * This function gets called when a POSIX.1b interval timer expires from
+ * the HRTIMER interrupt (soft interrupt on RT kernels).
+ *
+ * Handles CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME and CLOCK_TAI
+ * based timers.
  */
 static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
 {
+       enum hrtimer_restart ret = HRTIMER_NORESTART;
        struct k_itimer *timr;
        unsigned long flags;
        int si_private = 0;
-       enum hrtimer_restart ret = HRTIMER_NORESTART;
 
        timr = container_of(timer, struct k_itimer, it.real.timer);
        spin_lock_irqsave(&timr->it_lock, flags);
@@ -379,9 +323,10 @@ static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
 
        if (posix_timer_event(timr, si_private)) {
                /*
-                * signal was not sent because of sig_ignor
-                * we will not get a call back to restart it AND
-                * it should be restarted.
+                * The signal was not queued due to SIG_IGN. As a
+                * consequence the timer is not going to be rearmed from
+                * the signal delivery path. But as a real signal handler
+                * can be installed later the timer must be rearmed here.
                 */
                if (timr->it_interval != 0) {
                        ktime_t now = hrtimer_cb_get_time(timer);
@@ -390,34 +335,35 @@ static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
                         * FIXME: What we really want, is to stop this
                         * timer completely and restart it in case the
                         * SIG_IGN is removed. This is a non trivial
-                        * change which involves sighand locking
-                        * (sigh !), which we don't want to do late in
-                        * the release cycle.
+                        * change to the signal handling code.
+                        *
+                        * For now let timers with an interval less than a
+                        * jiffie expire every jiffie and recheck for a
+                        * valid signal handler.
+                        *
+                        * This avoids interrupt starvation in case of a
+                        * very small interval, which would expire the
+                        * timer immediately again.
+                        *
+                        * Moving now ahead of time by one jiffie tricks
+                        * hrtimer_forward() to expire the timer later,
+                        * while it still maintains the overrun accuracy
+                        * for the price of a slight inconsistency in the
+                        * timer_gettime() case. This is at least better
+                        * than a timer storm.
                         *
-                        * For now we just let timers with an interval
-                        * less than a jiffie expire every jiffie to
-                        * avoid softirq starvation in case of SIG_IGN
-                        * and a very small interval, which would put
-                        * the timer right back on the softirq pending
-                        * list. By moving now ahead of time we trick
-                        * hrtimer_forward() to expire the timer
-                        * later, while we still maintain the overrun
-                        * accuracy, but have some inconsistency in
-                        * the timer_gettime() case. This is at least
-                        * better than a starved softirq. A more
-                        * complex fix which solves also another related
-                        * inconsistency is already in the pipeline.
+                        * Only required when high resolution timers are
+                        * enabled as the periodic tick based timers are
+                        * automatically aligned to the next tick.
                         */
-#ifdef CONFIG_HIGH_RES_TIMERS
-                       {
-                               ktime_t kj = NSEC_PER_SEC / HZ;
+                       if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS)) {
+                               ktime_t kj = TICK_NSEC;
 
                                if (timr->it_interval < kj)
                                        now = ktime_add(now, kj);
                        }
-#endif
-                       timr->it_overrun += hrtimer_forward(timer, now,
-                                                           timr->it_interval);
+
+                       timr->it_overrun += hrtimer_forward(timer, now, timr->it_interval);
                        ret = HRTIMER_RESTART;
                        ++timr->it_requeue_pending;
                        timr->it_active = 1;
@@ -454,8 +400,8 @@ static struct pid *good_sigevent(sigevent_t * event)
 
 static struct k_itimer * alloc_posix_timer(void)
 {
-       struct k_itimer *tmr;
-       tmr = kmem_cache_zalloc(posix_timers_cache, GFP_KERNEL);
+       struct k_itimer *tmr = kmem_cache_zalloc(posix_timers_cache, GFP_KERNEL);
+
        if (!tmr)
                return tmr;
        if (unlikely(!(tmr->sigq = sigqueue_alloc()))) {
@@ -473,21 +419,21 @@ static void k_itimer_rcu_free(struct rcu_head *head)
        kmem_cache_free(posix_timers_cache, tmr);
 }
 
-#define IT_ID_SET      1
-#define IT_ID_NOT_SET  0
-static void release_posix_timer(struct k_itimer *tmr, int it_id_set)
+static void posix_timer_free(struct k_itimer *tmr)
 {
-       if (it_id_set) {
-               unsigned long flags;
-               spin_lock_irqsave(&hash_lock, flags);
-               hlist_del_rcu(&tmr->t_hash);
-               spin_unlock_irqrestore(&hash_lock, flags);
-       }
        put_pid(tmr->it_pid);
        sigqueue_free(tmr->sigq);
        call_rcu(&tmr->rcu, k_itimer_rcu_free);
 }
 
+static void posix_timer_unhash_and_free(struct k_itimer *tmr)
+{
+       spin_lock(&hash_lock);
+       hlist_del_rcu(&tmr->t_hash);
+       spin_unlock(&hash_lock);
+       posix_timer_free(tmr);
+}
+
 static int common_timer_create(struct k_itimer *new_timer)
 {
        hrtimer_init(&new_timer->it.real.timer, new_timer->it_clock, 0);
@@ -501,7 +447,6 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
        const struct k_clock *kc = clockid_to_kclock(which_clock);
        struct k_itimer *new_timer;
        int error, new_timer_id;
-       int it_id_set = IT_ID_NOT_SET;
 
        if (!kc)
                return -EINVAL;
@@ -513,13 +458,18 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
                return -EAGAIN;
 
        spin_lock_init(&new_timer->it_lock);
+
+       /*
+        * Add the timer to the hash table. The timer is not yet valid
+        * because new_timer::it_signal is still NULL. The timer id is also
+        * not yet visible to user space.
+        */
        new_timer_id = posix_timer_add(new_timer);
        if (new_timer_id < 0) {
-               error = new_timer_id;
-               goto out;
+               posix_timer_free(new_timer);
+               return new_timer_id;
        }
 
-       it_id_set = IT_ID_SET;
        new_timer->it_id = (timer_t) new_timer_id;
        new_timer->it_clock = which_clock;
        new_timer->kclock = kc;
@@ -547,30 +497,33 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
        new_timer->sigq->info.si_tid   = new_timer->it_id;
        new_timer->sigq->info.si_code  = SI_TIMER;
 
-       if (copy_to_user(created_timer_id,
-                        &new_timer_id, sizeof (new_timer_id))) {
+       if (copy_to_user(created_timer_id, &new_timer_id, sizeof (new_timer_id))) {
                error = -EFAULT;
                goto out;
        }
-
+       /*
+        * After succesful copy out, the timer ID is visible to user space
+        * now but not yet valid because new_timer::signal is still NULL.
+        *
+        * Complete the initialization with the clock specific create
+        * callback.
+        */
        error = kc->timer_create(new_timer);
        if (error)
                goto out;
 
        spin_lock_irq(&current->sighand->siglock);
-       new_timer->it_signal = current->signal;
+       /* This makes the timer valid in the hash table */
+       WRITE_ONCE(new_timer->it_signal, current->signal);
        list_add(&new_timer->list, &current->signal->posix_timers);
        spin_unlock_irq(&current->sighand->siglock);
-
-       return 0;
        /*
-        * In the case of the timer belonging to another task, after
-        * the task is unlocked, the timer is owned by the other task
-        * and may cease to exist at any time.  Don't use or modify
-        * new_timer after the unlock call.
+        * After unlocking sighand::siglock @new_timer is subject to
+        * concurrent removal and cannot be touched anymore
         */
+       return 0;
 out:
-       release_posix_timer(new_timer, it_id_set);
+       posix_timer_unhash_and_free(new_timer);
        return error;
 }
 
@@ -604,13 +557,6 @@ COMPAT_SYSCALL_DEFINE3(timer_create, clockid_t, which_clock,
 }
 #endif
 
-/*
- * Locking issues: We need to protect the result of the id look up until
- * we get the timer locked down so it is not deleted under us.  The
- * removal is done under the idr spinlock so we use that here to bridge
- * the find to the timer lock.  To avoid a dead lock, the timer id MUST
- * be release with out holding the timer lock.
- */
 static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flags)
 {
        struct k_itimer *timr;
@@ -622,10 +568,35 @@ static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flags)
        if ((unsigned long long)timer_id > INT_MAX)
                return NULL;
 
+       /*
+        * The hash lookup and the timers are RCU protected.
+        *
+        * Timers are added to the hash in invalid state where
+        * timr::it_signal == NULL. timer::it_signal is only set after the
+        * rest of the initialization succeeded.
+        *
+        * Timer destruction happens in steps:
+        *  1) Set timr::it_signal to NULL with timr::it_lock held
+        *  2) Release timr::it_lock
+        *  3) Remove from the hash under hash_lock
+        *  4) Call RCU for removal after the grace period
+        *
+        * Holding rcu_read_lock() accross the lookup ensures that
+        * the timer cannot be freed.
+        *
+        * The lookup validates locklessly that timr::it_signal ==
+        * current::it_signal and timr::it_id == @timer_id. timr::it_id
+        * can't change, but timr::it_signal becomes NULL during
+        * destruction.
+        */
        rcu_read_lock();
        timr = posix_timer_by_id(timer_id);
        if (timr) {
                spin_lock_irqsave(&timr->it_lock, *flags);
+               /*
+                * Validate under timr::it_lock that timr::it_signal is
+                * still valid. Pairs with #1 above.
+                */
                if (timr->it_signal == current->signal) {
                        rcu_read_unlock();
                        return timr;
@@ -652,20 +623,16 @@ static s64 common_hrtimer_forward(struct k_itimer *timr, ktime_t now)
 }
 
 /*
- * Get the time remaining on a POSIX.1b interval timer.  This function
- * is ALWAYS called with spin_lock_irq on the timer, thus it must not
- * mess with irq.
+ * Get the time remaining on a POSIX.1b interval timer.
  *
- * We have a couple of messes to clean up here.  First there is the case
- * of a timer that has a requeue pending.  These timers should appear to
- * be in the timer list with an expiry as if we were to requeue them
- * now.
+ * Two issues to handle here:
  *
- * The second issue is the SIGEV_NONE timer which may be active but is
- * not really ever put in the timer list (to save system resources).
- * This timer may be expired, and if so, we will do it here.  Otherwise
- * it is the same as a requeue pending timer WRT to what we should
- * report.
+ *  1) The timer has a requeue pending. The return value must appear as
+ *     if the timer has been requeued right now.
+ *
+ *  2) The timer is a SIGEV_NONE timer. These timers are never enqueued
+ *     into the hrtimer queue and therefore never expired. Emulate expiry
+ *     here taking #1 into account.
  */
 void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
 {
@@ -681,8 +648,12 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
                cur_setting->it_interval = ktime_to_timespec64(iv);
        } else if (!timr->it_active) {
                /*
-                * SIGEV_NONE oneshot timers are never queued. Check them
-                * below.
+                * SIGEV_NONE oneshot timers are never queued and therefore
+                * timr->it_active is always false. The check below
+                * vs. remaining time will handle this case.
+                *
+                * For all other timers there is nothing to update here, so
+                * return.
                 */
                if (!sig_none)
                        return;
@@ -691,18 +662,29 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
        now = kc->clock_get_ktime(timr->it_clock);
 
        /*
-        * When a requeue is pending or this is a SIGEV_NONE timer move the
-        * expiry time forward by intervals, so expiry is > now.
+        * If this is an interval timer and either has requeue pending or
+        * is a SIGEV_NONE timer move the expiry time forward by intervals,
+        * so expiry is > now.
         */
        if (iv && (timr->it_requeue_pending & REQUEUE_PENDING || sig_none))
                timr->it_overrun += kc->timer_forward(timr, now);
 
        remaining = kc->timer_remaining(timr, now);
-       /* Return 0 only, when the timer is expired and not pending */
+       /*
+        * As @now is retrieved before a possible timer_forward() and
+        * cannot be reevaluated by the compiler @remaining is based on the
+        * same @now value. Therefore @remaining is consistent vs. @now.
+        *
+        * Consequently all interval timers, i.e. @iv > 0, cannot have a
+        * remaining time <= 0 because timer_forward() guarantees to move
+        * them forward so that the next timer expiry is > @now.
+        */
        if (remaining <= 0) {
                /*
-                * A single shot SIGEV_NONE timer must return 0, when
-                * it is expired !
+                * A single shot SIGEV_NONE timer must return 0, when it is
+                * expired! Timers which have a real signal delivery mode
+                * must return a remaining time greater than 0 because the
+                * signal has not yet been delivered.
                 */
                if (!sig_none)
                        cur_setting->it_value.tv_nsec = 1;
@@ -711,11 +693,10 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
        }
 }
 
-/* Get the time remaining on a POSIX.1b interval timer. */
 static int do_timer_gettime(timer_t timer_id,  struct itimerspec64 *setting)
 {
-       struct k_itimer *timr;
        const struct k_clock *kc;
+       struct k_itimer *timr;
        unsigned long flags;
        int ret = 0;
 
@@ -765,20 +746,29 @@ SYSCALL_DEFINE2(timer_gettime32, timer_t, timer_id,
 
 #endif
 
-/*
- * Get the number of overruns of a POSIX.1b interval timer.  This is to
- * be the overrun of the timer last delivered.  At the same time we are
- * accumulating overruns on the next timer.  The overrun is frozen when
- * the signal is delivered, either at the notify time (if the info block
- * is not queued) or at the actual delivery time (as we are informed by
- * the call back to posixtimer_rearm().  So all we need to do is
- * to pick up the frozen overrun.
+/**
+ * sys_timer_getoverrun - Get the number of overruns of a POSIX.1b interval timer
+ * @timer_id:  The timer ID which identifies the timer
+ *
+ * The "overrun count" of a timer is one plus the number of expiration
+ * intervals which have elapsed between the first expiry, which queues the
+ * signal and the actual signal delivery. On signal delivery the "overrun
+ * count" is calculated and cached, so it can be returned directly here.
+ *
+ * As this is relative to the last queued signal the returned overrun count
+ * is meaningless outside of the signal delivery path and even there it
+ * does not accurately reflect the current state when user space evaluates
+ * it.
+ *
+ * Returns:
+ *     -EINVAL         @timer_id is invalid
+ *     1..INT_MAX      The number of overruns related to the last delivered signal
  */
 SYSCALL_DEFINE1(timer_getoverrun, timer_t, timer_id)
 {
        struct k_itimer *timr;
-       int overrun;
        unsigned long flags;
+       int overrun;
 
        timr = lock_timer(timer_id, &flags);
        if (!timr)
@@ -831,10 +821,18 @@ static void common_timer_wait_running(struct k_itimer *timer)
 }
 
 /*
- * On PREEMPT_RT this prevent priority inversion against softirq kthread in
- * case it gets preempted while executing a timer callback. See comments in
- * hrtimer_cancel_wait_running. For PREEMPT_RT=n this just results in a
- * cpu_relax().
+ * On PREEMPT_RT this prevents priority inversion and a potential livelock
+ * against the ksoftirqd thread in case that ksoftirqd gets preempted while
+ * executing a hrtimer callback.
+ *
+ * See the comments in hrtimer_cancel_wait_running(). For PREEMPT_RT=n this
+ * just results in a cpu_relax().
+ *
+ * For POSIX CPU timers with CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n this is
+ * just a cpu_relax(). With CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y this
+ * prevents spinning on an eventually scheduled out task and a livelock
+ * when the task which tries to delete or disarm the timer has preempted
+ * the task which runs the expiry in task work context.
  */
 static struct k_itimer *timer_wait_running(struct k_itimer *timer,
                                           unsigned long *flags)
@@ -943,8 +941,7 @@ SYSCALL_DEFINE4(timer_settime, timer_t, timer_id, int, flags,
                const struct __kernel_itimerspec __user *, new_setting,
                struct __kernel_itimerspec __user *, old_setting)
 {
-       struct itimerspec64 new_spec, old_spec;
-       struct itimerspec64 *rtn = old_setting ? &old_spec : NULL;
+       struct itimerspec64 new_spec, old_spec, *rtn;
        int error = 0;
 
        if (!new_setting)
@@ -953,6 +950,7 @@ SYSCALL_DEFINE4(timer_settime, timer_t, timer_id, int, flags,
        if (get_itimerspec64(&new_spec, new_setting))
                return -EFAULT;
 
+       rtn = old_setting ? &old_spec : NULL;
        error = do_timer_settime(timer_id, flags, &new_spec, rtn);
        if (!error && old_setting) {
                if (put_itimerspec64(&old_spec, old_setting))
@@ -1026,38 +1024,71 @@ retry_delete:
        list_del(&timer->list);
        spin_unlock(&current->sighand->siglock);
        /*
-        * This keeps any tasks waiting on the spin lock from thinking
-        * they got something (see the lock code above).
+        * A concurrent lookup could check timer::it_signal lockless. It
+        * will reevaluate with timer::it_lock held and observe the NULL.
         */
-       timer->it_signal = NULL;
+       WRITE_ONCE(timer->it_signal, NULL);
 
        unlock_timer(timer, flags);
-       release_posix_timer(timer, IT_ID_SET);
+       posix_timer_unhash_and_free(timer);
        return 0;
 }
 
 /*
- * return timer owned by the process, used by exit_itimers
+ * Delete a timer if it is armed, remove it from the hash and schedule it
+ * for RCU freeing.
  */
 static void itimer_delete(struct k_itimer *timer)
 {
-retry_delete:
-       spin_lock_irq(&timer->it_lock);
+       unsigned long flags;
 
+       /*
+        * irqsave is required to make timer_wait_running() work.
+        */
+       spin_lock_irqsave(&timer->it_lock, flags);
+
+retry_delete:
+       /*
+        * Even if the timer is not longer accessible from other tasks
+        * it still might be armed and queued in the underlying timer
+        * mechanism. Worse, that timer mechanism might run the expiry
+        * function concurrently.
+        */
        if (timer_delete_hook(timer) == TIMER_RETRY) {
-               spin_unlock_irq(&timer->it_lock);
+               /*
+                * Timer is expired concurrently, prevent livelocks
+                * and pointless spinning on RT.
+                *
+                * timer_wait_running() drops timer::it_lock, which opens
+                * the possibility for another task to delete the timer.
+                *
+                * That's not possible here because this is invoked from
+                * do_exit() only for the last thread of the thread group.
+                * So no other task can access and delete that timer.
+                */
+               if (WARN_ON_ONCE(timer_wait_running(timer, &flags) != timer))
+                       return;
+
                goto retry_delete;
        }
        list_del(&timer->list);
 
-       spin_unlock_irq(&timer->it_lock);
-       release_posix_timer(timer, IT_ID_SET);
+       /*
+        * Setting timer::it_signal to NULL is technically not required
+        * here as nothing can access the timer anymore legitimately via
+        * the hash table. Set it to NULL nevertheless so that all deletion
+        * paths are consistent.
+        */
+       WRITE_ONCE(timer->it_signal, NULL);
+
+       spin_unlock_irqrestore(&timer->it_lock, flags);
+       posix_timer_unhash_and_free(timer);
 }
 
 /*
- * This is called by do_exit or de_thread, only when nobody else can
- * modify the signal->posix_timers list. Yet we need sighand->siglock
- * to prevent the race with /proc/pid/timers.
+ * Invoked from do_exit() when the last thread of a thread group exits.
+ * At that point no other task can access the timers of the dying
+ * task anymore.
  */
 void exit_itimers(struct task_struct *tsk)
 {
@@ -1067,10 +1098,12 @@ void exit_itimers(struct task_struct *tsk)
        if (list_empty(&tsk->signal->posix_timers))
                return;
 
+       /* Protect against concurrent read via /proc/$PID/timers */
        spin_lock_irq(&tsk->sighand->siglock);
        list_replace_init(&tsk->signal->posix_timers, &timers);
        spin_unlock_irq(&tsk->sighand->siglock);
 
+       /* The timers are not longer accessible via tsk::signal */
        while (!list_empty(&timers)) {
                tmr = list_first_entry(&timers, struct k_itimer, list);
                itimer_delete(tmr);
@@ -1089,6 +1122,10 @@ SYSCALL_DEFINE2(clock_settime, const clockid_t, which_clock,
        if (get_timespec64(&new_tp, tp))
                return -EFAULT;
 
+       /*
+        * Permission checks have to be done inside the clock specific
+        * setter callback.
+        */
        return kc->clock_set(which_clock, &new_tp);
 }
 
@@ -1139,6 +1176,79 @@ SYSCALL_DEFINE2(clock_adjtime, const clockid_t, which_clock,
        return err;
 }
 
+/**
+ * sys_clock_getres - Get the resolution of a clock
+ * @which_clock:       The clock to get the resolution for
+ * @tp:                        Pointer to a a user space timespec64 for storage
+ *
+ * POSIX defines:
+ *
+ * "The clock_getres() function shall return the resolution of any
+ * clock. Clock resolutions are implementation-defined and cannot be set by
+ * a process. If the argument res is not NULL, the resolution of the
+ * specified clock shall be stored in the location pointed to by res. If
+ * res is NULL, the clock resolution is not returned. If the time argument
+ * of clock_settime() is not a multiple of res, then the value is truncated
+ * to a multiple of res."
+ *
+ * Due to the various hardware constraints the real resolution can vary
+ * wildly and even change during runtime when the underlying devices are
+ * replaced. The kernel also can use hardware devices with different
+ * resolutions for reading the time and for arming timers.
+ *
+ * The kernel therefore deviates from the POSIX spec in various aspects:
+ *
+ * 1) The resolution returned to user space
+ *
+ *    For CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME, CLOCK_TAI,
+ *    CLOCK_REALTIME_ALARM, CLOCK_BOOTTIME_ALAREM and CLOCK_MONOTONIC_RAW
+ *    the kernel differentiates only two cases:
+ *
+ *    I)  Low resolution mode:
+ *
+ *       When high resolution timers are disabled at compile or runtime
+ *       the resolution returned is nanoseconds per tick, which represents
+ *       the precision at which timers expire.
+ *
+ *    II) High resolution mode:
+ *
+ *       When high resolution timers are enabled the resolution returned
+ *       is always one nanosecond independent of the actual resolution of
+ *       the underlying hardware devices.
+ *
+ *       For CLOCK_*_ALARM the actual resolution depends on system
+ *       state. When system is running the resolution is the same as the
+ *       resolution of the other clocks. During suspend the actual
+ *       resolution is the resolution of the underlying RTC device which
+ *       might be way less precise than the clockevent device used during
+ *       running state.
+ *
+ *   For CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE the resolution
+ *   returned is always nanoseconds per tick.
+ *
+ *   For CLOCK_PROCESS_CPUTIME and CLOCK_THREAD_CPUTIME the resolution
+ *   returned is always one nanosecond under the assumption that the
+ *   underlying scheduler clock has a better resolution than nanoseconds
+ *   per tick.
+ *
+ *   For dynamic POSIX clocks (PTP devices) the resolution returned is
+ *   always one nanosecond.
+ *
+ * 2) Affect on sys_clock_settime()
+ *
+ *    The kernel does not truncate the time which is handed in to
+ *    sys_clock_settime(). The kernel internal timekeeping is always using
+ *    nanoseconds precision independent of the clocksource device which is
+ *    used to read the time from. The resolution of that device only
+ *    affects the presicion of the time returned by sys_clock_gettime().
+ *
+ * Returns:
+ *     0               Success. @tp contains the resolution
+ *     -EINVAL         @which_clock is not a valid clock ID
+ *     -EFAULT         Copying the resolution to @tp faulted
+ *     -ENODEV         Dynamic POSIX clock is not backed by a device
+ *     -EOPNOTSUPP     Dynamic POSIX clock does not support getres()
+ */
 SYSCALL_DEFINE2(clock_getres, const clockid_t, which_clock,
                struct __kernel_timespec __user *, tp)
 {
@@ -1230,7 +1340,7 @@ SYSCALL_DEFINE2(clock_getres_time32, clockid_t, which_clock,
 #endif
 
 /*
- * nanosleep for monotonic and realtime clocks
+ * sys_clock_nanosleep() for CLOCK_REALTIME and CLOCK_TAI
  */
 static int common_nsleep(const clockid_t which_clock, int flags,
                         const struct timespec64 *rqtp)
@@ -1242,8 +1352,13 @@ static int common_nsleep(const clockid_t which_clock, int flags,
                                 which_clock);
 }
 
+/*
+ * sys_clock_nanosleep() for CLOCK_MONOTONIC and CLOCK_BOOTTIME
+ *
+ * Absolute nanosleeps for these clocks are time-namespace adjusted.
+ */
 static int common_nsleep_timens(const clockid_t which_clock, int flags,
-                        const struct timespec64 *rqtp)
+                               const struct timespec64 *rqtp)
 {
        ktime_t texp = timespec64_to_ktime(*rqtp);
 
index 8464c5a..68d6c11 100644 (file)
@@ -64,7 +64,7 @@ static struct clock_data cd ____cacheline_aligned = {
        .actual_read_sched_clock = jiffy_sched_clock_read,
 };
 
-static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
+static __always_inline u64 cyc_to_ns(u64 cyc, u32 mult, u32 shift)
 {
        return (cyc * mult) >> shift;
 }
@@ -77,26 +77,36 @@ notrace struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
 
 notrace int sched_clock_read_retry(unsigned int seq)
 {
-       return read_seqcount_latch_retry(&cd.seq, seq);
+       return raw_read_seqcount_latch_retry(&cd.seq, seq);
 }
 
-unsigned long long notrace sched_clock(void)
+unsigned long long noinstr sched_clock_noinstr(void)
 {
-       u64 cyc, res;
-       unsigned int seq;
        struct clock_read_data *rd;
+       unsigned int seq;
+       u64 cyc, res;
 
        do {
-               rd = sched_clock_read_begin(&seq);
+               seq = raw_read_seqcount_latch(&cd.seq);
+               rd = cd.read_data + (seq & 1);
 
                cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
                      rd->sched_clock_mask;
                res = rd->epoch_ns + cyc_to_ns(cyc, rd->mult, rd->shift);
-       } while (sched_clock_read_retry(seq));
+       } while (raw_read_seqcount_latch_retry(&cd.seq, seq));
 
        return res;
 }
 
+unsigned long long notrace sched_clock(void)
+{
+       unsigned long long ns;
+       preempt_disable_notrace();
+       ns = sched_clock_noinstr();
+       preempt_enable_notrace();
+       return ns;
+}
+
 /*
  * Updating the data required to read the clock.
  *
index 93bf2b4..771d1e0 100644 (file)
@@ -35,14 +35,15 @@ static __cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
 #ifdef CONFIG_TICK_ONESHOT
 static DEFINE_PER_CPU(struct clock_event_device *, tick_oneshot_wakeup_device);
 
-static void tick_broadcast_setup_oneshot(struct clock_event_device *bc);
+static void tick_broadcast_setup_oneshot(struct clock_event_device *bc, bool from_periodic);
 static void tick_broadcast_clear_oneshot(int cpu);
 static void tick_resume_broadcast_oneshot(struct clock_event_device *bc);
 # ifdef CONFIG_HOTPLUG_CPU
 static void tick_broadcast_oneshot_offline(unsigned int cpu);
 # endif
 #else
-static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc) { BUG(); }
+static inline void
+tick_broadcast_setup_oneshot(struct clock_event_device *bc, bool from_periodic) { BUG(); }
 static inline void tick_broadcast_clear_oneshot(int cpu) { }
 static inline void tick_resume_broadcast_oneshot(struct clock_event_device *bc) { }
 # ifdef CONFIG_HOTPLUG_CPU
@@ -264,7 +265,7 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
                if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
                        tick_broadcast_start_periodic(bc);
                else
-                       tick_broadcast_setup_oneshot(bc);
+                       tick_broadcast_setup_oneshot(bc, false);
                ret = 1;
        } else {
                /*
@@ -500,7 +501,7 @@ void tick_broadcast_control(enum tick_broadcast_mode mode)
                        if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
                                tick_broadcast_start_periodic(bc);
                        else
-                               tick_broadcast_setup_oneshot(bc);
+                               tick_broadcast_setup_oneshot(bc, false);
                }
        }
 out:
@@ -1020,48 +1021,101 @@ static inline ktime_t tick_get_next_period(void)
 /**
  * tick_broadcast_setup_oneshot - setup the broadcast device
  */
-static void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
+static void tick_broadcast_setup_oneshot(struct clock_event_device *bc,
+                                        bool from_periodic)
 {
        int cpu = smp_processor_id();
+       ktime_t nexttick = 0;
 
        if (!bc)
                return;
 
-       /* Set it up only once ! */
-       if (bc->event_handler != tick_handle_oneshot_broadcast) {
-               int was_periodic = clockevent_state_periodic(bc);
-
-               bc->event_handler = tick_handle_oneshot_broadcast;
-
+       /*
+        * When the broadcast device was switched to oneshot by the first
+        * CPU handling the NOHZ change, the other CPUs will reach this
+        * code via hrtimer_run_queues() -> tick_check_oneshot_change()
+        * too. Set up the broadcast device only once!
+        */
+       if (bc->event_handler == tick_handle_oneshot_broadcast) {
                /*
-                * We must be careful here. There might be other CPUs
-                * waiting for periodic broadcast. We need to set the
-                * oneshot_mask bits for those and program the
-                * broadcast device to fire.
+                * The CPU which switched from periodic to oneshot mode
+                * set the broadcast oneshot bit for all other CPUs which
+                * are in the general (periodic) broadcast mask to ensure
+                * that CPUs which wait for the periodic broadcast are
+                * woken up.
+                *
+                * Clear the bit for the local CPU as the set bit would
+                * prevent the first tick_broadcast_enter() after this CPU
+                * switched to oneshot state to program the broadcast
+                * device.
+                *
+                * This code can also be reached via tick_broadcast_control(),
+                * but this cannot avoid the tick_broadcast_clear_oneshot()
+                * as that would break the periodic to oneshot transition of
+                * secondary CPUs. But that's harmless as the below only
+                * clears already cleared bits.
                 */
+               tick_broadcast_clear_oneshot(cpu);
+               return;
+       }
+
+
+       bc->event_handler = tick_handle_oneshot_broadcast;
+       bc->next_event = KTIME_MAX;
+
+       /*
+        * When the tick mode is switched from periodic to oneshot it must
+        * be ensured that CPUs which are waiting for periodic broadcast
+        * get their wake-up at the next tick.  This is achieved by ORing
+        * tick_broadcast_mask into tick_broadcast_oneshot_mask.
+        *
+        * For other callers, e.g. broadcast device replacement,
+        * tick_broadcast_oneshot_mask must not be touched as this would
+        * set bits for CPUs which are already NOHZ, but not idle. Their
+        * next tick_broadcast_enter() would observe the bit set and fail
+        * to update the expiry time and the broadcast event device.
+        */
+       if (from_periodic) {
                cpumask_copy(tmpmask, tick_broadcast_mask);
+               /* Remove the local CPU as it is obviously not idle */
                cpumask_clear_cpu(cpu, tmpmask);
-               cpumask_or(tick_broadcast_oneshot_mask,
-                          tick_broadcast_oneshot_mask, tmpmask);
+               cpumask_or(tick_broadcast_oneshot_mask, tick_broadcast_oneshot_mask, tmpmask);
 
-               if (was_periodic && !cpumask_empty(tmpmask)) {
-                       ktime_t nextevt = tick_get_next_period();
+               /*
+                * Ensure that the oneshot broadcast handler will wake the
+                * CPUs which are still waiting for periodic broadcast.
+                */
+               nexttick = tick_get_next_period();
+               tick_broadcast_init_next_event(tmpmask, nexttick);
 
-                       clockevents_switch_state(bc, CLOCK_EVT_STATE_ONESHOT);
-                       tick_broadcast_init_next_event(tmpmask, nextevt);
-                       tick_broadcast_set_event(bc, cpu, nextevt);
-               } else
-                       bc->next_event = KTIME_MAX;
-       } else {
                /*
-                * The first cpu which switches to oneshot mode sets
-                * the bit for all other cpus which are in the general
-                * (periodic) broadcast mask. So the bit is set and
-                * would prevent the first broadcast enter after this
-                * to program the bc device.
+                * If the underlying broadcast clock event device is
+                * already in oneshot state, then there is nothing to do.
+                * The device was already armed for the next tick
+                * in tick_handle_broadcast_periodic()
                 */
-               tick_broadcast_clear_oneshot(cpu);
+               if (clockevent_state_oneshot(bc))
+                       return;
        }
+
+       /*
+        * When switching from periodic to oneshot mode arm the broadcast
+        * device for the next tick.
+        *
+        * If the broadcast device has been replaced in oneshot mode and
+        * the oneshot broadcast mask is not empty, then arm it to expire
+        * immediately in order to reevaluate the next expiring timer.
+        * @nexttick is 0 and therefore in the past which will cause the
+        * clockevent code to force an event.
+        *
+        * For both cases the programming can be avoided when the oneshot
+        * broadcast mask is empty.
+        *
+        * tick_broadcast_set_event() implicitly switches the broadcast
+        * device to oneshot state.
+        */
+       if (!cpumask_empty(tick_broadcast_oneshot_mask))
+               tick_broadcast_set_event(bc, cpu, nexttick);
 }
 
 /*
@@ -1070,14 +1124,16 @@ static void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 void tick_broadcast_switch_to_oneshot(void)
 {
        struct clock_event_device *bc;
+       enum tick_device_mode oldmode;
        unsigned long flags;
 
        raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 
+       oldmode = tick_broadcast_device.mode;
        tick_broadcast_device.mode = TICKDEV_MODE_ONESHOT;
        bc = tick_broadcast_device.evtdev;
        if (bc)
-               tick_broadcast_setup_oneshot(bc);
+               tick_broadcast_setup_oneshot(bc, oldmode == TICKDEV_MODE_PERIODIC);
 
        raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
index 65b8658..e9138cd 100644 (file)
@@ -218,19 +218,8 @@ static void tick_setup_device(struct tick_device *td,
                 * this cpu:
                 */
                if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
-                       ktime_t next_p;
-                       u32 rem;
-
                        tick_do_timer_cpu = cpu;
-
-                       next_p = ktime_get();
-                       div_u64_rem(next_p, TICK_NSEC, &rem);
-                       if (rem) {
-                               next_p -= rem;
-                               next_p += TICK_NSEC;
-                       }
-
-                       tick_next_period = next_p;
+                       tick_next_period = ktime_get();
 #ifdef CONFIG_NO_HZ_FULL
                        /*
                         * The boot CPU may be nohz_full, in which case set
index 5225467..4df14db 100644 (file)
@@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
        raw_spin_lock(&jiffies_lock);
        write_seqcount_begin(&jiffies_seq);
        /* Did we start the jiffies update yet ? */
-       if (last_jiffies_update == 0)
+       if (last_jiffies_update == 0) {
+               u32 rem;
+
+               /*
+                * Ensure that the tick is aligned to a multiple of
+                * TICK_NSEC.
+                */
+               div_u64_rem(tick_next_period, TICK_NSEC, &rem);
+               if (rem)
+                       tick_next_period += TICK_NSEC - rem;
+
                last_jiffies_update = tick_next_period;
+       }
        period = last_jiffies_update;
        write_seqcount_end(&jiffies_seq);
        raw_spin_unlock(&jiffies_lock);
@@ -1030,7 +1041,7 @@ static bool report_idle_softirq(void)
                        return false;
        }
 
-       if (ratelimit < 10)
+       if (ratelimit >= 10)
                return false;
 
        /* On RT, softirqs handling may be waiting on some lock */
index 09d5949..266d028 100644 (file)
@@ -450,7 +450,7 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
                tkr = tkf->base + (seq & 0x01);
                now = ktime_to_ns(tkr->base);
                now += fast_tk_get_delta_ns(tkr);
-       } while (read_seqcount_latch_retry(&tkf->seq, seq));
+       } while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
 
        return now;
 }
@@ -566,7 +566,7 @@ static __always_inline u64 __ktime_get_real_fast(struct tk_fast *tkf, u64 *mono)
                basem = ktime_to_ns(tkr->base);
                baser = ktime_to_ns(tkr->base_real);
                delta = fast_tk_get_delta_ns(tkr);
-       } while (read_seqcount_latch_retry(&tkf->seq, seq));
+       } while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
 
        if (mono)
                *mono = basem + delta;
index 9a050e3..1f4b07d 100644 (file)
@@ -900,13 +900,23 @@ static const struct bpf_func_proto bpf_send_signal_thread_proto = {
 
 BPF_CALL_3(bpf_d_path, struct path *, path, char *, buf, u32, sz)
 {
+       struct path copy;
        long len;
        char *p;
 
        if (!sz)
                return 0;
 
-       p = d_path(path, buf, sz);
+       /*
+        * The path pointer is verified as trusted and safe to use,
+        * but let's double check it's valid anyway to workaround
+        * potentially broken verifier.
+        */
+       len = copy_from_kernel_nofault(&copy, path, sizeof(*path));
+       if (len < 0)
+               return len;
+
+       p = d_path(&copy, buf, sz);
        if (IS_ERR(p)) {
                len = PTR_ERR(p);
        } else {
index 9abb390..18d3684 100644 (file)
 struct fprobe_rethook_node {
        struct rethook_node node;
        unsigned long entry_ip;
+       unsigned long entry_parent_ip;
        char data[];
 };
 
-static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
-                          struct ftrace_ops *ops, struct ftrace_regs *fregs)
+static inline void __fprobe_handler(unsigned long ip, unsigned long parent_ip,
+                       struct ftrace_ops *ops, struct ftrace_regs *fregs)
 {
        struct fprobe_rethook_node *fpr;
        struct rethook_node *rh = NULL;
        struct fprobe *fp;
        void *entry_data = NULL;
-       int bit, ret;
+       int ret = 0;
 
        fp = container_of(ops, struct fprobe, ops);
-       if (fprobe_disabled(fp))
-               return;
-
-       bit = ftrace_test_recursion_trylock(ip, parent_ip);
-       if (bit < 0) {
-               fp->nmissed++;
-               return;
-       }
 
        if (fp->exit_handler) {
                rh = rethook_try_get(fp->rethook);
                if (!rh) {
                        fp->nmissed++;
-                       goto out;
+                       return;
                }
                fpr = container_of(rh, struct fprobe_rethook_node, node);
                fpr->entry_ip = ip;
+               fpr->entry_parent_ip = parent_ip;
                if (fp->entry_data_size)
                        entry_data = fpr->data;
        }
@@ -61,23 +55,60 @@ static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
                else
                        rethook_hook(rh, ftrace_get_regs(fregs), true);
        }
-out:
+}
+
+static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
+               struct ftrace_ops *ops, struct ftrace_regs *fregs)
+{
+       struct fprobe *fp;
+       int bit;
+
+       fp = container_of(ops, struct fprobe, ops);
+       if (fprobe_disabled(fp))
+               return;
+
+       /* recursion detection has to go before any traceable function and
+        * all functions before this point should be marked as notrace
+        */
+       bit = ftrace_test_recursion_trylock(ip, parent_ip);
+       if (bit < 0) {
+               fp->nmissed++;
+               return;
+       }
+       __fprobe_handler(ip, parent_ip, ops, fregs);
        ftrace_test_recursion_unlock(bit);
+
 }
 NOKPROBE_SYMBOL(fprobe_handler);
 
 static void fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip,
                                  struct ftrace_ops *ops, struct ftrace_regs *fregs)
 {
-       struct fprobe *fp = container_of(ops, struct fprobe, ops);
+       struct fprobe *fp;
+       int bit;
+
+       fp = container_of(ops, struct fprobe, ops);
+       if (fprobe_disabled(fp))
+               return;
+
+       /* recursion detection has to go before any traceable function and
+        * all functions called before this point should be marked as notrace
+        */
+       bit = ftrace_test_recursion_trylock(ip, parent_ip);
+       if (bit < 0) {
+               fp->nmissed++;
+               return;
+       }
 
        if (unlikely(kprobe_running())) {
                fp->nmissed++;
                return;
        }
+
        kprobe_busy_begin();
-       fprobe_handler(ip, parent_ip, ops, fregs);
+       __fprobe_handler(ip, parent_ip, ops, fregs);
        kprobe_busy_end();
+       ftrace_test_recursion_unlock(bit);
 }
 
 static void fprobe_exit_handler(struct rethook_node *rh, void *data,
@@ -85,14 +116,26 @@ static void fprobe_exit_handler(struct rethook_node *rh, void *data,
 {
        struct fprobe *fp = (struct fprobe *)data;
        struct fprobe_rethook_node *fpr;
+       int bit;
 
        if (!fp || fprobe_disabled(fp))
                return;
 
        fpr = container_of(rh, struct fprobe_rethook_node, node);
 
+       /*
+        * we need to assure no calls to traceable functions in-between the
+        * end of fprobe_handler and the beginning of fprobe_exit_handler.
+        */
+       bit = ftrace_test_recursion_trylock(fpr->entry_ip, fpr->entry_parent_ip);
+       if (bit < 0) {
+               fp->nmissed++;
+               return;
+       }
+
        fp->exit_handler(fp, fpr->entry_ip, regs,
                         fp->entry_data_size ? (void *)fpr->data : NULL);
+       ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(fprobe_exit_handler);
 
index 32c3dfd..60f6cb2 100644 (file)
@@ -288,7 +288,7 @@ unsigned long rethook_trampoline_handler(struct pt_regs *regs,
         * These loops must be protected from rethook_free_rcu() because those
         * are accessing 'rhn->rethook'.
         */
-       preempt_disable();
+       preempt_disable_notrace();
 
        /*
         * Run the handler on the shadow stack. Do not unlink the list here because
@@ -321,7 +321,7 @@ unsigned long rethook_trampoline_handler(struct pt_regs *regs,
                first = first->next;
                rethook_recycle(rhn);
        }
-       preempt_enable();
+       preempt_enable_notrace();
 
        return correct_ret_addr;
 }
index ebc5978..5d2c567 100644 (file)
@@ -60,6 +60,7 @@
  */
 bool ring_buffer_expanded;
 
+#ifdef CONFIG_FTRACE_STARTUP_TEST
 /*
  * We need to change this state when a selftest is running.
  * A selftest will lurk into the ring-buffer to count the
@@ -75,7 +76,6 @@ static bool __read_mostly tracing_selftest_running;
  */
 bool __read_mostly tracing_selftest_disabled;
 
-#ifdef CONFIG_FTRACE_STARTUP_TEST
 void __init disable_tracing_selftest(const char *reason)
 {
        if (!tracing_selftest_disabled) {
@@ -83,6 +83,9 @@ void __init disable_tracing_selftest(const char *reason)
                pr_info("Ftrace startup test is disabled due to %s\n", reason);
        }
 }
+#else
+#define tracing_selftest_running       0
+#define tracing_selftest_disabled      0
 #endif
 
 /* Pipe tracepoints to printk */
@@ -1051,7 +1054,10 @@ int __trace_array_puts(struct trace_array *tr, unsigned long ip,
        if (!(tr->trace_flags & TRACE_ITER_PRINTK))
                return 0;
 
-       if (unlikely(tracing_selftest_running || tracing_disabled))
+       if (unlikely(tracing_selftest_running && tr == &global_trace))
+               return 0;
+
+       if (unlikely(tracing_disabled))
                return 0;
 
        alloc = sizeof(*entry) + size + 2; /* possible \n added */
@@ -2041,6 +2047,24 @@ static int run_tracer_selftest(struct tracer *type)
        return 0;
 }
 
+static int do_run_tracer_selftest(struct tracer *type)
+{
+       int ret;
+
+       /*
+        * Tests can take a long time, especially if they are run one after the
+        * other, as does happen during bootup when all the tracers are
+        * registered. This could cause the soft lockup watchdog to trigger.
+        */
+       cond_resched();
+
+       tracing_selftest_running = true;
+       ret = run_tracer_selftest(type);
+       tracing_selftest_running = false;
+
+       return ret;
+}
+
 static __init int init_trace_selftests(void)
 {
        struct trace_selftests *p, *n;
@@ -2092,6 +2116,10 @@ static inline int run_tracer_selftest(struct tracer *type)
 {
        return 0;
 }
+static inline int do_run_tracer_selftest(struct tracer *type)
+{
+       return 0;
+}
 #endif /* CONFIG_FTRACE_STARTUP_TEST */
 
 static void add_tracer_options(struct trace_array *tr, struct tracer *t);
@@ -2127,8 +2155,6 @@ int __init register_tracer(struct tracer *type)
 
        mutex_lock(&trace_types_lock);
 
-       tracing_selftest_running = true;
-
        for (t = trace_types; t; t = t->next) {
                if (strcmp(type->name, t->name) == 0) {
                        /* already found */
@@ -2157,7 +2183,7 @@ int __init register_tracer(struct tracer *type)
        /* store the tracer for __set_tracer_option */
        type->flags->trace = type;
 
-       ret = run_tracer_selftest(type);
+       ret = do_run_tracer_selftest(type);
        if (ret < 0)
                goto out;
 
@@ -2166,7 +2192,6 @@ int __init register_tracer(struct tracer *type)
        add_tracer_options(&global_trace, type);
 
  out:
-       tracing_selftest_running = false;
        mutex_unlock(&trace_types_lock);
 
        if (ret || !default_bootup_tracer)
@@ -3490,7 +3515,7 @@ __trace_array_vprintk(struct trace_buffer *buffer,
        unsigned int trace_ctx;
        char *tbuffer;
 
-       if (tracing_disabled || tracing_selftest_running)
+       if (tracing_disabled)
                return 0;
 
        /* Don't pollute graph traces with trace_vprintk internals */
@@ -3538,6 +3563,9 @@ __printf(3, 0)
 int trace_array_vprintk(struct trace_array *tr,
                        unsigned long ip, const char *fmt, va_list args)
 {
+       if (tracing_selftest_running && tr == &global_trace)
+               return 0;
+
        return __trace_array_vprintk(tr->array_buffer.buffer, ip, fmt, args);
 }
 
@@ -5171,7 +5199,7 @@ static const struct file_operations tracing_fops = {
        .open           = tracing_open,
        .read           = seq_read,
        .read_iter      = seq_read_iter,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = copy_splice_read,
        .write          = tracing_write_stub,
        .llseek         = tracing_lseek,
        .release        = tracing_release,
@@ -5752,7 +5780,7 @@ static const char readme_msg[] =
        "\t    table using the key(s) and value(s) named, and the value of a\n"
        "\t    sum called 'hitcount' is incremented.  Keys and values\n"
        "\t    correspond to fields in the event's format description.  Keys\n"
-       "\t    can be any field, or the special string 'stacktrace'.\n"
+       "\t    can be any field, or the special string 'common_stacktrace'.\n"
        "\t    Compound keys consisting of up to two fields can be specified\n"
        "\t    by the 'keys' keyword.  Values must correspond to numeric\n"
        "\t    fields.  Sort keys consisting of up to two fields can be\n"
index 654ffa4..57e539d 100644 (file)
@@ -194,6 +194,8 @@ static int trace_define_generic_fields(void)
        __generic_field(int, common_cpu, FILTER_CPU);
        __generic_field(char *, COMM, FILTER_COMM);
        __generic_field(char *, comm, FILTER_COMM);
+       __generic_field(char *, stacktrace, FILTER_STACKTRACE);
+       __generic_field(char *, STACKTRACE, FILTER_STACKTRACE);
 
        return ret;
 }
index 486cca3..b97d3ad 100644 (file)
@@ -1364,7 +1364,7 @@ static const char *hist_field_name(struct hist_field *field,
                if (field->field)
                        field_name = field->field->name;
                else
-                       field_name = "stacktrace";
+                       field_name = "common_stacktrace";
        } else if (field->flags & HIST_FIELD_FL_HITCOUNT)
                field_name = "hitcount";
 
@@ -2367,7 +2367,7 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
                hist_data->enable_timestamps = true;
                if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS)
                        hist_data->attrs->ts_in_usecs = true;
-       } else if (strcmp(field_name, "stacktrace") == 0) {
+       } else if (strcmp(field_name, "common_stacktrace") == 0) {
                *flags |= HIST_FIELD_FL_STACKTRACE;
        } else if (strcmp(field_name, "common_cpu") == 0)
                *flags |= HIST_FIELD_FL_CPU;
@@ -2378,11 +2378,15 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
                if (!field || !field->size) {
                        /*
                         * For backward compatibility, if field_name
-                        * was "cpu", then we treat this the same as
-                        * common_cpu. This also works for "CPU".
+                        * was "cpu" or "stacktrace", then we treat this
+                        * the same as common_cpu and common_stacktrace
+                        * respectively. This also works for "CPU", and
+                        * "STACKTRACE".
                         */
                        if (field && field->filter_type == FILTER_CPU) {
                                *flags |= HIST_FIELD_FL_CPU;
+                       } else if (field && field->filter_type == FILTER_STACKTRACE) {
+                               *flags |= HIST_FIELD_FL_STACKTRACE;
                        } else {
                                hist_err(tr, HIST_ERR_FIELD_NOT_FOUND,
                                         errpos(field_name));
@@ -4238,13 +4242,19 @@ static int __create_val_field(struct hist_trigger_data *hist_data,
                goto out;
        }
 
-       /* Some types cannot be a value */
-       if (hist_field->flags & (HIST_FIELD_FL_GRAPH | HIST_FIELD_FL_PERCENT |
-                                HIST_FIELD_FL_BUCKET | HIST_FIELD_FL_LOG2 |
-                                HIST_FIELD_FL_SYM | HIST_FIELD_FL_SYM_OFFSET |
-                                HIST_FIELD_FL_SYSCALL | HIST_FIELD_FL_STACKTRACE)) {
-               hist_err(file->tr, HIST_ERR_BAD_FIELD_MODIFIER, errpos(field_str));
-               ret = -EINVAL;
+       /* values and variables should not have some modifiers */
+       if (hist_field->flags & HIST_FIELD_FL_VAR) {
+               /* Variable */
+               if (hist_field->flags & (HIST_FIELD_FL_GRAPH | HIST_FIELD_FL_PERCENT |
+                                        HIST_FIELD_FL_BUCKET | HIST_FIELD_FL_LOG2))
+                       goto err;
+       } else {
+               /* Value */
+               if (hist_field->flags & (HIST_FIELD_FL_GRAPH | HIST_FIELD_FL_PERCENT |
+                                        HIST_FIELD_FL_BUCKET | HIST_FIELD_FL_LOG2 |
+                                        HIST_FIELD_FL_SYM | HIST_FIELD_FL_SYM_OFFSET |
+                                        HIST_FIELD_FL_SYSCALL | HIST_FIELD_FL_STACKTRACE))
+                       goto err;
        }
 
        hist_data->fields[val_idx] = hist_field;
@@ -4256,6 +4266,9 @@ static int __create_val_field(struct hist_trigger_data *hist_data,
                ret = -EINVAL;
  out:
        return ret;
+ err:
+       hist_err(file->tr, HIST_ERR_BAD_FIELD_MODIFIER, errpos(field_str));
+       return -EINVAL;
 }
 
 static int create_val_field(struct hist_trigger_data *hist_data,
@@ -5385,7 +5398,7 @@ static void hist_trigger_print_key(struct seq_file *m,
                        if (key_field->field)
                                seq_printf(m, "%s.stacktrace", key_field->field->name);
                        else
-                               seq_puts(m, "stacktrace:\n");
+                               seq_puts(m, "common_stacktrace:\n");
                        hist_trigger_stacktrace_print(m,
                                                      key + key_field->offset,
                                                      HIST_STACKTRACE_DEPTH);
@@ -5968,7 +5981,7 @@ static int event_hist_trigger_print(struct seq_file *m,
                        if (field->field)
                                seq_printf(m, "%s.stacktrace", field->field->name);
                        else
-                               seq_puts(m, "stacktrace");
+                               seq_puts(m, "common_stacktrace");
                } else
                        hist_field_print(m, field);
        }
index b1ecd76..8df0550 100644 (file)
 #define EVENT_STATUS_OTHER BIT(7)
 
 /*
+ * User register flags are not allowed yet, keep them here until we are
+ * ready to expose them out to the user ABI.
+ */
+enum user_reg_flag {
+       /* Event will not delete upon last reference closing */
+       USER_EVENT_REG_PERSIST          = 1U << 0,
+
+       /* This value or above is currently non-ABI */
+       USER_EVENT_REG_MAX              = 1U << 1,
+};
+
+/*
  * Stores the system name, tables, and locks for a group of events. This
  * allows isolation for events by various means.
  */
@@ -85,8 +97,10 @@ struct user_event {
        struct hlist_node               node;
        struct list_head                fields;
        struct list_head                validators;
+       struct work_struct              put_work;
        refcount_t                      refcnt;
        int                             min_size;
+       int                             reg_flags;
        char                            status;
 };
 
@@ -96,12 +110,12 @@ struct user_event {
  * these to track enablement sites that are tied to an event.
  */
 struct user_event_enabler {
-       struct list_head        link;
+       struct list_head        mm_enablers_link;
        struct user_event       *event;
        unsigned long           addr;
 
        /* Track enable bit, flags, etc. Aligned for bitops. */
-       unsigned int            values;
+       unsigned long           values;
 };
 
 /* Bits 0-5 are for the bit to update upon enable/disable (0-63 allowed) */
@@ -116,7 +130,9 @@ struct user_event_enabler {
 /* Only duplicate the bit value */
 #define ENABLE_VAL_DUP_MASK ENABLE_VAL_BIT_MASK
 
-#define ENABLE_BITOPS(e) ((unsigned long *)&(e)->values)
+#define ENABLE_BITOPS(e) (&(e)->values)
+
+#define ENABLE_BIT(e) ((int)((e)->values & ENABLE_VAL_BIT_MASK))
 
 /* Used for asynchronous faulting in of pages */
 struct user_event_enabler_fault {
@@ -153,7 +169,7 @@ struct user_event_file_info {
 #define VALIDATOR_REL (1 << 1)
 
 struct user_event_validator {
-       struct list_head        link;
+       struct list_head        user_event_link;
        int                     offset;
        int                     flags;
 };
@@ -163,76 +179,151 @@ typedef void (*user_event_func_t) (struct user_event *user, struct iov_iter *i,
 
 static int user_event_parse(struct user_event_group *group, char *name,
                            char *args, char *flags,
-                           struct user_event **newuser);
+                           struct user_event **newuser, int reg_flags);
 
 static struct user_event_mm *user_event_mm_get(struct user_event_mm *mm);
 static struct user_event_mm *user_event_mm_get_all(struct user_event *user);
 static void user_event_mm_put(struct user_event_mm *mm);
+static int destroy_user_event(struct user_event *user);
 
 static u32 user_event_key(char *name)
 {
        return jhash(name, strlen(name), 0);
 }
 
-static void user_event_group_destroy(struct user_event_group *group)
+static struct user_event *user_event_get(struct user_event *user)
 {
-       kfree(group->system_name);
-       kfree(group);
+       refcount_inc(&user->refcnt);
+
+       return user;
 }
 
-static char *user_event_group_system_name(struct user_namespace *user_ns)
+static void delayed_destroy_user_event(struct work_struct *work)
 {
-       char *system_name;
-       int len = sizeof(USER_EVENTS_SYSTEM) + 1;
+       struct user_event *user = container_of(
+               work, struct user_event, put_work);
+
+       mutex_lock(&event_mutex);
 
-       if (user_ns != &init_user_ns) {
+       if (!refcount_dec_and_test(&user->refcnt))
+               goto out;
+
+       if (destroy_user_event(user)) {
                /*
-                * Unexpected at this point:
-                * We only currently support init_user_ns.
-                * When we enable more, this will trigger a failure so log.
+                * The only reason this would fail here is if we cannot
+                * update the visibility of the event. In this case the
+                * event stays in the hashtable, waiting for someone to
+                * attempt to delete it later.
                 */
-               pr_warn("user_events: Namespace other than init_user_ns!\n");
-               return NULL;
+               pr_warn("user_events: Unable to delete event\n");
+               refcount_set(&user->refcnt, 1);
        }
+out:
+       mutex_unlock(&event_mutex);
+}
 
-       system_name = kmalloc(len, GFP_KERNEL);
+static void user_event_put(struct user_event *user, bool locked)
+{
+       bool delete;
 
-       if (!system_name)
-               return NULL;
+       if (unlikely(!user))
+               return;
 
-       snprintf(system_name, len, "%s", USER_EVENTS_SYSTEM);
+       /*
+        * When the event is not enabled for auto-delete there will always
+        * be at least 1 reference to the event. During the event creation
+        * we initially set the refcnt to 2 to achieve this. In those cases
+        * the caller must acquire event_mutex and after decrement check if
+        * the refcnt is 1, meaning this is the last reference. When auto
+        * delete is enabled, there will only be 1 ref, IE: refcnt will be
+        * only set to 1 during creation to allow the below checks to go
+        * through upon the last put. The last put must always be done with
+        * the event mutex held.
+        */
+       if (!locked) {
+               lockdep_assert_not_held(&event_mutex);
+               delete = refcount_dec_and_mutex_lock(&user->refcnt, &event_mutex);
+       } else {
+               lockdep_assert_held(&event_mutex);
+               delete = refcount_dec_and_test(&user->refcnt);
+       }
 
-       return system_name;
+       if (!delete)
+               return;
+
+       /*
+        * We now have the event_mutex in all cases, which ensures that
+        * no new references will be taken until event_mutex is released.
+        * New references come through find_user_event(), which requires
+        * the event_mutex to be held.
+        */
+
+       if (user->reg_flags & USER_EVENT_REG_PERSIST) {
+               /* We should not get here when persist flag is set */
+               pr_alert("BUG: Auto-delete engaged on persistent event\n");
+               goto out;
+       }
+
+       /*
+        * Unfortunately we have to attempt the actual destroy in a work
+        * queue. This is because not all cases handle a trace_event_call
+        * being removed within the class->reg() operation for unregister.
+        */
+       INIT_WORK(&user->put_work, delayed_destroy_user_event);
+
+       /*
+        * Since the event is still in the hashtable, we have to re-inc
+        * the ref count to 1. This count will be decremented and checked
+        * in the work queue to ensure it's still the last ref. This is
+        * needed because a user-process could register the same event in
+        * between the time of event_mutex release and the work queue
+        * running the delayed destroy. If we removed the item now from
+        * the hashtable, this would result in a timing window where a
+        * user process would fail a register because the trace_event_call
+        * register would fail in the tracing layers.
+        */
+       refcount_set(&user->refcnt, 1);
+
+       if (WARN_ON_ONCE(!schedule_work(&user->put_work))) {
+               /*
+                * If we fail we must wait for an admin to attempt delete or
+                * another register/close of the event, whichever is first.
+                */
+               pr_warn("user_events: Unable to queue delayed destroy\n");
+       }
+out:
+       /* Ensure if we didn't have event_mutex before we unlock it */
+       if (!locked)
+               mutex_unlock(&event_mutex);
 }
 
-static inline struct user_event_group
-*user_event_group_from_user_ns(struct user_namespace *user_ns)
+static void user_event_group_destroy(struct user_event_group *group)
 {
-       if (user_ns == &init_user_ns)
-               return init_group;
-
-       return NULL;
+       kfree(group->system_name);
+       kfree(group);
 }
 
-static struct user_event_group *current_user_event_group(void)
+static char *user_event_group_system_name(void)
 {
-       struct user_namespace *user_ns = current_user_ns();
-       struct user_event_group *group = NULL;
+       char *system_name;
+       int len = sizeof(USER_EVENTS_SYSTEM) + 1;
 
-       while (user_ns) {
-               group = user_event_group_from_user_ns(user_ns);
+       system_name = kmalloc(len, GFP_KERNEL);
 
-               if (group)
-                       break;
+       if (!system_name)
+               return NULL;
 
-               user_ns = user_ns->parent;
-       }
+       snprintf(system_name, len, "%s", USER_EVENTS_SYSTEM);
 
-       return group;
+       return system_name;
+}
+
+static struct user_event_group *current_user_event_group(void)
+{
+       return init_group;
 }
 
-static struct user_event_group
-*user_event_group_create(struct user_namespace *user_ns)
+static struct user_event_group *user_event_group_create(void)
 {
        struct user_event_group *group;
 
@@ -241,7 +332,7 @@ static struct user_event_group
        if (!group)
                return NULL;
 
-       group->system_name = user_event_group_system_name(user_ns);
+       group->system_name = user_event_group_system_name();
 
        if (!group->system_name)
                goto error;
@@ -257,12 +348,13 @@ error:
        return NULL;
 };
 
-static void user_event_enabler_destroy(struct user_event_enabler *enabler)
+static void user_event_enabler_destroy(struct user_event_enabler *enabler,
+                                      bool locked)
 {
-       list_del_rcu(&enabler->link);
+       list_del_rcu(&enabler->mm_enablers_link);
 
        /* No longer tracking the event via the enabler */
-       refcount_dec(&enabler->event->refcnt);
+       user_event_put(enabler->event, locked);
 
        kfree(enabler);
 }
@@ -324,7 +416,7 @@ static void user_event_enabler_fault_fixup(struct work_struct *work)
 
        /* User asked for enabler to be removed during fault */
        if (test_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler))) {
-               user_event_enabler_destroy(enabler);
+               user_event_enabler_destroy(enabler, true);
                goto out;
        }
 
@@ -423,9 +515,9 @@ static int user_event_enabler_write(struct user_event_mm *mm,
 
        /* Update bit atomically, user tracers must be atomic as well */
        if (enabler->event && enabler->event->status)
-               set_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr);
+               set_bit(ENABLE_BIT(enabler), ptr);
        else
-               clear_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr);
+               clear_bit(ENABLE_BIT(enabler), ptr);
 
        kunmap_local(kaddr);
        unpin_user_pages_dirty_lock(&page, 1, true);
@@ -437,11 +529,9 @@ static bool user_event_enabler_exists(struct user_event_mm *mm,
                                      unsigned long uaddr, unsigned char bit)
 {
        struct user_event_enabler *enabler;
-       struct user_event_enabler *next;
 
-       list_for_each_entry_safe(enabler, next, &mm->enablers, link) {
-               if (enabler->addr == uaddr &&
-                   (enabler->values & ENABLE_VAL_BIT_MASK) == bit)
+       list_for_each_entry(enabler, &mm->enablers, mm_enablers_link) {
+               if (enabler->addr == uaddr && ENABLE_BIT(enabler) == bit)
                        return true;
        }
 
@@ -451,23 +541,36 @@ static bool user_event_enabler_exists(struct user_event_mm *mm,
 static void user_event_enabler_update(struct user_event *user)
 {
        struct user_event_enabler *enabler;
-       struct user_event_mm *mm = user_event_mm_get_all(user);
        struct user_event_mm *next;
+       struct user_event_mm *mm;
        int attempt;
 
+       lockdep_assert_held(&event_mutex);
+
+       /*
+        * We need to build a one-shot list of all the mms that have an
+        * enabler for the user_event passed in. This list is only valid
+        * while holding the event_mutex. The only reason for this is due
+        * to the global mm list being RCU protected and we use methods
+        * which can wait (mmap_read_lock and pin_user_pages_remote).
+        *
+        * NOTE: user_event_mm_get_all() increments the ref count of each
+        * mm that is added to the list to prevent removal timing windows.
+        * We must always put each mm after they are used, which may wait.
+        */
+       mm = user_event_mm_get_all(user);
+
        while (mm) {
                next = mm->next;
                mmap_read_lock(mm->mm);
-               rcu_read_lock();
 
-               list_for_each_entry_rcu(enabler, &mm->enablers, link) {
+               list_for_each_entry(enabler, &mm->enablers, mm_enablers_link) {
                        if (enabler->event == user) {
                                attempt = 0;
                                user_event_enabler_write(mm, enabler, true, &attempt);
                        }
                }
 
-               rcu_read_unlock();
                mmap_read_unlock(mm->mm);
                user_event_mm_put(mm);
                mm = next;
@@ -488,14 +591,14 @@ static bool user_event_enabler_dup(struct user_event_enabler *orig,
        if (!enabler)
                return false;
 
-       enabler->event = orig->event;
+       enabler->event = user_event_get(orig->event);
        enabler->addr = orig->addr;
 
        /* Only dup part of value (ignore future flags, etc) */
        enabler->values = orig->values & ENABLE_VAL_DUP_MASK;
 
-       refcount_inc(&enabler->event->refcnt);
-       list_add_rcu(&enabler->link, &mm->enablers);
+       /* Enablers not exposed yet, RCU not required */
+       list_add(&enabler->mm_enablers_link, &mm->enablers);
 
        return true;
 }
@@ -514,6 +617,14 @@ static struct user_event_mm *user_event_mm_get_all(struct user_event *user)
        struct user_event_mm *mm;
 
        /*
+        * We use the mm->next field to build a one-shot list from the global
+        * RCU protected list. To build this list the event_mutex must be held.
+        * This lets us build a list without requiring allocs that could fail
+        * when user based events are most wanted for diagnostics.
+        */
+       lockdep_assert_held(&event_mutex);
+
+       /*
         * We do not want to block fork/exec while enablements are being
         * updated, so we use RCU to walk the current tasks that have used
         * user_events ABI for 1 or more events. Each enabler found in each
@@ -525,23 +636,24 @@ static struct user_event_mm *user_event_mm_get_all(struct user_event *user)
         */
        rcu_read_lock();
 
-       list_for_each_entry_rcu(mm, &user_event_mms, link)
-               list_for_each_entry_rcu(enabler, &mm->enablers, link)
+       list_for_each_entry_rcu(mm, &user_event_mms, mms_link) {
+               list_for_each_entry_rcu(enabler, &mm->enablers, mm_enablers_link) {
                        if (enabler->event == user) {
                                mm->next = found;
                                found = user_event_mm_get(mm);
                                break;
                        }
+               }
+       }
 
        rcu_read_unlock();
 
        return found;
 }
 
-static struct user_event_mm *user_event_mm_create(struct task_struct *t)
+static struct user_event_mm *user_event_mm_alloc(struct task_struct *t)
 {
        struct user_event_mm *user_mm;
-       unsigned long flags;
 
        user_mm = kzalloc(sizeof(*user_mm), GFP_KERNEL_ACCOUNT);
 
@@ -553,12 +665,6 @@ static struct user_event_mm *user_event_mm_create(struct task_struct *t)
        refcount_set(&user_mm->refcnt, 1);
        refcount_set(&user_mm->tasks, 1);
 
-       spin_lock_irqsave(&user_event_mms_lock, flags);
-       list_add_rcu(&user_mm->link, &user_event_mms);
-       spin_unlock_irqrestore(&user_event_mms_lock, flags);
-
-       t->user_event_mm = user_mm;
-
        /*
         * The lifetime of the memory descriptor can slightly outlast
         * the task lifetime if a ref to the user_event_mm is taken
@@ -572,6 +678,17 @@ static struct user_event_mm *user_event_mm_create(struct task_struct *t)
        return user_mm;
 }
 
+static void user_event_mm_attach(struct user_event_mm *user_mm, struct task_struct *t)
+{
+       unsigned long flags;
+
+       spin_lock_irqsave(&user_event_mms_lock, flags);
+       list_add_rcu(&user_mm->mms_link, &user_event_mms);
+       spin_unlock_irqrestore(&user_event_mms_lock, flags);
+
+       t->user_event_mm = user_mm;
+}
+
 static struct user_event_mm *current_user_event_mm(void)
 {
        struct user_event_mm *user_mm = current->user_event_mm;
@@ -579,10 +696,12 @@ static struct user_event_mm *current_user_event_mm(void)
        if (user_mm)
                goto inc;
 
-       user_mm = user_event_mm_create(current);
+       user_mm = user_event_mm_alloc(current);
 
        if (!user_mm)
                goto error;
+
+       user_event_mm_attach(user_mm, current);
 inc:
        refcount_inc(&user_mm->refcnt);
 error:
@@ -593,8 +712,8 @@ static void user_event_mm_destroy(struct user_event_mm *mm)
 {
        struct user_event_enabler *enabler, *next;
 
-       list_for_each_entry_safe(enabler, next, &mm->enablers, link)
-               user_event_enabler_destroy(enabler);
+       list_for_each_entry_safe(enabler, next, &mm->enablers, mm_enablers_link)
+               user_event_enabler_destroy(enabler, false);
 
        mmdrop(mm->mm);
        kfree(mm);
@@ -630,7 +749,7 @@ void user_event_mm_remove(struct task_struct *t)
 
        /* Remove the mm from the list, so it can no longer be enabled */
        spin_lock_irqsave(&user_event_mms_lock, flags);
-       list_del_rcu(&mm->link);
+       list_del_rcu(&mm->mms_link);
        spin_unlock_irqrestore(&user_event_mms_lock, flags);
 
        /*
@@ -670,7 +789,7 @@ void user_event_mm_remove(struct task_struct *t)
 
 void user_event_mm_dup(struct task_struct *t, struct user_event_mm *old_mm)
 {
-       struct user_event_mm *mm = user_event_mm_create(t);
+       struct user_event_mm *mm = user_event_mm_alloc(t);
        struct user_event_enabler *enabler;
 
        if (!mm)
@@ -678,16 +797,18 @@ void user_event_mm_dup(struct task_struct *t, struct user_event_mm *old_mm)
 
        rcu_read_lock();
 
-       list_for_each_entry_rcu(enabler, &old_mm->enablers, link)
+       list_for_each_entry_rcu(enabler, &old_mm->enablers, mm_enablers_link) {
                if (!user_event_enabler_dup(enabler, mm))
                        goto error;
+       }
 
        rcu_read_unlock();
 
+       user_event_mm_attach(mm, t);
        return;
 error:
        rcu_read_unlock();
-       user_event_mm_remove(t);
+       user_event_mm_destroy(mm);
 }
 
 static bool current_user_event_enabler_exists(unsigned long uaddr,
@@ -747,8 +868,8 @@ retry:
         * exit or run exec(), which includes forks and clones.
         */
        if (!*write_result) {
-               refcount_inc(&enabler->event->refcnt);
-               list_add_rcu(&enabler->link, &user_mm->enablers);
+               user_event_get(user);
+               list_add_rcu(&enabler->mm_enablers_link, &user_mm->enablers);
        }
 
        mutex_unlock(&event_mutex);
@@ -770,7 +891,12 @@ out:
 static __always_inline __must_check
 bool user_event_last_ref(struct user_event *user)
 {
-       return refcount_read(&user->refcnt) == 1;
+       int last = 0;
+
+       if (user->reg_flags & USER_EVENT_REG_PERSIST)
+               last = 1;
+
+       return refcount_read(&user->refcnt) == last;
 }
 
 static __always_inline __must_check
@@ -809,7 +935,8 @@ static struct list_head *user_event_get_fields(struct trace_event_call *call)
  * Upon success user_event has its ref count increased by 1.
  */
 static int user_event_parse_cmd(struct user_event_group *group,
-                               char *raw_command, struct user_event **newuser)
+                               char *raw_command, struct user_event **newuser,
+                               int reg_flags)
 {
        char *name = raw_command;
        char *args = strpbrk(name, " ");
@@ -823,7 +950,7 @@ static int user_event_parse_cmd(struct user_event_group *group,
        if (flags)
                *flags++ = '\0';
 
-       return user_event_parse(group, name, args, flags, newuser);
+       return user_event_parse(group, name, args, flags, newuser, reg_flags);
 }
 
 static int user_field_array_size(const char *type)
@@ -904,8 +1031,8 @@ static void user_event_destroy_validators(struct user_event *user)
        struct user_event_validator *validator, *next;
        struct list_head *head = &user->validators;
 
-       list_for_each_entry_safe(validator, next, head, link) {
-               list_del(&validator->link);
+       list_for_each_entry_safe(validator, next, head, user_event_link) {
+               list_del(&validator->user_event_link);
                kfree(validator);
        }
 }
@@ -959,7 +1086,7 @@ add_validator:
        validator->offset = offset;
 
        /* Want sequential access when validating */
-       list_add_tail(&validator->link, &user->validators);
+       list_add_tail(&validator->user_event_link, &user->validators);
 
 add_field:
        field->type = type;
@@ -1334,10 +1461,8 @@ static struct user_event *find_user_event(struct user_event_group *group,
        *outkey = key;
 
        hash_for_each_possible(group->register_table, user, node, key)
-               if (!strcmp(EVENT_NAME(user), name)) {
-                       refcount_inc(&user->refcnt);
-                       return user;
-               }
+               if (!strcmp(EVENT_NAME(user), name))
+                       return user_event_get(user);
 
        return NULL;
 }
@@ -1349,7 +1474,7 @@ static int user_event_validate(struct user_event *user, void *data, int len)
        void *pos, *end = data + len;
        u32 loc, offset, size;
 
-       list_for_each_entry(validator, head, link) {
+       list_for_each_entry(validator, head, user_event_link) {
                pos = data + validator->offset;
 
                /* Already done min_size check, no bounds check here */
@@ -1399,7 +1524,7 @@ static void user_event_ftrace(struct user_event *user, struct iov_iter *i,
        if (unlikely(!entry))
                return;
 
-       if (unlikely(!copy_nofault(entry + 1, i->count, i)))
+       if (unlikely(i->count != 0 && !copy_nofault(entry + 1, i->count, i)))
                goto discard;
 
        if (!list_empty(&user->validators) &&
@@ -1440,7 +1565,7 @@ static void user_event_perf(struct user_event *user, struct iov_iter *i,
 
                perf_fetch_caller_regs(regs);
 
-               if (unlikely(!copy_nofault(perf_entry + 1, i->count, i)))
+               if (unlikely(i->count != 0 && !copy_nofault(perf_entry + 1, i->count, i)))
                        goto discard;
 
                if (!list_empty(&user->validators) &&
@@ -1551,12 +1676,12 @@ static int user_event_reg(struct trace_event_call *call,
 
        return ret;
 inc:
-       refcount_inc(&user->refcnt);
+       user_event_get(user);
        update_enable_bit_for(user);
        return 0;
 dec:
        update_enable_bit_for(user);
-       refcount_dec(&user->refcnt);
+       user_event_put(user, true);
        return 0;
 }
 
@@ -1587,10 +1712,11 @@ static int user_event_create(const char *raw_command)
 
        mutex_lock(&group->reg_mutex);
 
-       ret = user_event_parse_cmd(group, name, &user);
+       /* Dyn events persist, otherwise they would cleanup immediately */
+       ret = user_event_parse_cmd(group, name, &user, USER_EVENT_REG_PERSIST);
 
        if (!ret)
-               refcount_dec(&user->refcnt);
+               user_event_put(user, false);
 
        mutex_unlock(&group->reg_mutex);
 
@@ -1712,6 +1838,8 @@ static bool user_event_match(const char *system, const char *event,
 
        if (match && argc > 0)
                match = user_fields_match(user, argc, argv);
+       else if (match && argc == 0)
+               match = list_empty(&user->fields);
 
        return match;
 }
@@ -1748,11 +1876,17 @@ static int user_event_trace_register(struct user_event *user)
  */
 static int user_event_parse(struct user_event_group *group, char *name,
                            char *args, char *flags,
-                           struct user_event **newuser)
+                           struct user_event **newuser, int reg_flags)
 {
        int ret;
        u32 key;
        struct user_event *user;
+       int argc = 0;
+       char **argv;
+
+       /* User register flags are not ready yet */
+       if (reg_flags != 0 || flags != NULL)
+               return -EINVAL;
 
        /* Prevent dyn_event from racing */
        mutex_lock(&event_mutex);
@@ -1760,13 +1894,35 @@ static int user_event_parse(struct user_event_group *group, char *name,
        mutex_unlock(&event_mutex);
 
        if (user) {
-               *newuser = user;
-               /*
-                * Name is allocated by caller, free it since it already exists.
-                * Caller only worries about failure cases for freeing.
-                */
-               kfree(name);
+               if (args) {
+                       argv = argv_split(GFP_KERNEL, args, &argc);
+                       if (!argv) {
+                               ret = -ENOMEM;
+                               goto error;
+                       }
+
+                       ret = user_fields_match(user, argc, (const char **)argv);
+                       argv_free(argv);
+
+               } else
+                       ret = list_empty(&user->fields);
+
+               if (ret) {
+                       *newuser = user;
+                       /*
+                        * Name is allocated by caller, free it since it already exists.
+                        * Caller only worries about failure cases for freeing.
+                        */
+                       kfree(name);
+               } else {
+                       ret = -EADDRINUSE;
+                       goto error;
+               }
+
                return 0;
+error:
+               user_event_put(user, false);
+               return ret;
        }
 
        user = kzalloc(sizeof(*user), GFP_KERNEL_ACCOUNT);
@@ -1819,8 +1975,15 @@ static int user_event_parse(struct user_event_group *group, char *name,
        if (ret)
                goto put_user_lock;
 
-       /* Ensure we track self ref and caller ref (2) */
-       refcount_set(&user->refcnt, 2);
+       user->reg_flags = reg_flags;
+
+       if (user->reg_flags & USER_EVENT_REG_PERSIST) {
+               /* Ensure we track self ref and caller ref (2) */
+               refcount_set(&user->refcnt, 2);
+       } else {
+               /* Ensure we track only caller ref (1) */
+               refcount_set(&user->refcnt, 1);
+       }
 
        dyn_event_init(&user->devent, &user_event_dops);
        dyn_event_add(&user->devent, &user->call);
@@ -1852,7 +2015,7 @@ static int delete_user_event(struct user_event_group *group, char *name)
        if (!user)
                return -ENOENT;
 
-       refcount_dec(&user->refcnt);
+       user_event_put(user, true);
 
        if (!user_event_last_ref(user))
                return -EBUSY;
@@ -2011,9 +2174,7 @@ static int user_events_ref_add(struct user_event_file_info *info,
        for (i = 0; i < count; ++i)
                new_refs->events[i] = refs->events[i];
 
-       new_refs->events[i] = user;
-
-       refcount_inc(&user->refcnt);
+       new_refs->events[i] = user_event_get(user);
 
        rcu_assign_pointer(info->refs, new_refs);
 
@@ -2044,8 +2205,8 @@ static long user_reg_get(struct user_reg __user *ureg, struct user_reg *kreg)
        if (ret)
                return ret;
 
-       /* Ensure no flags, since we don't support any yet */
-       if (kreg->flags != 0)
+       /* Ensure only valid flags */
+       if (kreg->flags & ~(USER_EVENT_REG_MAX-1))
                return -EINVAL;
 
        /* Ensure supported size */
@@ -2117,7 +2278,7 @@ static long user_events_ioctl_reg(struct user_event_file_info *info,
                return ret;
        }
 
-       ret = user_event_parse_cmd(info->group, name, &user);
+       ret = user_event_parse_cmd(info->group, name, &user, reg.flags);
 
        if (ret) {
                kfree(name);
@@ -2127,7 +2288,7 @@ static long user_events_ioctl_reg(struct user_event_file_info *info,
        ret = user_events_ref_add(info, user);
 
        /* No longer need parse ref, ref_add either worked or not */
-       refcount_dec(&user->refcnt);
+       user_event_put(user, false);
 
        /* Positive number is index and valid */
        if (ret < 0)
@@ -2270,17 +2431,18 @@ static long user_events_ioctl_unreg(unsigned long uarg)
         */
        mutex_lock(&event_mutex);
 
-       list_for_each_entry_safe(enabler, next, &mm->enablers, link)
+       list_for_each_entry_safe(enabler, next, &mm->enablers, mm_enablers_link) {
                if (enabler->addr == reg.disable_addr &&
-                   (enabler->values & ENABLE_VAL_BIT_MASK) == reg.disable_bit) {
+                   ENABLE_BIT(enabler) == reg.disable_bit) {
                        set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler));
 
                        if (!test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)))
-                               user_event_enabler_destroy(enabler);
+                               user_event_enabler_destroy(enabler, true);
 
                        /* Removed at least one */
                        ret = 0;
                }
+       }
 
        mutex_unlock(&event_mutex);
 
@@ -2333,7 +2495,6 @@ static int user_events_release(struct inode *node, struct file *file)
        struct user_event_file_info *info = file->private_data;
        struct user_event_group *group;
        struct user_event_refs *refs;
-       struct user_event *user;
        int i;
 
        if (!info)
@@ -2357,12 +2518,9 @@ static int user_events_release(struct inode *node, struct file *file)
         * The underlying user_events are ref counted, and cannot be freed.
         * After this decrement, the user_events may be freed elsewhere.
         */
-       for (i = 0; i < refs->count; ++i) {
-               user = refs->events[i];
+       for (i = 0; i < refs->count; ++i)
+               user_event_put(refs->events[i], false);
 
-               if (user)
-                       refcount_dec(&user->refcnt);
-       }
 out:
        file->private_data = NULL;
 
@@ -2543,7 +2701,7 @@ static int __init trace_events_user_init(void)
        if (!fault_cache)
                return -ENOMEM;
 
-       init_group = user_event_group_create(&init_user_ns);
+       init_group = user_event_group_create();
 
        if (!init_group) {
                kmem_cache_destroy(fault_cache);
index efbbec2..e97e3fa 100644 (file)
@@ -1652,6 +1652,8 @@ static enum hrtimer_restart timerlat_irq(struct hrtimer *timer)
                        osnoise_stop_tracing();
                        notify_new_max_latency(diff);
 
+                       wake_up_process(tlat->kthread);
+
                        return HRTIMER_NORESTART;
                }
        }
index 15f05fa..1e33f36 100644 (file)
@@ -847,7 +847,7 @@ static void print_fields(struct trace_iterator *iter, struct trace_event_call *c
        int ret;
        void *pos;
 
-       list_for_each_entry(field, head, link) {
+       list_for_each_entry_reverse(field, head, link) {
                trace_seq_printf(&iter->seq, " %s=", field->name);
                if (field->offset + field->size > iter->ent_size) {
                        trace_seq_puts(&iter->seq, "<OVERFLOW>");
index ef8ed3b..6a4ecfb 100644 (file)
@@ -308,7 +308,7 @@ trace_probe_primary_from_call(struct trace_event_call *call)
 {
        struct trace_probe_event *tpe = trace_probe_event_from_call(call);
 
-       return list_first_entry(&tpe->probes, struct trace_probe, list);
+       return list_first_entry_or_null(&tpe->probes, struct trace_probe, list);
 }
 
 static inline struct list_head *trace_probe_probe_list(struct trace_probe *tp)
index a931d9a..5295904 100644 (file)
@@ -848,6 +848,12 @@ trace_selftest_startup_function_graph(struct tracer *trace,
        }
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+       /*
+        * These tests can take some time to run. Make sure on non PREEMPT
+        * kernels, we do not trigger the softlockup detector.
+        */
+       cond_resched();
+
        tracing_reset_online_cpus(&tr->array_buffer);
        set_graph_array(tr);
 
@@ -869,6 +875,8 @@ trace_selftest_startup_function_graph(struct tracer *trace,
        if (ret)
                goto out;
 
+       cond_resched();
+
        ret = register_ftrace_graph(&fgraph_ops);
        if (ret) {
                warn_failed_init_tracer(trace, ret);
@@ -891,6 +899,8 @@ trace_selftest_startup_function_graph(struct tracer *trace,
        if (ret)
                goto out;
 
+       cond_resched();
+
        tracing_start();
 
        if (!ret && !count) {
index b7cbd66..da35e5b 100644 (file)
@@ -12,58 +12,90 @@ enum vhost_task_flags {
        VHOST_TASK_FLAGS_STOP,
 };
 
+struct vhost_task {
+       bool (*fn)(void *data);
+       void *data;
+       struct completion exited;
+       unsigned long flags;
+       struct task_struct *task;
+};
+
 static int vhost_task_fn(void *data)
 {
        struct vhost_task *vtsk = data;
-       int ret;
+       bool dead = false;
+
+       for (;;) {
+               bool did_work;
+
+               if (!dead && signal_pending(current)) {
+                       struct ksignal ksig;
+                       /*
+                        * Calling get_signal will block in SIGSTOP,
+                        * or clear fatal_signal_pending, but remember
+                        * what was set.
+                        *
+                        * This thread won't actually exit until all
+                        * of the file descriptors are closed, and
+                        * the release function is called.
+                        */
+                       dead = get_signal(&ksig);
+                       if (dead)
+                               clear_thread_flag(TIF_SIGPENDING);
+               }
+
+               /* mb paired w/ vhost_task_stop */
+               set_current_state(TASK_INTERRUPTIBLE);
+
+               if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
+                       __set_current_state(TASK_RUNNING);
+                       break;
+               }
+
+               did_work = vtsk->fn(vtsk->data);
+               if (!did_work)
+                       schedule();
+       }
 
-       ret = vtsk->fn(vtsk->data);
        complete(&vtsk->exited);
-       do_exit(ret);
+       do_exit(0);
 }
 
 /**
+ * vhost_task_wake - wakeup the vhost_task
+ * @vtsk: vhost_task to wake
+ *
+ * wake up the vhost_task worker thread
+ */
+void vhost_task_wake(struct vhost_task *vtsk)
+{
+       wake_up_process(vtsk->task);
+}
+EXPORT_SYMBOL_GPL(vhost_task_wake);
+
+/**
  * vhost_task_stop - stop a vhost_task
  * @vtsk: vhost_task to stop
  *
- * Callers must call vhost_task_should_stop and return from their worker
- * function when it returns true;
+ * vhost_task_fn ensures the worker thread exits after
+ * VHOST_TASK_FLAGS_SOP becomes true.
  */
 void vhost_task_stop(struct vhost_task *vtsk)
 {
-       pid_t pid = vtsk->task->pid;
-
        set_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags);
-       wake_up_process(vtsk->task);
+       vhost_task_wake(vtsk);
        /*
         * Make sure vhost_task_fn is no longer accessing the vhost_task before
-        * freeing it below. If userspace crashed or exited without closing,
-        * then the vhost_task->task could already be marked dead so
-        * kernel_wait will return early.
+        * freeing it below.
         */
        wait_for_completion(&vtsk->exited);
-       /*
-        * If we are just closing/removing a device and the parent process is
-        * not exiting then reap the task.
-        */
-       kernel_wait4(pid, NULL, __WCLONE, NULL);
        kfree(vtsk);
 }
 EXPORT_SYMBOL_GPL(vhost_task_stop);
 
 /**
- * vhost_task_should_stop - should the vhost task return from the work function
- * @vtsk: vhost_task to stop
- */
-bool vhost_task_should_stop(struct vhost_task *vtsk)
-{
-       return test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags);
-}
-EXPORT_SYMBOL_GPL(vhost_task_should_stop);
-
-/**
- * vhost_task_create - create a copy of a process to be used by the kernel
- * @fn: thread stack
+ * vhost_task_create - create a copy of a task to be used by the kernel
+ * @fn: vhost worker function
  * @arg: data to be passed to fn
  * @name: the thread's name
  *
@@ -71,17 +103,17 @@ EXPORT_SYMBOL_GPL(vhost_task_should_stop);
  * failure. The returned task is inactive, and the caller must fire it up
  * through vhost_task_start().
  */
-struct vhost_task *vhost_task_create(int (*fn)(void *), void *arg,
+struct vhost_task *vhost_task_create(bool (*fn)(void *), void *arg,
                                     const char *name)
 {
        struct kernel_clone_args args = {
-               .flags          = CLONE_FS | CLONE_UNTRACED | CLONE_VM,
+               .flags          = CLONE_FS | CLONE_UNTRACED | CLONE_VM |
+                                 CLONE_THREAD | CLONE_SIGHAND,
                .exit_signal    = 0,
                .fn             = vhost_task_fn,
                .name           = name,
                .user_worker    = 1,
                .no_files       = 1,
-               .ignore_signals = 1,
        };
        struct vhost_task *vtsk;
        struct task_struct *tsk;
index e91cb4c..d0b6b39 100644 (file)
@@ -42,7 +42,7 @@ MODULE_AUTHOR("Red Hat, Inc.");
 static inline bool lock_wqueue(struct watch_queue *wqueue)
 {
        spin_lock_bh(&wqueue->lock);
-       if (unlikely(wqueue->defunct)) {
+       if (unlikely(!wqueue->pipe)) {
                spin_unlock_bh(&wqueue->lock);
                return false;
        }
@@ -104,9 +104,6 @@ static bool post_one_notification(struct watch_queue *wqueue,
        unsigned int head, tail, mask, note, offset, len;
        bool done = false;
 
-       if (!pipe)
-               return false;
-
        spin_lock_irq(&pipe->rd_wait.lock);
 
        mask = pipe->ring_size - 1;
@@ -603,8 +600,11 @@ void watch_queue_clear(struct watch_queue *wqueue)
        rcu_read_lock();
        spin_lock_bh(&wqueue->lock);
 
-       /* Prevent new notifications from being stored. */
-       wqueue->defunct = true;
+       /*
+        * This pipe can be freed by callers like free_pipe_info().
+        * Removing this reference also prevents new notifications.
+        */
+       wqueue->pipe = NULL;
 
        while (!hlist_empty(&wqueue->watches)) {
                watch = hlist_entry(wqueue->watches.first, struct watch, queue_node);
index 4666a1a..02a8f40 100644 (file)
@@ -126,6 +126,12 @@ enum {
  *    cpu or grabbing pool->lock is enough for read access.  If
  *    POOL_DISASSOCIATED is set, it's identical to L.
  *
+ * K: Only modified by worker while holding pool->lock. Can be safely read by
+ *    self, while holding pool->lock or from IRQ context if %current is the
+ *    kworker.
+ *
+ * S: Only modified by worker self.
+ *
  * A: wq_pool_attach_mutex protected.
  *
  * PL: wq_pool_mutex protected.
@@ -200,6 +206,22 @@ struct worker_pool {
 };
 
 /*
+ * Per-pool_workqueue statistics. These can be monitored using
+ * tools/workqueue/wq_monitor.py.
+ */
+enum pool_workqueue_stats {
+       PWQ_STAT_STARTED,       /* work items started execution */
+       PWQ_STAT_COMPLETED,     /* work items completed execution */
+       PWQ_STAT_CPU_TIME,      /* total CPU time consumed */
+       PWQ_STAT_CPU_INTENSIVE, /* wq_cpu_intensive_thresh_us violations */
+       PWQ_STAT_CM_WAKEUP,     /* concurrency-management worker wakeups */
+       PWQ_STAT_MAYDAY,        /* maydays to rescuer */
+       PWQ_STAT_RESCUED,       /* linked work items executed by rescuer */
+
+       PWQ_NR_STATS,
+};
+
+/*
  * The per-pool workqueue.  While queued, the lower WORK_STRUCT_FLAG_BITS
  * of work_struct->data are used for flags and the remaining high bits
  * point to the pwq; thus, pwqs need to be aligned at two's power of the
@@ -236,6 +258,8 @@ struct pool_workqueue {
        struct list_head        pwqs_node;      /* WR: node on wq->pwqs */
        struct list_head        mayday_node;    /* MD: node on wq->maydays */
 
+       u64                     stats[PWQ_NR_STATS];
+
        /*
         * Release of unbound pwq is punted to system_wq.  See put_pwq()
         * and pwq_unbound_release_workfn() for details.  pool_workqueue
@@ -310,6 +334,14 @@ static struct kmem_cache *pwq_cache;
 static cpumask_var_t *wq_numa_possible_cpumask;
                                        /* possible CPUs of each node */
 
+/*
+ * Per-cpu work items which run for longer than the following threshold are
+ * automatically considered CPU intensive and excluded from concurrency
+ * management to prevent them from noticeably delaying other per-cpu work items.
+ */
+static unsigned long wq_cpu_intensive_thresh_us = 10000;
+module_param_named(cpu_intensive_thresh_us, wq_cpu_intensive_thresh_us, ulong, 0644);
+
 static bool wq_disable_numa;
 module_param_named(disable_numa, wq_disable_numa, bool, 0444);
 
@@ -705,12 +737,17 @@ static void clear_work_data(struct work_struct *work)
        set_work_data(work, WORK_STRUCT_NO_POOL, 0);
 }
 
+static inline struct pool_workqueue *work_struct_pwq(unsigned long data)
+{
+       return (struct pool_workqueue *)(data & WORK_STRUCT_WQ_DATA_MASK);
+}
+
 static struct pool_workqueue *get_work_pwq(struct work_struct *work)
 {
        unsigned long data = atomic_long_read(&work->data);
 
        if (data & WORK_STRUCT_PWQ)
-               return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
+               return work_struct_pwq(data);
        else
                return NULL;
 }
@@ -738,8 +775,7 @@ static struct worker_pool *get_work_pool(struct work_struct *work)
        assert_rcu_or_pool_mutex();
 
        if (data & WORK_STRUCT_PWQ)
-               return ((struct pool_workqueue *)
-                       (data & WORK_STRUCT_WQ_DATA_MASK))->pool;
+               return work_struct_pwq(data)->pool;
 
        pool_id = data >> WORK_OFFQ_POOL_SHIFT;
        if (pool_id == WORK_OFFQ_POOL_NONE)
@@ -760,8 +796,7 @@ static int get_work_pool_id(struct work_struct *work)
        unsigned long data = atomic_long_read(&work->data);
 
        if (data & WORK_STRUCT_PWQ)
-               return ((struct pool_workqueue *)
-                       (data & WORK_STRUCT_WQ_DATA_MASK))->pool->id;
+               return work_struct_pwq(data)->pool->id;
 
        return data >> WORK_OFFQ_POOL_SHIFT;
 }
@@ -864,6 +899,152 @@ static void wake_up_worker(struct worker_pool *pool)
 }
 
 /**
+ * worker_set_flags - set worker flags and adjust nr_running accordingly
+ * @worker: self
+ * @flags: flags to set
+ *
+ * Set @flags in @worker->flags and adjust nr_running accordingly.
+ *
+ * CONTEXT:
+ * raw_spin_lock_irq(pool->lock)
+ */
+static inline void worker_set_flags(struct worker *worker, unsigned int flags)
+{
+       struct worker_pool *pool = worker->pool;
+
+       WARN_ON_ONCE(worker->task != current);
+
+       /* If transitioning into NOT_RUNNING, adjust nr_running. */
+       if ((flags & WORKER_NOT_RUNNING) &&
+           !(worker->flags & WORKER_NOT_RUNNING)) {
+               pool->nr_running--;
+       }
+
+       worker->flags |= flags;
+}
+
+/**
+ * worker_clr_flags - clear worker flags and adjust nr_running accordingly
+ * @worker: self
+ * @flags: flags to clear
+ *
+ * Clear @flags in @worker->flags and adjust nr_running accordingly.
+ *
+ * CONTEXT:
+ * raw_spin_lock_irq(pool->lock)
+ */
+static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
+{
+       struct worker_pool *pool = worker->pool;
+       unsigned int oflags = worker->flags;
+
+       WARN_ON_ONCE(worker->task != current);
+
+       worker->flags &= ~flags;
+
+       /*
+        * If transitioning out of NOT_RUNNING, increment nr_running.  Note
+        * that the nested NOT_RUNNING is not a noop.  NOT_RUNNING is mask
+        * of multiple flags, not a single flag.
+        */
+       if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING))
+               if (!(worker->flags & WORKER_NOT_RUNNING))
+                       pool->nr_running++;
+}
+
+#ifdef CONFIG_WQ_CPU_INTENSIVE_REPORT
+
+/*
+ * Concurrency-managed per-cpu work items that hog CPU for longer than
+ * wq_cpu_intensive_thresh_us trigger the automatic CPU_INTENSIVE mechanism,
+ * which prevents them from stalling other concurrency-managed work items. If a
+ * work function keeps triggering this mechanism, it's likely that the work item
+ * should be using an unbound workqueue instead.
+ *
+ * wq_cpu_intensive_report() tracks work functions which trigger such conditions
+ * and report them so that they can be examined and converted to use unbound
+ * workqueues as appropriate. To avoid flooding the console, each violating work
+ * function is tracked and reported with exponential backoff.
+ */
+#define WCI_MAX_ENTS 128
+
+struct wci_ent {
+       work_func_t             func;
+       atomic64_t              cnt;
+       struct hlist_node       hash_node;
+};
+
+static struct wci_ent wci_ents[WCI_MAX_ENTS];
+static int wci_nr_ents;
+static DEFINE_RAW_SPINLOCK(wci_lock);
+static DEFINE_HASHTABLE(wci_hash, ilog2(WCI_MAX_ENTS));
+
+static struct wci_ent *wci_find_ent(work_func_t func)
+{
+       struct wci_ent *ent;
+
+       hash_for_each_possible_rcu(wci_hash, ent, hash_node,
+                                  (unsigned long)func) {
+               if (ent->func == func)
+                       return ent;
+       }
+       return NULL;
+}
+
+static void wq_cpu_intensive_report(work_func_t func)
+{
+       struct wci_ent *ent;
+
+restart:
+       ent = wci_find_ent(func);
+       if (ent) {
+               u64 cnt;
+
+               /*
+                * Start reporting from the fourth time and back off
+                * exponentially.
+                */
+               cnt = atomic64_inc_return_relaxed(&ent->cnt);
+               if (cnt >= 4 && is_power_of_2(cnt))
+                       printk_deferred(KERN_WARNING "workqueue: %ps hogged CPU for >%luus %llu times, consider switching to WQ_UNBOUND\n",
+                                       ent->func, wq_cpu_intensive_thresh_us,
+                                       atomic64_read(&ent->cnt));
+               return;
+       }
+
+       /*
+        * @func is a new violation. Allocate a new entry for it. If wcn_ents[]
+        * is exhausted, something went really wrong and we probably made enough
+        * noise already.
+        */
+       if (wci_nr_ents >= WCI_MAX_ENTS)
+               return;
+
+       raw_spin_lock(&wci_lock);
+
+       if (wci_nr_ents >= WCI_MAX_ENTS) {
+               raw_spin_unlock(&wci_lock);
+               return;
+       }
+
+       if (wci_find_ent(func)) {
+               raw_spin_unlock(&wci_lock);
+               goto restart;
+       }
+
+       ent = &wci_ents[wci_nr_ents++];
+       ent->func = func;
+       atomic64_set(&ent->cnt, 1);
+       hash_add_rcu(wci_hash, &ent->hash_node, (unsigned long)func);
+
+       raw_spin_unlock(&wci_lock);
+}
+
+#else  /* CONFIG_WQ_CPU_INTENSIVE_REPORT */
+static void wq_cpu_intensive_report(work_func_t func) {}
+#endif /* CONFIG_WQ_CPU_INTENSIVE_REPORT */
+
+/**
  * wq_worker_running - a worker is running again
  * @task: task waking up
  *
@@ -873,7 +1054,7 @@ void wq_worker_running(struct task_struct *task)
 {
        struct worker *worker = kthread_data(task);
 
-       if (!worker->sleeping)
+       if (!READ_ONCE(worker->sleeping))
                return;
 
        /*
@@ -886,7 +1067,14 @@ void wq_worker_running(struct task_struct *task)
        if (!(worker->flags & WORKER_NOT_RUNNING))
                worker->pool->nr_running++;
        preempt_enable();
-       worker->sleeping = 0;
+
+       /*
+        * CPU intensive auto-detection cares about how long a work item hogged
+        * CPU without sleeping. Reset the starting timestamp on wakeup.
+        */
+       worker->current_at = worker->task->se.sum_exec_runtime;
+
+       WRITE_ONCE(worker->sleeping, 0);
 }
 
 /**
@@ -912,10 +1100,10 @@ void wq_worker_sleeping(struct task_struct *task)
        pool = worker->pool;
 
        /* Return if preempted before wq_worker_running() was reached */
-       if (worker->sleeping)
+       if (READ_ONCE(worker->sleeping))
                return;
 
-       worker->sleeping = 1;
+       WRITE_ONCE(worker->sleeping, 1);
        raw_spin_lock_irq(&pool->lock);
 
        /*
@@ -929,12 +1117,66 @@ void wq_worker_sleeping(struct task_struct *task)
        }
 
        pool->nr_running--;
-       if (need_more_worker(pool))
+       if (need_more_worker(pool)) {
+               worker->current_pwq->stats[PWQ_STAT_CM_WAKEUP]++;
                wake_up_worker(pool);
+       }
        raw_spin_unlock_irq(&pool->lock);
 }
 
 /**
+ * wq_worker_tick - a scheduler tick occurred while a kworker is running
+ * @task: task currently running
+ *
+ * Called from scheduler_tick(). We're in the IRQ context and the current
+ * worker's fields which follow the 'K' locking rule can be accessed safely.
+ */
+void wq_worker_tick(struct task_struct *task)
+{
+       struct worker *worker = kthread_data(task);
+       struct pool_workqueue *pwq = worker->current_pwq;
+       struct worker_pool *pool = worker->pool;
+
+       if (!pwq)
+               return;
+
+       pwq->stats[PWQ_STAT_CPU_TIME] += TICK_USEC;
+
+       if (!wq_cpu_intensive_thresh_us)
+               return;
+
+       /*
+        * If the current worker is concurrency managed and hogged the CPU for
+        * longer than wq_cpu_intensive_thresh_us, it's automatically marked
+        * CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
+        *
+        * Set @worker->sleeping means that @worker is in the process of
+        * switching out voluntarily and won't be contributing to
+        * @pool->nr_running until it wakes up. As wq_worker_sleeping() also
+        * decrements ->nr_running, setting CPU_INTENSIVE here can lead to
+        * double decrements. The task is releasing the CPU anyway. Let's skip.
+        * We probably want to make this prettier in the future.
+        */
+       if ((worker->flags & WORKER_NOT_RUNNING) || READ_ONCE(worker->sleeping) ||
+           worker->task->se.sum_exec_runtime - worker->current_at <
+           wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
+               return;
+
+       raw_spin_lock(&pool->lock);
+
+       worker_set_flags(worker, WORKER_CPU_INTENSIVE);
+       wq_cpu_intensive_report(worker->current_func);
+       pwq->stats[PWQ_STAT_CPU_INTENSIVE]++;
+
+       if (need_more_worker(pool)) {
+               pwq->stats[PWQ_STAT_CM_WAKEUP]++;
+               wake_up_worker(pool);
+       }
+
+       raw_spin_unlock(&pool->lock);
+}
+
+/**
  * wq_worker_last_func - retrieve worker's last work function
  * @task: Task to retrieve last work function of.
  *
@@ -966,60 +1208,6 @@ work_func_t wq_worker_last_func(struct task_struct *task)
 }
 
 /**
- * worker_set_flags - set worker flags and adjust nr_running accordingly
- * @worker: self
- * @flags: flags to set
- *
- * Set @flags in @worker->flags and adjust nr_running accordingly.
- *
- * CONTEXT:
- * raw_spin_lock_irq(pool->lock)
- */
-static inline void worker_set_flags(struct worker *worker, unsigned int flags)
-{
-       struct worker_pool *pool = worker->pool;
-
-       WARN_ON_ONCE(worker->task != current);
-
-       /* If transitioning into NOT_RUNNING, adjust nr_running. */
-       if ((flags & WORKER_NOT_RUNNING) &&
-           !(worker->flags & WORKER_NOT_RUNNING)) {
-               pool->nr_running--;
-       }
-
-       worker->flags |= flags;
-}
-
-/**
- * worker_clr_flags - clear worker flags and adjust nr_running accordingly
- * @worker: self
- * @flags: flags to clear
- *
- * Clear @flags in @worker->flags and adjust nr_running accordingly.
- *
- * CONTEXT:
- * raw_spin_lock_irq(pool->lock)
- */
-static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
-{
-       struct worker_pool *pool = worker->pool;
-       unsigned int oflags = worker->flags;
-
-       WARN_ON_ONCE(worker->task != current);
-
-       worker->flags &= ~flags;
-
-       /*
-        * If transitioning out of NOT_RUNNING, increment nr_running.  Note
-        * that the nested NOT_RUNNING is not a noop.  NOT_RUNNING is mask
-        * of multiple flags, not a single flag.
-        */
-       if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING))
-               if (!(worker->flags & WORKER_NOT_RUNNING))
-                       pool->nr_running++;
-}
-
-/**
  * find_worker_executing_work - find worker which is executing a work
  * @pool: pool of interest
  * @work: work to find worker for
@@ -1539,6 +1727,8 @@ out:
  * We queue the work to a specific CPU, the caller must ensure it
  * can't go away.  Callers that fail to ensure that the specified
  * CPU cannot go away will execute on a randomly chosen CPU.
+ * But note well that callers specifying a CPU that never has been
+ * online will get a splat.
  *
  * Return: %false if @work was already on a queue, %true otherwise.
  */
@@ -2163,6 +2353,7 @@ static void send_mayday(struct work_struct *work)
                get_pwq(pwq);
                list_add_tail(&pwq->mayday_node, &wq->maydays);
                wake_up_process(wq->rescuer->task);
+               pwq->stats[PWQ_STAT_MAYDAY]++;
        }
 }
 
@@ -2300,7 +2491,6 @@ __acquires(&pool->lock)
 {
        struct pool_workqueue *pwq = get_work_pwq(work);
        struct worker_pool *pool = worker->pool;
-       bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
        unsigned long work_data;
        struct worker *collision;
 #ifdef CONFIG_LOCKDEP
@@ -2337,6 +2527,7 @@ __acquires(&pool->lock)
        worker->current_work = work;
        worker->current_func = work->func;
        worker->current_pwq = pwq;
+       worker->current_at = worker->task->se.sum_exec_runtime;
        work_data = *work_data_bits(work);
        worker->current_color = get_work_color(work_data);
 
@@ -2354,7 +2545,7 @@ __acquires(&pool->lock)
         * of concurrency management and the next code block will chain
         * execution of the pending work items.
         */
-       if (unlikely(cpu_intensive))
+       if (unlikely(pwq->wq->flags & WQ_CPU_INTENSIVE))
                worker_set_flags(worker, WORKER_CPU_INTENSIVE);
 
        /*
@@ -2401,6 +2592,7 @@ __acquires(&pool->lock)
         * workqueues), so hiding them isn't a problem.
         */
        lockdep_invariant_state(true);
+       pwq->stats[PWQ_STAT_STARTED]++;
        trace_workqueue_execute_start(work);
        worker->current_func(work);
        /*
@@ -2408,6 +2600,7 @@ __acquires(&pool->lock)
         * point will only record its address.
         */
        trace_workqueue_execute_end(work, worker->current_func);
+       pwq->stats[PWQ_STAT_COMPLETED]++;
        lock_map_release(&lockdep_map);
        lock_map_release(&pwq->wq->lockdep_map);
 
@@ -2432,9 +2625,12 @@ __acquires(&pool->lock)
 
        raw_spin_lock_irq(&pool->lock);
 
-       /* clear cpu intensive status */
-       if (unlikely(cpu_intensive))
-               worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
+       /*
+        * In addition to %WQ_CPU_INTENSIVE, @worker may also have been marked
+        * CPU intensive by wq_worker_tick() if @work hogged CPU longer than
+        * wq_cpu_intensive_thresh_us. Clear it.
+        */
+       worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
 
        /* tag the worker for identification in schedule() */
        worker->last_func = worker->current_func;
@@ -2651,6 +2847,7 @@ repeat:
                                if (first)
                                        pool->watchdog_ts = jiffies;
                                move_linked_works(work, scheduled, &n);
+                               pwq->stats[PWQ_STAT_RESCUED]++;
                        }
                        first = false;
                }
index e00b120..6b1d66e 100644 (file)
@@ -28,13 +28,18 @@ struct worker {
                struct hlist_node       hentry; /* L: while busy */
        };
 
-       struct work_struct      *current_work;  /* L: work being processed */
-       work_func_t             current_func;   /* L: current_work's fn */
-       struct pool_workqueue   *current_pwq;   /* L: current_work's pwq */
-       unsigned int            current_color;  /* L: current_work's color */
-       struct list_head        scheduled;      /* L: scheduled works */
+       struct work_struct      *current_work;  /* K: work being processed and its */
+       work_func_t             current_func;   /* K: function */
+       struct pool_workqueue   *current_pwq;   /* K: pwq */
+       u64                     current_at;     /* K: runtime at start or last wakeup */
+       unsigned int            current_color;  /* K: color */
+
+       int                     sleeping;       /* S: is worker sleeping? */
 
-       /* 64 bytes boundary on 64bit, 32 on 32bit */
+       /* used by the scheduler to determine a worker's last known identity */
+       work_func_t             last_func;      /* K: last work's fn */
+
+       struct list_head        scheduled;      /* L: scheduled works */
 
        struct task_struct      *task;          /* I: worker task */
        struct worker_pool      *pool;          /* A: the associated pool */
@@ -42,10 +47,9 @@ struct worker {
        struct list_head        node;           /* A: anchored at pool->workers */
                                                /* A: runs through worker->node */
 
-       unsigned long           last_active;    /* L: last active timestamp */
+       unsigned long           last_active;    /* K: last active timestamp */
        unsigned int            flags;          /* X: flags */
        int                     id;             /* I: worker id */
-       int                     sleeping;       /* None */
 
        /*
         * Opaque string set with work_set_desc().  Printed out with task
@@ -55,9 +59,6 @@ struct worker {
 
        /* used only by rescuers to point to the target workqueue */
        struct workqueue_struct *rescue_wq;     /* I: the workqueue to rescue */
-
-       /* used by the scheduler to determine a worker's last known identity */
-       work_func_t             last_func;
 };
 
 /**
@@ -76,6 +77,7 @@ static inline struct worker *current_wq_worker(void)
  */
 void wq_worker_running(struct task_struct *task);
 void wq_worker_sleeping(struct task_struct *task);
+void wq_worker_tick(struct task_struct *task);
 work_func_t wq_worker_last_func(struct task_struct *task);
 
 #endif /* _KERNEL_WORKQUEUE_INTERNAL_H */
index ce51d4d..1d5c3bc 100644 (file)
@@ -1134,6 +1134,19 @@ config WQ_WATCHDOG
          state.  This can be configured through kernel parameter
          "workqueue.watchdog_thresh" and its sysfs counterpart.
 
+config WQ_CPU_INTENSIVE_REPORT
+       bool "Report per-cpu work items which hog CPU for too long"
+       depends on DEBUG_KERNEL
+       help
+         Say Y here to enable reporting of concurrency-managed per-cpu work
+         items that hog CPUs for longer than
+         workqueue.cpu_intensive_threshold_us. Workqueue automatically
+         detects and excludes them from concurrency management to prevent
+         them from stalling other per-cpu work items. Occassional
+         triggering may not necessarily indicate a problem. Repeated
+         triggering likely indicates that the work item should be switched
+         to use an unbound workqueue.
+
 config TEST_LOCKUP
        tristate "Test module to generate lockups"
        depends on m
@@ -2453,6 +2466,23 @@ config BITFIELD_KUNIT
 
          If unsure, say N.
 
+config CHECKSUM_KUNIT
+       tristate "KUnit test checksum functions at runtime" if !KUNIT_ALL_TESTS
+       depends on KUNIT
+       default KUNIT_ALL_TESTS
+       help
+         Enable this option to test the checksum functions at boot.
+
+         KUnit tests run during boot and output the results to the debug log
+         in TAP format (http://testanything.org/). Only useful for kernel devs
+         running the KUnit test harness, and not intended for inclusion into a
+         production build.
+
+         For more information on KUnit and unit tests in general please refer
+         to the KUnit documentation in Documentation/dev-tools/kunit/.
+
+         If unsure, say N.
+
 config HASH_KUNIT_TEST
        tristate "KUnit Test for integer hash functions" if !KUNIT_ALL_TESTS
        depends on KUNIT
index 876fcde..cd37ec1 100644 (file)
@@ -377,6 +377,7 @@ obj-$(CONFIG_PLDMFW) += pldmfw/
 # KUnit tests
 CFLAGS_bitfield_kunit.o := $(DISABLE_STRUCTLEAK_PLUGIN)
 obj-$(CONFIG_BITFIELD_KUNIT) += bitfield_kunit.o
+obj-$(CONFIG_CHECKSUM_KUNIT) += checksum_kunit.o
 obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o
 obj-$(CONFIG_HASHTABLE_KUNIT_TEST) += hashtable_test.o
 obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o
diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
new file mode 100644 (file)
index 0000000..ace3c47
--- /dev/null
@@ -0,0 +1,334 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Test cases csum_partial and csum_fold
+ */
+
+#include <kunit/test.h>
+#include <asm/checksum.h>
+
+#define MAX_LEN 512
+#define MAX_ALIGN 64
+#define TEST_BUFLEN (MAX_LEN + MAX_ALIGN)
+
+static const __wsum random_init_sum = 0x2847aab;
+static const u8 random_buf[] = {
+       0xac, 0xd7, 0x76, 0x69, 0x6e, 0xf2, 0x93, 0x2c, 0x1f, 0xe0, 0xde, 0x86,
+       0x8f, 0x54, 0x33, 0x90, 0x95, 0xbf, 0xff, 0xb9, 0xea, 0x62, 0x6e, 0xb5,
+       0xd3, 0x4f, 0xf5, 0x60, 0x50, 0x5c, 0xc7, 0xfa, 0x6d, 0x1a, 0xc7, 0xf0,
+       0xd2, 0x2c, 0x12, 0x3d, 0x88, 0xe3, 0x14, 0x21, 0xb1, 0x5e, 0x45, 0x31,
+       0xa2, 0x85, 0x36, 0x76, 0xba, 0xd8, 0xad, 0xbb, 0x9e, 0x49, 0x8f, 0xf7,
+       0xce, 0xea, 0xef, 0xca, 0x2c, 0x29, 0xf7, 0x15, 0x5c, 0x1d, 0x4d, 0x09,
+       0x1f, 0xe2, 0x14, 0x31, 0x8c, 0x07, 0x57, 0x23, 0x1f, 0x6f, 0x03, 0xe1,
+       0x93, 0x19, 0x53, 0x03, 0x45, 0x49, 0x9a, 0x3b, 0x8e, 0x0c, 0x12, 0x5d,
+       0x8a, 0xb8, 0x9b, 0x8c, 0x9a, 0x03, 0xe5, 0xa2, 0x43, 0xd2, 0x3b, 0x4e,
+       0x7e, 0x30, 0x3c, 0x22, 0x2d, 0xc5, 0xfc, 0x9e, 0xdb, 0xc6, 0xf9, 0x69,
+       0x12, 0x39, 0x1f, 0xa0, 0x11, 0x0c, 0x3f, 0xf5, 0x53, 0xc9, 0x30, 0xfb,
+       0xb0, 0xdd, 0x21, 0x1d, 0x34, 0xe2, 0x65, 0x30, 0xf1, 0xe8, 0x1b, 0xe7,
+       0x55, 0x0d, 0xeb, 0xbd, 0xcc, 0x9d, 0x24, 0xa4, 0xad, 0xa7, 0x93, 0x47,
+       0x19, 0x2e, 0xc4, 0x5c, 0x3b, 0xc7, 0x6d, 0x95, 0x0c, 0x47, 0x60, 0xaf,
+       0x5b, 0x47, 0xee, 0xdc, 0x31, 0x31, 0x14, 0x12, 0x7e, 0x9e, 0x45, 0xb1,
+       0xc1, 0x69, 0x4b, 0x84, 0xfc, 0x88, 0xc1, 0x9e, 0x46, 0xb4, 0xc2, 0x25,
+       0xc5, 0x6c, 0x4c, 0x22, 0x58, 0x5c, 0xbe, 0xff, 0xea, 0x88, 0x88, 0x7a,
+       0xcb, 0x1c, 0x5d, 0x63, 0xa1, 0xf2, 0x33, 0x0c, 0xa2, 0x16, 0x0b, 0x6e,
+       0x2b, 0x79, 0x58, 0xf7, 0xac, 0xd3, 0x6a, 0x3f, 0x81, 0x57, 0x48, 0x45,
+       0xe3, 0x7c, 0xdc, 0xd6, 0x34, 0x7e, 0xe6, 0x73, 0xfa, 0xcb, 0x31, 0x18,
+       0xa9, 0x0b, 0xee, 0x6b, 0x99, 0xb9, 0x2d, 0xde, 0x22, 0x0e, 0x71, 0x57,
+       0x0e, 0x9b, 0x11, 0xd1, 0x15, 0x41, 0xd0, 0x6b, 0x50, 0x8a, 0x23, 0x64,
+       0xe3, 0x9c, 0xb3, 0x55, 0x09, 0xe9, 0x32, 0x67, 0xf9, 0xe0, 0x73, 0xf1,
+       0x60, 0x66, 0x0b, 0x88, 0x79, 0x8d, 0x4b, 0x52, 0x83, 0x20, 0x26, 0x78,
+       0x49, 0x27, 0xe7, 0x3e, 0x29, 0xa8, 0x18, 0x82, 0x41, 0xdd, 0x1e, 0xcc,
+       0x3b, 0xc4, 0x65, 0xd1, 0x21, 0x40, 0x72, 0xb2, 0x87, 0x5e, 0x16, 0x10,
+       0x80, 0x3f, 0x4b, 0x58, 0x1c, 0xc2, 0x79, 0x20, 0xf0, 0xe0, 0x80, 0xd3,
+       0x52, 0xa5, 0x19, 0x6e, 0x47, 0x90, 0x08, 0xf5, 0x50, 0xe2, 0xd6, 0xae,
+       0xe9, 0x2e, 0xdc, 0xd5, 0xb4, 0x90, 0x1f, 0x79, 0x49, 0x82, 0x21, 0x84,
+       0xa0, 0xb5, 0x2f, 0xff, 0x30, 0x71, 0xed, 0x80, 0x68, 0xb1, 0x6d, 0xef,
+       0xf6, 0xcf, 0xb8, 0x41, 0x79, 0xf5, 0x01, 0xbc, 0x0c, 0x9b, 0x0e, 0x06,
+       0xf3, 0xb0, 0xbb, 0x97, 0xb8, 0xb1, 0xfd, 0x51, 0x4e, 0xef, 0x0a, 0x3d,
+       0x7a, 0x3d, 0xbd, 0x61, 0x00, 0xa2, 0xb3, 0xf0, 0x1d, 0x77, 0x7b, 0x6c,
+       0x01, 0x61, 0xa5, 0xa3, 0xdb, 0xd5, 0xd5, 0xf4, 0xb5, 0x28, 0x9f, 0x0a,
+       0xa3, 0x82, 0x5f, 0x4b, 0x40, 0x0f, 0x05, 0x0e, 0x78, 0xed, 0xbf, 0x17,
+       0xf6, 0x5a, 0x8a, 0x7d, 0xf9, 0x45, 0xc1, 0xd7, 0x1b, 0x9d, 0x6c, 0x07,
+       0x88, 0xf3, 0xbc, 0xf1, 0xea, 0x28, 0x1f, 0xb8, 0x7a, 0x60, 0x3c, 0xce,
+       0x3e, 0x50, 0xb2, 0x0b, 0xcf, 0xe5, 0x08, 0x1f, 0x48, 0x04, 0xf9, 0x35,
+       0x29, 0x15, 0xbe, 0x82, 0x96, 0xc2, 0x55, 0x04, 0x6c, 0x19, 0x45, 0x29,
+       0x0b, 0xb6, 0x49, 0x12, 0xfb, 0x8d, 0x1b, 0x75, 0x8b, 0xd9, 0x6a, 0x5c,
+       0xbe, 0x46, 0x2b, 0x41, 0xfe, 0x21, 0xad, 0x1f, 0x75, 0xe7, 0x90, 0x3d,
+       0xe1, 0xdf, 0x4b, 0xe1, 0x81, 0xe2, 0x17, 0x02, 0x7b, 0x58, 0x8b, 0x92,
+       0x1a, 0xac, 0x46, 0xdd, 0x2e, 0xce, 0x40, 0x09
+};
+static const __sum16 expected_results[] = {
+       0x82d0, 0x8224, 0xab23, 0xaaad, 0x41ad, 0x413f, 0x4f3e, 0x4eab, 0x22ab,
+       0x228c, 0x428b, 0x41ad, 0xbbac, 0xbb1d, 0x671d, 0x66ea, 0xd6e9, 0xd654,
+       0x1754, 0x1655, 0x5d54, 0x5c6a, 0xfa69, 0xf9fb, 0x44fb, 0x4428, 0xf527,
+       0xf432, 0x9432, 0x93e2, 0x37e2, 0x371b, 0x3d1a, 0x3cad, 0x22ad, 0x21e6,
+       0x31e5, 0x3113, 0x0513, 0x0501, 0xc800, 0xc778, 0xe477, 0xe463, 0xc363,
+       0xc2b2, 0x64b2, 0x646d, 0x336d, 0x32cb, 0xadca, 0xad94, 0x3794, 0x36da,
+       0x5ed9, 0x5e2c, 0xa32b, 0xa28d, 0x598d, 0x58fe, 0x61fd, 0x612f, 0x772e,
+       0x763f, 0xac3e, 0xac12, 0x8312, 0x821b, 0x6d1b, 0x6cbf, 0x4fbf, 0x4f72,
+       0x4672, 0x4653, 0x6452, 0x643e, 0x333e, 0x32b2, 0x2bb2, 0x2b5b, 0x085b,
+       0x083c, 0x993b, 0x9938, 0xb837, 0xb7a4, 0x9ea4, 0x9e51, 0x9b51, 0x9b0c,
+       0x520c, 0x5172, 0x1672, 0x15e4, 0x09e4, 0x09d2, 0xacd1, 0xac47, 0xf446,
+       0xf3ab, 0x67ab, 0x6711, 0x6411, 0x632c, 0xc12b, 0xc0e8, 0xeee7, 0xeeac,
+       0xa0ac, 0xa02e, 0x702e, 0x6ff2, 0x4df2, 0x4dc5, 0x88c4, 0x87c8, 0xe9c7,
+       0xe8ec, 0x22ec, 0x21f3, 0xb8f2, 0xb8e0, 0x7fe0, 0x7fc1, 0xdfc0, 0xdfaf,
+       0xd3af, 0xd370, 0xde6f, 0xde1c, 0x151c, 0x14ec, 0x19eb, 0x193b, 0x3c3a,
+       0x3c19, 0x1f19, 0x1ee5, 0x3ce4, 0x3c7f, 0x0c7f, 0x0b8e, 0x238d, 0x2372,
+       0x3c71, 0x3c1c, 0x2f1c, 0x2e31, 0x7130, 0x7064, 0xd363, 0xd33f, 0x2f3f,
+       0x2e92, 0x8791, 0x86fe, 0x3ffe, 0x3fe5, 0x11e5, 0x1121, 0xb520, 0xb4e5,
+       0xede4, 0xed77, 0x5877, 0x586b, 0x116b, 0x110b, 0x620a, 0x61af, 0x1aaf,
+       0x19c1, 0x3dc0, 0x3d8f, 0x0c8f, 0x0c7b, 0xfa7a, 0xf9fc, 0x5bfc, 0x5bb7,
+       0xaab6, 0xa9f5, 0x40f5, 0x40aa, 0xbca9, 0xbbad, 0x33ad, 0x32ec, 0x94eb,
+       0x94a5, 0xe0a4, 0xdfe2, 0xbae2, 0xba1d, 0x4e1d, 0x4dd1, 0x2bd1, 0x2b79,
+       0xcf78, 0xceba, 0xcfb9, 0xcecf, 0x46cf, 0x4647, 0xcc46, 0xcb7b, 0xaf7b,
+       0xaf1e, 0x4c1e, 0x4b7d, 0x597c, 0x5949, 0x4d49, 0x4ca7, 0x36a7, 0x369c,
+       0xc89b, 0xc870, 0x4f70, 0x4f18, 0x5817, 0x576b, 0x846a, 0x8400, 0x4500,
+       0x447f, 0xed7e, 0xed36, 0xa836, 0xa753, 0x2b53, 0x2a77, 0x5476, 0x5442,
+       0xd641, 0xd55b, 0x625b, 0x6161, 0x9660, 0x962f, 0x7e2f, 0x7d86, 0x7286,
+       0x7198, 0x0698, 0x05ff, 0x4cfe, 0x4cd1, 0x6ed0, 0x6eae, 0x60ae, 0x603d,
+       0x093d, 0x092f, 0x6e2e, 0x6e1d, 0x9d1c, 0x9d07, 0x5c07, 0x5b37, 0xf036,
+       0xefe6, 0x65e6, 0x65c3, 0x01c3, 0x00e0, 0x64df, 0x642c, 0x0f2c, 0x0f23,
+       0x2622, 0x25f0, 0xbeef, 0xbdf6, 0xddf5, 0xdd82, 0xec81, 0xec21, 0x8621,
+       0x8616, 0xfe15, 0xfd9c, 0x709c, 0x7051, 0x1e51, 0x1dce, 0xfdcd, 0xfda7,
+       0x85a7, 0x855e, 0x5e5e, 0x5d77, 0x1f77, 0x1f4e, 0x774d, 0x7735, 0xf534,
+       0xf4f3, 0x17f3, 0x17d5, 0x4bd4, 0x4b99, 0x8798, 0x8733, 0xb632, 0xb611,
+       0x7611, 0x759f, 0xc39e, 0xc317, 0x6517, 0x6501, 0x5501, 0x5481, 0x1581,
+       0x1536, 0xbd35, 0xbd19, 0xfb18, 0xfa9f, 0xda9f, 0xd9af, 0xf9ae, 0xf92e,
+       0x262e, 0x25dc, 0x80db, 0x80c2, 0x12c2, 0x127b, 0x827a, 0x8272, 0x8d71,
+       0x8d21, 0xab20, 0xaa4a, 0xfc49, 0xfb60, 0xcd60, 0xcc84, 0xf783, 0xf6cf,
+       0x66cf, 0x66b0, 0xedaf, 0xed66, 0x6b66, 0x6b45, 0xe744, 0xe6a4, 0x31a4,
+       0x3175, 0x3274, 0x3244, 0xc143, 0xc056, 0x4056, 0x3fee, 0x8eed, 0x8e80,
+       0x9f7f, 0x9e89, 0xcf88, 0xced0, 0x8dd0, 0x8d57, 0x9856, 0x9855, 0xdc54,
+       0xdc48, 0x4148, 0x413a, 0x3b3a, 0x3a47, 0x8a46, 0x898b, 0xf28a, 0xf1d2,
+       0x40d2, 0x3fd5, 0xeed4, 0xee86, 0xff85, 0xff7b, 0xc27b, 0xc201, 0x8501,
+       0x8444, 0x2344, 0x2344, 0x8143, 0x8090, 0x908f, 0x9072, 0x1972, 0x18f7,
+       0xacf6, 0xacf5, 0x4bf5, 0x4b50, 0xa84f, 0xa774, 0xd273, 0xd19e, 0xdd9d,
+       0xdce8, 0xb4e8, 0xb449, 0xaa49, 0xa9a6, 0x27a6, 0x2747, 0xdc46, 0xdc06,
+       0xcd06, 0xcd01, 0xbf01, 0xbe89, 0xd188, 0xd0c9, 0xb9c9, 0xb8d3, 0x5ed3,
+       0x5e49, 0xe148, 0xe04f, 0x9b4f, 0x9a8e, 0xc38d, 0xc372, 0x2672, 0x2606,
+       0x1f06, 0x1e7e, 0x2b7d, 0x2ac1, 0x39c0, 0x38d6, 0x10d6, 0x10b7, 0x58b6,
+       0x583c, 0xf83b, 0xf7ff, 0x29ff, 0x29c1, 0xd9c0, 0xd90e, 0xce0e, 0xcd3f,
+       0xe83e, 0xe836, 0xc936, 0xc8ee, 0xc4ee, 0xc3f5, 0x8ef5, 0x8ecc, 0x79cc,
+       0x790e, 0xf70d, 0xf677, 0x3477, 0x3422, 0x3022, 0x2fb6, 0x16b6, 0x1671,
+       0xed70, 0xed65, 0x3765, 0x371c, 0x251c, 0x2421, 0x9720, 0x9705, 0x2205,
+       0x217a, 0x4879, 0x480f, 0xec0e, 0xeb50, 0xa550, 0xa525, 0x6425, 0x6327,
+       0x4227, 0x417a, 0x227a, 0x2205, 0x3b04, 0x3a74, 0xfd73, 0xfc92, 0x1d92,
+       0x1d47, 0x3c46, 0x3bc5, 0x59c4, 0x59ad, 0x57ad, 0x5732, 0xff31, 0xfea6,
+       0x6ca6, 0x6c8c, 0xc08b, 0xc045, 0xe344, 0xe316, 0x1516, 0x14d6,
+};
+static const __wsum init_sums_no_overflow[] = {
+       0xffffffff, 0xfffffffb, 0xfffffbfb, 0xfffffbf7, 0xfffff7f7, 0xfffff7f3,
+       0xfffff3f3, 0xfffff3ef, 0xffffefef, 0xffffefeb, 0xffffebeb, 0xffffebe7,
+       0xffffe7e7, 0xffffe7e3, 0xffffe3e3, 0xffffe3df, 0xffffdfdf, 0xffffdfdb,
+       0xffffdbdb, 0xffffdbd7, 0xffffd7d7, 0xffffd7d3, 0xffffd3d3, 0xffffd3cf,
+       0xffffcfcf, 0xffffcfcb, 0xffffcbcb, 0xffffcbc7, 0xffffc7c7, 0xffffc7c3,
+       0xffffc3c3, 0xffffc3bf, 0xffffbfbf, 0xffffbfbb, 0xffffbbbb, 0xffffbbb7,
+       0xffffb7b7, 0xffffb7b3, 0xffffb3b3, 0xffffb3af, 0xffffafaf, 0xffffafab,
+       0xffffabab, 0xffffaba7, 0xffffa7a7, 0xffffa7a3, 0xffffa3a3, 0xffffa39f,
+       0xffff9f9f, 0xffff9f9b, 0xffff9b9b, 0xffff9b97, 0xffff9797, 0xffff9793,
+       0xffff9393, 0xffff938f, 0xffff8f8f, 0xffff8f8b, 0xffff8b8b, 0xffff8b87,
+       0xffff8787, 0xffff8783, 0xffff8383, 0xffff837f, 0xffff7f7f, 0xffff7f7b,
+       0xffff7b7b, 0xffff7b77, 0xffff7777, 0xffff7773, 0xffff7373, 0xffff736f,
+       0xffff6f6f, 0xffff6f6b, 0xffff6b6b, 0xffff6b67, 0xffff6767, 0xffff6763,
+       0xffff6363, 0xffff635f, 0xffff5f5f, 0xffff5f5b, 0xffff5b5b, 0xffff5b57,
+       0xffff5757, 0xffff5753, 0xffff5353, 0xffff534f, 0xffff4f4f, 0xffff4f4b,
+       0xffff4b4b, 0xffff4b47, 0xffff4747, 0xffff4743, 0xffff4343, 0xffff433f,
+       0xffff3f3f, 0xffff3f3b, 0xffff3b3b, 0xffff3b37, 0xffff3737, 0xffff3733,
+       0xffff3333, 0xffff332f, 0xffff2f2f, 0xffff2f2b, 0xffff2b2b, 0xffff2b27,
+       0xffff2727, 0xffff2723, 0xffff2323, 0xffff231f, 0xffff1f1f, 0xffff1f1b,
+       0xffff1b1b, 0xffff1b17, 0xffff1717, 0xffff1713, 0xffff1313, 0xffff130f,
+       0xffff0f0f, 0xffff0f0b, 0xffff0b0b, 0xffff0b07, 0xffff0707, 0xffff0703,
+       0xffff0303, 0xffff02ff, 0xfffffefe, 0xfffffefa, 0xfffffafa, 0xfffffaf6,
+       0xfffff6f6, 0xfffff6f2, 0xfffff2f2, 0xfffff2ee, 0xffffeeee, 0xffffeeea,
+       0xffffeaea, 0xffffeae6, 0xffffe6e6, 0xffffe6e2, 0xffffe2e2, 0xffffe2de,
+       0xffffdede, 0xffffdeda, 0xffffdada, 0xffffdad6, 0xffffd6d6, 0xffffd6d2,
+       0xffffd2d2, 0xffffd2ce, 0xffffcece, 0xffffceca, 0xffffcaca, 0xffffcac6,
+       0xffffc6c6, 0xffffc6c2, 0xffffc2c2, 0xffffc2be, 0xffffbebe, 0xffffbeba,
+       0xffffbaba, 0xffffbab6, 0xffffb6b6, 0xffffb6b2, 0xffffb2b2, 0xffffb2ae,
+       0xffffaeae, 0xffffaeaa, 0xffffaaaa, 0xffffaaa6, 0xffffa6a6, 0xffffa6a2,
+       0xffffa2a2, 0xffffa29e, 0xffff9e9e, 0xffff9e9a, 0xffff9a9a, 0xffff9a96,
+       0xffff9696, 0xffff9692, 0xffff9292, 0xffff928e, 0xffff8e8e, 0xffff8e8a,
+       0xffff8a8a, 0xffff8a86, 0xffff8686, 0xffff8682, 0xffff8282, 0xffff827e,
+       0xffff7e7e, 0xffff7e7a, 0xffff7a7a, 0xffff7a76, 0xffff7676, 0xffff7672,
+       0xffff7272, 0xffff726e, 0xffff6e6e, 0xffff6e6a, 0xffff6a6a, 0xffff6a66,
+       0xffff6666, 0xffff6662, 0xffff6262, 0xffff625e, 0xffff5e5e, 0xffff5e5a,
+       0xffff5a5a, 0xffff5a56, 0xffff5656, 0xffff5652, 0xffff5252, 0xffff524e,
+       0xffff4e4e, 0xffff4e4a, 0xffff4a4a, 0xffff4a46, 0xffff4646, 0xffff4642,
+       0xffff4242, 0xffff423e, 0xffff3e3e, 0xffff3e3a, 0xffff3a3a, 0xffff3a36,
+       0xffff3636, 0xffff3632, 0xffff3232, 0xffff322e, 0xffff2e2e, 0xffff2e2a,
+       0xffff2a2a, 0xffff2a26, 0xffff2626, 0xffff2622, 0xffff2222, 0xffff221e,
+       0xffff1e1e, 0xffff1e1a, 0xffff1a1a, 0xffff1a16, 0xffff1616, 0xffff1612,
+       0xffff1212, 0xffff120e, 0xffff0e0e, 0xffff0e0a, 0xffff0a0a, 0xffff0a06,
+       0xffff0606, 0xffff0602, 0xffff0202, 0xffff01fe, 0xfffffdfd, 0xfffffdf9,
+       0xfffff9f9, 0xfffff9f5, 0xfffff5f5, 0xfffff5f1, 0xfffff1f1, 0xfffff1ed,
+       0xffffeded, 0xffffede9, 0xffffe9e9, 0xffffe9e5, 0xffffe5e5, 0xffffe5e1,
+       0xffffe1e1, 0xffffe1dd, 0xffffdddd, 0xffffddd9, 0xffffd9d9, 0xffffd9d5,
+       0xffffd5d5, 0xffffd5d1, 0xffffd1d1, 0xffffd1cd, 0xffffcdcd, 0xffffcdc9,
+       0xffffc9c9, 0xffffc9c5, 0xffffc5c5, 0xffffc5c1, 0xffffc1c1, 0xffffc1bd,
+       0xffffbdbd, 0xffffbdb9, 0xffffb9b9, 0xffffb9b5, 0xffffb5b5, 0xffffb5b1,
+       0xffffb1b1, 0xffffb1ad, 0xffffadad, 0xffffada9, 0xffffa9a9, 0xffffa9a5,
+       0xffffa5a5, 0xffffa5a1, 0xffffa1a1, 0xffffa19d, 0xffff9d9d, 0xffff9d99,
+       0xffff9999, 0xffff9995, 0xffff9595, 0xffff9591, 0xffff9191, 0xffff918d,
+       0xffff8d8d, 0xffff8d89, 0xffff8989, 0xffff8985, 0xffff8585, 0xffff8581,
+       0xffff8181, 0xffff817d, 0xffff7d7d, 0xffff7d79, 0xffff7979, 0xffff7975,
+       0xffff7575, 0xffff7571, 0xffff7171, 0xffff716d, 0xffff6d6d, 0xffff6d69,
+       0xffff6969, 0xffff6965, 0xffff6565, 0xffff6561, 0xffff6161, 0xffff615d,
+       0xffff5d5d, 0xffff5d59, 0xffff5959, 0xffff5955, 0xffff5555, 0xffff5551,
+       0xffff5151, 0xffff514d, 0xffff4d4d, 0xffff4d49, 0xffff4949, 0xffff4945,
+       0xffff4545, 0xffff4541, 0xffff4141, 0xffff413d, 0xffff3d3d, 0xffff3d39,
+       0xffff3939, 0xffff3935, 0xffff3535, 0xffff3531, 0xffff3131, 0xffff312d,
+       0xffff2d2d, 0xffff2d29, 0xffff2929, 0xffff2925, 0xffff2525, 0xffff2521,
+       0xffff2121, 0xffff211d, 0xffff1d1d, 0xffff1d19, 0xffff1919, 0xffff1915,
+       0xffff1515, 0xffff1511, 0xffff1111, 0xffff110d, 0xffff0d0d, 0xffff0d09,
+       0xffff0909, 0xffff0905, 0xffff0505, 0xffff0501, 0xffff0101, 0xffff00fd,
+       0xfffffcfc, 0xfffffcf8, 0xfffff8f8, 0xfffff8f4, 0xfffff4f4, 0xfffff4f0,
+       0xfffff0f0, 0xfffff0ec, 0xffffecec, 0xffffece8, 0xffffe8e8, 0xffffe8e4,
+       0xffffe4e4, 0xffffe4e0, 0xffffe0e0, 0xffffe0dc, 0xffffdcdc, 0xffffdcd8,
+       0xffffd8d8, 0xffffd8d4, 0xffffd4d4, 0xffffd4d0, 0xffffd0d0, 0xffffd0cc,
+       0xffffcccc, 0xffffccc8, 0xffffc8c8, 0xffffc8c4, 0xffffc4c4, 0xffffc4c0,
+       0xffffc0c0, 0xffffc0bc, 0xffffbcbc, 0xffffbcb8, 0xffffb8b8, 0xffffb8b4,
+       0xffffb4b4, 0xffffb4b0, 0xffffb0b0, 0xffffb0ac, 0xffffacac, 0xffffaca8,
+       0xffffa8a8, 0xffffa8a4, 0xffffa4a4, 0xffffa4a0, 0xffffa0a0, 0xffffa09c,
+       0xffff9c9c, 0xffff9c98, 0xffff9898, 0xffff9894, 0xffff9494, 0xffff9490,
+       0xffff9090, 0xffff908c, 0xffff8c8c, 0xffff8c88, 0xffff8888, 0xffff8884,
+       0xffff8484, 0xffff8480, 0xffff8080, 0xffff807c, 0xffff7c7c, 0xffff7c78,
+       0xffff7878, 0xffff7874, 0xffff7474, 0xffff7470, 0xffff7070, 0xffff706c,
+       0xffff6c6c, 0xffff6c68, 0xffff6868, 0xffff6864, 0xffff6464, 0xffff6460,
+       0xffff6060, 0xffff605c, 0xffff5c5c, 0xffff5c58, 0xffff5858, 0xffff5854,
+       0xffff5454, 0xffff5450, 0xffff5050, 0xffff504c, 0xffff4c4c, 0xffff4c48,
+       0xffff4848, 0xffff4844, 0xffff4444, 0xffff4440, 0xffff4040, 0xffff403c,
+       0xffff3c3c, 0xffff3c38, 0xffff3838, 0xffff3834, 0xffff3434, 0xffff3430,
+       0xffff3030, 0xffff302c, 0xffff2c2c, 0xffff2c28, 0xffff2828, 0xffff2824,
+       0xffff2424, 0xffff2420, 0xffff2020, 0xffff201c, 0xffff1c1c, 0xffff1c18,
+       0xffff1818, 0xffff1814, 0xffff1414, 0xffff1410, 0xffff1010, 0xffff100c,
+       0xffff0c0c, 0xffff0c08, 0xffff0808, 0xffff0804, 0xffff0404, 0xffff0400,
+       0xffff0000, 0xfffffffb,
+};
+
+static u8 tmp_buf[TEST_BUFLEN];
+
+#define full_csum(buff, len, sum) csum_fold(csum_partial(buff, len, sum))
+
+#define CHECK_EQ(lhs, rhs) KUNIT_ASSERT_EQ(test, lhs, rhs)
+
+static void assert_setup_correct(struct kunit *test)
+{
+       CHECK_EQ(sizeof(random_buf) / sizeof(random_buf[0]), MAX_LEN);
+       CHECK_EQ(sizeof(expected_results) / sizeof(expected_results[0]),
+                MAX_LEN);
+       CHECK_EQ(sizeof(init_sums_no_overflow) /
+                        sizeof(init_sums_no_overflow[0]),
+                MAX_LEN);
+}
+
+/*
+ * Test with randomized input (pre determined random with known results).
+ */
+static void test_csum_fixed_random_inputs(struct kunit *test)
+{
+       int len, align;
+       __wsum result, expec, sum;
+
+       assert_setup_correct(test);
+       for (align = 0; align < TEST_BUFLEN; ++align) {
+               memcpy(&tmp_buf[align], random_buf,
+                      min(MAX_LEN, TEST_BUFLEN - align));
+               for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
+                    ++len) {
+                       /*
+                        * Test the precomputed random input.
+                        */
+                       sum = random_init_sum;
+                       result = full_csum(&tmp_buf[align], len, sum);
+                       expec = expected_results[len];
+                       CHECK_EQ(result, expec);
+               }
+       }
+}
+
+/*
+ * All ones input test. If there are any missing carry operations, it fails.
+ */
+static void test_csum_all_carry_inputs(struct kunit *test)
+{
+       int len, align;
+       __wsum result, expec, sum;
+
+       assert_setup_correct(test);
+       memset(tmp_buf, 0xff, TEST_BUFLEN);
+       for (align = 0; align < TEST_BUFLEN; ++align) {
+               for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
+                    ++len) {
+                       /*
+                        * All carries from input and initial sum.
+                        */
+                       sum = 0xffffffff;
+                       result = full_csum(&tmp_buf[align], len, sum);
+                       expec = (len & 1) ? 0xff00 : 0;
+                       CHECK_EQ(result, expec);
+
+                       /*
+                        * All carries from input.
+                        */
+                       sum = 0;
+                       result = full_csum(&tmp_buf[align], len, sum);
+                       if (len & 1)
+                               expec = 0xff00;
+                       else if (len)
+                               expec = 0;
+                       else
+                               expec = 0xffff;
+                       CHECK_EQ(result, expec);
+               }
+       }
+}
+
+/*
+ * Test with input that alone doesn't cause any carries. By selecting the
+ * maximum initial sum, this allows us to test that there are no carries
+ * where there shouldn't be.
+ */
+static void test_csum_no_carry_inputs(struct kunit *test)
+{
+       int len, align;
+       __wsum result, expec, sum;
+
+       assert_setup_correct(test);
+       memset(tmp_buf, 0x4, TEST_BUFLEN);
+       for (align = 0; align < TEST_BUFLEN; ++align) {
+               for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
+                    ++len) {
+                       /*
+                        * Expect no carries.
+                        */
+                       sum = init_sums_no_overflow[len];
+                       result = full_csum(&tmp_buf[align], len, sum);
+                       expec = 0;
+                       CHECK_EQ(result, expec);
+
+                       /*
+                        * Expect one carry.
+                        */
+                       sum = init_sums_no_overflow[len] + 1;
+                       result = full_csum(&tmp_buf[align], len, sum);
+                       expec = len ? 0xfffe : 0xffff;
+                       CHECK_EQ(result, expec);
+               }
+       }
+}
+
+static struct kunit_case __refdata checksum_test_cases[] = {
+       KUNIT_CASE(test_csum_fixed_random_inputs),
+       KUNIT_CASE(test_csum_all_carry_inputs),
+       KUNIT_CASE(test_csum_no_carry_inputs),
+       {}
+};
+
+static struct kunit_suite checksum_test_suite = {
+       .name = "checksum",
+       .test_cases = checksum_test_cases,
+};
+
+kunit_test_suites(&checksum_test_suite);
+
+MODULE_AUTHOR("Noah Goldstein <goldstein.w.n@gmail.com>");
+MODULE_LICENSE("GPL");
index 73c1636..4c34867 100644 (file)
@@ -280,8 +280,8 @@ static void irq_cpu_rmap_release(struct kref *ref)
        struct irq_glue *glue =
                container_of(ref, struct irq_glue, notify.kref);
 
-       cpu_rmap_put(glue->rmap);
        glue->rmap->obj[glue->index] = NULL;
+       cpu_rmap_put(glue->rmap);
        kfree(glue);
 }
 
index 771d82d..c40e5d9 100644 (file)
@@ -14,8 +14,6 @@
 #include <crypto/curve25519.h>
 #include <linux/string.h>
 
-typedef __uint128_t u128;
-
 static __always_inline u64 u64_eq_mask(u64 a, u64 b)
 {
        u64 x = a ^ b;
index d34cf40..988702c 100644 (file)
@@ -10,8 +10,6 @@
 #include <asm/unaligned.h>
 #include <crypto/internal/poly1305.h>
 
-typedef __uint128_t u128;
-
 void poly1305_core_setkey(struct poly1305_core_key *key,
                          const u8 raw_key[POLY1305_BLOCK_SIZE])
 {
index 003edc5..a517256 100644 (file)
@@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
 
 static void fill_pool(void)
 {
-       gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
+       gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;
        struct debug_obj *obj;
        unsigned long flags;
 
@@ -498,6 +498,15 @@ static void debug_print_object(struct debug_obj *obj, char *msg)
        const struct debug_obj_descr *descr = obj->descr;
        static int limit;
 
+       /*
+        * Don't report if lookup_object_or_alloc() by the current thread
+        * failed because lookup_object_or_alloc()/debug_objects_oom() by a
+        * concurrent thread turned off debug_objects_enabled and cleared
+        * the hash buckets.
+        */
+       if (!debug_objects_enabled)
+               return;
+
        if (limit < 5 && descr != descr_test) {
                void *hint = descr->debug_hint ?
                        descr->debug_hint(obj->object) : NULL;
@@ -591,10 +600,21 @@ static void debug_objects_fill_pool(void)
 {
        /*
         * On RT enabled kernels the pool refill must happen in preemptible
-        * context:
+        * context -- for !RT kernels we rely on the fact that spinlock_t and
+        * raw_spinlock_t are basically the same type and this lock-type
+        * inversion works just fine.
         */
-       if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible())
+       if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
+               /*
+                * Annotate away the spinlock_t inside raw_spinlock_t warning
+                * by temporarily raising the wait-type to WAIT_SLEEP, matching
+                * the preemptible() condition above.
+                */
+               static DEFINE_WAIT_OVERRIDE_MAP(fill_pool_map, LD_WAIT_SLEEP);
+               lock_map_acquire_try(&fill_pool_map);
                fill_pool();
+               lock_map_release(&fill_pool_map);
+       }
 }
 
 static void
index 38045d6..e89aaf0 100644 (file)
@@ -54,7 +54,7 @@ void dim_park_tired(struct dim *dim)
 }
 EXPORT_SYMBOL(dim_park_tired);
 
-void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
+bool dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
                    struct dim_stats *curr_stats)
 {
        /* u32 holds up to 71 minutes, should be enough */
@@ -66,7 +66,7 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
                             start->comp_ctr);
 
        if (!delta_us)
-               return;
+               return false;
 
        curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
        curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
@@ -79,5 +79,6 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
        else
                curr_stats->cpe_ratio = 0;
 
+       return true;
 }
 EXPORT_SYMBOL(dim_calc_stats);
index 53f6b9c..4e32f7a 100644 (file)
@@ -227,7 +227,8 @@ void net_dim(struct dim *dim, struct dim_sample end_sample)
                                  dim->start_sample.event_ctr);
                if (nevents < DIM_NEVENTS)
                        break;
-               dim_calc_stats(&dim->start_sample, &end_sample, &curr_stats);
+               if (!dim_calc_stats(&dim->start_sample, &end_sample, &curr_stats))
+                       break;
                if (net_dim_decision(&curr_stats, dim)) {
                        dim->state = DIM_APPLY_NEW_PROFILE;
                        schedule_work(&dim->work);
index 15462d5..88f7794 100644 (file)
@@ -88,7 +88,8 @@ void rdma_dim(struct dim *dim, u64 completions)
                nevents = curr_sample->event_ctr - dim->start_sample.event_ctr;
                if (nevents < DIM_NEVENTS)
                        break;
-               dim_calc_stats(&dim->start_sample, curr_sample, &curr_stats);
+               if (!dim_calc_stats(&dim->start_sample, curr_sample, &curr_stats))
+                       break;
                if (rdma_dim_decision(&curr_stats, dim)) {
                        dim->state = DIM_APPLY_NEW_PROFILE;
                        schedule_work(&dim->work);
index 960223e..b667b1e 100644 (file)
@@ -14,8 +14,6 @@
 #include <linux/scatterlist.h>
 #include <linux/instrumented.h>
 
-#define PIPE_PARANOIA /* for now */
-
 /* covers ubuf and kbuf alike */
 #define iterate_buf(i, n, base, len, off, __p, STEP) {         \
        size_t __maybe_unused off = 0;                          \
@@ -198,150 +196,6 @@ static int copyin(void *to, const void __user *from, size_t n)
        return res;
 }
 
-#ifdef PIPE_PARANOIA
-static bool sanity(const struct iov_iter *i)
-{
-       struct pipe_inode_info *pipe = i->pipe;
-       unsigned int p_head = pipe->head;
-       unsigned int p_tail = pipe->tail;
-       unsigned int p_occupancy = pipe_occupancy(p_head, p_tail);
-       unsigned int i_head = i->head;
-       unsigned int idx;
-
-       if (i->last_offset) {
-               struct pipe_buffer *p;
-               if (unlikely(p_occupancy == 0))
-                       goto Bad;       // pipe must be non-empty
-               if (unlikely(i_head != p_head - 1))
-                       goto Bad;       // must be at the last buffer...
-
-               p = pipe_buf(pipe, i_head);
-               if (unlikely(p->offset + p->len != abs(i->last_offset)))
-                       goto Bad;       // ... at the end of segment
-       } else {
-               if (i_head != p_head)
-                       goto Bad;       // must be right after the last buffer
-       }
-       return true;
-Bad:
-       printk(KERN_ERR "idx = %d, offset = %d\n", i_head, i->last_offset);
-       printk(KERN_ERR "head = %d, tail = %d, buffers = %d\n",
-                       p_head, p_tail, pipe->ring_size);
-       for (idx = 0; idx < pipe->ring_size; idx++)
-               printk(KERN_ERR "[%p %p %d %d]\n",
-                       pipe->bufs[idx].ops,
-                       pipe->bufs[idx].page,
-                       pipe->bufs[idx].offset,
-                       pipe->bufs[idx].len);
-       WARN_ON(1);
-       return false;
-}
-#else
-#define sanity(i) true
-#endif
-
-static struct page *push_anon(struct pipe_inode_info *pipe, unsigned size)
-{
-       struct page *page = alloc_page(GFP_USER);
-       if (page) {
-               struct pipe_buffer *buf = pipe_buf(pipe, pipe->head++);
-               *buf = (struct pipe_buffer) {
-                       .ops = &default_pipe_buf_ops,
-                       .page = page,
-                       .offset = 0,
-                       .len = size
-               };
-       }
-       return page;
-}
-
-static void push_page(struct pipe_inode_info *pipe, struct page *page,
-                       unsigned int offset, unsigned int size)
-{
-       struct pipe_buffer *buf = pipe_buf(pipe, pipe->head++);
-       *buf = (struct pipe_buffer) {
-               .ops = &page_cache_pipe_buf_ops,
-               .page = page,
-               .offset = offset,
-               .len = size
-       };
-       get_page(page);
-}
-
-static inline int last_offset(const struct pipe_buffer *buf)
-{
-       if (buf->ops == &default_pipe_buf_ops)
-               return buf->len;        // buf->offset is 0 for those
-       else
-               return -(buf->offset + buf->len);
-}
-
-static struct page *append_pipe(struct iov_iter *i, size_t size,
-                               unsigned int *off)
-{
-       struct pipe_inode_info *pipe = i->pipe;
-       int offset = i->last_offset;
-       struct pipe_buffer *buf;
-       struct page *page;
-
-       if (offset > 0 && offset < PAGE_SIZE) {
-               // some space in the last buffer; add to it
-               buf = pipe_buf(pipe, pipe->head - 1);
-               size = min_t(size_t, size, PAGE_SIZE - offset);
-               buf->len += size;
-               i->last_offset += size;
-               i->count -= size;
-               *off = offset;
-               return buf->page;
-       }
-       // OK, we need a new buffer
-       *off = 0;
-       size = min_t(size_t, size, PAGE_SIZE);
-       if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
-               return NULL;
-       page = push_anon(pipe, size);
-       if (!page)
-               return NULL;
-       i->head = pipe->head - 1;
-       i->last_offset = size;
-       i->count -= size;
-       return page;
-}
-
-static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
-                        struct iov_iter *i)
-{
-       struct pipe_inode_info *pipe = i->pipe;
-       unsigned int head = pipe->head;
-
-       if (unlikely(bytes > i->count))
-               bytes = i->count;
-
-       if (unlikely(!bytes))
-               return 0;
-
-       if (!sanity(i))
-               return 0;
-
-       if (offset && i->last_offset == -offset) { // could we merge it?
-               struct pipe_buffer *buf = pipe_buf(pipe, head - 1);
-               if (buf->page == page) {
-                       buf->len += bytes;
-                       i->last_offset -= bytes;
-                       i->count -= bytes;
-                       return bytes;
-               }
-       }
-       if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
-               return 0;
-
-       push_page(pipe, page, offset, bytes);
-       i->last_offset = -(offset + bytes);
-       i->head = head;
-       i->count -= bytes;
-       return bytes;
-}
-
 /*
  * fault_in_iov_iter_readable - fault in iov iterator for reading
  * @i: iterator
@@ -446,46 +300,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_init);
 
-// returns the offset in partial buffer (if any)
-static inline unsigned int pipe_npages(const struct iov_iter *i, int *npages)
-{
-       struct pipe_inode_info *pipe = i->pipe;
-       int used = pipe->head - pipe->tail;
-       int off = i->last_offset;
-
-       *npages = max((int)pipe->max_usage - used, 0);
-
-       if (off > 0 && off < PAGE_SIZE) { // anon and not full
-               (*npages)++;
-               return off;
-       }
-       return 0;
-}
-
-static size_t copy_pipe_to_iter(const void *addr, size_t bytes,
-                               struct iov_iter *i)
-{
-       unsigned int off, chunk;
-
-       if (unlikely(bytes > i->count))
-               bytes = i->count;
-       if (unlikely(!bytes))
-               return 0;
-
-       if (!sanity(i))
-               return 0;
-
-       for (size_t n = bytes; n; n -= chunk) {
-               struct page *page = append_pipe(i, n, &off);
-               chunk = min_t(size_t, n, PAGE_SIZE - off);
-               if (!page)
-                       return bytes - n;
-               memcpy_to_page(page, off, addr, chunk);
-               addr += chunk;
-       }
-       return bytes;
-}
-
 static __wsum csum_and_memcpy(void *to, const void *from, size_t len,
                              __wsum sum, size_t off)
 {
@@ -493,44 +307,10 @@ static __wsum csum_and_memcpy(void *to, const void *from, size_t len,
        return csum_block_add(sum, next, off);
 }
 
-static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes,
-                                        struct iov_iter *i, __wsum *sump)
-{
-       __wsum sum = *sump;
-       size_t off = 0;
-       unsigned int chunk, r;
-
-       if (unlikely(bytes > i->count))
-               bytes = i->count;
-       if (unlikely(!bytes))
-               return 0;
-
-       if (!sanity(i))
-               return 0;
-
-       while (bytes) {
-               struct page *page = append_pipe(i, bytes, &r);
-               char *p;
-
-               if (!page)
-                       break;
-               chunk = min_t(size_t, bytes, PAGE_SIZE - r);
-               p = kmap_local_page(page);
-               sum = csum_and_memcpy(p + r, addr + off, chunk, sum, off);
-               kunmap_local(p);
-               off += chunk;
-               bytes -= chunk;
-       }
-       *sump = sum;
-       return off;
-}
-
 size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
        if (WARN_ON_ONCE(i->data_source))
                return 0;
-       if (unlikely(iov_iter_is_pipe(i)))
-               return copy_pipe_to_iter(addr, bytes, i);
        if (user_backed_iter(i))
                might_fault();
        iterate_and_advance(i, bytes, base, len, off,
@@ -552,42 +332,6 @@ static int copyout_mc(void __user *to, const void *from, size_t n)
        return n;
 }
 
-static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
-                               struct iov_iter *i)
-{
-       size_t xfer = 0;
-       unsigned int off, chunk;
-
-       if (unlikely(bytes > i->count))
-               bytes = i->count;
-       if (unlikely(!bytes))
-               return 0;
-
-       if (!sanity(i))
-               return 0;
-
-       while (bytes) {
-               struct page *page = append_pipe(i, bytes, &off);
-               unsigned long rem;
-               char *p;
-
-               if (!page)
-                       break;
-               chunk = min_t(size_t, bytes, PAGE_SIZE - off);
-               p = kmap_local_page(page);
-               rem = copy_mc_to_kernel(p + off, addr + xfer, chunk);
-               chunk -= rem;
-               kunmap_local(p);
-               xfer += chunk;
-               bytes -= chunk;
-               if (rem) {
-                       iov_iter_revert(i, rem);
-                       break;
-               }
-       }
-       return xfer;
-}
-
 /**
  * _copy_mc_to_iter - copy to iter with source memory error exception handling
  * @addr: source kernel address
@@ -607,9 +351,8 @@ static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
  *   alignment and poison alignment assumptions to avoid re-triggering
  *   hardware exceptions.
  *
- * * ITER_KVEC, ITER_PIPE, and ITER_BVEC can return short copies.
- *   Compare to copy_to_iter() where only ITER_IOVEC attempts might return
- *   a short copy.
+ * * ITER_KVEC and ITER_BVEC can return short copies.  Compare to
+ *   copy_to_iter() where only ITER_IOVEC attempts might return a short copy.
  *
  * Return: number of bytes copied (may be %0)
  */
@@ -617,8 +360,6 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
        if (WARN_ON_ONCE(i->data_source))
                return 0;
-       if (unlikely(iov_iter_is_pipe(i)))
-               return copy_mc_pipe_to_iter(addr, bytes, i);
        if (user_backed_iter(i))
                might_fault();
        __iterate_and_advance(i, bytes, base, len, off,
@@ -732,8 +473,6 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
                return 0;
        if (WARN_ON_ONCE(i->data_source))
                return 0;
-       if (unlikely(iov_iter_is_pipe(i)))
-               return copy_page_to_iter_pipe(page, offset, bytes, i);
        page += offset / PAGE_SIZE; // first subpage
        offset %= PAGE_SIZE;
        while (1) {
@@ -764,8 +503,6 @@ size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t byte
                return 0;
        if (WARN_ON_ONCE(i->data_source))
                return 0;
-       if (unlikely(iov_iter_is_pipe(i)))
-               return copy_page_to_iter_pipe(page, offset, bytes, i);
        page += offset / PAGE_SIZE; // first subpage
        offset %= PAGE_SIZE;
        while (1) {
@@ -818,36 +555,8 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 }
 EXPORT_SYMBOL(copy_page_from_iter);
 
-static size_t pipe_zero(size_t bytes, struct iov_iter *i)
-{
-       unsigned int chunk, off;
-
-       if (unlikely(bytes > i->count))
-               bytes = i->count;
-       if (unlikely(!bytes))
-               return 0;
-
-       if (!sanity(i))
-               return 0;
-
-       for (size_t n = bytes; n; n -= chunk) {
-               struct page *page = append_pipe(i, n, &off);
-               char *p;
-
-               if (!page)
-                       return bytes - n;
-               chunk = min_t(size_t, n, PAGE_SIZE - off);
-               p = kmap_local_page(page);
-               memset(p + off, 0, chunk);
-               kunmap_local(p);
-       }
-       return bytes;
-}
-
 size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
 {
-       if (unlikely(iov_iter_is_pipe(i)))
-               return pipe_zero(bytes, i);
        iterate_and_advance(i, bytes, base, len, count,
                clear_user(base, len),
                memset(base, 0, len)
@@ -878,32 +587,6 @@ size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t byt
 }
 EXPORT_SYMBOL(copy_page_from_iter_atomic);
 
-static void pipe_advance(struct iov_iter *i, size_t size)
-{
-       struct pipe_inode_info *pipe = i->pipe;
-       int off = i->last_offset;
-
-       if (!off && !size) {
-               pipe_discard_from(pipe, i->start_head); // discard everything
-               return;
-       }
-       i->count -= size;
-       while (1) {
-               struct pipe_buffer *buf = pipe_buf(pipe, i->head);
-               if (off) /* make it relative to the beginning of buffer */
-                       size += abs(off) - buf->offset;
-               if (size <= buf->len) {
-                       buf->len = size;
-                       i->last_offset = last_offset(buf);
-                       break;
-               }
-               size -= buf->len;
-               i->head++;
-               off = 0;
-       }
-       pipe_discard_from(pipe, i->head + 1); // discard everything past this one
-}
-
 static void iov_iter_bvec_advance(struct iov_iter *i, size_t size)
 {
        const struct bio_vec *bvec, *end;
@@ -955,8 +638,6 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
                iov_iter_iovec_advance(i, size);
        } else if (iov_iter_is_bvec(i)) {
                iov_iter_bvec_advance(i, size);
-       } else if (iov_iter_is_pipe(i)) {
-               pipe_advance(i, size);
        } else if (iov_iter_is_discard(i)) {
                i->count -= size;
        }
@@ -970,26 +651,6 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
        if (WARN_ON(unroll > MAX_RW_COUNT))
                return;
        i->count += unroll;
-       if (unlikely(iov_iter_is_pipe(i))) {
-               struct pipe_inode_info *pipe = i->pipe;
-               unsigned int head = pipe->head;
-
-               while (head > i->start_head) {
-                       struct pipe_buffer *b = pipe_buf(pipe, --head);
-                       if (unroll < b->len) {
-                               b->len -= unroll;
-                               i->last_offset = last_offset(b);
-                               i->head = head;
-                               return;
-                       }
-                       unroll -= b->len;
-                       pipe_buf_release(pipe, b);
-                       pipe->head--;
-               }
-               i->last_offset = 0;
-               i->head = head;
-               return;
-       }
        if (unlikely(iov_iter_is_discard(i)))
                return;
        if (unroll <= i->iov_offset) {
@@ -1079,24 +740,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_bvec);
 
-void iov_iter_pipe(struct iov_iter *i, unsigned int direction,
-                       struct pipe_inode_info *pipe,
-                       size_t count)
-{
-       BUG_ON(direction != READ);
-       WARN_ON(pipe_full(pipe->head, pipe->tail, pipe->ring_size));
-       *i = (struct iov_iter){
-               .iter_type = ITER_PIPE,
-               .data_source = false,
-               .pipe = pipe,
-               .head = pipe->head,
-               .start_head = pipe->head,
-               .last_offset = 0,
-               .count = count
-       };
-}
-EXPORT_SYMBOL(iov_iter_pipe);
-
 /**
  * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray
  * @i: The iterator to initialise.
@@ -1224,19 +867,6 @@ bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask,
        if (iov_iter_is_bvec(i))
                return iov_iter_aligned_bvec(i, addr_mask, len_mask);
 
-       if (iov_iter_is_pipe(i)) {
-               size_t size = i->count;
-
-               if (size & len_mask)
-                       return false;
-               if (size && i->last_offset > 0) {
-                       if (i->last_offset & addr_mask)
-                               return false;
-               }
-
-               return true;
-       }
-
        if (iov_iter_is_xarray(i)) {
                if (i->count & len_mask)
                        return false;
@@ -1307,14 +937,6 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
        if (iov_iter_is_bvec(i))
                return iov_iter_alignment_bvec(i);
 
-       if (iov_iter_is_pipe(i)) {
-               size_t size = i->count;
-
-               if (size && i->last_offset > 0)
-                       return size | i->last_offset;
-               return size;
-       }
-
        if (iov_iter_is_xarray(i))
                return (i->xarray_start + i->iov_offset) | i->count;
 
@@ -1367,36 +989,6 @@ static int want_pages_array(struct page ***res, size_t size,
        return count;
 }
 
-static ssize_t pipe_get_pages(struct iov_iter *i,
-                  struct page ***pages, size_t maxsize, unsigned maxpages,
-                  size_t *start)
-{
-       unsigned int npages, count, off, chunk;
-       struct page **p;
-       size_t left;
-
-       if (!sanity(i))
-               return -EFAULT;
-
-       *start = off = pipe_npages(i, &npages);
-       if (!npages)
-               return -EFAULT;
-       count = want_pages_array(pages, maxsize, off, min(npages, maxpages));
-       if (!count)
-               return -ENOMEM;
-       p = *pages;
-       for (npages = 0, left = maxsize ; npages < count; npages++, left -= chunk) {
-               struct page *page = append_pipe(i, left, &off);
-               if (!page)
-                       break;
-               chunk = min_t(size_t, left, PAGE_SIZE - off);
-               get_page(*p++ = page);
-       }
-       if (!npages)
-               return -EFAULT;
-       return maxsize - left;
-}
-
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
                                          pgoff_t index, unsigned int nr_pages)
 {
@@ -1490,8 +1082,7 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
 
 static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
                   struct page ***pages, size_t maxsize,
-                  unsigned int maxpages, size_t *start,
-                  iov_iter_extraction_t extraction_flags)
+                  unsigned int maxpages, size_t *start)
 {
        unsigned int n, gup_flags = 0;
 
@@ -1501,8 +1092,6 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
                return 0;
        if (maxsize > MAX_RW_COUNT)
                maxsize = MAX_RW_COUNT;
-       if (extraction_flags & ITER_ALLOW_P2PDMA)
-               gup_flags |= FOLL_PCI_P2PDMA;
 
        if (likely(user_backed_iter(i))) {
                unsigned long addr;
@@ -1547,56 +1136,36 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
                }
                return maxsize;
        }
-       if (iov_iter_is_pipe(i))
-               return pipe_get_pages(i, pages, maxsize, maxpages, start);
        if (iov_iter_is_xarray(i))
                return iter_xarray_get_pages(i, pages, maxsize, maxpages, start);
        return -EFAULT;
 }
 
-ssize_t iov_iter_get_pages(struct iov_iter *i,
-                  struct page **pages, size_t maxsize, unsigned maxpages,
-                  size_t *start, iov_iter_extraction_t extraction_flags)
+ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
+               size_t maxsize, unsigned maxpages, size_t *start)
 {
        if (!maxpages)
                return 0;
        BUG_ON(!pages);
 
-       return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages,
-                                         start, extraction_flags);
-}
-EXPORT_SYMBOL_GPL(iov_iter_get_pages);
-
-ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
-               size_t maxsize, unsigned maxpages, size_t *start)
-{
-       return iov_iter_get_pages(i, pages, maxsize, maxpages, start, 0);
+       return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start);
 }
 EXPORT_SYMBOL(iov_iter_get_pages2);
 
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
-                  struct page ***pages, size_t maxsize,
-                  size_t *start, iov_iter_extraction_t extraction_flags)
+ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
+               struct page ***pages, size_t maxsize, size_t *start)
 {
        ssize_t len;
 
        *pages = NULL;
 
-       len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start,
-                                        extraction_flags);
+       len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start);
        if (len <= 0) {
                kvfree(*pages);
                *pages = NULL;
        }
        return len;
 }
-EXPORT_SYMBOL_GPL(iov_iter_get_pages_alloc);
-
-ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
-               struct page ***pages, size_t maxsize, size_t *start)
-{
-       return iov_iter_get_pages_alloc(i, pages, maxsize, start, 0);
-}
 EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
 
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
@@ -1638,9 +1207,7 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
        }
 
        sum = csum_shift(csstate->csum, csstate->off);
-       if (unlikely(iov_iter_is_pipe(i)))
-               bytes = csum_and_copy_to_pipe_iter(addr, bytes, i, &sum);
-       else iterate_and_advance(i, bytes, base, len, off, ({
+       iterate_and_advance(i, bytes, base, len, off, ({
                next = csum_and_copy_to_user(addr + off, base, len);
                sum = csum_block_add(sum, next, off);
                next ? 0 : len;
@@ -1725,15 +1292,6 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
                return iov_npages(i, maxpages);
        if (iov_iter_is_bvec(i))
                return bvec_npages(i, maxpages);
-       if (iov_iter_is_pipe(i)) {
-               int npages;
-
-               if (!sanity(i))
-                       return 0;
-
-               pipe_npages(i, &npages);
-               return min(npages, maxpages);
-       }
        if (iov_iter_is_xarray(i)) {
                unsigned offset = (i->xarray_start + i->iov_offset) % PAGE_SIZE;
                int npages = DIV_ROUND_UP(offset + i->count, PAGE_SIZE);
@@ -1746,10 +1304,6 @@ EXPORT_SYMBOL(iov_iter_npages);
 const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 {
        *new = *old;
-       if (unlikely(iov_iter_is_pipe(new))) {
-               WARN_ON(1);
-               return NULL;
-       }
        if (iov_iter_is_bvec(new))
                return new->bvec = kmemdup(new->bvec,
                                    new->nr_segs * sizeof(struct bio_vec),
index 0cea31c..ce6749a 100644 (file)
@@ -125,11 +125,6 @@ kunit_test_suites(&executor_test_suite);
 
 /* Test helpers */
 
-static void kfree_res_free(struct kunit_resource *res)
-{
-       kfree(res->data);
-}
-
 /* Use the resource API to register a call to kfree(to_free).
  * Since we never actually use the resource, it's safe to use on const data.
  */
@@ -138,8 +133,10 @@ static void kfree_at_end(struct kunit *test, const void *to_free)
        /* kfree() handles NULL already, but avoid allocating a no-op cleanup. */
        if (IS_ERR_OR_NULL(to_free))
                return;
-       kunit_alloc_resource(test, NULL, kfree_res_free, GFP_KERNEL,
-                            (void *)to_free);
+
+       kunit_add_action(test,
+                       (kunit_action_t *)kfree,
+                       (void *)to_free);
 }
 
 static struct kunit_suite *alloc_fake_suite(struct kunit *test,
index cd8b7e5..b69b689 100644 (file)
@@ -42,6 +42,16 @@ static int example_test_init(struct kunit *test)
 }
 
 /*
+ * This is run once after each test case, see the comment on
+ * example_test_suite for more information.
+ */
+static void example_test_exit(struct kunit *test)
+{
+       kunit_info(test, "cleaning up\n");
+}
+
+
+/*
  * This is run once before all test cases in the suite.
  * See the comment on example_test_suite for more information.
  */
@@ -53,6 +63,16 @@ static int example_test_init_suite(struct kunit_suite *suite)
 }
 
 /*
+ * This is run once after all test cases in the suite.
+ * See the comment on example_test_suite for more information.
+ */
+static void example_test_exit_suite(struct kunit_suite *suite)
+{
+       kunit_info(suite, "exiting suite\n");
+}
+
+
+/*
  * This test should always be skipped.
  */
 static void example_skip_test(struct kunit *test)
@@ -167,6 +187,39 @@ static void example_static_stub_test(struct kunit *test)
        KUNIT_EXPECT_EQ(test, add_one(1), 2);
 }
 
+static const struct example_param {
+       int value;
+} example_params_array[] = {
+       { .value = 2, },
+       { .value = 1, },
+       { .value = 0, },
+};
+
+static void example_param_get_desc(const struct example_param *p, char *desc)
+{
+       snprintf(desc, KUNIT_PARAM_DESC_SIZE, "example value %d", p->value);
+}
+
+KUNIT_ARRAY_PARAM(example, example_params_array, example_param_get_desc);
+
+/*
+ * This test shows the use of params.
+ */
+static void example_params_test(struct kunit *test)
+{
+       const struct example_param *param = test->param_value;
+
+       /* By design, param pointer will not be NULL */
+       KUNIT_ASSERT_NOT_NULL(test, param);
+
+       /* Test can be skipped on unsupported param values */
+       if (!param->value)
+               kunit_skip(test, "unsupported param value");
+
+       /* You can use param values for parameterized testing */
+       KUNIT_EXPECT_EQ(test, param->value % param->value, 0);
+}
+
 /*
  * Here we make a list of all the test cases we want to add to the test suite
  * below.
@@ -183,6 +236,7 @@ static struct kunit_case example_test_cases[] = {
        KUNIT_CASE(example_mark_skipped_test),
        KUNIT_CASE(example_all_expect_macros_test),
        KUNIT_CASE(example_static_stub_test),
+       KUNIT_CASE_PARAM(example_params_test, example_gen_params),
        {}
 };
 
@@ -211,7 +265,9 @@ static struct kunit_case example_test_cases[] = {
 static struct kunit_suite example_test_suite = {
        .name = "example",
        .init = example_test_init,
+       .exit = example_test_exit,
        .suite_init = example_test_init_suite,
+       .suite_exit = example_test_exit_suite,
        .test_cases = example_test_cases,
 };
 
index 42e44ca..83d8e90 100644 (file)
@@ -112,7 +112,7 @@ struct kunit_test_resource_context {
        struct kunit test;
        bool is_resource_initialized;
        int allocate_order[2];
-       int free_order[2];
+       int free_order[4];
 };
 
 static int fake_resource_init(struct kunit_resource *res, void *context)
@@ -403,6 +403,88 @@ static void kunit_resource_test_named(struct kunit *test)
        KUNIT_EXPECT_TRUE(test, list_empty(&test->resources));
 }
 
+static void increment_int(void *ctx)
+{
+       int *i = (int *)ctx;
+       (*i)++;
+}
+
+static void kunit_resource_test_action(struct kunit *test)
+{
+       int num_actions = 0;
+
+       kunit_add_action(test, increment_int, &num_actions);
+       KUNIT_EXPECT_EQ(test, num_actions, 0);
+       kunit_cleanup(test);
+       KUNIT_EXPECT_EQ(test, num_actions, 1);
+
+       /* Once we've cleaned up, the action queue is empty. */
+       kunit_cleanup(test);
+       KUNIT_EXPECT_EQ(test, num_actions, 1);
+
+       /* Check the same function can be deferred multiple times. */
+       kunit_add_action(test, increment_int, &num_actions);
+       kunit_add_action(test, increment_int, &num_actions);
+       kunit_cleanup(test);
+       KUNIT_EXPECT_EQ(test, num_actions, 3);
+}
+static void kunit_resource_test_remove_action(struct kunit *test)
+{
+       int num_actions = 0;
+
+       kunit_add_action(test, increment_int, &num_actions);
+       KUNIT_EXPECT_EQ(test, num_actions, 0);
+
+       kunit_remove_action(test, increment_int, &num_actions);
+       kunit_cleanup(test);
+       KUNIT_EXPECT_EQ(test, num_actions, 0);
+}
+static void kunit_resource_test_release_action(struct kunit *test)
+{
+       int num_actions = 0;
+
+       kunit_add_action(test, increment_int, &num_actions);
+       KUNIT_EXPECT_EQ(test, num_actions, 0);
+       /* Runs immediately on trigger. */
+       kunit_release_action(test, increment_int, &num_actions);
+       KUNIT_EXPECT_EQ(test, num_actions, 1);
+
+       /* Doesn't run again on test exit. */
+       kunit_cleanup(test);
+       KUNIT_EXPECT_EQ(test, num_actions, 1);
+}
+static void action_order_1(void *ctx)
+{
+       struct kunit_test_resource_context *res_ctx = (struct kunit_test_resource_context *)ctx;
+
+       KUNIT_RESOURCE_TEST_MARK_ORDER(res_ctx, free_order, 1);
+       kunit_log(KERN_INFO, current->kunit_test, "action_order_1");
+}
+static void action_order_2(void *ctx)
+{
+       struct kunit_test_resource_context *res_ctx = (struct kunit_test_resource_context *)ctx;
+
+       KUNIT_RESOURCE_TEST_MARK_ORDER(res_ctx, free_order, 2);
+       kunit_log(KERN_INFO, current->kunit_test, "action_order_2");
+}
+static void kunit_resource_test_action_ordering(struct kunit *test)
+{
+       struct kunit_test_resource_context *ctx = test->priv;
+
+       kunit_add_action(test, action_order_1, ctx);
+       kunit_add_action(test, action_order_2, ctx);
+       kunit_add_action(test, action_order_1, ctx);
+       kunit_add_action(test, action_order_2, ctx);
+       kunit_remove_action(test, action_order_1, ctx);
+       kunit_release_action(test, action_order_2, ctx);
+       kunit_cleanup(test);
+
+       /* [2 is triggered] [2], [(1 is cancelled)] [1] */
+       KUNIT_EXPECT_EQ(test, ctx->free_order[0], 2);
+       KUNIT_EXPECT_EQ(test, ctx->free_order[1], 2);
+       KUNIT_EXPECT_EQ(test, ctx->free_order[2], 1);
+}
+
 static int kunit_resource_test_init(struct kunit *test)
 {
        struct kunit_test_resource_context *ctx =
@@ -434,6 +516,10 @@ static struct kunit_case kunit_resource_test_cases[] = {
        KUNIT_CASE(kunit_resource_test_proper_free_ordering),
        KUNIT_CASE(kunit_resource_test_static),
        KUNIT_CASE(kunit_resource_test_named),
+       KUNIT_CASE(kunit_resource_test_action),
+       KUNIT_CASE(kunit_resource_test_remove_action),
+       KUNIT_CASE(kunit_resource_test_release_action),
+       KUNIT_CASE(kunit_resource_test_action_ordering),
        {}
 };
 
index c414df9..f020925 100644 (file)
@@ -77,3 +77,102 @@ int kunit_destroy_resource(struct kunit *test, kunit_resource_match_t match,
        return 0;
 }
 EXPORT_SYMBOL_GPL(kunit_destroy_resource);
+
+struct kunit_action_ctx {
+       struct kunit_resource res;
+       kunit_action_t *func;
+       void *ctx;
+};
+
+static void __kunit_action_free(struct kunit_resource *res)
+{
+       struct kunit_action_ctx *action_ctx = container_of(res, struct kunit_action_ctx, res);
+
+       action_ctx->func(action_ctx->ctx);
+}
+
+
+int kunit_add_action(struct kunit *test, void (*action)(void *), void *ctx)
+{
+       struct kunit_action_ctx *action_ctx;
+
+       KUNIT_ASSERT_NOT_NULL_MSG(test, action, "Tried to action a NULL function!");
+
+       action_ctx = kzalloc(sizeof(*action_ctx), GFP_KERNEL);
+       if (!action_ctx)
+               return -ENOMEM;
+
+       action_ctx->func = action;
+       action_ctx->ctx = ctx;
+
+       action_ctx->res.should_kfree = true;
+       /* As init is NULL, this cannot fail. */
+       __kunit_add_resource(test, NULL, __kunit_action_free, &action_ctx->res, action_ctx);
+
+       return 0;
+}
+EXPORT_SYMBOL_GPL(kunit_add_action);
+
+int kunit_add_action_or_reset(struct kunit *test, void (*action)(void *),
+                             void *ctx)
+{
+       int res = kunit_add_action(test, action, ctx);
+
+       if (res)
+               action(ctx);
+       return res;
+}
+EXPORT_SYMBOL_GPL(kunit_add_action_or_reset);
+
+static bool __kunit_action_match(struct kunit *test,
+                               struct kunit_resource *res, void *match_data)
+{
+       struct kunit_action_ctx *match_ctx = (struct kunit_action_ctx *)match_data;
+       struct kunit_action_ctx *res_ctx = container_of(res, struct kunit_action_ctx, res);
+
+       /* Make sure this is a free function. */
+       if (res->free != __kunit_action_free)
+               return false;
+
+       /* Both the function and context data should match. */
+       return (match_ctx->func == res_ctx->func) && (match_ctx->ctx == res_ctx->ctx);
+}
+
+void kunit_remove_action(struct kunit *test,
+                       kunit_action_t *action,
+                       void *ctx)
+{
+       struct kunit_action_ctx match_ctx;
+       struct kunit_resource *res;
+
+       match_ctx.func = action;
+       match_ctx.ctx = ctx;
+
+       res = kunit_find_resource(test, __kunit_action_match, &match_ctx);
+       if (res) {
+               /* Remove the free function so we don't run the action. */
+               res->free = NULL;
+               kunit_remove_resource(test, res);
+               kunit_put_resource(res);
+       }
+}
+EXPORT_SYMBOL_GPL(kunit_remove_action);
+
+void kunit_release_action(struct kunit *test,
+                        kunit_action_t *action,
+                        void *ctx)
+{
+       struct kunit_action_ctx match_ctx;
+       struct kunit_resource *res;
+
+       match_ctx.func = action;
+       match_ctx.ctx = ctx;
+
+       res = kunit_find_resource(test, __kunit_action_match, &match_ctx);
+       if (res) {
+               kunit_remove_resource(test, res);
+               /* We have to put() this here, else free won't be called. */
+               kunit_put_resource(res);
+       }
+}
+EXPORT_SYMBOL_GPL(kunit_release_action);
index e2910b2..84e4666 100644 (file)
@@ -185,16 +185,28 @@ static void kunit_print_suite_start(struct kunit_suite *suite)
                  kunit_suite_num_test_cases(suite));
 }
 
-static void kunit_print_ok_not_ok(void *test_or_suite,
-                                 bool is_test,
+/* Currently supported test levels */
+enum {
+       KUNIT_LEVEL_SUITE = 0,
+       KUNIT_LEVEL_CASE,
+       KUNIT_LEVEL_CASE_PARAM,
+};
+
+static void kunit_print_ok_not_ok(struct kunit *test,
+                                 unsigned int test_level,
                                  enum kunit_status status,
                                  size_t test_number,
                                  const char *description,
                                  const char *directive)
 {
-       struct kunit_suite *suite = is_test ? NULL : test_or_suite;
-       struct kunit *test = is_test ? test_or_suite : NULL;
        const char *directive_header = (status == KUNIT_SKIPPED) ? " # SKIP " : "";
+       const char *directive_body = (status == KUNIT_SKIPPED) ? directive : "";
+
+       /*
+        * When test is NULL assume that results are from the suite
+        * and today suite results are expected at level 0 only.
+        */
+       WARN(!test && test_level, "suite test level can't be %u!\n", test_level);
 
        /*
         * We do not log the test suite results as doing so would
@@ -203,17 +215,18 @@ static void kunit_print_ok_not_ok(void *test_or_suite,
         * separately seq_printf() the suite results for the debugfs
         * representation.
         */
-       if (suite)
+       if (!test)
                pr_info("%s %zd %s%s%s\n",
                        kunit_status_to_ok_not_ok(status),
                        test_number, description, directive_header,
-                       (status == KUNIT_SKIPPED) ? directive : "");
+                       directive_body);
        else
                kunit_log(KERN_INFO, test,
-                         KUNIT_SUBTEST_INDENT "%s %zd %s%s%s",
+                         "%*s%s %zd %s%s%s",
+                         KUNIT_INDENT_LEN * test_level, "",
                          kunit_status_to_ok_not_ok(status),
                          test_number, description, directive_header,
-                         (status == KUNIT_SKIPPED) ? directive : "");
+                         directive_body);
 }
 
 enum kunit_status kunit_suite_has_succeeded(struct kunit_suite *suite)
@@ -239,7 +252,7 @@ static size_t kunit_suite_counter = 1;
 
 static void kunit_print_suite_end(struct kunit_suite *suite)
 {
-       kunit_print_ok_not_ok((void *)suite, false,
+       kunit_print_ok_not_ok(NULL, KUNIT_LEVEL_SUITE,
                              kunit_suite_has_succeeded(suite),
                              kunit_suite_counter++,
                              suite->name,
@@ -310,7 +323,7 @@ static void kunit_fail(struct kunit *test, const struct kunit_loc *loc,
        string_stream_destroy(stream);
 }
 
-static void __noreturn kunit_abort(struct kunit *test)
+void __noreturn __kunit_abort(struct kunit *test)
 {
        kunit_try_catch_throw(&test->try_catch); /* Does not return. */
 
@@ -322,8 +335,9 @@ static void __noreturn kunit_abort(struct kunit *test)
         */
        WARN_ONCE(true, "Throw could not abort from test!\n");
 }
+EXPORT_SYMBOL_GPL(__kunit_abort);
 
-void kunit_do_failed_assertion(struct kunit *test,
+void __kunit_do_failed_assertion(struct kunit *test,
                               const struct kunit_loc *loc,
                               enum kunit_assert_type type,
                               const struct kunit_assert *assert,
@@ -340,11 +354,8 @@ void kunit_do_failed_assertion(struct kunit *test,
        kunit_fail(test, loc, type, assert, assert_format, &message);
 
        va_end(args);
-
-       if (type == KUNIT_ASSERTION)
-               kunit_abort(test);
 }
-EXPORT_SYMBOL_GPL(kunit_do_failed_assertion);
+EXPORT_SYMBOL_GPL(__kunit_do_failed_assertion);
 
 void kunit_init_test(struct kunit *test, const char *name, char *log)
 {
@@ -419,15 +430,54 @@ static void kunit_try_run_case(void *data)
         * thread will resume control and handle any necessary clean up.
         */
        kunit_run_case_internal(test, suite, test_case);
-       /* This line may never be reached. */
+}
+
+static void kunit_try_run_case_cleanup(void *data)
+{
+       struct kunit_try_catch_context *ctx = data;
+       struct kunit *test = ctx->test;
+       struct kunit_suite *suite = ctx->suite;
+
+       current->kunit_test = test;
+
        kunit_run_case_cleanup(test, suite);
 }
 
+static void kunit_catch_run_case_cleanup(void *data)
+{
+       struct kunit_try_catch_context *ctx = data;
+       struct kunit *test = ctx->test;
+       int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
+
+       /* It is always a failure if cleanup aborts. */
+       kunit_set_failure(test);
+
+       if (try_exit_code) {
+               /*
+                * Test case could not finish, we have no idea what state it is
+                * in, so don't do clean up.
+                */
+               if (try_exit_code == -ETIMEDOUT) {
+                       kunit_err(test, "test case cleanup timed out\n");
+               /*
+                * Unknown internal error occurred preventing test case from
+                * running, so there is nothing to clean up.
+                */
+               } else {
+                       kunit_err(test, "internal error occurred during test case cleanup: %d\n",
+                                 try_exit_code);
+               }
+               return;
+       }
+
+       kunit_err(test, "test aborted during cleanup. continuing without cleaning up\n");
+}
+
+
 static void kunit_catch_run_case(void *data)
 {
        struct kunit_try_catch_context *ctx = data;
        struct kunit *test = ctx->test;
-       struct kunit_suite *suite = ctx->suite;
        int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
 
        if (try_exit_code) {
@@ -448,12 +498,6 @@ static void kunit_catch_run_case(void *data)
                }
                return;
        }
-
-       /*
-        * Test case was run, but aborted. It is the test case's business as to
-        * whether it failed or not, we just need to clean up.
-        */
-       kunit_run_case_cleanup(test, suite);
 }
 
 /*
@@ -478,6 +522,13 @@ static void kunit_run_case_catch_errors(struct kunit_suite *suite,
        context.test_case = test_case;
        kunit_try_catch_run(try_catch, &context);
 
+       /* Now run the cleanup */
+       kunit_try_catch_init(try_catch,
+                            test,
+                            kunit_try_run_case_cleanup,
+                            kunit_catch_run_case_cleanup);
+       kunit_try_catch_run(try_catch, &context);
+
        /* Propagate the parameter result to the test case. */
        if (test->status == KUNIT_FAILURE)
                test_case->status = KUNIT_FAILURE;
@@ -585,11 +636,11 @@ int kunit_run_tests(struct kunit_suite *suite)
                                                 "param-%d", test.param_index);
                                }
 
-                               kunit_log(KERN_INFO, &test,
-                                         KUNIT_SUBTEST_INDENT KUNIT_SUBTEST_INDENT
-                                         "%s %d %s",
-                                         kunit_status_to_ok_not_ok(test.status),
-                                         test.param_index + 1, param_desc);
+                               kunit_print_ok_not_ok(&test, KUNIT_LEVEL_CASE_PARAM,
+                                                     test.status,
+                                                     test.param_index + 1,
+                                                     param_desc,
+                                                     test.status_comment);
 
                                /* Get next param. */
                                param_desc[0] = '\0';
@@ -603,7 +654,7 @@ int kunit_run_tests(struct kunit_suite *suite)
 
                kunit_print_test_stats(&test, param_stats);
 
-               kunit_print_ok_not_ok(&test, true, test_case->status,
+               kunit_print_ok_not_ok(&test, KUNIT_LEVEL_CASE, test_case->status,
                                      kunit_test_case_num(suite, test_case),
                                      test_case->name,
                                      test.status_comment);
@@ -712,58 +763,28 @@ static struct notifier_block kunit_mod_nb = {
 };
 #endif
 
-struct kunit_kmalloc_array_params {
-       size_t n;
-       size_t size;
-       gfp_t gfp;
-};
-
-static int kunit_kmalloc_array_init(struct kunit_resource *res, void *context)
+void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
 {
-       struct kunit_kmalloc_array_params *params = context;
+       void *data;
 
-       res->data = kmalloc_array(params->n, params->size, params->gfp);
-       if (!res->data)
-               return -ENOMEM;
+       data = kmalloc_array(n, size, gfp);
 
-       return 0;
-}
+       if (!data)
+               return NULL;
 
-static void kunit_kmalloc_array_free(struct kunit_resource *res)
-{
-       kfree(res->data);
-}
-
-void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
-{
-       struct kunit_kmalloc_array_params params = {
-               .size = size,
-               .n = n,
-               .gfp = gfp
-       };
+       if (kunit_add_action_or_reset(test, (kunit_action_t *)kfree, data) != 0)
+               return NULL;
 
-       return kunit_alloc_resource(test,
-                                   kunit_kmalloc_array_init,
-                                   kunit_kmalloc_array_free,
-                                   gfp,
-                                   &params);
+       return data;
 }
 EXPORT_SYMBOL_GPL(kunit_kmalloc_array);
 
-static inline bool kunit_kfree_match(struct kunit *test,
-                                    struct kunit_resource *res, void *match_data)
-{
-       /* Only match resources allocated with kunit_kmalloc() and friends. */
-       return res->free == kunit_kmalloc_array_free && res->data == match_data;
-}
-
 void kunit_kfree(struct kunit *test, const void *ptr)
 {
        if (!ptr)
                return;
 
-       if (kunit_destroy_resource(test, kunit_kfree_match, (void *)ptr))
-               KUNIT_FAIL(test, "kunit_kfree: %px already freed or not allocated by kunit", ptr);
+       kunit_release_action(test, (kunit_action_t *)kfree, (void *)ptr);
 }
 EXPORT_SYMBOL_GPL(kunit_kfree);
 
index 110a364..8ebc43d 100644 (file)
@@ -5317,15 +5317,9 @@ int mas_empty_area(struct ma_state *mas, unsigned long min,
 
        mt = mte_node_type(mas->node);
        pivots = ma_pivots(mas_mn(mas), mt);
-       if (offset)
-               mas->min = pivots[offset - 1] + 1;
-
-       if (offset < mt_pivots[mt])
-               mas->max = pivots[offset];
-
-       if (mas->index < mas->min)
-               mas->index = mas->min;
-
+       min = mas_safe_min(mas, pivots, offset);
+       if (mas->index < min)
+               mas->index = min;
        mas->last = mas->index + size - 1;
        return 0;
 }
index 049ba13..1a31065 100644 (file)
@@ -27,6 +27,8 @@
 #include <linux/string.h>
 #include <linux/xarray.h>
 
+#include "radix-tree.h"
+
 /*
  * Radix tree node cache.
  */
diff --git a/lib/radix-tree.h b/lib/radix-tree.h
new file mode 100644 (file)
index 0000000..40d5c03
--- /dev/null
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* radix-tree helpers that are only shared with xarray */
+
+struct kmem_cache;
+struct rcu_head;
+
+extern struct kmem_cache *radix_tree_node_cachep;
+extern void radix_tree_node_rcu_free(struct rcu_head *head);
diff --git a/lib/raid6/neon.h b/lib/raid6/neon.h
new file mode 100644 (file)
index 0000000..2ca41ee
--- /dev/null
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+void raid6_neon1_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs);
+void raid6_neon1_xor_syndrome_real(int disks, int start, int stop,
+                                   unsigned long bytes, void **ptrs);
+void raid6_neon2_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs);
+void raid6_neon2_xor_syndrome_real(int disks, int start, int stop,
+                                   unsigned long bytes, void **ptrs);
+void raid6_neon4_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs);
+void raid6_neon4_xor_syndrome_real(int disks, int start, int stop,
+                                   unsigned long bytes, void **ptrs);
+void raid6_neon8_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs);
+void raid6_neon8_xor_syndrome_real(int disks, int start, int stop,
+                                   unsigned long bytes, void **ptrs);
+void __raid6_2data_recov_neon(int bytes, uint8_t *p, uint8_t *q, uint8_t *dp,
+                             uint8_t *dq, const uint8_t *pbmul,
+                             const uint8_t *qmul);
+
+void __raid6_datap_recov_neon(int bytes, uint8_t *p, uint8_t *q, uint8_t *dq,
+                             const uint8_t *qmul);
+
+
index b7c6803..355270a 100644 (file)
@@ -25,6 +25,7 @@
  */
 
 #include <arm_neon.h>
+#include "neon.h"
 
 typedef uint8x16_t unative_t;
 
index d6fba8b..1bfc141 100644 (file)
@@ -8,6 +8,7 @@
 
 #ifdef __KERNEL__
 #include <asm/neon.h>
+#include "neon.h"
 #else
 #define kernel_neon_begin()
 #define kernel_neon_end()
@@ -19,13 +20,6 @@ static int raid6_has_neon(void)
        return cpu_has_neon();
 }
 
-void __raid6_2data_recov_neon(int bytes, uint8_t *p, uint8_t *q, uint8_t *dp,
-                             uint8_t *dq, const uint8_t *pbmul,
-                             const uint8_t *qmul);
-
-void __raid6_datap_recov_neon(int bytes, uint8_t *p, uint8_t *q, uint8_t *dq,
-                             const uint8_t *qmul);
-
 static void raid6_2data_recov_neon(int disks, size_t bytes, int faila,
                int failb, void **ptrs)
 {
index 90eb80d..f9e7e8f 100644 (file)
@@ -5,6 +5,7 @@
  */
 
 #include <arm_neon.h>
+#include "neon.h"
 
 #ifdef CONFIG_ARM
 /*
index 05ed84c..1d7d480 100644 (file)
@@ -45,6 +45,7 @@ struct test_batched_req {
        bool sent;
        const struct firmware *fw;
        const char *name;
+       const char *fw_buf;
        struct completion completion;
        struct task_struct *task;
        struct device *dev;
@@ -175,8 +176,14 @@ static void __test_release_all_firmware(void)
 
        for (i = 0; i < test_fw_config->num_requests; i++) {
                req = &test_fw_config->reqs[i];
-               if (req->fw)
+               if (req->fw) {
+                       if (req->fw_buf) {
+                               kfree_const(req->fw_buf);
+                               req->fw_buf = NULL;
+                       }
                        release_firmware(req->fw);
+                       req->fw = NULL;
+               }
        }
 
        vfree(test_fw_config->reqs);
@@ -353,16 +360,26 @@ static ssize_t config_test_show_str(char *dst,
        return len;
 }
 
-static intest_dev_config_update_bool(const char *buf, size_t size,
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
                                       bool *cfg)
 {
        int ret;
 
-       mutex_lock(&test_fw_mutex);
        if (kstrtobool(buf, cfg) < 0)
                ret = -EINVAL;
        else
                ret = size;
+
+       return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+                                      bool *cfg)
+{
+       int ret;
+
+       mutex_lock(&test_fw_mutex);
+       ret = __test_dev_config_update_bool(buf, size, cfg);
        mutex_unlock(&test_fw_mutex);
 
        return ret;
@@ -373,7 +390,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val)
        return snprintf(buf, PAGE_SIZE, "%d\n", val);
 }
 
-static int test_dev_config_update_size_t(const char *buf,
+static int __test_dev_config_update_size_t(
+                                        const char *buf,
                                         size_t size,
                                         size_t *cfg)
 {
@@ -384,9 +402,7 @@ static int test_dev_config_update_size_t(const char *buf,
        if (ret)
                return ret;
 
-       mutex_lock(&test_fw_mutex);
        *(size_t *)cfg = new;
-       mutex_unlock(&test_fw_mutex);
 
        /* Always return full write size even if we didn't consume all */
        return size;
@@ -402,7 +418,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val)
        return snprintf(buf, PAGE_SIZE, "%d\n", val);
 }
 
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
 {
        u8 val;
        int ret;
@@ -411,14 +427,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
        if (ret)
                return ret;
 
-       mutex_lock(&test_fw_mutex);
        *(u8 *)cfg = val;
-       mutex_unlock(&test_fw_mutex);
 
        /* Always return full write size even if we didn't consume all */
        return size;
 }
 
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+       int ret;
+
+       mutex_lock(&test_fw_mutex);
+       ret = __test_dev_config_update_u8(buf, size, cfg);
+       mutex_unlock(&test_fw_mutex);
+
+       return ret;
+}
+
 static ssize_t test_dev_config_show_u8(char *buf, u8 val)
 {
        return snprintf(buf, PAGE_SIZE, "%u\n", val);
@@ -471,10 +496,10 @@ static ssize_t config_num_requests_store(struct device *dev,
                mutex_unlock(&test_fw_mutex);
                goto out;
        }
-       mutex_unlock(&test_fw_mutex);
 
-       rc = test_dev_config_update_u8(buf, count,
-                                      &test_fw_config->num_requests);
+       rc = __test_dev_config_update_u8(buf, count,
+                                        &test_fw_config->num_requests);
+       mutex_unlock(&test_fw_mutex);
 
 out:
        return rc;
@@ -518,10 +543,10 @@ static ssize_t config_buf_size_store(struct device *dev,
                mutex_unlock(&test_fw_mutex);
                goto out;
        }
-       mutex_unlock(&test_fw_mutex);
 
-       rc = test_dev_config_update_size_t(buf, count,
-                                          &test_fw_config->buf_size);
+       rc = __test_dev_config_update_size_t(buf, count,
+                                            &test_fw_config->buf_size);
+       mutex_unlock(&test_fw_mutex);
 
 out:
        return rc;
@@ -548,10 +573,10 @@ static ssize_t config_file_offset_store(struct device *dev,
                mutex_unlock(&test_fw_mutex);
                goto out;
        }
-       mutex_unlock(&test_fw_mutex);
 
-       rc = test_dev_config_update_size_t(buf, count,
-                                          &test_fw_config->file_offset);
+       rc = __test_dev_config_update_size_t(buf, count,
+                                            &test_fw_config->file_offset);
+       mutex_unlock(&test_fw_mutex);
 
 out:
        return rc;
@@ -652,6 +677,8 @@ static ssize_t trigger_request_store(struct device *dev,
 
        mutex_lock(&test_fw_mutex);
        release_firmware(test_firmware);
+       if (test_fw_config->reqs)
+               __test_release_all_firmware();
        test_firmware = NULL;
        rc = request_firmware(&test_firmware, name, dev);
        if (rc) {
@@ -752,6 +779,8 @@ static ssize_t trigger_async_request_store(struct device *dev,
        mutex_lock(&test_fw_mutex);
        release_firmware(test_firmware);
        test_firmware = NULL;
+       if (test_fw_config->reqs)
+               __test_release_all_firmware();
        rc = request_firmware_nowait(THIS_MODULE, 1, name, dev, GFP_KERNEL,
                                     NULL, trigger_async_request_cb);
        if (rc) {
@@ -794,6 +823,8 @@ static ssize_t trigger_custom_fallback_store(struct device *dev,
 
        mutex_lock(&test_fw_mutex);
        release_firmware(test_firmware);
+       if (test_fw_config->reqs)
+               __test_release_all_firmware();
        test_firmware = NULL;
        rc = request_firmware_nowait(THIS_MODULE, FW_ACTION_NOUEVENT, name,
                                     dev, GFP_KERNEL, NULL,
@@ -856,6 +887,8 @@ static int test_fw_run_batch_request(void *data)
                                                 test_fw_config->buf_size);
                if (!req->fw)
                        kfree(test_buf);
+               else
+                       req->fw_buf = test_buf;
        } else {
                req->rc = test_fw_config->req_firmware(&req->fw,
                                                       req->name,
@@ -895,6 +928,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev,
 
        mutex_lock(&test_fw_mutex);
 
+       if (test_fw_config->reqs) {
+               rc = -EBUSY;
+               goto out_bail;
+       }
+
        test_fw_config->reqs =
                vzalloc(array3_size(sizeof(struct test_batched_req),
                                    test_fw_config->num_requests, 2));
@@ -911,6 +949,7 @@ static ssize_t trigger_batched_requests_store(struct device *dev,
                req->fw = NULL;
                req->idx = i;
                req->name = test_fw_config->name;
+               req->fw_buf = NULL;
                req->dev = dev;
                init_completion(&req->completion);
                req->task = kthread_run(test_fw_run_batch_request, req,
@@ -993,6 +1032,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev,
 
        mutex_lock(&test_fw_mutex);
 
+       if (test_fw_config->reqs) {
+               rc = -EBUSY;
+               goto out_bail;
+       }
+
        test_fw_config->reqs =
                vzalloc(array3_size(sizeof(struct test_batched_req),
                                    test_fw_config->num_requests, 2));
@@ -1010,6 +1054,7 @@ ssize_t trigger_batched_requests_async_store(struct device *dev,
        for (i = 0; i < test_fw_config->num_requests; i++) {
                req = &test_fw_config->reqs[i];
                req->name = test_fw_config->name;
+               req->fw_buf = NULL;
                req->fw = NULL;
                req->idx = i;
                init_completion(&req->completion);
index 9dd9745..3718d98 100644 (file)
@@ -369,7 +369,7 @@ vm_map_ram_test(void)
        int i;
 
        map_nr_pages = nr_pages > 0 ? nr_pages:1;
-       pages = kmalloc(map_nr_pages * sizeof(struct page), GFP_KERNEL);
+       pages = kcalloc(map_nr_pages, sizeof(struct page *), GFP_KERNEL);
        if (!pages)
                return -1;
 
index ea9ce1f..2071a37 100644 (file)
@@ -12,6 +12,8 @@
 #include <linux/slab.h>
 #include <linux/xarray.h>
 
+#include "radix-tree.h"
+
 /*
  * Coding conventions in this file:
  *
@@ -247,10 +249,6 @@ void *xas_load(struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_load);
 
-/* Move the radix tree node cache here */
-extern struct kmem_cache *radix_tree_node_cachep;
-extern void radix_tree_node_rcu_free(struct rcu_head *head);
-
 #define XA_RCU_FREE    ((struct xarray *)1)
 
 static void xa_node_free(struct xa_node *node)
index a925415..018a5bd 100644 (file)
@@ -98,6 +98,7 @@ config PAGE_OWNER
 config PAGE_TABLE_CHECK
        bool "Check for invalid mappings in user page tables"
        depends on ARCH_SUPPORTS_PAGE_TABLE_CHECK
+       depends on EXCLUSIVE_SYSTEM_RAM
        select PAGE_EXTENSION
        help
          Check that anonymous page is not being mapped twice with read write
index d9ef620..91cff7f 100644 (file)
@@ -551,6 +551,8 @@ int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs)
                return -EINVAL;
        if (attrs->min_nr_regions > attrs->max_nr_regions)
                return -EINVAL;
+       if (attrs->sample_interval > attrs->aggr_interval)
+               return -EINVAL;
 
        damon_update_monitoring_results(ctx, attrs);
        ctx->attrs = *attrs;
index b4c9bd3..00f01d8 100644 (file)
@@ -1760,7 +1760,9 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
  *
  * Return: The index of the gap if found, otherwise an index outside the
  * range specified (in which case 'return - index >= max_scan' will be true).
- * In the rare case of index wrap-around, 0 will be returned.
+ * In the rare case of index wrap-around, 0 will be returned.  0 will also
+ * be returned if index == 0 and there is a gap at the index.  We can not
+ * wrap-around if passed index == 0.
  */
 pgoff_t page_cache_next_miss(struct address_space *mapping,
                             pgoff_t index, unsigned long max_scan)
@@ -1770,12 +1772,13 @@ pgoff_t page_cache_next_miss(struct address_space *mapping,
        while (max_scan--) {
                void *entry = xas_next(&xas);
                if (!entry || xa_is_value(entry))
-                       break;
-               if (xas.xa_index == 0)
-                       break;
+                       return xas.xa_index;
+               if (xas.xa_index == 0 && index != 0)
+                       return xas.xa_index;
        }
 
-       return xas.xa_index;
+       /* No gaps in range and no wrap-around, return index beyond range */
+       return xas.xa_index + 1;
 }
 EXPORT_SYMBOL(page_cache_next_miss);
 
@@ -1796,7 +1799,9 @@ EXPORT_SYMBOL(page_cache_next_miss);
  *
  * Return: The index of the gap if found, otherwise an index outside the
  * range specified (in which case 'index - return >= max_scan' will be true).
- * In the rare case of wrap-around, ULONG_MAX will be returned.
+ * In the rare case of wrap-around, ULONG_MAX will be returned.  ULONG_MAX
+ * will also be returned if index == ULONG_MAX and there is a gap at the
+ * index.  We can not wrap-around if passed index == ULONG_MAX.
  */
 pgoff_t page_cache_prev_miss(struct address_space *mapping,
                             pgoff_t index, unsigned long max_scan)
@@ -1806,12 +1811,13 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
        while (max_scan--) {
                void *entry = xas_prev(&xas);
                if (!entry || xa_is_value(entry))
-                       break;
-               if (xas.xa_index == ULONG_MAX)
-                       break;
+                       return xas.xa_index;
+               if (xas.xa_index == ULONG_MAX && index != ULONG_MAX)
+                       return xas.xa_index;
        }
 
-       return xas.xa_index;
+       /* No gaps in range and no wrap-around, return index beyond range */
+       return xas.xa_index - 1;
 }
 EXPORT_SYMBOL(page_cache_prev_miss);
 
@@ -2687,8 +2693,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
                if (unlikely(iocb->ki_pos >= i_size_read(inode)))
                        break;
 
-               error = filemap_get_pages(iocb, iter->count, &fbatch,
-                                         iov_iter_is_pipe(iter));
+               error = filemap_get_pages(iocb, iter->count, &fbatch, false);
                if (error < 0)
                        break;
 
@@ -2872,9 +2877,24 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
        return spliced;
 }
 
-/*
- * Splice folios from the pagecache of a buffered (ie. non-O_DIRECT) file into
- * a pipe.
+/**
+ * filemap_splice_read -  Splice data from a file's pagecache into a pipe
+ * @in: The file to read from
+ * @ppos: Pointer to the file position to read from
+ * @pipe: The pipe to splice into
+ * @len: The amount to splice
+ * @flags: The SPLICE_F_* flags
+ *
+ * This function gets folios from a file's pagecache and splices them into the
+ * pipe.  Readahead will be called as necessary to fill more folios.  This may
+ * be used for blockdevs also.
+ *
+ * Return: On success, the number of bytes read will be returned and *@ppos
+ * will be updated if appropriate; 0 will be returned if there is no more data
+ * to be read; -EAGAIN will be returned if the pipe had no space, and some
+ * other negative error code will be returned on error.  A short read may occur
+ * if the pipe has insufficient space, we reach the end of the data or we hit a
+ * hole.
  */
 ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
                            struct pipe_inode_info *pipe,
@@ -2887,6 +2907,9 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
        bool writably_mapped;
        int i, error = 0;
 
+       if (unlikely(*ppos >= in->f_mapping->host->i_sb->s_maxbytes))
+               return 0;
+
        init_sync_kiocb(&iocb, in);
        iocb.ki_pos = *ppos;
 
@@ -2900,7 +2923,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
        do {
                cond_resched();
 
-               if (*ppos >= i_size_read(file_inode(in)))
+               if (*ppos >= i_size_read(in->f_mapping->host))
                        break;
 
                iocb.ki_pos = *ppos;
@@ -2916,7 +2939,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
                 * part of the page is not copied back to userspace (unless
                 * another truncate extends the file - this is desired though).
                 */
-               isize = i_size_read(file_inode(in));
+               isize = i_size_read(in->f_mapping->host);
                if (unlikely(*ppos >= isize))
                        break;
                end_offset = min_t(loff_t, isize, *ppos + len);
index bbe4162..0814576 100644 (file)
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -51,7 +51,8 @@ static inline void sanity_check_pinned_pages(struct page **pages,
                struct page *page = *pages;
                struct folio *folio = page_folio(page);
 
-               if (!folio_test_anon(folio))
+               if (is_zero_page(page) ||
+                   !folio_test_anon(folio))
                        continue;
                if (!folio_test_large(folio) || folio_test_hugetlb(folio))
                        VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page);
@@ -132,6 +133,13 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
                struct folio *folio;
 
                /*
+                * Don't take a pin on the zero page - it's not going anywhere
+                * and it is used in a *lot* of places.
+                */
+               if (is_zero_page(page))
+                       return page_folio(page);
+
+               /*
                 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
                 * right zone, so fail and let the caller fall back to the slow
                 * path.
@@ -180,6 +188,8 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
 static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 {
        if (flags & FOLL_PIN) {
+               if (is_zero_folio(folio))
+                       return;
                node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
                if (folio_test_large(folio))
                        atomic_sub(refs, &folio->_pincount);
@@ -225,6 +235,13 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
                folio_ref_inc(folio);
        else if (flags & FOLL_PIN) {
                /*
+                * Don't take a pin on the zero page - it's not going anywhere
+                * and it is used in a *lot* of places.
+                */
+               if (is_zero_page(page))
+                       return 0;
+
+               /*
                 * Similar to try_grab_folio(): be sure to *also*
                 * increment the normal page refcount field at least once,
                 * so that the page really is pinned.
@@ -258,6 +275,33 @@ void unpin_user_page(struct page *page)
 }
 EXPORT_SYMBOL(unpin_user_page);
 
+/**
+ * folio_add_pin - Try to get an additional pin on a pinned folio
+ * @folio: The folio to be pinned
+ *
+ * Get an additional pin on a folio we already have a pin on.  Makes no change
+ * if the folio is a zero_page.
+ */
+void folio_add_pin(struct folio *folio)
+{
+       if (is_zero_folio(folio))
+               return;
+
+       /*
+        * Similar to try_grab_folio(): be sure to *also* increment the normal
+        * page refcount field at least once, so that the page really is
+        * pinned.
+        */
+       if (folio_test_large(folio)) {
+               WARN_ON_ONCE(atomic_read(&folio->_pincount) < 1);
+               folio_ref_inc(folio);
+               atomic_inc(&folio->_pincount);
+       } else {
+               WARN_ON_ONCE(folio_ref_count(folio) < GUP_PIN_COUNTING_BIAS);
+               folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
+       }
+}
+
 static inline struct folio *gup_folio_range_next(struct page *start,
                unsigned long npages, unsigned long i, unsigned int *ntails)
 {
@@ -3079,6 +3123,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
  *
  * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
  * see Documentation/core-api/pin_user_pages.rst for further details.
+ *
+ * Note that if a zero_page is amongst the returned pages, it will not have
+ * pins in it and unpin_user_page() will not remove pins from it.
  */
 int pin_user_pages_fast(unsigned long start, int nr_pages,
                        unsigned int gup_flags, struct page **pages)
@@ -3110,6 +3157,9 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
  *
  * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
  * see Documentation/core-api/pin_user_pages.rst for details.
+ *
+ * Note that if a zero_page is amongst the returned pages, it will not have
+ * pins in it and unpin_user_page*() will not remove pins from it.
  */
 long pin_user_pages_remote(struct mm_struct *mm,
                           unsigned long start, unsigned long nr_pages,
@@ -3143,6 +3193,9 @@ EXPORT_SYMBOL(pin_user_pages_remote);
  *
  * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
  * see Documentation/core-api/pin_user_pages.rst for details.
+ *
+ * Note that if a zero_page is amongst the returned pages, it will not have
+ * pins in it and unpin_user_page*() will not remove pins from it.
  */
 long pin_user_pages(unsigned long start, unsigned long nr_pages,
                    unsigned int gup_flags, struct page **pages,
@@ -3161,6 +3214,9 @@ EXPORT_SYMBOL(pin_user_pages);
  * pin_user_pages_unlocked() is the FOLL_PIN variant of
  * get_user_pages_unlocked(). Behavior is the same, except that this one sets
  * FOLL_PIN and rejects FOLL_GET.
+ *
+ * Note that if a zero_page is amongst the returned pages, it will not have
+ * pins in it and unpin_user_page*() will not remove pins from it.
  */
 long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
                             struct page **pages, unsigned int gup_flags)
index 8ae7307..c0421b7 100644 (file)
@@ -381,6 +381,7 @@ static int gup_test_release(struct inode *inode, struct file *file)
 static const struct file_operations gup_test_fops = {
        .open = nonseekable_open,
        .unlocked_ioctl = gup_test_ioctl,
+       .compat_ioctl = compat_ptr_ioctl,
        .release = gup_test_release,
 };
 
index 68410c6..e6029d9 100644 (file)
@@ -179,12 +179,6 @@ extern unsigned long highest_memmap_pfn;
 #define MAX_RECLAIM_RETRIES 16
 
 /*
- * in mm/early_ioremap.c
- */
-pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
-                                       unsigned long size, pgprot_t prot);
-
-/*
  * in mm/vmscan.c:
  */
 bool isolate_lru_page(struct page *page);
index 2aafc46..392fb27 100644 (file)
@@ -29,7 +29,7 @@
  * canary of every 8 bytes is the same. 64-bit memory can be filled and checked
  * at a time instead of byte by byte to improve performance.
  */
-#define KFENCE_CANARY_PATTERN_U64 ((u64)0xaaaaaaaaaaaaaaaa ^ (u64)(0x0706050403020100))
+#define KFENCE_CANARY_PATTERN_U64 ((u64)0xaaaaaaaaaaaaaaaa ^ (u64)(le64_to_cpu(0x0706050403020100)))
 
 /* Maximum stack depth for reports. */
 #define KFENCE_STACK_DEPTH 64
index 6b9d39d..2d0d58f 100644 (file)
@@ -2070,7 +2070,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
                                        TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);
 
                xas_lock_irq(&xas);
-               xas_set(&xas, index);
 
                VM_BUG_ON_PAGE(page != xas_load(&xas), page);
 
index 3feafea..50b9211 100644 (file)
@@ -1436,6 +1436,15 @@ done:
                 */
                kmemleak_alloc_phys(found, size, 0);
 
+       /*
+        * Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP,
+        * require memory to be accepted before it can be used by the
+        * guest.
+        *
+        * Accept the memory of the allocated buffer.
+        */
+       accept_memory(found, found + size);
+
        return found;
 }
 
index 69b90c3..e763e76 100644 (file)
@@ -371,12 +371,15 @@ SYSCALL_DEFINE2(memfd_create,
 
                inode->i_mode &= ~0111;
                file_seals = memfd_file_seals_ptr(file);
-               *file_seals &= ~F_SEAL_SEAL;
-               *file_seals |= F_SEAL_EXEC;
+               if (file_seals) {
+                       *file_seals &= ~F_SEAL_SEAL;
+                       *file_seals |= F_SEAL_EXEC;
+               }
        } else if (flags & MFD_ALLOW_SEALING) {
                /* MFD_EXEC and MFD_ALLOW_SEALING are set */
                file_seals = memfd_file_seals_ptr(file);
-               *file_seals &= ~F_SEAL_SEAL;
+               if (file_seals)
+                       *file_seals &= ~F_SEAL_SEAL;
        }
 
        fd_install(fd, file);
index 7f7f9c6..1cfc08e 100644 (file)
@@ -1375,6 +1375,10 @@ static void __meminit zone_init_free_lists(struct zone *zone)
                INIT_LIST_HEAD(&zone->free_area[order].free_list[t]);
                zone->free_area[order].nr_free = 0;
        }
+
+#ifdef CONFIG_UNACCEPTED_MEMORY
+       INIT_LIST_HEAD(&zone->unaccepted_pages);
+#endif
 }
 
 void __meminit init_currently_empty_zone(struct zone *zone,
@@ -1960,6 +1964,9 @@ static void __init deferred_free_range(unsigned long pfn,
                return;
        }
 
+       /* Accept chunks smaller than MAX_ORDER upfront */
+       accept_memory(PFN_PHYS(pfn), PFN_PHYS(pfn + nr_pages));
+
        for (i = 0; i < nr_pages; i++, page++, pfn++) {
                if (pageblock_aligned(pfn))
                        set_pageblock_migratetype(page, MIGRATE_MOVABLE);
index 13678ed..d600404 100644 (file)
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2318,21 +2318,6 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
        return __split_vma(vmi, vma, addr, new_below);
 }
 
-static inline int munmap_sidetree(struct vm_area_struct *vma,
-                                  struct ma_state *mas_detach)
-{
-       vma_start_write(vma);
-       mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);
-       if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
-               return -ENOMEM;
-
-       vma_mark_detached(vma, true);
-       if (vma->vm_flags & VM_LOCKED)
-               vma->vm_mm->locked_vm -= vma_pages(vma);
-
-       return 0;
-}
-
 /*
  * do_vmi_align_munmap() - munmap the aligned region from @start to @end.
  * @vmi: The vma iterator
@@ -2354,6 +2339,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
        struct maple_tree mt_detach;
        int count = 0;
        int error = -ENOMEM;
+       unsigned long locked_vm = 0;
        MA_STATE(mas_detach, &mt_detach, 0, 0);
        mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
        mt_set_external_lock(&mt_detach, &mm->mmap_lock);
@@ -2399,9 +2385,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
                        if (error)
                                goto end_split_failed;
                }
-               error = munmap_sidetree(next, &mas_detach);
-               if (error)
-                       goto munmap_sidetree_failed;
+               vma_start_write(next);
+               mas_set_range(&mas_detach, next->vm_start, next->vm_end - 1);
+               if (mas_store_gfp(&mas_detach, next, GFP_KERNEL))
+                       goto munmap_gather_failed;
+               vma_mark_detached(next, true);
+               if (next->vm_flags & VM_LOCKED)
+                       locked_vm += vma_pages(next);
 
                count++;
 #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
@@ -2447,10 +2437,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
        }
 #endif
        /* Point of no return */
+       error = -ENOMEM;
        vma_iter_set(vmi, start);
        if (vma_iter_clear_gfp(vmi, start, end, GFP_KERNEL))
-               return -ENOMEM;
+               goto clear_tree_failed;
 
+       mm->locked_vm -= locked_vm;
        mm->map_count -= count;
        /*
         * Do not downgrade mmap_lock if we are next to VM_GROWSDOWN or
@@ -2480,9 +2472,14 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
        validate_mm(mm);
        return downgrade ? 1 : 0;
 
+clear_tree_failed:
 userfaultfd_error:
-munmap_sidetree_failed:
+munmap_gather_failed:
 end_split_failed:
+       mas_set(&mas_detach, 0);
+       mas_for_each(&mas_detach, next, end)
+               vma_mark_detached(next, false);
+
        __mt_destroy(&mt_detach);
 start_split_failed:
 map_count_exceeded:
index 92d3d3c..c59e756 100644 (file)
@@ -867,7 +867,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
        }
        tlb_finish_mmu(&tlb);
 
-       if (!error && vma_iter_end(&vmi) < end)
+       if (!error && tmp < end)
                error = -ENOMEM;
 
 out:
index 47421be..d239fba 100644 (file)
@@ -387,6 +387,12 @@ EXPORT_SYMBOL(nr_node_ids);
 EXPORT_SYMBOL(nr_online_nodes);
 #endif
 
+static bool page_contains_unaccepted(struct page *page, unsigned int order);
+static void accept_page(struct page *page, unsigned int order);
+static bool try_to_accept_memory(struct zone *zone, unsigned int order);
+static inline bool has_unaccepted_memory(void);
+static bool __free_unaccepted(struct page *page);
+
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
@@ -1481,6 +1487,13 @@ void __free_pages_core(struct page *page, unsigned int order)
 
        atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
 
+       if (page_contains_unaccepted(page, order)) {
+               if (order == MAX_ORDER && __free_unaccepted(page))
+                       return;
+
+               accept_page(page, order);
+       }
+
        /*
         * Bypass PCP and place fresh pages right to the tail, primarily
         * relevant for memory onlining.
@@ -3159,6 +3172,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
        if (!(alloc_flags & ALLOC_CMA))
                unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
 #endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+       unusable_free += zone_page_state(z, NR_UNACCEPTED);
+#endif
 
        return unusable_free;
 }
@@ -3458,6 +3474,11 @@ retry:
                                       gfp_mask)) {
                        int ret;
 
+                       if (has_unaccepted_memory()) {
+                               if (try_to_accept_memory(zone, order))
+                                       goto try_this_zone;
+                       }
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
                        /*
                         * Watermark failed for this zone, but see if we can
@@ -3510,6 +3531,11 @@ try_this_zone:
 
                        return page;
                } else {
+                       if (has_unaccepted_memory()) {
+                               if (try_to_accept_memory(zone, order))
+                                       goto try_this_zone;
+                       }
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
                        /* Try again if zone has deferred pages */
                        if (deferred_pages_enabled()) {
@@ -7215,3 +7241,150 @@ bool has_managed_dma(void)
        return false;
 }
 #endif /* CONFIG_ZONE_DMA */
+
+#ifdef CONFIG_UNACCEPTED_MEMORY
+
+/* Counts number of zones with unaccepted pages. */
+static DEFINE_STATIC_KEY_FALSE(zones_with_unaccepted_pages);
+
+static bool lazy_accept = true;
+
+static int __init accept_memory_parse(char *p)
+{
+       if (!strcmp(p, "lazy")) {
+               lazy_accept = true;
+               return 0;
+       } else if (!strcmp(p, "eager")) {
+               lazy_accept = false;
+               return 0;
+       } else {
+               return -EINVAL;
+       }
+}
+early_param("accept_memory", accept_memory_parse);
+
+static bool page_contains_unaccepted(struct page *page, unsigned int order)
+{
+       phys_addr_t start = page_to_phys(page);
+       phys_addr_t end = start + (PAGE_SIZE << order);
+
+       return range_contains_unaccepted_memory(start, end);
+}
+
+static void accept_page(struct page *page, unsigned int order)
+{
+       phys_addr_t start = page_to_phys(page);
+
+       accept_memory(start, start + (PAGE_SIZE << order));
+}
+
+static bool try_to_accept_memory_one(struct zone *zone)
+{
+       unsigned long flags;
+       struct page *page;
+       bool last;
+
+       if (list_empty(&zone->unaccepted_pages))
+               return false;
+
+       spin_lock_irqsave(&zone->lock, flags);
+       page = list_first_entry_or_null(&zone->unaccepted_pages,
+                                       struct page, lru);
+       if (!page) {
+               spin_unlock_irqrestore(&zone->lock, flags);
+               return false;
+       }
+
+       list_del(&page->lru);
+       last = list_empty(&zone->unaccepted_pages);
+
+       __mod_zone_freepage_state(zone, -MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
+       __mod_zone_page_state(zone, NR_UNACCEPTED, -MAX_ORDER_NR_PAGES);
+       spin_unlock_irqrestore(&zone->lock, flags);
+
+       accept_page(page, MAX_ORDER);
+
+       __free_pages_ok(page, MAX_ORDER, FPI_TO_TAIL);
+
+       if (last)
+               static_branch_dec(&zones_with_unaccepted_pages);
+
+       return true;
+}
+
+static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+{
+       long to_accept;
+       int ret = false;
+
+       /* How much to accept to get to high watermark? */
+       to_accept = high_wmark_pages(zone) -
+                   (zone_page_state(zone, NR_FREE_PAGES) -
+                   __zone_watermark_unusable_free(zone, order, 0));
+
+       /* Accept at least one page */
+       do {
+               if (!try_to_accept_memory_one(zone))
+                       break;
+               ret = true;
+               to_accept -= MAX_ORDER_NR_PAGES;
+       } while (to_accept > 0);
+
+       return ret;
+}
+
+static inline bool has_unaccepted_memory(void)
+{
+       return static_branch_unlikely(&zones_with_unaccepted_pages);
+}
+
+static bool __free_unaccepted(struct page *page)
+{
+       struct zone *zone = page_zone(page);
+       unsigned long flags;
+       bool first = false;
+
+       if (!lazy_accept)
+               return false;
+
+       spin_lock_irqsave(&zone->lock, flags);
+       first = list_empty(&zone->unaccepted_pages);
+       list_add_tail(&page->lru, &zone->unaccepted_pages);
+       __mod_zone_freepage_state(zone, MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
+       __mod_zone_page_state(zone, NR_UNACCEPTED, MAX_ORDER_NR_PAGES);
+       spin_unlock_irqrestore(&zone->lock, flags);
+
+       if (first)
+               static_branch_inc(&zones_with_unaccepted_pages);
+
+       return true;
+}
+
+#else
+
+static bool page_contains_unaccepted(struct page *page, unsigned int order)
+{
+       return false;
+}
+
+static void accept_page(struct page *page, unsigned int order)
+{
+}
+
+static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+{
+       return false;
+}
+
+static inline bool has_unaccepted_memory(void)
+{
+       return false;
+}
+
+static bool __free_unaccepted(struct page *page)
+{
+       BUILD_BUG();
+       return false;
+}
+
+#endif /* CONFIG_UNACCEPTED_MEMORY */
index 87b682d..684cd3c 100644 (file)
@@ -338,7 +338,7 @@ static void swap_writepage_bdev_sync(struct page *page,
        bio_init(&bio, sis->bdev, &bv, 1,
                 REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc));
        bio.bi_iter.bi_sector = swap_page_sector(page);
-       bio_add_page(&bio, page, thp_size(page), 0);
+       __bio_add_page(&bio, page, thp_size(page), 0);
 
        bio_associate_blkg_from_page(&bio, page);
        count_swpout_vm_event(page);
@@ -360,7 +360,7 @@ static void swap_writepage_bdev_async(struct page *page,
                        GFP_NOIO);
        bio->bi_iter.bi_sector = swap_page_sector(page);
        bio->bi_end_io = end_swap_bio_write;
-       bio_add_page(bio, page, thp_size(page), 0);
+       __bio_add_page(bio, page, thp_size(page), 0);
 
        bio_associate_blkg_from_page(bio, page);
        count_swpout_vm_event(page);
@@ -468,7 +468,7 @@ static void swap_readpage_bdev_sync(struct page *page,
 
        bio_init(&bio, sis->bdev, &bv, 1, REQ_OP_READ);
        bio.bi_iter.bi_sector = swap_page_sector(page);
-       bio_add_page(&bio, page, thp_size(page), 0);
+       __bio_add_page(&bio, page, thp_size(page), 0);
        /*
         * Keep this task valid during swap readpage because the oom killer may
         * attempt to access it in the page fault retry time check.
@@ -488,7 +488,7 @@ static void swap_readpage_bdev_async(struct page *page,
        bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
        bio->bi_iter.bi_sector = swap_page_sector(page);
        bio->bi_end_io = end_swap_bio_read;
-       bio_add_page(bio, page, thp_size(page), 0);
+       __bio_add_page(bio, page, thp_size(page), 0);
        count_vm_event(PSWPIN);
        submit_bio(bio);
 }
index 25d8610..f2baf97 100644 (file)
@@ -71,6 +71,8 @@ static void page_table_check_clear(struct mm_struct *mm, unsigned long addr,
 
        page = pfn_to_page(pfn);
        page_ext = page_ext_get(page);
+
+       BUG_ON(PageSlab(page));
        anon = PageAnon(page);
 
        for (i = 0; i < pgcnt; i++) {
@@ -107,6 +109,8 @@ static void page_table_check_set(struct mm_struct *mm, unsigned long addr,
 
        page = pfn_to_page(pfn);
        page_ext = page_ext_get(page);
+
+       BUG_ON(PageSlab(page));
        anon = PageAnon(page);
 
        for (i = 0; i < pgcnt; i++) {
@@ -133,6 +137,8 @@ void __page_table_check_zero(struct page *page, unsigned int order)
        struct page_ext *page_ext;
        unsigned long i;
 
+       BUG_ON(PageSlab(page));
+
        page_ext = page_ext_get(page);
        BUG_ON(!page_ext);
        for (i = 0; i < (1ul << order); i++) {
index e40a08c..1f504ed 100644 (file)
@@ -2731,6 +2731,138 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
        return retval ? retval : error;
 }
 
+static bool zero_pipe_buf_get(struct pipe_inode_info *pipe,
+                             struct pipe_buffer *buf)
+{
+       return true;
+}
+
+static void zero_pipe_buf_release(struct pipe_inode_info *pipe,
+                                 struct pipe_buffer *buf)
+{
+}
+
+static bool zero_pipe_buf_try_steal(struct pipe_inode_info *pipe,
+                                   struct pipe_buffer *buf)
+{
+       return false;
+}
+
+static const struct pipe_buf_operations zero_pipe_buf_ops = {
+       .release        = zero_pipe_buf_release,
+       .try_steal      = zero_pipe_buf_try_steal,
+       .get            = zero_pipe_buf_get,
+};
+
+static size_t splice_zeropage_into_pipe(struct pipe_inode_info *pipe,
+                                       loff_t fpos, size_t size)
+{
+       size_t offset = fpos & ~PAGE_MASK;
+
+       size = min_t(size_t, size, PAGE_SIZE - offset);
+
+       if (!pipe_full(pipe->head, pipe->tail, pipe->max_usage)) {
+               struct pipe_buffer *buf = pipe_head_buf(pipe);
+
+               *buf = (struct pipe_buffer) {
+                       .ops    = &zero_pipe_buf_ops,
+                       .page   = ZERO_PAGE(0),
+                       .offset = offset,
+                       .len    = size,
+               };
+               pipe->head++;
+       }
+
+       return size;
+}
+
+static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
+                                     struct pipe_inode_info *pipe,
+                                     size_t len, unsigned int flags)
+{
+       struct inode *inode = file_inode(in);
+       struct address_space *mapping = inode->i_mapping;
+       struct folio *folio = NULL;
+       size_t total_spliced = 0, used, npages, n, part;
+       loff_t isize;
+       int error = 0;
+
+       /* Work out how much data we can actually add into the pipe */
+       used = pipe_occupancy(pipe->head, pipe->tail);
+       npages = max_t(ssize_t, pipe->max_usage - used, 0);
+       len = min_t(size_t, len, npages * PAGE_SIZE);
+
+       do {
+               if (*ppos >= i_size_read(inode))
+                       break;
+
+               error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio, SGP_READ);
+               if (error) {
+                       if (error == -EINVAL)
+                               error = 0;
+                       break;
+               }
+               if (folio) {
+                       folio_unlock(folio);
+
+                       if (folio_test_hwpoison(folio)) {
+                               error = -EIO;
+                               break;
+                       }
+               }
+
+               /*
+                * i_size must be checked after we know the pages are Uptodate.
+                *
+                * Checking i_size after the check allows us to calculate
+                * the correct value for "nr", which means the zero-filled
+                * part of the page is not copied back to userspace (unless
+                * another truncate extends the file - this is desired though).
+                */
+               isize = i_size_read(inode);
+               if (unlikely(*ppos >= isize))
+                       break;
+               part = min_t(loff_t, isize - *ppos, len);
+
+               if (folio) {
+                       /*
+                        * If users can be writing to this page using arbitrary
+                        * virtual addresses, take care about potential aliasing
+                        * before reading the page on the kernel side.
+                        */
+                       if (mapping_writably_mapped(mapping))
+                               flush_dcache_folio(folio);
+                       folio_mark_accessed(folio);
+                       /*
+                        * Ok, we have the page, and it's up-to-date, so we can
+                        * now splice it into the pipe.
+                        */
+                       n = splice_folio_into_pipe(pipe, folio, *ppos, part);
+                       folio_put(folio);
+                       folio = NULL;
+               } else {
+                       n = splice_zeropage_into_pipe(pipe, *ppos, len);
+               }
+
+               if (!n)
+                       break;
+               len -= n;
+               total_spliced += n;
+               *ppos += n;
+               in->f_ra.prev_pos = *ppos;
+               if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
+                       break;
+
+               cond_resched();
+       } while (len);
+
+       if (folio)
+               folio_put(folio);
+
+       file_accessed(in);
+       return total_spliced ? total_spliced : error;
+}
+
 static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence)
 {
        struct address_space *mapping = file->f_mapping;
@@ -3971,7 +4103,7 @@ static const struct file_operations shmem_file_operations = {
        .read_iter      = shmem_file_read_iter,
        .write_iter     = generic_file_write_iter,
        .fsync          = noop_fsync,
-       .splice_read    = generic_file_splice_read,
+       .splice_read    = shmem_file_splice_read,
        .splice_write   = iter_file_splice_write,
        .fallocate      = shmem_fallocate,
 #endif
index 3f83b10..3ab53fa 100644 (file)
@@ -5,12 +5,10 @@
 #include <linux/seq_file.h>
 #include <linux/shrinker.h>
 #include <linux/memcontrol.h>
-#include <linux/srcu.h>
 
 /* defined in vmscan.c */
-extern struct mutex shrinker_mutex;
+extern struct rw_semaphore shrinker_rwsem;
 extern struct list_head shrinker_list;
-extern struct srcu_struct shrinker_srcu;
 
 static DEFINE_IDA(shrinker_debugfs_ida);
 static struct dentry *shrinker_debugfs_root;
@@ -51,13 +49,18 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
        struct mem_cgroup *memcg;
        unsigned long total;
        bool memcg_aware;
-       int ret = 0, nid, srcu_idx;
+       int ret, nid;
 
        count_per_node = kcalloc(nr_node_ids, sizeof(unsigned long), GFP_KERNEL);
        if (!count_per_node)
                return -ENOMEM;
 
-       srcu_idx = srcu_read_lock(&shrinker_srcu);
+       ret = down_read_killable(&shrinker_rwsem);
+       if (ret) {
+               kfree(count_per_node);
+               return ret;
+       }
+       rcu_read_lock();
 
        memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
 
@@ -88,7 +91,8 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
                }
        } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
 
-       srcu_read_unlock(&shrinker_srcu, srcu_idx);
+       rcu_read_unlock();
+       up_read(&shrinker_rwsem);
 
        kfree(count_per_node);
        return ret;
@@ -111,8 +115,9 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
                .gfp_mask = GFP_KERNEL,
        };
        struct mem_cgroup *memcg = NULL;
-       int nid, srcu_idx;
+       int nid;
        char kbuf[72];
+       ssize_t ret;
 
        read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
        if (copy_from_user(kbuf, buf, read_len))
@@ -141,7 +146,11 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
                return -EINVAL;
        }
 
-       srcu_idx = srcu_read_lock(&shrinker_srcu);
+       ret = down_read_killable(&shrinker_rwsem);
+       if (ret) {
+               mem_cgroup_put(memcg);
+               return ret;
+       }
 
        sc.nid = nid;
        sc.memcg = memcg;
@@ -150,7 +159,7 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
 
        shrinker->scan_objects(shrinker, &sc);
 
-       srcu_read_unlock(&shrinker_srcu, srcu_idx);
+       up_read(&shrinker_rwsem);
        mem_cgroup_put(memcg);
 
        return size;
@@ -168,7 +177,7 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
        char buf[128];
        int id;
 
-       lockdep_assert_held(&shrinker_mutex);
+       lockdep_assert_held(&shrinker_rwsem);
 
        /* debugfs isn't initialized yet, add debugfs entries later. */
        if (!shrinker_debugfs_root)
@@ -211,7 +220,7 @@ int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...)
        if (!new)
                return -ENOMEM;
 
-       mutex_lock(&shrinker_mutex);
+       down_write(&shrinker_rwsem);
 
        old = shrinker->name;
        shrinker->name = new;
@@ -229,7 +238,7 @@ int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...)
                        shrinker->debugfs_entry = entry;
        }
 
-       mutex_unlock(&shrinker_mutex);
+       up_write(&shrinker_rwsem);
 
        kfree_const(old);
 
@@ -237,23 +246,28 @@ int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...)
 }
 EXPORT_SYMBOL(shrinker_debugfs_rename);
 
-struct dentry *shrinker_debugfs_remove(struct shrinker *shrinker)
+struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
+                                      int *debugfs_id)
 {
        struct dentry *entry = shrinker->debugfs_entry;
 
-       lockdep_assert_held(&shrinker_mutex);
+       lockdep_assert_held(&shrinker_rwsem);
 
        kfree_const(shrinker->name);
        shrinker->name = NULL;
 
-       if (entry) {
-               ida_free(&shrinker_debugfs_ida, shrinker->debugfs_id);
-               shrinker->debugfs_entry = NULL;
-       }
+       *debugfs_id = entry ? shrinker->debugfs_id : -1;
+       shrinker->debugfs_entry = NULL;
 
        return entry;
 }
 
+void shrinker_debugfs_remove(struct dentry *debugfs_entry, int debugfs_id)
+{
+       debugfs_remove_recursive(debugfs_entry);
+       ida_free(&shrinker_debugfs_ida, debugfs_id);
+}
+
 static int __init shrinker_debugfs_init(void)
 {
        struct shrinker *shrinker;
@@ -266,14 +280,14 @@ static int __init shrinker_debugfs_init(void)
        shrinker_debugfs_root = dentry;
 
        /* Create debugfs entries for shrinkers registered at boot */
-       mutex_lock(&shrinker_mutex);
+       down_write(&shrinker_rwsem);
        list_for_each_entry(shrinker, &shrinker_list, list)
                if (!shrinker->debugfs_entry) {
                        ret = shrinker_debugfs_add(shrinker);
                        if (ret)
                                break;
                }
-       mutex_unlock(&shrinker_mutex);
+       up_write(&shrinker_rwsem);
 
        return ret;
 }
index f01ac25..bc36edd 100644 (file)
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -6,6 +6,38 @@
  */
 void __init kmem_cache_init(void);
 
+#ifdef CONFIG_64BIT
+# ifdef system_has_cmpxchg128
+# define system_has_freelist_aba()     system_has_cmpxchg128()
+# define try_cmpxchg_freelist          try_cmpxchg128
+# endif
+#define this_cpu_try_cmpxchg_freelist  this_cpu_try_cmpxchg128
+typedef u128 freelist_full_t;
+#else /* CONFIG_64BIT */
+# ifdef system_has_cmpxchg64
+# define system_has_freelist_aba()     system_has_cmpxchg64()
+# define try_cmpxchg_freelist          try_cmpxchg64
+# endif
+#define this_cpu_try_cmpxchg_freelist  this_cpu_try_cmpxchg64
+typedef u64 freelist_full_t;
+#endif /* CONFIG_64BIT */
+
+#if defined(system_has_freelist_aba) && !defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+#undef system_has_freelist_aba
+#endif
+
+/*
+ * Freelist pointer and counter to cmpxchg together, avoids the typical ABA
+ * problems with cmpxchg of just a pointer.
+ */
+typedef union {
+       struct {
+               void *freelist;
+               unsigned long counter;
+       };
+       freelist_full_t full;
+} freelist_aba_t;
+
 /* Reuses the bits in struct page */
 struct slab {
        unsigned long __page_flags;
@@ -38,14 +70,21 @@ struct slab {
 #endif
                        };
                        /* Double-word boundary */
-                       void *freelist;         /* first free object */
                        union {
-                               unsigned long counters;
                                struct {
-                                       unsigned inuse:16;
-                                       unsigned objects:15;
-                                       unsigned frozen:1;
+                                       void *freelist;         /* first free object */
+                                       union {
+                                               unsigned long counters;
+                                               struct {
+                                                       unsigned inuse:16;
+                                                       unsigned objects:15;
+                                                       unsigned frozen:1;
+                                               };
+                                       };
                                };
+#ifdef system_has_freelist_aba
+                               freelist_aba_t freelist_counter;
+#endif
                        };
                };
                struct rcu_head rcu_head;
@@ -72,8 +111,8 @@ SLAB_MATCH(memcg_data, memcg_data);
 #endif
 #undef SLAB_MATCH
 static_assert(sizeof(struct slab) <= sizeof(struct page));
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && defined(CONFIG_SLUB)
-static_assert(IS_ALIGNED(offsetof(struct slab, freelist), 2*sizeof(void *)));
+#if defined(system_has_freelist_aba) && defined(CONFIG_SLUB)
+static_assert(IS_ALIGNED(offsetof(struct slab, freelist), sizeof(freelist_aba_t)));
 #endif
 
 /**
index c87628c..7529626 100644 (file)
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -292,7 +292,12 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
 /* Poison object */
 #define __OBJECT_POISON                ((slab_flags_t __force)0x80000000U)
 /* Use cmpxchg_double */
+
+#ifdef system_has_freelist_aba
 #define __CMPXCHG_DOUBLE       ((slab_flags_t __force)0x40000000U)
+#else
+#define __CMPXCHG_DOUBLE       ((slab_flags_t __force)0U)
+#endif
 
 /*
  * Tracking user of a slab.
@@ -512,6 +517,40 @@ static __always_inline void slab_unlock(struct slab *slab)
        __bit_spin_unlock(PG_locked, &page->flags);
 }
 
+static inline bool
+__update_freelist_fast(struct slab *slab,
+                     void *freelist_old, unsigned long counters_old,
+                     void *freelist_new, unsigned long counters_new)
+{
+#ifdef system_has_freelist_aba
+       freelist_aba_t old = { .freelist = freelist_old, .counter = counters_old };
+       freelist_aba_t new = { .freelist = freelist_new, .counter = counters_new };
+
+       return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full);
+#else
+       return false;
+#endif
+}
+
+static inline bool
+__update_freelist_slow(struct slab *slab,
+                     void *freelist_old, unsigned long counters_old,
+                     void *freelist_new, unsigned long counters_new)
+{
+       bool ret = false;
+
+       slab_lock(slab);
+       if (slab->freelist == freelist_old &&
+           slab->counters == counters_old) {
+               slab->freelist = freelist_new;
+               slab->counters = counters_new;
+               ret = true;
+       }
+       slab_unlock(slab);
+
+       return ret;
+}
+
 /*
  * Interrupts must be disabled (for the fallback code to work right), typically
  * by an _irqsave() lock variant. On PREEMPT_RT the preempt_disable(), which is
@@ -519,33 +558,25 @@ static __always_inline void slab_unlock(struct slab *slab)
  * allocation/ free operation in hardirq context. Therefore nothing can
  * interrupt the operation.
  */
-static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct slab *slab,
+static inline bool __slab_update_freelist(struct kmem_cache *s, struct slab *slab,
                void *freelist_old, unsigned long counters_old,
                void *freelist_new, unsigned long counters_new,
                const char *n)
 {
+       bool ret;
+
        if (USE_LOCKLESS_FAST_PATH())
                lockdep_assert_irqs_disabled();
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+
        if (s->flags & __CMPXCHG_DOUBLE) {
-               if (cmpxchg_double(&slab->freelist, &slab->counters,
-                                  freelist_old, counters_old,
-                                  freelist_new, counters_new))
-                       return true;
-       } else
-#endif
-       {
-               slab_lock(slab);
-               if (slab->freelist == freelist_old &&
-                                       slab->counters == counters_old) {
-                       slab->freelist = freelist_new;
-                       slab->counters = counters_new;
-                       slab_unlock(slab);
-                       return true;
-               }
-               slab_unlock(slab);
+               ret = __update_freelist_fast(slab, freelist_old, counters_old,
+                                           freelist_new, counters_new);
+       } else {
+               ret = __update_freelist_slow(slab, freelist_old, counters_old,
+                                           freelist_new, counters_new);
        }
+       if (likely(ret))
+               return true;
 
        cpu_relax();
        stat(s, CMPXCHG_DOUBLE_FAIL);
@@ -557,36 +588,26 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct slab *slab
        return false;
 }
 
-static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct slab *slab,
+static inline bool slab_update_freelist(struct kmem_cache *s, struct slab *slab,
                void *freelist_old, unsigned long counters_old,
                void *freelist_new, unsigned long counters_new,
                const char *n)
 {
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
+       bool ret;
+
        if (s->flags & __CMPXCHG_DOUBLE) {
-               if (cmpxchg_double(&slab->freelist, &slab->counters,
-                                  freelist_old, counters_old,
-                                  freelist_new, counters_new))
-                       return true;
-       } else
-#endif
-       {
+               ret = __update_freelist_fast(slab, freelist_old, counters_old,
+                                           freelist_new, counters_new);
+       } else {
                unsigned long flags;
 
                local_irq_save(flags);
-               slab_lock(slab);
-               if (slab->freelist == freelist_old &&
-                                       slab->counters == counters_old) {
-                       slab->freelist = freelist_new;
-                       slab->counters = counters_new;
-                       slab_unlock(slab);
-                       local_irq_restore(flags);
-                       return true;
-               }
-               slab_unlock(slab);
+               ret = __update_freelist_slow(slab, freelist_old, counters_old,
+                                           freelist_new, counters_new);
                local_irq_restore(flags);
        }
+       if (likely(ret))
+               return true;
 
        cpu_relax();
        stat(s, CMPXCHG_DOUBLE_FAIL);
@@ -2228,7 +2249,7 @@ static inline void *acquire_slab(struct kmem_cache *s,
        VM_BUG_ON(new.frozen);
        new.frozen = 1;
 
-       if (!__cmpxchg_double_slab(s, slab,
+       if (!__slab_update_freelist(s, slab,
                        freelist, counters,
                        new.freelist, new.counters,
                        "acquire_slab"))
@@ -2554,7 +2575,7 @@ redo:
        }
 
 
-       if (!cmpxchg_double_slab(s, slab,
+       if (!slab_update_freelist(s, slab,
                                old.freelist, old.counters,
                                new.freelist, new.counters,
                                "unfreezing slab")) {
@@ -2611,7 +2632,7 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
 
                        new.frozen = 0;
 
-               } while (!__cmpxchg_double_slab(s, slab,
+               } while (!__slab_update_freelist(s, slab,
                                old.freelist, old.counters,
                                new.freelist, new.counters,
                                "unfreezing slab"));
@@ -3008,6 +3029,18 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags)
 }
 
 #ifndef CONFIG_SLUB_TINY
+static inline bool
+__update_cpu_freelist_fast(struct kmem_cache *s,
+                          void *freelist_old, void *freelist_new,
+                          unsigned long tid)
+{
+       freelist_aba_t old = { .freelist = freelist_old, .counter = tid };
+       freelist_aba_t new = { .freelist = freelist_new, .counter = next_tid(tid) };
+
+       return this_cpu_try_cmpxchg_freelist(s->cpu_slab->freelist_tid.full,
+                                            &old.full, new.full);
+}
+
 /*
  * Check the slab->freelist and either transfer the freelist to the
  * per cpu freelist or deactivate the slab.
@@ -3034,7 +3067,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
                new.inuse = slab->objects;
                new.frozen = freelist != NULL;
 
-       } while (!__cmpxchg_double_slab(s, slab,
+       } while (!__slab_update_freelist(s, slab,
                freelist, counters,
                NULL, new.counters,
                "get_freelist"));
@@ -3359,11 +3392,7 @@ redo:
                 * against code executing on this cpu *not* from access by
                 * other cpus.
                 */
-               if (unlikely(!this_cpu_cmpxchg_double(
-                               s->cpu_slab->freelist, s->cpu_slab->tid,
-                               object, tid,
-                               next_object, next_tid(tid)))) {
-
+               if (unlikely(!__update_cpu_freelist_fast(s, object, next_object, tid))) {
                        note_cmpxchg_failure("slab_alloc", s, tid);
                        goto redo;
                }
@@ -3631,7 +3660,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
                        }
                }
 
-       } while (!cmpxchg_double_slab(s, slab,
+       } while (!slab_update_freelist(s, slab,
                prior, counters,
                head, new.counters,
                "__slab_free"));
@@ -3736,11 +3765,7 @@ redo:
 
                set_freepointer(s, tail_obj, freelist);
 
-               if (unlikely(!this_cpu_cmpxchg_double(
-                               s->cpu_slab->freelist, s->cpu_slab->tid,
-                               freelist, tid,
-                               head, next_tid(tid)))) {
-
+               if (unlikely(!__update_cpu_freelist_fast(s, freelist, head, tid))) {
                        note_cmpxchg_failure("slab_free", s, tid);
                        goto redo;
                }
@@ -4505,11 +4530,11 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
                }
        }
 
-#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
-    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
-       if (system_has_cmpxchg_double() && (s->flags & SLAB_NO_CMPXCHG) == 0)
+#ifdef system_has_freelist_aba
+       if (system_has_freelist_aba() && !(s->flags & SLAB_NO_CMPXCHG)) {
                /* Enable fast mode */
                s->flags |= __CMPXCHG_DOUBLE;
+       }
 #endif
 
        /*
index 274bbf7..6bc8306 100644 (file)
@@ -2539,7 +2539,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
                struct block_device *bdev = I_BDEV(inode);
 
                set_blocksize(bdev, old_block_size);
-               blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+               blkdev_put(bdev, p);
        }
 
        inode_lock(inode);
@@ -2770,7 +2770,7 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 
        if (S_ISBLK(inode->i_mode)) {
                p->bdev = blkdev_get_by_dev(inode->i_rdev,
-                                  FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
+                               BLK_OPEN_READ | BLK_OPEN_WRITE, p, NULL);
                if (IS_ERR(p->bdev)) {
                        error = PTR_ERR(p->bdev);
                        p->bdev = NULL;
@@ -3221,7 +3221,7 @@ bad_swap:
        p->cluster_next_cpu = NULL;
        if (inode && S_ISBLK(inode->i_mode) && p->bdev) {
                set_blocksize(p->bdev, p->old_block_size);
-               blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+               blkdev_put(p->bdev, p);
        }
        inode = NULL;
        destroy_swap_extents(p);
index 9683573..1d13d71 100644 (file)
@@ -3098,11 +3098,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
         * allocation request, free them via vfree() if any.
         */
        if (area->nr_pages != nr_small_pages) {
-               /* vm_area_alloc_pages() can also fail due to a fatal signal */
-               if (!fatal_signal_pending(current))
+               /*
+                * vm_area_alloc_pages() can fail due to insufficient memory but
+                * also:-
+                *
+                * - a pending fatal signal
+                * - insufficient huge page-order pages
+                *
+                * Since we always retry allocations at order-0 in the huge page
+                * case a warning for either is spurious.
+                */
+               if (!fatal_signal_pending(current) && page_order == 0)
                        warn_alloc(gfp_mask, NULL,
-                               "vmalloc error: size %lu, page order %u, failed to allocate pages",
-                               area->nr_pages * PAGE_SIZE, page_order);
+                               "vmalloc error: size %lu, failed to allocate pages",
+                               area->nr_pages * PAGE_SIZE);
                goto fail;
        }
 
index d257916..5bf98d0 100644 (file)
@@ -35,7 +35,7 @@
 #include <linux/cpuset.h>
 #include <linux/compaction.h>
 #include <linux/notifier.h>
-#include <linux/mutex.h>
+#include <linux/rwsem.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/freezer.h>
@@ -57,7 +57,6 @@
 #include <linux/khugepaged.h>
 #include <linux/rculist_nulls.h>
 #include <linux/random.h>
-#include <linux/srcu.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -190,9 +189,7 @@ struct scan_control {
 int vm_swappiness = 60;
 
 LIST_HEAD(shrinker_list);
-DEFINE_MUTEX(shrinker_mutex);
-DEFINE_SRCU(shrinker_srcu);
-static atomic_t shrinker_srcu_generation = ATOMIC_INIT(0);
+DECLARE_RWSEM(shrinker_rwsem);
 
 #ifdef CONFIG_MEMCG
 static int shrinker_nr_max;
@@ -211,21 +208,8 @@ static inline int shrinker_defer_size(int nr_items)
 static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
                                                     int nid)
 {
-       return srcu_dereference_check(memcg->nodeinfo[nid]->shrinker_info,
-                                     &shrinker_srcu,
-                                     lockdep_is_held(&shrinker_mutex));
-}
-
-static struct shrinker_info *shrinker_info_srcu(struct mem_cgroup *memcg,
-                                                    int nid)
-{
-       return srcu_dereference(memcg->nodeinfo[nid]->shrinker_info,
-                               &shrinker_srcu);
-}
-
-static void free_shrinker_info_rcu(struct rcu_head *head)
-{
-       kvfree(container_of(head, struct shrinker_info, rcu));
+       return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
+                                        lockdep_is_held(&shrinker_rwsem));
 }
 
 static int expand_one_shrinker_info(struct mem_cgroup *memcg,
@@ -266,7 +250,7 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg,
                       defer_size - old_defer_size);
 
                rcu_assign_pointer(pn->shrinker_info, new);
-               call_srcu(&shrinker_srcu, &old->rcu, free_shrinker_info_rcu);
+               kvfree_rcu(old, rcu);
        }
 
        return 0;
@@ -292,7 +276,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
        int nid, size, ret = 0;
        int map_size, defer_size = 0;
 
-       mutex_lock(&shrinker_mutex);
+       down_write(&shrinker_rwsem);
        map_size = shrinker_map_size(shrinker_nr_max);
        defer_size = shrinker_defer_size(shrinker_nr_max);
        size = map_size + defer_size;
@@ -308,7 +292,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
                info->map_nr_max = shrinker_nr_max;
                rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
        }
-       mutex_unlock(&shrinker_mutex);
+       up_write(&shrinker_rwsem);
 
        return ret;
 }
@@ -324,7 +308,7 @@ static int expand_shrinker_info(int new_id)
        if (!root_mem_cgroup)
                goto out;
 
-       lockdep_assert_held(&shrinker_mutex);
+       lockdep_assert_held(&shrinker_rwsem);
 
        map_size = shrinker_map_size(new_nr_max);
        defer_size = shrinker_defer_size(new_nr_max);
@@ -352,16 +336,15 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id)
 {
        if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) {
                struct shrinker_info *info;
-               int srcu_idx;
 
-               srcu_idx = srcu_read_lock(&shrinker_srcu);
-               info = shrinker_info_srcu(memcg, nid);
+               rcu_read_lock();
+               info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
                if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) {
                        /* Pairs with smp mb in shrink_slab() */
                        smp_mb__before_atomic();
                        set_bit(shrinker_id, info->map);
                }
-               srcu_read_unlock(&shrinker_srcu, srcu_idx);
+               rcu_read_unlock();
        }
 }
 
@@ -374,7 +357,8 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker)
        if (mem_cgroup_disabled())
                return -ENOSYS;
 
-       mutex_lock(&shrinker_mutex);
+       down_write(&shrinker_rwsem);
+       /* This may call shrinker, so it must use down_read_trylock() */
        id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
        if (id < 0)
                goto unlock;
@@ -388,7 +372,7 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker)
        shrinker->id = id;
        ret = 0;
 unlock:
-       mutex_unlock(&shrinker_mutex);
+       up_write(&shrinker_rwsem);
        return ret;
 }
 
@@ -398,7 +382,7 @@ static void unregister_memcg_shrinker(struct shrinker *shrinker)
 
        BUG_ON(id < 0);
 
-       lockdep_assert_held(&shrinker_mutex);
+       lockdep_assert_held(&shrinker_rwsem);
 
        idr_remove(&shrinker_idr, id);
 }
@@ -408,7 +392,7 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
 {
        struct shrinker_info *info;
 
-       info = shrinker_info_srcu(memcg, nid);
+       info = shrinker_info_protected(memcg, nid);
        return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0);
 }
 
@@ -417,7 +401,7 @@ static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
 {
        struct shrinker_info *info;
 
-       info = shrinker_info_srcu(memcg, nid);
+       info = shrinker_info_protected(memcg, nid);
        return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]);
 }
 
@@ -433,7 +417,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
                parent = root_mem_cgroup;
 
        /* Prevent from concurrent shrinker_info expand */
-       mutex_lock(&shrinker_mutex);
+       down_read(&shrinker_rwsem);
        for_each_node(nid) {
                child_info = shrinker_info_protected(memcg, nid);
                parent_info = shrinker_info_protected(parent, nid);
@@ -442,7 +426,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
                        atomic_long_add(nr, &parent_info->nr_deferred[i]);
                }
        }
-       mutex_unlock(&shrinker_mutex);
+       up_read(&shrinker_rwsem);
 }
 
 static bool cgroup_reclaim(struct scan_control *sc)
@@ -743,9 +727,9 @@ void free_prealloced_shrinker(struct shrinker *shrinker)
        shrinker->name = NULL;
 #endif
        if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
-               mutex_lock(&shrinker_mutex);
+               down_write(&shrinker_rwsem);
                unregister_memcg_shrinker(shrinker);
-               mutex_unlock(&shrinker_mutex);
+               up_write(&shrinker_rwsem);
                return;
        }
 
@@ -755,11 +739,11 @@ void free_prealloced_shrinker(struct shrinker *shrinker)
 
 void register_shrinker_prepared(struct shrinker *shrinker)
 {
-       mutex_lock(&shrinker_mutex);
-       list_add_tail_rcu(&shrinker->list, &shrinker_list);
+       down_write(&shrinker_rwsem);
+       list_add_tail(&shrinker->list, &shrinker_list);
        shrinker->flags |= SHRINKER_REGISTERED;
        shrinker_debugfs_add(shrinker);
-       mutex_unlock(&shrinker_mutex);
+       up_write(&shrinker_rwsem);
 }
 
 static int __register_shrinker(struct shrinker *shrinker)
@@ -805,22 +789,20 @@ EXPORT_SYMBOL(register_shrinker);
 void unregister_shrinker(struct shrinker *shrinker)
 {
        struct dentry *debugfs_entry;
+       int debugfs_id;
 
        if (!(shrinker->flags & SHRINKER_REGISTERED))
                return;
 
-       mutex_lock(&shrinker_mutex);
-       list_del_rcu(&shrinker->list);
+       down_write(&shrinker_rwsem);
+       list_del(&shrinker->list);
        shrinker->flags &= ~SHRINKER_REGISTERED;
        if (shrinker->flags & SHRINKER_MEMCG_AWARE)
                unregister_memcg_shrinker(shrinker);
-       debugfs_entry = shrinker_debugfs_remove(shrinker);
-       mutex_unlock(&shrinker_mutex);
+       debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
+       up_write(&shrinker_rwsem);
 
-       atomic_inc(&shrinker_srcu_generation);
-       synchronize_srcu(&shrinker_srcu);
-
-       debugfs_remove_recursive(debugfs_entry);
+       shrinker_debugfs_remove(debugfs_entry, debugfs_id);
 
        kfree(shrinker->nr_deferred);
        shrinker->nr_deferred = NULL;
@@ -830,13 +812,15 @@ EXPORT_SYMBOL(unregister_shrinker);
 /**
  * synchronize_shrinkers - Wait for all running shrinkers to complete.
  *
- * This is useful to guarantee that all shrinker invocations have seen an
- * update, before freeing memory.
+ * This is equivalent to calling unregister_shrink() and register_shrinker(),
+ * but atomically and with less overhead. This is useful to guarantee that all
+ * shrinker invocations have seen an update, before freeing memory, similar to
+ * rcu.
  */
 void synchronize_shrinkers(void)
 {
-       atomic_inc(&shrinker_srcu_generation);
-       synchronize_srcu(&shrinker_srcu);
+       down_write(&shrinker_rwsem);
+       up_write(&shrinker_rwsem);
 }
 EXPORT_SYMBOL(synchronize_shrinkers);
 
@@ -945,20 +929,19 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
 {
        struct shrinker_info *info;
        unsigned long ret, freed = 0;
-       int srcu_idx, generation;
-       int i = 0;
+       int i;
 
        if (!mem_cgroup_online(memcg))
                return 0;
 
-again:
-       srcu_idx = srcu_read_lock(&shrinker_srcu);
-       info = shrinker_info_srcu(memcg, nid);
+       if (!down_read_trylock(&shrinker_rwsem))
+               return 0;
+
+       info = shrinker_info_protected(memcg, nid);
        if (unlikely(!info))
                goto unlock;
 
-       generation = atomic_read(&shrinker_srcu_generation);
-       for_each_set_bit_from(i, info->map, info->map_nr_max) {
+       for_each_set_bit(i, info->map, info->map_nr_max) {
                struct shrink_control sc = {
                        .gfp_mask = gfp_mask,
                        .nid = nid,
@@ -1004,14 +987,14 @@ again:
                                set_shrinker_bit(memcg, nid, i);
                }
                freed += ret;
-               if (atomic_read(&shrinker_srcu_generation) != generation) {
-                       srcu_read_unlock(&shrinker_srcu, srcu_idx);
-                       i++;
-                       goto again;
+
+               if (rwsem_is_contended(&shrinker_rwsem)) {
+                       freed = freed ? : 1;
+                       break;
                }
        }
 unlock:
-       srcu_read_unlock(&shrinker_srcu, srcu_idx);
+       up_read(&shrinker_rwsem);
        return freed;
 }
 #else /* CONFIG_MEMCG */
@@ -1048,7 +1031,6 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
 {
        unsigned long ret, freed = 0;
        struct shrinker *shrinker;
-       int srcu_idx, generation;
 
        /*
         * The root memcg might be allocated even though memcg is disabled
@@ -1060,11 +1042,10 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
        if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
                return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
 
-       srcu_idx = srcu_read_lock(&shrinker_srcu);
+       if (!down_read_trylock(&shrinker_rwsem))
+               goto out;
 
-       generation = atomic_read(&shrinker_srcu_generation);
-       list_for_each_entry_srcu(shrinker, &shrinker_list, list,
-                                srcu_read_lock_held(&shrinker_srcu)) {
+       list_for_each_entry(shrinker, &shrinker_list, list) {
                struct shrink_control sc = {
                        .gfp_mask = gfp_mask,
                        .nid = nid,
@@ -1075,14 +1056,19 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
                if (ret == SHRINK_EMPTY)
                        ret = 0;
                freed += ret;
-
-               if (atomic_read(&shrinker_srcu_generation) != generation) {
+               /*
+                * Bail out if someone want to register a new shrinker to
+                * prevent the registration from being stalled for long periods
+                * by parallel ongoing shrinking.
+                */
+               if (rwsem_is_contended(&shrinker_rwsem)) {
                        freed = freed ? : 1;
                        break;
                }
        }
 
-       srcu_read_unlock(&shrinker_srcu, srcu_idx);
+       up_read(&shrinker_rwsem);
+out:
        cond_resched();
        return freed;
 }
index c280463..282349c 100644 (file)
@@ -1180,6 +1180,9 @@ const char * const vmstat_text[] = {
        "nr_zspages",
 #endif
        "nr_free_cma",
+#ifdef CONFIG_UNACCEPTED_MEMORY
+       "nr_unaccepted",
+#endif
 
        /* enum numa_stat_item counters */
 #ifdef CONFIG_NUMA
index 44ddaf5..02f7f41 100644 (file)
@@ -1331,31 +1331,6 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
        obj_to_location(obj, &page, &obj_idx);
        zspage = get_zspage(page);
 
-#ifdef CONFIG_ZPOOL
-       /*
-        * Move the zspage to front of pool's LRU.
-        *
-        * Note that this is swap-specific, so by definition there are no ongoing
-        * accesses to the memory while the page is swapped out that would make
-        * it "hot". A new entry is hot, then ages to the tail until it gets either
-        * written back or swaps back in.
-        *
-        * Furthermore, map is also called during writeback. We must not put an
-        * isolated page on the LRU mid-reclaim.
-        *
-        * As a result, only update the LRU when the page is mapped for write
-        * when it's first instantiated.
-        *
-        * This is a deviation from the other backends, which perform this update
-        * in the allocation function (zbud_alloc, z3fold_alloc).
-        */
-       if (mm == ZS_MM_WO) {
-               if (!list_empty(&zspage->lru))
-                       list_del(&zspage->lru);
-               list_add(&zspage->lru, &pool->lru);
-       }
-#endif
-
        /*
         * migration cannot move any zpages in this zspage. Here, pool->lock
         * is too heavy since callers would take some time until they calls
@@ -1525,9 +1500,8 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
                fix_fullness_group(class, zspage);
                record_obj(handle, obj);
                class_stat_inc(class, ZS_OBJS_INUSE, 1);
-               spin_unlock(&pool->lock);
 
-               return handle;
+               goto out;
        }
 
        spin_unlock(&pool->lock);
@@ -1550,6 +1524,14 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
 
        /* We completely set up zspage so mark them as movable */
        SetZsPageMovable(pool, zspage);
+out:
+#ifdef CONFIG_ZPOOL
+       /* Add/move zspage to beginning of LRU */
+       if (!list_empty(&zspage->lru))
+               list_del(&zspage->lru);
+       list_add(&zspage->lru, &pool->lru);
+#endif
+
        spin_unlock(&pool->lock);
 
        return handle;
index e1e621d..30092d9 100644 (file)
@@ -1020,6 +1020,22 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
                goto fail;
 
        case ZSWAP_SWAPCACHE_NEW: /* page is locked */
+               /*
+                * Having a local reference to the zswap entry doesn't exclude
+                * swapping from invalidating and recycling the swap slot. Once
+                * the swapcache is secured against concurrent swapping to and
+                * from the slot, recheck that the entry is still current before
+                * writing.
+                */
+               spin_lock(&tree->lock);
+               if (zswap_rb_search(&tree->rbroot, entry->offset) != entry) {
+                       spin_unlock(&tree->lock);
+                       delete_from_swap_cache(page_folio(page));
+                       ret = -ENOMEM;
+                       goto fail;
+               }
+               spin_unlock(&tree->lock);
+
                /* decompress */
                acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
                dlen = PAGE_SIZE;
@@ -1158,9 +1174,16 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
                goto reject;
        }
 
+       /*
+        * XXX: zswap reclaim does not work with cgroups yet. Without a
+        * cgroup-aware entry LRU, we will push out entries system-wide based on
+        * local cgroup limits.
+        */
        objcg = get_obj_cgroup_from_page(page);
-       if (objcg && !obj_cgroup_may_zswap(objcg))
-               goto shrink;
+       if (objcg && !obj_cgroup_may_zswap(objcg)) {
+               ret = -ENOMEM;
+               goto reject;
+       }
 
        /* reclaim space if needed */
        if (zswap_is_full()) {
index 870e493..b90781b 100644 (file)
@@ -109,8 +109,8 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
         * NOTE: THIS ASSUMES DIX ETHERNET, SPECIFICALLY NOT SUPPORTING
         * OTHER THINGS LIKE FDDI/TokenRing/802.3 SNAPs...
         */
-       if (veth->h_vlan_proto != vlan->vlan_proto ||
-           vlan->flags & VLAN_FLAG_REORDER_HDR) {
+       if (vlan->flags & VLAN_FLAG_REORDER_HDR ||
+           veth->h_vlan_proto != vlan->vlan_proto) {
                u16 vlan_tci;
                vlan_tci = vlan->vlan_id;
                vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb->priority);
index 2b2d33e..995d29e 100644 (file)
@@ -400,6 +400,7 @@ done:
        return error;
 }
 
+#ifdef CONFIG_PROC_FS
 void *atm_dev_seq_start(struct seq_file *seq, loff_t *pos)
 {
        mutex_lock(&atm_dev_mutex);
@@ -415,3 +416,4 @@ void *atm_dev_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
        return seq_list_next(v, &atm_devs, pos);
 }
+#endif
index 6968e55..28a939d 100644 (file)
@@ -101,7 +101,6 @@ static void batadv_dat_purge(struct work_struct *work);
  */
 static void batadv_dat_start_timer(struct batadv_priv *bat_priv)
 {
-       INIT_DELAYED_WORK(&bat_priv->dat.work, batadv_dat_purge);
        queue_delayed_work(batadv_event_workqueue, &bat_priv->dat.work,
                           msecs_to_jiffies(10000));
 }
@@ -819,6 +818,7 @@ int batadv_dat_init(struct batadv_priv *bat_priv)
        if (!bat_priv->dat.hash)
                return -ENOMEM;
 
+       INIT_DELAYED_WORK(&bat_priv->dat.work, batadv_dat_purge);
        batadv_dat_start_timer(bat_priv);
 
        batadv_tvlv_handler_register(bat_priv, batadv_dat_tvlv_ogm_handler_v1,
index 640b951..1ef952b 100644 (file)
@@ -947,8 +947,8 @@ static void find_cis(struct hci_conn *conn, void *data)
 {
        struct iso_list_data *d = data;
 
-       /* Ignore broadcast */
-       if (!bacmp(&conn->dst, BDADDR_ANY))
+       /* Ignore broadcast or if CIG don't match */
+       if (!bacmp(&conn->dst, BDADDR_ANY) || d->cig != conn->iso_qos.ucast.cig)
                return;
 
        d->count++;
@@ -963,12 +963,17 @@ static void cis_cleanup(struct hci_conn *conn)
        struct hci_dev *hdev = conn->hdev;
        struct iso_list_data d;
 
+       if (conn->iso_qos.ucast.cig == BT_ISO_QOS_CIG_UNSET)
+               return;
+
        memset(&d, 0, sizeof(d));
        d.cig = conn->iso_qos.ucast.cig;
 
        /* Check if ISO connection is a CIS and remove CIG if there are
         * no other connections using it.
         */
+       hci_conn_hash_list_state(hdev, find_cis, ISO_LINK, BT_BOUND, &d);
+       hci_conn_hash_list_state(hdev, find_cis, ISO_LINK, BT_CONNECT, &d);
        hci_conn_hash_list_state(hdev, find_cis, ISO_LINK, BT_CONNECTED, &d);
        if (d.count)
                return;
@@ -1083,8 +1088,28 @@ static void hci_conn_unlink(struct hci_conn *conn)
        if (!conn->parent) {
                struct hci_link *link, *t;
 
-               list_for_each_entry_safe(link, t, &conn->link_list, list)
-                       hci_conn_unlink(link->conn);
+               list_for_each_entry_safe(link, t, &conn->link_list, list) {
+                       struct hci_conn *child = link->conn;
+
+                       hci_conn_unlink(child);
+
+                       /* If hdev is down it means
+                        * hci_dev_close_sync/hci_conn_hash_flush is in progress
+                        * and links don't need to be cleanup as all connections
+                        * would be cleanup.
+                        */
+                       if (!test_bit(HCI_UP, &hdev->flags))
+                               continue;
+
+                       /* Due to race, SCO connection might be not established
+                        * yet at this point. Delete it now, otherwise it is
+                        * possible for it to be stuck and can't be deleted.
+                        */
+                       if ((child->type == SCO_LINK ||
+                            child->type == ESCO_LINK) &&
+                           child->handle == HCI_CONN_HANDLE_UNSET)
+                               hci_conn_del(child);
+               }
 
                return;
        }
@@ -1092,35 +1117,30 @@ static void hci_conn_unlink(struct hci_conn *conn)
        if (!conn->link)
                return;
 
-       hci_conn_put(conn->parent);
-       conn->parent = NULL;
-
        list_del_rcu(&conn->link->list);
        synchronize_rcu();
 
+       hci_conn_drop(conn->parent);
+       hci_conn_put(conn->parent);
+       conn->parent = NULL;
+
        kfree(conn->link);
        conn->link = NULL;
-
-       /* Due to race, SCO connection might be not established
-        * yet at this point. Delete it now, otherwise it is
-        * possible for it to be stuck and can't be deleted.
-        */
-       if (conn->handle == HCI_CONN_HANDLE_UNSET)
-               hci_conn_del(conn);
 }
 
-int hci_conn_del(struct hci_conn *conn)
+void hci_conn_del(struct hci_conn *conn)
 {
        struct hci_dev *hdev = conn->hdev;
 
        BT_DBG("%s hcon %p handle %d", hdev->name, conn, conn->handle);
 
+       hci_conn_unlink(conn);
+
        cancel_delayed_work_sync(&conn->disc_work);
        cancel_delayed_work_sync(&conn->auto_accept_work);
        cancel_delayed_work_sync(&conn->idle_work);
 
        if (conn->type == ACL_LINK) {
-               hci_conn_unlink(conn);
                /* Unacked frames */
                hdev->acl_cnt += conn->sent;
        } else if (conn->type == LE_LINK) {
@@ -1131,13 +1151,6 @@ int hci_conn_del(struct hci_conn *conn)
                else
                        hdev->acl_cnt += conn->sent;
        } else {
-               struct hci_conn *acl = conn->parent;
-
-               if (acl) {
-                       hci_conn_unlink(conn);
-                       hci_conn_drop(acl);
-               }
-
                /* Unacked ISO frames */
                if (conn->type == ISO_LINK) {
                        if (hdev->iso_pkts)
@@ -1160,8 +1173,6 @@ int hci_conn_del(struct hci_conn *conn)
         * rest of hci_conn_del.
         */
        hci_conn_cleanup(conn);
-
-       return 0;
 }
 
 struct hci_dev *hci_get_route(bdaddr_t *dst, bdaddr_t *src, uint8_t src_type)
@@ -1760,24 +1771,23 @@ static bool hci_le_set_cig_params(struct hci_conn *conn, struct bt_iso_qos *qos)
 
        memset(&data, 0, sizeof(data));
 
-       /* Allocate a CIG if not set */
+       /* Allocate first still reconfigurable CIG if not set */
        if (qos->ucast.cig == BT_ISO_QOS_CIG_UNSET) {
-               for (data.cig = 0x00; data.cig < 0xff; data.cig++) {
+               for (data.cig = 0x00; data.cig < 0xf0; data.cig++) {
                        data.count = 0;
-                       data.cis = 0xff;
 
-                       hci_conn_hash_list_state(hdev, cis_list, ISO_LINK,
-                                                BT_BOUND, &data);
+                       hci_conn_hash_list_state(hdev, find_cis, ISO_LINK,
+                                                BT_CONNECT, &data);
                        if (data.count)
                                continue;
 
-                       hci_conn_hash_list_state(hdev, cis_list, ISO_LINK,
+                       hci_conn_hash_list_state(hdev, find_cis, ISO_LINK,
                                                 BT_CONNECTED, &data);
                        if (!data.count)
                                break;
                }
 
-               if (data.cig == 0xff)
+               if (data.cig == 0xf0)
                        return false;
 
                /* Update CIG */
@@ -2462,22 +2472,21 @@ timer:
 /* Drop all connection on the device */
 void hci_conn_hash_flush(struct hci_dev *hdev)
 {
-       struct hci_conn_hash *h = &hdev->conn_hash;
-       struct hci_conn *c, *n;
+       struct list_head *head = &hdev->conn_hash.list;
+       struct hci_conn *conn;
 
        BT_DBG("hdev %s", hdev->name);
 
-       list_for_each_entry_safe(c, n, &h->list, list) {
-               c->state = BT_CLOSED;
-
-               hci_disconn_cfm(c, HCI_ERROR_LOCAL_HOST_TERM);
-
-               /* Unlink before deleting otherwise it is possible that
-                * hci_conn_del removes the link which may cause the list to
-                * contain items already freed.
-                */
-               hci_conn_unlink(c);
-               hci_conn_del(c);
+       /* We should not traverse the list here, because hci_conn_del
+        * can remove extra links, which may cause the list traversal
+        * to hit items that have already been released.
+        */
+       while ((conn = list_first_entry_or_null(head,
+                                               struct hci_conn,
+                                               list)) != NULL) {
+               conn->state = BT_CLOSED;
+               hci_disconn_cfm(conn, HCI_ERROR_LOCAL_HOST_TERM);
+               hci_conn_del(conn);
        }
 }
 
index a856b10..48917c6 100644 (file)
@@ -1416,10 +1416,10 @@ int hci_remove_link_key(struct hci_dev *hdev, bdaddr_t *bdaddr)
 
 int hci_remove_ltk(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 bdaddr_type)
 {
-       struct smp_ltk *k;
+       struct smp_ltk *k, *tmp;
        int removed = 0;
 
-       list_for_each_entry_rcu(k, &hdev->long_term_keys, list) {
+       list_for_each_entry_safe(k, tmp, &hdev->long_term_keys, list) {
                if (bacmp(bdaddr, &k->bdaddr) || k->bdaddr_type != bdaddr_type)
                        continue;
 
@@ -1435,9 +1435,9 @@ int hci_remove_ltk(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 bdaddr_type)
 
 void hci_remove_irk(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 addr_type)
 {
-       struct smp_irk *k;
+       struct smp_irk *k, *tmp;
 
-       list_for_each_entry_rcu(k, &hdev->identity_resolving_keys, list) {
+       list_for_each_entry_safe(k, tmp, &hdev->identity_resolving_keys, list) {
                if (bacmp(bdaddr, &k->bdaddr) || k->addr_type != addr_type)
                        continue;
 
@@ -2686,7 +2686,9 @@ void hci_unregister_dev(struct hci_dev *hdev)
 {
        BT_DBG("%p name %s bus %d", hdev, hdev->name, hdev->bus);
 
+       mutex_lock(&hdev->unregister_lock);
        hci_dev_set_flag(hdev, HCI_UNREGISTER);
+       mutex_unlock(&hdev->unregister_lock);
 
        write_lock(&hci_dev_list_lock);
        list_del(&hdev->list);
index d00ef6e..09ba6d8 100644 (file)
@@ -3804,48 +3804,56 @@ static u8 hci_cc_le_set_cig_params(struct hci_dev *hdev, void *data,
                                   struct sk_buff *skb)
 {
        struct hci_rp_le_set_cig_params *rp = data;
+       struct hci_cp_le_set_cig_params *cp;
        struct hci_conn *conn;
-       int i = 0;
+       u8 status = rp->status;
+       int i;
 
        bt_dev_dbg(hdev, "status 0x%2.2x", rp->status);
 
+       cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_CIG_PARAMS);
+       if (!cp || rp->num_handles != cp->num_cis || rp->cig_id != cp->cig_id) {
+               bt_dev_err(hdev, "unexpected Set CIG Parameters response data");
+               status = HCI_ERROR_UNSPECIFIED;
+       }
+
        hci_dev_lock(hdev);
 
-       if (rp->status) {
+       if (status) {
                while ((conn = hci_conn_hash_lookup_cig(hdev, rp->cig_id))) {
                        conn->state = BT_CLOSED;
-                       hci_connect_cfm(conn, rp->status);
+                       hci_connect_cfm(conn, status);
                        hci_conn_del(conn);
                }
                goto unlock;
        }
 
-       rcu_read_lock();
+       /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2553
+        *
+        * If the Status return parameter is zero, then the Controller shall
+        * set the Connection_Handle arrayed return parameter to the connection
+        * handle(s) corresponding to the CIS configurations specified in
+        * the CIS_IDs command parameter, in the same order.
+        */
+       for (i = 0; i < rp->num_handles; ++i) {
+               conn = hci_conn_hash_lookup_cis(hdev, NULL, 0, rp->cig_id,
+                                               cp->cis[i].cis_id);
+               if (!conn || !bacmp(&conn->dst, BDADDR_ANY))
+                       continue;
 
-       list_for_each_entry_rcu(conn, &hdev->conn_hash.list, list) {
-               if (conn->type != ISO_LINK ||
-                   conn->iso_qos.ucast.cig != rp->cig_id ||
-                   conn->state == BT_CONNECTED)
+               if (conn->state != BT_BOUND && conn->state != BT_CONNECT)
                        continue;
 
-               conn->handle = __le16_to_cpu(rp->handle[i++]);
+               conn->handle = __le16_to_cpu(rp->handle[i]);
 
                bt_dev_dbg(hdev, "%p handle 0x%4.4x parent %p", conn,
                           conn->handle, conn->parent);
 
                /* Create CIS if LE is already connected */
-               if (conn->parent && conn->parent->state == BT_CONNECTED) {
-                       rcu_read_unlock();
+               if (conn->parent && conn->parent->state == BT_CONNECTED)
                        hci_le_create_cis(conn);
-                       rcu_read_lock();
-               }
-
-               if (i == rp->num_handles)
-                       break;
        }
 
-       rcu_read_unlock();
-
 unlock:
        hci_dev_unlock(hdev);
 
index 647a8ce..804cde4 100644 (file)
@@ -629,6 +629,7 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
        INIT_WORK(&hdev->cmd_sync_work, hci_cmd_sync_work);
        INIT_LIST_HEAD(&hdev->cmd_sync_work_list);
        mutex_init(&hdev->cmd_sync_work_lock);
+       mutex_init(&hdev->unregister_lock);
 
        INIT_WORK(&hdev->cmd_sync_cancel_work, hci_cmd_sync_cancel_work);
        INIT_WORK(&hdev->reenable_adv_work, reenable_adv);
@@ -692,14 +693,19 @@ int hci_cmd_sync_submit(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
                        void *data, hci_cmd_sync_work_destroy_t destroy)
 {
        struct hci_cmd_sync_work_entry *entry;
+       int err = 0;
 
-       if (hci_dev_test_flag(hdev, HCI_UNREGISTER))
-               return -ENODEV;
+       mutex_lock(&hdev->unregister_lock);
+       if (hci_dev_test_flag(hdev, HCI_UNREGISTER)) {
+               err = -ENODEV;
+               goto unlock;
+       }
 
        entry = kmalloc(sizeof(*entry), GFP_KERNEL);
-       if (!entry)
-               return -ENOMEM;
-
+       if (!entry) {
+               err = -ENOMEM;
+               goto unlock;
+       }
        entry->func = func;
        entry->data = data;
        entry->destroy = destroy;
@@ -710,7 +716,9 @@ int hci_cmd_sync_submit(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
 
        queue_work(hdev->req_workqueue, &hdev->cmd_sync_work);
 
-       return 0;
+unlock:
+       mutex_unlock(&hdev->unregister_lock);
+       return err;
 }
 EXPORT_SYMBOL(hci_cmd_sync_submit);
 
@@ -4543,6 +4551,9 @@ static int hci_init_sync(struct hci_dev *hdev)
            !hci_dev_test_flag(hdev, HCI_CONFIG))
                return 0;
 
+       if (hci_dev_test_and_set_flag(hdev, HCI_DEBUGFS_CREATED))
+               return 0;
+
        hci_debugfs_create_common(hdev);
 
        if (lmp_bredr_capable(hdev))
index 376b523..c5e8798 100644 (file)
@@ -4306,6 +4306,10 @@ static int l2cap_connect_create_rsp(struct l2cap_conn *conn,
        result = __le16_to_cpu(rsp->result);
        status = __le16_to_cpu(rsp->status);
 
+       if (result == L2CAP_CR_SUCCESS && (dcid < L2CAP_CID_DYN_START ||
+                                          dcid > L2CAP_CID_DYN_END))
+               return -EPROTO;
+
        BT_DBG("dcid 0x%4.4x scid 0x%4.4x result 0x%2.2x status 0x%2.2x",
               dcid, scid, result, status);
 
@@ -4337,6 +4341,11 @@ static int l2cap_connect_create_rsp(struct l2cap_conn *conn,
 
        switch (result) {
        case L2CAP_CR_SUCCESS:
+               if (__l2cap_get_chan_by_dcid(conn, dcid)) {
+                       err = -EBADSLT;
+                       break;
+               }
+
                l2cap_state_change(chan, BT_CONFIG);
                chan->ident = 0;
                chan->dcid = dcid;
@@ -4663,7 +4672,9 @@ static inline int l2cap_disconnect_req(struct l2cap_conn *conn,
 
        chan->ops->set_shutdown(chan);
 
+       l2cap_chan_unlock(chan);
        mutex_lock(&conn->chan_lock);
+       l2cap_chan_lock(chan);
        l2cap_chan_del(chan, ECONNRESET);
        mutex_unlock(&conn->chan_lock);
 
@@ -4702,7 +4713,9 @@ static inline int l2cap_disconnect_rsp(struct l2cap_conn *conn,
                return 0;
        }
 
+       l2cap_chan_unlock(chan);
        mutex_lock(&conn->chan_lock);
+       l2cap_chan_lock(chan);
        l2cap_chan_del(chan, 0);
        mutex_unlock(&conn->chan_lock);
 
index 5774470..84d6dd5 100644 (file)
@@ -42,7 +42,7 @@ int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb
            eth_type_vlan(skb->protocol)) {
                int depth;
 
-               if (!__vlan_get_protocol(skb, skb->protocol, &depth))
+               if (!vlan_get_protocol_and_depth(skb, skb->protocol, &depth))
                        goto drop;
 
                skb_set_network_header(skb, depth);
index 2b05328..efb0960 100644 (file)
@@ -27,6 +27,10 @@ int br_process_vlan_tunnel_info(const struct net_bridge *br,
 int br_get_vlan_tunnel_info_size(struct net_bridge_vlan_group *vg);
 int br_fill_vlan_tunnel_info(struct sk_buff *skb,
                             struct net_bridge_vlan_group *vg);
+bool vlan_tunid_inrange(const struct net_bridge_vlan *v_curr,
+                       const struct net_bridge_vlan *v_last);
+int br_vlan_tunnel_info(const struct net_bridge_port *p, int cmd,
+                       u16 vid, u32 tun_id, bool *changed);
 
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
 /* br_vlan_tunnel.c */
@@ -43,10 +47,6 @@ void br_handle_ingress_vlan_tunnel(struct sk_buff *skb,
                                   struct net_bridge_vlan_group *vg);
 int br_handle_egress_vlan_tunnel(struct sk_buff *skb,
                                 struct net_bridge_vlan *vlan);
-bool vlan_tunid_inrange(const struct net_bridge_vlan *v_curr,
-                       const struct net_bridge_vlan *v_last);
-int br_vlan_tunnel_info(const struct net_bridge_port *p, int cmd,
-                       u16 vid, u32 tun_id, bool *changed);
 #else
 static inline int vlan_tunnel_init(struct net_bridge_vlan_group *vg)
 {
index a750259..84f9aba 100644 (file)
@@ -1139,7 +1139,7 @@ static int isotp_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
        struct isotp_sock *so = isotp_sk(sk);
        int ret = 0;
 
-       if (flags & ~(MSG_DONTWAIT | MSG_TRUNC | MSG_PEEK))
+       if (flags & ~(MSG_DONTWAIT | MSG_TRUNC | MSG_PEEK | MSG_CMSG_COMPAT))
                return -EINVAL;
 
        if (!so->bound)
index 821d4ff..ecff1c9 100644 (file)
@@ -126,7 +126,7 @@ static void j1939_can_recv(struct sk_buff *iskb, void *data)
 #define J1939_CAN_ID CAN_EFF_FLAG
 #define J1939_CAN_MASK (CAN_EFF_FLAG | CAN_RTR_FLAG)
 
-static DEFINE_SPINLOCK(j1939_netdev_lock);
+static DEFINE_MUTEX(j1939_netdev_lock);
 
 static struct j1939_priv *j1939_priv_create(struct net_device *ndev)
 {
@@ -220,7 +220,7 @@ static void __j1939_rx_release(struct kref *kref)
        j1939_can_rx_unregister(priv);
        j1939_ecu_unmap_all(priv);
        j1939_priv_set(priv->ndev, NULL);
-       spin_unlock(&j1939_netdev_lock);
+       mutex_unlock(&j1939_netdev_lock);
 }
 
 /* get pointer to priv without increasing ref counter */
@@ -248,9 +248,9 @@ static struct j1939_priv *j1939_priv_get_by_ndev(struct net_device *ndev)
 {
        struct j1939_priv *priv;
 
-       spin_lock(&j1939_netdev_lock);
+       mutex_lock(&j1939_netdev_lock);
        priv = j1939_priv_get_by_ndev_locked(ndev);
-       spin_unlock(&j1939_netdev_lock);
+       mutex_unlock(&j1939_netdev_lock);
 
        return priv;
 }
@@ -260,14 +260,14 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
        struct j1939_priv *priv, *priv_new;
        int ret;
 
-       spin_lock(&j1939_netdev_lock);
+       mutex_lock(&j1939_netdev_lock);
        priv = j1939_priv_get_by_ndev_locked(ndev);
        if (priv) {
                kref_get(&priv->rx_kref);
-               spin_unlock(&j1939_netdev_lock);
+               mutex_unlock(&j1939_netdev_lock);
                return priv;
        }
-       spin_unlock(&j1939_netdev_lock);
+       mutex_unlock(&j1939_netdev_lock);
 
        priv = j1939_priv_create(ndev);
        if (!priv)
@@ -277,29 +277,31 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
        spin_lock_init(&priv->j1939_socks_lock);
        INIT_LIST_HEAD(&priv->j1939_socks);
 
-       spin_lock(&j1939_netdev_lock);
+       mutex_lock(&j1939_netdev_lock);
        priv_new = j1939_priv_get_by_ndev_locked(ndev);
        if (priv_new) {
                /* Someone was faster than us, use their priv and roll
                 * back our's.
                 */
                kref_get(&priv_new->rx_kref);
-               spin_unlock(&j1939_netdev_lock);
+               mutex_unlock(&j1939_netdev_lock);
                dev_put(ndev);
                kfree(priv);
                return priv_new;
        }
        j1939_priv_set(ndev, priv);
-       spin_unlock(&j1939_netdev_lock);
 
        ret = j1939_can_rx_register(priv);
        if (ret < 0)
                goto out_priv_put;
 
+       mutex_unlock(&j1939_netdev_lock);
        return priv;
 
  out_priv_put:
        j1939_priv_set(ndev, NULL);
+       mutex_unlock(&j1939_netdev_lock);
+
        dev_put(ndev);
        kfree(priv);
 
@@ -308,7 +310,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
 
 void j1939_netdev_stop(struct j1939_priv *priv)
 {
-       kref_put_lock(&priv->rx_kref, __j1939_rx_release, &j1939_netdev_lock);
+       kref_put_mutex(&priv->rx_kref, __j1939_rx_release, &j1939_netdev_lock);
        j1939_priv_put(priv);
 }
 
index 7e90f9e..35970c2 100644 (file)
@@ -798,7 +798,7 @@ static int j1939_sk_recvmsg(struct socket *sock, struct msghdr *msg,
        struct j1939_sk_buff_cb *skcb;
        int ret = 0;
 
-       if (flags & ~(MSG_DONTWAIT | MSG_ERRQUEUE))
+       if (flags & ~(MSG_DONTWAIT | MSG_ERRQUEUE | MSG_CMSG_COMPAT))
                return -EINVAL;
 
        if (flags & MSG_ERRQUEUE)
@@ -1088,6 +1088,11 @@ void j1939_sk_errqueue(struct j1939_session *session,
 
 void j1939_sk_send_loop_abort(struct sock *sk, int err)
 {
+       struct j1939_sock *jsk = j1939_sk(sk);
+
+       if (jsk->state & J1939_SOCK_ERRQUEUE)
+               return;
+
        sk->sk_err = err;
 
        sk_error_report(sk);
index 5662dff..176eb58 100644 (file)
@@ -807,18 +807,21 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
 {
        struct sock *sk = sock->sk;
        __poll_t mask;
+       u8 shutdown;
 
        sock_poll_wait(file, sock, wait);
        mask = 0;
 
        /* exceptional events? */
-       if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
+       if (READ_ONCE(sk->sk_err) ||
+           !skb_queue_empty_lockless(&sk->sk_error_queue))
                mask |= EPOLLERR |
                        (sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
 
-       if (sk->sk_shutdown & RCV_SHUTDOWN)
+       shutdown = READ_ONCE(sk->sk_shutdown);
+       if (shutdown & RCV_SHUTDOWN)
                mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
-       if (sk->sk_shutdown == SHUTDOWN_MASK)
+       if (shutdown == SHUTDOWN_MASK)
                mask |= EPOLLHUP;
 
        /* readable? */
@@ -827,10 +830,12 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
 
        /* Connection-based need to check for termination and startup */
        if (connection_based(sk)) {
-               if (sk->sk_state == TCP_CLOSE)
+               int state = READ_ONCE(sk->sk_state);
+
+               if (state == TCP_CLOSE)
                        mask |= EPOLLHUP;
                /* connection hasn't started yet? */
-               if (sk->sk_state == TCP_SYN_SENT)
+               if (state == TCP_SYN_SENT)
                        return mask;
        }
 
index 735096d..c29f3e1 100644 (file)
@@ -3335,7 +3335,7 @@ __be16 skb_network_protocol(struct sk_buff *skb, int *depth)
                type = eth->h_proto;
        }
 
-       return __vlan_get_protocol(skb, type, depth);
+       return vlan_get_protocol_and_depth(skb, type, depth);
 }
 
 /* openvswitch calls this on rx path, so we need a different check.
@@ -4471,8 +4471,10 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
                u32 next_cpu;
                u32 ident;
 
-               /* First check into global flow table if there is a match */
-               ident = sock_flow_table->ents[hash & sock_flow_table->mask];
+               /* First check into global flow table if there is a match.
+                * This READ_ONCE() pairs with WRITE_ONCE() from rps_record_sock_flow().
+                */
+               ident = READ_ONCE(sock_flow_table->ents[hash & sock_flow_table->mask]);
                if ((ident ^ hash) & ~rps_cpu_mask)
                        goto try_rps;
 
@@ -10541,7 +10543,7 @@ struct netdev_queue *dev_ingress_queue_create(struct net_device *dev)
                return NULL;
        netdev_init_one_queue(dev, queue, NULL);
        RCU_INIT_POINTER(queue->qdisc, &noop_qdisc);
-       queue->qdisc_sleeping = &noop_qdisc;
+       RCU_INIT_POINTER(queue->qdisc_sleeping, &noop_qdisc);
        rcu_assign_pointer(dev->ingress_queue, queue);
 #endif
        return queue;
index e212e9d..a3e12a6 100644 (file)
@@ -134,6 +134,29 @@ EXPORT_SYMBOL(page_pool_ethtool_stats_get);
 #define recycle_stat_add(pool, __stat, val)
 #endif
 
+static bool page_pool_producer_lock(struct page_pool *pool)
+       __acquires(&pool->ring.producer_lock)
+{
+       bool in_softirq = in_softirq();
+
+       if (in_softirq)
+               spin_lock(&pool->ring.producer_lock);
+       else
+               spin_lock_bh(&pool->ring.producer_lock);
+
+       return in_softirq;
+}
+
+static void page_pool_producer_unlock(struct page_pool *pool,
+                                     bool in_softirq)
+       __releases(&pool->ring.producer_lock)
+{
+       if (in_softirq)
+               spin_unlock(&pool->ring.producer_lock);
+       else
+               spin_unlock_bh(&pool->ring.producer_lock);
+}
+
 static int page_pool_init(struct page_pool *pool,
                          const struct page_pool_params *params)
 {
@@ -617,6 +640,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
                             int count)
 {
        int i, bulk_len = 0;
+       bool in_softirq;
 
        for (i = 0; i < count; i++) {
                struct page *page = virt_to_head_page(data[i]);
@@ -635,7 +659,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
                return;
 
        /* Bulk producer into ptr_ring page_pool cache */
-       page_pool_ring_lock(pool);
+       in_softirq = page_pool_producer_lock(pool);
        for (i = 0; i < bulk_len; i++) {
                if (__ptr_ring_produce(&pool->ring, data[i])) {
                        /* ring full */
@@ -644,7 +668,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
                }
        }
        recycle_stat_add(pool, ring, i);
-       page_pool_ring_unlock(pool);
+       page_pool_producer_unlock(pool, in_softirq);
 
        /* Hopefully all pages was return into ptr_ring */
        if (likely(i == bulk_len))
index 653901a..41de3a2 100644 (file)
@@ -2385,6 +2385,37 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[],
                if (tb[IFLA_BROADCAST] &&
                    nla_len(tb[IFLA_BROADCAST]) < dev->addr_len)
                        return -EINVAL;
+
+               if (tb[IFLA_GSO_MAX_SIZE] &&
+                   nla_get_u32(tb[IFLA_GSO_MAX_SIZE]) > dev->tso_max_size) {
+                       NL_SET_ERR_MSG(extack, "too big gso_max_size");
+                       return -EINVAL;
+               }
+
+               if (tb[IFLA_GSO_MAX_SEGS] &&
+                   (nla_get_u32(tb[IFLA_GSO_MAX_SEGS]) > GSO_MAX_SEGS ||
+                    nla_get_u32(tb[IFLA_GSO_MAX_SEGS]) > dev->tso_max_segs)) {
+                       NL_SET_ERR_MSG(extack, "too big gso_max_segs");
+                       return -EINVAL;
+               }
+
+               if (tb[IFLA_GRO_MAX_SIZE] &&
+                   nla_get_u32(tb[IFLA_GRO_MAX_SIZE]) > GRO_MAX_SIZE) {
+                       NL_SET_ERR_MSG(extack, "too big gro_max_size");
+                       return -EINVAL;
+               }
+
+               if (tb[IFLA_GSO_IPV4_MAX_SIZE] &&
+                   nla_get_u32(tb[IFLA_GSO_IPV4_MAX_SIZE]) > dev->tso_max_size) {
+                       NL_SET_ERR_MSG(extack, "too big gso_ipv4_max_size");
+                       return -EINVAL;
+               }
+
+               if (tb[IFLA_GRO_IPV4_MAX_SIZE] &&
+                   nla_get_u32(tb[IFLA_GRO_IPV4_MAX_SIZE]) > GRO_MAX_SIZE) {
+                       NL_SET_ERR_MSG(extack, "too big gro_ipv4_max_size");
+                       return -EINVAL;
+               }
        }
 
        if (tb[IFLA_AF_SPEC]) {
@@ -2858,11 +2889,6 @@ static int do_setlink(const struct sk_buff *skb,
        if (tb[IFLA_GSO_MAX_SIZE]) {
                u32 max_size = nla_get_u32(tb[IFLA_GSO_MAX_SIZE]);
 
-               if (max_size > dev->tso_max_size) {
-                       err = -EINVAL;
-                       goto errout;
-               }
-
                if (dev->gso_max_size ^ max_size) {
                        netif_set_gso_max_size(dev, max_size);
                        status |= DO_SETLINK_MODIFIED;
@@ -2872,11 +2898,6 @@ static int do_setlink(const struct sk_buff *skb,
        if (tb[IFLA_GSO_MAX_SEGS]) {
                u32 max_segs = nla_get_u32(tb[IFLA_GSO_MAX_SEGS]);
 
-               if (max_segs > GSO_MAX_SEGS || max_segs > dev->tso_max_segs) {
-                       err = -EINVAL;
-                       goto errout;
-               }
-
                if (dev->gso_max_segs ^ max_segs) {
                        netif_set_gso_max_segs(dev, max_segs);
                        status |= DO_SETLINK_MODIFIED;
@@ -2895,11 +2916,6 @@ static int do_setlink(const struct sk_buff *skb,
        if (tb[IFLA_GSO_IPV4_MAX_SIZE]) {
                u32 max_size = nla_get_u32(tb[IFLA_GSO_IPV4_MAX_SIZE]);
 
-               if (max_size > dev->tso_max_size) {
-                       err = -EINVAL;
-                       goto errout;
-               }
-
                if (dev->gso_ipv4_max_size ^ max_size) {
                        netif_set_gso_ipv4_max_size(dev, max_size);
                        status |= DO_SETLINK_MODIFIED;
@@ -3285,6 +3301,7 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
        struct net_device *dev;
        unsigned int num_tx_queues = 1;
        unsigned int num_rx_queues = 1;
+       int err;
 
        if (tb[IFLA_NUM_TX_QUEUES])
                num_tx_queues = nla_get_u32(tb[IFLA_NUM_TX_QUEUES]);
@@ -3320,13 +3337,18 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
        if (!dev)
                return ERR_PTR(-ENOMEM);
 
+       err = validate_linkmsg(dev, tb, extack);
+       if (err < 0) {
+               free_netdev(dev);
+               return ERR_PTR(err);
+       }
+
        dev_net_set(dev, net);
        dev->rtnl_link_ops = ops;
        dev->rtnl_link_state = RTNL_LINK_INITIALIZING;
 
        if (tb[IFLA_MTU]) {
                u32 mtu = nla_get_u32(tb[IFLA_MTU]);
-               int err;
 
                err = dev_validate_mtu(dev, mtu, extack);
                if (err) {
index 26a5860..cea28d3 100644 (file)
@@ -5224,8 +5224,10 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
        } else {
                skb = skb_clone(orig_skb, GFP_ATOMIC);
 
-               if (skb_orphan_frags_rx(skb, GFP_ATOMIC))
+               if (skb_orphan_frags_rx(skb, GFP_ATOMIC)) {
+                       kfree_skb(skb);
                        return;
+               }
        }
        if (!skb)
                return;
@@ -5298,7 +5300,7 @@ bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off)
        u32 csum_end = (u32)start + (u32)off + sizeof(__sum16);
        u32 csum_start = skb_headroom(skb) + (u32)start;
 
-       if (unlikely(csum_start > U16_MAX || csum_end > skb_headlen(skb))) {
+       if (unlikely(csum_start >= U16_MAX || csum_end > skb_headlen(skb))) {
                net_warn_ratelimited("bad partial csum: csum=%u/%u headroom=%u headlen=%u\n",
                                     start, off, skb_headroom(skb), skb_headlen(skb));
                return false;
@@ -5306,7 +5308,7 @@ bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off)
        skb->ip_summed = CHECKSUM_PARTIAL;
        skb->csum_start = csum_start;
        skb->csum_offset = off;
-       skb_set_transport_header(skb, start);
+       skb->transport_header = csum_start;
        return true;
 }
 EXPORT_SYMBOL_GPL(skb_partial_csum_set);
index f818837..a29508e 100644 (file)
@@ -481,8 +481,6 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
                msg_rx = sk_psock_peek_msg(psock);
        }
 out:
-       if (psock->work_state.skb && copied > 0)
-               schedule_work(&psock->work);
        return copied;
 }
 EXPORT_SYMBOL_GPL(sk_msg_recvmsg);
@@ -624,42 +622,33 @@ static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb,
 
 static void sk_psock_skb_state(struct sk_psock *psock,
                               struct sk_psock_work_state *state,
-                              struct sk_buff *skb,
                               int len, int off)
 {
        spin_lock_bh(&psock->ingress_lock);
        if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
-               state->skb = skb;
                state->len = len;
                state->off = off;
-       } else {
-               sock_drop(psock->sk, skb);
        }
        spin_unlock_bh(&psock->ingress_lock);
 }
 
 static void sk_psock_backlog(struct work_struct *work)
 {
-       struct sk_psock *psock = container_of(work, struct sk_psock, work);
+       struct delayed_work *dwork = to_delayed_work(work);
+       struct sk_psock *psock = container_of(dwork, struct sk_psock, work);
        struct sk_psock_work_state *state = &psock->work_state;
        struct sk_buff *skb = NULL;
+       u32 len = 0, off = 0;
        bool ingress;
-       u32 len, off;
        int ret;
 
        mutex_lock(&psock->work_mutex);
-       if (unlikely(state->skb)) {
-               spin_lock_bh(&psock->ingress_lock);
-               skb = state->skb;
+       if (unlikely(state->len)) {
                len = state->len;
                off = state->off;
-               state->skb = NULL;
-               spin_unlock_bh(&psock->ingress_lock);
        }
-       if (skb)
-               goto start;
 
-       while ((skb = skb_dequeue(&psock->ingress_skb))) {
+       while ((skb = skb_peek(&psock->ingress_skb))) {
                len = skb->len;
                off = 0;
                if (skb_bpf_strparser(skb)) {
@@ -668,7 +657,6 @@ static void sk_psock_backlog(struct work_struct *work)
                        off = stm->offset;
                        len = stm->full_len;
                }
-start:
                ingress = skb_bpf_ingress(skb);
                skb_bpf_redirect_clear(skb);
                do {
@@ -678,22 +666,28 @@ start:
                                                          len, ingress);
                        if (ret <= 0) {
                                if (ret == -EAGAIN) {
-                                       sk_psock_skb_state(psock, state, skb,
-                                                          len, off);
+                                       sk_psock_skb_state(psock, state, len, off);
+
+                                       /* Delay slightly to prioritize any
+                                        * other work that might be here.
+                                        */
+                                       if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
+                                               schedule_delayed_work(&psock->work, 1);
                                        goto end;
                                }
                                /* Hard errors break pipe and stop xmit. */
                                sk_psock_report_error(psock, ret ? -ret : EPIPE);
                                sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
-                               sock_drop(psock->sk, skb);
                                goto end;
                        }
                        off += ret;
                        len -= ret;
                } while (len);
 
-               if (!ingress)
+               skb = skb_dequeue(&psock->ingress_skb);
+               if (!ingress) {
                        kfree_skb(skb);
+               }
        }
 end:
        mutex_unlock(&psock->work_mutex);
@@ -734,7 +728,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node)
        INIT_LIST_HEAD(&psock->link);
        spin_lock_init(&psock->link_lock);
 
-       INIT_WORK(&psock->work, sk_psock_backlog);
+       INIT_DELAYED_WORK(&psock->work, sk_psock_backlog);
        mutex_init(&psock->work_mutex);
        INIT_LIST_HEAD(&psock->ingress_msg);
        spin_lock_init(&psock->ingress_lock);
@@ -786,11 +780,6 @@ static void __sk_psock_zap_ingress(struct sk_psock *psock)
                skb_bpf_redirect_clear(skb);
                sock_drop(psock->sk, skb);
        }
-       kfree_skb(psock->work_state.skb);
-       /* We null the skb here to ensure that calls to sk_psock_backlog
-        * do not pick up the free'd skb.
-        */
-       psock->work_state.skb = NULL;
        __sk_psock_purge_ingress_msg(psock);
 }
 
@@ -809,7 +798,6 @@ void sk_psock_stop(struct sk_psock *psock)
        spin_lock_bh(&psock->ingress_lock);
        sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
        sk_psock_cork_free(psock);
-       __sk_psock_zap_ingress(psock);
        spin_unlock_bh(&psock->ingress_lock);
 }
 
@@ -823,7 +811,8 @@ static void sk_psock_destroy(struct work_struct *work)
 
        sk_psock_done_strp(psock);
 
-       cancel_work_sync(&psock->work);
+       cancel_delayed_work_sync(&psock->work);
+       __sk_psock_zap_ingress(psock);
        mutex_destroy(&psock->work_mutex);
 
        psock_progs_drop(&psock->progs);
@@ -938,7 +927,7 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
        }
 
        skb_queue_tail(&psock_other->ingress_skb, skb);
-       schedule_work(&psock_other->work);
+       schedule_delayed_work(&psock_other->work, 0);
        spin_unlock_bh(&psock_other->ingress_lock);
        return 0;
 }
@@ -990,10 +979,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
                err = -EIO;
                sk_other = psock->sk;
                if (sock_flag(sk_other, SOCK_DEAD) ||
-                   !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
-                       skb_bpf_redirect_clear(skb);
+                   !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
                        goto out_free;
-               }
 
                skb_bpf_set_ingress(skb);
 
@@ -1018,22 +1005,23 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
                        spin_lock_bh(&psock->ingress_lock);
                        if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
                                skb_queue_tail(&psock->ingress_skb, skb);
-                               schedule_work(&psock->work);
+                               schedule_delayed_work(&psock->work, 0);
                                err = 0;
                        }
                        spin_unlock_bh(&psock->ingress_lock);
-                       if (err < 0) {
-                               skb_bpf_redirect_clear(skb);
+                       if (err < 0)
                                goto out_free;
-                       }
                }
                break;
        case __SK_REDIRECT:
+               tcp_eat_skb(psock->sk, skb);
                err = sk_psock_skb_redirect(psock, skb);
                break;
        case __SK_DROP:
        default:
 out_free:
+               skb_bpf_redirect_clear(skb);
+               tcp_eat_skb(psock->sk, skb);
                sock_drop(psock->sk, skb);
        }
 
@@ -1049,7 +1037,7 @@ static void sk_psock_write_space(struct sock *sk)
        psock = sk_psock(sk);
        if (likely(psock)) {
                if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
-                       schedule_work(&psock->work);
+                       schedule_delayed_work(&psock->work, 0);
                write_space = psock->saved_write_space;
        }
        rcu_read_unlock();
@@ -1078,8 +1066,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
                skb_dst_drop(skb);
                skb_bpf_redirect_clear(skb);
                ret = bpf_prog_run_pin_on_cpu(prog, skb);
-               if (ret == SK_PASS)
-                       skb_bpf_set_strparser(skb);
+               skb_bpf_set_strparser(skb);
                ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb));
                skb->sk = NULL;
        }
@@ -1183,12 +1170,11 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
        int ret = __SK_DROP;
        int len = skb->len;
 
-       skb_get(skb);
-
        rcu_read_lock();
        psock = sk_psock(sk);
        if (unlikely(!psock)) {
                len = 0;
+               tcp_eat_skb(sk, skb);
                sock_drop(sk, skb);
                goto out;
        }
@@ -1212,12 +1198,22 @@ out:
 static void sk_psock_verdict_data_ready(struct sock *sk)
 {
        struct socket *sock = sk->sk_socket;
+       int copied;
 
        trace_sk_data_ready(sk);
 
        if (unlikely(!sock || !sock->ops || !sock->ops->read_skb))
                return;
-       sock->ops->read_skb(sk, sk_psock_verdict_recv);
+       copied = sock->ops->read_skb(sk, sk_psock_verdict_recv);
+       if (copied >= 0) {
+               struct sk_psock *psock;
+
+               rcu_read_lock();
+               psock = sk_psock(sk);
+               if (psock)
+                       psock->saved_data_ready(sk);
+               rcu_read_unlock();
+       }
 }
 
 void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock)
index 5440e67..6e5662c 100644 (file)
@@ -1362,12 +1362,6 @@ set_sndbuf:
                __sock_set_mark(sk, val);
                break;
        case SO_RCVMARK:
-               if (!sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) &&
-                   !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
-                       ret = -EPERM;
-                       break;
-               }
-
                sock_valbool_flag(sk, SOCK_RCVMARK, valbool);
                break;
 
@@ -2381,7 +2375,6 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 {
        u32 max_segs = 1;
 
-       sk_dst_set(sk, dst);
        sk->sk_route_caps = dst->dev->features;
        if (sk_is_tcp(sk))
                sk->sk_route_caps |= NETIF_F_GSO;
@@ -2400,6 +2393,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
                }
        }
        sk->sk_gso_max_segs = max_segs;
+       sk_dst_set(sk, dst);
 }
 EXPORT_SYMBOL_GPL(sk_setup_caps);
 
index 7c189c2..00afb66 100644 (file)
@@ -1644,9 +1644,10 @@ void sock_map_close(struct sock *sk, long timeout)
                rcu_read_unlock();
                sk_psock_stop(psock);
                release_sock(sk);
-               cancel_work_sync(&psock->work);
+               cancel_delayed_work_sync(&psock->work);
                sk_psock_put(sk, psock);
        }
+
        /* Make sure we do not recurse. This is a bug.
         * Leak the socket instead of crashing on a stack overflow.
         */
index 434446a..f5c4e47 100644 (file)
@@ -73,8 +73,8 @@ int sk_stream_wait_connect(struct sock *sk, long *timeo_p)
                add_wait_queue(sk_sleep(sk), &wait);
                sk->sk_write_pending++;
                done = sk_wait_event(sk, timeo_p,
-                                    !sk->sk_err &&
-                                    !((1 << sk->sk_state) &
+                                    !READ_ONCE(sk->sk_err) &&
+                                    !((1 << READ_ONCE(sk->sk_state)) &
                                       ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)), &wait);
                remove_wait_queue(sk_sleep(sk), &wait);
                sk->sk_write_pending--;
@@ -87,9 +87,9 @@ EXPORT_SYMBOL(sk_stream_wait_connect);
  * sk_stream_closing - Return 1 if we still have things to send in our buffers.
  * @sk: socket to verify
  */
-static inline int sk_stream_closing(struct sock *sk)
+static int sk_stream_closing(const struct sock *sk)
 {
-       return (1 << sk->sk_state) &
+       return (1 << READ_ONCE(sk->sk_state)) &
               (TCPF_FIN_WAIT1 | TCPF_CLOSING | TCPF_LAST_ACK);
 }
 
@@ -142,8 +142,8 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
 
                set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
                sk->sk_write_pending++;
-               sk_wait_event(sk, &current_timeo, sk->sk_err ||
-                                                 (sk->sk_shutdown & SEND_SHUTDOWN) ||
+               sk_wait_event(sk, &current_timeo, READ_ONCE(sk->sk_err) ||
+                                                 (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) ||
                                                  (sk_stream_memory_free(sk) &&
                                                  !vm_wait), &wait);
                sk->sk_write_pending--;
index a06b564..b0ebf85 100644 (file)
@@ -191,6 +191,9 @@ int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized)
        struct dccp_sock *dp = dccp_sk(sk);
        struct inet_connection_sock *icsk = inet_csk(sk);
 
+       pr_warn_once("DCCP is deprecated and scheduled to be removed in 2025, "
+                    "please contact the netdev mailing list\n");
+
        icsk->icsk_rto          = DCCP_TIMEOUT_INIT;
        icsk->icsk_syn_retries  = sysctl_dccp_request_retries;
        sk->sk_state            = DCCP_CLOSED;
index 777b091..c23ebab 100644 (file)
@@ -204,11 +204,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
        if (ret < 0)
                goto err_xa_alloc;
 
-       devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event;
-       ret = register_netdevice_notifier(&devlink->netdevice_nb);
-       if (ret)
-               goto err_register_netdevice_notifier;
-
        devlink->dev = dev;
        devlink->ops = ops;
        xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
@@ -233,8 +228,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 
        return devlink;
 
-err_register_netdevice_notifier:
-       xa_erase(&devlinks, devlink->index);
 err_xa_alloc:
        kfree(devlink);
        return NULL;
@@ -266,8 +259,6 @@ void devlink_free(struct devlink *devlink)
        xa_destroy(&devlink->params);
        xa_destroy(&devlink->ports);
 
-       WARN_ON_ONCE(unregister_netdevice_notifier(&devlink->netdevice_nb));
-
        xa_erase(&devlinks, devlink->index);
 
        devlink_put(devlink);
@@ -303,6 +294,10 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
        .pre_exit = devlink_pernet_pre_exit,
 };
 
+static struct notifier_block devlink_port_netdevice_nb = {
+       .notifier_call = devlink_port_netdevice_event,
+};
+
 static int __init devlink_init(void)
 {
        int err;
@@ -311,6 +306,9 @@ static int __init devlink_init(void)
        if (err)
                goto out;
        err = register_pernet_subsys(&devlink_pernet_ops);
+       if (err)
+               goto out;
+       err = register_netdevice_notifier(&devlink_port_netdevice_nb);
 
 out:
        WARN_ON(err);
index e133f42..62921b2 100644 (file)
@@ -50,7 +50,6 @@ struct devlink {
        u8 reload_failed:1;
        refcount_t refcount;
        struct rcu_work rwork;
-       struct notifier_block netdevice_nb;
        char priv[] __aligned(NETDEV_ALIGN);
 };
 
index dffca2f..cd02549 100644 (file)
@@ -7073,10 +7073,9 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
        struct devlink_port *devlink_port = netdev->devlink_port;
        struct devlink *devlink;
 
-       devlink = container_of(nb, struct devlink, netdevice_nb);
-
-       if (!devlink_port || devlink_port->devlink != devlink)
+       if (!devlink_port)
                return NOTIFY_OK;
+       devlink = devlink_port->devlink;
 
        switch (event) {
        case NETDEV_POST_INIT:
index ab1afe6..1afed89 100644 (file)
@@ -403,6 +403,24 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst)
        return 0;
 }
 
+static struct dsa_port *
+dsa_switch_preferred_default_local_cpu_port(struct dsa_switch *ds)
+{
+       struct dsa_port *cpu_dp;
+
+       if (!ds->ops->preferred_default_local_cpu_port)
+               return NULL;
+
+       cpu_dp = ds->ops->preferred_default_local_cpu_port(ds);
+       if (!cpu_dp)
+               return NULL;
+
+       if (WARN_ON(!dsa_port_is_cpu(cpu_dp) || cpu_dp->ds != ds))
+               return NULL;
+
+       return cpu_dp;
+}
+
 /* Perform initial assignment of CPU ports to user ports and DSA links in the
  * fabric, giving preference to CPU ports local to each switch. Default to
  * using the first CPU port in the switch tree if the port does not have a CPU
@@ -410,12 +428,16 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst)
  */
 static int dsa_tree_setup_cpu_ports(struct dsa_switch_tree *dst)
 {
-       struct dsa_port *cpu_dp, *dp;
+       struct dsa_port *preferred_cpu_dp, *cpu_dp, *dp;
 
        list_for_each_entry(cpu_dp, &dst->ports, list) {
                if (!dsa_port_is_cpu(cpu_dp))
                        continue;
 
+               preferred_cpu_dp = dsa_switch_preferred_default_local_cpu_port(cpu_dp->ds);
+               if (preferred_cpu_dp && preferred_cpu_dp != cpu_dp)
+                       continue;
+
                /* Prefer a local CPU port */
                dsa_switch_for_each_port(dp, cpu_dp->ds) {
                        /* Prefer the first local CPU port found */
index e6adc5d..6d37bab 100644 (file)
@@ -102,7 +102,7 @@ struct handshake_req_alloc_test_param handshake_req_alloc_params[] = {
        {
                .desc                   = "handshake_req_alloc excessive privsize",
                .proto                  = &handshake_req_alloc_proto_6,
-               .gfp                    = GFP_KERNEL,
+               .gfp                    = GFP_KERNEL | __GFP_NOWARN,
                .expect_success         = false,
        },
        {
@@ -209,6 +209,7 @@ static void handshake_req_submit_test4(struct kunit *test)
 {
        struct handshake_req *req, *result;
        struct socket *sock;
+       struct file *filp;
        int err;
 
        /* Arrange */
@@ -218,9 +219,10 @@ static void handshake_req_submit_test4(struct kunit *test)
        err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
                            &sock, 1);
        KUNIT_ASSERT_EQ(test, err, 0);
-       sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL);
-       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file);
+       filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
+       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
        KUNIT_ASSERT_NOT_NULL(test, sock->sk);
+       sock->file = filp;
 
        err = handshake_req_submit(sock, req, GFP_KERNEL);
        KUNIT_ASSERT_EQ(test, err, 0);
@@ -241,6 +243,7 @@ static void handshake_req_submit_test5(struct kunit *test)
        struct handshake_req *req;
        struct handshake_net *hn;
        struct socket *sock;
+       struct file *filp;
        struct net *net;
        int saved, err;
 
@@ -251,9 +254,10 @@ static void handshake_req_submit_test5(struct kunit *test)
        err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
                            &sock, 1);
        KUNIT_ASSERT_EQ(test, err, 0);
-       sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL);
-       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file);
+       filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
+       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
        KUNIT_ASSERT_NOT_NULL(test, sock->sk);
+       sock->file = filp;
 
        net = sock_net(sock->sk);
        hn = handshake_pernet(net);
@@ -276,6 +280,7 @@ static void handshake_req_submit_test6(struct kunit *test)
 {
        struct handshake_req *req1, *req2;
        struct socket *sock;
+       struct file *filp;
        int err;
 
        /* Arrange */
@@ -287,9 +292,10 @@ static void handshake_req_submit_test6(struct kunit *test)
        err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
                            &sock, 1);
        KUNIT_ASSERT_EQ(test, err, 0);
-       sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL);
-       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file);
+       filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
+       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
        KUNIT_ASSERT_NOT_NULL(test, sock->sk);
+       sock->file = filp;
 
        /* Act */
        err = handshake_req_submit(sock, req1, GFP_KERNEL);
@@ -307,6 +313,7 @@ static void handshake_req_cancel_test1(struct kunit *test)
 {
        struct handshake_req *req;
        struct socket *sock;
+       struct file *filp;
        bool result;
        int err;
 
@@ -318,8 +325,9 @@ static void handshake_req_cancel_test1(struct kunit *test)
                            &sock, 1);
        KUNIT_ASSERT_EQ(test, err, 0);
 
-       sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL);
-       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file);
+       filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
+       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
+       sock->file = filp;
 
        err = handshake_req_submit(sock, req, GFP_KERNEL);
        KUNIT_ASSERT_EQ(test, err, 0);
@@ -340,6 +348,7 @@ static void handshake_req_cancel_test2(struct kunit *test)
        struct handshake_req *req, *next;
        struct handshake_net *hn;
        struct socket *sock;
+       struct file *filp;
        struct net *net;
        bool result;
        int err;
@@ -352,8 +361,9 @@ static void handshake_req_cancel_test2(struct kunit *test)
                            &sock, 1);
        KUNIT_ASSERT_EQ(test, err, 0);
 
-       sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL);
-       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file);
+       filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
+       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
+       sock->file = filp;
 
        err = handshake_req_submit(sock, req, GFP_KERNEL);
        KUNIT_ASSERT_EQ(test, err, 0);
@@ -380,6 +390,7 @@ static void handshake_req_cancel_test3(struct kunit *test)
        struct handshake_req *req, *next;
        struct handshake_net *hn;
        struct socket *sock;
+       struct file *filp;
        struct net *net;
        bool result;
        int err;
@@ -392,8 +403,9 @@ static void handshake_req_cancel_test3(struct kunit *test)
                            &sock, 1);
        KUNIT_ASSERT_EQ(test, err, 0);
 
-       sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL);
-       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file);
+       filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
+       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
+       sock->file = filp;
 
        err = handshake_req_submit(sock, req, GFP_KERNEL);
        KUNIT_ASSERT_EQ(test, err, 0);
@@ -436,6 +448,7 @@ static void handshake_req_destroy_test1(struct kunit *test)
 {
        struct handshake_req *req;
        struct socket *sock;
+       struct file *filp;
        int err;
 
        /* Arrange */
@@ -448,8 +461,9 @@ static void handshake_req_destroy_test1(struct kunit *test)
                            &sock, 1);
        KUNIT_ASSERT_EQ(test, err, 0);
 
-       sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL);
-       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sock->file);
+       filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
+       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
+       sock->file = filp;
 
        err = handshake_req_submit(sock, req, GFP_KERNEL);
        KUNIT_ASSERT_EQ(test, err, 0);
index 35c9c44..1086653 100644 (file)
@@ -48,7 +48,7 @@ int handshake_genl_notify(struct net *net, const struct handshake_proto *proto,
                                proto->hp_handler_class))
                return -ESRCH;
 
-       msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+       msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, flags);
        if (!msg)
                return -ENOMEM;
 
@@ -99,9 +99,6 @@ static int handshake_dup(struct socket *sock)
        struct file *file;
        int newfd;
 
-       if (!sock->file)
-               return -EBADF;
-
        file = get_file(sock->file);
        newfd = get_unused_fd_flags(O_CLOEXEC);
        if (newfd < 0) {
@@ -142,15 +139,16 @@ int handshake_nl_accept_doit(struct sk_buff *skb, struct genl_info *info)
                goto out_complete;
        }
        err = req->hr_proto->hp_accept(req, info, fd);
-       if (err)
+       if (err) {
+               fput(sock->file);
                goto out_complete;
+       }
 
        trace_handshake_cmd_accept(net, req, req->hr_sk, fd);
        return 0;
 
 out_complete:
        handshake_complete(req, -EIO, NULL);
-       fput(sock->file);
 out_status:
        trace_handshake_cmd_accept_err(net, req, NULL, err);
        return err;
@@ -159,8 +157,8 @@ out_status:
 int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info)
 {
        struct net *net = sock_net(skb->sk);
+       struct handshake_req *req = NULL;
        struct socket *sock = NULL;
-       struct handshake_req *req;
        int fd, status, err;
 
        if (GENL_REQ_ATTR_CHECK(info, HANDSHAKE_A_DONE_SOCKFD))
index fcbeb63..b735f5c 100644 (file)
@@ -31,6 +31,7 @@ struct tls_handshake_req {
        int                     th_type;
        unsigned int            th_timeout_ms;
        int                     th_auth_mode;
+       const char              *th_peername;
        key_serial_t            th_keyring;
        key_serial_t            th_certificate;
        key_serial_t            th_privkey;
@@ -48,6 +49,7 @@ tls_handshake_req_init(struct handshake_req *req,
        treq->th_timeout_ms = args->ta_timeout_ms;
        treq->th_consumer_done = args->ta_done;
        treq->th_consumer_data = args->ta_data;
+       treq->th_peername = args->ta_peername;
        treq->th_keyring = args->ta_keyring;
        treq->th_num_peerids = 0;
        treq->th_certificate = TLS_NO_CERT;
@@ -214,6 +216,12 @@ static int tls_handshake_accept(struct handshake_req *req,
        ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_MESSAGE_TYPE, treq->th_type);
        if (ret < 0)
                goto out_cancel;
+       if (treq->th_peername) {
+               ret = nla_put_string(msg, HANDSHAKE_A_ACCEPT_PEERNAME,
+                                    treq->th_peername);
+               if (ret < 0)
+                       goto out_cancel;
+       }
        if (treq->th_timeout_ms) {
                ret = nla_put_u32(msg, HANDSHAKE_A_ACCEPT_TIMEOUT, treq->th_timeout_ms);
                if (ret < 0)
index e5d8439..c16db0b 100644 (file)
@@ -13,7 +13,7 @@
 
 #define MAXNAME                32
 #define WPAN_PHY_ENTRY __array(char, wpan_phy_name, MAXNAME)
-#define WPAN_PHY_ASSIGN        strlcpy(__entry->wpan_phy_name,  \
+#define WPAN_PHY_ASSIGN        strscpy(__entry->wpan_phy_name,  \
                                wpan_phy_name(wpan_phy), \
                                MAXNAME)
 #define WPAN_PHY_PR_FMT        "%s"
index 940062e..4a76ebf 100644 (file)
@@ -586,6 +586,7 @@ static long inet_wait_for_connect(struct sock *sk, long timeo, int writebias)
 
        add_wait_queue(sk_sleep(sk), &wait);
        sk->sk_write_pending += writebias;
+       sk->sk_wait_pending++;
 
        /* Basic assumption: if someone sets sk->sk_err, he _must_
         * change state of the socket from TCP_SYN_*.
@@ -601,6 +602,7 @@ static long inet_wait_for_connect(struct sock *sk, long timeo, int writebias)
        }
        remove_wait_queue(sk_sleep(sk), &wait);
        sk->sk_write_pending -= writebias;
+       sk->sk_wait_pending--;
        return timeo;
 }
 
@@ -894,7 +896,7 @@ int inet_shutdown(struct socket *sock, int how)
                   EPOLLHUP, even on eg. unconnected UDP sockets -- RR */
                fallthrough;
        default:
-               sk->sk_shutdown |= how;
+               WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | how);
                if (sk->sk_prot->shutdown)
                        sk->sk_prot->shutdown(sk, how);
                break;
index 3969fa8..ee848be 100644 (file)
@@ -340,6 +340,9 @@ static int esp_xmit(struct xfrm_state *x, struct sk_buff *skb,  netdev_features_
 
        secpath_reset(skb);
 
+       if (skb_needs_linearize(skb, skb->dev->features) &&
+           __skb_linearize(skb))
+               return -ENOMEM;
        return 0;
 }
 
index 65ad425..1386787 100644 (file)
@@ -1142,6 +1142,7 @@ struct sock *inet_csk_clone_lock(const struct sock *sk,
        if (newsk) {
                struct inet_connection_sock *newicsk = inet_csk(newsk);
 
+               newsk->sk_wait_pending = 0;
                inet_sk_set_state(newsk, TCP_SYN_RECV);
                newicsk->icsk_bind_hash = NULL;
                newicsk->icsk_bind2_hash = NULL;
index b511ff0..8e97d8d 100644 (file)
@@ -317,7 +317,14 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
                        ipc->tos = val;
                        ipc->priority = rt_tos2priority(ipc->tos);
                        break;
-
+               case IP_PROTOCOL:
+                       if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+                               return -EINVAL;
+                       val = *(int *)CMSG_DATA(cmsg);
+                       if (val < 1 || val > 255)
+                               return -EINVAL;
+                       ipc->protocol = val;
+                       break;
                default:
                        return -EINVAL;
                }
@@ -1761,6 +1768,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname,
        case IP_LOCAL_PORT_RANGE:
                val = inet->local_port_range.hi << 16 | inet->local_port_range.lo;
                break;
+       case IP_PROTOCOL:
+               val = inet_sk(sk)->inet_num;
+               break;
        default:
                sockopt_release_sock(sk);
                return -ENOPROTOOPT;
index ff712bf..eadf1c9 100644 (file)
@@ -532,6 +532,9 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
        }
 
        ipcm_init_sk(&ipc, inet);
+       /* Keep backward compat */
+       if (hdrincl)
+               ipc.protocol = IPPROTO_RAW;
 
        if (msg->msg_controllen) {
                err = ip_cmsg_send(sk, msg, &ipc, false);
@@ -599,7 +602,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
        flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, tos,
                           RT_SCOPE_UNIVERSE,
-                          hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+                          hdrincl ? ipc.protocol : sk->sk_protocol,
                           inet_sk_flowi_flags(sk) |
                            (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
                           daddr, saddr, 0, 0, sk->sk_uid);
index 40fe70f..88dfe51 100644 (file)
@@ -34,8 +34,8 @@ static int ip_ttl_min = 1;
 static int ip_ttl_max = 255;
 static int tcp_syn_retries_min = 1;
 static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
-static int ip_ping_group_range_min[] = { 0, 0 };
-static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
+static unsigned long ip_ping_group_range_min[] = { 0, 0 };
+static unsigned long ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
 static u32 u32_max_div_HZ = UINT_MAX / HZ;
 static int one_day_secs = 24 * 3600;
 static u32 fib_multipath_hash_fields_all_mask __maybe_unused =
@@ -165,7 +165,7 @@ static int ipv4_ping_group_range(struct ctl_table *table, int write,
 {
        struct user_namespace *user_ns = current_user_ns();
        int ret;
-       gid_t urange[2];
+       unsigned long urange[2];
        kgid_t low, high;
        struct ctl_table tmp = {
                .data = &urange,
@@ -178,7 +178,7 @@ static int ipv4_ping_group_range(struct ctl_table *table, int write,
        inet_get_ping_group_range_table(table, &low, &high);
        urange[0] = from_kgid_munged(user_ns, low);
        urange[1] = from_kgid_munged(user_ns, high);
-       ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+       ret = proc_doulongvec_minmax(&tmp, write, buffer, lenp, ppos);
 
        if (write && ret == 0) {
                low = make_kgid(user_ns, urange[0]);
index 20db115..8d20d92 100644 (file)
@@ -498,6 +498,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
        __poll_t mask;
        struct sock *sk = sock->sk;
        const struct tcp_sock *tp = tcp_sk(sk);
+       u8 shutdown;
        int state;
 
        sock_poll_wait(file, sock, wait);
@@ -540,9 +541,10 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
         * NOTE. Check for TCP_CLOSE is added. The goal is to prevent
         * blocking on fresh not-connected or disconnected socket. --ANK
         */
-       if (sk->sk_shutdown == SHUTDOWN_MASK || state == TCP_CLOSE)
+       shutdown = READ_ONCE(sk->sk_shutdown);
+       if (shutdown == SHUTDOWN_MASK || state == TCP_CLOSE)
                mask |= EPOLLHUP;
-       if (sk->sk_shutdown & RCV_SHUTDOWN)
+       if (shutdown & RCV_SHUTDOWN)
                mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
 
        /* Connected or passive Fast Open socket? */
@@ -559,7 +561,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
                if (tcp_stream_is_readable(sk, target))
                        mask |= EPOLLIN | EPOLLRDNORM;
 
-               if (!(sk->sk_shutdown & SEND_SHUTDOWN)) {
+               if (!(shutdown & SEND_SHUTDOWN)) {
                        if (__sk_stream_is_writeable(sk, 1)) {
                                mask |= EPOLLOUT | EPOLLWRNORM;
                        } else {  /* send SIGIO later */
@@ -1569,7 +1571,7 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
  * calculation of whether or not we must ACK for the sake of
  * a window update.
  */
-static void __tcp_cleanup_rbuf(struct sock *sk, int copied)
+void __tcp_cleanup_rbuf(struct sock *sk, int copied)
 {
        struct tcp_sock *tp = tcp_sk(sk);
        bool time_to_ack = false;
@@ -1771,7 +1773,6 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
                WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
                tcp_flags = TCP_SKB_CB(skb)->tcp_flags;
                used = recv_actor(sk, skb);
-               consume_skb(skb);
                if (used < 0) {
                        if (!copied)
                                copied = used;
@@ -1785,14 +1786,6 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
                        break;
                }
        }
-       WRITE_ONCE(tp->copied_seq, seq);
-
-       tcp_rcv_space_adjust(sk);
-
-       /* Clean up data we have read: This will do ACK frames. */
-       if (copied > 0)
-               __tcp_cleanup_rbuf(sk, copied);
-
        return copied;
 }
 EXPORT_SYMBOL(tcp_read_skb);
@@ -2867,7 +2860,7 @@ void __tcp_close(struct sock *sk, long timeout)
        int data_was_unread = 0;
        int state;
 
-       sk->sk_shutdown = SHUTDOWN_MASK;
+       WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
 
        if (sk->sk_state == TCP_LISTEN) {
                tcp_set_state(sk, TCP_CLOSE);
@@ -3088,6 +3081,12 @@ int tcp_disconnect(struct sock *sk, int flags)
        int old_state = sk->sk_state;
        u32 seq;
 
+       /* Deny disconnect if other threads are blocked in sk_wait_event()
+        * or inet_wait_for_connect().
+        */
+       if (sk->sk_wait_pending)
+               return -EBUSY;
+
        if (old_state != TCP_CLOSE)
                tcp_set_state(sk, TCP_CLOSE);
 
@@ -3119,7 +3118,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 
        inet_bhash2_reset_saddr(sk);
 
-       sk->sk_shutdown = 0;
+       WRITE_ONCE(sk->sk_shutdown, 0);
        sock_reset_flag(sk, SOCK_DONE);
        tp->srtt_us = 0;
        tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
@@ -4079,7 +4078,8 @@ int do_tcp_getsockopt(struct sock *sk, int level,
        switch (optname) {
        case TCP_MAXSEG:
                val = tp->mss_cache;
-               if (!val && ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)))
+               if (tp->rx_opt.user_mss &&
+                   ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)))
                        val = tp->rx_opt.user_mss;
                if (tp->repair)
                        val = tp->rx_opt.mss_clamp;
@@ -4649,7 +4649,7 @@ void tcp_done(struct sock *sk)
        if (req)
                reqsk_fastopen_remove(sk, req, false);
 
-       sk->sk_shutdown = SHUTDOWN_MASK;
+       WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
 
        if (!sock_flag(sk, SOCK_DEAD))
                sk->sk_state_change(sk);
index ebf9175..5f93918 100644 (file)
 #include <net/inet_common.h>
 #include <net/tls.h>
 
+void tcp_eat_skb(struct sock *sk, struct sk_buff *skb)
+{
+       struct tcp_sock *tcp;
+       int copied;
+
+       if (!skb || !skb->len || !sk_is_tcp(sk))
+               return;
+
+       if (skb_bpf_strparser(skb))
+               return;
+
+       tcp = tcp_sk(sk);
+       copied = tcp->copied_seq + skb->len;
+       WRITE_ONCE(tcp->copied_seq, copied);
+       tcp_rcv_space_adjust(sk);
+       __tcp_cleanup_rbuf(sk, skb->len);
+}
+
 static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock,
                           struct sk_msg *msg, u32 apply_bytes, int flags)
 {
@@ -168,20 +186,40 @@ static int tcp_msg_wait_data(struct sock *sk, struct sk_psock *psock,
        sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
        ret = sk_wait_event(sk, &timeo,
                            !list_empty(&psock->ingress_msg) ||
-                           !skb_queue_empty(&sk->sk_receive_queue), &wait);
+                           !skb_queue_empty_lockless(&sk->sk_receive_queue), &wait);
        sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
        remove_wait_queue(sk_sleep(sk), &wait);
        return ret;
 }
 
+static bool is_next_msg_fin(struct sk_psock *psock)
+{
+       struct scatterlist *sge;
+       struct sk_msg *msg_rx;
+       int i;
+
+       msg_rx = sk_psock_peek_msg(psock);
+       i = msg_rx->sg.start;
+       sge = sk_msg_elem(msg_rx, i);
+       if (!sge->length) {
+               struct sk_buff *skb = msg_rx->skb;
+
+               if (skb && TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
+                       return true;
+       }
+       return false;
+}
+
 static int tcp_bpf_recvmsg_parser(struct sock *sk,
                                  struct msghdr *msg,
                                  size_t len,
                                  int flags,
                                  int *addr_len)
 {
+       struct tcp_sock *tcp = tcp_sk(sk);
+       u32 seq = tcp->copied_seq;
        struct sk_psock *psock;
-       int copied;
+       int copied = 0;
 
        if (unlikely(flags & MSG_ERRQUEUE))
                return inet_recv_error(sk, msg, len, addr_len);
@@ -194,8 +232,43 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk,
                return tcp_recvmsg(sk, msg, len, flags, addr_len);
 
        lock_sock(sk);
+
+       /* We may have received data on the sk_receive_queue pre-accept and
+        * then we can not use read_skb in this context because we haven't
+        * assigned a sk_socket yet so have no link to the ops. The work-around
+        * is to check the sk_receive_queue and in these cases read skbs off
+        * queue again. The read_skb hook is not running at this point because
+        * of lock_sock so we avoid having multiple runners in read_skb.
+        */
+       if (unlikely(!skb_queue_empty(&sk->sk_receive_queue))) {
+               tcp_data_ready(sk);
+               /* This handles the ENOMEM errors if we both receive data
+                * pre accept and are already under memory pressure. At least
+                * let user know to retry.
+                */
+               if (unlikely(!skb_queue_empty(&sk->sk_receive_queue))) {
+                       copied = -EAGAIN;
+                       goto out;
+               }
+       }
+
 msg_bytes_ready:
        copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
+       /* The typical case for EFAULT is the socket was gracefully
+        * shutdown with a FIN pkt. So check here the other case is
+        * some error on copy_page_to_iter which would be unexpected.
+        * On fin return correct return code to zero.
+        */
+       if (copied == -EFAULT) {
+               bool is_fin = is_next_msg_fin(psock);
+
+               if (is_fin) {
+                       copied = 0;
+                       seq++;
+                       goto out;
+               }
+       }
+       seq += copied;
        if (!copied) {
                long timeo;
                int data;
@@ -233,6 +306,10 @@ msg_bytes_ready:
                copied = -EAGAIN;
        }
 out:
+       WRITE_ONCE(tcp->copied_seq, seq);
+       tcp_rcv_space_adjust(sk);
+       if (copied > 0)
+               __tcp_cleanup_rbuf(sk, copied);
        release_sock(sk);
        sk_psock_put(sk, psock);
        return copied;
index a057330..bf8b222 100644 (file)
@@ -4362,7 +4362,7 @@ void tcp_fin(struct sock *sk)
 
        inet_csk_schedule_ack(sk);
 
-       sk->sk_shutdown |= RCV_SHUTDOWN;
+       WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | RCV_SHUTDOWN);
        sock_set_flag(sk, SOCK_DONE);
 
        switch (sk->sk_state) {
@@ -4530,7 +4530,7 @@ static void tcp_sack_maybe_coalesce(struct tcp_sock *tp)
        }
 }
 
-static void tcp_sack_compress_send_ack(struct sock *sk)
+void tcp_sack_compress_send_ack(struct sock *sk)
 {
        struct tcp_sock *tp = tcp_sk(sk);
 
@@ -6599,7 +6599,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
                        break;
 
                tcp_set_state(sk, TCP_FIN_WAIT2);
-               sk->sk_shutdown |= SEND_SHUTDOWN;
+               WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | SEND_SHUTDOWN);
 
                sk_dst_confirm(sk);
 
index 39bda2b..06d2573 100644 (file)
@@ -829,6 +829,9 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
                                   inet_twsk(sk)->tw_priority : sk->sk_priority;
                transmit_time = tcp_transmit_time(sk);
                xfrm_sk_clone_policy(ctl_sk, sk);
+       } else {
+               ctl_sk->sk_mark = 0;
+               ctl_sk->sk_priority = 0;
        }
        ip_send_unicast_reply(ctl_sk,
                              skb, &TCP_SKB_CB(skb)->header.h4.opt,
@@ -836,7 +839,6 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
                              &arg, arg.iov[0].iov_len,
                              transmit_time);
 
-       ctl_sk->sk_mark = 0;
        xfrm_sk_free_policy(ctl_sk);
        sock_net_set(ctl_sk, &init_net);
        __TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
@@ -935,7 +937,6 @@ static void tcp_v4_send_ack(const struct sock *sk,
                              &arg, arg.iov[0].iov_len,
                              transmit_time);
 
-       ctl_sk->sk_mark = 0;
        sock_net_set(ctl_sk, &init_net);
        __TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
        local_bh_enable();
index 45dda78..4851211 100644 (file)
@@ -60,12 +60,12 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
        struct tcphdr *th;
        unsigned int thlen;
        unsigned int seq;
-       __be32 delta;
        unsigned int oldlen;
        unsigned int mss;
        struct sk_buff *gso_skb = skb;
        __sum16 newcheck;
        bool ooo_okay, copy_destructor;
+       __wsum delta;
 
        th = tcp_hdr(skb);
        thlen = th->doff * 4;
@@ -75,7 +75,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
        if (!pskb_may_pull(skb, thlen))
                goto out;
 
-       oldlen = (u16)~skb->len;
+       oldlen = ~skb->len;
        __skb_pull(skb, thlen);
 
        mss = skb_shinfo(skb)->gso_size;
@@ -110,7 +110,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
        if (skb_is_gso(segs))
                mss *= skb_shinfo(segs)->gso_segs;
 
-       delta = htonl(oldlen + (thlen + mss));
+       delta = (__force __wsum)htonl(oldlen + thlen + mss);
 
        skb = segs;
        th = tcp_hdr(skb);
@@ -119,8 +119,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
        if (unlikely(skb_shinfo(gso_skb)->tx_flags & SKBTX_SW_TSTAMP))
                tcp_gso_tstamp(segs, skb_shinfo(gso_skb)->tskey, seq, mss);
 
-       newcheck = ~csum_fold((__force __wsum)((__force u32)th->check +
-                                              (__force u32)delta));
+       newcheck = ~csum_fold(csum_add(csum_unfold(th->check), delta));
 
        while (skb->next) {
                th->fin = th->psh = 0;
@@ -165,11 +164,11 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
                        WARN_ON_ONCE(refcount_sub_and_test(-delta, &skb->sk->sk_wmem_alloc));
        }
 
-       delta = htonl(oldlen + (skb_tail_pointer(skb) -
-                               skb_transport_header(skb)) +
-                     skb->data_len);
-       th->check = ~csum_fold((__force __wsum)((__force u32)th->check +
-                               (__force u32)delta));
+       delta = (__force __wsum)htonl(oldlen +
+                                     (skb_tail_pointer(skb) -
+                                      skb_transport_header(skb)) +
+                                     skb->data_len);
+       th->check = ~csum_fold(csum_add(csum_unfold(th->check), delta));
        if (skb->ip_summed == CHECKSUM_PARTIAL)
                gso_reset_checksum(skb, ~th->check);
        else
index b839c2f..39eb947 100644 (file)
@@ -290,9 +290,19 @@ static int tcp_write_timeout(struct sock *sk)
 void tcp_delack_timer_handler(struct sock *sk)
 {
        struct inet_connection_sock *icsk = inet_csk(sk);
+       struct tcp_sock *tp = tcp_sk(sk);
 
-       if (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) ||
-           !(icsk->icsk_ack.pending & ICSK_ACK_TIMER))
+       if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))
+               return;
+
+       /* Handling the sack compression case */
+       if (tp->compressed_ack) {
+               tcp_mstamp_refresh(tp);
+               tcp_sack_compress_send_ack(sk);
+               return;
+       }
+
+       if (!(icsk->icsk_ack.pending & ICSK_ACK_TIMER))
                return;
 
        if (time_after(icsk->icsk_ack.timeout, jiffies)) {
@@ -312,7 +322,7 @@ void tcp_delack_timer_handler(struct sock *sk)
                        inet_csk_exit_pingpong_mode(sk);
                        icsk->icsk_ack.ato      = TCP_ATO_MIN;
                }
-               tcp_mstamp_refresh(tcp_sk(sk));
+               tcp_mstamp_refresh(tp);
                tcp_send_ack(sk);
                __NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKS);
        }
index aa32afd..9482def 100644 (file)
@@ -1818,7 +1818,7 @@ EXPORT_SYMBOL(__skb_recv_udp);
 int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 {
        struct sk_buff *skb;
-       int err, copied;
+       int err;
 
 try_again:
        skb = skb_recv_udp(sk, MSG_DONTWAIT, &err);
@@ -1837,10 +1837,7 @@ try_again:
        }
 
        WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
-       copied = recv_actor(sk, skb);
-       kfree_skb(skb);
-
-       return copied;
+       return recv_actor(sk, skb);
 }
 EXPORT_SYMBOL(udp_read_skb);
 
index e0c9cc3..143f93a 100644 (file)
@@ -22,6 +22,8 @@ static int udplite_sk_init(struct sock *sk)
 {
        udp_init_sock(sk);
        udp_sk(sk)->pcflag = UDPLITE_BIT;
+       pr_warn_once("UDP-Lite is deprecated and scheduled to be removed in 2025, "
+                    "please contact the netdev mailing list\n");
        return 0;
 }
 
@@ -64,6 +66,8 @@ struct proto  udplite_prot = {
        .per_cpu_fw_alloc  = &udp_memory_per_cpu_fw_alloc,
 
        .sysctl_mem        = sysctl_udp_mem,
+       .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min),
+       .sysctl_rmem_offset = offsetof(struct net, ipv4.sysctl_udp_rmem_min),
        .obj_size          = sizeof(struct udp_sock),
        .h.udp_table       = &udplite_table,
 };
index ad2afee..eac206a 100644 (file)
@@ -164,6 +164,7 @@ drop:
        kfree_skb(skb);
        return 0;
 }
+EXPORT_SYMBOL(xfrm4_udp_encap_rcv);
 
 int xfrm4_rcv(struct sk_buff *skb)
 {
index 75c0299..7723402 100644 (file)
@@ -374,6 +374,9 @@ static int esp6_xmit(struct xfrm_state *x, struct sk_buff *skb,  netdev_features
 
        secpath_reset(skb);
 
+       if (skb_needs_linearize(skb, skb->dev->features) &&
+           __skb_linearize(skb))
+               return -ENOMEM;
        return 0;
 }
 
index a8d961d..5fa0e37 100644 (file)
@@ -569,24 +569,6 @@ looped_back:
                return -1;
        }
 
-       if (skb_cloned(skb)) {
-               if (pskb_expand_head(skb, IPV6_RPL_SRH_WORST_SWAP_SIZE, 0,
-                                    GFP_ATOMIC)) {
-                       __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
-                                       IPSTATS_MIB_OUTDISCARDS);
-                       kfree_skb(skb);
-                       return -1;
-               }
-       } else {
-               err = skb_cow_head(skb, IPV6_RPL_SRH_WORST_SWAP_SIZE);
-               if (unlikely(err)) {
-                       kfree_skb(skb);
-                       return -1;
-               }
-       }
-
-       hdr = (struct ipv6_rpl_sr_hdr *)skb_transport_header(skb);
-
        if (!pskb_may_pull(skb, ipv6_rpl_srh_size(n, hdr->cmpri,
                                                  hdr->cmpre))) {
                kfree_skb(skb);
@@ -630,6 +612,17 @@ looped_back:
        skb_pull(skb, ((hdr->hdrlen + 1) << 3));
        skb_postpull_rcsum(skb, oldhdr,
                           sizeof(struct ipv6hdr) + ((hdr->hdrlen + 1) << 3));
+       if (unlikely(!hdr->segments_left)) {
+               if (pskb_expand_head(skb, sizeof(struct ipv6hdr) + ((chdr->hdrlen + 1) << 3), 0,
+                                    GFP_ATOMIC)) {
+                       __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_OUTDISCARDS);
+                       kfree_skb(skb);
+                       kfree(buf);
+                       return -1;
+               }
+
+               oldhdr = ipv6_hdr(skb);
+       }
        skb_push(skb, ((chdr->hdrlen + 1) << 3) + sizeof(struct ipv6hdr));
        skb_reset_network_header(skb);
        skb_mac_header_rebuild(skb);
index da46c42..49e31e4 100644 (file)
@@ -143,6 +143,8 @@ int ipv6_find_tlv(const struct sk_buff *skb, int offset, int type)
                        optlen = 1;
                        break;
                default:
+                       if (len < 2)
+                               goto bad;
                        optlen = nh[offset + 1] + 2;
                        if (optlen > len)
                                goto bad;
index 2438da5..bac768d 100644 (file)
@@ -2491,7 +2491,7 @@ static int ipv6_route_native_seq_show(struct seq_file *seq, void *v)
        const struct net_device *dev;
 
        if (rt->nh)
-               fib6_nh = nexthop_fib6_nh_bh(rt->nh);
+               fib6_nh = nexthop_fib6_nh(rt->nh);
 
        seq_printf(seq, "%pi6 %02x ", &rt->fib6_dst.addr, rt->fib6_dst.plen);
 
@@ -2556,14 +2556,14 @@ static struct fib6_table *ipv6_route_seq_next_table(struct fib6_table *tbl,
 
        if (tbl) {
                h = (tbl->tb6_id & (FIB6_TABLE_HASHSZ - 1)) + 1;
-               node = rcu_dereference_bh(hlist_next_rcu(&tbl->tb6_hlist));
+               node = rcu_dereference(hlist_next_rcu(&tbl->tb6_hlist));
        } else {
                h = 0;
                node = NULL;
        }
 
        while (!node && h < FIB6_TABLE_HASHSZ) {
-               node = rcu_dereference_bh(
+               node = rcu_dereference(
                        hlist_first_rcu(&net->ipv6.fib_table_hash[h++]));
        }
        return hlist_entry_safe(node, struct fib6_table, tb6_hlist);
@@ -2593,7 +2593,7 @@ static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos)
        if (!v)
                goto iter_table;
 
-       n = rcu_dereference_bh(((struct fib6_info *)v)->fib6_next);
+       n = rcu_dereference(((struct fib6_info *)v)->fib6_next);
        if (n)
                return n;
 
@@ -2619,12 +2619,12 @@ iter_table:
 }
 
 static void *ipv6_route_seq_start(struct seq_file *seq, loff_t *pos)
-       __acquires(RCU_BH)
+       __acquires(RCU)
 {
        struct net *net = seq_file_net(seq);
        struct ipv6_route_iter *iter = seq->private;
 
-       rcu_read_lock_bh();
+       rcu_read_lock();
        iter->tbl = ipv6_route_seq_next_table(NULL, net);
        iter->skip = *pos;
 
@@ -2645,7 +2645,7 @@ static bool ipv6_route_iter_active(struct ipv6_route_iter *iter)
 }
 
 static void ipv6_route_native_seq_stop(struct seq_file *seq, void *v)
-       __releases(RCU_BH)
+       __releases(RCU)
 {
        struct net *net = seq_file_net(seq);
        struct ipv6_route_iter *iter = seq->private;
@@ -2653,7 +2653,7 @@ static void ipv6_route_native_seq_stop(struct seq_file *seq, void *v)
        if (ipv6_route_iter_active(iter))
                fib6_walker_unlink(net, &iter->w);
 
-       rcu_read_unlock_bh();
+       rcu_read_unlock();
 }
 
 #if IS_BUILTIN(CONFIG_IPV6) && defined(CONFIG_BPF_SYSCALL)
index a4ecfc9..da80974 100644 (file)
@@ -1015,12 +1015,14 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
                                            ntohl(tun_id),
                                            ntohl(md->u.index), truncate,
                                            false);
+                       proto = htons(ETH_P_ERSPAN);
                } else if (md->version == 2) {
                        erspan_build_header_v2(skb,
                                               ntohl(tun_id),
                                               md->u.md2.dir,
                                               get_hwid(&md->u.md2),
                                               truncate, false);
+                       proto = htons(ETH_P_ERSPAN2);
                } else {
                        goto tx_err;
                }
@@ -1043,24 +1045,25 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
                        break;
                }
 
-               if (t->parms.erspan_ver == 1)
+               if (t->parms.erspan_ver == 1) {
                        erspan_build_header(skb, ntohl(t->parms.o_key),
                                            t->parms.index,
                                            truncate, false);
-               else if (t->parms.erspan_ver == 2)
+                       proto = htons(ETH_P_ERSPAN);
+               } else if (t->parms.erspan_ver == 2) {
                        erspan_build_header_v2(skb, ntohl(t->parms.o_key),
                                               t->parms.dir,
                                               t->parms.hwid,
                                               truncate, false);
-               else
+                       proto = htons(ETH_P_ERSPAN2);
+               } else {
                        goto tx_err;
+               }
 
                fl6.daddr = t->parms.raddr;
        }
 
        /* Push GRE header. */
-       proto = (t->parms.erspan_ver == 1) ? htons(ETH_P_ERSPAN)
-                                          : htons(ETH_P_ERSPAN2);
        gre_build_header(skb, 8, TUNNEL_SEQ, proto, 0, htonl(atomic_fetch_inc(&t->o_seqno)));
 
        /* TooBig packet may have updated dst->dev's mtu */
index c4835db..f804c11 100644 (file)
@@ -114,7 +114,8 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
        addr_type = ipv6_addr_type(daddr);
        if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
            (addr_type & IPV6_ADDR_MAPPED) ||
-           (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
+           (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
+            l3mdev_master_ifindex_by_index(sock_net(sk), oif) != sk->sk_bound_dev_if))
                return -EINVAL;
 
        ipcm6_init_sk(&ipc6, np);
index 7d0adb6..44ee7a2 100644 (file)
@@ -793,7 +793,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
                if (!proto)
                        proto = inet->inet_num;
-               else if (proto != inet->inet_num)
+               else if (proto != inet->inet_num &&
+                        inet->inet_num != IPPROTO_RAW)
                        return -EINVAL;
 
                if (proto > 255)
index e3aec46..392aaa3 100644 (file)
@@ -6412,9 +6412,9 @@ static struct ctl_table ipv6_route_table_template[] = {
        {
                .procname       =       "skip_notify_on_dev_down",
                .data           =       &init_net.ipv6.sysctl.skip_notify_on_dev_down,
-               .maxlen         =       sizeof(int),
+               .maxlen         =       sizeof(u8),
                .mode           =       0644,
-               .proc_handler   =       proc_dointvec_minmax,
+               .proc_handler   =       proc_dou8vec_minmax,
                .extra1         =       SYSCTL_ZERO,
                .extra2         =       SYSCTL_ONE,
        },
index 67eaf3c..8e010d0 100644 (file)
@@ -8,6 +8,8 @@
  *  Changes:
  *  Fixes:
  */
+#define pr_fmt(fmt) "UDPLite6: " fmt
+
 #include <linux/export.h>
 #include <linux/proc_fs.h>
 #include "udp_impl.h"
@@ -16,6 +18,8 @@ static int udplitev6_sk_init(struct sock *sk)
 {
        udpv6_init_sock(sk);
        udp_sk(sk)->pcflag = UDPLITE_BIT;
+       pr_warn_once("UDP-Lite is deprecated and scheduled to be removed in 2025, "
+                    "please contact the netdev mailing list\n");
        return 0;
 }
 
@@ -60,6 +64,8 @@ struct proto udplitev6_prot = {
        .per_cpu_fw_alloc  = &udp_memory_per_cpu_fw_alloc,
 
        .sysctl_mem        = sysctl_udp_mem,
+       .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min),
+       .sysctl_rmem_offset = offsetof(struct net, ipv4.sysctl_udp_rmem_min),
        .obj_size          = sizeof(struct udp6_sock),
        .h.udp_table       = &udplite_table,
 };
index 04cbeef..4907ab2 100644 (file)
@@ -86,6 +86,9 @@ int xfrm6_udp_encap_rcv(struct sock *sk, struct sk_buff *skb)
        __be32 *udpdata32;
        __u16 encap_type = up->encap_type;
 
+       if (skb->protocol == htons(ETH_P_IP))
+               return xfrm4_udp_encap_rcv(sk, skb);
+
        /* if this is not encapsulated socket, then just return now */
        if (!encap_type)
                return 1;
index a815f5a..31ab12f 100644 (file)
@@ -1940,7 +1940,8 @@ static u32 gen_reqid(struct net *net)
 }
 
 static int
-parse_ipsecrequest(struct xfrm_policy *xp, struct sadb_x_ipsecrequest *rq)
+parse_ipsecrequest(struct xfrm_policy *xp, struct sadb_x_policy *pol,
+                  struct sadb_x_ipsecrequest *rq)
 {
        struct net *net = xp_net(xp);
        struct xfrm_tmpl *t = xp->xfrm_vec + xp->xfrm_nr;
@@ -1958,9 +1959,12 @@ parse_ipsecrequest(struct xfrm_policy *xp, struct sadb_x_ipsecrequest *rq)
        if ((mode = pfkey_mode_to_xfrm(rq->sadb_x_ipsecrequest_mode)) < 0)
                return -EINVAL;
        t->mode = mode;
-       if (rq->sadb_x_ipsecrequest_level == IPSEC_LEVEL_USE)
+       if (rq->sadb_x_ipsecrequest_level == IPSEC_LEVEL_USE) {
+               if ((mode == XFRM_MODE_TUNNEL || mode == XFRM_MODE_BEET) &&
+                   pol->sadb_x_policy_dir == IPSEC_DIR_OUTBOUND)
+                       return -EINVAL;
                t->optional = 1;
-       else if (rq->sadb_x_ipsecrequest_level == IPSEC_LEVEL_UNIQUE) {
+       else if (rq->sadb_x_ipsecrequest_level == IPSEC_LEVEL_UNIQUE) {
                t->reqid = rq->sadb_x_ipsecrequest_reqid;
                if (t->reqid > IPSEC_MANUAL_REQID_MAX)
                        t->reqid = 0;
@@ -2002,7 +2006,7 @@ parse_ipsecrequests(struct xfrm_policy *xp, struct sadb_x_policy *pol)
                    rq->sadb_x_ipsecrequest_len < sizeof(*rq))
                        return -EINVAL;
 
-               if ((err = parse_ipsecrequest(xp, rq)) < 0)
+               if ((err = parse_ipsecrequest(xp, pol, rq)) < 0)
                        return err;
                len -= rq->sadb_x_ipsecrequest_len;
                rq = (void*)((u8*)rq + rq->sadb_x_ipsecrequest_len);
index da7fe94..9ffbc66 100644 (file)
@@ -583,7 +583,8 @@ static int llc_ui_wait_for_disc(struct sock *sk, long timeout)
 
        add_wait_queue(sk_sleep(sk), &wait);
        while (1) {
-               if (sk_wait_event(sk, &timeout, sk->sk_state == TCP_CLOSE, &wait))
+               if (sk_wait_event(sk, &timeout,
+                                 READ_ONCE(sk->sk_state) == TCP_CLOSE, &wait))
                        break;
                rc = -ERESTARTSYS;
                if (signal_pending(current))
@@ -603,7 +604,8 @@ static bool llc_ui_wait_for_conn(struct sock *sk, long timeout)
 
        add_wait_queue(sk_sleep(sk), &wait);
        while (1) {
-               if (sk_wait_event(sk, &timeout, sk->sk_state != TCP_SYN_SENT, &wait))
+               if (sk_wait_event(sk, &timeout,
+                                 READ_ONCE(sk->sk_state) != TCP_SYN_SENT, &wait))
                        break;
                if (signal_pending(current) || !timeout)
                        break;
@@ -622,7 +624,7 @@ static int llc_ui_wait_for_busy_core(struct sock *sk, long timeout)
        while (1) {
                rc = 0;
                if (sk_wait_event(sk, &timeout,
-                                 (sk->sk_shutdown & RCV_SHUTDOWN) ||
+                                 (READ_ONCE(sk->sk_shutdown) & RCV_SHUTDOWN) ||
                                  (!llc_data_accept_state(llc->state) &&
                                   !llc->remote_busy_flag &&
                                   !llc->p_flag), &wait))
index 7317e4a..f2d08db 100644 (file)
@@ -1578,9 +1578,10 @@ static int ieee80211_stop_ap(struct wiphy *wiphy, struct net_device *dev,
                sdata_dereference(link->u.ap.unsol_bcast_probe_resp,
                                  sdata);
 
-       /* abort any running channel switch */
+       /* abort any running channel switch or color change */
        mutex_lock(&local->mtx);
        link_conf->csa_active = false;
+       link_conf->color_change_active = false;
        if (link->csa_block_tx) {
                ieee80211_wake_vif_queues(local, sdata,
                                          IEEE80211_QUEUE_STOP_REASON_CSA);
@@ -3589,7 +3590,7 @@ void ieee80211_channel_switch_disconnect(struct ieee80211_vif *vif, bool block_t
 EXPORT_SYMBOL(ieee80211_channel_switch_disconnect);
 
 static int ieee80211_set_after_csa_beacon(struct ieee80211_sub_if_data *sdata,
-                                         u32 *changed)
+                                         u64 *changed)
 {
        int err;
 
@@ -3632,7 +3633,7 @@ static int ieee80211_set_after_csa_beacon(struct ieee80211_sub_if_data *sdata,
 static int __ieee80211_csa_finalize(struct ieee80211_sub_if_data *sdata)
 {
        struct ieee80211_local *local = sdata->local;
-       u32 changed = 0;
+       u64 changed = 0;
        int err;
 
        sdata_assert_lock(sdata);
@@ -4864,11 +4865,16 @@ static int ieee80211_add_intf_link(struct wiphy *wiphy,
                                   unsigned int link_id)
 {
        struct ieee80211_sub_if_data *sdata = IEEE80211_WDEV_TO_SUB_IF(wdev);
+       int res;
 
        if (wdev->use_4addr)
                return -EOPNOTSUPP;
 
-       return ieee80211_vif_set_links(sdata, wdev->valid_links);
+       mutex_lock(&sdata->local->mtx);
+       res = ieee80211_vif_set_links(sdata, wdev->valid_links);
+       mutex_unlock(&sdata->local->mtx);
+
+       return res;
 }
 
 static void ieee80211_del_intf_link(struct wiphy *wiphy,
@@ -4877,7 +4883,9 @@ static void ieee80211_del_intf_link(struct wiphy *wiphy,
 {
        struct ieee80211_sub_if_data *sdata = IEEE80211_WDEV_TO_SUB_IF(wdev);
 
+       mutex_lock(&sdata->local->mtx);
        ieee80211_vif_set_links(sdata, wdev->valid_links);
+       mutex_unlock(&sdata->local->mtx);
 }
 
 static int sta_add_link_station(struct ieee80211_local *local,
index dbc34fb..77c90ed 100644 (file)
@@ -258,7 +258,8 @@ ieee80211_get_max_required_bw(struct ieee80211_sub_if_data *sdata,
 
 static enum nl80211_chan_width
 ieee80211_get_chanctx_vif_max_required_bw(struct ieee80211_sub_if_data *sdata,
-                                         struct ieee80211_chanctx_conf *conf)
+                                         struct ieee80211_chanctx *ctx,
+                                         struct ieee80211_link_data *rsvd_for)
 {
        enum nl80211_chan_width max_bw = NL80211_CHAN_WIDTH_20_NOHT;
        struct ieee80211_vif *vif = &sdata->vif;
@@ -267,13 +268,14 @@ ieee80211_get_chanctx_vif_max_required_bw(struct ieee80211_sub_if_data *sdata,
        rcu_read_lock();
        for (link_id = 0; link_id < ARRAY_SIZE(sdata->link); link_id++) {
                enum nl80211_chan_width width = NL80211_CHAN_WIDTH_20_NOHT;
-               struct ieee80211_bss_conf *link_conf =
-                       rcu_dereference(sdata->vif.link_conf[link_id]);
+               struct ieee80211_link_data *link =
+                       rcu_dereference(sdata->link[link_id]);
 
-               if (!link_conf)
+               if (!link)
                        continue;
 
-               if (rcu_access_pointer(link_conf->chanctx_conf) != conf)
+               if (link != rsvd_for &&
+                   rcu_access_pointer(link->conf->chanctx_conf) != &ctx->conf)
                        continue;
 
                switch (vif->type) {
@@ -287,7 +289,7 @@ ieee80211_get_chanctx_vif_max_required_bw(struct ieee80211_sub_if_data *sdata,
                         * point, so take the width from the chandef, but
                         * account also for TDLS peers
                         */
-                       width = max(link_conf->chandef.width,
+                       width = max(link->conf->chandef.width,
                                    ieee80211_get_max_required_bw(sdata, link_id));
                        break;
                case NL80211_IFTYPE_P2P_DEVICE:
@@ -296,7 +298,7 @@ ieee80211_get_chanctx_vif_max_required_bw(struct ieee80211_sub_if_data *sdata,
                case NL80211_IFTYPE_ADHOC:
                case NL80211_IFTYPE_MESH_POINT:
                case NL80211_IFTYPE_OCB:
-                       width = link_conf->chandef.width;
+                       width = link->conf->chandef.width;
                        break;
                case NL80211_IFTYPE_WDS:
                case NL80211_IFTYPE_UNSPECIFIED:
@@ -316,7 +318,8 @@ ieee80211_get_chanctx_vif_max_required_bw(struct ieee80211_sub_if_data *sdata,
 
 static enum nl80211_chan_width
 ieee80211_get_chanctx_max_required_bw(struct ieee80211_local *local,
-                                     struct ieee80211_chanctx_conf *conf)
+                                     struct ieee80211_chanctx *ctx,
+                                     struct ieee80211_link_data *rsvd_for)
 {
        struct ieee80211_sub_if_data *sdata;
        enum nl80211_chan_width max_bw = NL80211_CHAN_WIDTH_20_NOHT;
@@ -328,7 +331,8 @@ ieee80211_get_chanctx_max_required_bw(struct ieee80211_local *local,
                if (!ieee80211_sdata_running(sdata))
                        continue;
 
-               width = ieee80211_get_chanctx_vif_max_required_bw(sdata, conf);
+               width = ieee80211_get_chanctx_vif_max_required_bw(sdata, ctx,
+                                                                 rsvd_for);
 
                max_bw = max(max_bw, width);
        }
@@ -336,8 +340,8 @@ ieee80211_get_chanctx_max_required_bw(struct ieee80211_local *local,
        /* use the configured bandwidth in case of monitor interface */
        sdata = rcu_dereference(local->monitor_sdata);
        if (sdata &&
-           rcu_access_pointer(sdata->vif.bss_conf.chanctx_conf) == conf)
-               max_bw = max(max_bw, conf->def.width);
+           rcu_access_pointer(sdata->vif.bss_conf.chanctx_conf) == &ctx->conf)
+               max_bw = max(max_bw, ctx->conf.def.width);
 
        rcu_read_unlock();
 
@@ -349,8 +353,10 @@ ieee80211_get_chanctx_max_required_bw(struct ieee80211_local *local,
  * the max of min required widths of all the interfaces bound to this
  * channel context.
  */
-static u32 _ieee80211_recalc_chanctx_min_def(struct ieee80211_local *local,
-                                            struct ieee80211_chanctx *ctx)
+static u32
+_ieee80211_recalc_chanctx_min_def(struct ieee80211_local *local,
+                                 struct ieee80211_chanctx *ctx,
+                                 struct ieee80211_link_data *rsvd_for)
 {
        enum nl80211_chan_width max_bw;
        struct cfg80211_chan_def min_def;
@@ -370,7 +376,7 @@ static u32 _ieee80211_recalc_chanctx_min_def(struct ieee80211_local *local,
                return 0;
        }
 
-       max_bw = ieee80211_get_chanctx_max_required_bw(local, &ctx->conf);
+       max_bw = ieee80211_get_chanctx_max_required_bw(local, ctx, rsvd_for);
 
        /* downgrade chandef up to max_bw */
        min_def = ctx->conf.def;
@@ -448,9 +454,10 @@ static void ieee80211_chan_bw_change(struct ieee80211_local *local,
  * channel context.
  */
 void ieee80211_recalc_chanctx_min_def(struct ieee80211_local *local,
-                                     struct ieee80211_chanctx *ctx)
+                                     struct ieee80211_chanctx *ctx,
+                                     struct ieee80211_link_data *rsvd_for)
 {
-       u32 changed = _ieee80211_recalc_chanctx_min_def(local, ctx);
+       u32 changed = _ieee80211_recalc_chanctx_min_def(local, ctx, rsvd_for);
 
        if (!changed)
                return;
@@ -464,10 +471,11 @@ void ieee80211_recalc_chanctx_min_def(struct ieee80211_local *local,
        ieee80211_chan_bw_change(local, ctx, false);
 }
 
-static void ieee80211_change_chanctx(struct ieee80211_local *local,
-                                    struct ieee80211_chanctx *ctx,
-                                    struct ieee80211_chanctx *old_ctx,
-                                    const struct cfg80211_chan_def *chandef)
+static void _ieee80211_change_chanctx(struct ieee80211_local *local,
+                                     struct ieee80211_chanctx *ctx,
+                                     struct ieee80211_chanctx *old_ctx,
+                                     const struct cfg80211_chan_def *chandef,
+                                     struct ieee80211_link_data *rsvd_for)
 {
        u32 changed;
 
@@ -492,7 +500,7 @@ static void ieee80211_change_chanctx(struct ieee80211_local *local,
        ieee80211_chan_bw_change(local, old_ctx, true);
 
        if (cfg80211_chandef_identical(&ctx->conf.def, chandef)) {
-               ieee80211_recalc_chanctx_min_def(local, ctx);
+               ieee80211_recalc_chanctx_min_def(local, ctx, rsvd_for);
                return;
        }
 
@@ -502,7 +510,7 @@ static void ieee80211_change_chanctx(struct ieee80211_local *local,
 
        /* check if min chanctx also changed */
        changed = IEEE80211_CHANCTX_CHANGE_WIDTH |
-                 _ieee80211_recalc_chanctx_min_def(local, ctx);
+                 _ieee80211_recalc_chanctx_min_def(local, ctx, rsvd_for);
        drv_change_chanctx(local, ctx, changed);
 
        if (!local->use_chanctx) {
@@ -514,6 +522,14 @@ static void ieee80211_change_chanctx(struct ieee80211_local *local,
        ieee80211_chan_bw_change(local, old_ctx, false);
 }
 
+static void ieee80211_change_chanctx(struct ieee80211_local *local,
+                                    struct ieee80211_chanctx *ctx,
+                                    struct ieee80211_chanctx *old_ctx,
+                                    const struct cfg80211_chan_def *chandef)
+{
+       _ieee80211_change_chanctx(local, ctx, old_ctx, chandef, NULL);
+}
+
 static struct ieee80211_chanctx *
 ieee80211_find_chanctx(struct ieee80211_local *local,
                       const struct cfg80211_chan_def *chandef,
@@ -638,7 +654,7 @@ ieee80211_alloc_chanctx(struct ieee80211_local *local,
        ctx->conf.rx_chains_dynamic = 1;
        ctx->mode = mode;
        ctx->conf.radar_enabled = false;
-       ieee80211_recalc_chanctx_min_def(local, ctx);
+       _ieee80211_recalc_chanctx_min_def(local, ctx, NULL);
 
        return ctx;
 }
@@ -855,6 +871,9 @@ static int ieee80211_assign_link_chanctx(struct ieee80211_link_data *link,
        }
 
        if (new_ctx) {
+               /* recalc considering the link we'll use it for now */
+               ieee80211_recalc_chanctx_min_def(local, new_ctx, link);
+
                ret = drv_assign_vif_chanctx(local, sdata, link->conf, new_ctx);
                if (ret)
                        goto out;
@@ -873,12 +892,12 @@ out:
                ieee80211_recalc_chanctx_chantype(local, curr_ctx);
                ieee80211_recalc_smps_chanctx(local, curr_ctx);
                ieee80211_recalc_radar_chanctx(local, curr_ctx);
-               ieee80211_recalc_chanctx_min_def(local, curr_ctx);
+               ieee80211_recalc_chanctx_min_def(local, curr_ctx, NULL);
        }
 
        if (new_ctx && ieee80211_chanctx_num_assigned(local, new_ctx) > 0) {
                ieee80211_recalc_txpower(sdata, false);
-               ieee80211_recalc_chanctx_min_def(local, new_ctx);
+               ieee80211_recalc_chanctx_min_def(local, new_ctx, NULL);
        }
 
        if (sdata->vif.type != NL80211_IFTYPE_P2P_DEVICE &&
@@ -1270,7 +1289,7 @@ ieee80211_link_use_reserved_reassign(struct ieee80211_link_data *link)
 
        ieee80211_link_update_chandef(link, &link->reserved_chandef);
 
-       ieee80211_change_chanctx(local, new_ctx, old_ctx, chandef);
+       _ieee80211_change_chanctx(local, new_ctx, old_ctx, chandef, link);
 
        vif_chsw[0].vif = &sdata->vif;
        vif_chsw[0].old_ctx = &old_ctx->conf;
@@ -1300,7 +1319,7 @@ ieee80211_link_use_reserved_reassign(struct ieee80211_link_data *link)
        if (ieee80211_chanctx_refcount(local, old_ctx) == 0)
                ieee80211_free_chanctx(local, old_ctx);
 
-       ieee80211_recalc_chanctx_min_def(local, new_ctx);
+       ieee80211_recalc_chanctx_min_def(local, new_ctx, NULL);
        ieee80211_recalc_smps_chanctx(local, new_ctx);
        ieee80211_recalc_radar_chanctx(local, new_ctx);
 
@@ -1665,7 +1684,7 @@ static int ieee80211_vif_use_reserved_switch(struct ieee80211_local *local)
                ieee80211_recalc_chanctx_chantype(local, ctx);
                ieee80211_recalc_smps_chanctx(local, ctx);
                ieee80211_recalc_radar_chanctx(local, ctx);
-               ieee80211_recalc_chanctx_min_def(local, ctx);
+               ieee80211_recalc_chanctx_min_def(local, ctx, NULL);
 
                list_for_each_entry_safe(link, link_tmp, &ctx->reserved_links,
                                         reserved_chanctx_list) {
index 729f261..0322aba 100644 (file)
@@ -3,7 +3,7 @@
  * HE handling
  *
  * Copyright(c) 2017 Intel Deutschland GmbH
- * Copyright(c) 2019 - 2022 Intel Corporation
+ * Copyright(c) 2019 - 2023 Intel Corporation
  */
 
 #include "ieee80211_i.h"
@@ -114,6 +114,7 @@ ieee80211_he_cap_ie_to_sta_he_cap(struct ieee80211_sub_if_data *sdata,
                                  struct link_sta_info *link_sta)
 {
        struct ieee80211_sta_he_cap *he_cap = &link_sta->pub->he_cap;
+       const struct ieee80211_sta_he_cap *own_he_cap_ptr;
        struct ieee80211_sta_he_cap own_he_cap;
        struct ieee80211_he_cap_elem *he_cap_ie_elem = (void *)he_cap_ie;
        u8 he_ppe_size;
@@ -123,12 +124,16 @@ ieee80211_he_cap_ie_to_sta_he_cap(struct ieee80211_sub_if_data *sdata,
 
        memset(he_cap, 0, sizeof(*he_cap));
 
-       if (!he_cap_ie ||
-           !ieee80211_get_he_iftype_cap(sband,
-                                        ieee80211_vif_type_p2p(&sdata->vif)))
+       if (!he_cap_ie)
                return;
 
-       own_he_cap = sband->iftype_data->he_cap;
+       own_he_cap_ptr =
+               ieee80211_get_he_iftype_cap(sband,
+                                           ieee80211_vif_type_p2p(&sdata->vif));
+       if (!own_he_cap_ptr)
+               return;
+
+       own_he_cap = *own_he_cap_ptr;
 
        /* Make sure size is OK */
        mcs_nss_size = ieee80211_he_mcs_nss_size(he_cap_ie_elem);
index a0a7839..4159fb6 100644 (file)
@@ -2312,7 +2312,7 @@ ieee802_11_parse_elems(const u8 *start, size_t len, bool action,
        return ieee802_11_parse_elems_crc(start, len, action, 0, 0, bss);
 }
 
-void ieee80211_fragment_element(struct sk_buff *skb, u8 *len_pos);
+void ieee80211_fragment_element(struct sk_buff *skb, u8 *len_pos, u8 frag_id);
 
 extern const int ieee802_1d_to_ac[8];
 
@@ -2537,7 +2537,8 @@ int ieee80211_chanctx_refcount(struct ieee80211_local *local,
 void ieee80211_recalc_smps_chanctx(struct ieee80211_local *local,
                                   struct ieee80211_chanctx *chanctx);
 void ieee80211_recalc_chanctx_min_def(struct ieee80211_local *local,
-                                     struct ieee80211_chanctx *ctx);
+                                     struct ieee80211_chanctx *ctx,
+                                     struct ieee80211_link_data *rsvd_for);
 bool ieee80211_is_radar_required(struct ieee80211_local *local);
 
 void ieee80211_dfs_cac_timer(unsigned long data);
index e82db88..40f030b 100644 (file)
@@ -2,7 +2,7 @@
 /*
  * MLO link handling
  *
- * Copyright (C) 2022 Intel Corporation
+ * Copyright (C) 2022-2023 Intel Corporation
  */
 #include <linux/slab.h>
 #include <linux/kernel.h>
@@ -409,6 +409,7 @@ static int _ieee80211_set_active_links(struct ieee80211_sub_if_data *sdata,
                                                 IEEE80211_CHANCTX_SHARED);
                WARN_ON_ONCE(ret);
 
+               ieee80211_mgd_set_link_qos_params(link);
                ieee80211_link_info_change_notify(sdata, link,
                                                  BSS_CHANGED_ERP_CTS_PROT |
                                                  BSS_CHANGED_ERP_PREAMBLE |
@@ -423,7 +424,6 @@ static int _ieee80211_set_active_links(struct ieee80211_sub_if_data *sdata,
                                                  BSS_CHANGED_TWT |
                                                  BSS_CHANGED_HE_OBSS_PD |
                                                  BSS_CHANGED_HE_BSS_COLOR);
-               ieee80211_mgd_set_link_qos_params(link);
        }
 
        old_active = sdata->vif.active_links;
index e13a035..5a43031 100644 (file)
@@ -1217,6 +1217,7 @@ static void ieee80211_add_non_inheritance_elem(struct sk_buff *skb,
                                               const u16 *inner)
 {
        unsigned int skb_len = skb->len;
+       bool at_extension = false;
        bool added = false;
        int i, j;
        u8 *len, *list_len = NULL;
@@ -1228,7 +1229,6 @@ static void ieee80211_add_non_inheritance_elem(struct sk_buff *skb,
        for (i = 0; i < PRESENT_ELEMS_MAX && outer[i]; i++) {
                u16 elem = outer[i];
                bool have_inner = false;
-               bool at_extension = false;
 
                /* should at least be sorted in the sense of normal -> ext */
                WARN_ON(at_extension && elem < PRESENT_ELEM_EXT_OFFS);
@@ -1257,8 +1257,14 @@ static void ieee80211_add_non_inheritance_elem(struct sk_buff *skb,
                }
                *list_len += 1;
                skb_put_u8(skb, (u8)elem);
+               added = true;
        }
 
+       /* if we added a list but no extension list, make a zero-len one */
+       if (added && (!at_extension || !list_len))
+               skb_put_u8(skb, 0);
+
+       /* if nothing added remove extension element completely */
        if (!added)
                skb_trim(skb, skb_len);
        else
@@ -1366,10 +1372,11 @@ static void ieee80211_assoc_add_ml_elem(struct ieee80211_sub_if_data *sdata,
                ieee80211_add_non_inheritance_elem(skb, outer_present_elems,
                                                   link_present_elems);
 
-               ieee80211_fragment_element(skb, subelem_len);
+               ieee80211_fragment_element(skb, subelem_len,
+                                          IEEE80211_MLE_SUBELEM_FRAGMENT);
        }
 
-       ieee80211_fragment_element(skb, ml_elem_len);
+       ieee80211_fragment_element(skb, ml_elem_len, WLAN_EID_FRAGMENT);
 }
 
 static int ieee80211_send_assoc(struct ieee80211_sub_if_data *sdata)
index 58222c0..fc6e130 100644 (file)
@@ -2110,7 +2110,7 @@ ieee80211_rx_h_decrypt(struct ieee80211_rx_data *rx)
        /* either the frame has been decrypted or will be dropped */
        status->flag |= RX_FLAG_DECRYPTED;
 
-       if (unlikely(ieee80211_is_beacon(fc) && result == RX_DROP_UNUSABLE &&
+       if (unlikely(ieee80211_is_beacon(fc) && (result & RX_DROP_UNUSABLE) &&
                     rx->sdata->dev))
                cfg80211_rx_unprot_mlme_mgmt(rx->sdata->dev,
                                             skb->data, skb->len);
@@ -4965,7 +4965,9 @@ static bool ieee80211_prepare_and_rx_handle(struct ieee80211_rx_data *rx,
        }
 
        if (unlikely(rx->sta && rx->sta->sta.mlo) &&
-           is_unicast_ether_addr(hdr->addr1)) {
+           is_unicast_ether_addr(hdr->addr1) &&
+           !ieee80211_is_probe_resp(hdr->frame_control) &&
+           !ieee80211_is_beacon(hdr->frame_control)) {
                /* translate to MLD addresses */
                if (ether_addr_equal(link->conf->addr, hdr->addr1))
                        ether_addr_copy(hdr->addr1, rx->sdata->vif.addr);
index de5d69f..db0d013 100644 (file)
@@ -67,7 +67,7 @@
                        __entry->min_freq_offset = (c)->chan ? (c)->chan->freq_offset : 0;      \
                        __entry->min_chan_width = (c)->width;                           \
                        __entry->min_center_freq1 = (c)->center_freq1;                  \
-                       __entry->freq1_offset = (c)->freq1_offset;                      \
+                       __entry->min_freq1_offset = (c)->freq1_offset;                  \
                        __entry->min_center_freq2 = (c)->center_freq2;
 #define MIN_CHANDEF_PR_FMT     " min_control:%d.%03d MHz min_width:%d min_center: %d.%03d/%d MHz"
 #define MIN_CHANDEF_PR_ARG     __entry->min_control_freq, __entry->min_freq_offset,    \
index 1a33274..13b522d 100644 (file)
@@ -3791,6 +3791,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
        ieee80211_tx_result r;
        struct ieee80211_vif *vif = txq->vif;
        int q = vif->hw_queue[txq->ac];
+       unsigned long flags;
        bool q_stopped;
 
        WARN_ON_ONCE(softirq_count() == 0);
@@ -3799,9 +3800,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
                return NULL;
 
 begin:
-       spin_lock(&local->queue_stop_reason_lock);
+       spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
        q_stopped = local->queue_stop_reasons[q];
-       spin_unlock(&local->queue_stop_reason_lock);
+       spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);
 
        if (unlikely(q_stopped)) {
                /* mark for waking later */
@@ -4444,7 +4445,7 @@ static void ieee80211_mlo_multicast_tx(struct net_device *dev,
                                       struct sk_buff *skb)
 {
        struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
-       unsigned long links = sdata->vif.valid_links;
+       unsigned long links = sdata->vif.active_links;
        unsigned int link;
        u32 ctrl_flags = IEEE80211_TX_CTRL_MCAST_MLO_FIRST_TX;
 
@@ -5527,7 +5528,7 @@ ieee80211_beacon_get_template_ema_list(struct ieee80211_hw *hw,
 {
        struct ieee80211_ema_beacons *ema_beacons = NULL;
 
-       WARN_ON(__ieee80211_beacon_get(hw, vif, NULL, false, link_id, 0,
+       WARN_ON(__ieee80211_beacon_get(hw, vif, NULL, true, link_id, 0,
                                       &ema_beacons));
 
        return ema_beacons;
@@ -6039,7 +6040,7 @@ void __ieee80211_tx_skb_tid_band(struct ieee80211_sub_if_data *sdata,
                rcu_read_unlock();
 
                if (WARN_ON_ONCE(link == ARRAY_SIZE(sdata->vif.link_conf)))
-                       link = ffs(sdata->vif.valid_links) - 1;
+                       link = ffs(sdata->vif.active_links) - 1;
        }
 
        IEEE80211_SKB_CB(skb)->control.flags |=
@@ -6075,7 +6076,7 @@ void ieee80211_tx_skb_tid(struct ieee80211_sub_if_data *sdata,
                band = chanctx_conf->def.chan->band;
        } else {
                WARN_ON(link_id >= 0 &&
-                       !(sdata->vif.valid_links & BIT(link_id)));
+                       !(sdata->vif.active_links & BIT(link_id)));
                /* MLD transmissions must not rely on the band */
                band = 0;
        }
index 1527d6a..3bd07a0 100644 (file)
@@ -3015,7 +3015,7 @@ void ieee80211_recalc_min_chandef(struct ieee80211_sub_if_data *sdata,
 
                chanctx = container_of(chanctx_conf, struct ieee80211_chanctx,
                                       conf);
-               ieee80211_recalc_chanctx_min_def(local, chanctx);
+               ieee80211_recalc_chanctx_min_def(local, chanctx, NULL);
        }
  unlock:
        mutex_unlock(&local->chanctx_mtx);
@@ -5049,7 +5049,7 @@ u8 *ieee80211_ie_build_eht_cap(u8 *pos,
        return pos;
 }
 
-void ieee80211_fragment_element(struct sk_buff *skb, u8 *len_pos)
+void ieee80211_fragment_element(struct sk_buff *skb, u8 *len_pos, u8 frag_id)
 {
        unsigned int elem_len;
 
@@ -5069,7 +5069,7 @@ void ieee80211_fragment_element(struct sk_buff *skb, u8 *len_pos)
                memmove(len_pos + 255 + 3, len_pos + 255 + 1, elem_len);
                /* place the fragment ID */
                len_pos += 255 + 1;
-               *len_pos = WLAN_EID_FRAGMENT;
+               *len_pos = frag_id;
                /* and point to fragment length to update later */
                len_pos++;
        }
index 689396d..1574ecc 100644 (file)
@@ -14,7 +14,7 @@
 
 #define MAXNAME                32
 #define LOCAL_ENTRY    __array(char, wpan_phy_name, MAXNAME)
-#define LOCAL_ASSIGN   strlcpy(__entry->wpan_phy_name, \
+#define LOCAL_ASSIGN   strscpy(__entry->wpan_phy_name, \
                                wpan_phy_name(local->hw.phy), MAXNAME)
 #define LOCAL_PR_FMT   "%s"
 #define LOCAL_PR_ARG   __entry->wpan_phy_name
index 78c9245..76612bc 100644 (file)
@@ -87,8 +87,15 @@ bool mptcp_pm_allow_new_subflow(struct mptcp_sock *msk)
        unsigned int subflows_max;
        int ret = 0;
 
-       if (mptcp_pm_is_userspace(msk))
-               return mptcp_userspace_pm_active(msk);
+       if (mptcp_pm_is_userspace(msk)) {
+               if (mptcp_userspace_pm_active(msk)) {
+                       spin_lock_bh(&pm->lock);
+                       pm->subflows++;
+                       spin_unlock_bh(&pm->lock);
+                       return true;
+               }
+               return false;
+       }
 
        subflows_max = mptcp_pm_get_subflows_max(msk);
 
@@ -181,8 +188,16 @@ void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct sock *ssk,
        struct mptcp_pm_data *pm = &msk->pm;
        bool update_subflows;
 
-       update_subflows = (subflow->request_join || subflow->mp_join) &&
-                         mptcp_pm_is_kernel(msk);
+       update_subflows = subflow->request_join || subflow->mp_join;
+       if (mptcp_pm_is_userspace(msk)) {
+               if (update_subflows) {
+                       spin_lock_bh(&pm->lock);
+                       pm->subflows--;
+                       spin_unlock_bh(&pm->lock);
+               }
+               return;
+       }
+
        if (!READ_ONCE(pm->work_pending) && !update_subflows)
                return;
 
index bc343da..1224dfc 100644 (file)
@@ -1047,6 +1047,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
        if (err)
                return err;
 
+       inet_sk_state_store(newsk, TCP_LISTEN);
        err = kernel_listen(ssock, backlog);
        if (err)
                return err;
@@ -1558,6 +1559,24 @@ static int mptcp_nl_cmd_del_addr(struct sk_buff *skb, struct genl_info *info)
        return ret;
 }
 
+void mptcp_pm_remove_addrs(struct mptcp_sock *msk, struct list_head *rm_list)
+{
+       struct mptcp_rm_list alist = { .nr = 0 };
+       struct mptcp_pm_addr_entry *entry;
+
+       list_for_each_entry(entry, rm_list, list) {
+               remove_anno_list_by_saddr(msk, &entry->addr);
+               if (alist.nr < MPTCP_RM_IDS_MAX)
+                       alist.ids[alist.nr++] = entry->addr.id;
+       }
+
+       if (alist.nr) {
+               spin_lock_bh(&msk->pm.lock);
+               mptcp_pm_remove_addr(msk, &alist);
+               spin_unlock_bh(&msk->pm.lock);
+       }
+}
+
 void mptcp_pm_remove_addrs_and_subflows(struct mptcp_sock *msk,
                                        struct list_head *rm_list)
 {
index 27a2758..b06aa58 100644 (file)
@@ -69,6 +69,7 @@ static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
                                                        MPTCP_PM_MAX_ADDR_ID + 1,
                                                        1);
                list_add_tail_rcu(&e->list, &msk->pm.userspace_pm_local_addr_list);
+               msk->pm.local_addr_used++;
                ret = e->addr.id;
        } else if (match) {
                ret = entry->addr.id;
@@ -79,6 +80,31 @@ append_err:
        return ret;
 }
 
+/* If the subflow is closed from the other peer (not via a
+ * subflow destroy command then), we want to keep the entry
+ * not to assign the same ID to another address and to be
+ * able to send RM_ADDR after the removal of the subflow.
+ */
+static int mptcp_userspace_pm_delete_local_addr(struct mptcp_sock *msk,
+                                               struct mptcp_pm_addr_entry *addr)
+{
+       struct mptcp_pm_addr_entry *entry, *tmp;
+
+       list_for_each_entry_safe(entry, tmp, &msk->pm.userspace_pm_local_addr_list, list) {
+               if (mptcp_addresses_equal(&entry->addr, &addr->addr, false)) {
+                       /* TODO: a refcount is needed because the entry can
+                        * be used multiple times (e.g. fullmesh mode).
+                        */
+                       list_del_rcu(&entry->list);
+                       kfree(entry);
+                       msk->pm.local_addr_used--;
+                       return 0;
+               }
+       }
+
+       return -EINVAL;
+}
+
 int mptcp_userspace_pm_get_flags_and_ifindex_by_id(struct mptcp_sock *msk,
                                                   unsigned int id,
                                                   u8 *flags, int *ifindex)
@@ -171,6 +197,7 @@ int mptcp_nl_cmd_announce(struct sk_buff *skb, struct genl_info *info)
        spin_lock_bh(&msk->pm.lock);
 
        if (mptcp_pm_alloc_anno_list(msk, &addr_val)) {
+               msk->pm.add_addr_signaled++;
                mptcp_pm_announce_addr(msk, &addr_val.addr, false);
                mptcp_pm_nl_addr_send_ack(msk);
        }
@@ -232,7 +259,7 @@ int mptcp_nl_cmd_remove(struct sk_buff *skb, struct genl_info *info)
 
        list_move(&match->list, &free_list);
 
-       mptcp_pm_remove_addrs_and_subflows(msk, &free_list);
+       mptcp_pm_remove_addrs(msk, &free_list);
 
        release_sock((struct sock *)msk);
 
@@ -251,6 +278,7 @@ int mptcp_nl_cmd_sf_create(struct sk_buff *skb, struct genl_info *info)
        struct nlattr *raddr = info->attrs[MPTCP_PM_ATTR_ADDR_REMOTE];
        struct nlattr *token = info->attrs[MPTCP_PM_ATTR_TOKEN];
        struct nlattr *laddr = info->attrs[MPTCP_PM_ATTR_ADDR];
+       struct mptcp_pm_addr_entry local = { 0 };
        struct mptcp_addr_info addr_r;
        struct mptcp_addr_info addr_l;
        struct mptcp_sock *msk;
@@ -302,12 +330,26 @@ int mptcp_nl_cmd_sf_create(struct sk_buff *skb, struct genl_info *info)
                goto create_err;
        }
 
+       local.addr = addr_l;
+       err = mptcp_userspace_pm_append_new_local_addr(msk, &local);
+       if (err < 0) {
+               GENL_SET_ERR_MSG(info, "did not match address and id");
+               goto create_err;
+       }
+
        lock_sock(sk);
 
        err = __mptcp_subflow_connect(sk, &addr_l, &addr_r);
 
        release_sock(sk);
 
+       spin_lock_bh(&msk->pm.lock);
+       if (err)
+               mptcp_userspace_pm_delete_local_addr(msk, &local);
+       else
+               msk->pm.subflows++;
+       spin_unlock_bh(&msk->pm.lock);
+
  create_err:
        sock_put((struct sock *)msk);
        return err;
@@ -420,7 +462,11 @@ int mptcp_nl_cmd_sf_destroy(struct sk_buff *skb, struct genl_info *info)
        ssk = mptcp_nl_find_ssk(msk, &addr_l, &addr_r);
        if (ssk) {
                struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+               struct mptcp_pm_addr_entry entry = { .addr = addr_l };
 
+               spin_lock_bh(&msk->pm.lock);
+               mptcp_userspace_pm_delete_local_addr(msk, &entry);
+               spin_unlock_bh(&msk->pm.lock);
                mptcp_subflow_shutdown(sk, ssk, RCV_SHUTDOWN | SEND_SHUTDOWN);
                mptcp_close_ssk(sk, ssk, subflow);
                MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RMSUBFLOW);
index 08dc53f..a6c7f2d 100644 (file)
@@ -44,7 +44,7 @@ enum {
 static struct percpu_counter mptcp_sockets_allocated ____cacheline_aligned_in_smp;
 
 static void __mptcp_destroy_sock(struct sock *sk);
-static void __mptcp_check_send_data_fin(struct sock *sk);
+static void mptcp_check_send_data_fin(struct sock *sk);
 
 DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
 static struct net_device mptcp_napi_dev;
@@ -90,8 +90,8 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
        if (err)
                return err;
 
-       msk->first = ssock->sk;
-       msk->subflow = ssock;
+       WRITE_ONCE(msk->first, ssock->sk);
+       WRITE_ONCE(msk->subflow, ssock);
        subflow = mptcp_subflow_ctx(ssock->sk);
        list_add(&subflow->node, &msk->conn_list);
        sock_hold(ssock->sk);
@@ -424,8 +424,7 @@ static bool mptcp_pending_data_fin_ack(struct sock *sk)
 {
        struct mptcp_sock *msk = mptcp_sk(sk);
 
-       return !__mptcp_check_fallback(msk) &&
-              ((1 << sk->sk_state) &
+       return ((1 << sk->sk_state) &
                (TCPF_FIN_WAIT1 | TCPF_CLOSING | TCPF_LAST_ACK)) &&
               msk->write_seq == READ_ONCE(msk->snd_una);
 }
@@ -583,9 +582,6 @@ static bool mptcp_check_data_fin(struct sock *sk)
        u64 rcv_data_fin_seq;
        bool ret = false;
 
-       if (__mptcp_check_fallback(msk))
-               return ret;
-
        /* Need to ack a DATA_FIN received from a peer while this side
         * of the connection is in ESTABLISHED, FIN_WAIT1, or FIN_WAIT2.
         * msk->rcv_data_fin was set when parsing the incoming options
@@ -603,7 +599,7 @@ static bool mptcp_check_data_fin(struct sock *sk)
                WRITE_ONCE(msk->ack_seq, msk->ack_seq + 1);
                WRITE_ONCE(msk->rcv_data_fin, 0);
 
-               sk->sk_shutdown |= RCV_SHUTDOWN;
+               WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | RCV_SHUTDOWN);
                smp_mb__before_atomic(); /* SHUTDOWN must be visible first */
 
                switch (sk->sk_state) {
@@ -623,7 +619,8 @@ static bool mptcp_check_data_fin(struct sock *sk)
                }
 
                ret = true;
-               mptcp_send_ack(msk);
+               if (!__mptcp_check_fallback(msk))
+                       mptcp_send_ack(msk);
                mptcp_close_wake_up(sk);
        }
        return ret;
@@ -825,6 +822,13 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
        mptcp_data_unlock(sk);
 }
 
+static void mptcp_subflow_joined(struct mptcp_sock *msk, struct sock *ssk)
+{
+       mptcp_subflow_ctx(ssk)->map_seq = READ_ONCE(msk->ack_seq);
+       WRITE_ONCE(msk->allow_infinite_fallback, false);
+       mptcp_event(MPTCP_EVENT_SUB_ESTABLISHED, msk, ssk, GFP_ATOMIC);
+}
+
 static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
 {
        struct sock *sk = (struct sock *)msk;
@@ -839,15 +843,16 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
                mptcp_sock_graft(ssk, sk->sk_socket);
 
        mptcp_sockopt_sync_locked(msk, ssk);
+       mptcp_subflow_joined(msk, ssk);
        return true;
 }
 
-static void __mptcp_flush_join_list(struct sock *sk)
+static void __mptcp_flush_join_list(struct sock *sk, struct list_head *join_list)
 {
        struct mptcp_subflow_context *tmp, *subflow;
        struct mptcp_sock *msk = mptcp_sk(sk);
 
-       list_for_each_entry_safe(subflow, tmp, &msk->join_list, node) {
+       list_for_each_entry_safe(subflow, tmp, join_list, node) {
                struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
                bool slow = lock_sock_fast(ssk);
 
@@ -889,49 +894,6 @@ bool mptcp_schedule_work(struct sock *sk)
        return false;
 }
 
-void mptcp_subflow_eof(struct sock *sk)
-{
-       if (!test_and_set_bit(MPTCP_WORK_EOF, &mptcp_sk(sk)->flags))
-               mptcp_schedule_work(sk);
-}
-
-static void mptcp_check_for_eof(struct mptcp_sock *msk)
-{
-       struct mptcp_subflow_context *subflow;
-       struct sock *sk = (struct sock *)msk;
-       int receivers = 0;
-
-       mptcp_for_each_subflow(msk, subflow)
-               receivers += !subflow->rx_eof;
-       if (receivers)
-               return;
-
-       if (!(sk->sk_shutdown & RCV_SHUTDOWN)) {
-               /* hopefully temporary hack: propagate shutdown status
-                * to msk, when all subflows agree on it
-                */
-               sk->sk_shutdown |= RCV_SHUTDOWN;
-
-               smp_mb__before_atomic(); /* SHUTDOWN must be visible first */
-               sk->sk_data_ready(sk);
-       }
-
-       switch (sk->sk_state) {
-       case TCP_ESTABLISHED:
-               inet_sk_state_store(sk, TCP_CLOSE_WAIT);
-               break;
-       case TCP_FIN_WAIT1:
-               inet_sk_state_store(sk, TCP_CLOSING);
-               break;
-       case TCP_FIN_WAIT2:
-               inet_sk_state_store(sk, TCP_CLOSE);
-               break;
-       default:
-               return;
-       }
-       mptcp_close_wake_up(sk);
-}
-
 static struct sock *mptcp_subflow_recv_lookup(const struct mptcp_sock *msk)
 {
        struct mptcp_subflow_context *subflow;
@@ -1601,7 +1563,7 @@ out:
        if (!mptcp_timer_pending(sk))
                mptcp_reset_timer(sk);
        if (do_check_data_fin)
-               __mptcp_check_send_data_fin(sk);
+               mptcp_check_send_data_fin(sk);
 }
 
 static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool first)
@@ -1702,7 +1664,6 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
 
        lock_sock(ssk);
        msg->msg_flags |= MSG_DONTWAIT;
-       msk->connect_flags = O_NONBLOCK;
        msk->fastopening = 1;
        ret = tcp_sendmsg_fastopen(ssk, msg, copied_syn, len, NULL);
        msk->fastopening = 0;
@@ -1720,7 +1681,13 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
                if (ret && ret != -EINPROGRESS && ret != -ERESTARTSYS && ret != -EINTR)
                        *copied_syn = 0;
        } else if (ret && ret != -EINPROGRESS) {
-               mptcp_disconnect(sk, 0);
+               /* The disconnect() op called by tcp_sendmsg_fastopen()/
+                * __inet_stream_connect() can fail, due to looking check,
+                * see mptcp_disconnect().
+                * Attempt it again outside the problematic scope.
+                */
+               if (!mptcp_disconnect(sk, 0))
+                       sk->sk_socket->state = SS_UNCONNECTED;
        }
        inet_sk(sk)->defer_connect = 0;
 
@@ -2151,9 +2118,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
                                break;
                        }
 
-                       if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags))
-                               mptcp_check_for_eof(msk);
-
                        if (sk->sk_shutdown & RCV_SHUTDOWN) {
                                /* race breaker: the shutdown could be after the
                                 * previous receive queue check
@@ -2283,7 +2247,7 @@ static void mptcp_dispose_initial_subflow(struct mptcp_sock *msk)
 {
        if (msk->subflow) {
                iput(SOCK_INODE(msk->subflow));
-               msk->subflow = NULL;
+               WRITE_ONCE(msk->subflow, NULL);
        }
 }
 
@@ -2382,7 +2346,10 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 
        need_push = (flags & MPTCP_CF_PUSH) && __mptcp_retransmit_pending_data(sk);
        if (!dispose_it) {
-               tcp_disconnect(ssk, 0);
+               /* The MPTCP code never wait on the subflow sockets, TCP-level
+                * disconnect should never fail
+                */
+               WARN_ON_ONCE(tcp_disconnect(ssk, 0));
                msk->subflow->state = SS_UNCONNECTED;
                mptcp_subflow_ctx_reset(subflow);
                release_sock(ssk);
@@ -2401,13 +2368,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
                kfree_rcu(subflow, rcu);
        } else {
                /* otherwise tcp will dispose of the ssk and subflow ctx */
-               if (ssk->sk_state == TCP_LISTEN) {
-                       tcp_set_state(ssk, TCP_CLOSE);
-                       mptcp_subflow_queue_clean(sk, ssk);
-                       inet_csk_listen_stop(ssk);
-                       mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CLOSED);
-               }
-
                __tcp_close(ssk, 0);
 
                /* close acquired an extra ref */
@@ -2420,7 +2380,7 @@ out_release:
        sock_put(ssk);
 
        if (ssk == msk->first)
-               msk->first = NULL;
+               WRITE_ONCE(msk->first, NULL);
 
 out:
        if (ssk == msk->last_snd)
@@ -2527,7 +2487,7 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
        }
 
        inet_sk_state_store(sk, TCP_CLOSE);
-       sk->sk_shutdown = SHUTDOWN_MASK;
+       WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
        smp_mb__before_atomic(); /* SHUTDOWN must be visible first */
        set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags);
 
@@ -2664,16 +2624,12 @@ static void mptcp_worker(struct work_struct *work)
        if (unlikely((1 << state) & (TCPF_CLOSE | TCPF_LISTEN)))
                goto unlock;
 
-       mptcp_check_data_fin_ack(sk);
-
        mptcp_check_fastclose(msk);
 
        mptcp_pm_nl_work(msk);
 
-       if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags))
-               mptcp_check_for_eof(msk);
-
-       __mptcp_check_send_data_fin(sk);
+       mptcp_check_send_data_fin(sk);
+       mptcp_check_data_fin_ack(sk);
        mptcp_check_data_fin(sk);
 
        if (test_and_clear_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags))
@@ -2721,7 +2677,7 @@ static int __mptcp_init_sock(struct sock *sk)
        WRITE_ONCE(msk->rmem_released, 0);
        msk->timer_ival = TCP_RTO_MIN;
 
-       msk->first = NULL;
+       WRITE_ONCE(msk->first, NULL);
        inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;
        WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk)));
        WRITE_ONCE(msk->allow_infinite_fallback, true);
@@ -2805,13 +2761,19 @@ void mptcp_subflow_shutdown(struct sock *sk, struct sock *ssk, int how)
                        break;
                fallthrough;
        case TCP_SYN_SENT:
-               tcp_disconnect(ssk, O_NONBLOCK);
+               WARN_ON_ONCE(tcp_disconnect(ssk, O_NONBLOCK));
                break;
        default:
                if (__mptcp_check_fallback(mptcp_sk(sk))) {
                        pr_debug("Fallback");
                        ssk->sk_shutdown |= how;
                        tcp_shutdown(ssk, how);
+
+                       /* simulate the data_fin ack reception to let the state
+                        * machine move forward
+                        */
+                       WRITE_ONCE(mptcp_sk(sk)->snd_una, mptcp_sk(sk)->snd_nxt);
+                       mptcp_schedule_work(sk);
                } else {
                        pr_debug("Sending DATA_FIN on subflow %p", ssk);
                        tcp_send_ack(ssk);
@@ -2851,7 +2813,7 @@ static int mptcp_close_state(struct sock *sk)
        return next & TCP_ACTION_FIN;
 }
 
-static void __mptcp_check_send_data_fin(struct sock *sk)
+static void mptcp_check_send_data_fin(struct sock *sk)
 {
        struct mptcp_subflow_context *subflow;
        struct mptcp_sock *msk = mptcp_sk(sk);
@@ -2869,19 +2831,6 @@ static void __mptcp_check_send_data_fin(struct sock *sk)
 
        WRITE_ONCE(msk->snd_nxt, msk->write_seq);
 
-       /* fallback socket will not get data_fin/ack, can move to the next
-        * state now
-        */
-       if (__mptcp_check_fallback(msk)) {
-               WRITE_ONCE(msk->snd_una, msk->write_seq);
-               if ((1 << sk->sk_state) & (TCPF_CLOSING | TCPF_LAST_ACK)) {
-                       inet_sk_state_store(sk, TCP_CLOSE);
-                       mptcp_close_wake_up(sk);
-               } else if (sk->sk_state == TCP_FIN_WAIT1) {
-                       inet_sk_state_store(sk, TCP_FIN_WAIT2);
-               }
-       }
-
        mptcp_for_each_subflow(msk, subflow) {
                struct sock *tcp_sk = mptcp_subflow_tcp_sock(subflow);
 
@@ -2901,7 +2850,7 @@ static void __mptcp_wr_shutdown(struct sock *sk)
        WRITE_ONCE(msk->write_seq, msk->write_seq + 1);
        WRITE_ONCE(msk->snd_data_fin_enable, 1);
 
-       __mptcp_check_send_data_fin(sk);
+       mptcp_check_send_data_fin(sk);
 }
 
 static void __mptcp_destroy_sock(struct sock *sk)
@@ -2946,10 +2895,24 @@ static __poll_t mptcp_check_readable(struct mptcp_sock *msk)
        return EPOLLIN | EPOLLRDNORM;
 }
 
-static void mptcp_listen_inuse_dec(struct sock *sk)
+static void mptcp_check_listen_stop(struct sock *sk)
 {
-       if (inet_sk_state_load(sk) == TCP_LISTEN)
-               sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
+       struct sock *ssk;
+
+       if (inet_sk_state_load(sk) != TCP_LISTEN)
+               return;
+
+       sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
+       ssk = mptcp_sk(sk)->first;
+       if (WARN_ON_ONCE(!ssk || inet_sk_state_load(ssk) != TCP_LISTEN))
+               return;
+
+       lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
+       mptcp_subflow_queue_clean(sk, ssk);
+       inet_csk_listen_stop(ssk);
+       mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CLOSED);
+       tcp_set_state(ssk, TCP_CLOSE);
+       release_sock(ssk);
 }
 
 bool __mptcp_close(struct sock *sk, long timeout)
@@ -2959,10 +2922,10 @@ bool __mptcp_close(struct sock *sk, long timeout)
        bool do_cancel_work = false;
        int subflows_alive = 0;
 
-       sk->sk_shutdown = SHUTDOWN_MASK;
+       WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
 
        if ((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) {
-               mptcp_listen_inuse_dec(sk);
+               mptcp_check_listen_stop(sk);
                inet_sk_state_store(sk, TCP_CLOSE);
                goto cleanup;
        }
@@ -3039,7 +3002,7 @@ static void mptcp_close(struct sock *sk, long timeout)
        sock_put(sk);
 }
 
-void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
+static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
 {
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
        const struct ipv6_pinfo *ssk6 = inet6_sk(ssk);
@@ -3066,15 +3029,20 @@ static int mptcp_disconnect(struct sock *sk, int flags)
 {
        struct mptcp_sock *msk = mptcp_sk(sk);
 
+       /* Deny disconnect if other threads are blocked in sk_wait_event()
+        * or inet_wait_for_connect().
+        */
+       if (sk->sk_wait_pending)
+               return -EBUSY;
+
        /* We are on the fastopen error path. We can't call straight into the
         * subflows cleanup code due to lock nesting (we are already under
-        * msk->firstsocket lock). Do nothing and leave the cleanup to the
-        * caller.
+        * msk->firstsocket lock).
         */
        if (msk->fastopening)
-               return 0;
+               return -EBUSY;
 
-       mptcp_listen_inuse_dec(sk);
+       mptcp_check_listen_stop(sk);
        inet_sk_state_store(sk, TCP_CLOSE);
 
        mptcp_stop_timer(sk);
@@ -3102,7 +3070,7 @@ static int mptcp_disconnect(struct sock *sk, int flags)
        mptcp_pm_data_reset(msk);
        mptcp_ca_reset(sk);
 
-       sk->sk_shutdown = 0;
+       WRITE_ONCE(sk->sk_shutdown, 0);
        sk_error_report(sk);
        return 0;
 }
@@ -3116,9 +3084,10 @@ static struct ipv6_pinfo *mptcp_inet6_sk(const struct sock *sk)
 }
 #endif
 
-struct sock *mptcp_sk_clone(const struct sock *sk,
-                           const struct mptcp_options_received *mp_opt,
-                           struct request_sock *req)
+struct sock *mptcp_sk_clone_init(const struct sock *sk,
+                                const struct mptcp_options_received *mp_opt,
+                                struct sock *ssk,
+                                struct request_sock *req)
 {
        struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req);
        struct sock *nsk = sk_clone_lock(sk, GFP_ATOMIC);
@@ -3132,12 +3101,13 @@ struct sock *mptcp_sk_clone(const struct sock *sk,
                inet_sk(nsk)->pinet6 = mptcp_inet6_sk(nsk);
 #endif
 
+       nsk->sk_wait_pending = 0;
        __mptcp_init_sock(nsk);
 
        msk = mptcp_sk(nsk);
        msk->local_key = subflow_req->local_key;
        msk->token = subflow_req->token;
-       msk->subflow = NULL;
+       WRITE_ONCE(msk->subflow, NULL);
        msk->in_accept_queue = 1;
        WRITE_ONCE(msk->fully_established, false);
        if (mp_opt->suboptions & OPTION_MPTCP_CSUMREQD)
@@ -3150,10 +3120,30 @@ struct sock *mptcp_sk_clone(const struct sock *sk,
        msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq;
 
        sock_reset_flag(nsk, SOCK_RCU_FREE);
-       /* will be fully established after successful MPC subflow creation */
-       inet_sk_state_store(nsk, TCP_SYN_RECV);
-
        security_inet_csk_clone(nsk, req);
+
+       /* this can't race with mptcp_close(), as the msk is
+        * not yet exposted to user-space
+        */
+       inet_sk_state_store(nsk, TCP_ESTABLISHED);
+
+       /* The msk maintain a ref to each subflow in the connections list */
+       WRITE_ONCE(msk->first, ssk);
+       list_add(&mptcp_subflow_ctx(ssk)->node, &msk->conn_list);
+       sock_hold(ssk);
+
+       /* new mpc subflow takes ownership of the newly
+        * created mptcp socket
+        */
+       mptcp_token_accept(subflow_req, msk);
+
+       /* set msk addresses early to ensure mptcp_pm_get_local_id()
+        * uses the correct data
+        */
+       mptcp_copy_inaddrs(nsk, ssk);
+       mptcp_propagate_sndbuf(nsk, ssk);
+
+       mptcp_rcv_space_init(msk, ssk);
        bh_unlock_sock(nsk);
 
        /* note: the newly allocated socket refcount is 2 now */
@@ -3185,7 +3175,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
        struct socket *listener;
        struct sock *newsk;
 
-       listener = msk->subflow;
+       listener = READ_ONCE(msk->subflow);
        if (WARN_ON_ONCE(!listener)) {
                *err = -EINVAL;
                return NULL;
@@ -3299,9 +3289,14 @@ static void mptcp_release_cb(struct sock *sk)
        for (;;) {
                unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED) |
                                      msk->push_pending;
+               struct list_head join_list;
+
                if (!flags)
                        break;
 
+               INIT_LIST_HEAD(&join_list);
+               list_splice_init(&msk->join_list, &join_list);
+
                /* the following actions acquire the subflow socket lock
                 *
                 * 1) can't be invoked in atomic scope
@@ -3312,8 +3307,9 @@ static void mptcp_release_cb(struct sock *sk)
                msk->push_pending = 0;
                msk->cb_flags &= ~flags;
                spin_unlock_bh(&sk->sk_lock.slock);
+
                if (flags & BIT(MPTCP_FLUSH_JOIN_LIST))
-                       __mptcp_flush_join_list(sk);
+                       __mptcp_flush_join_list(sk, &join_list);
                if (flags & BIT(MPTCP_PUSH_PENDING))
                        __mptcp_push_pending(sk, 0);
                if (flags & BIT(MPTCP_RETRANSMIT))
@@ -3465,14 +3461,16 @@ bool mptcp_finish_join(struct sock *ssk)
                return false;
        }
 
-       if (!list_empty(&subflow->node))
-               goto out;
+       /* active subflow, already present inside the conn_list */
+       if (!list_empty(&subflow->node)) {
+               mptcp_subflow_joined(msk, ssk);
+               return true;
+       }
 
        if (!mptcp_pm_allow_new_subflow(msk))
                goto err_prohibited;
 
-       /* active connections are already on conn_list.
-        * If we can't acquire msk socket lock here, let the release callback
+       /* If we can't acquire msk socket lock here, let the release callback
         * handle it
         */
        mptcp_data_lock(parent);
@@ -3495,11 +3493,6 @@ err_prohibited:
                return false;
        }
 
-       subflow->map_seq = READ_ONCE(msk->ack_seq);
-       WRITE_ONCE(msk->allow_infinite_fallback, false);
-
-out:
-       mptcp_event(MPTCP_EVENT_SUB_ESTABLISHED, msk, ssk, GFP_ATOMIC);
        return true;
 }
 
@@ -3617,9 +3610,9 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
         * acquired the subflow socket lock, too.
         */
        if (msk->fastopening)
-               err = __inet_stream_connect(ssock, uaddr, addr_len, msk->connect_flags, 1);
+               err = __inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK, 1);
        else
-               err = inet_stream_connect(ssock, uaddr, addr_len, msk->connect_flags);
+               err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
        inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
 
        /* on successful connect, the msk state will be moved to established by
@@ -3632,12 +3625,10 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 
        mptcp_copy_inaddrs(sk, ssock->sk);
 
-       /* unblocking connect, mptcp-level inet_stream_connect will error out
-        * without changing the socket state, update it here.
+       /* silence EINPROGRESS and let the caller inet_stream_connect
+        * handle the connection in progress
         */
-       if (err == -EINPROGRESS)
-               sk->sk_socket->state = ssock->state;
-       return err;
+       return 0;
 }
 
 static struct proto mptcp_prot = {
@@ -3696,18 +3687,6 @@ unlock:
        return err;
 }
 
-static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
-                               int addr_len, int flags)
-{
-       int ret;
-
-       lock_sock(sock->sk);
-       mptcp_sk(sock->sk)->connect_flags = flags;
-       ret = __inet_stream_connect(sock, uaddr, addr_len, flags, 0);
-       release_sock(sock->sk);
-       return ret;
-}
-
 static int mptcp_listen(struct socket *sock, int backlog)
 {
        struct mptcp_sock *msk = mptcp_sk(sock->sk);
@@ -3751,10 +3730,10 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 
        pr_debug("msk=%p", msk);
 
-       /* buggy applications can call accept on socket states other then LISTEN
+       /* Buggy applications can call accept on socket states other then LISTEN
         * but no need to allocate the first subflow just to error out.
         */
-       ssock = msk->subflow;
+       ssock = READ_ONCE(msk->subflow);
        if (!ssock)
                return -EINVAL;
 
@@ -3800,9 +3779,6 @@ static __poll_t mptcp_check_writeable(struct mptcp_sock *msk)
 {
        struct sock *sk = (struct sock *)msk;
 
-       if (unlikely(sk->sk_shutdown & SEND_SHUTDOWN))
-               return EPOLLOUT | EPOLLWRNORM;
-
        if (sk_stream_is_writeable(sk))
                return EPOLLOUT | EPOLLWRNORM;
 
@@ -3820,6 +3796,7 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
        struct sock *sk = sock->sk;
        struct mptcp_sock *msk;
        __poll_t mask = 0;
+       u8 shutdown;
        int state;
 
        msk = mptcp_sk(sk);
@@ -3828,23 +3805,30 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
        state = inet_sk_state_load(sk);
        pr_debug("msk=%p state=%d flags=%lx", msk, state, msk->flags);
        if (state == TCP_LISTEN) {
-               if (WARN_ON_ONCE(!msk->subflow || !msk->subflow->sk))
+               struct socket *ssock = READ_ONCE(msk->subflow);
+
+               if (WARN_ON_ONCE(!ssock || !ssock->sk))
                        return 0;
 
-               return inet_csk_listen_poll(msk->subflow->sk);
+               return inet_csk_listen_poll(ssock->sk);
        }
 
+       shutdown = READ_ONCE(sk->sk_shutdown);
+       if (shutdown == SHUTDOWN_MASK || state == TCP_CLOSE)
+               mask |= EPOLLHUP;
+       if (shutdown & RCV_SHUTDOWN)
+               mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
+
        if (state != TCP_SYN_SENT && state != TCP_SYN_RECV) {
                mask |= mptcp_check_readable(msk);
-               mask |= mptcp_check_writeable(msk);
+               if (shutdown & SEND_SHUTDOWN)
+                       mask |= EPOLLOUT | EPOLLWRNORM;
+               else
+                       mask |= mptcp_check_writeable(msk);
        } else if (state == TCP_SYN_SENT && inet_sk(sk)->defer_connect) {
                /* cf tcp_poll() note about TFO */
                mask |= EPOLLOUT | EPOLLWRNORM;
        }
-       if (sk->sk_shutdown == SHUTDOWN_MASK || state == TCP_CLOSE)
-               mask |= EPOLLHUP;
-       if (sk->sk_shutdown & RCV_SHUTDOWN)
-               mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
 
        /* This barrier is coupled with smp_wmb() in __mptcp_error_report() */
        smp_rmb();
@@ -3859,7 +3843,7 @@ static const struct proto_ops mptcp_stream_ops = {
        .owner             = THIS_MODULE,
        .release           = inet_release,
        .bind              = mptcp_bind,
-       .connect           = mptcp_stream_connect,
+       .connect           = inet_stream_connect,
        .socketpair        = sock_no_socketpair,
        .accept            = mptcp_stream_accept,
        .getname           = inet_getname,
@@ -3954,7 +3938,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
        .owner             = THIS_MODULE,
        .release           = inet6_release,
        .bind              = mptcp_bind,
-       .connect           = mptcp_stream_connect,
+       .connect           = inet_stream_connect,
        .socketpair        = sock_no_socketpair,
        .accept            = mptcp_stream_accept,
        .getname           = inet6_getname,
index 2d7b2c8..d3783a7 100644 (file)
 /* MPTCP socket atomic flags */
 #define MPTCP_NOSPACE          1
 #define MPTCP_WORK_RTX         2
-#define MPTCP_WORK_EOF         3
 #define MPTCP_FALLBACK_DONE    4
 #define MPTCP_WORK_CLOSE_SUBFLOW 5
 
@@ -297,7 +296,6 @@ struct mptcp_sock {
                        nodelay:1,
                        fastopening:1,
                        in_accept_queue:1;
-       int             connect_flags;
        struct work_struct work;
        struct sk_buff  *ooo_last_skb;
        struct rb_root  out_of_order_queue;
@@ -306,7 +304,11 @@ struct mptcp_sock {
        struct list_head rtx_queue;
        struct mptcp_data_frag *first_pending;
        struct list_head join_list;
-       struct socket   *subflow; /* outgoing connect/listener/!mp_capable */
+       struct socket   *subflow; /* outgoing connect/listener/!mp_capable
+                                  * The mptcp ops can safely dereference, using suitable
+                                  * ONCE annotation, the subflow outside the socket
+                                  * lock as such sock is freed after close().
+                                  */
        struct sock     *first;
        struct mptcp_pm_data    pm;
        struct {
@@ -473,14 +475,13 @@ struct mptcp_subflow_context {
                send_mp_fail : 1,
                send_fastclose : 1,
                send_infinite_map : 1,
-               rx_eof : 1,
                remote_key_valid : 1,        /* received the peer key from */
                disposable : 1,     /* ctx can be free at ulp release time */
                stale : 1,          /* unable to snd/rcv data, do not use for xmit */
                local_id_valid : 1, /* local_id is correctly initialized */
                valid_csum_seen : 1,        /* at least one csum validated */
                is_mptfo : 1,       /* subflow is doing TFO */
-               __unused : 8;
+               __unused : 9;
        enum mptcp_data_avail data_avail;
        u32     remote_nonce;
        u64     thmac;
@@ -613,7 +614,6 @@ int mptcp_is_checksum_enabled(const struct net *net);
 int mptcp_allow_join_id0(const struct net *net);
 unsigned int mptcp_stale_loss_cnt(const struct net *net);
 int mptcp_get_pm_type(const struct net *net);
-void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk);
 void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
                                     const struct mptcp_options_received *mp_opt);
 bool __mptcp_retransmit_pending_data(struct sock *sk);
@@ -683,9 +683,10 @@ void __init mptcp_proto_init(void);
 int __init mptcp_proto_v6_init(void);
 #endif
 
-struct sock *mptcp_sk_clone(const struct sock *sk,
-                           const struct mptcp_options_received *mp_opt,
-                           struct request_sock *req);
+struct sock *mptcp_sk_clone_init(const struct sock *sk,
+                                const struct mptcp_options_received *mp_opt,
+                                struct sock *ssk,
+                                struct request_sock *req);
 void mptcp_get_options(const struct sk_buff *skb,
                       struct mptcp_options_received *mp_opt);
 
@@ -717,7 +718,6 @@ static inline u64 mptcp_expand_seq(u64 old_seq, u64 cur_seq, bool use_64bit)
 void __mptcp_check_push(struct sock *sk, struct sock *ssk);
 void __mptcp_data_acked(struct sock *sk);
 void __mptcp_error_report(struct sock *sk);
-void mptcp_subflow_eof(struct sock *sk);
 bool mptcp_update_rcv_data_fin(struct mptcp_sock *msk, u64 data_fin_seq, bool use_64bit);
 static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
 {
@@ -829,6 +829,7 @@ int mptcp_pm_announce_addr(struct mptcp_sock *msk,
                           bool echo);
 int mptcp_pm_remove_addr(struct mptcp_sock *msk, const struct mptcp_rm_list *rm_list);
 int mptcp_pm_remove_subflow(struct mptcp_sock *msk, const struct mptcp_rm_list *rm_list);
+void mptcp_pm_remove_addrs(struct mptcp_sock *msk, struct list_head *rm_list);
 void mptcp_pm_remove_addrs_and_subflows(struct mptcp_sock *msk,
                                        struct list_head *rm_list);
 
index 76952cf..8ff5c9f 100644 (file)
@@ -815,38 +815,12 @@ create_child:
                ctx->setsockopt_seq = listener->setsockopt_seq;
 
                if (ctx->mp_capable) {
-                       ctx->conn = mptcp_sk_clone(listener->conn, &mp_opt, req);
+                       ctx->conn = mptcp_sk_clone_init(listener->conn, &mp_opt, child, req);
                        if (!ctx->conn)
                                goto fallback;
 
                        owner = mptcp_sk(ctx->conn);
-
-                       /* this can't race with mptcp_close(), as the msk is
-                        * not yet exposted to user-space
-                        */
-                       inet_sk_state_store(ctx->conn, TCP_ESTABLISHED);
-
-                       /* record the newly created socket as the first msk
-                        * subflow, but don't link it yet into conn_list
-                        */
-                       WRITE_ONCE(owner->first, child);
-
-                       /* new mpc subflow takes ownership of the newly
-                        * created mptcp socket
-                        */
-                       owner->setsockopt_seq = ctx->setsockopt_seq;
                        mptcp_pm_new_connection(owner, child, 1);
-                       mptcp_token_accept(subflow_req, owner);
-
-                       /* set msk addresses early to ensure mptcp_pm_get_local_id()
-                        * uses the correct data
-                        */
-                       mptcp_copy_inaddrs(ctx->conn, child);
-                       mptcp_propagate_sndbuf(ctx->conn, child);
-
-                       mptcp_rcv_space_init(owner, child);
-                       list_add(&ctx->node, &owner->conn_list);
-                       sock_hold(child);
 
                        /* with OoO packets we can reach here without ingress
                         * mpc option
@@ -1781,14 +1755,16 @@ static void subflow_state_change(struct sock *sk)
 {
        struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
        struct sock *parent = subflow->conn;
+       struct mptcp_sock *msk;
 
        __subflow_state_change(sk);
 
+       msk = mptcp_sk(parent);
        if (subflow_simultaneous_connect(sk)) {
                mptcp_propagate_sndbuf(parent, sk);
                mptcp_do_fallback(sk);
-               mptcp_rcv_space_init(mptcp_sk(parent), sk);
-               pr_fallback(mptcp_sk(parent));
+               mptcp_rcv_space_init(msk, sk);
+               pr_fallback(msk);
                subflow->conn_finished = 1;
                mptcp_set_connected(parent);
        }
@@ -1804,11 +1780,12 @@ static void subflow_state_change(struct sock *sk)
 
        subflow_sched_work_if_closed(mptcp_sk(parent), sk);
 
-       if (__mptcp_check_fallback(mptcp_sk(parent)) &&
-           !subflow->rx_eof && subflow_is_done(sk)) {
-               subflow->rx_eof = 1;
-               mptcp_subflow_eof(parent);
-       }
+       /* when the fallback subflow closes the rx side, trigger a 'dummy'
+        * ingress data fin, so that the msk state will follow along
+        */
+       if (__mptcp_check_fallback(msk) && subflow_is_done(sk) && msk->first == sk &&
+           mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true))
+               mptcp_schedule_work(parent);
 }
 
 void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_ssk)
index f0783e4..5f76ae8 100644 (file)
@@ -711,9 +711,11 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct)
 
        rcu_read_lock();
        ct_hook = rcu_dereference(nf_ct_hook);
-       BUG_ON(ct_hook == NULL);
-       ct_hook->destroy(nfct);
+       if (ct_hook)
+               ct_hook->destroy(nfct);
        rcu_read_unlock();
+
+       WARN_ON(!ct_hook);
 }
 EXPORT_SYMBOL(nf_conntrack_destroy);
 
index 46ebee9..9a6b647 100644 (file)
@@ -1694,6 +1694,14 @@ call_ad(struct net *net, struct sock *ctnl, struct sk_buff *skb,
        bool eexist = flags & IPSET_FLAG_EXIST, retried = false;
 
        do {
+               if (retried) {
+                       __ip_set_get(set);
+                       nfnl_unlock(NFNL_SUBSYS_IPSET);
+                       cond_resched();
+                       nfnl_lock(NFNL_SUBSYS_IPSET);
+                       __ip_set_put(set);
+               }
+
                ip_set_lock(set);
                ret = set->variant->uadt(set, tb, adt, &lineno, flags, retried);
                ip_set_unlock(set);
index feb1d7f..a80b960 100644 (file)
@@ -1207,6 +1207,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
        skb->transport_header = skb->network_header;
 
        skb_set_inner_ipproto(skb, next_protocol);
+       skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
 
        if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) {
                bool check = false;
@@ -1349,6 +1350,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
        skb->transport_header = skb->network_header;
 
        skb_set_inner_ipproto(skb, next_protocol);
+       skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
 
        if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) {
                bool check = false;
index c4ccfec..d119f1d 100644 (file)
@@ -2260,6 +2260,9 @@ static int nf_confirm_cthelper(struct sk_buff *skb, struct nf_conn *ct,
                return 0;
 
        helper = rcu_dereference(help->helper);
+       if (!helper)
+               return 0;
+
        if (!(helper->flags & NF_CT_HELPER_F_USERSPACE))
                return 0;
 
index d40544c..69c8c8c 100644 (file)
@@ -2976,7 +2976,9 @@ nla_put_failure:
        return -1;
 }
 
+#if IS_ENABLED(CONFIG_NF_NAT)
 static const union nf_inet_addr any_addr;
+#endif
 
 static __be32 nf_expect_get_id(const struct nf_conntrack_expect *exp)
 {
@@ -3460,10 +3462,12 @@ ctnetlink_change_expect(struct nf_conntrack_expect *x,
        return 0;
 }
 
+#if IS_ENABLED(CONFIG_NF_NAT)
 static const struct nla_policy exp_nat_nla_policy[CTA_EXPECT_NAT_MAX+1] = {
        [CTA_EXPECT_NAT_DIR]    = { .type = NLA_U32 },
        [CTA_EXPECT_NAT_TUPLE]  = { .type = NLA_NESTED },
 };
+#endif
 
 static int
 ctnetlink_parse_expect_nat(const struct nlattr *attr,
index 57f6724..169e16f 100644 (file)
@@ -1218,11 +1218,12 @@ static int __init nf_conntrack_standalone_init(void)
        nf_conntrack_htable_size_user = nf_conntrack_htable_size;
 #endif
 
+       nf_conntrack_init_end();
+
        ret = register_pernet_subsys(&nf_conntrack_net_ops);
        if (ret < 0)
                goto out_pernet;
 
-       nf_conntrack_init_end();
        return 0;
 
 out_pernet:
index 04bd0ed..b0ef48b 100644 (file)
@@ -317,12 +317,12 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
 EXPORT_SYMBOL_GPL(flow_offload_add);
 
 void flow_offload_refresh(struct nf_flowtable *flow_table,
-                         struct flow_offload *flow)
+                         struct flow_offload *flow, bool force)
 {
        u32 timeout;
 
        timeout = nf_flowtable_time_stamp + flow_offload_get_timeout(flow);
-       if (timeout - READ_ONCE(flow->timeout) > HZ)
+       if (force || timeout - READ_ONCE(flow->timeout) > HZ)
                WRITE_ONCE(flow->timeout, timeout);
        else
                return;
@@ -334,6 +334,12 @@ void flow_offload_refresh(struct nf_flowtable *flow_table,
 }
 EXPORT_SYMBOL_GPL(flow_offload_refresh);
 
+static bool nf_flow_is_outdated(const struct flow_offload *flow)
+{
+       return test_bit(IPS_SEEN_REPLY_BIT, &flow->ct->status) &&
+               !test_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags);
+}
+
 static inline bool nf_flow_has_expired(const struct flow_offload *flow)
 {
        return nf_flow_timeout_delta(flow->timeout) <= 0;
@@ -423,7 +429,8 @@ static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table,
                                    struct flow_offload *flow, void *data)
 {
        if (nf_flow_has_expired(flow) ||
-           nf_ct_is_dying(flow->ct))
+           nf_ct_is_dying(flow->ct) ||
+           nf_flow_is_outdated(flow))
                flow_offload_teardown(flow);
 
        if (test_bit(NF_FLOW_TEARDOWN, &flow->flags)) {
index 19efba1..3bbaf9c 100644 (file)
@@ -384,7 +384,7 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
        if (skb_try_make_writable(skb, thoff + hdrsize))
                return NF_DROP;
 
-       flow_offload_refresh(flow_table, flow);
+       flow_offload_refresh(flow_table, flow, false);
 
        nf_flow_encap_pop(skb, tuplehash);
        thoff -= offset;
@@ -650,7 +650,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
        if (skb_try_make_writable(skb, thoff + hdrsize))
                return NF_DROP;
 
-       flow_offload_refresh(flow_table, flow);
+       flow_offload_refresh(flow_table, flow, false);
 
        nf_flow_encap_pop(skb, tuplehash);
 
index 59fb832..4c7937f 100644 (file)
@@ -151,6 +151,7 @@ static struct nft_trans *nft_trans_alloc_gfp(const struct nft_ctx *ctx,
                return NULL;
 
        INIT_LIST_HEAD(&trans->list);
+       INIT_LIST_HEAD(&trans->binding_list);
        trans->msg_type = msg_type;
        trans->ctx      = *ctx;
 
@@ -163,13 +164,20 @@ static struct nft_trans *nft_trans_alloc(const struct nft_ctx *ctx,
        return nft_trans_alloc_gfp(ctx, msg_type, size, GFP_KERNEL);
 }
 
-static void nft_trans_destroy(struct nft_trans *trans)
+static void nft_trans_list_del(struct nft_trans *trans)
 {
        list_del(&trans->list);
+       list_del(&trans->binding_list);
+}
+
+static void nft_trans_destroy(struct nft_trans *trans)
+{
+       nft_trans_list_del(trans);
        kfree(trans);
 }
 
-static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set)
+static void __nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set,
+                                bool bind)
 {
        struct nftables_pernet *nft_net;
        struct net *net = ctx->net;
@@ -183,16 +191,80 @@ static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set)
                switch (trans->msg_type) {
                case NFT_MSG_NEWSET:
                        if (nft_trans_set(trans) == set)
-                               nft_trans_set_bound(trans) = true;
+                               nft_trans_set_bound(trans) = bind;
                        break;
                case NFT_MSG_NEWSETELEM:
                        if (nft_trans_elem_set(trans) == set)
-                               nft_trans_elem_set_bound(trans) = true;
+                               nft_trans_elem_set_bound(trans) = bind;
+                       break;
+               }
+       }
+}
+
+static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set)
+{
+       return __nft_set_trans_bind(ctx, set, true);
+}
+
+static void nft_set_trans_unbind(const struct nft_ctx *ctx, struct nft_set *set)
+{
+       return __nft_set_trans_bind(ctx, set, false);
+}
+
+static void __nft_chain_trans_bind(const struct nft_ctx *ctx,
+                                  struct nft_chain *chain, bool bind)
+{
+       struct nftables_pernet *nft_net;
+       struct net *net = ctx->net;
+       struct nft_trans *trans;
+
+       if (!nft_chain_binding(chain))
+               return;
+
+       nft_net = nft_pernet(net);
+       list_for_each_entry_reverse(trans, &nft_net->commit_list, list) {
+               switch (trans->msg_type) {
+               case NFT_MSG_NEWCHAIN:
+                       if (nft_trans_chain(trans) == chain)
+                               nft_trans_chain_bound(trans) = bind;
+                       break;
+               case NFT_MSG_NEWRULE:
+                       if (trans->ctx.chain == chain)
+                               nft_trans_rule_bound(trans) = bind;
                        break;
                }
        }
 }
 
+static void nft_chain_trans_bind(const struct nft_ctx *ctx,
+                                struct nft_chain *chain)
+{
+       __nft_chain_trans_bind(ctx, chain, true);
+}
+
+int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain)
+{
+       if (!nft_chain_binding(chain))
+               return 0;
+
+       if (nft_chain_binding(ctx->chain))
+               return -EOPNOTSUPP;
+
+       if (chain->bound)
+               return -EBUSY;
+
+       chain->bound = true;
+       chain->use++;
+       nft_chain_trans_bind(ctx, chain);
+
+       return 0;
+}
+
+void nf_tables_unbind_chain(const struct nft_ctx *ctx, struct nft_chain *chain)
+{
+       __nft_chain_trans_bind(ctx, chain, false);
+}
+
 static int nft_netdev_register_hooks(struct net *net,
                                     struct list_head *hook_list)
 {
@@ -292,6 +364,19 @@ static void nft_trans_commit_list_add_tail(struct net *net, struct nft_trans *tr
 {
        struct nftables_pernet *nft_net = nft_pernet(net);
 
+       switch (trans->msg_type) {
+       case NFT_MSG_NEWSET:
+               if (!nft_trans_set_update(trans) &&
+                   nft_set_is_anonymous(nft_trans_set(trans)))
+                       list_add_tail(&trans->binding_list, &nft_net->binding_list);
+               break;
+       case NFT_MSG_NEWCHAIN:
+               if (!nft_trans_chain_update(trans) &&
+                   nft_chain_binding(nft_trans_chain(trans)))
+                       list_add_tail(&trans->binding_list, &nft_net->binding_list);
+               break;
+       }
+
        list_add_tail(&trans->list, &nft_net->commit_list);
 }
 
@@ -338,8 +423,9 @@ static struct nft_trans *nft_trans_chain_add(struct nft_ctx *ctx, int msg_type)
                                ntohl(nla_get_be32(ctx->nla[NFTA_CHAIN_ID]));
                }
        }
-
+       nft_trans_chain(trans) = ctx->chain;
        nft_trans_commit_list_add_tail(ctx->net, trans);
+
        return trans;
 }
 
@@ -357,8 +443,7 @@ static int nft_delchain(struct nft_ctx *ctx)
        return 0;
 }
 
-static void nft_rule_expr_activate(const struct nft_ctx *ctx,
-                                  struct nft_rule *rule)
+void nft_rule_expr_activate(const struct nft_ctx *ctx, struct nft_rule *rule)
 {
        struct nft_expr *expr;
 
@@ -371,9 +456,8 @@ static void nft_rule_expr_activate(const struct nft_ctx *ctx,
        }
 }
 
-static void nft_rule_expr_deactivate(const struct nft_ctx *ctx,
-                                    struct nft_rule *rule,
-                                    enum nft_trans_phase phase)
+void nft_rule_expr_deactivate(const struct nft_ctx *ctx, struct nft_rule *rule,
+                             enum nft_trans_phase phase)
 {
        struct nft_expr *expr;
 
@@ -495,6 +579,58 @@ static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type,
        return __nft_trans_set_add(ctx, msg_type, set, NULL);
 }
 
+static void nft_setelem_data_deactivate(const struct net *net,
+                                       const struct nft_set *set,
+                                       struct nft_set_elem *elem);
+
+static int nft_mapelem_deactivate(const struct nft_ctx *ctx,
+                                 struct nft_set *set,
+                                 const struct nft_set_iter *iter,
+                                 struct nft_set_elem *elem)
+{
+       nft_setelem_data_deactivate(ctx->net, set, elem);
+
+       return 0;
+}
+
+struct nft_set_elem_catchall {
+       struct list_head        list;
+       struct rcu_head         rcu;
+       void                    *elem;
+};
+
+static void nft_map_catchall_deactivate(const struct nft_ctx *ctx,
+                                       struct nft_set *set)
+{
+       u8 genmask = nft_genmask_next(ctx->net);
+       struct nft_set_elem_catchall *catchall;
+       struct nft_set_elem elem;
+       struct nft_set_ext *ext;
+
+       list_for_each_entry(catchall, &set->catchall_list, list) {
+               ext = nft_set_elem_ext(set, catchall->elem);
+               if (!nft_set_elem_active(ext, genmask))
+                       continue;
+
+               elem.priv = catchall->elem;
+               nft_setelem_data_deactivate(ctx->net, set, &elem);
+               break;
+       }
+}
+
+static void nft_map_deactivate(const struct nft_ctx *ctx, struct nft_set *set)
+{
+       struct nft_set_iter iter = {
+               .genmask        = nft_genmask_next(ctx->net),
+               .fn             = nft_mapelem_deactivate,
+       };
+
+       set->ops->walk(ctx, set, &iter);
+       WARN_ON_ONCE(iter.err);
+
+       nft_map_catchall_deactivate(ctx, set);
+}
+
 static int nft_delset(const struct nft_ctx *ctx, struct nft_set *set)
 {
        int err;
@@ -503,6 +639,9 @@ static int nft_delset(const struct nft_ctx *ctx, struct nft_set *set)
        if (err < 0)
                return err;
 
+       if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
+               nft_map_deactivate(ctx, set);
+
        nft_deactivate_next(ctx->net, set);
        ctx->table->use--;
 
@@ -1600,6 +1739,8 @@ static int nft_dump_basechain_hook(struct sk_buff *skb, int family,
 
        if (nft_base_chain_netdev(family, ops->hooknum)) {
                nest_devs = nla_nest_start_noflag(skb, NFTA_HOOK_DEVS);
+               if (!nest_devs)
+                       goto nla_put_failure;
 
                if (!hook_list)
                        hook_list = &basechain->hook_list;
@@ -2224,7 +2365,7 @@ static int nft_basechain_init(struct nft_base_chain *basechain, u8 family,
        return 0;
 }
 
-static int nft_chain_add(struct nft_table *table, struct nft_chain *chain)
+int nft_chain_add(struct nft_table *table, struct nft_chain *chain)
 {
        int err;
 
@@ -2526,6 +2667,8 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy,
        nft_trans_basechain(trans) = basechain;
        INIT_LIST_HEAD(&nft_trans_chain_hooks(trans));
        list_splice(&hook.list, &nft_trans_chain_hooks(trans));
+       if (nla[NFTA_CHAIN_HOOK])
+               module_put(hook.type->owner);
 
        nft_trans_commit_list_add_tail(ctx->net, trans);
 
@@ -2668,21 +2811,18 @@ static int nf_tables_newchain(struct sk_buff *skb, const struct nfnl_info *info,
        return nf_tables_addchain(&ctx, family, genmask, policy, flags, extack);
 }
 
-static int nft_delchain_hook(struct nft_ctx *ctx, struct nft_chain *chain,
+static int nft_delchain_hook(struct nft_ctx *ctx,
+                            struct nft_base_chain *basechain,
                             struct netlink_ext_ack *extack)
 {
+       const struct nft_chain *chain = &basechain->chain;
        const struct nlattr * const *nla = ctx->nla;
        struct nft_chain_hook chain_hook = {};
-       struct nft_base_chain *basechain;
        struct nft_hook *this, *hook;
        LIST_HEAD(chain_del_list);
        struct nft_trans *trans;
        int err;
 
-       if (!nft_is_base_chain(chain))
-               return -EOPNOTSUPP;
-
-       basechain = nft_base_chain(chain);
        err = nft_chain_parse_hook(ctx->net, basechain, nla, &chain_hook,
                                   ctx->family, chain->flags, extack);
        if (err < 0)
@@ -2767,7 +2907,12 @@ static int nf_tables_delchain(struct sk_buff *skb, const struct nfnl_info *info,
                if (chain->flags & NFT_CHAIN_HW_OFFLOAD)
                        return -EOPNOTSUPP;
 
-               return nft_delchain_hook(&ctx, chain, extack);
+               if (nft_is_base_chain(chain)) {
+                       struct nft_base_chain *basechain = nft_base_chain(chain);
+
+                       if (nft_base_chain_netdev(table->family, basechain->ops.hooknum))
+                               return nft_delchain_hook(&ctx, basechain, extack);
+               }
        }
 
        if (info->nlh->nlmsg_flags & NLM_F_NONREC &&
@@ -3488,8 +3633,7 @@ err_fill_rule_info:
        return err;
 }
 
-static void nf_tables_rule_destroy(const struct nft_ctx *ctx,
-                                  struct nft_rule *rule)
+void nf_tables_rule_destroy(const struct nft_ctx *ctx, struct nft_rule *rule)
 {
        struct nft_expr *expr, *next;
 
@@ -3506,7 +3650,7 @@ static void nf_tables_rule_destroy(const struct nft_ctx *ctx,
        kfree(rule);
 }
 
-void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule)
+static void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule)
 {
        nft_rule_expr_deactivate(ctx, rule, NFT_TRANS_RELEASE);
        nf_tables_rule_destroy(ctx, rule);
@@ -3594,12 +3738,6 @@ int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set,
        return 0;
 }
 
-struct nft_set_elem_catchall {
-       struct list_head        list;
-       struct rcu_head         rcu;
-       void                    *elem;
-};
-
 int nft_set_catchall_validate(const struct nft_ctx *ctx, struct nft_set *set)
 {
        u8 genmask = nft_genmask_next(ctx->net);
@@ -3842,7 +3980,8 @@ err_destroy_flow_rule:
        if (flow)
                nft_flow_rule_destroy(flow);
 err_release_rule:
-       nf_tables_rule_release(&ctx, rule);
+       nft_rule_expr_deactivate(&ctx, rule, NFT_TRANS_PREPARE_ERROR);
+       nf_tables_rule_destroy(&ctx, rule);
 err_release_expr:
        for (i = 0; i < n; i++) {
                if (expr_info[i].ops) {
@@ -3865,12 +4004,10 @@ static struct nft_rule *nft_rule_lookup_byid(const struct net *net,
        struct nft_trans *trans;
 
        list_for_each_entry(trans, &nft_net->commit_list, list) {
-               struct nft_rule *rule = nft_trans_rule(trans);
-
                if (trans->msg_type == NFT_MSG_NEWRULE &&
                    trans->ctx.chain == chain &&
                    id == nft_trans_rule_id(trans))
-                       return rule;
+                       return nft_trans_rule(trans);
        }
        return ERR_PTR(-ENOENT);
 }
@@ -4776,6 +4913,9 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
                if (!(flags & NFT_SET_TIMEOUT))
                        return -EINVAL;
 
+               if (flags & NFT_SET_ANONYMOUS)
+                       return -EOPNOTSUPP;
+
                err = nf_msecs_to_jiffies64(nla[NFTA_SET_TIMEOUT], &desc.timeout);
                if (err)
                        return err;
@@ -4784,6 +4924,10 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
        if (nla[NFTA_SET_GC_INTERVAL] != NULL) {
                if (!(flags & NFT_SET_TIMEOUT))
                        return -EINVAL;
+
+               if (flags & NFT_SET_ANONYMOUS)
+                       return -EOPNOTSUPP;
+
                desc.gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL]));
        }
 
@@ -4830,6 +4974,9 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
                if (info->nlh->nlmsg_flags & NLM_F_REPLACE)
                        return -EOPNOTSUPP;
 
+               if (nft_set_is_anonymous(set))
+                       return -EOPNOTSUPP;
+
                err = nft_set_expr_alloc(&ctx, set, nla, exprs, &num_exprs, flags);
                if (err < 0)
                        return err;
@@ -4919,6 +5066,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
 
        set->num_exprs = num_exprs;
        set->handle = nf_tables_alloc_handle(table);
+       INIT_LIST_HEAD(&set->pending_update);
 
        err = nft_trans_set_add(&ctx, NFT_MSG_NEWSET, set);
        if (err < 0)
@@ -4932,7 +5080,7 @@ err_set_expr_alloc:
        for (i = 0; i < set->num_exprs; i++)
                nft_expr_destroy(&ctx, set->exprs[i]);
 err_set_destroy:
-       ops->destroy(set);
+       ops->destroy(&ctx, set);
 err_set_init:
        kfree(set->name);
 err_set_name:
@@ -4947,7 +5095,7 @@ static void nft_set_catchall_destroy(const struct nft_ctx *ctx,
 
        list_for_each_entry_safe(catchall, next, &set->catchall_list, list) {
                list_del_rcu(&catchall->list);
-               nft_set_elem_destroy(set, catchall->elem, true);
+               nf_tables_set_elem_destroy(ctx, set, catchall->elem);
                kfree_rcu(catchall, rcu);
        }
 }
@@ -4962,7 +5110,7 @@ static void nft_set_destroy(const struct nft_ctx *ctx, struct nft_set *set)
        for (i = 0; i < set->num_exprs; i++)
                nft_expr_destroy(ctx, set->exprs[i]);
 
-       set->ops->destroy(set);
+       set->ops->destroy(ctx, set);
        nft_set_catchall_destroy(ctx, set);
        kfree(set->name);
        kvfree(set);
@@ -5127,10 +5275,60 @@ static void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
        }
 }
 
+static void nft_setelem_data_activate(const struct net *net,
+                                     const struct nft_set *set,
+                                     struct nft_set_elem *elem);
+
+static int nft_mapelem_activate(const struct nft_ctx *ctx,
+                               struct nft_set *set,
+                               const struct nft_set_iter *iter,
+                               struct nft_set_elem *elem)
+{
+       nft_setelem_data_activate(ctx->net, set, elem);
+
+       return 0;
+}
+
+static void nft_map_catchall_activate(const struct nft_ctx *ctx,
+                                     struct nft_set *set)
+{
+       u8 genmask = nft_genmask_next(ctx->net);
+       struct nft_set_elem_catchall *catchall;
+       struct nft_set_elem elem;
+       struct nft_set_ext *ext;
+
+       list_for_each_entry(catchall, &set->catchall_list, list) {
+               ext = nft_set_elem_ext(set, catchall->elem);
+               if (!nft_set_elem_active(ext, genmask))
+                       continue;
+
+               elem.priv = catchall->elem;
+               nft_setelem_data_activate(ctx->net, set, &elem);
+               break;
+       }
+}
+
+static void nft_map_activate(const struct nft_ctx *ctx, struct nft_set *set)
+{
+       struct nft_set_iter iter = {
+               .genmask        = nft_genmask_next(ctx->net),
+               .fn             = nft_mapelem_activate,
+       };
+
+       set->ops->walk(ctx, set, &iter);
+       WARN_ON_ONCE(iter.err);
+
+       nft_map_catchall_activate(ctx, set);
+}
+
 void nf_tables_activate_set(const struct nft_ctx *ctx, struct nft_set *set)
 {
-       if (nft_set_is_anonymous(set))
+       if (nft_set_is_anonymous(set)) {
+               if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
+                       nft_map_activate(ctx, set);
+
                nft_clear(ctx->net, set);
+       }
 
        set->use++;
 }
@@ -5141,14 +5339,28 @@ void nf_tables_deactivate_set(const struct nft_ctx *ctx, struct nft_set *set,
                              enum nft_trans_phase phase)
 {
        switch (phase) {
-       case NFT_TRANS_PREPARE:
+       case NFT_TRANS_PREPARE_ERROR:
+               nft_set_trans_unbind(ctx, set);
                if (nft_set_is_anonymous(set))
                        nft_deactivate_next(ctx->net, set);
 
                set->use--;
+               break;
+       case NFT_TRANS_PREPARE:
+               if (nft_set_is_anonymous(set)) {
+                       if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
+                               nft_map_deactivate(ctx, set);
+
+                       nft_deactivate_next(ctx->net, set);
+               }
+               set->use--;
                return;
        case NFT_TRANS_ABORT:
        case NFT_TRANS_RELEASE:
+               if (nft_set_is_anonymous(set) &&
+                   set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
+                       nft_map_deactivate(ctx, set);
+
                set->use--;
                fallthrough;
        default:
@@ -5901,6 +6113,7 @@ static void nft_set_elem_expr_destroy(const struct nft_ctx *ctx,
                __nft_set_elem_expr_destroy(ctx, expr);
 }
 
+/* Drop references and destroy. Called from gc, dynset and abort path. */
 void nft_set_elem_destroy(const struct nft_set *set, void *elem,
                          bool destroy_expr)
 {
@@ -5922,11 +6135,11 @@ void nft_set_elem_destroy(const struct nft_set *set, void *elem,
 }
 EXPORT_SYMBOL_GPL(nft_set_elem_destroy);
 
-/* Only called from commit path, nft_setelem_data_deactivate() already deals
- * with the refcounting from the preparation phase.
+/* Destroy element. References have been already dropped in the preparation
+ * path via nft_setelem_data_deactivate().
  */
-static void nf_tables_set_elem_destroy(const struct nft_ctx *ctx,
-                                      const struct nft_set *set, void *elem)
+void nf_tables_set_elem_destroy(const struct nft_ctx *ctx,
+                               const struct nft_set *set, void *elem)
 {
        struct nft_set_ext *ext = nft_set_elem_ext(set, elem);
 
@@ -6489,19 +6702,19 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
        if (flags)
                *nft_set_ext_flags(ext) = flags;
 
+       if (obj) {
+               *nft_set_ext_obj(ext) = obj;
+               obj->use++;
+       }
        if (ulen > 0) {
                if (nft_set_ext_check(&tmpl, NFT_SET_EXT_USERDATA, ulen) < 0) {
                        err = -EINVAL;
-                       goto err_elem_userdata;
+                       goto err_elem_free;
                }
                udata = nft_set_ext_userdata(ext);
                udata->len = ulen - 1;
                nla_memcpy(&udata->data, nla[NFTA_SET_ELEM_USERDATA], ulen);
        }
-       if (obj) {
-               *nft_set_ext_obj(ext) = obj;
-               obj->use++;
-       }
        err = nft_set_elem_expr_setup(ctx, &tmpl, ext, expr_array, num_exprs);
        if (err < 0)
                goto err_elem_free;
@@ -6556,10 +6769,7 @@ err_set_full:
 err_element_clash:
        kfree(trans);
 err_elem_free:
-       if (obj)
-               obj->use--;
-err_elem_userdata:
-       nf_tables_set_elem_destroy(ctx, set, elem.priv);
+       nft_set_elem_destroy(set, elem.priv, true);
 err_parse_data:
        if (nla[NFTA_SET_ELEM_DATA] != NULL)
                nft_data_release(&elem.data.val, desc.type);
@@ -6603,7 +6813,8 @@ static int nf_tables_newsetelem(struct sk_buff *skb,
        if (IS_ERR(set))
                return PTR_ERR(set);
 
-       if (!list_empty(&set->bindings) && set->flags & NFT_SET_CONSTANT)
+       if (!list_empty(&set->bindings) &&
+           (set->flags & (NFT_SET_CONSTANT | NFT_SET_ANONYMOUS)))
                return -EBUSY;
 
        nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla);
@@ -6636,7 +6847,6 @@ static int nf_tables_newsetelem(struct sk_buff *skb,
 void nft_data_hold(const struct nft_data *data, enum nft_data_types type)
 {
        struct nft_chain *chain;
-       struct nft_rule *rule;
 
        if (type == NFT_DATA_VERDICT) {
                switch (data->verdict.code) {
@@ -6644,15 +6854,6 @@ void nft_data_hold(const struct nft_data *data, enum nft_data_types type)
                case NFT_GOTO:
                        chain = data->verdict.chain;
                        chain->use++;
-
-                       if (!nft_chain_is_bound(chain))
-                               break;
-
-                       chain->table->use++;
-                       list_for_each_entry(rule, &chain->rules, list)
-                               chain->use++;
-
-                       nft_chain_add(chain->table, chain);
                        break;
                }
        }
@@ -6887,7 +7088,9 @@ static int nf_tables_delsetelem(struct sk_buff *skb,
        set = nft_set_lookup(table, nla[NFTA_SET_ELEM_LIST_SET], genmask);
        if (IS_ERR(set))
                return PTR_ERR(set);
-       if (!list_empty(&set->bindings) && set->flags & NFT_SET_CONSTANT)
+
+       if (!list_empty(&set->bindings) &&
+           (set->flags & (NFT_SET_CONSTANT | NFT_SET_ANONYMOUS)))
                return -EBUSY;
 
        nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla);
@@ -7669,6 +7872,7 @@ void nf_tables_deactivate_flowtable(const struct nft_ctx *ctx,
                                    enum nft_trans_phase phase)
 {
        switch (phase) {
+       case NFT_TRANS_PREPARE_ERROR:
        case NFT_TRANS_PREPARE:
        case NFT_TRANS_ABORT:
        case NFT_TRANS_RELEASE:
@@ -8941,7 +9145,7 @@ static void nf_tables_trans_destroy_work(struct work_struct *w)
        synchronize_rcu();
 
        list_for_each_entry_safe(trans, next, &head, list) {
-               list_del(&trans->list);
+               nft_trans_list_del(trans);
                nft_commit_release(trans);
        }
 }
@@ -9007,7 +9211,7 @@ static int nf_tables_commit_chain_prepare(struct net *net, struct nft_chain *cha
                                continue;
                        }
 
-                       if (WARN_ON_ONCE(data + expr->ops->size > data_boundary))
+                       if (WARN_ON_ONCE(data + size + expr->ops->size > data_boundary))
                                return -ENOMEM;
 
                        memcpy(data + size, expr, expr->ops->size);
@@ -9275,10 +9479,25 @@ static void nf_tables_commit_audit_log(struct list_head *adl, u32 generation)
        }
 }
 
+static void nft_set_commit_update(struct list_head *set_update_list)
+{
+       struct nft_set *set, *next;
+
+       list_for_each_entry_safe(set, next, set_update_list, pending_update) {
+               list_del_init(&set->pending_update);
+
+               if (!set->ops->commit)
+                       continue;
+
+               set->ops->commit(set);
+       }
+}
+
 static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 {
        struct nftables_pernet *nft_net = nft_pernet(net);
        struct nft_trans *trans, *next;
+       LIST_HEAD(set_update_list);
        struct nft_trans_elem *te;
        struct nft_chain *chain;
        struct nft_table *table;
@@ -9291,6 +9510,27 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
                return 0;
        }
 
+       list_for_each_entry(trans, &nft_net->binding_list, binding_list) {
+               switch (trans->msg_type) {
+               case NFT_MSG_NEWSET:
+                       if (!nft_trans_set_update(trans) &&
+                           nft_set_is_anonymous(nft_trans_set(trans)) &&
+                           !nft_trans_set_bound(trans)) {
+                               pr_warn_once("nftables ruleset with unbound set\n");
+                               return -EINVAL;
+                       }
+                       break;
+               case NFT_MSG_NEWCHAIN:
+                       if (!nft_trans_chain_update(trans) &&
+                           nft_chain_binding(nft_trans_chain(trans)) &&
+                           !nft_trans_chain_bound(trans)) {
+                               pr_warn_once("nftables ruleset with unbound chain\n");
+                               return -EINVAL;
+                       }
+                       break;
+               }
+       }
+
        /* 0. Validate ruleset, otherwise roll back for error reporting. */
        if (nf_tables_validate(net) < 0)
                return -EAGAIN;
@@ -9453,6 +9693,11 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
                        nf_tables_setelem_notify(&trans->ctx, te->set,
                                                 &te->elem,
                                                 NFT_MSG_NEWSETELEM);
+                       if (te->set->ops->commit &&
+                           list_empty(&te->set->pending_update)) {
+                               list_add_tail(&te->set->pending_update,
+                                             &set_update_list);
+                       }
                        nft_trans_destroy(trans);
                        break;
                case NFT_MSG_DELSETELEM:
@@ -9467,6 +9712,11 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
                                atomic_dec(&te->set->nelems);
                                te->set->ndeact--;
                        }
+                       if (te->set->ops->commit &&
+                           list_empty(&te->set->pending_update)) {
+                               list_add_tail(&te->set->pending_update,
+                                             &set_update_list);
+                       }
                        break;
                case NFT_MSG_NEWOBJ:
                        if (nft_trans_obj_update(trans)) {
@@ -9529,6 +9779,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
                }
        }
 
+       nft_set_commit_update(&set_update_list);
+
        nft_commit_notify(net, NETLINK_CB(skb).portid);
        nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN);
        nf_tables_commit_audit_log(&adl, nft_net->base_seq);
@@ -9588,10 +9840,25 @@ static void nf_tables_abort_release(struct nft_trans *trans)
        kfree(trans);
 }
 
+static void nft_set_abort_update(struct list_head *set_update_list)
+{
+       struct nft_set *set, *next;
+
+       list_for_each_entry_safe(set, next, set_update_list, pending_update) {
+               list_del_init(&set->pending_update);
+
+               if (!set->ops->abort)
+                       continue;
+
+               set->ops->abort(set);
+       }
+}
+
 static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 {
        struct nftables_pernet *nft_net = nft_pernet(net);
        struct nft_trans *trans, *next;
+       LIST_HEAD(set_update_list);
        struct nft_trans_elem *te;
 
        if (action == NFNL_ABORT_VALIDATE &&
@@ -9633,7 +9900,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
                                kfree(nft_trans_chain_name(trans));
                                nft_trans_destroy(trans);
                        } else {
-                               if (nft_chain_is_bound(trans->ctx.chain)) {
+                               if (nft_trans_chain_bound(trans)) {
                                        nft_trans_destroy(trans);
                                        break;
                                }
@@ -9656,6 +9923,10 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
                        nft_trans_destroy(trans);
                        break;
                case NFT_MSG_NEWRULE:
+                       if (nft_trans_rule_bound(trans)) {
+                               nft_trans_destroy(trans);
+                               break;
+                       }
                        trans->ctx.chain->use--;
                        list_del_rcu(&nft_trans_rule(trans)->list);
                        nft_rule_expr_deactivate(&trans->ctx,
@@ -9690,6 +9961,9 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
                case NFT_MSG_DESTROYSET:
                        trans->ctx.table->use++;
                        nft_clear(trans->ctx.net, nft_trans_set(trans));
+                       if (nft_trans_set(trans)->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
+                               nft_map_activate(&trans->ctx, nft_trans_set(trans));
+
                        nft_trans_destroy(trans);
                        break;
                case NFT_MSG_NEWSETELEM:
@@ -9701,6 +9975,12 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
                        nft_setelem_remove(net, te->set, &te->elem);
                        if (!nft_setelem_is_catchall(te->set, &te->elem))
                                atomic_dec(&te->set->nelems);
+
+                       if (te->set->ops->abort &&
+                           list_empty(&te->set->pending_update)) {
+                               list_add_tail(&te->set->pending_update,
+                                             &set_update_list);
+                       }
                        break;
                case NFT_MSG_DELSETELEM:
                case NFT_MSG_DESTROYSETELEM:
@@ -9711,6 +9991,11 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
                        if (!nft_setelem_is_catchall(te->set, &te->elem))
                                te->set->ndeact--;
 
+                       if (te->set->ops->abort &&
+                           list_empty(&te->set->pending_update)) {
+                               list_add_tail(&te->set->pending_update,
+                                             &set_update_list);
+                       }
                        nft_trans_destroy(trans);
                        break;
                case NFT_MSG_NEWOBJ:
@@ -9753,11 +10038,13 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
                }
        }
 
+       nft_set_abort_update(&set_update_list);
+
        synchronize_rcu();
 
        list_for_each_entry_safe_reverse(trans, next,
                                         &nft_net->commit_list, list) {
-               list_del(&trans->list);
+               nft_trans_list_del(trans);
                nf_tables_abort_release(trans);
        }
 
@@ -10206,22 +10493,12 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
 static void nft_verdict_uninit(const struct nft_data *data)
 {
        struct nft_chain *chain;
-       struct nft_rule *rule;
 
        switch (data->verdict.code) {
        case NFT_JUMP:
        case NFT_GOTO:
                chain = data->verdict.chain;
                chain->use--;
-
-               if (!nft_chain_is_bound(chain))
-                       break;
-
-               chain->table->use--;
-               list_for_each_entry(rule, &chain->rules, list)
-                       chain->use--;
-
-               nft_chain_del(chain);
                break;
        }
 }
@@ -10456,6 +10733,9 @@ static void __nft_release_table(struct net *net, struct nft_table *table)
        list_for_each_entry_safe(set, ns, &table->sets, list) {
                list_del(&set->list);
                table->use--;
+               if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
+                       nft_map_deactivate(&ctx, set);
+
                nft_set_destroy(&ctx, set);
        }
        list_for_each_entry_safe(obj, ne, &table->objects, list) {
@@ -10540,6 +10820,7 @@ static int __net_init nf_tables_init_net(struct net *net)
 
        INIT_LIST_HEAD(&nft_net->tables);
        INIT_LIST_HEAD(&nft_net->commit_list);
+       INIT_LIST_HEAD(&nft_net->binding_list);
        INIT_LIST_HEAD(&nft_net->module_list);
        INIT_LIST_HEAD(&nft_net->notify_list);
        mutex_init(&nft_net->commit_mutex);
index ae71464..c9fbe0f 100644 (file)
@@ -533,7 +533,8 @@ ack:
                         * processed, this avoids that the same error is
                         * reported several times when replaying the batch.
                         */
-                       if (nfnl_err_add(&err_list, nlh, err, &extack) < 0) {
+                       if (err == -ENOMEM ||
+                           nfnl_err_add(&err_list, nlh, err, &extack) < 0) {
                                /* We failed to enqueue an error, reset the
                                 * list of errors and send OOM to userspace
                                 * pointing to the batch header.
index ee6840b..8f1bfa6 100644 (file)
@@ -439,3 +439,4 @@ module_init(nfnl_osf_init);
 module_exit(nfnl_osf_fini);
 
 MODULE_LICENSE("GPL");
+MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_OSF);
index 84eae7c..2527a01 100644 (file)
@@ -323,7 +323,7 @@ static bool nft_bitwise_reduce(struct nft_regs_track *track,
        dreg = priv->dreg;
        regcount = DIV_ROUND_UP(priv->len, NFT_REG32_SIZE);
        for (i = 0; i < regcount; i++, dreg++)
-               track->regs[priv->dreg].bitwise = expr;
+               track->regs[dreg].bitwise = expr;
 
        return false;
 }
index c3563f0..680fe55 100644 (file)
@@ -344,6 +344,12 @@ static void nft_netdev_event(unsigned long event, struct net_device *dev,
                return;
        }
 
+       /* UNREGISTER events are also happening on netns exit.
+        *
+        * Although nf_tables core releases all tables/chains, only this event
+        * handler provides guarantee that hook->ops.dev is still accessible,
+        * so we cannot skip exiting net namespaces.
+        */
        __nft_release_basechain(ctx);
 }
 
@@ -362,9 +368,6 @@ static int nf_tables_netdev_event(struct notifier_block *this,
            event != NETDEV_CHANGENAME)
                return NOTIFY_DONE;
 
-       if (!check_net(ctx.net))
-               return NOTIFY_DONE;
-
        nft_net = nft_pernet(ctx.net);
        mutex_lock(&nft_net->commit_mutex);
        list_for_each_entry(table, &nft_net->tables, list) {
index c9d2f7c..3d76ebf 100644 (file)
@@ -76,11 +76,9 @@ static int nft_immediate_init(const struct nft_ctx *ctx,
                switch (priv->data.verdict.code) {
                case NFT_JUMP:
                case NFT_GOTO:
-                       if (nft_chain_is_bound(chain)) {
-                               err = -EBUSY;
-                               goto err1;
-                       }
-                       chain->bound = true;
+                       err = nf_tables_bind_chain(ctx, chain);
+                       if (err < 0)
+                               return err;
                        break;
                default:
                        break;
@@ -98,6 +96,31 @@ static void nft_immediate_activate(const struct nft_ctx *ctx,
                                   const struct nft_expr *expr)
 {
        const struct nft_immediate_expr *priv = nft_expr_priv(expr);
+       const struct nft_data *data = &priv->data;
+       struct nft_ctx chain_ctx;
+       struct nft_chain *chain;
+       struct nft_rule *rule;
+
+       if (priv->dreg == NFT_REG_VERDICT) {
+               switch (data->verdict.code) {
+               case NFT_JUMP:
+               case NFT_GOTO:
+                       chain = data->verdict.chain;
+                       if (!nft_chain_binding(chain))
+                               break;
+
+                       chain_ctx = *ctx;
+                       chain_ctx.chain = chain;
+
+                       list_for_each_entry(rule, &chain->rules, list)
+                               nft_rule_expr_activate(&chain_ctx, rule);
+
+                       nft_clear(ctx->net, chain);
+                       break;
+               default:
+                       break;
+               }
+       }
 
        return nft_data_hold(&priv->data, nft_dreg_to_type(priv->dreg));
 }
@@ -107,6 +130,43 @@ static void nft_immediate_deactivate(const struct nft_ctx *ctx,
                                     enum nft_trans_phase phase)
 {
        const struct nft_immediate_expr *priv = nft_expr_priv(expr);
+       const struct nft_data *data = &priv->data;
+       struct nft_ctx chain_ctx;
+       struct nft_chain *chain;
+       struct nft_rule *rule;
+
+       if (priv->dreg == NFT_REG_VERDICT) {
+               switch (data->verdict.code) {
+               case NFT_JUMP:
+               case NFT_GOTO:
+                       chain = data->verdict.chain;
+                       if (!nft_chain_binding(chain))
+                               break;
+
+                       chain_ctx = *ctx;
+                       chain_ctx.chain = chain;
+
+                       list_for_each_entry(rule, &chain->rules, list)
+                               nft_rule_expr_deactivate(&chain_ctx, rule, phase);
+
+                       switch (phase) {
+                       case NFT_TRANS_PREPARE_ERROR:
+                               nf_tables_unbind_chain(ctx, chain);
+                               fallthrough;
+                       case NFT_TRANS_PREPARE:
+                               nft_deactivate_next(ctx->net, chain);
+                               break;
+                       default:
+                               nft_chain_del(chain);
+                               chain->bound = false;
+                               chain->table->use--;
+                               break;
+                       }
+                       break;
+               default:
+                       break;
+               }
+       }
 
        if (phase == NFT_TRANS_COMMIT)
                return;
@@ -131,15 +191,27 @@ static void nft_immediate_destroy(const struct nft_ctx *ctx,
        case NFT_GOTO:
                chain = data->verdict.chain;
 
-               if (!nft_chain_is_bound(chain))
+               if (!nft_chain_binding(chain))
+                       break;
+
+               /* Rule construction failed, but chain is already bound:
+                * let the transaction records release this chain and its rules.
+                */
+               if (chain->bound) {
+                       chain->use--;
                        break;
+               }
 
+               /* Rule has been deleted, release chain and its rules. */
                chain_ctx = *ctx;
                chain_ctx.chain = chain;
 
-               list_for_each_entry_safe(rule, n, &chain->rules, list)
-                       nf_tables_rule_release(&chain_ctx, rule);
-
+               chain->use--;
+               list_for_each_entry_safe(rule, n, &chain->rules, list) {
+                       chain->use--;
+                       list_del(&rule->list);
+                       nf_tables_rule_destroy(&chain_ctx, rule);
+               }
                nf_tables_chain_destroy(&chain_ctx);
                break;
        default:
index 96081ac..1e5e7a1 100644 (file)
@@ -271,13 +271,14 @@ static int nft_bitmap_init(const struct nft_set *set,
        return 0;
 }
 
-static void nft_bitmap_destroy(const struct nft_set *set)
+static void nft_bitmap_destroy(const struct nft_ctx *ctx,
+                              const struct nft_set *set)
 {
        struct nft_bitmap *priv = nft_set_priv(set);
        struct nft_bitmap_elem *be, *n;
 
        list_for_each_entry_safe(be, n, &priv->list, head)
-               nft_set_elem_destroy(set, be, true);
+               nf_tables_set_elem_destroy(ctx, set, be);
 }
 
 static bool nft_bitmap_estimate(const struct nft_set_desc *desc, u32 features,
index 76de6c8..0b73cb0 100644 (file)
@@ -400,19 +400,31 @@ static int nft_rhash_init(const struct nft_set *set,
        return 0;
 }
 
+struct nft_rhash_ctx {
+       const struct nft_ctx    ctx;
+       const struct nft_set    *set;
+};
+
 static void nft_rhash_elem_destroy(void *ptr, void *arg)
 {
-       nft_set_elem_destroy(arg, ptr, true);
+       struct nft_rhash_ctx *rhash_ctx = arg;
+
+       nf_tables_set_elem_destroy(&rhash_ctx->ctx, rhash_ctx->set, ptr);
 }
 
-static void nft_rhash_destroy(const struct nft_set *set)
+static void nft_rhash_destroy(const struct nft_ctx *ctx,
+                             const struct nft_set *set)
 {
        struct nft_rhash *priv = nft_set_priv(set);
+       struct nft_rhash_ctx rhash_ctx = {
+               .ctx    = *ctx,
+               .set    = set,
+       };
 
        cancel_delayed_work_sync(&priv->gc_work);
        rcu_barrier();
        rhashtable_free_and_destroy(&priv->ht, nft_rhash_elem_destroy,
-                                   (void *)set);
+                                   (void *)&rhash_ctx);
 }
 
 /* Number of buckets is stored in u32, so cap our result to 1U<<31 */
@@ -643,7 +655,8 @@ static int nft_hash_init(const struct nft_set *set,
        return 0;
 }
 
-static void nft_hash_destroy(const struct nft_set *set)
+static void nft_hash_destroy(const struct nft_ctx *ctx,
+                            const struct nft_set *set)
 {
        struct nft_hash *priv = nft_set_priv(set);
        struct nft_hash_elem *he;
@@ -653,7 +666,7 @@ static void nft_hash_destroy(const struct nft_set *set)
        for (i = 0; i < priv->buckets; i++) {
                hlist_for_each_entry_safe(he, next, &priv->table[i], node) {
                        hlist_del_rcu(&he->node);
-                       nft_set_elem_destroy(set, he, true);
+                       nf_tables_set_elem_destroy(ctx, set, he);
                }
        }
 }
index 06d46d1..0452ee5 100644 (file)
@@ -1600,17 +1600,10 @@ static void pipapo_free_fields(struct nft_pipapo_match *m)
        }
 }
 
-/**
- * pipapo_reclaim_match - RCU callback to free fields from old matching data
- * @rcu:       RCU head
- */
-static void pipapo_reclaim_match(struct rcu_head *rcu)
+static void pipapo_free_match(struct nft_pipapo_match *m)
 {
-       struct nft_pipapo_match *m;
        int i;
 
-       m = container_of(rcu, struct nft_pipapo_match, rcu);
-
        for_each_possible_cpu(i)
                kfree(*per_cpu_ptr(m->scratch, i));
 
@@ -1625,7 +1618,19 @@ static void pipapo_reclaim_match(struct rcu_head *rcu)
 }
 
 /**
- * pipapo_commit() - Replace lookup data with current working copy
+ * pipapo_reclaim_match - RCU callback to free fields from old matching data
+ * @rcu:       RCU head
+ */
+static void pipapo_reclaim_match(struct rcu_head *rcu)
+{
+       struct nft_pipapo_match *m;
+
+       m = container_of(rcu, struct nft_pipapo_match, rcu);
+       pipapo_free_match(m);
+}
+
+/**
+ * nft_pipapo_commit() - Replace lookup data with current working copy
  * @set:       nftables API set representation
  *
  * While at it, check if we should perform garbage collection on the working
@@ -1635,7 +1640,7 @@ static void pipapo_reclaim_match(struct rcu_head *rcu)
  * We also need to create a new working copy for subsequent insertions and
  * deletions.
  */
-static void pipapo_commit(const struct nft_set *set)
+static void nft_pipapo_commit(const struct nft_set *set)
 {
        struct nft_pipapo *priv = nft_set_priv(set);
        struct nft_pipapo_match *new_clone, *old;
@@ -1660,6 +1665,26 @@ static void pipapo_commit(const struct nft_set *set)
        priv->clone = new_clone;
 }
 
+static void nft_pipapo_abort(const struct nft_set *set)
+{
+       struct nft_pipapo *priv = nft_set_priv(set);
+       struct nft_pipapo_match *new_clone, *m;
+
+       if (!priv->dirty)
+               return;
+
+       m = rcu_dereference(priv->match);
+
+       new_clone = pipapo_clone(m);
+       if (IS_ERR(new_clone))
+               return;
+
+       priv->dirty = false;
+
+       pipapo_free_match(priv->clone);
+       priv->clone = new_clone;
+}
+
 /**
  * nft_pipapo_activate() - Mark element reference as active given key, commit
  * @net:       Network namespace
@@ -1667,8 +1692,7 @@ static void pipapo_commit(const struct nft_set *set)
  * @elem:      nftables API element representation containing key data
  *
  * On insertion, elements are added to a copy of the matching data currently
- * in use for lookups, and not directly inserted into current lookup data, so
- * we'll take care of that by calling pipapo_commit() here. Both
+ * in use for lookups, and not directly inserted into current lookup data. Both
  * nft_pipapo_insert() and nft_pipapo_activate() are called once for each
  * element, hence we can't purpose either one as a real commit operation.
  */
@@ -1684,8 +1708,6 @@ static void nft_pipapo_activate(const struct net *net,
 
        nft_set_elem_change_active(net, set, &e->ext);
        nft_set_elem_clear_busy(&e->ext);
-
-       pipapo_commit(set);
 }
 
 /**
@@ -1931,7 +1953,6 @@ static void nft_pipapo_remove(const struct net *net, const struct nft_set *set,
                if (i == m->field_count) {
                        priv->dirty = true;
                        pipapo_drop(m, rulemap);
-                       pipapo_commit(set);
                        return;
                }
 
@@ -1953,12 +1974,16 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
                            struct nft_set_iter *iter)
 {
        struct nft_pipapo *priv = nft_set_priv(set);
+       struct net *net = read_pnet(&set->net);
        struct nft_pipapo_match *m;
        struct nft_pipapo_field *f;
        int i, r;
 
        rcu_read_lock();
-       m = rcu_dereference(priv->match);
+       if (iter->genmask == nft_genmask_cur(net))
+               m = rcu_dereference(priv->match);
+       else
+               m = priv->clone;
 
        if (unlikely(!m))
                goto out;
@@ -2127,10 +2152,12 @@ out_scratch:
 
 /**
  * nft_set_pipapo_match_destroy() - Destroy elements from key mapping array
+ * @ctx:       context
  * @set:       nftables API set representation
  * @m:         matching data pointing to key mapping array
  */
-static void nft_set_pipapo_match_destroy(const struct nft_set *set,
+static void nft_set_pipapo_match_destroy(const struct nft_ctx *ctx,
+                                        const struct nft_set *set,
                                         struct nft_pipapo_match *m)
 {
        struct nft_pipapo_field *f;
@@ -2147,15 +2174,17 @@ static void nft_set_pipapo_match_destroy(const struct nft_set *set,
 
                e = f->mt[r].e;
 
-               nft_set_elem_destroy(set, e, true);
+               nf_tables_set_elem_destroy(ctx, set, e);
        }
 }
 
 /**
  * nft_pipapo_destroy() - Free private data for set and all committed elements
+ * @ctx:       context
  * @set:       nftables API set representation
  */
-static void nft_pipapo_destroy(const struct nft_set *set)
+static void nft_pipapo_destroy(const struct nft_ctx *ctx,
+                              const struct nft_set *set)
 {
        struct nft_pipapo *priv = nft_set_priv(set);
        struct nft_pipapo_match *m;
@@ -2165,7 +2194,7 @@ static void nft_pipapo_destroy(const struct nft_set *set)
        if (m) {
                rcu_barrier();
 
-               nft_set_pipapo_match_destroy(set, m);
+               nft_set_pipapo_match_destroy(ctx, set, m);
 
 #ifdef NFT_PIPAPO_ALIGN
                free_percpu(m->scratch_aligned);
@@ -2182,7 +2211,7 @@ static void nft_pipapo_destroy(const struct nft_set *set)
                m = priv->clone;
 
                if (priv->dirty)
-                       nft_set_pipapo_match_destroy(set, m);
+                       nft_set_pipapo_match_destroy(ctx, set, m);
 
 #ifdef NFT_PIPAPO_ALIGN
                free_percpu(priv->clone->scratch_aligned);
@@ -2230,6 +2259,8 @@ const struct nft_set_type nft_set_pipapo_type = {
                .init           = nft_pipapo_init,
                .destroy        = nft_pipapo_destroy,
                .gc_init        = nft_pipapo_gc_init,
+               .commit         = nft_pipapo_commit,
+               .abort          = nft_pipapo_abort,
                .elemsize       = offsetof(struct nft_pipapo_elem, ext),
        },
 };
@@ -2252,6 +2283,8 @@ const struct nft_set_type nft_set_pipapo_avx2_type = {
                .init           = nft_pipapo_init,
                .destroy        = nft_pipapo_destroy,
                .gc_init        = nft_pipapo_gc_init,
+               .commit         = nft_pipapo_commit,
+               .abort          = nft_pipapo_abort,
                .elemsize       = offsetof(struct nft_pipapo_elem, ext),
        },
 };
index 19ea4d3..5c05c9b 100644 (file)
@@ -221,7 +221,7 @@ static int nft_rbtree_gc_elem(const struct nft_set *__set,
 {
        struct nft_set *set = (struct nft_set *)__set;
        struct rb_node *prev = rb_prev(&rbe->node);
-       struct nft_rbtree_elem *rbe_prev;
+       struct nft_rbtree_elem *rbe_prev = NULL;
        struct nft_set_gc_batch *gcb;
 
        gcb = nft_set_gc_batch_check(set, NULL, GFP_ATOMIC);
@@ -229,17 +229,21 @@ static int nft_rbtree_gc_elem(const struct nft_set *__set,
                return -ENOMEM;
 
        /* search for expired end interval coming before this element. */
-       do {
+       while (prev) {
                rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node);
                if (nft_rbtree_interval_end(rbe_prev))
                        break;
 
                prev = rb_prev(prev);
-       } while (prev != NULL);
+       }
+
+       if (rbe_prev) {
+               rb_erase(&rbe_prev->node, &priv->root);
+               atomic_dec(&set->nelems);
+       }
 
-       rb_erase(&rbe_prev->node, &priv->root);
        rb_erase(&rbe->node, &priv->root);
-       atomic_sub(2, &set->nelems);
+       atomic_dec(&set->nelems);
 
        nft_set_gc_batch_add(gcb, rbe);
        nft_set_gc_batch_complete(gcb);
@@ -268,7 +272,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
                               struct nft_set_ext **ext)
 {
        struct nft_rbtree_elem *rbe, *rbe_le = NULL, *rbe_ge = NULL;
-       struct rb_node *node, *parent, **p, *first = NULL;
+       struct rb_node *node, *next, *parent, **p, *first = NULL;
        struct nft_rbtree *priv = nft_set_priv(set);
        u8 genmask = nft_genmask_next(net);
        int d, err;
@@ -307,7 +311,9 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
         * Values stored in the tree are in reversed order, starting from
         * highest to lowest value.
         */
-       for (node = first; node != NULL; node = rb_next(node)) {
+       for (node = first; node != NULL; node = next) {
+               next = rb_next(node);
+
                rbe = rb_entry(node, struct nft_rbtree_elem, node);
 
                if (!nft_set_elem_active(&rbe->ext, genmask))
@@ -658,7 +664,8 @@ static int nft_rbtree_init(const struct nft_set *set,
        return 0;
 }
 
-static void nft_rbtree_destroy(const struct nft_set *set)
+static void nft_rbtree_destroy(const struct nft_ctx *ctx,
+                              const struct nft_set *set)
 {
        struct nft_rbtree *priv = nft_set_priv(set);
        struct nft_rbtree_elem *rbe;
@@ -669,7 +676,7 @@ static void nft_rbtree_destroy(const struct nft_set *set)
        while ((node = priv->root.rb_node) != NULL) {
                rb_erase(node, &priv->root);
                rbe = rb_entry(node, struct nft_rbtree_elem, node);
-               nft_set_elem_destroy(set, rbe, true);
+               nf_tables_set_elem_destroy(ctx, set, rbe);
        }
 }
 
index e1990ba..dc94858 100644 (file)
@@ -71,4 +71,3 @@ MODULE_AUTHOR("Evgeniy Polyakov <zbr@ioremap.net>");
 MODULE_DESCRIPTION("Passive OS fingerprint matching.");
 MODULE_ALIAS("ipt_osf");
 MODULE_ALIAS("ip6t_osf");
-MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_OSF);
index 54c0830..27511c9 100644 (file)
@@ -857,7 +857,8 @@ int netlbl_catmap_setlong(struct netlbl_lsm_catmap **catmap,
 
        offset -= iter->startbit;
        idx = offset / NETLBL_CATMAP_MAPSIZE;
-       iter->bitmap[idx] |= bitmap << (offset % NETLBL_CATMAP_MAPSIZE);
+       iter->bitmap[idx] |= (NETLBL_CATMAP_MAPTYPE)bitmap
+                            << (offset % NETLBL_CATMAP_MAPSIZE);
 
        return 0;
 }
index 7ef8b9a..3a1e0fd 100644 (file)
@@ -1779,7 +1779,7 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname,
                                break;
                        }
                }
-               if (put_user(ALIGN(nlk->ngroups / 8, sizeof(u32)), optlen))
+               if (put_user(ALIGN(BITS_TO_BYTES(nlk->ngroups), sizeof(u32)), optlen))
                        err = -EFAULT;
                netlink_unlock_table();
                return err;
@@ -1990,7 +1990,7 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
        skb_free_datagram(sk, skb);
 
-       if (nlk->cb_running &&
+       if (READ_ONCE(nlk->cb_running) &&
            atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
                ret = netlink_dump(sk);
                if (ret) {
@@ -2302,7 +2302,7 @@ static int netlink_dump(struct sock *sk)
        if (cb->done)
                cb->done(cb);
 
-       nlk->cb_running = false;
+       WRITE_ONCE(nlk->cb_running, false);
        module = cb->module;
        skb = cb->skb;
        mutex_unlock(nlk->cb_mutex);
@@ -2365,7 +2365,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
                        goto error_put;
        }
 
-       nlk->cb_running = true;
+       WRITE_ONCE(nlk->cb_running, true);
        nlk->dump_done_errno = INT_MAX;
 
        mutex_unlock(nlk->cb_mutex);
@@ -2703,7 +2703,7 @@ static int netlink_native_seq_show(struct seq_file *seq, void *v)
                           nlk->groups ? (u32)nlk->groups[0] : 0,
                           sk_rmem_alloc_get(s),
                           sk_wmem_alloc_get(s),
-                          nlk->cb_running,
+                          READ_ONCE(nlk->cb_running),
                           refcount_read(&s->sk_refcnt),
                           atomic_read(&s->sk_drops),
                           sock_i_ino(s)
index 3f99b43..e2d2af9 100644 (file)
@@ -123,7 +123,7 @@ void nr_write_internal(struct sock *sk, int frametype)
        unsigned char  *dptr;
        int len, timeout;
 
-       len = NR_NETWORK_LEN + NR_TRANSPORT_LEN;
+       len = NR_TRANSPORT_LEN;
 
        switch (frametype & 0x0F) {
        case NR_CONNREQ:
@@ -141,7 +141,8 @@ void nr_write_internal(struct sock *sk, int frametype)
                return;
        }
 
-       if ((skb = alloc_skb(len, GFP_ATOMIC)) == NULL)
+       skb = alloc_skb(NR_NETWORK_LEN + len, GFP_ATOMIC);
+       if (!skb)
                return;
 
        /*
@@ -149,7 +150,7 @@ void nr_write_internal(struct sock *sk, int frametype)
         */
        skb_reserve(skb, NR_NETWORK_LEN);
 
-       dptr = skb_put(skb, skb_tailroom(skb));
+       dptr = skb_put(skb, len);
 
        switch (frametype & 0x0F) {
        case NR_CONNREQ:
index e9ca007..0f23e5e 100644 (file)
@@ -77,13 +77,12 @@ static struct sk_buff *nsh_gso_segment(struct sk_buff *skb,
                                       netdev_features_t features)
 {
        struct sk_buff *segs = ERR_PTR(-EINVAL);
+       u16 mac_offset = skb->mac_header;
        unsigned int nsh_len, mac_len;
        __be16 proto;
-       int nhoff;
 
        skb_reset_network_header(skb);
 
-       nhoff = skb->network_header - skb->mac_header;
        mac_len = skb->mac_len;
 
        if (unlikely(!pskb_may_pull(skb, NSH_BASE_HDR_LEN)))
@@ -108,15 +107,14 @@ static struct sk_buff *nsh_gso_segment(struct sk_buff *skb,
        segs = skb_mac_gso_segment(skb, features);
        if (IS_ERR_OR_NULL(segs)) {
                skb_gso_error_unwind(skb, htons(ETH_P_NSH), nsh_len,
-                                    skb->network_header - nhoff,
-                                    mac_len);
+                                    mac_offset, mac_len);
                goto out;
        }
 
        for (skb = segs; skb; skb = skb->next) {
                skb->protocol = htons(ETH_P_NSH);
                __skb_push(skb, nsh_len);
-               skb_set_mac_header(skb, -nhoff);
+               skb->mac_header = mac_offset;
                skb->network_header = skb->mac_header + mac_len;
                skb->mac_len = mac_len;
        }
index fcee601..58f530f 100644 (file)
@@ -236,9 +236,6 @@ void ovs_dp_detach_port(struct vport *p)
        /* First drop references to device. */
        hlist_del_rcu(&p->dp_hash_node);
 
-       /* Free percpu memory */
-       free_percpu(p->upcall_stats);
-
        /* Then destroy it. */
        ovs_vport_del(p);
 }
@@ -1858,12 +1855,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
                goto err_destroy_portids;
        }
 
-       vport->upcall_stats = netdev_alloc_pcpu_stats(struct vport_upcall_stats_percpu);
-       if (!vport->upcall_stats) {
-               err = -ENOMEM;
-               goto err_destroy_vport;
-       }
-
        err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
                                   info->snd_seq, 0, OVS_DP_CMD_NEW);
        BUG_ON(err < 0);
@@ -1876,8 +1867,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
        ovs_notify(&dp_datapath_genl_family, reply, info);
        return 0;
 
-err_destroy_vport:
-       ovs_dp_detach_port(vport);
 err_destroy_portids:
        kfree(rcu_dereference_raw(dp->upcall_portids));
 err_unlock_and_destroy_meters:
@@ -2322,12 +2311,6 @@ restart:
                goto exit_unlock_free;
        }
 
-       vport->upcall_stats = netdev_alloc_pcpu_stats(struct vport_upcall_stats_percpu);
-       if (!vport->upcall_stats) {
-               err = -ENOMEM;
-               goto exit_unlock_free_vport;
-       }
-
        err = ovs_vport_cmd_fill_info(vport, reply, genl_info_net(info),
                                      info->snd_portid, info->snd_seq, 0,
                                      OVS_VPORT_CMD_NEW, GFP_KERNEL);
@@ -2345,8 +2328,6 @@ restart:
        ovs_notify(&dp_vport_genl_family, reply, info);
        return 0;
 
-exit_unlock_free_vport:
-       ovs_dp_detach_port(vport);
 exit_unlock_free:
        ovs_unlock();
        kfree_skb(reply);
index 7e0f5c4..972ae01 100644 (file)
@@ -124,6 +124,7 @@ struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *ops,
 {
        struct vport *vport;
        size_t alloc_size;
+       int err;
 
        alloc_size = sizeof(struct vport);
        if (priv_size) {
@@ -135,17 +136,29 @@ struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *ops,
        if (!vport)
                return ERR_PTR(-ENOMEM);
 
+       vport->upcall_stats = netdev_alloc_pcpu_stats(struct vport_upcall_stats_percpu);
+       if (!vport->upcall_stats) {
+               err = -ENOMEM;
+               goto err_kfree_vport;
+       }
+
        vport->dp = parms->dp;
        vport->port_no = parms->port_no;
        vport->ops = ops;
        INIT_HLIST_NODE(&vport->dp_hash_node);
 
        if (ovs_vport_set_upcall_portids(vport, parms->upcall_portids)) {
-               kfree(vport);
-               return ERR_PTR(-EINVAL);
+               err = -EINVAL;
+               goto err_free_percpu;
        }
 
        return vport;
+
+err_free_percpu:
+       free_percpu(vport->upcall_stats);
+err_kfree_vport:
+       kfree(vport);
+       return ERR_PTR(err);
 }
 EXPORT_SYMBOL_GPL(ovs_vport_alloc);
 
@@ -165,6 +178,7 @@ void ovs_vport_free(struct vport *vport)
         * it is safe to use raw dereference.
         */
        kfree(rcu_dereference_raw(vport->upcall_portids));
+       free_percpu(vport->upcall_stats);
        kfree(vport);
 }
 EXPORT_SYMBOL_GPL(ovs_vport_free);
index 640d94e..a2dbeb2 100644 (file)
@@ -1934,10 +1934,8 @@ static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
        /* Move network header to the right position for VLAN tagged packets */
        if (likely(skb->dev->type == ARPHRD_ETHER) &&
            eth_type_vlan(skb->protocol) &&
-           __vlan_get_protocol(skb, skb->protocol, &depth) != 0) {
-               if (pskb_may_pull(skb, depth))
-                       skb_set_network_header(skb, depth);
-       }
+           vlan_get_protocol_and_depth(skb, skb->protocol, &depth) != 0)
+               skb_set_network_header(skb, depth);
 
        skb_probe_transport_header(skb);
 }
@@ -3203,6 +3201,9 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
 
        lock_sock(sk);
        spin_lock(&po->bind_lock);
+       if (!proto)
+               proto = po->num;
+
        rcu_read_lock();
 
        if (po->fanout) {
@@ -3301,7 +3302,7 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr,
        memcpy(name, uaddr->sa_data, sizeof(uaddr->sa_data_min));
        name[sizeof(uaddr->sa_data_min)] = 0;
 
-       return packet_do_bind(sk, name, 0, pkt_sk(sk)->num);
+       return packet_do_bind(sk, name, 0, 0);
 }
 
 static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
@@ -3318,8 +3319,7 @@ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len
        if (sll->sll_family != AF_PACKET)
                return -EINVAL;
 
-       return packet_do_bind(sk, NULL, sll->sll_ifindex,
-                             sll->sll_protocol ? : pkt_sk(sk)->num);
+       return packet_do_bind(sk, NULL, sll->sll_ifindex, sll->sll_protocol);
 }
 
 static struct proto packet_proto = {
index d0c4eda..f6b200c 100644 (file)
@@ -143,7 +143,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb,
        rp = nlmsg_data(nlh);
        rp->pdiag_family = AF_PACKET;
        rp->pdiag_type = sk->sk_type;
-       rp->pdiag_num = ntohs(po->num);
+       rp->pdiag_num = ntohs(READ_ONCE(po->num));
        rp->pdiag_ino = sk_ino;
        sock_diag_save_cookie(sk, rp->pdiag_cookie);
 
index 0f25a38..0f7a729 100644 (file)
@@ -783,7 +783,7 @@ int qrtr_ns_init(void)
                goto err_sock;
        }
 
-       qrtr_ns.workqueue = alloc_workqueue("qrtr_ns_handler", WQ_UNBOUND, 1);
+       qrtr_ns.workqueue = alloc_ordered_workqueue("qrtr_ns_handler", 0);
        if (!qrtr_ns.workqueue) {
                ret = -ENOMEM;
                goto err_sock;
index 31f738d..4c471fa 100644 (file)
@@ -980,6 +980,7 @@ static int __init af_rxrpc_init(void)
        BUILD_BUG_ON(sizeof(struct rxrpc_skb_priv) > sizeof_field(struct sk_buff, cb));
 
        ret = -ENOMEM;
+       rxrpc_gen_version_string();
        rxrpc_call_jar = kmem_cache_create(
                "rxrpc_call_jar", sizeof(struct rxrpc_call), 0,
                SLAB_HWCACHE_ALIGN, NULL);
@@ -988,7 +989,7 @@ static int __init af_rxrpc_init(void)
                goto error_call_jar;
        }
 
-       rxrpc_workqueue = alloc_workqueue("krxrpcd", WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+       rxrpc_workqueue = alloc_ordered_workqueue("krxrpcd", WQ_HIGHPRI | WQ_MEM_RECLAIM);
        if (!rxrpc_workqueue) {
                pr_notice("Failed to allocate work queue\n");
                goto error_work_queue;
index 5d44dc0..e8e14c6 100644 (file)
@@ -1068,6 +1068,7 @@ int rxrpc_get_server_data_key(struct rxrpc_connection *, const void *, time64_t,
 /*
  * local_event.c
  */
+void rxrpc_gen_version_string(void);
 void rxrpc_send_version_request(struct rxrpc_local *local,
                                struct rxrpc_host_header *hdr,
                                struct sk_buff *skb);
index 5e69ea6..993c69f 100644 (file)
 #include <generated/utsrelease.h>
 #include "ar-internal.h"
 
-static const char rxrpc_version_string[65] = "linux-" UTS_RELEASE " AF_RXRPC";
+static char rxrpc_version_string[65]; // "linux-" UTS_RELEASE " AF_RXRPC";
+
+/*
+ * Generate the VERSION packet string.
+ */
+void rxrpc_gen_version_string(void)
+{
+       snprintf(rxrpc_version_string, sizeof(rxrpc_version_string),
+                "linux-%.49s AF_RXRPC", UTS_RELEASE);
+}
 
 /*
  * Reply to a version request
index 9cc0bc7..abc71a0 100644 (file)
@@ -610,6 +610,7 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
        struct flow_offload_tuple tuple = {};
        enum ip_conntrack_info ctinfo;
        struct tcphdr *tcph = NULL;
+       bool force_refresh = false;
        struct flow_offload *flow;
        struct nf_conn *ct;
        u8 dir;
@@ -647,6 +648,7 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
                         * established state, then don't refresh.
                         */
                        return false;
+               force_refresh = true;
        }
 
        if (tcph && (unlikely(tcph->fin || tcph->rst))) {
@@ -660,7 +662,12 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p,
        else
                ctinfo = IP_CT_ESTABLISHED_REPLY;
 
-       flow_offload_refresh(nf_ft, flow);
+       flow_offload_refresh(nf_ft, flow, force_refresh);
+       if (!test_bit(IPS_ASSURED_BIT, &ct->status)) {
+               /* Process this flow in SW to allow promoting to ASSURED */
+               return false;
+       }
+
        nf_conntrack_get(&ct->ct_general);
        nf_ct_set(skb, ct, ctinfo);
        if (nf_ft->flags & NF_FLOWTABLE_COUNTER)
index fc945c7..c819b81 100644 (file)
 #include <linux/rtnetlink.h>
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
 #include <linux/slab.h>
+#include <net/ipv6.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <linux/tc_act/tc_pedit.h>
@@ -327,28 +330,58 @@ static bool offset_valid(struct sk_buff *skb, int offset)
        return true;
 }
 
-static void pedit_skb_hdr_offset(struct sk_buff *skb,
+static int pedit_l4_skb_offset(struct sk_buff *skb, int *hoffset, const int header_type)
+{
+       const int noff = skb_network_offset(skb);
+       int ret = -EINVAL;
+       struct iphdr _iph;
+
+       switch (skb->protocol) {
+       case htons(ETH_P_IP): {
+               const struct iphdr *iph = skb_header_pointer(skb, noff, sizeof(_iph), &_iph);
+
+               if (!iph)
+                       goto out;
+               *hoffset = noff + iph->ihl * 4;
+               ret = 0;
+               break;
+       }
+       case htons(ETH_P_IPV6):
+               ret = ipv6_find_hdr(skb, hoffset, header_type, NULL, NULL) == header_type ? 0 : -EINVAL;
+               break;
+       }
+out:
+       return ret;
+}
+
+static int pedit_skb_hdr_offset(struct sk_buff *skb,
                                 enum pedit_header_type htype, int *hoffset)
 {
+       int ret = -EINVAL;
        /* 'htype' is validated in the netlink parsing */
        switch (htype) {
        case TCA_PEDIT_KEY_EX_HDR_TYPE_ETH:
-               if (skb_mac_header_was_set(skb))
+               if (skb_mac_header_was_set(skb)) {
                        *hoffset = skb_mac_offset(skb);
+                       ret = 0;
+               }
                break;
        case TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK:
        case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
        case TCA_PEDIT_KEY_EX_HDR_TYPE_IP6:
                *hoffset = skb_network_offset(skb);
+               ret = 0;
                break;
        case TCA_PEDIT_KEY_EX_HDR_TYPE_TCP:
+               ret = pedit_l4_skb_offset(skb, hoffset, IPPROTO_TCP);
+               break;
        case TCA_PEDIT_KEY_EX_HDR_TYPE_UDP:
-               if (skb_transport_header_was_set(skb))
-                       *hoffset = skb_transport_offset(skb);
+               ret = pedit_l4_skb_offset(skb, hoffset, IPPROTO_UDP);
                break;
        default:
                break;
        }
+       return ret;
 }
 
 TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
@@ -384,6 +417,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
                int hoffset = 0;
                u32 *ptr, hdata;
                u32 val;
+               int rc;
 
                if (tkey_ex) {
                        htype = tkey_ex->htype;
@@ -392,7 +426,11 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
                        tkey_ex++;
                }
 
-               pedit_skb_hdr_offset(skb, htype, &hoffset);
+               rc = pedit_skb_hdr_offset(skb, htype, &hoffset);
+               if (rc) {
+                       pr_info_ratelimited("tc action pedit unable to extract header offset for header type (0x%x)\n", htype);
+                       goto bad;
+               }
 
                if (tkey->offmask) {
                        u8 *d, _d;
index 227cba5..2e9dce0 100644 (file)
@@ -357,23 +357,23 @@ static int tcf_police_dump(struct sk_buff *skb, struct tc_action *a,
        opt.burst = PSCHED_NS2TICKS(p->tcfp_burst);
        if (p->rate_present) {
                psched_ratecfg_getrate(&opt.rate, &p->rate);
-               if ((police->params->rate.rate_bytes_ps >= (1ULL << 32)) &&
+               if ((p->rate.rate_bytes_ps >= (1ULL << 32)) &&
                    nla_put_u64_64bit(skb, TCA_POLICE_RATE64,
-                                     police->params->rate.rate_bytes_ps,
+                                     p->rate.rate_bytes_ps,
                                      TCA_POLICE_PAD))
                        goto nla_put_failure;
        }
        if (p->peak_present) {
                psched_ratecfg_getrate(&opt.peakrate, &p->peak);
-               if ((police->params->peak.rate_bytes_ps >= (1ULL << 32)) &&
+               if ((p->peak.rate_bytes_ps >= (1ULL << 32)) &&
                    nla_put_u64_64bit(skb, TCA_POLICE_PEAKRATE64,
-                                     police->params->peak.rate_bytes_ps,
+                                     p->peak.rate_bytes_ps,
                                      TCA_POLICE_PAD))
                        goto nla_put_failure;
        }
        if (p->pps_present) {
                if (nla_put_u64_64bit(skb, TCA_POLICE_PKTRATE64,
-                                     police->params->ppsrate.rate_pkts_ps,
+                                     p->ppsrate.rate_pkts_ps,
                                      TCA_POLICE_PAD))
                        goto nla_put_failure;
                if (nla_put_u64_64bit(skb, TCA_POLICE_PKTBURST64,
index 2621550..a193cc7 100644 (file)
@@ -43,8 +43,6 @@
 #include <net/flow_offload.h>
 #include <net/tc_wrapper.h>
 
-extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
-
 /* The list of all installed classifier types */
 static LIST_HEAD(tcf_proto_base);
 
@@ -659,8 +657,8 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool by_act,
 {
        struct tcf_block *block = chain->block;
        const struct tcf_proto_ops *tmplt_ops;
+       unsigned int refcnt, non_act_refcnt;
        bool free_block = false;
-       unsigned int refcnt;
        void *tmplt_priv;
 
        mutex_lock(&block->lock);
@@ -680,13 +678,15 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool by_act,
         * save these to temporary variables.
         */
        refcnt = --chain->refcnt;
+       non_act_refcnt = refcnt - chain->action_refcnt;
        tmplt_ops = chain->tmplt_ops;
        tmplt_priv = chain->tmplt_priv;
 
-       /* The last dropped non-action reference will trigger notification. */
-       if (refcnt - chain->action_refcnt == 0 && !by_act) {
-               tc_chain_notify_delete(tmplt_ops, tmplt_priv, chain->index,
-                                      block, NULL, 0, 0, false);
+       if (non_act_refcnt == chain->explicitly_created && !by_act) {
+               if (non_act_refcnt == 0)
+                       tc_chain_notify_delete(tmplt_ops, tmplt_priv,
+                                              chain->index, block, NULL, 0, 0,
+                                              false);
                /* Last reference to chain, no need to lock. */
                chain->flushing = false;
        }
@@ -2952,6 +2952,7 @@ static int tc_chain_tmplt_add(struct tcf_chain *chain, struct net *net,
                return PTR_ERR(ops);
        if (!ops->tmplt_create || !ops->tmplt_destroy || !ops->tmplt_dump) {
                NL_SET_ERR_MSG(extack, "Chain templates are not supported with specified classifier");
+               module_put(ops->owner);
                return -EOPNOTSUPP;
        }
 
index 9dbc433..815c3e4 100644 (file)
@@ -1153,6 +1153,9 @@ static int fl_set_geneve_opt(const struct nlattr *nla, struct fl_flow_key *key,
        if (option_len > sizeof(struct geneve_opt))
                data_len = option_len - sizeof(struct geneve_opt);
 
+       if (key->enc_opts.len > FLOW_DIS_TUN_OPTS_MAX - 4)
+               return -ERANGE;
+
        opt = (struct geneve_opt *)&key->enc_opts.data[key->enc_opts.len];
        memset(opt, 0xff, option_len);
        opt->length = data_len / 4;
index 4e2e269..d15d50d 100644 (file)
@@ -718,13 +718,19 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp,
                         struct nlattr *est, u32 flags, u32 fl_flags,
                         struct netlink_ext_ack *extack)
 {
-       int err;
+       int err, ifindex = -1;
 
        err = tcf_exts_validate_ex(net, tp, tb, est, &n->exts, flags,
                                   fl_flags, extack);
        if (err < 0)
                return err;
 
+       if (tb[TCA_U32_INDEV]) {
+               ifindex = tcf_change_indev(net, tb[TCA_U32_INDEV], extack);
+               if (ifindex < 0)
+                       return -EINVAL;
+       }
+
        if (tb[TCA_U32_LINK]) {
                u32 handle = nla_get_u32(tb[TCA_U32_LINK]);
                struct tc_u_hnode *ht_down = NULL, *ht_old;
@@ -759,13 +765,9 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp,
                tcf_bind_filter(tp, &n->res, base);
        }
 
-       if (tb[TCA_U32_INDEV]) {
-               int ret;
-               ret = tcf_change_indev(net, tb[TCA_U32_INDEV], extack);
-               if (ret < 0)
-                       return -EINVAL;
-               n->ifindex = ret;
-       }
+       if (ifindex >= 0)
+               n->ifindex = ifindex;
+
        return 0;
 }
 
index fdb8f42..aa6b1fe 100644 (file)
@@ -309,7 +309,7 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 
        if (dev_ingress_queue(dev))
                q = qdisc_match_from_root(
-                       dev_ingress_queue(dev)->qdisc_sleeping,
+                       rtnl_dereference(dev_ingress_queue(dev)->qdisc_sleeping),
                        handle);
 out:
        return q;
@@ -328,7 +328,8 @@ struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle)
 
        nq = dev_ingress_queue_rcu(dev);
        if (nq)
-               q = qdisc_match_from_root(nq->qdisc_sleeping, handle);
+               q = qdisc_match_from_root(rcu_dereference(nq->qdisc_sleeping),
+                                         handle);
 out:
        return q;
 }
@@ -634,8 +635,13 @@ EXPORT_SYMBOL(qdisc_watchdog_init);
 void qdisc_watchdog_schedule_range_ns(struct qdisc_watchdog *wd, u64 expires,
                                      u64 delta_ns)
 {
-       if (test_bit(__QDISC_STATE_DEACTIVATED,
-                    &qdisc_root_sleeping(wd->qdisc)->state))
+       bool deactivated;
+
+       rcu_read_lock();
+       deactivated = test_bit(__QDISC_STATE_DEACTIVATED,
+                              &qdisc_root_sleeping(wd->qdisc)->state);
+       rcu_read_unlock();
+       if (deactivated)
                return;
 
        if (hrtimer_is_queued(&wd->timer)) {
@@ -1073,17 +1079,29 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 
        if (parent == NULL) {
                unsigned int i, num_q, ingress;
+               struct netdev_queue *dev_queue;
 
                ingress = 0;
                num_q = dev->num_tx_queues;
                if ((q && q->flags & TCQ_F_INGRESS) ||
                    (new && new->flags & TCQ_F_INGRESS)) {
-                       num_q = 1;
                        ingress = 1;
-                       if (!dev_ingress_queue(dev)) {
+                       dev_queue = dev_ingress_queue(dev);
+                       if (!dev_queue) {
                                NL_SET_ERR_MSG(extack, "Device does not have an ingress queue");
                                return -ENOENT;
                        }
+
+                       q = rtnl_dereference(dev_queue->qdisc_sleeping);
+
+                       /* This is the counterpart of that qdisc_refcount_inc_nz() call in
+                        * __tcf_qdisc_find() for filter requests.
+                        */
+                       if (!qdisc_refcount_dec_if_one(q)) {
+                               NL_SET_ERR_MSG(extack,
+                                              "Current ingress or clsact Qdisc has ongoing filter requests");
+                               return -EBUSY;
+                       }
                }
 
                if (dev->flags & IFF_UP)
@@ -1094,18 +1112,26 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
                if (new && new->ops->attach && !ingress)
                        goto skip;
 
-               for (i = 0; i < num_q; i++) {
-                       struct netdev_queue *dev_queue = dev_ingress_queue(dev);
-
-                       if (!ingress)
+               if (!ingress) {
+                       for (i = 0; i < num_q; i++) {
                                dev_queue = netdev_get_tx_queue(dev, i);
+                               old = dev_graft_qdisc(dev_queue, new);
 
-                       old = dev_graft_qdisc(dev_queue, new);
-                       if (new && i > 0)
-                               qdisc_refcount_inc(new);
-
-                       if (!ingress)
+                               if (new && i > 0)
+                                       qdisc_refcount_inc(new);
                                qdisc_put(old);
+                       }
+               } else {
+                       old = dev_graft_qdisc(dev_queue, NULL);
+
+                       /* {ingress,clsact}_destroy() @old before grafting @new to avoid
+                        * unprotected concurrent accesses to net_device::miniq_{in,e}gress
+                        * pointer(s) in mini_qdisc_pair_swap().
+                        */
+                       qdisc_notify(net, skb, n, classid, old, new, extack);
+                       qdisc_destroy(old);
+
+                       dev_graft_qdisc(dev_queue, new);
                }
 
 skip:
@@ -1119,8 +1145,6 @@ skip:
 
                        if (new && new->ops->attach)
                                new->ops->attach(new);
-               } else {
-                       notify_and_destroy(net, skb, n, classid, old, new, extack);
                }
 
                if (dev->flags & IFF_UP)
@@ -1252,7 +1276,12 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
        sch->parent = parent;
 
        if (handle == TC_H_INGRESS) {
-               sch->flags |= TCQ_F_INGRESS;
+               if (!(sch->flags & TCQ_F_INGRESS)) {
+                       NL_SET_ERR_MSG(extack,
+                                      "Specified parent ID is reserved for ingress and clsact Qdiscs");
+                       err = -EINVAL;
+                       goto err_out3;
+               }
                handle = TC_H_MAKE(TC_H_INGRESS, 0);
        } else {
                if (handle == 0) {
@@ -1473,7 +1502,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
                                }
                                q = qdisc_leaf(p, clid);
                        } else if (dev_ingress_queue(dev)) {
-                               q = dev_ingress_queue(dev)->qdisc_sleeping;
+                               q = rtnl_dereference(dev_ingress_queue(dev)->qdisc_sleeping);
                        }
                } else {
                        q = rtnl_dereference(dev->qdisc);
@@ -1559,7 +1588,7 @@ replay:
                                }
                                q = qdisc_leaf(p, clid);
                        } else if (dev_ingress_queue_create(dev)) {
-                               q = dev_ingress_queue(dev)->qdisc_sleeping;
+                               q = rtnl_dereference(dev_ingress_queue(dev)->qdisc_sleeping);
                        }
                } else {
                        q = rtnl_dereference(dev->qdisc);
@@ -1591,11 +1620,20 @@ replay:
                                        NL_SET_ERR_MSG(extack, "Invalid qdisc name");
                                        return -EINVAL;
                                }
+                               if (q->flags & TCQ_F_INGRESS) {
+                                       NL_SET_ERR_MSG(extack,
+                                                      "Cannot regraft ingress or clsact Qdiscs");
+                                       return -EINVAL;
+                               }
                                if (q == p ||
                                    (p && check_loop(q, p, 0))) {
                                        NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected");
                                        return -ELOOP;
                                }
+                               if (clid == TC_H_INGRESS) {
+                                       NL_SET_ERR_MSG(extack, "Ingress cannot graft directly");
+                                       return -EINVAL;
+                               }
                                qdisc_refcount_inc(q);
                                goto graft;
                        } else {
@@ -1791,8 +1829,8 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb)
 
                dev_queue = dev_ingress_queue(dev);
                if (dev_queue &&
-                   tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb,
-                                      &q_idx, s_q_idx, false,
+                   tc_dump_qdisc_root(rtnl_dereference(dev_queue->qdisc_sleeping),
+                                      skb, cb, &q_idx, s_q_idx, false,
                                       tca[TCA_DUMP_INVISIBLE]) < 0)
                        goto done;
 
@@ -2235,8 +2273,8 @@ static int tc_dump_tclass(struct sk_buff *skb, struct netlink_callback *cb)
 
        dev_queue = dev_ingress_queue(dev);
        if (dev_queue &&
-           tc_dump_tclass_root(dev_queue->qdisc_sleeping, skb, tcm, cb,
-                               &t, s_t, false) < 0)
+           tc_dump_tclass_root(rtnl_dereference(dev_queue->qdisc_sleeping),
+                               skb, tcm, cb, &t, s_t, false) < 0)
                goto done;
 
 done:
@@ -2288,7 +2326,9 @@ static struct pernet_operations psched_net_ops = {
        .exit = psched_net_exit,
 };
 
+#if IS_ENABLED(CONFIG_RETPOLINE)
 DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper);
+#endif
 
 static int __init pktsched_init(void)
 {
index 6980796..591d87d 100644 (file)
@@ -201,6 +201,11 @@ out:
        return NET_XMIT_CN;
 }
 
+static struct netlink_range_validation fq_pie_q_range = {
+       .min = 1,
+       .max = 1 << 20,
+};
+
 static const struct nla_policy fq_pie_policy[TCA_FQ_PIE_MAX + 1] = {
        [TCA_FQ_PIE_LIMIT]              = {.type = NLA_U32},
        [TCA_FQ_PIE_FLOWS]              = {.type = NLA_U32},
@@ -208,7 +213,8 @@ static const struct nla_policy fq_pie_policy[TCA_FQ_PIE_MAX + 1] = {
        [TCA_FQ_PIE_TUPDATE]            = {.type = NLA_U32},
        [TCA_FQ_PIE_ALPHA]              = {.type = NLA_U32},
        [TCA_FQ_PIE_BETA]               = {.type = NLA_U32},
-       [TCA_FQ_PIE_QUANTUM]            = {.type = NLA_U32},
+       [TCA_FQ_PIE_QUANTUM]            =
+                       NLA_POLICY_FULL_RANGE(NLA_U32, &fq_pie_q_range),
        [TCA_FQ_PIE_MEMORY_LIMIT]       = {.type = NLA_U32},
        [TCA_FQ_PIE_ECN_PROB]           = {.type = NLA_U32},
        [TCA_FQ_PIE_ECN]                = {.type = NLA_U32},
@@ -373,6 +379,7 @@ static void fq_pie_timer(struct timer_list *t)
        spinlock_t *root_lock; /* to lock qdisc for probability calculations */
        u32 idx;
 
+       rcu_read_lock();
        root_lock = qdisc_lock(qdisc_root_sleeping(sch));
        spin_lock(root_lock);
 
@@ -385,6 +392,7 @@ static void fq_pie_timer(struct timer_list *t)
                mod_timer(&q->adapt_timer, jiffies + q->p_params.tupdate);
 
        spin_unlock(root_lock);
+       rcu_read_unlock();
 }
 
 static int fq_pie_init(struct Qdisc *sch, struct nlattr *opt,
index 37e41f9..5d7e23f 100644 (file)
@@ -648,7 +648,7 @@ struct Qdisc_ops noop_qdisc_ops __read_mostly = {
 
 static struct netdev_queue noop_netdev_queue = {
        RCU_POINTER_INITIALIZER(qdisc, &noop_qdisc),
-       .qdisc_sleeping =       &noop_qdisc,
+       RCU_POINTER_INITIALIZER(qdisc_sleeping, &noop_qdisc),
 };
 
 struct Qdisc noop_qdisc = {
@@ -1046,7 +1046,7 @@ static void qdisc_free_cb(struct rcu_head *head)
        qdisc_free(q);
 }
 
-static void qdisc_destroy(struct Qdisc *qdisc)
+static void __qdisc_destroy(struct Qdisc *qdisc)
 {
        const struct Qdisc_ops  *ops = qdisc->ops;
 
@@ -1070,6 +1070,14 @@ static void qdisc_destroy(struct Qdisc *qdisc)
        call_rcu(&qdisc->rcu, qdisc_free_cb);
 }
 
+void qdisc_destroy(struct Qdisc *qdisc)
+{
+       if (qdisc->flags & TCQ_F_BUILTIN)
+               return;
+
+       __qdisc_destroy(qdisc);
+}
+
 void qdisc_put(struct Qdisc *qdisc)
 {
        if (!qdisc)
@@ -1079,7 +1087,7 @@ void qdisc_put(struct Qdisc *qdisc)
            !refcount_dec_and_test(&qdisc->refcnt))
                return;
 
-       qdisc_destroy(qdisc);
+       __qdisc_destroy(qdisc);
 }
 EXPORT_SYMBOL(qdisc_put);
 
@@ -1094,7 +1102,7 @@ void qdisc_put_unlocked(struct Qdisc *qdisc)
            !refcount_dec_and_rtnl_lock(&qdisc->refcnt))
                return;
 
-       qdisc_destroy(qdisc);
+       __qdisc_destroy(qdisc);
        rtnl_unlock();
 }
 EXPORT_SYMBOL(qdisc_put_unlocked);
@@ -1103,7 +1111,7 @@ EXPORT_SYMBOL(qdisc_put_unlocked);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
                              struct Qdisc *qdisc)
 {
-       struct Qdisc *oqdisc = dev_queue->qdisc_sleeping;
+       struct Qdisc *oqdisc = rtnl_dereference(dev_queue->qdisc_sleeping);
        spinlock_t *root_lock;
 
        root_lock = qdisc_lock(oqdisc);
@@ -1112,7 +1120,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
        /* ... and graft new one */
        if (qdisc == NULL)
                qdisc = &noop_qdisc;
-       dev_queue->qdisc_sleeping = qdisc;
+       rcu_assign_pointer(dev_queue->qdisc_sleeping, qdisc);
        rcu_assign_pointer(dev_queue->qdisc, &noop_qdisc);
 
        spin_unlock_bh(root_lock);
@@ -1125,12 +1133,12 @@ static void shutdown_scheduler_queue(struct net_device *dev,
                                     struct netdev_queue *dev_queue,
                                     void *_qdisc_default)
 {
-       struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+       struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc_sleeping);
        struct Qdisc *qdisc_default = _qdisc_default;
 
        if (qdisc) {
                rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
-               dev_queue->qdisc_sleeping = qdisc_default;
+               rcu_assign_pointer(dev_queue->qdisc_sleeping, qdisc_default);
 
                qdisc_put(qdisc);
        }
@@ -1154,7 +1162,7 @@ static void attach_one_default_qdisc(struct net_device *dev,
 
        if (!netif_is_multiqueue(dev))
                qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
-       dev_queue->qdisc_sleeping = qdisc;
+       rcu_assign_pointer(dev_queue->qdisc_sleeping, qdisc);
 }
 
 static void attach_default_qdiscs(struct net_device *dev)
@@ -1167,7 +1175,7 @@ static void attach_default_qdiscs(struct net_device *dev)
        if (!netif_is_multiqueue(dev) ||
            dev->priv_flags & IFF_NO_QUEUE) {
                netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
-               qdisc = txq->qdisc_sleeping;
+               qdisc = rtnl_dereference(txq->qdisc_sleeping);
                rcu_assign_pointer(dev->qdisc, qdisc);
                qdisc_refcount_inc(qdisc);
        } else {
@@ -1186,7 +1194,7 @@ static void attach_default_qdiscs(struct net_device *dev)
                netdev_for_each_tx_queue(dev, shutdown_scheduler_queue, &noop_qdisc);
                dev->priv_flags |= IFF_NO_QUEUE;
                netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
-               qdisc = txq->qdisc_sleeping;
+               qdisc = rtnl_dereference(txq->qdisc_sleeping);
                rcu_assign_pointer(dev->qdisc, qdisc);
                qdisc_refcount_inc(qdisc);
                dev->priv_flags ^= IFF_NO_QUEUE;
@@ -1202,7 +1210,7 @@ static void transition_one_qdisc(struct net_device *dev,
                                 struct netdev_queue *dev_queue,
                                 void *_need_watchdog)
 {
-       struct Qdisc *new_qdisc = dev_queue->qdisc_sleeping;
+       struct Qdisc *new_qdisc = rtnl_dereference(dev_queue->qdisc_sleeping);
        int *need_watchdog_p = _need_watchdog;
 
        if (!(new_qdisc->flags & TCQ_F_BUILTIN))
@@ -1272,7 +1280,7 @@ static void dev_reset_queue(struct net_device *dev,
        struct Qdisc *qdisc;
        bool nolock;
 
-       qdisc = dev_queue->qdisc_sleeping;
+       qdisc = rtnl_dereference(dev_queue->qdisc_sleeping);
        if (!qdisc)
                return;
 
@@ -1303,7 +1311,7 @@ static bool some_qdisc_is_busy(struct net_device *dev)
                int val;
 
                dev_queue = netdev_get_tx_queue(dev, i);
-               q = dev_queue->qdisc_sleeping;
+               q = rtnl_dereference(dev_queue->qdisc_sleeping);
 
                root_lock = qdisc_lock(q);
                spin_lock_bh(root_lock);
@@ -1379,7 +1387,7 @@ EXPORT_SYMBOL(dev_deactivate);
 static int qdisc_change_tx_queue_len(struct net_device *dev,
                                     struct netdev_queue *dev_queue)
 {
-       struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+       struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc_sleeping);
        const struct Qdisc_ops *ops = qdisc->ops;
 
        if (ops->change_tx_queue_len)
@@ -1404,7 +1412,7 @@ void mq_change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx)
        unsigned int i;
 
        for (i = new_real_tx; i < dev->real_num_tx_queues; i++) {
-               qdisc = netdev_get_tx_queue(dev, i)->qdisc_sleeping;
+               qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc_sleeping);
                /* Only update the default qdiscs we created,
                 * qdiscs with handles are always hashed.
                 */
@@ -1412,7 +1420,7 @@ void mq_change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx)
                        qdisc_hash_del(qdisc);
        }
        for (i = dev->real_num_tx_queues; i < new_real_tx; i++) {
-               qdisc = netdev_get_tx_queue(dev, i)->qdisc_sleeping;
+               qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc_sleeping);
                if (qdisc != &noop_qdisc && !qdisc->handle)
                        qdisc_hash_add(qdisc, false);
        }
@@ -1449,7 +1457,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
        struct Qdisc *qdisc = _qdisc;
 
        rcu_assign_pointer(dev_queue->qdisc, qdisc);
-       dev_queue->qdisc_sleeping = qdisc;
+       rcu_assign_pointer(dev_queue->qdisc_sleeping, qdisc);
 }
 
 void dev_init_scheduler(struct net_device *dev)
index 8483812..e43a454 100644 (file)
@@ -80,6 +80,9 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
        struct net_device *dev = qdisc_dev(sch);
        int err;
 
+       if (sch->parent != TC_H_INGRESS)
+               return -EOPNOTSUPP;
+
        net_inc_ingress_queue();
 
        mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
@@ -101,6 +104,9 @@ static void ingress_destroy(struct Qdisc *sch)
 {
        struct ingress_sched_data *q = qdisc_priv(sch);
 
+       if (sch->parent != TC_H_INGRESS)
+               return;
+
        tcf_block_put_ext(q->block, sch, &q->block_info);
        net_dec_ingress_queue();
 }
@@ -134,7 +140,7 @@ static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
        .cl_ops                 =       &ingress_class_ops,
        .id                     =       "ingress",
        .priv_size              =       sizeof(struct ingress_sched_data),
-       .static_flags           =       TCQ_F_CPUSTATS,
+       .static_flags           =       TCQ_F_INGRESS | TCQ_F_CPUSTATS,
        .init                   =       ingress_init,
        .destroy                =       ingress_destroy,
        .dump                   =       ingress_dump,
@@ -219,6 +225,9 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
        struct net_device *dev = qdisc_dev(sch);
        int err;
 
+       if (sch->parent != TC_H_CLSACT)
+               return -EOPNOTSUPP;
+
        net_inc_ingress_queue();
        net_inc_egress_queue();
 
@@ -248,6 +257,9 @@ static void clsact_destroy(struct Qdisc *sch)
 {
        struct clsact_sched_data *q = qdisc_priv(sch);
 
+       if (sch->parent != TC_H_CLSACT)
+               return;
+
        tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
        tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
 
@@ -269,7 +281,7 @@ static struct Qdisc_ops clsact_qdisc_ops __read_mostly = {
        .cl_ops                 =       &clsact_class_ops,
        .id                     =       "clsact",
        .priv_size              =       sizeof(struct clsact_sched_data),
-       .static_flags           =       TCQ_F_CPUSTATS,
+       .static_flags           =       TCQ_F_INGRESS | TCQ_F_CPUSTATS,
        .init                   =       clsact_init,
        .destroy                =       clsact_destroy,
        .dump                   =       ingress_dump,
index d0bc660..c860119 100644 (file)
@@ -141,7 +141,7 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
         * qdisc totals are added at end.
         */
        for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
-               qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
+               qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
                spin_lock_bh(qdisc_lock(qdisc));
 
                gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
@@ -202,7 +202,7 @@ static struct Qdisc *mq_leaf(struct Qdisc *sch, unsigned long cl)
 {
        struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
 
-       return dev_queue->qdisc_sleeping;
+       return rtnl_dereference(dev_queue->qdisc_sleeping);
 }
 
 static unsigned long mq_find(struct Qdisc *sch, u32 classid)
@@ -221,7 +221,7 @@ static int mq_dump_class(struct Qdisc *sch, unsigned long cl,
 
        tcm->tcm_parent = TC_H_ROOT;
        tcm->tcm_handle |= TC_H_MIN(cl);
-       tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
+       tcm->tcm_info = rtnl_dereference(dev_queue->qdisc_sleeping)->handle;
        return 0;
 }
 
@@ -230,7 +230,7 @@ static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 {
        struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
 
-       sch = dev_queue->qdisc_sleeping;
+       sch = rtnl_dereference(dev_queue->qdisc_sleeping);
        if (gnet_stats_copy_basic(d, sch->cpu_bstats, &sch->bstats, true) < 0 ||
            qdisc_qstats_copy(d, sch) < 0)
                return -1;
index dc5a0ff..ab69ff7 100644 (file)
@@ -557,7 +557,7 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
         * qdisc totals are added at end.
         */
        for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
-               qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
+               qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
                spin_lock_bh(qdisc_lock(qdisc));
 
                gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
@@ -604,7 +604,7 @@ static struct Qdisc *mqprio_leaf(struct Qdisc *sch, unsigned long cl)
        if (!dev_queue)
                return NULL;
 
-       return dev_queue->qdisc_sleeping;
+       return rtnl_dereference(dev_queue->qdisc_sleeping);
 }
 
 static unsigned long mqprio_find(struct Qdisc *sch, u32 classid)
@@ -637,7 +637,7 @@ static int mqprio_dump_class(struct Qdisc *sch, unsigned long cl,
                tcm->tcm_parent = (tc < 0) ? 0 :
                        TC_H_MAKE(TC_H_MAJ(sch->handle),
                                  TC_H_MIN(tc + TC_H_MIN_PRIORITY));
-               tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
+               tcm->tcm_info = rtnl_dereference(dev_queue->qdisc_sleeping)->handle;
        } else {
                tcm->tcm_parent = TC_H_ROOT;
                tcm->tcm_info = 0;
@@ -693,7 +693,7 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
        } else {
                struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
 
-               sch = dev_queue->qdisc_sleeping;
+               sch = rtnl_dereference(dev_queue->qdisc_sleeping);
                if (gnet_stats_copy_basic(d, sch->cpu_bstats,
                                          &sch->bstats, true) < 0 ||
                    qdisc_qstats_copy(d, sch) < 0)
index 6ef3021..e79be1b 100644 (file)
@@ -966,6 +966,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
        if (ret < 0)
                return ret;
 
+       sch_tree_lock(sch);
        /* backup q->clg and q->loss_model */
        old_clg = q->clg;
        old_loss_model = q->loss_model;
@@ -974,7 +975,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
                ret = get_loss_clg(q, tb[TCA_NETEM_LOSS]);
                if (ret) {
                        q->loss_model = old_loss_model;
-                       return ret;
+                       goto unlock;
                }
        } else {
                q->loss_model = CLG_RANDOM;
@@ -1041,6 +1042,8 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
        /* capping jitter to the range acceptable by tabledist() */
        q->jitter = min_t(s64, abs(q->jitter), INT_MAX);
 
+unlock:
+       sch_tree_unlock(sch);
        return ret;
 
 get_table_failure:
@@ -1050,7 +1053,8 @@ get_table_failure:
         */
        q->clg = old_clg;
        q->loss_model = old_loss_model;
-       return ret;
+
+       goto unlock;
 }
 
 static int netem_init(struct Qdisc *sch, struct nlattr *opt,
index 2152a56..2da6250 100644 (file)
@@ -421,8 +421,10 @@ static void pie_timer(struct timer_list *t)
 {
        struct pie_sched_data *q = from_timer(q, t, adapt_timer);
        struct Qdisc *sch = q->sch;
-       spinlock_t *root_lock = qdisc_lock(qdisc_root_sleeping(sch));
+       spinlock_t *root_lock;
 
+       rcu_read_lock();
+       root_lock = qdisc_lock(qdisc_root_sleeping(sch));
        spin_lock(root_lock);
        pie_calculate_probability(&q->params, &q->vars, sch->qstats.backlog);
 
@@ -430,6 +432,7 @@ static void pie_timer(struct timer_list *t)
        if (q->params.tupdate)
                mod_timer(&q->adapt_timer, jiffies + q->params.tupdate);
        spin_unlock(root_lock);
+       rcu_read_unlock();
 }
 
 static int pie_init(struct Qdisc *sch, struct nlattr *opt,
index 9812932..16277b6 100644 (file)
@@ -321,12 +321,15 @@ static inline void red_adaptative_timer(struct timer_list *t)
 {
        struct red_sched_data *q = from_timer(q, t, adapt_timer);
        struct Qdisc *sch = q->sch;
-       spinlock_t *root_lock = qdisc_lock(qdisc_root_sleeping(sch));
+       spinlock_t *root_lock;
 
+       rcu_read_lock();
+       root_lock = qdisc_lock(qdisc_root_sleeping(sch));
        spin_lock(root_lock);
        red_adaptative_algo(&q->parms, &q->vars);
        mod_timer(&q->adapt_timer, jiffies + HZ/2);
        spin_unlock(root_lock);
+       rcu_read_unlock();
 }
 
 static int red_init(struct Qdisc *sch, struct nlattr *opt,
index abd4363..66dcb18 100644 (file)
@@ -606,10 +606,12 @@ static void sfq_perturbation(struct timer_list *t)
 {
        struct sfq_sched_data *q = from_timer(q, t, perturb_timer);
        struct Qdisc *sch = q->sch;
-       spinlock_t *root_lock = qdisc_lock(qdisc_root_sleeping(sch));
+       spinlock_t *root_lock;
        siphash_key_t nkey;
 
        get_random_bytes(&nkey, sizeof(nkey));
+       rcu_read_lock();
+       root_lock = qdisc_lock(qdisc_root_sleeping(sch));
        spin_lock(root_lock);
        q->perturbation = nkey;
        if (!q->filter_list && q->tail)
@@ -618,6 +620,7 @@ static void sfq_perturbation(struct timer_list *t)
 
        if (q->perturb_period)
                mod_timer(&q->perturb_timer, jiffies + q->perturb_period);
+       rcu_read_unlock();
 }
 
 static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
index 76db9a1..cf0e61e 100644 (file)
@@ -797,6 +797,9 @@ static struct sk_buff *taprio_dequeue_tc_priority(struct Qdisc *sch,
 
                        taprio_next_tc_txq(dev, tc, &q->cur_txq[tc]);
 
+                       if (q->cur_txq[tc] >= dev->num_tx_queues)
+                               q->cur_txq[tc] = first_txq;
+
                        if (skb)
                                return skb;
                } while (q->cur_txq[tc] != first_txq);
@@ -2358,7 +2361,7 @@ static struct Qdisc *taprio_leaf(struct Qdisc *sch, unsigned long cl)
        if (!dev_queue)
                return NULL;
 
-       return dev_queue->qdisc_sleeping;
+       return rtnl_dereference(dev_queue->qdisc_sleeping);
 }
 
 static unsigned long taprio_find(struct Qdisc *sch, u32 classid)
@@ -2377,7 +2380,7 @@ static int taprio_dump_class(struct Qdisc *sch, unsigned long cl,
 
        tcm->tcm_parent = TC_H_ROOT;
        tcm->tcm_handle |= TC_H_MIN(cl);
-       tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
+       tcm->tcm_info = rtnl_dereference(dev_queue->qdisc_sleeping)->handle;
 
        return 0;
 }
@@ -2389,7 +2392,7 @@ static int taprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 {
        struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
 
-       sch = dev_queue->qdisc_sleeping;
+       sch = rtnl_dereference(dev_queue->qdisc_sleeping);
        if (gnet_stats_copy_basic(d, NULL, &sch->bstats, true) < 0 ||
            qdisc_qstats_copy(d, sch) < 0)
                return -1;
index 16f9238..7721239 100644 (file)
@@ -297,7 +297,7 @@ restart:
                struct net_device *slave = qdisc_dev(q);
                struct netdev_queue *slave_txq = netdev_get_tx_queue(slave, 0);
 
-               if (slave_txq->qdisc_sleeping != q)
+               if (rcu_access_pointer(slave_txq->qdisc_sleeping) != q)
                        continue;
                if (netif_xmit_stopped(netdev_get_tx_queue(slave, subq)) ||
                    !netif_running(slave)) {
index 7fbeb99..23d6633 100644 (file)
@@ -1250,7 +1250,10 @@ static int sctp_side_effects(enum sctp_event_type event_type,
        default:
                pr_err("impossible disposition %d in state %d, event_type %d, event_id %d\n",
                       status, state, event_type, subtype.chunk);
-               BUG();
+               error = status;
+               if (error >= 0)
+                       error = -EINVAL;
+               WARN_ON_ONCE(1);
                break;
        }
 
index 97f1155..08fdf12 100644 (file)
@@ -4482,7 +4482,7 @@ enum sctp_disposition sctp_sf_eat_auth(struct net *net,
                                    SCTP_AUTH_NEW_KEY, GFP_ATOMIC);
 
                if (!ev)
-                       return -ENOMEM;
+                       return SCTP_DISPOSITION_NOMEM;
 
                sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
                                SCTP_ULPEVENT(ev));
index 2f66a20..2abe45a 100644 (file)
@@ -324,9 +324,12 @@ bool sctp_transport_pl_recv(struct sctp_transport *t)
                t->pl.probe_size += SCTP_PL_BIG_STEP;
        } else if (t->pl.state == SCTP_PL_SEARCH) {
                if (!t->pl.probe_high) {
-                       t->pl.probe_size = min(t->pl.probe_size + SCTP_PL_BIG_STEP,
-                                              SCTP_MAX_PLPMTU);
-                       return false;
+                       if (t->pl.probe_size < SCTP_MAX_PLPMTU) {
+                               t->pl.probe_size = min(t->pl.probe_size + SCTP_PL_BIG_STEP,
+                                                      SCTP_MAX_PLPMTU);
+                               return false;
+                       }
+                       t->pl.probe_high = SCTP_MAX_PLPMTU;
                }
                t->pl.probe_size += SCTP_PL_MIN_STEP;
                if (t->pl.probe_size >= t->pl.probe_high) {
@@ -341,7 +344,7 @@ bool sctp_transport_pl_recv(struct sctp_transport *t)
        } else if (t->pl.state == SCTP_PL_COMPLETE) {
                /* Raise probe_size again after 30 * interval in Search Complete */
                t->pl.state = SCTP_PL_SEARCH; /* Search Complete -> Search */
-               t->pl.probe_size += SCTP_PL_MIN_STEP;
+               t->pl.probe_size = min(t->pl.probe_size + SCTP_PL_MIN_STEP, SCTP_MAX_PLPMTU);
        }
 
        return t->pl.state == SCTP_PL_COMPLETE;
index 50c38b6..538e9c6 100644 (file)
@@ -2000,8 +2000,10 @@ static int smc_listen_rdma_init(struct smc_sock *new_smc,
                return rc;
 
        /* create send buffer and rmb */
-       if (smc_buf_create(new_smc, false))
+       if (smc_buf_create(new_smc, false)) {
+               smc_conn_abort(new_smc, ini->first_contact_local);
                return SMC_CLC_DECL_MEM;
+       }
 
        return 0;
 }
@@ -2217,8 +2219,11 @@ static void smc_find_rdma_v2_device_serv(struct smc_sock *new_smc,
        smcr_version = ini->smcr_version;
        ini->smcr_version = SMC_V2;
        rc = smc_listen_rdma_init(new_smc, ini);
-       if (!rc)
+       if (!rc) {
                rc = smc_listen_rdma_reg(new_smc, ini->first_contact_local);
+               if (rc)
+                       smc_conn_abort(new_smc, ini->first_contact_local);
+       }
        if (!rc)
                return;
        ini->smcr_version = smcr_version;
index 31db743..dbdf03e 100644 (file)
@@ -67,8 +67,8 @@ static void smc_close_stream_wait(struct smc_sock *smc, long timeout)
 
                rc = sk_wait_event(sk, &timeout,
                                   !smc_tx_prepared_sends(&smc->conn) ||
-                                  sk->sk_err == ECONNABORTED ||
-                                  sk->sk_err == ECONNRESET ||
+                                  READ_ONCE(sk->sk_err) == ECONNABORTED ||
+                                  READ_ONCE(sk->sk_err) == ECONNRESET ||
                                   smc->conn.killed,
                                   &wait);
                if (rc)
index 4543567..3f465fa 100644 (file)
@@ -127,6 +127,7 @@ static int smcr_lgr_conn_assign_link(struct smc_connection *conn, bool first)
        int i, j;
 
        /* do link balancing */
+       conn->lnk = NULL;       /* reset conn->lnk first */
        for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
                struct smc_link *lnk = &conn->lgr->lnk[i];
 
index a0840b8..90f0b60 100644 (file)
@@ -578,7 +578,10 @@ static struct smc_buf_desc *smc_llc_get_next_rmb(struct smc_link_group *lgr,
 {
        struct smc_buf_desc *buf_next;
 
-       if (!buf_pos || list_is_last(&buf_pos->list, &lgr->rmbs[*buf_lst])) {
+       if (!buf_pos)
+               return _smc_llc_get_next_rmb(lgr, buf_lst);
+
+       if (list_is_last(&buf_pos->list, &lgr->rmbs[*buf_lst])) {
                (*buf_lst)++;
                return _smc_llc_get_next_rmb(lgr, buf_lst);
        }
@@ -614,6 +617,8 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext,
                goto out;
        buf_pos = smc_llc_get_first_rmb(lgr, &buf_lst);
        for (i = 0; i < ext->num_rkeys; i++) {
+               while (buf_pos && !(buf_pos)->used)
+                       buf_pos = smc_llc_get_next_rmb(lgr, &buf_lst, buf_pos);
                if (!buf_pos)
                        break;
                rmb = buf_pos;
@@ -623,8 +628,6 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext,
                        cpu_to_be64((uintptr_t)rmb->cpu_addr) :
                        cpu_to_be64((u64)sg_dma_address(rmb->sgt[lnk_idx].sgl));
                buf_pos = smc_llc_get_next_rmb(lgr, &buf_lst, buf_pos);
-               while (buf_pos && !(buf_pos)->used)
-                       buf_pos = smc_llc_get_next_rmb(lgr, &buf_lst, buf_pos);
        }
        len += i * sizeof(ext->rt[0]);
 out:
@@ -848,6 +851,8 @@ static int smc_llc_add_link_cont(struct smc_link *link,
        addc_llc->num_rkeys = *num_rkeys_todo;
        n = *num_rkeys_todo;
        for (i = 0; i < min_t(u8, n, SMC_LLC_RKEYS_PER_CONT_MSG); i++) {
+               while (*buf_pos && !(*buf_pos)->used)
+                       *buf_pos = smc_llc_get_next_rmb(lgr, buf_lst, *buf_pos);
                if (!*buf_pos) {
                        addc_llc->num_rkeys = addc_llc->num_rkeys -
                                              *num_rkeys_todo;
@@ -864,8 +869,6 @@ static int smc_llc_add_link_cont(struct smc_link *link,
 
                (*num_rkeys_todo)--;
                *buf_pos = smc_llc_get_next_rmb(lgr, buf_lst, *buf_pos);
-               while (*buf_pos && !(*buf_pos)->used)
-                       *buf_pos = smc_llc_get_next_rmb(lgr, buf_lst, *buf_pos);
        }
        addc_llc->hd.common.llc_type = SMC_LLC_ADD_LINK_CONT;
        addc_llc->hd.length = sizeof(struct smc_llc_msg_add_link_cont);
index 4380d32..9a2f363 100644 (file)
@@ -267,9 +267,9 @@ int smc_rx_wait(struct smc_sock *smc, long *timeo,
        sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
        add_wait_queue(sk_sleep(sk), &wait);
        rc = sk_wait_event(sk, timeo,
-                          sk->sk_err ||
+                          READ_ONCE(sk->sk_err) ||
                           cflags->peer_conn_abort ||
-                          sk->sk_shutdown & RCV_SHUTDOWN ||
+                          READ_ONCE(sk->sk_shutdown) & RCV_SHUTDOWN ||
                           conn->killed ||
                           fcrit(conn),
                           &wait);
index f4b6a71..4512844 100644 (file)
@@ -113,8 +113,8 @@ static int smc_tx_wait(struct smc_sock *smc, int flags)
                        break; /* at least 1 byte of free & no urgent data */
                set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
                sk_wait_event(sk, &timeo,
-                             sk->sk_err ||
-                             (sk->sk_shutdown & SEND_SHUTDOWN) ||
+                             READ_ONCE(sk->sk_err) ||
+                             (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) ||
                              smc_cdc_rxed_any_close(conn) ||
                              (atomic_read(&conn->sndbuf_space) &&
                               !conn->urg_tx_pend),
index a7b4b37..e46b162 100644 (file)
@@ -471,6 +471,7 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
                return file;
        }
 
+       file->f_mode |= FMODE_NOWAIT;
        sock->file = file;
        file->private_data = sock;
        stream_open(SOCK_INODE(sock), file);
@@ -1093,7 +1094,7 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
        struct socket *sock = file->private_data;
 
        if (unlikely(!sock->ops->splice_read))
-               return generic_file_splice_read(file, ppos, pipe, len, flags);
+               return copy_splice_read(file, ppos, pipe, len, flags);
 
        return sock->ops->splice_read(sock, ppos, pipe, len, flags);
 }
@@ -2911,7 +2912,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
                 * error to return on the next call or if the
                 * app asks about it using getsockopt(SO_ERROR).
                 */
-               sock->sk->sk_err = -err;
+               WRITE_ONCE(sock->sk->sk_err, -err);
        }
 out_put:
        fput_light(sock->file, fput_needed);
index 212c5d5..9734e1d 100644 (file)
@@ -639,6 +639,16 @@ gss_krb5_cts_crypt(struct crypto_sync_skcipher *cipher, struct xdr_buf *buf,
 
        ret = write_bytes_to_xdr_buf(buf, offset, data, len);
 
+#if IS_ENABLED(CONFIG_KUNIT)
+       /*
+        * CBC-CTS does not define an output IV but RFC 3962 defines it as the
+        * penultimate block of ciphertext, so copy that into the IV buffer
+        * before returning.
+        */
+       if (encrypt)
+               memcpy(iv, data, crypto_sync_skcipher_ivsize(cipher));
+#endif
+
 out:
        kfree(data);
        return ret;
index c8321de..6debf4f 100644 (file)
@@ -927,11 +927,10 @@ static void __rpc_execute(struct rpc_task *task)
                 */
                do_action = task->tk_action;
                /* Tasks with an RPC error status should exit */
-               if (do_action != rpc_exit_task &&
+               if (do_action && do_action != rpc_exit_task &&
                    (status = READ_ONCE(task->tk_rpc_status)) != 0) {
                        task->tk_status = status;
-                       if (do_action != NULL)
-                               do_action = rpc_exit_task;
+                       do_action = rpc_exit_task;
                }
                /* Callbacks override all actions */
                if (task->tk_callback) {
index 26367cf..e7c1012 100644 (file)
@@ -109,15 +109,15 @@ param_get_pool_mode(char *buf, const struct kernel_param *kp)
        switch (*ip)
        {
        case SVC_POOL_AUTO:
-               return strlcpy(buf, "auto\n", 20);
+               return sysfs_emit(buf, "auto\n");
        case SVC_POOL_GLOBAL:
-               return strlcpy(buf, "global\n", 20);
+               return sysfs_emit(buf, "global\n");
        case SVC_POOL_PERCPU:
-               return strlcpy(buf, "percpu\n", 20);
+               return sysfs_emit(buf, "percpu\n");
        case SVC_POOL_PERNODE:
-               return strlcpy(buf, "pernode\n", 20);
+               return sysfs_emit(buf, "pernode\n");
        default:
-               return sprintf(buf, "%d\n", *ip);
+               return sysfs_emit(buf, "%d\n", *ip);
        }
 }
 
@@ -597,34 +597,25 @@ svc_destroy(struct kref *ref)
 }
 EXPORT_SYMBOL_GPL(svc_destroy);
 
-/*
- * Allocate an RPC server's buffer space.
- * We allocate pages and place them in rq_pages.
- */
-static int
+static bool
 svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
 {
-       unsigned int pages, arghi;
+       unsigned long pages, ret;
 
        /* bc_xprt uses fore channel allocated buffers */
        if (svc_is_backchannel(rqstp))
-               return 1;
+               return true;
 
        pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
                                       * We assume one is at most one page
                                       */
-       arghi = 0;
        WARN_ON_ONCE(pages > RPCSVC_MAXPAGES);
        if (pages > RPCSVC_MAXPAGES)
                pages = RPCSVC_MAXPAGES;
-       while (pages) {
-               struct page *p = alloc_pages_node(node, GFP_KERNEL, 0);
-               if (!p)
-                       break;
-               rqstp->rq_pages[arghi++] = p;
-               pages--;
-       }
-       return pages == 0;
+
+       ret = alloc_pages_bulk_array_node(GFP_KERNEL, node, pages,
+                                         rqstp->rq_pages);
+       return ret == pages;
 }
 
 /*
@@ -1052,7 +1043,7 @@ static int __svc_register(struct net *net, const char *progname,
 #endif
        }
 
-       trace_svc_register(progname, version, protocol, port, family, error);
+       trace_svc_register(progname, version, family, protocol, port, error);
        return error;
 }
 
@@ -1173,6 +1164,7 @@ static void __svc_unregister(struct net *net, const u32 program, const u32 versi
  */
 static void svc_unregister(const struct svc_serv *serv, struct net *net)
 {
+       struct sighand_struct *sighand;
        struct svc_program *progp;
        unsigned long flags;
        unsigned int i;
@@ -1189,9 +1181,12 @@ static void svc_unregister(const struct svc_serv *serv, struct net *net)
                }
        }
 
-       spin_lock_irqsave(&current->sighand->siglock, flags);
+       rcu_read_lock();
+       sighand = rcu_dereference(current->sighand);
+       spin_lock_irqsave(&sighand->siglock, flags);
        recalc_sigpending();
-       spin_unlock_irqrestore(&current->sighand->siglock, flags);
+       spin_unlock_irqrestore(&sighand->siglock, flags);
+       rcu_read_unlock();
 }
 
 /*
@@ -1416,7 +1411,7 @@ err_bad_rpc:
        /* Only RPCv2 supported */
        xdr_stream_encode_u32(xdr, RPC_VERSION);
        xdr_stream_encode_u32(xdr, RPC_VERSION);
-       goto sendit;
+       return 1;       /* don't wrap */
 
 err_bad_auth:
        dprintk("svc: authentication failed (%d)\n",
@@ -1432,7 +1427,7 @@ err_bad_auth:
 err_bad_prog:
        dprintk("svc: unknown program %d\n", rqstp->rq_prog);
        serv->sv_stats->rpcbadfmt++;
-       xdr_stream_encode_u32(xdr, RPC_PROG_UNAVAIL);
+       *rqstp->rq_accept_statp = rpc_prog_unavail;
        goto sendit;
 
 err_bad_vers:
@@ -1440,7 +1435,12 @@ err_bad_vers:
                       rqstp->rq_vers, rqstp->rq_prog, progp->pg_name);
 
        serv->sv_stats->rpcbadfmt++;
-       xdr_stream_encode_u32(xdr, RPC_PROG_MISMATCH);
+       *rqstp->rq_accept_statp = rpc_prog_mismatch;
+
+       /*
+        * svc_authenticate() has already added the verifier and
+        * advanced the stream just past rq_accept_statp.
+        */
        xdr_stream_encode_u32(xdr, process.mismatch.lovers);
        xdr_stream_encode_u32(xdr, process.mismatch.hivers);
        goto sendit;
@@ -1449,19 +1449,19 @@ err_bad_proc:
        svc_printk(rqstp, "unknown procedure (%d)\n", rqstp->rq_proc);
 
        serv->sv_stats->rpcbadfmt++;
-       xdr_stream_encode_u32(xdr, RPC_PROC_UNAVAIL);
+       *rqstp->rq_accept_statp = rpc_proc_unavail;
        goto sendit;
 
 err_garbage_args:
        svc_printk(rqstp, "failed to decode RPC header\n");
 
        serv->sv_stats->rpcbadfmt++;
-       xdr_stream_encode_u32(xdr, RPC_GARBAGE_ARGS);
+       *rqstp->rq_accept_statp = rpc_garbage_args;
        goto sendit;
 
 err_system_err:
        serv->sv_stats->rpcbadfmt++;
-       xdr_stream_encode_u32(xdr, RPC_SYSTEM_ERR);
+       *rqstp->rq_accept_statp = rpc_system_err;
        goto sendit;
 }
 
index 84e5d7d..62c7919 100644 (file)
@@ -74,13 +74,18 @@ static LIST_HEAD(svc_xprt_class_list);
  *               that no other thread will be using the transport or will
  *               try to set XPT_DEAD.
  */
+
+/**
+ * svc_reg_xprt_class - Register a server-side RPC transport class
+ * @xcl: New transport class to be registered
+ *
+ * Returns zero on success; otherwise a negative errno is returned.
+ */
 int svc_reg_xprt_class(struct svc_xprt_class *xcl)
 {
        struct svc_xprt_class *cl;
        int res = -EEXIST;
 
-       dprintk("svc: Adding svc transport class '%s'\n", xcl->xcl_name);
-
        INIT_LIST_HEAD(&xcl->xcl_list);
        spin_lock(&svc_xprt_class_lock);
        /* Make sure there isn't already a class with the same name */
@@ -96,9 +101,13 @@ out:
 }
 EXPORT_SYMBOL_GPL(svc_reg_xprt_class);
 
+/**
+ * svc_unreg_xprt_class - Unregister a server-side RPC transport class
+ * @xcl: Transport class to be unregistered
+ *
+ */
 void svc_unreg_xprt_class(struct svc_xprt_class *xcl)
 {
-       dprintk("svc: Removing svc transport class '%s'\n", xcl->xcl_name);
        spin_lock(&svc_xprt_class_lock);
        list_del_init(&xcl->xcl_list);
        spin_unlock(&svc_xprt_class_lock);
@@ -532,13 +541,23 @@ void svc_reserve(struct svc_rqst *rqstp, int space)
 }
 EXPORT_SYMBOL_GPL(svc_reserve);
 
+static void free_deferred(struct svc_xprt *xprt, struct svc_deferred_req *dr)
+{
+       if (!dr)
+               return;
+
+       xprt->xpt_ops->xpo_release_ctxt(xprt, dr->xprt_ctxt);
+       kfree(dr);
+}
+
 static void svc_xprt_release(struct svc_rqst *rqstp)
 {
        struct svc_xprt *xprt = rqstp->rq_xprt;
 
-       xprt->xpt_ops->xpo_release_rqst(rqstp);
+       xprt->xpt_ops->xpo_release_ctxt(xprt, rqstp->rq_xprt_ctxt);
+       rqstp->rq_xprt_ctxt = NULL;
 
-       kfree(rqstp->rq_deferred);
+       free_deferred(xprt, rqstp->rq_deferred);
        rqstp->rq_deferred = NULL;
 
        svc_rqst_release_pages(rqstp);
@@ -675,8 +694,9 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
        }
 
        for (filled = 0; filled < pages; filled = ret) {
-               ret = alloc_pages_bulk_array(GFP_KERNEL, pages,
-                                            rqstp->rq_pages);
+               ret = alloc_pages_bulk_array_node(GFP_KERNEL,
+                                                 rqstp->rq_pool->sp_id,
+                                                 pages, rqstp->rq_pages);
                if (ret > filled)
                        /* Made progress, don't sleep yet */
                        continue;
@@ -833,15 +853,11 @@ static int svc_handle_xprt(struct svc_rqst *rqstp, struct svc_xprt *xprt)
                svc_xprt_received(xprt);
        } else if (svc_xprt_reserve_slot(rqstp, xprt)) {
                /* XPT_DATA|XPT_DEFERRED case: */
-               dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
-                       rqstp, rqstp->rq_pool->sp_id, xprt,
-                       kref_read(&xprt->xpt_ref));
                rqstp->rq_deferred = svc_deferred_dequeue(xprt);
                if (rqstp->rq_deferred)
                        len = svc_deferred_recv(rqstp);
                else
                        len = xprt->xpt_ops->xpo_recvfrom(rqstp);
-               rqstp->rq_stime = ktime_get();
                rqstp->rq_reserved = serv->sv_max_mesg;
                atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
        } else
@@ -884,6 +900,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
        err = -EAGAIN;
        if (len <= 0)
                goto out_release;
+
        trace_svc_xdr_recvfrom(&rqstp->rq_arg);
 
        clear_bit(XPT_OLD, &xprt->xpt_flags);
@@ -892,6 +909,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
 
        if (serv->sv_stats)
                serv->sv_stats->netcnt++;
+       rqstp->rq_stime = ktime_get();
        return len;
 out_release:
        rqstp->rq_res.len = 0;
@@ -1054,7 +1072,7 @@ static void svc_delete_xprt(struct svc_xprt *xprt)
        spin_unlock_bh(&serv->sv_lock);
 
        while ((dr = svc_deferred_dequeue(xprt)) != NULL)
-               kfree(dr);
+               free_deferred(xprt, dr);
 
        call_xpt_users(xprt);
        svc_xprt_put(xprt);
@@ -1176,8 +1194,8 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many)
        if (too_many || test_bit(XPT_DEAD, &xprt->xpt_flags)) {
                spin_unlock(&xprt->xpt_lock);
                trace_svc_defer_drop(dr);
+               free_deferred(xprt, dr);
                svc_xprt_put(xprt);
-               kfree(dr);
                return;
        }
        dr->xprt = NULL;
@@ -1222,14 +1240,14 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
                dr->addrlen = rqstp->rq_addrlen;
                dr->daddr = rqstp->rq_daddr;
                dr->argslen = rqstp->rq_arg.len >> 2;
-               dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
-               rqstp->rq_xprt_ctxt = NULL;
 
                /* back up head to the start of the buffer and copy */
                skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
                memcpy(dr->args, rqstp->rq_arg.head[0].iov_base - skip,
                       dr->argslen << 2);
        }
+       dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
+       rqstp->rq_xprt_ctxt = NULL;
        trace_svc_defer(rqstp);
        svc_xprt_get(rqstp->rq_xprt);
        dr->xprt = rqstp->rq_xprt;
@@ -1262,6 +1280,8 @@ static noinline int svc_deferred_recv(struct svc_rqst *rqstp)
        rqstp->rq_daddr       = dr->daddr;
        rqstp->rq_respages    = rqstp->rq_pages;
        rqstp->rq_xprt_ctxt   = dr->xprt_ctxt;
+
+       dr->xprt_ctxt = NULL;
        svc_xprt_received(rqstp->rq_xprt);
        return dr->argslen << 2;
 }
index a51c9b9..5f519fc 100644 (file)
@@ -121,27 +121,27 @@ static void svc_reclassify_socket(struct socket *sock)
 #endif
 
 /**
- * svc_tcp_release_rqst - Release transport-related resources
- * @rqstp: request structure with resources to be released
+ * svc_tcp_release_ctxt - Release transport-related resources
+ * @xprt: the transport which owned the context
+ * @ctxt: the context from rqstp->rq_xprt_ctxt or dr->xprt_ctxt
  *
  */
-static void svc_tcp_release_rqst(struct svc_rqst *rqstp)
+static void svc_tcp_release_ctxt(struct svc_xprt *xprt, void *ctxt)
 {
 }
 
 /**
- * svc_udp_release_rqst - Release transport-related resources
- * @rqstp: request structure with resources to be released
+ * svc_udp_release_ctxt - Release transport-related resources
+ * @xprt: the transport which owned the context
+ * @ctxt: the context from rqstp->rq_xprt_ctxt or dr->xprt_ctxt
  *
  */
-static void svc_udp_release_rqst(struct svc_rqst *rqstp)
+static void svc_udp_release_ctxt(struct svc_xprt *xprt, void *ctxt)
 {
-       struct sk_buff *skb = rqstp->rq_xprt_ctxt;
+       struct sk_buff *skb = ctxt;
 
-       if (skb) {
-               rqstp->rq_xprt_ctxt = NULL;
+       if (skb)
                consume_skb(skb);
-       }
 }
 
 union svc_pktinfo_u {
@@ -696,7 +696,8 @@ static int svc_udp_sendto(struct svc_rqst *rqstp)
        unsigned int sent;
        int err;
 
-       svc_udp_release_rqst(rqstp);
+       svc_udp_release_ctxt(xprt, rqstp->rq_xprt_ctxt);
+       rqstp->rq_xprt_ctxt = NULL;
 
        svc_set_cmsg_data(rqstp, cmh);
 
@@ -768,7 +769,7 @@ static const struct svc_xprt_ops svc_udp_ops = {
        .xpo_recvfrom = svc_udp_recvfrom,
        .xpo_sendto = svc_udp_sendto,
        .xpo_result_payload = svc_sock_result_payload,
-       .xpo_release_rqst = svc_udp_release_rqst,
+       .xpo_release_ctxt = svc_udp_release_ctxt,
        .xpo_detach = svc_sock_detach,
        .xpo_free = svc_sock_free,
        .xpo_has_wspace = svc_udp_has_wspace,
@@ -825,12 +826,6 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 
        trace_sk_data_ready(sk);
 
-       if (svsk) {
-               /* Refer to svc_setup_socket() for details. */
-               rmb();
-               svsk->sk_odata(sk);
-       }
-
        /*
         * This callback may called twice when a new connection
         * is established as a child socket inherits everything
@@ -839,13 +834,18 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
         *    when one of child sockets become ESTABLISHED.
         * 2) data_ready method of the child socket may be called
         *    when it receives data before the socket is accepted.
-        * In case of 2, we should ignore it silently.
+        * In case of 2, we should ignore it silently and DO NOT
+        * dereference svsk.
         */
-       if (sk->sk_state == TCP_LISTEN) {
-               if (svsk) {
-                       set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
-                       svc_xprt_enqueue(&svsk->sk_xprt);
-               }
+       if (sk->sk_state != TCP_LISTEN)
+               return;
+
+       if (svsk) {
+               /* Refer to svc_setup_socket() for details. */
+               rmb();
+               svsk->sk_odata(sk);
+               set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
+               svc_xprt_enqueue(&svsk->sk_xprt);
        }
 }
 
@@ -886,15 +886,13 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
        clear_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
        err = kernel_accept(sock, &newsock, O_NONBLOCK);
        if (err < 0) {
-               if (err == -ENOMEM)
-                       printk(KERN_WARNING "%s: no more sockets!\n",
-                              serv->sv_name);
-               else if (err != -EAGAIN)
-                       net_warn_ratelimited("%s: accept failed (err %d)!\n",
-                                            serv->sv_name, -err);
-               trace_svcsock_accept_err(xprt, serv->sv_name, err);
+               if (err != -EAGAIN)
+                       trace_svcsock_accept_err(xprt, serv->sv_name, err);
                return NULL;
        }
+       if (IS_ERR(sock_alloc_file(newsock, O_NONBLOCK, NULL)))
+               return NULL;
+
        set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
 
        err = kernel_getpeername(newsock, sin);
@@ -935,7 +933,7 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
        return &newsvsk->sk_xprt;
 
 failed:
-       sock_release(newsock);
+       sockfd_put(newsock);
        return NULL;
 }
 
@@ -1298,7 +1296,8 @@ static int svc_tcp_sendto(struct svc_rqst *rqstp)
        unsigned int sent;
        int err;
 
-       svc_tcp_release_rqst(rqstp);
+       svc_tcp_release_ctxt(xprt, rqstp->rq_xprt_ctxt);
+       rqstp->rq_xprt_ctxt = NULL;
 
        atomic_inc(&svsk->sk_sendqlen);
        mutex_lock(&xprt->xpt_mutex);
@@ -1343,7 +1342,7 @@ static const struct svc_xprt_ops svc_tcp_ops = {
        .xpo_recvfrom = svc_tcp_recvfrom,
        .xpo_sendto = svc_tcp_sendto,
        .xpo_result_payload = svc_sock_result_payload,
-       .xpo_release_rqst = svc_tcp_release_rqst,
+       .xpo_release_ctxt = svc_tcp_release_ctxt,
        .xpo_detach = svc_tcp_sock_detach,
        .xpo_free = svc_sock_free,
        .xpo_has_wspace = svc_tcp_has_wspace,
@@ -1430,7 +1429,6 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
                                                struct socket *sock,
                                                int flags)
 {
-       struct file     *filp = NULL;
        struct svc_sock *svsk;
        struct sock     *inet;
        int             pmap_register = !(flags & SVC_SOCK_ANONYMOUS);
@@ -1439,14 +1437,6 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
        if (!svsk)
                return ERR_PTR(-ENOMEM);
 
-       if (!sock->file) {
-               filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
-               if (IS_ERR(filp)) {
-                       kfree(svsk);
-                       return ERR_CAST(filp);
-               }
-       }
-
        inet = sock->sk;
 
        if (pmap_register) {
@@ -1456,8 +1446,6 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
                                     inet->sk_protocol,
                                     ntohs(inet_sk(inet)->inet_sport));
                if (err < 0) {
-                       if (filp)
-                               fput(filp);
                        kfree(svsk);
                        return ERR_PTR(err);
                }
@@ -1470,7 +1458,7 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
        svsk->sk_owspace = inet->sk_write_space;
        /*
         * This barrier is necessary in order to prevent race condition
-        * with svc_data_ready(), svc_listen_data_ready() and others
+        * with svc_data_ready(), svc_tcp_listen_data_ready(), and others
         * when calling callbacks above.
         */
        wmb();
@@ -1482,29 +1470,14 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
        else
                svc_tcp_init(svsk, serv);
 
-       trace_svcsock_new_socket(sock);
+       trace_svcsock_new(svsk, sock);
        return svsk;
 }
 
-bool svc_alien_sock(struct net *net, int fd)
-{
-       int err;
-       struct socket *sock = sockfd_lookup(fd, &err);
-       bool ret = false;
-
-       if (!sock)
-               goto out;
-       if (sock_net(sock->sk) != net)
-               ret = true;
-       sockfd_put(sock);
-out:
-       return ret;
-}
-EXPORT_SYMBOL_GPL(svc_alien_sock);
-
 /**
  * svc_addsock - add a listener socket to an RPC service
  * @serv: pointer to RPC service to which to add a new listener
+ * @net: caller's network namespace
  * @fd: file descriptor of the new listener
  * @name_return: pointer to buffer to fill in with name of listener
  * @len: size of the buffer
@@ -1514,8 +1487,8 @@ EXPORT_SYMBOL_GPL(svc_alien_sock);
  * Name is terminated with '\n'.  On error, returns a negative errno
  * value.
  */
-int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
-               const size_t len, const struct cred *cred)
+int svc_addsock(struct svc_serv *serv, struct net *net, const int fd,
+               char *name_return, const size_t len, const struct cred *cred)
 {
        int err = 0;
        struct socket *so = sockfd_lookup(fd, &err);
@@ -1526,6 +1499,9 @@ int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
 
        if (!so)
                return err;
+       err = -EINVAL;
+       if (sock_net(so->sk) != net)
+               goto out;
        err = -EAFNOSUPPORT;
        if ((so->sk->sk_family != PF_INET) && (so->sk->sk_family != PF_INET6))
                goto out;
@@ -1675,6 +1651,8 @@ static void svc_sock_free(struct svc_xprt *xprt)
        struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
        struct socket *sock = svsk->sk_sock;
 
+       trace_svcsock_free(svsk, sock);
+
        tls_handshake_cancel(sock->sk);
        if (sock->file)
                sockfd_put(sock);
index 36835b2..2a22e78 100644 (file)
@@ -1070,22 +1070,22 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
 }
 EXPORT_SYMBOL_GPL(xdr_reserve_space);
 
-
 /**
  * xdr_reserve_space_vec - Reserves a large amount of buffer space for sending
  * @xdr: pointer to xdr_stream
- * @vec: pointer to a kvec array
  * @nbytes: number of bytes to reserve
  *
- * Reserves enough buffer space to encode 'nbytes' of data and stores the
- * pointers in 'vec'. The size argument passed to xdr_reserve_space() is
- * determined based on the number of bytes remaining in the current page to
- * avoid invalidating iov_base pointers when xdr_commit_encode() is called.
+ * The size argument passed to xdr_reserve_space() is determined based
+ * on the number of bytes remaining in the current page to avoid
+ * invalidating iov_base pointers when xdr_commit_encode() is called.
+ *
+ * Return values:
+ *   %0: success
+ *   %-EMSGSIZE: not enough space is available in @xdr
  */
-int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes)
+int xdr_reserve_space_vec(struct xdr_stream *xdr, size_t nbytes)
 {
-       int thislen;
-       int v = 0;
+       size_t thislen;
        __be32 *p;
 
        /*
@@ -1097,21 +1097,19 @@ int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbyte
                xdr->end = xdr->p;
        }
 
+       /* XXX: Let's find a way to make this more efficient */
        while (nbytes) {
                thislen = xdr->buf->page_len % PAGE_SIZE;
                thislen = min_t(size_t, nbytes, PAGE_SIZE - thislen);
 
                p = xdr_reserve_space(xdr, thislen);
                if (!p)
-                       return -EIO;
+                       return -EMSGSIZE;
 
-               vec[v].iov_base = p;
-               vec[v].iov_len = thislen;
-               v++;
                nbytes -= thislen;
        }
 
-       return v;
+       return 0;
 }
 EXPORT_SYMBOL_GPL(xdr_reserve_space_vec);
 
index aa2227a..7420a2c 100644 (file)
@@ -93,13 +93,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
         */
        get_page(virt_to_page(rqst->rq_buffer));
        sctxt->sc_send_wr.opcode = IB_WR_SEND;
-       ret = svc_rdma_send(rdma, sctxt);
-       if (ret < 0)
-               return ret;
-
-       ret = wait_for_completion_killable(&sctxt->sc_done);
-       svc_rdma_send_ctxt_put(rdma, sctxt);
-       return ret;
+       return svc_rdma_send(rdma, sctxt);
 }
 
 /* Server-side transport endpoint wants a whole page for its send
index 1c658fa..85c8bca 100644 (file)
@@ -125,14 +125,15 @@ static void svc_rdma_recv_cid_init(struct svcxprt_rdma *rdma,
 static struct svc_rdma_recv_ctxt *
 svc_rdma_recv_ctxt_alloc(struct svcxprt_rdma *rdma)
 {
+       int node = ibdev_to_node(rdma->sc_cm_id->device);
        struct svc_rdma_recv_ctxt *ctxt;
        dma_addr_t addr;
        void *buffer;
 
-       ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
+       ctxt = kmalloc_node(sizeof(*ctxt), GFP_KERNEL, node);
        if (!ctxt)
                goto fail0;
-       buffer = kmalloc(rdma->sc_max_req_size, GFP_KERNEL);
+       buffer = kmalloc_node(rdma->sc_max_req_size, GFP_KERNEL, node);
        if (!buffer)
                goto fail1;
        addr = ib_dma_map_single(rdma->sc_pd->device, buffer,
@@ -155,7 +156,6 @@ svc_rdma_recv_ctxt_alloc(struct svcxprt_rdma *rdma)
        ctxt->rc_recv_sge.length = rdma->sc_max_req_size;
        ctxt->rc_recv_sge.lkey = rdma->sc_pd->local_dma_lkey;
        ctxt->rc_recv_buf = buffer;
-       ctxt->rc_temp = false;
        return ctxt;
 
 fail2:
@@ -232,34 +232,30 @@ void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
        pcl_free(&ctxt->rc_write_pcl);
        pcl_free(&ctxt->rc_reply_pcl);
 
-       if (!ctxt->rc_temp)
-               llist_add(&ctxt->rc_node, &rdma->sc_recv_ctxts);
-       else
-               svc_rdma_recv_ctxt_destroy(rdma, ctxt);
+       llist_add(&ctxt->rc_node, &rdma->sc_recv_ctxts);
 }
 
 /**
- * svc_rdma_release_rqst - Release transport-specific per-rqst resources
- * @rqstp: svc_rqst being released
+ * svc_rdma_release_ctxt - Release transport-specific per-rqst resources
+ * @xprt: the transport which owned the context
+ * @vctxt: the context from rqstp->rq_xprt_ctxt or dr->xprt_ctxt
  *
  * Ensure that the recv_ctxt is released whether or not a Reply
  * was sent. For example, the client could close the connection,
  * or svc_process could drop an RPC, before the Reply is sent.
  */
-void svc_rdma_release_rqst(struct svc_rqst *rqstp)
+void svc_rdma_release_ctxt(struct svc_xprt *xprt, void *vctxt)
 {
-       struct svc_rdma_recv_ctxt *ctxt = rqstp->rq_xprt_ctxt;
-       struct svc_xprt *xprt = rqstp->rq_xprt;
+       struct svc_rdma_recv_ctxt *ctxt = vctxt;
        struct svcxprt_rdma *rdma =
                container_of(xprt, struct svcxprt_rdma, sc_xprt);
 
-       rqstp->rq_xprt_ctxt = NULL;
        if (ctxt)
                svc_rdma_recv_ctxt_put(rdma, ctxt);
 }
 
 static bool svc_rdma_refresh_recvs(struct svcxprt_rdma *rdma,
-                                  unsigned int wanted, bool temp)
+                                  unsigned int wanted)
 {
        const struct ib_recv_wr *bad_wr = NULL;
        struct svc_rdma_recv_ctxt *ctxt;
@@ -276,7 +272,6 @@ static bool svc_rdma_refresh_recvs(struct svcxprt_rdma *rdma,
                        break;
 
                trace_svcrdma_post_recv(ctxt);
-               ctxt->rc_temp = temp;
                ctxt->rc_recv_wr.next = recv_chain;
                recv_chain = &ctxt->rc_recv_wr;
                rdma->sc_pending_recvs++;
@@ -310,7 +305,7 @@ err_free:
  */
 bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma)
 {
-       return svc_rdma_refresh_recvs(rdma, rdma->sc_max_requests, true);
+       return svc_rdma_refresh_recvs(rdma, rdma->sc_max_requests);
 }
 
 /**
@@ -344,7 +339,7 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
         * client reconnects.
         */
        if (rdma->sc_pending_recvs < rdma->sc_max_requests)
-               if (!svc_rdma_refresh_recvs(rdma, rdma->sc_recv_batch, false))
+               if (!svc_rdma_refresh_recvs(rdma, rdma->sc_recv_batch))
                        goto dropped;
 
        /* All wc fields are now known to be valid */
@@ -776,9 +771,6 @@ static bool svc_rdma_is_reverse_direction_reply(struct svc_xprt *xprt,
  *
  * The next ctxt is removed from the "receive" lists.
  *
- * - If the ctxt completes a Read, then finish assembling the Call
- *   message and return the number of bytes in the message.
- *
  * - If the ctxt completes a Receive, then construct the Call
  *   message from the contents of the Receive buffer.
  *
@@ -787,7 +779,8 @@ static bool svc_rdma_is_reverse_direction_reply(struct svc_xprt *xprt,
  *     in the message.
  *
  *   - If there are Read chunks in this message, post Read WRs to
- *     pull that payload and return 0.
+ *     pull that payload. When the Read WRs complete, build the
+ *     full message and return the number of bytes in it.
  */
 int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 {
@@ -797,6 +790,12 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
        struct svc_rdma_recv_ctxt *ctxt;
        int ret;
 
+       /* Prevent svc_xprt_release() from releasing pages in rq_pages
+        * when returning 0 or an error.
+        */
+       rqstp->rq_respages = rqstp->rq_pages;
+       rqstp->rq_next_page = rqstp->rq_respages;
+
        rqstp->rq_xprt_ctxt = NULL;
 
        ctxt = NULL;
@@ -820,12 +819,6 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
                                   DMA_FROM_DEVICE);
        svc_rdma_build_arg_xdr(rqstp, ctxt);
 
-       /* Prevent svc_xprt_release from releasing pages in rq_pages
-        * if we return 0 or an error.
-        */
-       rqstp->rq_respages = rqstp->rq_pages;
-       rqstp->rq_next_page = rqstp->rq_respages;
-
        ret = svc_rdma_xdr_decode_req(&rqstp->rq_arg, ctxt);
        if (ret < 0)
                goto out_err;
index 11cf7c6..e460e25 100644 (file)
@@ -62,8 +62,8 @@ svc_rdma_get_rw_ctxt(struct svcxprt_rdma *rdma, unsigned int sges)
        if (node) {
                ctxt = llist_entry(node, struct svc_rdma_rw_ctxt, rw_node);
        } else {
-               ctxt = kmalloc(struct_size(ctxt, rw_first_sgl, SG_CHUNK_SIZE),
-                              GFP_KERNEL);
+               ctxt = kmalloc_node(struct_size(ctxt, rw_first_sgl, SG_CHUNK_SIZE),
+                                   GFP_KERNEL, ibdev_to_node(rdma->sc_cm_id->device));
                if (!ctxt)
                        goto out_noctx;
 
@@ -84,8 +84,7 @@ out_noctx:
        return NULL;
 }
 
-static void __svc_rdma_put_rw_ctxt(struct svcxprt_rdma *rdma,
-                                  struct svc_rdma_rw_ctxt *ctxt,
+static void __svc_rdma_put_rw_ctxt(struct svc_rdma_rw_ctxt *ctxt,
                                   struct llist_head *list)
 {
        sg_free_table_chained(&ctxt->rw_sg_table, SG_CHUNK_SIZE);
@@ -95,7 +94,7 @@ static void __svc_rdma_put_rw_ctxt(struct svcxprt_rdma *rdma,
 static void svc_rdma_put_rw_ctxt(struct svcxprt_rdma *rdma,
                                 struct svc_rdma_rw_ctxt *ctxt)
 {
-       __svc_rdma_put_rw_ctxt(rdma, ctxt, &rdma->sc_rw_ctxts);
+       __svc_rdma_put_rw_ctxt(ctxt, &rdma->sc_rw_ctxts);
 }
 
 /**
@@ -191,6 +190,8 @@ static void svc_rdma_cc_release(struct svc_rdma_chunk_ctxt *cc,
        struct svc_rdma_rw_ctxt *ctxt;
        LLIST_HEAD(free);
 
+       trace_svcrdma_cc_release(&cc->cc_cid, cc->cc_sqecount);
+
        first = last = NULL;
        while ((ctxt = svc_rdma_next_ctxt(&cc->cc_rwctxts)) != NULL) {
                list_del(&ctxt->rw_list);
@@ -198,7 +199,7 @@ static void svc_rdma_cc_release(struct svc_rdma_chunk_ctxt *cc,
                rdma_rw_ctx_destroy(&ctxt->rw_ctx, rdma->sc_qp,
                                    rdma->sc_port_num, ctxt->rw_sg_table.sgl,
                                    ctxt->rw_nents, dir);
-               __svc_rdma_put_rw_ctxt(rdma, ctxt, &free);
+               __svc_rdma_put_rw_ctxt(ctxt, &free);
 
                ctxt->rw_node.next = first;
                first = &ctxt->rw_node;
@@ -234,7 +235,8 @@ svc_rdma_write_info_alloc(struct svcxprt_rdma *rdma,
 {
        struct svc_rdma_write_info *info;
 
-       info = kmalloc(sizeof(*info), GFP_KERNEL);
+       info = kmalloc_node(sizeof(*info), GFP_KERNEL,
+                           ibdev_to_node(rdma->sc_cm_id->device));
        if (!info)
                return info;
 
@@ -304,7 +306,8 @@ svc_rdma_read_info_alloc(struct svcxprt_rdma *rdma)
 {
        struct svc_rdma_read_info *info;
 
-       info = kmalloc(sizeof(*info), GFP_KERNEL);
+       info = kmalloc_node(sizeof(*info), GFP_KERNEL,
+                           ibdev_to_node(rdma->sc_cm_id->device));
        if (!info)
                return info;
 
@@ -351,8 +354,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
        return;
 }
 
-/* This function sleeps when the transport's Send Queue is congested.
- *
+/*
  * Assumptions:
  * - If ib_post_send() succeeds, only one completion is expected,
  *   even if one or more WRs are flushed. This is true when posting
@@ -367,6 +369,8 @@ static int svc_rdma_post_chunk_ctxt(struct svc_rdma_chunk_ctxt *cc)
        struct ib_cqe *cqe;
        int ret;
 
+       might_sleep();
+
        if (cc->cc_sqecount > rdma->sc_sq_depth)
                return -EINVAL;
 
index 22a871e..c6644cc 100644 (file)
@@ -123,18 +123,17 @@ static void svc_rdma_send_cid_init(struct svcxprt_rdma *rdma,
 static struct svc_rdma_send_ctxt *
 svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
 {
+       int node = ibdev_to_node(rdma->sc_cm_id->device);
        struct svc_rdma_send_ctxt *ctxt;
        dma_addr_t addr;
        void *buffer;
-       size_t size;
        int i;
 
-       size = sizeof(*ctxt);
-       size += rdma->sc_max_send_sges * sizeof(struct ib_sge);
-       ctxt = kmalloc(size, GFP_KERNEL);
+       ctxt = kmalloc_node(struct_size(ctxt, sc_sges, rdma->sc_max_send_sges),
+                           GFP_KERNEL, node);
        if (!ctxt)
                goto fail0;
-       buffer = kmalloc(rdma->sc_max_req_size, GFP_KERNEL);
+       buffer = kmalloc_node(rdma->sc_max_req_size, GFP_KERNEL, node);
        if (!buffer)
                goto fail1;
        addr = ib_dma_map_single(rdma->sc_pd->device, buffer,
@@ -148,7 +147,6 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
        ctxt->sc_send_wr.wr_cqe = &ctxt->sc_cqe;
        ctxt->sc_send_wr.sg_list = ctxt->sc_sges;
        ctxt->sc_send_wr.send_flags = IB_SEND_SIGNALED;
-       init_completion(&ctxt->sc_done);
        ctxt->sc_cqe.done = svc_rdma_wc_send;
        ctxt->sc_xprt_buf = buffer;
        xdr_buf_init(&ctxt->sc_hdrbuf, ctxt->sc_xprt_buf,
@@ -214,6 +212,7 @@ out:
 
        ctxt->sc_send_wr.num_sge = 0;
        ctxt->sc_cur_sge_no = 0;
+       ctxt->sc_page_count = 0;
        return ctxt;
 
 out_empty:
@@ -228,6 +227,8 @@ out_empty:
  * svc_rdma_send_ctxt_put - Return send_ctxt to free list
  * @rdma: controlling svcxprt_rdma
  * @ctxt: object to return to the free list
+ *
+ * Pages left in sc_pages are DMA unmapped and released.
  */
 void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma,
                            struct svc_rdma_send_ctxt *ctxt)
@@ -235,6 +236,9 @@ void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma,
        struct ib_device *device = rdma->sc_cm_id->device;
        unsigned int i;
 
+       if (ctxt->sc_page_count)
+               release_pages(ctxt->sc_pages, ctxt->sc_page_count);
+
        /* The first SGE contains the transport header, which
         * remains mapped until @ctxt is destroyed.
         */
@@ -281,12 +285,12 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
                container_of(cqe, struct svc_rdma_send_ctxt, sc_cqe);
 
        svc_rdma_wake_send_waiters(rdma, 1);
-       complete(&ctxt->sc_done);
 
        if (unlikely(wc->status != IB_WC_SUCCESS))
                goto flushed;
 
        trace_svcrdma_wc_send(wc, &ctxt->sc_cid);
+       svc_rdma_send_ctxt_put(rdma, ctxt);
        return;
 
 flushed:
@@ -294,6 +298,7 @@ flushed:
                trace_svcrdma_wc_send_err(wc, &ctxt->sc_cid);
        else
                trace_svcrdma_wc_send_flush(wc, &ctxt->sc_cid);
+       svc_rdma_send_ctxt_put(rdma, ctxt);
        svc_xprt_deferred_close(&rdma->sc_xprt);
 }
 
@@ -310,7 +315,7 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt)
        struct ib_send_wr *wr = &ctxt->sc_send_wr;
        int ret;
 
-       reinit_completion(&ctxt->sc_done);
+       might_sleep();
 
        /* Sync the transport header buffer */
        ib_dma_sync_single_for_device(rdma->sc_pd->device,
@@ -799,6 +804,25 @@ int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
                                       svc_rdma_xb_dma_map, &args);
 }
 
+/* The svc_rqst and all resources it owns are released as soon as
+ * svc_rdma_sendto returns. Transfer pages under I/O to the ctxt
+ * so they are released by the Send completion handler.
+ */
+static void svc_rdma_save_io_pages(struct svc_rqst *rqstp,
+                                  struct svc_rdma_send_ctxt *ctxt)
+{
+       int i, pages = rqstp->rq_next_page - rqstp->rq_respages;
+
+       ctxt->sc_page_count += pages;
+       for (i = 0; i < pages; i++) {
+               ctxt->sc_pages[i] = rqstp->rq_respages[i];
+               rqstp->rq_respages[i] = NULL;
+       }
+
+       /* Prevent svc_xprt_release from releasing pages in rq_pages */
+       rqstp->rq_next_page = rqstp->rq_respages;
+}
+
 /* Prepare the portion of the RPC Reply that will be transmitted
  * via RDMA Send. The RPC-over-RDMA transport header is prepared
  * in sc_sges[0], and the RPC xdr_buf is prepared in following sges.
@@ -828,6 +852,8 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
        if (ret < 0)
                return ret;
 
+       svc_rdma_save_io_pages(rqstp, sctxt);
+
        if (rctxt->rc_inv_rkey) {
                sctxt->sc_send_wr.opcode = IB_WR_SEND_WITH_INV;
                sctxt->sc_send_wr.ex.invalidate_rkey = rctxt->rc_inv_rkey;
@@ -835,13 +861,7 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
                sctxt->sc_send_wr.opcode = IB_WR_SEND;
        }
 
-       ret = svc_rdma_send(rdma, sctxt);
-       if (ret < 0)
-               return ret;
-
-       ret = wait_for_completion_killable(&sctxt->sc_done);
-       svc_rdma_send_ctxt_put(rdma, sctxt);
-       return ret;
+       return svc_rdma_send(rdma, sctxt);
 }
 
 /**
@@ -907,8 +927,7 @@ void svc_rdma_send_error_msg(struct svcxprt_rdma *rdma,
        sctxt->sc_sges[0].length = sctxt->sc_hdrbuf.len;
        if (svc_rdma_send(rdma, sctxt))
                goto put_ctxt;
-
-       wait_for_completion_killable(&sctxt->sc_done);
+       return;
 
 put_ctxt:
        svc_rdma_send_ctxt_put(rdma, sctxt);
@@ -976,17 +995,16 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
        ret = svc_rdma_send_reply_msg(rdma, sctxt, rctxt, rqstp);
        if (ret < 0)
                goto put_ctxt;
-
-       /* Prevent svc_xprt_release() from releasing the page backing
-        * rq_res.head[0].iov_base. It's no longer being accessed by
-        * the I/O device. */
-       rqstp->rq_respages++;
        return 0;
 
 reply_chunk:
        if (ret != -E2BIG && ret != -EINVAL)
                goto put_ctxt;
 
+       /* Send completion releases payload pages that were part
+        * of previously posted RDMA Writes.
+        */
+       svc_rdma_save_io_pages(rqstp, sctxt);
        svc_rdma_send_error_msg(rdma, sctxt, rctxt, ret);
        return 0;
 
index 416b298..2abd895 100644 (file)
@@ -64,7 +64,7 @@
 #define RPCDBG_FACILITY        RPCDBG_SVCXPRT
 
 static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
-                                                struct net *net);
+                                                struct net *net, int node);
 static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
                                        struct net *net,
                                        struct sockaddr *sa, int salen,
@@ -80,7 +80,7 @@ static const struct svc_xprt_ops svc_rdma_ops = {
        .xpo_recvfrom = svc_rdma_recvfrom,
        .xpo_sendto = svc_rdma_sendto,
        .xpo_result_payload = svc_rdma_result_payload,
-       .xpo_release_rqst = svc_rdma_release_rqst,
+       .xpo_release_ctxt = svc_rdma_release_ctxt,
        .xpo_detach = svc_rdma_detach,
        .xpo_free = svc_rdma_free,
        .xpo_has_wspace = svc_rdma_has_wspace,
@@ -123,14 +123,14 @@ static void qp_event_handler(struct ib_event *event, void *context)
 }
 
 static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
-                                                struct net *net)
+                                                struct net *net, int node)
 {
-       struct svcxprt_rdma *cma_xprt = kzalloc(sizeof *cma_xprt, GFP_KERNEL);
+       struct svcxprt_rdma *cma_xprt;
 
-       if (!cma_xprt) {
-               dprintk("svcrdma: failed to create new transport\n");
+       cma_xprt = kzalloc_node(sizeof(*cma_xprt), GFP_KERNEL, node);
+       if (!cma_xprt)
                return NULL;
-       }
+
        svc_xprt_init(net, &svc_rdma_class, &cma_xprt->sc_xprt, serv);
        INIT_LIST_HEAD(&cma_xprt->sc_accept_q);
        INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
@@ -193,9 +193,9 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id,
        struct svcxprt_rdma *newxprt;
        struct sockaddr *sa;
 
-       /* Create a new transport */
        newxprt = svc_rdma_create_xprt(listen_xprt->sc_xprt.xpt_server,
-                                      listen_xprt->sc_xprt.xpt_net);
+                                      listen_xprt->sc_xprt.xpt_net,
+                                      ibdev_to_node(new_cma_id->device));
        if (!newxprt)
                return;
        newxprt->sc_cm_id = new_cma_id;
@@ -304,7 +304,7 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
 
        if (sa->sa_family != AF_INET && sa->sa_family != AF_INET6)
                return ERR_PTR(-EAFNOSUPPORT);
-       cma_xprt = svc_rdma_create_xprt(serv, net);
+       cma_xprt = svc_rdma_create_xprt(serv, net, NUMA_NO_NODE);
        if (!cma_xprt)
                return ERR_PTR(-ENOMEM);
        set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
index 35cac77..cdcd273 100644 (file)
@@ -541,6 +541,19 @@ int tipc_bearer_mtu(struct net *net, u32 bearer_id)
        return mtu;
 }
 
+int tipc_bearer_min_mtu(struct net *net, u32 bearer_id)
+{
+       int mtu = TIPC_MIN_BEARER_MTU;
+       struct tipc_bearer *b;
+
+       rcu_read_lock();
+       b = bearer_get(net, bearer_id);
+       if (b)
+               mtu += b->encap_hlen;
+       rcu_read_unlock();
+       return mtu;
+}
+
 /* tipc_bearer_xmit_skb - sends buffer to destination over bearer
  */
 void tipc_bearer_xmit_skb(struct net *net, u32 bearer_id,
@@ -1138,8 +1151,8 @@ int __tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info)
                                return -EINVAL;
                        }
 #ifdef CONFIG_TIPC_MEDIA_UDP
-                       if (tipc_udp_mtu_bad(nla_get_u32
-                                            (props[TIPC_NLA_PROP_MTU]))) {
+                       if (nla_get_u32(props[TIPC_NLA_PROP_MTU]) <
+                           b->encap_hlen + TIPC_MIN_BEARER_MTU) {
                                NL_SET_ERR_MSG(info->extack,
                                               "MTU value is out-of-range");
                                return -EINVAL;
@@ -1245,7 +1258,7 @@ int tipc_nl_media_get(struct sk_buff *skb, struct genl_info *info)
        struct tipc_nl_msg msg;
        struct tipc_media *media;
        struct sk_buff *rep;
-       struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1];
+       struct nlattr *attrs[TIPC_NLA_MEDIA_MAX + 1];
 
        if (!info->attrs[TIPC_NLA_MEDIA])
                return -EINVAL;
@@ -1294,7 +1307,7 @@ int __tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info)
        int err;
        char *name;
        struct tipc_media *m;
-       struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1];
+       struct nlattr *attrs[TIPC_NLA_MEDIA_MAX + 1];
 
        if (!info->attrs[TIPC_NLA_MEDIA])
                return -EINVAL;
index 490ad6e..bd0cc5c 100644 (file)
@@ -146,6 +146,7 @@ struct tipc_media {
  * @identity: array index of this bearer within TIPC bearer array
  * @disc: ptr to link setup request
  * @net_plane: network plane ('A' through 'H') currently associated with bearer
+ * @encap_hlen: encap headers length
  * @up: bearer up flag (bit 0)
  * @refcnt: tipc_bearer reference counter
  *
@@ -170,6 +171,7 @@ struct tipc_bearer {
        u32 identity;
        struct tipc_discoverer *disc;
        char net_plane;
+       u16 encap_hlen;
        unsigned long up;
        refcount_t refcnt;
 };
@@ -232,6 +234,7 @@ int tipc_bearer_setup(void);
 void tipc_bearer_cleanup(void);
 void tipc_bearer_stop(struct net *net);
 int tipc_bearer_mtu(struct net *net, u32 bearer_id);
+int tipc_bearer_min_mtu(struct net *net, u32 bearer_id);
 bool tipc_bearer_bcast_support(struct net *net, u32 bearer_id);
 void tipc_bearer_xmit_skb(struct net *net, u32 bearer_id,
                          struct sk_buff *skb,
index b3ce248..2eff1c7 100644 (file)
@@ -2200,7 +2200,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
        struct tipc_msg *hdr = buf_msg(skb);
        struct tipc_gap_ack_blks *ga = NULL;
        bool reply = msg_probe(hdr), retransmitted = false;
-       u32 dlen = msg_data_sz(hdr), glen = 0;
+       u32 dlen = msg_data_sz(hdr), glen = 0, msg_max;
        u16 peers_snd_nxt =  msg_next_sent(hdr);
        u16 peers_tol = msg_link_tolerance(hdr);
        u16 peers_prio = msg_linkprio(hdr);
@@ -2239,6 +2239,9 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
        switch (mtyp) {
        case RESET_MSG:
        case ACTIVATE_MSG:
+               msg_max = msg_max_pkt(hdr);
+               if (msg_max < tipc_bearer_min_mtu(l->net, l->bearer_id))
+                       break;
                /* Complete own link name with peer's interface name */
                if_name =  strrchr(l->name, ':') + 1;
                if (sizeof(l->name) - (if_name - l->name) <= TIPC_MAX_IF_NAME)
@@ -2283,8 +2286,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
                l->peer_session = msg_session(hdr);
                l->in_session = true;
                l->peer_bearer_id = msg_bearer_id(hdr);
-               if (l->mtu > msg_max_pkt(hdr))
-                       l->mtu = msg_max_pkt(hdr);
+               if (l->mtu > msg_max)
+                       l->mtu = msg_max;
                break;
 
        case STATE_MSG:
index 37edfe1..dd73d71 100644 (file)
@@ -314,9 +314,9 @@ static void tsk_rej_rx_queue(struct sock *sk, int error)
                tipc_sk_respond(sk, skb, error);
 }
 
-static bool tipc_sk_connected(struct sock *sk)
+static bool tipc_sk_connected(const struct sock *sk)
 {
-       return sk->sk_state == TIPC_ESTABLISHED;
+       return READ_ONCE(sk->sk_state) == TIPC_ESTABLISHED;
 }
 
 /* tipc_sk_type_connectionless - check if the socket is datagram socket
index c2bb818..0a85244 100644 (file)
@@ -738,8 +738,8 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
                        udp_conf.local_ip.s_addr = local.ipv4.s_addr;
                udp_conf.use_udp_checksums = false;
                ub->ifindex = dev->ifindex;
-               if (tipc_mtu_bad(dev, sizeof(struct iphdr) +
-                                     sizeof(struct udphdr))) {
+               b->encap_hlen = sizeof(struct iphdr) + sizeof(struct udphdr);
+               if (tipc_mtu_bad(dev, b->encap_hlen)) {
                        err = -EINVAL;
                        goto err;
                }
@@ -760,6 +760,7 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
                else
                        udp_conf.local_ip6 = local.ipv6;
                ub->ifindex = dev->ifindex;
+               b->encap_hlen = sizeof(struct ipv6hdr) + sizeof(struct udphdr);
                b->mtu = 1280;
 #endif
        } else {
index 804c388..0672aca 100644 (file)
@@ -167,6 +167,11 @@ static inline bool tls_strp_msg_ready(struct tls_sw_context_rx *ctx)
        return ctx->strp.msg_ready;
 }
 
+static inline bool tls_strp_msg_mixed_decrypted(struct tls_sw_context_rx *ctx)
+{
+       return ctx->strp.mixed_decrypted;
+}
+
 #ifdef CONFIG_TLS_DEVICE
 int tls_device_init(void);
 void tls_device_cleanup(void);
index a7cc4f9..bf69c9d 100644 (file)
@@ -1007,20 +1007,14 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx)
        struct tls_sw_context_rx *sw_ctx = tls_sw_ctx_rx(tls_ctx);
        struct sk_buff *skb = tls_strp_msg(sw_ctx);
        struct strp_msg *rxm = strp_msg(skb);
-       int is_decrypted = skb->decrypted;
-       int is_encrypted = !is_decrypted;
-       struct sk_buff *skb_iter;
-       int left;
-
-       left = rxm->full_len - skb->len;
-       /* Check if all the data is decrypted already */
-       skb_iter = skb_shinfo(skb)->frag_list;
-       while (skb_iter && left > 0) {
-               is_decrypted &= skb_iter->decrypted;
-               is_encrypted &= !skb_iter->decrypted;
-
-               left -= skb_iter->len;
-               skb_iter = skb_iter->next;
+       int is_decrypted, is_encrypted;
+
+       if (!tls_strp_msg_mixed_decrypted(sw_ctx)) {
+               is_decrypted = skb->decrypted;
+               is_encrypted = !is_decrypted;
+       } else {
+               is_decrypted = 0;
+               is_encrypted = 0;
        }
 
        trace_tls_device_decrypted(sk, tcp_sk(sk)->copied_seq - rxm->full_len,
index b32c112..f2e7302 100644 (file)
@@ -111,7 +111,8 @@ int wait_on_pending_writer(struct sock *sk, long *timeo)
                        break;
                }
 
-               if (sk_wait_event(sk, timeo, !sk->sk_write_pending, &wait))
+               if (sk_wait_event(sk, timeo,
+                                 !READ_ONCE(sk->sk_write_pending), &wait))
                        break;
        }
        remove_wait_queue(sk_sleep(sk), &wait);
index 955ac3e..f37f4a0 100644 (file)
@@ -20,7 +20,9 @@ static void tls_strp_abort_strp(struct tls_strparser *strp, int err)
        strp->stopped = 1;
 
        /* Report an error on the lower socket */
-       strp->sk->sk_err = -err;
+       WRITE_ONCE(strp->sk->sk_err, -err);
+       /* Paired with smp_rmb() in tcp_poll() */
+       smp_wmb();
        sk_error_report(strp->sk);
 }
 
@@ -29,34 +31,50 @@ static void tls_strp_anchor_free(struct tls_strparser *strp)
        struct skb_shared_info *shinfo = skb_shinfo(strp->anchor);
 
        DEBUG_NET_WARN_ON_ONCE(atomic_read(&shinfo->dataref) != 1);
-       shinfo->frag_list = NULL;
+       if (!strp->copy_mode)
+               shinfo->frag_list = NULL;
        consume_skb(strp->anchor);
        strp->anchor = NULL;
 }
 
-/* Create a new skb with the contents of input copied to its page frags */
-static struct sk_buff *tls_strp_msg_make_copy(struct tls_strparser *strp)
+static struct sk_buff *
+tls_strp_skb_copy(struct tls_strparser *strp, struct sk_buff *in_skb,
+                 int offset, int len)
 {
-       struct strp_msg *rxm;
        struct sk_buff *skb;
-       int i, err, offset;
+       int i, err;
 
-       skb = alloc_skb_with_frags(0, strp->stm.full_len, TLS_PAGE_ORDER,
+       skb = alloc_skb_with_frags(0, len, TLS_PAGE_ORDER,
                                   &err, strp->sk->sk_allocation);
        if (!skb)
                return NULL;
 
-       offset = strp->stm.offset;
        for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
                skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 
-               WARN_ON_ONCE(skb_copy_bits(strp->anchor, offset,
+               WARN_ON_ONCE(skb_copy_bits(in_skb, offset,
                                           skb_frag_address(frag),
                                           skb_frag_size(frag)));
                offset += skb_frag_size(frag);
        }
 
-       skb_copy_header(skb, strp->anchor);
+       skb->len = len;
+       skb->data_len = len;
+       skb_copy_header(skb, in_skb);
+       return skb;
+}
+
+/* Create a new skb with the contents of input copied to its page frags */
+static struct sk_buff *tls_strp_msg_make_copy(struct tls_strparser *strp)
+{
+       struct strp_msg *rxm;
+       struct sk_buff *skb;
+
+       skb = tls_strp_skb_copy(strp, strp->anchor, strp->stm.offset,
+                               strp->stm.full_len);
+       if (!skb)
+               return NULL;
+
        rxm = strp_msg(skb);
        rxm->offset = 0;
        return skb;
@@ -180,22 +198,22 @@ static void tls_strp_flush_anchor_copy(struct tls_strparser *strp)
        for (i = 0; i < shinfo->nr_frags; i++)
                __skb_frag_unref(&shinfo->frags[i], false);
        shinfo->nr_frags = 0;
+       if (strp->copy_mode) {
+               kfree_skb_list(shinfo->frag_list);
+               shinfo->frag_list = NULL;
+       }
        strp->copy_mode = 0;
+       strp->mixed_decrypted = 0;
 }
 
-static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb,
-                          unsigned int offset, size_t in_len)
+static int tls_strp_copyin_frag(struct tls_strparser *strp, struct sk_buff *skb,
+                               struct sk_buff *in_skb, unsigned int offset,
+                               size_t in_len)
 {
-       struct tls_strparser *strp = (struct tls_strparser *)desc->arg.data;
-       struct sk_buff *skb;
-       skb_frag_t *frag;
        size_t len, chunk;
+       skb_frag_t *frag;
        int sz;
 
-       if (strp->msg_ready)
-               return 0;
-
-       skb = strp->anchor;
        frag = &skb_shinfo(skb)->frags[skb->len / PAGE_SIZE];
 
        len = in_len;
@@ -208,19 +226,26 @@ static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb,
                                           skb_frag_size(frag),
                                           chunk));
 
-               sz = tls_rx_msg_size(strp, strp->anchor);
-               if (sz < 0) {
-                       desc->error = sz;
-                       return 0;
-               }
-
-               /* We may have over-read, sz == 0 is guaranteed under-read */
-               if (sz > 0)
-                       chunk = min_t(size_t, chunk, sz - skb->len);
-
                skb->len += chunk;
                skb->data_len += chunk;
                skb_frag_size_add(frag, chunk);
+
+               sz = tls_rx_msg_size(strp, skb);
+               if (sz < 0)
+                       return sz;
+
+               /* We may have over-read, sz == 0 is guaranteed under-read */
+               if (unlikely(sz && sz < skb->len)) {
+                       int over = skb->len - sz;
+
+                       WARN_ON_ONCE(over > chunk);
+                       skb->len -= over;
+                       skb->data_len -= over;
+                       skb_frag_size_add(frag, -over);
+
+                       chunk -= over;
+               }
+
                frag++;
                len -= chunk;
                offset += chunk;
@@ -247,15 +272,99 @@ static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb,
                offset += chunk;
        }
 
-       if (strp->stm.full_len == skb->len) {
+read_done:
+       return in_len - len;
+}
+
+static int tls_strp_copyin_skb(struct tls_strparser *strp, struct sk_buff *skb,
+                              struct sk_buff *in_skb, unsigned int offset,
+                              size_t in_len)
+{
+       struct sk_buff *nskb, *first, *last;
+       struct skb_shared_info *shinfo;
+       size_t chunk;
+       int sz;
+
+       if (strp->stm.full_len)
+               chunk = strp->stm.full_len - skb->len;
+       else
+               chunk = TLS_MAX_PAYLOAD_SIZE + PAGE_SIZE;
+       chunk = min(chunk, in_len);
+
+       nskb = tls_strp_skb_copy(strp, in_skb, offset, chunk);
+       if (!nskb)
+               return -ENOMEM;
+
+       shinfo = skb_shinfo(skb);
+       if (!shinfo->frag_list) {
+               shinfo->frag_list = nskb;
+               nskb->prev = nskb;
+       } else {
+               first = shinfo->frag_list;
+               last = first->prev;
+               last->next = nskb;
+               first->prev = nskb;
+       }
+
+       skb->len += chunk;
+       skb->data_len += chunk;
+
+       if (!strp->stm.full_len) {
+               sz = tls_rx_msg_size(strp, skb);
+               if (sz < 0)
+                       return sz;
+
+               /* We may have over-read, sz == 0 is guaranteed under-read */
+               if (unlikely(sz && sz < skb->len)) {
+                       int over = skb->len - sz;
+
+                       WARN_ON_ONCE(over > chunk);
+                       skb->len -= over;
+                       skb->data_len -= over;
+                       __pskb_trim(nskb, nskb->len - over);
+
+                       chunk -= over;
+               }
+
+               strp->stm.full_len = sz;
+       }
+
+       return chunk;
+}
+
+static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb,
+                          unsigned int offset, size_t in_len)
+{
+       struct tls_strparser *strp = (struct tls_strparser *)desc->arg.data;
+       struct sk_buff *skb;
+       int ret;
+
+       if (strp->msg_ready)
+               return 0;
+
+       skb = strp->anchor;
+       if (!skb->len)
+               skb_copy_decrypted(skb, in_skb);
+       else
+               strp->mixed_decrypted |= !!skb_cmp_decrypted(skb, in_skb);
+
+       if (IS_ENABLED(CONFIG_TLS_DEVICE) && strp->mixed_decrypted)
+               ret = tls_strp_copyin_skb(strp, skb, in_skb, offset, in_len);
+       else
+               ret = tls_strp_copyin_frag(strp, skb, in_skb, offset, in_len);
+       if (ret < 0) {
+               desc->error = ret;
+               ret = 0;
+       }
+
+       if (strp->stm.full_len && strp->stm.full_len == skb->len) {
                desc->count = 0;
 
                strp->msg_ready = 1;
                tls_rx_msg_ready(strp);
        }
 
-read_done:
-       return in_len - len;
+       return ret;
 }
 
 static int tls_strp_read_copyin(struct tls_strparser *strp)
@@ -315,15 +424,19 @@ static int tls_strp_read_copy(struct tls_strparser *strp, bool qshort)
        return 0;
 }
 
-static bool tls_strp_check_no_dup(struct tls_strparser *strp)
+static bool tls_strp_check_queue_ok(struct tls_strparser *strp)
 {
        unsigned int len = strp->stm.offset + strp->stm.full_len;
-       struct sk_buff *skb;
+       struct sk_buff *first, *skb;
        u32 seq;
 
-       skb = skb_shinfo(strp->anchor)->frag_list;
-       seq = TCP_SKB_CB(skb)->seq;
+       first = skb_shinfo(strp->anchor)->frag_list;
+       skb = first;
+       seq = TCP_SKB_CB(first)->seq;
 
+       /* Make sure there's no duplicate data in the queue,
+        * and the decrypted status matches.
+        */
        while (skb->len < len) {
                seq += skb->len;
                len -= skb->len;
@@ -331,6 +444,8 @@ static bool tls_strp_check_no_dup(struct tls_strparser *strp)
 
                if (TCP_SKB_CB(skb)->seq != seq)
                        return false;
+               if (skb_cmp_decrypted(first, skb))
+                       return false;
        }
 
        return true;
@@ -411,7 +526,7 @@ static int tls_strp_read_sock(struct tls_strparser *strp)
                        return tls_strp_read_copy(strp, true);
        }
 
-       if (!tls_strp_check_no_dup(strp))
+       if (!tls_strp_check_queue_ok(strp))
                return tls_strp_read_copy(strp, false);
 
        strp->msg_ready = 1;
index 635b8bf..1a53c8f 100644 (file)
@@ -70,7 +70,9 @@ noinline void tls_err_abort(struct sock *sk, int err)
 {
        WARN_ON_ONCE(err >= 0);
        /* sk->sk_err should contain a positive error code. */
-       sk->sk_err = -err;
+       WRITE_ONCE(sk->sk_err, -err);
+       /* Paired with smp_rmb() in tcp_poll() */
+       smp_wmb();
        sk_error_report(sk);
 }
 
@@ -2304,10 +2306,14 @@ static void tls_data_ready(struct sock *sk)
        struct tls_context *tls_ctx = tls_get_ctx(sk);
        struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
        struct sk_psock *psock;
+       gfp_t alloc_save;
 
        trace_sk_data_ready(sk);
 
+       alloc_save = sk->sk_allocation;
+       sk->sk_allocation = GFP_ATOMIC;
        tls_strp_data_ready(&ctx->strp);
+       sk->sk_allocation = alloc_save;
 
        psock = sk_psock_get(sk);
        if (psock) {
index fb31e8a..e7728b5 100644 (file)
@@ -603,7 +603,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
        /* Clear state */
        unix_state_lock(sk);
        sock_orphan(sk);
-       sk->sk_shutdown = SHUTDOWN_MASK;
+       WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
        path         = u->path;
        u->path.dentry = NULL;
        u->path.mnt = NULL;
@@ -628,7 +628,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
                if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) {
                        unix_state_lock(skpair);
                        /* No more writes */
-                       skpair->sk_shutdown = SHUTDOWN_MASK;
+                       WRITE_ONCE(skpair->sk_shutdown, SHUTDOWN_MASK);
                        if (!skb_queue_empty(&sk->sk_receive_queue) || embrion)
                                WRITE_ONCE(skpair->sk_err, ECONNRESET);
                        unix_state_unlock(skpair);
@@ -1442,7 +1442,7 @@ static long unix_wait_for_peer(struct sock *other, long timeo)
 
        sched = !sock_flag(other, SOCK_DEAD) &&
                !(other->sk_shutdown & RCV_SHUTDOWN) &&
-               unix_recvq_full(other);
+               unix_recvq_full_lockless(other);
 
        unix_state_unlock(other);
 
@@ -2553,7 +2553,7 @@ static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 {
        struct unix_sock *u = unix_sk(sk);
        struct sk_buff *skb;
-       int err, copied;
+       int err;
 
        mutex_lock(&u->iolock);
        skb = skb_recv_datagram(sk, MSG_DONTWAIT, &err);
@@ -2561,10 +2561,7 @@ static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
        if (!skb)
                return err;
 
-       copied = recv_actor(sk, skb);
-       kfree_skb(skb);
-
-       return copied;
+       return recv_actor(sk, skb);
 }
 
 /*
@@ -3008,7 +3005,7 @@ static int unix_shutdown(struct socket *sock, int mode)
        ++mode;
 
        unix_state_lock(sk);
-       sk->sk_shutdown |= mode;
+       WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | mode);
        other = unix_peer(sk);
        if (other)
                sock_hold(other);
@@ -3028,7 +3025,7 @@ static int unix_shutdown(struct socket *sock, int mode)
                if (mode&SEND_SHUTDOWN)
                        peer_mode |= RCV_SHUTDOWN;
                unix_state_lock(other);
-               other->sk_shutdown |= peer_mode;
+               WRITE_ONCE(other->sk_shutdown, other->sk_shutdown | peer_mode);
                unix_state_unlock(other);
                other->sk_state_change(other);
                if (peer_mode == SHUTDOWN_MASK)
@@ -3160,16 +3157,18 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
 {
        struct sock *sk = sock->sk;
        __poll_t mask;
+       u8 shutdown;
 
        sock_poll_wait(file, sock, wait);
        mask = 0;
+       shutdown = READ_ONCE(sk->sk_shutdown);
 
        /* exceptional events? */
        if (READ_ONCE(sk->sk_err))
                mask |= EPOLLERR;
-       if (sk->sk_shutdown == SHUTDOWN_MASK)
+       if (shutdown == SHUTDOWN_MASK)
                mask |= EPOLLHUP;
-       if (sk->sk_shutdown & RCV_SHUTDOWN)
+       if (shutdown & RCV_SHUTDOWN)
                mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
 
        /* readable? */
@@ -3203,9 +3202,11 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
        struct sock *sk = sock->sk, *other;
        unsigned int writable;
        __poll_t mask;
+       u8 shutdown;
 
        sock_poll_wait(file, sock, wait);
        mask = 0;
+       shutdown = READ_ONCE(sk->sk_shutdown);
 
        /* exceptional events? */
        if (READ_ONCE(sk->sk_err) ||
@@ -3213,9 +3214,9 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
                mask |= EPOLLERR |
                        (sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
 
-       if (sk->sk_shutdown & RCV_SHUTDOWN)
+       if (shutdown & RCV_SHUTDOWN)
                mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
-       if (sk->sk_shutdown == SHUTDOWN_MASK)
+       if (shutdown == SHUTDOWN_MASK)
                mask |= EPOLLHUP;
 
        /* readable? */
index 413407b..efb8a09 100644 (file)
@@ -1462,7 +1462,7 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
                        vsock_transport_cancel_pkt(vsk);
                        vsock_remove_connected(vsk);
                        goto out_wait;
-               } else if (timeout == 0) {
+               } else if ((sk->sk_state != TCP_ESTABLISHED) && (timeout == 0)) {
                        err = -ETIMEDOUT;
                        sk->sk_state = TCP_CLOSE;
                        sock->state = SS_UNCONNECTED;
index e487855..b769fc2 100644 (file)
@@ -1441,7 +1441,6 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t recv_acto
        struct sock *sk = sk_vsock(vsk);
        struct sk_buff *skb;
        int off = 0;
-       int copied;
        int err;
 
        spin_lock_bh(&vvs->rx_lock);
@@ -1454,9 +1453,7 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t recv_acto
        if (!skb)
                return err;
 
-       copied = recv_actor(sk, skb);
-       kfree_skb(skb);
-       return copied;
+       return recv_actor(sk, skb);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_read_skb);
 
index 5b0c4d5..b3ec9ea 100644 (file)
@@ -368,12 +368,12 @@ static void cfg80211_sched_scan_stop_wk(struct work_struct *work)
        rdev = container_of(work, struct cfg80211_registered_device,
                           sched_scan_stop_wk);
 
-       rtnl_lock();
+       wiphy_lock(&rdev->wiphy);
        list_for_each_entry_safe(req, tmp, &rdev->sched_scan_req_list, list) {
                if (req->nl_owner_dead)
                        cfg80211_stop_sched_scan_req(rdev, req, false);
        }
-       rtnl_unlock();
+       wiphy_unlock(&rdev->wiphy);
 }
 
 static void cfg80211_propagate_radar_detect_wk(struct work_struct *work)
index d95f805..087d60c 100644 (file)
@@ -10723,6 +10723,8 @@ static int nl80211_authenticate(struct sk_buff *skb, struct genl_info *info)
                if (!info->attrs[NL80211_ATTR_MLD_ADDR])
                        return -EINVAL;
                req.ap_mld_addr = nla_data(info->attrs[NL80211_ATTR_MLD_ADDR]);
+               if (!is_valid_ether_addr(req.ap_mld_addr))
+                       return -EINVAL;
        }
 
        req.bss = cfg80211_get_bss(&rdev->wiphy, chan, bssid, ssid, ssid_len,
index 2e497cf..69b5087 100644 (file)
@@ -2,7 +2,7 @@
 /*
  * Portions of this file
  * Copyright(c) 2016-2017 Intel Deutschland GmbH
- * Copyright (C) 2018, 2021-2022 Intel Corporation
+ * Copyright (C) 2018, 2021-2023 Intel Corporation
  */
 #ifndef __CFG80211_RDEV_OPS
 #define __CFG80211_RDEV_OPS
@@ -1441,8 +1441,8 @@ rdev_del_intf_link(struct cfg80211_registered_device *rdev,
                   unsigned int link_id)
 {
        trace_rdev_del_intf_link(&rdev->wiphy, wdev, link_id);
-       if (rdev->ops->add_intf_link)
-               rdev->ops->add_intf_link(&rdev->wiphy, wdev, link_id);
+       if (rdev->ops->del_intf_link)
+               rdev->ops->del_intf_link(&rdev->wiphy, wdev, link_id);
        trace_rdev_return_void(&rdev->wiphy);
 }
 
index 0d40d6a..26f11e4 100644 (file)
@@ -2404,11 +2404,8 @@ static bool reg_wdev_chan_valid(struct wiphy *wiphy, struct wireless_dev *wdev)
                case NL80211_IFTYPE_P2P_GO:
                case NL80211_IFTYPE_ADHOC:
                case NL80211_IFTYPE_MESH_POINT:
-                       wiphy_lock(wiphy);
                        ret = cfg80211_reg_can_beacon_relax(wiphy, &chandef,
                                                            iftype);
-                       wiphy_unlock(wiphy);
-
                        if (!ret)
                                return ret;
                        break;
@@ -2440,11 +2437,11 @@ static void reg_leave_invalid_chans(struct wiphy *wiphy)
        struct wireless_dev *wdev;
        struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy);
 
-       ASSERT_RTNL();
-
+       wiphy_lock(wiphy);
        list_for_each_entry(wdev, &rdev->wiphy.wdev_list, list)
                if (!reg_wdev_chan_valid(wiphy, wdev))
                        cfg80211_leave(rdev, wdev);
+       wiphy_unlock(wiphy);
 }
 
 static void reg_check_chans_work(struct work_struct *work)
index a138225..c501db7 100644 (file)
@@ -5,7 +5,7 @@
  * Copyright 2008 Johannes Berg <johannes@sipsolutions.net>
  * Copyright 2013-2014  Intel Mobile Communications GmbH
  * Copyright 2016      Intel Deutschland GmbH
- * Copyright (C) 2018-2022 Intel Corporation
+ * Copyright (C) 2018-2023 Intel Corporation
  */
 #include <linux/kernel.h>
 #include <linux/slab.h>
@@ -540,6 +540,10 @@ static int cfg80211_parse_ap_info(struct cfg80211_colocated_ap *entry,
        /* skip the TBTT offset */
        pos++;
 
+       /* ignore entries with invalid BSSID */
+       if (!is_valid_ether_addr(pos))
+               return -EINVAL;
+
        memcpy(entry->bssid, pos, ETH_ALEN);
        pos += ETH_ALEN;
 
index 3bc0c30..9755ef2 100644 (file)
@@ -5,7 +5,7 @@
  * Copyright 2007-2009 Johannes Berg <johannes@sipsolutions.net>
  * Copyright 2013-2014  Intel Mobile Communications GmbH
  * Copyright 2017      Intel Deutschland GmbH
- * Copyright (C) 2018-2022 Intel Corporation
+ * Copyright (C) 2018-2023 Intel Corporation
  */
 #include <linux/export.h>
 #include <linux/bitops.h>
@@ -2558,6 +2558,13 @@ void cfg80211_remove_links(struct wireless_dev *wdev)
 {
        unsigned int link_id;
 
+       /*
+        * links are controlled by upper layers (userspace/cfg)
+        * only for AP mode, so only remove them here for AP
+        */
+       if (wdev->iftype != NL80211_IFTYPE_AP)
+               return;
+
        wdev_lock(wdev);
        if (wdev->valid_links) {
                for_each_valid_link(wdev, link_id)
index bef28c6..408f5e5 100644 (file)
@@ -378,7 +378,7 @@ int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
                break;
        default:
                xdo->dev = NULL;
-               dev_put(dev);
+               netdev_put(dev, &xdo->dev_tracker);
                NL_SET_ERR_MSG(extack, "Unrecognized offload direction");
                return -EINVAL;
        }
index 39fb91f..815b380 100644 (file)
@@ -131,6 +131,7 @@ struct sec_path *secpath_set(struct sk_buff *skb)
        memset(sp->ovec, 0, sizeof(sp->ovec));
        sp->olen = 0;
        sp->len = 0;
+       sp->verified_cnt = 0;
 
        return sp;
 }
@@ -330,11 +331,10 @@ xfrm_inner_mode_encap_remove(struct xfrm_state *x,
 {
        switch (x->props.mode) {
        case XFRM_MODE_BEET:
-               switch (XFRM_MODE_SKB_CB(skb)->protocol) {
-               case IPPROTO_IPIP:
-               case IPPROTO_BEETPH:
+               switch (x->sel.family) {
+               case AF_INET:
                        return xfrm4_remove_beet_encap(x, skb);
-               case IPPROTO_IPV6:
+               case AF_INET6:
                        return xfrm6_remove_beet_encap(x, skb);
                }
                break;
index 5c61ec0..e7617c9 100644 (file)
@@ -1831,6 +1831,7 @@ again:
 
                __xfrm_policy_unlink(pol, dir);
                spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+               xfrm_dev_policy_delete(pol);
                cnt++;
                xfrm_audit_policy_delete(pol, 1, task_valid);
                xfrm_policy_kill(pol);
@@ -1869,6 +1870,7 @@ again:
 
                __xfrm_policy_unlink(pol, dir);
                spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+               xfrm_dev_policy_delete(pol);
                cnt++;
                xfrm_audit_policy_delete(pol, 1, task_valid);
                xfrm_policy_kill(pol);
@@ -3312,7 +3314,7 @@ xfrm_secpath_reject(int idx, struct sk_buff *skb, const struct flowi *fl)
 
 static inline int
 xfrm_state_ok(const struct xfrm_tmpl *tmpl, const struct xfrm_state *x,
-             unsigned short family)
+             unsigned short family, u32 if_id)
 {
        if (xfrm_state_kern(x))
                return tmpl->optional && !xfrm_state_addr_cmp(tmpl, x, tmpl->encap_family);
@@ -3323,7 +3325,8 @@ xfrm_state_ok(const struct xfrm_tmpl *tmpl, const struct xfrm_state *x,
                (tmpl->allalgs || (tmpl->aalgos & (1<<x->props.aalgo)) ||
                 !(xfrm_id_proto_match(tmpl->id.proto, IPSEC_PROTO_ANY))) &&
                !(x->props.mode != XFRM_MODE_TRANSPORT &&
-                 xfrm_state_addr_cmp(tmpl, x, family));
+                 xfrm_state_addr_cmp(tmpl, x, family)) &&
+               (if_id == 0 || if_id == x->if_id);
 }
 
 /*
@@ -3335,7 +3338,7 @@ xfrm_state_ok(const struct xfrm_tmpl *tmpl, const struct xfrm_state *x,
  */
 static inline int
 xfrm_policy_ok(const struct xfrm_tmpl *tmpl, const struct sec_path *sp, int start,
-              unsigned short family)
+              unsigned short family, u32 if_id)
 {
        int idx = start;
 
@@ -3345,9 +3348,16 @@ xfrm_policy_ok(const struct xfrm_tmpl *tmpl, const struct sec_path *sp, int star
        } else
                start = -1;
        for (; idx < sp->len; idx++) {
-               if (xfrm_state_ok(tmpl, sp->xvec[idx], family))
+               if (xfrm_state_ok(tmpl, sp->xvec[idx], family, if_id))
                        return ++idx;
                if (sp->xvec[idx]->props.mode != XFRM_MODE_TRANSPORT) {
+                       if (idx < sp->verified_cnt) {
+                               /* Secpath entry previously verified, consider optional and
+                                * continue searching
+                                */
+                               continue;
+                       }
+
                        if (start == -1)
                                start = -2-idx;
                        break;
@@ -3712,12 +3722,6 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
                }
                xfrm_nr = ti;
 
-               if (net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK &&
-                   !xfrm_nr) {
-                       XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
-                       goto reject;
-               }
-
                if (npols > 1) {
                        xfrm_tmpl_sort(stp, tpp, xfrm_nr, family);
                        tpp = stp;
@@ -3728,9 +3732,12 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
                 * Order is _important_. Later we will implement
                 * some barriers, but at the moment barriers
                 * are implied between each two transformations.
+                * Upon success, marks secpath entries as having been
+                * verified to allow them to be skipped in future policy
+                * checks (e.g. nested tunnels).
                 */
                for (i = xfrm_nr-1, k = 0; i >= 0; i--) {
-                       k = xfrm_policy_ok(tpp[i], sp, k, family);
+                       k = xfrm_policy_ok(tpp[i], sp, k, family, if_id);
                        if (k < 0) {
                                if (k < -1)
                                        /* "-2 - errored_index" returned */
@@ -3745,10 +3752,9 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
                        goto reject;
                }
 
-               if (if_id)
-                       secpath_reset(skb);
-
                xfrm_pols_put(pols, npols);
+               sp->verified_cnt = k;
+
                return 1;
        }
        XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLBLOCK);
index d720e16..c34a2a0 100644 (file)
@@ -1770,7 +1770,7 @@ static void copy_templates(struct xfrm_policy *xp, struct xfrm_user_tmpl *ut,
 }
 
 static int validate_tmpl(int nr, struct xfrm_user_tmpl *ut, u16 family,
-                        struct netlink_ext_ack *extack)
+                        int dir, struct netlink_ext_ack *extack)
 {
        u16 prev_family;
        int i;
@@ -1796,6 +1796,10 @@ static int validate_tmpl(int nr, struct xfrm_user_tmpl *ut, u16 family,
                switch (ut[i].mode) {
                case XFRM_MODE_TUNNEL:
                case XFRM_MODE_BEET:
+                       if (ut[i].optional && dir == XFRM_POLICY_OUT) {
+                               NL_SET_ERR_MSG(extack, "Mode in optional template not allowed in outbound policy");
+                               return -EINVAL;
+                       }
                        break;
                default:
                        if (ut[i].family != prev_family) {
@@ -1833,7 +1837,7 @@ static int validate_tmpl(int nr, struct xfrm_user_tmpl *ut, u16 family,
 }
 
 static int copy_from_user_tmpl(struct xfrm_policy *pol, struct nlattr **attrs,
-                              struct netlink_ext_ack *extack)
+                              int dir, struct netlink_ext_ack *extack)
 {
        struct nlattr *rt = attrs[XFRMA_TMPL];
 
@@ -1844,7 +1848,7 @@ static int copy_from_user_tmpl(struct xfrm_policy *pol, struct nlattr **attrs,
                int nr = nla_len(rt) / sizeof(*utmpl);
                int err;
 
-               err = validate_tmpl(nr, utmpl, pol->family, extack);
+               err = validate_tmpl(nr, utmpl, pol->family, dir, extack);
                if (err)
                        return err;
 
@@ -1921,7 +1925,7 @@ static struct xfrm_policy *xfrm_policy_construct(struct net *net,
        if (err)
                goto error;
 
-       if (!(err = copy_from_user_tmpl(xp, attrs, extack)))
+       if (!(err = copy_from_user_tmpl(xp, attrs, p->dir, extack)))
                err = copy_from_user_sec_ctx(xp, attrs);
        if (err)
                goto error;
@@ -1980,6 +1984,7 @@ static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
 
        if (err) {
                xfrm_dev_policy_delete(xp);
+               xfrm_dev_policy_free(xp);
                security_xfrm_policy_free(xp->security);
                kfree(xp);
                return err;
@@ -3499,7 +3504,7 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,
                return NULL;
 
        nr = ((len - sizeof(*p)) / sizeof(*ut));
-       if (validate_tmpl(nr, ut, p->sel.family, NULL))
+       if (validate_tmpl(nr, ut, p->sel.family, p->dir, NULL))
                return NULL;
 
        if (p->dir > XFRM_POLICY_OUT)
index c89c753..eb6f22e 100644 (file)
@@ -10,6 +10,9 @@ upstream. In general, only additions should be performed (e.g. new
 methods). Eventually, changes should make it into upstream so that,
 at some point, this fork can be dropped from the kernel tree.
 
+The Rust upstream version on top of which these files are based matches
+the output of `scripts/min-tool-version.sh rustc`.
+
 
 ## Rationale
 
index ca224a5..acf22d4 100644 (file)
@@ -22,21 +22,24 @@ use core::marker::Destruct;
 mod tests;
 
 extern "Rust" {
-    // These are the magic symbols to call the global allocator.  rustc generates
+    // These are the magic symbols to call the global allocator. rustc generates
     // them to call `__rg_alloc` etc. if there is a `#[global_allocator]` attribute
     // (the code expanding that attribute macro generates those functions), or to call
-    // the default implementations in libstd (`__rdl_alloc` etc. in `library/std/src/alloc.rs`)
+    // the default implementations in std (`__rdl_alloc` etc. in `library/std/src/alloc.rs`)
     // otherwise.
-    // The rustc fork of LLVM also special-cases these function names to be able to optimize them
+    // The rustc fork of LLVM 14 and earlier also special-cases these function names to be able to optimize them
     // like `malloc`, `realloc`, and `free`, respectively.
     #[rustc_allocator]
-    #[rustc_allocator_nounwind]
+    #[rustc_nounwind]
     fn __rust_alloc(size: usize, align: usize) -> *mut u8;
-    #[rustc_allocator_nounwind]
+    #[rustc_deallocator]
+    #[rustc_nounwind]
     fn __rust_dealloc(ptr: *mut u8, size: usize, align: usize);
-    #[rustc_allocator_nounwind]
+    #[rustc_reallocator]
+    #[rustc_nounwind]
     fn __rust_realloc(ptr: *mut u8, old_size: usize, align: usize, new_size: usize) -> *mut u8;
-    #[rustc_allocator_nounwind]
+    #[rustc_allocator_zeroed]
+    #[rustc_nounwind]
     fn __rust_alloc_zeroed(size: usize, align: usize) -> *mut u8;
 }
 
@@ -72,11 +75,14 @@ pub use std::alloc::Global;
 /// # Examples
 ///
 /// ```
-/// use std::alloc::{alloc, dealloc, Layout};
+/// use std::alloc::{alloc, dealloc, handle_alloc_error, Layout};
 ///
 /// unsafe {
 ///     let layout = Layout::new::<u16>();
 ///     let ptr = alloc(layout);
+///     if ptr.is_null() {
+///         handle_alloc_error(layout);
+///     }
 ///
 ///     *(ptr as *mut u16) = 42;
 ///     assert_eq!(*(ptr as *mut u16), 42);
@@ -349,7 +355,7 @@ pub(crate) const unsafe fn box_free<T: ?Sized, A: ~const Allocator + ~const Dest
 
 #[cfg(not(no_global_oom_handling))]
 extern "Rust" {
-    // This is the magic symbol to call the global alloc error handler.  rustc generates
+    // This is the magic symbol to call the global alloc error handler. rustc generates
     // it to call `__rg_oom` if there is a `#[alloc_error_handler]`, or to call the
     // default implementations below (`__rdl_oom`) otherwise.
     fn __rust_alloc_error_handler(size: usize, align: usize) -> !;
@@ -394,25 +400,24 @@ pub use std::alloc::handle_alloc_error;
 #[allow(unused_attributes)]
 #[unstable(feature = "alloc_internals", issue = "none")]
 pub mod __alloc_error_handler {
-    use crate::alloc::Layout;
-
-    // called via generated `__rust_alloc_error_handler`
-
-    // if there is no `#[alloc_error_handler]`
+    // called via generated `__rust_alloc_error_handler` if there is no
+    // `#[alloc_error_handler]`.
     #[rustc_std_internal_symbol]
-    pub unsafe extern "C-unwind" fn __rdl_oom(size: usize, _align: usize) -> ! {
-        panic!("memory allocation of {size} bytes failed")
-    }
-
-    // if there is an `#[alloc_error_handler]`
-    #[rustc_std_internal_symbol]
-    pub unsafe extern "C-unwind" fn __rg_oom(size: usize, align: usize) -> ! {
-        let layout = unsafe { Layout::from_size_align_unchecked(size, align) };
+    pub unsafe fn __rdl_oom(size: usize, _align: usize) -> ! {
         extern "Rust" {
-            #[lang = "oom"]
-            fn oom_impl(layout: Layout) -> !;
+            // This symbol is emitted by rustc next to __rust_alloc_error_handler.
+            // Its value depends on the -Zoom={panic,abort} compiler option.
+            static __rust_alloc_error_handler_should_panic: u8;
+        }
+
+        #[allow(unused_unsafe)]
+        if unsafe { __rust_alloc_error_handler_should_panic != 0 } {
+            panic!("memory allocation of {size} bytes failed")
+        } else {
+            core::panicking::panic_nounwind_fmt(format_args!(
+                "memory allocation of {size} bytes failed"
+            ))
         }
-        unsafe { oom_impl(layout) }
     }
 }
 
index dcfe87b..14af986 100644 (file)
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: Apache-2.0 OR MIT
 
-//! A pointer type for heap allocation.
+//! The `Box<T>` type for heap allocation.
 //!
 //! [`Box<T>`], casually referred to as a 'box', provides the simplest form of
 //! heap allocation in Rust. Boxes provide ownership for this allocation, and
 //! definition is just using `T*` can lead to undefined behavior, as
 //! described in [rust-lang/unsafe-code-guidelines#198][ucg#198].
 //!
+//! # Considerations for unsafe code
+//!
+//! **Warning: This section is not normative and is subject to change, possibly
+//! being relaxed in the future! It is a simplified summary of the rules
+//! currently implemented in the compiler.**
+//!
+//! The aliasing rules for `Box<T>` are the same as for `&mut T`. `Box<T>`
+//! asserts uniqueness over its content. Using raw pointers derived from a box
+//! after that box has been mutated through, moved or borrowed as `&mut T`
+//! is not allowed. For more guidance on working with box from unsafe code, see
+//! [rust-lang/unsafe-code-guidelines#326][ucg#326].
+//!
+//!
 //! [ucg#198]: https://github.com/rust-lang/unsafe-code-guidelines/issues/198
+//! [ucg#326]: https://github.com/rust-lang/unsafe-code-guidelines/issues/326
 //! [dereferencing]: core::ops::Deref
 //! [`Box::<T>::from_raw(value)`]: Box::from_raw
 //! [`Global`]: crate::alloc::Global
@@ -139,12 +153,14 @@ use core::async_iter::AsyncIterator;
 use core::borrow;
 use core::cmp::Ordering;
 use core::convert::{From, TryFrom};
+use core::error::Error;
 use core::fmt;
 use core::future::Future;
 use core::hash::{Hash, Hasher};
 #[cfg(not(no_global_oom_handling))]
 use core::iter::FromIterator;
 use core::iter::{FusedIterator, Iterator};
+use core::marker::Tuple;
 use core::marker::{Destruct, Unpin, Unsize};
 use core::mem;
 use core::ops::{
@@ -163,6 +179,8 @@ use crate::raw_vec::RawVec;
 #[cfg(not(no_global_oom_handling))]
 use crate::str::from_boxed_utf8_unchecked;
 #[cfg(not(no_global_oom_handling))]
+use crate::string::String;
+#[cfg(not(no_global_oom_handling))]
 use crate::vec::Vec;
 
 #[cfg(not(no_thin))]
@@ -172,7 +190,7 @@ pub use thin::ThinBox;
 #[cfg(not(no_thin))]
 mod thin;
 
-/// A pointer type for heap allocation.
+/// A pointer type that uniquely owns a heap allocation of type `T`.
 ///
 /// See the [module-level documentation](../../std/boxed/index.html) for more.
 #[lang = "owned_box"]
@@ -196,12 +214,13 @@ impl<T> Box<T> {
     /// ```
     /// let five = Box::new(5);
     /// ```
-    #[cfg(not(no_global_oom_handling))]
+    #[cfg(all(not(no_global_oom_handling)))]
     #[inline(always)]
     #[stable(feature = "rust1", since = "1.0.0")]
     #[must_use]
     pub fn new(x: T) -> Self {
-        box x
+        #[rustc_box]
+        Box::new(x)
     }
 
     /// Constructs a new box with uninitialized contents.
@@ -256,14 +275,21 @@ impl<T> Box<T> {
         Self::new_zeroed_in(Global)
     }
 
-    /// Constructs a new `Pin<Box<T>>`. If `T` does not implement `Unpin`, then
+    /// Constructs a new `Pin<Box<T>>`. If `T` does not implement [`Unpin`], then
     /// `x` will be pinned in memory and unable to be moved.
+    ///
+    /// Constructing and pinning of the `Box` can also be done in two steps: `Box::pin(x)`
+    /// does the same as <code>[Box::into_pin]\([Box::new]\(x))</code>. Consider using
+    /// [`into_pin`](Box::into_pin) if you already have a `Box<T>`, or if you want to
+    /// construct a (pinned) `Box` in a different way than with [`Box::new`].
     #[cfg(not(no_global_oom_handling))]
     #[stable(feature = "pin", since = "1.33.0")]
     #[must_use]
     #[inline(always)]
     pub fn pin(x: T) -> Pin<Box<T>> {
-        (box x).into()
+        (#[rustc_box]
+        Box::new(x))
+        .into()
     }
 
     /// Allocates memory on the heap then places `x` into it,
@@ -543,8 +569,13 @@ impl<T, A: Allocator> Box<T, A> {
         unsafe { Ok(Box::from_raw_in(ptr.as_ptr(), alloc)) }
     }
 
-    /// Constructs a new `Pin<Box<T, A>>`. If `T` does not implement `Unpin`, then
+    /// Constructs a new `Pin<Box<T, A>>`. If `T` does not implement [`Unpin`], then
     /// `x` will be pinned in memory and unable to be moved.
+    ///
+    /// Constructing and pinning of the `Box` can also be done in two steps: `Box::pin_in(x, alloc)`
+    /// does the same as <code>[Box::into_pin]\([Box::new_in]\(x, alloc))</code>. Consider using
+    /// [`into_pin`](Box::into_pin) if you already have a `Box<T, A>`, or if you want to
+    /// construct a (pinned) `Box` in a different way than with [`Box::new_in`].
     #[cfg(not(no_global_oom_handling))]
     #[unstable(feature = "allocator_api", issue = "32838")]
     #[rustc_const_unstable(feature = "const_box", issue = "92521")]
@@ -926,6 +957,7 @@ impl<T: ?Sized> Box<T> {
     /// [`Layout`]: crate::Layout
     #[stable(feature = "box_raw", since = "1.4.0")]
     #[inline]
+    #[must_use = "call `drop(Box::from_raw(ptr))` if you intend to drop the `Box`"]
     pub unsafe fn from_raw(raw: *mut T) -> Self {
         unsafe { Self::from_raw_in(raw, Global) }
     }
@@ -1160,19 +1192,44 @@ impl<T: ?Sized, A: Allocator> Box<T, A> {
         unsafe { &mut *mem::ManuallyDrop::new(b).0.as_ptr() }
     }
 
-    /// Converts a `Box<T>` into a `Pin<Box<T>>`
+    /// Converts a `Box<T>` into a `Pin<Box<T>>`. If `T` does not implement [`Unpin`], then
+    /// `*boxed` will be pinned in memory and unable to be moved.
     ///
     /// This conversion does not allocate on the heap and happens in place.
     ///
     /// This is also available via [`From`].
-    #[unstable(feature = "box_into_pin", issue = "62370")]
+    ///
+    /// Constructing and pinning a `Box` with <code>Box::into_pin([Box::new]\(x))</code>
+    /// can also be written more concisely using <code>[Box::pin]\(x)</code>.
+    /// This `into_pin` method is useful if you already have a `Box<T>`, or you are
+    /// constructing a (pinned) `Box` in a different way than with [`Box::new`].
+    ///
+    /// # Notes
+    ///
+    /// It's not recommended that crates add an impl like `From<Box<T>> for Pin<T>`,
+    /// as it'll introduce an ambiguity when calling `Pin::from`.
+    /// A demonstration of such a poor impl is shown below.
+    ///
+    /// ```compile_fail
+    /// # use std::pin::Pin;
+    /// struct Foo; // A type defined in this crate.
+    /// impl From<Box<()>> for Pin<Foo> {
+    ///     fn from(_: Box<()>) -> Pin<Foo> {
+    ///         Pin::new(Foo)
+    ///     }
+    /// }
+    ///
+    /// let foo = Box::new(());
+    /// let bar = Pin::from(foo);
+    /// ```
+    #[stable(feature = "box_into_pin", since = "1.63.0")]
     #[rustc_const_unstable(feature = "const_box", issue = "92521")]
     pub const fn into_pin(boxed: Self) -> Pin<Self>
     where
         A: 'static,
     {
         // It's not possible to move or replace the insides of a `Pin<Box<T>>`
-        // when `T: !Unpin`,  so it's safe to pin it directly without any
+        // when `T: !Unpin`, so it's safe to pin it directly without any
         // additional requirements.
         unsafe { Pin::new_unchecked(boxed) }
     }
@@ -1190,7 +1247,8 @@ unsafe impl<#[may_dangle] T: ?Sized, A: Allocator> Drop for Box<T, A> {
 impl<T: Default> Default for Box<T> {
     /// Creates a `Box<T>`, with the `Default` value for T.
     fn default() -> Self {
-        box T::default()
+        #[rustc_box]
+        Box::new(T::default())
     }
 }
 
@@ -1408,9 +1466,17 @@ impl<T: ?Sized, A: Allocator> const From<Box<T, A>> for Pin<Box<T, A>>
 where
     A: 'static,
 {
-    /// Converts a `Box<T>` into a `Pin<Box<T>>`
+    /// Converts a `Box<T>` into a `Pin<Box<T>>`. If `T` does not implement [`Unpin`], then
+    /// `*boxed` will be pinned in memory and unable to be moved.
     ///
     /// This conversion does not allocate on the heap and happens in place.
+    ///
+    /// This is also available via [`Box::into_pin`].
+    ///
+    /// Constructing and pinning a `Box` with <code><Pin<Box\<T>>>::from([Box::new]\(x))</code>
+    /// can also be written more concisely using <code>[Box::pin]\(x)</code>.
+    /// This `From` implementation is useful if you already have a `Box<T>`, or you are
+    /// constructing a (pinned) `Box` in a different way than with [`Box::new`].
     fn from(boxed: Box<T, A>) -> Self {
         Box::into_pin(boxed)
     }
@@ -1422,7 +1488,7 @@ impl<T: Copy> From<&[T]> for Box<[T]> {
     /// Converts a `&[T]` into a `Box<[T]>`
     ///
     /// This conversion allocates on the heap
-    /// and performs a copy of `slice`.
+    /// and performs a copy of `slice` and its contents.
     ///
     /// # Examples
     /// ```rust
@@ -1554,10 +1620,27 @@ impl<T, const N: usize> From<[T; N]> for Box<[T]> {
     /// println!("{boxed:?}");
     /// ```
     fn from(array: [T; N]) -> Box<[T]> {
-        box array
+        #[rustc_box]
+        Box::new(array)
     }
 }
 
+/// Casts a boxed slice to a boxed array.
+///
+/// # Safety
+///
+/// `boxed_slice.len()` must be exactly `N`.
+unsafe fn boxed_slice_as_array_unchecked<T, A: Allocator, const N: usize>(
+    boxed_slice: Box<[T], A>,
+) -> Box<[T; N], A> {
+    debug_assert_eq!(boxed_slice.len(), N);
+
+    let (ptr, alloc) = Box::into_raw_with_allocator(boxed_slice);
+    // SAFETY: Pointer and allocator came from an existing box,
+    // and our safety condition requires that the length is exactly `N`
+    unsafe { Box::from_raw_in(ptr as *mut [T; N], alloc) }
+}
+
 #[stable(feature = "boxed_slice_try_from", since = "1.43.0")]
 impl<T, const N: usize> TryFrom<Box<[T]>> for Box<[T; N]> {
     type Error = Box<[T]>;
@@ -1573,13 +1656,46 @@ impl<T, const N: usize> TryFrom<Box<[T]>> for Box<[T; N]> {
     /// `boxed_slice.len()` does not equal `N`.
     fn try_from(boxed_slice: Box<[T]>) -> Result<Self, Self::Error> {
         if boxed_slice.len() == N {
-            Ok(unsafe { Box::from_raw(Box::into_raw(boxed_slice) as *mut [T; N]) })
+            Ok(unsafe { boxed_slice_as_array_unchecked(boxed_slice) })
         } else {
             Err(boxed_slice)
         }
     }
 }
 
+#[cfg(not(no_global_oom_handling))]
+#[stable(feature = "boxed_array_try_from_vec", since = "1.66.0")]
+impl<T, const N: usize> TryFrom<Vec<T>> for Box<[T; N]> {
+    type Error = Vec<T>;
+
+    /// Attempts to convert a `Vec<T>` into a `Box<[T; N]>`.
+    ///
+    /// Like [`Vec::into_boxed_slice`], this is in-place if `vec.capacity() == N`,
+    /// but will require a reallocation otherwise.
+    ///
+    /// # Errors
+    ///
+    /// Returns the original `Vec<T>` in the `Err` variant if
+    /// `boxed_slice.len()` does not equal `N`.
+    ///
+    /// # Examples
+    ///
+    /// This can be used with [`vec!`] to create an array on the heap:
+    ///
+    /// ```
+    /// let state: Box<[f32; 100]> = vec![1.0; 100].try_into().unwrap();
+    /// assert_eq!(state.len(), 100);
+    /// ```
+    fn try_from(vec: Vec<T>) -> Result<Self, Self::Error> {
+        if vec.len() == N {
+            let boxed_slice = vec.into_boxed_slice();
+            Ok(unsafe { boxed_slice_as_array_unchecked(boxed_slice) })
+        } else {
+            Err(vec)
+        }
+    }
+}
+
 impl<A: Allocator> Box<dyn Any, A> {
     /// Attempt to downcast the box to a concrete type.
     ///
@@ -1869,7 +1985,7 @@ impl<I: ExactSizeIterator + ?Sized, A: Allocator> ExactSizeIterator for Box<I, A
 impl<I: FusedIterator + ?Sized, A: Allocator> FusedIterator for Box<I, A> {}
 
 #[stable(feature = "boxed_closure_impls", since = "1.35.0")]
-impl<Args, F: FnOnce<Args> + ?Sized, A: Allocator> FnOnce<Args> for Box<F, A> {
+impl<Args: Tuple, F: FnOnce<Args> + ?Sized, A: Allocator> FnOnce<Args> for Box<F, A> {
     type Output = <F as FnOnce<Args>>::Output;
 
     extern "rust-call" fn call_once(self, args: Args) -> Self::Output {
@@ -1878,20 +1994,20 @@ impl<Args, F: FnOnce<Args> + ?Sized, A: Allocator> FnOnce<Args> for Box<F, A> {
 }
 
 #[stable(feature = "boxed_closure_impls", since = "1.35.0")]
-impl<Args, F: FnMut<Args> + ?Sized, A: Allocator> FnMut<Args> for Box<F, A> {
+impl<Args: Tuple, F: FnMut<Args> + ?Sized, A: Allocator> FnMut<Args> for Box<F, A> {
     extern "rust-call" fn call_mut(&mut self, args: Args) -> Self::Output {
         <F as FnMut<Args>>::call_mut(self, args)
     }
 }
 
 #[stable(feature = "boxed_closure_impls", since = "1.35.0")]
-impl<Args, F: Fn<Args> + ?Sized, A: Allocator> Fn<Args> for Box<F, A> {
+impl<Args: Tuple, F: Fn<Args> + ?Sized, A: Allocator> Fn<Args> for Box<F, A> {
     extern "rust-call" fn call(&self, args: Args) -> Self::Output {
         <F as Fn<Args>>::call(self, args)
     }
 }
 
-#[unstable(feature = "coerce_unsized", issue = "27732")]
+#[unstable(feature = "coerce_unsized", issue = "18598")]
 impl<T: ?Sized + Unsize<U>, U: ?Sized, A: Allocator> CoerceUnsized<Box<U, A>> for Box<T, A> {}
 
 #[unstable(feature = "dispatch_from_dyn", issue = "none")]
@@ -1973,8 +2089,7 @@ impl<T: ?Sized, A: Allocator> AsMut<T> for Box<T, A> {
  *  could have a method to project a Pin<T> from it.
  */
 #[stable(feature = "pin", since = "1.33.0")]
-#[rustc_const_unstable(feature = "const_box", issue = "92521")]
-impl<T: ?Sized, A: Allocator> const Unpin for Box<T, A> where A: 'static {}
+impl<T: ?Sized, A: Allocator> Unpin for Box<T, A> where A: 'static {}
 
 #[unstable(feature = "generator_trait", issue = "43122")]
 impl<G: ?Sized + Generator<R> + Unpin, R, A: Allocator> Generator<R> for Box<G, A>
@@ -2026,3 +2141,292 @@ impl<S: ?Sized + AsyncIterator + Unpin> AsyncIterator for Box<S> {
         (**self).size_hint()
     }
 }
+
+impl dyn Error {
+    #[inline]
+    #[stable(feature = "error_downcast", since = "1.3.0")]
+    #[rustc_allow_incoherent_impl]
+    /// Attempts to downcast the box to a concrete type.
+    pub fn downcast<T: Error + 'static>(self: Box<Self>) -> Result<Box<T>, Box<dyn Error>> {
+        if self.is::<T>() {
+            unsafe {
+                let raw: *mut dyn Error = Box::into_raw(self);
+                Ok(Box::from_raw(raw as *mut T))
+            }
+        } else {
+            Err(self)
+        }
+    }
+}
+
+impl dyn Error + Send {
+    #[inline]
+    #[stable(feature = "error_downcast", since = "1.3.0")]
+    #[rustc_allow_incoherent_impl]
+    /// Attempts to downcast the box to a concrete type.
+    pub fn downcast<T: Error + 'static>(self: Box<Self>) -> Result<Box<T>, Box<dyn Error + Send>> {
+        let err: Box<dyn Error> = self;
+        <dyn Error>::downcast(err).map_err(|s| unsafe {
+            // Reapply the `Send` marker.
+            mem::transmute::<Box<dyn Error>, Box<dyn Error + Send>>(s)
+        })
+    }
+}
+
+impl dyn Error + Send + Sync {
+    #[inline]
+    #[stable(feature = "error_downcast", since = "1.3.0")]
+    #[rustc_allow_incoherent_impl]
+    /// Attempts to downcast the box to a concrete type.
+    pub fn downcast<T: Error + 'static>(self: Box<Self>) -> Result<Box<T>, Box<Self>> {
+        let err: Box<dyn Error> = self;
+        <dyn Error>::downcast(err).map_err(|s| unsafe {
+            // Reapply the `Send + Sync` marker.
+            mem::transmute::<Box<dyn Error>, Box<dyn Error + Send + Sync>>(s)
+        })
+    }
+}
+
+#[cfg(not(no_global_oom_handling))]
+#[stable(feature = "rust1", since = "1.0.0")]
+impl<'a, E: Error + 'a> From<E> for Box<dyn Error + 'a> {
+    /// Converts a type of [`Error`] into a box of dyn [`Error`].
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use std::error::Error;
+    /// use std::fmt;
+    /// use std::mem;
+    ///
+    /// #[derive(Debug)]
+    /// struct AnError;
+    ///
+    /// impl fmt::Display for AnError {
+    ///     fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+    ///         write!(f, "An error")
+    ///     }
+    /// }
+    ///
+    /// impl Error for AnError {}
+    ///
+    /// let an_error = AnError;
+    /// assert!(0 == mem::size_of_val(&an_error));
+    /// let a_boxed_error = Box::<dyn Error>::from(an_error);
+    /// assert!(mem::size_of::<Box<dyn Error>>() == mem::size_of_val(&a_boxed_error))
+    /// ```
+    fn from(err: E) -> Box<dyn Error + 'a> {
+        Box::new(err)
+    }
+}
+
+#[cfg(not(no_global_oom_handling))]
+#[stable(feature = "rust1", since = "1.0.0")]
+impl<'a, E: Error + Send + Sync + 'a> From<E> for Box<dyn Error + Send + Sync + 'a> {
+    /// Converts a type of [`Error`] + [`Send`] + [`Sync`] into a box of
+    /// dyn [`Error`] + [`Send`] + [`Sync`].
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use std::error::Error;
+    /// use std::fmt;
+    /// use std::mem;
+    ///
+    /// #[derive(Debug)]
+    /// struct AnError;
+    ///
+    /// impl fmt::Display for AnError {
+    ///     fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+    ///         write!(f, "An error")
+    ///     }
+    /// }
+    ///
+    /// impl Error for AnError {}
+    ///
+    /// unsafe impl Send for AnError {}
+    ///
+    /// unsafe impl Sync for AnError {}
+    ///
+    /// let an_error = AnError;
+    /// assert!(0 == mem::size_of_val(&an_error));
+    /// let a_boxed_error = Box::<dyn Error + Send + Sync>::from(an_error);
+    /// assert!(
+    ///     mem::size_of::<Box<dyn Error + Send + Sync>>() == mem::size_of_val(&a_boxed_error))
+    /// ```
+    fn from(err: E) -> Box<dyn Error + Send + Sync + 'a> {
+        Box::new(err)
+    }
+}
+
+#[cfg(not(no_global_oom_handling))]
+#[stable(feature = "rust1", since = "1.0.0")]
+impl From<String> for Box<dyn Error + Send + Sync> {
+    /// Converts a [`String`] into a box of dyn [`Error`] + [`Send`] + [`Sync`].
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use std::error::Error;
+    /// use std::mem;
+    ///
+    /// let a_string_error = "a string error".to_string();
+    /// let a_boxed_error = Box::<dyn Error + Send + Sync>::from(a_string_error);
+    /// assert!(
+    ///     mem::size_of::<Box<dyn Error + Send + Sync>>() == mem::size_of_val(&a_boxed_error))
+    /// ```
+    #[inline]
+    fn from(err: String) -> Box<dyn Error + Send + Sync> {
+        struct StringError(String);
+
+        impl Error for StringError {
+            #[allow(deprecated)]
+            fn description(&self) -> &str {
+                &self.0
+            }
+        }
+
+        impl fmt::Display for StringError {
+            fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+                fmt::Display::fmt(&self.0, f)
+            }
+        }
+
+        // Purposefully skip printing "StringError(..)"
+        impl fmt::Debug for StringError {
+            fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+                fmt::Debug::fmt(&self.0, f)
+            }
+        }
+
+        Box::new(StringError(err))
+    }
+}
+
+#[cfg(not(no_global_oom_handling))]
+#[stable(feature = "string_box_error", since = "1.6.0")]
+impl From<String> for Box<dyn Error> {
+    /// Converts a [`String`] into a box of dyn [`Error`].
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use std::error::Error;
+    /// use std::mem;
+    ///
+    /// let a_string_error = "a string error".to_string();
+    /// let a_boxed_error = Box::<dyn Error>::from(a_string_error);
+    /// assert!(mem::size_of::<Box<dyn Error>>() == mem::size_of_val(&a_boxed_error))
+    /// ```
+    fn from(str_err: String) -> Box<dyn Error> {
+        let err1: Box<dyn Error + Send + Sync> = From::from(str_err);
+        let err2: Box<dyn Error> = err1;
+        err2
+    }
+}
+
+#[cfg(not(no_global_oom_handling))]
+#[stable(feature = "rust1", since = "1.0.0")]
+impl<'a> From<&str> for Box<dyn Error + Send + Sync + 'a> {
+    /// Converts a [`str`] into a box of dyn [`Error`] + [`Send`] + [`Sync`].
+    ///
+    /// [`str`]: prim@str
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use std::error::Error;
+    /// use std::mem;
+    ///
+    /// let a_str_error = "a str error";
+    /// let a_boxed_error = Box::<dyn Error + Send + Sync>::from(a_str_error);
+    /// assert!(
+    ///     mem::size_of::<Box<dyn Error + Send + Sync>>() == mem::size_of_val(&a_boxed_error))
+    /// ```
+    #[inline]
+    fn from(err: &str) -> Box<dyn Error + Send + Sync + 'a> {
+        From::from(String::from(err))
+    }
+}
+
+#[cfg(not(no_global_oom_handling))]
+#[stable(feature = "string_box_error", since = "1.6.0")]
+impl From<&str> for Box<dyn Error> {
+    /// Converts a [`str`] into a box of dyn [`Error`].
+    ///
+    /// [`str`]: prim@str
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use std::error::Error;
+    /// use std::mem;
+    ///
+    /// let a_str_error = "a str error";
+    /// let a_boxed_error = Box::<dyn Error>::from(a_str_error);
+    /// assert!(mem::size_of::<Box<dyn Error>>() == mem::size_of_val(&a_boxed_error))
+    /// ```
+    fn from(err: &str) -> Box<dyn Error> {
+        From::from(String::from(err))
+    }
+}
+
+#[cfg(not(no_global_oom_handling))]
+#[stable(feature = "cow_box_error", since = "1.22.0")]
+impl<'a, 'b> From<Cow<'b, str>> for Box<dyn Error + Send + Sync + 'a> {
+    /// Converts a [`Cow`] into a box of dyn [`Error`] + [`Send`] + [`Sync`].
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use std::error::Error;
+    /// use std::mem;
+    /// use std::borrow::Cow;
+    ///
+    /// let a_cow_str_error = Cow::from("a str error");
+    /// let a_boxed_error = Box::<dyn Error + Send + Sync>::from(a_cow_str_error);
+    /// assert!(
+    ///     mem::size_of::<Box<dyn Error + Send + Sync>>() == mem::size_of_val(&a_boxed_error))
+    /// ```
+    fn from(err: Cow<'b, str>) -> Box<dyn Error + Send + Sync + 'a> {
+        From::from(String::from(err))
+    }
+}
+
+#[cfg(not(no_global_oom_handling))]
+#[stable(feature = "cow_box_error", since = "1.22.0")]
+impl<'a> From<Cow<'a, str>> for Box<dyn Error> {
+    /// Converts a [`Cow`] into a box of dyn [`Error`].
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use std::error::Error;
+    /// use std::mem;
+    /// use std::borrow::Cow;
+    ///
+    /// let a_cow_str_error = Cow::from("a str error");
+    /// let a_boxed_error = Box::<dyn Error>::from(a_cow_str_error);
+    /// assert!(mem::size_of::<Box<dyn Error>>() == mem::size_of_val(&a_boxed_error))
+    /// ```
+    fn from(err: Cow<'a, str>) -> Box<dyn Error> {
+        From::from(String::from(err))
+    }
+}
+
+#[stable(feature = "box_error", since = "1.8.0")]
+impl<T: core::error::Error> core::error::Error for Box<T> {
+    #[allow(deprecated, deprecated_in_future)]
+    fn description(&self) -> &str {
+        core::error::Error::description(&**self)
+    }
+
+    #[allow(deprecated)]
+    fn cause(&self) -> Option<&dyn core::error::Error> {
+        core::error::Error::cause(&**self)
+    }
+
+    fn source(&self) -> Option<&(dyn core::error::Error + 'static)> {
+        core::error::Error::source(&**self)
+    }
+}
index 1eec265..2506065 100644 (file)
@@ -141,7 +141,7 @@ impl Display for TryReserveError {
                 " because the computed capacity exceeded the collection's maximum"
             }
             TryReserveErrorKind::AllocError { .. } => {
-                " because the memory allocator returned a error"
+                " because the memory allocator returned an error"
             }
         };
         fmt.write_str(reason)
@@ -154,3 +154,6 @@ trait SpecExtend<I: IntoIterator> {
     /// Extends `self` with the contents of the given iterator.
     fn spec_extend(&mut self, iter: I);
 }
+
+#[stable(feature = "try_reserve", since = "1.57.0")]
+impl core::error::Error for TryReserveError {}
index 3aebf83..5f37437 100644 (file)
@@ -5,7 +5,7 @@
 //! This library provides smart pointers and collections for managing
 //! heap-allocated values.
 //!
-//! This library, like libcore, normally doesn’t need to be used directly
+//! This library, like core, normally doesn’t need to be used directly
 //! since its contents are re-exported in the [`std` crate](../std/index.html).
 //! Crates that use the `#![no_std]` attribute however will typically
 //! not depend on `std`, so they’d use this crate instead.
 //! [`Rc`]: rc
 //! [`RefCell`]: core::cell
 
-// To run liballoc tests without x.py without ending up with two copies of liballoc, Miri needs to be
-// able to "empty" this crate. See <https://github.com/rust-lang/miri-test-libstd/issues/4>.
-// rustc itself never sets the feature, so this line has no affect there.
-#![cfg(any(not(feature = "miri-test-libstd"), test, doctest))]
 #![allow(unused_attributes)]
 #![stable(feature = "alloc", since = "1.36.0")]
 #![doc(
     any(not(feature = "miri-test-libstd"), test, doctest),
     no_global_oom_handling,
     not(no_global_oom_handling),
+    not(no_rc),
+    not(no_sync),
     target_has_atomic = "ptr"
 ))]
 #![no_std]
 #![needs_allocator]
+// To run alloc tests without x.py without ending up with two copies of alloc, Miri needs to be
+// able to "empty" this crate. See <https://github.com/rust-lang/miri-test-libstd/issues/4>.
+// rustc itself never sets the feature, so this line has no affect there.
+#![cfg(any(not(feature = "miri-test-libstd"), test, doctest))]
 //
 // Lints:
 #![deny(unsafe_op_in_unsafe_fn)]
+#![deny(fuzzy_provenance_casts)]
 #![warn(deprecated_in_future)]
 #![warn(missing_debug_implementations)]
 #![warn(missing_docs)]
 #![allow(explicit_outlives_requirements)]
 //
 // Library features:
-#![cfg_attr(not(no_global_oom_handling), feature(alloc_c_string))]
 #![feature(alloc_layout_extra)]
 #![feature(allocator_api)]
 #![feature(array_chunks)]
+#![feature(array_into_iter_constructors)]
 #![feature(array_methods)]
 #![feature(array_windows)]
 #![feature(assert_matches)]
 #![feature(coerce_unsized)]
 #![cfg_attr(not(no_global_oom_handling), feature(const_alloc_error))]
 #![feature(const_box)]
-#![cfg_attr(not(no_global_oom_handling), feature(const_btree_new))]
+#![cfg_attr(not(no_global_oom_handling), feature(const_btree_len))]
 #![cfg_attr(not(no_borrow), feature(const_cow_is_borrowed))]
 #![feature(const_convert)]
 #![feature(const_size_of_val)]
 #![feature(const_align_of_val)]
 #![feature(const_ptr_read)]
+#![feature(const_maybe_uninit_zeroed)]
 #![feature(const_maybe_uninit_write)]
 #![feature(const_maybe_uninit_as_mut_ptr)]
 #![feature(const_refs_to_cell)]
-#![feature(core_c_str)]
 #![feature(core_intrinsics)]
-#![feature(core_ffi_c)]
+#![feature(core_panic)]
 #![feature(const_eval_select)]
 #![feature(const_pin)]
+#![feature(const_waker)]
 #![feature(cstr_from_bytes_until_nul)]
 #![feature(dispatch_from_dyn)]
+#![feature(error_generic_member_access)]
+#![feature(error_in_core)]
 #![feature(exact_size_is_empty)]
 #![feature(extend_one)]
 #![feature(fmt_internals)]
 #![feature(fn_traits)]
 #![feature(hasher_prefixfree_extras)]
+#![feature(inline_const)]
 #![feature(inplace_iteration)]
+#![cfg_attr(test, feature(is_sorted))]
 #![feature(iter_advance_by)]
+#![feature(iter_next_chunk)]
+#![feature(iter_repeat_n)]
 #![feature(layout_for_ptr)]
 #![feature(maybe_uninit_slice)]
+#![feature(maybe_uninit_uninit_array)]
+#![feature(maybe_uninit_uninit_array_transpose)]
 #![cfg_attr(test, feature(new_uninit))]
 #![feature(nonnull_slice_from_raw_parts)]
 #![feature(pattern)]
+#![feature(pointer_byte_offsets)]
+#![feature(provide_any)]
 #![feature(ptr_internals)]
 #![feature(ptr_metadata)]
 #![feature(ptr_sub_ptr)]
 #![feature(receiver_trait)]
+#![feature(saturating_int_impl)]
 #![feature(set_ptr_value)]
+#![feature(sized_type_properties)]
+#![feature(slice_from_ptr_range)]
 #![feature(slice_group_by)]
 #![feature(slice_ptr_get)]
 #![feature(slice_ptr_len)]
 #![feature(trusted_len)]
 #![feature(trusted_random_access)]
 #![feature(try_trait_v2)]
+#![feature(tuple_trait)]
 #![feature(unchecked_math)]
 #![feature(unicode_internals)]
 #![feature(unsize)]
+#![feature(utf8_chunks)]
+#![feature(std_internals)]
 //
 // Language features:
 #![feature(allocator_internals)]
 #![feature(allow_internal_unstable)]
 #![feature(associated_type_bounds)]
-#![feature(box_syntax)]
 #![feature(cfg_sanitize)]
 #![feature(const_deref)]
 #![feature(const_mut_refs)]
 #![cfg_attr(not(test), feature(generator_trait))]
 #![feature(hashmap_internals)]
 #![feature(lang_items)]
-#![feature(let_else)]
 #![feature(min_specialization)]
 #![feature(negative_impls)]
 #![feature(never_type)]
-#![feature(nll)] // Not necessary, but here to test the `nll` feature.
 #![feature(rustc_allow_const_fn_unstable)]
 #![feature(rustc_attrs)]
+#![feature(pointer_is_aligned)]
 #![feature(slice_internals)]
 #![feature(staged_api)]
+#![feature(stmt_expr_attributes)]
 #![cfg_attr(test, feature(test))]
 #![feature(unboxed_closures)]
 #![feature(unsized_fn_params)]
 #![feature(c_unwind)]
+#![feature(with_negative_coherence)]
+#![cfg_attr(test, feature(panic_update_hook))]
 //
 // Rustdoc features:
 #![feature(doc_cfg)]
 extern crate std;
 #[cfg(test)]
 extern crate test;
+#[cfg(test)]
+mod testing;
 
 // Module with internal macros used by other modules (needs to be included before other modules).
 #[cfg(not(no_macros))]
@@ -218,7 +241,7 @@ mod boxed {
 #[cfg(not(no_borrow))]
 pub mod borrow;
 pub mod collections;
-#[cfg(not(no_global_oom_handling))]
+#[cfg(all(not(no_rc), not(no_sync), not(no_global_oom_handling)))]
 pub mod ffi;
 #[cfg(not(no_fmt))]
 pub mod fmt;
@@ -229,10 +252,9 @@ pub mod slice;
 pub mod str;
 #[cfg(not(no_string))]
 pub mod string;
-#[cfg(not(no_sync))]
-#[cfg(target_has_atomic = "ptr")]
+#[cfg(all(not(no_rc), not(no_sync), target_has_atomic = "ptr"))]
 pub mod sync;
-#[cfg(all(not(no_global_oom_handling), target_has_atomic = "ptr"))]
+#[cfg(all(not(no_global_oom_handling), not(no_rc), not(no_sync), target_has_atomic = "ptr"))]
 pub mod task;
 #[cfg(test)]
 mod tests;
@@ -243,3 +265,20 @@ pub mod vec;
 pub mod __export {
     pub use core::format_args;
 }
+
+#[cfg(test)]
+#[allow(dead_code)] // Not used in all configurations
+pub(crate) mod test_helpers {
+    /// Copied from `std::test_helpers::test_rng`, since these tests rely on the
+    /// seed not being the same for every RNG invocation too.
+    pub(crate) fn test_rng() -> rand_xorshift::XorShiftRng {
+        use std::hash::{BuildHasher, Hash, Hasher};
+        let mut hasher = std::collections::hash_map::RandomState::new().build_hasher();
+        std::panic::Location::caller().hash(&mut hasher);
+        let hc64 = hasher.finish();
+        let seed_vec =
+            hc64.to_le_bytes().into_iter().chain(0u8..8).collect::<crate::vec::Vec<u8>>();
+        let seed: [u8; 16] = seed_vec.as_slice().try_into().unwrap();
+        rand::SeedableRng::from_seed(seed)
+    }
+}
index eb77db5..5db87ea 100644 (file)
@@ -5,7 +5,7 @@
 use core::alloc::LayoutError;
 use core::cmp;
 use core::intrinsics;
-use core::mem::{self, ManuallyDrop, MaybeUninit};
+use core::mem::{self, ManuallyDrop, MaybeUninit, SizedTypeProperties};
 use core::ops::Drop;
 use core::ptr::{self, NonNull, Unique};
 use core::slice;
@@ -177,7 +177,7 @@ impl<T, A: Allocator> RawVec<T, A> {
     #[cfg(not(no_global_oom_handling))]
     fn allocate_in(capacity: usize, init: AllocInit, alloc: A) -> Self {
         // Don't allocate here because `Drop` will not deallocate when `capacity` is 0.
-        if mem::size_of::<T>() == 0 || capacity == 0 {
+        if T::IS_ZST || capacity == 0 {
             Self::new_in(alloc)
         } else {
             // We avoid `unwrap_or_else` here because it bloats the amount of
@@ -212,7 +212,7 @@ impl<T, A: Allocator> RawVec<T, A> {
 
     fn try_allocate_in(capacity: usize, init: AllocInit, alloc: A) -> Result<Self, TryReserveError> {
         // Don't allocate here because `Drop` will not deallocate when `capacity` is 0.
-        if mem::size_of::<T>() == 0 || capacity == 0 {
+        if T::IS_ZST || capacity == 0 {
             return Ok(Self::new_in(alloc));
         }
 
@@ -262,7 +262,7 @@ impl<T, A: Allocator> RawVec<T, A> {
     /// This will always be `usize::MAX` if `T` is zero-sized.
     #[inline(always)]
     pub fn capacity(&self) -> usize {
-        if mem::size_of::<T>() == 0 { usize::MAX } else { self.cap }
+        if T::IS_ZST { usize::MAX } else { self.cap }
     }
 
     /// Returns a shared reference to the allocator backing this `RawVec`.
@@ -271,7 +271,7 @@ impl<T, A: Allocator> RawVec<T, A> {
     }
 
     fn current_memory(&self) -> Option<(NonNull<u8>, Layout)> {
-        if mem::size_of::<T>() == 0 || self.cap == 0 {
+        if T::IS_ZST || self.cap == 0 {
             None
         } else {
             // We have an allocated chunk of memory, so we can bypass runtime
@@ -419,7 +419,7 @@ impl<T, A: Allocator> RawVec<T, A> {
         // This is ensured by the calling contexts.
         debug_assert!(additional > 0);
 
-        if mem::size_of::<T>() == 0 {
+        if T::IS_ZST {
             // Since we return a capacity of `usize::MAX` when `elem_size` is
             // 0, getting to here necessarily means the `RawVec` is overfull.
             return Err(CapacityOverflow.into());
@@ -445,7 +445,7 @@ impl<T, A: Allocator> RawVec<T, A> {
     // `grow_amortized`, but this method is usually instantiated less often so
     // it's less critical.
     fn grow_exact(&mut self, len: usize, additional: usize) -> Result<(), TryReserveError> {
-        if mem::size_of::<T>() == 0 {
+        if T::IS_ZST {
             // Since we return a capacity of `usize::MAX` when the type size is
             // 0, getting to here necessarily means the `RawVec` is overfull.
             return Err(CapacityOverflow.into());
@@ -460,7 +460,7 @@ impl<T, A: Allocator> RawVec<T, A> {
         Ok(())
     }
 
-    #[allow(dead_code)]
+    #[cfg(not(no_global_oom_handling))]
     fn shrink(&mut self, cap: usize) -> Result<(), TryReserveError> {
         assert!(cap <= self.capacity(), "Tried to shrink to a larger capacity");
 
index e444e97..245e015 100644 (file)
@@ -1,84 +1,14 @@
 // SPDX-License-Identifier: Apache-2.0 OR MIT
 
-//! A dynamically-sized view into a contiguous sequence, `[T]`.
+//! Utilities for the slice primitive type.
 //!
 //! *[See also the slice primitive type](slice).*
 //!
-//! Slices are a view into a block of memory represented as a pointer and a
-//! length.
+//! Most of the structs in this module are iterator types which can only be created
+//! using a certain function. For example, `slice.iter()` yields an [`Iter`].
 //!
-//! ```
-//! // slicing a Vec
-//! let vec = vec![1, 2, 3];
-//! let int_slice = &vec[..];
-//! // coercing an array to a slice
-//! let str_slice: &[&str] = &["one", "two", "three"];
-//! ```
-//!
-//! Slices are either mutable or shared. The shared slice type is `&[T]`,
-//! while the mutable slice type is `&mut [T]`, where `T` represents the element
-//! type. For example, you can mutate the block of memory that a mutable slice
-//! points to:
-//!
-//! ```
-//! let x = &mut [1, 2, 3];
-//! x[1] = 7;
-//! assert_eq!(x, &[1, 7, 3]);
-//! ```
-//!
-//! Here are some of the things this module contains:
-//!
-//! ## Structs
-//!
-//! There are several structs that are useful for slices, such as [`Iter`], which
-//! represents iteration over a slice.
-//!
-//! ## Trait Implementations
-//!
-//! There are several implementations of common traits for slices. Some examples
-//! include:
-//!
-//! * [`Clone`]
-//! * [`Eq`], [`Ord`] - for slices whose element type are [`Eq`] or [`Ord`].
-//! * [`Hash`] - for slices whose element type is [`Hash`].
-//!
-//! ## Iteration
-//!
-//! The slices implement `IntoIterator`. The iterator yields references to the
-//! slice elements.
-//!
-//! ```
-//! let numbers = &[0, 1, 2];
-//! for n in numbers {
-//!     println!("{n} is a number!");
-//! }
-//! ```
-//!
-//! The mutable slice yields mutable references to the elements:
-//!
-//! ```
-//! let mut scores = [7, 8, 9];
-//! for score in &mut scores[..] {
-//!     *score += 1;
-//! }
-//! ```
-//!
-//! This iterator yields mutable references to the slice's elements, so while
-//! the element type of the slice is `i32`, the element type of the iterator is
-//! `&mut i32`.
-//!
-//! * [`.iter`] and [`.iter_mut`] are the explicit methods to return the default
-//!   iterators.
-//! * Further methods that return iterators are [`.split`], [`.splitn`],
-//!   [`.chunks`], [`.windows`] and more.
-//!
-//! [`Hash`]: core::hash::Hash
-//! [`.iter`]: slice::iter
-//! [`.iter_mut`]: slice::iter_mut
-//! [`.split`]: slice::split
-//! [`.splitn`]: slice::splitn
-//! [`.chunks`]: slice::chunks
-//! [`.windows`]: slice::windows
+//! A few functions are provided to create a slice from a value reference
+//! or from a raw pointer.
 #![stable(feature = "rust1", since = "1.0.0")]
 // Many of the usings in this module are only used in the test configuration.
 // It's cleaner to just turn off the unused_imports warning than to fix them.
@@ -88,20 +18,23 @@ use core::borrow::{Borrow, BorrowMut};
 #[cfg(not(no_global_oom_handling))]
 use core::cmp::Ordering::{self, Less};
 #[cfg(not(no_global_oom_handling))]
-use core::mem;
-#[cfg(not(no_global_oom_handling))]
-use core::mem::size_of;
+use core::mem::{self, SizedTypeProperties};
 #[cfg(not(no_global_oom_handling))]
 use core::ptr;
+#[cfg(not(no_global_oom_handling))]
+use core::slice::sort;
 
 use crate::alloc::Allocator;
 #[cfg(not(no_global_oom_handling))]
-use crate::alloc::Global;
+use crate::alloc::{self, Global};
 #[cfg(not(no_global_oom_handling))]
 use crate::borrow::ToOwned;
 use crate::boxed::Box;
 use crate::vec::Vec;
 
+#[cfg(test)]
+mod tests;
+
 #[unstable(feature = "slice_range", issue = "76393")]
 pub use core::slice::range;
 #[unstable(feature = "array_chunks", issue = "74985")]
@@ -116,6 +49,8 @@ pub use core::slice::EscapeAscii;
 pub use core::slice::SliceIndex;
 #[stable(feature = "from_ref", since = "1.28.0")]
 pub use core::slice::{from_mut, from_ref};
+#[unstable(feature = "slice_from_ptr_range", issue = "89792")]
+pub use core::slice::{from_mut_ptr_range, from_ptr_range};
 #[stable(feature = "rust1", since = "1.0.0")]
 pub use core::slice::{from_raw_parts, from_raw_parts_mut};
 #[stable(feature = "rust1", since = "1.0.0")]
@@ -275,7 +210,7 @@ impl<T> [T] {
     where
         T: Ord,
     {
-        merge_sort(self, |a, b| a.lt(b));
+        stable_sort(self, T::lt);
     }
 
     /// Sorts the slice with a comparator function.
@@ -331,7 +266,7 @@ impl<T> [T] {
     where
         F: FnMut(&T, &T) -> Ordering,
     {
-        merge_sort(self, |a, b| compare(a, b) == Less);
+        stable_sort(self, |a, b| compare(a, b) == Less);
     }
 
     /// Sorts the slice with a key extraction function.
@@ -374,7 +309,7 @@ impl<T> [T] {
         F: FnMut(&T) -> K,
         K: Ord,
     {
-        merge_sort(self, |a, b| f(a).lt(&f(b)));
+        stable_sort(self, |a, b| f(a).lt(&f(b)));
     }
 
     /// Sorts the slice with a key extraction function.
@@ -530,7 +465,7 @@ impl<T> [T] {
         hack::into_vec(self)
     }
 
-    /// Creates a vector by repeating a slice `n` times.
+    /// Creates a vector by copying a slice `n` times.
     ///
     /// # Panics
     ///
@@ -725,7 +660,7 @@ impl [u8] {
 ///
 /// ```error
 /// error[E0207]: the type parameter `T` is not constrained by the impl trait, self type, or predica
-///    --> src/liballoc/slice.rs:608:6
+///    --> library/alloc/src/slice.rs:608:6
 ///     |
 /// 608 | impl<T: Clone, V: Borrow<[T]>> Concat for [V] {
 ///     |      ^ unconstrained type parameter
@@ -836,14 +771,14 @@ impl<T: Clone, V: Borrow<[T]>> Join<&[T]> for [V] {
 ////////////////////////////////////////////////////////////////////////////////
 
 #[stable(feature = "rust1", since = "1.0.0")]
-impl<T> Borrow<[T]> for Vec<T> {
+impl<T, A: Allocator> Borrow<[T]> for Vec<T, A> {
     fn borrow(&self) -> &[T] {
         &self[..]
     }
 }
 
 #[stable(feature = "rust1", since = "1.0.0")]
-impl<T> BorrowMut<[T]> for Vec<T> {
+impl<T, A: Allocator> BorrowMut<[T]> for Vec<T, A> {
     fn borrow_mut(&mut self) -> &mut [T] {
         &mut self[..]
     }
@@ -881,324 +816,52 @@ impl<T: Clone> ToOwned for [T] {
 // Sorting
 ////////////////////////////////////////////////////////////////////////////////
 
-/// Inserts `v[0]` into pre-sorted sequence `v[1..]` so that whole `v[..]` becomes sorted.
-///
-/// This is the integral subroutine of insertion sort.
+#[inline]
 #[cfg(not(no_global_oom_handling))]
-fn insert_head<T, F>(v: &mut [T], is_less: &mut F)
+fn stable_sort<T, F>(v: &mut [T], mut is_less: F)
 where
     F: FnMut(&T, &T) -> bool,
 {
-    if v.len() >= 2 && is_less(&v[1], &v[0]) {
-        unsafe {
-            // There are three ways to implement insertion here:
-            //
-            // 1. Swap adjacent elements until the first one gets to its final destination.
-            //    However, this way we copy data around more than is necessary. If elements are big
-            //    structures (costly to copy), this method will be slow.
-            //
-            // 2. Iterate until the right place for the first element is found. Then shift the
-            //    elements succeeding it to make room for it and finally place it into the
-            //    remaining hole. This is a good method.
-            //
-            // 3. Copy the first element into a temporary variable. Iterate until the right place
-            //    for it is found. As we go along, copy every traversed element into the slot
-            //    preceding it. Finally, copy data from the temporary variable into the remaining
-            //    hole. This method is very good. Benchmarks demonstrated slightly better
-            //    performance than with the 2nd method.
-            //
-            // All methods were benchmarked, and the 3rd showed best results. So we chose that one.
-            let tmp = mem::ManuallyDrop::new(ptr::read(&v[0]));
-
-            // Intermediate state of the insertion process is always tracked by `hole`, which
-            // serves two purposes:
-            // 1. Protects integrity of `v` from panics in `is_less`.
-            // 2. Fills the remaining hole in `v` in the end.
-            //
-            // Panic safety:
-            //
-            // If `is_less` panics at any point during the process, `hole` will get dropped and
-            // fill the hole in `v` with `tmp`, thus ensuring that `v` still holds every object it
-            // initially held exactly once.
-            let mut hole = InsertionHole { src: &*tmp, dest: &mut v[1] };
-            ptr::copy_nonoverlapping(&v[1], &mut v[0], 1);
-
-            for i in 2..v.len() {
-                if !is_less(&v[i], &*tmp) {
-                    break;
-                }
-                ptr::copy_nonoverlapping(&v[i], &mut v[i - 1], 1);
-                hole.dest = &mut v[i];
-            }
-            // `hole` gets dropped and thus copies `tmp` into the remaining hole in `v`.
-        }
-    }
-
-    // When dropped, copies from `src` into `dest`.
-    struct InsertionHole<T> {
-        src: *const T,
-        dest: *mut T,
-    }
-
-    impl<T> Drop for InsertionHole<T> {
-        fn drop(&mut self) {
-            unsafe {
-                ptr::copy_nonoverlapping(self.src, self.dest, 1);
-            }
-        }
+    if T::IS_ZST {
+        // Sorting has no meaningful behavior on zero-sized types. Do nothing.
+        return;
     }
-}
-
-/// Merges non-decreasing runs `v[..mid]` and `v[mid..]` using `buf` as temporary storage, and
-/// stores the result into `v[..]`.
-///
-/// # Safety
-///
-/// The two slices must be non-empty and `mid` must be in bounds. Buffer `buf` must be long enough
-/// to hold a copy of the shorter slice. Also, `T` must not be a zero-sized type.
-#[cfg(not(no_global_oom_handling))]
-unsafe fn merge<T, F>(v: &mut [T], mid: usize, buf: *mut T, is_less: &mut F)
-where
-    F: FnMut(&T, &T) -> bool,
-{
-    let len = v.len();
-    let v = v.as_mut_ptr();
-    let (v_mid, v_end) = unsafe { (v.add(mid), v.add(len)) };
 
-    // The merge process first copies the shorter run into `buf`. Then it traces the newly copied
-    // run and the longer run forwards (or backwards), comparing their next unconsumed elements and
-    // copying the lesser (or greater) one into `v`.
-    //
-    // As soon as the shorter run is fully consumed, the process is done. If the longer run gets
-    // consumed first, then we must copy whatever is left of the shorter run into the remaining
-    // hole in `v`.
-    //
-    // Intermediate state of the process is always tracked by `hole`, which serves two purposes:
-    // 1. Protects integrity of `v` from panics in `is_less`.
-    // 2. Fills the remaining hole in `v` if the longer run gets consumed first.
-    //
-    // Panic safety:
-    //
-    // If `is_less` panics at any point during the process, `hole` will get dropped and fill the
-    // hole in `v` with the unconsumed range in `buf`, thus ensuring that `v` still holds every
-    // object it initially held exactly once.
-    let mut hole;
+    let elem_alloc_fn = |len: usize| -> *mut T {
+        // SAFETY: Creating the layout is safe as long as merge_sort never calls this with len >
+        // v.len(). Alloc in general will only be used as 'shadow-region' to store temporary swap
+        // elements.
+        unsafe { alloc::alloc(alloc::Layout::array::<T>(len).unwrap_unchecked()) as *mut T }
+    };
 
-    if mid <= len - mid {
-        // The left run is shorter.
+    let elem_dealloc_fn = |buf_ptr: *mut T, len: usize| {
+        // SAFETY: Creating the layout is safe as long as merge_sort never calls this with len >
+        // v.len(). The caller must ensure that buf_ptr was created by elem_alloc_fn with the same
+        // len.
         unsafe {
-            ptr::copy_nonoverlapping(v, buf, mid);
-            hole = MergeHole { start: buf, end: buf.add(mid), dest: v };
+            alloc::dealloc(buf_ptr as *mut u8, alloc::Layout::array::<T>(len).unwrap_unchecked());
         }
+    };
 
-        // Initially, these pointers point to the beginnings of their arrays.
-        let left = &mut hole.start;
-        let mut right = v_mid;
-        let out = &mut hole.dest;
-
-        while *left < hole.end && right < v_end {
-            // Consume the lesser side.
-            // If equal, prefer the left run to maintain stability.
-            unsafe {
-                let to_copy = if is_less(&*right, &**left) {
-                    get_and_increment(&mut right)
-                } else {
-                    get_and_increment(left)
-                };
-                ptr::copy_nonoverlapping(to_copy, get_and_increment(out), 1);
-            }
-        }
-    } else {
-        // The right run is shorter.
+    let run_alloc_fn = |len: usize| -> *mut sort::TimSortRun {
+        // SAFETY: Creating the layout is safe as long as merge_sort never calls this with an
+        // obscene length or 0.
         unsafe {
-            ptr::copy_nonoverlapping(v_mid, buf, len - mid);
-            hole = MergeHole { start: buf, end: buf.add(len - mid), dest: v_mid };
+            alloc::alloc(alloc::Layout::array::<sort::TimSortRun>(len).unwrap_unchecked())
+                as *mut sort::TimSortRun
         }
+    };
 
-        // Initially, these pointers point past the ends of their arrays.
-        let left = &mut hole.dest;
-        let right = &mut hole.end;
-        let mut out = v_end;
-
-        while v < *left && buf < *right {
-            // Consume the greater side.
-            // If equal, prefer the right run to maintain stability.
-            unsafe {
-                let to_copy = if is_less(&*right.offset(-1), &*left.offset(-1)) {
-                    decrement_and_get(left)
-                } else {
-                    decrement_and_get(right)
-                };
-                ptr::copy_nonoverlapping(to_copy, decrement_and_get(&mut out), 1);
-            }
-        }
-    }
-    // Finally, `hole` gets dropped. If the shorter run was not fully consumed, whatever remains of
-    // it will now be copied into the hole in `v`.
-
-    unsafe fn get_and_increment<T>(ptr: &mut *mut T) -> *mut T {
-        let old = *ptr;
-        *ptr = unsafe { ptr.offset(1) };
-        old
-    }
-
-    unsafe fn decrement_and_get<T>(ptr: &mut *mut T) -> *mut T {
-        *ptr = unsafe { ptr.offset(-1) };
-        *ptr
-    }
-
-    // When dropped, copies the range `start..end` into `dest..`.
-    struct MergeHole<T> {
-        start: *mut T,
-        end: *mut T,
-        dest: *mut T,
-    }
-
-    impl<T> Drop for MergeHole<T> {
-        fn drop(&mut self) {
-            // `T` is not a zero-sized type, and these are pointers into a slice's elements.
-            unsafe {
-                let len = self.end.sub_ptr(self.start);
-                ptr::copy_nonoverlapping(self.start, self.dest, len);
-            }
-        }
-    }
-}
-
-/// This merge sort borrows some (but not all) ideas from TimSort, which is described in detail
-/// [here](https://github.com/python/cpython/blob/main/Objects/listsort.txt).
-///
-/// The algorithm identifies strictly descending and non-descending subsequences, which are called
-/// natural runs. There is a stack of pending runs yet to be merged. Each newly found run is pushed
-/// onto the stack, and then some pairs of adjacent runs are merged until these two invariants are
-/// satisfied:
-///
-/// 1. for every `i` in `1..runs.len()`: `runs[i - 1].len > runs[i].len`
-/// 2. for every `i` in `2..runs.len()`: `runs[i - 2].len > runs[i - 1].len + runs[i].len`
-///
-/// The invariants ensure that the total running time is *O*(*n* \* log(*n*)) worst-case.
-#[cfg(not(no_global_oom_handling))]
-fn merge_sort<T, F>(v: &mut [T], mut is_less: F)
-where
-    F: FnMut(&T, &T) -> bool,
-{
-    // Slices of up to this length get sorted using insertion sort.
-    const MAX_INSERTION: usize = 20;
-    // Very short runs are extended using insertion sort to span at least this many elements.
-    const MIN_RUN: usize = 10;
-
-    // Sorting has no meaningful behavior on zero-sized types.
-    if size_of::<T>() == 0 {
-        return;
-    }
-
-    let len = v.len();
-
-    // Short arrays get sorted in-place via insertion sort to avoid allocations.
-    if len <= MAX_INSERTION {
-        if len >= 2 {
-            for i in (0..len - 1).rev() {
-                insert_head(&mut v[i..], &mut is_less);
-            }
-        }
-        return;
-    }
-
-    // Allocate a buffer to use as scratch memory. We keep the length 0 so we can keep in it
-    // shallow copies of the contents of `v` without risking the dtors running on copies if
-    // `is_less` panics. When merging two sorted runs, this buffer holds a copy of the shorter run,
-    // which will always have length at most `len / 2`.
-    let mut buf = Vec::with_capacity(len / 2);
-
-    // In order to identify natural runs in `v`, we traverse it backwards. That might seem like a
-    // strange decision, but consider the fact that merges more often go in the opposite direction
-    // (forwards). According to benchmarks, merging forwards is slightly faster than merging
-    // backwards. To conclude, identifying runs by traversing backwards improves performance.
-    let mut runs = vec![];
-    let mut end = len;
-    while end > 0 {
-        // Find the next natural run, and reverse it if it's strictly descending.
-        let mut start = end - 1;
-        if start > 0 {
-            start -= 1;
-            unsafe {
-                if is_less(v.get_unchecked(start + 1), v.get_unchecked(start)) {
-                    while start > 0 && is_less(v.get_unchecked(start), v.get_unchecked(start - 1)) {
-                        start -= 1;
-                    }
-                    v[start..end].reverse();
-                } else {
-                    while start > 0 && !is_less(v.get_unchecked(start), v.get_unchecked(start - 1))
-                    {
-                        start -= 1;
-                    }
-                }
-            }
-        }
-
-        // Insert some more elements into the run if it's too short. Insertion sort is faster than
-        // merge sort on short sequences, so this significantly improves performance.
-        while start > 0 && end - start < MIN_RUN {
-            start -= 1;
-            insert_head(&mut v[start..end], &mut is_less);
-        }
-
-        // Push this run onto the stack.
-        runs.push(Run { start, len: end - start });
-        end = start;
-
-        // Merge some pairs of adjacent runs to satisfy the invariants.
-        while let Some(r) = collapse(&runs) {
-            let left = runs[r + 1];
-            let right = runs[r];
-            unsafe {
-                merge(
-                    &mut v[left.start..right.start + right.len],
-                    left.len,
-                    buf.as_mut_ptr(),
-                    &mut is_less,
-                );
-            }
-            runs[r] = Run { start: left.start, len: left.len + right.len };
-            runs.remove(r + 1);
-        }
-    }
-
-    // Finally, exactly one run must remain in the stack.
-    debug_assert!(runs.len() == 1 && runs[0].start == 0 && runs[0].len == len);
-
-    // Examines the stack of runs and identifies the next pair of runs to merge. More specifically,
-    // if `Some(r)` is returned, that means `runs[r]` and `runs[r + 1]` must be merged next. If the
-    // algorithm should continue building a new run instead, `None` is returned.
-    //
-    // TimSort is infamous for its buggy implementations, as described here:
-    // http://envisage-project.eu/timsort-specification-and-verification/
-    //
-    // The gist of the story is: we must enforce the invariants on the top four runs on the stack.
-    // Enforcing them on just top three is not sufficient to ensure that the invariants will still
-    // hold for *all* runs in the stack.
-    //
-    // This function correctly checks invariants for the top four runs. Additionally, if the top
-    // run starts at index 0, it will always demand a merge operation until the stack is fully
-    // collapsed, in order to complete the sort.
-    #[inline]
-    fn collapse(runs: &[Run]) -> Option<usize> {
-        let n = runs.len();
-        if n >= 2
-            && (runs[n - 1].start == 0
-                || runs[n - 2].len <= runs[n - 1].len
-                || (n >= 3 && runs[n - 3].len <= runs[n - 2].len + runs[n - 1].len)
-                || (n >= 4 && runs[n - 4].len <= runs[n - 3].len + runs[n - 2].len))
-        {
-            if n >= 3 && runs[n - 3].len < runs[n - 1].len { Some(n - 3) } else { Some(n - 2) }
-        } else {
-            None
+    let run_dealloc_fn = |buf_ptr: *mut sort::TimSortRun, len: usize| {
+        // SAFETY: The caller must ensure that buf_ptr was created by elem_alloc_fn with the same
+        // len.
+        unsafe {
+            alloc::dealloc(
+                buf_ptr as *mut u8,
+                alloc::Layout::array::<sort::TimSortRun>(len).unwrap_unchecked(),
+            );
         }
-    }
+    };
 
-    #[derive(Clone, Copy)]
-    struct Run {
-        start: usize,
-        len: usize,
-    }
+    sort::merge_sort(v, &mut is_less, elem_alloc_fn, elem_dealloc_fn, run_alloc_fn, run_dealloc_fn);
 }
index b6a5f98..d503d2f 100644 (file)
@@ -3,7 +3,7 @@
 use crate::alloc::{Allocator, Global};
 use core::fmt;
 use core::iter::{FusedIterator, TrustedLen};
-use core::mem;
+use core::mem::{self, ManuallyDrop, SizedTypeProperties};
 use core::ptr::{self, NonNull};
 use core::slice::{self};
 
@@ -67,6 +67,77 @@ impl<'a, T, A: Allocator> Drain<'a, T, A> {
     pub fn allocator(&self) -> &A {
         unsafe { self.vec.as_ref().allocator() }
     }
+
+    /// Keep unyielded elements in the source `Vec`.
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// #![feature(drain_keep_rest)]
+    ///
+    /// let mut vec = vec!['a', 'b', 'c'];
+    /// let mut drain = vec.drain(..);
+    ///
+    /// assert_eq!(drain.next().unwrap(), 'a');
+    ///
+    /// // This call keeps 'b' and 'c' in the vec.
+    /// drain.keep_rest();
+    ///
+    /// // If we wouldn't call `keep_rest()`,
+    /// // `vec` would be empty.
+    /// assert_eq!(vec, ['b', 'c']);
+    /// ```
+    #[unstable(feature = "drain_keep_rest", issue = "101122")]
+    pub fn keep_rest(self) {
+        // At this moment layout looks like this:
+        //
+        // [head] [yielded by next] [unyielded] [yielded by next_back] [tail]
+        //        ^-- start         \_________/-- unyielded_len        \____/-- self.tail_len
+        //                          ^-- unyielded_ptr                  ^-- tail
+        //
+        // Normally `Drop` impl would drop [unyielded] and then move [tail] to the `start`.
+        // Here we want to
+        // 1. Move [unyielded] to `start`
+        // 2. Move [tail] to a new start at `start + len(unyielded)`
+        // 3. Update length of the original vec to `len(head) + len(unyielded) + len(tail)`
+        //    a. In case of ZST, this is the only thing we want to do
+        // 4. Do *not* drop self, as everything is put in a consistent state already, there is nothing to do
+        let mut this = ManuallyDrop::new(self);
+
+        unsafe {
+            let source_vec = this.vec.as_mut();
+
+            let start = source_vec.len();
+            let tail = this.tail_start;
+
+            let unyielded_len = this.iter.len();
+            let unyielded_ptr = this.iter.as_slice().as_ptr();
+
+            // ZSTs have no identity, so we don't need to move them around.
+            let needs_move = mem::size_of::<T>() != 0;
+
+            if needs_move {
+                let start_ptr = source_vec.as_mut_ptr().add(start);
+
+                // memmove back unyielded elements
+                if unyielded_ptr != start_ptr {
+                    let src = unyielded_ptr;
+                    let dst = start_ptr;
+
+                    ptr::copy(src, dst, unyielded_len);
+                }
+
+                // memmove back untouched tail
+                if tail != (start + unyielded_len) {
+                    let src = source_vec.as_ptr().add(tail);
+                    let dst = start_ptr.add(unyielded_len);
+                    ptr::copy(src, dst, this.tail_len);
+                }
+            }
+
+            source_vec.set_len(start + unyielded_len + this.tail_len);
+        }
+    }
 }
 
 #[stable(feature = "vec_drain_as_slice", since = "1.46.0")]
@@ -133,7 +204,7 @@ impl<T, A: Allocator> Drop for Drain<'_, T, A> {
 
         let mut vec = self.vec;
 
-        if mem::size_of::<T>() == 0 {
+        if T::IS_ZST {
             // ZSTs have no identity, so we don't need to move them around, we only need to drop the correct amount.
             // this can be achieved by manipulating the Vec length instead of moving values out from `iter`.
             unsafe {
@@ -154,9 +225,9 @@ impl<T, A: Allocator> Drop for Drain<'_, T, A> {
         }
 
         // as_slice() must only be called when iter.len() is > 0 because
-        // vec::Splice modifies vec::Drain fields and may grow the vec which would invalidate
-        // the iterator's internal pointers. Creating a reference to deallocated memory
-        // is invalid even when it is zero-length
+        // it also gets touched by vec::Splice which may turn it into a dangling pointer
+        // which would make it and the vec pointer point to different allocations which would
+        // lead to invalid pointer arithmetic below.
         let drop_ptr = iter.as_slice().as_ptr();
 
         unsafe {
index b04fce0..4b01922 100644 (file)
@@ -1,8 +1,9 @@
 // SPDX-License-Identifier: Apache-2.0 OR MIT
 
 use crate::alloc::{Allocator, Global};
-use core::ptr::{self};
-use core::slice::{self};
+use core::mem::{self, ManuallyDrop};
+use core::ptr;
+use core::slice;
 
 use super::Vec;
 
@@ -56,6 +57,61 @@ where
     pub fn allocator(&self) -> &A {
         self.vec.allocator()
     }
+
+    /// Keep unyielded elements in the source `Vec`.
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// #![feature(drain_filter)]
+    /// #![feature(drain_keep_rest)]
+    ///
+    /// let mut vec = vec!['a', 'b', 'c'];
+    /// let mut drain = vec.drain_filter(|_| true);
+    ///
+    /// assert_eq!(drain.next().unwrap(), 'a');
+    ///
+    /// // This call keeps 'b' and 'c' in the vec.
+    /// drain.keep_rest();
+    ///
+    /// // If we wouldn't call `keep_rest()`,
+    /// // `vec` would be empty.
+    /// assert_eq!(vec, ['b', 'c']);
+    /// ```
+    #[unstable(feature = "drain_keep_rest", issue = "101122")]
+    pub fn keep_rest(self) {
+        // At this moment layout looks like this:
+        //
+        //  _____________________/-- old_len
+        // /                     \
+        // [kept] [yielded] [tail]
+        //        \_______/ ^-- idx
+        //                \-- del
+        //
+        // Normally `Drop` impl would drop [tail] (via .for_each(drop), ie still calling `pred`)
+        //
+        // 1. Move [tail] after [kept]
+        // 2. Update length of the original vec to `old_len - del`
+        //    a. In case of ZST, this is the only thing we want to do
+        // 3. Do *not* drop self, as everything is put in a consistent state already, there is nothing to do
+        let mut this = ManuallyDrop::new(self);
+
+        unsafe {
+            // ZSTs have no identity, so we don't need to move them around.
+            let needs_move = mem::size_of::<T>() != 0;
+
+            if needs_move && this.idx < this.old_len && this.del > 0 {
+                let ptr = this.vec.as_mut_ptr();
+                let src = ptr.add(this.idx);
+                let dst = src.sub(this.del);
+                let tail_len = this.old_len - this.idx;
+                src.copy_to(dst, tail_len);
+            }
+
+            let new_len = this.old_len - this.del;
+            this.vec.set_len(new_len);
+        }
+    }
 }
 
 #[unstable(feature = "drain_filter", reason = "recently added", issue = "43244")]
index f7a50e7..34a2a70 100644 (file)
@@ -3,14 +3,16 @@
 #[cfg(not(no_global_oom_handling))]
 use super::AsVecIntoIter;
 use crate::alloc::{Allocator, Global};
+#[cfg(not(no_global_oom_handling))]
+use crate::collections::VecDeque;
 use crate::raw_vec::RawVec;
+use core::array;
 use core::fmt;
-use core::intrinsics::arith_offset;
 use core::iter::{
     FusedIterator, InPlaceIterable, SourceIter, TrustedLen, TrustedRandomAccessNoCoerce,
 };
 use core::marker::PhantomData;
-use core::mem::{self, ManuallyDrop};
+use core::mem::{self, ManuallyDrop, MaybeUninit, SizedTypeProperties};
 #[cfg(not(no_global_oom_handling))]
 use core::ops::Deref;
 use core::ptr::{self, NonNull};
@@ -40,7 +42,9 @@ pub struct IntoIter<
     // to avoid dropping the allocator twice we need to wrap it into ManuallyDrop
     pub(super) alloc: ManuallyDrop<A>,
     pub(super) ptr: *const T,
-    pub(super) end: *const T,
+    pub(super) end: *const T, // If T is a ZST, this is actually ptr+len. This encoding is picked so that
+                              // ptr == end is a quick test for the Iterator being empty, that works
+                              // for both ZST and non-ZST.
 }
 
 #[stable(feature = "vec_intoiter_debug", since = "1.13.0")]
@@ -97,13 +101,16 @@ impl<T, A: Allocator> IntoIter<T, A> {
     }
 
     /// Drops remaining elements and relinquishes the backing allocation.
+    /// This method guarantees it won't panic before relinquishing
+    /// the backing allocation.
     ///
     /// This is roughly equivalent to the following, but more efficient
     ///
     /// ```
     /// # let mut into_iter = Vec::<u8>::with_capacity(10).into_iter();
+    /// let mut into_iter = std::mem::replace(&mut into_iter, Vec::new().into_iter());
     /// (&mut into_iter).for_each(core::mem::drop);
-    /// unsafe { core::ptr::write(&mut into_iter, Vec::new().into_iter()); }
+    /// std::mem::forget(into_iter);
     /// ```
     ///
     /// This method is used by in-place iteration, refer to the vec::in_place_collect
@@ -120,15 +127,45 @@ impl<T, A: Allocator> IntoIter<T, A> {
         self.ptr = self.buf.as_ptr();
         self.end = self.buf.as_ptr();
 
+        // Dropping the remaining elements can panic, so this needs to be
+        // done only after updating the other fields.
         unsafe {
             ptr::drop_in_place(remaining);
         }
     }
 
     /// Forgets to Drop the remaining elements while still allowing the backing allocation to be freed.
-    #[allow(dead_code)]
     pub(crate) fn forget_remaining_elements(&mut self) {
-        self.ptr = self.end;
+        // For th ZST case, it is crucial that we mutate `end` here, not `ptr`.
+        // `ptr` must stay aligned, while `end` may be unaligned.
+        self.end = self.ptr;
+    }
+
+    #[cfg(not(no_global_oom_handling))]
+    #[inline]
+    pub(crate) fn into_vecdeque(self) -> VecDeque<T, A> {
+        // Keep our `Drop` impl from dropping the elements and the allocator
+        let mut this = ManuallyDrop::new(self);
+
+        // SAFETY: This allocation originally came from a `Vec`, so it passes
+        // all those checks. We have `this.buf` ≤ `this.ptr` ≤ `this.end`,
+        // so the `sub_ptr`s below cannot wrap, and will produce a well-formed
+        // range. `end` ≤ `buf + cap`, so the range will be in-bounds.
+        // Taking `alloc` is ok because nothing else is going to look at it,
+        // since our `Drop` impl isn't going to run so there's no more code.
+        unsafe {
+            let buf = this.buf.as_ptr();
+            let initialized = if T::IS_ZST {
+                // All the pointers are the same for ZSTs, so it's fine to
+                // say that they're all at the beginning of the "allocation".
+                0..this.len()
+            } else {
+                this.ptr.sub_ptr(buf)..this.end.sub_ptr(buf)
+            };
+            let cap = this.cap;
+            let alloc = ManuallyDrop::take(&mut this.alloc);
+            VecDeque::from_contiguous_raw_parts_in(buf, initialized, cap, alloc)
+        }
     }
 }
 
@@ -150,19 +187,18 @@ impl<T, A: Allocator> Iterator for IntoIter<T, A> {
 
     #[inline]
     fn next(&mut self) -> Option<T> {
-        if self.ptr as *const _ == self.end {
+        if self.ptr == self.end {
             None
-        } else if mem::size_of::<T>() == 0 {
-            // purposefully don't use 'ptr.offset' because for
-            // vectors with 0-size elements this would return the
-            // same pointer.
-            self.ptr = unsafe { arith_offset(self.ptr as *const i8, 1) as *mut T };
+        } else if T::IS_ZST {
+            // `ptr` has to stay where it is to remain aligned, so we reduce the length by 1 by
+            // reducing the `end`.
+            self.end = self.end.wrapping_byte_sub(1);
 
             // Make up a value of this ZST.
             Some(unsafe { mem::zeroed() })
         } else {
             let old = self.ptr;
-            self.ptr = unsafe { self.ptr.offset(1) };
+            self.ptr = unsafe { self.ptr.add(1) };
 
             Some(unsafe { ptr::read(old) })
         }
@@ -170,7 +206,7 @@ impl<T, A: Allocator> Iterator for IntoIter<T, A> {
 
     #[inline]
     fn size_hint(&self) -> (usize, Option<usize>) {
-        let exact = if mem::size_of::<T>() == 0 {
+        let exact = if T::IS_ZST {
             self.end.addr().wrapping_sub(self.ptr.addr())
         } else {
             unsafe { self.end.sub_ptr(self.ptr) }
@@ -182,11 +218,9 @@ impl<T, A: Allocator> Iterator for IntoIter<T, A> {
     fn advance_by(&mut self, n: usize) -> Result<(), usize> {
         let step_size = self.len().min(n);
         let to_drop = ptr::slice_from_raw_parts_mut(self.ptr as *mut T, step_size);
-        if mem::size_of::<T>() == 0 {
-            // SAFETY: due to unchecked casts of unsigned amounts to signed offsets the wraparound
-            // effectively results in unsigned pointers representing positions 0..usize::MAX,
-            // which is valid for ZSTs.
-            self.ptr = unsafe { arith_offset(self.ptr as *const i8, step_size as isize) as *mut T }
+        if T::IS_ZST {
+            // See `next` for why we sub `end` here.
+            self.end = self.end.wrapping_byte_sub(step_size);
         } else {
             // SAFETY: the min() above ensures that step_size is in bounds
             self.ptr = unsafe { self.ptr.add(step_size) };
@@ -206,6 +240,43 @@ impl<T, A: Allocator> Iterator for IntoIter<T, A> {
         self.len()
     }
 
+    #[inline]
+    fn next_chunk<const N: usize>(&mut self) -> Result<[T; N], core::array::IntoIter<T, N>> {
+        let mut raw_ary = MaybeUninit::uninit_array();
+
+        let len = self.len();
+
+        if T::IS_ZST {
+            if len < N {
+                self.forget_remaining_elements();
+                // Safety: ZSTs can be conjured ex nihilo, only the amount has to be correct
+                return Err(unsafe { array::IntoIter::new_unchecked(raw_ary, 0..len) });
+            }
+
+            self.end = self.end.wrapping_byte_sub(N);
+            // Safety: ditto
+            return Ok(unsafe { raw_ary.transpose().assume_init() });
+        }
+
+        if len < N {
+            // Safety: `len` indicates that this many elements are available and we just checked that
+            // it fits into the array.
+            unsafe {
+                ptr::copy_nonoverlapping(self.ptr, raw_ary.as_mut_ptr() as *mut T, len);
+                self.forget_remaining_elements();
+                return Err(array::IntoIter::new_unchecked(raw_ary, 0..len));
+            }
+        }
+
+        // Safety: `len` is larger than the array size. Copy a fixed amount here to fully initialize
+        // the array.
+        return unsafe {
+            ptr::copy_nonoverlapping(self.ptr, raw_ary.as_mut_ptr() as *mut T, N);
+            self.ptr = self.ptr.add(N);
+            Ok(raw_ary.transpose().assume_init())
+        };
+    }
+
     unsafe fn __iterator_get_unchecked(&mut self, i: usize) -> Self::Item
     where
         Self: TrustedRandomAccessNoCoerce,
@@ -219,7 +290,7 @@ impl<T, A: Allocator> Iterator for IntoIter<T, A> {
         // that `T: Copy` so reading elements from the buffer doesn't invalidate
         // them for `Drop`.
         unsafe {
-            if mem::size_of::<T>() == 0 { mem::zeroed() } else { ptr::read(self.ptr.add(i)) }
+            if T::IS_ZST { mem::zeroed() } else { ptr::read(self.ptr.add(i)) }
         }
     }
 }
@@ -230,14 +301,14 @@ impl<T, A: Allocator> DoubleEndedIterator for IntoIter<T, A> {
     fn next_back(&mut self) -> Option<T> {
         if self.end == self.ptr {
             None
-        } else if mem::size_of::<T>() == 0 {
+        } else if T::IS_ZST {
             // See above for why 'ptr.offset' isn't used
-            self.end = unsafe { arith_offset(self.end as *const i8, -1) as *mut T };
+            self.end = self.end.wrapping_byte_sub(1);
 
             // Make up a value of this ZST.
             Some(unsafe { mem::zeroed() })
         } else {
-            self.end = unsafe { self.end.offset(-1) };
+            self.end = unsafe { self.end.sub(1) };
 
             Some(unsafe { ptr::read(self.end) })
         }
@@ -246,14 +317,12 @@ impl<T, A: Allocator> DoubleEndedIterator for IntoIter<T, A> {
     #[inline]
     fn advance_back_by(&mut self, n: usize) -> Result<(), usize> {
         let step_size = self.len().min(n);
-        if mem::size_of::<T>() == 0 {
+        if T::IS_ZST {
             // SAFETY: same as for advance_by()
-            self.end = unsafe {
-                arith_offset(self.end as *const i8, step_size.wrapping_neg() as isize) as *mut T
-            }
+            self.end = self.end.wrapping_byte_sub(step_size);
         } else {
             // SAFETY: same as for advance_by()
-            self.end = unsafe { self.end.offset(step_size.wrapping_neg() as isize) };
+            self.end = unsafe { self.end.sub(step_size) };
         }
         let to_drop = ptr::slice_from_raw_parts_mut(self.end as *mut T, step_size);
         // SAFETY: same as for advance_by()
index 377f3d1..d928dcf 100644 (file)
@@ -1,10 +1,13 @@
 // SPDX-License-Identifier: Apache-2.0 OR MIT
 
+use core::num::{Saturating, Wrapping};
+
 use crate::boxed::Box;
 
 #[rustc_specialization_trait]
 pub(super) unsafe trait IsZero {
-    /// Whether this value's representation is all zeros
+    /// Whether this value's representation is all zeros,
+    /// or can be represented with all zeroes.
     fn is_zero(&self) -> bool;
 }
 
@@ -19,12 +22,14 @@ macro_rules! impl_is_zero {
     };
 }
 
+impl_is_zero!(i8, |x| x == 0); // It is needed to impl for arrays and tuples of i8.
 impl_is_zero!(i16, |x| x == 0);
 impl_is_zero!(i32, |x| x == 0);
 impl_is_zero!(i64, |x| x == 0);
 impl_is_zero!(i128, |x| x == 0);
 impl_is_zero!(isize, |x| x == 0);
 
+impl_is_zero!(u8, |x| x == 0); // It is needed to impl for arrays and tuples of u8.
 impl_is_zero!(u16, |x| x == 0);
 impl_is_zero!(u32, |x| x == 0);
 impl_is_zero!(u64, |x| x == 0);
@@ -55,16 +60,42 @@ unsafe impl<T: IsZero, const N: usize> IsZero for [T; N] {
     #[inline]
     fn is_zero(&self) -> bool {
         // Because this is generated as a runtime check, it's not obvious that
-        // it's worth doing if the array is really long.  The threshold here
-        // is largely arbitrary, but was picked because as of 2022-05-01 LLVM
-        // can const-fold the check in `vec![[0; 32]; n]` but not in
-        // `vec![[0; 64]; n]`: https://godbolt.org/z/WTzjzfs5b
+        // it's worth doing if the array is really long. The threshold here
+        // is largely arbitrary, but was picked because as of 2022-07-01 LLVM
+        // fails to const-fold the check in `vec![[1; 32]; n]`
+        // See https://github.com/rust-lang/rust/pull/97581#issuecomment-1166628022
         // Feel free to tweak if you have better evidence.
 
-        N <= 32 && self.iter().all(IsZero::is_zero)
+        N <= 16 && self.iter().all(IsZero::is_zero)
+    }
+}
+
+// This is recursive macro.
+macro_rules! impl_for_tuples {
+    // Stopper
+    () => {
+        // No use for implementing for empty tuple because it is ZST.
+    };
+    ($first_arg:ident $(,$rest:ident)*) => {
+        unsafe impl <$first_arg: IsZero, $($rest: IsZero,)*> IsZero for ($first_arg, $($rest,)*){
+            #[inline]
+            fn is_zero(&self) -> bool{
+                // Destructure tuple to N references
+                // Rust allows to hide generic params by local variable names.
+                #[allow(non_snake_case)]
+                let ($first_arg, $($rest,)*) = self;
+
+                $first_arg.is_zero()
+                    $( && $rest.is_zero() )*
+            }
+        }
+
+        impl_for_tuples!($($rest),*);
     }
 }
 
+impl_for_tuples!(A, B, C, D, E, F, G, H);
+
 // `Option<&T>` and `Option<Box<T>>` are guaranteed to represent `None` as null.
 // For fat pointers, the bytes that would be the pointer metadata in the `Some`
 // variant are padding in the `None` variant, so ignoring them and
@@ -118,3 +149,56 @@ impl_is_zero_option_of_nonzero!(
     NonZeroUsize,
     NonZeroIsize,
 );
+
+macro_rules! impl_is_zero_option_of_num {
+    ($($t:ty,)+) => {$(
+        unsafe impl IsZero for Option<$t> {
+            #[inline]
+            fn is_zero(&self) -> bool {
+                const {
+                    let none: Self = unsafe { core::mem::MaybeUninit::zeroed().assume_init() };
+                    assert!(none.is_none());
+                }
+                self.is_none()
+            }
+        }
+    )+};
+}
+
+impl_is_zero_option_of_num!(u8, u16, u32, u64, u128, i8, i16, i32, i64, i128, usize, isize,);
+
+unsafe impl<T: IsZero> IsZero for Wrapping<T> {
+    #[inline]
+    fn is_zero(&self) -> bool {
+        self.0.is_zero()
+    }
+}
+
+unsafe impl<T: IsZero> IsZero for Saturating<T> {
+    #[inline]
+    fn is_zero(&self) -> bool {
+        self.0.is_zero()
+    }
+}
+
+macro_rules! impl_for_optional_bool {
+    ($($t:ty,)+) => {$(
+        unsafe impl IsZero for $t {
+            #[inline]
+            fn is_zero(&self) -> bool {
+                // SAFETY: This is *not* a stable layout guarantee, but
+                // inside `core` we're allowed to rely on the current rustc
+                // behaviour that options of bools will be one byte with
+                // no padding, so long as they're nested less than 254 deep.
+                let raw: u8 = unsafe { core::mem::transmute(*self) };
+                raw == 0
+            }
+        }
+    )+};
+}
+impl_for_optional_bool! {
+    Option<bool>,
+    Option<Option<bool>>,
+    Option<Option<Option<bool>>>,
+    // Could go further, but not worth the metadata overhead
+}
index fe4fff5..9499591 100644 (file)
@@ -61,12 +61,12 @@ use core::cmp::Ordering;
 use core::convert::TryFrom;
 use core::fmt;
 use core::hash::{Hash, Hasher};
-use core::intrinsics::{arith_offset, assume};
+use core::intrinsics::assume;
 use core::iter;
 #[cfg(not(no_global_oom_handling))]
 use core::iter::FromIterator;
 use core::marker::PhantomData;
-use core::mem::{self, ManuallyDrop, MaybeUninit};
+use core::mem::{self, ManuallyDrop, MaybeUninit, SizedTypeProperties};
 use core::ops::{self, Index, IndexMut, Range, RangeBounds};
 use core::ptr::{self, NonNull};
 use core::slice::{self, SliceIndex};
@@ -75,7 +75,7 @@ use crate::alloc::{Allocator, Global};
 #[cfg(not(no_borrow))]
 use crate::borrow::{Cow, ToOwned};
 use crate::boxed::Box;
-use crate::collections::TryReserveError;
+use crate::collections::{TryReserveError, TryReserveErrorKind};
 use crate::raw_vec::RawVec;
 
 #[unstable(feature = "drain_filter", reason = "recently added", issue = "43244")]
@@ -127,7 +127,7 @@ use self::set_len_on_drop::SetLenOnDrop;
 mod set_len_on_drop;
 
 #[cfg(not(no_global_oom_handling))]
-use self::in_place_drop::InPlaceDrop;
+use self::in_place_drop::{InPlaceDrop, InPlaceDstBufDrop};
 
 #[cfg(not(no_global_oom_handling))]
 mod in_place_drop;
@@ -169,7 +169,7 @@ mod spec_extend;
 /// vec[0] = 7;
 /// assert_eq!(vec[0], 7);
 ///
-/// vec.extend([1, 2, 3].iter().copied());
+/// vec.extend([1, 2, 3]);
 ///
 /// for x in &vec {
 ///     println!("{x}");
@@ -428,17 +428,25 @@ impl<T> Vec<T> {
         Vec { buf: RawVec::NEW, len: 0 }
     }
 
-    /// Constructs a new, empty `Vec<T>` with the specified capacity.
+    /// Constructs a new, empty `Vec<T>` with at least the specified capacity.
     ///
-    /// The vector will be able to hold exactly `capacity` elements without
-    /// reallocating. If `capacity` is 0, the vector will not allocate.
+    /// The vector will be able to hold at least `capacity` elements without
+    /// reallocating. This method is allowed to allocate for more elements than
+    /// `capacity`. If `capacity` is 0, the vector will not allocate.
     ///
     /// It is important to note that although the returned vector has the
-    /// *capacity* specified, the vector will have a zero *length*. For an
-    /// explanation of the difference between length and capacity, see
+    /// minimum *capacity* specified, the vector will have a zero *length*. For
+    /// an explanation of the difference between length and capacity, see
     /// *[Capacity and reallocation]*.
     ///
+    /// If it is important to know the exact allocated capacity of a `Vec`,
+    /// always use the [`capacity`] method after construction.
+    ///
+    /// For `Vec<T>` where `T` is a zero-sized type, there will be no allocation
+    /// and the capacity will always be `usize::MAX`.
+    ///
     /// [Capacity and reallocation]: #capacity-and-reallocation
+    /// [`capacity`]: Vec::capacity
     ///
     /// # Panics
     ///
@@ -451,19 +459,24 @@ impl<T> Vec<T> {
     ///
     /// // The vector contains no items, even though it has capacity for more
     /// assert_eq!(vec.len(), 0);
-    /// assert_eq!(vec.capacity(), 10);
+    /// assert!(vec.capacity() >= 10);
     ///
     /// // These are all done without reallocating...
     /// for i in 0..10 {
     ///     vec.push(i);
     /// }
     /// assert_eq!(vec.len(), 10);
-    /// assert_eq!(vec.capacity(), 10);
+    /// assert!(vec.capacity() >= 10);
     ///
     /// // ...but this may make the vector reallocate
     /// vec.push(11);
     /// assert_eq!(vec.len(), 11);
     /// assert!(vec.capacity() >= 11);
+    ///
+    /// // A vector of a zero-sized type will always over-allocate, since no
+    /// // allocation is necessary
+    /// let vec_units = Vec::<()>::with_capacity(10);
+    /// assert_eq!(vec_units.capacity(), usize::MAX);
     /// ```
     #[cfg(not(no_global_oom_handling))]
     #[inline]
@@ -473,17 +486,25 @@ impl<T> Vec<T> {
         Self::with_capacity_in(capacity, Global)
     }
 
-    /// Tries to construct a new, empty `Vec<T>` with the specified capacity.
+    /// Tries to construct a new, empty `Vec<T>` with at least the specified capacity.
     ///
-    /// The vector will be able to hold exactly `capacity` elements without
-    /// reallocating. If `capacity` is 0, the vector will not allocate.
+    /// The vector will be able to hold at least `capacity` elements without
+    /// reallocating. This method is allowed to allocate for more elements than
+    /// `capacity`. If `capacity` is 0, the vector will not allocate.
     ///
     /// It is important to note that although the returned vector has the
-    /// *capacity* specified, the vector will have a zero *length*. For an
-    /// explanation of the difference between length and capacity, see
+    /// minimum *capacity* specified, the vector will have a zero *length*. For
+    /// an explanation of the difference between length and capacity, see
     /// *[Capacity and reallocation]*.
     ///
+    /// If it is important to know the exact allocated capacity of a `Vec`,
+    /// always use the [`capacity`] method after construction.
+    ///
+    /// For `Vec<T>` where `T` is a zero-sized type, there will be no allocation
+    /// and the capacity will always be `usize::MAX`.
+    ///
     /// [Capacity and reallocation]: #capacity-and-reallocation
+    /// [`capacity`]: Vec::capacity
     ///
     /// # Examples
     ///
@@ -492,14 +513,14 @@ impl<T> Vec<T> {
     ///
     /// // The vector contains no items, even though it has capacity for more
     /// assert_eq!(vec.len(), 0);
-    /// assert_eq!(vec.capacity(), 10);
+    /// assert!(vec.capacity() >= 10);
     ///
     /// // These are all done without reallocating...
     /// for i in 0..10 {
     ///     vec.push(i);
     /// }
     /// assert_eq!(vec.len(), 10);
-    /// assert_eq!(vec.capacity(), 10);
+    /// assert!(vec.capacity() >= 10);
     ///
     /// // ...but this may make the vector reallocate
     /// vec.push(11);
@@ -508,6 +529,11 @@ impl<T> Vec<T> {
     ///
     /// let mut result = Vec::try_with_capacity(usize::MAX);
     /// assert!(result.is_err());
+    ///
+    /// // A vector of a zero-sized type will always over-allocate, since no
+    /// // allocation is necessary
+    /// let vec_units = Vec::<()>::try_with_capacity(10).unwrap();
+    /// assert_eq!(vec_units.capacity(), usize::MAX);
     /// ```
     #[inline]
     #[stable(feature = "kernel", since = "1.0.0")]
@@ -515,15 +541,15 @@ impl<T> Vec<T> {
         Self::try_with_capacity_in(capacity, Global)
     }
 
-    /// Creates a `Vec<T>` directly from the raw components of another vector.
+    /// Creates a `Vec<T>` directly from a pointer, a capacity, and a length.
     ///
     /// # Safety
     ///
     /// This is highly unsafe, due to the number of invariants that aren't
     /// checked:
     ///
-    /// * `ptr` needs to have been previously allocated via [`String`]/`Vec<T>`
-    ///   (at least, it's highly likely to be incorrect if it wasn't).
+    /// * `ptr` must have been allocated using the global allocator, such as via
+    ///   the [`alloc::alloc`] function.
     /// * `T` needs to have the same alignment as what `ptr` was allocated with.
     ///   (`T` having a less strict alignment is not sufficient, the alignment really
     ///   needs to be equal to satisfy the [`dealloc`] requirement that memory must be
@@ -532,6 +558,14 @@ impl<T> Vec<T> {
     ///   to be the same size as the pointer was allocated with. (Because similar to
     ///   alignment, [`dealloc`] must be called with the same layout `size`.)
     /// * `length` needs to be less than or equal to `capacity`.
+    /// * The first `length` values must be properly initialized values of type `T`.
+    /// * `capacity` needs to be the capacity that the pointer was allocated with.
+    /// * The allocated size in bytes must be no larger than `isize::MAX`.
+    ///   See the safety documentation of [`pointer::offset`].
+    ///
+    /// These requirements are always upheld by any `ptr` that has been allocated
+    /// via `Vec<T>`. Other allocation sources are allowed if the invariants are
+    /// upheld.
     ///
     /// Violating these may cause problems like corrupting the allocator's
     /// internal data structures. For example it is normally **not** safe
@@ -552,6 +586,7 @@ impl<T> Vec<T> {
     /// function.
     ///
     /// [`String`]: crate::string::String
+    /// [`alloc::alloc`]: crate::alloc::alloc
     /// [`dealloc`]: crate::alloc::GlobalAlloc::dealloc
     ///
     /// # Examples
@@ -574,8 +609,8 @@ impl<T> Vec<T> {
     ///
     /// unsafe {
     ///     // Overwrite memory with 4, 5, 6
-    ///     for i in 0..len as isize {
-    ///         ptr::write(p.offset(i), 4 + i);
+    ///     for i in 0..len {
+    ///         ptr::write(p.add(i), 4 + i);
     ///     }
     ///
     ///     // Put everything back together into a Vec
@@ -583,6 +618,32 @@ impl<T> Vec<T> {
     ///     assert_eq!(rebuilt, [4, 5, 6]);
     /// }
     /// ```
+    ///
+    /// Using memory that was allocated elsewhere:
+    ///
+    /// ```rust
+    /// #![feature(allocator_api)]
+    ///
+    /// use std::alloc::{AllocError, Allocator, Global, Layout};
+    ///
+    /// fn main() {
+    ///     let layout = Layout::array::<u32>(16).expect("overflow cannot happen");
+    ///
+    ///     let vec = unsafe {
+    ///         let mem = match Global.allocate(layout) {
+    ///             Ok(mem) => mem.cast::<u32>().as_ptr(),
+    ///             Err(AllocError) => return,
+    ///         };
+    ///
+    ///         mem.write(1_000_000);
+    ///
+    ///         Vec::from_raw_parts_in(mem, 1, 16, Global)
+    ///     };
+    ///
+    ///     assert_eq!(vec, &[1_000_000]);
+    ///     assert_eq!(vec.capacity(), 16);
+    /// }
+    /// ```
     #[inline]
     #[stable(feature = "rust1", since = "1.0.0")]
     pub unsafe fn from_raw_parts(ptr: *mut T, length: usize, capacity: usize) -> Self {
@@ -611,18 +672,26 @@ impl<T, A: Allocator> Vec<T, A> {
         Vec { buf: RawVec::new_in(alloc), len: 0 }
     }
 
-    /// Constructs a new, empty `Vec<T, A>` with the specified capacity with the provided
-    /// allocator.
+    /// Constructs a new, empty `Vec<T, A>` with at least the specified capacity
+    /// with the provided allocator.
     ///
-    /// The vector will be able to hold exactly `capacity` elements without
-    /// reallocating. If `capacity` is 0, the vector will not allocate.
+    /// The vector will be able to hold at least `capacity` elements without
+    /// reallocating. This method is allowed to allocate for more elements than
+    /// `capacity`. If `capacity` is 0, the vector will not allocate.
     ///
     /// It is important to note that although the returned vector has the
-    /// *capacity* specified, the vector will have a zero *length*. For an
-    /// explanation of the difference between length and capacity, see
+    /// minimum *capacity* specified, the vector will have a zero *length*. For
+    /// an explanation of the difference between length and capacity, see
     /// *[Capacity and reallocation]*.
     ///
+    /// If it is important to know the exact allocated capacity of a `Vec`,
+    /// always use the [`capacity`] method after construction.
+    ///
+    /// For `Vec<T, A>` where `T` is a zero-sized type, there will be no allocation
+    /// and the capacity will always be `usize::MAX`.
+    ///
     /// [Capacity and reallocation]: #capacity-and-reallocation
+    /// [`capacity`]: Vec::capacity
     ///
     /// # Panics
     ///
@@ -652,6 +721,11 @@ impl<T, A: Allocator> Vec<T, A> {
     /// vec.push(11);
     /// assert_eq!(vec.len(), 11);
     /// assert!(vec.capacity() >= 11);
+    ///
+    /// // A vector of a zero-sized type will always over-allocate, since no
+    /// // allocation is necessary
+    /// let vec_units = Vec::<(), System>::with_capacity_in(10, System);
+    /// assert_eq!(vec_units.capacity(), usize::MAX);
     /// ```
     #[cfg(not(no_global_oom_handling))]
     #[inline]
@@ -660,18 +734,26 @@ impl<T, A: Allocator> Vec<T, A> {
         Vec { buf: RawVec::with_capacity_in(capacity, alloc), len: 0 }
     }
 
-    /// Tries to construct a new, empty `Vec<T, A>` with the specified capacity
+    /// Tries to construct a new, empty `Vec<T, A>` with at least the specified capacity
     /// with the provided allocator.
     ///
-    /// The vector will be able to hold exactly `capacity` elements without
-    /// reallocating. If `capacity` is 0, the vector will not allocate.
+    /// The vector will be able to hold at least `capacity` elements without
+    /// reallocating. This method is allowed to allocate for more elements than
+    /// `capacity`. If `capacity` is 0, the vector will not allocate.
     ///
     /// It is important to note that although the returned vector has the
-    /// *capacity* specified, the vector will have a zero *length*. For an
-    /// explanation of the difference between length and capacity, see
+    /// minimum *capacity* specified, the vector will have a zero *length*. For
+    /// an explanation of the difference between length and capacity, see
     /// *[Capacity and reallocation]*.
     ///
+    /// If it is important to know the exact allocated capacity of a `Vec`,
+    /// always use the [`capacity`] method after construction.
+    ///
+    /// For `Vec<T, A>` where `T` is a zero-sized type, there will be no allocation
+    /// and the capacity will always be `usize::MAX`.
+    ///
     /// [Capacity and reallocation]: #capacity-and-reallocation
+    /// [`capacity`]: Vec::capacity
     ///
     /// # Examples
     ///
@@ -700,6 +782,11 @@ impl<T, A: Allocator> Vec<T, A> {
     ///
     /// let mut result = Vec::try_with_capacity_in(usize::MAX, System);
     /// assert!(result.is_err());
+    ///
+    /// // A vector of a zero-sized type will always over-allocate, since no
+    /// // allocation is necessary
+    /// let vec_units = Vec::<(), System>::try_with_capacity_in(10, System).unwrap();
+    /// assert_eq!(vec_units.capacity(), usize::MAX);
     /// ```
     #[inline]
     #[stable(feature = "kernel", since = "1.0.0")]
@@ -707,21 +794,31 @@ impl<T, A: Allocator> Vec<T, A> {
         Ok(Vec { buf: RawVec::try_with_capacity_in(capacity, alloc)?, len: 0 })
     }
 
-    /// Creates a `Vec<T, A>` directly from the raw components of another vector.
+    /// Creates a `Vec<T, A>` directly from a pointer, a capacity, a length,
+    /// and an allocator.
     ///
     /// # Safety
     ///
     /// This is highly unsafe, due to the number of invariants that aren't
     /// checked:
     ///
-    /// * `ptr` needs to have been previously allocated via [`String`]/`Vec<T>`
-    ///   (at least, it's highly likely to be incorrect if it wasn't).
-    /// * `T` needs to have the same size and alignment as what `ptr` was allocated with.
+    /// * `ptr` must be [*currently allocated*] via the given allocator `alloc`.
+    /// * `T` needs to have the same alignment as what `ptr` was allocated with.
     ///   (`T` having a less strict alignment is not sufficient, the alignment really
     ///   needs to be equal to satisfy the [`dealloc`] requirement that memory must be
     ///   allocated and deallocated with the same layout.)
+    /// * The size of `T` times the `capacity` (ie. the allocated size in bytes) needs
+    ///   to be the same size as the pointer was allocated with. (Because similar to
+    ///   alignment, [`dealloc`] must be called with the same layout `size`.)
     /// * `length` needs to be less than or equal to `capacity`.
-    /// * `capacity` needs to be the capacity that the pointer was allocated with.
+    /// * The first `length` values must be properly initialized values of type `T`.
+    /// * `capacity` needs to [*fit*] the layout size that the pointer was allocated with.
+    /// * The allocated size in bytes must be no larger than `isize::MAX`.
+    ///   See the safety documentation of [`pointer::offset`].
+    ///
+    /// These requirements are always upheld by any `ptr` that has been allocated
+    /// via `Vec<T, A>`. Other allocation sources are allowed if the invariants are
+    /// upheld.
     ///
     /// Violating these may cause problems like corrupting the allocator's
     /// internal data structures. For example it is **not** safe
@@ -739,6 +836,8 @@ impl<T, A: Allocator> Vec<T, A> {
     ///
     /// [`String`]: crate::string::String
     /// [`dealloc`]: crate::alloc::GlobalAlloc::dealloc
+    /// [*currently allocated*]: crate::alloc::Allocator#currently-allocated-memory
+    /// [*fit*]: crate::alloc::Allocator#memory-fitting
     ///
     /// # Examples
     ///
@@ -768,8 +867,8 @@ impl<T, A: Allocator> Vec<T, A> {
     ///
     /// unsafe {
     ///     // Overwrite memory with 4, 5, 6
-    ///     for i in 0..len as isize {
-    ///         ptr::write(p.offset(i), 4 + i);
+    ///     for i in 0..len {
+    ///         ptr::write(p.add(i), 4 + i);
     ///     }
     ///
     ///     // Put everything back together into a Vec
@@ -777,6 +876,29 @@ impl<T, A: Allocator> Vec<T, A> {
     ///     assert_eq!(rebuilt, [4, 5, 6]);
     /// }
     /// ```
+    ///
+    /// Using memory that was allocated elsewhere:
+    ///
+    /// ```rust
+    /// use std::alloc::{alloc, Layout};
+    ///
+    /// fn main() {
+    ///     let layout = Layout::array::<u32>(16).expect("overflow cannot happen");
+    ///     let vec = unsafe {
+    ///         let mem = alloc(layout).cast::<u32>();
+    ///         if mem.is_null() {
+    ///             return;
+    ///         }
+    ///
+    ///         mem.write(1_000_000);
+    ///
+    ///         Vec::from_raw_parts(mem, 1, 16)
+    ///     };
+    ///
+    ///     assert_eq!(vec, &[1_000_000]);
+    ///     assert_eq!(vec.capacity(), 16);
+    /// }
+    /// ```
     #[inline]
     #[unstable(feature = "allocator_api", issue = "32838")]
     pub unsafe fn from_raw_parts_in(ptr: *mut T, length: usize, capacity: usize, alloc: A) -> Self {
@@ -869,13 +991,14 @@ impl<T, A: Allocator> Vec<T, A> {
         (ptr, len, capacity, alloc)
     }
 
-    /// Returns the number of elements the vector can hold without
+    /// Returns the total number of elements the vector can hold without
     /// reallocating.
     ///
     /// # Examples
     ///
     /// ```
-    /// let vec: Vec<i32> = Vec::with_capacity(10);
+    /// let mut vec: Vec<i32> = Vec::with_capacity(10);
+    /// vec.push(42);
     /// assert_eq!(vec.capacity(), 10);
     /// ```
     #[inline]
@@ -885,10 +1008,10 @@ impl<T, A: Allocator> Vec<T, A> {
     }
 
     /// Reserves capacity for at least `additional` more elements to be inserted
-    /// in the given `Vec<T>`. The collection may reserve more space to avoid
-    /// frequent reallocations. After calling `reserve`, capacity will be
-    /// greater than or equal to `self.len() + additional`. Does nothing if
-    /// capacity is already sufficient.
+    /// in the given `Vec<T>`. The collection may reserve more space to
+    /// speculatively avoid frequent reallocations. After calling `reserve`,
+    /// capacity will be greater than or equal to `self.len() + additional`.
+    /// Does nothing if capacity is already sufficient.
     ///
     /// # Panics
     ///
@@ -907,10 +1030,12 @@ impl<T, A: Allocator> Vec<T, A> {
         self.buf.reserve(self.len, additional);
     }
 
-    /// Reserves the minimum capacity for exactly `additional` more elements to
-    /// be inserted in the given `Vec<T>`. After calling `reserve_exact`,
-    /// capacity will be greater than or equal to `self.len() + additional`.
-    /// Does nothing if the capacity is already sufficient.
+    /// Reserves the minimum capacity for at least `additional` more elements to
+    /// be inserted in the given `Vec<T>`. Unlike [`reserve`], this will not
+    /// deliberately over-allocate to speculatively avoid frequent allocations.
+    /// After calling `reserve_exact`, capacity will be greater than or equal to
+    /// `self.len() + additional`. Does nothing if the capacity is already
+    /// sufficient.
     ///
     /// Note that the allocator may give the collection more space than it
     /// requests. Therefore, capacity can not be relied upon to be precisely
@@ -936,10 +1061,11 @@ impl<T, A: Allocator> Vec<T, A> {
     }
 
     /// Tries to reserve capacity for at least `additional` more elements to be inserted
-    /// in the given `Vec<T>`. The collection may reserve more space to avoid
+    /// in the given `Vec<T>`. The collection may reserve more space to speculatively avoid
     /// frequent reallocations. After calling `try_reserve`, capacity will be
-    /// greater than or equal to `self.len() + additional`. Does nothing if
-    /// capacity is already sufficient.
+    /// greater than or equal to `self.len() + additional` if it returns
+    /// `Ok(())`. Does nothing if capacity is already sufficient. This method
+    /// preserves the contents even if an error occurs.
     ///
     /// # Errors
     ///
@@ -971,10 +1097,11 @@ impl<T, A: Allocator> Vec<T, A> {
         self.buf.try_reserve(self.len, additional)
     }
 
-    /// Tries to reserve the minimum capacity for exactly `additional`
-    /// elements to be inserted in the given `Vec<T>`. After calling
-    /// `try_reserve_exact`, capacity will be greater than or equal to
-    /// `self.len() + additional` if it returns `Ok(())`.
+    /// Tries to reserve the minimum capacity for at least `additional`
+    /// elements to be inserted in the given `Vec<T>`. Unlike [`try_reserve`],
+    /// this will not deliberately over-allocate to speculatively avoid frequent
+    /// allocations. After calling `try_reserve_exact`, capacity will be greater
+    /// than or equal to `self.len() + additional` if it returns `Ok(())`.
     /// Does nothing if the capacity is already sufficient.
     ///
     /// Note that the allocator may give the collection more space than it
@@ -1066,7 +1193,8 @@ impl<T, A: Allocator> Vec<T, A> {
 
     /// Converts the vector into [`Box<[T]>`][owned slice].
     ///
-    /// Note that this will drop any excess capacity.
+    /// If the vector has excess capacity, its items will be moved into a
+    /// newly-allocated buffer with exactly the right capacity.
     ///
     /// [owned slice]: Box
     ///
@@ -1199,7 +1327,8 @@ impl<T, A: Allocator> Vec<T, A> {
         self
     }
 
-    /// Returns a raw pointer to the vector's buffer.
+    /// Returns a raw pointer to the vector's buffer, or a dangling raw pointer
+    /// valid for zero sized reads if the vector didn't allocate.
     ///
     /// The caller must ensure that the vector outlives the pointer this
     /// function returns, or else it will end up pointing to garbage.
@@ -1236,7 +1365,8 @@ impl<T, A: Allocator> Vec<T, A> {
         ptr
     }
 
-    /// Returns an unsafe mutable pointer to the vector's buffer.
+    /// Returns an unsafe mutable pointer to the vector's buffer, or a dangling
+    /// raw pointer valid for zero sized reads if the vector didn't allocate.
     ///
     /// The caller must ensure that the vector outlives the pointer this
     /// function returns, or else it will end up pointing to garbage.
@@ -1440,9 +1570,6 @@ impl<T, A: Allocator> Vec<T, A> {
         }
 
         let len = self.len();
-        if index > len {
-            assert_failed(index, len);
-        }
 
         // space for the new element
         if len == self.buf.capacity() {
@@ -1454,9 +1581,15 @@ impl<T, A: Allocator> Vec<T, A> {
             // The spot to put the new value
             {
                 let p = self.as_mut_ptr().add(index);
-                // Shift everything over to make space. (Duplicating the
-                // `index`th element into two consecutive places.)
-                ptr::copy(p, p.offset(1), len - index);
+                if index < len {
+                    // Shift everything over to make space. (Duplicating the
+                    // `index`th element into two consecutive places.)
+                    ptr::copy(p, p.add(1), len - index);
+                } else if index == len {
+                    // No elements need shifting.
+                } else {
+                    assert_failed(index, len);
+                }
                 // Write it in, overwriting the first copy of the `index`th
                 // element.
                 ptr::write(p, element);
@@ -1513,7 +1646,7 @@ impl<T, A: Allocator> Vec<T, A> {
                 ret = ptr::read(ptr);
 
                 // Shift everything down to fill in that spot.
-                ptr::copy(ptr.offset(1), ptr, len - index - 1);
+                ptr::copy(ptr.add(1), ptr, len - index - 1);
             }
             self.set_len(len - 1);
             ret
@@ -1562,11 +1695,11 @@ impl<T, A: Allocator> Vec<T, A> {
     ///
     /// ```
     /// let mut vec = vec![1, 2, 3, 4];
-    /// vec.retain_mut(|x| if *x > 3 {
-    ///     false
-    /// } else {
+    /// vec.retain_mut(|x| if *x <= 3 {
     ///     *x += 1;
     ///     true
+    /// } else {
+    ///     false
     /// });
     /// assert_eq!(vec, [2, 3, 4]);
     /// ```
@@ -1854,6 +1987,51 @@ impl<T, A: Allocator> Vec<T, A> {
         Ok(())
     }
 
+    /// Appends an element if there is sufficient spare capacity, otherwise an error is returned
+    /// with the element.
+    ///
+    /// Unlike [`push`] this method will not reallocate when there's insufficient capacity.
+    /// The caller should use [`reserve`] or [`try_reserve`] to ensure that there is enough capacity.
+    ///
+    /// [`push`]: Vec::push
+    /// [`reserve`]: Vec::reserve
+    /// [`try_reserve`]: Vec::try_reserve
+    ///
+    /// # Examples
+    ///
+    /// A manual, panic-free alternative to [`FromIterator`]:
+    ///
+    /// ```
+    /// #![feature(vec_push_within_capacity)]
+    ///
+    /// use std::collections::TryReserveError;
+    /// fn from_iter_fallible<T>(iter: impl Iterator<Item=T>) -> Result<Vec<T>, TryReserveError> {
+    ///     let mut vec = Vec::new();
+    ///     for value in iter {
+    ///         if let Err(value) = vec.push_within_capacity(value) {
+    ///             vec.try_reserve(1)?;
+    ///             // this cannot fail, the previous line either returned or added at least 1 free slot
+    ///             let _ = vec.push_within_capacity(value);
+    ///         }
+    ///     }
+    ///     Ok(vec)
+    /// }
+    /// assert_eq!(from_iter_fallible(0..100), Ok(Vec::from_iter(0..100)));
+    /// ```
+    #[inline]
+    #[unstable(feature = "vec_push_within_capacity", issue = "100486")]
+    pub fn push_within_capacity(&mut self, value: T) -> Result<(), T> {
+        if self.len == self.buf.capacity() {
+            return Err(value);
+        }
+        unsafe {
+            let end = self.as_mut_ptr().add(self.len);
+            ptr::write(end, value);
+            self.len += 1;
+        }
+        Ok(())
+    }
+
     /// Removes the last element from a vector and returns it, or [`None`] if it
     /// is empty.
     ///
@@ -1886,7 +2064,7 @@ impl<T, A: Allocator> Vec<T, A> {
     ///
     /// # Panics
     ///
-    /// Panics if the number of elements in the vector overflows a `usize`.
+    /// Panics if the new capacity exceeds `isize::MAX` bytes.
     ///
     /// # Examples
     ///
@@ -1980,9 +2158,7 @@ impl<T, A: Allocator> Vec<T, A> {
         unsafe {
             // set self.vec length's to start, to be safe in case Drain is leaked
             self.set_len(start);
-            // Use the borrow in the IterMut to indicate borrowing behavior of the
-            // whole Drain iterator (like &mut T).
-            let range_slice = slice::from_raw_parts_mut(self.as_mut_ptr().add(start), end - start);
+            let range_slice = slice::from_raw_parts(self.as_ptr().add(start), end - start);
             Drain {
                 tail_start: end,
                 tail_len: len - end,
@@ -2145,7 +2321,7 @@ impl<T, A: Allocator> Vec<T, A> {
     {
         let len = self.len();
         if new_len > len {
-            self.extend_with(new_len - len, ExtendFunc(f));
+            self.extend_trusted(iter::repeat_with(f).take(new_len - len));
         } else {
             self.truncate(new_len);
         }
@@ -2174,7 +2350,6 @@ impl<T, A: Allocator> Vec<T, A> {
     /// static_ref[0] += 1;
     /// assert_eq!(static_ref, &[2, 2, 3]);
     /// ```
-    #[cfg(not(no_global_oom_handling))]
     #[stable(feature = "vec_leak", since = "1.47.0")]
     #[inline]
     pub fn leak<'a>(self) -> &'a mut [T]
@@ -2469,7 +2644,7 @@ impl<T: Clone, A: Allocator> Vec<T, A> {
         self.reserve(range.len());
 
         // SAFETY:
-        // - `slice::range` guarantees  that the given range is valid for indexing self
+        // - `slice::range` guarantees that the given range is valid for indexing self
         unsafe {
             self.spec_extend_from_within(range);
         }
@@ -2501,7 +2676,7 @@ impl<T, A: Allocator, const N: usize> Vec<[T; N], A> {
     #[unstable(feature = "slice_flatten", issue = "95629")]
     pub fn into_flattened(self) -> Vec<T, A> {
         let (ptr, len, cap, alloc) = self.into_raw_parts_with_alloc();
-        let (new_len, new_cap) = if mem::size_of::<T>() == 0 {
+        let (new_len, new_cap) = if T::IS_ZST {
             (len.checked_mul(N).expect("vec len overflow"), usize::MAX)
         } else {
             // SAFETY:
@@ -2537,16 +2712,6 @@ impl<T: Clone> ExtendWith<T> for ExtendElement<T> {
     }
 }
 
-struct ExtendFunc<F>(F);
-impl<T, F: FnMut() -> T> ExtendWith<T> for ExtendFunc<F> {
-    fn next(&mut self) -> T {
-        (self.0)()
-    }
-    fn last(mut self) -> T {
-        (self.0)()
-    }
-}
-
 impl<T, A: Allocator> Vec<T, A> {
     #[cfg(not(no_global_oom_handling))]
     /// Extend the vector by `n` values, using the given generator.
@@ -2563,7 +2728,7 @@ impl<T, A: Allocator> Vec<T, A> {
             // Write all elements except the last one
             for _ in 1..n {
                 ptr::write(ptr, value.next());
-                ptr = ptr.offset(1);
+                ptr = ptr.add(1);
                 // Increment the length in every step in case next() panics
                 local_len.increment_len(1);
             }
@@ -2592,7 +2757,7 @@ impl<T, A: Allocator> Vec<T, A> {
             // Write all elements except the last one
             for _ in 1..n {
                 ptr::write(ptr, value.next());
-                ptr = ptr.offset(1);
+                ptr = ptr.add(1);
                 // Increment the length in every step in case next() panics
                 local_len.increment_len(1);
             }
@@ -2664,7 +2829,7 @@ impl<T: Clone, A: Allocator> ExtendFromWithinSpec for Vec<T, A> {
         let (this, spare, len) = unsafe { self.split_at_spare_mut_with_len() };
 
         // SAFETY:
-        // - caller guaratees that src is a valid index
+        // - caller guarantees that src is a valid index
         let to_clone = unsafe { this.get_unchecked(src) };
 
         iter::zip(to_clone, spare)
@@ -2683,13 +2848,13 @@ impl<T: Copy, A: Allocator> ExtendFromWithinSpec for Vec<T, A> {
             let (init, spare) = self.split_at_spare_mut();
 
             // SAFETY:
-            // - caller guaratees that `src` is a valid index
+            // - caller guarantees that `src` is a valid index
             let source = unsafe { init.get_unchecked(src) };
 
             // SAFETY:
             // - Both pointers are created from unique slice references (`&mut [_]`)
             //   so they are valid and do not overlap.
-            // - Elements are :Copy so it's OK to to copy them, without doing
+            // - Elements are :Copy so it's OK to copy them, without doing
             //   anything with the original values
             // - `count` is equal to the len of `source`, so source is valid for
             //   `count` reads
@@ -2712,6 +2877,7 @@ impl<T: Copy, A: Allocator> ExtendFromWithinSpec for Vec<T, A> {
 impl<T, A: Allocator> ops::Deref for Vec<T, A> {
     type Target = [T];
 
+    #[inline]
     fn deref(&self) -> &[T] {
         unsafe { slice::from_raw_parts(self.as_ptr(), self.len) }
     }
@@ -2719,6 +2885,7 @@ impl<T, A: Allocator> ops::Deref for Vec<T, A> {
 
 #[stable(feature = "rust1", since = "1.0.0")]
 impl<T, A: Allocator> ops::DerefMut for Vec<T, A> {
+    #[inline]
     fn deref_mut(&mut self) -> &mut [T] {
         unsafe { slice::from_raw_parts_mut(self.as_mut_ptr(), self.len) }
     }
@@ -2764,7 +2931,7 @@ impl<T: Clone, A: Allocator + Clone> Clone for Vec<T, A> {
 
     // HACK(japaric): with cfg(test) the inherent `[T]::to_vec` method, which is
     // required for this method definition, is not available. Instead use the
-    // `slice::to_vec`  function which is only available with cfg(test)
+    // `slice::to_vec` function which is only available with cfg(test)
     // NB see the slice::hack module in slice.rs for more information
     #[cfg(test)]
     fn clone(&self) -> Self {
@@ -2845,19 +3012,22 @@ impl<T, A: Allocator> IntoIterator for Vec<T, A> {
     ///
     /// ```
     /// let v = vec!["a".to_string(), "b".to_string()];
-    /// for s in v.into_iter() {
-    ///     // s has type String, not &String
-    ///     println!("{s}");
-    /// }
+    /// let mut v_iter = v.into_iter();
+    ///
+    /// let first_element: Option<String> = v_iter.next();
+    ///
+    /// assert_eq!(first_element, Some("a".to_string()));
+    /// assert_eq!(v_iter.next(), Some("b".to_string()));
+    /// assert_eq!(v_iter.next(), None);
     /// ```
     #[inline]
-    fn into_iter(self) -> IntoIter<T, A> {
+    fn into_iter(self) -> Self::IntoIter {
         unsafe {
             let mut me = ManuallyDrop::new(self);
             let alloc = ManuallyDrop::new(ptr::read(me.allocator()));
             let begin = me.as_mut_ptr();
-            let end = if mem::size_of::<T>() == 0 {
-                arith_offset(begin as *const i8, me.len() as isize) as *const T
+            let end = if T::IS_ZST {
+                begin.wrapping_byte_add(me.len())
             } else {
                 begin.add(me.len()) as *const T
             };
@@ -2879,7 +3049,7 @@ impl<'a, T, A: Allocator> IntoIterator for &'a Vec<T, A> {
     type Item = &'a T;
     type IntoIter = slice::Iter<'a, T>;
 
-    fn into_iter(self) -> slice::Iter<'a, T> {
+    fn into_iter(self) -> Self::IntoIter {
         self.iter()
     }
 }
@@ -2889,7 +3059,7 @@ impl<'a, T, A: Allocator> IntoIterator for &'a mut Vec<T, A> {
     type Item = &'a mut T;
     type IntoIter = slice::IterMut<'a, T>;
 
-    fn into_iter(self) -> slice::IterMut<'a, T> {
+    fn into_iter(self) -> Self::IntoIter {
         self.iter_mut()
     }
 }
@@ -2969,6 +3139,69 @@ impl<T, A: Allocator> Vec<T, A> {
         Ok(())
     }
 
+    // specific extend for `TrustedLen` iterators, called both by the specializations
+    // and internal places where resolving specialization makes compilation slower
+    #[cfg(not(no_global_oom_handling))]
+    fn extend_trusted(&mut self, iterator: impl iter::TrustedLen<Item = T>) {
+        let (low, high) = iterator.size_hint();
+        if let Some(additional) = high {
+            debug_assert_eq!(
+                low,
+                additional,
+                "TrustedLen iterator's size hint is not exact: {:?}",
+                (low, high)
+            );
+            self.reserve(additional);
+            unsafe {
+                let ptr = self.as_mut_ptr();
+                let mut local_len = SetLenOnDrop::new(&mut self.len);
+                iterator.for_each(move |element| {
+                    ptr::write(ptr.add(local_len.current_len()), element);
+                    // Since the loop executes user code which can panic we have to update
+                    // the length every step to correctly drop what we've written.
+                    // NB can't overflow since we would have had to alloc the address space
+                    local_len.increment_len(1);
+                });
+            }
+        } else {
+            // Per TrustedLen contract a `None` upper bound means that the iterator length
+            // truly exceeds usize::MAX, which would eventually lead to a capacity overflow anyway.
+            // Since the other branch already panics eagerly (via `reserve()`) we do the same here.
+            // This avoids additional codegen for a fallback code path which would eventually
+            // panic anyway.
+            panic!("capacity overflow");
+        }
+    }
+
+    // specific extend for `TrustedLen` iterators, called both by the specializations
+    // and internal places where resolving specialization makes compilation slower
+    fn try_extend_trusted(&mut self, iterator: impl iter::TrustedLen<Item = T>) -> Result<(), TryReserveError> {
+        let (low, high) = iterator.size_hint();
+        if let Some(additional) = high {
+            debug_assert_eq!(
+                low,
+                additional,
+                "TrustedLen iterator's size hint is not exact: {:?}",
+                (low, high)
+            );
+            self.try_reserve(additional)?;
+            unsafe {
+                let ptr = self.as_mut_ptr();
+                let mut local_len = SetLenOnDrop::new(&mut self.len);
+                iterator.for_each(move |element| {
+                    ptr::write(ptr.add(local_len.current_len()), element);
+                    // Since the loop executes user code which can panic we have to update
+                    // the length every step to correctly drop what we've written.
+                    // NB can't overflow since we would have had to alloc the address space
+                    local_len.increment_len(1);
+                });
+            }
+            Ok(())
+        } else {
+            Err(TryReserveErrorKind::CapacityOverflow.into())
+        }
+    }
+
     /// Creates a splicing iterator that replaces the specified range in the vector
     /// with the given `replace_with` iterator and yields the removed items.
     /// `replace_with` does not need to be the same length as `range`.
@@ -3135,6 +3368,8 @@ unsafe impl<#[may_dangle] T, A: Allocator> Drop for Vec<T, A> {
 #[rustc_const_unstable(feature = "const_default_impls", issue = "87864")]
 impl<T> const Default for Vec<T> {
     /// Creates an empty `Vec<T>`.
+    ///
+    /// The vector will not allocate until elements are pushed onto it.
     fn default() -> Vec<T> {
         Vec::new()
     }
@@ -3227,12 +3462,15 @@ impl<T, const N: usize> From<[T; N]> for Vec<T> {
     /// ```
     #[cfg(not(test))]
     fn from(s: [T; N]) -> Vec<T> {
-        <[T]>::into_vec(box s)
+        <[T]>::into_vec(
+            #[rustc_box]
+            Box::new(s),
+        )
     }
 
     #[cfg(test)]
     fn from(s: [T; N]) -> Vec<T> {
-        crate::slice::into_vec(box s)
+        crate::slice::into_vec(Box::new(s))
     }
 }
 
@@ -3261,7 +3499,7 @@ where
     }
 }
 
-// note: test pulls in libstd, which causes errors here
+// note: test pulls in std, which causes errors here
 #[cfg(not(test))]
 #[stable(feature = "vec_from_box", since = "1.18.0")]
 impl<T, A: Allocator> From<Box<[T], A>> for Vec<T, A> {
@@ -3279,7 +3517,7 @@ impl<T, A: Allocator> From<Box<[T], A>> for Vec<T, A> {
     }
 }
 
-// note: test pulls in libstd, which causes errors here
+// note: test pulls in std, which causes errors here
 #[cfg(not(no_global_oom_handling))]
 #[cfg(not(test))]
 #[stable(feature = "box_from_vec", since = "1.20.0")]
@@ -3294,6 +3532,14 @@ impl<T, A: Allocator> From<Vec<T, A>> for Box<[T], A> {
     /// ```
     /// assert_eq!(Box::from(vec![1, 2, 3]), vec![1, 2, 3].into_boxed_slice());
     /// ```
+    ///
+    /// Any excess capacity is removed:
+    /// ```
+    /// let mut vec = Vec::with_capacity(10);
+    /// vec.extend([1, 2, 3]);
+    ///
+    /// assert_eq!(Box::from(vec), vec![1, 2, 3].into_boxed_slice());
+    /// ```
     fn from(v: Vec<T, A>) -> Self {
         v.into_boxed_slice()
     }
index 448bf50..d3c7297 100644 (file)
@@ -20,6 +20,11 @@ impl<'a> SetLenOnDrop<'a> {
     pub(super) fn increment_len(&mut self, increment: usize) {
         self.local_len += increment;
     }
+
+    #[inline]
+    pub(super) fn current_len(&self) -> usize {
+        self.local_len
+    }
 }
 
 impl Drop for SetLenOnDrop<'_> {
index 5ce2d00..a6a7352 100644 (file)
@@ -1,12 +1,11 @@
 // SPDX-License-Identifier: Apache-2.0 OR MIT
 
 use crate::alloc::Allocator;
-use crate::collections::{TryReserveError, TryReserveErrorKind};
+use crate::collections::TryReserveError;
 use core::iter::TrustedLen;
-use core::ptr::{self};
 use core::slice::{self};
 
-use super::{IntoIter, SetLenOnDrop, Vec};
+use super::{IntoIter, Vec};
 
 // Specialization trait used for Vec::extend
 #[cfg(not(no_global_oom_handling))]
@@ -44,36 +43,7 @@ where
     I: TrustedLen<Item = T>,
 {
     default fn spec_extend(&mut self, iterator: I) {
-        // This is the case for a TrustedLen iterator.
-        let (low, high) = iterator.size_hint();
-        if let Some(additional) = high {
-            debug_assert_eq!(
-                low,
-                additional,
-                "TrustedLen iterator's size hint is not exact: {:?}",
-                (low, high)
-            );
-            self.reserve(additional);
-            unsafe {
-                let mut ptr = self.as_mut_ptr().add(self.len());
-                let mut local_len = SetLenOnDrop::new(&mut self.len);
-                iterator.for_each(move |element| {
-                    ptr::write(ptr, element);
-                    ptr = ptr.offset(1);
-                    // Since the loop executes user code which can panic we have to bump the pointer
-                    // after each step.
-                    // NB can't overflow since we would have had to alloc the address space
-                    local_len.increment_len(1);
-                });
-            }
-        } else {
-            // Per TrustedLen contract a `None` upper bound means that the iterator length
-            // truly exceeds usize::MAX, which would eventually lead to a capacity overflow anyway.
-            // Since the other branch already panics eagerly (via `reserve()`) we do the same here.
-            // This avoids additional codegen for a fallback code path which would eventually
-            // panic anyway.
-            panic!("capacity overflow");
-        }
+        self.extend_trusted(iterator)
     }
 }
 
@@ -82,32 +52,7 @@ where
     I: TrustedLen<Item = T>,
 {
     default fn try_spec_extend(&mut self, iterator: I) -> Result<(), TryReserveError> {
-        // This is the case for a TrustedLen iterator.
-        let (low, high) = iterator.size_hint();
-        if let Some(additional) = high {
-            debug_assert_eq!(
-                low,
-                additional,
-                "TrustedLen iterator's size hint is not exact: {:?}",
-                (low, high)
-            );
-            self.try_reserve(additional)?;
-            unsafe {
-                let mut ptr = self.as_mut_ptr().add(self.len());
-                let mut local_len = SetLenOnDrop::new(&mut self.len);
-                iterator.for_each(move |element| {
-                    ptr::write(ptr, element);
-                    ptr = ptr.offset(1);
-                    // Since the loop executes user code which can panic we have to bump the pointer
-                    // after each step.
-                    // NB can't overflow since we would have had to alloc the address space
-                    local_len.increment_len(1);
-                });
-            }
-            Ok(())
-        } else {
-            Err(TryReserveErrorKind::CapacityOverflow.into())
-        }
+        self.try_extend_trusted(iterator)
     }
 }
 
index 50e7a76..3e601ce 100644 (file)
@@ -6,6 +6,7 @@
  * Sorted alphabetically.
  */
 
+#include <linux/errname.h>
 #include <linux/slab.h>
 #include <linux/refcount.h>
 #include <linux/wait.h>
index 7b24645..9bcbea0 100644 (file)
@@ -9,7 +9,6 @@
 //! using this crate.
 
 #![no_std]
-#![feature(core_ffi_c)]
 // See <https://github.com/rust-lang/rust-bindgen/issues/1651>.
 #![cfg_attr(test, allow(deref_nullptr))]
 #![cfg_attr(test, allow(unaligned_references))]
index 81e8026..bb594da 100644 (file)
@@ -21,6 +21,7 @@
 #include <linux/bug.h>
 #include <linux/build_bug.h>
 #include <linux/err.h>
+#include <linux/errname.h>
 #include <linux/refcount.h>
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
@@ -110,6 +111,12 @@ long rust_helper_PTR_ERR(__force const void *ptr)
 }
 EXPORT_SYMBOL_GPL(rust_helper_PTR_ERR);
 
+const char *rust_helper_errname(int err)
+{
+       return errname(err);
+}
+EXPORT_SYMBOL_GPL(rust_helper_errname);
+
 struct task_struct *rust_helper_get_current(void)
 {
        return current;
index 6595423..9e37120 100644 (file)
@@ -67,6 +67,8 @@ macro_rules! build_error {
 ///     assert!(n > 1); // Run-time check
 /// }
 /// ```
+///
+/// [`static_assert!`]: crate::static_assert!
 #[macro_export]
 macro_rules! build_assert {
     ($cond:expr $(,)?) => {{
index 5f4114b..05fcab6 100644 (file)
@@ -4,16 +4,20 @@
 //!
 //! C header: [`include/uapi/asm-generic/errno-base.h`](../../../include/uapi/asm-generic/errno-base.h)
 
+use crate::str::CStr;
+
 use alloc::{
     alloc::{AllocError, LayoutError},
     collections::TryReserveError,
 };
 
 use core::convert::From;
+use core::fmt;
 use core::num::TryFromIntError;
 use core::str::Utf8Error;
 
 /// Contains the C-compatible error codes.
+#[rustfmt::skip]
 pub mod code {
     macro_rules! declare_err {
         ($err:tt $(,)? $($doc:expr),+) => {
@@ -58,6 +62,25 @@ pub mod code {
     declare_err!(EPIPE, "Broken pipe.");
     declare_err!(EDOM, "Math argument out of domain of func.");
     declare_err!(ERANGE, "Math result not representable.");
+    declare_err!(ERESTARTSYS, "Restart the system call.");
+    declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
+    declare_err!(ERESTARTNOHAND, "Restart if no handler.");
+    declare_err!(ENOIOCTLCMD, "No ioctl command.");
+    declare_err!(ERESTART_RESTARTBLOCK, "Restart by calling sys_restart_syscall.");
+    declare_err!(EPROBE_DEFER, "Driver requests probe retry.");
+    declare_err!(EOPENSTALE, "Open found a stale dentry.");
+    declare_err!(ENOPARAM, "Parameter not supported.");
+    declare_err!(EBADHANDLE, "Illegal NFS file handle.");
+    declare_err!(ENOTSYNC, "Update synchronization mismatch.");
+    declare_err!(EBADCOOKIE, "Cookie is stale.");
+    declare_err!(ENOTSUPP, "Operation is not supported.");
+    declare_err!(ETOOSMALL, "Buffer or request is too small.");
+    declare_err!(ESERVERFAULT, "An untranslatable error occurred.");
+    declare_err!(EBADTYPE, "Type not supported by server.");
+    declare_err!(EJUKEBOX, "Request initiated, but will not complete before timeout.");
+    declare_err!(EIOCBQUEUED, "iocb queued, will get completion event.");
+    declare_err!(ERECALLCONFLICT, "Conflict with recalled state.");
+    declare_err!(ENOGRACE, "NFS file lock reclaim refused.");
 }
 
 /// Generic integer kernel error.
@@ -113,6 +136,42 @@ impl Error {
         // SAFETY: self.0 is a valid error due to its invariant.
         unsafe { bindings::ERR_PTR(self.0.into()) as *mut _ }
     }
+
+    /// Returns a string representing the error, if one exists.
+    #[cfg(not(testlib))]
+    pub fn name(&self) -> Option<&'static CStr> {
+        // SAFETY: Just an FFI call, there are no extra safety requirements.
+        let ptr = unsafe { bindings::errname(-self.0) };
+        if ptr.is_null() {
+            None
+        } else {
+            // SAFETY: The string returned by `errname` is static and `NUL`-terminated.
+            Some(unsafe { CStr::from_char_ptr(ptr) })
+        }
+    }
+
+    /// Returns a string representing the error, if one exists.
+    ///
+    /// When `testlib` is configured, this always returns `None` to avoid the dependency on a
+    /// kernel function so that tests that use this (e.g., by calling [`Result::unwrap`]) can still
+    /// run in userspace.
+    #[cfg(testlib)]
+    pub fn name(&self) -> Option<&'static CStr> {
+        None
+    }
+}
+
+impl fmt::Debug for Error {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        match self.name() {
+            // Print out number if no name can be found.
+            None => f.debug_tuple("Error").field(&-self.0).finish(),
+            // SAFETY: These strings are ASCII-only.
+            Some(name) => f
+                .debug_tuple(unsafe { core::str::from_utf8_unchecked(name) })
+                .finish(),
+        }
+    }
 }
 
 impl From<AllocError> for Error {
@@ -177,7 +236,7 @@ impl From<core::convert::Infallible> for Error {
 /// Note that even if a function does not return anything when it succeeds,
 /// it should still be modeled as returning a `Result` rather than
 /// just an [`Error`].
-pub type Result<T = ()> = core::result::Result<T, Error>;
+pub type Result<T = (), E = Error> = core::result::Result<T, E>;
 
 /// Converts an integer as returned by a C kernel function to an error if it's negative, and
 /// `Ok(())` otherwise.
index 4ebfb08..b4332a4 100644 (file)
 //! [`Opaque`]: kernel::types::Opaque
 //! [`Opaque::ffi_init`]: kernel::types::Opaque::ffi_init
 //! [`pin_data`]: ::macros::pin_data
+//! [`pin_init!`]: crate::pin_init!
 
 use crate::{
     error::{self, Error},
@@ -255,6 +256,8 @@ pub mod macros;
 /// A normal `let` binding with optional type annotation. The expression is expected to implement
 /// [`PinInit`]/[`Init`] with the error type [`Infallible`]. If you want to use a different error
 /// type, then use [`stack_try_pin_init!`].
+///
+/// [`stack_try_pin_init!`]: crate::stack_try_pin_init!
 #[macro_export]
 macro_rules! stack_pin_init {
     (let $var:ident $(: $t:ty)? = $val:expr) => {
@@ -804,6 +807,8 @@ macro_rules! try_pin_init {
 ///
 /// This initializer is for initializing data in-place that might later be moved. If you want to
 /// pin-initialize, use [`pin_init!`].
+///
+/// [`try_init!`]: crate::try_init!
 // For a detailed example of how this macro works, see the module documentation of the hidden
 // module `__internal` inside of `init/__internal.rs`.
 #[macro_export]
index 541cfad..00aa4e9 100644 (file)
@@ -16,8 +16,9 @@
 //!
 //! We will look at the following example:
 //!
-//! ```rust
+//! ```rust,ignore
 //! # use kernel::init::*;
+//! # use core::pin::Pin;
 //! #[pin_data]
 //! #[repr(C)]
 //! struct Bar<T> {
 //!
 //! Here is the definition of `Bar` from our example:
 //!
-//! ```rust
+//! ```rust,ignore
 //! # use kernel::init::*;
 //! #[pin_data]
 //! #[repr(C)]
 //! struct Bar<T> {
+//!     #[pin]
 //!     t: T,
 //!     pub x: usize,
 //! }
@@ -83,7 +85,7 @@
 //!
 //! This expands to the following code:
 //!
-//! ```rust
+//! ```rust,ignore
 //! // Firstly the normal definition of the struct, attributes are preserved:
 //! #[repr(C)]
 //! struct Bar<T> {
 //!         unsafe fn t<E>(
 //!             self,
 //!             slot: *mut T,
-//!             init: impl ::kernel::init::Init<T, E>,
+//!             // Since `t` is `#[pin]`, this is `PinInit`.
+//!             init: impl ::kernel::init::PinInit<T, E>,
 //!         ) -> ::core::result::Result<(), E> {
-//!             unsafe { ::kernel::init::Init::__init(init, slot) }
+//!             unsafe { ::kernel::init::PinInit::__pinned_init(init, slot) }
 //!         }
 //!         pub unsafe fn x<E>(
 //!             self,
 //!             slot: *mut usize,
+//!             // Since `x` is not `#[pin]`, this is `Init`.
 //!             init: impl ::kernel::init::Init<usize, E>,
 //!         ) -> ::core::result::Result<(), E> {
 //!             unsafe { ::kernel::init::Init::__init(init, slot) }
 //!         }
 //!     }
 //!     // Implement the internal `HasPinData` trait that associates `Bar` with the pin-data struct
-//!     // that we constructed beforehand.
+//!     // that we constructed above.
 //!     unsafe impl<T> ::kernel::init::__internal::HasPinData for Bar<T> {
 //!         type PinData = __ThePinData<T>;
 //!         unsafe fn __pin_data() -> Self::PinData {
 //!     struct __Unpin<'__pin, T> {
 //!         __phantom_pin: ::core::marker::PhantomData<fn(&'__pin ()) -> &'__pin ()>,
 //!         __phantom: ::core::marker::PhantomData<fn(Bar<T>) -> Bar<T>>,
+//!         // Our only `#[pin]` field is `t`.
+//!         t: T,
 //!     }
 //!     #[doc(hidden)]
 //!     impl<'__pin, T>
 //!
 //! Here is the impl on `Bar` defining the new function:
 //!
-//! ```rust
+//! ```rust,ignore
 //! impl<T> Bar<T> {
 //!     fn new(t: T) -> impl PinInit<Self> {
 //!         pin_init!(Self { t, x: 0 })
 //!
 //! This expands to the following code:
 //!
-//! ```rust
+//! ```rust,ignore
 //! impl<T> Bar<T> {
 //!     fn new(t: T) -> impl PinInit<Self> {
 //!         {
 //!                     // that will refer to this struct instead of the one defined above.
 //!                     struct __InitOk;
 //!                     // This is the expansion of `t,`, which is syntactic sugar for `t: t,`.
-//!                     unsafe { ::core::ptr::write(&raw mut (*slot).t, t) };
+//!                     unsafe { ::core::ptr::write(::core::addr_of_mut!((*slot).t), t) };
 //!                     // Since initialization could fail later (not in this case, since the error
-//!                     // type is `Infallible`) we will need to drop this field if it fails. This
-//!                     // `DropGuard` will drop the field when it gets dropped and has not yet
-//!                     // been forgotten. We make a reference to it, so users cannot `mem::forget`
-//!                     // it from the initializer, since the name is the same as the field.
+//!                     // type is `Infallible`) we will need to drop this field if there is an
+//!                     // error later. This `DropGuard` will drop the field when it gets dropped
+//!                     // and has not yet been forgotten. We make a reference to it, so users
+//!                     // cannot `mem::forget` it from the initializer, since the name is the same
+//!                     // as the field (including hygiene).
 //!                     let t = &unsafe {
-//!                         ::kernel::init::__internal::DropGuard::new(&raw mut (*slot).t)
+//!                         ::kernel::init::__internal::DropGuard::new(
+//!                             ::core::addr_of_mut!((*slot).t),
+//!                         )
 //!                     };
 //!                     // Expansion of `x: 0,`:
 //!                     // Since this can be an arbitrary expression we cannot place it inside of
 //!                     // the `unsafe` block, so we bind it here.
 //!                     let x = 0;
-//!                     unsafe { ::core::ptr::write(&raw mut (*slot).x, x) };
+//!                     unsafe { ::core::ptr::write(::core::addr_of_mut!((*slot).x), x) };
+//!                     // We again create a `DropGuard`.
 //!                     let x = &unsafe {
-//!                         ::kernel::init::__internal::DropGuard::new(&raw mut (*slot).x)
+//!                         ::kernel::init::__internal::DropGuard::new(
+//!                             ::core::addr_of_mut!((*slot).x),
+//!                         )
 //!                     };
 //!
-//!                     // Here we use the type checker to ensuer that every field has been
+//!                     // Here we use the type checker to ensure that every field has been
 //!                     // initialized exactly once, since this is `if false` it will never get
 //!                     // executed, but still type-checked.
 //!                     // Additionally we abuse `slot` to automatically infer the correct type for
 //!                         };
 //!                     }
 //!                     // Since initialization has successfully completed, we can now forget the
-//!                     // guards.
+//!                     // guards. This is not `mem::forget`, since we only have `&DropGuard`.
 //!                     unsafe { ::kernel::init::__internal::DropGuard::forget(t) };
 //!                     unsafe { ::kernel::init::__internal::DropGuard::forget(x) };
 //!                 }
 //!                 // `__InitOk` that we need to return.
 //!                 Ok(__InitOk)
 //!             });
-//!             // Change the return type of the closure.
+//!             // Change the return type from `__InitOk` to `()`.
 //!             let init = move |slot| -> ::core::result::Result<(), ::core::convert::Infallible> {
 //!                 init(slot).map(|__InitOk| ())
 //!             };
 //! Since we already took a look at `#[pin_data]` on `Bar`, this section will only explain the
 //! differences/new things in the expansion of the `Foo` definition:
 //!
-//! ```rust
+//! ```rust,ignore
 //! #[pin_data(PinnedDrop)]
 //! struct Foo {
 //!     a: usize,
 //!
 //! This expands to the following code:
 //!
-//! ```rust
+//! ```rust,ignore
 //! struct Foo {
 //!     a: usize,
 //!     b: Bar<u32>,
 //!         unsafe fn b<E>(
 //!             self,
 //!             slot: *mut Bar<u32>,
-//!             // Note that this is `PinInit` instead of `Init`, this is because `b` is
-//!             // structurally pinned, as marked by the `#[pin]` attribute.
 //!             init: impl ::kernel::init::PinInit<Bar<u32>, E>,
 //!         ) -> ::core::result::Result<(), E> {
 //!             unsafe { ::kernel::init::PinInit::__pinned_init(init, slot) }
 //!     struct __Unpin<'__pin> {
 //!         __phantom_pin: ::core::marker::PhantomData<fn(&'__pin ()) -> &'__pin ()>,
 //!         __phantom: ::core::marker::PhantomData<fn(Foo) -> Foo>,
-//!         // Since this field is `#[pin]`, it is listed here.
 //!         b: Bar<u32>,
 //!     }
 //!     #[doc(hidden)]
 //!     impl<'__pin> ::core::marker::Unpin for Foo where __Unpin<'__pin>: ::core::marker::Unpin {}
 //!     // Since we specified `PinnedDrop` as the argument to `#[pin_data]`, we expect `Foo` to
 //!     // implement `PinnedDrop`. Thus we do not need to prevent `Drop` implementations like
-//!     // before, instead we implement it here and delegate to `PinnedDrop`.
+//!     // before, instead we implement `Drop` here and delegate to `PinnedDrop`.
 //!     impl ::core::ops::Drop for Foo {
 //!         fn drop(&mut self) {
 //!             // Since we are getting dropped, no one else has a reference to `self` and thus we
 //!
 //! Here is the `PinnedDrop` impl for `Foo`:
 //!
-//! ```rust
+//! ```rust,ignore
 //! #[pinned_drop]
 //! impl PinnedDrop for Foo {
 //!     fn drop(self: Pin<&mut Self>) {
 //!
 //! This expands to the following code:
 //!
-//! ```rust
+//! ```rust,ignore
 //! // `unsafe`, full path and the token parameter are added, everything else stays the same.
 //! unsafe impl ::kernel::init::PinnedDrop for Foo {
 //!     fn drop(self: Pin<&mut Self>, _: ::kernel::init::__internal::OnlyCallFromDrop) {
 //!
 //! ## `pin_init!` on `Foo`
 //!
-//! Since we already took a look at `pin_init!` on `Bar`, this section will only explain the
-//! differences/new things in the expansion of `pin_init!` on `Foo`:
+//! Since we already took a look at `pin_init!` on `Bar`, this section will only show the expansion
+//! of `pin_init!` on `Foo`:
 //!
-//! ```rust
+//! ```rust,ignore
 //! let a = 42;
 //! let initializer = pin_init!(Foo {
 //!     a,
 //!
 //! This expands to the following code:
 //!
-//! ```rust
+//! ```rust,ignore
 //! let a = 42;
 //! let initializer = {
 //!     struct __InitOk;
 //!     >(data, move |slot| {
 //!         {
 //!             struct __InitOk;
-//!             unsafe { ::core::ptr::write(&raw mut (*slot).a, a) };
-//!             let a = &unsafe { ::kernel::init::__internal::DropGuard::new(&raw mut (*slot).a) };
+//!             unsafe { ::core::ptr::write(::core::addr_of_mut!((*slot).a), a) };
+//!             let a = &unsafe {
+//!                 ::kernel::init::__internal::DropGuard::new(::core::addr_of_mut!((*slot).a))
+//!             };
 //!             let b = Bar::new(36);
-//!             // Here we use `data` to access the correct field and require that `b` is of type
-//!             // `PinInit<Bar<u32>, Infallible>`.
-//!             unsafe { data.b(&raw mut (*slot).b, b)? };
-//!             let b = &unsafe { ::kernel::init::__internal::DropGuard::new(&raw mut (*slot).b) };
+//!             unsafe { data.b(::core::addr_of_mut!((*slot).b), b)? };
+//!             let b = &unsafe {
+//!                 ::kernel::init::__internal::DropGuard::new(::core::addr_of_mut!((*slot).b))
+//!             };
 //!
 //!             #[allow(unreachable_code, clippy::diverging_sub_expression)]
 //!             if false {
index 676995d..85b2612 100644 (file)
 #![no_std]
 #![feature(allocator_api)]
 #![feature(coerce_unsized)]
-#![feature(core_ffi_c)]
 #![feature(dispatch_from_dyn)]
-#![feature(explicit_generic_args_with_impl_trait)]
-#![feature(generic_associated_types)]
 #![feature(new_uninit)]
-#![feature(pin_macro)]
 #![feature(receiver_trait)]
 #![feature(unsize)]
 
index b3e68b2..388d6a5 100644 (file)
 /// [`std::dbg`]: https://doc.rust-lang.org/std/macro.dbg.html
 /// [`eprintln`]: https://doc.rust-lang.org/std/macro.eprintln.html
 /// [`printk`]: https://www.kernel.org/doc/html/latest/core-api/printk-basics.html
+/// [`pr_info`]: crate::pr_info!
+/// [`pr_debug`]: crate::pr_debug!
 #[macro_export]
 macro_rules! dbg {
     // NOTE: We cannot use `concat!` to make a static string as a format argument
index cd3d2a6..c9dd3bf 100644 (file)
@@ -2,6 +2,7 @@
 
 //! String representations.
 
+use alloc::alloc::AllocError;
 use alloc::vec::Vec;
 use core::fmt::{self, Write};
 use core::ops::{self, Deref, Index};
@@ -199,6 +200,12 @@ impl CStr {
     pub unsafe fn as_str_unchecked(&self) -> &str {
         unsafe { core::str::from_utf8_unchecked(self.as_bytes()) }
     }
+
+    /// Convert this [`CStr`] into a [`CString`] by allocating memory and
+    /// copying over the string data.
+    pub fn to_cstring(&self) -> Result<CString, AllocError> {
+        CString::try_from(self)
+    }
 }
 
 impl fmt::Display for CStr {
@@ -584,6 +591,21 @@ impl Deref for CString {
     }
 }
 
+impl<'a> TryFrom<&'a CStr> for CString {
+    type Error = AllocError;
+
+    fn try_from(cstr: &'a CStr) -> Result<CString, AllocError> {
+        let mut buf = Vec::new();
+
+        buf.try_extend_from_slice(cstr.as_bytes_with_nul())
+            .map_err(|_| AllocError)?;
+
+        // INVARIANT: The `CStr` and `CString` types have the same invariants for
+        // the string data, and we copied it over without changes.
+        Ok(CString { buf })
+    }
+}
+
 /// A convenience alias for [`core::format_args`].
 #[macro_export]
 macro_rules! fmt {
index e6d2062..a89843c 100644 (file)
@@ -146,13 +146,15 @@ impl<T: ?Sized + Unsize<U>, U: ?Sized> core::ops::DispatchFromDyn<Arc<U>> for Ar
 
 // SAFETY: It is safe to send `Arc<T>` to another thread when the underlying `T` is `Sync` because
 // it effectively means sharing `&T` (which is safe because `T` is `Sync`); additionally, it needs
-// `T` to be `Send` because any thread that has an `Arc<T>` may ultimately access `T` directly, for
-// example, when the reference count reaches zero and `T` is dropped.
+// `T` to be `Send` because any thread that has an `Arc<T>` may ultimately access `T` using a
+// mutable reference when the reference count reaches zero and `T` is dropped.
 unsafe impl<T: ?Sized + Sync + Send> Send for Arc<T> {}
 
-// SAFETY: It is safe to send `&Arc<T>` to another thread when the underlying `T` is `Sync` for the
-// same reason as above. `T` needs to be `Send` as well because a thread can clone an `&Arc<T>`
-// into an `Arc<T>`, which may lead to `T` being accessed by the same reasoning as above.
+// SAFETY: It is safe to send `&Arc<T>` to another thread when the underlying `T` is `Sync`
+// because it effectively means sharing `&T` (which is safe because `T` is `Sync`); additionally,
+// it needs `T` to be `Send` because any thread that has a `&Arc<T>` may clone it and get an
+// `Arc<T>` on that thread, so the thread may ultimately access `T` using a mutable reference when
+// the reference count reaches zero and `T` is dropped.
 unsafe impl<T: ?Sized + Sync + Send> Sync for Arc<T> {}
 
 impl<T> Arc<T> {
@@ -185,7 +187,7 @@ impl<T> Arc<T> {
 
     /// Use the given initializer to in-place initialize a `T`.
     ///
-    /// This is equivalent to [`pin_init`], since an [`Arc`] is always pinned.
+    /// This is equivalent to [`Arc<T>::pin_init`], since an [`Arc`] is always pinned.
     #[inline]
     pub fn init<E>(init: impl Init<T, E>) -> error::Result<Self>
     where
@@ -221,6 +223,11 @@ impl<T: ?Sized> Arc<T> {
         // reference can be created.
         unsafe { ArcBorrow::new(self.ptr) }
     }
+
+    /// Compare whether two [`Arc`] pointers reference the same underlying object.
+    pub fn ptr_eq(this: &Self, other: &Self) -> bool {
+        core::ptr::eq(this.ptr.as_ptr(), other.ptr.as_ptr())
+    }
 }
 
 impl<T: 'static> ForeignOwnable for Arc<T> {
@@ -259,6 +266,12 @@ impl<T: ?Sized> Deref for Arc<T> {
     }
 }
 
+impl<T: ?Sized> AsRef<T> for Arc<T> {
+    fn as_ref(&self) -> &T {
+        self.deref()
+    }
+}
+
 impl<T: ?Sized> Clone for Arc<T> {
     fn clone(&self) -> Self {
         // INVARIANT: C `refcount_inc` saturates the refcount, so it cannot overflow to zero.
index 526d29a..7eda15e 100644 (file)
@@ -64,8 +64,14 @@ macro_rules! current {
 #[repr(transparent)]
 pub struct Task(pub(crate) Opaque<bindings::task_struct>);
 
-// SAFETY: It's OK to access `Task` through references from other threads because we're either
-// accessing properties that don't change (e.g., `pid`, `group_leader`) or that are properly
+// SAFETY: By design, the only way to access a `Task` is via the `current` function or via an
+// `ARef<Task>` obtained through the `AlwaysRefCounted` impl. This means that the only situation in
+// which a `Task` can be accessed mutably is when the refcount drops to zero and the destructor
+// runs. It is safe for that to happen on any thread, so it is ok for this type to be `Send`.
+unsafe impl Send for Task {}
+
+// SAFETY: It's OK to access `Task` through shared references from other threads because we're
+// either accessing properties that don't change (e.g., `pid`, `group_leader`) or that are properly
 // synchronised by C code (e.g., `signal_pending`).
 unsafe impl Sync for Task {}
 
index 29db59d..1e5380b 100644 (file)
@@ -321,6 +321,19 @@ pub struct ARef<T: AlwaysRefCounted> {
     _p: PhantomData<T>,
 }
 
+// SAFETY: It is safe to send `ARef<T>` to another thread when the underlying `T` is `Sync` because
+// it effectively means sharing `&T` (which is safe because `T` is `Sync`); additionally, it needs
+// `T` to be `Send` because any thread that has an `ARef<T>` may ultimately access `T` using a
+// mutable reference, for example, when the reference count reaches zero and `T` is dropped.
+unsafe impl<T: AlwaysRefCounted + Sync + Send> Send for ARef<T> {}
+
+// SAFETY: It is safe to send `&ARef<T>` to another thread when the underlying `T` is `Sync`
+// because it effectively means sharing `&T` (which is safe because `T` is `Sync`); additionally,
+// it needs `T` to be `Send` because any thread that has a `&ARef<T>` may clone it and get an
+// `ARef<T>` on that thread, so the thread may ultimately access `T` using a mutable reference, for
+// example, when the reference count reaches zero and `T` is dropped.
+unsafe impl<T: AlwaysRefCounted + Sync + Send> Sync for ARef<T> {}
+
 impl<T: AlwaysRefCounted> ARef<T> {
     /// Creates a new instance of [`ARef`].
     ///
index b2bdd4d..afb0f2e 100644 (file)
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use proc_macro::{token_stream, Group, TokenTree};
+use proc_macro::{token_stream, Group, Punct, Spacing, TokenStream, TokenTree};
 
 pub(crate) fn try_ident(it: &mut token_stream::IntoIter) -> Option<String> {
     if let Some(TokenTree::Ident(ident)) = it.next() {
@@ -69,3 +69,87 @@ pub(crate) fn expect_end(it: &mut token_stream::IntoIter) {
         panic!("Expected end");
     }
 }
+
+pub(crate) struct Generics {
+    pub(crate) impl_generics: Vec<TokenTree>,
+    pub(crate) ty_generics: Vec<TokenTree>,
+}
+
+/// Parses the given `TokenStream` into `Generics` and the rest.
+///
+/// The generics are not present in the rest, but a where clause might remain.
+pub(crate) fn parse_generics(input: TokenStream) -> (Generics, Vec<TokenTree>) {
+    // `impl_generics`, the declared generics with their bounds.
+    let mut impl_generics = vec![];
+    // Only the names of the generics, without any bounds.
+    let mut ty_generics = vec![];
+    // Tokens not related to the generics e.g. the `where` token and definition.
+    let mut rest = vec![];
+    // The current level of `<`.
+    let mut nesting = 0;
+    let mut toks = input.into_iter();
+    // If we are at the beginning of a generic parameter.
+    let mut at_start = true;
+    for tt in &mut toks {
+        match tt.clone() {
+            TokenTree::Punct(p) if p.as_char() == '<' => {
+                if nesting >= 1 {
+                    // This is inside of the generics and part of some bound.
+                    impl_generics.push(tt);
+                }
+                nesting += 1;
+            }
+            TokenTree::Punct(p) if p.as_char() == '>' => {
+                // This is a parsing error, so we just end it here.
+                if nesting == 0 {
+                    break;
+                } else {
+                    nesting -= 1;
+                    if nesting >= 1 {
+                        // We are still inside of the generics and part of some bound.
+                        impl_generics.push(tt);
+                    }
+                    if nesting == 0 {
+                        break;
+                    }
+                }
+            }
+            tt => {
+                if nesting == 1 {
+                    // Here depending on the token, it might be a generic variable name.
+                    match &tt {
+                        // Ignore const.
+                        TokenTree::Ident(i) if i.to_string() == "const" => {}
+                        TokenTree::Ident(_) if at_start => {
+                            ty_generics.push(tt.clone());
+                            // We also already push the `,` token, this makes it easier to append
+                            // generics.
+                            ty_generics.push(TokenTree::Punct(Punct::new(',', Spacing::Alone)));
+                            at_start = false;
+                        }
+                        TokenTree::Punct(p) if p.as_char() == ',' => at_start = true,
+                        // Lifetimes begin with `'`.
+                        TokenTree::Punct(p) if p.as_char() == '\'' && at_start => {
+                            ty_generics.push(tt.clone());
+                        }
+                        _ => {}
+                    }
+                }
+                if nesting >= 1 {
+                    impl_generics.push(tt);
+                } else if nesting == 0 {
+                    // If we haven't entered the generics yet, we still want to keep these tokens.
+                    rest.push(tt);
+                }
+            }
+        }
+    }
+    rest.extend(toks);
+    (
+        Generics {
+            impl_generics,
+            ty_generics,
+        },
+        rest,
+    )
+}
index 954149d..6d58cfd 100644 (file)
 // SPDX-License-Identifier: Apache-2.0 OR MIT
 
-use proc_macro::{Punct, Spacing, TokenStream, TokenTree};
+use crate::helpers::{parse_generics, Generics};
+use proc_macro::{Group, Punct, Spacing, TokenStream, TokenTree};
 
 pub(crate) fn pin_data(args: TokenStream, input: TokenStream) -> TokenStream {
     // This proc-macro only does some pre-parsing and then delegates the actual parsing to
     // `kernel::__pin_data!`.
-    //
-    // In here we only collect the generics, since parsing them in declarative macros is very
-    // elaborate. We also do not need to analyse their structure, we only need to collect them.
 
-    // `impl_generics`, the declared generics with their bounds.
-    let mut impl_generics = vec![];
-    // Only the names of the generics, without any bounds.
-    let mut ty_generics = vec![];
-    // Tokens not related to the generics e.g. the `impl` token.
-    let mut rest = vec![];
-    // The current level of `<`.
-    let mut nesting = 0;
-    let mut toks = input.into_iter();
-    // If we are at the beginning of a generic parameter.
-    let mut at_start = true;
-    for tt in &mut toks {
-        match tt.clone() {
-            TokenTree::Punct(p) if p.as_char() == '<' => {
-                if nesting >= 1 {
-                    impl_generics.push(tt);
-                }
-                nesting += 1;
-            }
-            TokenTree::Punct(p) if p.as_char() == '>' => {
-                if nesting == 0 {
-                    break;
-                } else {
-                    nesting -= 1;
-                    if nesting >= 1 {
-                        impl_generics.push(tt);
-                    }
-                    if nesting == 0 {
-                        break;
-                    }
+    let (
+        Generics {
+            impl_generics,
+            ty_generics,
+        },
+        rest,
+    ) = parse_generics(input);
+    // The struct definition might contain the `Self` type. Since `__pin_data!` will define a new
+    // type with the same generics and bounds, this poses a problem, since `Self` will refer to the
+    // new type as opposed to this struct definition. Therefore we have to replace `Self` with the
+    // concrete name.
+
+    // Errors that occur when replacing `Self` with `struct_name`.
+    let mut errs = TokenStream::new();
+    // The name of the struct with ty_generics.
+    let struct_name = rest
+        .iter()
+        .skip_while(|tt| !matches!(tt, TokenTree::Ident(i) if i.to_string() == "struct"))
+        .nth(1)
+        .and_then(|tt| match tt {
+            TokenTree::Ident(_) => {
+                let tt = tt.clone();
+                let mut res = vec![tt];
+                if !ty_generics.is_empty() {
+                    // We add this, so it is maximally compatible with e.g. `Self::CONST` which
+                    // will be replaced by `StructName::<$generics>::CONST`.
+                    res.push(TokenTree::Punct(Punct::new(':', Spacing::Joint)));
+                    res.push(TokenTree::Punct(Punct::new(':', Spacing::Alone)));
+                    res.push(TokenTree::Punct(Punct::new('<', Spacing::Alone)));
+                    res.extend(ty_generics.iter().cloned());
+                    res.push(TokenTree::Punct(Punct::new('>', Spacing::Alone)));
                 }
+                Some(res)
             }
-            tt => {
-                if nesting == 1 {
-                    match &tt {
-                        TokenTree::Ident(i) if i.to_string() == "const" => {}
-                        TokenTree::Ident(_) if at_start => {
-                            ty_generics.push(tt.clone());
-                            ty_generics.push(TokenTree::Punct(Punct::new(',', Spacing::Alone)));
-                            at_start = false;
-                        }
-                        TokenTree::Punct(p) if p.as_char() == ',' => at_start = true,
-                        TokenTree::Punct(p) if p.as_char() == '\'' && at_start => {
-                            ty_generics.push(tt.clone());
-                        }
-                        _ => {}
-                    }
-                }
-                if nesting >= 1 {
-                    impl_generics.push(tt);
-                } else if nesting == 0 {
-                    rest.push(tt);
-                }
+            _ => None,
+        })
+        .unwrap_or_else(|| {
+            // If we did not find the name of the struct then we will use `Self` as the replacement
+            // and add a compile error to ensure it does not compile.
+            errs.extend(
+                "::core::compile_error!(\"Could not locate type name.\");"
+                    .parse::<TokenStream>()
+                    .unwrap(),
+            );
+            "Self".parse::<TokenStream>().unwrap().into_iter().collect()
+        });
+    let impl_generics = impl_generics
+        .into_iter()
+        .flat_map(|tt| replace_self_and_deny_type_defs(&struct_name, tt, &mut errs))
+        .collect::<Vec<_>>();
+    let mut rest = rest
+        .into_iter()
+        .flat_map(|tt| {
+            // We ignore top level `struct` tokens, since they would emit a compile error.
+            if matches!(&tt, TokenTree::Ident(i) if i.to_string() == "struct") {
+                vec![tt]
+            } else {
+                replace_self_and_deny_type_defs(&struct_name, tt, &mut errs)
             }
-        }
-    }
-    rest.extend(toks);
+        })
+        .collect::<Vec<_>>();
     // This should be the body of the struct `{...}`.
     let last = rest.pop();
-    quote!(::kernel::__pin_data! {
+    let mut quoted = quote!(::kernel::__pin_data! {
         parse_input:
         @args(#args),
         @sig(#(#rest)*),
         @impl_generics(#(#impl_generics)*),
         @ty_generics(#(#ty_generics)*),
         @body(#last),
-    })
+    });
+    quoted.extend(errs);
+    quoted
+}
+
+/// Replaces `Self` with `struct_name` and errors on `enum`, `trait`, `struct` `union` and `impl`
+/// keywords.
+///
+/// The error is appended to `errs` to allow normal parsing to continue.
+fn replace_self_and_deny_type_defs(
+    struct_name: &Vec<TokenTree>,
+    tt: TokenTree,
+    errs: &mut TokenStream,
+) -> Vec<TokenTree> {
+    match tt {
+        TokenTree::Ident(ref i)
+            if i.to_string() == "enum"
+                || i.to_string() == "trait"
+                || i.to_string() == "struct"
+                || i.to_string() == "union"
+                || i.to_string() == "impl" =>
+        {
+            errs.extend(
+                format!(
+                    "::core::compile_error!(\"Cannot use `{i}` inside of struct definition with \
+                        `#[pin_data]`.\");"
+                )
+                .parse::<TokenStream>()
+                .unwrap()
+                .into_iter()
+                .map(|mut tok| {
+                    tok.set_span(tt.span());
+                    tok
+                }),
+            );
+            vec![tt]
+        }
+        TokenTree::Ident(i) if i.to_string() == "Self" => struct_name.clone(),
+        TokenTree::Literal(_) | TokenTree::Punct(_) | TokenTree::Ident(_) => vec![tt],
+        TokenTree::Group(g) => vec![TokenTree::Group(Group::new(
+            g.delimiter(),
+            g.stream()
+                .into_iter()
+                .flat_map(|tt| replace_self_and_deny_type_defs(struct_name, tt, errs))
+                .collect(),
+        ))],
+    }
 }
index c8e08b3..dddbb4e 100644 (file)
@@ -39,12 +39,14 @@ impl ToTokens for TokenStream {
 /// [`quote_spanned!`](https://docs.rs/quote/latest/quote/macro.quote_spanned.html) macro from the
 /// `quote` crate but provides only just enough functionality needed by the current `macros` crate.
 macro_rules! quote_spanned {
-    ($span:expr => $($tt:tt)*) => {
-    #[allow(clippy::vec_init_then_push)]
-    {
-        let mut tokens = ::std::vec::Vec::new();
-        let span = $span;
-        quote_spanned!(@proc tokens span $($tt)*);
+    ($span:expr => $($tt:tt)*) => {{
+        let mut tokens;
+        #[allow(clippy::vec_init_then_push)]
+        {
+            tokens = ::std::vec::Vec::new();
+            let span = $span;
+            quote_spanned!(@proc tokens span $($tt)*);
+        }
         ::proc_macro::TokenStream::from_iter(tokens)
     }};
     (@proc $v:ident $span:ident) => {};
index 29f69f3..0caad90 100644 (file)
@@ -8,7 +8,6 @@
 //! userspace APIs.
 
 #![no_std]
-#![feature(core_ffi_c)]
 // See <https://github.com/rust-lang/rust-bindgen/issues/1651>.
 #![cfg_attr(test, allow(deref_nullptr))]
 #![cfg_attr(test, allow(unaligned_references))]
index 6448b78..bf66277 100644 (file)
@@ -498,7 +498,6 @@ int main(int argc, char **argv)
                                        "Option -%c requires an argument.\n\n",
                                        optopt);
                case 'h':
-                       __fallthrough;
                default:
                        Usage();
                        return 0;
index 9f94fc8..7817523 100644 (file)
@@ -277,7 +277,7 @@ $(obj)/%.lst: $(src)/%.c FORCE
 # Compile Rust sources (.rs)
 # ---------------------------------------------------------------------------
 
-rust_allowed_features := core_ffi_c,explicit_generic_args_with_impl_trait,new_uninit,pin_macro
+rust_allowed_features := new_uninit
 
 rust_common_cmd = \
        RUST_MODFILE=$(modfile) $(RUSTC_OR_CLIPPY) $(rust_flags) \
index 81d5c32..608ff39 100755 (executable)
@@ -36,9 +36,16 @@ meta_has_relaxed()
        meta_in "$1" "BFIR"
 }
 
-#find_fallback_template(pfx, name, sfx, order)
-find_fallback_template()
+#meta_is_implicitly_relaxed(meta)
+meta_is_implicitly_relaxed()
 {
+       meta_in "$1" "vls"
+}
+
+#find_template(tmpltype, pfx, name, sfx, order)
+find_template()
+{
+       local tmpltype="$1"; shift
        local pfx="$1"; shift
        local name="$1"; shift
        local sfx="$1"; shift
@@ -52,8 +59,8 @@ find_fallback_template()
        #
        # Start at the most specific, and fall back to the most general. Once
        # we find a specific fallback, don't bother looking for more.
-       for base in "${pfx}${name}${sfx}${order}" "${name}"; do
-               file="${ATOMICDIR}/fallbacks/${base}"
+       for base in "${pfx}${name}${sfx}${order}" "${pfx}${name}${sfx}" "${name}"; do
+               file="${ATOMICDIR}/${tmpltype}/${base}"
 
                if [ -f "${file}" ]; then
                        printf "${file}"
@@ -62,6 +69,18 @@ find_fallback_template()
        done
 }
 
+#find_fallback_template(pfx, name, sfx, order)
+find_fallback_template()
+{
+       find_template "fallbacks" "$@"
+}
+
+#find_kerneldoc_template(pfx, name, sfx, order)
+find_kerneldoc_template()
+{
+       find_template "kerneldoc" "$@"
+}
+
 #gen_ret_type(meta, int)
 gen_ret_type() {
        local meta="$1"; shift
@@ -142,6 +161,91 @@ gen_args()
        done
 }
 
+#gen_desc_return(meta)
+gen_desc_return()
+{
+       local meta="$1"; shift
+
+       case "${meta}" in
+       [v])
+               printf "Return: Nothing."
+               ;;
+       [Ff])
+               printf "Return: The original value of @v."
+               ;;
+       [R])
+               printf "Return: The updated value of @v."
+               ;;
+       [l])
+               printf "Return: The value of @v."
+               ;;
+       esac
+}
+
+#gen_template_kerneldoc(template, class, meta, pfx, name, sfx, order, atomic, int, args...)
+gen_template_kerneldoc()
+{
+       local template="$1"; shift
+       local class="$1"; shift
+       local meta="$1"; shift
+       local pfx="$1"; shift
+       local name="$1"; shift
+       local sfx="$1"; shift
+       local order="$1"; shift
+       local atomic="$1"; shift
+       local int="$1"; shift
+
+       local atomicname="${atomic}_${pfx}${name}${sfx}${order}"
+
+       local ret="$(gen_ret_type "${meta}" "${int}")"
+       local retstmt="$(gen_ret_stmt "${meta}")"
+       local params="$(gen_params "${int}" "${atomic}" "$@")"
+       local args="$(gen_args "$@")"
+       local desc_order=""
+       local desc_instrumentation=""
+       local desc_return=""
+
+       if [ ! -z "${order}" ]; then
+               desc_order="${order##_}"
+       elif meta_is_implicitly_relaxed "${meta}"; then
+               desc_order="relaxed"
+       else
+               desc_order="full"
+       fi
+
+       if [ -z "${class}" ]; then
+               desc_noinstr="Unsafe to use in noinstr code; use raw_${atomicname}() there."
+       else
+               desc_noinstr="Safe to use in noinstr code; prefer ${atomicname}() elsewhere."
+       fi
+
+       desc_return="$(gen_desc_return "${meta}")"
+
+       . ${template}
+}
+
+#gen_kerneldoc(class, meta, pfx, name, sfx, order, atomic, int, args...)
+gen_kerneldoc()
+{
+       local class="$1"; shift
+       local meta="$1"; shift
+       local pfx="$1"; shift
+       local name="$1"; shift
+       local sfx="$1"; shift
+       local order="$1"; shift
+
+       local atomicname="${atomic}_${pfx}${name}${sfx}${order}"
+
+       local tmpl="$(find_kerneldoc_template "${pfx}" "${name}" "${sfx}" "${order}")"
+       if [ -z "${tmpl}" ]; then
+               printf "/*\n"
+               printf " * No kerneldoc available for ${class}${atomicname}\n"
+               printf " */\n"
+       else
+       gen_template_kerneldoc "${tmpl}" "${class}" "${meta}" "${pfx}" "${name}" "${sfx}" "${order}" "$@"
+       fi
+}
+
 #gen_proto_order_variants(meta, pfx, name, sfx, ...)
 gen_proto_order_variants()
 {
index 85ca8d9..903946c 100644 (file)
@@ -27,7 +27,7 @@ and                   vF      i       v
 andnot                 vF      i       v
 or                     vF      i       v
 xor                    vF      i       v
-xchg                   I       v       i
+xchg                   I       v       i:new
 cmpxchg                        I       v       i:old   i:new
 try_cmpxchg            B       v       p:old   i:new
 sub_and_test           b       i       v
index ef76408..4da0cab 100755 (executable)
@@ -1,9 +1,5 @@
 cat <<EOF
-static __always_inline ${ret}
-arch_${atomic}_${pfx}${name}${sfx}_acquire(${params})
-{
        ${ret} ret = arch_${atomic}_${pfx}${name}${sfx}_relaxed(${args});
        __atomic_acquire_fence();
        return ret;
-}
 EOF
index e5980ab..1d3d4ab 100755 (executable)
@@ -1,15 +1,3 @@
 cat <<EOF
-/**
- * arch_${atomic}_add_negative${order} - Add and test if negative
- * @i: integer value to add
- * @v: pointer of type ${atomic}_t
- *
- * Atomically adds @i to @v and returns true if the result is negative,
- * or false when the result is greater than or equal to zero.
- */
-static __always_inline bool
-arch_${atomic}_add_negative${order}(${int} i, ${atomic}_t *v)
-{
-       return arch_${atomic}_add_return${order}(i, v) < 0;
-}
+       return raw_${atomic}_add_return${order}(i, v) < 0;
 EOF
index 9e5159c..95ecb2b 100755 (executable)
@@ -1,16 +1,3 @@
 cat << EOF
-/**
- * arch_${atomic}_add_unless - add unless the number is already a given value
- * @v: pointer of type ${atomic}_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, if @v was not already @u.
- * Returns true if the addition was done.
- */
-static __always_inline bool
-arch_${atomic}_add_unless(${atomic}_t *v, ${int} a, ${int} u)
-{
-       return arch_${atomic}_fetch_add_unless(v, a, u) != u;
-}
+       return raw_${atomic}_fetch_add_unless(v, a, u) != u;
 EOF
index 5a42f54..6676045 100755 (executable)
@@ -1,7 +1,3 @@
 cat <<EOF
-static __always_inline ${ret}
-arch_${atomic}_${pfx}andnot${sfx}${order}(${int} i, ${atomic}_t *v)
-{
-       ${retstmt}arch_${atomic}_${pfx}and${sfx}${order}(~i, v);
-}
+       ${retstmt}raw_${atomic}_${pfx}and${sfx}${order}(~i, v);
 EOF
diff --git a/scripts/atomic/fallbacks/cmpxchg b/scripts/atomic/fallbacks/cmpxchg
new file mode 100644 (file)
index 0000000..1c8507f
--- /dev/null
@@ -0,0 +1,3 @@
+cat <<EOF
+       return raw_cmpxchg${order}(&v->counter, old, new);
+EOF
index 8c144c8..60d286d 100755 (executable)
@@ -1,7 +1,3 @@
 cat <<EOF
-static __always_inline ${ret}
-arch_${atomic}_${pfx}dec${sfx}${order}(${atomic}_t *v)
-{
-       ${retstmt}arch_${atomic}_${pfx}sub${sfx}${order}(1, v);
-}
+       ${retstmt}raw_${atomic}_${pfx}sub${sfx}${order}(1, v);
 EOF
index 8549f35..3a0278e 100755 (executable)
@@ -1,15 +1,3 @@
 cat <<EOF
-/**
- * arch_${atomic}_dec_and_test - decrement and test
- * @v: pointer of type ${atomic}_t
- *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
-static __always_inline bool
-arch_${atomic}_dec_and_test(${atomic}_t *v)
-{
-       return arch_${atomic}_dec_return(v) == 0;
-}
+       return raw_${atomic}_dec_return(v) == 0;
 EOF
index 86bdced..f65c11b 100755 (executable)
@@ -1,15 +1,11 @@
 cat <<EOF
-static __always_inline ${ret}
-arch_${atomic}_dec_if_positive(${atomic}_t *v)
-{
-       ${int} dec, c = arch_${atomic}_read(v);
+       ${int} dec, c = raw_${atomic}_read(v);
 
        do {
                dec = c - 1;
                if (unlikely(dec < 0))
                        break;
-       } while (!arch_${atomic}_try_cmpxchg(v, &c, dec));
+       } while (!raw_${atomic}_try_cmpxchg(v, &c, dec));
 
        return dec;
-}
 EOF
index c531d5a..d025361 100755 (executable)
@@ -1,14 +1,10 @@
 cat <<EOF
-static __always_inline bool
-arch_${atomic}_dec_unless_positive(${atomic}_t *v)
-{
-       ${int} c = arch_${atomic}_read(v);
+       ${int} c = raw_${atomic}_read(v);
 
        do {
                if (unlikely(c > 0))
                        return false;
-       } while (!arch_${atomic}_try_cmpxchg(v, &c, c - 1));
+       } while (!raw_${atomic}_try_cmpxchg(v, &c, c - 1));
 
        return true;
-}
 EOF
index 07757d8..40d5b39 100755 (executable)
@@ -1,11 +1,7 @@
 cat <<EOF
-static __always_inline ${ret}
-arch_${atomic}_${pfx}${name}${sfx}(${params})
-{
        ${ret} ret;
        __atomic_pre_full_fence();
        ret = arch_${atomic}_${pfx}${name}${sfx}_relaxed(${args});
        __atomic_post_full_fence();
        return ret;
-}
 EOF
index 68ce13c..8db7e9e 100755 (executable)
@@ -1,23 +1,10 @@
 cat << EOF
-/**
- * arch_${atomic}_fetch_add_unless - add unless the number is already a given value
- * @v: pointer of type ${atomic}_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as @v was not already @u.
- * Returns original value of @v
- */
-static __always_inline ${int}
-arch_${atomic}_fetch_add_unless(${atomic}_t *v, ${int} a, ${int} u)
-{
-       ${int} c = arch_${atomic}_read(v);
+       ${int} c = raw_${atomic}_read(v);
 
        do {
                if (unlikely(c == u))
                        break;
-       } while (!arch_${atomic}_try_cmpxchg(v, &c, c + a));
+       } while (!raw_${atomic}_try_cmpxchg(v, &c, c + a));
 
        return c;
-}
 EOF
index 3c2c373..56c770f 100755 (executable)
@@ -1,7 +1,3 @@
 cat <<EOF
-static __always_inline ${ret}
-arch_${atomic}_${pfx}inc${sfx}${order}(${atomic}_t *v)
-{
-       ${retstmt}arch_${atomic}_${pfx}add${sfx}${order}(1, v);
-}
+       ${retstmt}raw_${atomic}_${pfx}add${sfx}${order}(1, v);
 EOF
index 0cf23fe..7d16a10 100755 (executable)
@@ -1,15 +1,3 @@
 cat <<EOF
-/**
- * arch_${atomic}_inc_and_test - increment and test
- * @v: pointer of type ${atomic}_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-static __always_inline bool
-arch_${atomic}_inc_and_test(${atomic}_t *v)
-{
-       return arch_${atomic}_inc_return(v) == 0;
-}
+       return raw_${atomic}_inc_return(v) == 0;
 EOF
index ed8a1f5..1fcef1e 100755 (executable)
@@ -1,14 +1,3 @@
 cat <<EOF
-/**
- * arch_${atomic}_inc_not_zero - increment unless the number is zero
- * @v: pointer of type ${atomic}_t
- *
- * Atomically increments @v by 1, if @v is non-zero.
- * Returns true if the increment was done.
- */
-static __always_inline bool
-arch_${atomic}_inc_not_zero(${atomic}_t *v)
-{
-       return arch_${atomic}_add_unless(v, 1, 0);
-}
+       return raw_${atomic}_add_unless(v, 1, 0);
 EOF
index 95d8ce4..7b4b098 100755 (executable)
@@ -1,14 +1,10 @@
 cat <<EOF
-static __always_inline bool
-arch_${atomic}_inc_unless_negative(${atomic}_t *v)
-{
-       ${int} c = arch_${atomic}_read(v);
+       ${int} c = raw_${atomic}_read(v);
 
        do {
                if (unlikely(c < 0))
                        return false;
-       } while (!arch_${atomic}_try_cmpxchg(v, &c, c + 1));
+       } while (!raw_${atomic}_try_cmpxchg(v, &c, c + 1));
 
        return true;
-}
 EOF
index a0ea1d2..e319862 100755 (executable)
@@ -1,16 +1,12 @@
 cat <<EOF
-static __always_inline ${ret}
-arch_${atomic}_read_acquire(const ${atomic}_t *v)
-{
        ${int} ret;
 
        if (__native_word(${atomic}_t)) {
                ret = smp_load_acquire(&(v)->counter);
        } else {
-               ret = arch_${atomic}_read(v);
+               ret = raw_${atomic}_read(v);
                __atomic_acquire_fence();
        }
 
        return ret;
-}
 EOF
index b46feb5..1e6daf5 100755 (executable)
@@ -1,8 +1,4 @@
 cat <<EOF
-static __always_inline ${ret}
-arch_${atomic}_${pfx}${name}${sfx}_release(${params})
-{
        __atomic_release_fence();
        ${retstmt}arch_${atomic}_${pfx}${name}${sfx}_relaxed(${args});
-}
 EOF
index 05cdb7f..16a374a 100755 (executable)
@@ -1,12 +1,8 @@
 cat <<EOF
-static __always_inline void
-arch_${atomic}_set_release(${atomic}_t *v, ${int} i)
-{
        if (__native_word(${atomic}_t)) {
                smp_store_release(&(v)->counter, i);
        } else {
                __atomic_release_fence();
-               arch_${atomic}_set(v, i);
+               raw_${atomic}_set(v, i);
        }
-}
 EOF
index 260f373..d1f746f 100755 (executable)
@@ -1,16 +1,3 @@
 cat <<EOF
-/**
- * arch_${atomic}_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer of type ${atomic}_t
- *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
- */
-static __always_inline bool
-arch_${atomic}_sub_and_test(${int} i, ${atomic}_t *v)
-{
-       return arch_${atomic}_sub_return(i, v) == 0;
-}
+       return raw_${atomic}_sub_return(i, v) == 0;
 EOF
index 890f850..d4da820 100755 (executable)
@@ -1,11 +1,7 @@
 cat <<EOF
-static __always_inline bool
-arch_${atomic}_try_cmpxchg${order}(${atomic}_t *v, ${int} *old, ${int} new)
-{
        ${int} r, o = *old;
-       r = arch_${atomic}_cmpxchg${order}(v, o, new);
+       r = raw_${atomic}_cmpxchg${order}(v, o, new);
        if (unlikely(r != o))
                *old = r;
        return likely(r == o);
-}
 EOF
diff --git a/scripts/atomic/fallbacks/xchg b/scripts/atomic/fallbacks/xchg
new file mode 100644 (file)
index 0000000..e4def1e
--- /dev/null
@@ -0,0 +1,3 @@
+cat <<EOF
+       return raw_xchg${order}(&v->counter, new);
+EOF
index 6e853f0..c0c8a85 100755 (executable)
@@ -17,23 +17,16 @@ gen_template_fallback()
        local atomic="$1"; shift
        local int="$1"; shift
 
-       local atomicname="arch_${atomic}_${pfx}${name}${sfx}${order}"
-
        local ret="$(gen_ret_type "${meta}" "${int}")"
        local retstmt="$(gen_ret_stmt "${meta}")"
        local params="$(gen_params "${int}" "${atomic}" "$@")"
        local args="$(gen_args "$@")"
 
-       if [ ! -z "${template}" ]; then
-               printf "#ifndef ${atomicname}\n"
-               . ${template}
-               printf "#define ${atomicname} ${atomicname}\n"
-               printf "#endif\n\n"
-       fi
+       . ${template}
 }
 
-#gen_proto_fallback(meta, pfx, name, sfx, order, atomic, int, args...)
-gen_proto_fallback()
+#gen_order_fallback(meta, pfx, name, sfx, order, atomic, int, args...)
+gen_order_fallback()
 {
        local meta="$1"; shift
        local pfx="$1"; shift
@@ -41,87 +34,124 @@ gen_proto_fallback()
        local sfx="$1"; shift
        local order="$1"; shift
 
-       local tmpl="$(find_fallback_template "${pfx}" "${name}" "${sfx}" "${order}")"
+       local tmpl_order=${order#_}
+       local tmpl="${ATOMICDIR}/fallbacks/${tmpl_order:-fence}"
        gen_template_fallback "${tmpl}" "${meta}" "${pfx}" "${name}" "${sfx}" "${order}" "$@"
 }
 
-#gen_basic_fallbacks(basename)
-gen_basic_fallbacks()
-{
-       local basename="$1"; shift
-cat << EOF
-#define ${basename}_acquire ${basename}
-#define ${basename}_release ${basename}
-#define ${basename}_relaxed ${basename}
-EOF
-}
-
-gen_proto_order_variant()
+#gen_proto_fallback(meta, pfx, name, sfx, order, atomic, int, args...)
+gen_proto_fallback()
 {
        local meta="$1"; shift
        local pfx="$1"; shift
        local name="$1"; shift
        local sfx="$1"; shift
        local order="$1"; shift
-       local atomic="$1"
 
-       local basename="arch_${atomic}_${pfx}${name}${sfx}"
-
-       printf "#define ${basename}${order} ${basename}${order}\n"
+       local tmpl="$(find_fallback_template "${pfx}" "${name}" "${sfx}" "${order}")"
+       gen_template_fallback "${tmpl}" "${meta}" "${pfx}" "${name}" "${sfx}" "${order}" "$@"
 }
 
-#gen_proto_order_variants(meta, pfx, name, sfx, atomic, int, args...)
-gen_proto_order_variants()
+#gen_proto_order_variant(meta, pfx, name, sfx, order, atomic, int, args...)
+gen_proto_order_variant()
 {
        local meta="$1"; shift
        local pfx="$1"; shift
        local name="$1"; shift
        local sfx="$1"; shift
-       local atomic="$1"
+       local order="$1"; shift
+       local atomic="$1"; shift
+       local int="$1"; shift
 
-       local basename="arch_${atomic}_${pfx}${name}${sfx}"
+       local atomicname="${atomic}_${pfx}${name}${sfx}${order}"
+       local basename="${atomic}_${pfx}${name}${sfx}"
 
        local template="$(find_fallback_template "${pfx}" "${name}" "${sfx}" "${order}")"
 
-       # If we don't have relaxed atomics, then we don't bother with ordering fallbacks
-       # read_acquire and set_release need to be templated, though
-       if ! meta_has_relaxed "${meta}"; then
-               gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "" "$@"
+       local ret="$(gen_ret_type "${meta}" "${int}")"
+       local retstmt="$(gen_ret_stmt "${meta}")"
+       local params="$(gen_params "${int}" "${atomic}" "$@")"
+       local args="$(gen_args "$@")"
 
-               if meta_has_acquire "${meta}"; then
-                       gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "_acquire" "$@"
-               fi
+       gen_kerneldoc "raw_" "${meta}" "${pfx}" "${name}" "${sfx}" "${order}" "${atomic}" "${int}" "$@"
+
+       printf "static __always_inline ${ret}\n"
+       printf "raw_${atomicname}(${params})\n"
+       printf "{\n"
+
+       # Where there is no possible fallback, this order variant is mandatory
+       # and must be provided by arch code. Add a comment to the header to
+       # make this obvious.
+       #
+       # Ideally we'd error on a missing definition, but arch code might
+       # define this order variant as a C function without a preprocessor
+       # symbol.
+       if [ -z ${template} ] && [ -z "${order}" ] && ! meta_has_relaxed "${meta}"; then
+               printf "\t${retstmt}arch_${atomicname}(${args});\n"
+               printf "}\n\n"
+               return
+       fi
 
-               if meta_has_release "${meta}"; then
-                       gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "_release" "$@"
-               fi
+       printf "#if defined(arch_${atomicname})\n"
+       printf "\t${retstmt}arch_${atomicname}(${args});\n"
 
-               return
+       # Allow FULL/ACQUIRE/RELEASE ops to be defined in terms of RELAXED ops
+       if [ "${order}" != "_relaxed" ] && meta_has_relaxed "${meta}"; then
+               printf "#elif defined(arch_${basename}_relaxed)\n"
+               gen_order_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "${order}" "${atomic}" "${int}" "$@"
        fi
 
-       printf "#ifndef ${basename}_relaxed\n"
+       # Allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops
+       if [ ! -z "${order}" ]; then
+               printf "#elif defined(arch_${basename})\n"
+               printf "\t${retstmt}arch_${basename}(${args});\n"
+       fi
 
+       printf "#else\n"
        if [ ! -z "${template}" ]; then
-               printf "#ifdef ${basename}\n"
+               gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "${order}" "${atomic}" "${int}" "$@"
+       else
+               printf "#error \"Unable to define raw_${atomicname}\"\n"
        fi
 
-       gen_basic_fallbacks "${basename}"
+       printf "#endif\n"
+       printf "}\n\n"
+}
 
-       if [ ! -z "${template}" ]; then
-               printf "#endif /* ${basename} */\n\n"
-               gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "" "$@"
-               gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "_acquire" "$@"
-               gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "_release" "$@"
-               gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "_relaxed" "$@"
+
+#gen_proto_order_variants(meta, pfx, name, sfx, atomic, int, args...)
+gen_proto_order_variants()
+{
+       local meta="$1"; shift
+       local pfx="$1"; shift
+       local name="$1"; shift
+       local sfx="$1"; shift
+       local atomic="$1"
+
+       gen_proto_order_variant "${meta}" "${pfx}" "${name}" "${sfx}" "" "$@"
+
+       if meta_has_acquire "${meta}"; then
+               gen_proto_order_variant "${meta}" "${pfx}" "${name}" "${sfx}" "_acquire" "$@"
        fi
 
-       printf "#else /* ${basename}_relaxed */\n\n"
+       if meta_has_release "${meta}"; then
+               gen_proto_order_variant "${meta}" "${pfx}" "${name}" "${sfx}" "_release" "$@"
+       fi
 
-       gen_template_fallback "${ATOMICDIR}/fallbacks/acquire"  "${meta}" "${pfx}" "${name}" "${sfx}" "_acquire" "$@"
-       gen_template_fallback "${ATOMICDIR}/fallbacks/release"  "${meta}" "${pfx}" "${name}" "${sfx}" "_release" "$@"
-       gen_template_fallback "${ATOMICDIR}/fallbacks/fence"  "${meta}" "${pfx}" "${name}" "${sfx}" "" "$@"
+       if meta_has_relaxed "${meta}"; then
+               gen_proto_order_variant "${meta}" "${pfx}" "${name}" "${sfx}" "_relaxed" "$@"
+       fi
+}
 
-       printf "#endif /* ${basename}_relaxed */\n\n"
+#gen_basic_fallbacks(basename)
+gen_basic_fallbacks()
+{
+       local basename="$1"; shift
+cat << EOF
+#define raw_${basename}_acquire arch_${basename}
+#define raw_${basename}_release arch_${basename}
+#define raw_${basename}_relaxed arch_${basename}
+EOF
 }
 
 gen_order_fallbacks()
@@ -130,36 +160,65 @@ gen_order_fallbacks()
 
 cat <<EOF
 
-#ifndef ${xchg}_acquire
-#define ${xchg}_acquire(...) \\
-       __atomic_op_acquire(${xchg}, __VA_ARGS__)
+#define raw_${xchg}_relaxed arch_${xchg}_relaxed
+
+#ifdef arch_${xchg}_acquire
+#define raw_${xchg}_acquire arch_${xchg}_acquire
+#else
+#define raw_${xchg}_acquire(...) \\
+       __atomic_op_acquire(arch_${xchg}, __VA_ARGS__)
 #endif
 
-#ifndef ${xchg}_release
-#define ${xchg}_release(...) \\
-       __atomic_op_release(${xchg}, __VA_ARGS__)
+#ifdef arch_${xchg}_release
+#define raw_${xchg}_release arch_${xchg}_release
+#else
+#define raw_${xchg}_release(...) \\
+       __atomic_op_release(arch_${xchg}, __VA_ARGS__)
 #endif
 
-#ifndef ${xchg}
-#define ${xchg}(...) \\
-       __atomic_op_fence(${xchg}, __VA_ARGS__)
+#ifdef arch_${xchg}
+#define raw_${xchg} arch_${xchg}
+#else
+#define raw_${xchg}(...) \\
+       __atomic_op_fence(arch_${xchg}, __VA_ARGS__)
 #endif
 
 EOF
 }
 
-gen_xchg_fallbacks()
+gen_xchg_order_fallback()
 {
        local xchg="$1"; shift
-       printf "#ifndef ${xchg}_relaxed\n"
+       local order="$1"; shift
+       local forder="${order:-_fence}"
 
-       gen_basic_fallbacks ${xchg}
+       printf "#if defined(arch_${xchg}${order})\n"
+       printf "#define raw_${xchg}${order} arch_${xchg}${order}\n"
 
-       printf "#else /* ${xchg}_relaxed */\n"
+       if [ "${order}" != "_relaxed" ]; then
+               printf "#elif defined(arch_${xchg}_relaxed)\n"
+               printf "#define raw_${xchg}${order}(...) \\\\\n"
+               printf "        __atomic_op${forder}(arch_${xchg}, __VA_ARGS__)\n"
+       fi
+
+       if [ ! -z "${order}" ]; then
+               printf "#elif defined(arch_${xchg})\n"
+               printf "#define raw_${xchg}${order} arch_${xchg}\n"
+       fi
 
-       gen_order_fallbacks ${xchg}
+       printf "#else\n"
+       printf "extern void raw_${xchg}${order}_not_implemented(void);\n"
+       printf "#define raw_${xchg}${order}(...) raw_${xchg}${order}_not_implemented()\n"
+       printf "#endif\n\n"
+}
+
+gen_xchg_fallbacks()
+{
+       local xchg="$1"; shift
 
-       printf "#endif /* ${xchg}_relaxed */\n\n"
+       for order in "" "_acquire" "_release" "_relaxed"; do
+               gen_xchg_order_fallback "${xchg}" "${order}"
+       done
 }
 
 gen_try_cmpxchg_fallback()
@@ -168,40 +227,61 @@ gen_try_cmpxchg_fallback()
        local order="$1"; shift;
 
 cat <<EOF
-#ifndef arch_try_${cmpxchg}${order}
-#define arch_try_${cmpxchg}${order}(_ptr, _oldp, _new) \\
+#define raw_try_${cmpxchg}${order}(_ptr, _oldp, _new) \\
 ({ \\
        typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \\
-       ___r = arch_${cmpxchg}${order}((_ptr), ___o, (_new)); \\
+       ___r = raw_${cmpxchg}${order}((_ptr), ___o, (_new)); \\
        if (unlikely(___r != ___o)) \\
                *___op = ___r; \\
        likely(___r == ___o); \\
 })
-#endif /* arch_try_${cmpxchg}${order} */
-
 EOF
 }
 
-gen_try_cmpxchg_fallbacks()
+gen_try_cmpxchg_order_fallback()
 {
-       local cmpxchg="$1"; shift;
+       local cmpxchg="$1"; shift
+       local order="$1"; shift
+       local forder="${order:-_fence}"
 
-       printf "#ifndef arch_try_${cmpxchg}_relaxed\n"
-       printf "#ifdef arch_try_${cmpxchg}\n"
+       printf "#if defined(arch_try_${cmpxchg}${order})\n"
+       printf "#define raw_try_${cmpxchg}${order} arch_try_${cmpxchg}${order}\n"
+
+       if [ "${order}" != "_relaxed" ]; then
+               printf "#elif defined(arch_try_${cmpxchg}_relaxed)\n"
+               printf "#define raw_try_${cmpxchg}${order}(...) \\\\\n"
+               printf "        __atomic_op${forder}(arch_try_${cmpxchg}, __VA_ARGS__)\n"
+       fi
 
-       gen_basic_fallbacks "arch_try_${cmpxchg}"
+       if [ ! -z "${order}" ]; then
+               printf "#elif defined(arch_try_${cmpxchg})\n"
+               printf "#define raw_try_${cmpxchg}${order} arch_try_${cmpxchg}\n"
+       fi
 
-       printf "#endif /* arch_try_${cmpxchg} */\n\n"
+       printf "#else\n"
+       gen_try_cmpxchg_fallback "${cmpxchg}" "${order}"
+       printf "#endif\n\n"
+}
+
+gen_try_cmpxchg_fallbacks()
+{
+       local cmpxchg="$1"; shift;
 
        for order in "" "_acquire" "_release" "_relaxed"; do
-               gen_try_cmpxchg_fallback "${cmpxchg}" "${order}"
+               gen_try_cmpxchg_order_fallback "${cmpxchg}" "${order}"
        done
+}
 
-       printf "#else /* arch_try_${cmpxchg}_relaxed */\n"
-
-       gen_order_fallbacks "arch_try_${cmpxchg}"
+gen_cmpxchg_local_fallbacks()
+{
+       local cmpxchg="$1"; shift
 
-       printf "#endif /* arch_try_${cmpxchg}_relaxed */\n\n"
+       printf "#define raw_${cmpxchg} arch_${cmpxchg}\n\n"
+       printf "#ifdef arch_try_${cmpxchg}\n"
+       printf "#define raw_try_${cmpxchg} arch_try_${cmpxchg}\n"
+       printf "#else\n"
+       gen_try_cmpxchg_fallback "${cmpxchg}" ""
+       printf "#endif\n\n"
 }
 
 cat << EOF
@@ -217,16 +297,20 @@ cat << EOF
 
 EOF
 
-for xchg in "arch_xchg" "arch_cmpxchg" "arch_cmpxchg64"; do
+for xchg in "xchg" "cmpxchg" "cmpxchg64" "cmpxchg128"; do
        gen_xchg_fallbacks "${xchg}"
 done
 
-for cmpxchg in "cmpxchg" "cmpxchg64"; do
+for cmpxchg in "cmpxchg" "cmpxchg64" "cmpxchg128"; do
        gen_try_cmpxchg_fallbacks "${cmpxchg}"
 done
 
-for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
-       gen_try_cmpxchg_fallback "${cmpxchg}" ""
+for cmpxchg in "cmpxchg_local" "cmpxchg64_local" "cmpxchg128_local"; do
+       gen_cmpxchg_local_fallbacks "${cmpxchg}" ""
+done
+
+for cmpxchg in "sync_cmpxchg"; do
+       printf "#define raw_${cmpxchg} arch_${cmpxchg}\n\n"
 done
 
 grep '^[a-z]' "$1" | while read name meta args; do
index d9ffd74..8f8f8e3 100755 (executable)
@@ -68,12 +68,14 @@ gen_proto_order_variant()
        local args="$(gen_args "$@")"
        local retstmt="$(gen_ret_stmt "${meta}")"
 
+       gen_kerneldoc "" "${meta}" "${pfx}" "${name}" "${sfx}" "${order}" "${atomic}" "${int}" "$@"
+
 cat <<EOF
 static __always_inline ${ret}
 ${atomicname}(${params})
 {
 ${checks}
-       ${retstmt}arch_${atomicname}(${args});
+       ${retstmt}raw_${atomicname}(${args});
 }
 EOF
 
@@ -84,7 +86,6 @@ gen_xchg()
 {
        local xchg="$1"; shift
        local order="$1"; shift
-       local mult="$1"; shift
 
        kcsan_barrier=""
        if [ "${xchg%_local}" = "${xchg}" ]; then
@@ -104,9 +105,9 @@ cat <<EOF
 EOF
 [ -n "$kcsan_barrier" ] && printf "\t${kcsan_barrier}; \\\\\n"
 cat <<EOF
-       instrument_atomic_read_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
-       instrument_read_write(__ai_oldp, ${mult}sizeof(*__ai_oldp)); \\
-       arch_${xchg}${order}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \\
+       instrument_read_write(__ai_oldp, sizeof(*__ai_oldp)); \\
+       raw_${xchg}${order}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\
 })
 EOF
 
@@ -119,8 +120,8 @@ cat <<EOF
 EOF
 [ -n "$kcsan_barrier" ] && printf "\t${kcsan_barrier}; \\\\\n"
 cat <<EOF
-       instrument_atomic_read_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
-       arch_${xchg}${order}(__ai_ptr, __VA_ARGS__); \\
+       instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \\
+       raw_${xchg}${order}(__ai_ptr, __VA_ARGS__); \\
 })
 EOF
 
@@ -134,15 +135,10 @@ cat << EOF
 // DO NOT MODIFY THIS FILE DIRECTLY
 
 /*
- * This file provides wrappers with KASAN instrumentation for atomic operations.
- * To use this functionality an arch's atomic.h file needs to define all
- * atomic operations with arch_ prefix (e.g. arch_atomic_read()) and include
- * this file at the end. This file provides atomic_read() that forwards to
- * arch_atomic_read() for actual atomic operation.
- * Note: if an arch atomic operation is implemented by means of other atomic
- * operations (e.g. atomic_read()/atomic_cmpxchg() loop), then it needs to use
- * arch_ variants (i.e. arch_atomic_read()/arch_atomic_cmpxchg()) to avoid
- * double instrumentation.
+ * This file provoides atomic operations with explicit instrumentation (e.g.
+ * KASAN, KCSAN), which should be used unless it is necessary to avoid
+ * instrumentation. Where it is necessary to aovid instrumenation, the
+ * raw_atomic*() operations should be used.
  */
 #ifndef _LINUX_ATOMIC_INSTRUMENTED_H
 #define _LINUX_ATOMIC_INSTRUMENTED_H
@@ -166,24 +162,18 @@ grep '^[a-z]' "$1" | while read name meta args; do
 done
 
 
-for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
+for xchg in "xchg" "cmpxchg" "cmpxchg64" "cmpxchg128" "try_cmpxchg" "try_cmpxchg64" "try_cmpxchg128"; do
        for order in "" "_acquire" "_release" "_relaxed"; do
-               gen_xchg "${xchg}" "${order}" ""
+               gen_xchg "${xchg}" "${order}"
                printf "\n"
        done
 done
 
-for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
-       gen_xchg "${xchg}" "" ""
+for xchg in "cmpxchg_local" "cmpxchg64_local" "cmpxchg128_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" "try_cmpxchg128_local"; do
+       gen_xchg "${xchg}" ""
        printf "\n"
 done
 
-gen_xchg "cmpxchg_double" "" "2 * "
-
-printf "\n\n"
-
-gen_xchg "cmpxchg_double_local" "" "2 * "
-
 cat <<EOF
 
 #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
index eda89ce..9826be3 100755 (executable)
@@ -32,24 +32,34 @@ gen_args_cast()
        done
 }
 
-#gen_proto_order_variant(meta, pfx, name, sfx, order, atomic, int, arg...)
+#gen_proto_order_variant(meta, pfx, name, sfx, order, arg...)
 gen_proto_order_variant()
 {
        local meta="$1"; shift
-       local name="$1$2$3$4"; shift; shift; shift; shift
-       local atomic="$1"; shift
-       local int="$1"; shift
+       local pfx="$1"; shift
+       local name="$1"; shift
+       local sfx="$1"; shift
+       local order="$1"; shift
+
+       local atomicname="${pfx}${name}${sfx}${order}"
 
        local ret="$(gen_ret_type "${meta}" "long")"
        local params="$(gen_params "long" "atomic_long" "$@")"
-       local argscast="$(gen_args_cast "${int}" "${atomic}" "$@")"
+       local argscast_32="$(gen_args_cast "int" "atomic" "$@")"
+       local argscast_64="$(gen_args_cast "s64" "atomic64" "$@")"
        local retstmt="$(gen_ret_stmt "${meta}")"
 
+       gen_kerneldoc "raw_" "${meta}" "${pfx}" "${name}" "${sfx}" "${order}" "atomic_long" "long" "$@"
+
 cat <<EOF
 static __always_inline ${ret}
-arch_atomic_long_${name}(${params})
+raw_atomic_long_${atomicname}(${params})
 {
-       ${retstmt}arch_${atomic}_${name}(${argscast});
+#ifdef CONFIG_64BIT
+       ${retstmt}raw_atomic64_${atomicname}(${argscast_64});
+#else
+       ${retstmt}raw_atomic_${atomicname}(${argscast_32});
+#endif
 }
 
 EOF
@@ -79,24 +89,12 @@ typedef atomic_t atomic_long_t;
 #define atomic_long_cond_read_relaxed  atomic_cond_read_relaxed
 #endif
 
-#ifdef CONFIG_64BIT
-
-EOF
-
-grep '^[a-z]' "$1" | while read name meta args; do
-       gen_proto "${meta}" "${name}" "atomic64" "s64" ${args}
-done
-
-cat <<EOF
-#else /* CONFIG_64BIT */
-
 EOF
 
 grep '^[a-z]' "$1" | while read name meta args; do
-       gen_proto "${meta}" "${name}" "atomic" "int" ${args}
+       gen_proto "${meta}" "${name}" ${args}
 done
 
 cat <<EOF
-#endif /* CONFIG_64BIT */
 #endif /* _LINUX_ATOMIC_LONG_H */
 EOF
diff --git a/scripts/atomic/kerneldoc/add b/scripts/atomic/kerneldoc/add
new file mode 100644 (file)
index 0000000..991f3da
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic add with ${desc_order} ordering
+ * @i: ${int} value to add
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v + @i) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * ${desc_return}
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/add_negative b/scripts/atomic/kerneldoc/add_negative
new file mode 100644 (file)
index 0000000..f4ca1f0
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic add and test if negative with ${desc_order} ordering
+ * @i: ${int} value to add
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v + @i) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: @true if the resulting value of @v is negative, @false otherwise.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/add_unless b/scripts/atomic/kerneldoc/add_unless
new file mode 100644 (file)
index 0000000..f828e5f
--- /dev/null
@@ -0,0 +1,18 @@
+if [ -z "${pfx}" ]; then
+       desc_return="Return: @true if @v was updated, @false otherwise."
+fi
+
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic add unless value with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ * @a: ${int} value to add
+ * @u: ${int} value to compare with
+ *
+ * If (@v != @u), atomically updates @v to (@v + @a) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * ${desc_return}
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/and b/scripts/atomic/kerneldoc/and
new file mode 100644 (file)
index 0000000..a923574
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic bitwise AND with ${desc_order} ordering
+ * @i: ${int} value
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v & @i) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * ${desc_return}
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/andnot b/scripts/atomic/kerneldoc/andnot
new file mode 100644 (file)
index 0000000..64bb509
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic bitwise AND NOT with ${desc_order} ordering
+ * @i: ${int} value
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v & ~@i) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * ${desc_return}
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/cmpxchg b/scripts/atomic/kerneldoc/cmpxchg
new file mode 100644 (file)
index 0000000..3bce328
--- /dev/null
@@ -0,0 +1,14 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic compare and exchange with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ * @old: ${int} value to compare with
+ * @new: ${int} value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: The original value of @v.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/dec b/scripts/atomic/kerneldoc/dec
new file mode 100644 (file)
index 0000000..bbeecbc
--- /dev/null
@@ -0,0 +1,12 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic decrement with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v - 1) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * ${desc_return}
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/dec_and_test b/scripts/atomic/kerneldoc/dec_and_test
new file mode 100644 (file)
index 0000000..71bbd23
--- /dev/null
@@ -0,0 +1,12 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic decrement and test if zero with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v - 1) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/dec_if_positive b/scripts/atomic/kerneldoc/dec_if_positive
new file mode 100644 (file)
index 0000000..04f1aed
--- /dev/null
@@ -0,0 +1,12 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic decrement if positive with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ *
+ * If (@v > 0), atomically updates @v to (@v - 1) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: The old value of (@v - 1), regardless of whether @v was updated.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/dec_unless_positive b/scripts/atomic/kerneldoc/dec_unless_positive
new file mode 100644 (file)
index 0000000..ee73612
--- /dev/null
@@ -0,0 +1,12 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic decrement unless positive with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ *
+ * If (@v <= 0), atomically updates @v to (@v - 1) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/inc b/scripts/atomic/kerneldoc/inc
new file mode 100644 (file)
index 0000000..9f14f1b
--- /dev/null
@@ -0,0 +1,12 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic increment with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v + 1) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * ${desc_return}
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/inc_and_test b/scripts/atomic/kerneldoc/inc_and_test
new file mode 100644 (file)
index 0000000..971694d
--- /dev/null
@@ -0,0 +1,12 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic increment and test if zero with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v + 1) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/inc_not_zero b/scripts/atomic/kerneldoc/inc_not_zero
new file mode 100644 (file)
index 0000000..618be08
--- /dev/null
@@ -0,0 +1,12 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic increment unless zero with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ *
+ * If (@v != 0), atomically updates @v to (@v + 1) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/inc_unless_negative b/scripts/atomic/kerneldoc/inc_unless_negative
new file mode 100644 (file)
index 0000000..597f23d
--- /dev/null
@@ -0,0 +1,12 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic increment unless negative with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ *
+ * If (@v >= 0), atomically updates @v to (@v + 1) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: @true if @v was updated, @false otherwise.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/or b/scripts/atomic/kerneldoc/or
new file mode 100644 (file)
index 0000000..55b33de
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic bitwise OR with ${desc_order} ordering
+ * @i: ${int} value
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v | @i) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * ${desc_return}
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/read b/scripts/atomic/kerneldoc/read
new file mode 100644 (file)
index 0000000..89fe614
--- /dev/null
@@ -0,0 +1,12 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic load with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically loads the value of @v with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: The value loaded from @v.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/set b/scripts/atomic/kerneldoc/set
new file mode 100644 (file)
index 0000000..e82cb9e
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic set with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ * @i: ${int} value to assign
+ *
+ * Atomically sets @v to @i with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: Nothing.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/sub b/scripts/atomic/kerneldoc/sub
new file mode 100644 (file)
index 0000000..3ba642d
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic subtract with ${desc_order} ordering
+ * @i: ${int} value to subtract
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v - @i) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * ${desc_return}
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/sub_and_test b/scripts/atomic/kerneldoc/sub_and_test
new file mode 100644 (file)
index 0000000..d3760f7
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic subtract and test if zero with ${desc_order} ordering
+ * @i: ${int} value to add
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v - @i) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: @true if the resulting value of @v is zero, @false otherwise.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/try_cmpxchg b/scripts/atomic/kerneldoc/try_cmpxchg
new file mode 100644 (file)
index 0000000..2965532
--- /dev/null
@@ -0,0 +1,15 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic compare and exchange with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ * @old: pointer to ${int} value to compare with
+ * @new: ${int} value to assign
+ *
+ * If (@v == @old), atomically updates @v to @new with ${desc_order} ordering.
+ * Otherwise, updates @old to the current value of @v.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: @true if the exchange occured, @false otherwise.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/xchg b/scripts/atomic/kerneldoc/xchg
new file mode 100644 (file)
index 0000000..75f04c0
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic exchange with ${desc_order} ordering
+ * @v: pointer to ${atomic}_t
+ * @new: ${int} value to assign
+ *
+ * Atomically updates @v to @new with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * Return: The original value of @v.
+ */
+EOF
diff --git a/scripts/atomic/kerneldoc/xor b/scripts/atomic/kerneldoc/xor
new file mode 100644 (file)
index 0000000..8837270
--- /dev/null
@@ -0,0 +1,13 @@
+cat <<EOF
+/**
+ * ${class}${atomicname}() - atomic bitwise XOR with ${desc_order} ordering
+ * @i: ${int} value
+ * @v: pointer to ${atomic}_t
+ *
+ * Atomically updates @v to (@v ^ @i) with ${desc_order} ordering.
+ *
+ * ${desc_noinstr}
+ *
+ * ${desc_return}
+ */
+EOF
index 471300b..50a92c4 100644 (file)
@@ -48,12 +48,12 @@ if IS_BUILTIN(CONFIG_COMMON_CLK):
     LX_GDBPARSED(CLK_GET_RATE_NOCACHE)
 
 /* linux/fs.h */
-LX_VALUE(SB_RDONLY)
-LX_VALUE(SB_SYNCHRONOUS)
-LX_VALUE(SB_MANDLOCK)
-LX_VALUE(SB_DIRSYNC)
-LX_VALUE(SB_NOATIME)
-LX_VALUE(SB_NODIRATIME)
+LX_GDBPARSED(SB_RDONLY)
+LX_GDBPARSED(SB_SYNCHRONOUS)
+LX_GDBPARSED(SB_MANDLOCK)
+LX_GDBPARSED(SB_DIRSYNC)
+LX_GDBPARSED(SB_NOATIME)
+LX_GDBPARSED(SB_NODIRATIME)
 
 /* linux/htimer.h */
 LX_GDBPARSED(hrtimer_resolution)
index b2ce416..6c9aed1 100755 (executable)
@@ -63,11 +63,11 @@ fi
 
 # Extract GFP flags from the kernel source
 TMPFILE=`mktemp -t gfptranslate-XXXXXX` || exit 1
-grep -q ___GFP $SOURCE/include/linux/gfp.h
+grep -q ___GFP $SOURCE/include/linux/gfp_types.h
 if [ $? -eq 0 ]; then
-       grep "^#define ___GFP" $SOURCE/include/linux/gfp.h | sed -e 's/u$//' | grep -v GFP_BITS > $TMPFILE
+       grep "^#define ___GFP" $SOURCE/include/linux/gfp_types.h | sed -e 's/u$//' | grep -v GFP_BITS > $TMPFILE
 else
-       grep "^#define __GFP" $SOURCE/include/linux/gfp.h | sed -e 's/(__force gfp_t)//' | sed -e 's/u)/)/' | grep -v GFP_BITS | sed -e 's/)\//) \//' > $TMPFILE
+       grep "^#define __GFP" $SOURCE/include/linux/gfp_types.h | sed -e 's/(__force gfp_t)//' | sed -e 's/u)/)/' | grep -v GFP_BITS | sed -e 's/)\//) \//' > $TMPFILE
 fi
 
 # Parse the flags
index 2486689..eb70c1f 100755 (executable)
@@ -64,7 +64,7 @@ my $type_constant = '\b``([^\`]+)``\b';
 my $type_constant2 = '\%([-_\w]+)';
 my $type_func = '(\w+)\(\)';
 my $type_param = '\@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)';
-my $type_param_ref = '([\!]?)\@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)';
+my $type_param_ref = '([\!~]?)\@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)';
 my $type_fp_param = '\@(\w+)\(\)';  # Special RST handling for func ptr params
 my $type_fp_param2 = '\@(\w+->\S+)\(\)';  # Special RST handling for structs with func ptr params
 my $type_env = '(\$\w+)';
index 20d483e..dfd1863 100755 (executable)
@@ -17,7 +17,11 @@ binutils)
        echo 2.25.0
        ;;
 gcc)
-       echo 5.1.0
+       if [ "$SRCARCH" = parisc ]; then
+               echo 11.0.0
+       else
+               echo 5.1.0
+       fi
        ;;
 llvm)
        if [ "$SRCARCH" = s390 ]; then
@@ -27,7 +31,7 @@ llvm)
        fi
        ;;
 rustc)
-       echo 1.62.0
+       echo 1.68.2
        ;;
 bindgen)
        echo 0.56.0
index d4531d0..c12150f 100644 (file)
@@ -1979,6 +1979,11 @@ static void add_header(struct buffer *b, struct module *mod)
        buf_printf(b, "#include <linux/vermagic.h>\n");
        buf_printf(b, "#include <linux/compiler.h>\n");
        buf_printf(b, "\n");
+       buf_printf(b, "#ifdef CONFIG_UNWINDER_ORC\n");
+       buf_printf(b, "#include <asm/orc_header.h>\n");
+       buf_printf(b, "ORC_HEADER;\n");
+       buf_printf(b, "#endif\n");
+       buf_printf(b, "\n");
        buf_printf(b, "BUILD_SALT;\n");
        buf_printf(b, "BUILD_LTO_INFO;\n");
        buf_printf(b, "\n");
diff --git a/scripts/orc_hash.sh b/scripts/orc_hash.sh
new file mode 100644 (file)
index 0000000..466611a
--- /dev/null
@@ -0,0 +1,16 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+
+set -e
+
+printf '%s' '#define ORC_HASH '
+
+awk '
+/^#define ORC_(REG|TYPE)_/ { print }
+/^struct orc_entry {$/ { p=1 }
+p { print }
+/^}/ { p=0 }' |
+       sha1sum |
+       cut -d " " -f 1 |
+       sed 's/\([0-9a-f]\{2\}\)/0x\1,/g'
index d3662f4..ce541b0 100644 (file)
@@ -202,19 +202,19 @@ int ima_get_action(struct mnt_idmap *idmap, struct inode *inode,
                                allowed_algos);
 }
 
-static int ima_get_verity_digest(struct integrity_iint_cache *iint,
-                                struct ima_max_digest_data *hash)
+static bool ima_get_verity_digest(struct integrity_iint_cache *iint,
+                                 struct ima_max_digest_data *hash)
 {
-       enum hash_algo verity_alg;
-       int ret;
+       enum hash_algo alg;
+       int digest_len;
 
        /*
         * On failure, 'measure' policy rules will result in a file data
         * hash containing 0's.
         */
-       ret = fsverity_get_digest(iint->inode, hash->digest, &verity_alg);
-       if (ret)
-               return ret;
+       digest_len = fsverity_get_digest(iint->inode, hash->digest, NULL, &alg);
+       if (digest_len == 0)
+               return false;
 
        /*
         * Unlike in the case of actually calculating the file hash, in
@@ -223,9 +223,9 @@ static int ima_get_verity_digest(struct integrity_iint_cache *iint,
         * mismatch between the verity algorithm and the xattr signature
         * algorithm, if one exists, will be detected later.
         */
-       hash->hdr.algo = verity_alg;
-       hash->hdr.length = hash_digest_size[verity_alg];
-       return 0;
+       hash->hdr.algo = alg;
+       hash->hdr.length = digest_len;
+       return true;
 }
 
 /*
@@ -276,16 +276,9 @@ int ima_collect_measurement(struct integrity_iint_cache *iint,
        memset(&hash.digest, 0, sizeof(hash.digest));
 
        if (iint->flags & IMA_VERITY_REQUIRED) {
-               result = ima_get_verity_digest(iint, &hash);
-               switch (result) {
-               case 0:
-                       break;
-               case -ENODATA:
+               if (!ima_get_verity_digest(iint, &hash)) {
                        audit_cause = "no-verity-digest";
-                       break;
-               default:
-                       audit_cause = "invalid-verity-digest";
-                       break;
+                       result = -ENODATA;
                }
        } else if (buf) {
                result = ima_calc_buffer_hash(buf, size, &hash.hdr);
index 8e33c4e..c1e862a 100644 (file)
@@ -2,7 +2,7 @@
 
 config SECURITY_LANDLOCK
        bool "Landlock support"
-       depends on SECURITY && !ARCH_EPHEMERAL_INODES
+       depends on SECURITY
        select SECURITY_PATH
        help
          Landlock is a sandboxing mechanism that enables processes to restrict
index 46e273b..50a6b50 100644 (file)
@@ -141,6 +141,14 @@ int snd_pcm_area_copy(const struct snd_pcm_channel_area *src_channel,
 
 void *snd_pcm_plug_buf_alloc(struct snd_pcm_substream *plug, snd_pcm_uframes_t size);
 void snd_pcm_plug_buf_unlock(struct snd_pcm_substream *plug, void *ptr);
+#else
+
+static inline snd_pcm_sframes_t snd_pcm_plug_client_size(struct snd_pcm_substream *handle, snd_pcm_uframes_t drv_size) { return drv_size; }
+static inline snd_pcm_sframes_t snd_pcm_plug_slave_size(struct snd_pcm_substream *handle, snd_pcm_uframes_t clt_size) { return clt_size; }
+static inline int snd_pcm_plug_slave_format(int format, const struct snd_mask *format_mask) { return format; }
+
+#endif
+
 snd_pcm_sframes_t snd_pcm_oss_write3(struct snd_pcm_substream *substream,
                                     const char *ptr, snd_pcm_uframes_t size,
                                     int in_kernel);
@@ -151,14 +159,6 @@ snd_pcm_sframes_t snd_pcm_oss_writev3(struct snd_pcm_substream *substream,
 snd_pcm_sframes_t snd_pcm_oss_readv3(struct snd_pcm_substream *substream,
                                     void **bufs, snd_pcm_uframes_t frames);
 
-#else
-
-static inline snd_pcm_sframes_t snd_pcm_plug_client_size(struct snd_pcm_substream *handle, snd_pcm_uframes_t drv_size) { return drv_size; }
-static inline snd_pcm_sframes_t snd_pcm_plug_slave_size(struct snd_pcm_substream *handle, snd_pcm_uframes_t clt_size) { return clt_size; }
-static inline int snd_pcm_plug_slave_format(int format, const struct snd_mask *format_mask) { return format; }
-
-#endif
-
 #ifdef PLUGIN_DEBUG
 #define pdprintf(fmt, args...) printk(KERN_DEBUG "plugin: " fmt, ##args)
 #else
index 07efb38..f2940b2 100644 (file)
@@ -37,6 +37,7 @@ struct seq_oss_midi {
        struct snd_midi_event *coder;   /* MIDI event coder */
        struct seq_oss_devinfo *devinfo;        /* assigned OSSseq device */
        snd_use_lock_t use_lock;
+       struct mutex open_mutex;
 };
 
 
@@ -172,6 +173,7 @@ snd_seq_oss_midi_check_new_port(struct snd_seq_port_info *pinfo)
        mdev->flags = pinfo->capability;
        mdev->opened = 0;
        snd_use_lock_init(&mdev->use_lock);
+       mutex_init(&mdev->open_mutex);
 
        /* copy and truncate the name of synth device */
        strscpy(mdev->name, pinfo->name, sizeof(mdev->name));
@@ -322,15 +324,17 @@ snd_seq_oss_midi_open(struct seq_oss_devinfo *dp, int dev, int fmode)
        int perm;
        struct seq_oss_midi *mdev;
        struct snd_seq_port_subscribe subs;
+       int err;
 
        mdev = get_mididev(dp, dev);
        if (!mdev)
                return -ENODEV;
 
+       mutex_lock(&mdev->open_mutex);
        /* already used? */
        if (mdev->opened && mdev->devinfo != dp) {
-               snd_use_lock_free(&mdev->use_lock);
-               return -EBUSY;
+               err = -EBUSY;
+               goto unlock;
        }
 
        perm = 0;
@@ -340,14 +344,14 @@ snd_seq_oss_midi_open(struct seq_oss_devinfo *dp, int dev, int fmode)
                perm |= PERM_READ;
        perm &= mdev->flags;
        if (perm == 0) {
-               snd_use_lock_free(&mdev->use_lock);
-               return -ENXIO;
+               err = -ENXIO;
+               goto unlock;
        }
 
        /* already opened? */
        if ((mdev->opened & perm) == perm) {
-               snd_use_lock_free(&mdev->use_lock);
-               return 0;
+               err = 0;
+               goto unlock;
        }
 
        perm &= ~mdev->opened;
@@ -372,13 +376,17 @@ snd_seq_oss_midi_open(struct seq_oss_devinfo *dp, int dev, int fmode)
        }
 
        if (! mdev->opened) {
-               snd_use_lock_free(&mdev->use_lock);
-               return -ENXIO;
+               err = -ENXIO;
+               goto unlock;
        }
 
        mdev->devinfo = dp;
+       err = 0;
+
+ unlock:
+       mutex_unlock(&mdev->open_mutex);
        snd_use_lock_free(&mdev->use_lock);
-       return 0;
+       return err;
 }
 
 /*
@@ -393,10 +401,9 @@ snd_seq_oss_midi_close(struct seq_oss_devinfo *dp, int dev)
        mdev = get_mididev(dp, dev);
        if (!mdev)
                return -ENODEV;
-       if (! mdev->opened || mdev->devinfo != dp) {
-               snd_use_lock_free(&mdev->use_lock);
-               return 0;
-       }
+       mutex_lock(&mdev->open_mutex);
+       if (!mdev->opened || mdev->devinfo != dp)
+               goto unlock;
 
        memset(&subs, 0, sizeof(subs));
        if (mdev->opened & PERM_WRITE) {
@@ -415,6 +422,8 @@ snd_seq_oss_midi_close(struct seq_oss_devinfo *dp, int dev)
        mdev->opened = 0;
        mdev->devinfo = NULL;
 
+ unlock:
+       mutex_unlock(&mdev->open_mutex);
        snd_use_lock_free(&mdev->use_lock);
        return 0;
 }
index a15f55b..295163b 100644 (file)
@@ -259,8 +259,10 @@ int snd_dg00x_stream_init_duplex(struct snd_dg00x *dg00x)
                return err;
 
        err = init_stream(dg00x, &dg00x->tx_stream);
-       if (err < 0)
+       if (err < 0) {
                destroy_stream(dg00x, &dg00x->rx_stream);
+               return err;
+       }
 
        err = amdtp_domain_init(&dg00x->domain);
        if (err < 0) {
index accc9d2..6c043fb 100644 (file)
@@ -611,7 +611,7 @@ EXPORT_SYMBOL_GPL(snd_hdac_power_up_pm);
 int snd_hdac_keep_power_up(struct hdac_device *codec)
 {
        if (!atomic_inc_not_zero(&codec->in_pm)) {
-               int ret = pm_runtime_get_if_in_use(&codec->dev);
+               int ret = pm_runtime_get_if_active(&codec->dev, true);
                if (!ret)
                        return -1;
                if (ret < 0)
index 230f65a..388db5f 100644 (file)
@@ -892,10 +892,10 @@ int snd_gf1_pcm_new(struct snd_gus_card *gus, int pcm_dev, int control_index)
                kctl = snd_ctl_new1(&snd_gf1_pcm_volume_control1, gus);
        else
                kctl = snd_ctl_new1(&snd_gf1_pcm_volume_control, gus);
+       kctl->id.index = control_index;
        err = snd_ctl_add(card, kctl);
        if (err < 0)
                return err;
-       kctl->id.index = control_index;
 
        return 0;
 }
index 727db6d..6d25c12 100644 (file)
@@ -2688,20 +2688,20 @@ static int snd_cmipci_mixer_new(struct cmipci *cm, int pcm_spdif_device)
                }
                if (cm->can_ac3_hw) {
                        kctl = snd_ctl_new1(&snd_cmipci_spdif_default, cm);
+                       kctl->id.device = pcm_spdif_device;
                        err = snd_ctl_add(card, kctl);
                        if (err < 0)
                                return err;
-                       kctl->id.device = pcm_spdif_device;
                        kctl = snd_ctl_new1(&snd_cmipci_spdif_mask, cm);
+                       kctl->id.device = pcm_spdif_device;
                        err = snd_ctl_add(card, kctl);
                        if (err < 0)
                                return err;
-                       kctl->id.device = pcm_spdif_device;
                        kctl = snd_ctl_new1(&snd_cmipci_spdif_stream, cm);
+                       kctl->id.device = pcm_spdif_device;
                        err = snd_ctl_add(card, kctl);
                        if (err < 0)
                                return err;
-                       kctl->id.device = pcm_spdif_device;
                }
                if (cm->chip_version <= 37) {
                        sw = snd_cmipci_old_mixer_switches;
index 62f4584..7d882b3 100644 (file)
@@ -531,7 +531,7 @@ static int load_firmware(struct snd_cs46xx *chip)
        return err;
 }
 
-int snd_cs46xx_download_image(struct snd_cs46xx *chip)
+static __maybe_unused int snd_cs46xx_download_image(struct snd_cs46xx *chip)
 {
        int idx, err;
        unsigned int offset = 0;
index 9f79c0a..bd19f92 100644 (file)
@@ -2458,10 +2458,14 @@ int snd_hda_create_dig_out_ctls(struct hda_codec *codec,
                   type == HDA_PCM_TYPE_HDMI) {
                /* suppose a single SPDIF device */
                for (dig_mix = dig_mixes; dig_mix->name; dig_mix++) {
+                       struct snd_ctl_elem_id id;
+
                        kctl = find_mixer_ctl(codec, dig_mix->name, 0, 0);
                        if (!kctl)
                                break;
-                       kctl->id.index = spdif_index;
+                       id = kctl->id;
+                       id.index = spdif_index;
+                       snd_ctl_rename_id(codec->card, &kctl->id, &id);
                }
                bus->primary_dig_out_type = HDA_PCM_TYPE_HDMI;
        }
index fc114e5..dbf7aa8 100644 (file)
@@ -1155,8 +1155,8 @@ static bool path_has_mixer(struct hda_codec *codec, int path_idx, int ctl_type)
        return path && path->ctls[ctl_type];
 }
 
-static const char * const channel_name[4] = {
-       "Front", "Surround", "CLFE", "Side"
+static const char * const channel_name[] = {
+       "Front", "Surround", "CLFE", "Side", "Back",
 };
 
 /* give some appropriate ctl name prefix for the given line out channel */
@@ -1182,7 +1182,7 @@ static const char *get_line_out_pfx(struct hda_codec *codec, int ch,
 
        /* multi-io channels */
        if (ch >= cfg->line_outs)
-               return channel_name[ch];
+               goto fixed_name;
 
        switch (cfg->line_out_type) {
        case AUTO_PIN_SPEAKER_OUT:
@@ -1234,6 +1234,7 @@ static const char *get_line_out_pfx(struct hda_codec *codec, int ch,
        if (cfg->line_outs == 1 && !spec->multi_ios)
                return "Line Out";
 
+ fixed_name:
        if (ch >= ARRAY_SIZE(channel_name)) {
                snd_BUG();
                return "PCM";
index 099722e..748a3c4 100644 (file)
@@ -1306,6 +1306,7 @@ static const struct snd_pci_quirk ca0132_quirks[] = {
        SND_PCI_QUIRK(0x1458, 0xA026, "Gigabyte G1.Sniper Z97", QUIRK_R3DI),
        SND_PCI_QUIRK(0x1458, 0xA036, "Gigabyte GA-Z170X-Gaming 7", QUIRK_R3DI),
        SND_PCI_QUIRK(0x3842, 0x1038, "EVGA X99 Classified", QUIRK_R3DI),
+       SND_PCI_QUIRK(0x3842, 0x104b, "EVGA X299 Dark", QUIRK_R3DI),
        SND_PCI_QUIRK(0x3842, 0x1055, "EVGA Z390 DARK", QUIRK_R3DI),
        SND_PCI_QUIRK(0x1102, 0x0013, "Recon3D", QUIRK_R3D),
        SND_PCI_QUIRK(0x1102, 0x0018, "Recon3D", QUIRK_R3D),
index 64a9440..5c0b1a0 100644 (file)
@@ -4589,6 +4589,11 @@ HDA_CODEC_ENTRY(0x10de009d, "GPU 9d HDMI/DP",    patch_nvhdmi),
 HDA_CODEC_ENTRY(0x10de009e, "GPU 9e HDMI/DP",  patch_nvhdmi),
 HDA_CODEC_ENTRY(0x10de009f, "GPU 9f HDMI/DP",  patch_nvhdmi),
 HDA_CODEC_ENTRY(0x10de00a0, "GPU a0 HDMI/DP",  patch_nvhdmi),
+HDA_CODEC_ENTRY(0x10de00a3, "GPU a3 HDMI/DP",  patch_nvhdmi),
+HDA_CODEC_ENTRY(0x10de00a4, "GPU a4 HDMI/DP",  patch_nvhdmi),
+HDA_CODEC_ENTRY(0x10de00a5, "GPU a5 HDMI/DP",  patch_nvhdmi),
+HDA_CODEC_ENTRY(0x10de00a6, "GPU a6 HDMI/DP",  patch_nvhdmi),
+HDA_CODEC_ENTRY(0x10de00a7, "GPU a7 HDMI/DP",  patch_nvhdmi),
 HDA_CODEC_ENTRY(0x10de8001, "MCP73 HDMI",      patch_nvhdmi_2ch),
 HDA_CODEC_ENTRY(0x10de8067, "MCP67/68 HDMI",   patch_nvhdmi_2ch),
 HDA_CODEC_ENTRY(0x67663d82, "Arise 82 HDMI/DP",        patch_gf_hdmi),
index 172ffc2..dabfdec 100644 (file)
@@ -7063,6 +7063,8 @@ enum {
        ALC225_FIXUP_DELL1_MIC_NO_PRESENCE,
        ALC295_FIXUP_DISABLE_DAC3,
        ALC285_FIXUP_SPEAKER2_TO_DAC1,
+       ALC285_FIXUP_ASUS_SPEAKER2_TO_DAC1,
+       ALC285_FIXUP_ASUS_HEADSET_MIC,
        ALC280_FIXUP_HP_HEADSET_MIC,
        ALC221_FIXUP_HP_FRONT_MIC,
        ALC292_FIXUP_TPT460,
@@ -8033,6 +8035,22 @@ static const struct hda_fixup alc269_fixups[] = {
                .chained = true,
                .chain_id = ALC269_FIXUP_THINKPAD_ACPI
        },
+       [ALC285_FIXUP_ASUS_SPEAKER2_TO_DAC1] = {
+               .type = HDA_FIXUP_FUNC,
+               .v.func = alc285_fixup_speaker2_to_dac1,
+               .chained = true,
+               .chain_id = ALC245_FIXUP_CS35L41_SPI_2
+       },
+       [ALC285_FIXUP_ASUS_HEADSET_MIC] = {
+               .type = HDA_FIXUP_PINS,
+               .v.pins = (const struct hda_pintbl[]) {
+                       { 0x19, 0x03a11050 },
+                       { 0x1b, 0x03a11c30 },
+                       { }
+               },
+               .chained = true,
+               .chain_id = ALC285_FIXUP_ASUS_SPEAKER2_TO_DAC1
+       },
        [ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER] = {
                .type = HDA_FIXUP_PINS,
                .v.pins = (const struct hda_pintbl[]) {
@@ -9363,7 +9381,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x103c, 0x802f, "HP Z240", ALC221_FIXUP_HP_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x103c, 0x8077, "HP", ALC256_FIXUP_HP_HEADSET_MIC),
        SND_PCI_QUIRK(0x103c, 0x8158, "HP", ALC256_FIXUP_HP_HEADSET_MIC),
-       SND_PCI_QUIRK(0x103c, 0x820d, "HP Pavilion 15", ALC269_FIXUP_HP_MUTE_LED_MIC3),
+       SND_PCI_QUIRK(0x103c, 0x820d, "HP Pavilion 15", ALC295_FIXUP_HP_X360),
        SND_PCI_QUIRK(0x103c, 0x8256, "HP", ALC221_FIXUP_HP_FRONT_MIC),
        SND_PCI_QUIRK(0x103c, 0x827e, "HP x360", ALC295_FIXUP_HP_X360),
        SND_PCI_QUIRK(0x103c, 0x827f, "HP x360", ALC269_FIXUP_HP_MUTE_LED_MIC3),
@@ -9458,7 +9476,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x103c, 0x8aa3, "HP ProBook 450 G9 (MB 8AA1)", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8aa8, "HP EliteBook 640 G9 (MB 8AA6)", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8aab, "HP EliteBook 650 G9 (MB 8AA9)", ALC236_FIXUP_HP_GPIO_LED),
-        SND_PCI_QUIRK(0x103c, 0x8abb, "HP ZBook Firefly 14 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8abb, "HP ZBook Firefly 14 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8ad1, "HP EliteBook 840 14 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8ad2, "HP EliteBook 860 16 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b42, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
@@ -9469,18 +9487,25 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x103c, 0x8b47, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b5d, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
        SND_PCI_QUIRK(0x103c, 0x8b5e, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
+       SND_PCI_QUIRK(0x103c, 0x8b63, "HP Elite Dragonfly 13.5 inch G4", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b65, "HP ProBook 455 15.6 inch G10 Notebook PC", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
        SND_PCI_QUIRK(0x103c, 0x8b66, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
+       SND_PCI_QUIRK(0x103c, 0x8b70, "HP EliteBook 835 G10", ALC287_FIXUP_CS35L41_I2C_2),
+       SND_PCI_QUIRK(0x103c, 0x8b72, "HP EliteBook 845 G10", ALC287_FIXUP_CS35L41_I2C_2),
+       SND_PCI_QUIRK(0x103c, 0x8b74, "HP EliteBook 845W G10", ALC287_FIXUP_CS35L41_I2C_2),
+       SND_PCI_QUIRK(0x103c, 0x8b77, "HP ElieBook 865 G10", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x103c, 0x8b7a, "HP", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b7d, "HP", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b87, "HP", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b8a, "HP", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b8b, "HP", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b8d, "HP", ALC236_FIXUP_HP_GPIO_LED),
-       SND_PCI_QUIRK(0x103c, 0x8b8f, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8b8f, "HP", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b92, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b96, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
+       SND_PCI_QUIRK(0x103c, 0x8b97, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
        SND_PCI_QUIRK(0x103c, 0x8bf0, "HP", ALC236_FIXUP_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8c26, "HP HP EliteBook 800G11", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x1043, 0x103e, "ASUS X540SA", ALC256_FIXUP_ASUS_MIC),
        SND_PCI_QUIRK(0x1043, 0x103f, "ASUS TX300", ALC282_FIXUP_ASUS_TX300),
        SND_PCI_QUIRK(0x1043, 0x106d, "Asus K53BE", ALC269_FIXUP_LIMIT_INT_MIC_BOOST),
@@ -9500,6 +9525,9 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1043, 0x1313, "Asus K42JZ", ALC269VB_FIXUP_ASUS_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1043, 0x13b0, "ASUS Z550SA", ALC256_FIXUP_ASUS_MIC),
        SND_PCI_QUIRK(0x1043, 0x1427, "Asus Zenbook UX31E", ALC269VB_FIXUP_ASUS_ZENBOOK),
+       SND_PCI_QUIRK(0x1043, 0x1473, "ASUS GU604V", ALC285_FIXUP_ASUS_HEADSET_MIC),
+       SND_PCI_QUIRK(0x1043, 0x1483, "ASUS GU603V", ALC285_FIXUP_ASUS_HEADSET_MIC),
+       SND_PCI_QUIRK(0x1043, 0x1493, "ASUS GV601V", ALC285_FIXUP_ASUS_HEADSET_MIC),
        SND_PCI_QUIRK(0x1043, 0x1517, "Asus Zenbook UX31A", ALC269VB_FIXUP_ASUS_ZENBOOK_UX31A),
        SND_PCI_QUIRK(0x1043, 0x1662, "ASUS GV301QH", ALC294_FIXUP_ASUS_DUAL_SPK),
        SND_PCI_QUIRK(0x1043, 0x1683, "ASUS UM3402YAR", ALC287_FIXUP_CS35L41_I2C_2),
@@ -9520,9 +9548,12 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1043, 0x1a8f, "ASUS UX582ZS", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1b11, "ASUS UX431DA", ALC294_FIXUP_ASUS_COEF_1B),
        SND_PCI_QUIRK(0x1043, 0x1b13, "Asus U41SV", ALC269_FIXUP_INV_DMIC),
+       SND_PCI_QUIRK(0x1043, 0x1b93, "ASUS G614JVR/JIR", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1bbd, "ASUS Z550MA", ALC255_FIXUP_ASUS_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1043, 0x1c23, "Asus X55U", ALC269_FIXUP_LIMIT_INT_MIC_BOOST),
+       SND_PCI_QUIRK(0x1043, 0x1c62, "ASUS GU603", ALC289_FIXUP_ASUS_GA401),
        SND_PCI_QUIRK(0x1043, 0x1c92, "ASUS ROG Strix G15", ALC285_FIXUP_ASUS_G533Z_PINS),
+       SND_PCI_QUIRK(0x1043, 0x1caf, "ASUS G634JYR/JZR", ALC285_FIXUP_ASUS_HEADSET_MIC),
        SND_PCI_QUIRK(0x1043, 0x1ccd, "ASUS X555UB", ALC256_FIXUP_ASUS_MIC),
        SND_PCI_QUIRK(0x1043, 0x1d42, "ASUS Zephyrus G14 2022", ALC289_FIXUP_ASUS_GA401),
        SND_PCI_QUIRK(0x1043, 0x1d4e, "ASUS TM420", ALC256_FIXUP_ASUS_HPE),
@@ -9537,6 +9568,11 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1043, 0x1f12, "ASUS UM5302", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x1043, 0x1f92, "ASUS ROG Flow X16", ALC289_FIXUP_ASUS_GA401),
        SND_PCI_QUIRK(0x1043, 0x3030, "ASUS ZN270IE", ALC256_FIXUP_ASUS_AIO_GPIO2),
+       SND_PCI_QUIRK(0x1043, 0x3a20, "ASUS G614JZR", ALC245_FIXUP_CS35L41_SPI_2),
+       SND_PCI_QUIRK(0x1043, 0x3a30, "ASUS G814JVR/JIR", ALC245_FIXUP_CS35L41_SPI_2),
+       SND_PCI_QUIRK(0x1043, 0x3a40, "ASUS G814JZR", ALC245_FIXUP_CS35L41_SPI_2),
+       SND_PCI_QUIRK(0x1043, 0x3a50, "ASUS G834JYR/JZR", ALC245_FIXUP_CS35L41_SPI_2),
+       SND_PCI_QUIRK(0x1043, 0x3a60, "ASUS G634JYR/JZR", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x831a, "ASUS P901", ALC269_FIXUP_STEREO_DMIC),
        SND_PCI_QUIRK(0x1043, 0x834a, "ASUS S101", ALC269_FIXUP_STEREO_DMIC),
        SND_PCI_QUIRK(0x1043, 0x8398, "ASUS P1005", ALC269_FIXUP_STEREO_DMIC),
@@ -9560,6 +9596,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x10ec, 0x124c, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK),
        SND_PCI_QUIRK(0x10ec, 0x1252, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK),
        SND_PCI_QUIRK(0x10ec, 0x1254, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK),
+       SND_PCI_QUIRK(0x10ec, 0x12cc, "Intel Reference board", ALC225_FIXUP_HEADSET_JACK),
        SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-SZ6", ALC269_FIXUP_HEADSET_MODE),
        SND_PCI_QUIRK(0x144d, 0xc109, "Samsung Ativ book 9 (NP900X3G)", ALC269_FIXUP_INV_DMIC),
        SND_PCI_QUIRK(0x144d, 0xc169, "Samsung Notebook 9 Pen (NP930SBE-K01US)", ALC298_FIXUP_SAMSUNG_AMP),
@@ -9608,6 +9645,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1558, 0x5101, "Clevo S510WU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x5157, "Clevo W517GU1", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x51a1, "Clevo NS50MU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
+       SND_PCI_QUIRK(0x1558, 0x51b1, "Clevo NS50AU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x5630, "Clevo NP50RNJS", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x70a1, "Clevo NB70T[HJK]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x70b3, "Clevo NK70SB", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
@@ -9618,6 +9656,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1558, 0x7716, "Clevo NS50PU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x7717, "Clevo NS70PU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x7718, "Clevo L140PU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
+       SND_PCI_QUIRK(0x1558, 0x7724, "Clevo L140AU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x8228, "Clevo NR40BU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x8520, "Clevo NH50D[CD]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1558, 0x8521, "Clevo NH77D[CD]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
@@ -9778,6 +9817,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x8086, 0x2074, "Intel NUC 8", ALC233_FIXUP_INTEL_NUC8_DMIC),
        SND_PCI_QUIRK(0x8086, 0x2080, "Intel NUC 8 Rugged", ALC256_FIXUP_INTEL_NUC8_RUGGED),
        SND_PCI_QUIRK(0x8086, 0x2081, "Intel NUC 10", ALC256_FIXUP_INTEL_NUC10),
+       SND_PCI_QUIRK(0x8086, 0x3038, "Intel NUC 13", ALC225_FIXUP_HEADSET_JACK),
        SND_PCI_QUIRK(0xf111, 0x0001, "Framework Laptop", ALC295_FIXUP_FRAMEWORK_LAPTOP_MIC_NO_PRESENCE),
 
 #if 0
@@ -11663,7 +11703,9 @@ static const struct snd_pci_quirk alc662_fixup_tbl[] = {
        SND_PCI_QUIRK(0x103c, 0x1632, "HP RP5800", ALC662_FIXUP_HP_RP5800),
        SND_PCI_QUIRK(0x103c, 0x870c, "HP", ALC897_FIXUP_HP_HSMIC_VERB),
        SND_PCI_QUIRK(0x103c, 0x8719, "HP", ALC897_FIXUP_HP_HSMIC_VERB),
+       SND_PCI_QUIRK(0x103c, 0x872b, "HP", ALC897_FIXUP_HP_HSMIC_VERB),
        SND_PCI_QUIRK(0x103c, 0x873e, "HP", ALC671_FIXUP_HP_HEADSET_MIC2),
+       SND_PCI_QUIRK(0x103c, 0x8768, "HP Slim Desktop S01", ALC671_FIXUP_HP_HEADSET_MIC2),
        SND_PCI_QUIRK(0x103c, 0x877e, "HP 288 Pro G6", ALC671_FIXUP_HP_HEADSET_MIC2),
        SND_PCI_QUIRK(0x103c, 0x885f, "HP 288 Pro G8", ALC671_FIXUP_HP_HEADSET_MIC2),
        SND_PCI_QUIRK(0x1043, 0x1080, "Asus UX501VW", ALC668_FIXUP_HEADSET_MODE),
@@ -11685,10 +11727,13 @@ static const struct snd_pci_quirk alc662_fixup_tbl[] = {
        SND_PCI_QUIRK(0x14cd, 0x5003, "USI", ALC662_FIXUP_USI_HEADSET_MODE),
        SND_PCI_QUIRK(0x17aa, 0x1036, "Lenovo P520", ALC662_FIXUP_LENOVO_MULTI_CODECS),
        SND_PCI_QUIRK(0x17aa, 0x1057, "Lenovo P360", ALC897_FIXUP_HEADSET_MIC_PIN),
+       SND_PCI_QUIRK(0x17aa, 0x1064, "Lenovo P3 Tower", ALC897_FIXUP_HEADSET_MIC_PIN),
        SND_PCI_QUIRK(0x17aa, 0x32ca, "Lenovo ThinkCentre M80", ALC897_FIXUP_HEADSET_MIC_PIN),
        SND_PCI_QUIRK(0x17aa, 0x32cb, "Lenovo ThinkCentre M70", ALC897_FIXUP_HEADSET_MIC_PIN),
        SND_PCI_QUIRK(0x17aa, 0x32cf, "Lenovo ThinkCentre M950", ALC897_FIXUP_HEADSET_MIC_PIN),
        SND_PCI_QUIRK(0x17aa, 0x32f7, "Lenovo ThinkCentre M90", ALC897_FIXUP_HEADSET_MIC_PIN),
+       SND_PCI_QUIRK(0x17aa, 0x3321, "Lenovo ThinkCentre M70 Gen4", ALC897_FIXUP_HEADSET_MIC_PIN),
+       SND_PCI_QUIRK(0x17aa, 0x331b, "Lenovo ThinkCentre M90 Gen4", ALC897_FIXUP_HEADSET_MIC_PIN),
        SND_PCI_QUIRK(0x17aa, 0x3742, "Lenovo TianYi510Pro-14IOB", ALC897_FIXUP_HEADSET_MIC_PIN2),
        SND_PCI_QUIRK(0x17aa, 0x38af, "Lenovo Ideapad Y550P", ALC662_FIXUP_IDEAPAD),
        SND_PCI_QUIRK(0x17aa, 0x3a0d, "Lenovo Ideapad Y550", ALC662_FIXUP_IDEAPAD),
@@ -11697,6 +11742,7 @@ static const struct snd_pci_quirk alc662_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1b0a, 0x01b8, "ACER Veriton", ALC662_FIXUP_ACER_VERITON),
        SND_PCI_QUIRK(0x1b35, 0x1234, "CZC ET26", ALC662_FIXUP_CZC_ET26),
        SND_PCI_QUIRK(0x1b35, 0x2206, "CZC P10T", ALC662_FIXUP_CZC_P10T),
+       SND_PCI_QUIRK(0x1c6c, 0x1239, "Compaq N14JP6-V2", ALC897_FIXUP_HP_HSMIC_VERB),
 
 #if 0
        /* Below is a quirk table taken from the old code.
index 24b9782..0278493 100644 (file)
@@ -1899,11 +1899,12 @@ static int aureon_add_controls(struct snd_ice1712 *ice)
                else {
                        for (i = 0; i < ARRAY_SIZE(cs8415_controls); i++) {
                                struct snd_kcontrol *kctl;
-                               err = snd_ctl_add(ice->card, (kctl = snd_ctl_new1(&cs8415_controls[i], ice)));
-                               if (err < 0)
-                                       return err;
+                               kctl = snd_ctl_new1(&cs8415_controls[i], ice);
                                if (i > 1)
                                        kctl->id.device = ice->pcm->device;
+                               err = snd_ctl_add(ice->card, kctl);
+                               if (err < 0)
+                                       return err;
                        }
                }
        }
index a5241a2..3b0c3e7 100644 (file)
@@ -2371,22 +2371,26 @@ int snd_ice1712_spdif_build_controls(struct snd_ice1712 *ice)
 
        if (snd_BUG_ON(!ice->pcm_pro))
                return -EIO;
-       err = snd_ctl_add(ice->card, kctl = snd_ctl_new1(&snd_ice1712_spdif_default, ice));
+       kctl = snd_ctl_new1(&snd_ice1712_spdif_default, ice);
+       kctl->id.device = ice->pcm_pro->device;
+       err = snd_ctl_add(ice->card, kctl);
        if (err < 0)
                return err;
+       kctl = snd_ctl_new1(&snd_ice1712_spdif_maskc, ice);
        kctl->id.device = ice->pcm_pro->device;
-       err = snd_ctl_add(ice->card, kctl = snd_ctl_new1(&snd_ice1712_spdif_maskc, ice));
+       err = snd_ctl_add(ice->card, kctl);
        if (err < 0)
                return err;
+       kctl = snd_ctl_new1(&snd_ice1712_spdif_maskp, ice);
        kctl->id.device = ice->pcm_pro->device;
-       err = snd_ctl_add(ice->card, kctl = snd_ctl_new1(&snd_ice1712_spdif_maskp, ice));
+       err = snd_ctl_add(ice->card, kctl);
        if (err < 0)
                return err;
+       kctl = snd_ctl_new1(&snd_ice1712_spdif_stream, ice);
        kctl->id.device = ice->pcm_pro->device;
-       err = snd_ctl_add(ice->card, kctl = snd_ctl_new1(&snd_ice1712_spdif_stream, ice));
+       err = snd_ctl_add(ice->card, kctl);
        if (err < 0)
                return err;
-       kctl->id.device = ice->pcm_pro->device;
        ice->spdif.stream_ctl = kctl;
        return 0;
 }
index 6fab2ad..1dc776a 100644 (file)
@@ -2392,23 +2392,27 @@ static int snd_vt1724_spdif_build_controls(struct snd_ice1712 *ice)
        if (err < 0)
                return err;
 
-       err = snd_ctl_add(ice->card, kctl = snd_ctl_new1(&snd_vt1724_spdif_default, ice));
+       kctl = snd_ctl_new1(&snd_vt1724_spdif_default, ice);
+       kctl->id.device = ice->pcm->device;
+       err = snd_ctl_add(ice->card, kctl);
        if (err < 0)
                return err;
+       kctl = snd_ctl_new1(&snd_vt1724_spdif_maskc, ice);
        kctl->id.device = ice->pcm->device;
-       err = snd_ctl_add(ice->card, kctl = snd_ctl_new1(&snd_vt1724_spdif_maskc, ice));
+       err = snd_ctl_add(ice->card, kctl);
        if (err < 0)
                return err;
+       kctl = snd_ctl_new1(&snd_vt1724_spdif_maskp, ice);
        kctl->id.device = ice->pcm->device;
-       err = snd_ctl_add(ice->card, kctl = snd_ctl_new1(&snd_vt1724_spdif_maskp, ice));
+       err = snd_ctl_add(ice->card, kctl);
        if (err < 0)
                return err;
-       kctl->id.device = ice->pcm->device;
 #if 0 /* use default only */
-       err = snd_ctl_add(ice->card, kctl = snd_ctl_new1(&snd_vt1724_spdif_stream, ice));
+       kctl = snd_ctl_new1(&snd_vt1724_spdif_stream, ice);
+       kctl->id.device = ice->pcm->device;
+       err = snd_ctl_add(ice->card, kctl);
        if (err < 0)
                return err;
-       kctl->id.device = ice->pcm->device;
        ice->spdif.stream_ctl = kctl;
 #endif
        return 0;
index 6971eec..6b8d869 100644 (file)
@@ -1822,20 +1822,20 @@ int snd_ymfpci_mixer(struct snd_ymfpci *chip, int rear_switch)
        if (snd_BUG_ON(!chip->pcm_spdif))
                return -ENXIO;
        kctl = snd_ctl_new1(&snd_ymfpci_spdif_default, chip);
+       kctl->id.device = chip->pcm_spdif->device;
        err = snd_ctl_add(chip->card, kctl);
        if (err < 0)
                return err;
-       kctl->id.device = chip->pcm_spdif->device;
        kctl = snd_ctl_new1(&snd_ymfpci_spdif_mask, chip);
+       kctl->id.device = chip->pcm_spdif->device;
        err = snd_ctl_add(chip->card, kctl);
        if (err < 0)
                return err;
-       kctl->id.device = chip->pcm_spdif->device;
        kctl = snd_ctl_new1(&snd_ymfpci_spdif_stream, chip);
+       kctl->id.device = chip->pcm_spdif->device;
        err = snd_ctl_add(chip->card, kctl);
        if (err < 0)
                return err;
-       kctl->id.device = chip->pcm_spdif->device;
        chip->spdif_pcm_ctl = kctl;
 
        /* direct recording source */
index afddb9a..b1337b9 100644 (file)
@@ -211,8 +211,7 @@ static int create_acp63_platform_devs(struct pci_dev *pci, struct acp63_dev_data
        case ACP63_PDM_DEV_MASK:
                adata->pdm_dev_index  = 0;
                acp63_fill_platform_dev_info(&pdevinfo[0], parent, NULL, "acp_ps_pdm_dma",
-                                            0, adata->res, 1, &adata->acp_lock,
-                                            sizeof(adata->acp_lock));
+                                            0, adata->res, 1, NULL, 0);
                acp63_fill_platform_dev_info(&pdevinfo[1], parent, NULL, "dmic-codec",
                                             0, NULL, 0, NULL, 0);
                acp63_fill_platform_dev_info(&pdevinfo[2], parent, NULL, "acp_ps_mach",
index 46b9132..3a83dc1 100644 (file)
@@ -361,12 +361,12 @@ static int acp63_pdm_audio_probe(struct platform_device *pdev)
 {
        struct resource *res;
        struct pdm_dev_data *adata;
+       struct acp63_dev_data *acp_data;
+       struct device *parent;
        int status;
 
-       if (!pdev->dev.platform_data) {
-               dev_err(&pdev->dev, "platform_data not retrieved\n");
-               return -ENODEV;
-       }
+       parent = pdev->dev.parent;
+       acp_data = dev_get_drvdata(parent);
        res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
        if (!res) {
                dev_err(&pdev->dev, "IORESOURCE_MEM FAILED\n");
@@ -382,7 +382,7 @@ static int acp63_pdm_audio_probe(struct platform_device *pdev)
                return -ENOMEM;
 
        adata->capture_stream = NULL;
-       adata->acp_lock = pdev->dev.platform_data;
+       adata->acp_lock = &acp_data->acp_lock;
        dev_set_drvdata(&pdev->dev, adata);
        status = devm_snd_soc_register_component(&pdev->dev,
                                                 &acp63_pdm_component,
index 0bc6e40..246299a 100644 (file)
@@ -175,6 +175,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
                .driver_data = &acp6x_card,
                .matches = {
                        DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "21EF"),
+               }
+       },
+       {
+               .driver_data = &acp6x_card,
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
                        DMI_MATCH(DMI_PRODUCT_NAME, "21EM"),
                }
        },
@@ -311,6 +318,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
                        DMI_MATCH(DMI_BOARD_NAME, "8A22"),
                }
        },
+       {
+               .driver_data = &acp6x_card,
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_VENDOR, "System76"),
+                       DMI_MATCH(DMI_PRODUCT_VERSION, "pang12"),
+               }
+       },
        {}
 };
 
index 8538e28..1e42052 100644 (file)
@@ -46,7 +46,7 @@ static const struct reg_default cs35l41_reg[] = {
        { CS35L41_DSP1_RX5_SRC,                 0x00000020 },
        { CS35L41_DSP1_RX6_SRC,                 0x00000021 },
        { CS35L41_DSP1_RX7_SRC,                 0x0000003A },
-       { CS35L41_DSP1_RX8_SRC,                 0x00000001 },
+       { CS35L41_DSP1_RX8_SRC,                 0x0000003B },
        { CS35L41_NGATE1_SRC,                   0x00000008 },
        { CS35L41_NGATE2_SRC,                   0x00000009 },
        { CS35L41_AMP_DIG_VOL_CTRL,             0x00008000 },
@@ -58,8 +58,8 @@ static const struct reg_default cs35l41_reg[] = {
        { CS35L41_IRQ1_MASK2,                   0xFFFFFFFF },
        { CS35L41_IRQ1_MASK3,                   0xFFFF87FF },
        { CS35L41_IRQ1_MASK4,                   0xFEFFFFFF },
-       { CS35L41_GPIO1_CTRL1,                  0xE1000001 },
-       { CS35L41_GPIO2_CTRL1,                  0xE1000001 },
+       { CS35L41_GPIO1_CTRL1,                  0x81000001 },
+       { CS35L41_GPIO2_CTRL1,                  0x81000001 },
        { CS35L41_MIXER_NGATE_CFG,              0x00000000 },
        { CS35L41_MIXER_NGATE_CH1_CFG,          0x00000303 },
        { CS35L41_MIXER_NGATE_CH2_CFG,          0x00000303 },
index 46762f7..e0d2b9b 100644 (file)
@@ -704,9 +704,6 @@ static int cs35l56_sdw_dai_hw_free(struct snd_pcm_substream *substream,
 static int cs35l56_sdw_dai_set_stream(struct snd_soc_dai *dai,
                                      void *sdw_stream, int direction)
 {
-       if (!sdw_stream)
-               return 0;
-
        snd_soc_dai_dma_data_set(dai, direction, sdw_stream);
 
        return 0;
@@ -852,10 +849,11 @@ static void cs35l56_dsp_work(struct work_struct *work)
         */
        if (cs35l56->sdw_peripheral) {
                cs35l56->sdw_irq_no_unmask = true;
-               cancel_work_sync(&cs35l56->sdw_irq_work);
+               flush_work(&cs35l56->sdw_irq_work);
                sdw_write_no_pm(cs35l56->sdw_peripheral, CS35L56_SDW_GEN_INT_MASK_1, 0);
                sdw_read_no_pm(cs35l56->sdw_peripheral, CS35L56_SDW_GEN_INT_STAT_1);
                sdw_write_no_pm(cs35l56->sdw_peripheral, CS35L56_SDW_GEN_INT_STAT_1, 0xFF);
+               flush_work(&cs35l56->sdw_irq_work);
        }
 
        ret = cs35l56_mbox_send(cs35l56, CS35L56_MBOX_CMD_SHUTDOWN);
index da6fcf7..de978c3 100644 (file)
@@ -746,6 +746,8 @@ static int tx_macro_put_dec_enum(struct snd_kcontrol *kcontrol,
        struct tx_macro *tx = snd_soc_component_get_drvdata(component);
 
        val = ucontrol->value.enumerated.item[0];
+       if (val >= e->items)
+               return -EINVAL;
 
        switch (e->reg) {
        case CDC_TX_INP_MUX_ADC_MUX0_CFG0:
@@ -772,6 +774,9 @@ static int tx_macro_put_dec_enum(struct snd_kcontrol *kcontrol,
        case CDC_TX_INP_MUX_ADC_MUX7_CFG0:
                mic_sel_reg = CDC_TX7_TX_PATH_CFG0;
                break;
+       default:
+               dev_err(component->dev, "Error in configuration!!\n");
+               return -EINVAL;
        }
 
        if (val != 0) {
index dcce06b..e6b84e2 100644 (file)
@@ -211,7 +211,7 @@ static int max98363_io_init(struct sdw_slave *slave)
 }
 
 #define MAX98363_RATES SNDRV_PCM_RATE_8000_192000
-#define MAX98363_FORMATS (SNDRV_PCM_FMTBIT_S32_LE)
+#define MAX98363_FORMATS (SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S24_LE)
 
 static int max98363_sdw_dai_hw_params(struct snd_pcm_substream *substream,
                                      struct snd_pcm_hw_params *params,
@@ -246,7 +246,7 @@ static int max98363_sdw_dai_hw_params(struct snd_pcm_substream *substream,
        stream_config.frame_rate = params_rate(params);
        stream_config.bps = snd_pcm_format_width(params_format(params));
        stream_config.direction = direction;
-       stream_config.ch_count = params_channels(params);
+       stream_config.ch_count = 1;
 
        if (stream_config.ch_count > runtime->hw.channels_max) {
                stream_config.ch_count = runtime->hw.channels_max;
index 4f19fd9..5a4db89 100644 (file)
@@ -1903,6 +1903,30 @@ static const struct dmi_system_id nau8824_quirk_table[] = {
                },
                .driver_data = (void *)(NAU8824_MONO_SPEAKER),
        },
+       {
+               /* Positivo CW14Q01P */
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Positivo Tecnologia SA"),
+                       DMI_MATCH(DMI_BOARD_NAME, "CW14Q01P"),
+               },
+               .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH),
+       },
+       {
+               /* Positivo K1424G */
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Positivo Tecnologia SA"),
+                       DMI_MATCH(DMI_BOARD_NAME, "K1424G"),
+               },
+               .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH),
+       },
+       {
+               /* Positivo N14ZP74G */
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Positivo Tecnologia SA"),
+                       DMI_MATCH(DMI_BOARD_NAME, "N14ZP74G"),
+               },
+               .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH),
+       },
        {}
 };
 
index 2935c1b..5bc46b0 100644 (file)
@@ -267,7 +267,9 @@ static int rt5682_i2c_probe(struct i2c_client *i2c)
                ret = devm_request_threaded_irq(&i2c->dev, i2c->irq, NULL,
                        rt5682_irq, IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING
                        | IRQF_ONESHOT, "rt5682", rt5682);
-               if (ret)
+               if (!ret)
+                       rt5682->irq = i2c->irq;
+               else
                        dev_err(&i2c->dev, "Failed to reguest IRQ: %d\n", ret);
        }
 
index f6c798b..5d99254 100644 (file)
@@ -2959,6 +2959,9 @@ static int rt5682_suspend(struct snd_soc_component *component)
        if (rt5682->is_sdw)
                return 0;
 
+       if (rt5682->irq)
+               disable_irq(rt5682->irq);
+
        cancel_delayed_work_sync(&rt5682->jack_detect_work);
        cancel_delayed_work_sync(&rt5682->jd_check_work);
        if (rt5682->hs_jack && (rt5682->jack_type & SND_JACK_HEADSET) == SND_JACK_HEADSET) {
@@ -3027,6 +3030,9 @@ static int rt5682_resume(struct snd_soc_component *component)
        mod_delayed_work(system_power_efficient_wq,
                &rt5682->jack_detect_work, msecs_to_jiffies(0));
 
+       if (rt5682->irq)
+               enable_irq(rt5682->irq);
+
        return 0;
 }
 #else
index d568c69..e8efd8a 100644 (file)
@@ -1462,6 +1462,7 @@ struct rt5682_priv {
        int pll_out[RT5682_PLLS];
 
        int jack_type;
+       int irq;
        int irq_work_delay_time;
 };
 
index 00b6036..c293244 100644 (file)
@@ -53,6 +53,18 @@ static const struct reg_default ssm2602_reg[SSM2602_CACHEREGNUM] = {
        { .reg = 0x09, .def = 0x0000 }
 };
 
+/*
+ * ssm2602 register patch
+ * Workaround for playback distortions after power up: activates digital
+ * core, and then powers on output, DAC, and whole chip at the same time
+ */
+
+static const struct reg_sequence ssm2602_patch[] = {
+       { SSM2602_ACTIVE, 0x01 },
+       { SSM2602_PWR,    0x07 },
+       { SSM2602_RESET,  0x00 },
+};
+
 
 /*Appending several "None"s just for OSS mixer use*/
 static const char *ssm2602_input_select[] = {
@@ -598,6 +610,9 @@ static int ssm260x_component_probe(struct snd_soc_component *component)
                return ret;
        }
 
+       regmap_register_patch(ssm2602->regmap, ssm2602_patch,
+                             ARRAY_SIZE(ssm2602_patch));
+
        /* set the update bits */
        regmap_update_bits(ssm2602->regmap, SSM2602_LINVOL,
                            LINVOL_LRIN_BOTH, LINVOL_LRIN_BOTH);
index 402286d..9c10200 100644 (file)
@@ -1190,7 +1190,6 @@ static const struct regmap_config wcd938x_regmap_config = {
        .readable_reg = wcd938x_readable_register,
        .writeable_reg = wcd938x_writeable_register,
        .volatile_reg = wcd938x_volatile_register,
-       .can_multi_write = true,
 };
 
 static const struct sdw_slave_ops wcd9380_slave_ops = {
index f709231..97f6873 100644 (file)
@@ -645,7 +645,6 @@ static struct regmap_config wsa881x_regmap_config = {
        .readable_reg = wsa881x_readable_register,
        .reg_format_endian = REGMAP_ENDIAN_NATIVE,
        .val_format_endian = REGMAP_ENDIAN_NATIVE,
-       .can_multi_write = true,
 };
 
 enum {
index c609cb6..e80b531 100644 (file)
@@ -946,7 +946,6 @@ static struct regmap_config wsa883x_regmap_config = {
        .writeable_reg = wsa883x_writeable_register,
        .reg_format_endian = REGMAP_ENDIAN_NATIVE,
        .val_format_endian = REGMAP_ENDIAN_NATIVE,
-       .can_multi_write = true,
        .use_single_read = true,
 };
 
index acdf98b..399a489 100644 (file)
@@ -132,13 +132,13 @@ static irqreturn_t i2s_irq_handler(int irq, void *dev_id)
 
                /* Error Handling: TX */
                if (isr[i] & ISR_TXFO) {
-                       dev_err(dev->dev, "TX overrun (ch_id=%d)\n", i);
+                       dev_err_ratelimited(dev->dev, "TX overrun (ch_id=%d)\n", i);
                        irq_valid = true;
                }
 
                /* Error Handling: TX */
                if (isr[i] & ISR_RXFO) {
-                       dev_err(dev->dev, "RX overrun (ch_id=%d)\n", i);
+                       dev_err_ratelimited(dev->dev, "RX overrun (ch_id=%d)\n", i);
                        irq_valid = true;
                }
        }
@@ -183,30 +183,6 @@ static void i2s_stop(struct dw_i2s_dev *dev,
        }
 }
 
-static int dw_i2s_startup(struct snd_pcm_substream *substream,
-               struct snd_soc_dai *cpu_dai)
-{
-       struct dw_i2s_dev *dev = snd_soc_dai_get_drvdata(cpu_dai);
-       union dw_i2s_snd_dma_data *dma_data = NULL;
-
-       if (!(dev->capability & DWC_I2S_RECORD) &&
-                       (substream->stream == SNDRV_PCM_STREAM_CAPTURE))
-               return -EINVAL;
-
-       if (!(dev->capability & DWC_I2S_PLAY) &&
-                       (substream->stream == SNDRV_PCM_STREAM_PLAYBACK))
-               return -EINVAL;
-
-       if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
-               dma_data = &dev->play_dma_data;
-       else if (substream->stream == SNDRV_PCM_STREAM_CAPTURE)
-               dma_data = &dev->capture_dma_data;
-
-       snd_soc_dai_set_dma_data(cpu_dai, substream, (void *)dma_data);
-
-       return 0;
-}
-
 static void dw_i2s_config(struct dw_i2s_dev *dev, int stream)
 {
        u32 ch_reg;
@@ -305,12 +281,6 @@ static int dw_i2s_hw_params(struct snd_pcm_substream *substream,
        return 0;
 }
 
-static void dw_i2s_shutdown(struct snd_pcm_substream *substream,
-               struct snd_soc_dai *dai)
-{
-       snd_soc_dai_set_dma_data(dai, substream, NULL);
-}
-
 static int dw_i2s_prepare(struct snd_pcm_substream *substream,
                          struct snd_soc_dai *dai)
 {
@@ -382,8 +352,6 @@ static int dw_i2s_set_fmt(struct snd_soc_dai *cpu_dai, unsigned int fmt)
 }
 
 static const struct snd_soc_dai_ops dw_i2s_dai_ops = {
-       .startup        = dw_i2s_startup,
-       .shutdown       = dw_i2s_shutdown,
        .hw_params      = dw_i2s_hw_params,
        .prepare        = dw_i2s_prepare,
        .trigger        = dw_i2s_trigger,
@@ -625,6 +593,14 @@ static int dw_configure_dai_by_dt(struct dw_i2s_dev *dev,
 
 }
 
+static int dw_i2s_dai_probe(struct snd_soc_dai *dai)
+{
+       struct dw_i2s_dev *dev = snd_soc_dai_get_drvdata(dai);
+
+       snd_soc_dai_init_dma_data(dai, &dev->play_dma_data, &dev->capture_dma_data);
+       return 0;
+}
+
 static int dw_i2s_probe(struct platform_device *pdev)
 {
        const struct i2s_platform_data *pdata = pdev->dev.platform_data;
@@ -643,6 +619,7 @@ static int dw_i2s_probe(struct platform_device *pdev)
                return -ENOMEM;
 
        dw_i2s_dai->ops = &dw_i2s_dai_ops;
+       dw_i2s_dai->probe = dw_i2s_dai_probe;
 
        dev->i2s_base = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
        if (IS_ERR(dev->i2s_base))
index 94341e4..3f08082 100644 (file)
@@ -1159,7 +1159,7 @@ static int fsl_micfil_probe(struct platform_device *pdev)
        ret = devm_snd_dmaengine_pcm_register(&pdev->dev, NULL, 0);
        if (ret) {
                dev_err(&pdev->dev, "failed to pcm register\n");
-               return ret;
+               goto err_pm_disable;
        }
 
        fsl_micfil_dai.capture.formats = micfil->soc->formats;
@@ -1169,9 +1169,20 @@ static int fsl_micfil_probe(struct platform_device *pdev)
        if (ret) {
                dev_err(&pdev->dev, "failed to register component %s\n",
                        fsl_micfil_component.name);
+               goto err_pm_disable;
        }
 
        return ret;
+
+err_pm_disable:
+       pm_runtime_disable(&pdev->dev);
+
+       return ret;
+}
+
+static void fsl_micfil_remove(struct platform_device *pdev)
+{
+       pm_runtime_disable(&pdev->dev);
 }
 
 static int __maybe_unused fsl_micfil_runtime_suspend(struct device *dev)
@@ -1232,6 +1243,7 @@ static const struct dev_pm_ops fsl_micfil_pm_ops = {
 
 static struct platform_driver fsl_micfil_driver = {
        .probe = fsl_micfil_probe,
+       .remove_new = fsl_micfil_remove,
        .driver = {
                .name = "fsl-micfil-dai",
                .pm = &fsl_micfil_pm_ops,
index abdaffb..e3105d4 100644 (file)
@@ -491,14 +491,21 @@ static int fsl_sai_set_bclk(struct snd_soc_dai *dai, bool tx, u32 freq)
        regmap_update_bits(sai->regmap, reg, FSL_SAI_CR2_MSEL_MASK,
                           FSL_SAI_CR2_MSEL(sai->mclk_id[tx]));
 
-       if (savediv == 1)
+       if (savediv == 1) {
                regmap_update_bits(sai->regmap, reg,
                                   FSL_SAI_CR2_DIV_MASK | FSL_SAI_CR2_BYP,
                                   FSL_SAI_CR2_BYP);
-       else
+               if (fsl_sai_dir_is_synced(sai, adir))
+                       regmap_update_bits(sai->regmap, FSL_SAI_xCR2(tx, ofs),
+                                          FSL_SAI_CR2_BCI, FSL_SAI_CR2_BCI);
+               else
+                       regmap_update_bits(sai->regmap, FSL_SAI_xCR2(tx, ofs),
+                                          FSL_SAI_CR2_BCI, 0);
+       } else {
                regmap_update_bits(sai->regmap, reg,
                                   FSL_SAI_CR2_DIV_MASK | FSL_SAI_CR2_BYP,
                                   savediv / 2 - 1);
+       }
 
        if (sai->soc_data->max_register >= FSL_SAI_MCTL) {
                /* SAI is in master mode at this point, so enable MCLK */
index 197748a..a53c4f0 100644 (file)
 
 /* SAI Transmit and Receive Configuration 2 Register */
 #define FSL_SAI_CR2_SYNC       BIT(30)
+#define FSL_SAI_CR2_BCI                BIT(28)
 #define FSL_SAI_CR2_MSEL_MASK  (0x3 << 26)
 #define FSL_SAI_CR2_MSEL_BUS   0
 #define FSL_SAI_CR2_MSEL_MCLK1 BIT(26)
index 467edd9..e5ff61c 100644 (file)
@@ -314,7 +314,7 @@ int asoc_simple_startup(struct snd_pcm_substream *substream)
                }
                ret = snd_pcm_hw_constraint_minmax(substream->runtime, SNDRV_PCM_HW_PARAM_RATE,
                        fixed_rate, fixed_rate);
-               if (ret)
+               if (ret < 0)
                        goto codec_err;
        }
 
index 6f044cc..5a5e4ec 100644 (file)
@@ -416,6 +416,7 @@ static int __simple_for_each_link(struct asoc_simple_priv *priv,
 
                        if (ret < 0) {
                                of_node_put(codec);
+                               of_node_put(plat);
                                of_node_put(np);
                                goto error;
                        }
index 02683dc..1860099 100644 (file)
@@ -169,6 +169,7 @@ static bool apl_lp_streaming(struct avs_dev *adev)
 {
        struct avs_path *path;
 
+       spin_lock(&adev->path_list_lock);
        /* Any gateway without buffer allocated in LP area disqualifies D0IX. */
        list_for_each_entry(path, &adev->path_list, node) {
                struct avs_path_pipeline *ppl;
@@ -188,11 +189,14 @@ static bool apl_lp_streaming(struct avs_dev *adev)
                                if (cfg->copier.dma_type == INVALID_OBJECT_ID)
                                        continue;
 
-                               if (!mod->gtw_attrs.lp_buffer_alloc)
+                               if (!mod->gtw_attrs.lp_buffer_alloc) {
+                                       spin_unlock(&adev->path_list_lock);
                                        return false;
+                               }
                        }
                }
        }
+       spin_unlock(&adev->path_list_lock);
 
        return true;
 }
index d7fccdc..0cf38c9 100644 (file)
@@ -283,8 +283,8 @@ void avs_release_firmwares(struct avs_dev *adev);
 
 int avs_dsp_init_module(struct avs_dev *adev, u16 module_id, u8 ppl_instance_id,
                        u8 core_id, u8 domain, void *param, u32 param_size,
-                       u16 *instance_id);
-void avs_dsp_delete_module(struct avs_dev *adev, u16 module_id, u16 instance_id,
+                       u8 *instance_id);
+void avs_dsp_delete_module(struct avs_dev *adev, u16 module_id, u8 instance_id,
                           u8 ppl_instance_id, u8 core_id);
 int avs_dsp_create_pipeline(struct avs_dev *adev, u16 req_size, u8 priority,
                            bool lp, u16 attributes, u8 *instance_id);
index b2823c2..60f8fb0 100644 (file)
@@ -443,7 +443,7 @@ static int avs_register_i2s_boards(struct avs_dev *adev)
        }
 
        for (mach = boards->machs; mach->id[0]; mach++) {
-               if (!acpi_dev_present(mach->id, NULL, -1))
+               if (!acpi_dev_present(mach->id, mach->uid, -1))
                        continue;
 
                if (mach->machine_quirk)
index a8b14b7..3dfa2e9 100644 (file)
@@ -21,17 +21,25 @@ static struct avs_dev *avs_get_kcontrol_adev(struct snd_kcontrol *kcontrol)
        return to_avs_dev(w->dapm->component->dev);
 }
 
-static struct avs_path_module *avs_get_kcontrol_module(struct avs_dev *adev, u32 id)
+static struct avs_path_module *avs_get_volume_module(struct avs_dev *adev, u32 id)
 {
        struct avs_path *path;
        struct avs_path_pipeline *ppl;
        struct avs_path_module *mod;
 
-       list_for_each_entry(path, &adev->path_list, node)
-               list_for_each_entry(ppl, &path->ppl_list, node)
-                       list_for_each_entry(mod, &ppl->mod_list, node)
-                               if (mod->template->ctl_id && mod->template->ctl_id == id)
+       spin_lock(&adev->path_list_lock);
+       list_for_each_entry(path, &adev->path_list, node) {
+               list_for_each_entry(ppl, &path->ppl_list, node) {
+                       list_for_each_entry(mod, &ppl->mod_list, node) {
+                               if (guid_equal(&mod->template->cfg_ext->type, &AVS_PEAKVOL_MOD_UUID)
+                                   && mod->template->ctl_id == id) {
+                                       spin_unlock(&adev->path_list_lock);
                                        return mod;
+                               }
+                       }
+               }
+       }
+       spin_unlock(&adev->path_list_lock);
 
        return NULL;
 }
@@ -49,7 +57,7 @@ int avs_control_volume_get(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_va
        /* prevent access to modules while path is being constructed */
        mutex_lock(&adev->path_mutex);
 
-       active_module = avs_get_kcontrol_module(adev, ctl_data->id);
+       active_module = avs_get_volume_module(adev, ctl_data->id);
        if (active_module) {
                ret = avs_ipc_peakvol_get_volume(adev, active_module->module_id,
                                                 active_module->instance_id, &dspvols,
@@ -89,7 +97,7 @@ int avs_control_volume_put(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_va
                changed = 1;
        }
 
-       active_module = avs_get_kcontrol_module(adev, ctl_data->id);
+       active_module = avs_get_volume_module(adev, ctl_data->id);
        if (active_module) {
                dspvol.channel_id = AVS_ALL_CHANNELS_MASK;
                dspvol.target_volume = *volume;
index b881100..aa03af4 100644 (file)
@@ -225,7 +225,7 @@ err:
 
 int avs_dsp_init_module(struct avs_dev *adev, u16 module_id, u8 ppl_instance_id,
                        u8 core_id, u8 domain, void *param, u32 param_size,
-                       u16 *instance_id)
+                       u8 *instance_id)
 {
        struct avs_module_entry mentry;
        bool was_loaded = false;
@@ -272,7 +272,7 @@ err_mod_entry:
        return ret;
 }
 
-void avs_dsp_delete_module(struct avs_dev *adev, u16 module_id, u16 instance_id,
+void avs_dsp_delete_module(struct avs_dev *adev, u16 module_id, u8 instance_id,
                           u8 ppl_instance_id, u8 core_id)
 {
        struct avs_module_entry mentry;
index d3b60ae..7f23a30 100644 (file)
@@ -619,7 +619,7 @@ enum avs_channel_config {
        AVS_CHANNEL_CONFIG_DUAL_MONO = 9,
        AVS_CHANNEL_CONFIG_I2S_DUAL_STEREO_0 = 10,
        AVS_CHANNEL_CONFIG_I2S_DUAL_STEREO_1 = 11,
-       AVS_CHANNEL_CONFIG_4_CHANNEL = 12,
+       AVS_CHANNEL_CONFIG_7_1 = 12,
        AVS_CHANNEL_CONFIG_INVALID
 };
 
index 197222c..657f7b0 100644 (file)
@@ -37,7 +37,7 @@ struct avs_path_pipeline {
 
 struct avs_path_module {
        u16 module_id;
-       u16 instance_id;
+       u8 instance_id;
        union avs_gtw_attributes gtw_attrs;
 
        struct avs_tplg_module *template;
index 31c032a..1fbb2c2 100644 (file)
@@ -468,21 +468,34 @@ static int avs_dai_fe_startup(struct snd_pcm_substream *substream, struct snd_so
 
        host_stream = snd_hdac_ext_stream_assign(bus, substream, HDAC_EXT_STREAM_TYPE_HOST);
        if (!host_stream) {
-               kfree(data);
-               return -EBUSY;
+               ret = -EBUSY;
+               goto err;
        }
 
        data->host_stream = host_stream;
-       snd_pcm_hw_constraint_integer(runtime, SNDRV_PCM_HW_PARAM_PERIODS);
+       ret = snd_pcm_hw_constraint_integer(runtime, SNDRV_PCM_HW_PARAM_PERIODS);
+       if (ret < 0)
+               goto err;
+
        /* avoid wrap-around with wall-clock */
-       snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_BUFFER_TIME, 20, 178000000);
-       snd_pcm_hw_constraint_list(runtime, 0, SNDRV_PCM_HW_PARAM_RATE, &hw_rates);
+       ret = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_BUFFER_TIME, 20, 178000000);
+       if (ret < 0)
+               goto err;
+
+       ret = snd_pcm_hw_constraint_list(runtime, 0, SNDRV_PCM_HW_PARAM_RATE, &hw_rates);
+       if (ret < 0)
+               goto err;
+
        snd_pcm_set_sync(substream);
 
        dev_dbg(dai->dev, "%s fe STARTUP tag %d str %p",
                __func__, hdac_stream(host_stream)->stream_tag, substream);
 
        return 0;
+
+err:
+       kfree(data);
+       return ret;
 }
 
 static void avs_dai_fe_shutdown(struct snd_pcm_substream *substream, struct snd_soc_dai *dai)
index 70a9420..2759282 100644 (file)
@@ -18,7 +18,7 @@ static int avs_dsp_init_probe(struct avs_dev *adev, union avs_connector_node_id
 {
        struct avs_probe_cfg cfg = {{0}};
        struct avs_module_entry mentry;
-       u16 dummy;
+       u8 dummy;
 
        avs_get_module_entry(adev, &AVS_PROBE_MOD_UUID, &mentry);
 
index 6faf4a4..144f082 100644 (file)
@@ -1347,7 +1347,7 @@ static int sof_card_dai_links_create(struct device *dev,
                                if ((SDW_PART_ID(adr_link->adr_d[i].adr) !=
                                    SDW_PART_ID(adr_link->adr_d[j].adr)) ||
                                    (SDW_MFG_ID(adr_link->adr_d[i].adr) !=
-                                   SDW_MFG_ID(adr_link->adr_d[i].adr))) {
+                                   SDW_MFG_ID(adr_link->adr_d[j].adr))) {
                                        append_codec_type = true;
                                        goto out;
                                }
index 6d9cfe0..d0f6c94 100644 (file)
@@ -218,18 +218,48 @@ static int jz4740_i2s_set_fmt(struct snd_soc_dai *dai, unsigned int fmt)
        return 0;
 }
 
+static int jz4740_i2s_get_i2sdiv(unsigned long mclk, unsigned long rate,
+                                unsigned long i2sdiv_max)
+{
+       unsigned long div, rate1, rate2, err1, err2;
+
+       div = mclk / (64 * rate);
+       if (div == 0)
+               div = 1;
+
+       rate1 = mclk / (64 * div);
+       rate2 = mclk / (64 * (div + 1));
+
+       err1 = abs(rate1 - rate);
+       err2 = abs(rate2 - rate);
+
+       /*
+        * Choose the divider that produces the smallest error in the
+        * output rate and reject dividers with a 5% or higher error.
+        * In the event that both dividers are outside the acceptable
+        * error margin, reject the rate to prevent distorted audio.
+        * (The number 5% is arbitrary.)
+        */
+       if (div <= i2sdiv_max && err1 <= err2 && err1 < rate/20)
+               return div;
+       if (div < i2sdiv_max && err2 < rate/20)
+               return div + 1;
+
+       return -EINVAL;
+}
+
 static int jz4740_i2s_hw_params(struct snd_pcm_substream *substream,
        struct snd_pcm_hw_params *params, struct snd_soc_dai *dai)
 {
        struct jz4740_i2s *i2s = snd_soc_dai_get_drvdata(dai);
        struct regmap_field *div_field;
+       unsigned long i2sdiv_max;
        unsigned int sample_size;
-       uint32_t ctrl;
-       int div;
+       uint32_t ctrl, conf;
+       int div = 1;
 
        regmap_read(i2s->regmap, JZ_REG_AIC_CTRL, &ctrl);
-
-       div = clk_get_rate(i2s->clk_i2s) / (64 * params_rate(params));
+       regmap_read(i2s->regmap, JZ_REG_AIC_CONF, &conf);
 
        switch (params_format(params)) {
        case SNDRV_PCM_FORMAT_S8:
@@ -258,11 +288,27 @@ static int jz4740_i2s_hw_params(struct snd_pcm_substream *substream,
                        ctrl &= ~JZ_AIC_CTRL_MONO_TO_STEREO;
 
                div_field = i2s->field_i2sdiv_playback;
+               i2sdiv_max = GENMASK(i2s->soc_info->field_i2sdiv_playback.msb,
+                                    i2s->soc_info->field_i2sdiv_playback.lsb);
        } else {
                ctrl &= ~JZ_AIC_CTRL_INPUT_SAMPLE_SIZE;
                ctrl |= FIELD_PREP(JZ_AIC_CTRL_INPUT_SAMPLE_SIZE, sample_size);
 
                div_field = i2s->field_i2sdiv_capture;
+               i2sdiv_max = GENMASK(i2s->soc_info->field_i2sdiv_capture.msb,
+                                    i2s->soc_info->field_i2sdiv_capture.lsb);
+       }
+
+       /*
+        * Only calculate I2SDIV if we're supplying the bit or frame clock.
+        * If the codec is supplying both clocks then the divider output is
+        * unused, and we don't want it to limit the allowed sample rates.
+        */
+       if (conf & (JZ_AIC_CONF_BIT_CLK_MASTER | JZ_AIC_CONF_SYNC_CLK_MASTER)) {
+               div = jz4740_i2s_get_i2sdiv(clk_get_rate(i2s->clk_i2s),
+                                           params_rate(params), i2sdiv_max);
+               if (div < 0)
+                       return div;
        }
 
        regmap_write(i2s->regmap, JZ_REG_AIC_CTRL, ctrl);
index a6b4f29..539e3a0 100644 (file)
@@ -644,9 +644,3 @@ int mt8186_init_clock(struct mtk_base_afe *afe)
 
        return 0;
 }
-
-void mt8186_deinit_clock(void *priv)
-{
-       struct mtk_base_afe *afe = priv;
-       mt8186_audsys_clk_unregister(afe);
-}
index d598871..a9d59e5 100644 (file)
@@ -81,7 +81,6 @@ enum {
 struct mtk_base_afe;
 int mt8186_set_audio_int_bus_parent(struct mtk_base_afe *afe, int clk_id);
 int mt8186_init_clock(struct mtk_base_afe *afe);
-void mt8186_deinit_clock(void *priv);
 int mt8186_afe_enable_cgs(struct mtk_base_afe *afe);
 void mt8186_afe_disable_cgs(struct mtk_base_afe *afe);
 int mt8186_afe_enable_clock(struct mtk_base_afe *afe);
index 41172a8..a868a04 100644 (file)
@@ -2848,10 +2848,6 @@ static int mt8186_afe_pcm_dev_probe(struct platform_device *pdev)
                return ret;
        }
 
-       ret = devm_add_action_or_reset(dev, mt8186_deinit_clock, (void *)afe);
-       if (ret)
-               return ret;
-
        /* init memif */
        afe->memif_32bit_supported = 0;
        afe->memif_size = MT8186_MEMIF_NUM;
index 578969c..5666be6 100644 (file)
@@ -84,6 +84,29 @@ static const struct afe_gate aud_clks[CLK_AUD_NR_CLK] = {
        GATE_AUD2(CLK_AUD_ETDM_OUT1_BCLK, "aud_etdm_out1_bclk", "top_audio", 24),
 };
 
+static void mt8186_audsys_clk_unregister(void *data)
+{
+       struct mtk_base_afe *afe = data;
+       struct mt8186_afe_private *afe_priv = afe->platform_priv;
+       struct clk *clk;
+       struct clk_lookup *cl;
+       int i;
+
+       if (!afe_priv)
+               return;
+
+       for (i = 0; i < CLK_AUD_NR_CLK; i++) {
+               cl = afe_priv->lookup[i];
+               if (!cl)
+                       continue;
+
+               clk = cl->clk;
+               clk_unregister_gate(clk);
+
+               clkdev_drop(cl);
+       }
+}
+
 int mt8186_audsys_clk_register(struct mtk_base_afe *afe)
 {
        struct mt8186_afe_private *afe_priv = afe->platform_priv;
@@ -124,27 +147,6 @@ int mt8186_audsys_clk_register(struct mtk_base_afe *afe)
                afe_priv->lookup[i] = cl;
        }
 
-       return 0;
+       return devm_add_action_or_reset(afe->dev, mt8186_audsys_clk_unregister, afe);
 }
 
-void mt8186_audsys_clk_unregister(struct mtk_base_afe *afe)
-{
-       struct mt8186_afe_private *afe_priv = afe->platform_priv;
-       struct clk *clk;
-       struct clk_lookup *cl;
-       int i;
-
-       if (!afe_priv)
-               return;
-
-       for (i = 0; i < CLK_AUD_NR_CLK; i++) {
-               cl = afe_priv->lookup[i];
-               if (!cl)
-                       continue;
-
-               clk = cl->clk;
-               clk_unregister_gate(clk);
-
-               clkdev_drop(cl);
-       }
-}
index b8d6a06..897a291 100644 (file)
@@ -10,6 +10,5 @@
 #define _MT8186_AUDSYS_CLK_H_
 
 int mt8186_audsys_clk_register(struct mtk_base_afe *afe);
-void mt8186_audsys_clk_unregister(struct mtk_base_afe *afe);
 
 #endif
index 743d6a1..0fb9751 100644 (file)
@@ -418,13 +418,6 @@ int mt8188_afe_init_clock(struct mtk_base_afe *afe)
        return 0;
 }
 
-void mt8188_afe_deinit_clock(void *priv)
-{
-       struct mtk_base_afe *afe = priv;
-
-       mt8188_audsys_clk_unregister(afe);
-}
-
 int mt8188_afe_enable_clk(struct mtk_base_afe *afe, struct clk *clk)
 {
        int ret;
index 084fdfb..a4203a8 100644 (file)
@@ -100,7 +100,6 @@ int mt8188_afe_get_mclk_source_clk_id(int sel);
 int mt8188_afe_get_mclk_source_rate(struct mtk_base_afe *afe, int apll);
 int mt8188_afe_get_default_mclk_source_by_rate(int rate);
 int mt8188_afe_init_clock(struct mtk_base_afe *afe);
-void mt8188_afe_deinit_clock(void *priv);
 int mt8188_afe_enable_clk(struct mtk_base_afe *afe, struct clk *clk);
 void mt8188_afe_disable_clk(struct mtk_base_afe *afe, struct clk *clk);
 int mt8188_afe_set_clk_rate(struct mtk_base_afe *afe, struct clk *clk,
index e5f9373..bcf7025 100644 (file)
@@ -3185,10 +3185,6 @@ static int mt8188_afe_pcm_dev_probe(struct platform_device *pdev)
        if (ret)
                return dev_err_probe(dev, ret, "init clock error");
 
-       ret = devm_add_action_or_reset(dev, mt8188_afe_deinit_clock, (void *)afe);
-       if (ret)
-               return ret;
-
        spin_lock_init(&afe_priv->afe_ctrl_lock);
 
        mutex_init(&afe->irq_alloc_lock);
index be1c53b..c796ad8 100644 (file)
@@ -138,6 +138,29 @@ static const struct afe_gate aud_clks[CLK_AUD_NR_CLK] = {
        GATE_AUD6(CLK_AUD_GASRC11, "aud_gasrc11", "top_asm_h", 11),
 };
 
+static void mt8188_audsys_clk_unregister(void *data)
+{
+       struct mtk_base_afe *afe = data;
+       struct mt8188_afe_private *afe_priv = afe->platform_priv;
+       struct clk *clk;
+       struct clk_lookup *cl;
+       int i;
+
+       if (!afe_priv)
+               return;
+
+       for (i = 0; i < CLK_AUD_NR_CLK; i++) {
+               cl = afe_priv->lookup[i];
+               if (!cl)
+                       continue;
+
+               clk = cl->clk;
+               clk_unregister_gate(clk);
+
+               clkdev_drop(cl);
+       }
+}
+
 int mt8188_audsys_clk_register(struct mtk_base_afe *afe)
 {
        struct mt8188_afe_private *afe_priv = afe->platform_priv;
@@ -179,27 +202,5 @@ int mt8188_audsys_clk_register(struct mtk_base_afe *afe)
                afe_priv->lookup[i] = cl;
        }
 
-       return 0;
-}
-
-void mt8188_audsys_clk_unregister(struct mtk_base_afe *afe)
-{
-       struct mt8188_afe_private *afe_priv = afe->platform_priv;
-       struct clk *clk;
-       struct clk_lookup *cl;
-       int i;
-
-       if (!afe_priv)
-               return;
-
-       for (i = 0; i < CLK_AUD_NR_CLK; i++) {
-               cl = afe_priv->lookup[i];
-               if (!cl)
-                       continue;
-
-               clk = cl->clk;
-               clk_unregister_gate(clk);
-
-               clkdev_drop(cl);
-       }
+       return devm_add_action_or_reset(afe->dev, mt8188_audsys_clk_unregister, afe);
 }
index 6c5f463..45b0948 100644 (file)
@@ -10,6 +10,5 @@
 #define _MT8188_AUDSYS_CLK_H_
 
 int mt8188_audsys_clk_register(struct mtk_base_afe *afe);
-void mt8188_audsys_clk_unregister(struct mtk_base_afe *afe);
 
 #endif
index 9ca2cb8..f35318a 100644 (file)
@@ -410,11 +410,6 @@ int mt8195_afe_init_clock(struct mtk_base_afe *afe)
        return 0;
 }
 
-void mt8195_afe_deinit_clock(struct mtk_base_afe *afe)
-{
-       mt8195_audsys_clk_unregister(afe);
-}
-
 int mt8195_afe_enable_clk(struct mtk_base_afe *afe, struct clk *clk)
 {
        int ret;
index 40663e3..a08c0ee 100644 (file)
@@ -101,7 +101,6 @@ int mt8195_afe_get_mclk_source_clk_id(int sel);
 int mt8195_afe_get_mclk_source_rate(struct mtk_base_afe *afe, int apll);
 int mt8195_afe_get_default_mclk_source_by_rate(int rate);
 int mt8195_afe_init_clock(struct mtk_base_afe *afe);
-void mt8195_afe_deinit_clock(struct mtk_base_afe *afe);
 int mt8195_afe_enable_clk(struct mtk_base_afe *afe, struct clk *clk);
 void mt8195_afe_disable_clk(struct mtk_base_afe *afe, struct clk *clk);
 int mt8195_afe_prepare_clk(struct mtk_base_afe *afe, struct clk *clk);
index 9e45efe..03dabc0 100644 (file)
@@ -3255,15 +3255,11 @@ err_pm_put:
 
 static void mt8195_afe_pcm_dev_remove(struct platform_device *pdev)
 {
-       struct mtk_base_afe *afe = platform_get_drvdata(pdev);
-
        snd_soc_unregister_component(&pdev->dev);
 
        pm_runtime_disable(&pdev->dev);
        if (!pm_runtime_status_suspended(&pdev->dev))
                mt8195_afe_runtime_suspend(&pdev->dev);
-
-       mt8195_afe_deinit_clock(afe);
 }
 
 static const struct of_device_id mt8195_afe_pcm_dt_match[] = {
index e0670e0..38594bc 100644 (file)
@@ -148,6 +148,29 @@ static const struct afe_gate aud_clks[CLK_AUD_NR_CLK] = {
        GATE_AUD6(CLK_AUD_GASRC19, "aud_gasrc19", "top_asm_h", 19),
 };
 
+static void mt8195_audsys_clk_unregister(void *data)
+{
+       struct mtk_base_afe *afe = data;
+       struct mt8195_afe_private *afe_priv = afe->platform_priv;
+       struct clk *clk;
+       struct clk_lookup *cl;
+       int i;
+
+       if (!afe_priv)
+               return;
+
+       for (i = 0; i < CLK_AUD_NR_CLK; i++) {
+               cl = afe_priv->lookup[i];
+               if (!cl)
+                       continue;
+
+               clk = cl->clk;
+               clk_unregister_gate(clk);
+
+               clkdev_drop(cl);
+       }
+}
+
 int mt8195_audsys_clk_register(struct mtk_base_afe *afe)
 {
        struct mt8195_afe_private *afe_priv = afe->platform_priv;
@@ -188,27 +211,5 @@ int mt8195_audsys_clk_register(struct mtk_base_afe *afe)
                afe_priv->lookup[i] = cl;
        }
 
-       return 0;
-}
-
-void mt8195_audsys_clk_unregister(struct mtk_base_afe *afe)
-{
-       struct mt8195_afe_private *afe_priv = afe->platform_priv;
-       struct clk *clk;
-       struct clk_lookup *cl;
-       int i;
-
-       if (!afe_priv)
-               return;
-
-       for (i = 0; i < CLK_AUD_NR_CLK; i++) {
-               cl = afe_priv->lookup[i];
-               if (!cl)
-                       continue;
-
-               clk = cl->clk;
-               clk_unregister_gate(clk);
-
-               clkdev_drop(cl);
-       }
+       return devm_add_action_or_reset(afe->dev, mt8195_audsys_clk_unregister, afe);
 }
index 239d310..69db2dd 100644 (file)
@@ -10,6 +10,5 @@
 #define _MT8195_AUDSYS_CLK_H_
 
 int mt8195_audsys_clk_register(struct mtk_base_afe *afe);
-void mt8195_audsys_clk_unregister(struct mtk_base_afe *afe);
 
 #endif
index adb69d7..4fb1ac8 100644 (file)
@@ -2405,6 +2405,9 @@ int dpcm_be_dai_prepare(struct snd_soc_pcm_runtime *fe, int stream)
                if (!snd_soc_dpcm_be_can_update(fe, be, stream))
                        continue;
 
+               if (!snd_soc_dpcm_can_be_prepared(fe, be, stream))
+                       continue;
+
                if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_HW_PARAMS) &&
                    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
                    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_SUSPEND) &&
@@ -3042,3 +3045,20 @@ int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
        return snd_soc_dpcm_check_state(fe, be, stream, state, ARRAY_SIZE(state));
 }
 EXPORT_SYMBOL_GPL(snd_soc_dpcm_can_be_params);
+
+/*
+ * We can only prepare a BE DAI if any of it's FE are not prepared,
+ * running or paused for the specified stream direction.
+ */
+int snd_soc_dpcm_can_be_prepared(struct snd_soc_pcm_runtime *fe,
+                                struct snd_soc_pcm_runtime *be, int stream)
+{
+       const enum snd_soc_dpcm_state state[] = {
+               SND_SOC_DPCM_STATE_START,
+               SND_SOC_DPCM_STATE_PAUSED,
+               SND_SOC_DPCM_STATE_PREPARE,
+       };
+
+       return snd_soc_dpcm_check_state(fe, be, stream, state, ARRAY_SIZE(state));
+}
+EXPORT_SYMBOL_GPL(snd_soc_dpcm_can_be_prepared);
index 4e0c48a..749e856 100644 (file)
@@ -209,7 +209,12 @@ int acp_sof_ipc_msg_data(struct snd_sof_dev *sdev, struct snd_sof_pcm_stream *sp
                acp_mailbox_read(sdev, offset, p, sz);
        } else {
                struct snd_pcm_substream *substream = sps->substream;
-               struct acp_dsp_stream *stream = substream->runtime->private_data;
+               struct acp_dsp_stream *stream;
+
+               if (!substream || !substream->runtime)
+                       return -ESTRPIPE;
+
+               stream = substream->runtime->private_data;
 
                if (!stream)
                        return -ESTRPIPE;
index b42b598..d547318 100644 (file)
@@ -438,8 +438,8 @@ void snd_sof_handle_fw_exception(struct snd_sof_dev *sdev, const char *msg)
                /* should we prevent DSP entering D3 ? */
                if (!sdev->ipc_dump_printed)
                        dev_info(sdev->dev,
-                                "preventing DSP entering D3 state to preserve context\n");
-               pm_runtime_get_noresume(sdev->dev);
+                                "Attempting to prevent DSP from entering D3 state to preserve context\n");
+               pm_runtime_get_if_in_use(sdev->dev);
        }
 
        /* dump vital information to the logs */
index 775582a..b7cbf66 100644 (file)
@@ -19,6 +19,9 @@
 
 #if IS_ENABLED(CONFIG_SND_SOC_SOF_HDA_MLINK)
 
+/* worst-case number of sublinks is used for sublink refcount array allocation only */
+#define HDAML_MAX_SUBLINKS (AZX_ML_LCTL_CPA_SHIFT - AZX_ML_LCTL_SPA_SHIFT)
+
 /**
  * struct hdac_ext2_link - HDAudio extended+alternate link
  *
@@ -33,6 +36,7 @@
  * @leptr:             extended link pointer
  * @eml_lock:          mutual exclusion to access shared registers e.g. CPA/SPA bits
  * in LCTL register
+ * @sublink_ref_count: array of refcounts, required to power-manage sublinks independently
  * @base_ptr:          pointer to shim/ip/shim_vs space
  * @instance_offset:   offset between each of @slcount instances managed by link
  * @shim_offset:       offset to SHIM register base
@@ -53,6 +57,7 @@ struct hdac_ext2_link {
        u32 leptr;
 
        struct mutex eml_lock; /* prevent concurrent access to e.g. CPA/SPA */
+       int sublink_ref_count[HDAML_MAX_SUBLINKS];
 
        /* internal values computed from LCAP contents */
        void __iomem *base_ptr;
@@ -68,6 +73,7 @@ struct hdac_ext2_link {
 #define AZX_REG_SDW_SHIM_OFFSET                                0x0
 #define AZX_REG_SDW_IP_OFFSET                          0x100
 #define AZX_REG_SDW_VS_SHIM_OFFSET                     0x6000
+#define AZX_REG_SDW_SHIM_PCMSyCM(y)                    (0x16 + 0x4 * (y))
 
 /* only one instance supported */
 #define AZX_REG_INTEL_DMIC_SHIM_OFFSET                 0x0
@@ -91,7 +97,7 @@ struct hdac_ext2_link {
  */
 
 static int hdaml_lnk_enum(struct device *dev, struct hdac_ext2_link *h2link,
-                         void __iomem *ml_addr, int link_idx)
+                         void __iomem *remap_addr, void __iomem *ml_addr, int link_idx)
 {
        struct hdac_ext_link *hlink = &h2link->hext_link;
        u32 base_offset;
@@ -126,15 +132,16 @@ static int hdaml_lnk_enum(struct device *dev, struct hdac_ext2_link *h2link,
                link_idx, h2link->slcount);
 
        /* find IP ID and offsets */
-       h2link->leptr = readl(hlink->ml_addr + AZX_REG_ML_LEPTR);
+       h2link->leptr = readl(ml_addr + AZX_REG_ML_LEPTR);
 
        h2link->elid = FIELD_GET(AZX_REG_ML_LEPTR_ID, h2link->leptr);
 
        base_offset = FIELD_GET(AZX_REG_ML_LEPTR_PTR, h2link->leptr);
-       h2link->base_ptr = hlink->ml_addr + base_offset;
+       h2link->base_ptr = remap_addr + base_offset;
 
        switch (h2link->elid) {
        case AZX_REG_ML_LEPTR_ID_SDW:
+               h2link->instance_offset = AZX_REG_SDW_INSTANCE_OFFSET;
                h2link->shim_offset = AZX_REG_SDW_SHIM_OFFSET;
                h2link->ip_offset = AZX_REG_SDW_IP_OFFSET;
                h2link->shim_vs_offset = AZX_REG_SDW_VS_SHIM_OFFSET;
@@ -149,6 +156,7 @@ static int hdaml_lnk_enum(struct device *dev, struct hdac_ext2_link *h2link,
                        link_idx, base_offset);
                break;
        case AZX_REG_ML_LEPTR_ID_INTEL_SSP:
+               h2link->instance_offset = AZX_REG_INTEL_SSP_INSTANCE_OFFSET;
                h2link->shim_offset = AZX_REG_INTEL_SSP_SHIM_OFFSET;
                h2link->ip_offset = AZX_REG_INTEL_SSP_IP_OFFSET;
                h2link->shim_vs_offset = AZX_REG_INTEL_SSP_VS_SHIM_OFFSET;
@@ -333,6 +341,21 @@ static void hdaml_link_set_lsdiid(u32 __iomem *lsdiid, int dev_num)
        writel(val, lsdiid);
 }
 
+static void hdaml_shim_map_stream_ch(u16 __iomem *pcmsycm, int lchan, int hchan,
+                                    int stream_id, int dir)
+{
+       u16 val;
+
+       val = readw(pcmsycm);
+
+       u16p_replace_bits(&val, lchan, GENMASK(3, 0));
+       u16p_replace_bits(&val, hchan, GENMASK(7, 4));
+       u16p_replace_bits(&val, stream_id, GENMASK(13, 8));
+       u16p_replace_bits(&val, dir, BIT(15));
+
+       writew(val, pcmsycm);
+}
+
 static void hdaml_lctl_offload_enable(u32 __iomem *lctl, bool enable)
 {
        u32 val = readl(lctl);
@@ -364,7 +387,7 @@ static int hda_ml_alloc_h2link(struct hdac_bus *bus, int index)
        hlink->bus = bus;
        hlink->ml_addr = bus->mlcap + AZX_ML_BASE + (AZX_ML_INTERVAL * index);
 
-       ret = hdaml_lnk_enum(bus->dev, h2link, hlink->ml_addr, index);
+       ret = hdaml_lnk_enum(bus->dev, h2link, bus->remap_addr, hlink->ml_addr, index);
        if (ret < 0) {
                kfree(h2link);
                return ret;
@@ -641,8 +664,13 @@ static int hdac_bus_eml_power_up_base(struct hdac_bus *bus, bool alt, int elid,
        if (eml_lock)
                mutex_lock(&h2link->eml_lock);
 
-       if (++hlink->ref_count > 1)
-               goto skip_init;
+       if (!alt) {
+               if (++hlink->ref_count > 1)
+                       goto skip_init;
+       } else {
+               if (++h2link->sublink_ref_count[sublink] > 1)
+                       goto skip_init;
+       }
 
        ret = hdaml_link_init(hlink->ml_addr + AZX_REG_ML_LCTL, sublink);
 
@@ -684,9 +712,13 @@ static int hdac_bus_eml_power_down_base(struct hdac_bus *bus, bool alt, int elid
        if (eml_lock)
                mutex_lock(&h2link->eml_lock);
 
-       if (--hlink->ref_count > 0)
-               goto skip_shutdown;
-
+       if (!alt) {
+               if (--hlink->ref_count > 0)
+                       goto skip_shutdown;
+       } else {
+               if (--h2link->sublink_ref_count[sublink] > 0)
+                       goto skip_shutdown;
+       }
        ret = hdaml_link_shutdown(hlink->ml_addr + AZX_REG_ML_LCTL, sublink);
 
 skip_shutdown:
@@ -740,6 +772,40 @@ int hdac_bus_eml_sdw_set_lsdiid(struct hdac_bus *bus, int sublink, int dev_num)
        return 0;
 } EXPORT_SYMBOL_NS(hdac_bus_eml_sdw_set_lsdiid, SND_SOC_SOF_HDA_MLINK);
 
+/*
+ * the 'y' parameter comes from the PCMSyCM hardware register naming. 'y' refers to the
+ * PDI index, i.e. the FIFO used for RX or TX
+ */
+int hdac_bus_eml_sdw_map_stream_ch(struct hdac_bus *bus, int sublink, int y,
+                                  int channel_mask, int stream_id, int dir)
+{
+       struct hdac_ext2_link *h2link;
+       u16 __iomem *pcmsycm;
+       u16 val;
+
+       h2link = find_ext2_link(bus, true, AZX_REG_ML_LEPTR_ID_SDW);
+       if (!h2link)
+               return -ENODEV;
+
+       pcmsycm = h2link->base_ptr + h2link->shim_offset +
+               h2link->instance_offset * sublink +
+               AZX_REG_SDW_SHIM_PCMSyCM(y);
+
+       mutex_lock(&h2link->eml_lock);
+
+       hdaml_shim_map_stream_ch(pcmsycm, 0, hweight32(channel_mask),
+                                stream_id, dir);
+
+       mutex_unlock(&h2link->eml_lock);
+
+       val = readw(pcmsycm);
+
+       dev_dbg(bus->dev, "channel_mask %#x stream_id %d dir %d pcmscm %#x\n",
+               channel_mask, stream_id, dir, val);
+
+       return 0;
+} EXPORT_SYMBOL_NS(hdac_bus_eml_sdw_map_stream_ch, SND_SOC_SOF_HDA_MLINK);
+
 void hda_bus_ml_put_all(struct hdac_bus *bus)
 {
        struct hdac_ext_link *hlink;
@@ -836,6 +902,18 @@ struct hdac_ext_link *hdac_bus_eml_dmic_get_hlink(struct hdac_bus *bus)
 }
 EXPORT_SYMBOL_NS(hdac_bus_eml_dmic_get_hlink, SND_SOC_SOF_HDA_MLINK);
 
+struct hdac_ext_link *hdac_bus_eml_sdw_get_hlink(struct hdac_bus *bus)
+{
+       struct hdac_ext2_link *h2link;
+
+       h2link = find_ext2_link(bus, true, AZX_REG_ML_LEPTR_ID_SDW);
+       if (!h2link)
+               return NULL;
+
+       return &h2link->hext_link;
+}
+EXPORT_SYMBOL_NS(hdac_bus_eml_sdw_get_hlink, SND_SOC_SOF_HDA_MLINK);
+
 int hdac_bus_eml_enable_offload(struct hdac_bus *bus, bool alt, int elid, bool enable)
 {
        struct hdac_ext2_link *h2link;
index fc1eb8e..ba4ef29 100644 (file)
@@ -2103,10 +2103,13 @@ static int sof_ipc3_dai_config(struct snd_sof_dev *sdev, struct snd_sof_widget *
         * For the case of PAUSE/HW_FREE, since there are no quirks, flags can be used as is.
         */
 
-       if (flags & SOF_DAI_CONFIG_FLAGS_HW_PARAMS)
+       if (flags & SOF_DAI_CONFIG_FLAGS_HW_PARAMS) {
+               /* Clear stale command */
+               config->flags &= ~SOF_DAI_CONFIG_FLAGS_CMD_MASK;
                config->flags |= flags;
-       else
+       } else {
                config->flags = flags;
+       }
 
        /* only send the IPC if the widget is set up in the DSP */
        if (swidget->use_count > 0) {
index 059eebf..5abe616 100644 (file)
@@ -59,7 +59,7 @@ static const struct sof_topology_token ipc4_in_audio_format_tokens[] = {
                audio_fmt.interleaving_style)},
        {SOF_TKN_CAVS_AUDIO_FORMAT_IN_FMT_CFG, SND_SOC_TPLG_TUPLE_TYPE_WORD, get_token_u32,
                offsetof(struct sof_ipc4_pin_format, audio_fmt.fmt_cfg)},
-       {SOF_TKN_CAVS_AUDIO_FORMAT_PIN_INDEX, SND_SOC_TPLG_TUPLE_TYPE_WORD, get_token_u32,
+       {SOF_TKN_CAVS_AUDIO_FORMAT_INPUT_PIN_INDEX, SND_SOC_TPLG_TUPLE_TYPE_WORD, get_token_u32,
                offsetof(struct sof_ipc4_pin_format, pin_index)},
        {SOF_TKN_CAVS_AUDIO_FORMAT_IBS, SND_SOC_TPLG_TUPLE_TYPE_WORD, get_token_u32,
                offsetof(struct sof_ipc4_pin_format, buffer_size)},
@@ -79,7 +79,7 @@ static const struct sof_topology_token ipc4_out_audio_format_tokens[] = {
                audio_fmt.interleaving_style)},
        {SOF_TKN_CAVS_AUDIO_FORMAT_OUT_FMT_CFG, SND_SOC_TPLG_TUPLE_TYPE_WORD, get_token_u32,
                offsetof(struct sof_ipc4_pin_format, audio_fmt.fmt_cfg)},
-       {SOF_TKN_CAVS_AUDIO_FORMAT_PIN_INDEX, SND_SOC_TPLG_TUPLE_TYPE_WORD, get_token_u32,
+       {SOF_TKN_CAVS_AUDIO_FORMAT_OUTPUT_PIN_INDEX, SND_SOC_TPLG_TUPLE_TYPE_WORD, get_token_u32,
                offsetof(struct sof_ipc4_pin_format, pin_index)},
        {SOF_TKN_CAVS_AUDIO_FORMAT_OBS, SND_SOC_TPLG_TUPLE_TYPE_WORD, get_token_u32,
                offsetof(struct sof_ipc4_pin_format, buffer_size)},
index 567db32..d0ab6f3 100644 (file)
@@ -643,16 +643,17 @@ static int sof_pcm_probe(struct snd_soc_component *component)
                                       "%s/%s",
                                       plat_data->tplg_filename_prefix,
                                       plat_data->tplg_filename);
-       if (!tplg_filename)
-               return -ENOMEM;
+       if (!tplg_filename) {
+               ret = -ENOMEM;
+               goto pm_error;
+       }
 
        ret = snd_sof_load_topology(component, tplg_filename);
-       if (ret < 0) {
+       if (ret < 0)
                dev_err(component->dev, "error: failed to load DSP topology %d\n",
                        ret);
-               return ret;
-       }
 
+pm_error:
        pm_runtime_mark_last_busy(component->dev);
        pm_runtime_put_autosuspend(component->dev);
 
index 2fdbc53..2b23244 100644 (file)
@@ -164,7 +164,7 @@ static int sof_resume(struct device *dev, bool runtime_resume)
                ret = tplg_ops->set_up_all_pipelines(sdev, false);
                if (ret < 0) {
                        dev_err(sdev->dev, "Failed to restore pipeline after resume %d\n", ret);
-                       return ret;
+                       goto setup_fail;
                }
        }
 
@@ -178,6 +178,18 @@ static int sof_resume(struct device *dev, bool runtime_resume)
                        dev_err(sdev->dev, "ctx_restore IPC error during resume: %d\n", ret);
        }
 
+setup_fail:
+#if IS_ENABLED(CONFIG_SND_SOC_SOF_DEBUG_ENABLE_DEBUGFS_CACHE)
+       if (ret < 0) {
+               /*
+                * Debugfs cannot be read in runtime suspend, so cache
+                * the contents upon failure. This allows to capture
+                * possible DSP coredump information.
+                */
+               sof_cache_debugfs(sdev);
+       }
+#endif
+
        return ret;
 }
 
index fff1268..8d9e9d5 100644 (file)
@@ -218,12 +218,7 @@ static ssize_t sof_probes_dfs_points_read(struct file *file, char __user *to,
 
        ret = ipc->points_info(cdev, &desc, &num_desc);
        if (ret < 0)
-               goto exit;
-
-       pm_runtime_mark_last_busy(dev);
-       err = pm_runtime_put_autosuspend(dev);
-       if (err < 0)
-               dev_err_ratelimited(dev, "debugfs read failed to idle %d\n", err);
+               goto pm_error;
 
        for (i = 0; i < num_desc; i++) {
                offset = strlen(buf);
@@ -241,6 +236,13 @@ static ssize_t sof_probes_dfs_points_read(struct file *file, char __user *to,
        ret = simple_read_from_buffer(to, count, ppos, buf, strlen(buf));
 
        kfree(desc);
+
+pm_error:
+       pm_runtime_mark_last_busy(dev);
+       err = pm_runtime_put_autosuspend(dev);
+       if (err < 0)
+               dev_err_ratelimited(dev, "debugfs read failed to idle %d\n", err);
+
 exit:
        kfree(buf);
        return ret;
index d3d536b..f160dc4 100644 (file)
@@ -586,6 +586,10 @@ static int sof_copy_tuples(struct snd_sof_dev *sdev, struct snd_soc_tplg_vendor_
                                if (*num_copied_tuples == tuples_size)
                                        return 0;
                        }
+
+                       /* stop when we've found the required token instances */
+                       if (found == num_tokens * token_instance_num)
+                               return 0;
                }
 
                /* next array */
@@ -1261,7 +1265,7 @@ static int sof_widget_parse_tokens(struct snd_soc_component *scomp, struct snd_s
                if (num_sets > 1) {
                        struct snd_sof_tuple *new_tuples;
 
-                       num_tuples += token_list[object_token_list[i]].count * num_sets;
+                       num_tuples += token_list[object_token_list[i]].count * (num_sets - 1);
                        new_tuples = krealloc(swidget->tuples,
                                              sizeof(*new_tuples) * num_tuples, GFP_KERNEL);
                        if (!new_tuples) {
index 468c8e7..0b69ceb 100644 (file)
@@ -117,6 +117,9 @@ int tegra_pcm_open(struct snd_soc_component *component,
                return ret;
        }
 
+       /* Set wait time to 500ms by default */
+       substream->wait_time = 500;
+
        return 0;
 }
 EXPORT_SYMBOL_GPL(tegra_pcm_open);
index 4b1c5ba..ab5fed9 100644 (file)
@@ -423,6 +423,7 @@ static int line6_parse_audio_format_rates_quirk(struct snd_usb_audio *chip,
        case USB_ID(0x0e41, 0x4248): /* Line6 Helix >= fw 2.82 */
        case USB_ID(0x0e41, 0x4249): /* Line6 Helix Rack >= fw 2.82 */
        case USB_ID(0x0e41, 0x424a): /* Line6 Helix LT >= fw 2.82 */
+       case USB_ID(0x0e41, 0x424b): /* Line6 Pod Go */
        case USB_ID(0x19f7, 0x0011): /* Rode Rodecaster Pro */
                return set_fixed_rate(fp, 48000, SNDRV_PCM_RATE_48000);
        }
index eec5232..08bf535 100644 (file)
@@ -650,6 +650,10 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
                goto unlock;
        }
 
+       ret = snd_usb_pcm_change_state(subs, UAC3_PD_STATE_D0);
+       if (ret < 0)
+               goto unlock;
+
  again:
        if (subs->sync_endpoint) {
                ret = snd_usb_endpoint_prepare(chip, subs->sync_endpoint);
index 3ecd1ba..6cf55b7 100644 (file)
@@ -2191,6 +2191,8 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
                   QUIRK_FLAG_DSD_RAW),
        VENDOR_FLG(0x2ab6, /* T+A devices */
                   QUIRK_FLAG_DSD_RAW),
+       VENDOR_FLG(0x3336, /* HEM devices */
+                  QUIRK_FLAG_DSD_RAW),
        VENDOR_FLG(0x3353, /* Khadas devices */
                   QUIRK_FLAG_DSD_RAW),
        VENDOR_FLG(0x3842, /* EVGA */
index f8129c6..f7ddd73 100644 (file)
@@ -198,6 +198,15 @@ struct kvm_arm_copy_mte_tags {
        __u64 reserved[2];
 };
 
+/*
+ * Counter/Timer offset structure. Describe the virtual/physical offset.
+ * To be used with KVM_ARM_SET_COUNTER_OFFSET.
+ */
+struct kvm_arm_counter_offset {
+       __u64 counter_offset;
+       __u64 reserved;
+};
+
 #define KVM_ARM_TAGS_TO_GUEST          0
 #define KVM_ARM_TAGS_FROM_GUEST                1
 
@@ -372,6 +381,10 @@ enum {
 #endif
 };
 
+/* Device Control API on vm fd */
+#define KVM_ARM_VM_SMCCC_CTRL          0
+#define   KVM_ARM_VM_SMCCC_FILTER      0
+
 /* Device Control API: ARM VGIC */
 #define KVM_DEV_ARM_VGIC_GRP_ADDR      0
 #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
@@ -411,6 +424,8 @@ enum {
 #define KVM_ARM_VCPU_TIMER_CTRL                1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER                0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER                1
+#define   KVM_ARM_VCPU_TIMER_IRQ_HVTIMER       2
+#define   KVM_ARM_VCPU_TIMER_IRQ_HPTIMER       3
 #define KVM_ARM_VCPU_PVTIME_CTRL       2
 #define   KVM_ARM_VCPU_PVTIME_IPA      0
 
@@ -469,6 +484,27 @@ enum {
 /* run->fail_entry.hardware_entry_failure_reason codes. */
 #define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED    (1ULL << 0)
 
+enum kvm_smccc_filter_action {
+       KVM_SMCCC_FILTER_HANDLE = 0,
+       KVM_SMCCC_FILTER_DENY,
+       KVM_SMCCC_FILTER_FWD_TO_USER,
+
+#ifdef __KERNEL__
+       NR_SMCCC_FILTER_ACTIONS
+#endif
+};
+
+struct kvm_smccc_filter {
+       __u32 base;
+       __u32 nr_functions;
+       __u8 action;
+       __u8 pad[15];
+};
+
+/* arm64-specific KVM_EXIT_HYPERCALL flags */
+#define KVM_HYPERCALL_EXIT_SMC         (1U << 0)
+#define KVM_HYPERCALL_EXIT_16BIT       (1U << 1)
+
 #endif
 
 #endif /* __ARM_KVM_H__ */
index b890058..cb8ca46 100644 (file)
@@ -97,7 +97,7 @@
 #define X86_FEATURE_SYSENTER32         ( 3*32+15) /* "" sysenter in IA32 userspace */
 #define X86_FEATURE_REP_GOOD           ( 3*32+16) /* REP microcode works well */
 #define X86_FEATURE_AMD_LBR_V2         ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
-#define X86_FEATURE_LFENCE_RDTSC       ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
+/* FREE, was #define X86_FEATURE_LFENCE_RDTSC          ( 3*32+18) "" LFENCE synchronizes RDTSC */
 #define X86_FEATURE_ACC_POWER          ( 3*32+19) /* AMD Accumulated Power Mechanism */
 #define X86_FEATURE_NOPL               ( 3*32+20) /* The NOPL (0F 1F) instructions */
 #define X86_FEATURE_ALWAYS             ( 3*32+21) /* "" Always-present feature */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW         ( 8*32+ 0) /* Intel TPR Shadow */
-#define X86_FEATURE_VNMI               ( 8*32+ 1) /* Intel Virtual NMI */
-#define X86_FEATURE_FLEXPRIORITY       ( 8*32+ 2) /* Intel FlexPriority */
-#define X86_FEATURE_EPT                        ( 8*32+ 3) /* Intel Extended Page Table */
-#define X86_FEATURE_VPID               ( 8*32+ 4) /* Intel Virtual Processor ID */
+#define X86_FEATURE_FLEXPRIORITY       ( 8*32+ 1) /* Intel FlexPriority */
+#define X86_FEATURE_EPT                        ( 8*32+ 2) /* Intel Extended Page Table */
+#define X86_FEATURE_VPID               ( 8*32+ 3) /* Intel Virtual Processor ID */
 
 #define X86_FEATURE_VMMCALL            ( 8*32+15) /* Prefer VMMCALL to VMCALL */
 #define X86_FEATURE_XENPV              ( 8*32+16) /* "" Xen paravirtual guest */
 #define X86_FEATURE_SGX_EDECCSSA       (11*32+18) /* "" SGX EDECCSSA user leaf function */
 #define X86_FEATURE_CALL_DEPTH         (11*32+19) /* "" Call depth tracking for RSB stuffing */
 #define X86_FEATURE_MSR_TSX_CTRL       (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
+#define X86_FEATURE_SMBA               (11*32+21) /* "" Slow Memory Bandwidth Allocation */
+#define X86_FEATURE_BMEC               (11*32+22) /* "" Bandwidth Monitoring Event Configuration */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI           (12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16                (12*32+ 5) /* AVX512 BFLOAT16 instructions */
 #define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* "" CMPccXADD instructions */
+#define X86_FEATURE_ARCH_PERFMON_EXT   (12*32+ 8) /* "" Intel Architectural PerfMon Extension */
+#define X86_FEATURE_FZRM               (12*32+10) /* "" Fast zero-length REP MOVSB */
+#define X86_FEATURE_FSRS               (12*32+11) /* "" Fast short REP STOSB */
+#define X86_FEATURE_FSRC               (12*32+12) /* "" Fast short REP {CMPSB,SCASB} */
 #define X86_FEATURE_LKGS               (12*32+18) /* "" Load "kernel" (userspace) GS */
 #define X86_FEATURE_AMX_FP16           (12*32+21) /* "" AMX fp16 Support */
 #define X86_FEATURE_AVX_IFMA            (12*32+23) /* "" Support for VPMADD52[H,L]UQ */
+#define X86_FEATURE_LAM                        (12*32+26) /* Linear Address Masking */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO             (13*32+ 0) /* CLZERO instruction */
 #define X86_FEATURE_VIRT_SSBD          (13*32+25) /* Virtualized Speculative Store Bypass Disable */
 #define X86_FEATURE_AMD_SSB_NO         (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
 #define X86_FEATURE_CPPC               (13*32+27) /* Collaborative Processor Performance Control */
+#define X86_FEATURE_AMD_PSFD            (13*32+28) /* "" Predictive Store Forwarding Disable */
 #define X86_FEATURE_BTC_NO             (13*32+29) /* "" Not vulnerable to Branch Type Confusion */
 #define X86_FEATURE_BRS                        (13*32+31) /* Branch Sampling available */
 
 #define X86_FEATURE_VGIF               (15*32+16) /* Virtual GIF */
 #define X86_FEATURE_X2AVIC             (15*32+18) /* Virtual x2apic */
 #define X86_FEATURE_V_SPEC_CTRL                (15*32+20) /* Virtual SPEC_CTRL */
+#define X86_FEATURE_VNMI               (15*32+25) /* Virtual NMI */
 #define X86_FEATURE_SVME_ADDR_CHK      (15*32+28) /* "" SVME addr check */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */
 #define X86_FEATURE_V_TSC_AUX          (19*32+ 9) /* "" Virtual TSC_AUX */
 #define X86_FEATURE_SME_COHERENT       (19*32+10) /* "" AMD hardware-enforced cache coherency */
 
+/* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */
+#define X86_FEATURE_NO_NESTED_DATA_BP  (20*32+ 0) /* "" No Nested Data Breakpoints */
+#define X86_FEATURE_LFENCE_RDTSC       (20*32+ 2) /* "" LFENCE always serializing / synchronizes RDTSC */
+#define X86_FEATURE_NULL_SEL_CLR_BASE  (20*32+ 6) /* "" Null Selector Clears Base */
+#define X86_FEATURE_AUTOIBRS           (20*32+ 8) /* "" Automatic IBRS */
+#define X86_FEATURE_NO_SMM_CTL_MSR     (20*32+ 9) /* "" SMM_CTL MSR is not present */
+
 /*
  * BUG word(s)
  */
 #define X86_BUG_MMIO_UNKNOWN           X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */
 #define X86_BUG_RETBLEED               X86_BUG(27) /* CPU is affected by RETBleed */
 #define X86_BUG_EIBRS_PBRSB            X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
+#define X86_BUG_SMT_RSB                        X86_BUG(29) /* CPU is vulnerable to Cross-Thread Return Address Predictions */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
index 5dfa4fb..fafe9be 100644 (file)
 # define DISABLE_CALL_DEPTH_TRACKING   (1 << (X86_FEATURE_CALL_DEPTH & 31))
 #endif
 
+#ifdef CONFIG_ADDRESS_MASKING
+# define DISABLE_LAM           0
+#else
+# define DISABLE_LAM           (1 << (X86_FEATURE_LAM & 31))
+#endif
+
 #ifdef CONFIG_INTEL_IOMMU_SVM
 # define DISABLE_ENQCMD                0
 #else
 #define DISABLED_MASK10        0
 #define DISABLED_MASK11        (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
                         DISABLE_CALL_DEPTH_TRACKING)
-#define DISABLED_MASK12        0
+#define DISABLED_MASK12        (DISABLE_LAM)
 #define DISABLED_MASK13        0
 #define DISABLED_MASK14        0
 #define DISABLED_MASK15        0
index ad35355..3aedae6 100644 (file)
 
 /* Abbreviated from Intel SDM name IA32_INTEGRITY_CAPABILITIES */
 #define MSR_INTEGRITY_CAPS                     0x000002d9
+#define MSR_INTEGRITY_CAPS_ARRAY_BIST_BIT      2
+#define MSR_INTEGRITY_CAPS_ARRAY_BIST          BIT(MSR_INTEGRITY_CAPS_ARRAY_BIST_BIT)
 #define MSR_INTEGRITY_CAPS_PERIODIC_BIST_BIT   4
 #define MSR_INTEGRITY_CAPS_PERIODIC_BIST       BIT(MSR_INTEGRITY_CAPS_PERIODIC_BIST_BIT)
 
index c5573ea..1c1b755 100644 (file)
@@ -34,6 +34,8 @@
 #define BYTES_NOP7     0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
 #define BYTES_NOP8     0x3e,BYTES_NOP7
 
+#define ASM_NOP_MAX 8
+
 #else
 
 /*
@@ -47,6 +49,9 @@
  * 6: osp nopl 0x00(%eax,%eax,1)
  * 7: nopl 0x00000000(%eax)
  * 8: nopl 0x00000000(%eax,%eax,1)
+ * 9: cs nopl 0x00000000(%eax,%eax,1)
+ * 10: osp cs nopl 0x00000000(%eax,%eax,1)
+ * 11: osp osp cs nopl 0x00000000(%eax,%eax,1)
  */
 #define BYTES_NOP1     0x90
 #define BYTES_NOP2     0x66,BYTES_NOP1
 #define BYTES_NOP6     0x66,BYTES_NOP5
 #define BYTES_NOP7     0x0f,0x1f,0x80,0x00,0x00,0x00,0x00
 #define BYTES_NOP8     0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
+#define BYTES_NOP9     0x2e,BYTES_NOP8
+#define BYTES_NOP10    0x66,BYTES_NOP9
+#define BYTES_NOP11    0x66,BYTES_NOP10
+
+#define ASM_NOP9  _ASM_BYTES(BYTES_NOP9)
+#define ASM_NOP10 _ASM_BYTES(BYTES_NOP10)
+#define ASM_NOP11 _ASM_BYTES(BYTES_NOP11)
+
+#define ASM_NOP_MAX 11
 
 #endif /* CONFIG_64BIT */
 
@@ -68,8 +82,6 @@
 #define ASM_NOP7 _ASM_BYTES(BYTES_NOP7)
 #define ASM_NOP8 _ASM_BYTES(BYTES_NOP8)
 
-#define ASM_NOP_MAX 8
-
 #ifndef __ASSEMBLY__
 extern const unsigned char * const x86_nops[];
 #endif
index 7f467fe..1a6a1f9 100644 (file)
@@ -559,4 +559,7 @@ struct kvm_pmu_event_filter {
 #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
 
+/* x86-specific KVM_EXIT_HYPERCALL flags. */
+#define KVM_EXIT_HYPERCALL_LONG_MODE   BIT(0)
+
 #endif /* _ASM_X86_KVM_H */
index 500b96e..e8d7ebb 100644 (file)
 #define ARCH_GET_XCOMP_GUEST_PERM      0x1024
 #define ARCH_REQ_XCOMP_GUEST_PERM      0x1025
 
+#define ARCH_XCOMP_TILECFG             17
+#define ARCH_XCOMP_TILEDATA            18
+
 #define ARCH_MAP_VDSO_X32              0x2001
 #define ARCH_MAP_VDSO_32               0x2002
 #define ARCH_MAP_VDSO_64               0x2003
 
+#define ARCH_GET_UNTAG_MASK            0x4001
+#define ARCH_ENABLE_TAGGED_ADDR                0x4002
+#define ARCH_GET_MAX_TAG_BITS          0x4003
+#define ARCH_FORCE_TAGGED_SVA          0x4004
+
 #endif /* _ASM_X86_PRCTL_H */
index b8ddfc4..bc48a4d 100644 (file)
@@ -2,6 +2,9 @@
 #ifndef __NR_fork
 #define __NR_fork 2
 #endif
+#ifndef __NR_execve
+#define __NR_execve 11
+#endif
 #ifndef __NR_getppid
 #define __NR_getppid 64
 #endif
diff --git a/tools/arch/x86/kcpuid/.gitignore b/tools/arch/x86/kcpuid/.gitignore
new file mode 100644 (file)
index 0000000..1b8541b
--- /dev/null
@@ -0,0 +1 @@
+kcpuid
index 416f5b3..24b7d01 100644 (file)
@@ -517,15 +517,16 @@ static void show_range(struct cpuid_range *range)
 static inline struct cpuid_func *index_to_func(u32 index)
 {
        struct cpuid_range *range;
+       u32 func_idx;
 
        range = (index & 0x80000000) ? leafs_ext : leafs_basic;
-       index &= 0x7FFFFFFF;
+       func_idx = index & 0xffff;
 
-       if (((index & 0xFFFF) + 1) > (u32)range->nr) {
+       if ((func_idx + 1) > (u32)range->nr) {
                printf("ERR: invalid input index (0x%x)\n", index);
                return NULL;
        }
-       return &range->funcs[index];
+       return &range->funcs[func_idx];
 }
 
 static void show_info(void)
index a91ac66..d055b82 100644 (file)
 .section .noinstr.text, "ax"
 
 /*
- * We build a jump to memcpy_orig by default which gets NOPped out on
- * the majority of x86 CPUs which set REP_GOOD. In addition, CPUs which
- * have the enhanced REP MOVSB/STOSB feature (ERMS), change those NOPs
- * to a jmp to memcpy_erms which does the REP; MOVSB mem copy.
- */
-
-/*
  * memcpy - Copy a memory block.
  *
  * Input:
  *
  * Output:
  * rax original destination
+ *
+ * The FSRM alternative should be done inline (avoiding the call and
+ * the disgusting return handling), but that would require some help
+ * from the compiler for better calling conventions.
+ *
+ * The 'rep movsb' itself is small enough to replace the call, but the
+ * two register moves blow up the code. And one of them is "needed"
+ * only for the return value that is the same as the source input,
+ * which the compiler could/should do much better anyway.
  */
 SYM_TYPED_FUNC_START(__memcpy)
-       ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \
-                     "jmp memcpy_erms", X86_FEATURE_ERMS
+       ALTERNATIVE "jmp memcpy_orig", "", X86_FEATURE_FSRM
 
        movq %rdi, %rax
        movq %rdx, %rcx
-       shrq $3, %rcx
-       andl $7, %edx
-       rep movsq
-       movl %edx, %ecx
        rep movsb
        RET
 SYM_FUNC_END(__memcpy)
@@ -45,17 +42,6 @@ EXPORT_SYMBOL(__memcpy)
 SYM_FUNC_ALIAS(memcpy, __memcpy)
 EXPORT_SYMBOL(memcpy)
 
-/*
- * memcpy_erms() - enhanced fast string memcpy. This is faster and
- * simpler than memcpy. Use memcpy_erms when possible.
- */
-SYM_FUNC_START_LOCAL(memcpy_erms)
-       movq %rdi, %rax
-       movq %rdx, %rcx
-       rep movsb
-       RET
-SYM_FUNC_END(memcpy_erms)
-
 SYM_FUNC_START_LOCAL(memcpy_orig)
        movq %rdi, %rax
 
index 6143b1a..7c59a70 100644 (file)
  * rdx   count (bytes)
  *
  * rax   original destination
+ *
+ * The FSRS alternative should be done inline (avoiding the call and
+ * the disgusting return handling), but that would require some help
+ * from the compiler for better calling conventions.
+ *
+ * The 'rep stosb' itself is small enough to replace the call, but all
+ * the register moves blow up the code. And two of them are "needed"
+ * only for the return value that is the same as the source input,
+ * which the compiler could/should do much better anyway.
  */
 SYM_FUNC_START(__memset)
-       /*
-        * Some CPUs support enhanced REP MOVSB/STOSB feature. It is recommended
-        * to use it when possible. If not available, use fast string instructions.
-        *
-        * Otherwise, use original memset function.
-        */
-       ALTERNATIVE_2 "jmp memset_orig", "", X86_FEATURE_REP_GOOD, \
-                     "jmp memset_erms", X86_FEATURE_ERMS
+       ALTERNATIVE "jmp memset_orig", "", X86_FEATURE_FSRS
 
        movq %rdi,%r9
+       movb %sil,%al
        movq %rdx,%rcx
-       andl $7,%edx
-       shrq $3,%rcx
-       /* expand byte value  */
-       movzbl %sil,%esi
-       movabs $0x0101010101010101,%rax
-       imulq %rsi,%rax
-       rep stosq
-       movl %edx,%ecx
        rep stosb
        movq %r9,%rax
        RET
@@ -48,26 +43,6 @@ EXPORT_SYMBOL(__memset)
 SYM_FUNC_ALIAS(memset, __memset)
 EXPORT_SYMBOL(memset)
 
-/*
- * ISO C memset - set a memory block to a byte value. This function uses
- * enhanced rep stosb to override the fast string function.
- * The code is simpler and shorter than the fast string function as well.
- *
- * rdi   destination
- * rsi   value (char)
- * rdx   count (bytes)
- *
- * rax   original destination
- */
-SYM_FUNC_START_LOCAL(memset_erms)
-       movq %rdi,%r9
-       movb %sil,%al
-       movq %rdx,%rcx
-       rep stosb
-       movq %r9,%rax
-       RET
-SYM_FUNC_END(memset_erms)
-
 SYM_FUNC_START_LOCAL(memset_orig)
        movq %rdi,%r10
 
index c61d061..52a0be4 100644 (file)
@@ -94,7 +94,7 @@ static void print_attributes(struct gpio_v2_line_info *info)
        for (i = 0; i < info->num_attrs; i++) {
                if (info->attrs[i].id == GPIO_V2_LINE_ATTR_ID_DEBOUNCE)
                        fprintf(stdout, ", debounce_period=%dusec",
-                               info->attrs[0].debounce_period_us);
+                               info->attrs[i].debounce_period_us);
        }
 }
 
index b54bd86..7ce02a2 100644 (file)
@@ -4,7 +4,6 @@
 
 /* Just disable it so we can build arch/x86/lib/memcpy_64.S for perf bench: */
 
-#define altinstruction_entry #
-#define ALTERNATIVE_2 #
+#define ALTERNATIVE #
 
 #endif
index cef3b1c..51ac441 100644 (file)
  */
 #define CORESIGHT_LEGACY_CPU_TRACE_ID(cpu)  (0x10 + (cpu * 2))
 
-/* CoreSight trace ID is currently the bottom 7 bits of the value */
-#define CORESIGHT_TRACE_ID_VAL_MASK    GENMASK(6, 0)
-
-/*
- * perf record will set the legacy meta data values as unused initially.
- * This allows perf report to manage the decoders created when dynamic
- * allocation in operation.
- */
-#define CORESIGHT_TRACE_ID_UNUSED_FLAG BIT(31)
-
-/* Value to set for unused trace ID values */
-#define CORESIGHT_TRACE_ID_UNUSED_VAL  0x7F
-
 /*
  * Below are the definition of bit offsets for perf option, and works as
  * arbitrary values for all ETM versions.
index 9839fea..64d67b0 100644 (file)
@@ -25,8 +25,23 @@ endif
 
 nolibc_arch := $(patsubst arm64,aarch64,$(ARCH))
 arch_file := arch-$(nolibc_arch).h
-all_files := ctype.h errno.h nolibc.h signal.h stackprotector.h std.h stdint.h \
-             stdio.h stdlib.h string.h sys.h time.h types.h unistd.h
+all_files := \
+               compiler.h \
+               ctype.h \
+               errno.h \
+               nolibc.h \
+               signal.h \
+               stackprotector.h \
+               std.h \
+               stdint.h \
+               stdlib.h \
+               string.h \
+               sys.h \
+               time.h \
+               types.h \
+               unistd.h \
+               stdio.h \
+
 
 # install all headers needed to support a bare-metal compiler
 all: headers
index 383badd..11f294a 100644 (file)
@@ -7,6 +7,8 @@
 #ifndef _NOLIBC_ARCH_AARCH64_H
 #define _NOLIBC_ARCH_AARCH64_H
 
+#include "compiler.h"
+
 /* The struct returned by the newfstatat() syscall. Differs slightly from the
  * x86_64's stat one by field ordering, so be careful.
  */
@@ -173,27 +175,30 @@ char **environ __attribute__((weak));
 const unsigned long *_auxv __attribute__((weak));
 
 /* startup code */
-void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void)
 {
        __asm__ volatile (
-               "ldr x0, [sp]\n"     // argc (x0) was in the stack
-               "add x1, sp, 8\n"    // argv (x1) = sp
-               "lsl x2, x0, 3\n"    // envp (x2) = 8*argc ...
-               "add x2, x2, 8\n"    //           + 8 (skip null)
-               "add x2, x2, x1\n"   //           + argv
-               "adrp x3, environ\n"          // x3 = &environ (high bits)
-               "str x2, [x3, #:lo12:environ]\n" // store envp into environ
-               "mov x4, x2\n"       // search for auxv (follows NULL after last env)
+#ifdef _NOLIBC_STACKPROTECTOR
+               "bl __stack_chk_init\n"   /* initialize stack protector                     */
+#endif
+               "ldr x0, [sp]\n"     /* argc (x0) was in the stack                          */
+               "add x1, sp, 8\n"    /* argv (x1) = sp                                      */
+               "lsl x2, x0, 3\n"    /* envp (x2) = 8*argc ...                              */
+               "add x2, x2, 8\n"    /*           + 8 (skip null)                           */
+               "add x2, x2, x1\n"   /*           + argv                                    */
+               "adrp x3, environ\n"          /* x3 = &environ (high bits)                  */
+               "str x2, [x3, #:lo12:environ]\n" /* store envp into environ                 */
+               "mov x4, x2\n"       /* search for auxv (follows NULL after last env)       */
                "0:\n"
-               "ldr x5, [x4], 8\n"  // x5 = *x4; x4 += 8
-               "cbnz x5, 0b\n"      // and stop at NULL after last env
-               "adrp x3, _auxv\n"   // x3 = &_auxv (high bits)
-               "str x4, [x3, #:lo12:_auxv]\n" // store x4 into _auxv
-               "and sp, x1, -16\n"  // sp must be 16-byte aligned in the callee
-               "bl main\n"          // main() returns the status code, we'll exit with it.
-               "mov x8, 93\n"       // NR_exit == 93
+               "ldr x5, [x4], 8\n"  /* x5 = *x4; x4 += 8                                   */
+               "cbnz x5, 0b\n"      /* and stop at NULL after last env                     */
+               "adrp x3, _auxv\n"   /* x3 = &_auxv (high bits)                             */
+               "str x4, [x3, #:lo12:_auxv]\n" /* store x4 into _auxv                       */
+               "and sp, x1, -16\n"  /* sp must be 16-byte aligned in the callee            */
+               "bl main\n"          /* main() returns the status code, we'll exit with it. */
+               "mov x8, 93\n"       /* NR_exit == 93                                       */
                "svc #0\n"
        );
        __builtin_unreachable();
 }
-#endif // _NOLIBC_ARCH_AARCH64_H
+#endif /* _NOLIBC_ARCH_AARCH64_H */
index 42499f2..ca4c669 100644 (file)
@@ -7,6 +7,8 @@
 #ifndef _NOLIBC_ARCH_ARM_H
 #define _NOLIBC_ARCH_ARM_H
 
+#include "compiler.h"
+
 /* The struct returned by the stat() syscall, 32-bit only, the syscall returns
  * exactly 56 bytes (stops before the unused array). In big endian, the format
  * differs as devices are returned as short only.
@@ -196,41 +198,67 @@ struct sys_stat_struct {
        _arg1;                                                                \
 })
 
+#define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6)                  \
+({                                                                            \
+       register long _num  __asm__(_NOLIBC_SYSCALL_REG) = (num);             \
+       register long _arg1 __asm__ ("r0") = (long)(arg1);                    \
+       register long _arg2 __asm__ ("r1") = (long)(arg2);                    \
+       register long _arg3 __asm__ ("r2") = (long)(arg3);                    \
+       register long _arg4 __asm__ ("r3") = (long)(arg4);                    \
+       register long _arg5 __asm__ ("r4") = (long)(arg5);                    \
+       register long _arg6 __asm__ ("r5") = (long)(arg6);                    \
+                                                                             \
+       __asm__  volatile (                                                   \
+               _NOLIBC_THUMB_SET_R7                                          \
+               "svc #0\n"                                                    \
+               _NOLIBC_THUMB_RESTORE_R7                                      \
+               : "=r"(_arg1), "=r" (_num)                                    \
+               : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
+                 "r"(_arg6), "r"(_num)                                       \
+               : "memory", "cc", "lr"                                        \
+       );                                                                    \
+       _arg1;                                                                \
+})
+
+
 char **environ __attribute__((weak));
 const unsigned long *_auxv __attribute__((weak));
 
 /* startup code */
-void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void)
 {
        __asm__ volatile (
-               "pop {%r0}\n"                 // argc was in the stack
-               "mov %r1, %sp\n"              // argv = sp
+#ifdef _NOLIBC_STACKPROTECTOR
+               "bl __stack_chk_init\n"       /* initialize stack protector                          */
+#endif
+               "pop {%r0}\n"                 /* argc was in the stack                               */
+               "mov %r1, %sp\n"              /* argv = sp                                           */
 
-               "add %r2, %r0, $1\n"          // envp = (argc + 1) ...
-               "lsl %r2, %r2, $2\n"          //        * 4        ...
-               "add %r2, %r2, %r1\n"         //        + argv
-               "ldr %r3, 1f\n"               // r3 = &environ (see below)
-               "str %r2, [r3]\n"             // store envp into environ
+               "add %r2, %r0, $1\n"          /* envp = (argc + 1) ...                               */
+               "lsl %r2, %r2, $2\n"          /*        * 4        ...                               */
+               "add %r2, %r2, %r1\n"         /*        + argv                                       */
+               "ldr %r3, 1f\n"               /* r3 = &environ (see below)                           */
+               "str %r2, [r3]\n"             /* store envp into environ                             */
 
-               "mov r4, r2\n"                // search for auxv (follows NULL after last env)
+               "mov r4, r2\n"                /* search for auxv (follows NULL after last env)       */
                "0:\n"
-               "mov r5, r4\n"                // r5 = r4
-               "add r4, r4, #4\n"            // r4 += 4
-               "ldr r5,[r5]\n"               // r5 = *r5 = *(r4-4)
-               "cmp r5, #0\n"                // and stop at NULL after last env
+               "mov r5, r4\n"                /* r5 = r4                                             */
+               "add r4, r4, #4\n"            /* r4 += 4                                             */
+               "ldr r5,[r5]\n"               /* r5 = *r5 = *(r4-4)                                  */
+               "cmp r5, #0\n"                /* and stop at NULL after last env                     */
                "bne 0b\n"
-               "ldr %r3, 2f\n"               // r3 = &_auxv (low bits)
-               "str r4, [r3]\n"              // store r4 into _auxv
+               "ldr %r3, 2f\n"               /* r3 = &_auxv (low bits)                              */
+               "str r4, [r3]\n"              /* store r4 into _auxv                                 */
 
-               "mov %r3, $8\n"               // AAPCS : sp must be 8-byte aligned in the
-               "neg %r3, %r3\n"              //         callee, and bl doesn't push (lr=pc)
-               "and %r3, %r3, %r1\n"         // so we do sp = r1(=sp) & r3(=-8);
-               "mov %sp, %r3\n"              //
+               "mov %r3, $8\n"               /* AAPCS : sp must be 8-byte aligned in the            */
+               "neg %r3, %r3\n"              /*         callee, and bl doesn't push (lr=pc)         */
+               "and %r3, %r3, %r1\n"         /* so we do sp = r1(=sp) & r3(=-8);                    */
+               "mov %sp, %r3\n"
 
-               "bl main\n"                   // main() returns the status code, we'll exit with it.
-               "movs r7, $1\n"               // NR_exit == 1
+               "bl main\n"                   /* main() returns the status code, we'll exit with it. */
+               "movs r7, $1\n"               /* NR_exit == 1                                        */
                "svc $0x00\n"
-               ".align 2\n"                  // below are the pointers to a few variables
+               ".align 2\n"                  /* below are the pointers to a few variables           */
                "1:\n"
                ".word environ\n"
                "2:\n"
@@ -239,4 +267,4 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
        __builtin_unreachable();
 }
 
-#endif // _NOLIBC_ARCH_ARM_H
+#endif /* _NOLIBC_ARCH_ARM_H */
index 2d98d78..3d672d9 100644 (file)
@@ -7,6 +7,8 @@
 #ifndef _NOLIBC_ARCH_I386_H
 #define _NOLIBC_ARCH_I386_H
 
+#include "compiler.h"
+
 /* The struct returned by the stat() syscall, 32-bit only, the syscall returns
  * exactly 56 bytes (stops before the unused array).
  */
@@ -181,8 +183,6 @@ struct sys_stat_struct {
 char **environ __attribute__((weak));
 const unsigned long *_auxv __attribute__((weak));
 
-#define __ARCH_SUPPORTS_STACK_PROTECTOR
-
 /* startup code */
 /*
  * i386 System V ABI mandates:
@@ -190,35 +190,35 @@ const unsigned long *_auxv __attribute__((weak));
  * 2) The deepest stack frame should be set to zero
  *
  */
-void __attribute__((weak,noreturn,optimize("omit-frame-pointer"),no_stack_protector)) _start(void)
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void)
 {
        __asm__ volatile (
-#ifdef NOLIBC_STACKPROTECTOR
-               "call __stack_chk_init\n"   // initialize stack protector
+#ifdef _NOLIBC_STACKPROTECTOR
+               "call __stack_chk_init\n"   /* initialize stack protector                    */
 #endif
-               "pop %eax\n"                // argc   (first arg, %eax)
-               "mov %esp, %ebx\n"          // argv[] (second arg, %ebx)
-               "lea 4(%ebx,%eax,4),%ecx\n" // then a NULL then envp (third arg, %ecx)
-               "mov %ecx, environ\n"       // save environ
-               "xor %ebp, %ebp\n"          // zero the stack frame
-               "mov %ecx, %edx\n"          // search for auxv (follows NULL after last env)
+               "pop %eax\n"                /* argc   (first arg, %eax)                      */
+               "mov %esp, %ebx\n"          /* argv[] (second arg, %ebx)                     */
+               "lea 4(%ebx,%eax,4),%ecx\n" /* then a NULL then envp (third arg, %ecx)       */
+               "mov %ecx, environ\n"       /* save environ                                  */
+               "xor %ebp, %ebp\n"          /* zero the stack frame                          */
+               "mov %ecx, %edx\n"          /* search for auxv (follows NULL after last env) */
                "0:\n"
-               "add $4, %edx\n"            // search for auxv using edx, it follows the
-               "cmp -4(%edx), %ebp\n"      // ... NULL after last env (ebp is zero here)
+               "add $4, %edx\n"            /* search for auxv using edx, it follows the     */
+               "cmp -4(%edx), %ebp\n"      /* ... NULL after last env (ebp is zero here)    */
                "jnz 0b\n"
-               "mov %edx, _auxv\n"         // save it into _auxv
-               "and $-16, %esp\n"          // x86 ABI : esp must be 16-byte aligned before
-               "sub $4, %esp\n"            // the call instruction (args are aligned)
-               "push %ecx\n"               // push all registers on the stack so that we
-               "push %ebx\n"               // support both regparm and plain stack modes
+               "mov %edx, _auxv\n"         /* save it into _auxv                            */
+               "and $-16, %esp\n"          /* x86 ABI : esp must be 16-byte aligned before  */
+               "sub $4, %esp\n"            /* the call instruction (args are aligned)       */
+               "push %ecx\n"               /* push all registers on the stack so that we    */
+               "push %ebx\n"               /* support both regparm and plain stack modes    */
                "push %eax\n"
-               "call main\n"               // main() returns the status code in %eax
-               "mov %eax, %ebx\n"          // retrieve exit code (32-bit int)
-               "movl $1, %eax\n"           // NR_exit == 1
-               "int $0x80\n"               // exit now
-               "hlt\n"                     // ensure it does not
+               "call main\n"               /* main() returns the status code in %eax        */
+               "mov %eax, %ebx\n"          /* retrieve exit code (32-bit int)               */
+               "movl $1, %eax\n"           /* NR_exit == 1                                  */
+               "int $0x80\n"               /* exit now                                      */
+               "hlt\n"                     /* ensure it does not                            */
        );
        __builtin_unreachable();
 }
 
-#endif // _NOLIBC_ARCH_I386_H
+#endif /* _NOLIBC_ARCH_I386_H */
index 029ee3c..ad3f266 100644 (file)
@@ -7,6 +7,8 @@
 #ifndef _NOLIBC_ARCH_LOONGARCH_H
 #define _NOLIBC_ARCH_LOONGARCH_H
 
+#include "compiler.h"
+
 /* Syscalls for LoongArch :
  *   - stack is 16-byte aligned
  *   - syscall number is passed in a7
@@ -158,7 +160,7 @@ const unsigned long *_auxv __attribute__((weak));
 #define LONG_ADDI    "addi.w"
 #define LONG_SLL     "slli.w"
 #define LONG_BSTRINS "bstrins.w"
-#else // __loongarch_grlen == 64
+#else /* __loongarch_grlen == 64 */
 #define LONGLOG      "3"
 #define SZREG        "8"
 #define REG_L        "ld.d"
@@ -170,31 +172,34 @@ const unsigned long *_auxv __attribute__((weak));
 #endif
 
 /* startup code */
-void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void)
 {
        __asm__ volatile (
-               REG_L        " $a0, $sp, 0\n"         // argc (a0) was in the stack
-               LONG_ADDI    " $a1, $sp, "SZREG"\n"   // argv (a1) = sp + SZREG
-               LONG_SLL     " $a2, $a0, "LONGLOG"\n" // envp (a2) = SZREG*argc ...
-               LONG_ADDI    " $a2, $a2, "SZREG"\n"   //             + SZREG (skip null)
-               LONG_ADD     " $a2, $a2, $a1\n"       //             + argv
-
-               "move          $a3, $a2\n"            // iterate a3 over envp to find auxv (after NULL)
-               "0:\n"                                // do {
-               REG_L        " $a4, $a3, 0\n"         //   a4 = *a3;
-               LONG_ADDI    " $a3, $a3, "SZREG"\n"   //   a3 += sizeof(void*);
-               "bne           $a4, $zero, 0b\n"      // } while (a4);
-               "la.pcrel      $a4, _auxv\n"          // a4 = &_auxv
-               LONG_S       " $a3, $a4, 0\n"         // store a3 into _auxv
-
-               "la.pcrel      $a3, environ\n"        // a3 = &environ
-               LONG_S       " $a2, $a3, 0\n"         // store envp(a2) into environ
-               LONG_BSTRINS " $sp, $zero, 3, 0\n"    // sp must be 16-byte aligned
-               "bl            main\n"                // main() returns the status code, we'll exit with it.
-               "li.w          $a7, 93\n"             // NR_exit == 93
+#ifdef _NOLIBC_STACKPROTECTOR
+               "bl __stack_chk_init\n"               /* initialize stack protector                          */
+#endif
+               REG_L        " $a0, $sp, 0\n"         /* argc (a0) was in the stack                          */
+               LONG_ADDI    " $a1, $sp, "SZREG"\n"   /* argv (a1) = sp + SZREG                              */
+               LONG_SLL     " $a2, $a0, "LONGLOG"\n" /* envp (a2) = SZREG*argc ...                          */
+               LONG_ADDI    " $a2, $a2, "SZREG"\n"   /*             + SZREG (skip null)                     */
+               LONG_ADD     " $a2, $a2, $a1\n"       /*             + argv                                  */
+
+               "move          $a3, $a2\n"            /* iterate a3 over envp to find auxv (after NULL)      */
+               "0:\n"                                /* do {                                                */
+               REG_L        " $a4, $a3, 0\n"         /*   a4 = *a3;                                         */
+               LONG_ADDI    " $a3, $a3, "SZREG"\n"   /*   a3 += sizeof(void*);                              */
+               "bne           $a4, $zero, 0b\n"      /* } while (a4);                                       */
+               "la.pcrel      $a4, _auxv\n"          /* a4 = &_auxv                                         */
+               LONG_S       " $a3, $a4, 0\n"         /* store a3 into _auxv                                 */
+
+               "la.pcrel      $a3, environ\n"        /* a3 = &environ                                       */
+               LONG_S       " $a2, $a3, 0\n"         /* store envp(a2) into environ                         */
+               LONG_BSTRINS " $sp, $zero, 3, 0\n"    /* sp must be 16-byte aligned                          */
+               "bl            main\n"                /* main() returns the status code, we'll exit with it. */
+               "li.w          $a7, 93\n"             /* NR_exit == 93                                       */
                "syscall       0\n"
        );
        __builtin_unreachable();
 }
 
-#endif // _NOLIBC_ARCH_LOONGARCH_H
+#endif /* _NOLIBC_ARCH_LOONGARCH_H */
index bf83432..db24e08 100644 (file)
@@ -7,6 +7,8 @@
 #ifndef _NOLIBC_ARCH_MIPS_H
 #define _NOLIBC_ARCH_MIPS_H
 
+#include "compiler.h"
+
 /* The struct returned by the stat() syscall. 88 bytes are returned by the
  * syscall.
  */
@@ -180,45 +182,49 @@ char **environ __attribute__((weak));
 const unsigned long *_auxv __attribute__((weak));
 
 /* startup code, note that it's called __start on MIPS */
-void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __start(void)
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector __start(void)
 {
        __asm__ volatile (
-               //".set nomips16\n"
+               /*".set nomips16\n"*/
                ".set push\n"
                ".set    noreorder\n"
                ".option pic0\n"
-               //".ent __start\n"
-               //"__start:\n"
-               "lw $a0,($sp)\n"        // argc was in the stack
-               "addiu  $a1, $sp, 4\n"  // argv = sp + 4
-               "sll $a2, $a0, 2\n"     // a2 = argc * 4
-               "add   $a2, $a2, $a1\n" // envp = argv + 4*argc ...
-               "addiu $a2, $a2, 4\n"   //        ... + 4
-               "lui $a3, %hi(environ)\n"     // load environ into a3 (hi)
-               "addiu $a3, %lo(environ)\n"   // load environ into a3 (lo)
-               "sw $a2,($a3)\n"              // store envp(a2) into environ
-
-               "move $t0, $a2\n"             // iterate t0 over envp, look for NULL
-               "0:"                          // do {
-               "lw $a3, ($t0)\n"             //   a3=*(t0);
-               "bne $a3, $0, 0b\n"           // } while (a3);
-               "addiu $t0, $t0, 4\n"         // delayed slot: t0+=4;
-               "lui $a3, %hi(_auxv)\n"       // load _auxv into a3 (hi)
-               "addiu $a3, %lo(_auxv)\n"     // load _auxv into a3 (lo)
-               "sw $t0, ($a3)\n"             // store t0 into _auxv
+#ifdef _NOLIBC_STACKPROTECTOR
+               "jal __stack_chk_init\n" /* initialize stack protector                         */
+               "nop\n"                  /* delayed slot                                       */
+#endif
+               /*".ent __start\n"*/
+               /*"__start:\n"*/
+               "lw $a0,($sp)\n"        /* argc was in the stack                               */
+               "addiu  $a1, $sp, 4\n"  /* argv = sp + 4                                       */
+               "sll $a2, $a0, 2\n"     /* a2 = argc * 4                                       */
+               "add   $a2, $a2, $a1\n" /* envp = argv + 4*argc ...                            */
+               "addiu $a2, $a2, 4\n"   /*        ... + 4                                      */
+               "lui $a3, %hi(environ)\n"     /* load environ into a3 (hi)                     */
+               "addiu $a3, %lo(environ)\n"   /* load environ into a3 (lo)                     */
+               "sw $a2,($a3)\n"              /* store envp(a2) into environ                   */
+
+               "move $t0, $a2\n"             /* iterate t0 over envp, look for NULL           */
+               "0:"                          /* do {                                          */
+               "lw $a3, ($t0)\n"             /*   a3=*(t0);                                   */
+               "bne $a3, $0, 0b\n"           /* } while (a3);                                 */
+               "addiu $t0, $t0, 4\n"         /* delayed slot: t0+=4;                          */
+               "lui $a3, %hi(_auxv)\n"       /* load _auxv into a3 (hi)                       */
+               "addiu $a3, %lo(_auxv)\n"     /* load _auxv into a3 (lo)                       */
+               "sw $t0, ($a3)\n"             /* store t0 into _auxv                           */
 
                "li $t0, -8\n"
-               "and $sp, $sp, $t0\n"   // sp must be 8-byte aligned
-               "addiu $sp,$sp,-16\n"   // the callee expects to save a0..a3 there!
-               "jal main\n"            // main() returns the status code, we'll exit with it.
-               "nop\n"                 // delayed slot
-               "move $a0, $v0\n"       // retrieve 32-bit exit code from v0
-               "li $v0, 4001\n"        // NR_exit == 4001
+               "and $sp, $sp, $t0\n"   /* sp must be 8-byte aligned                           */
+               "addiu $sp,$sp,-16\n"   /* the callee expects to save a0..a3 there!            */
+               "jal main\n"            /* main() returns the status code, we'll exit with it. */
+               "nop\n"                 /* delayed slot                                        */
+               "move $a0, $v0\n"       /* retrieve 32-bit exit code from v0                   */
+               "li $v0, 4001\n"        /* NR_exit == 4001                                     */
                "syscall\n"
-               //".end __start\n"
+               /*".end __start\n"*/
                ".set pop\n"
        );
        __builtin_unreachable();
 }
 
-#endif // _NOLIBC_ARCH_MIPS_H
+#endif /* _NOLIBC_ARCH_MIPS_H */
index e197fcb..a2e8564 100644 (file)
@@ -7,6 +7,8 @@
 #ifndef _NOLIBC_ARCH_RISCV_H
 #define _NOLIBC_ARCH_RISCV_H
 
+#include "compiler.h"
+
 struct sys_stat_struct {
        unsigned long   st_dev;         /* Device.  */
        unsigned long   st_ino;         /* File serial number.  */
@@ -33,9 +35,13 @@ struct sys_stat_struct {
 #if   __riscv_xlen == 64
 #define PTRLOG "3"
 #define SZREG  "8"
+#define REG_L  "ld"
+#define REG_S  "sd"
 #elif __riscv_xlen == 32
 #define PTRLOG "2"
 #define SZREG  "4"
+#define REG_L  "lw"
+#define REG_S  "sw"
 #endif
 
 /* Syscalls for RISCV :
@@ -174,35 +180,38 @@ char **environ __attribute__((weak));
 const unsigned long *_auxv __attribute__((weak));
 
 /* startup code */
-void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void)
 {
        __asm__ volatile (
                ".option push\n"
                ".option norelax\n"
                "lla   gp, __global_pointer$\n"
                ".option pop\n"
-               "lw    a0, 0(sp)\n"          // argc (a0) was in the stack
-               "add   a1, sp, "SZREG"\n"    // argv (a1) = sp
-               "slli  a2, a0, "PTRLOG"\n"   // envp (a2) = SZREG*argc ...
-               "add   a2, a2, "SZREG"\n"    //             + SZREG (skip null)
-               "add   a2,a2,a1\n"           //             + argv
-
-               "add   a3, a2, zero\n"       // iterate a3 over envp to find auxv (after NULL)
-               "0:\n"                       // do {
-               "ld    a4, 0(a3)\n"          //   a4 = *a3;
-               "add   a3, a3, "SZREG"\n"    //   a3 += sizeof(void*);
-               "bne   a4, zero, 0b\n"       // } while (a4);
-               "lui   a4, %hi(_auxv)\n"     // a4 = &_auxv (high bits)
-               "sd    a3, %lo(_auxv)(a4)\n" // store a3 into _auxv
-
-               "lui a3, %hi(environ)\n"     // a3 = &environ (high bits)
-               "sd a2,%lo(environ)(a3)\n"   // store envp(a2) into environ
-               "andi  sp,a1,-16\n"          // sp must be 16-byte aligned
-               "call  main\n"               // main() returns the status code, we'll exit with it.
-               "li a7, 93\n"                // NR_exit == 93
+#ifdef _NOLIBC_STACKPROTECTOR
+               "call __stack_chk_init\n"    /* initialize stack protector                          */
+#endif
+               REG_L" a0, 0(sp)\n"          /* argc (a0) was in the stack                          */
+               "add   a1, sp, "SZREG"\n"    /* argv (a1) = sp                                      */
+               "slli  a2, a0, "PTRLOG"\n"   /* envp (a2) = SZREG*argc ...                          */
+               "add   a2, a2, "SZREG"\n"    /*             + SZREG (skip null)                     */
+               "add   a2,a2,a1\n"           /*             + argv                                  */
+
+               "add   a3, a2, zero\n"       /* iterate a3 over envp to find auxv (after NULL)      */
+               "0:\n"                       /* do {                                                */
+               REG_L" a4, 0(a3)\n"          /*   a4 = *a3;                                         */
+               "add   a3, a3, "SZREG"\n"    /*   a3 += sizeof(void*);                              */
+               "bne   a4, zero, 0b\n"       /* } while (a4);                                       */
+               "lui   a4, %hi(_auxv)\n"     /* a4 = &_auxv (high bits)                             */
+               REG_S" a3, %lo(_auxv)(a4)\n" /* store a3 into _auxv                                 */
+
+               "lui   a3, %hi(environ)\n"   /* a3 = &environ (high bits)                           */
+               REG_S" a2,%lo(environ)(a3)\n"/* store envp(a2) into environ                         */
+               "andi  sp,a1,-16\n"          /* sp must be 16-byte aligned                          */
+               "call  main\n"               /* main() returns the status code, we'll exit with it. */
+               "li a7, 93\n"                /* NR_exit == 93                                       */
                "ecall\n"
        );
        __builtin_unreachable();
 }
 
-#endif // _NOLIBC_ARCH_RISCV_H
+#endif /* _NOLIBC_ARCH_RISCV_H */
index 6b0e54e..516dff5 100644 (file)
@@ -5,8 +5,11 @@
 
 #ifndef _NOLIBC_ARCH_S390_H
 #define _NOLIBC_ARCH_S390_H
+#include <asm/signal.h>
 #include <asm/unistd.h>
 
+#include "compiler.h"
+
 /* The struct returned by the stat() syscall, equivalent to stat64(). The
  * syscall returns 116 bytes and stops in the middle of __unused.
  */
@@ -163,7 +166,7 @@ char **environ __attribute__((weak));
 const unsigned long *_auxv __attribute__((weak));
 
 /* startup code */
-void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void)
 {
        __asm__ volatile (
                "lg     %r2,0(%r15)\n"          /* argument count */
@@ -223,4 +226,12 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
        return (void *)my_syscall1(__NR_mmap, &args);
 }
 #define sys_mmap sys_mmap
-#endif // _NOLIBC_ARCH_S390_H
+
+static __attribute__((unused))
+pid_t sys_fork(void)
+{
+       return my_syscall5(__NR_clone, 0, SIGCHLD, 0, 0, 0);
+}
+#define sys_fork sys_fork
+
+#endif /* _NOLIBC_ARCH_S390_H */
index f7f2a11..6fc4d83 100644 (file)
@@ -7,6 +7,8 @@
 #ifndef _NOLIBC_ARCH_X86_64_H
 #define _NOLIBC_ARCH_X86_64_H
 
+#include "compiler.h"
+
 /* The struct returned by the stat() syscall, equivalent to stat64(). The
  * syscall returns 116 bytes and stops in the middle of __unused.
  */
@@ -181,8 +183,6 @@ struct sys_stat_struct {
 char **environ __attribute__((weak));
 const unsigned long *_auxv __attribute__((weak));
 
-#define __ARCH_SUPPORTS_STACK_PROTECTOR
-
 /* startup code */
 /*
  * x86-64 System V ABI mandates:
@@ -190,31 +190,31 @@ const unsigned long *_auxv __attribute__((weak));
  * 2) The deepest stack frame should be zero (the %rbp).
  *
  */
-void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void)
 {
        __asm__ volatile (
-#ifdef NOLIBC_STACKPROTECTOR
-               "call __stack_chk_init\n"   // initialize stack protector
+#ifdef _NOLIBC_STACKPROTECTOR
+               "call __stack_chk_init\n"   /* initialize stack protector                          */
 #endif
-               "pop %rdi\n"                // argc   (first arg, %rdi)
-               "mov %rsp, %rsi\n"          // argv[] (second arg, %rsi)
-               "lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
-               "mov %rdx, environ\n"       // save environ
-               "xor %ebp, %ebp\n"          // zero the stack frame
-               "mov %rdx, %rax\n"          // search for auxv (follows NULL after last env)
+               "pop %rdi\n"                /* argc   (first arg, %rdi)                            */
+               "mov %rsp, %rsi\n"          /* argv[] (second arg, %rsi)                           */
+               "lea 8(%rsi,%rdi,8),%rdx\n" /* then a NULL then envp (third arg, %rdx)             */
+               "mov %rdx, environ\n"       /* save environ                                        */
+               "xor %ebp, %ebp\n"          /* zero the stack frame                                */
+               "mov %rdx, %rax\n"          /* search for auxv (follows NULL after last env)       */
                "0:\n"
-               "add $8, %rax\n"            // search for auxv using rax, it follows the
-               "cmp -8(%rax), %rbp\n"      // ... NULL after last env (rbp is zero here)
+               "add $8, %rax\n"            /* search for auxv using rax, it follows the           */
+               "cmp -8(%rax), %rbp\n"      /* ... NULL after last env (rbp is zero here)          */
                "jnz 0b\n"
-               "mov %rax, _auxv\n"         // save it into _auxv
-               "and $-16, %rsp\n"          // x86 ABI : esp must be 16-byte aligned before call
-               "call main\n"               // main() returns the status code, we'll exit with it.
-               "mov %eax, %edi\n"          // retrieve exit code (32 bit)
-               "mov $60, %eax\n"           // NR_exit == 60
-               "syscall\n"                 // really exit
-               "hlt\n"                     // ensure it does not return
+               "mov %rax, _auxv\n"         /* save it into _auxv                                  */
+               "and $-16, %rsp\n"          /* x86 ABI : esp must be 16-byte aligned before call   */
+               "call main\n"               /* main() returns the status code, we'll exit with it. */
+               "mov %eax, %edi\n"          /* retrieve exit code (32 bit)                         */
+               "mov $60, %eax\n"           /* NR_exit == 60                                       */
+               "syscall\n"                 /* really exit                                         */
+               "hlt\n"                     /* ensure it does not return                           */
        );
        __builtin_unreachable();
 }
 
-#endif // _NOLIBC_ARCH_X86_64_H
+#endif /* _NOLIBC_ARCH_X86_64_H */
index 2d5386a..82b4393 100644 (file)
@@ -7,7 +7,7 @@
  * the syscall declarations and the _start code definition. This is the only
  * global part. On all architectures the kernel puts everything in the stack
  * before jumping to _start just above us, without any return address (_start
- * is not a function but an entry pint). So at the stack pointer we find argc.
+ * is not a function but an entry point). So at the stack pointer we find argc.
  * Then argv[] begins, and ends at the first NULL. Then we have envp which
  * starts and ends with a NULL as well. So envp=argv+argc+1.
  */
diff --git a/tools/include/nolibc/compiler.h b/tools/include/nolibc/compiler.h
new file mode 100644 (file)
index 0000000..beddc36
--- /dev/null
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
+/*
+ * NOLIBC compiler support header
+ * Copyright (C) 2023 Thomas Weißschuh <linux@weissschuh.net>
+ */
+#ifndef _NOLIBC_COMPILER_H
+#define _NOLIBC_COMPILER_H
+
+#if defined(__SSP__) || defined(__SSP_STRONG__) || defined(__SSP_ALL__) || defined(__SSP_EXPLICIT__)
+
+#define _NOLIBC_STACKPROTECTOR
+
+#endif /* defined(__SSP__) ... */
+
+#if defined(__has_attribute)
+#  if __has_attribute(no_stack_protector)
+#    define __no_stack_protector __attribute__((no_stack_protector))
+#  else
+#    define __no_stack_protector __attribute__((__optimize__("-fno-stack-protector")))
+#  endif
+#else
+#  define __no_stack_protector __attribute__((__optimize__("-fno-stack-protector")))
+#endif /* defined(__has_attribute) */
+
+#endif /* _NOLIBC_COMPILER_H */
index 04739a6..05a228a 100644 (file)
 #include "sys.h"
 #include "ctype.h"
 #include "signal.h"
+#include "unistd.h"
 #include "stdio.h"
 #include "stdlib.h"
 #include "string.h"
 #include "time.h"
-#include "unistd.h"
 #include "stackprotector.h"
 
 /* Used by programs to avoid std includes */
index d119cbb..88f7b2d 100644 (file)
@@ -7,13 +7,9 @@
 #ifndef _NOLIBC_STACKPROTECTOR_H
 #define _NOLIBC_STACKPROTECTOR_H
 
-#include "arch.h"
+#include "compiler.h"
 
-#if defined(NOLIBC_STACKPROTECTOR)
-
-#if !defined(__ARCH_SUPPORTS_STACK_PROTECTOR)
-#error "nolibc does not support stack protectors on this arch"
-#endif
+#if defined(_NOLIBC_STACKPROTECTOR)
 
 #include "sys.h"
 #include "stdlib.h"
@@ -41,13 +37,14 @@ void __stack_chk_fail_local(void)
 __attribute__((weak,section(".data.nolibc_stack_chk")))
 uintptr_t __stack_chk_guard;
 
-__attribute__((weak,no_stack_protector,section(".text.nolibc_stack_chk")))
+__attribute__((weak,section(".text.nolibc_stack_chk"))) __no_stack_protector
 void __stack_chk_init(void)
 {
        my_syscall3(__NR_getrandom, &__stack_chk_guard, sizeof(__stack_chk_guard), 0);
-       /* a bit more randomness in case getrandom() fails */
-       __stack_chk_guard ^= (uintptr_t) &__stack_chk_guard;
+       /* a bit more randomness in case getrandom() fails, ensure the guard is never 0 */
+       if (__stack_chk_guard != (uintptr_t) &__stack_chk_guard)
+               __stack_chk_guard ^= (uintptr_t) &__stack_chk_guard;
 }
-#endif // defined(NOLIBC_STACKPROTECTOR)
+#endif /* defined(_NOLIBC_STACKPROTECTOR) */
 
-#endif // _NOLIBC_STACKPROTECTOR_H
+#endif /* _NOLIBC_STACKPROTECTOR_H */
index c1ce4f5..4b28243 100644 (file)
@@ -36,8 +36,8 @@ typedef  ssize_t       int_fast16_t;
 typedef   size_t      uint_fast16_t;
 typedef  ssize_t       int_fast32_t;
 typedef   size_t      uint_fast32_t;
-typedef  ssize_t       int_fast64_t;
-typedef   size_t      uint_fast64_t;
+typedef  int64_t       int_fast64_t;
+typedef uint64_t      uint_fast64_t;
 
 typedef  int64_t           intmax_t;
 typedef uint64_t          uintmax_t;
@@ -84,16 +84,30 @@ typedef uint64_t          uintmax_t;
 #define  INT_FAST8_MIN   INT8_MIN
 #define INT_FAST16_MIN   INTPTR_MIN
 #define INT_FAST32_MIN   INTPTR_MIN
-#define INT_FAST64_MIN   INTPTR_MIN
+#define INT_FAST64_MIN   INT64_MIN
 
 #define  INT_FAST8_MAX   INT8_MAX
 #define INT_FAST16_MAX   INTPTR_MAX
 #define INT_FAST32_MAX   INTPTR_MAX
-#define INT_FAST64_MAX   INTPTR_MAX
+#define INT_FAST64_MAX   INT64_MAX
 
 #define  UINT_FAST8_MAX  UINT8_MAX
 #define UINT_FAST16_MAX  SIZE_MAX
 #define UINT_FAST32_MAX  SIZE_MAX
-#define UINT_FAST64_MAX  SIZE_MAX
+#define UINT_FAST64_MAX  UINT64_MAX
+
+#ifndef INT_MIN
+#define INT_MIN          (-__INT_MAX__ - 1)
+#endif
+#ifndef INT_MAX
+#define INT_MAX          __INT_MAX__
+#endif
+
+#ifndef LONG_MIN
+#define LONG_MIN         (-__LONG_MAX__ - 1)
+#endif
+#ifndef LONG_MAX
+#define LONG_MAX         __LONG_MAX__
+#endif
 
 #endif /* _NOLIBC_STDINT_H */
index 6cbbb52..0eef91d 100644 (file)
 #define EOF (-1)
 #endif
 
-/* just define FILE as a non-empty type */
+/* just define FILE as a non-empty type. The value of the pointer gives
+ * the FD: FILE=~fd for fd>=0 or NULL for fd<0. This way positive FILE
+ * are immediately identified as abnormal entries (i.e. possible copies
+ * of valid pointers to something else).
+ */
 typedef struct FILE {
        char dummy[1];
 } FILE;
 
-/* We define the 3 common stdio files as constant invalid pointers that
- * are easily recognized.
- */
-static __attribute__((unused)) FILE* const stdin  = (FILE*)-3;
-static __attribute__((unused)) FILE* const stdout = (FILE*)-2;
-static __attribute__((unused)) FILE* const stderr = (FILE*)-1;
+static __attribute__((unused)) FILE* const stdin  = (FILE*)(intptr_t)~STDIN_FILENO;
+static __attribute__((unused)) FILE* const stdout = (FILE*)(intptr_t)~STDOUT_FILENO;
+static __attribute__((unused)) FILE* const stderr = (FILE*)(intptr_t)~STDERR_FILENO;
+
+/* provides a FILE* equivalent of fd. The mode is ignored. */
+static __attribute__((unused))
+FILE *fdopen(int fd, const char *mode __attribute__((unused)))
+{
+       if (fd < 0) {
+               SET_ERRNO(EBADF);
+               return NULL;
+       }
+       return (FILE*)(intptr_t)~fd;
+}
+
+/* provides the fd of stream. */
+static __attribute__((unused))
+int fileno(FILE *stream)
+{
+       intptr_t i = (intptr_t)stream;
+
+       if (i >= 0) {
+               SET_ERRNO(EBADF);
+               return -1;
+       }
+       return ~i;
+}
+
+/* flush a stream. */
+static __attribute__((unused))
+int fflush(FILE *stream)
+{
+       intptr_t i = (intptr_t)stream;
+
+       /* NULL is valid here. */
+       if (i > 0) {
+               SET_ERRNO(EBADF);
+               return -1;
+       }
+
+       /* Don't do anything, nolibc does not support buffering. */
+       return 0;
+}
+
+/* flush a stream. */
+static __attribute__((unused))
+int fclose(FILE *stream)
+{
+       intptr_t i = (intptr_t)stream;
+
+       if (i >= 0) {
+               SET_ERRNO(EBADF);
+               return -1;
+       }
+
+       if (close(~i))
+               return EOF;
+
+       return 0;
+}
 
 /* getc(), fgetc(), getchar() */
 
@@ -41,14 +99,8 @@ static __attribute__((unused))
 int fgetc(FILE* stream)
 {
        unsigned char ch;
-       int fd;
 
-       if (stream < stdin || stream > stderr)
-               return EOF;
-
-       fd = 3 + (long)stream;
-
-       if (read(fd, &ch, 1) <= 0)
+       if (read(fileno(stream), &ch, 1) <= 0)
                return EOF;
        return ch;
 }
@@ -68,14 +120,8 @@ static __attribute__((unused))
 int fputc(int c, FILE* stream)
 {
        unsigned char ch = c;
-       int fd;
-
-       if (stream < stdin || stream > stderr)
-               return EOF;
-
-       fd = 3 + (long)stream;
 
-       if (write(fd, &ch, 1) <= 0)
+       if (write(fileno(stream), &ch, 1) <= 0)
                return EOF;
        return ch;
 }
@@ -96,12 +142,7 @@ static __attribute__((unused))
 int _fwrite(const void *buf, size_t size, FILE *stream)
 {
        ssize_t ret;
-       int fd;
-
-       if (stream < stdin || stream > stderr)
-               return EOF;
-
-       fd = 3 + (long)stream;
+       int fd = fileno(stream);
 
        while (size) {
                ret = write(fd, buf, size);
index 894c955..902162f 100644 (file)
@@ -102,7 +102,7 @@ char *_getenv(const char *name, char **environ)
        return NULL;
 }
 
-static inline __attribute__((unused,always_inline))
+static __inline__ __attribute__((unused,always_inline))
 char *getenv(const char *name)
 {
        extern char **environ;
@@ -231,7 +231,7 @@ int utoh_r(unsigned long in, char *buffer)
 /* converts unsigned long <in> to an hex string using the static itoa_buffer
  * and returns the pointer to that string.
  */
-static inline __attribute__((unused))
+static __inline__ __attribute__((unused))
 char *utoh(unsigned long in)
 {
        utoh_r(in, itoa_buffer);
@@ -293,7 +293,7 @@ int itoa_r(long in, char *buffer)
 /* for historical compatibility, same as above but returns the pointer to the
  * buffer.
  */
-static inline __attribute__((unused))
+static __inline__ __attribute__((unused))
 char *ltoa_r(long in, char *buffer)
 {
        itoa_r(in, buffer);
@@ -303,7 +303,7 @@ char *ltoa_r(long in, char *buffer)
 /* converts long integer <in> to a string using the static itoa_buffer and
  * returns the pointer to that string.
  */
-static inline __attribute__((unused))
+static __inline__ __attribute__((unused))
 char *itoa(long in)
 {
        itoa_r(in, itoa_buffer);
@@ -313,7 +313,7 @@ char *itoa(long in)
 /* converts long integer <in> to a string using the static itoa_buffer and
  * returns the pointer to that string. Same as above, for compatibility.
  */
-static inline __attribute__((unused))
+static __inline__ __attribute__((unused))
 char *ltoa(long in)
 {
        itoa_r(in, itoa_buffer);
@@ -323,7 +323,7 @@ char *ltoa(long in)
 /* converts unsigned long integer <in> to a string using the static itoa_buffer
  * and returns the pointer to that string.
  */
-static inline __attribute__((unused))
+static __inline__ __attribute__((unused))
 char *utoa(unsigned long in)
 {
        utoa_r(in, itoa_buffer);
@@ -367,7 +367,7 @@ int u64toh_r(uint64_t in, char *buffer)
 /* converts uint64_t <in> to an hex string using the static itoa_buffer and
  * returns the pointer to that string.
  */
-static inline __attribute__((unused))
+static __inline__ __attribute__((unused))
 char *u64toh(uint64_t in)
 {
        u64toh_r(in, itoa_buffer);
@@ -429,7 +429,7 @@ int i64toa_r(int64_t in, char *buffer)
 /* converts int64_t <in> to a string using the static itoa_buffer and returns
  * the pointer to that string.
  */
-static inline __attribute__((unused))
+static __inline__ __attribute__((unused))
 char *i64toa(int64_t in)
 {
        i64toa_r(in, itoa_buffer);
@@ -439,7 +439,7 @@ char *i64toa(int64_t in)
 /* converts uint64_t <in> to a string using the static itoa_buffer and returns
  * the pointer to that string.
  */
-static inline __attribute__((unused))
+static __inline__ __attribute__((unused))
 char *u64toa(uint64_t in)
 {
        u64toa_r(in, itoa_buffer);
index fffdaf6..0c2e06c 100644 (file)
@@ -90,7 +90,7 @@ void *memset(void *dst, int b, size_t len)
 
        while (len--) {
                /* prevent gcc from recognizing memset() here */
-               asm volatile("");
+               __asm__ volatile("");
                *(p++) = b;
        }
        return dst;
@@ -139,7 +139,7 @@ size_t strlen(const char *str)
        size_t len;
 
        for (len = 0; str[len]; len++)
-               asm("");
+               __asm__("");
        return len;
 }
 
index 5d624dc..856249a 100644 (file)
 
 /* system includes */
 #include <asm/unistd.h>
-#include <asm/signal.h>  // for SIGCHLD
+#include <asm/signal.h>  /* for SIGCHLD */
 #include <asm/ioctls.h>
 #include <asm/mman.h>
 #include <linux/fs.h>
 #include <linux/loop.h>
 #include <linux/time.h>
 #include <linux/auxvec.h>
-#include <linux/fcntl.h> // for O_* and AT_*
-#include <linux/stat.h>  // for statx()
+#include <linux/fcntl.h> /* for O_* and AT_* */
+#include <linux/stat.h>  /* for statx() */
+#include <linux/reboot.h> /* for LINUX_REBOOT_* */
+#include <linux/prctl.h>
 
 #include "arch.h"
 #include "errno.h"
@@ -322,7 +324,7 @@ static __attribute__((noreturn,unused))
 void sys_exit(int status)
 {
        my_syscall1(__NR_exit, status & 255);
-       while(1); // shut the "noreturn" warnings.
+       while(1); /* shut the "noreturn" warnings. */
 }
 
 static __attribute__((noreturn,unused))
@@ -336,6 +338,7 @@ void exit(int status)
  * pid_t fork(void);
  */
 
+#ifndef sys_fork
 static __attribute__((unused))
 pid_t sys_fork(void)
 {
@@ -351,6 +354,7 @@ pid_t sys_fork(void)
 #error Neither __NR_clone nor __NR_fork defined, cannot implement sys_fork()
 #endif
 }
+#endif
 
 static __attribute__((unused))
 pid_t fork(void)
@@ -858,7 +862,7 @@ int open(const char *path, int flags, ...)
                va_list args;
 
                va_start(args, flags);
-               mode = va_arg(args, mode_t);
+               mode = va_arg(args, int);
                va_end(args);
        }
 
@@ -873,6 +877,32 @@ int open(const char *path, int flags, ...)
 
 
 /*
+ * int prctl(int option, unsigned long arg2, unsigned long arg3,
+ *                       unsigned long arg4, unsigned long arg5);
+ */
+
+static __attribute__((unused))
+int sys_prctl(int option, unsigned long arg2, unsigned long arg3,
+                         unsigned long arg4, unsigned long arg5)
+{
+       return my_syscall5(__NR_prctl, option, arg2, arg3, arg4, arg5);
+}
+
+static __attribute__((unused))
+int prctl(int option, unsigned long arg2, unsigned long arg3,
+                     unsigned long arg4, unsigned long arg5)
+{
+       int ret = sys_prctl(option, arg2, arg3, arg4, arg5);
+
+       if (ret < 0) {
+               SET_ERRNO(-ret);
+               ret = -1;
+       }
+       return ret;
+}
+
+
+/*
  * int pivot_root(const char *new, const char *old);
  */
 
@@ -909,7 +939,7 @@ int sys_poll(struct pollfd *fds, int nfds, int timeout)
                t.tv_sec  = timeout / 1000;
                t.tv_nsec = (timeout % 1000) * 1000000;
        }
-       return my_syscall4(__NR_ppoll, fds, nfds, (timeout >= 0) ? &t : NULL, NULL);
+       return my_syscall5(__NR_ppoll, fds, nfds, (timeout >= 0) ? &t : NULL, NULL, 0);
 #elif defined(__NR_poll)
        return my_syscall3(__NR_poll, fds, nfds, timeout);
 #else
@@ -1131,23 +1161,26 @@ int sys_stat(const char *path, struct stat *buf)
        long ret;
 
        ret = sys_statx(AT_FDCWD, path, AT_NO_AUTOMOUNT, STATX_BASIC_STATS, &statx);
-       buf->st_dev     = ((statx.stx_dev_minor & 0xff)
-                         | (statx.stx_dev_major << 8)
-                         | ((statx.stx_dev_minor & ~0xff) << 12));
-       buf->st_ino     = statx.stx_ino;
-       buf->st_mode    = statx.stx_mode;
-       buf->st_nlink   = statx.stx_nlink;
-       buf->st_uid     = statx.stx_uid;
-       buf->st_gid     = statx.stx_gid;
-       buf->st_rdev    = ((statx.stx_rdev_minor & 0xff)
-                         | (statx.stx_rdev_major << 8)
-                         | ((statx.stx_rdev_minor & ~0xff) << 12));
-       buf->st_size    = statx.stx_size;
-       buf->st_blksize = statx.stx_blksize;
-       buf->st_blocks  = statx.stx_blocks;
-       buf->st_atime   = statx.stx_atime.tv_sec;
-       buf->st_mtime   = statx.stx_mtime.tv_sec;
-       buf->st_ctime   = statx.stx_ctime.tv_sec;
+       buf->st_dev          = ((statx.stx_dev_minor & 0xff)
+                              | (statx.stx_dev_major << 8)
+                              | ((statx.stx_dev_minor & ~0xff) << 12));
+       buf->st_ino          = statx.stx_ino;
+       buf->st_mode         = statx.stx_mode;
+       buf->st_nlink        = statx.stx_nlink;
+       buf->st_uid          = statx.stx_uid;
+       buf->st_gid          = statx.stx_gid;
+       buf->st_rdev         = ((statx.stx_rdev_minor & 0xff)
+                              | (statx.stx_rdev_major << 8)
+                              | ((statx.stx_rdev_minor & ~0xff) << 12));
+       buf->st_size         = statx.stx_size;
+       buf->st_blksize      = statx.stx_blksize;
+       buf->st_blocks       = statx.stx_blocks;
+       buf->st_atim.tv_sec  = statx.stx_atime.tv_sec;
+       buf->st_atim.tv_nsec = statx.stx_atime.tv_nsec;
+       buf->st_mtim.tv_sec  = statx.stx_mtime.tv_sec;
+       buf->st_mtim.tv_nsec = statx.stx_mtime.tv_nsec;
+       buf->st_ctim.tv_sec  = statx.stx_ctime.tv_sec;
+       buf->st_ctim.tv_nsec = statx.stx_ctime.tv_nsec;
        return ret;
 }
 #else
@@ -1165,19 +1198,22 @@ int sys_stat(const char *path, struct stat *buf)
 #else
 #error Neither __NR_newfstatat nor __NR_stat defined, cannot implement sys_stat()
 #endif
-       buf->st_dev     = stat.st_dev;
-       buf->st_ino     = stat.st_ino;
-       buf->st_mode    = stat.st_mode;
-       buf->st_nlink   = stat.st_nlink;
-       buf->st_uid     = stat.st_uid;
-       buf->st_gid     = stat.st_gid;
-       buf->st_rdev    = stat.st_rdev;
-       buf->st_size    = stat.st_size;
-       buf->st_blksize = stat.st_blksize;
-       buf->st_blocks  = stat.st_blocks;
-       buf->st_atime   = stat.st_atime;
-       buf->st_mtime   = stat.st_mtime;
-       buf->st_ctime   = stat.st_ctime;
+       buf->st_dev          = stat.st_dev;
+       buf->st_ino          = stat.st_ino;
+       buf->st_mode         = stat.st_mode;
+       buf->st_nlink        = stat.st_nlink;
+       buf->st_uid          = stat.st_uid;
+       buf->st_gid          = stat.st_gid;
+       buf->st_rdev         = stat.st_rdev;
+       buf->st_size         = stat.st_size;
+       buf->st_blksize      = stat.st_blksize;
+       buf->st_blocks       = stat.st_blocks;
+       buf->st_atim.tv_sec  = stat.st_atime;
+       buf->st_atim.tv_nsec = stat.st_atime_nsec;
+       buf->st_mtim.tv_sec  = stat.st_mtime;
+       buf->st_mtim.tv_nsec = stat.st_mtime_nsec;
+       buf->st_ctim.tv_sec  = stat.st_ctime;
+       buf->st_ctim.tv_nsec = stat.st_ctime_nsec;
        return ret;
 }
 #endif
@@ -1365,6 +1401,29 @@ ssize_t write(int fd, const void *buf, size_t count)
        return ret;
 }
 
+
+/*
+ * int memfd_create(const char *name, unsigned int flags);
+ */
+
+static __attribute__((unused))
+int sys_memfd_create(const char *name, unsigned int flags)
+{
+       return my_syscall2(__NR_memfd_create, name, flags);
+}
+
+static __attribute__((unused))
+int memfd_create(const char *name, unsigned int flags)
+{
+       ssize_t ret = sys_memfd_create(name, flags);
+
+       if (ret < 0) {
+               SET_ERRNO(-ret);
+               ret = -1;
+       }
+       return ret;
+}
+
 /* make sure to include all global symbols */
 #include "nolibc.h"
 
index aedd7d9..f96e28b 100644 (file)
 #define SEEK_CUR       1
 #define SEEK_END       2
 
-/* cmd for reboot() */
-#define LINUX_REBOOT_MAGIC1         0xfee1dead
-#define LINUX_REBOOT_MAGIC2         0x28121969
-#define LINUX_REBOOT_CMD_HALT       0xcdef0123
-#define LINUX_REBOOT_CMD_POWER_OFF  0x4321fedc
-#define LINUX_REBOOT_CMD_RESTART    0x01234567
-#define LINUX_REBOOT_CMD_SW_SUSPEND 0xd000fce2
-
 /* Macros used on waitpid()'s return status */
 #define WEXITSTATUS(status) (((status) & 0xff00) >> 8)
 #define WIFEXITED(status)   (((status) & 0x7f) == 0)
@@ -206,9 +198,9 @@ struct stat {
        off_t     st_size;    /* total size, in bytes */
        blksize_t st_blksize; /* blocksize for file system I/O */
        blkcnt_t  st_blocks;  /* number of 512B blocks allocated */
-       time_t    st_atime;   /* time of last access */
-       time_t    st_mtime;   /* time of last modification */
-       time_t    st_ctime;   /* time of last status change */
+       union { time_t st_atime; struct timespec st_atim; }; /* time of last access */
+       union { time_t st_mtime; struct timespec st_mtim; }; /* time of last modification */
+       union { time_t st_ctime; struct timespec st_ctim; }; /* time of last status change */
 };
 
 /* WARNING, it only deals with the 4096 first majors and 256 first minors */
index ac7d53d..0e832e1 100644 (file)
@@ -56,6 +56,21 @@ int tcsetpgrp(int fd, pid_t pid)
        return ioctl(fd, TIOCSPGRP, &pid);
 }
 
+#define _syscall(N, ...)                                                      \
+({                                                                            \
+       long _ret = my_syscall##N(__VA_ARGS__);                               \
+       if (_ret < 0) {                                                       \
+               SET_ERRNO(-_ret);                                             \
+               _ret = -1;                                                    \
+       }                                                                     \
+       _ret;                                                                 \
+})
+
+#define _syscall_narg(...) __syscall_narg(__VA_ARGS__, 6, 5, 4, 3, 2, 1, 0)
+#define __syscall_narg(_0, _1, _2, _3, _4, _5, _6, N, ...) N
+#define _syscall_n(N, ...) _syscall(N, __VA_ARGS__)
+#define syscall(...) _syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__)
+
 /* make sure to include all global symbols */
 #include "nolibc.h"
 
index 6428085..a87bbbb 100644 (file)
@@ -972,6 +972,19 @@ extern "C" {
 #define DRM_IOCTL_GET_STATS             DRM_IOR( 0x06, struct drm_stats)
 #define DRM_IOCTL_SET_VERSION          DRM_IOWR(0x07, struct drm_set_version)
 #define DRM_IOCTL_MODESET_CTL           DRM_IOW(0x08, struct drm_modeset_ctl)
+/**
+ * DRM_IOCTL_GEM_CLOSE - Close a GEM handle.
+ *
+ * GEM handles are not reference-counted by the kernel. User-space is
+ * responsible for managing their lifetime. For example, if user-space imports
+ * the same memory object twice on the same DRM file description, the same GEM
+ * handle is returned by both imports, and user-space needs to ensure
+ * &DRM_IOCTL_GEM_CLOSE is performed once only. The same situation can happen
+ * when a memory object is allocated, then exported and imported again on the
+ * same DRM file description. The &DRM_IOCTL_MODE_GETFB2 IOCTL is an exception
+ * and always returns fresh new GEM handles even if an existing GEM handle
+ * already refers to the same memory object before the IOCTL is performed.
+ */
 #define DRM_IOCTL_GEM_CLOSE            DRM_IOW (0x09, struct drm_gem_close)
 #define DRM_IOCTL_GEM_FLINK            DRM_IOWR(0x0a, struct drm_gem_flink)
 #define DRM_IOCTL_GEM_OPEN             DRM_IOWR(0x0b, struct drm_gem_open)
@@ -1012,7 +1025,37 @@ extern "C" {
 #define DRM_IOCTL_UNLOCK               DRM_IOW( 0x2b, struct drm_lock)
 #define DRM_IOCTL_FINISH               DRM_IOW( 0x2c, struct drm_lock)
 
+/**
+ * DRM_IOCTL_PRIME_HANDLE_TO_FD - Convert a GEM handle to a DMA-BUF FD.
+ *
+ * User-space sets &drm_prime_handle.handle with the GEM handle to export and
+ * &drm_prime_handle.flags, and gets back a DMA-BUF file descriptor in
+ * &drm_prime_handle.fd.
+ *
+ * The export can fail for any driver-specific reason, e.g. because export is
+ * not supported for this specific GEM handle (but might be for others).
+ *
+ * Support for exporting DMA-BUFs is advertised via &DRM_PRIME_CAP_EXPORT.
+ */
 #define DRM_IOCTL_PRIME_HANDLE_TO_FD    DRM_IOWR(0x2d, struct drm_prime_handle)
+/**
+ * DRM_IOCTL_PRIME_FD_TO_HANDLE - Convert a DMA-BUF FD to a GEM handle.
+ *
+ * User-space sets &drm_prime_handle.fd with a DMA-BUF file descriptor to
+ * import, and gets back a GEM handle in &drm_prime_handle.handle.
+ * &drm_prime_handle.flags is unused.
+ *
+ * If an existing GEM handle refers to the memory object backing the DMA-BUF,
+ * that GEM handle is returned. Therefore user-space which needs to handle
+ * arbitrary DMA-BUFs must have a user-space lookup data structure to manually
+ * reference-count duplicated GEM handles. For more information see
+ * &DRM_IOCTL_GEM_CLOSE.
+ *
+ * The import can fail for any driver-specific reason, e.g. because import is
+ * only supported for DMA-BUFs allocated on this DRM device.
+ *
+ * Support for importing DMA-BUFs is advertised via &DRM_PRIME_CAP_IMPORT.
+ */
 #define DRM_IOCTL_PRIME_FD_TO_HANDLE    DRM_IOWR(0x2e, struct drm_prime_handle)
 
 #define DRM_IOCTL_AGP_ACQUIRE          DRM_IO(  0x30)
@@ -1104,8 +1147,13 @@ extern "C" {
  * struct as the output.
  *
  * If the client is DRM master or has &CAP_SYS_ADMIN, &drm_mode_fb_cmd2.handles
- * will be filled with GEM buffer handles. Planes are valid until one has a
- * zero handle -- this can be used to compute the number of planes.
+ * will be filled with GEM buffer handles. Fresh new GEM handles are always
+ * returned, even if another GEM handle referring to the same memory object
+ * already exists on the DRM file description. The caller is responsible for
+ * removing the new handles, e.g. via the &DRM_IOCTL_GEM_CLOSE IOCTL. The same
+ * new handle will be returned for multiple planes in case they use the same
+ * memory object. Planes are valid until one has a zero handle -- this can be
+ * used to compute the number of planes.
  *
  * Otherwise, &drm_mode_fb_cmd2.handles will be zeroed and planes are valid
  * until one has a zero &drm_mode_fb_cmd2.pitches.
@@ -1113,6 +1161,11 @@ extern "C" {
  * If the framebuffer has a format modifier, &DRM_MODE_FB_MODIFIERS will be set
  * in &drm_mode_fb_cmd2.flags and &drm_mode_fb_cmd2.modifier will contain the
  * modifier. Otherwise, user-space must ignore &drm_mode_fb_cmd2.modifier.
+ *
+ * To obtain DMA-BUF FDs for each plane without leaking GEM handles, user-space
+ * can export each handle via &DRM_IOCTL_PRIME_HANDLE_TO_FD, then immediately
+ * close each unique handle via &DRM_IOCTL_GEM_CLOSE, making sure to not
+ * double-close handles which are specified multiple times in the array.
  */
 #define DRM_IOCTL_MODE_GETFB2          DRM_IOWR(0xCE, struct drm_mode_fb_cmd2)
 
index 8df261c..dba7c5a 100644 (file)
@@ -2491,7 +2491,7 @@ struct i915_context_param_engines {
 #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0 /* see i915_context_engines_load_balance */
 #define I915_CONTEXT_ENGINES_EXT_BOND 1 /* see i915_context_engines_bond */
 #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
-       struct i915_engine_class_instance engines[0];
+       struct i915_engine_class_instance engines[];
 } __attribute__((packed));
 
 #define I915_DEFINE_CONTEXT_PARAM_ENGINES(name__, N__) struct { \
@@ -2676,6 +2676,10 @@ enum drm_i915_oa_format {
        I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
        I915_OA_FORMAT_A24u40_A14u32_B8_C8,
 
+       /* MTL OAM */
+       I915_OAM_FORMAT_MPEC8u64_B8_C8,
+       I915_OAM_FORMAT_MPEC8u32_B8_C8,
+
        I915_OA_FORMAT_MAX          /* non-ABI */
 };
 
@@ -2758,6 +2762,25 @@ enum drm_i915_perf_property_id {
         */
        DRM_I915_PERF_PROP_POLL_OA_PERIOD,
 
+       /**
+        * Multiple engines may be mapped to the same OA unit. The OA unit is
+        * identified by class:instance of any engine mapped to it.
+        *
+        * This parameter specifies the engine class and must be passed along
+        * with DRM_I915_PERF_PROP_OA_ENGINE_INSTANCE.
+        *
+        * This property is available in perf revision 6.
+        */
+       DRM_I915_PERF_PROP_OA_ENGINE_CLASS,
+
+       /**
+        * This parameter specifies the engine instance and must be passed along
+        * with DRM_I915_PERF_PROP_OA_ENGINE_CLASS.
+        *
+        * This property is available in perf revision 6.
+        */
+       DRM_I915_PERF_PROP_OA_ENGINE_INSTANCE,
+
        DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
index 1bb11a6..c994ff5 100644 (file)
@@ -1035,6 +1035,7 @@ enum bpf_attach_type {
        BPF_TRACE_KPROBE_MULTI,
        BPF_LSM_CGROUP,
        BPF_STRUCT_OPS,
+       BPF_NETFILTER,
        __MAX_BPF_ATTACH_TYPE
 };
 
index af2a44c..a429381 100644 (file)
@@ -28,7 +28,7 @@
 #define _BITUL(x)      (_UL(1) << (x))
 #define _BITULL(x)     (_ULL(1) << (x))
 
-#define __ALIGN_KERNEL(x, a)           __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
+#define __ALIGN_KERNEL(x, a)           __ALIGN_KERNEL_MASK(x, (__typeof__(x))(a) - 1)
 #define __ALIGN_KERNEL_MASK(x, mask)   (((x) + (mask)) & ~(mask))
 
 #define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
index 07a4cb1..e682ab6 100644 (file)
@@ -162,6 +162,8 @@ struct in_addr {
 #define MCAST_MSFILTER                 48
 #define IP_MULTICAST_ALL               49
 #define IP_UNICAST_IF                  50
+#define IP_LOCAL_PORT_RANGE            51
+#define IP_PROTOCOL                    52
 
 #define MCAST_EXCLUDE  0
 #define MCAST_INCLUDE  1
index 4003a16..737318b 100644 (file)
@@ -341,8 +341,13 @@ struct kvm_run {
                        __u64 nr;
                        __u64 args[6];
                        __u64 ret;
-                       __u32 longmode;
-                       __u32 pad;
+
+                       union {
+#ifndef __KERNEL__
+                               __u32 longmode;
+#endif
+                               __u64 flags;
+                       };
                } hypercall;
                /* KVM_EXIT_TPR_ACCESS */
                struct {
@@ -1184,6 +1189,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
 #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
 #define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
+#define KVM_CAP_COUNTER_OFFSET 227
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1543,6 +1549,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
+/* Available with KVM_CAP_COUNTER_OFFSET */
+#define KVM_ARM_SET_COUNTER_OFFSET _IOW(KVMIO,  0xb5, struct kvm_arm_counter_offset)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
index 759b3f5..f23d9a1 100644 (file)
@@ -290,6 +290,8 @@ struct prctl_mm_map {
 #define PR_SET_VMA             0x53564d41
 # define PR_SET_VMA_ANON_NAME          0
 
+#define PR_GET_AUXV                    0x41555856
+
 #define PR_SET_MEMORY_MERGE            67
 #define PR_GET_MEMORY_MERGE            68
 #endif /* _LINUX_PRCTL_H */
index de6810e..0aa955a 100644 (file)
@@ -429,9 +429,14 @@ struct snd_pcm_sw_params {
        snd_pcm_uframes_t avail_min;            /* min avail frames for wakeup */
        snd_pcm_uframes_t xfer_align;           /* obsolete: xfer size need to be a multiple */
        snd_pcm_uframes_t start_threshold;      /* min hw_avail frames for automatic start */
-       snd_pcm_uframes_t stop_threshold;       /* min avail frames for automatic stop */
-       snd_pcm_uframes_t silence_threshold;    /* min distance from noise for silence filling */
-       snd_pcm_uframes_t silence_size;         /* silence block size */
+       /*
+        * The following two thresholds alleviate playback buffer underruns; when
+        * hw_avail drops below the threshold, the respective action is triggered:
+        */
+       snd_pcm_uframes_t stop_threshold;       /* - stop playback */
+       snd_pcm_uframes_t silence_threshold;    /* - pre-fill buffer with silence */
+       snd_pcm_uframes_t silence_size;         /* max size of silence pre-fill; when >= boundary,
+                                                * fill played area with silence immediately */
        snd_pcm_uframes_t boundary;             /* pointers wrap point */
        unsigned int proto;                     /* protocol version */
        unsigned int tstamp_type;               /* timestamp type (req. proto >= 2.0.12) */
@@ -570,7 +575,8 @@ struct __snd_pcm_mmap_status64 {
 struct __snd_pcm_mmap_control64 {
        __pad_before_uframe __pad1;
        snd_pcm_uframes_t appl_ptr;      /* RW: appl ptr (0...boundary-1) */
-       __pad_before_uframe __pad2;
+       __pad_before_uframe __pad2;      // This should be __pad_after_uframe, but binary
+                                        // backwards compatibility constraints prevent a fix.
 
        __pad_before_uframe __pad3;
        snd_pcm_uframes_t  avail_min;    /* RW: min available frames for wakeup */
index ad1ec89..a27f6e9 100644 (file)
@@ -117,6 +117,7 @@ static const char * const attach_type_name[] = {
        [BPF_PERF_EVENT]                = "perf_event",
        [BPF_TRACE_KPROBE_MULTI]        = "trace_kprobe_multi",
        [BPF_STRUCT_OPS]                = "struct_ops",
+       [BPF_NETFILTER]                 = "netfilter",
 };
 
 static const char * const link_type_name[] = {
@@ -8712,7 +8713,7 @@ static const struct bpf_sec_def section_defs[] = {
        SEC_DEF("struct_ops+",          STRUCT_OPS, 0, SEC_NONE),
        SEC_DEF("struct_ops.s+",        STRUCT_OPS, 0, SEC_SLEEPABLE),
        SEC_DEF("sk_lookup",            SK_LOOKUP, BPF_SK_LOOKUP, SEC_ATTACHABLE),
-       SEC_DEF("netfilter",            NETFILTER, 0, SEC_NONE),
+       SEC_DEF("netfilter",            NETFILTER, BPF_NETFILTER, SEC_NONE),
 };
 
 static size_t custom_sec_def_cnt;
index 6065f40..b7d4431 100644 (file)
@@ -180,7 +180,9 @@ static int probe_prog_load(enum bpf_prog_type prog_type,
        case BPF_PROG_TYPE_SK_REUSEPORT:
        case BPF_PROG_TYPE_FLOW_DISSECTOR:
        case BPF_PROG_TYPE_CGROUP_SYSCTL:
+               break;
        case BPF_PROG_TYPE_NETFILTER:
+               opts.expected_attach_type = BPF_NETFILTER;
                break;
        default:
                return -EOPNOTSUPP;
index 41b9b94..8e91473 100644 (file)
@@ -6,10 +6,6 @@
 #include <stdbool.h>
 #include <stdint.h>
 
-#ifndef NORETURN
-#define NORETURN __attribute__((__noreturn__))
-#endif
-
 enum parse_opt_type {
        /* special types */
        OPTION_END,
@@ -183,9 +179,9 @@ extern int parse_options_subcommand(int argc, const char **argv,
                                const char *const subcommands[],
                                const char *usagestr[], int flags);
 
-extern NORETURN void usage_with_options(const char * const *usagestr,
+extern __noreturn void usage_with_options(const char * const *usagestr,
                                         const struct option *options);
-extern NORETURN __attribute__((format(printf,3,4)))
+extern __noreturn __attribute__((format(printf,3,4)))
 void usage_with_options_msg(const char * const *usagestr,
                            const struct option *options,
                            const char *fmt, ...);
index b2aec04..dfac76e 100644 (file)
@@ -5,8 +5,7 @@
 #include <stdarg.h>
 #include <stdlib.h>
 #include <stdio.h>
-
-#define NORETURN __attribute__((__noreturn__))
+#include <linux/compiler.h>
 
 static inline void report(const char *prefix, const char *err, va_list params)
 {
@@ -15,7 +14,7 @@ static inline void report(const char *prefix, const char *err, va_list params)
        fprintf(stderr, " %s%s\n", prefix, msg);
 }
 
-static NORETURN inline void die(const char *err, ...)
+static __noreturn inline void die(const char *err, ...)
 {
        va_list params;
 
index aa77bca..3144f33 100644 (file)
@@ -591,8 +591,9 @@ class YnlFamily(SpecFamily):
                         print('Unexpected message: ' + repr(gm))
                         continue
 
-                rsp.append(self._decode(gm.raw_attrs, op.attr_set.name)
-                           | gm.fixed_header_attrs)
+                rsp_msg = self._decode(gm.raw_attrs, op.attr_set.name)
+                rsp_msg.update(gm.fixed_header_attrs)
+                rsp.append(rsp_msg)
 
         if not rsp:
             return None
index 744db42..fe39c2a 100644 (file)
@@ -244,6 +244,11 @@ To achieve the validation, objtool enforces the following rules:
 Objtool warnings
 ----------------
 
+NOTE: When requesting help with an objtool warning, please recreate with
+OBJTOOL_VERBOSE=1 (e.g., "make OBJTOOL_VERBOSE=1") and send the full
+output, including any disassembly or backtrace below the warning, to the
+objtool maintainers.
+
 For asm files, if you're getting an error which doesn't make sense,
 first make sure that the affected code follows the above rules.
 
@@ -298,6 +303,11 @@ the objtool maintainers.
    If it's not actually in a callable function (e.g. kernel entry code),
    change ENDPROC to END.
 
+3. file.o: warning: objtool: foo+0x48c: bar() is missing a __noreturn annotation
+
+   The call from foo() to bar() doesn't return, but bar() is missing the
+   __noreturn annotation.  NOTE: In addition to annotating the function
+   with __noreturn, please also add it to tools/objtool/noreturns.h.
 
 4. file.o: warning: objtool: func(): can't find starting instruction
    or
index 73f9ae1..66814fa 100644 (file)
@@ -1,10 +1,13 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
-
 #ifndef _OBJTOOL_ARCH_ELF
 #define _OBJTOOL_ARCH_ELF
 
-#define R_NONE R_PPC_NONE
-#define R_ABS64 R_PPC64_ADDR64
-#define R_ABS32 R_PPC_ADDR32
+#define R_NONE         R_PPC_NONE
+#define R_ABS64                R_PPC64_ADDR64
+#define R_ABS32                R_PPC_ADDR32
+#define R_DATA32       R_PPC_REL32
+#define R_DATA64       R_PPC64_REL64
+#define R_TEXT32       R_PPC_REL32
+#define R_TEXT64       R_PPC64_REL32
 
 #endif /* _OBJTOOL_ARCH_ELF */
index 9ef024f..2e1caab 100644 (file)
@@ -84,7 +84,7 @@ bool arch_pc_relative_reloc(struct reloc *reloc)
         * All relocation types where P (the address of the target)
         * is included in the computation.
         */
-       switch (reloc->type) {
+       switch (reloc_type(reloc)) {
        case R_X86_64_PC8:
        case R_X86_64_PC16:
        case R_X86_64_PC32:
@@ -623,11 +623,11 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
                        if (!immr || strcmp(immr->sym->name, "pv_ops"))
                                break;
 
-                       idx = (immr->addend + 8) / sizeof(void *);
+                       idx = (reloc_addend(immr) + 8) / sizeof(void *);
 
                        func = disp->sym;
                        if (disp->sym->type == STT_SECTION)
-                               func = find_symbol_by_offset(disp->sym->sec, disp->addend);
+                               func = find_symbol_by_offset(disp->sym->sec, reloc_addend(disp));
                        if (!func) {
                                WARN("no func for pv_ops[]");
                                return -1;
index ac14987..7131f7f 100644 (file)
@@ -1,8 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
 #ifndef _OBJTOOL_ARCH_ELF
 #define _OBJTOOL_ARCH_ELF
 
-#define R_NONE R_X86_64_NONE
-#define R_ABS64 R_X86_64_64
-#define R_ABS32 R_X86_64_32
+#define R_NONE         R_X86_64_NONE
+#define R_ABS32                R_X86_64_32
+#define R_ABS64                R_X86_64_64
+#define R_DATA32       R_X86_64_PC32
+#define R_DATA64       R_X86_64_PC32
+#define R_TEXT32       R_X86_64_PC32
+#define R_TEXT64       R_X86_64_PC32
 
 #endif /* _OBJTOOL_ARCH_ELF */
index 7c97b73..29e9495 100644 (file)
@@ -42,13 +42,7 @@ bool arch_support_alt_relocation(struct special_alt *special_alt,
                                 struct instruction *insn,
                                 struct reloc *reloc)
 {
-       /*
-        * The x86 alternatives code adjusts the offsets only when it
-        * encounters a branch instruction at the very beginning of the
-        * replacement group.
-        */
-       return insn->offset == special_alt->new_off &&
-              (insn->type == INSN_CALL || is_jump(insn));
+       return true;
 }
 
 /*
@@ -105,10 +99,10 @@ struct reloc *arch_find_switch_table(struct objtool_file *file,
            !text_reloc->sym->sec->rodata)
                return NULL;
 
-       table_offset = text_reloc->addend;
+       table_offset = reloc_addend(text_reloc);
        table_sec = text_reloc->sym->sec;
 
-       if (text_reloc->type == R_X86_64_PC32)
+       if (reloc_type(text_reloc) == R_X86_64_PC32)
                table_offset += 4;
 
        /*
@@ -138,7 +132,7 @@ struct reloc *arch_find_switch_table(struct objtool_file *file,
         * indicates a rare GCC quirk/bug which can leave dead
         * code behind.
         */
-       if (text_reloc->type == R_X86_64_PC32)
+       if (reloc_type(text_reloc) == R_X86_64_PC32)
                file->ignore_unreachables = true;
 
        return rodata_reloc;
index 7c17519..5e21cfb 100644 (file)
@@ -93,6 +93,7 @@ static const struct option check_options[] = {
        OPT_BOOLEAN(0, "no-unreachable", &opts.no_unreachable, "skip 'unreachable instruction' warnings"),
        OPT_BOOLEAN(0, "sec-address", &opts.sec_address, "print section addresses in warnings"),
        OPT_BOOLEAN(0, "stats", &opts.stats, "print statistics"),
+       OPT_BOOLEAN('v', "verbose", &opts.verbose, "verbose warnings"),
 
        OPT_END(),
 };
@@ -118,6 +119,10 @@ int cmd_parse_options(int argc, const char **argv, const char * const usage[])
                parse_options(envc, envv, check_options, env_usage, 0);
        }
 
+       env = getenv("OBJTOOL_VERBOSE");
+       if (env && !strcmp(env, "1"))
+               opts.verbose = true;
+
        argc = parse_options(argc, argv, check_options, usage, 0);
        if (argc != 1)
                usage_with_options(usage, check_options);
index 0fcf99c..8936a05 100644 (file)
@@ -8,7 +8,6 @@
 #include <inttypes.h>
 #include <sys/mman.h>
 
-#include <arch/elf.h>
 #include <objtool/builtin.h>
 #include <objtool/cfi.h>
 #include <objtool/arch.h>
@@ -33,6 +32,7 @@ static unsigned long nr_cfi, nr_cfi_reused, nr_cfi_cache;
 static struct cfi_init_state initial_func_cfi;
 static struct cfi_state init_cfi;
 static struct cfi_state func_cfi;
+static struct cfi_state force_undefined_cfi;
 
 struct instruction *find_insn(struct objtool_file *file,
                              struct section *sec, unsigned long offset)
@@ -192,51 +192,11 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func,
        struct instruction *insn;
        bool empty = true;
 
-       /*
-        * Unfortunately these have to be hard coded because the noreturn
-        * attribute isn't provided in ELF data. Keep 'em sorted.
-        */
+#define NORETURN(func) __stringify(func),
        static const char * const global_noreturns[] = {
-               "__invalid_creds",
-               "__module_put_and_kthread_exit",
-               "__reiserfs_panic",
-               "__stack_chk_fail",
-               "__ubsan_handle_builtin_unreachable",
-               "arch_call_rest_init",
-               "arch_cpu_idle_dead",
-               "btrfs_assertfail",
-               "cpu_bringup_and_idle",
-               "cpu_startup_entry",
-               "do_exit",
-               "do_group_exit",
-               "do_task_dead",
-               "ex_handler_msr_mce",
-               "fortify_panic",
-               "hlt_play_dead",
-               "hv_ghcb_terminate",
-               "kthread_complete_and_exit",
-               "kthread_exit",
-               "kunit_try_catch_throw",
-               "lbug_with_loc",
-               "machine_real_restart",
-               "make_task_dead",
-               "mpt_halt_firmware",
-               "nmi_panic_self_stop",
-               "panic",
-               "panic_smp_self_stop",
-               "rest_init",
-               "resume_play_dead",
-               "rewind_stack_and_make_dead",
-               "sev_es_terminate",
-               "snp_abort",
-               "start_kernel",
-               "stop_this_cpu",
-               "usercopy_abort",
-               "x86_64_start_kernel",
-               "x86_64_start_reservations",
-               "xen_cpu_bringup_again",
-               "xen_start_kernel",
+#include "noreturns.h"
        };
+#undef NORETURN
 
        if (!func)
                return false;
@@ -533,7 +493,7 @@ static int add_pv_ops(struct objtool_file *file, const char *symname)
 {
        struct symbol *sym, *func;
        unsigned long off, end;
-       struct reloc *rel;
+       struct reloc *reloc;
        int idx;
 
        sym = find_symbol_by_name(file->elf, symname);
@@ -543,19 +503,20 @@ static int add_pv_ops(struct objtool_file *file, const char *symname)
        off = sym->offset;
        end = off + sym->len;
        for (;;) {
-               rel = find_reloc_by_dest_range(file->elf, sym->sec, off, end - off);
-               if (!rel)
+               reloc = find_reloc_by_dest_range(file->elf, sym->sec, off, end - off);
+               if (!reloc)
                        break;
 
-               func = rel->sym;
+               func = reloc->sym;
                if (func->type == STT_SECTION)
-                       func = find_symbol_by_offset(rel->sym->sec, rel->addend);
+                       func = find_symbol_by_offset(reloc->sym->sec,
+                                                    reloc_addend(reloc));
 
-               idx = (rel->offset - sym->offset) / sizeof(unsigned long);
+               idx = (reloc_offset(reloc) - sym->offset) / sizeof(unsigned long);
 
                objtool_pv_add(file, idx, func);
 
-               off = rel->offset + 1;
+               off = reloc_offset(reloc) + 1;
                if (off > end)
                        break;
        }
@@ -620,35 +581,40 @@ static struct instruction *find_last_insn(struct objtool_file *file,
  */
 static int add_dead_ends(struct objtool_file *file)
 {
-       struct section *sec;
+       struct section *rsec;
        struct reloc *reloc;
        struct instruction *insn;
+       s64 addend;
 
        /*
         * Check for manually annotated dead ends.
         */
-       sec = find_section_by_name(file->elf, ".rela.discard.unreachable");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.unreachable");
+       if (!rsec)
                goto reachable;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
+       for_each_reloc(rsec, reloc) {
+
                if (reloc->sym->type != STT_SECTION) {
-                       WARN("unexpected relocation symbol type in %s", sec->name);
+                       WARN("unexpected relocation symbol type in %s", rsec->name);
                        return -1;
                }
-               insn = find_insn(file, reloc->sym->sec, reloc->addend);
+
+               addend = reloc_addend(reloc);
+
+               insn = find_insn(file, reloc->sym->sec, addend);
                if (insn)
                        insn = prev_insn_same_sec(file, insn);
-               else if (reloc->addend == reloc->sym->sec->sh.sh_size) {
+               else if (addend == reloc->sym->sec->sh.sh_size) {
                        insn = find_last_insn(file, reloc->sym->sec);
                        if (!insn) {
                                WARN("can't find unreachable insn at %s+0x%" PRIx64,
-                                    reloc->sym->sec->name, reloc->addend);
+                                    reloc->sym->sec->name, addend);
                                return -1;
                        }
                } else {
                        WARN("can't find unreachable insn at %s+0x%" PRIx64,
-                            reloc->sym->sec->name, reloc->addend);
+                            reloc->sym->sec->name, addend);
                        return -1;
                }
 
@@ -662,28 +628,32 @@ reachable:
         * GCC doesn't know the "ud2" is fatal, so it generates code as if it's
         * not a dead end.
         */
-       sec = find_section_by_name(file->elf, ".rela.discard.reachable");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.reachable");
+       if (!rsec)
                return 0;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
+       for_each_reloc(rsec, reloc) {
+
                if (reloc->sym->type != STT_SECTION) {
-                       WARN("unexpected relocation symbol type in %s", sec->name);
+                       WARN("unexpected relocation symbol type in %s", rsec->name);
                        return -1;
                }
-               insn = find_insn(file, reloc->sym->sec, reloc->addend);
+
+               addend = reloc_addend(reloc);
+
+               insn = find_insn(file, reloc->sym->sec, addend);
                if (insn)
                        insn = prev_insn_same_sec(file, insn);
-               else if (reloc->addend == reloc->sym->sec->sh.sh_size) {
+               else if (addend == reloc->sym->sec->sh.sh_size) {
                        insn = find_last_insn(file, reloc->sym->sec);
                        if (!insn) {
                                WARN("can't find reachable insn at %s+0x%" PRIx64,
-                                    reloc->sym->sec->name, reloc->addend);
+                                    reloc->sym->sec->name, addend);
                                return -1;
                        }
                } else {
                        WARN("can't find reachable insn at %s+0x%" PRIx64,
-                            reloc->sym->sec->name, reloc->addend);
+                            reloc->sym->sec->name, addend);
                        return -1;
                }
 
@@ -695,8 +665,8 @@ reachable:
 
 static int create_static_call_sections(struct objtool_file *file)
 {
-       struct section *sec;
        struct static_call_site *site;
+       struct section *sec;
        struct instruction *insn;
        struct symbol *key_sym;
        char *key_name, *tmp;
@@ -716,22 +686,21 @@ static int create_static_call_sections(struct objtool_file *file)
        list_for_each_entry(insn, &file->static_call_list, call_node)
                idx++;
 
-       sec = elf_create_section(file->elf, ".static_call_sites", SHF_WRITE,
-                                sizeof(struct static_call_site), idx);
+       sec = elf_create_section_pair(file->elf, ".static_call_sites",
+                                     sizeof(*site), idx, idx * 2);
        if (!sec)
                return -1;
 
+       /* Allow modules to modify the low bits of static_call_site::key */
+       sec->sh.sh_flags |= SHF_WRITE;
+
        idx = 0;
        list_for_each_entry(insn, &file->static_call_list, call_node) {
 
-               site = (struct static_call_site *)sec->data->d_buf + idx;
-               memset(site, 0, sizeof(struct static_call_site));
-
                /* populate reloc for 'addr' */
-               if (elf_add_reloc_to_insn(file->elf, sec,
-                                         idx * sizeof(struct static_call_site),
-                                         R_X86_64_PC32,
-                                         insn->sec, insn->offset))
+               if (!elf_init_reloc_text_sym(file->elf, sec,
+                                            idx * sizeof(*site), idx * 2,
+                                            insn->sec, insn->offset))
                        return -1;
 
                /* find key symbol */
@@ -771,10 +740,10 @@ static int create_static_call_sections(struct objtool_file *file)
                free(key_name);
 
                /* populate reloc for 'key' */
-               if (elf_add_reloc(file->elf, sec,
-                                 idx * sizeof(struct static_call_site) + 4,
-                                 R_X86_64_PC32, key_sym,
-                                 is_sibling_call(insn) * STATIC_CALL_SITE_TAIL))
+               if (!elf_init_reloc_data_sym(file->elf, sec,
+                                            idx * sizeof(*site) + 4,
+                                            (idx * 2) + 1, key_sym,
+                                            is_sibling_call(insn) * STATIC_CALL_SITE_TAIL))
                        return -1;
 
                idx++;
@@ -802,26 +771,18 @@ static int create_retpoline_sites_sections(struct objtool_file *file)
        if (!idx)
                return 0;
 
-       sec = elf_create_section(file->elf, ".retpoline_sites", 0,
-                                sizeof(int), idx);
-       if (!sec) {
-               WARN("elf_create_section: .retpoline_sites");
+       sec = elf_create_section_pair(file->elf, ".retpoline_sites",
+                                     sizeof(int), idx, idx);
+       if (!sec)
                return -1;
-       }
 
        idx = 0;
        list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
 
-               int *site = (int *)sec->data->d_buf + idx;
-               *site = 0;
-
-               if (elf_add_reloc_to_insn(file->elf, sec,
-                                         idx * sizeof(int),
-                                         R_X86_64_PC32,
-                                         insn->sec, insn->offset)) {
-                       WARN("elf_add_reloc_to_insn: .retpoline_sites");
+               if (!elf_init_reloc_text_sym(file->elf, sec,
+                                            idx * sizeof(int), idx,
+                                            insn->sec, insn->offset))
                        return -1;
-               }
 
                idx++;
        }
@@ -848,26 +809,18 @@ static int create_return_sites_sections(struct objtool_file *file)
        if (!idx)
                return 0;
 
-       sec = elf_create_section(file->elf, ".return_sites", 0,
-                                sizeof(int), idx);
-       if (!sec) {
-               WARN("elf_create_section: .return_sites");
+       sec = elf_create_section_pair(file->elf, ".return_sites",
+                                     sizeof(int), idx, idx);
+       if (!sec)
                return -1;
-       }
 
        idx = 0;
        list_for_each_entry(insn, &file->return_thunk_list, call_node) {
 
-               int *site = (int *)sec->data->d_buf + idx;
-               *site = 0;
-
-               if (elf_add_reloc_to_insn(file->elf, sec,
-                                         idx * sizeof(int),
-                                         R_X86_64_PC32,
-                                         insn->sec, insn->offset)) {
-                       WARN("elf_add_reloc_to_insn: .return_sites");
+               if (!elf_init_reloc_text_sym(file->elf, sec,
+                                            idx * sizeof(int), idx,
+                                            insn->sec, insn->offset))
                        return -1;
-               }
 
                idx++;
        }
@@ -900,12 +853,10 @@ static int create_ibt_endbr_seal_sections(struct objtool_file *file)
        if (!idx)
                return 0;
 
-       sec = elf_create_section(file->elf, ".ibt_endbr_seal", 0,
-                                sizeof(int), idx);
-       if (!sec) {
-               WARN("elf_create_section: .ibt_endbr_seal");
+       sec = elf_create_section_pair(file->elf, ".ibt_endbr_seal",
+                                     sizeof(int), idx, idx);
+       if (!sec)
                return -1;
-       }
 
        idx = 0;
        list_for_each_entry(insn, &file->endbr_list, call_node) {
@@ -920,13 +871,10 @@ static int create_ibt_endbr_seal_sections(struct objtool_file *file)
                     !strcmp(sym->name, "cleanup_module")))
                        WARN("%s(): not an indirect call target", sym->name);
 
-               if (elf_add_reloc_to_insn(file->elf, sec,
-                                         idx * sizeof(int),
-                                         R_X86_64_PC32,
-                                         insn->sec, insn->offset)) {
-                       WARN("elf_add_reloc_to_insn: .ibt_endbr_seal");
+               if (!elf_init_reloc_text_sym(file->elf, sec,
+                                            idx * sizeof(int), idx,
+                                            insn->sec, insn->offset))
                        return -1;
-               }
 
                idx++;
        }
@@ -938,7 +886,6 @@ static int create_cfi_sections(struct objtool_file *file)
 {
        struct section *sec;
        struct symbol *sym;
-       unsigned int *loc;
        int idx;
 
        sec = find_section_by_name(file->elf, ".cfi_sites");
@@ -959,7 +906,8 @@ static int create_cfi_sections(struct objtool_file *file)
                idx++;
        }
 
-       sec = elf_create_section(file->elf, ".cfi_sites", 0, sizeof(unsigned int), idx);
+       sec = elf_create_section_pair(file->elf, ".cfi_sites",
+                                     sizeof(unsigned int), idx, idx);
        if (!sec)
                return -1;
 
@@ -971,13 +919,9 @@ static int create_cfi_sections(struct objtool_file *file)
                if (strncmp(sym->name, "__cfi_", 6))
                        continue;
 
-               loc = (unsigned int *)sec->data->d_buf + idx;
-               memset(loc, 0, sizeof(unsigned int));
-
-               if (elf_add_reloc_to_insn(file->elf, sec,
-                                         idx * sizeof(unsigned int),
-                                         R_X86_64_PC32,
-                                         sym->sec, sym->offset))
+               if (!elf_init_reloc_text_sym(file->elf, sec,
+                                            idx * sizeof(unsigned int), idx,
+                                            sym->sec, sym->offset))
                        return -1;
 
                idx++;
@@ -988,7 +932,7 @@ static int create_cfi_sections(struct objtool_file *file)
 
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
-       int addrsize = elf_class_addrsize(file->elf);
+       size_t addr_size = elf_addr_size(file->elf);
        struct instruction *insn;
        struct section *sec;
        int idx;
@@ -1007,25 +951,26 @@ static int create_mcount_loc_sections(struct objtool_file *file)
        list_for_each_entry(insn, &file->mcount_loc_list, call_node)
                idx++;
 
-       sec = elf_create_section(file->elf, "__mcount_loc", 0, addrsize, idx);
+       sec = elf_create_section_pair(file->elf, "__mcount_loc", addr_size,
+                                     idx, idx);
        if (!sec)
                return -1;
 
-       sec->sh.sh_addralign = addrsize;
+       sec->sh.sh_addralign = addr_size;
 
        idx = 0;
        list_for_each_entry(insn, &file->mcount_loc_list, call_node) {
-               void *loc;
 
-               loc = sec->data->d_buf + idx;
-               memset(loc, 0, addrsize);
+               struct reloc *reloc;
 
-               if (elf_add_reloc_to_insn(file->elf, sec, idx,
-                                         addrsize == sizeof(u64) ? R_ABS64 : R_ABS32,
-                                         insn->sec, insn->offset))
+               reloc = elf_init_reloc_text_sym(file->elf, sec, idx * addr_size, idx,
+                                              insn->sec, insn->offset);
+               if (!reloc)
                        return -1;
 
-               idx += addrsize;
+               set_reloc_type(file->elf, reloc, addr_size == 8 ? R_ABS64 : R_ABS32);
+
+               idx++;
        }
 
        return 0;
@@ -1035,7 +980,6 @@ static int create_direct_call_sections(struct objtool_file *file)
 {
        struct instruction *insn;
        struct section *sec;
-       unsigned int *loc;
        int idx;
 
        sec = find_section_by_name(file->elf, ".call_sites");
@@ -1052,20 +996,17 @@ static int create_direct_call_sections(struct objtool_file *file)
        list_for_each_entry(insn, &file->call_list, call_node)
                idx++;
 
-       sec = elf_create_section(file->elf, ".call_sites", 0, sizeof(unsigned int), idx);
+       sec = elf_create_section_pair(file->elf, ".call_sites",
+                                     sizeof(unsigned int), idx, idx);
        if (!sec)
                return -1;
 
        idx = 0;
        list_for_each_entry(insn, &file->call_list, call_node) {
 
-               loc = (unsigned int *)sec->data->d_buf + idx;
-               memset(loc, 0, sizeof(unsigned int));
-
-               if (elf_add_reloc_to_insn(file->elf, sec,
-                                         idx * sizeof(unsigned int),
-                                         R_X86_64_PC32,
-                                         insn->sec, insn->offset))
+               if (!elf_init_reloc_text_sym(file->elf, sec,
+                                            idx * sizeof(unsigned int), idx,
+                                            insn->sec, insn->offset))
                        return -1;
 
                idx++;
@@ -1080,28 +1021,29 @@ static int create_direct_call_sections(struct objtool_file *file)
 static void add_ignores(struct objtool_file *file)
 {
        struct instruction *insn;
-       struct section *sec;
+       struct section *rsec;
        struct symbol *func;
        struct reloc *reloc;
 
-       sec = find_section_by_name(file->elf, ".rela.discard.func_stack_frame_non_standard");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.func_stack_frame_non_standard");
+       if (!rsec)
                return;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
+       for_each_reloc(rsec, reloc) {
                switch (reloc->sym->type) {
                case STT_FUNC:
                        func = reloc->sym;
                        break;
 
                case STT_SECTION:
-                       func = find_func_by_offset(reloc->sym->sec, reloc->addend);
+                       func = find_func_by_offset(reloc->sym->sec, reloc_addend(reloc));
                        if (!func)
                                continue;
                        break;
 
                default:
-                       WARN("unexpected relocation symbol type in %s: %d", sec->name, reloc->sym->type);
+                       WARN("unexpected relocation symbol type in %s: %d",
+                            rsec->name, reloc->sym->type);
                        continue;
                }
 
@@ -1320,21 +1262,21 @@ static void add_uaccess_safe(struct objtool_file *file)
  */
 static int add_ignore_alternatives(struct objtool_file *file)
 {
-       struct section *sec;
+       struct section *rsec;
        struct reloc *reloc;
        struct instruction *insn;
 
-       sec = find_section_by_name(file->elf, ".rela.discard.ignore_alts");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.ignore_alts");
+       if (!rsec)
                return 0;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
+       for_each_reloc(rsec, reloc) {
                if (reloc->sym->type != STT_SECTION) {
-                       WARN("unexpected relocation symbol type in %s", sec->name);
+                       WARN("unexpected relocation symbol type in %s", rsec->name);
                        return -1;
                }
 
-               insn = find_insn(file, reloc->sym->sec, reloc->addend);
+               insn = find_insn(file, reloc->sym->sec, reloc_addend(reloc));
                if (!insn) {
                        WARN("bad .discard.ignore_alts entry");
                        return -1;
@@ -1421,10 +1363,8 @@ static void annotate_call_site(struct objtool_file *file,
         * noinstr text.
         */
        if (opts.hack_noinstr && insn->sec->noinstr && sym->profiling_func) {
-               if (reloc) {
-                       reloc->type = R_NONE;
-                       elf_write_reloc(file->elf, reloc);
-               }
+               if (reloc)
+                       set_reloc_type(file->elf, reloc, R_NONE);
 
                elf_write_insn(file->elf, insn->sec,
                               insn->offset, insn->len,
@@ -1450,10 +1390,8 @@ static void annotate_call_site(struct objtool_file *file,
                if (sibling)
                        WARN_INSN(insn, "tail call to __fentry__ !?!?");
                if (opts.mnop) {
-                       if (reloc) {
-                               reloc->type = R_NONE;
-                               elf_write_reloc(file->elf, reloc);
-                       }
+                       if (reloc)
+                               set_reloc_type(file->elf, reloc, R_NONE);
 
                        elf_write_insn(file->elf, insn->sec,
                                       insn->offset, insn->len,
@@ -1610,7 +1548,7 @@ static int add_jump_destinations(struct objtool_file *file)
                        dest_off = arch_jump_destination(insn);
                } else if (reloc->sym->type == STT_SECTION) {
                        dest_sec = reloc->sym->sec;
-                       dest_off = arch_dest_reloc_offset(reloc->addend);
+                       dest_off = arch_dest_reloc_offset(reloc_addend(reloc));
                } else if (reloc->sym->retpoline_thunk) {
                        add_retpoline_call(file, insn);
                        continue;
@@ -1627,7 +1565,7 @@ static int add_jump_destinations(struct objtool_file *file)
                } else if (reloc->sym->sec->idx) {
                        dest_sec = reloc->sym->sec;
                        dest_off = reloc->sym->sym.st_value +
-                                  arch_dest_reloc_offset(reloc->addend);
+                                  arch_dest_reloc_offset(reloc_addend(reloc));
                } else {
                        /* non-func asm code jumping to another file */
                        continue;
@@ -1744,7 +1682,7 @@ static int add_call_destinations(struct objtool_file *file)
                        }
 
                } else if (reloc->sym->type == STT_SECTION) {
-                       dest_off = arch_dest_reloc_offset(reloc->addend);
+                       dest_off = arch_dest_reloc_offset(reloc_addend(reloc));
                        dest = find_call_destination(reloc->sym->sec, dest_off);
                        if (!dest) {
                                WARN_INSN(insn, "can't find call dest symbol at %s+0x%lx",
@@ -1932,10 +1870,8 @@ static int handle_jump_alt(struct objtool_file *file,
        if (opts.hack_jump_label && special_alt->key_addend & 2) {
                struct reloc *reloc = insn_reloc(file, orig_insn);
 
-               if (reloc) {
-                       reloc->type = R_NONE;
-                       elf_write_reloc(file->elf, reloc);
-               }
+               if (reloc)
+                       set_reloc_type(file->elf, reloc, R_NONE);
                elf_write_insn(file->elf, orig_insn->sec,
                               orig_insn->offset, orig_insn->len,
                               arch_nop_insn(orig_insn->len));
@@ -2047,34 +1983,35 @@ out:
 }
 
 static int add_jump_table(struct objtool_file *file, struct instruction *insn,
-                           struct reloc *table)
+                         struct reloc *next_table)
 {
-       struct reloc *reloc = table;
-       struct instruction *dest_insn;
-       struct alternative *alt;
        struct symbol *pfunc = insn_func(insn)->pfunc;
+       struct reloc *table = insn_jump_table(insn);
+       struct instruction *dest_insn;
        unsigned int prev_offset = 0;
+       struct reloc *reloc = table;
+       struct alternative *alt;
 
        /*
         * Each @reloc is a switch table relocation which points to the target
         * instruction.
         */
-       list_for_each_entry_from(reloc, &table->sec->reloc_list, list) {
+       for_each_reloc_from(table->sec, reloc) {
 
                /* Check for the end of the table: */
-               if (reloc != table && reloc->jump_table_start)
+               if (reloc != table && reloc == next_table)
                        break;
 
                /* Make sure the table entries are consecutive: */
-               if (prev_offset && reloc->offset != prev_offset + 8)
+               if (prev_offset && reloc_offset(reloc) != prev_offset + 8)
                        break;
 
                /* Detect function pointers from contiguous objects: */
                if (reloc->sym->sec == pfunc->sec &&
-                   reloc->addend == pfunc->offset)
+                   reloc_addend(reloc) == pfunc->offset)
                        break;
 
-               dest_insn = find_insn(file, reloc->sym->sec, reloc->addend);
+               dest_insn = find_insn(file, reloc->sym->sec, reloc_addend(reloc));
                if (!dest_insn)
                        break;
 
@@ -2091,7 +2028,7 @@ static int add_jump_table(struct objtool_file *file, struct instruction *insn,
                alt->insn = dest_insn;
                alt->next = insn->alts;
                insn->alts = alt;
-               prev_offset = reloc->offset;
+               prev_offset = reloc_offset(reloc);
        }
 
        if (!prev_offset) {
@@ -2135,7 +2072,7 @@ static struct reloc *find_jump_table(struct objtool_file *file,
                table_reloc = arch_find_switch_table(file, insn);
                if (!table_reloc)
                        continue;
-               dest_insn = find_insn(file, table_reloc->sym->sec, table_reloc->addend);
+               dest_insn = find_insn(file, table_reloc->sym->sec, reloc_addend(table_reloc));
                if (!dest_insn || !insn_func(dest_insn) || insn_func(dest_insn)->pfunc != func)
                        continue;
 
@@ -2177,29 +2114,39 @@ static void mark_func_jump_tables(struct objtool_file *file,
                        continue;
 
                reloc = find_jump_table(file, func, insn);
-               if (reloc) {
-                       reloc->jump_table_start = true;
+               if (reloc)
                        insn->_jump_table = reloc;
-               }
        }
 }
 
 static int add_func_jump_tables(struct objtool_file *file,
                                  struct symbol *func)
 {
-       struct instruction *insn;
-       int ret;
+       struct instruction *insn, *insn_t1 = NULL, *insn_t2;
+       int ret = 0;
 
        func_for_each_insn(file, func, insn) {
                if (!insn_jump_table(insn))
                        continue;
 
-               ret = add_jump_table(file, insn, insn_jump_table(insn));
+               if (!insn_t1) {
+                       insn_t1 = insn;
+                       continue;
+               }
+
+               insn_t2 = insn;
+
+               ret = add_jump_table(file, insn_t1, insn_jump_table(insn_t2));
                if (ret)
                        return ret;
+
+               insn_t1 = insn_t2;
        }
 
-       return 0;
+       if (insn_t1)
+               ret = add_jump_table(file, insn_t1, NULL);
+
+       return ret;
 }
 
 /*
@@ -2240,7 +2187,7 @@ static void set_func_state(struct cfi_state *state)
 static int read_unwind_hints(struct objtool_file *file)
 {
        struct cfi_state cfi = init_cfi;
-       struct section *sec, *relocsec;
+       struct section *sec;
        struct unwind_hint *hint;
        struct instruction *insn;
        struct reloc *reloc;
@@ -2250,8 +2197,7 @@ static int read_unwind_hints(struct objtool_file *file)
        if (!sec)
                return 0;
 
-       relocsec = sec->reloc;
-       if (!relocsec) {
+       if (!sec->rsec) {
                WARN("missing .rela.discard.unwind_hints section");
                return -1;
        }
@@ -2272,7 +2218,7 @@ static int read_unwind_hints(struct objtool_file *file)
                        return -1;
                }
 
-               insn = find_insn(file, reloc->sym->sec, reloc->addend);
+               insn = find_insn(file, reloc->sym->sec, reloc_addend(reloc));
                if (!insn) {
                        WARN("can't find insn for unwind_hints[%d]", i);
                        return -1;
@@ -2280,6 +2226,11 @@ static int read_unwind_hints(struct objtool_file *file)
 
                insn->hint = true;
 
+               if (hint->type == UNWIND_HINT_TYPE_UNDEFINED) {
+                       insn->cfi = &force_undefined_cfi;
+                       continue;
+               }
+
                if (hint->type == UNWIND_HINT_TYPE_SAVE) {
                        insn->hint = false;
                        insn->save = true;
@@ -2326,16 +2277,17 @@ static int read_unwind_hints(struct objtool_file *file)
 
 static int read_noendbr_hints(struct objtool_file *file)
 {
-       struct section *sec;
        struct instruction *insn;
+       struct section *rsec;
        struct reloc *reloc;
 
-       sec = find_section_by_name(file->elf, ".rela.discard.noendbr");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.noendbr");
+       if (!rsec)
                return 0;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
-               insn = find_insn(file, reloc->sym->sec, reloc->sym->offset + reloc->addend);
+       for_each_reloc(rsec, reloc) {
+               insn = find_insn(file, reloc->sym->sec,
+                                reloc->sym->offset + reloc_addend(reloc));
                if (!insn) {
                        WARN("bad .discard.noendbr entry");
                        return -1;
@@ -2349,21 +2301,21 @@ static int read_noendbr_hints(struct objtool_file *file)
 
 static int read_retpoline_hints(struct objtool_file *file)
 {
-       struct section *sec;
+       struct section *rsec;
        struct instruction *insn;
        struct reloc *reloc;
 
-       sec = find_section_by_name(file->elf, ".rela.discard.retpoline_safe");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.retpoline_safe");
+       if (!rsec)
                return 0;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
+       for_each_reloc(rsec, reloc) {
                if (reloc->sym->type != STT_SECTION) {
-                       WARN("unexpected relocation symbol type in %s", sec->name);
+                       WARN("unexpected relocation symbol type in %s", rsec->name);
                        return -1;
                }
 
-               insn = find_insn(file, reloc->sym->sec, reloc->addend);
+               insn = find_insn(file, reloc->sym->sec, reloc_addend(reloc));
                if (!insn) {
                        WARN("bad .discard.retpoline_safe entry");
                        return -1;
@@ -2385,21 +2337,21 @@ static int read_retpoline_hints(struct objtool_file *file)
 
 static int read_instr_hints(struct objtool_file *file)
 {
-       struct section *sec;
+       struct section *rsec;
        struct instruction *insn;
        struct reloc *reloc;
 
-       sec = find_section_by_name(file->elf, ".rela.discard.instr_end");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.instr_end");
+       if (!rsec)
                return 0;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
+       for_each_reloc(rsec, reloc) {
                if (reloc->sym->type != STT_SECTION) {
-                       WARN("unexpected relocation symbol type in %s", sec->name);
+                       WARN("unexpected relocation symbol type in %s", rsec->name);
                        return -1;
                }
 
-               insn = find_insn(file, reloc->sym->sec, reloc->addend);
+               insn = find_insn(file, reloc->sym->sec, reloc_addend(reloc));
                if (!insn) {
                        WARN("bad .discard.instr_end entry");
                        return -1;
@@ -2408,17 +2360,17 @@ static int read_instr_hints(struct objtool_file *file)
                insn->instr--;
        }
 
-       sec = find_section_by_name(file->elf, ".rela.discard.instr_begin");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.instr_begin");
+       if (!rsec)
                return 0;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
+       for_each_reloc(rsec, reloc) {
                if (reloc->sym->type != STT_SECTION) {
-                       WARN("unexpected relocation symbol type in %s", sec->name);
+                       WARN("unexpected relocation symbol type in %s", rsec->name);
                        return -1;
                }
 
-               insn = find_insn(file, reloc->sym->sec, reloc->addend);
+               insn = find_insn(file, reloc->sym->sec, reloc_addend(reloc));
                if (!insn) {
                        WARN("bad .discard.instr_begin entry");
                        return -1;
@@ -2432,21 +2384,21 @@ static int read_instr_hints(struct objtool_file *file)
 
 static int read_validate_unret_hints(struct objtool_file *file)
 {
-       struct section *sec;
+       struct section *rsec;
        struct instruction *insn;
        struct reloc *reloc;
 
-       sec = find_section_by_name(file->elf, ".rela.discard.validate_unret");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.validate_unret");
+       if (!rsec)
                return 0;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
+       for_each_reloc(rsec, reloc) {
                if (reloc->sym->type != STT_SECTION) {
-                       WARN("unexpected relocation symbol type in %s", sec->name);
+                       WARN("unexpected relocation symbol type in %s", rsec->name);
                        return -1;
                }
 
-               insn = find_insn(file, reloc->sym->sec, reloc->addend);
+               insn = find_insn(file, reloc->sym->sec, reloc_addend(reloc));
                if (!insn) {
                        WARN("bad .discard.instr_end entry");
                        return -1;
@@ -2461,23 +2413,23 @@ static int read_validate_unret_hints(struct objtool_file *file)
 static int read_intra_function_calls(struct objtool_file *file)
 {
        struct instruction *insn;
-       struct section *sec;
+       struct section *rsec;
        struct reloc *reloc;
 
-       sec = find_section_by_name(file->elf, ".rela.discard.intra_function_calls");
-       if (!sec)
+       rsec = find_section_by_name(file->elf, ".rela.discard.intra_function_calls");
+       if (!rsec)
                return 0;
 
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
+       for_each_reloc(rsec, reloc) {
                unsigned long dest_off;
 
                if (reloc->sym->type != STT_SECTION) {
                        WARN("unexpected relocation symbol type in %s",
-                            sec->name);
+                            rsec->name);
                        return -1;
                }
 
-               insn = find_insn(file, reloc->sym->sec, reloc->addend);
+               insn = find_insn(file, reloc->sym->sec, reloc_addend(reloc));
                if (!insn) {
                        WARN("bad .discard.intra_function_call entry");
                        return -1;
@@ -2833,6 +2785,10 @@ static int update_cfi_state(struct instruction *insn,
        struct cfi_reg *cfa = &cfi->cfa;
        struct cfi_reg *regs = cfi->regs;
 
+       /* ignore UNWIND_HINT_UNDEFINED regions */
+       if (cfi->force_undefined)
+               return 0;
+
        /* stack operations don't make sense with an undefined CFA */
        if (cfa->base == CFI_UNDEFINED) {
                if (insn_func(insn)) {
@@ -3369,15 +3325,15 @@ static inline bool func_uaccess_safe(struct symbol *func)
 static inline const char *call_dest_name(struct instruction *insn)
 {
        static char pvname[19];
-       struct reloc *rel;
+       struct reloc *reloc;
        int idx;
 
        if (insn_call_dest(insn))
                return insn_call_dest(insn)->name;
 
-       rel = insn_reloc(NULL, insn);
-       if (rel && !strcmp(rel->sym->name, "pv_ops")) {
-               idx = (rel->addend / sizeof(void *));
+       reloc = insn_reloc(NULL, insn);
+       if (reloc && !strcmp(reloc->sym->name, "pv_ops")) {
+               idx = (reloc_addend(reloc) / sizeof(void *));
                snprintf(pvname, sizeof(pvname), "pv_ops[%d]", idx);
                return pvname;
        }
@@ -3388,14 +3344,14 @@ static inline const char *call_dest_name(struct instruction *insn)
 static bool pv_call_dest(struct objtool_file *file, struct instruction *insn)
 {
        struct symbol *target;
-       struct reloc *rel;
+       struct reloc *reloc;
        int idx;
 
-       rel = insn_reloc(file, insn);
-       if (!rel || strcmp(rel->sym->name, "pv_ops"))
+       reloc = insn_reloc(file, insn);
+       if (!reloc || strcmp(reloc->sym->name, "pv_ops"))
                return false;
 
-       idx = (arch_dest_reloc_offset(rel->addend) / sizeof(void *));
+       idx = (arch_dest_reloc_offset(reloc_addend(reloc)) / sizeof(void *));
 
        if (file->pv_ops[idx].clean)
                return true;
@@ -3657,8 +3613,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
 
                                ret = validate_branch(file, func, alt->insn, state);
                                if (ret) {
-                                       if (opts.backtrace)
-                                               BT_FUNC("(alt)", insn);
+                                       BT_INSN(insn, "(alt)");
                                        return ret;
                                }
                        }
@@ -3703,8 +3658,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
                                ret = validate_branch(file, func,
                                                      insn->jump_dest, state);
                                if (ret) {
-                                       if (opts.backtrace)
-                                               BT_FUNC("(branch)", insn);
+                                       BT_INSN(insn, "(branch)");
                                        return ret;
                                }
                        }
@@ -3802,8 +3756,8 @@ static int validate_unwind_hint(struct objtool_file *file,
 {
        if (insn->hint && !insn->visited && !insn->ignore) {
                int ret = validate_branch(file, insn_func(insn), insn, *state);
-               if (ret && opts.backtrace)
-                       BT_FUNC("<=== (hint)", insn);
+               if (ret)
+                       BT_INSN(insn, "<=== (hint)");
                return ret;
        }
 
@@ -3841,7 +3795,7 @@ static int validate_unwind_hints(struct objtool_file *file, struct section *sec)
 static int validate_unret(struct objtool_file *file, struct instruction *insn)
 {
        struct instruction *next, *dest;
-       int ret, warnings = 0;
+       int ret;
 
        for (;;) {
                next = next_insn_to_validate(file, insn);
@@ -3861,8 +3815,7 @@ static int validate_unret(struct objtool_file *file, struct instruction *insn)
 
                                ret = validate_unret(file, alt->insn);
                                if (ret) {
-                                       if (opts.backtrace)
-                                               BT_FUNC("(alt)", insn);
+                                       BT_INSN(insn, "(alt)");
                                        return ret;
                                }
                        }
@@ -3888,10 +3841,8 @@ static int validate_unret(struct objtool_file *file, struct instruction *insn)
                                }
                                ret = validate_unret(file, insn->jump_dest);
                                if (ret) {
-                                       if (opts.backtrace) {
-                                               BT_FUNC("(branch%s)", insn,
-                                                       insn->type == INSN_JUMP_CONDITIONAL ? "-cond" : "");
-                                       }
+                                       BT_INSN(insn, "(branch%s)",
+                                               insn->type == INSN_JUMP_CONDITIONAL ? "-cond" : "");
                                        return ret;
                                }
 
@@ -3913,8 +3864,7 @@ static int validate_unret(struct objtool_file *file, struct instruction *insn)
 
                        ret = validate_unret(file, dest);
                        if (ret) {
-                               if (opts.backtrace)
-                                       BT_FUNC("(call)", insn);
+                               BT_INSN(insn, "(call)");
                                return ret;
                        }
                        /*
@@ -3943,7 +3893,7 @@ static int validate_unret(struct objtool_file *file, struct instruction *insn)
                insn = next;
        }
 
-       return warnings;
+       return 0;
 }
 
 /*
@@ -4178,7 +4128,6 @@ static int add_prefix_symbols(struct objtool_file *file)
 {
        struct section *sec;
        struct symbol *func;
-       int warnings = 0;
 
        for_each_sec(file, sec) {
                if (!(sec->sh.sh_flags & SHF_EXECINSTR))
@@ -4192,7 +4141,7 @@ static int add_prefix_symbols(struct objtool_file *file)
                }
        }
 
-       return warnings;
+       return 0;
 }
 
 static int validate_symbol(struct objtool_file *file, struct section *sec,
@@ -4216,8 +4165,8 @@ static int validate_symbol(struct objtool_file *file, struct section *sec,
        state->uaccess = sym->uaccess_safe;
 
        ret = validate_branch(file, insn_func(insn), insn, *state);
-       if (ret && opts.backtrace)
-               BT_FUNC("<=== (sym)", insn);
+       if (ret)
+               BT_INSN(insn, "<=== (sym)");
        return ret;
 }
 
@@ -4333,8 +4282,8 @@ static int validate_ibt_insn(struct objtool_file *file, struct instruction *insn
        for (reloc = insn_reloc(file, insn);
             reloc;
             reloc = find_reloc_by_dest_range(file->elf, insn->sec,
-                                             reloc->offset + 1,
-                                             (insn->offset + insn->len) - (reloc->offset + 1))) {
+                                             reloc_offset(reloc) + 1,
+                                             (insn->offset + insn->len) - (reloc_offset(reloc) + 1))) {
 
                /*
                 * static_call_update() references the trampoline, which
@@ -4344,10 +4293,11 @@ static int validate_ibt_insn(struct objtool_file *file, struct instruction *insn
                        continue;
 
                off = reloc->sym->offset;
-               if (reloc->type == R_X86_64_PC32 || reloc->type == R_X86_64_PLT32)
-                       off += arch_dest_reloc_offset(reloc->addend);
+               if (reloc_type(reloc) == R_X86_64_PC32 ||
+                   reloc_type(reloc) == R_X86_64_PLT32)
+                       off += arch_dest_reloc_offset(reloc_addend(reloc));
                else
-                       off += reloc->addend;
+                       off += reloc_addend(reloc);
 
                dest = find_insn(file, reloc->sym->sec, off);
                if (!dest)
@@ -4404,7 +4354,7 @@ static int validate_ibt_data_reloc(struct objtool_file *file,
        struct instruction *dest;
 
        dest = find_insn(file, reloc->sym->sec,
-                        reloc->sym->offset + reloc->addend);
+                        reloc->sym->offset + reloc_addend(reloc));
        if (!dest)
                return 0;
 
@@ -4417,7 +4367,7 @@ static int validate_ibt_data_reloc(struct objtool_file *file,
                return 0;
 
        WARN_FUNC("data relocation to !ENDBR: %s",
-                 reloc->sec->base, reloc->offset,
+                 reloc->sec->base, reloc_offset(reloc),
                  offstr(dest->sec, dest->offset));
 
        return 1;
@@ -4444,7 +4394,7 @@ static int validate_ibt(struct objtool_file *file)
                if (sec->sh.sh_flags & SHF_EXECINSTR)
                        continue;
 
-               if (!sec->reloc)
+               if (!sec->rsec)
                        continue;
 
                /*
@@ -4471,7 +4421,7 @@ static int validate_ibt(struct objtool_file *file)
                    strstr(sec->name, "__patchable_function_entries"))
                        continue;
 
-               list_for_each_entry(reloc, &sec->reloc->reloc_list, list)
+               for_each_reloc(sec->rsec, reloc)
                        warnings += validate_ibt_data_reloc(file, reloc);
        }
 
@@ -4511,9 +4461,40 @@ static int validate_sls(struct objtool_file *file)
        return warnings;
 }
 
+static bool ignore_noreturn_call(struct instruction *insn)
+{
+       struct symbol *call_dest = insn_call_dest(insn);
+
+       /*
+        * FIXME: hack, we need a real noreturn solution
+        *
+        * Problem is, exc_double_fault() may or may not return, depending on
+        * whether CONFIG_X86_ESPFIX64 is set.  But objtool has no visibility
+        * to the kernel config.
+        *
+        * Other potential ways to fix it:
+        *
+        *   - have compiler communicate __noreturn functions somehow
+        *   - remove CONFIG_X86_ESPFIX64
+        *   - read the .config file
+        *   - add a cmdline option
+        *   - create a generic objtool annotation format (vs a bunch of custom
+        *     formats) and annotate it
+        */
+       if (!strcmp(call_dest->name, "exc_double_fault")) {
+               /* prevent further unreachable warnings for the caller */
+               insn->sym->warned = 1;
+               return true;
+       }
+
+       return false;
+}
+
 static int validate_reachable_instructions(struct objtool_file *file)
 {
-       struct instruction *insn;
+       struct instruction *insn, *prev_insn;
+       struct symbol *call_dest;
+       int warnings = 0;
 
        if (file->ignore_unreachables)
                return 0;
@@ -4522,13 +4503,127 @@ static int validate_reachable_instructions(struct objtool_file *file)
                if (insn->visited || ignore_unreachable_insn(file, insn))
                        continue;
 
+               prev_insn = prev_insn_same_sec(file, insn);
+               if (prev_insn && prev_insn->dead_end) {
+                       call_dest = insn_call_dest(prev_insn);
+                       if (call_dest && !ignore_noreturn_call(prev_insn)) {
+                               WARN_INSN(insn, "%s() is missing a __noreturn annotation",
+                                         call_dest->name);
+                               warnings++;
+                               continue;
+                       }
+               }
+
                WARN_INSN(insn, "unreachable instruction");
-               return 1;
+               warnings++;
+       }
+
+       return warnings;
+}
+
+/* 'funcs' is a space-separated list of function names */
+static int disas_funcs(const char *funcs)
+{
+       const char *objdump_str, *cross_compile;
+       int size, ret;
+       char *cmd;
+
+       cross_compile = getenv("CROSS_COMPILE");
+
+       objdump_str = "%sobjdump -wdr %s | gawk -M -v _funcs='%s' '"
+                       "BEGIN { split(_funcs, funcs); }"
+                       "/^$/ { func_match = 0; }"
+                       "/<.*>:/ { "
+                               "f = gensub(/.*<(.*)>:/, \"\\\\1\", 1);"
+                               "for (i in funcs) {"
+                                       "if (funcs[i] == f) {"
+                                               "func_match = 1;"
+                                               "base = strtonum(\"0x\" $1);"
+                                               "break;"
+                                       "}"
+                               "}"
+                       "}"
+                       "{"
+                               "if (func_match) {"
+                                       "addr = strtonum(\"0x\" $1);"
+                                       "printf(\"%%04x \", addr - base);"
+                                       "print;"
+                               "}"
+                       "}' 1>&2";
+
+       /* fake snprintf() to calculate the size */
+       size = snprintf(NULL, 0, objdump_str, cross_compile, objname, funcs) + 1;
+       if (size <= 0) {
+               WARN("objdump string size calculation failed");
+               return -1;
+       }
+
+       cmd = malloc(size);
+
+       /* real snprintf() */
+       snprintf(cmd, size, objdump_str, cross_compile, objname, funcs);
+       ret = system(cmd);
+       if (ret) {
+               WARN("disassembly failed: %d", ret);
+               return -1;
        }
 
        return 0;
 }
 
+static int disas_warned_funcs(struct objtool_file *file)
+{
+       struct symbol *sym;
+       char *funcs = NULL, *tmp;
+
+       for_each_sym(file, sym) {
+               if (sym->warned) {
+                       if (!funcs) {
+                               funcs = malloc(strlen(sym->name) + 1);
+                               strcpy(funcs, sym->name);
+                       } else {
+                               tmp = malloc(strlen(funcs) + strlen(sym->name) + 2);
+                               sprintf(tmp, "%s %s", funcs, sym->name);
+                               free(funcs);
+                               funcs = tmp;
+                       }
+               }
+       }
+
+       if (funcs)
+               disas_funcs(funcs);
+
+       return 0;
+}
+
+struct insn_chunk {
+       void *addr;
+       struct insn_chunk *next;
+};
+
+/*
+ * Reduce peak RSS usage by freeing insns memory before writing the ELF file,
+ * which can trigger more allocations for .debug_* sections whose data hasn't
+ * been read yet.
+ */
+static void free_insns(struct objtool_file *file)
+{
+       struct instruction *insn;
+       struct insn_chunk *chunks = NULL, *chunk;
+
+       for_each_insn(file, insn) {
+               if (!insn->idx) {
+                       chunk = malloc(sizeof(*chunk));
+                       chunk->addr = insn;
+                       chunk->next = chunks;
+                       chunks = chunk;
+               }
+       }
+
+       for (chunk = chunks; chunk; chunk = chunk->next)
+               free(chunk->addr);
+}
+
 int check(struct objtool_file *file)
 {
        int ret, warnings = 0;
@@ -4537,6 +4632,8 @@ int check(struct objtool_file *file)
        init_cfi_state(&init_cfi);
        init_cfi_state(&func_cfi);
        set_func_state(&func_cfi);
+       init_cfi_state(&force_undefined_cfi);
+       force_undefined_cfi.force_undefined = true;
 
        if (!cfi_hash_alloc(1UL << (file->elf->symbol_bits - 3)))
                goto out;
@@ -4673,6 +4770,10 @@ int check(struct objtool_file *file)
                warnings += ret;
        }
 
+       free_insns(file);
+
+       if (opts.verbose)
+               disas_warned_funcs(file);
 
        if (opts.stats) {
                printf("nr_insns_visited: %ld\n", nr_insns_visited);
index 500e929..d420b5d 100644 (file)
@@ -32,16 +32,52 @@ static inline u32 str_hash(const char *str)
 #define __elf_table(name)      (elf->name##_hash)
 #define __elf_bits(name)       (elf->name##_bits)
 
-#define elf_hash_add(name, node, key) \
-       hlist_add_head(node, &__elf_table(name)[hash_min(key, __elf_bits(name))])
+#define __elf_table_entry(name, key) \
+       __elf_table(name)[hash_min(key, __elf_bits(name))]
+
+#define elf_hash_add(name, node, key)                                  \
+({                                                                     \
+       struct elf_hash_node *__node = node;                            \
+       __node->next = __elf_table_entry(name, key);                    \
+       __elf_table_entry(name, key) = __node;                          \
+})
+
+static inline void __elf_hash_del(struct elf_hash_node *node,
+                                 struct elf_hash_node **head)
+{
+       struct elf_hash_node *cur, *prev;
+
+       if (node == *head) {
+               *head = node->next;
+               return;
+       }
+
+       for (prev = NULL, cur = *head; cur; prev = cur, cur = cur->next) {
+               if (cur == node) {
+                       prev->next = cur->next;
+                       break;
+               }
+       }
+}
 
-#define elf_hash_for_each_possible(name, obj, member, key) \
-       hlist_for_each_entry(obj, &__elf_table(name)[hash_min(key, __elf_bits(name))], member)
+#define elf_hash_del(name, node, key) \
+       __elf_hash_del(node, &__elf_table_entry(name, key))
+
+#define elf_list_entry(ptr, type, member)                              \
+({                                                                     \
+       typeof(ptr) __ptr = (ptr);                                      \
+       __ptr ? container_of(__ptr, type, member) : NULL;               \
+})
+
+#define elf_hash_for_each_possible(name, obj, member, key)             \
+       for (obj = elf_list_entry(__elf_table_entry(name, key), typeof(*obj), member); \
+            obj;                                                       \
+            obj = elf_list_entry(obj->member.next, typeof(*(obj)), member))
 
 #define elf_alloc_hash(name, size) \
 ({ \
        __elf_bits(name) = max(10, ilog2(size)); \
-       __elf_table(name) = mmap(NULL, sizeof(struct hlist_head) << __elf_bits(name), \
+       __elf_table(name) = mmap(NULL, sizeof(struct elf_hash_node *) << __elf_bits(name), \
                                 PROT_READ|PROT_WRITE, \
                                 MAP_PRIVATE|MAP_ANON, -1, 0); \
        if (__elf_table(name) == (void *)-1L) { \
@@ -233,21 +269,22 @@ struct reloc *find_reloc_by_dest_range(const struct elf *elf, struct section *se
                                     unsigned long offset, unsigned int len)
 {
        struct reloc *reloc, *r = NULL;
+       struct section *rsec;
        unsigned long o;
 
-       if (!sec->reloc)
+       rsec = sec->rsec;
+       if (!rsec)
                return NULL;
 
-       sec = sec->reloc;
-
        for_offset_range(o, offset, offset + len) {
                elf_hash_for_each_possible(reloc, reloc, hash,
-                                          sec_offset_hash(sec, o)) {
-                       if (reloc->sec != sec)
+                                          sec_offset_hash(rsec, o)) {
+                       if (reloc->sec != rsec)
                                continue;
 
-                       if (reloc->offset >= offset && reloc->offset < offset + len) {
-                               if (!r || reloc->offset < r->offset)
+                       if (reloc_offset(reloc) >= offset &&
+                           reloc_offset(reloc) < offset + len) {
+                               if (!r || reloc_offset(reloc) < reloc_offset(r))
                                        r = reloc;
                        }
                }
@@ -263,6 +300,11 @@ struct reloc *find_reloc_by_dest(const struct elf *elf, struct section *sec, uns
        return find_reloc_by_dest_range(elf, sec, offset, 1);
 }
 
+static bool is_dwarf_section(struct section *sec)
+{
+       return !strncmp(sec->name, ".debug_", 7);
+}
+
 static int read_sections(struct elf *elf)
 {
        Elf_Scn *s = NULL;
@@ -293,7 +335,6 @@ static int read_sections(struct elf *elf)
                sec = &elf->section_data[i];
 
                INIT_LIST_HEAD(&sec->symbol_list);
-               INIT_LIST_HEAD(&sec->reloc_list);
 
                s = elf_getscn(elf->elf, i);
                if (!s) {
@@ -314,7 +355,7 @@ static int read_sections(struct elf *elf)
                        return -1;
                }
 
-               if (sec->sh.sh_size != 0) {
+               if (sec->sh.sh_size != 0 && !is_dwarf_section(sec)) {
                        sec->data = elf_getdata(s, NULL);
                        if (!sec->data) {
                                WARN_ELF("elf_getdata");
@@ -328,12 +369,12 @@ static int read_sections(struct elf *elf)
                        }
                }
 
-               if (sec->sh.sh_flags & SHF_EXECINSTR)
-                       elf->text_size += sec->sh.sh_size;
-
                list_add_tail(&sec->list, &elf->sections);
                elf_hash_add(section, &sec->hash, sec->idx);
                elf_hash_add(section_name, &sec->name_hash, str_hash(sec->name));
+
+               if (is_reloc_sec(sec))
+                       elf->num_relocs += sec_num_entries(sec);
        }
 
        if (opts.stats) {
@@ -356,7 +397,6 @@ static void elf_add_symbol(struct elf *elf, struct symbol *sym)
        struct rb_node *pnode;
        struct symbol *iter;
 
-       INIT_LIST_HEAD(&sym->reloc_list);
        INIT_LIST_HEAD(&sym->pv_target);
        sym->alias = sym;
 
@@ -407,7 +447,7 @@ static int read_symbols(struct elf *elf)
                if (symtab_shndx)
                        shndx_data = symtab_shndx->data;
 
-               symbols_nr = symtab->sh.sh_size / symtab->sh.sh_entsize;
+               symbols_nr = sec_num_entries(symtab);
        } else {
                /*
                 * A missing symbol table is actually possible if it's an empty
@@ -533,52 +573,17 @@ err:
        return -1;
 }
 
-static struct section *elf_create_reloc_section(struct elf *elf,
-                                               struct section *base,
-                                               int reltype);
-
-int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
-                 unsigned int type, struct symbol *sym, s64 addend)
-{
-       struct reloc *reloc;
-
-       if (!sec->reloc && !elf_create_reloc_section(elf, sec, SHT_RELA))
-               return -1;
-
-       reloc = malloc(sizeof(*reloc));
-       if (!reloc) {
-               perror("malloc");
-               return -1;
-       }
-       memset(reloc, 0, sizeof(*reloc));
-
-       reloc->sec = sec->reloc;
-       reloc->offset = offset;
-       reloc->type = type;
-       reloc->sym = sym;
-       reloc->addend = addend;
-
-       list_add_tail(&reloc->sym_reloc_entry, &sym->reloc_list);
-       list_add_tail(&reloc->list, &sec->reloc->reloc_list);
-       elf_hash_add(reloc, &reloc->hash, reloc_hash(reloc));
-
-       sec->reloc->sh.sh_size += sec->reloc->sh.sh_entsize;
-       sec->reloc->changed = true;
-
-       return 0;
-}
-
 /*
- * Ensure that any reloc section containing references to @sym is marked
- * changed such that it will get re-generated in elf_rebuild_reloc_sections()
- * with the new symbol index.
+ * @sym's idx has changed.  Update the relocs which reference it.
  */
-static void elf_dirty_reloc_sym(struct elf *elf, struct symbol *sym)
+static int elf_update_sym_relocs(struct elf *elf, struct symbol *sym)
 {
        struct reloc *reloc;
 
-       list_for_each_entry(reloc, &sym->reloc_list, sym_reloc_entry)
-               reloc->sec->changed = true;
+       for (reloc = sym->relocs; reloc; reloc = reloc->sym_next_reloc)
+               set_reloc_sym(elf, reloc, reloc->sym->idx);
+
+       return 0;
 }
 
 /*
@@ -655,7 +660,7 @@ static int elf_update_symbol(struct elf *elf, struct section *symtab,
                        symtab_data->d_align = 1;
                        symtab_data->d_type = ELF_T_SYM;
 
-                       symtab->changed = true;
+                       mark_sec_changed(elf, symtab, true);
                        symtab->truncate = true;
 
                        if (t) {
@@ -670,7 +675,7 @@ static int elf_update_symbol(struct elf *elf, struct section *symtab,
                                shndx_data->d_align = sizeof(Elf32_Word);
                                shndx_data->d_type = ELF_T_WORD;
 
-                               symtab_shndx->changed = true;
+                               mark_sec_changed(elf, symtab_shndx, true);
                                symtab_shndx->truncate = true;
                        }
 
@@ -734,7 +739,7 @@ __elf_create_symbol(struct elf *elf, struct symbol *sym)
                return NULL;
        }
 
-       new_idx = symtab->sh.sh_size / symtab->sh.sh_entsize;
+       new_idx = sec_num_entries(symtab);
 
        if (GELF_ST_BIND(sym->sym.st_info) != STB_LOCAL)
                goto non_local;
@@ -746,18 +751,19 @@ __elf_create_symbol(struct elf *elf, struct symbol *sym)
        first_non_local = symtab->sh.sh_info;
        old = find_symbol_by_index(elf, first_non_local);
        if (old) {
-               old->idx = new_idx;
 
-               hlist_del(&old->hash);
-               elf_hash_add(symbol, &old->hash, old->idx);
-
-               elf_dirty_reloc_sym(elf, old);
+               elf_hash_del(symbol, &old->hash, old->idx);
+               elf_hash_add(symbol, &old->hash, new_idx);
+               old->idx = new_idx;
 
                if (elf_update_symbol(elf, symtab, symtab_shndx, old)) {
                        WARN("elf_update_symbol move");
                        return NULL;
                }
 
+               if (elf_update_sym_relocs(elf, old))
+                       return NULL;
+
                new_idx = first_non_local;
        }
 
@@ -774,11 +780,11 @@ non_local:
        }
 
        symtab->sh.sh_size += symtab->sh.sh_entsize;
-       symtab->changed = true;
+       mark_sec_changed(elf, symtab, true);
 
        if (symtab_shndx) {
                symtab_shndx->sh.sh_size += sizeof(Elf32_Word);
-               symtab_shndx->changed = true;
+               mark_sec_changed(elf, symtab_shndx, true);
        }
 
        return sym;
@@ -841,13 +847,57 @@ elf_create_prefix_symbol(struct elf *elf, struct symbol *orig, long size)
        return sym;
 }
 
-int elf_add_reloc_to_insn(struct elf *elf, struct section *sec,
-                         unsigned long offset, unsigned int type,
-                         struct section *insn_sec, unsigned long insn_off)
+static struct reloc *elf_init_reloc(struct elf *elf, struct section *rsec,
+                                   unsigned int reloc_idx,
+                                   unsigned long offset, struct symbol *sym,
+                                   s64 addend, unsigned int type)
+{
+       struct reloc *reloc, empty = { 0 };
+
+       if (reloc_idx >= sec_num_entries(rsec)) {
+               WARN("%s: bad reloc_idx %u for %s with %d relocs",
+                    __func__, reloc_idx, rsec->name, sec_num_entries(rsec));
+               return NULL;
+       }
+
+       reloc = &rsec->relocs[reloc_idx];
+
+       if (memcmp(reloc, &empty, sizeof(empty))) {
+               WARN("%s: %s: reloc %d already initialized!",
+                    __func__, rsec->name, reloc_idx);
+               return NULL;
+       }
+
+       reloc->sec = rsec;
+       reloc->sym = sym;
+
+       set_reloc_offset(elf, reloc, offset);
+       set_reloc_sym(elf, reloc, sym->idx);
+       set_reloc_type(elf, reloc, type);
+       set_reloc_addend(elf, reloc, addend);
+
+       elf_hash_add(reloc, &reloc->hash, reloc_hash(reloc));
+       reloc->sym_next_reloc = sym->relocs;
+       sym->relocs = reloc;
+
+       return reloc;
+}
+
+struct reloc *elf_init_reloc_text_sym(struct elf *elf, struct section *sec,
+                                     unsigned long offset,
+                                     unsigned int reloc_idx,
+                                     struct section *insn_sec,
+                                     unsigned long insn_off)
 {
        struct symbol *sym = insn_sec->sym;
        int addend = insn_off;
 
+       if (!(insn_sec->sh.sh_flags & SHF_EXECINSTR)) {
+               WARN("bad call to %s() for data symbol %s",
+                    __func__, sym->name);
+               return NULL;
+       }
+
        if (!sym) {
                /*
                 * Due to how weak functions work, we must use section based
@@ -857,108 +907,86 @@ int elf_add_reloc_to_insn(struct elf *elf, struct section *sec,
                 */
                sym = elf_create_section_symbol(elf, insn_sec);
                if (!sym)
-                       return -1;
+                       return NULL;
 
                insn_sec->sym = sym;
        }
 
-       return elf_add_reloc(elf, sec, offset, type, sym, addend);
+       return elf_init_reloc(elf, sec->rsec, reloc_idx, offset, sym, addend,
+                             elf_text_rela_type(elf));
 }
 
-static int read_rel_reloc(struct section *sec, int i, struct reloc *reloc, unsigned int *symndx)
+struct reloc *elf_init_reloc_data_sym(struct elf *elf, struct section *sec,
+                                     unsigned long offset,
+                                     unsigned int reloc_idx,
+                                     struct symbol *sym,
+                                     s64 addend)
 {
-       if (!gelf_getrel(sec->data, i, &reloc->rel)) {
-               WARN_ELF("gelf_getrel");
-               return -1;
+       if (sym->sec && (sec->sh.sh_flags & SHF_EXECINSTR)) {
+               WARN("bad call to %s() for text symbol %s",
+                    __func__, sym->name);
+               return NULL;
        }
-       reloc->type = GELF_R_TYPE(reloc->rel.r_info);
-       reloc->addend = 0;
-       reloc->offset = reloc->rel.r_offset;
-       *symndx = GELF_R_SYM(reloc->rel.r_info);
-       return 0;
-}
 
-static int read_rela_reloc(struct section *sec, int i, struct reloc *reloc, unsigned int *symndx)
-{
-       if (!gelf_getrela(sec->data, i, &reloc->rela)) {
-               WARN_ELF("gelf_getrela");
-               return -1;
-       }
-       reloc->type = GELF_R_TYPE(reloc->rela.r_info);
-       reloc->addend = reloc->rela.r_addend;
-       reloc->offset = reloc->rela.r_offset;
-       *symndx = GELF_R_SYM(reloc->rela.r_info);
-       return 0;
+       return elf_init_reloc(elf, sec->rsec, reloc_idx, offset, sym, addend,
+                             elf_data_rela_type(elf));
 }
 
 static int read_relocs(struct elf *elf)
 {
-       unsigned long nr_reloc, max_reloc = 0, tot_reloc = 0;
-       struct section *sec;
+       unsigned long nr_reloc, max_reloc = 0;
+       struct section *rsec;
        struct reloc *reloc;
        unsigned int symndx;
        struct symbol *sym;
        int i;
 
-       if (!elf_alloc_hash(reloc, elf->text_size / 16))
+       if (!elf_alloc_hash(reloc, elf->num_relocs))
                return -1;
 
-       list_for_each_entry(sec, &elf->sections, list) {
-               if ((sec->sh.sh_type != SHT_RELA) &&
-                   (sec->sh.sh_type != SHT_REL))
+       list_for_each_entry(rsec, &elf->sections, list) {
+               if (!is_reloc_sec(rsec))
                        continue;
 
-               sec->base = find_section_by_index(elf, sec->sh.sh_info);
-               if (!sec->base) {
+               rsec->base = find_section_by_index(elf, rsec->sh.sh_info);
+               if (!rsec->base) {
                        WARN("can't find base section for reloc section %s",
-                            sec->name);
+                            rsec->name);
                        return -1;
                }
 
-               sec->base->reloc = sec;
+               rsec->base->rsec = rsec;
 
                nr_reloc = 0;
-               sec->reloc_data = calloc(sec->sh.sh_size / sec->sh.sh_entsize, sizeof(*reloc));
-               if (!sec->reloc_data) {
+               rsec->relocs = calloc(sec_num_entries(rsec), sizeof(*reloc));
+               if (!rsec->relocs) {
                        perror("calloc");
                        return -1;
                }
-               for (i = 0; i < sec->sh.sh_size / sec->sh.sh_entsize; i++) {
-                       reloc = &sec->reloc_data[i];
-                       switch (sec->sh.sh_type) {
-                       case SHT_REL:
-                               if (read_rel_reloc(sec, i, reloc, &symndx))
-                                       return -1;
-                               break;
-                       case SHT_RELA:
-                               if (read_rela_reloc(sec, i, reloc, &symndx))
-                                       return -1;
-                               break;
-                       default: return -1;
-                       }
+               for (i = 0; i < sec_num_entries(rsec); i++) {
+                       reloc = &rsec->relocs[i];
 
-                       reloc->sec = sec;
-                       reloc->idx = i;
+                       reloc->sec = rsec;
+                       symndx = reloc_sym(reloc);
                        reloc->sym = sym = find_symbol_by_index(elf, symndx);
                        if (!reloc->sym) {
                                WARN("can't find reloc entry symbol %d for %s",
-                                    symndx, sec->name);
+                                    symndx, rsec->name);
                                return -1;
                        }
 
-                       list_add_tail(&reloc->sym_reloc_entry, &sym->reloc_list);
-                       list_add_tail(&reloc->list, &sec->reloc_list);
                        elf_hash_add(reloc, &reloc->hash, reloc_hash(reloc));
+                       reloc->sym_next_reloc = sym->relocs;
+                       sym->relocs = reloc;
 
                        nr_reloc++;
                }
                max_reloc = max(max_reloc, nr_reloc);
-               tot_reloc += nr_reloc;
        }
 
        if (opts.stats) {
                printf("max_reloc: %lu\n", max_reloc);
-               printf("tot_reloc: %lu\n", tot_reloc);
+               printf("num_relocs: %lu\n", elf->num_relocs);
                printf("reloc_bits: %d\n", elf->reloc_bits);
        }
 
@@ -1053,13 +1081,14 @@ static int elf_add_string(struct elf *elf, struct section *strtab, char *str)
 
        len = strtab->sh.sh_size;
        strtab->sh.sh_size += data->d_size;
-       strtab->changed = true;
+
+       mark_sec_changed(elf, strtab, true);
 
        return len;
 }
 
 struct section *elf_create_section(struct elf *elf, const char *name,
-                                  unsigned int sh_flags, size_t entsize, int nr)
+                                  size_t entsize, unsigned int nr)
 {
        struct section *sec, *shstrtab;
        size_t size = entsize * nr;
@@ -1073,7 +1102,6 @@ struct section *elf_create_section(struct elf *elf, const char *name,
        memset(sec, 0, sizeof(*sec));
 
        INIT_LIST_HEAD(&sec->symbol_list);
-       INIT_LIST_HEAD(&sec->reloc_list);
 
        s = elf_newscn(elf->elf);
        if (!s) {
@@ -1088,7 +1116,6 @@ struct section *elf_create_section(struct elf *elf, const char *name,
        }
 
        sec->idx = elf_ndxscn(s);
-       sec->changed = true;
 
        sec->data = elf_newdata(s);
        if (!sec->data) {
@@ -1117,7 +1144,7 @@ struct section *elf_create_section(struct elf *elf, const char *name,
        sec->sh.sh_entsize = entsize;
        sec->sh.sh_type = SHT_PROGBITS;
        sec->sh.sh_addralign = 1;
-       sec->sh.sh_flags = SHF_ALLOC | sh_flags;
+       sec->sh.sh_flags = SHF_ALLOC;
 
        /* Add section name to .shstrtab (or .strtab for Clang) */
        shstrtab = find_section_by_name(elf, ".shstrtab");
@@ -1135,158 +1162,66 @@ struct section *elf_create_section(struct elf *elf, const char *name,
        elf_hash_add(section, &sec->hash, sec->idx);
        elf_hash_add(section_name, &sec->name_hash, str_hash(sec->name));
 
-       elf->changed = true;
+       mark_sec_changed(elf, sec, true);
 
        return sec;
 }
 
-static struct section *elf_create_rel_reloc_section(struct elf *elf, struct section *base)
+static struct section *elf_create_rela_section(struct elf *elf,
+                                              struct section *sec,
+                                              unsigned int reloc_nr)
 {
-       char *relocname;
-       struct section *sec;
+       struct section *rsec;
+       char *rsec_name;
 
-       relocname = malloc(strlen(base->name) + strlen(".rel") + 1);
-       if (!relocname) {
+       rsec_name = malloc(strlen(sec->name) + strlen(".rela") + 1);
+       if (!rsec_name) {
                perror("malloc");
                return NULL;
        }
-       strcpy(relocname, ".rel");
-       strcat(relocname, base->name);
+       strcpy(rsec_name, ".rela");
+       strcat(rsec_name, sec->name);
 
-       sec = elf_create_section(elf, relocname, 0, sizeof(GElf_Rel), 0);
-       free(relocname);
-       if (!sec)
+       rsec = elf_create_section(elf, rsec_name, elf_rela_size(elf), reloc_nr);
+       free(rsec_name);
+       if (!rsec)
                return NULL;
 
-       base->reloc = sec;
-       sec->base = base;
+       rsec->data->d_type = ELF_T_RELA;
+       rsec->sh.sh_type = SHT_RELA;
+       rsec->sh.sh_addralign = elf_addr_size(elf);
+       rsec->sh.sh_link = find_section_by_name(elf, ".symtab")->idx;
+       rsec->sh.sh_info = sec->idx;
+       rsec->sh.sh_flags = SHF_INFO_LINK;
+
+       rsec->relocs = calloc(sec_num_entries(rsec), sizeof(struct reloc));
+       if (!rsec->relocs) {
+               perror("calloc");
+               return NULL;
+       }
 
-       sec->sh.sh_type = SHT_REL;
-       sec->sh.sh_addralign = 8;
-       sec->sh.sh_link = find_section_by_name(elf, ".symtab")->idx;
-       sec->sh.sh_info = base->idx;
-       sec->sh.sh_flags = SHF_INFO_LINK;
+       sec->rsec = rsec;
+       rsec->base = sec;
 
-       return sec;
+       return rsec;
 }
 
-static struct section *elf_create_rela_reloc_section(struct elf *elf, struct section *base)
+struct section *elf_create_section_pair(struct elf *elf, const char *name,
+                                       size_t entsize, unsigned int nr,
+                                       unsigned int reloc_nr)
 {
-       char *relocname;
        struct section *sec;
-       int addrsize = elf_class_addrsize(elf);
-
-       relocname = malloc(strlen(base->name) + strlen(".rela") + 1);
-       if (!relocname) {
-               perror("malloc");
-               return NULL;
-       }
-       strcpy(relocname, ".rela");
-       strcat(relocname, base->name);
 
-       if (addrsize == sizeof(u32))
-               sec = elf_create_section(elf, relocname, 0, sizeof(Elf32_Rela), 0);
-       else
-               sec = elf_create_section(elf, relocname, 0, sizeof(GElf_Rela), 0);
-       free(relocname);
+       sec = elf_create_section(elf, name, entsize, nr);
        if (!sec)
                return NULL;
 
-       base->reloc = sec;
-       sec->base = base;
-
-       sec->sh.sh_type = SHT_RELA;
-       sec->sh.sh_addralign = addrsize;
-       sec->sh.sh_link = find_section_by_name(elf, ".symtab")->idx;
-       sec->sh.sh_info = base->idx;
-       sec->sh.sh_flags = SHF_INFO_LINK;
+       if (!elf_create_rela_section(elf, sec, reloc_nr))
+               return NULL;
 
        return sec;
 }
 
-static struct section *elf_create_reloc_section(struct elf *elf,
-                                        struct section *base,
-                                        int reltype)
-{
-       switch (reltype) {
-       case SHT_REL:  return elf_create_rel_reloc_section(elf, base);
-       case SHT_RELA: return elf_create_rela_reloc_section(elf, base);
-       default:       return NULL;
-       }
-}
-
-static int elf_rebuild_rel_reloc_section(struct section *sec)
-{
-       struct reloc *reloc;
-       int idx = 0;
-       void *buf;
-
-       /* Allocate a buffer for relocations */
-       buf = malloc(sec->sh.sh_size);
-       if (!buf) {
-               perror("malloc");
-               return -1;
-       }
-
-       sec->data->d_buf = buf;
-       sec->data->d_size = sec->sh.sh_size;
-       sec->data->d_type = ELF_T_REL;
-
-       idx = 0;
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
-               reloc->rel.r_offset = reloc->offset;
-               reloc->rel.r_info = GELF_R_INFO(reloc->sym->idx, reloc->type);
-               if (!gelf_update_rel(sec->data, idx, &reloc->rel)) {
-                       WARN_ELF("gelf_update_rel");
-                       return -1;
-               }
-               idx++;
-       }
-
-       return 0;
-}
-
-static int elf_rebuild_rela_reloc_section(struct section *sec)
-{
-       struct reloc *reloc;
-       int idx = 0;
-       void *buf;
-
-       /* Allocate a buffer for relocations with addends */
-       buf = malloc(sec->sh.sh_size);
-       if (!buf) {
-               perror("malloc");
-               return -1;
-       }
-
-       sec->data->d_buf = buf;
-       sec->data->d_size = sec->sh.sh_size;
-       sec->data->d_type = ELF_T_RELA;
-
-       idx = 0;
-       list_for_each_entry(reloc, &sec->reloc_list, list) {
-               reloc->rela.r_offset = reloc->offset;
-               reloc->rela.r_addend = reloc->addend;
-               reloc->rela.r_info = GELF_R_INFO(reloc->sym->idx, reloc->type);
-               if (!gelf_update_rela(sec->data, idx, &reloc->rela)) {
-                       WARN_ELF("gelf_update_rela");
-                       return -1;
-               }
-               idx++;
-       }
-
-       return 0;
-}
-
-static int elf_rebuild_reloc_section(struct elf *elf, struct section *sec)
-{
-       switch (sec->sh.sh_type) {
-       case SHT_REL:  return elf_rebuild_rel_reloc_section(sec);
-       case SHT_RELA: return elf_rebuild_rela_reloc_section(sec);
-       default:       return -1;
-       }
-}
-
 int elf_write_insn(struct elf *elf, struct section *sec,
                   unsigned long offset, unsigned int len,
                   const char *insn)
@@ -1299,37 +1234,8 @@ int elf_write_insn(struct elf *elf, struct section *sec,
        }
 
        memcpy(data->d_buf + offset, insn, len);
-       elf_flagdata(data, ELF_C_SET, ELF_F_DIRTY);
 
-       elf->changed = true;
-
-       return 0;
-}
-
-int elf_write_reloc(struct elf *elf, struct reloc *reloc)
-{
-       struct section *sec = reloc->sec;
-
-       if (sec->sh.sh_type == SHT_REL) {
-               reloc->rel.r_info = GELF_R_INFO(reloc->sym->idx, reloc->type);
-               reloc->rel.r_offset = reloc->offset;
-
-               if (!gelf_update_rel(sec->data, reloc->idx, &reloc->rel)) {
-                       WARN_ELF("gelf_update_rel");
-                       return -1;
-               }
-       } else {
-               reloc->rela.r_info = GELF_R_INFO(reloc->sym->idx, reloc->type);
-               reloc->rela.r_addend = reloc->addend;
-               reloc->rela.r_offset = reloc->offset;
-
-               if (!gelf_update_rela(sec->data, reloc->idx, &reloc->rela)) {
-                       WARN_ELF("gelf_update_rela");
-                       return -1;
-               }
-       }
-
-       elf->changed = true;
+       mark_sec_changed(elf, sec, true);
 
        return 0;
 }
@@ -1401,25 +1307,20 @@ int elf_write(struct elf *elf)
                if (sec->truncate)
                        elf_truncate_section(elf, sec);
 
-               if (sec->changed) {
+               if (sec_changed(sec)) {
                        s = elf_getscn(elf->elf, sec->idx);
                        if (!s) {
                                WARN_ELF("elf_getscn");
                                return -1;
                        }
+
+                       /* Note this also flags the section dirty */
                        if (!gelf_update_shdr(s, &sec->sh)) {
                                WARN_ELF("gelf_update_shdr");
                                return -1;
                        }
 
-                       if (sec->base &&
-                           elf_rebuild_reloc_section(elf, sec)) {
-                               WARN("elf_rebuild_reloc_section");
-                               return -1;
-                       }
-
-                       sec->changed = false;
-                       elf->changed = true;
+                       mark_sec_changed(elf, sec, false);
                }
        }
 
@@ -1439,30 +1340,14 @@ int elf_write(struct elf *elf)
 
 void elf_close(struct elf *elf)
 {
-       struct section *sec, *tmpsec;
-       struct symbol *sym, *tmpsym;
-       struct reloc *reloc, *tmpreloc;
-
        if (elf->elf)
                elf_end(elf->elf);
 
        if (elf->fd > 0)
                close(elf->fd);
 
-       list_for_each_entry_safe(sec, tmpsec, &elf->sections, list) {
-               list_for_each_entry_safe(sym, tmpsym, &sec->symbol_list, list) {
-                       list_del(&sym->list);
-                       hash_del(&sym->hash);
-               }
-               list_for_each_entry_safe(reloc, tmpreloc, &sec->reloc_list, list) {
-                       list_del(&reloc->list);
-                       hash_del(&reloc->hash);
-               }
-               list_del(&sec->list);
-               free(sec->reloc_data);
-       }
-
-       free(elf->symbol_data);
-       free(elf->section_data);
-       free(elf);
+       /*
+        * NOTE: All remaining allocations are leaked on purpose.  Objtool is
+        * about to exit anyway.
+        */
 }
index 2a108e6..fcca666 100644 (file)
@@ -37,6 +37,7 @@ struct opts {
        bool no_unreachable;
        bool sec_address;
        bool stats;
+       bool verbose;
 };
 
 extern struct opts opts;
index b1258e7..c8a6bec 100644 (file)
@@ -36,6 +36,7 @@ struct cfi_state {
        bool drap;
        bool signal;
        bool end;
+       bool force_undefined;
 };
 
 #endif /* _OBJTOOL_CFI_H */
index e1ca588..c532d70 100644 (file)
@@ -12,6 +12,7 @@
 #include <linux/hashtable.h>
 #include <linux/rbtree.h>
 #include <linux/jhash.h>
+#include <arch/elf.h>
 
 #ifdef LIBELF_USE_DEPRECATED
 # define elf_getshdrnum    elf_getshnum
 #define ELF_C_READ_MMAP ELF_C_READ
 #endif
 
+struct elf_hash_node {
+       struct elf_hash_node *next;
+};
+
 struct section {
        struct list_head list;
-       struct hlist_node hash;
-       struct hlist_node name_hash;
+       struct elf_hash_node hash;
+       struct elf_hash_node name_hash;
        GElf_Shdr sh;
        struct rb_root_cached symbol_tree;
        struct list_head symbol_list;
-       struct list_head reloc_list;
-       struct section *base, *reloc;
+       struct section *base, *rsec;
        struct symbol *sym;
        Elf_Data *data;
        char *name;
        int idx;
-       bool changed, text, rodata, noinstr, init, truncate;
-       struct reloc *reloc_data;
+       bool _changed, text, rodata, noinstr, init, truncate;
+       struct reloc *relocs;
 };
 
 struct symbol {
        struct list_head list;
        struct rb_node node;
-       struct hlist_node hash;
-       struct hlist_node name_hash;
+       struct elf_hash_node hash;
+       struct elf_hash_node name_hash;
        GElf_Sym sym;
        struct section *sec;
        char *name;
@@ -61,37 +65,27 @@ struct symbol {
        u8 return_thunk      : 1;
        u8 fentry            : 1;
        u8 profiling_func    : 1;
+       u8 warned            : 1;
        struct list_head pv_target;
-       struct list_head reloc_list;
+       struct reloc *relocs;
 };
 
 struct reloc {
-       struct list_head list;
-       struct hlist_node hash;
-       union {
-               GElf_Rela rela;
-               GElf_Rel  rel;
-       };
+       struct elf_hash_node hash;
        struct section *sec;
        struct symbol *sym;
-       struct list_head sym_reloc_entry;
-       unsigned long offset;
-       unsigned int type;
-       s64 addend;
-       int idx;
-       bool jump_table_start;
+       struct reloc *sym_next_reloc;
 };
 
-#define ELF_HASH_BITS  20
-
 struct elf {
        Elf *elf;
        GElf_Ehdr ehdr;
        int fd;
        bool changed;
        char *name;
-       unsigned int text_size, num_files;
+       unsigned int num_files;
        struct list_head sections;
+       unsigned long num_relocs;
 
        int symbol_bits;
        int symbol_name_bits;
@@ -99,44 +93,54 @@ struct elf {
        int section_name_bits;
        int reloc_bits;
 
-       struct hlist_head *symbol_hash;
-       struct hlist_head *symbol_name_hash;
-       struct hlist_head *section_hash;
-       struct hlist_head *section_name_hash;
-       struct hlist_head *reloc_hash;
+       struct elf_hash_node **symbol_hash;
+       struct elf_hash_node **symbol_name_hash;
+       struct elf_hash_node **section_hash;
+       struct elf_hash_node **section_name_hash;
+       struct elf_hash_node **reloc_hash;
 
        struct section *section_data;
        struct symbol *symbol_data;
 };
 
-#define OFFSET_STRIDE_BITS     4
-#define OFFSET_STRIDE          (1UL << OFFSET_STRIDE_BITS)
-#define OFFSET_STRIDE_MASK     (~(OFFSET_STRIDE - 1))
-
-#define for_offset_range(_offset, _start, _end)                        \
-       for (_offset = ((_start) & OFFSET_STRIDE_MASK);         \
-            _offset >= ((_start) & OFFSET_STRIDE_MASK) &&      \
-            _offset <= ((_end) & OFFSET_STRIDE_MASK);          \
-            _offset += OFFSET_STRIDE)
+struct elf *elf_open_read(const char *name, int flags);
 
-static inline u32 sec_offset_hash(struct section *sec, unsigned long offset)
-{
-       u32 ol, oh, idx = sec->idx;
+struct section *elf_create_section(struct elf *elf, const char *name,
+                                  size_t entsize, unsigned int nr);
+struct section *elf_create_section_pair(struct elf *elf, const char *name,
+                                       size_t entsize, unsigned int nr,
+                                       unsigned int reloc_nr);
 
-       offset &= OFFSET_STRIDE_MASK;
+struct symbol *elf_create_prefix_symbol(struct elf *elf, struct symbol *orig, long size);
 
-       ol = offset;
-       oh = (offset >> 16) >> 16;
+struct reloc *elf_init_reloc_text_sym(struct elf *elf, struct section *sec,
+                                     unsigned long offset,
+                                     unsigned int reloc_idx,
+                                     struct section *insn_sec,
+                                     unsigned long insn_off);
 
-       __jhash_mix(ol, oh, idx);
+struct reloc *elf_init_reloc_data_sym(struct elf *elf, struct section *sec,
+                                     unsigned long offset,
+                                     unsigned int reloc_idx,
+                                     struct symbol *sym,
+                                     s64 addend);
 
-       return ol;
-}
+int elf_write_insn(struct elf *elf, struct section *sec,
+                  unsigned long offset, unsigned int len,
+                  const char *insn);
+int elf_write(struct elf *elf);
+void elf_close(struct elf *elf);
 
-static inline u32 reloc_hash(struct reloc *reloc)
-{
-       return sec_offset_hash(reloc->sec, reloc->offset);
-}
+struct section *find_section_by_name(const struct elf *elf, const char *name);
+struct symbol *find_func_by_offset(struct section *sec, unsigned long offset);
+struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset);
+struct symbol *find_symbol_by_name(const struct elf *elf, const char *name);
+struct symbol *find_symbol_containing(const struct section *sec, unsigned long offset);
+int find_symbol_hole_containing(const struct section *sec, unsigned long offset);
+struct reloc *find_reloc_by_dest(const struct elf *elf, struct section *sec, unsigned long offset);
+struct reloc *find_reloc_by_dest_range(const struct elf *elf, struct section *sec,
+                                    unsigned long offset, unsigned int len);
+struct symbol *find_func_containing(struct section *sec, unsigned long offset);
 
 /*
  * Try to see if it's a whole archive (vmlinux.o or module).
@@ -148,42 +152,147 @@ static inline bool has_multiple_files(struct elf *elf)
        return elf->num_files > 1;
 }
 
-static inline int elf_class_addrsize(struct elf *elf)
+static inline size_t elf_addr_size(struct elf *elf)
 {
-       if (elf->ehdr.e_ident[EI_CLASS] == ELFCLASS32)
-               return sizeof(u32);
-       else
-               return sizeof(u64);
+       return elf->ehdr.e_ident[EI_CLASS] == ELFCLASS32 ? 4 : 8;
 }
 
-struct elf *elf_open_read(const char *name, int flags);
-struct section *elf_create_section(struct elf *elf, const char *name, unsigned int sh_flags, size_t entsize, int nr);
+static inline size_t elf_rela_size(struct elf *elf)
+{
+       return elf_addr_size(elf) == 4 ? sizeof(Elf32_Rela) : sizeof(Elf64_Rela);
+}
 
-struct symbol *elf_create_prefix_symbol(struct elf *elf, struct symbol *orig, long size);
+static inline unsigned int elf_data_rela_type(struct elf *elf)
+{
+       return elf_addr_size(elf) == 4 ? R_DATA32 : R_DATA64;
+}
 
-int elf_add_reloc(struct elf *elf, struct section *sec, unsigned long offset,
-                 unsigned int type, struct symbol *sym, s64 addend);
-int elf_add_reloc_to_insn(struct elf *elf, struct section *sec,
-                         unsigned long offset, unsigned int type,
-                         struct section *insn_sec, unsigned long insn_off);
+static inline unsigned int elf_text_rela_type(struct elf *elf)
+{
+       return elf_addr_size(elf) == 4 ? R_TEXT32 : R_TEXT64;
+}
 
-int elf_write_insn(struct elf *elf, struct section *sec,
-                  unsigned long offset, unsigned int len,
-                  const char *insn);
-int elf_write_reloc(struct elf *elf, struct reloc *reloc);
-int elf_write(struct elf *elf);
-void elf_close(struct elf *elf);
+static inline bool is_reloc_sec(struct section *sec)
+{
+       return sec->sh.sh_type == SHT_RELA || sec->sh.sh_type == SHT_REL;
+}
 
-struct section *find_section_by_name(const struct elf *elf, const char *name);
-struct symbol *find_func_by_offset(struct section *sec, unsigned long offset);
-struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset);
-struct symbol *find_symbol_by_name(const struct elf *elf, const char *name);
-struct symbol *find_symbol_containing(const struct section *sec, unsigned long offset);
-int find_symbol_hole_containing(const struct section *sec, unsigned long offset);
-struct reloc *find_reloc_by_dest(const struct elf *elf, struct section *sec, unsigned long offset);
-struct reloc *find_reloc_by_dest_range(const struct elf *elf, struct section *sec,
-                                    unsigned long offset, unsigned int len);
-struct symbol *find_func_containing(struct section *sec, unsigned long offset);
+static inline bool sec_changed(struct section *sec)
+{
+       return sec->_changed;
+}
+
+static inline void mark_sec_changed(struct elf *elf, struct section *sec,
+                                   bool changed)
+{
+       sec->_changed = changed;
+       elf->changed |= changed;
+}
+
+static inline unsigned int sec_num_entries(struct section *sec)
+{
+       return sec->sh.sh_size / sec->sh.sh_entsize;
+}
+
+static inline unsigned int reloc_idx(struct reloc *reloc)
+{
+       return reloc - reloc->sec->relocs;
+}
+
+static inline void *reloc_rel(struct reloc *reloc)
+{
+       struct section *rsec = reloc->sec;
+
+       return rsec->data->d_buf + (reloc_idx(reloc) * rsec->sh.sh_entsize);
+}
+
+static inline bool is_32bit_reloc(struct reloc *reloc)
+{
+       /*
+        * Elf32_Rel:   8 bytes
+        * Elf32_Rela: 12 bytes
+        * Elf64_Rel:  16 bytes
+        * Elf64_Rela: 24 bytes
+        */
+       return reloc->sec->sh.sh_entsize < 16;
+}
+
+#define __get_reloc_field(reloc, field)                                        \
+({                                                                     \
+       is_32bit_reloc(reloc) ?                                         \
+               ((Elf32_Rela *)reloc_rel(reloc))->field :               \
+               ((Elf64_Rela *)reloc_rel(reloc))->field;                \
+})
+
+#define __set_reloc_field(reloc, field, val)                           \
+({                                                                     \
+       if (is_32bit_reloc(reloc))                                      \
+               ((Elf32_Rela *)reloc_rel(reloc))->field = val;          \
+       else                                                            \
+               ((Elf64_Rela *)reloc_rel(reloc))->field = val;          \
+})
+
+static inline u64 reloc_offset(struct reloc *reloc)
+{
+       return __get_reloc_field(reloc, r_offset);
+}
+
+static inline void set_reloc_offset(struct elf *elf, struct reloc *reloc, u64 offset)
+{
+       __set_reloc_field(reloc, r_offset, offset);
+       mark_sec_changed(elf, reloc->sec, true);
+}
+
+static inline s64 reloc_addend(struct reloc *reloc)
+{
+       return __get_reloc_field(reloc, r_addend);
+}
+
+static inline void set_reloc_addend(struct elf *elf, struct reloc *reloc, s64 addend)
+{
+       __set_reloc_field(reloc, r_addend, addend);
+       mark_sec_changed(elf, reloc->sec, true);
+}
+
+
+static inline unsigned int reloc_sym(struct reloc *reloc)
+{
+       u64 info = __get_reloc_field(reloc, r_info);
+
+       return is_32bit_reloc(reloc) ?
+               ELF32_R_SYM(info) :
+               ELF64_R_SYM(info);
+}
+
+static inline unsigned int reloc_type(struct reloc *reloc)
+{
+       u64 info = __get_reloc_field(reloc, r_info);
+
+       return is_32bit_reloc(reloc) ?
+               ELF32_R_TYPE(info) :
+               ELF64_R_TYPE(info);
+}
+
+static inline void set_reloc_sym(struct elf *elf, struct reloc *reloc, unsigned int sym)
+{
+       u64 info = is_32bit_reloc(reloc) ?
+               ELF32_R_INFO(sym, reloc_type(reloc)) :
+               ELF64_R_INFO(sym, reloc_type(reloc));
+
+       __set_reloc_field(reloc, r_info, info);
+
+       mark_sec_changed(elf, reloc->sec, true);
+}
+static inline void set_reloc_type(struct elf *elf, struct reloc *reloc, unsigned int type)
+{
+       u64 info = is_32bit_reloc(reloc) ?
+               ELF32_R_INFO(reloc_sym(reloc), type) :
+               ELF64_R_INFO(reloc_sym(reloc), type);
+
+       __set_reloc_field(reloc, r_info, info);
+
+       mark_sec_changed(elf, reloc->sec, true);
+}
 
 #define for_each_sec(file, sec)                                                \
        list_for_each_entry(sec, &file->elf->sections, list)
@@ -197,4 +306,44 @@ struct symbol *find_func_containing(struct section *sec, unsigned long offset);
                for_each_sec(file, __sec)                               \
                        sec_for_each_sym(__sec, sym)
 
+#define for_each_reloc(rsec, reloc)                                    \
+       for (int __i = 0, __fake = 1; __fake; __fake = 0)               \
+               for (reloc = rsec->relocs;                              \
+                    __i < sec_num_entries(rsec);                       \
+                    __i++, reloc++)
+
+#define for_each_reloc_from(rsec, reloc)                               \
+       for (int __i = reloc_idx(reloc);                                \
+            __i < sec_num_entries(rsec);                               \
+            __i++, reloc++)
+
+#define OFFSET_STRIDE_BITS     4
+#define OFFSET_STRIDE          (1UL << OFFSET_STRIDE_BITS)
+#define OFFSET_STRIDE_MASK     (~(OFFSET_STRIDE - 1))
+
+#define for_offset_range(_offset, _start, _end)                        \
+       for (_offset = ((_start) & OFFSET_STRIDE_MASK);         \
+            _offset >= ((_start) & OFFSET_STRIDE_MASK) &&      \
+            _offset <= ((_end) & OFFSET_STRIDE_MASK);          \
+            _offset += OFFSET_STRIDE)
+
+static inline u32 sec_offset_hash(struct section *sec, unsigned long offset)
+{
+       u32 ol, oh, idx = sec->idx;
+
+       offset &= OFFSET_STRIDE_MASK;
+
+       ol = offset;
+       oh = (offset >> 16) >> 16;
+
+       __jhash_mix(ol, oh, idx);
+
+       return ol;
+}
+
+static inline u32 reloc_hash(struct reloc *reloc)
+{
+       return sec_offset_hash(reloc->sec, reloc_offset(reloc));
+}
+
 #endif /* _OBJTOOL_ELF_H */
index b1c920d..ac04d3f 100644 (file)
@@ -55,15 +55,22 @@ static inline char *offstr(struct section *sec, unsigned long offset)
 
 #define WARN_INSN(insn, format, ...)                                   \
 ({                                                                     \
-       WARN_FUNC(format, insn->sec, insn->offset,  ##__VA_ARGS__);     \
+       struct instruction *_insn = (insn);                             \
+       if (!_insn->sym || !_insn->sym->warned)                         \
+               WARN_FUNC(format, _insn->sec, _insn->offset,            \
+                         ##__VA_ARGS__);                               \
+       if (_insn->sym)                                                 \
+               _insn->sym->warned = 1;                                 \
 })
 
-#define BT_FUNC(format, insn, ...)                     \
-({                                                     \
-       struct instruction *_insn = (insn);             \
-       char *_str = offstr(_insn->sec, _insn->offset); \
-       WARN("  %s: " format, _str, ##__VA_ARGS__);     \
-       free(_str);                                     \
+#define BT_INSN(insn, format, ...)                             \
+({                                                             \
+       if (opts.verbose || opts.backtrace) {                   \
+               struct instruction *_insn = (insn);             \
+               char *_str = offstr(_insn->sec, _insn->offset); \
+               WARN("  %s: " format, _str, ##__VA_ARGS__);     \
+               free(_str);                                     \
+       }                                                       \
 })
 
 #define WARN_ELF(format, ...)                          \
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
new file mode 100644 (file)
index 0000000..1514e84
--- /dev/null
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * This is a (sorted!) list of all known __noreturn functions in the kernel.
+ * It's needed for objtool to properly reverse-engineer the control flow graph.
+ *
+ * Yes, this is unfortunate.  A better solution is in the works.
+ */
+NORETURN(__invalid_creds)
+NORETURN(__kunit_abort)
+NORETURN(__module_put_and_kthread_exit)
+NORETURN(__reiserfs_panic)
+NORETURN(__stack_chk_fail)
+NORETURN(__ubsan_handle_builtin_unreachable)
+NORETURN(arch_call_rest_init)
+NORETURN(arch_cpu_idle_dead)
+NORETURN(btrfs_assertfail)
+NORETURN(cpu_bringup_and_idle)
+NORETURN(cpu_startup_entry)
+NORETURN(do_exit)
+NORETURN(do_group_exit)
+NORETURN(do_task_dead)
+NORETURN(ex_handler_msr_mce)
+NORETURN(fortify_panic)
+NORETURN(hlt_play_dead)
+NORETURN(hv_ghcb_terminate)
+NORETURN(kthread_complete_and_exit)
+NORETURN(kthread_exit)
+NORETURN(kunit_try_catch_throw)
+NORETURN(machine_real_restart)
+NORETURN(make_task_dead)
+NORETURN(mpt_halt_firmware)
+NORETURN(nmi_panic_self_stop)
+NORETURN(panic)
+NORETURN(panic_smp_self_stop)
+NORETURN(rest_init)
+NORETURN(rewind_stack_and_make_dead)
+NORETURN(sev_es_terminate)
+NORETURN(snp_abort)
+NORETURN(start_kernel)
+NORETURN(stop_this_cpu)
+NORETURN(usercopy_abort)
+NORETURN(x86_64_start_kernel)
+NORETURN(x86_64_start_reservations)
+NORETURN(xen_cpu_bringup_again)
+NORETURN(xen_start_kernel)
index 48efd1e..bae3439 100644 (file)
@@ -118,8 +118,8 @@ static int write_orc_entry(struct elf *elf, struct section *orc_sec,
        orc->bp_offset = bswap_if_needed(elf, orc->bp_offset);
 
        /* populate reloc for ip */
-       if (elf_add_reloc_to_insn(elf, ip_sec, idx * sizeof(int), R_X86_64_PC32,
-                                 insn_sec, insn_off))
+       if (!elf_init_reloc_text_sym(elf, ip_sec, idx * sizeof(int), idx,
+                                    insn_sec, insn_off))
                return -1;
 
        return 0;
@@ -237,12 +237,12 @@ int orc_create(struct objtool_file *file)
                WARN("file already has .orc_unwind section, skipping");
                return -1;
        }
-       orc_sec = elf_create_section(file->elf, ".orc_unwind", 0,
+       orc_sec = elf_create_section(file->elf, ".orc_unwind",
                                     sizeof(struct orc_entry), nr);
        if (!orc_sec)
                return -1;
 
-       sec = elf_create_section(file->elf, ".orc_unwind_ip", 0, sizeof(int), nr);
+       sec = elf_create_section_pair(file->elf, ".orc_unwind_ip", sizeof(int), nr, nr);
        if (!sec)
                return -1;
 
index baa85c3..91b1950 100644 (file)
@@ -62,7 +62,7 @@ static void reloc_to_sec_off(struct reloc *reloc, struct section **sec,
                             unsigned long *off)
 {
        *sec = reloc->sym->sec;
-       *off = reloc->sym->offset + reloc->addend;
+       *off = reloc->sym->offset + reloc_addend(reloc);
 }
 
 static int get_alt_entry(struct elf *elf, const struct special_entry *entry,
@@ -126,7 +126,7 @@ static int get_alt_entry(struct elf *elf, const struct special_entry *entry,
                                  sec, offset + entry->key);
                        return -1;
                }
-               alt->key_addend = key_reloc->addend;
+               alt->key_addend = reloc_addend(key_reloc);
        }
 
        return 0;
index 4884520..a794d9e 100644 (file)
@@ -216,6 +216,12 @@ ifeq ($(call get-executable,$(BISON)),)
   dummy := $(error Error: $(BISON) is missing on this system, please install it)
 endif
 
+ifeq ($(BUILD_BPF_SKEL),1)
+  ifeq ($(call get-executable,$(CLANG)),)
+    dummy := $(error $(CLANG) is missing on this system, please install it to be able to build with BUILD_BPF_SKEL=1)
+  endif
+endif
+
 ifneq ($(OUTPUT),)
   ifeq ($(shell expr $(shell $(BISON) --version | grep bison | sed -e 's/.\+ \([0-9]\+\).\([0-9]\+\).\([0-9]\+\)/\1\2\3/g') \>\= 371), 1)
     BISON_FILE_PREFIX_MAP := --file-prefix-map=$(OUTPUT)=
@@ -921,6 +927,7 @@ ifndef NO_DEMANGLE
     EXTLIBS += -lstdc++
     CFLAGS += -DHAVE_CXA_DEMANGLE_SUPPORT
     CXXFLAGS += -DHAVE_CXA_DEMANGLE_SUPPORT
+    $(call detected,CONFIG_CXX_DEMANGLE)
   endif
   ifdef BUILD_NONDISTRO
     ifeq ($(filter -liberty,$(EXTLIBS)),)
index a42a6a9..f487948 100644 (file)
@@ -181,7 +181,6 @@ HOSTCC  ?= gcc
 HOSTLD  ?= ld
 HOSTAR  ?= ar
 CLANG   ?= clang
-LLVM_STRIP ?= llvm-strip
 
 PKG_CONFIG = $(CROSS_COMPILE)pkg-config
 
@@ -1057,15 +1056,33 @@ $(SKEL_TMP_OUT) $(LIBAPI_OUTPUT) $(LIBBPF_OUTPUT) $(LIBPERF_OUTPUT) $(LIBSUBCMD_
 
 ifdef BUILD_BPF_SKEL
 BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
-BPF_INCLUDE := -I$(SKEL_TMP_OUT)/.. -I$(LIBBPF_INCLUDE)
+# Get Clang's default includes on this system, as opposed to those seen by
+# '-target bpf'. This fixes "missing" files on some architectures/distros,
+# such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
+#
+# Use '-idirafter': Don't interfere with include mechanics except where the
+# build would have failed anyways.
+define get_sys_includes
+$(shell $(1) $(2) -v -E - </dev/null 2>&1 \
+       | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }') \
+$(shell $(1) $(2) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}')
+endef
+
+ifneq ($(CROSS_COMPILE),)
+CLANG_TARGET_ARCH = --target=$(notdir $(CROSS_COMPILE:%-=%))
+endif
+
+CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH))
+BPF_INCLUDE := -I$(SKEL_TMP_OUT)/.. -I$(LIBBPF_INCLUDE) $(CLANG_SYS_INCLUDES)
+TOOLS_UAPI_INCLUDE := -I$(srctree)/tools/include/uapi
 
 $(BPFTOOL): | $(SKEL_TMP_OUT)
        $(Q)CFLAGS= $(MAKE) -C ../bpf/bpftool \
                OUTPUT=$(SKEL_TMP_OUT)/ bootstrap
 
 $(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
-       $(QUIET_CLANG)$(CLANG) -g -O2 -target bpf -Wall -Werror $(BPF_INCLUDE) \
-         -c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@
+       $(QUIET_CLANG)$(CLANG) -g -O2 -target bpf -Wall -Werror $(BPF_INCLUDE) $(TOOLS_UAPI_INCLUDE) \
+         -c $(filter util/bpf_skel/%.bpf.c,$^) -o $@
 
 $(SKEL_OUT)/%.skel.h: $(SKEL_TMP_OUT)/%.bpf.o | $(BPFTOOL)
        $(QUIET_GENSKEL)$(BPFTOOL) gen skeleton $< > $@
index 77cb03e..9ca040b 100644 (file)
@@ -78,9 +78,9 @@ static int cs_etm_validate_context_id(struct auxtrace_record *itr,
        char path[PATH_MAX];
        int err;
        u32 val;
-       u64 contextid =
-               evsel->core.attr.config &
-               (perf_pmu__format_bits(&cs_etm_pmu->format, "contextid1") |
+       u64 contextid = evsel->core.attr.config &
+               (perf_pmu__format_bits(&cs_etm_pmu->format, "contextid") |
+                perf_pmu__format_bits(&cs_etm_pmu->format, "contextid1") |
                 perf_pmu__format_bits(&cs_etm_pmu->format, "contextid2"));
 
        if (!contextid)
@@ -114,8 +114,7 @@ static int cs_etm_validate_context_id(struct auxtrace_record *itr,
                 *  0b00100 Maximum of 32-bit Context ID size.
                 *  All other values are reserved.
                 */
-               val = BMVAL(val, 5, 9);
-               if (!val || val != 0x4) {
+               if (BMVAL(val, 5, 9) != 0x4) {
                        pr_err("%s: CONTEXTIDR_EL1 isn't supported, disable with %s/contextid1=0/\n",
                               CORESIGHT_ETM_PMU_NAME, CORESIGHT_ETM_PMU_NAME);
                        return -EINVAL;
index 860a8b4..a9623b1 100644 (file)
@@ -12,7 +12,7 @@
 #include "arm-spe.h"
 #include "hisi-ptt.h"
 #include "../../../util/pmu.h"
-#include "../cs-etm.h"
+#include "../../../util/cs-etm.h"
 
 struct perf_event_attr
 *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
index d730666..80b9f62 100644 (file)
@@ -29,8 +29,8 @@ static int _get_cpuid(char *buf, size_t sz, struct perf_cpu_map *cpus)
                char path[PATH_MAX];
                FILE *file;
 
-               scnprintf(path, PATH_MAX, "%s/devices/system/cpu/cpu%d"MIDR,
-                               sysfs, cpus->map[cpu]);
+               scnprintf(path, PATH_MAX, "%s/devices/system/cpu/cpu%d" MIDR,
+                         sysfs, RC_CHK_ACCESS(cpus)->map[cpu].cpu);
 
                file = fopen(path, "r");
                if (!file) {
index fa143ac..ef1ed64 100644 (file)
@@ -18,7 +18,7 @@ static struct perf_pmu *pmu__find_core_pmu(void)
                 * The cpumap should cover all CPUs. Otherwise, some CPUs may
                 * not support some events or have different event IDs.
                 */
-               if (pmu->cpus->nr != cpu__max_cpu().cpu)
+               if (RC_CHK_ACCESS(pmu->cpus)->nr != cpu__max_cpu().cpu)
                        return NULL;
 
                return pmu;
index 7991476..b68f475 100644 (file)
 444  common    landlock_create_ruleset sys_landlock_create_ruleset     sys_landlock_create_ruleset
 445  common    landlock_add_rule       sys_landlock_add_rule           sys_landlock_add_rule
 446  common    landlock_restrict_self  sys_landlock_restrict_self      sys_landlock_restrict_self
-# 447 reserved for memfd_secret
+447  common    memfd_secret            sys_memfd_secret                sys_memfd_secret
 448  common    process_mrelease        sys_process_mrelease            sys_process_mrelease
 449  common    futex_waitv             sys_futex_waitv                 sys_futex_waitv
 450  common    set_mempolicy_home_node sys_set_mempolicy_home_node     sys_set_mempolicy_home_node
index 902e9ea..93d3b88 100644 (file)
@@ -11,6 +11,7 @@ int test__intel_pt_pkt_decoder(struct test_suite *test, int subtest);
 int test__intel_pt_hybrid_compat(struct test_suite *test, int subtest);
 int test__bp_modify(struct test_suite *test, int subtest);
 int test__x86_sample_parsing(struct test_suite *test, int subtest);
+int test__amd_ibs_via_core_pmu(struct test_suite *test, int subtest);
 
 extern struct test_suite *arch_tests[];
 
index 6f4e863..fd02d81 100644 (file)
@@ -5,3 +5,4 @@ perf-y += arch-tests.o
 perf-y += sample-parsing.o
 perf-$(CONFIG_AUXTRACE) += insn-x86.o intel-pt-test.o
 perf-$(CONFIG_X86_64) += bp-modify.o
+perf-y += amd-ibs-via-core-pmu.o
diff --git a/tools/perf/arch/x86/tests/amd-ibs-via-core-pmu.c b/tools/perf/arch/x86/tests/amd-ibs-via-core-pmu.c
new file mode 100644 (file)
index 0000000..2902798
--- /dev/null
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "arch-tests.h"
+#include "linux/perf_event.h"
+#include "tests/tests.h"
+#include "pmu.h"
+#include "pmus.h"
+#include "../perf-sys.h"
+#include "debug.h"
+
+#define NR_SUB_TESTS 5
+
+static struct sub_tests {
+       int type;
+       unsigned long config;
+       bool valid;
+} sub_tests[NR_SUB_TESTS] = {
+       { PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES, true },
+       { PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS, false },
+       { PERF_TYPE_RAW, 0x076, true },
+       { PERF_TYPE_RAW, 0x0C1, true },
+       { PERF_TYPE_RAW, 0x012, false },
+};
+
+static int event_open(int type, unsigned long config)
+{
+       struct perf_event_attr attr;
+
+       memset(&attr, 0, sizeof(struct perf_event_attr));
+       attr.type = type;
+       attr.size = sizeof(struct perf_event_attr);
+       attr.config = config;
+       attr.disabled = 1;
+       attr.precise_ip = 1;
+       attr.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID;
+       attr.sample_period = 100000;
+
+       return sys_perf_event_open(&attr, -1, 0, -1, 0);
+}
+
+int test__amd_ibs_via_core_pmu(struct test_suite *test __maybe_unused,
+                              int subtest __maybe_unused)
+{
+       struct perf_pmu *ibs_pmu;
+       int ret = TEST_OK;
+       int fd, i;
+
+       if (list_empty(&pmus))
+               perf_pmu__scan(NULL);
+
+       ibs_pmu = perf_pmu__find("ibs_op");
+       if (!ibs_pmu)
+               return TEST_SKIP;
+
+       for (i = 0; i < NR_SUB_TESTS; i++) {
+               fd = event_open(sub_tests[i].type, sub_tests[i].config);
+               pr_debug("type: 0x%x, config: 0x%lx, fd: %d  -  ", sub_tests[i].type,
+                        sub_tests[i].config, fd);
+               if ((sub_tests[i].valid && fd == -1) ||
+                   (!sub_tests[i].valid && fd > 0)) {
+                       pr_debug("Fail\n");
+                       ret = TEST_FAIL;
+               } else {
+                       pr_debug("Pass\n");
+               }
+
+               if (fd > 0)
+                       close(fd);
+       }
+
+       return ret;
+}
index aae6ea0..b5c85ab 100644 (file)
@@ -22,6 +22,7 @@ struct test_suite suite__intel_pt = {
 DEFINE_SUITE("x86 bp modify", bp_modify);
 #endif
 DEFINE_SUITE("x86 Sample parsing", x86_sample_parsing);
+DEFINE_SUITE("AMD IBS via core pmu", amd_ibs_via_core_pmu);
 
 struct test_suite *arch_tests[] = {
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
@@ -35,5 +36,6 @@ struct test_suite *arch_tests[] = {
        &suite__bp_modify,
 #endif
        &suite__x86_sample_parsing,
+       &suite__amd_ibs_via_core_pmu,
        NULL,
 };
index 50ae8bd..6188e19 100644 (file)
@@ -7,7 +7,3 @@ MEMCPY_FN(memcpy_orig,
 MEMCPY_FN(__memcpy,
        "x86-64-movsq",
        "movsq-based memcpy() in arch/x86/lib/memcpy_64.S")
-
-MEMCPY_FN(memcpy_erms,
-       "x86-64-movsb",
-       "movsb-based memcpy() in arch/x86/lib/memcpy_64.S")
index 6eb45a2..1b9fef7 100644 (file)
@@ -2,7 +2,7 @@
 
 /* Various wrappers to make the kernel .S file build in user-space: */
 
-// memcpy_orig and memcpy_erms are being defined as SYM_L_LOCAL but we need it
+// memcpy_orig is being defined as SYM_L_LOCAL but we need it
 #define SYM_FUNC_START_LOCAL(name)                      \
         SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN)
 #define memcpy MEMCPY /* don't hide glibc's memcpy() */
index dac6d2b..247c72f 100644 (file)
@@ -7,7 +7,3 @@ MEMSET_FN(memset_orig,
 MEMSET_FN(__memset,
        "x86-64-stosq",
        "movsq-based memset() in arch/x86/lib/memset_64.S")
-
-MEMSET_FN(memset_erms,
-       "x86-64-stosb",
-       "movsb-based memset() in arch/x86/lib/memset_64.S")
index 6f093c4..abd26c9 100644 (file)
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-// memset_orig and memset_erms are being defined as SYM_L_LOCAL but we need it
+// memset_orig is being defined as SYM_L_LOCAL but we need it
 #define SYM_FUNC_START_LOCAL(name)                      \
         SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN)
 #define memset MEMSET /* don't hide glibc's memset() */
index 810e337..f9906f5 100644 (file)
@@ -1175,7 +1175,7 @@ int cmd_ftrace(int argc, const char **argv)
        OPT_BOOLEAN('b', "use-bpf", &ftrace.target.use_bpf,
                    "Use BPF to measure function latency"),
 #endif
-       OPT_BOOLEAN('n', "--use-nsec", &ftrace.use_nsec,
+       OPT_BOOLEAN('n', "use-nsec", &ftrace.use_nsec,
                    "Use nano-second histogram"),
        OPT_PARENT(common_options),
        };
index 006f522..c57be48 100644 (file)
@@ -3647,6 +3647,13 @@ static int process_stat_config_event(struct perf_session *session __maybe_unused
                                     union perf_event *event)
 {
        perf_event__read_stat_config(&stat_config, &event->stat_config);
+
+       /*
+        * Aggregation modes are not used since post-processing scripts are
+        * supposed to take care of such requirements
+        */
+       stat_config.aggr_mode = AGGR_NONE;
+
        return 0;
 }
 
index cc9fa48..b9ad32f 100644 (file)
@@ -667,6 +667,13 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
                        evsel_list->core.threads->err_thread = -1;
                        return COUNTER_RETRY;
                }
+       } else if (counter->skippable) {
+               if (verbose > 0)
+                       ui__warning("skipping event %s that kernel failed to open .\n",
+                                   evsel__name(counter));
+               counter->supported = false;
+               counter->errored = true;
+               return COUNTER_SKIP;
        }
 
        evsel__open_strerror(counter, &target, errno, msg, sizeof(msg));
@@ -1890,15 +1897,28 @@ static int add_default_attributes(void)
                 * caused by exposing latent bugs. This is fixed properly in:
                 * https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
                 */
-               if (metricgroup__has_metric("TopdownL1") && !perf_pmu__has_hybrid() &&
-                   metricgroup__parse_groups(evsel_list, "TopdownL1",
-                                           /*metric_no_group=*/false,
-                                           /*metric_no_merge=*/false,
-                                           /*metric_no_threshold=*/true,
-                                           stat_config.user_requested_cpu_list,
-                                           stat_config.system_wide,
-                                           &stat_config.metric_events) < 0)
-                       return -1;
+               if (metricgroup__has_metric("TopdownL1") && !perf_pmu__has_hybrid()) {
+                       struct evlist *metric_evlist = evlist__new();
+                       struct evsel *metric_evsel;
+
+                       if (!metric_evlist)
+                               return -1;
+
+                       if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
+                                                       /*metric_no_group=*/false,
+                                                       /*metric_no_merge=*/false,
+                                                       /*metric_no_threshold=*/true,
+                                                       stat_config.user_requested_cpu_list,
+                                                       stat_config.system_wide,
+                                                       &stat_config.metric_events) < 0)
+                               return -1;
+
+                       evlist__for_each_entry(metric_evlist, metric_evsel) {
+                               metric_evsel->skippable = true;
+                       }
+                       evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
+                       evlist__delete(metric_evlist);
+               }
 
                /* Platform specific attrs */
                if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
index 75d80e7..1f90475 100644 (file)
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that uops must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.   The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound.   The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound_aux",
         "MetricThreshold": "tma_backend_bound_aux > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that UOPS must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.  All of these subevents count backend stalls, in slots, due to a resource limitation.   These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based.  These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
         "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_base",
         "MetricThreshold": "tma_base > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
         "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.05",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
         "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
         "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
         "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
         "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.05",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
         "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
         "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_ms_uops",
         "MetricThreshold": "tma_ms_uops > 0.05",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS).  This includes uops from flows due to complex instructions, faults, assists, and inserted flows.",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
         "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_aux_group",
         "MetricName": "tma_resource_bound",
         "MetricThreshold": "tma_resource_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that uops must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.75",
+        "MetricgroupNoGroup": "TopdownL1",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 6 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEAVY",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
         "ScaleUnit": "100%",
         "Unit": "cpu_core"
index 1a85d93..0402adb 100644 (file)
@@ -98,6 +98,7 @@
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that uops must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.   The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound.   The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound_aux",
         "MetricThreshold": "tma_backend_bound_aux > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that UOPS must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.  All of these subevents count backend stalls, in slots, due to a resource limitation.   These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based.  These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_base",
         "MetricThreshold": "tma_base > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%"
     },
     {
         "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.05",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%"
     },
     {
         "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%"
     },
     {
         "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%"
     },
     {
         "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%"
     },
     {
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "ScaleUnit": "100%"
     },
     {
         "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.05",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%"
     },
     {
         "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "ScaleUnit": "100%"
     },
     {
         "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_ms_uops",
         "MetricThreshold": "tma_ms_uops > 0.05",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS).  This includes uops from flows due to complex instructions, faults, assists, and inserted flows.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_aux_group",
         "MetricName": "tma_resource_bound",
         "MetricThreshold": "tma_resource_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that uops must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.75",
+        "MetricgroupNoGroup": "TopdownL1",
         "ScaleUnit": "100%"
     },
     {
index 51cf856..f9e2316 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index fb57c73..e9c46d3 100644 (file)
@@ -97,6 +97,7 @@
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
         "ScaleUnit": "100%"
     },
index 65ec0c9..437b986 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index 8f7dc72..875c766 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index 2528418..9570a88 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index 11f152c..a522202 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index f45ae34..1a2154f 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 5 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
         "ScaleUnit": "100%"
     },
index 0f9b174..1ef772b 100644 (file)
@@ -80,6 +80,7 @@
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
         "ScaleUnit": "100%"
     },
@@ -89,6 +90,7 @@
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 5 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
         "ScaleUnit": "100%"
     },
index 5247f69..11080cc 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index 89469b1..65a46d6 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index e8f4e5c..66a6f65 100644 (file)
@@ -76,6 +76,7 @@
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
@@ -85,6 +86,7 @@
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
@@ -95,6 +97,7 @@
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index 4a99fe5..4b8bc19 100644 (file)
@@ -76,6 +76,7 @@
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
@@ -85,6 +86,7 @@
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
@@ -95,6 +97,7 @@
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index 126300b..620fc5b 100644 (file)
@@ -87,6 +87,7 @@
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
         "ScaleUnit": "100%"
     },
@@ -96,6 +97,7 @@
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 6 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEAVY",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
         "ScaleUnit": "100%"
     },
index a6d212b..21ef6c9 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index fa2f7f1..eb6f12c 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
         "ScaleUnit": "100%"
     },
index 4c80d6b..b442ed4 100644 (file)
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_bad_speculation",
         "MetricThreshold": "tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
         "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 5 > 0.35",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.15",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
         "MetricThreshold": "tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_light_operations",
         "MetricThreshold": "tma_light_operations > 0.6",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "MetricgroupNoGroup": "TopdownL2",
         "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
         "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "MetricgroupNoGroup": "TopdownL1",
         "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
         "ScaleUnit": "100%"
     },
index ca99b9c..f57a8f2 100755 (executable)
@@ -52,7 +52,8 @@ _json_event_attributes = [
 # Attributes that are in pmu_metric rather than pmu_event.
 _json_metric_attributes = [
     'metric_name', 'metric_group', 'metric_expr', 'metric_threshold', 'desc',
-    'long_desc', 'unit', 'compat', 'aggr_mode', 'event_grouping'
+    'long_desc', 'unit', 'compat', 'metricgroup_no_group', 'aggr_mode',
+    'event_grouping'
 ]
 # Attributes that are bools or enum int values, encoded as '0', '1',...
 _json_enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg']
@@ -303,6 +304,7 @@ class JsonEvent:
     self.deprecated = jd.get('Deprecated')
     self.metric_name = jd.get('MetricName')
     self.metric_group = jd.get('MetricGroup')
+    self.metricgroup_no_group = jd.get('MetricgroupNoGroup')
     self.event_grouping = convert_metric_constraint(jd.get('MetricConstraint'))
     self.metric_expr = None
     if 'MetricExpr' in jd:
index b7dff8f..8034968 100644 (file)
@@ -59,6 +59,7 @@ struct pmu_metric {
        const char *compat;
        const char *desc;
        const char *long_desc;
+       const char *metricgroup_no_group;
        enum aggr_mode_class aggr_mode;
        enum metric_event_groups event_grouping;
 };
index ccfef86..e890c26 100644 (file)
@@ -152,7 +152,7 @@ def parse_version(version):
 #   - expected values assignments
 class Test(object):
     def __init__(self, path, options):
-        parser = configparser.SafeConfigParser()
+        parser = configparser.ConfigParser()
         parser.read(path)
 
         log.warning("running '%s'" % path)
@@ -247,7 +247,7 @@ class Test(object):
         return True
 
     def load_events(self, path, events):
-        parser_event = configparser.SafeConfigParser()
+        parser_event = configparser.ConfigParser()
         parser_event.read(path)
 
         # The event record section header contains 'event' word,
@@ -261,7 +261,7 @@ class Test(object):
             # Read parent event if there's any
             if (':' in section):
                 base = section[section.index(':') + 1:]
-                parser_base = configparser.SafeConfigParser()
+                parser_base = configparser.ConfigParser()
                 parser_base.read(self.test_dir + '/' + base)
                 base_items = parser_base.items('event')
 
index a21fb65..fccd8ec 100644 (file)
@@ -16,7 +16,7 @@ pinned=0
 exclusive=0
 exclude_user=0
 exclude_kernel=0|1
-exclude_hv=0
+exclude_hv=0|1
 exclude_idle=0
 mmap=0
 comm=0
index d8ea6a8..a1e2da0 100644 (file)
@@ -40,7 +40,6 @@ fd=6
 type=0
 config=7
 optional=1
-
 # PERF_TYPE_HARDWARE / PERF_COUNT_HW_STALLED_CYCLES_BACKEND
 [event7:base-stat]
 fd=7
@@ -89,79 +88,98 @@ enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-bad-spec (0x8100)
+# PERF_TYPE_RAW / topdown-fe-bound (0x8200)
 [event13:base-stat]
 fd=13
 group_fd=11
 type=4
-config=33024
+config=33280
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-fe-bound (0x8200)
+# PERF_TYPE_RAW / topdown-be-bound (0x8300)
 [event14:base-stat]
 fd=14
 group_fd=11
 type=4
-config=33280
+config=33536
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-be-bound (0x8300)
+# PERF_TYPE_RAW / topdown-bad-spec (0x8100)
 [event15:base-stat]
 fd=15
 group_fd=11
 type=4
-config=33536
+config=33024
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-heavy-ops (0x8400)
+# PERF_TYPE_RAW / INT_MISC.UOP_DROPPING
 [event16:base-stat]
 fd=16
-group_fd=11
 type=4
-config=33792
-disabled=0
-enable_on_exec=0
-read_format=15
+config=4109
 optional=1
 
-# PERF_TYPE_RAW / topdown-br-mispredict (0x8500)
+# PERF_TYPE_RAW / cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/
 [event17:base-stat]
 fd=17
-group_fd=11
 type=4
-config=34048
-disabled=0
-enable_on_exec=0
-read_format=15
+config=17039629
 optional=1
 
-# PERF_TYPE_RAW / topdown-fetch-lat (0x8600)
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.THREAD
 [event18:base-stat]
 fd=18
-group_fd=11
 type=4
-config=34304
-disabled=0
-enable_on_exec=0
-read_format=15
+config=60
 optional=1
 
-# PERF_TYPE_RAW / topdown-mem-bound (0x8700)
+# PERF_TYPE_RAW / INT_MISC.RECOVERY_CYCLES_ANY
 [event19:base-stat]
 fd=19
-group_fd=11
 type=4
-config=34560
-disabled=0
-enable_on_exec=0
-read_format=15
+config=2097421
+optional=1
+
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.REF_XCLK
+[event20:base-stat]
+fd=20
+type=4
+config=316
+optional=1
+
+# PERF_TYPE_RAW / IDQ_UOPS_NOT_DELIVERED.CORE
+[event21:base-stat]
+fd=21
+type=4
+config=412
+optional=1
+
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE
+[event22:base-stat]
+fd=22
+type=4
+config=572
+optional=1
+
+# PERF_TYPE_RAW / UOPS_RETIRED.RETIRE_SLOTS
+[event23:base-stat]
+fd=23
+type=4
+config=706
+optional=1
+
+# PERF_TYPE_RAW / UOPS_ISSUED.ANY
+[event24:base-stat]
+fd=24
+type=4
+config=270
 optional=1
index b656ab9..1c52cb0 100644 (file)
@@ -90,89 +90,108 @@ enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-bad-spec (0x8100)
+# PERF_TYPE_RAW / topdown-fe-bound (0x8200)
 [event13:base-stat]
 fd=13
 group_fd=11
 type=4
-config=33024
+config=33280
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-fe-bound (0x8200)
+# PERF_TYPE_RAW / topdown-be-bound (0x8300)
 [event14:base-stat]
 fd=14
 group_fd=11
 type=4
-config=33280
+config=33536
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-be-bound (0x8300)
+# PERF_TYPE_RAW / topdown-bad-spec (0x8100)
 [event15:base-stat]
 fd=15
 group_fd=11
 type=4
-config=33536
+config=33024
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-heavy-ops (0x8400)
+# PERF_TYPE_RAW / INT_MISC.UOP_DROPPING
 [event16:base-stat]
 fd=16
-group_fd=11
 type=4
-config=33792
-disabled=0
-enable_on_exec=0
-read_format=15
+config=4109
 optional=1
 
-# PERF_TYPE_RAW / topdown-br-mispredict (0x8500)
+# PERF_TYPE_RAW / cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/
 [event17:base-stat]
 fd=17
-group_fd=11
 type=4
-config=34048
-disabled=0
-enable_on_exec=0
-read_format=15
+config=17039629
 optional=1
 
-# PERF_TYPE_RAW / topdown-fetch-lat (0x8600)
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.THREAD
 [event18:base-stat]
 fd=18
-group_fd=11
 type=4
-config=34304
-disabled=0
-enable_on_exec=0
-read_format=15
+config=60
 optional=1
 
-# PERF_TYPE_RAW / topdown-mem-bound (0x8700)
+# PERF_TYPE_RAW / INT_MISC.RECOVERY_CYCLES_ANY
 [event19:base-stat]
 fd=19
-group_fd=11
 type=4
-config=34560
-disabled=0
-enable_on_exec=0
-read_format=15
+config=2097421
+optional=1
+
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.REF_XCLK
+[event20:base-stat]
+fd=20
+type=4
+config=316
+optional=1
+
+# PERF_TYPE_RAW / IDQ_UOPS_NOT_DELIVERED.CORE
+[event21:base-stat]
+fd=21
+type=4
+config=412
+optional=1
+
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE
+[event22:base-stat]
+fd=22
+type=4
+config=572
+optional=1
+
+# PERF_TYPE_RAW / UOPS_RETIRED.RETIRE_SLOTS
+[event23:base-stat]
+fd=23
+type=4
+config=706
+optional=1
+
+# PERF_TYPE_RAW / UOPS_ISSUED.ANY
+[event24:base-stat]
+fd=24
+type=4
+config=270
 optional=1
 
 # PERF_TYPE_HW_CACHE /
 #  PERF_COUNT_HW_CACHE_L1D                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event20:base-stat]
-fd=20
+[event25:base-stat]
+fd=25
 type=3
 config=0
 optional=1
@@ -181,8 +200,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_L1D                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event21:base-stat]
-fd=21
+[event26:base-stat]
+fd=26
 type=3
 config=65536
 optional=1
@@ -191,8 +210,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_LL                 <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event22:base-stat]
-fd=22
+[event27:base-stat]
+fd=27
 type=3
 config=2
 optional=1
@@ -201,8 +220,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_LL                 <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event23:base-stat]
-fd=23
+[event28:base-stat]
+fd=28
 type=3
 config=65538
 optional=1
index 9762509..7e961d2 100644 (file)
@@ -90,89 +90,108 @@ enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-bad-spec (0x8100)
+# PERF_TYPE_RAW / topdown-fe-bound (0x8200)
 [event13:base-stat]
 fd=13
 group_fd=11
 type=4
-config=33024
+config=33280
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-fe-bound (0x8200)
+# PERF_TYPE_RAW / topdown-be-bound (0x8300)
 [event14:base-stat]
 fd=14
 group_fd=11
 type=4
-config=33280
+config=33536
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-be-bound (0x8300)
+# PERF_TYPE_RAW / topdown-bad-spec (0x8100)
 [event15:base-stat]
 fd=15
 group_fd=11
 type=4
-config=33536
+config=33024
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-heavy-ops (0x8400)
+# PERF_TYPE_RAW / INT_MISC.UOP_DROPPING
 [event16:base-stat]
 fd=16
-group_fd=11
 type=4
-config=33792
-disabled=0
-enable_on_exec=0
-read_format=15
+config=4109
 optional=1
 
-# PERF_TYPE_RAW / topdown-br-mispredict (0x8500)
+# PERF_TYPE_RAW / cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/
 [event17:base-stat]
 fd=17
-group_fd=11
 type=4
-config=34048
-disabled=0
-enable_on_exec=0
-read_format=15
+config=17039629
 optional=1
 
-# PERF_TYPE_RAW / topdown-fetch-lat (0x8600)
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.THREAD
 [event18:base-stat]
 fd=18
-group_fd=11
 type=4
-config=34304
-disabled=0
-enable_on_exec=0
-read_format=15
+config=60
 optional=1
 
-# PERF_TYPE_RAW / topdown-mem-bound (0x8700)
+# PERF_TYPE_RAW / INT_MISC.RECOVERY_CYCLES_ANY
 [event19:base-stat]
 fd=19
-group_fd=11
 type=4
-config=34560
-disabled=0
-enable_on_exec=0
-read_format=15
+config=2097421
+optional=1
+
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.REF_XCLK
+[event20:base-stat]
+fd=20
+type=4
+config=316
+optional=1
+
+# PERF_TYPE_RAW / IDQ_UOPS_NOT_DELIVERED.CORE
+[event21:base-stat]
+fd=21
+type=4
+config=412
+optional=1
+
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE
+[event22:base-stat]
+fd=22
+type=4
+config=572
+optional=1
+
+# PERF_TYPE_RAW / UOPS_RETIRED.RETIRE_SLOTS
+[event23:base-stat]
+fd=23
+type=4
+config=706
+optional=1
+
+# PERF_TYPE_RAW / UOPS_ISSUED.ANY
+[event24:base-stat]
+fd=24
+type=4
+config=270
 optional=1
 
 # PERF_TYPE_HW_CACHE /
 #  PERF_COUNT_HW_CACHE_L1D                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event20:base-stat]
-fd=20
+[event25:base-stat]
+fd=25
 type=3
 config=0
 optional=1
@@ -181,8 +200,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_L1D                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event21:base-stat]
-fd=21
+[event26:base-stat]
+fd=26
 type=3
 config=65536
 optional=1
@@ -191,8 +210,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_LL                 <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event22:base-stat]
-fd=22
+[event27:base-stat]
+fd=27
 type=3
 config=2
 optional=1
@@ -201,8 +220,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_LL                 <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event23:base-stat]
-fd=23
+[event28:base-stat]
+fd=28
 type=3
 config=65538
 optional=1
@@ -211,8 +230,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_L1I                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event24:base-stat]
-fd=24
+[event29:base-stat]
+fd=29
 type=3
 config=1
 optional=1
@@ -221,8 +240,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_L1I                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event25:base-stat]
-fd=25
+[event30:base-stat]
+fd=30
 type=3
 config=65537
 optional=1
@@ -231,8 +250,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_DTLB               <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event26:base-stat]
-fd=26
+[event31:base-stat]
+fd=31
 type=3
 config=3
 optional=1
@@ -241,8 +260,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_DTLB               <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event27:base-stat]
-fd=27
+[event32:base-stat]
+fd=32
 type=3
 config=65539
 optional=1
@@ -251,8 +270,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_ITLB               <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event28:base-stat]
-fd=28
+[event33:base-stat]
+fd=33
 type=3
 config=4
 optional=1
@@ -261,8 +280,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_ITLB               <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event29:base-stat]
-fd=29
+[event34:base-stat]
+fd=34
 type=3
 config=65540
 optional=1
index d555042..e50535f 100644 (file)
@@ -90,89 +90,108 @@ enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-bad-spec (0x8100)
+# PERF_TYPE_RAW / topdown-fe-bound (0x8200)
 [event13:base-stat]
 fd=13
 group_fd=11
 type=4
-config=33024
+config=33280
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-fe-bound (0x8200)
+# PERF_TYPE_RAW / topdown-be-bound (0x8300)
 [event14:base-stat]
 fd=14
 group_fd=11
 type=4
-config=33280
+config=33536
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-be-bound (0x8300)
+# PERF_TYPE_RAW / topdown-bad-spec (0x8100)
 [event15:base-stat]
 fd=15
 group_fd=11
 type=4
-config=33536
+config=33024
 disabled=0
 enable_on_exec=0
 read_format=15
 optional=1
 
-# PERF_TYPE_RAW / topdown-heavy-ops (0x8400)
+# PERF_TYPE_RAW / INT_MISC.UOP_DROPPING
 [event16:base-stat]
 fd=16
-group_fd=11
 type=4
-config=33792
-disabled=0
-enable_on_exec=0
-read_format=15
+config=4109
 optional=1
 
-# PERF_TYPE_RAW / topdown-br-mispredict (0x8500)
+# PERF_TYPE_RAW / cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/
 [event17:base-stat]
 fd=17
-group_fd=11
 type=4
-config=34048
-disabled=0
-enable_on_exec=0
-read_format=15
+config=17039629
 optional=1
 
-# PERF_TYPE_RAW / topdown-fetch-lat (0x8600)
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.THREAD
 [event18:base-stat]
 fd=18
-group_fd=11
 type=4
-config=34304
-disabled=0
-enable_on_exec=0
-read_format=15
+config=60
 optional=1
 
-# PERF_TYPE_RAW / topdown-mem-bound (0x8700)
+# PERF_TYPE_RAW / INT_MISC.RECOVERY_CYCLES_ANY
 [event19:base-stat]
 fd=19
-group_fd=11
 type=4
-config=34560
-disabled=0
-enable_on_exec=0
-read_format=15
+config=2097421
+optional=1
+
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.REF_XCLK
+[event20:base-stat]
+fd=20
+type=4
+config=316
+optional=1
+
+# PERF_TYPE_RAW / IDQ_UOPS_NOT_DELIVERED.CORE
+[event21:base-stat]
+fd=21
+type=4
+config=412
+optional=1
+
+# PERF_TYPE_RAW / CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE
+[event22:base-stat]
+fd=22
+type=4
+config=572
+optional=1
+
+# PERF_TYPE_RAW / UOPS_RETIRED.RETIRE_SLOTS
+[event23:base-stat]
+fd=23
+type=4
+config=706
+optional=1
+
+# PERF_TYPE_RAW / UOPS_ISSUED.ANY
+[event24:base-stat]
+fd=24
+type=4
+config=270
 optional=1
 
 # PERF_TYPE_HW_CACHE /
 #  PERF_COUNT_HW_CACHE_L1D                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event20:base-stat]
-fd=20
+[event25:base-stat]
+fd=25
 type=3
 config=0
 optional=1
@@ -181,8 +200,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_L1D                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event21:base-stat]
-fd=21
+[event26:base-stat]
+fd=26
 type=3
 config=65536
 optional=1
@@ -191,8 +210,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_LL                 <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event22:base-stat]
-fd=22
+[event27:base-stat]
+fd=27
 type=3
 config=2
 optional=1
@@ -201,8 +220,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_LL                 <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event23:base-stat]
-fd=23
+[event28:base-stat]
+fd=28
 type=3
 config=65538
 optional=1
@@ -211,8 +230,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_L1I                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event24:base-stat]
-fd=24
+[event29:base-stat]
+fd=29
 type=3
 config=1
 optional=1
@@ -221,8 +240,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_L1I                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event25:base-stat]
-fd=25
+[event30:base-stat]
+fd=30
 type=3
 config=65537
 optional=1
@@ -231,8 +250,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_DTLB               <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event26:base-stat]
-fd=26
+[event31:base-stat]
+fd=31
 type=3
 config=3
 optional=1
@@ -241,8 +260,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_DTLB               <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event27:base-stat]
-fd=27
+[event32:base-stat]
+fd=32
 type=3
 config=65539
 optional=1
@@ -251,8 +270,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_ITLB               <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event28:base-stat]
-fd=28
+[event33:base-stat]
+fd=33
 type=3
 config=4
 optional=1
@@ -261,8 +280,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_ITLB               <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_READ            <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event29:base-stat]
-fd=29
+[event34:base-stat]
+fd=34
 type=3
 config=65540
 optional=1
@@ -271,8 +290,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_L1D                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_PREFETCH        <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_ACCESS      << 16)
-[event30:base-stat]
-fd=30
+[event35:base-stat]
+fd=35
 type=3
 config=512
 optional=1
@@ -281,8 +300,8 @@ optional=1
 #  PERF_COUNT_HW_CACHE_L1D                <<  0  |
 # (PERF_COUNT_HW_CACHE_OP_PREFETCH        <<  8) |
 # (PERF_COUNT_HW_CACHE_RESULT_MISS        << 16)
-[event31:base-stat]
-fd=31
+[event36:base-stat]
+fd=36
 type=3
 config=66048
 optional=1
index cbf0e0c..733ead1 100644 (file)
@@ -120,7 +120,8 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
 
        p = "FOO/0";
        ret = expr__parse(&val, ctx, p);
-       TEST_ASSERT_VAL("division by zero", ret == -1);
+       TEST_ASSERT_VAL("division by zero", ret == 0);
+       TEST_ASSERT_VAL("division by zero", isnan(val));
 
        p = "BAR/";
        ret = expr__parse(&val, ctx, p);
index 1185b79..c05148e 100644 (file)
@@ -38,6 +38,7 @@ static void load_runtime_stat(struct evlist *evlist, struct value *vals)
        evlist__alloc_aggr_stats(evlist, 1);
        evlist__for_each_entry(evlist, evsel) {
                count = find_value(evsel->name, vals);
+               evsel->supported = true;
                evsel->stats->aggr->counts.val = count;
                if (evsel__name_is(evsel, "duration_time"))
                        update_stats(&walltime_nsecs_stats, count);
index 2c1d3f7..b154fbb 100755 (executable)
@@ -28,6 +28,18 @@ test_stat_record_report() {
   echo "stat record and report test [Success]"
 }
 
+test_stat_record_script() {
+  echo "stat record and script test"
+  if ! perf stat record -o - true | perf script -i - 2>&1 | \
+    grep -E -q "CPU[[:space:]]+THREAD[[:space:]]+VAL[[:space:]]+ENA[[:space:]]+RUN[[:space:]]+TIME[[:space:]]+EVENT"
+  then
+    echo "stat record and script test [Failed]"
+    err=1
+    return
+  fi
+  echo "stat record and script test [Success]"
+}
+
 test_stat_repeat_weak_groups() {
   echo "stat repeat weak groups test"
   if ! perf stat -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}' \
@@ -93,6 +105,7 @@ test_topdown_weak_groups() {
 
 test_default_stat
 test_stat_record_report
+test_stat_record_script
 test_stat_repeat_weak_groups
 test_topdown_groups
 test_topdown_weak_groups
index 4ddb17c..3a8b9bf 100755 (executable)
@@ -506,6 +506,13 @@ test_sample()
                echo "perf record failed with --aux-sample"
                return 1
        fi
+       # Check with event with PMU name
+       if perf_record_no_decode -o "${perfdatafile}" -e br_misp_retired.all_branches:u uname ; then
+               if ! perf_record_no_decode -o "${perfdatafile}" -e '{intel_pt//,br_misp_retired.all_branches/aux-sample-size=8192/}:u' uname ; then
+                       echo "perf record failed with --aux-sample-size"
+                       return 1
+               fi
+       fi
        echo OK
        return 0
 }
index 90cea88..499539d 100755 (executable)
@@ -56,7 +56,7 @@ if [ $? -ne 0 ]; then
        exit 1
 fi
 
-if ! perf inject -i $PERF_DATA -o $PERF_INJ_DATA -j; then
+if ! DEBUGINFOD_URLS='' perf inject -i $PERF_DATA -o $PERF_INJ_DATA -j; then
        echo "Fail to inject samples"
        exit 1
 fi
index fe022ca..a211348 100644 (file)
 
 static DEFINE_STRARRAY_OFFSET(x86_arch_prctl_codes_1, "ARCH_", x86_arch_prctl_codes_1_offset);
 static DEFINE_STRARRAY_OFFSET(x86_arch_prctl_codes_2, "ARCH_", x86_arch_prctl_codes_2_offset);
+static DEFINE_STRARRAY_OFFSET(x86_arch_prctl_codes_3, "ARCH_", x86_arch_prctl_codes_3_offset);
 
 static struct strarray *x86_arch_prctl_codes[] = {
        &strarray__x86_arch_prctl_codes_1,
        &strarray__x86_arch_prctl_codes_2,
+       &strarray__x86_arch_prctl_codes_3,
 };
 
 static DEFINE_STRARRAYS(x86_arch_prctl_codes);
index 57fa6aa..fd5c740 100755 (executable)
@@ -24,3 +24,4 @@ print_range () {
 
 print_range 1 0x1 0x1001
 print_range 2 0x2 0x2001
+print_range 3 0x4 0x4001
index bd18fe5..f9df1df 100644 (file)
@@ -214,7 +214,7 @@ perf-$(CONFIG_ZSTD) += zstd.o
 
 perf-$(CONFIG_LIBCAP) += cap.o
 
-perf-y += demangle-cxx.o
+perf-$(CONFIG_CXX_DEMANGLE) += demangle-cxx.o
 perf-y += demangle-ocaml.o
 perf-y += demangle-java.o
 perf-y += demangle-rust.o
index 8d3cfbb..1d48226 100644 (file)
@@ -416,6 +416,8 @@ int contention_end(u64 *ctx)
        return 0;
 }
 
+struct rq {};
+
 extern struct rq runqueues __ksym;
 
 struct rq___old {
index cffe493..fb94f52 100644 (file)
@@ -25,7 +25,7 @@ struct perf_sample_data___new {
 } __attribute__((preserve_access_index));
 
 /* new kernel perf_mem_data_src definition */
-union perf_mem_data_src__new {
+union perf_mem_data_src___new {
        __u64 val;
        struct {
                __u64   mem_op:5,       /* type of opcode */
@@ -108,7 +108,7 @@ static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
                if (entry->part == 7)
                        return kctx->data->data_src.mem_blk;
                if (entry->part == 8) {
-                       union perf_mem_data_src__new *data = (void *)&kctx->data->data_src;
+                       union perf_mem_data_src___new *data = (void *)&kctx->data->data_src;
 
                        if (bpf_core_field_exists(data->mem_hops))
                                return data->mem_hops;
index 449b1ea..c7ed51b 100644 (file)
@@ -1,6 +1,7 @@
 #ifndef __VMLINUX_H
 #define __VMLINUX_H
 
+#include <linux/stddef.h> // for define __always_inline
 #include <linux/bpf.h>
 #include <linux/types.h>
 #include <linux/perf_event.h>
index 70cac03..ecca407 100644 (file)
@@ -227,6 +227,19 @@ struct cs_etm_packet_queue {
 #define INFO_HEADER_SIZE (sizeof(((struct perf_record_auxtrace_info *)0)->type) + \
                          sizeof(((struct perf_record_auxtrace_info *)0)->reserved__))
 
+/* CoreSight trace ID is currently the bottom 7 bits of the value */
+#define CORESIGHT_TRACE_ID_VAL_MASK    GENMASK(6, 0)
+
+/*
+ * perf record will set the legacy meta data values as unused initially.
+ * This allows perf report to manage the decoders created when dynamic
+ * allocation in operation.
+ */
+#define CORESIGHT_TRACE_ID_UNUSED_FLAG BIT(31)
+
+/* Value to set for unused trace ID values */
+#define CORESIGHT_TRACE_ID_UNUSED_VAL  0x7F
+
 int cs_etm__process_auxtrace_info(union perf_event *event,
                                  struct perf_session *session);
 struct perf_event_attr *cs_etm_get_default_config(struct perf_pmu *pmu);
index 356c07f..c2dbb56 100644 (file)
@@ -282,6 +282,7 @@ void evsel__init(struct evsel *evsel,
        evsel->bpf_fd      = -1;
        INIT_LIST_HEAD(&evsel->config_terms);
        INIT_LIST_HEAD(&evsel->bpf_counter_list);
+       INIT_LIST_HEAD(&evsel->bpf_filters);
        perf_evsel__object.init(evsel);
        evsel->sample_size = __evsel__sample_size(attr->sample_type);
        evsel__calc_id_pos(evsel);
@@ -290,6 +291,7 @@ void evsel__init(struct evsel *evsel,
        evsel->per_pkg_mask  = NULL;
        evsel->collect_stat  = false;
        evsel->pmu_name      = NULL;
+       evsel->skippable     = false;
 }
 
 struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx)
@@ -828,26 +830,26 @@ bool evsel__name_is(struct evsel *evsel, const char *name)
 
 const char *evsel__group_pmu_name(const struct evsel *evsel)
 {
-       const struct evsel *leader;
+       struct evsel *leader = evsel__leader(evsel);
+       struct evsel *pos;
 
-       /* If the pmu_name is set use it. pmu_name isn't set for CPU and software events. */
-       if (evsel->pmu_name)
-               return evsel->pmu_name;
        /*
         * Software events may be in a group with other uncore PMU events. Use
-        * the pmu_name of the group leader to avoid breaking the software event
-        * out of the group.
+        * the pmu_name of the first non-software event to avoid breaking the
+        * software event out of the group.
         *
         * Aux event leaders, like intel_pt, expect a group with events from
         * other PMUs, so substitute the AUX event's PMU in this case.
         */
-       leader  = evsel__leader(evsel);
-       if ((evsel->core.attr.type == PERF_TYPE_SOFTWARE || evsel__is_aux_event(leader)) &&
-           leader->pmu_name) {
-               return leader->pmu_name;
+       if (evsel->core.attr.type == PERF_TYPE_SOFTWARE || evsel__is_aux_event(leader)) {
+               /* Starting with the leader, find the first event with a named PMU. */
+               for_each_group_evsel(pos, leader) {
+                       if (pos->pmu_name)
+                               return pos->pmu_name;
+               }
        }
 
-       return "cpu";
+       return evsel->pmu_name ?: "cpu";
 }
 
 const char *evsel__metric_id(const struct evsel *evsel)
@@ -1725,9 +1727,13 @@ static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread)
                return -1;
 
        fd = FD(leader, cpu_map_idx, thread);
-       BUG_ON(fd == -1);
+       BUG_ON(fd == -1 && !leader->skippable);
 
-       return fd;
+       /*
+        * When the leader has been skipped, return -2 to distinguish from no
+        * group leader case.
+        */
+       return fd == -1 ? -2 : fd;
 }
 
 static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int thread_idx)
@@ -2109,6 +2115,12 @@ retry_open:
 
                        group_fd = get_group_fd(evsel, idx, thread);
 
+                       if (group_fd == -2) {
+                               pr_debug("broken group leader for %s\n", evsel->name);
+                               err = -EINVAL;
+                               goto out_close;
+                       }
+
                        test_attr__ready();
 
                        /* Debug message used by test scripts */
index d575390..0f54f28 100644 (file)
@@ -95,6 +95,7 @@ struct evsel {
                bool                    weak_group;
                bool                    bpf_counter;
                bool                    use_config_name;
+               bool                    skippable;
                int                     bpf_fd;
                struct bpf_object       *bpf_obj;
                struct list_head        config_terms;
@@ -150,10 +151,8 @@ struct evsel {
         */
        struct bpf_counter_ops  *bpf_counter_ops;
 
-       union {
-               struct list_head        bpf_counter_list; /* for perf-stat -b */
-               struct list_head        bpf_filters; /* for perf-record --filter */
-       };
+       struct list_head        bpf_counter_list; /* for perf-stat -b */
+       struct list_head        bpf_filters; /* for perf-record --filter */
 
        /* for perf-stat --use-bpf */
        int                     bperf_leader_prog_fd;
index 250e444..4ce931c 100644 (file)
@@ -225,7 +225,11 @@ expr: NUMBER
 {
        if (fpclassify($3.val) == FP_ZERO) {
                pr_debug("division by zero\n");
-               YYABORT;
+               assert($3.ids == NULL);
+               if (compute_ids)
+                       ids__free($1.ids);
+               $$.val = NAN;
+               $$.ids = NULL;
        } else if (!compute_ids || (is_const($1.val) && is_const($3.val))) {
                assert($1.ids == NULL);
                assert($3.ids == NULL);
index c566c68..5e9c657 100644 (file)
@@ -1144,12 +1144,12 @@ static int metricgroup__add_metric_callback(const struct pmu_metric *pm,
        struct metricgroup__add_metric_data *data = vdata;
        int ret = 0;
 
-       if (pm->metric_expr &&
-               (match_metric(pm->metric_group, data->metric_name) ||
-                match_metric(pm->metric_name, data->metric_name))) {
+       if (pm->metric_expr && match_pm_metric(pm, data->metric_name)) {
+               bool metric_no_group = data->metric_no_group ||
+                       match_metric(data->metric_name, pm->metricgroup_no_group);
 
                data->has_match = true;
-               ret = add_metric(data->list, pm, data->modifier, data->metric_no_group,
+               ret = add_metric(data->list, pm, data->modifier, metric_no_group,
                                 data->metric_no_threshold, data->user_requested_cpu_list,
                                 data->system_wide, /*root_metric=*/NULL,
                                 /*visited_metrics=*/NULL, table);
@@ -1672,7 +1672,7 @@ static int metricgroup__topdown_max_level_callback(const struct pmu_metric *pm,
 {
        unsigned int *max_level = data;
        unsigned int level;
-       const char *p = strstr(pm->metric_group, "TopdownL");
+       const char *p = strstr(pm->metric_group ?: "", "TopdownL");
 
        if (!p || p[8] == '\0')
                return 0;
index d71019d..34ba840 100644 (file)
@@ -2140,25 +2140,32 @@ static int evlist__cmp(void *state, const struct list_head *l, const struct list
        int *leader_idx = state;
        int lhs_leader_idx = *leader_idx, rhs_leader_idx = *leader_idx, ret;
        const char *lhs_pmu_name, *rhs_pmu_name;
+       bool lhs_has_group = false, rhs_has_group = false;
 
        /*
         * First sort by grouping/leader. Read the leader idx only if the evsel
         * is part of a group, as -1 indicates no group.
         */
-       if (lhs_core->leader != lhs_core || lhs_core->nr_members > 1)
+       if (lhs_core->leader != lhs_core || lhs_core->nr_members > 1) {
+               lhs_has_group = true;
                lhs_leader_idx = lhs_core->leader->idx;
-       if (rhs_core->leader != rhs_core || rhs_core->nr_members > 1)
+       }
+       if (rhs_core->leader != rhs_core || rhs_core->nr_members > 1) {
+               rhs_has_group = true;
                rhs_leader_idx = rhs_core->leader->idx;
+       }
 
        if (lhs_leader_idx != rhs_leader_idx)
                return lhs_leader_idx - rhs_leader_idx;
 
-       /* Group by PMU. Groups can't span PMUs. */
-       lhs_pmu_name = evsel__group_pmu_name(lhs);
-       rhs_pmu_name = evsel__group_pmu_name(rhs);
-       ret = strcmp(lhs_pmu_name, rhs_pmu_name);
-       if (ret)
-               return ret;
+       /* Group by PMU if there is a group. Groups can't span PMUs. */
+       if (lhs_has_group && rhs_has_group) {
+               lhs_pmu_name = evsel__group_pmu_name(lhs);
+               rhs_pmu_name = evsel__group_pmu_name(rhs);
+               ret = strcmp(lhs_pmu_name, rhs_pmu_name);
+               if (ret)
+                       return ret;
+       }
 
        /* Architecture specific sorting. */
        return arch_evlist__cmp(lhs, rhs);
index 73b2ff2..bf5a6c1 100644 (file)
@@ -431,7 +431,7 @@ static void print_metric_json(struct perf_stat_config *config __maybe_unused,
        struct outstate *os = ctx;
        FILE *out = os->fh;
 
-       fprintf(out, "\"metric-value\" : %f, ", val);
+       fprintf(out, "\"metric-value\" : \"%f\", ", val);
        fprintf(out, "\"metric-unit\" : \"%s\"", unit);
        if (!config->metric_only)
                fprintf(out, "}");
index eeccab6..1566a20 100644 (file)
@@ -403,12 +403,25 @@ static int prepare_metric(struct evsel **metric_events,
                        if (!aggr)
                                break;
 
-                       /*
-                        * If an event was scaled during stat gathering, reverse
-                        * the scale before computing the metric.
-                        */
-                       val = aggr->counts.val * (1.0 / metric_events[i]->scale);
-                       source_count = evsel__source_count(metric_events[i]);
+                        if (!metric_events[i]->supported) {
+                               /*
+                                * Not supported events will have a count of 0,
+                                * which can be confusing in a
+                                * metric. Explicitly set the value to NAN. Not
+                                * counted events (enable time of 0) are read as
+                                * 0.
+                                */
+                               val = NAN;
+                               source_count = 0;
+                       } else {
+                               /*
+                                * If an event was scaled during stat gathering,
+                                * reverse the scale before computing the
+                                * metric.
+                                */
+                               val = aggr->counts.val * (1.0 / metric_events[i]->scale);
+                               source_count = evsel__source_count(metric_events[i]);
+                       }
                }
                n = strdup(evsel__metric_id(metric_events[i]));
                if (!n)
index b2ed9cc..63882a4 100644 (file)
 #include <bfd.h>
 #endif
 
+#if defined(HAVE_LIBBFD_SUPPORT) || defined(HAVE_CPLUS_DEMANGLE_SUPPORT)
+#ifndef DMGL_PARAMS
+#define DMGL_PARAMS     (1 << 0)  /* Include function args */
+#define DMGL_ANSI       (1 << 1)  /* Include const, volatile, etc */
+#endif
+#endif
+
 #ifndef EM_AARCH64
 #define EM_AARCH64     183  /* ARM 64 bit */
 #endif
@@ -271,6 +278,26 @@ static bool want_demangle(bool is_kernel_sym)
        return is_kernel_sym ? symbol_conf.demangle_kernel : symbol_conf.demangle;
 }
 
+/*
+ * Demangle C++ function signature, typically replaced by demangle-cxx.cpp
+ * version.
+ */
+__weak char *cxx_demangle_sym(const char *str __maybe_unused, bool params __maybe_unused,
+                             bool modifiers __maybe_unused)
+{
+#ifdef HAVE_LIBBFD_SUPPORT
+       int flags = (params ? DMGL_PARAMS : 0) | (modifiers ? DMGL_ANSI : 0);
+
+       return bfd_demangle(NULL, str, flags);
+#elif defined(HAVE_CPLUS_DEMANGLE_SUPPORT)
+       int flags = (params ? DMGL_PARAMS : 0) | (modifiers ? DMGL_ANSI : 0);
+
+       return cplus_demangle(str, flags);
+#else
+       return NULL;
+#endif
+}
+
 static char *demangle_sym(struct dso *dso, int kmodule, const char *elf_name)
 {
        char *demangled = NULL;
index 0ce29ee..a7a59c6 100644 (file)
@@ -40,25 +40,34 @@ static int sysfs_get_enabled(char *path, int *mode)
 {
        int fd;
        char yes_no;
+       int ret = 0;
 
        *mode = 0;
 
        fd = open(path, O_RDONLY);
-       if (fd == -1)
-               return -1;
+       if (fd == -1) {
+               ret = -1;
+               goto out;
+       }
 
        if (read(fd, &yes_no, 1) != 1) {
-               close(fd);
-               return -1;
+               ret = -1;
+               goto out_close;
        }
 
        if (yes_no == '1') {
                *mode = 1;
-               return 0;
+               goto out_close;
        } else if (yes_no == '0') {
-               return 0;
+               goto out_close;
+       } else {
+               ret = -1;
+               goto out_close;
        }
-       return -1;
+out_close:
+       close(fd);
+out:
+       return ret;
 }
 
 int powercap_get_enabled(int *mode)
index e7d48cb..ae6af35 100644 (file)
@@ -70,8 +70,8 @@ static int max_freq_mode;
  */
 static unsigned long max_frequency;
 
-static unsigned long long tsc_at_measure_start;
-static unsigned long long tsc_at_measure_end;
+static unsigned long long *tsc_at_measure_start;
+static unsigned long long *tsc_at_measure_end;
 static unsigned long long *mperf_previous_count;
 static unsigned long long *aperf_previous_count;
 static unsigned long long *mperf_current_count;
@@ -169,7 +169,7 @@ static int mperf_get_count_percent(unsigned int id, double *percent,
        aperf_diff = aperf_current_count[cpu] - aperf_previous_count[cpu];
 
        if (max_freq_mode == MAX_FREQ_TSC_REF) {
-               tsc_diff = tsc_at_measure_end - tsc_at_measure_start;
+               tsc_diff = tsc_at_measure_end[cpu] - tsc_at_measure_start[cpu];
                *percent = 100.0 * mperf_diff / tsc_diff;
                dprint("%s: TSC Ref - mperf_diff: %llu, tsc_diff: %llu\n",
                       mperf_cstates[id].name, mperf_diff, tsc_diff);
@@ -206,7 +206,7 @@ static int mperf_get_count_freq(unsigned int id, unsigned long long *count,
 
        if (max_freq_mode == MAX_FREQ_TSC_REF) {
                /* Calculate max_freq from TSC count */
-               tsc_diff = tsc_at_measure_end - tsc_at_measure_start;
+               tsc_diff = tsc_at_measure_end[cpu] - tsc_at_measure_start[cpu];
                time_diff = timespec_diff_us(time_start, time_end);
                max_frequency = tsc_diff / time_diff;
        }
@@ -225,33 +225,27 @@ static int mperf_get_count_freq(unsigned int id, unsigned long long *count,
 static int mperf_start(void)
 {
        int cpu;
-       unsigned long long dbg;
 
        clock_gettime(CLOCK_REALTIME, &time_start);
-       mperf_get_tsc(&tsc_at_measure_start);
 
-       for (cpu = 0; cpu < cpu_count; cpu++)
+       for (cpu = 0; cpu < cpu_count; cpu++) {
+               mperf_get_tsc(&tsc_at_measure_start[cpu]);
                mperf_init_stats(cpu);
+       }
 
-       mperf_get_tsc(&dbg);
-       dprint("TSC diff: %llu\n", dbg - tsc_at_measure_start);
        return 0;
 }
 
 static int mperf_stop(void)
 {
-       unsigned long long dbg;
        int cpu;
 
-       for (cpu = 0; cpu < cpu_count; cpu++)
+       for (cpu = 0; cpu < cpu_count; cpu++) {
                mperf_measure_stats(cpu);
+               mperf_get_tsc(&tsc_at_measure_end[cpu]);
+       }
 
-       mperf_get_tsc(&tsc_at_measure_end);
        clock_gettime(CLOCK_REALTIME, &time_end);
-
-       mperf_get_tsc(&dbg);
-       dprint("TSC diff: %llu\n", dbg - tsc_at_measure_end);
-
        return 0;
 }
 
@@ -353,7 +347,8 @@ struct cpuidle_monitor *mperf_register(void)
        aperf_previous_count = calloc(cpu_count, sizeof(unsigned long long));
        mperf_current_count = calloc(cpu_count, sizeof(unsigned long long));
        aperf_current_count = calloc(cpu_count, sizeof(unsigned long long));
-
+       tsc_at_measure_start = calloc(cpu_count, sizeof(unsigned long long));
+       tsc_at_measure_end = calloc(cpu_count, sizeof(unsigned long long));
        mperf_monitor.name_len = strlen(mperf_monitor.name);
        return &mperf_monitor;
 }
@@ -364,6 +359,8 @@ void mperf_unregister(void)
        free(aperf_previous_count);
        free(mperf_current_count);
        free(aperf_current_count);
+       free(tsc_at_measure_start);
+       free(tsc_at_measure_end);
        free(is_valid);
 }
 
index fba7bec..6f9347a 100644 (file)
@@ -6,6 +6,7 @@ ldflags-y += --wrap=acpi_pci_find_root
 ldflags-y += --wrap=nvdimm_bus_register
 ldflags-y += --wrap=devm_cxl_port_enumerate_dports
 ldflags-y += --wrap=devm_cxl_setup_hdm
+ldflags-y += --wrap=devm_cxl_enable_hdm
 ldflags-y += --wrap=devm_cxl_add_passthrough_decoder
 ldflags-y += --wrap=devm_cxl_enumerate_decoders
 ldflags-y += --wrap=cxl_await_media_ready
index ba572d0..34b4802 100644 (file)
@@ -1256,6 +1256,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
        if (rc)
                return rc;
 
+       cxlds->media_ready = true;
        rc = cxl_dev_state_identify(cxlds);
        if (rc)
                return rc;
index c4e53f2..2844165 100644 (file)
@@ -19,7 +19,7 @@ void register_cxl_mock_ops(struct cxl_mock_ops *ops)
 }
 EXPORT_SYMBOL_GPL(register_cxl_mock_ops);
 
-static DEFINE_SRCU(cxl_mock_srcu);
+DEFINE_STATIC_SRCU(cxl_mock_srcu);
 
 void unregister_cxl_mock_ops(struct cxl_mock_ops *ops)
 {
@@ -149,6 +149,21 @@ struct cxl_hdm *__wrap_devm_cxl_setup_hdm(struct cxl_port *port,
 }
 EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_setup_hdm, CXL);
 
+int __wrap_devm_cxl_enable_hdm(struct cxl_port *port, struct cxl_hdm *cxlhdm)
+{
+       int index, rc;
+       struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
+
+       if (ops && ops->is_mock_port(port->uport))
+               rc = 0;
+       else
+               rc = devm_cxl_enable_hdm(port, cxlhdm);
+       put_cxl_mock_ops(index);
+
+       return rc;
+}
+EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_enable_hdm, CXL);
+
 int __wrap_devm_cxl_add_passthrough_decoder(struct cxl_port *port)
 {
        int rc, index;
index f01f941..7f64880 100644 (file)
@@ -92,7 +92,7 @@ class LinuxSourceTreeOperations:
                if stderr:  # likely only due to build warnings
                        print(stderr.decode())
 
-       def start(self, params: List[str], build_dir: str) -> subprocess.Popen[str]:
+       def start(self, params: List[str], build_dir: str) -> subprocess.Popen:
                raise RuntimeError('not implemented!')
 
 
@@ -113,7 +113,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
                kconfig.merge_in_entries(base_kunitconfig)
                return kconfig
 
-       def start(self, params: List[str], build_dir: str) -> subprocess.Popen[str]:
+       def start(self, params: List[str], build_dir: str) -> subprocess.Popen:
                kernel_path = os.path.join(build_dir, self._kernel_path)
                qemu_command = ['qemu-system-' + self._qemu_arch,
                                '-nodefaults',
@@ -142,7 +142,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
                kconfig.merge_in_entries(base_kunitconfig)
                return kconfig
 
-       def start(self, params: List[str], build_dir: str) -> subprocess.Popen[str]:
+       def start(self, params: List[str], build_dir: str) -> subprocess.Popen:
                """Runs the Linux UML binary. Must be named 'linux'."""
                linux_bin = os.path.join(build_dir, 'linux')
                params.extend(['mem=1G', 'console=tty', 'kunit_shutdown=halt'])
diff --git a/tools/testing/kunit/mypy.ini b/tools/testing/kunit/mypy.ini
new file mode 100644 (file)
index 0000000..ddd2883
--- /dev/null
@@ -0,0 +1,6 @@
+[mypy]
+strict = True
+
+# E.g. we can't write subprocess.Popen[str] until Python 3.9+.
+# But kunit.py tries to support Python 3.7+, so let's disable it.
+disable_error_code = type-arg
index 8208c3b..c6d494e 100755 (executable)
@@ -23,7 +23,7 @@ commands: Dict[str, Sequence[str]] = {
        'kunit_tool_test.py': ['./kunit_tool_test.py'],
        'kunit smoke test': ['./kunit.py', 'run', '--kunitconfig=lib/kunit', '--build_dir=kunit_run_checks'],
        'pytype': ['/bin/sh', '-c', 'pytype *.py'],
-       'mypy': ['mypy', '--strict', '--exclude', '_test.py$', '--exclude', 'qemu_configs/', '.'],
+       'mypy': ['mypy', '--config-file', 'mypy.ini', '--exclude', '_test.py$', '--exclude', 'qemu_configs/', '.'],
 }
 
 # The user might not have mypy or pytype installed, skip them if so.
index caf32a9..7527f73 100644 (file)
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
-CFLAGS += -I. -I../../include -g -Og -Wall -D_LGPL_SOURCE -fsanitize=address \
-         -fsanitize=undefined
+CFLAGS += -I. -I../../include -I../../../lib -g -Og -Wall \
+         -D_LGPL_SOURCE -fsanitize=address -fsanitize=undefined
 LDFLAGS += -fsanitize=address -fsanitize=undefined
 LDLIBS+= -lpthread -lurcu
 TARGETS = main idr-test multiorder xarray maple
@@ -49,6 +49,7 @@ $(OFILES): Makefile *.h */*.h generated/map-shift.h generated/bit-length.h \
        ../../../include/linux/xarray.h \
        ../../../include/linux/maple_tree.h \
        ../../../include/linux/radix-tree.h \
+       ../../../lib/radix-tree.h \
        ../../../include/linux/idr.h
 
 radix-tree.c: ../../../lib/radix-tree.c
index 3e390fe..b7eef32 100644 (file)
@@ -381,7 +381,7 @@ __format:
                goto __close;
        }
        if (rrate != rate) {
-               snprintf(msg, sizeof(msg), "rate mismatch %ld != %ld", rate, rrate);
+               snprintf(msg, sizeof(msg), "rate mismatch %ld != %d", rate, rrate);
                goto __close;
        }
        rperiod_size = period_size;
@@ -447,24 +447,24 @@ __format:
                        frames = snd_pcm_writei(handle, samples, rate);
                        if (frames < 0) {
                                snprintf(msg, sizeof(msg),
-                                        "Write failed: expected %d, wrote %li", rate, frames);
+                                        "Write failed: expected %ld, wrote %li", rate, frames);
                                goto __close;
                        }
                        if (frames < rate) {
                                snprintf(msg, sizeof(msg),
-                                        "expected %d, wrote %li", rate, frames);
+                                        "expected %ld, wrote %li", rate, frames);
                                goto __close;
                        }
                } else {
                        frames = snd_pcm_readi(handle, samples, rate);
                        if (frames < 0) {
                                snprintf(msg, sizeof(msg),
-                                        "expected %d, wrote %li", rate, frames);
+                                        "expected %ld, wrote %li", rate, frames);
                                goto __close;
                        }
                        if (frames < rate) {
                                snprintf(msg, sizeof(msg),
-                                        "expected %d, wrote %li", rate, frames);
+                                        "expected %ld, wrote %li", rate, frames);
                                goto __close;
                        }
                }
index 93333a9..d4ad813 100644 (file)
@@ -39,6 +39,20 @@ static void cssc_sigill(void)
        asm volatile(".inst 0xdac01c00" : : : "x0");
 }
 
+static void mops_sigill(void)
+{
+       char dst[1], src[1];
+       register char *dstp asm ("x0") = dst;
+       register char *srcp asm ("x1") = src;
+       register long size asm ("x2") = 1;
+
+       /* CPYP [x0]!, [x1]!, x2! */
+       asm volatile(".inst 0x1d010440"
+                    : "+r" (dstp), "+r" (srcp), "+r" (size)
+                    :
+                    : "cc", "memory");
+}
+
 static void rng_sigill(void)
 {
        asm volatile("mrs x0, S3_3_C2_C4_0" : : : "x0");
@@ -210,6 +224,14 @@ static const struct hwcap_data {
                .sigill_fn = cssc_sigill,
        },
        {
+               .name = "MOPS",
+               .at_hwcap = AT_HWCAP2,
+               .hwcap_bit = HWCAP2_MOPS,
+               .cpuinfo = "mops",
+               .sigill_fn = mops_sigill,
+               .sigill_reliable = true,
+       },
+       {
                .name = "RNG",
                .at_hwcap = AT_HWCAP2,
                .hwcap_bit = HWCAP2_RNG,
index be95251..abe4d58 100644 (file)
@@ -20,7 +20,7 @@
 
 #include "../../kselftest.h"
 
-#define EXPECTED_TESTS 7
+#define EXPECTED_TESTS 11
 
 #define MAX_TPIDRS 2
 
@@ -132,6 +132,34 @@ static void test_tpidr(pid_t child)
        }
 }
 
+static void test_hw_debug(pid_t child, int type, const char *type_name)
+{
+       struct user_hwdebug_state state;
+       struct iovec iov;
+       int slots, arch, ret;
+
+       iov.iov_len = sizeof(state);
+       iov.iov_base = &state;
+
+       /* Should be able to read the values */
+       ret = ptrace(PTRACE_GETREGSET, child, type, &iov);
+       ksft_test_result(ret == 0, "read_%s\n", type_name);
+
+       if (ret == 0) {
+               /* Low 8 bits is the number of slots, next 4 bits the arch */
+               slots = state.dbg_info & 0xff;
+               arch = (state.dbg_info >> 8) & 0xf;
+
+               ksft_print_msg("%s version %d with %d slots\n", type_name,
+                              arch, slots);
+
+               /* Zero is not currently architecturally valid */
+               ksft_test_result(arch, "%s_arch_set\n", type_name);
+       } else {
+               ksft_test_result_skip("%s_arch_set\n");
+       }
+}
+
 static int do_child(void)
 {
        if (ptrace(PTRACE_TRACEME, -1, NULL, NULL))
@@ -207,6 +235,8 @@ static int do_parent(pid_t child)
        ksft_print_msg("Parent is %d, child is %d\n", getpid(), child);
 
        test_tpidr(child);
+       test_hw_debug(child, NT_ARM_HW_WATCH, "NT_ARM_HW_WATCH");
+       test_hw_debug(child, NT_ARM_HW_BREAK, "NT_ARM_HW_BREAK");
 
        ret = EXIT_SUCCESS;
 
index 8ab4c86..839e3a2 100644 (file)
@@ -4,7 +4,7 @@ fake_sigreturn_*
 sme_*
 ssve_*
 sve_*
-tpidr2_siginfo
+tpidr2_*
 za_*
 zt_*
 !*.[ch]
index 40be844..0dc948d 100644 (file)
@@ -249,7 +249,8 @@ static void default_handler(int signum, siginfo_t *si, void *uc)
                        fprintf(stderr, "-- Timeout !\n");
                } else {
                        fprintf(stderr,
-                               "-- RX UNEXPECTED SIGNAL: %d\n", signum);
+                               "-- RX UNEXPECTED SIGNAL: %d code %d address %p\n",
+                               signum, si->si_code, si->si_addr);
                }
                default_result(current, 1);
        }
diff --git a/tools/testing/selftests/arm64/signal/testcases/tpidr2_restore.c b/tools/testing/selftests/arm64/signal/testcases/tpidr2_restore.c
new file mode 100644 (file)
index 0000000..f9a86c0
--- /dev/null
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 ARM Limited
+ *
+ * Verify that the TPIDR2 register context in signal frames is restored.
+ */
+
+#include <signal.h>
+#include <ucontext.h>
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+#include <asm/sigcontext.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+#define SYS_TPIDR2 "S3_3_C13_C0_5"
+
+static uint64_t get_tpidr2(void)
+{
+       uint64_t val;
+
+       asm volatile (
+               "mrs    %0, " SYS_TPIDR2 "\n"
+               : "=r"(val)
+               :
+               : "cc");
+
+       return val;
+}
+
+static void set_tpidr2(uint64_t val)
+{
+       asm volatile (
+               "msr    " SYS_TPIDR2 ", %0\n"
+               :
+               : "r"(val)
+               : "cc");
+}
+
+
+static uint64_t initial_tpidr2;
+
+static bool save_tpidr2(struct tdescr *td)
+{
+       initial_tpidr2 = get_tpidr2();
+       fprintf(stderr, "Initial TPIDR2: %lx\n", initial_tpidr2);
+
+       return true;
+}
+
+static int modify_tpidr2(struct tdescr *td, siginfo_t *si, ucontext_t *uc)
+{
+       uint64_t my_tpidr2 = get_tpidr2();
+
+       my_tpidr2++;
+       fprintf(stderr, "Setting TPIDR2 to %lx\n", my_tpidr2);
+       set_tpidr2(my_tpidr2);
+
+       return 0;
+}
+
+static void check_tpidr2(struct tdescr *td)
+{
+       uint64_t tpidr2 = get_tpidr2();
+
+       td->pass = tpidr2 == initial_tpidr2;
+
+       if (td->pass)
+               fprintf(stderr, "TPIDR2 restored\n");
+       else
+               fprintf(stderr, "TPIDR2 was %lx but is now %lx\n",
+                       initial_tpidr2, tpidr2);
+}
+
+struct tdescr tde = {
+       .name = "TPIDR2 restore",
+       .descr = "Validate that TPIDR2 is restored from the sigframe",
+       .feats_required = FEAT_SME,
+       .timeout = 3,
+       .sig_trig = SIGUSR1,
+       .init = save_tpidr2,
+       .run = modify_tpidr2,
+       .check_result = check_tpidr2,
+};
index c49e540..28d2c77 100644 (file)
@@ -197,7 +197,7 @@ $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_r
 
 $(OUTPUT)/sign-file: ../../../../scripts/sign-file.c
        $(call msg,SIGN-FILE,,$@)
-       $(Q)$(CC) $(shell $(HOSTPKG_CONFIG)--cflags libcrypto 2> /dev/null) \
+       $(Q)$(CC) $(shell $(HOSTPKG_CONFIG) --cflags libcrypto 2> /dev/null) \
                  $< -o $@ \
                  $(shell $(HOSTPKG_CONFIG) --libs libcrypto 2> /dev/null || echo -lcrypto)
 
diff --git a/tools/testing/selftests/bpf/prog_tests/inner_array_lookup.c b/tools/testing/selftests/bpf/prog_tests/inner_array_lookup.c
new file mode 100644 (file)
index 0000000..9ab4cd1
--- /dev/null
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <test_progs.h>
+
+#include "inner_array_lookup.skel.h"
+
+void test_inner_array_lookup(void)
+{
+       int map1_fd, err;
+       int key = 3;
+       int val = 1;
+       struct inner_array_lookup *skel;
+
+       skel = inner_array_lookup__open_and_load();
+       if (!ASSERT_OK_PTR(skel, "open_load_skeleton"))
+               return;
+
+       err = inner_array_lookup__attach(skel);
+       if (!ASSERT_OK(err, "skeleton_attach"))
+               goto cleanup;
+
+       map1_fd = bpf_map__fd(skel->maps.inner_map1);
+       bpf_map_update_elem(map1_fd, &key, &val, 0);
+
+       /* Probe should have set the element at index 3 to 2 */
+       bpf_map_lookup_elem(map1_fd, &key, &val);
+       ASSERT_EQ(val, 2, "value_is_2");
+
+cleanup:
+       inner_array_lookup__destroy(skel);
+}
index 0ce25a9..064cc5e 100644 (file)
@@ -2,6 +2,7 @@
 // Copyright (c) 2020 Cloudflare
 #include <error.h>
 #include <netinet/tcp.h>
+#include <sys/epoll.h>
 
 #include "test_progs.h"
 #include "test_skmsg_load_helpers.skel.h"
@@ -9,8 +10,12 @@
 #include "test_sockmap_invalid_update.skel.h"
 #include "test_sockmap_skb_verdict_attach.skel.h"
 #include "test_sockmap_progs_query.skel.h"
+#include "test_sockmap_pass_prog.skel.h"
+#include "test_sockmap_drop_prog.skel.h"
 #include "bpf_iter_sockmap.skel.h"
 
+#include "sockmap_helpers.h"
+
 #define TCP_REPAIR             19      /* TCP sock is under repair right now */
 
 #define TCP_REPAIR_ON          1
@@ -350,6 +355,126 @@ out:
        test_sockmap_progs_query__destroy(skel);
 }
 
+#define MAX_EVENTS 10
+static void test_sockmap_skb_verdict_shutdown(void)
+{
+       struct epoll_event ev, events[MAX_EVENTS];
+       int n, err, map, verdict, s, c1, p1;
+       struct test_sockmap_pass_prog *skel;
+       int epollfd;
+       int zero = 0;
+       char b;
+
+       skel = test_sockmap_pass_prog__open_and_load();
+       if (!ASSERT_OK_PTR(skel, "open_and_load"))
+               return;
+
+       verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
+       map = bpf_map__fd(skel->maps.sock_map_rx);
+
+       err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0);
+       if (!ASSERT_OK(err, "bpf_prog_attach"))
+               goto out;
+
+       s = socket_loopback(AF_INET, SOCK_STREAM);
+       if (s < 0)
+               goto out;
+       err = create_pair(s, AF_INET, SOCK_STREAM, &c1, &p1);
+       if (err < 0)
+               goto out;
+
+       err = bpf_map_update_elem(map, &zero, &c1, BPF_NOEXIST);
+       if (err < 0)
+               goto out_close;
+
+       shutdown(p1, SHUT_WR);
+
+       ev.events = EPOLLIN;
+       ev.data.fd = c1;
+
+       epollfd = epoll_create1(0);
+       if (!ASSERT_GT(epollfd, -1, "epoll_create(0)"))
+               goto out_close;
+       err = epoll_ctl(epollfd, EPOLL_CTL_ADD, c1, &ev);
+       if (!ASSERT_OK(err, "epoll_ctl(EPOLL_CTL_ADD)"))
+               goto out_close;
+       err = epoll_wait(epollfd, events, MAX_EVENTS, -1);
+       if (!ASSERT_EQ(err, 1, "epoll_wait(fd)"))
+               goto out_close;
+
+       n = recv(c1, &b, 1, SOCK_NONBLOCK);
+       ASSERT_EQ(n, 0, "recv_timeout(fin)");
+out_close:
+       close(c1);
+       close(p1);
+out:
+       test_sockmap_pass_prog__destroy(skel);
+}
+
+static void test_sockmap_skb_verdict_fionread(bool pass_prog)
+{
+       int expected, zero = 0, sent, recvd, avail;
+       int err, map, verdict, s, c0, c1, p0, p1;
+       struct test_sockmap_pass_prog *pass;
+       struct test_sockmap_drop_prog *drop;
+       char buf[256] = "0123456789";
+
+       if (pass_prog) {
+               pass = test_sockmap_pass_prog__open_and_load();
+               if (!ASSERT_OK_PTR(pass, "open_and_load"))
+                       return;
+               verdict = bpf_program__fd(pass->progs.prog_skb_verdict);
+               map = bpf_map__fd(pass->maps.sock_map_rx);
+               expected = sizeof(buf);
+       } else {
+               drop = test_sockmap_drop_prog__open_and_load();
+               if (!ASSERT_OK_PTR(drop, "open_and_load"))
+                       return;
+               verdict = bpf_program__fd(drop->progs.prog_skb_verdict);
+               map = bpf_map__fd(drop->maps.sock_map_rx);
+               /* On drop data is consumed immediately and copied_seq inc'd */
+               expected = 0;
+       }
+
+
+       err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0);
+       if (!ASSERT_OK(err, "bpf_prog_attach"))
+               goto out;
+
+       s = socket_loopback(AF_INET, SOCK_STREAM);
+       if (!ASSERT_GT(s, -1, "socket_loopback(s)"))
+               goto out;
+       err = create_socket_pairs(s, AF_INET, SOCK_STREAM, &c0, &c1, &p0, &p1);
+       if (!ASSERT_OK(err, "create_socket_pairs(s)"))
+               goto out;
+
+       err = bpf_map_update_elem(map, &zero, &c1, BPF_NOEXIST);
+       if (!ASSERT_OK(err, "bpf_map_update_elem(c1)"))
+               goto out_close;
+
+       sent = xsend(p1, &buf, sizeof(buf), 0);
+       ASSERT_EQ(sent, sizeof(buf), "xsend(p0)");
+       err = ioctl(c1, FIONREAD, &avail);
+       ASSERT_OK(err, "ioctl(FIONREAD) error");
+       ASSERT_EQ(avail, expected, "ioctl(FIONREAD)");
+       /* On DROP test there will be no data to read */
+       if (pass_prog) {
+               recvd = recv_timeout(c1, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC);
+               ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(c0)");
+       }
+
+out_close:
+       close(c0);
+       close(p0);
+       close(c1);
+       close(p1);
+out:
+       if (pass_prog)
+               test_sockmap_pass_prog__destroy(pass);
+       else
+               test_sockmap_drop_prog__destroy(drop);
+}
+
 void test_sockmap_basic(void)
 {
        if (test__start_subtest("sockmap create_update_free"))
@@ -384,4 +509,10 @@ void test_sockmap_basic(void)
                test_sockmap_progs_query(BPF_SK_SKB_STREAM_VERDICT);
        if (test__start_subtest("sockmap skb_verdict progs query"))
                test_sockmap_progs_query(BPF_SK_SKB_VERDICT);
+       if (test__start_subtest("sockmap skb_verdict shutdown"))
+               test_sockmap_skb_verdict_shutdown();
+       if (test__start_subtest("sockmap skb_verdict fionread"))
+               test_sockmap_skb_verdict_fionread(true);
+       if (test__start_subtest("sockmap skb_verdict fionread on drop"))
+               test_sockmap_skb_verdict_fionread(false);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h
new file mode 100644 (file)
index 0000000..d126654
--- /dev/null
@@ -0,0 +1,390 @@
+#ifndef __SOCKMAP_HELPERS__
+#define __SOCKMAP_HELPERS__
+
+#include <linux/vm_sockets.h>
+
+#define IO_TIMEOUT_SEC 30
+#define MAX_STRERR_LEN 256
+#define MAX_TEST_NAME 80
+
+/* workaround for older vm_sockets.h */
+#ifndef VMADDR_CID_LOCAL
+#define VMADDR_CID_LOCAL 1
+#endif
+
+#define __always_unused        __attribute__((__unused__))
+
+#define _FAIL(errnum, fmt...)                                                  \
+       ({                                                                     \
+               error_at_line(0, (errnum), __func__, __LINE__, fmt);           \
+               CHECK_FAIL(true);                                              \
+       })
+#define FAIL(fmt...) _FAIL(0, fmt)
+#define FAIL_ERRNO(fmt...) _FAIL(errno, fmt)
+#define FAIL_LIBBPF(err, msg)                                                  \
+       ({                                                                     \
+               char __buf[MAX_STRERR_LEN];                                    \
+               libbpf_strerror((err), __buf, sizeof(__buf));                  \
+               FAIL("%s: %s", (msg), __buf);                                  \
+       })
+
+/* Wrappers that fail the test on error and report it. */
+
+#define xaccept_nonblock(fd, addr, len)                                        \
+       ({                                                                     \
+               int __ret =                                                    \
+                       accept_timeout((fd), (addr), (len), IO_TIMEOUT_SEC);   \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("accept");                                  \
+               __ret;                                                         \
+       })
+
+#define xbind(fd, addr, len)                                                   \
+       ({                                                                     \
+               int __ret = bind((fd), (addr), (len));                         \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("bind");                                    \
+               __ret;                                                         \
+       })
+
+#define xclose(fd)                                                             \
+       ({                                                                     \
+               int __ret = close((fd));                                       \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("close");                                   \
+               __ret;                                                         \
+       })
+
+#define xconnect(fd, addr, len)                                                \
+       ({                                                                     \
+               int __ret = connect((fd), (addr), (len));                      \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("connect");                                 \
+               __ret;                                                         \
+       })
+
+#define xgetsockname(fd, addr, len)                                            \
+       ({                                                                     \
+               int __ret = getsockname((fd), (addr), (len));                  \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("getsockname");                             \
+               __ret;                                                         \
+       })
+
+#define xgetsockopt(fd, level, name, val, len)                                 \
+       ({                                                                     \
+               int __ret = getsockopt((fd), (level), (name), (val), (len));   \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("getsockopt(" #name ")");                   \
+               __ret;                                                         \
+       })
+
+#define xlisten(fd, backlog)                                                   \
+       ({                                                                     \
+               int __ret = listen((fd), (backlog));                           \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("listen");                                  \
+               __ret;                                                         \
+       })
+
+#define xsetsockopt(fd, level, name, val, len)                                 \
+       ({                                                                     \
+               int __ret = setsockopt((fd), (level), (name), (val), (len));   \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("setsockopt(" #name ")");                   \
+               __ret;                                                         \
+       })
+
+#define xsend(fd, buf, len, flags)                                             \
+       ({                                                                     \
+               ssize_t __ret = send((fd), (buf), (len), (flags));             \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("send");                                    \
+               __ret;                                                         \
+       })
+
+#define xrecv_nonblock(fd, buf, len, flags)                                    \
+       ({                                                                     \
+               ssize_t __ret = recv_timeout((fd), (buf), (len), (flags),      \
+                                            IO_TIMEOUT_SEC);                  \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("recv");                                    \
+               __ret;                                                         \
+       })
+
+#define xsocket(family, sotype, flags)                                         \
+       ({                                                                     \
+               int __ret = socket(family, sotype, flags);                     \
+               if (__ret == -1)                                               \
+                       FAIL_ERRNO("socket");                                  \
+               __ret;                                                         \
+       })
+
+#define xbpf_map_delete_elem(fd, key)                                          \
+       ({                                                                     \
+               int __ret = bpf_map_delete_elem((fd), (key));                  \
+               if (__ret < 0)                                               \
+                       FAIL_ERRNO("map_delete");                              \
+               __ret;                                                         \
+       })
+
+#define xbpf_map_lookup_elem(fd, key, val)                                     \
+       ({                                                                     \
+               int __ret = bpf_map_lookup_elem((fd), (key), (val));           \
+               if (__ret < 0)                                               \
+                       FAIL_ERRNO("map_lookup");                              \
+               __ret;                                                         \
+       })
+
+#define xbpf_map_update_elem(fd, key, val, flags)                              \
+       ({                                                                     \
+               int __ret = bpf_map_update_elem((fd), (key), (val), (flags));  \
+               if (__ret < 0)                                               \
+                       FAIL_ERRNO("map_update");                              \
+               __ret;                                                         \
+       })
+
+#define xbpf_prog_attach(prog, target, type, flags)                            \
+       ({                                                                     \
+               int __ret =                                                    \
+                       bpf_prog_attach((prog), (target), (type), (flags));    \
+               if (__ret < 0)                                               \
+                       FAIL_ERRNO("prog_attach(" #type ")");                  \
+               __ret;                                                         \
+       })
+
+#define xbpf_prog_detach2(prog, target, type)                                  \
+       ({                                                                     \
+               int __ret = bpf_prog_detach2((prog), (target), (type));        \
+               if (__ret < 0)                                               \
+                       FAIL_ERRNO("prog_detach2(" #type ")");                 \
+               __ret;                                                         \
+       })
+
+#define xpthread_create(thread, attr, func, arg)                               \
+       ({                                                                     \
+               int __ret = pthread_create((thread), (attr), (func), (arg));   \
+               errno = __ret;                                                 \
+               if (__ret)                                                     \
+                       FAIL_ERRNO("pthread_create");                          \
+               __ret;                                                         \
+       })
+
+#define xpthread_join(thread, retval)                                          \
+       ({                                                                     \
+               int __ret = pthread_join((thread), (retval));                  \
+               errno = __ret;                                                 \
+               if (__ret)                                                     \
+                       FAIL_ERRNO("pthread_join");                            \
+               __ret;                                                         \
+       })
+
+static inline int poll_read(int fd, unsigned int timeout_sec)
+{
+       struct timeval timeout = { .tv_sec = timeout_sec };
+       fd_set rfds;
+       int r;
+
+       FD_ZERO(&rfds);
+       FD_SET(fd, &rfds);
+
+       r = select(fd + 1, &rfds, NULL, NULL, &timeout);
+       if (r == 0)
+               errno = ETIME;
+
+       return r == 1 ? 0 : -1;
+}
+
+static inline int accept_timeout(int fd, struct sockaddr *addr, socklen_t *len,
+                                unsigned int timeout_sec)
+{
+       if (poll_read(fd, timeout_sec))
+               return -1;
+
+       return accept(fd, addr, len);
+}
+
+static inline int recv_timeout(int fd, void *buf, size_t len, int flags,
+                              unsigned int timeout_sec)
+{
+       if (poll_read(fd, timeout_sec))
+               return -1;
+
+       return recv(fd, buf, len, flags);
+}
+
+static inline void init_addr_loopback4(struct sockaddr_storage *ss,
+                                      socklen_t *len)
+{
+       struct sockaddr_in *addr4 = memset(ss, 0, sizeof(*ss));
+
+       addr4->sin_family = AF_INET;
+       addr4->sin_port = 0;
+       addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+       *len = sizeof(*addr4);
+}
+
+static inline void init_addr_loopback6(struct sockaddr_storage *ss,
+                                      socklen_t *len)
+{
+       struct sockaddr_in6 *addr6 = memset(ss, 0, sizeof(*ss));
+
+       addr6->sin6_family = AF_INET6;
+       addr6->sin6_port = 0;
+       addr6->sin6_addr = in6addr_loopback;
+       *len = sizeof(*addr6);
+}
+
+static inline void init_addr_loopback_vsock(struct sockaddr_storage *ss,
+                                           socklen_t *len)
+{
+       struct sockaddr_vm *addr = memset(ss, 0, sizeof(*ss));
+
+       addr->svm_family = AF_VSOCK;
+       addr->svm_port = VMADDR_PORT_ANY;
+       addr->svm_cid = VMADDR_CID_LOCAL;
+       *len = sizeof(*addr);
+}
+
+static inline void init_addr_loopback(int family, struct sockaddr_storage *ss,
+                                     socklen_t *len)
+{
+       switch (family) {
+       case AF_INET:
+               init_addr_loopback4(ss, len);
+               return;
+       case AF_INET6:
+               init_addr_loopback6(ss, len);
+               return;
+       case AF_VSOCK:
+               init_addr_loopback_vsock(ss, len);
+               return;
+       default:
+               FAIL("unsupported address family %d", family);
+       }
+}
+
+static inline struct sockaddr *sockaddr(struct sockaddr_storage *ss)
+{
+       return (struct sockaddr *)ss;
+}
+
+static inline int add_to_sockmap(int sock_mapfd, int fd1, int fd2)
+{
+       u64 value;
+       u32 key;
+       int err;
+
+       key = 0;
+       value = fd1;
+       err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+       if (err)
+               return err;
+
+       key = 1;
+       value = fd2;
+       return xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
+}
+
+static inline int create_pair(int s, int family, int sotype, int *c, int *p)
+{
+       struct sockaddr_storage addr;
+       socklen_t len;
+       int err = 0;
+
+       len = sizeof(addr);
+       err = xgetsockname(s, sockaddr(&addr), &len);
+       if (err)
+               return err;
+
+       *c = xsocket(family, sotype, 0);
+       if (*c < 0)
+               return errno;
+       err = xconnect(*c, sockaddr(&addr), len);
+       if (err) {
+               err = errno;
+               goto close_cli0;
+       }
+
+       *p = xaccept_nonblock(s, NULL, NULL);
+       if (*p < 0) {
+               err = errno;
+               goto close_cli0;
+       }
+       return err;
+close_cli0:
+       close(*c);
+       return err;
+}
+
+static inline int create_socket_pairs(int s, int family, int sotype,
+                                     int *c0, int *c1, int *p0, int *p1)
+{
+       int err;
+
+       err = create_pair(s, family, sotype, c0, p0);
+       if (err)
+               return err;
+
+       err = create_pair(s, family, sotype, c1, p1);
+       if (err) {
+               close(*c0);
+               close(*p0);
+       }
+       return err;
+}
+
+static inline int enable_reuseport(int s, int progfd)
+{
+       int err, one = 1;
+
+       err = xsetsockopt(s, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one));
+       if (err)
+               return -1;
+       err = xsetsockopt(s, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, &progfd,
+                         sizeof(progfd));
+       if (err)
+               return -1;
+
+       return 0;
+}
+
+static inline int socket_loopback_reuseport(int family, int sotype, int progfd)
+{
+       struct sockaddr_storage addr;
+       socklen_t len;
+       int err, s;
+
+       init_addr_loopback(family, &addr, &len);
+
+       s = xsocket(family, sotype, 0);
+       if (s == -1)
+               return -1;
+
+       if (progfd >= 0)
+               enable_reuseport(s, progfd);
+
+       err = xbind(s, sockaddr(&addr), len);
+       if (err)
+               goto close;
+
+       if (sotype & SOCK_DGRAM)
+               return s;
+
+       err = xlisten(s, SOMAXCONN);
+       if (err)
+               goto close;
+
+       return s;
+close:
+       xclose(s);
+       return -1;
+}
+
+static inline int socket_loopback(int family, int sotype)
+{
+       return socket_loopback_reuseport(family, sotype, -1);
+}
+
+
+#endif // __SOCKMAP_HELPERS__
index 141c1e5..b4f6f3a 100644 (file)
 #include <unistd.h>
 #include <linux/vm_sockets.h>
 
-/* workaround for older vm_sockets.h */
-#ifndef VMADDR_CID_LOCAL
-#define VMADDR_CID_LOCAL 1
-#endif
-
 #include <bpf/bpf.h>
 #include <bpf/libbpf.h>
 
 #include "test_progs.h"
 #include "test_sockmap_listen.skel.h"
 
-#define IO_TIMEOUT_SEC 30
-#define MAX_STRERR_LEN 256
-#define MAX_TEST_NAME 80
-
-#define __always_unused        __attribute__((__unused__))
-
-#define _FAIL(errnum, fmt...)                                                  \
-       ({                                                                     \
-               error_at_line(0, (errnum), __func__, __LINE__, fmt);           \
-               CHECK_FAIL(true);                                              \
-       })
-#define FAIL(fmt...) _FAIL(0, fmt)
-#define FAIL_ERRNO(fmt...) _FAIL(errno, fmt)
-#define FAIL_LIBBPF(err, msg)                                                  \
-       ({                                                                     \
-               char __buf[MAX_STRERR_LEN];                                    \
-               libbpf_strerror((err), __buf, sizeof(__buf));                  \
-               FAIL("%s: %s", (msg), __buf);                                  \
-       })
-
-/* Wrappers that fail the test on error and report it. */
-
-#define xaccept_nonblock(fd, addr, len)                                        \
-       ({                                                                     \
-               int __ret =                                                    \
-                       accept_timeout((fd), (addr), (len), IO_TIMEOUT_SEC);   \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("accept");                                  \
-               __ret;                                                         \
-       })
-
-#define xbind(fd, addr, len)                                                   \
-       ({                                                                     \
-               int __ret = bind((fd), (addr), (len));                         \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("bind");                                    \
-               __ret;                                                         \
-       })
-
-#define xclose(fd)                                                             \
-       ({                                                                     \
-               int __ret = close((fd));                                       \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("close");                                   \
-               __ret;                                                         \
-       })
-
-#define xconnect(fd, addr, len)                                                \
-       ({                                                                     \
-               int __ret = connect((fd), (addr), (len));                      \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("connect");                                 \
-               __ret;                                                         \
-       })
-
-#define xgetsockname(fd, addr, len)                                            \
-       ({                                                                     \
-               int __ret = getsockname((fd), (addr), (len));                  \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("getsockname");                             \
-               __ret;                                                         \
-       })
-
-#define xgetsockopt(fd, level, name, val, len)                                 \
-       ({                                                                     \
-               int __ret = getsockopt((fd), (level), (name), (val), (len));   \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("getsockopt(" #name ")");                   \
-               __ret;                                                         \
-       })
-
-#define xlisten(fd, backlog)                                                   \
-       ({                                                                     \
-               int __ret = listen((fd), (backlog));                           \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("listen");                                  \
-               __ret;                                                         \
-       })
-
-#define xsetsockopt(fd, level, name, val, len)                                 \
-       ({                                                                     \
-               int __ret = setsockopt((fd), (level), (name), (val), (len));   \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("setsockopt(" #name ")");                   \
-               __ret;                                                         \
-       })
-
-#define xsend(fd, buf, len, flags)                                             \
-       ({                                                                     \
-               ssize_t __ret = send((fd), (buf), (len), (flags));             \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("send");                                    \
-               __ret;                                                         \
-       })
-
-#define xrecv_nonblock(fd, buf, len, flags)                                    \
-       ({                                                                     \
-               ssize_t __ret = recv_timeout((fd), (buf), (len), (flags),      \
-                                            IO_TIMEOUT_SEC);                  \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("recv");                                    \
-               __ret;                                                         \
-       })
-
-#define xsocket(family, sotype, flags)                                         \
-       ({                                                                     \
-               int __ret = socket(family, sotype, flags);                     \
-               if (__ret == -1)                                               \
-                       FAIL_ERRNO("socket");                                  \
-               __ret;                                                         \
-       })
-
-#define xbpf_map_delete_elem(fd, key)                                          \
-       ({                                                                     \
-               int __ret = bpf_map_delete_elem((fd), (key));                  \
-               if (__ret < 0)                                               \
-                       FAIL_ERRNO("map_delete");                              \
-               __ret;                                                         \
-       })
-
-#define xbpf_map_lookup_elem(fd, key, val)                                     \
-       ({                                                                     \
-               int __ret = bpf_map_lookup_elem((fd), (key), (val));           \
-               if (__ret < 0)                                               \
-                       FAIL_ERRNO("map_lookup");                              \
-               __ret;                                                         \
-       })
-
-#define xbpf_map_update_elem(fd, key, val, flags)                              \
-       ({                                                                     \
-               int __ret = bpf_map_update_elem((fd), (key), (val), (flags));  \
-               if (__ret < 0)                                               \
-                       FAIL_ERRNO("map_update");                              \
-               __ret;                                                         \
-       })
-
-#define xbpf_prog_attach(prog, target, type, flags)                            \
-       ({                                                                     \
-               int __ret =                                                    \
-                       bpf_prog_attach((prog), (target), (type), (flags));    \
-               if (__ret < 0)                                               \
-                       FAIL_ERRNO("prog_attach(" #type ")");                  \
-               __ret;                                                         \
-       })
-
-#define xbpf_prog_detach2(prog, target, type)                                  \
-       ({                                                                     \
-               int __ret = bpf_prog_detach2((prog), (target), (type));        \
-               if (__ret < 0)                                               \
-                       FAIL_ERRNO("prog_detach2(" #type ")");                 \
-               __ret;                                                         \
-       })
-
-#define xpthread_create(thread, attr, func, arg)                               \
-       ({                                                                     \
-               int __ret = pthread_create((thread), (attr), (func), (arg));   \
-               errno = __ret;                                                 \
-               if (__ret)                                                     \
-                       FAIL_ERRNO("pthread_create");                          \
-               __ret;                                                         \
-       })
-
-#define xpthread_join(thread, retval)                                          \
-       ({                                                                     \
-               int __ret = pthread_join((thread), (retval));                  \
-               errno = __ret;                                                 \
-               if (__ret)                                                     \
-                       FAIL_ERRNO("pthread_join");                            \
-               __ret;                                                         \
-       })
-
-static int poll_read(int fd, unsigned int timeout_sec)
-{
-       struct timeval timeout = { .tv_sec = timeout_sec };
-       fd_set rfds;
-       int r;
-
-       FD_ZERO(&rfds);
-       FD_SET(fd, &rfds);
-
-       r = select(fd + 1, &rfds, NULL, NULL, &timeout);
-       if (r == 0)
-               errno = ETIME;
-
-       return r == 1 ? 0 : -1;
-}
-
-static int accept_timeout(int fd, struct sockaddr *addr, socklen_t *len,
-                         unsigned int timeout_sec)
-{
-       if (poll_read(fd, timeout_sec))
-               return -1;
-
-       return accept(fd, addr, len);
-}
-
-static int recv_timeout(int fd, void *buf, size_t len, int flags,
-                       unsigned int timeout_sec)
-{
-       if (poll_read(fd, timeout_sec))
-               return -1;
-
-       return recv(fd, buf, len, flags);
-}
-
-static void init_addr_loopback4(struct sockaddr_storage *ss, socklen_t *len)
-{
-       struct sockaddr_in *addr4 = memset(ss, 0, sizeof(*ss));
-
-       addr4->sin_family = AF_INET;
-       addr4->sin_port = 0;
-       addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK);
-       *len = sizeof(*addr4);
-}
-
-static void init_addr_loopback6(struct sockaddr_storage *ss, socklen_t *len)
-{
-       struct sockaddr_in6 *addr6 = memset(ss, 0, sizeof(*ss));
-
-       addr6->sin6_family = AF_INET6;
-       addr6->sin6_port = 0;
-       addr6->sin6_addr = in6addr_loopback;
-       *len = sizeof(*addr6);
-}
-
-static void init_addr_loopback_vsock(struct sockaddr_storage *ss, socklen_t *len)
-{
-       struct sockaddr_vm *addr = memset(ss, 0, sizeof(*ss));
-
-       addr->svm_family = AF_VSOCK;
-       addr->svm_port = VMADDR_PORT_ANY;
-       addr->svm_cid = VMADDR_CID_LOCAL;
-       *len = sizeof(*addr);
-}
-
-static void init_addr_loopback(int family, struct sockaddr_storage *ss,
-                              socklen_t *len)
-{
-       switch (family) {
-       case AF_INET:
-               init_addr_loopback4(ss, len);
-               return;
-       case AF_INET6:
-               init_addr_loopback6(ss, len);
-               return;
-       case AF_VSOCK:
-               init_addr_loopback_vsock(ss, len);
-               return;
-       default:
-               FAIL("unsupported address family %d", family);
-       }
-}
-
-static inline struct sockaddr *sockaddr(struct sockaddr_storage *ss)
-{
-       return (struct sockaddr *)ss;
-}
-
-static int enable_reuseport(int s, int progfd)
-{
-       int err, one = 1;
-
-       err = xsetsockopt(s, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one));
-       if (err)
-               return -1;
-       err = xsetsockopt(s, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, &progfd,
-                         sizeof(progfd));
-       if (err)
-               return -1;
-
-       return 0;
-}
-
-static int socket_loopback_reuseport(int family, int sotype, int progfd)
-{
-       struct sockaddr_storage addr;
-       socklen_t len;
-       int err, s;
-
-       init_addr_loopback(family, &addr, &len);
-
-       s = xsocket(family, sotype, 0);
-       if (s == -1)
-               return -1;
-
-       if (progfd >= 0)
-               enable_reuseport(s, progfd);
-
-       err = xbind(s, sockaddr(&addr), len);
-       if (err)
-               goto close;
-
-       if (sotype & SOCK_DGRAM)
-               return s;
-
-       err = xlisten(s, SOMAXCONN);
-       if (err)
-               goto close;
-
-       return s;
-close:
-       xclose(s);
-       return -1;
-}
-
-static int socket_loopback(int family, int sotype)
-{
-       return socket_loopback_reuseport(family, sotype, -1);
-}
+#include "sockmap_helpers.h"
 
 static void test_insert_invalid(struct test_sockmap_listen *skel __always_unused,
                                int family, int sotype, int mapfd)
@@ -984,31 +671,12 @@ static const char *redir_mode_str(enum redir_mode mode)
        }
 }
 
-static int add_to_sockmap(int sock_mapfd, int fd1, int fd2)
-{
-       u64 value;
-       u32 key;
-       int err;
-
-       key = 0;
-       value = fd1;
-       err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
-       if (err)
-               return err;
-
-       key = 1;
-       value = fd2;
-       return xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST);
-}
-
 static void redir_to_connected(int family, int sotype, int sock_mapfd,
                               int verd_mapfd, enum redir_mode mode)
 {
        const char *log_prefix = redir_mode_str(mode);
-       struct sockaddr_storage addr;
        int s, c0, c1, p0, p1;
        unsigned int pass;
-       socklen_t len;
        int err, n;
        u32 key;
        char b;
@@ -1019,36 +687,13 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
        if (s < 0)
                return;
 
-       len = sizeof(addr);
-       err = xgetsockname(s, sockaddr(&addr), &len);
+       err = create_socket_pairs(s, family, sotype, &c0, &c1, &p0, &p1);
        if (err)
                goto close_srv;
 
-       c0 = xsocket(family, sotype, 0);
-       if (c0 < 0)
-               goto close_srv;
-       err = xconnect(c0, sockaddr(&addr), len);
-       if (err)
-               goto close_cli0;
-
-       p0 = xaccept_nonblock(s, NULL, NULL);
-       if (p0 < 0)
-               goto close_cli0;
-
-       c1 = xsocket(family, sotype, 0);
-       if (c1 < 0)
-               goto close_peer0;
-       err = xconnect(c1, sockaddr(&addr), len);
-       if (err)
-               goto close_cli1;
-
-       p1 = xaccept_nonblock(s, NULL, NULL);
-       if (p1 < 0)
-               goto close_cli1;
-
        err = add_to_sockmap(sock_mapfd, p0, p1);
        if (err)
-               goto close_peer1;
+               goto close;
 
        n = write(mode == REDIR_INGRESS ? c1 : p1, "a", 1);
        if (n < 0)
@@ -1056,12 +701,12 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
        if (n == 0)
                FAIL("%s: incomplete write", log_prefix);
        if (n < 1)
-               goto close_peer1;
+               goto close;
 
        key = SK_PASS;
        err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
        if (err)
-               goto close_peer1;
+               goto close;
        if (pass != 1)
                FAIL("%s: want pass count 1, have %d", log_prefix, pass);
        n = recv_timeout(c0, &b, 1, 0, IO_TIMEOUT_SEC);
@@ -1070,13 +715,10 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
        if (n == 0)
                FAIL("%s: incomplete recv", log_prefix);
 
-close_peer1:
+close:
        xclose(p1);
-close_cli1:
        xclose(c1);
-close_peer0:
        xclose(p0);
-close_cli0:
        xclose(c0);
 close_srv:
        xclose(s);
index 4512dd8..05d0e07 100644 (file)
@@ -209,7 +209,7 @@ static int getsetsockopt(void)
                        err, errno);
                goto err;
        }
-       ASSERT_EQ(optlen, 4, "Unexpected NETLINK_LIST_MEMBERSHIPS value");
+       ASSERT_EQ(optlen, 8, "Unexpected NETLINK_LIST_MEMBERSHIPS value");
 
        free(big_buf);
        close(fd);
diff --git a/tools/testing/selftests/bpf/prog_tests/subprogs_extable.c b/tools/testing/selftests/bpf/prog_tests/subprogs_extable.c
new file mode 100644 (file)
index 0000000..3afd9f7
--- /dev/null
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include "test_subprogs_extable.skel.h"
+
+void test_subprogs_extable(void)
+{
+       const int read_sz = 456;
+       struct test_subprogs_extable *skel;
+       int err;
+
+       skel = test_subprogs_extable__open_and_load();
+       if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+               return;
+
+       err = test_subprogs_extable__attach(skel);
+       if (!ASSERT_OK(err, "skel_attach"))
+               goto cleanup;
+
+       /* trigger tracepoint */
+       ASSERT_OK(trigger_module_test_read(read_sz), "trigger_read");
+
+       ASSERT_NEQ(skel->bss->triggered, 0, "verify at least one program ran");
+
+       test_subprogs_extable__detach(skel);
+
+cleanup:
+       test_subprogs_extable__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/inner_array_lookup.c b/tools/testing/selftests/bpf/progs/inner_array_lookup.c
new file mode 100644 (file)
index 0000000..c2c8f2f
--- /dev/null
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+struct inner_map {
+       __uint(type, BPF_MAP_TYPE_ARRAY);
+       __uint(max_entries, 5);
+       __type(key, int);
+       __type(value, int);
+} inner_map1 SEC(".maps");
+
+struct outer_map {
+       __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
+       __uint(max_entries, 3);
+       __type(key, int);
+       __array(values, struct inner_map);
+} outer_map1 SEC(".maps") = {
+       .values = {
+               [2] = &inner_map1,
+       },
+};
+
+SEC("raw_tp/sys_enter")
+int handle__sys_enter(void *ctx)
+{
+       int outer_key = 2, inner_key = 3;
+       int *val;
+       void *map;
+
+       map = bpf_map_lookup_elem(&outer_map1, &outer_key);
+       if (!map)
+               return 1;
+
+       val = bpf_map_lookup_elem(map, &inner_key);
+       if (!val)
+               return 1;
+
+       if (*val == 1)
+               *val = 2;
+
+       return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_drop_prog.c b/tools/testing/selftests/bpf/progs/test_sockmap_drop_prog.c
new file mode 100644 (file)
index 0000000..2931480
--- /dev/null
@@ -0,0 +1,32 @@
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+struct {
+       __uint(type, BPF_MAP_TYPE_SOCKMAP);
+       __uint(max_entries, 20);
+       __type(key, int);
+       __type(value, int);
+} sock_map_rx SEC(".maps");
+
+struct {
+       __uint(type, BPF_MAP_TYPE_SOCKMAP);
+       __uint(max_entries, 20);
+       __type(key, int);
+       __type(value, int);
+} sock_map_tx SEC(".maps");
+
+struct {
+       __uint(type, BPF_MAP_TYPE_SOCKMAP);
+       __uint(max_entries, 20);
+       __type(key, int);
+       __type(value, int);
+} sock_map_msg SEC(".maps");
+
+SEC("sk_skb")
+int prog_skb_verdict(struct __sk_buff *skb)
+{
+       return SK_DROP;
+}
+
+char _license[] SEC("license") = "GPL";
index baf9ebc..99d2ea9 100644 (file)
@@ -191,7 +191,7 @@ SEC("sockops")
 int bpf_sockmap(struct bpf_sock_ops *skops)
 {
        __u32 lport, rport;
-       int op, err, ret;
+       int op, ret;
 
        op = (int) skops->op;
 
@@ -203,10 +203,10 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
                if (lport == 10000) {
                        ret = 1;
 #ifdef SOCKMAP
-                       err = bpf_sock_map_update(skops, &sock_map, &ret,
+                       bpf_sock_map_update(skops, &sock_map, &ret,
                                                  BPF_NOEXIST);
 #else
-                       err = bpf_sock_hash_update(skops, &sock_map, &ret,
+                       bpf_sock_hash_update(skops, &sock_map, &ret,
                                                   BPF_NOEXIST);
 #endif
                }
@@ -218,10 +218,10 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
                if (bpf_ntohl(rport) == 10001) {
                        ret = 10;
 #ifdef SOCKMAP
-                       err = bpf_sock_map_update(skops, &sock_map, &ret,
+                       bpf_sock_map_update(skops, &sock_map, &ret,
                                                  BPF_NOEXIST);
 #else
-                       err = bpf_sock_hash_update(skops, &sock_map, &ret,
+                       bpf_sock_hash_update(skops, &sock_map, &ret,
                                                   BPF_NOEXIST);
 #endif
                }
@@ -230,8 +230,6 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
                break;
        }
 
-       __sink(err);
-
        return 0;
 }
 
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c b/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c
new file mode 100644 (file)
index 0000000..1d86a71
--- /dev/null
@@ -0,0 +1,32 @@
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+struct {
+       __uint(type, BPF_MAP_TYPE_SOCKMAP);
+       __uint(max_entries, 20);
+       __type(key, int);
+       __type(value, int);
+} sock_map_rx SEC(".maps");
+
+struct {
+       __uint(type, BPF_MAP_TYPE_SOCKMAP);
+       __uint(max_entries, 20);
+       __type(key, int);
+       __type(value, int);
+} sock_map_tx SEC(".maps");
+
+struct {
+       __uint(type, BPF_MAP_TYPE_SOCKMAP);
+       __uint(max_entries, 20);
+       __type(key, int);
+       __type(value, int);
+} sock_map_msg SEC(".maps");
+
+SEC("sk_skb")
+int prog_skb_verdict(struct __sk_buff *skb)
+{
+       return SK_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_subprogs_extable.c b/tools/testing/selftests/bpf/progs/test_subprogs_extable.c
new file mode 100644 (file)
index 0000000..e2a21fb
--- /dev/null
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct {
+       __uint(type, BPF_MAP_TYPE_ARRAY);
+       __uint(max_entries, 8);
+       __type(key, __u32);
+       __type(value, __u64);
+} test_array SEC(".maps");
+
+unsigned int triggered;
+
+static __u64 test_cb(struct bpf_map *map, __u32 *key, __u64 *val, void *data)
+{
+       return 1;
+}
+
+SEC("fexit/bpf_testmod_return_ptr")
+int BPF_PROG(handle_fexit_ret_subprogs, int arg, struct file *ret)
+{
+       *(volatile long *)ret;
+       *(volatile int *)&ret->f_mode;
+       bpf_for_each_map_elem(&test_array, test_cb, NULL, 0);
+       triggered++;
+       return 0;
+}
+
+SEC("fexit/bpf_testmod_return_ptr")
+int BPF_PROG(handle_fexit_ret_subprogs2, int arg, struct file *ret)
+{
+       *(volatile long *)ret;
+       *(volatile int *)&ret->f_mode;
+       bpf_for_each_map_elem(&test_array, test_cb, NULL, 0);
+       triggered++;
+       return 0;
+}
+
+SEC("fexit/bpf_testmod_return_ptr")
+int BPF_PROG(handle_fexit_ret_subprogs3, int arg, struct file *ret)
+{
+       *(volatile long *)ret;
+       *(volatile int *)&ret->f_mode;
+       bpf_for_each_map_elem(&test_array, test_cb, NULL, 0);
+       triggered++;
+       return 0;
+}
+
+char _license[] SEC("license") = "GPL";
index 136e553..6115520 100644 (file)
@@ -371,4 +371,83 @@ __naked void and_then_at_fp_8(void)
 "      ::: __clobber_all);
 }
 
+SEC("xdp")
+__description("32-bit spill of 64-bit reg should clear ID")
+__failure __msg("math between ctx pointer and 4294967295 is not allowed")
+__naked void spill_32bit_of_64bit_fail(void)
+{
+       asm volatile ("                                 \
+       r6 = r1;                                        \
+       /* Roll one bit to force the verifier to track both branches. */\
+       call %[bpf_get_prandom_u32];                    \
+       r0 &= 0x8;                                      \
+       /* Put a large number into r1. */               \
+       r1 = 0xffffffff;                                \
+       r1 <<= 32;                                      \
+       r1 += r0;                                       \
+       /* Assign an ID to r1. */                       \
+       r2 = r1;                                        \
+       /* 32-bit spill r1 to stack - should clear the ID! */\
+       *(u32*)(r10 - 8) = r1;                          \
+       /* 32-bit fill r2 from stack. */                \
+       r2 = *(u32*)(r10 - 8);                          \
+       /* Compare r2 with another register to trigger find_equal_scalars.\
+        * Having one random bit is important here, otherwise the verifier cuts\
+        * the corners. If the ID was mistakenly preserved on spill, this would\
+        * cause the verifier to think that r1 is also equal to zero in one of\
+        * the branches, and equal to eight on the other branch.\
+        */                                             \
+       r3 = 0;                                         \
+       if r2 != r3 goto l0_%=;                         \
+l0_%=: r1 >>= 32;                                      \
+       /* At this point, if the verifier thinks that r1 is 0, an out-of-bounds\
+        * read will happen, because it actually contains 0xffffffff.\
+        */                                             \
+       r6 += r1;                                       \
+       r0 = *(u32*)(r6 + 0);                           \
+       exit;                                           \
+"      :
+       : __imm(bpf_get_prandom_u32)
+       : __clobber_all);
+}
+
+SEC("xdp")
+__description("16-bit spill of 32-bit reg should clear ID")
+__failure __msg("dereference of modified ctx ptr R6 off=65535 disallowed")
+__naked void spill_16bit_of_32bit_fail(void)
+{
+       asm volatile ("                                 \
+       r6 = r1;                                        \
+       /* Roll one bit to force the verifier to track both branches. */\
+       call %[bpf_get_prandom_u32];                    \
+       r0 &= 0x8;                                      \
+       /* Put a large number into r1. */               \
+       w1 = 0xffff0000;                                \
+       r1 += r0;                                       \
+       /* Assign an ID to r1. */                       \
+       r2 = r1;                                        \
+       /* 16-bit spill r1 to stack - should clear the ID! */\
+       *(u16*)(r10 - 8) = r1;                          \
+       /* 16-bit fill r2 from stack. */                \
+       r2 = *(u16*)(r10 - 8);                          \
+       /* Compare r2 with another register to trigger find_equal_scalars.\
+        * Having one random bit is important here, otherwise the verifier cuts\
+        * the corners. If the ID was mistakenly preserved on spill, this would\
+        * cause the verifier to think that r1 is also equal to zero in one of\
+        * the branches, and equal to eight on the other branch.\
+        */                                             \
+       r3 = 0;                                         \
+       if r2 != r3 goto l0_%=;                         \
+l0_%=: r1 >>= 16;                                      \
+       /* At this point, if the verifier thinks that r1 is 0, an out-of-bounds\
+        * read will happen, because it actually contains 0xffff.\
+        */                                             \
+       r6 += r1;                                       \
+       r0 = *(u32*)(r6 + 0);                           \
+       exit;                                           \
+"      :
+       : __imm(bpf_get_prandom_u32)
+       : __clobber_all);
+}
+
 char _license[] SEC("license") = "GPL";
index e495f89..e60cf4d 100644 (file)
@@ -129,7 +129,7 @@ int main(int argc, char *argv[])
        uid_t uid = getuid();
 
        ksft_print_header();
-       ksft_set_plan(18);
+       ksft_set_plan(19);
        test_clone3_supported();
 
        /* Just a simple clone3() should return 0.*/
@@ -198,5 +198,8 @@ int main(int argc, char *argv[])
        /* Do a clone3() in a new time namespace */
        test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST);
 
+       /* Do a clone3() with exit signal (SIGCHLD) in flags */
+       test_clone3(SIGCHLD, 0, -EINVAL, CLONE3_ARGS_NO_TEST);
+
        ksft_finished();
 }
index 75e9007..ce5068f 100644 (file)
@@ -5,11 +5,3 @@ CONFIG_CPU_FREQ_GOV_USERSPACE=y
 CONFIG_CPU_FREQ_GOV_ONDEMAND=y
 CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
 CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y
-CONFIG_DEBUG_RT_MUTEXES=y
-CONFIG_DEBUG_PLIST=y
-CONFIG_DEBUG_SPINLOCK=y
-CONFIG_DEBUG_MUTEXES=y
-CONFIG_DEBUG_LOCK_ALLOC=y
-CONFIG_PROVE_LOCKING=y
-CONFIG_LOCKDEP=y
-CONFIG_DEBUG_ATOMIC_SLEEP=y
index db29a31..607ba5c 100755 (executable)
@@ -6,6 +6,7 @@
 ALL_TESTS="
        prio
        arp_validate
+       num_grat_arp
 "
 
 REQUIRE_MZ=no
@@ -255,6 +256,55 @@ arp_validate()
        arp_validate_ns "active-backup"
 }
 
+garp_test()
+{
+       local param="$1"
+       local active_slave exp_num real_num i
+       RET=0
+
+       # create bond
+       bond_reset "${param}"
+
+       bond_check_connection
+       [ $RET -ne 0 ] && log_test "num_grat_arp" "$retmsg"
+
+
+       # Add tc rules to count GARP number
+       for i in $(seq 0 2); do
+               tc -n ${g_ns} filter add dev s$i ingress protocol arp pref 1 handle 101 \
+                       flower skip_hw arp_op request arp_sip ${s_ip4} arp_tip ${s_ip4} action pass
+       done
+
+       # Do failover
+       active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
+       ip -n ${s_ns} link set ${active_slave} down
+
+       exp_num=$(echo "${param}" | cut -f6 -d ' ')
+       sleep $((exp_num + 2))
+
+       active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
+
+       # check result
+       real_num=$(tc_rule_handle_stats_get "dev s${active_slave#eth} ingress" 101 ".packets" "-n ${g_ns}")
+       if [ "${real_num}" -ne "${exp_num}" ]; then
+               echo "$real_num garp packets sent on active slave ${active_slave}"
+               RET=1
+       fi
+
+       for i in $(seq 0 2); do
+               tc -n ${g_ns} filter del dev s$i ingress
+       done
+}
+
+num_grat_arp()
+{
+       local val
+       for val in 10 20 30 50; do
+               garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000"
+               log_test "num_grat_arp" "active-backup miimon num_grat_arp $val"
+       done
+}
+
 trap cleanup EXIT
 
 setup_prepare
index 4045ca9..69ab99a 100644 (file)
@@ -61,6 +61,8 @@ server_create()
                ip -n ${g_ns} link set s${i} up
                ip -n ${g_ns} link set s${i} master br0
                ip -n ${s_ns} link set eth${i} master bond0
+
+               tc -n ${g_ns} qdisc add dev s${i} clsact
        done
 
        ip -n ${s_ns} link set bond0 up
index d6e106f..a1e955d 100644 (file)
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
 all:
 
-TEST_PROGS := ftracetest
+TEST_PROGS_EXTENDED := ftracetest
+TEST_PROGS := ftracetest-ktap
 TEST_FILES := test.d settings
 EXTRA_CLEAN := $(OUTPUT)/logs/*
 
index c3311c8..cb5f18c 100755 (executable)
@@ -13,6 +13,7 @@ echo "Usage: ftracetest [options] [testcase(s)] [testcase-directory(s)]"
 echo " Options:"
 echo "         -h|--help  Show help message"
 echo "         -k|--keep  Keep passed test logs"
+echo "         -K|--ktap  Output in KTAP format"
 echo "         -v|--verbose Increase verbosity of test messages"
 echo "         -vv        Alias of -v -v (Show all results in stdout)"
 echo "         -vvv       Alias of -v -v -v (Show all commands immediately)"
@@ -85,6 +86,10 @@ parse_opts() { # opts
       KEEP_LOG=1
       shift 1
     ;;
+    --ktap|-K)
+      KTAP=1
+      shift 1
+    ;;
     --verbose|-v|-vv|-vvv)
       if [ $VERBOSE -eq -1 ]; then
        usage "--console can not use with --verbose"
@@ -178,6 +183,7 @@ TEST_DIR=$TOP_DIR/test.d
 TEST_CASES=`find_testcases $TEST_DIR`
 LOG_DIR=$TOP_DIR/logs/`date +%Y%m%d-%H%M%S`/
 KEEP_LOG=0
+KTAP=0
 DEBUG=0
 VERBOSE=0
 UNSUPPORTED_RESULT=0
@@ -229,7 +235,7 @@ prlog() { # messages
     newline=
     shift
   fi
-  printf "$*$newline"
+  [ "$KTAP" != "1" ] && printf "$*$newline"
   [ "$LOG_FILE" ] && printf "$*$newline" | strip_esc >> $LOG_FILE
 }
 catlog() { #file
@@ -260,11 +266,11 @@ TOTAL_RESULT=0
 
 INSTANCE=
 CASENO=0
+CASENAME=
 
 testcase() { # testfile
   CASENO=$((CASENO+1))
-  desc=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
-  prlog -n "[$CASENO]$INSTANCE$desc"
+  CASENAME=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
 }
 
 checkreq() { # testfile
@@ -277,40 +283,68 @@ test_on_instance() { # testfile
   grep -q "^#[ \t]*flags:.*instance" $1
 }
 
+ktaptest() { # result comment
+  if [ "$KTAP" != "1" ]; then
+    return
+  fi
+
+  local result=
+  if [ "$1" = "1" ]; then
+    result="ok"
+  else
+    result="not ok"
+  fi
+  shift
+
+  local comment=$*
+  if [ "$comment" != "" ]; then
+    comment="# $comment"
+  fi
+
+  echo $result $CASENO $INSTANCE$CASENAME $comment
+}
+
 eval_result() { # sigval
   case $1 in
     $PASS)
       prlog "  [${color_green}PASS${color_reset}]"
+      ktaptest 1
       PASSED_CASES="$PASSED_CASES $CASENO"
       return 0
     ;;
     $FAIL)
       prlog "  [${color_red}FAIL${color_reset}]"
+      ktaptest 0
       FAILED_CASES="$FAILED_CASES $CASENO"
       return 1 # this is a bug.
     ;;
     $UNRESOLVED)
       prlog "  [${color_blue}UNRESOLVED${color_reset}]"
+      ktaptest 0 UNRESOLVED
       UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
       return $UNRESOLVED_RESULT # depends on use case
     ;;
     $UNTESTED)
       prlog "  [${color_blue}UNTESTED${color_reset}]"
+      ktaptest 1 SKIP
       UNTESTED_CASES="$UNTESTED_CASES $CASENO"
       return 0
     ;;
     $UNSUPPORTED)
       prlog "  [${color_blue}UNSUPPORTED${color_reset}]"
+      ktaptest 1 SKIP
       UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
       return $UNSUPPORTED_RESULT # depends on use case
     ;;
     $XFAIL)
       prlog "  [${color_green}XFAIL${color_reset}]"
+      ktaptest 1 XFAIL
       XFAILED_CASES="$XFAILED_CASES $CASENO"
       return 0
     ;;
     *)
       prlog "  [${color_blue}UNDEFINED${color_reset}]"
+      ktaptest 0 error
       UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
       return 1 # this must be a test bug
     ;;
@@ -371,6 +405,7 @@ __run_test() { # testfile
 run_test() { # testfile
   local testname=`basename $1`
   testcase $1
+  prlog -n "[$CASENO]$INSTANCE$CASENAME"
   if [ ! -z "$LOG_FILE" ] ; then
     local testlog=`mktemp $LOG_DIR/${CASENO}-${testname}-log.XXXXXX`
   else
@@ -405,6 +440,17 @@ run_test() { # testfile
 # load in the helper functions
 . $TEST_DIR/functions
 
+if [ "$KTAP" = "1" ]; then
+  echo "TAP version 13"
+
+  casecount=`echo $TEST_CASES | wc -w`
+  for t in $TEST_CASES; do
+    test_on_instance $t || continue
+    casecount=$((casecount+1))
+  done
+  echo "1..${casecount}"
+fi
+
 # Main loop
 for t in $TEST_CASES; do
   run_test $t
@@ -439,6 +485,17 @@ prlog "# of unsupported: " `echo $UNSUPPORTED_CASES | wc -w`
 prlog "# of xfailed: " `echo $XFAILED_CASES | wc -w`
 prlog "# of undefined(test bug): " `echo $UNDEFINED_CASES | wc -w`
 
+if [ "$KTAP" = "1" ]; then
+  echo -n "# Totals:"
+  echo -n " pass:"`echo $PASSED_CASES | wc -w`
+  echo -n " faii:"`echo $FAILED_CASES | wc -w`
+  echo -n " xfail:"`echo $XFAILED_CASES | wc -w`
+  echo -n " xpass:0"
+  echo -n " skip:"`echo $UNTESTED_CASES $UNSUPPORTED_CASES | wc -w`
+  echo -n " error:"`echo $UNRESOLVED_CASES $UNDEFINED_CASES | wc -w`
+  echo
+fi
+
 cleanup
 
 # if no error, return 0
diff --git a/tools/testing/selftests/ftrace/ftracetest-ktap b/tools/testing/selftests/ftrace/ftracetest-ktap
new file mode 100755 (executable)
index 0000000..b328467
--- /dev/null
@@ -0,0 +1,8 @@
+#!/bin/sh -e
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ftracetest-ktap: Wrapper to integrate ftracetest with the kselftest runner
+#
+# Copyright (C) Arm Ltd., 2023
+
+./ftracetest -K
index e2ff3bf..2de7c61 100644 (file)
@@ -9,18 +9,33 @@ fail() { #msg
     exit_fail
 }
 
-echo "Test event filter function name"
+sample_events() {
+    echo > trace
+    echo 1 > events/kmem/kmem_cache_free/enable
+    echo 1 > tracing_on
+    ls > /dev/null
+    echo 0 > tracing_on
+    echo 0 > events/kmem/kmem_cache_free/enable
+}
+
 echo 0 > tracing_on
 echo 0 > events/enable
+
+echo "Get the most frequently calling function"
+sample_events
+
+target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+if [ -z "$target_func" ]; then
+    exit_fail
+fi
 echo > trace
-echo 'call_site.function == exit_mmap' > events/kmem/kmem_cache_free/filter
-echo 1 > events/kmem/kmem_cache_free/enable
-echo 1 > tracing_on
-ls > /dev/null
-echo 0 > events/kmem/kmem_cache_free/enable
 
-hitcnt=`grep kmem_cache_free trace| grep exit_mmap | wc -l`
-misscnt=`grep kmem_cache_free trace| grep -v exit_mmap | wc -l`
+echo "Test event filter function name"
+echo "call_site.function == $target_func" > events/kmem/kmem_cache_free/filter
+sample_events
+
+hitcnt=`grep kmem_cache_free trace| grep $target_func | wc -l`
+misscnt=`grep kmem_cache_free trace| grep -v $target_func | wc -l`
 
 if [ $hitcnt -eq 0 ]; then
        exit_fail
@@ -30,20 +45,14 @@ if [ $misscnt -gt 0 ]; then
        exit_fail
 fi
 
-address=`grep ' exit_mmap$' /proc/kallsyms | cut -d' ' -f1`
+address=`grep " ${target_func}\$" /proc/kallsyms | cut -d' ' -f1`
 
 echo "Test event filter function address"
-echo 0 > tracing_on
-echo 0 > events/enable
-echo > trace
 echo "call_site.function == 0x$address" > events/kmem/kmem_cache_free/filter
-echo 1 > events/kmem/kmem_cache_free/enable
-echo 1 > tracing_on
-sleep 1
-echo 0 > events/kmem/kmem_cache_free/enable
+sample_events
 
-hitcnt=`grep kmem_cache_free trace| grep exit_mmap | wc -l`
-misscnt=`grep kmem_cache_free trace| grep -v exit_mmap | wc -l`
+hitcnt=`grep kmem_cache_free trace| grep $target_func | wc -l`
+misscnt=`grep kmem_cache_free trace| grep -v $target_func | wc -l`
 
 if [ $hitcnt -eq 0 ]; then
        exit_fail
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc
new file mode 100644 (file)
index 0000000..9f5d993
--- /dev/null
@@ -0,0 +1,34 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (C) 2023 Akanksha J N, IBM corporation
+# description: Register/unregister optimized probe
+# requires: kprobe_events
+
+case `uname -m` in
+x86_64)
+;;
+arm*)
+;;
+ppc*)
+;;
+*)
+  echo "Please implement other architecture here"
+  exit_unsupported
+esac
+
+DEFAULT=$(cat /proc/sys/debug/kprobes-optimization)
+echo 1 > /proc/sys/debug/kprobes-optimization
+for i in `seq 0 255`; do
+        echo  "p:testprobe $FUNCTION_FORK+${i}" > kprobe_events || continue
+        echo 1 > events/kprobes/enable || continue
+        (echo "forked")
+       PROBE=$(grep $FUNCTION_FORK /sys/kernel/debug/kprobes/list)
+        echo 0 > events/kprobes/enable
+        echo > kprobe_events
+       if echo $PROBE | grep -q OPTIMIZED; then
+                echo "$DEFAULT" >  /proc/sys/debug/kprobes-optimization
+                exit_pass
+        fi
+done
+echo "$DEFAULT" >  /proc/sys/debug/kprobes-optimization
+exit_unresolved
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack-legacy.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack-legacy.tc
new file mode 100644 (file)
index 0000000..d0cd91a
--- /dev/null
@@ -0,0 +1,24 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test inter-event histogram trigger trace action with dynamic string param (legacy stack)
+# requires: set_event synthetic_events events/sched/sched_process_exec/hist "long[] stack' >> synthetic_events":README
+
+fail() { #msg
+    echo $1
+    exit_fail
+}
+
+echo "Test create synthetic event with stack"
+
+# Test the old stacktrace keyword (for backward compatibility)
+echo 's:wake_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
+echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger
+echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(wake_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
+echo 1 > events/synthetic/wake_lat/enable
+sleep 1
+
+if ! grep -q "=>.*sched" trace; then
+    fail "Failed to create synthetic event with stack"
+fi
+
+exit 0
index 755dbe9..8f1cc9a 100644 (file)
@@ -1,7 +1,7 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
 # description: event trigger - test inter-event histogram trigger trace action with dynamic string param
-# requires: set_event synthetic_events events/sched/sched_process_exec/hist "long[]' >> synthetic_events":README
+# requires: set_event synthetic_events events/sched/sched_process_exec/hist "can be any field, or the special string 'common_stacktrace'":README
 
 fail() { #msg
     echo $1
@@ -10,9 +10,8 @@ fail() { #msg
 
 echo "Test create synthetic event with stack"
 
-
 echo 's:wake_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
-echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger
+echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=common_stacktrace  if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger
 echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(wake_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
 echo 1 > events/synthetic/wake_lat/enable
 sleep 1
index 9f539d4..fa2ce2b 100755 (executable)
@@ -389,6 +389,9 @@ create_chip chip
 create_bank chip bank
 set_num_lines chip bank 8
 enable_chip chip
+DEVNAME=`configfs_dev_name chip`
+CHIPNAME=`configfs_chip_name chip bank`
+SYSFS_PATH="/sys/devices/platform/$DEVNAME/$CHIPNAME/sim_gpio0/value"
 $BASE_DIR/gpio-mockup-cdev -b pull-up /dev/`configfs_chip_name chip bank` 0
 test `cat $SYSFS_PATH` = "1" || fail "bias setting does not work"
 remove_chip chip
index 294619a..1c952d1 100644 (file)
@@ -8,7 +8,8 @@ export logfile=/dev/stdout
 export per_test_logging=
 
 # Defaults for "settings" file fields:
-# "timeout" how many seconds to let each test run before failing.
+# "timeout" how many seconds to let each test run before running
+# over our soft timeout limit.
 export kselftest_default_timeout=45
 
 # There isn't a shell-agnostic way to find the path of a sourced file,
@@ -90,6 +91,14 @@ run_one()
                done < "$settings"
        fi
 
+       # Command line timeout overrides the settings file
+       if [ -n "$kselftest_override_timeout" ]; then
+               kselftest_timeout="$kselftest_override_timeout"
+               echo "# overriding timeout to $kselftest_timeout" >> "$logfile"
+       else
+               echo "# timeout set to $kselftest_timeout" >> "$logfile"
+       fi
+
        TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST"
        echo "# $TEST_HDR_MSG"
        if [ ! -e "$TEST" ]; then
index d8bff20..5fd49ad 100644 (file)
 
 /**
  * FIXTURE_SETUP() - Prepares the setup function for the fixture.
- * *_metadata* is included so that EXPECT_* and ASSERT_* work correctly.
+ * *_metadata* is included so that EXPECT_*, ASSERT_* etc. work correctly.
  *
  * @fixture_name: fixture name
  *
 
 /**
  * FIXTURE_TEARDOWN()
- * *_metadata* is included so that EXPECT_* and ASSERT_* work correctly.
+ * *_metadata* is included so that EXPECT_*, ASSERT_* etc. work correctly.
  *
  * @fixture_name: fixture name
  *
                if (setjmp(_metadata->env) == 0) { \
                        fixture_name##_setup(_metadata, &self, variant->data); \
                        /* Let setup failure terminate early. */ \
-                       if (!_metadata->passed) \
+                       if (!_metadata->passed || _metadata->skip) \
                                return; \
                        _metadata->setup_completed = true; \
                        fixture_name##_##test_name(_metadata, &self, variant->data); \
index 7a5ff64..4761b76 100644 (file)
@@ -116,6 +116,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/sev_migrate_tests
 TEST_GEN_PROGS_x86_64 += x86_64/amx_test
 TEST_GEN_PROGS_x86_64 += x86_64/max_vcpuid_cap_test
 TEST_GEN_PROGS_x86_64 += x86_64/triple_fault_event_test
+TEST_GEN_PROGS_x86_64 += x86_64/recalc_apic_map_test
 TEST_GEN_PROGS_x86_64 += access_tracking_perf_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
index d4e1f4a..4f10055 100644 (file)
@@ -48,6 +48,34 @@ struct reg_sublist {
        __u64 rejects_set_n;
 };
 
+struct feature_id_reg {
+       __u64 reg;
+       __u64 id_reg;
+       __u64 feat_shift;
+       __u64 feat_min;
+};
+
+static struct feature_id_reg feat_id_regs[] = {
+       {
+               ARM64_SYS_REG(3, 0, 2, 0, 3),   /* TCR2_EL1 */
+               ARM64_SYS_REG(3, 0, 0, 7, 3),   /* ID_AA64MMFR3_EL1 */
+               0,
+               1
+       },
+       {
+               ARM64_SYS_REG(3, 0, 10, 2, 2),  /* PIRE0_EL1 */
+               ARM64_SYS_REG(3, 0, 0, 7, 3),   /* ID_AA64MMFR3_EL1 */
+               4,
+               1
+       },
+       {
+               ARM64_SYS_REG(3, 0, 10, 2, 3),  /* PIR_EL1 */
+               ARM64_SYS_REG(3, 0, 0, 7, 3),   /* ID_AA64MMFR3_EL1 */
+               4,
+               1
+       }
+};
+
 struct vcpu_config {
        char *name;
        struct reg_sublist sublists[];
@@ -68,7 +96,8 @@ static int vcpu_configs_n;
 
 #define for_each_missing_reg(i)                                                        \
        for ((i) = 0; (i) < blessed_n; ++(i))                                   \
-               if (!find_reg(reg_list->reg, reg_list->n, blessed_reg[i]))
+               if (!find_reg(reg_list->reg, reg_list->n, blessed_reg[i]))      \
+                       if (check_supported_feat_reg(vcpu, blessed_reg[i]))
 
 #define for_each_new_reg(i)                                                    \
        for_each_reg_filtered(i)                                                \
@@ -132,6 +161,25 @@ static bool find_reg(__u64 regs[], __u64 nr_regs, __u64 reg)
        return false;
 }
 
+static bool check_supported_feat_reg(struct kvm_vcpu *vcpu, __u64 reg)
+{
+       int i, ret;
+       __u64 data, feat_val;
+
+       for (i = 0; i < ARRAY_SIZE(feat_id_regs); i++) {
+               if (feat_id_regs[i].reg == reg) {
+                       ret = __vcpu_get_reg(vcpu, feat_id_regs[i].id_reg, &data);
+                       if (ret < 0)
+                               return false;
+
+                       feat_val = ((data >> feat_id_regs[i].feat_shift) & 0xf);
+                       return feat_val >= feat_id_regs[i].feat_min;
+               }
+       }
+
+       return true;
+}
+
 static const char *str_with_index(const char *template, __u64 index)
 {
        char *str, *p;
@@ -843,12 +891,15 @@ static __u64 base_regs[] = {
        ARM64_SYS_REG(3, 0, 2, 0, 0),   /* TTBR0_EL1 */
        ARM64_SYS_REG(3, 0, 2, 0, 1),   /* TTBR1_EL1 */
        ARM64_SYS_REG(3, 0, 2, 0, 2),   /* TCR_EL1 */
+       ARM64_SYS_REG(3, 0, 2, 0, 3),   /* TCR2_EL1 */
        ARM64_SYS_REG(3, 0, 5, 1, 0),   /* AFSR0_EL1 */
        ARM64_SYS_REG(3, 0, 5, 1, 1),   /* AFSR1_EL1 */
        ARM64_SYS_REG(3, 0, 5, 2, 0),   /* ESR_EL1 */
        ARM64_SYS_REG(3, 0, 6, 0, 0),   /* FAR_EL1 */
        ARM64_SYS_REG(3, 0, 7, 4, 0),   /* PAR_EL1 */
        ARM64_SYS_REG(3, 0, 10, 2, 0),  /* MAIR_EL1 */
+       ARM64_SYS_REG(3, 0, 10, 2, 2),  /* PIRE0_EL1 */
+       ARM64_SYS_REG(3, 0, 10, 2, 3),  /* PIR_EL1 */
        ARM64_SYS_REG(3, 0, 10, 3, 0),  /* AMAIR_EL1 */
        ARM64_SYS_REG(3, 0, 12, 0, 0),  /* VBAR_EL1 */
        ARM64_SYS_REG(3, 0, 12, 1, 1),  /* DISR_EL1 */
diff --git a/tools/testing/selftests/kvm/x86_64/recalc_apic_map_test.c b/tools/testing/selftests/kvm/x86_64/recalc_apic_map_test.c
new file mode 100644 (file)
index 0000000..4c416eb
--- /dev/null
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Test edge cases and race conditions in kvm_recalculate_apic_map().
+ */
+
+#include <sys/ioctl.h>
+#include <pthread.h>
+#include <time.h>
+
+#include "processor.h"
+#include "test_util.h"
+#include "kvm_util.h"
+#include "apic.h"
+
+#define TIMEOUT                5       /* seconds */
+
+#define LAPIC_DISABLED 0
+#define LAPIC_X2APIC   (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)
+#define MAX_XAPIC_ID   0xff
+
+static void *race(void *arg)
+{
+       struct kvm_lapic_state lapic = {};
+       struct kvm_vcpu *vcpu = arg;
+
+       while (1) {
+               /* Trigger kvm_recalculate_apic_map(). */
+               vcpu_ioctl(vcpu, KVM_SET_LAPIC, &lapic);
+               pthread_testcancel();
+       }
+
+       return NULL;
+}
+
+int main(void)
+{
+       struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+       struct kvm_vcpu *vcpuN;
+       struct kvm_vm *vm;
+       pthread_t thread;
+       time_t t;
+       int i;
+
+       kvm_static_assert(KVM_MAX_VCPUS > MAX_XAPIC_ID);
+
+       /*
+        * Create the max number of vCPUs supported by selftests so that KVM
+        * has decent amount of work to do when recalculating the map, i.e. to
+        * make the problematic window large enough to hit.
+        */
+       vm = vm_create_with_vcpus(KVM_MAX_VCPUS, NULL, vcpus);
+
+       /*
+        * Enable x2APIC on all vCPUs so that KVM doesn't bail from the recalc
+        * due to vCPUs having aliased xAPIC IDs (truncated to 8 bits).
+        */
+       for (i = 0; i < KVM_MAX_VCPUS; i++)
+               vcpu_set_msr(vcpus[i], MSR_IA32_APICBASE, LAPIC_X2APIC);
+
+       ASSERT_EQ(pthread_create(&thread, NULL, race, vcpus[0]), 0);
+
+       vcpuN = vcpus[KVM_MAX_VCPUS - 1];
+       for (t = time(NULL) + TIMEOUT; time(NULL) < t;) {
+               vcpu_set_msr(vcpuN, MSR_IA32_APICBASE, LAPIC_X2APIC);
+               vcpu_set_msr(vcpuN, MSR_IA32_APICBASE, LAPIC_DISABLED);
+       }
+
+       ASSERT_EQ(pthread_cancel(thread), 0);
+       ASSERT_EQ(pthread_join(thread, NULL), 0);
+
+       kvm_vm_free(vm);
+
+       return 0;
+}
index 0f0a652..3dc9e43 100644 (file)
@@ -1,7 +1,10 @@
+CONFIG_CGROUPS=y
+CONFIG_CGROUP_SCHED=y
 CONFIG_OVERLAY_FS=y
-CONFIG_SECURITY_LANDLOCK=y
-CONFIG_SECURITY_PATH=y
+CONFIG_PROC_FS=y
 CONFIG_SECURITY=y
+CONFIG_SECURITY_LANDLOCK=y
 CONFIG_SHMEM=y
-CONFIG_TMPFS_XATTR=y
+CONFIG_SYSFS=y
 CONFIG_TMPFS=y
+CONFIG_TMPFS_XATTR=y
diff --git a/tools/testing/selftests/landlock/config.um b/tools/testing/selftests/landlock/config.um
new file mode 100644 (file)
index 0000000..40937c0
--- /dev/null
@@ -0,0 +1 @@
+CONFIG_HOSTFS=y
index b6c4be3..83d5655 100644 (file)
@@ -10,6 +10,7 @@
 #define _GNU_SOURCE
 #include <fcntl.h>
 #include <linux/landlock.h>
+#include <linux/magic.h>
 #include <sched.h>
 #include <stdio.h>
 #include <string.h>
@@ -19,6 +20,7 @@
 #include <sys/sendfile.h>
 #include <sys/stat.h>
 #include <sys/sysmacros.h>
+#include <sys/vfs.h>
 #include <unistd.h>
 
 #include "common.h"
@@ -107,8 +109,10 @@ static bool fgrep(FILE *const inf, const char *const str)
        return false;
 }
 
-static bool supports_overlayfs(void)
+static bool supports_filesystem(const char *const filesystem)
 {
+       char str[32];
+       int len;
        bool res;
        FILE *const inf = fopen("/proc/filesystems", "r");
 
@@ -119,11 +123,33 @@ static bool supports_overlayfs(void)
        if (!inf)
                return true;
 
-       res = fgrep(inf, "nodev\toverlay\n");
+       /* filesystem can be null for bind mounts. */
+       if (!filesystem)
+               return true;
+
+       len = snprintf(str, sizeof(str), "nodev\t%s\n", filesystem);
+       if (len >= sizeof(str))
+               /* Ignores too-long filesystem names. */
+               return true;
+
+       res = fgrep(inf, str);
        fclose(inf);
        return res;
 }
 
+static bool cwd_matches_fs(unsigned int fs_magic)
+{
+       struct statfs statfs_buf;
+
+       if (!fs_magic)
+               return true;
+
+       if (statfs(".", &statfs_buf))
+               return true;
+
+       return statfs_buf.f_type == fs_magic;
+}
+
 static void mkdir_parents(struct __test_metadata *const _metadata,
                          const char *const path)
 {
@@ -206,7 +232,26 @@ out:
        return err;
 }
 
-static void prepare_layout(struct __test_metadata *const _metadata)
+struct mnt_opt {
+       const char *const source;
+       const char *const type;
+       const unsigned long flags;
+       const char *const data;
+};
+
+const struct mnt_opt mnt_tmp = {
+       .type = "tmpfs",
+       .data = "size=4m,mode=700",
+};
+
+static int mount_opt(const struct mnt_opt *const mnt, const char *const target)
+{
+       return mount(mnt->source ?: mnt->type, target, mnt->type, mnt->flags,
+                    mnt->data);
+}
+
+static void prepare_layout_opt(struct __test_metadata *const _metadata,
+                              const struct mnt_opt *const mnt)
 {
        disable_caps(_metadata);
        umask(0077);
@@ -217,12 +262,28 @@ static void prepare_layout(struct __test_metadata *const _metadata)
         * for tests relying on pivot_root(2) and move_mount(2).
         */
        set_cap(_metadata, CAP_SYS_ADMIN);
-       ASSERT_EQ(0, unshare(CLONE_NEWNS));
-       ASSERT_EQ(0, mount("tmp", TMP_DIR, "tmpfs", 0, "size=4m,mode=700"));
+       ASSERT_EQ(0, unshare(CLONE_NEWNS | CLONE_NEWCGROUP));
+       ASSERT_EQ(0, mount_opt(mnt, TMP_DIR))
+       {
+               TH_LOG("Failed to mount the %s filesystem: %s", mnt->type,
+                      strerror(errno));
+               /*
+                * FIXTURE_TEARDOWN() is not called when FIXTURE_SETUP()
+                * failed, so we need to explicitly do a minimal cleanup to
+                * avoid cascading errors with other tests that don't depend on
+                * the same filesystem.
+                */
+               remove_path(TMP_DIR);
+       }
        ASSERT_EQ(0, mount(NULL, TMP_DIR, NULL, MS_PRIVATE | MS_REC, NULL));
        clear_cap(_metadata, CAP_SYS_ADMIN);
 }
 
+static void prepare_layout(struct __test_metadata *const _metadata)
+{
+       prepare_layout_opt(_metadata, &mnt_tmp);
+}
+
 static void cleanup_layout(struct __test_metadata *const _metadata)
 {
        set_cap(_metadata, CAP_SYS_ADMIN);
@@ -231,6 +292,20 @@ static void cleanup_layout(struct __test_metadata *const _metadata)
        EXPECT_EQ(0, remove_path(TMP_DIR));
 }
 
+/* clang-format off */
+FIXTURE(layout0) {};
+/* clang-format on */
+
+FIXTURE_SETUP(layout0)
+{
+       prepare_layout(_metadata);
+}
+
+FIXTURE_TEARDOWN(layout0)
+{
+       cleanup_layout(_metadata);
+}
+
 static void create_layout1(struct __test_metadata *const _metadata)
 {
        create_file(_metadata, file1_s1d1);
@@ -248,7 +323,7 @@ static void create_layout1(struct __test_metadata *const _metadata)
        create_file(_metadata, file1_s3d1);
        create_directory(_metadata, dir_s3d2);
        set_cap(_metadata, CAP_SYS_ADMIN);
-       ASSERT_EQ(0, mount("tmp", dir_s3d2, "tmpfs", 0, "size=4m,mode=700"));
+       ASSERT_EQ(0, mount_opt(&mnt_tmp, dir_s3d2));
        clear_cap(_metadata, CAP_SYS_ADMIN);
 
        ASSERT_EQ(0, mkdir(dir_s3d3, 0700));
@@ -262,11 +337,13 @@ static void remove_layout1(struct __test_metadata *const _metadata)
        EXPECT_EQ(0, remove_path(file1_s1d3));
        EXPECT_EQ(0, remove_path(file1_s1d2));
        EXPECT_EQ(0, remove_path(file1_s1d1));
+       EXPECT_EQ(0, remove_path(dir_s1d3));
 
        EXPECT_EQ(0, remove_path(file2_s2d3));
        EXPECT_EQ(0, remove_path(file1_s2d3));
        EXPECT_EQ(0, remove_path(file1_s2d2));
        EXPECT_EQ(0, remove_path(file1_s2d1));
+       EXPECT_EQ(0, remove_path(dir_s2d2));
 
        EXPECT_EQ(0, remove_path(file1_s3d1));
        EXPECT_EQ(0, remove_path(dir_s3d3));
@@ -510,7 +587,7 @@ TEST_F_FORK(layout1, file_and_dir_access_rights)
        ASSERT_EQ(0, close(ruleset_fd));
 }
 
-TEST_F_FORK(layout1, unknown_access_rights)
+TEST_F_FORK(layout0, unknown_access_rights)
 {
        __u64 access_mask;
 
@@ -608,7 +685,7 @@ static void enforce_ruleset(struct __test_metadata *const _metadata,
        }
 }
 
-TEST_F_FORK(layout1, proc_nsfs)
+TEST_F_FORK(layout0, proc_nsfs)
 {
        const struct rule rules[] = {
                {
@@ -657,11 +734,11 @@ TEST_F_FORK(layout1, proc_nsfs)
        ASSERT_EQ(0, close(path_beneath.parent_fd));
 }
 
-TEST_F_FORK(layout1, unpriv)
+TEST_F_FORK(layout0, unpriv)
 {
        const struct rule rules[] = {
                {
-                       .path = dir_s1d2,
+                       .path = TMP_DIR,
                        .access = ACCESS_RO,
                },
                {},
@@ -1301,12 +1378,12 @@ TEST_F_FORK(layout1, inherit_superset)
        ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
 }
 
-TEST_F_FORK(layout1, max_layers)
+TEST_F_FORK(layout0, max_layers)
 {
        int i, err;
        const struct rule rules[] = {
                {
-                       .path = dir_s1d2,
+                       .path = TMP_DIR,
                        .access = ACCESS_RO,
                },
                {},
@@ -4030,21 +4107,24 @@ static const char (*merge_sub_files[])[] = {
  *         └── work
  */
 
-/* clang-format off */
-FIXTURE(layout2_overlay) {};
-/* clang-format on */
+FIXTURE(layout2_overlay)
+{
+       bool skip_test;
+};
 
 FIXTURE_SETUP(layout2_overlay)
 {
-       if (!supports_overlayfs())
-               SKIP(return, "overlayfs is not supported");
+       if (!supports_filesystem("overlay")) {
+               self->skip_test = true;
+               SKIP(return, "overlayfs is not supported (setup)");
+       }
 
        prepare_layout(_metadata);
 
        create_directory(_metadata, LOWER_BASE);
        set_cap(_metadata, CAP_SYS_ADMIN);
        /* Creates tmpfs mount points to get deterministic overlayfs. */
-       ASSERT_EQ(0, mount("tmp", LOWER_BASE, "tmpfs", 0, "size=4m,mode=700"));
+       ASSERT_EQ(0, mount_opt(&mnt_tmp, LOWER_BASE));
        clear_cap(_metadata, CAP_SYS_ADMIN);
        create_file(_metadata, lower_fl1);
        create_file(_metadata, lower_dl1_fl2);
@@ -4054,7 +4134,7 @@ FIXTURE_SETUP(layout2_overlay)
 
        create_directory(_metadata, UPPER_BASE);
        set_cap(_metadata, CAP_SYS_ADMIN);
-       ASSERT_EQ(0, mount("tmp", UPPER_BASE, "tmpfs", 0, "size=4m,mode=700"));
+       ASSERT_EQ(0, mount_opt(&mnt_tmp, UPPER_BASE));
        clear_cap(_metadata, CAP_SYS_ADMIN);
        create_file(_metadata, upper_fu1);
        create_file(_metadata, upper_du1_fu2);
@@ -4075,8 +4155,8 @@ FIXTURE_SETUP(layout2_overlay)
 
 FIXTURE_TEARDOWN(layout2_overlay)
 {
-       if (!supports_overlayfs())
-               SKIP(return, "overlayfs is not supported");
+       if (self->skip_test)
+               SKIP(return, "overlayfs is not supported (teardown)");
 
        EXPECT_EQ(0, remove_path(lower_do1_fl3));
        EXPECT_EQ(0, remove_path(lower_dl1_fl2));
@@ -4109,8 +4189,8 @@ FIXTURE_TEARDOWN(layout2_overlay)
 
 TEST_F_FORK(layout2_overlay, no_restriction)
 {
-       if (!supports_overlayfs())
-               SKIP(return, "overlayfs is not supported");
+       if (self->skip_test)
+               SKIP(return, "overlayfs is not supported (test)");
 
        ASSERT_EQ(0, test_open(lower_fl1, O_RDONLY));
        ASSERT_EQ(0, test_open(lower_dl1, O_RDONLY));
@@ -4275,8 +4355,8 @@ TEST_F_FORK(layout2_overlay, same_content_different_file)
        size_t i;
        const char *path_entry;
 
-       if (!supports_overlayfs())
-               SKIP(return, "overlayfs is not supported");
+       if (self->skip_test)
+               SKIP(return, "overlayfs is not supported (test)");
 
        /* Sets rules on base directories (i.e. outside overlay scope). */
        ruleset_fd = create_ruleset(_metadata, ACCESS_RW, layer1_base);
@@ -4423,4 +4503,261 @@ TEST_F_FORK(layout2_overlay, same_content_different_file)
        }
 }
 
+FIXTURE(layout3_fs)
+{
+       bool has_created_dir;
+       bool has_created_file;
+       char *dir_path;
+       bool skip_test;
+};
+
+FIXTURE_VARIANT(layout3_fs)
+{
+       const struct mnt_opt mnt;
+       const char *const file_path;
+       unsigned int cwd_fs_magic;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout3_fs, tmpfs) {
+       /* clang-format on */
+       .mnt = mnt_tmp,
+       .file_path = file1_s1d1,
+};
+
+FIXTURE_VARIANT_ADD(layout3_fs, ramfs) {
+       .mnt = {
+               .type = "ramfs",
+               .data = "mode=700",
+       },
+       .file_path = TMP_DIR "/dir/file",
+};
+
+FIXTURE_VARIANT_ADD(layout3_fs, cgroup2) {
+       .mnt = {
+               .type = "cgroup2",
+       },
+       .file_path = TMP_DIR "/test/cgroup.procs",
+};
+
+FIXTURE_VARIANT_ADD(layout3_fs, proc) {
+       .mnt = {
+               .type = "proc",
+       },
+       .file_path = TMP_DIR "/self/status",
+};
+
+FIXTURE_VARIANT_ADD(layout3_fs, sysfs) {
+       .mnt = {
+               .type = "sysfs",
+       },
+       .file_path = TMP_DIR "/kernel/notes",
+};
+
+FIXTURE_VARIANT_ADD(layout3_fs, hostfs) {
+       .mnt = {
+               .source = TMP_DIR,
+               .flags = MS_BIND,
+       },
+       .file_path = TMP_DIR "/dir/file",
+       .cwd_fs_magic = HOSTFS_SUPER_MAGIC,
+};
+
+FIXTURE_SETUP(layout3_fs)
+{
+       struct stat statbuf;
+       const char *slash;
+       size_t dir_len;
+
+       if (!supports_filesystem(variant->mnt.type) ||
+           !cwd_matches_fs(variant->cwd_fs_magic)) {
+               self->skip_test = true;
+               SKIP(return, "this filesystem is not supported (setup)");
+       }
+
+       slash = strrchr(variant->file_path, '/');
+       ASSERT_NE(slash, NULL);
+       dir_len = (size_t)slash - (size_t)variant->file_path;
+       ASSERT_LT(0, dir_len);
+       self->dir_path = malloc(dir_len + 1);
+       self->dir_path[dir_len] = '\0';
+       strncpy(self->dir_path, variant->file_path, dir_len);
+
+       prepare_layout_opt(_metadata, &variant->mnt);
+
+       /* Creates directory when required. */
+       if (stat(self->dir_path, &statbuf)) {
+               set_cap(_metadata, CAP_DAC_OVERRIDE);
+               EXPECT_EQ(0, mkdir(self->dir_path, 0700))
+               {
+                       TH_LOG("Failed to create directory \"%s\": %s",
+                              self->dir_path, strerror(errno));
+                       free(self->dir_path);
+                       self->dir_path = NULL;
+               }
+               self->has_created_dir = true;
+               clear_cap(_metadata, CAP_DAC_OVERRIDE);
+       }
+
+       /* Creates file when required. */
+       if (stat(variant->file_path, &statbuf)) {
+               int fd;
+
+               set_cap(_metadata, CAP_DAC_OVERRIDE);
+               fd = creat(variant->file_path, 0600);
+               EXPECT_LE(0, fd)
+               {
+                       TH_LOG("Failed to create file \"%s\": %s",
+                              variant->file_path, strerror(errno));
+               }
+               EXPECT_EQ(0, close(fd));
+               self->has_created_file = true;
+               clear_cap(_metadata, CAP_DAC_OVERRIDE);
+       }
+}
+
+FIXTURE_TEARDOWN(layout3_fs)
+{
+       if (self->skip_test)
+               SKIP(return, "this filesystem is not supported (teardown)");
+
+       if (self->has_created_file) {
+               set_cap(_metadata, CAP_DAC_OVERRIDE);
+               /*
+                * Don't check for error because the file might already
+                * have been removed (cf. release_inode test).
+                */
+               unlink(variant->file_path);
+               clear_cap(_metadata, CAP_DAC_OVERRIDE);
+       }
+
+       if (self->has_created_dir) {
+               set_cap(_metadata, CAP_DAC_OVERRIDE);
+               /*
+                * Don't check for error because the directory might already
+                * have been removed (cf. release_inode test).
+                */
+               rmdir(self->dir_path);
+               clear_cap(_metadata, CAP_DAC_OVERRIDE);
+       }
+       free(self->dir_path);
+       self->dir_path = NULL;
+
+       cleanup_layout(_metadata);
+}
+
+static void layer3_fs_tag_inode(struct __test_metadata *const _metadata,
+                               FIXTURE_DATA(layout3_fs) * self,
+                               const FIXTURE_VARIANT(layout3_fs) * variant,
+                               const char *const rule_path)
+{
+       const struct rule layer1_allow_read_file[] = {
+               {
+                       .path = rule_path,
+                       .access = LANDLOCK_ACCESS_FS_READ_FILE,
+               },
+               {},
+       };
+       const struct landlock_ruleset_attr layer2_deny_everything_attr = {
+               .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
+       };
+       const char *const dev_null_path = "/dev/null";
+       int ruleset_fd;
+
+       if (self->skip_test)
+               SKIP(return, "this filesystem is not supported (test)");
+
+       /* Checks without Landlock. */
+       EXPECT_EQ(0, test_open(dev_null_path, O_RDONLY | O_CLOEXEC));
+       EXPECT_EQ(0, test_open(variant->file_path, O_RDONLY | O_CLOEXEC));
+
+       ruleset_fd = create_ruleset(_metadata, LANDLOCK_ACCESS_FS_READ_FILE,
+                                   layer1_allow_read_file);
+       EXPECT_LE(0, ruleset_fd);
+       enforce_ruleset(_metadata, ruleset_fd);
+       EXPECT_EQ(0, close(ruleset_fd));
+
+       EXPECT_EQ(EACCES, test_open(dev_null_path, O_RDONLY | O_CLOEXEC));
+       EXPECT_EQ(0, test_open(variant->file_path, O_RDONLY | O_CLOEXEC));
+
+       /* Forbids directory reading. */
+       ruleset_fd =
+               landlock_create_ruleset(&layer2_deny_everything_attr,
+                                       sizeof(layer2_deny_everything_attr), 0);
+       EXPECT_LE(0, ruleset_fd);
+       enforce_ruleset(_metadata, ruleset_fd);
+       EXPECT_EQ(0, close(ruleset_fd));
+
+       /* Checks with Landlock and forbidden access. */
+       EXPECT_EQ(EACCES, test_open(dev_null_path, O_RDONLY | O_CLOEXEC));
+       EXPECT_EQ(EACCES, test_open(variant->file_path, O_RDONLY | O_CLOEXEC));
+}
+
+/* Matrix of tests to check file hierarchy evaluation. */
+
+TEST_F_FORK(layout3_fs, tag_inode_dir_parent)
+{
+       /* The current directory must not be the root for this test. */
+       layer3_fs_tag_inode(_metadata, self, variant, ".");
+}
+
+TEST_F_FORK(layout3_fs, tag_inode_dir_mnt)
+{
+       layer3_fs_tag_inode(_metadata, self, variant, TMP_DIR);
+}
+
+TEST_F_FORK(layout3_fs, tag_inode_dir_child)
+{
+       layer3_fs_tag_inode(_metadata, self, variant, self->dir_path);
+}
+
+TEST_F_FORK(layout3_fs, tag_inode_file)
+{
+       layer3_fs_tag_inode(_metadata, self, variant, variant->file_path);
+}
+
+/* Light version of layout1.release_inodes */
+TEST_F_FORK(layout3_fs, release_inodes)
+{
+       const struct rule layer1[] = {
+               {
+                       .path = TMP_DIR,
+                       .access = LANDLOCK_ACCESS_FS_READ_DIR,
+               },
+               {},
+       };
+       int ruleset_fd;
+
+       if (self->skip_test)
+               SKIP(return, "this filesystem is not supported (test)");
+
+       /* Clean up for the teardown to not fail. */
+       if (self->has_created_file)
+               EXPECT_EQ(0, remove_path(variant->file_path));
+
+       if (self->has_created_dir)
+               /* Don't check for error because of cgroup specificities. */
+               remove_path(self->dir_path);
+
+       ruleset_fd =
+               create_ruleset(_metadata, LANDLOCK_ACCESS_FS_READ_DIR, layer1);
+       ASSERT_LE(0, ruleset_fd);
+
+       /* Unmount the filesystem while it is being used by a ruleset. */
+       set_cap(_metadata, CAP_SYS_ADMIN);
+       ASSERT_EQ(0, umount(TMP_DIR));
+       clear_cap(_metadata, CAP_SYS_ADMIN);
+
+       /* Replaces with a new mount point to simplify FIXTURE_TEARDOWN. */
+       set_cap(_metadata, CAP_SYS_ADMIN);
+       ASSERT_EQ(0, mount_opt(&mnt_tmp, TMP_DIR));
+       clear_cap(_metadata, CAP_SYS_ADMIN);
+
+       enforce_ruleset(_metadata, ruleset_fd);
+       ASSERT_EQ(0, close(ruleset_fd));
+
+       /* Checks that access to the new mount point is denied. */
+       ASSERT_EQ(EACCES, test_open(TMP_DIR, O_RDONLY));
+}
+
 TEST_HARNESS_MAIN
index 0f6aef2..2c44e11 100644 (file)
 #include <time.h>
 #include <linux/videodev2.h>
 
-int main(int argc, char **argv)
+#define PRIORITY_MAX 4
+
+int priority_test(int fd)
 {
-       int opt;
-       char video_dev[256];
-       int count;
-       struct v4l2_tuner vtuner;
-       struct v4l2_capability vcap;
+       /* This test will try to update the priority associated with a file descriptor */
+
+       enum v4l2_priority old_priority, new_priority, priority_to_compare;
        int ret;
-       int fd;
+       int result = 0;
 
-       if (argc < 2) {
-               printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
-               exit(-1);
+       ret = ioctl(fd, VIDIOC_G_PRIORITY, &old_priority);
+       if (ret < 0) {
+               printf("Failed to get priority: %s\n", strerror(errno));
+               return -1;
+       }
+       new_priority = (old_priority + 1) % PRIORITY_MAX;
+       ret = ioctl(fd, VIDIOC_S_PRIORITY, &new_priority);
+       if (ret < 0) {
+               printf("Failed to set priority: %s\n", strerror(errno));
+               return -1;
+       }
+       ret = ioctl(fd, VIDIOC_G_PRIORITY, &priority_to_compare);
+       if (ret < 0) {
+               printf("Failed to get new priority: %s\n", strerror(errno));
+               result = -1;
+               goto cleanup;
+       }
+       if (priority_to_compare != new_priority) {
+               printf("Priority wasn't set - test failed\n");
+               result = -1;
        }
 
-       /* Process arguments */
-       while ((opt = getopt(argc, argv, "d:")) != -1) {
-               switch (opt) {
-               case 'd':
-                       strncpy(video_dev, optarg, sizeof(video_dev) - 1);
-                       video_dev[sizeof(video_dev)-1] = '\0';
-                       break;
-               default:
-                       printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
-                       exit(-1);
-               }
+cleanup:
+       ret = ioctl(fd, VIDIOC_S_PRIORITY, &old_priority);
+       if (ret < 0) {
+               printf("Failed to restore priority: %s\n", strerror(errno));
+               return -1;
        }
+       return result;
+}
+
+int loop_test(int fd)
+{
+       int count;
+       struct v4l2_tuner vtuner;
+       struct v4l2_capability vcap;
+       int ret;
 
        /* Generate random number of interations */
        srand((unsigned int) time(NULL));
        count = rand();
 
-       /* Open Video device and keep it open */
-       fd = open(video_dev, O_RDWR);
-       if (fd == -1) {
-               printf("Video Device open errno %s\n", strerror(errno));
-               exit(-1);
-       }
-
        printf("\nNote:\n"
               "While test is running, remove the device or unbind\n"
               "driver and ensure there are no use after free errors\n"
@@ -98,4 +111,46 @@ int main(int argc, char **argv)
                sleep(10);
                count--;
        }
+       return 0;
+}
+
+int main(int argc, char **argv)
+{
+       int opt;
+       char video_dev[256];
+       int fd;
+       int test_result;
+
+       if (argc < 2) {
+               printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+               exit(-1);
+       }
+
+       /* Process arguments */
+       while ((opt = getopt(argc, argv, "d:")) != -1) {
+               switch (opt) {
+               case 'd':
+                       strncpy(video_dev, optarg, sizeof(video_dev) - 1);
+                       video_dev[sizeof(video_dev)-1] = '\0';
+                       break;
+               default:
+                       printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+                       exit(-1);
+               }
+       }
+
+       /* Open Video device and keep it open */
+       fd = open(video_dev, O_RDWR);
+       if (fd == -1) {
+               printf("Video Device open errno %s\n", strerror(errno));
+               exit(-1);
+       }
+
+       test_result = priority_test(fd);
+       if (!test_result)
+               printf("Priority test - PASSED\n");
+       else
+               printf("Priority test - FAILED\n");
+
+       loop_test(fd);
 }
index 23af463..4f0c50c 100644 (file)
@@ -5,12 +5,15 @@ LOCAL_HDRS += $(selfdir)/mm/local_config.h $(top_srcdir)/mm/gup_test.h
 
 include local_config.mk
 
+ifeq ($(ARCH),)
+
 ifeq ($(CROSS_COMPILE),)
 uname_M := $(shell uname -m 2>/dev/null || echo not)
 else
 uname_M := $(shell echo $(CROSS_COMPILE) | grep -o '^[a-z0-9]\+')
 endif
-MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/ppc64/')
+ARCH ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/ppc64/')
+endif
 
 # Without this, failed build products remain, with up-to-date timestamps,
 # thus tricking Make (and you!) into believing that All Is Well, in subsequent
@@ -65,7 +68,7 @@ TEST_GEN_PROGS += ksm_tests
 TEST_GEN_PROGS += ksm_functional_tests
 TEST_GEN_PROGS += mdwe_test
 
-ifeq ($(MACHINE),x86_64)
+ifeq ($(ARCH),x86_64)
 CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
 CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c)
 CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie)
@@ -87,13 +90,13 @@ TEST_GEN_PROGS += $(BINARIES_64)
 endif
 else
 
-ifneq (,$(findstring $(MACHINE),ppc64))
+ifneq (,$(findstring $(ARCH),ppc64))
 TEST_GEN_PROGS += protection_keys
 endif
 
 endif
 
-ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sparc64 x86_64))
+ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sparc64 x86_64))
 TEST_GEN_PROGS += va_high_addr_switch
 TEST_GEN_PROGS += virtual_address_range
 TEST_GEN_PROGS += write_to_hugetlbfs
@@ -112,7 +115,7 @@ $(TEST_GEN_PROGS): vm_util.c
 $(OUTPUT)/uffd-stress: uffd-common.c
 $(OUTPUT)/uffd-unit-tests: uffd-common.c
 
-ifeq ($(MACHINE),x86_64)
+ifeq ($(ARCH),x86_64)
 BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
 BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64))
 
index 80f06aa..f27a733 100644 (file)
@@ -8,8 +8,10 @@ diag_uid
 fin_ack_lat
 gro
 hwtstamp_config
+io_uring_zerocopy_tx
 ioam6_parser
 ip_defrag
+ip_local_port_range
 ipsec
 ipv6_flowlabel
 ipv6_flowlabel_mgr
@@ -26,6 +28,7 @@ reuseport_bpf_cpu
 reuseport_bpf_numa
 reuseport_dualstack
 rxtimestamp
+sctp_hello
 sk_bind_sendto_listen
 sk_connect_zero_addr
 socket
index 21ca914..ee6880a 100755 (executable)
@@ -92,6 +92,13 @@ NSC_CMD="ip netns exec ${NSC}"
 
 which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping)
 
+# Check if FIPS mode is enabled
+if [ -f /proc/sys/crypto/fips_enabled ]; then
+       fips_enabled=`cat /proc/sys/crypto/fips_enabled`
+else
+       fips_enabled=0
+fi
+
 ################################################################################
 # utilities
 
@@ -1216,7 +1223,7 @@ ipv4_tcp_novrf()
        run_cmd nettest -d ${NSA_DEV} -r ${a}
        log_test_addr ${a} $? 1 "No server, device client, local conn"
 
-       ipv4_tcp_md5_novrf
+       [ "$fips_enabled" = "1" ] || ipv4_tcp_md5_novrf
 }
 
 ipv4_tcp_vrf()
@@ -1270,9 +1277,11 @@ ipv4_tcp_vrf()
        log_test_addr ${a} $? 1 "Global server, local connection"
 
        # run MD5 tests
-       setup_vrf_dup
-       ipv4_tcp_md5
-       cleanup_vrf_dup
+       if [ "$fips_enabled" = "0" ]; then
+               setup_vrf_dup
+               ipv4_tcp_md5
+               cleanup_vrf_dup
+       fi
 
        #
        # enable VRF global server
@@ -2772,7 +2781,7 @@ ipv6_tcp_novrf()
                log_test_addr ${a} $? 1 "No server, device client, local conn"
        done
 
-       ipv6_tcp_md5_novrf
+       [ "$fips_enabled" = "1" ] || ipv6_tcp_md5_novrf
 }
 
 ipv6_tcp_vrf()
@@ -2842,9 +2851,11 @@ ipv6_tcp_vrf()
        log_test_addr ${a} $? 1 "Global server, local connection"
 
        # run MD5 tests
-       setup_vrf_dup
-       ipv6_tcp_md5
-       cleanup_vrf_dup
+       if [ "$fips_enabled" = "0" ]; then
+               setup_vrf_dup
+               ipv6_tcp_md5
+               cleanup_vrf_dup
+       fi
 
        #
        # enable VRF global server
index a47b26a..0f5e88c 100755 (executable)
@@ -2283,7 +2283,7 @@ EOF
 ################################################################################
 # main
 
-while getopts :t:pP46hv:w: o
+while getopts :t:pP46hvw: o
 do
        case $o in
                t) TESTS=$OPTARG;;
index 7da8ec8..35d89df 100755 (executable)
@@ -68,7 +68,7 @@ setup()
 cleanup()
 {
        $IP link del dev dummy0 &> /dev/null
-       ip netns del ns1
+       ip netns del ns1 &> /dev/null
        ip netns del ns2 &> /dev/null
 }
 
index 432fe84..48584a5 100755 (executable)
@@ -84,8 +84,9 @@ h2_destroy()
 
 router_rp1_200_create()
 {
-       ip link add name $rp1.200 up \
-               link $rp1 addrgenmode eui64 type vlan id 200
+       ip link add name $rp1.200 link $rp1 type vlan id 200
+       ip link set dev $rp1.200 addrgenmode eui64
+       ip link set dev $rp1.200 up
        ip address add dev $rp1.200 192.0.2.2/28
        ip address add dev $rp1.200 2001:db8:1::2/64
        ip stats set dev $rp1.200 l3_stats on
@@ -256,9 +257,11 @@ reapply_config()
 
        router_rp1_200_destroy
 
-       ip link add name $rp1.200 link $rp1 addrgenmode none type vlan id 200
+       ip link add name $rp1.200 link $rp1 type vlan id 200
+       ip link set dev $rp1.200 addrgenmode none
        ip stats set dev $rp1.200 l3_stats on
-       ip link set dev $rp1.200 up addrgenmode eui64
+       ip link set dev $rp1.200 addrgenmode eui64
+       ip link set dev $rp1.200 up
        ip address add dev $rp1.200 192.0.2.2/28
        ip address add dev $rp1.200 2001:db8:1::2/64
 }
index 057c3d0..9ddb68d 100755 (executable)
@@ -791,8 +791,9 @@ tc_rule_handle_stats_get()
        local id=$1; shift
        local handle=$1; shift
        local selector=${1:-.packets}; shift
+       local netns=${1:-""}; shift
 
-       tc -j -s filter show $id \
+       tc $netns -j -s filter show $id \
            | jq ".[] | select(.options.handle == $handle) | \
                  .options.actions[0].stats$selector"
 }
index c5095da..aec752a 100755 (executable)
@@ -93,12 +93,16 @@ cleanup()
 
 test_gretap()
 {
+       ip neigh replace 192.0.2.130 lladdr $(mac_get $h3) \
+                nud permanent dev br2
        full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap"
        full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap"
 }
 
 test_ip6gretap()
 {
+       ip neigh replace 2001:db8:2::2 lladdr $(mac_get $h3) \
+               nud permanent dev br2
        full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap"
        full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap"
 }
index 9ff22f2..0cf4c47 100755 (executable)
@@ -90,12 +90,16 @@ cleanup()
 
 test_gretap()
 {
+       ip neigh replace 192.0.2.130 lladdr $(mac_get $h3) \
+                nud permanent dev br1
        full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap"
        full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap"
 }
 
 test_ip6gretap()
 {
+       ip neigh replace 2001:db8:2::2 lladdr $(mac_get $h3) \
+               nud permanent dev br1
        full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap"
        full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap"
 }
index 43a7236..7b936a9 100644 (file)
@@ -9,7 +9,7 @@ TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh \
 
 TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq
 
-TEST_FILES := settings
+TEST_FILES := mptcp_lib.sh settings
 
 EXTRA_CLEAN := *.pcap
 
index 38021a0..6032f9b 100644 (file)
@@ -1,3 +1,4 @@
+CONFIG_KALLSYMS=y
 CONFIG_MPTCP=y
 CONFIG_IPV6=y
 CONFIG_MPTCP_IPV6=y
index ef628b1..fa9e09a 100755 (executable)
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+. "$(dirname "${0}")/mptcp_lib.sh"
+
 sec=$(date +%s)
 rndh=$(printf %x $sec)-$(mktemp -u XXXXXX)
 ns="ns1-$rndh"
@@ -31,6 +33,8 @@ cleanup()
        ip netns del $ns
 }
 
+mptcp_lib_check_mptcp
+
 ip -Version > /dev/null 2>&1
 if [ $? -ne 0 ];then
        echo "SKIP: Could not run test without ip tool"
@@ -51,16 +55,20 @@ __chk_nr()
 {
        local command="$1"
        local expected=$2
-       local msg nr
+       local msg="$3"
+       local skip="${4:-SKIP}"
+       local nr
 
-       shift 2
-       msg=$*
        nr=$(eval $command)
 
        printf "%-50s" "$msg"
        if [ $nr != $expected ]; then
-               echo "[ fail ] expected $expected found $nr"
-               ret=$test_cnt
+               if [ $nr = "$skip" ] && ! mptcp_lib_expect_all_features; then
+                       echo "[ skip ] Feature probably not supported"
+               else
+                       echo "[ fail ] expected $expected found $nr"
+                       ret=$test_cnt
+               fi
        else
                echo "[  ok  ]"
        fi
@@ -72,12 +80,12 @@ __chk_msk_nr()
        local condition=$1
        shift 1
 
-       __chk_nr "ss -inmHMN $ns | $condition" $*
+       __chk_nr "ss -inmHMN $ns | $condition" "$@"
 }
 
 chk_msk_nr()
 {
-       __chk_msk_nr "grep -c token:" $*
+       __chk_msk_nr "grep -c token:" "$@"
 }
 
 wait_msk_nr()
@@ -115,37 +123,26 @@ wait_msk_nr()
 
 chk_msk_fallback_nr()
 {
-               __chk_msk_nr "grep -c fallback" $*
+       __chk_msk_nr "grep -c fallback" "$@"
 }
 
 chk_msk_remote_key_nr()
 {
-               __chk_msk_nr "grep -c remote_key" $*
+       __chk_msk_nr "grep -c remote_key" "$@"
 }
 
 __chk_listen()
 {
        local filter="$1"
        local expected=$2
+       local msg="$3"
 
-       shift 2
-       msg=$*
-
-       nr=$(ss -N $ns -Ml "$filter" | grep -c LISTEN)
-       printf "%-50s" "$msg"
-
-       if [ $nr != $expected ]; then
-               echo "[ fail ] expected $expected found $nr"
-               ret=$test_cnt
-       else
-               echo "[  ok  ]"
-       fi
+       __chk_nr "ss -N $ns -Ml '$filter' | grep -c LISTEN" "$expected" "$msg" 0
 }
 
 chk_msk_listen()
 {
        lport=$1
-       local msg="check for listen socket"
 
        # destination port search should always return empty list
        __chk_listen "dport $lport" 0 "listen match for dport $lport"
@@ -163,10 +160,9 @@ chk_msk_listen()
 chk_msk_inuse()
 {
        local expected=$1
+       local msg="$2"
        local listen_nr
 
-       shift 1
-
        listen_nr=$(ss -N "${ns}" -Ml | grep -c LISTEN)
        expected=$((expected + listen_nr))
 
@@ -177,7 +173,7 @@ chk_msk_inuse()
                sleep 0.1
        done
 
-       __chk_nr get_msk_inuse $expected $*
+       __chk_nr get_msk_inuse $expected "$msg" 0
 }
 
 # $1: ns, $2: port
index a43d3e2..773dd77 100755 (executable)
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+. "$(dirname "${0}")/mptcp_lib.sh"
+
 time_start=$(date +%s)
 
 optstring="S:R:d:e:l:r:h4cm:f:tC"
@@ -141,6 +143,9 @@ cleanup()
        done
 }
 
+mptcp_lib_check_mptcp
+mptcp_lib_check_kallsyms
+
 ip -Version > /dev/null 2>&1
 if [ $? -ne 0 ];then
        echo "SKIP: Could not run test without ip tool"
@@ -691,6 +696,15 @@ run_test_transparent()
                return 0
        fi
 
+       # IP(V6)_TRANSPARENT has been added after TOS support which came with
+       # the required infrastructure in MPTCP sockopt code. To support TOS, the
+       # following function has been exported (T). Not great but better than
+       # checking for a specific kernel version.
+       if ! mptcp_lib_kallsyms_has "T __ip_sock_set_tos$"; then
+               echo "INFO: ${msg} not supported by the kernel: SKIP"
+               return
+       fi
+
 ip netns exec "$listener_ns" nft -f /dev/stdin <<"EOF"
 flush ruleset
 table inet mangle {
@@ -763,6 +777,11 @@ run_tests_peekmode()
 
 run_tests_mptfo()
 {
+       if ! mptcp_lib_kallsyms_has "mptcp_fastopen_"; then
+               echo "INFO: TFO not supported by the kernel: SKIP"
+               return
+       fi
+
        echo "INFO: with MPTFO start"
        ip netns exec "$ns1" sysctl -q net.ipv4.tcp_fastopen=2
        ip netns exec "$ns2" sysctl -q net.ipv4.tcp_fastopen=1
@@ -783,6 +802,11 @@ run_tests_disconnect()
        local old_cin=$cin
        local old_sin=$sin
 
+       if ! mptcp_lib_kallsyms_has "mptcp_pm_data_reset$"; then
+               echo "INFO: Full disconnect not supported: SKIP"
+               return
+       fi
+
        cat $cin $cin $cin > "$cin".disconnect
 
        # force do_transfer to cope with the multiple tranmissions
index 26310c1..0ae8caf 100755 (executable)
@@ -10,6 +10,8 @@
 # because it's invoked by variable name, see how the "tests" array is used
 #shellcheck disable=SC2317
 
+. "$(dirname "${0}")/mptcp_lib.sh"
+
 ret=0
 sin=""
 sinfail=""
@@ -17,11 +19,14 @@ sout=""
 cin=""
 cinfail=""
 cinsent=""
+tmpfile=""
 cout=""
 capout=""
 ns1=""
 ns2=""
 ksft_skip=4
+iptables="iptables"
+ip6tables="ip6tables"
 timeout_poll=30
 timeout_test=$((timeout_poll * 2 + 1))
 capture=0
@@ -79,7 +84,7 @@ init_partial()
                ip netns add $netns || exit $ksft_skip
                ip -net $netns link set lo up
                ip netns exec $netns sysctl -q net.mptcp.enabled=1
-               ip netns exec $netns sysctl -q net.mptcp.pm_type=0
+               ip netns exec $netns sysctl -q net.mptcp.pm_type=0 2>/dev/null || true
                ip netns exec $netns sysctl -q net.ipv4.conf.all.rp_filter=0
                ip netns exec $netns sysctl -q net.ipv4.conf.default.rp_filter=0
                if [ $checksum -eq 1 ]; then
@@ -136,12 +141,19 @@ cleanup_partial()
 
 check_tools()
 {
+       mptcp_lib_check_mptcp
+       mptcp_lib_check_kallsyms
+
        if ! ip -Version &> /dev/null; then
                echo "SKIP: Could not run test without ip tool"
                exit $ksft_skip
        fi
 
-       if ! iptables -V &> /dev/null; then
+       # Use the legacy version if available to support old kernel versions
+       if iptables-legacy -V &> /dev/null; then
+               iptables="iptables-legacy"
+               ip6tables="ip6tables-legacy"
+       elif ! iptables -V &> /dev/null; then
                echo "SKIP: Could not run all tests without iptables tool"
                exit $ksft_skip
        fi
@@ -175,10 +187,37 @@ cleanup()
 {
        rm -f "$cin" "$cout" "$sinfail"
        rm -f "$sin" "$sout" "$cinsent" "$cinfail"
+       rm -f "$tmpfile"
        rm -rf $evts_ns1 $evts_ns2
        cleanup_partial
 }
 
+# $1: msg
+print_title()
+{
+       printf "%03u %-36s %s" "${TEST_COUNT}" "${TEST_NAME}" "${1}"
+}
+
+# [ $1: fail msg ]
+mark_as_skipped()
+{
+       local msg="${1:-"Feature not supported"}"
+
+       mptcp_lib_fail_if_expected_feature "${msg}"
+
+       print_title "[ skip ] ${msg}"
+       printf "\n"
+}
+
+# $@: condition
+continue_if()
+{
+       if ! "${@}"; then
+               mark_as_skipped
+               return 1
+       fi
+}
+
 skip_test()
 {
        if [ "${#only_tests_ids[@]}" -eq 0 ] && [ "${#only_tests_names[@]}" -eq 0 ]; then
@@ -222,6 +261,19 @@ reset()
        return 0
 }
 
+# $1: test name ; $2: counter to check
+reset_check_counter()
+{
+       reset "${1}" || return 1
+
+       local counter="${2}"
+
+       if ! nstat -asz "${counter}" | grep -wq "${counter}"; then
+               mark_as_skipped "counter '${counter}' is not available"
+               return 1
+       fi
+}
+
 # $1: test name
 reset_with_cookies()
 {
@@ -241,17 +293,21 @@ reset_with_add_addr_timeout()
 
        reset "${1}" || return 1
 
-       tables="iptables"
+       tables="${iptables}"
        if [ $ip -eq 6 ]; then
-               tables="ip6tables"
+               tables="${ip6tables}"
        fi
 
        ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
-       ip netns exec $ns2 $tables -A OUTPUT -p tcp \
-               -m tcp --tcp-option 30 \
-               -m bpf --bytecode \
-               "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \
-               -j DROP
+
+       if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
+                       -m tcp --tcp-option 30 \
+                       -m bpf --bytecode \
+                       "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \
+                       -j DROP; then
+               mark_as_skipped "unable to set the 'add addr' rule"
+               return 1
+       fi
 }
 
 # $1: test name
@@ -295,22 +351,17 @@ reset_with_allow_join_id0()
 #     tc action pedit offset 162 out of bounds
 #
 # Netfilter is used to mark packets with enough data.
-reset_with_fail()
+setup_fail_rules()
 {
-       reset "${1}" || return 1
-
-       ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
-       ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1
-
        check_invert=1
        validate_checksum=1
-       local i="$2"
-       local ip="${3:-4}"
+       local i="$1"
+       local ip="${2:-4}"
        local tables
 
-       tables="iptables"
+       tables="${iptables}"
        if [ $ip -eq 6 ]; then
-               tables="ip6tables"
+               tables="${ip6tables}"
        fi
 
        ip netns exec $ns2 $tables \
@@ -320,15 +371,32 @@ reset_with_fail()
                -p tcp \
                -m length --length 150:9999 \
                -m statistic --mode nth --packet 1 --every 99999 \
-               -j MARK --set-mark 42 || exit 1
+               -j MARK --set-mark 42 || return ${ksft_skip}
 
-       tc -n $ns2 qdisc add dev ns2eth$i clsact || exit 1
+       tc -n $ns2 qdisc add dev ns2eth$i clsact || return ${ksft_skip}
        tc -n $ns2 filter add dev ns2eth$i egress \
                protocol ip prio 1000 \
                handle 42 fw \
                action pedit munge offset 148 u8 invert \
                pipe csum tcp \
-               index 100 || exit 1
+               index 100 || return ${ksft_skip}
+}
+
+reset_with_fail()
+{
+       reset_check_counter "${1}" "MPTcpExtInfiniteMapTx" || return 1
+       shift
+
+       ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
+       ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1
+
+       local rc=0
+       setup_fail_rules "${@}" || rc=$?
+
+       if [ ${rc} -eq ${ksft_skip} ]; then
+               mark_as_skipped "unable to set the 'fail' rules"
+               return 1
+       fi
 }
 
 reset_with_events()
@@ -343,6 +411,25 @@ reset_with_events()
        evts_ns2_pid=$!
 }
 
+reset_with_tcp_filter()
+{
+       reset "${1}" || return 1
+       shift
+
+       local ns="${!1}"
+       local src="${2}"
+       local target="${3}"
+
+       if ! ip netns exec "${ns}" ${iptables} \
+                       -A INPUT \
+                       -s "${src}" \
+                       -p tcp \
+                       -j "${target}"; then
+               mark_as_skipped "unable to set the filter rules"
+               return 1
+       fi
+}
+
 fail_test()
 {
        ret=1
@@ -383,9 +470,16 @@ check_transfer()
                        fail_test
                        return 1
                fi
-               bytes="--bytes=${bytes}"
+
+               # note: BusyBox's "cmp" command doesn't support --bytes
+               tmpfile=$(mktemp)
+               head --bytes="$bytes" "$in" > "$tmpfile"
+               mv "$tmpfile" "$in"
+               head --bytes="$bytes" "$out" > "$tmpfile"
+               mv "$tmpfile" "$out"
+               tmpfile=""
        fi
-       cmp -l "$in" "$out" ${bytes} | while read -r i a b; do
+       cmp -l "$in" "$out" | while read -r i a b; do
                local sum=$((0${a} + 0${b}))
                if [ $check_invert -eq 0 ] || [ $sum -ne $((0xff)) ]; then
                        echo "[ FAIL ] $what does not match (in, out):"
@@ -454,11 +548,25 @@ wait_local_port_listen()
        done
 }
 
-rm_addr_count()
+# $1: ns ; $2: counter
+get_counter()
 {
-       local ns=${1}
+       local ns="${1}"
+       local counter="${2}"
+       local count
 
-       ip netns exec ${ns} nstat -as | grep MPTcpExtRmAddr | awk '{print $2}'
+       count=$(ip netns exec ${ns} nstat -asz "${counter}" | awk 'NR==1 {next} {print $2}')
+       if [ -z "${count}" ]; then
+               mptcp_lib_fail_if_expected_feature "${counter} counter"
+               return 1
+       fi
+
+       echo "${count}"
+}
+
+rm_addr_count()
+{
+       get_counter "${1}" "MPTcpExtRmAddr"
 }
 
 # $1: ns, $2: old rm_addr counter in $ns
@@ -481,11 +589,11 @@ wait_mpj()
        local ns="${1}"
        local cnt old_cnt
 
-       old_cnt=$(ip netns exec ${ns} nstat -as | grep MPJoinAckRx | awk '{print $2}')
+       old_cnt=$(get_counter ${ns} "MPTcpExtMPJoinAckRx")
 
        local i
        for i in $(seq 10); do
-               cnt=$(ip netns exec ${ns} nstat -as | grep MPJoinAckRx | awk '{print $2}')
+               cnt=$(get_counter ${ns} "MPTcpExtMPJoinAckRx")
                [ "$cnt" = "${old_cnt}" ] || break
                sleep 0.1
        done
@@ -685,15 +793,6 @@ pm_nl_check_endpoint()
        fi
 }
 
-filter_tcp_from()
-{
-       local ns="${1}"
-       local src="${2}"
-       local target="${3}"
-
-       ip netns exec "${ns}" iptables -A INPUT -s "${src}" -p tcp -j "${target}"
-}
-
 do_transfer()
 {
        local listener_ns="$1"
@@ -849,7 +948,15 @@ do_transfer()
                                     sed -n 's/.*\(token:\)\([[:digit:]]*\).*$/\2/p;q')
                                ip netns exec ${listener_ns} ./pm_nl_ctl ann $addr token $tk id $id
                                sleep 1
+                               sp=$(grep "type:10" "$evts_ns1" |
+                                    sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q')
+                               da=$(grep "type:10" "$evts_ns1" |
+                                    sed -n 's/.*\(daddr6:\)\([0-9a-f:.]*\).*$/\2/p;q')
+                               dp=$(grep "type:10" "$evts_ns1" |
+                                    sed -n 's/.*\(dport:\)\([[:digit:]]*\).*$/\2/p;q')
                                ip netns exec ${listener_ns} ./pm_nl_ctl rem token $tk id $id
+                               ip netns exec ${listener_ns} ./pm_nl_ctl dsf lip "::ffff:$addr" \
+                                                       lport $sp rip $da rport $dp token $tk
                        fi
 
                        counter=$((counter + 1))
@@ -915,6 +1022,7 @@ do_transfer()
                                sleep 1
                                sp=$(grep "type:10" "$evts_ns2" |
                                     sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q')
+                               ip netns exec ${connector_ns} ./pm_nl_ctl rem token $tk id $id
                                ip netns exec ${connector_ns} ./pm_nl_ctl dsf lip $addr lport $sp \
                                                                        rip $da rport $dp token $tk
                        fi
@@ -1135,12 +1243,13 @@ chk_csum_nr()
        fi
 
        printf "%-${nr_blank}s %s" " " "sum"
-       count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtDataCsumErr | awk '{print $2}')
-       [ -z "$count" ] && count=0
+       count=$(get_counter ${ns1} "MPTcpExtDataCsumErr")
        if [ "$count" != "$csum_ns1" ]; then
                extra_msg="$extra_msg ns1=$count"
        fi
-       if { [ "$count" != $csum_ns1 ] && [ $allow_multi_errors_ns1 -eq 0 ]; } ||
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif { [ "$count" != $csum_ns1 ] && [ $allow_multi_errors_ns1 -eq 0 ]; } ||
           { [ "$count" -lt $csum_ns1 ] && [ $allow_multi_errors_ns1 -eq 1 ]; }; then
                echo "[fail] got $count data checksum error[s] expected $csum_ns1"
                fail_test
@@ -1149,12 +1258,13 @@ chk_csum_nr()
                echo -n "[ ok ]"
        fi
        echo -n " - csum  "
-       count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtDataCsumErr | awk '{print $2}')
-       [ -z "$count" ] && count=0
+       count=$(get_counter ${ns2} "MPTcpExtDataCsumErr")
        if [ "$count" != "$csum_ns2" ]; then
                extra_msg="$extra_msg ns2=$count"
        fi
-       if { [ "$count" != $csum_ns2 ] && [ $allow_multi_errors_ns2 -eq 0 ]; } ||
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif { [ "$count" != $csum_ns2 ] && [ $allow_multi_errors_ns2 -eq 0 ]; } ||
           { [ "$count" -lt $csum_ns2 ] && [ $allow_multi_errors_ns2 -eq 1 ]; }; then
                echo "[fail] got $count data checksum error[s] expected $csum_ns2"
                fail_test
@@ -1196,12 +1306,13 @@ chk_fail_nr()
        fi
 
        printf "%-${nr_blank}s %s" " " "ftx"
-       count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPFailTx | awk '{print $2}')
-       [ -z "$count" ] && count=0
+       count=$(get_counter ${ns_tx} "MPTcpExtMPFailTx")
        if [ "$count" != "$fail_tx" ]; then
                extra_msg="$extra_msg,tx=$count"
        fi
-       if { [ "$count" != "$fail_tx" ] && [ $allow_tx_lost -eq 0 ]; } ||
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif { [ "$count" != "$fail_tx" ] && [ $allow_tx_lost -eq 0 ]; } ||
           { [ "$count" -gt "$fail_tx" ] && [ $allow_tx_lost -eq 1 ]; }; then
                echo "[fail] got $count MP_FAIL[s] TX expected $fail_tx"
                fail_test
@@ -1211,12 +1322,13 @@ chk_fail_nr()
        fi
 
        echo -n " - failrx"
-       count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPFailRx | awk '{print $2}')
-       [ -z "$count" ] && count=0
+       count=$(get_counter ${ns_rx} "MPTcpExtMPFailRx")
        if [ "$count" != "$fail_rx" ]; then
                extra_msg="$extra_msg,rx=$count"
        fi
-       if { [ "$count" != "$fail_rx" ] && [ $allow_rx_lost -eq 0 ]; } ||
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif { [ "$count" != "$fail_rx" ] && [ $allow_rx_lost -eq 0 ]; } ||
           { [ "$count" -gt "$fail_rx" ] && [ $allow_rx_lost -eq 1 ]; }; then
                echo "[fail] got $count MP_FAIL[s] RX expected $fail_rx"
                fail_test
@@ -1248,10 +1360,11 @@ chk_fclose_nr()
        fi
 
        printf "%-${nr_blank}s %s" " " "ctx"
-       count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPFastcloseTx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       [ "$count" != "$fclose_tx" ] && extra_msg="$extra_msg,tx=$count"
-       if [ "$count" != "$fclose_tx" ]; then
+       count=$(get_counter ${ns_tx} "MPTcpExtMPFastcloseTx")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ "$count" != "$fclose_tx" ]; then
+               extra_msg="$extra_msg,tx=$count"
                echo "[fail] got $count MP_FASTCLOSE[s] TX expected $fclose_tx"
                fail_test
                dump_stats=1
@@ -1260,10 +1373,11 @@ chk_fclose_nr()
        fi
 
        echo -n " - fclzrx"
-       count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPFastcloseRx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       [ "$count" != "$fclose_rx" ] && extra_msg="$extra_msg,rx=$count"
-       if [ "$count" != "$fclose_rx" ]; then
+       count=$(get_counter ${ns_rx} "MPTcpExtMPFastcloseRx")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ "$count" != "$fclose_rx" ]; then
+               extra_msg="$extra_msg,rx=$count"
                echo "[fail] got $count MP_FASTCLOSE[s] RX expected $fclose_rx"
                fail_test
                dump_stats=1
@@ -1294,9 +1408,10 @@ chk_rst_nr()
        fi
 
        printf "%-${nr_blank}s %s" " " "rtx"
-       count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPRstTx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ $count -lt $rst_tx ]; then
+       count=$(get_counter ${ns_tx} "MPTcpExtMPRstTx")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ $count -lt $rst_tx ]; then
                echo "[fail] got $count MP_RST[s] TX expected $rst_tx"
                fail_test
                dump_stats=1
@@ -1305,9 +1420,10 @@ chk_rst_nr()
        fi
 
        echo -n " - rstrx "
-       count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPRstRx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" -lt "$rst_rx" ]; then
+       count=$(get_counter ${ns_rx} "MPTcpExtMPRstRx")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ "$count" -lt "$rst_rx" ]; then
                echo "[fail] got $count MP_RST[s] RX expected $rst_rx"
                fail_test
                dump_stats=1
@@ -1328,9 +1444,10 @@ chk_infi_nr()
        local dump_stats
 
        printf "%-${nr_blank}s %s" " " "itx"
-       count=$(ip netns exec $ns2 nstat -as | grep InfiniteMapTx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" != "$infi_tx" ]; then
+       count=$(get_counter ${ns2} "MPTcpExtInfiniteMapTx")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ "$count" != "$infi_tx" ]; then
                echo "[fail] got $count infinite map[s] TX expected $infi_tx"
                fail_test
                dump_stats=1
@@ -1339,9 +1456,10 @@ chk_infi_nr()
        fi
 
        echo -n " - infirx"
-       count=$(ip netns exec $ns1 nstat -as | grep InfiniteMapRx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" != "$infi_rx" ]; then
+       count=$(get_counter ${ns1} "MPTcpExtInfiniteMapRx")
+       if [ -z "$count" ]; then
+               echo "[skip]"
+       elif [ "$count" != "$infi_rx" ]; then
                echo "[fail] got $count infinite map[s] RX expected $infi_rx"
                fail_test
                dump_stats=1
@@ -1373,9 +1491,10 @@ chk_join_nr()
        fi
 
        printf "%03u %-36s %s" "${TEST_COUNT}" "${title}" "syn"
-       count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinSynRx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" != "$syn_nr" ]; then
+       count=$(get_counter ${ns1} "MPTcpExtMPJoinSynRx")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ "$count" != "$syn_nr" ]; then
                echo "[fail] got $count JOIN[s] syn expected $syn_nr"
                fail_test
                dump_stats=1
@@ -1385,9 +1504,10 @@ chk_join_nr()
 
        echo -n " - synack"
        with_cookie=$(ip netns exec $ns2 sysctl -n net.ipv4.tcp_syncookies)
-       count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtMPJoinSynAckRx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" != "$syn_ack_nr" ]; then
+       count=$(get_counter ${ns2} "MPTcpExtMPJoinSynAckRx")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ "$count" != "$syn_ack_nr" ]; then
                # simult connections exceeding the limit with cookie enabled could go up to
                # synack validation as the conn limit can be enforced reliably only after
                # the subflow creation
@@ -1403,9 +1523,10 @@ chk_join_nr()
        fi
 
        echo -n " - ack"
-       count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinAckRx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" != "$ack_nr" ]; then
+       count=$(get_counter ${ns1} "MPTcpExtMPJoinAckRx")
+       if [ -z "$count" ]; then
+               echo "[skip]"
+       elif [ "$count" != "$ack_nr" ]; then
                echo "[fail] got $count JOIN[s] ack expected $ack_nr"
                fail_test
                dump_stats=1
@@ -1437,12 +1558,12 @@ chk_stale_nr()
        local recover_nr
 
        printf "%-${nr_blank}s %-18s" " " "stale"
-       stale_nr=$(ip netns exec $ns nstat -as | grep MPTcpExtSubflowStale | awk '{print $2}')
-       [ -z "$stale_nr" ] && stale_nr=0
-       recover_nr=$(ip netns exec $ns nstat -as | grep MPTcpExtSubflowRecover | awk '{print $2}')
-       [ -z "$recover_nr" ] && recover_nr=0
 
-       if [ $stale_nr -lt $stale_min ] ||
+       stale_nr=$(get_counter ${ns} "MPTcpExtSubflowStale")
+       recover_nr=$(get_counter ${ns} "MPTcpExtSubflowRecover")
+       if [ -z "$stale_nr" ] || [ -z "$recover_nr" ]; then
+               echo "[skip]"
+       elif [ $stale_nr -lt $stale_min ] ||
           { [ $stale_max -gt 0 ] && [ $stale_nr -gt $stale_max ]; } ||
           [ $((stale_nr - recover_nr)) -ne $stale_delta ]; then
                echo "[fail] got $stale_nr stale[s] $recover_nr recover[s], " \
@@ -1478,12 +1599,12 @@ chk_add_nr()
        timeout=$(ip netns exec $ns1 sysctl -n net.mptcp.add_addr_timeout)
 
        printf "%-${nr_blank}s %s" " " "add"
-       count=$(ip netns exec $ns2 nstat -as MPTcpExtAddAddr | grep MPTcpExtAddAddr | awk '{print $2}')
-       [ -z "$count" ] && count=0
-
+       count=$(get_counter ${ns2} "MPTcpExtAddAddr")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
        # if the test configured a short timeout tolerate greater then expected
        # add addrs options, due to retransmissions
-       if [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then
+       elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then
                echo "[fail] got $count ADD_ADDR[s] expected $add_nr"
                fail_test
                dump_stats=1
@@ -1492,9 +1613,10 @@ chk_add_nr()
        fi
 
        echo -n " - echo  "
-       count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtEchoAdd | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" != "$echo_nr" ]; then
+       count=$(get_counter ${ns1} "MPTcpExtEchoAdd")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ "$count" != "$echo_nr" ]; then
                echo "[fail] got $count ADD_ADDR echo[s] expected $echo_nr"
                fail_test
                dump_stats=1
@@ -1504,9 +1626,10 @@ chk_add_nr()
 
        if [ $port_nr -gt 0 ]; then
                echo -n " - pt "
-               count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtPortAdd | awk '{print $2}')
-               [ -z "$count" ] && count=0
-               if [ "$count" != "$port_nr" ]; then
+               count=$(get_counter ${ns2} "MPTcpExtPortAdd")
+               if [ -z "$count" ]; then
+                       echo "[skip]"
+               elif [ "$count" != "$port_nr" ]; then
                        echo "[fail] got $count ADD_ADDR[s] with a port-number expected $port_nr"
                        fail_test
                        dump_stats=1
@@ -1515,10 +1638,10 @@ chk_add_nr()
                fi
 
                printf "%-${nr_blank}s %s" " " "syn"
-               count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinPortSynRx |
-                       awk '{print $2}')
-               [ -z "$count" ] && count=0
-               if [ "$count" != "$syn_nr" ]; then
+               count=$(get_counter ${ns1} "MPTcpExtMPJoinPortSynRx")
+               if [ -z "$count" ]; then
+                       echo -n "[skip]"
+               elif [ "$count" != "$syn_nr" ]; then
                        echo "[fail] got $count JOIN[s] syn with a different \
                                port-number expected $syn_nr"
                        fail_test
@@ -1528,10 +1651,10 @@ chk_add_nr()
                fi
 
                echo -n " - synack"
-               count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtMPJoinPortSynAckRx |
-                       awk '{print $2}')
-               [ -z "$count" ] && count=0
-               if [ "$count" != "$syn_ack_nr" ]; then
+               count=$(get_counter ${ns2} "MPTcpExtMPJoinPortSynAckRx")
+               if [ -z "$count" ]; then
+                       echo -n "[skip]"
+               elif [ "$count" != "$syn_ack_nr" ]; then
                        echo "[fail] got $count JOIN[s] synack with a different \
                                port-number expected $syn_ack_nr"
                        fail_test
@@ -1541,10 +1664,10 @@ chk_add_nr()
                fi
 
                echo -n " - ack"
-               count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinPortAckRx |
-                       awk '{print $2}')
-               [ -z "$count" ] && count=0
-               if [ "$count" != "$ack_nr" ]; then
+               count=$(get_counter ${ns1} "MPTcpExtMPJoinPortAckRx")
+               if [ -z "$count" ]; then
+                       echo "[skip]"
+               elif [ "$count" != "$ack_nr" ]; then
                        echo "[fail] got $count JOIN[s] ack with a different \
                                port-number expected $ack_nr"
                        fail_test
@@ -1554,10 +1677,10 @@ chk_add_nr()
                fi
 
                printf "%-${nr_blank}s %s" " " "syn"
-               count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMismatchPortSynRx |
-                       awk '{print $2}')
-               [ -z "$count" ] && count=0
-               if [ "$count" != "$mis_syn_nr" ]; then
+               count=$(get_counter ${ns1} "MPTcpExtMismatchPortSynRx")
+               if [ -z "$count" ]; then
+                       echo -n "[skip]"
+               elif [ "$count" != "$mis_syn_nr" ]; then
                        echo "[fail] got $count JOIN[s] syn with a mismatched \
                                port-number expected $mis_syn_nr"
                        fail_test
@@ -1567,10 +1690,10 @@ chk_add_nr()
                fi
 
                echo -n " - ack   "
-               count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMismatchPortAckRx |
-                       awk '{print $2}')
-               [ -z "$count" ] && count=0
-               if [ "$count" != "$mis_ack_nr" ]; then
+               count=$(get_counter ${ns1} "MPTcpExtMismatchPortAckRx")
+               if [ -z "$count" ]; then
+                       echo "[skip]"
+               elif [ "$count" != "$mis_ack_nr" ]; then
                        echo "[fail] got $count JOIN[s] ack with a mismatched \
                                port-number expected $mis_ack_nr"
                        fail_test
@@ -1614,9 +1737,10 @@ chk_rm_nr()
        fi
 
        printf "%-${nr_blank}s %s" " " "rm "
-       count=$(ip netns exec $addr_ns nstat -as | grep MPTcpExtRmAddr | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" != "$rm_addr_nr" ]; then
+       count=$(get_counter ${addr_ns} "MPTcpExtRmAddr")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ "$count" != "$rm_addr_nr" ]; then
                echo "[fail] got $count RM_ADDR[s] expected $rm_addr_nr"
                fail_test
                dump_stats=1
@@ -1625,29 +1749,27 @@ chk_rm_nr()
        fi
 
        echo -n " - rmsf  "
-       count=$(ip netns exec $subflow_ns nstat -as | grep MPTcpExtRmSubflow | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ -n "$simult" ]; then
+       count=$(get_counter ${subflow_ns} "MPTcpExtRmSubflow")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ -n "$simult" ]; then
                local cnt suffix
 
-               cnt=$(ip netns exec $addr_ns nstat -as | grep MPTcpExtRmSubflow | awk '{print $2}')
+               cnt=$(get_counter ${addr_ns} "MPTcpExtRmSubflow")
 
                # in case of simult flush, the subflow removal count on each side is
                # unreliable
-               [ -z "$cnt" ] && cnt=0
                count=$((count + cnt))
                [ "$count" != "$rm_subflow_nr" ] && suffix="$count in [$rm_subflow_nr:$((rm_subflow_nr*2))]"
                if [ $count -ge "$rm_subflow_nr" ] && \
                   [ "$count" -le "$((rm_subflow_nr *2 ))" ]; then
-                       echo "[ ok ] $suffix"
+                       echo -n "[ ok ] $suffix"
                else
                        echo "[fail] got $count RM_SUBFLOW[s] expected in range [$rm_subflow_nr:$((rm_subflow_nr*2))]"
                        fail_test
                        dump_stats=1
                fi
-               return
-       fi
-       if [ "$count" != "$rm_subflow_nr" ]; then
+       elif [ "$count" != "$rm_subflow_nr" ]; then
                echo "[fail] got $count RM_SUBFLOW[s] expected $rm_subflow_nr"
                fail_test
                dump_stats=1
@@ -1668,9 +1790,10 @@ chk_prio_nr()
        local dump_stats
 
        printf "%-${nr_blank}s %s" " " "ptx"
-       count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPPrioTx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" != "$mp_prio_nr_tx" ]; then
+       count=$(get_counter ${ns1} "MPTcpExtMPPrioTx")
+       if [ -z "$count" ]; then
+               echo -n "[skip]"
+       elif [ "$count" != "$mp_prio_nr_tx" ]; then
                echo "[fail] got $count MP_PRIO[s] TX expected $mp_prio_nr_tx"
                fail_test
                dump_stats=1
@@ -1679,9 +1802,10 @@ chk_prio_nr()
        fi
 
        echo -n " - prx   "
-       count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPPrioRx | awk '{print $2}')
-       [ -z "$count" ] && count=0
-       if [ "$count" != "$mp_prio_nr_rx" ]; then
+       count=$(get_counter ${ns1} "MPTcpExtMPPrioRx")
+       if [ -z "$count" ]; then
+               echo "[skip]"
+       elif [ "$count" != "$mp_prio_nr_rx" ]; then
                echo "[fail] got $count MP_PRIO[s] RX expected $mp_prio_nr_rx"
                fail_test
                dump_stats=1
@@ -1797,7 +1921,7 @@ wait_attempt_fail()
        while [ $time -lt $timeout_ms ]; do
                local cnt
 
-               cnt=$(ip netns exec $ns nstat -as TcpAttemptFails | grep TcpAttemptFails | awk '{print $2}')
+               cnt=$(get_counter ${ns} "TcpAttemptFails")
 
                [ "$cnt" = 1 ] && return 1
                time=$((time + 100))
@@ -1890,23 +2014,23 @@ subflows_error_tests()
        fi
 
        # multiple subflows, with subflow creation error
-       if reset "multi subflows, with failing subflow"; then
+       if reset_with_tcp_filter "multi subflows, with failing subflow" ns1 10.0.3.2 REJECT &&
+          continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
                pm_nl_set_limits $ns1 0 2
                pm_nl_set_limits $ns2 0 2
                pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
                pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
-               filter_tcp_from $ns1 10.0.3.2 REJECT
                run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
                chk_join_nr 1 1 1
        fi
 
        # multiple subflows, with subflow timeout on MPJ
-       if reset "multi subflows, with subflow timeout"; then
+       if reset_with_tcp_filter "multi subflows, with subflow timeout" ns1 10.0.3.2 DROP &&
+          continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
                pm_nl_set_limits $ns1 0 2
                pm_nl_set_limits $ns2 0 2
                pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
                pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
-               filter_tcp_from $ns1 10.0.3.2 DROP
                run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
                chk_join_nr 1 1 1
        fi
@@ -1914,11 +2038,11 @@ subflows_error_tests()
        # multiple subflows, check that the endpoint corresponding to
        # closed subflow (due to reset) is not reused if additional
        # subflows are added later
-       if reset "multi subflows, fair usage on close"; then
+       if reset_with_tcp_filter "multi subflows, fair usage on close" ns1 10.0.3.2 REJECT &&
+          continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
                pm_nl_set_limits $ns1 0 1
                pm_nl_set_limits $ns2 0 1
                pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
-               filter_tcp_from $ns1 10.0.3.2 REJECT
                run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
 
                # mpj subflow will be in TW after the reset
@@ -2018,11 +2142,18 @@ signal_address_tests()
                # the peer could possibly miss some addr notification, allow retransmission
                ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
                run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
-               chk_join_nr 3 3 3
 
-               # the server will not signal the address terminating
-               # the MPC subflow
-               chk_add_nr 3 3
+               # It is not directly linked to the commit introducing this
+               # symbol but for the parent one which is linked anyway.
+               if ! mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
+                       chk_join_nr 3 3 2
+                       chk_add_nr 4 4
+               else
+                       chk_join_nr 3 3 3
+                       # the server will not signal the address terminating
+                       # the MPC subflow
+                       chk_add_nr 3 3
+               fi
        fi
 }
 
@@ -2263,7 +2394,12 @@ remove_tests()
                pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow
                run_tests $ns1 $ns2 10.0.1.1 0 -8 -8 slow
                chk_join_nr 3 3 3
-               chk_rm_nr 0 3 simult
+
+               if mptcp_lib_kversion_ge 5.18; then
+                       chk_rm_nr 0 3 simult
+               else
+                       chk_rm_nr 3 3
+               fi
        fi
 
        # addresses flush
@@ -2501,7 +2637,8 @@ v4mapped_tests()
 
 mixed_tests()
 {
-       if reset "IPv4 sockets do not use IPv6 addresses"; then
+       if reset "IPv4 sockets do not use IPv6 addresses" &&
+          continue_if mptcp_lib_kversion_ge 6.3; then
                pm_nl_set_limits $ns1 0 1
                pm_nl_set_limits $ns2 1 1
                pm_nl_add_endpoint $ns1 dead:beef:2::1 flags signal
@@ -2510,7 +2647,8 @@ mixed_tests()
        fi
 
        # Need an IPv6 mptcp socket to allow subflows of both families
-       if reset "simult IPv4 and IPv6 subflows"; then
+       if reset "simult IPv4 and IPv6 subflows" &&
+          continue_if mptcp_lib_kversion_ge 6.3; then
                pm_nl_set_limits $ns1 0 1
                pm_nl_set_limits $ns2 1 1
                pm_nl_add_endpoint $ns1 10.0.1.1 flags signal
@@ -2519,7 +2657,8 @@ mixed_tests()
        fi
 
        # cross families subflows will not be created even in fullmesh mode
-       if reset "simult IPv4 and IPv6 subflows, fullmesh 1x1"; then
+       if reset "simult IPv4 and IPv6 subflows, fullmesh 1x1" &&
+          continue_if mptcp_lib_kversion_ge 6.3; then
                pm_nl_set_limits $ns1 0 4
                pm_nl_set_limits $ns2 1 4
                pm_nl_add_endpoint $ns2 dead:beef:2::2 flags subflow,fullmesh
@@ -2530,7 +2669,8 @@ mixed_tests()
 
        # fullmesh still tries to create all the possibly subflows with
        # matching family
-       if reset "simult IPv4 and IPv6 subflows, fullmesh 2x2"; then
+       if reset "simult IPv4 and IPv6 subflows, fullmesh 2x2" &&
+          continue_if mptcp_lib_kversion_ge 6.3; then
                pm_nl_set_limits $ns1 0 4
                pm_nl_set_limits $ns2 2 4
                pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
@@ -2543,7 +2683,8 @@ mixed_tests()
 backup_tests()
 {
        # single subflow, backup
-       if reset "single subflow, backup"; then
+       if reset "single subflow, backup" &&
+          continue_if mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then
                pm_nl_set_limits $ns1 0 1
                pm_nl_set_limits $ns2 0 1
                pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow,backup
@@ -2553,7 +2694,8 @@ backup_tests()
        fi
 
        # single address, backup
-       if reset "single address, backup"; then
+       if reset "single address, backup" &&
+          continue_if mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then
                pm_nl_set_limits $ns1 0 1
                pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
                pm_nl_set_limits $ns2 1 1
@@ -2564,7 +2706,8 @@ backup_tests()
        fi
 
        # single address with port, backup
-       if reset "single address with port, backup"; then
+       if reset "single address with port, backup" &&
+          continue_if mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then
                pm_nl_set_limits $ns1 0 1
                pm_nl_add_endpoint $ns1 10.0.2.1 flags signal port 10100
                pm_nl_set_limits $ns2 1 1
@@ -2574,14 +2717,16 @@ backup_tests()
                chk_prio_nr 1 1
        fi
 
-       if reset "mpc backup"; then
+       if reset "mpc backup" &&
+          continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then
                pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,backup
                run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
                chk_join_nr 0 0 0
                chk_prio_nr 0 1
        fi
 
-       if reset "mpc backup both sides"; then
+       if reset "mpc backup both sides" &&
+          continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then
                pm_nl_add_endpoint $ns1 10.0.1.1 flags subflow,backup
                pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,backup
                run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
@@ -2589,14 +2734,16 @@ backup_tests()
                chk_prio_nr 1 1
        fi
 
-       if reset "mpc switch to backup"; then
+       if reset "mpc switch to backup" &&
+          continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then
                pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow
                run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow backup
                chk_join_nr 0 0 0
                chk_prio_nr 0 1
        fi
 
-       if reset "mpc switch to backup both sides"; then
+       if reset "mpc switch to backup both sides" &&
+          continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then
                pm_nl_add_endpoint $ns1 10.0.1.1 flags subflow
                pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow
                run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow backup
@@ -2622,38 +2769,41 @@ verify_listener_events()
        local family
        local saddr
        local sport
+       local name
 
        if [ $e_type = $LISTENER_CREATED ]; then
-               stdbuf -o0 -e0 printf "\t\t\t\t\t CREATE_LISTENER %s:%s"\
-                       $e_saddr $e_sport
+               name="LISTENER_CREATED"
        elif [ $e_type = $LISTENER_CLOSED ]; then
-               stdbuf -o0 -e0 printf "\t\t\t\t\t CLOSE_LISTENER %s:%s "\
-                       $e_saddr $e_sport
+               name="LISTENER_CLOSED"
+       else
+               name="$e_type"
        fi
 
-       type=$(grep "type:$e_type," $evt |
-              sed --unbuffered -n 's/.*\(type:\)\([[:digit:]]*\).*$/\2/p;q')
-       family=$(grep "type:$e_type," $evt |
-                sed --unbuffered -n 's/.*\(family:\)\([[:digit:]]*\).*$/\2/p;q')
-       sport=$(grep "type:$e_type," $evt |
-               sed --unbuffered -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q')
+       printf "%-${nr_blank}s %s %s:%s " " " "$name" "$e_saddr" "$e_sport"
+
+       if ! mptcp_lib_kallsyms_has "mptcp_event_pm_listener$"; then
+               printf "[skip]: event not supported\n"
+               return
+       fi
+
+       type=$(grep "type:$e_type," $evt | sed -n 's/.*\(type:\)\([[:digit:]]*\).*$/\2/p;q')
+       family=$(grep "type:$e_type," $evt | sed -n 's/.*\(family:\)\([[:digit:]]*\).*$/\2/p;q')
+       sport=$(grep "type:$e_type," $evt | sed -n 's/.*\(sport:\)\([[:digit:]]*\).*$/\2/p;q')
        if [ $family ] && [ $family = $AF_INET6 ]; then
-               saddr=$(grep "type:$e_type," $evt |
-                       sed --unbuffered -n 's/.*\(saddr6:\)\([0-9a-f:.]*\).*$/\2/p;q')
+               saddr=$(grep "type:$e_type," $evt | sed -n 's/.*\(saddr6:\)\([0-9a-f:.]*\).*$/\2/p;q')
        else
-               saddr=$(grep "type:$e_type," $evt |
-                       sed --unbuffered -n 's/.*\(saddr4:\)\([0-9.]*\).*$/\2/p;q')
+               saddr=$(grep "type:$e_type," $evt | sed -n 's/.*\(saddr4:\)\([0-9.]*\).*$/\2/p;q')
        fi
 
        if [ $type ] && [ $type = $e_type ] &&
           [ $family ] && [ $family = $e_family ] &&
           [ $saddr ] && [ $saddr = $e_saddr ] &&
           [ $sport ] && [ $sport = $e_sport ]; then
-               stdbuf -o0 -e0 printf "[ ok ]\n"
+               echo "[ ok ]"
                return 0
        fi
        fail_test
-       stdbuf -o0 -e0 printf "[fail]\n"
+       echo "[fail]"
 }
 
 add_addr_ports_tests()
@@ -2959,7 +3109,8 @@ fullmesh_tests()
        fi
 
        # set fullmesh flag
-       if reset "set fullmesh flag test"; then
+       if reset "set fullmesh flag test" &&
+          continue_if mptcp_lib_kversion_ge 5.18; then
                pm_nl_set_limits $ns1 4 4
                pm_nl_add_endpoint $ns1 10.0.2.1 flags subflow
                pm_nl_set_limits $ns2 4 4
@@ -2969,7 +3120,8 @@ fullmesh_tests()
        fi
 
        # set nofullmesh flag
-       if reset "set nofullmesh flag test"; then
+       if reset "set nofullmesh flag test" &&
+          continue_if mptcp_lib_kversion_ge 5.18; then
                pm_nl_set_limits $ns1 4 4
                pm_nl_add_endpoint $ns1 10.0.2.1 flags subflow,fullmesh
                pm_nl_set_limits $ns2 4 4
@@ -2979,7 +3131,8 @@ fullmesh_tests()
        fi
 
        # set backup,fullmesh flags
-       if reset "set backup,fullmesh flags test"; then
+       if reset "set backup,fullmesh flags test" &&
+          continue_if mptcp_lib_kversion_ge 5.18; then
                pm_nl_set_limits $ns1 4 4
                pm_nl_add_endpoint $ns1 10.0.2.1 flags subflow
                pm_nl_set_limits $ns2 4 4
@@ -2990,7 +3143,8 @@ fullmesh_tests()
        fi
 
        # set nobackup,nofullmesh flags
-       if reset "set nobackup,nofullmesh flags test"; then
+       if reset "set nobackup,nofullmesh flags test" &&
+          continue_if mptcp_lib_kversion_ge 5.18; then
                pm_nl_set_limits $ns1 4 4
                pm_nl_set_limits $ns2 4 4
                pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow,backup,fullmesh
@@ -3003,14 +3157,14 @@ fullmesh_tests()
 
 fastclose_tests()
 {
-       if reset "fastclose test"; then
+       if reset_check_counter "fastclose test" "MPTcpExtMPFastcloseTx"; then
                run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_client
                chk_join_nr 0 0 0
                chk_fclose_nr 1 1
                chk_rst_nr 1 1 invert
        fi
 
-       if reset "fastclose server test"; then
+       if reset_check_counter "fastclose server test" "MPTcpExtMPFastcloseRx"; then
                run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_server
                chk_join_nr 0 0 0
                chk_fclose_nr 1 1 invert
@@ -3048,7 +3202,8 @@ fail_tests()
 userspace_tests()
 {
        # userspace pm type prevents add_addr
-       if reset "userspace pm type prevents add_addr"; then
+       if reset "userspace pm type prevents add_addr" &&
+          continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
                set_userspace_pm $ns1
                pm_nl_set_limits $ns1 0 2
                pm_nl_set_limits $ns2 0 2
@@ -3059,7 +3214,8 @@ userspace_tests()
        fi
 
        # userspace pm type does not echo add_addr without daemon
-       if reset "userspace pm no echo w/o daemon"; then
+       if reset "userspace pm no echo w/o daemon" &&
+          continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
                set_userspace_pm $ns2
                pm_nl_set_limits $ns1 0 2
                pm_nl_set_limits $ns2 0 2
@@ -3070,7 +3226,8 @@ userspace_tests()
        fi
 
        # userspace pm type rejects join
-       if reset "userspace pm type rejects join"; then
+       if reset "userspace pm type rejects join" &&
+          continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
                set_userspace_pm $ns1
                pm_nl_set_limits $ns1 1 1
                pm_nl_set_limits $ns2 1 1
@@ -3080,7 +3237,8 @@ userspace_tests()
        fi
 
        # userspace pm type does not send join
-       if reset "userspace pm type does not send join"; then
+       if reset "userspace pm type does not send join" &&
+          continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
                set_userspace_pm $ns2
                pm_nl_set_limits $ns1 1 1
                pm_nl_set_limits $ns2 1 1
@@ -3090,7 +3248,8 @@ userspace_tests()
        fi
 
        # userspace pm type prevents mp_prio
-       if reset "userspace pm type prevents mp_prio"; then
+       if reset "userspace pm type prevents mp_prio" &&
+          continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
                set_userspace_pm $ns1
                pm_nl_set_limits $ns1 1 1
                pm_nl_set_limits $ns2 1 1
@@ -3101,7 +3260,8 @@ userspace_tests()
        fi
 
        # userspace pm type prevents rm_addr
-       if reset "userspace pm type prevents rm_addr"; then
+       if reset "userspace pm type prevents rm_addr" &&
+          continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
                set_userspace_pm $ns1
                set_userspace_pm $ns2
                pm_nl_set_limits $ns1 0 1
@@ -3113,7 +3273,8 @@ userspace_tests()
        fi
 
        # userspace pm add & remove address
-       if reset_with_events "userspace pm add & remove address"; then
+       if reset_with_events "userspace pm add & remove address" &&
+          continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
                set_userspace_pm $ns1
                pm_nl_set_limits $ns2 1 1
                run_tests $ns1 $ns2 10.0.1.1 0 userspace_1 0 slow
@@ -3124,20 +3285,23 @@ userspace_tests()
        fi
 
        # userspace pm create destroy subflow
-       if reset_with_events "userspace pm create destroy subflow"; then
+       if reset_with_events "userspace pm create destroy subflow" &&
+          continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
                set_userspace_pm $ns2
                pm_nl_set_limits $ns1 0 1
                run_tests $ns1 $ns2 10.0.1.1 0 0 userspace_1 slow
                chk_join_nr 1 1 1
-               chk_rm_nr 0 1
+               chk_rm_nr 1 1
                kill_events_pids
        fi
 }
 
 endpoint_tests()
 {
+       # subflow_rebuild_header is needed to support the implicit flag
        # userspace pm type prevents add_addr
-       if reset "implicit EP"; then
+       if reset "implicit EP" &&
+          mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then
                pm_nl_set_limits $ns1 2 2
                pm_nl_set_limits $ns2 2 2
                pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
@@ -3157,7 +3321,8 @@ endpoint_tests()
                kill_tests_wait
        fi
 
-       if reset "delete and re-add"; then
+       if reset "delete and re-add" &&
+          mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then
                pm_nl_set_limits $ns1 1 1
                pm_nl_set_limits $ns2 1 1
                pm_nl_add_endpoint $ns2 10.0.2.2 id 2 dev ns2eth2 flags subflow
diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
new file mode 100644 (file)
index 0000000..f32045b
--- /dev/null
@@ -0,0 +1,104 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+readonly KSFT_FAIL=1
+readonly KSFT_SKIP=4
+
+# SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var can be set when validating all
+# features using the last version of the kernel and the selftests to make sure
+# a test is not being skipped by mistake.
+mptcp_lib_expect_all_features() {
+       [ "${SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES:-}" = "1" ]
+}
+
+# $1: msg
+mptcp_lib_fail_if_expected_feature() {
+       if mptcp_lib_expect_all_features; then
+               echo "ERROR: missing feature: ${*}"
+               exit ${KSFT_FAIL}
+       fi
+
+       return 1
+}
+
+# $1: file
+mptcp_lib_has_file() {
+       local f="${1}"
+
+       if [ -f "${f}" ]; then
+               return 0
+       fi
+
+       mptcp_lib_fail_if_expected_feature "${f} file not found"
+}
+
+mptcp_lib_check_mptcp() {
+       if ! mptcp_lib_has_file "/proc/sys/net/mptcp/enabled"; then
+               echo "SKIP: MPTCP support is not available"
+               exit ${KSFT_SKIP}
+       fi
+}
+
+mptcp_lib_check_kallsyms() {
+       if ! mptcp_lib_has_file "/proc/kallsyms"; then
+               echo "SKIP: CONFIG_KALLSYMS is missing"
+               exit ${KSFT_SKIP}
+       fi
+}
+
+# Internal: use mptcp_lib_kallsyms_has() instead
+__mptcp_lib_kallsyms_has() {
+       local sym="${1}"
+
+       mptcp_lib_check_kallsyms
+
+       grep -q " ${sym}" /proc/kallsyms
+}
+
+# $1: part of a symbol to look at, add '$' at the end for full name
+mptcp_lib_kallsyms_has() {
+       local sym="${1}"
+
+       if __mptcp_lib_kallsyms_has "${sym}"; then
+               return 0
+       fi
+
+       mptcp_lib_fail_if_expected_feature "${sym} symbol not found"
+}
+
+# $1: part of a symbol to look at, add '$' at the end for full name
+mptcp_lib_kallsyms_doesnt_have() {
+       local sym="${1}"
+
+       if ! __mptcp_lib_kallsyms_has "${sym}"; then
+               return 0
+       fi
+
+       mptcp_lib_fail_if_expected_feature "${sym} symbol has been found"
+}
+
+# !!!AVOID USING THIS!!!
+# Features might not land in the expected version and features can be backported
+#
+# $1: kernel version, e.g. 6.3
+mptcp_lib_kversion_ge() {
+       local exp_maj="${1%.*}"
+       local exp_min="${1#*.}"
+       local v maj min
+
+       # If the kernel has backported features, set this env var to 1:
+       if [ "${SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK:-}" = "1" ]; then
+               return 0
+       fi
+
+       v=$(uname -r | cut -d'.' -f1,2)
+       maj=${v%.*}
+       min=${v#*.}
+
+       if   [ "${maj}" -gt "${exp_maj}" ] ||
+          { [ "${maj}" -eq "${exp_maj}" ] && [ "${min}" -ge "${exp_min}" ]; }; then
+               return 0
+       fi
+
+       mptcp_lib_fail_if_expected_feature "kernel version ${1} lower than ${v}"
+}
index ae61f39..b35148e 100644 (file)
@@ -87,6 +87,10 @@ struct so_state {
        uint64_t tcpi_rcv_delta;
 };
 
+#ifndef MIN
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+#endif
+
 static void die_perror(const char *msg)
 {
        perror(msg);
@@ -349,13 +353,14 @@ static void do_getsockopt_tcp_info(struct so_state *s, int fd, size_t r, size_t
                        xerror("getsockopt MPTCP_TCPINFO (tries %d, %m)");
 
                assert(olen <= sizeof(ti));
-               assert(ti.d.size_user == ti.d.size_kernel);
-               assert(ti.d.size_user == sizeof(struct tcp_info));
+               assert(ti.d.size_kernel > 0);
+               assert(ti.d.size_user ==
+                      MIN(ti.d.size_kernel, sizeof(struct tcp_info)));
                assert(ti.d.num_subflows == 1);
 
                assert(olen > (socklen_t)sizeof(struct mptcp_subflow_data));
                olen -= sizeof(struct mptcp_subflow_data);
-               assert(olen == sizeof(struct tcp_info));
+               assert(olen == ti.d.size_user);
 
                if (ti.ti[0].tcpi_bytes_sent == w &&
                    ti.ti[0].tcpi_bytes_received == r)
@@ -401,13 +406,14 @@ static void do_getsockopt_subflow_addrs(int fd)
                die_perror("getsockopt MPTCP_SUBFLOW_ADDRS");
 
        assert(olen <= sizeof(addrs));
-       assert(addrs.d.size_user == addrs.d.size_kernel);
-       assert(addrs.d.size_user == sizeof(struct mptcp_subflow_addrs));
+       assert(addrs.d.size_kernel > 0);
+       assert(addrs.d.size_user ==
+              MIN(addrs.d.size_kernel, sizeof(struct mptcp_subflow_addrs)));
        assert(addrs.d.num_subflows == 1);
 
        assert(olen > (socklen_t)sizeof(struct mptcp_subflow_data));
        olen -= sizeof(struct mptcp_subflow_data);
-       assert(olen == sizeof(struct mptcp_subflow_addrs));
+       assert(olen == addrs.d.size_user);
 
        llen = sizeof(local);
        ret = getsockname(fd, (struct sockaddr *)&local, &llen);
index 1b70c0a..f295a37 100755 (executable)
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+. "$(dirname "${0}")/mptcp_lib.sh"
+
 ret=0
 sin=""
 sout=""
@@ -84,6 +86,9 @@ cleanup()
        rm -f "$sin" "$sout"
 }
 
+mptcp_lib_check_mptcp
+mptcp_lib_check_kallsyms
+
 ip -Version > /dev/null 2>&1
 if [ $? -ne 0 ];then
        echo "SKIP: Could not run test without ip tool"
@@ -182,9 +187,14 @@ do_transfer()
                local_addr="0.0.0.0"
        fi
 
+       cmsg="TIMESTAMPNS"
+       if mptcp_lib_kallsyms_has "mptcp_ioctl$"; then
+               cmsg+=",TCPINQ"
+       fi
+
        timeout ${timeout_test} \
                ip netns exec ${listener_ns} \
-                       $mptcp_connect -t ${timeout_poll} -l -M 1 -p $port -s ${srv_proto} -c TIMESTAMPNS,TCPINQ \
+                       $mptcp_connect -t ${timeout_poll} -l -M 1 -p $port -s ${srv_proto} -c "${cmsg}" \
                                ${local_addr} < "$sin" > "$sout" &
        local spid=$!
 
@@ -192,7 +202,7 @@ do_transfer()
 
        timeout ${timeout_test} \
                ip netns exec ${connector_ns} \
-                       $mptcp_connect -t ${timeout_poll} -M 2 -p $port -s ${cl_proto} -c TIMESTAMPNS,TCPINQ \
+                       $mptcp_connect -t ${timeout_poll} -M 2 -p $port -s ${cl_proto} -c "${cmsg}" \
                                $connect_addr < "$cin" > "$cout" &
 
        local cpid=$!
@@ -249,6 +259,11 @@ do_mptcp_sockopt_tests()
 {
        local lret=0
 
+       if ! mptcp_lib_kallsyms_has "mptcp_diag_fill_info$"; then
+               echo "INFO: MPTCP sockopt not supported: SKIP"
+               return
+       fi
+
        ip netns exec "$ns_sbox" ./mptcp_sockopt
        lret=$?
 
@@ -303,6 +318,11 @@ do_tcpinq_tests()
 {
        local lret=0
 
+       if ! mptcp_lib_kallsyms_has "mptcp_ioctl$"; then
+               echo "INFO: TCP_INQ not supported: SKIP"
+               return
+       fi
+
        local args
        for args in "-t tcp" "-r tcp"; do
                do_tcpinq_test $args
index 89839d1..d02e0d6 100755 (executable)
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+. "$(dirname "${0}")/mptcp_lib.sh"
+
 ksft_skip=4
 ret=0
 
@@ -34,6 +36,8 @@ cleanup()
        ip netns del $ns1
 }
 
+mptcp_lib_check_mptcp
+
 ip -Version > /dev/null 2>&1
 if [ $? -ne 0 ];then
        echo "SKIP: Could not run test without ip tool"
@@ -69,8 +73,12 @@ check()
 }
 
 check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "defaults addr list"
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
+
+default_limits="$(ip netns exec $ns1 ./pm_nl_ctl limits)"
+if mptcp_lib_expect_all_features; then
+       check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
 subflows 2" "defaults limits"
+fi
 
 ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.1
 ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.2 flags subflow dev lo
@@ -117,12 +125,10 @@ ip netns exec $ns1 ./pm_nl_ctl flush
 check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "flush addrs"
 
 ip netns exec $ns1 ./pm_nl_ctl limits 9 1
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
-subflows 2" "rcv addrs above hard limit"
+check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "rcv addrs above hard limit"
 
 ip netns exec $ns1 ./pm_nl_ctl limits 1 9
-check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0
-subflows 2" "subflows above hard limit"
+check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "subflows above hard limit"
 
 ip netns exec $ns1 ./pm_nl_ctl limits 8 8
 check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 8
@@ -172,14 +178,19 @@ subflow,backup 10.0.1.1" "set flags (backup)"
 ip netns exec $ns1 ./pm_nl_ctl set 10.0.1.1 flags nobackup
 check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
 subflow 10.0.1.1" "          (nobackup)"
+
+# fullmesh support has been added later
 ip netns exec $ns1 ./pm_nl_ctl set id 1 flags fullmesh
-check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
+if ip netns exec $ns1 ./pm_nl_ctl dump | grep -q "fullmesh" ||
+   mptcp_lib_expect_all_features; then
+       check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
 subflow,fullmesh 10.0.1.1" "          (fullmesh)"
-ip netns exec $ns1 ./pm_nl_ctl set id 1 flags nofullmesh
-check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
+       ip netns exec $ns1 ./pm_nl_ctl set id 1 flags nofullmesh
+       check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
 subflow 10.0.1.1" "          (nofullmesh)"
-ip netns exec $ns1 ./pm_nl_ctl set id 1 flags backup,fullmesh
-check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
+       ip netns exec $ns1 ./pm_nl_ctl set id 1 flags backup,fullmesh
+       check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
 subflow,backup,fullmesh 10.0.1.1" "          (backup,fullmesh)"
+fi
 
 exit $ret
index 9f22f7e..36a3c9d 100755 (executable)
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+. "$(dirname "${0}")/mptcp_lib.sh"
+
 sec=$(date +%s)
 rndh=$(printf %x $sec)-$(mktemp -u XXXXXX)
 ns1="ns1-$rndh"
@@ -34,6 +36,8 @@ cleanup()
        done
 }
 
+mptcp_lib_check_mptcp
+
 ip -Version > /dev/null 2>&1
 if [ $? -ne 0 ];then
        echo "SKIP: Could not run test without ip tool"
index b1eb7bc..98d9e4d 100755 (executable)
@@ -1,10 +1,20 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+. "$(dirname "${0}")/mptcp_lib.sh"
+
+mptcp_lib_check_mptcp
+mptcp_lib_check_kallsyms
+
+if ! mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
+       echo "userspace pm tests are not supported by the kernel: SKIP"
+       exit ${KSFT_SKIP}
+fi
+
 ip -Version > /dev/null 2>&1
 if [ $? -ne 0 ];then
        echo "SKIP: Cannot not run test without ip tool"
-       exit 1
+       exit ${KSFT_SKIP}
 fi
 
 ANNOUNCED=6        # MPTCP_EVENT_ANNOUNCED
@@ -905,6 +915,11 @@ test_listener()
 {
        print_title "Listener tests"
 
+       if ! mptcp_lib_kallsyms_has "mptcp_event_pm_listener$"; then
+               stdbuf -o0 -e0 printf "LISTENER events                                            \t[SKIP] Not supported\n"
+               return
+       fi
+
        # Capture events on the network namespace running the client
        :>$client_evts
 
index 1003119..f962823 100755 (executable)
@@ -232,10 +232,14 @@ setup_rt_networking()
        local nsname=rt-${rt}
 
        ip netns add ${nsname}
+
+       ip netns exec ${nsname} sysctl -wq net.ipv6.conf.all.accept_dad=0
+       ip netns exec ${nsname} sysctl -wq net.ipv6.conf.default.accept_dad=0
+
        ip link set veth-rt-${rt} netns ${nsname}
        ip -netns ${nsname} link set veth-rt-${rt} name veth0
 
-       ip -netns ${nsname} addr add ${IPv6_RT_NETWORK}::${rt}/64 dev veth0
+       ip -netns ${nsname} addr add ${IPv6_RT_NETWORK}::${rt}/64 dev veth0 nodad
        ip -netns ${nsname} link set veth0 up
        ip -netns ${nsname} link set lo up
 
@@ -254,6 +258,12 @@ setup_hs()
 
        # set the networking for the host
        ip netns add ${hsname}
+
+       # disable the rp_filter otherwise the kernel gets confused about how
+       # to route decap ipv4 packets.
+       ip netns exec ${rtname} sysctl -wq net.ipv4.conf.all.rp_filter=0
+       ip netns exec ${rtname} sysctl -wq net.ipv4.conf.default.rp_filter=0
+
        ip -netns ${hsname} link add veth0 type veth peer name ${rtveth}
        ip -netns ${hsname} link set ${rtveth} netns ${rtname}
        ip -netns ${hsname} addr add ${IPv4_HS_NETWORK}.${hs}/24 dev veth0
@@ -272,11 +282,6 @@ setup_hs()
 
        ip netns exec ${rtname} sysctl -wq net.ipv4.conf.${rtveth}.proxy_arp=1
 
-       # disable the rp_filter otherwise the kernel gets confused about how
-       # to route decap ipv4 packets.
-       ip netns exec ${rtname} sysctl -wq net.ipv4.conf.all.rp_filter=0
-       ip netns exec ${rtname} sysctl -wq net.ipv4.conf.${rtveth}.rp_filter=0
-
        ip netns exec ${rtname} sh -c "echo 1 > /proc/sys/net/vrf/strict_mode"
 }
 
index e699548..ff36844 100644 (file)
@@ -25,6 +25,8 @@
 #define TLS_PAYLOAD_MAX_LEN 16384
 #define SOL_TLS 282
 
+static int fips_enabled;
+
 struct tls_crypto_info_keys {
        union {
                struct tls12_crypto_info_aes_gcm_128 aes128;
@@ -235,7 +237,7 @@ FIXTURE_VARIANT(tls)
 {
        uint16_t tls_version;
        uint16_t cipher_type;
-       bool nopad;
+       bool nopad, fips_non_compliant;
 };
 
 FIXTURE_VARIANT_ADD(tls, 12_aes_gcm)
@@ -254,24 +256,28 @@ FIXTURE_VARIANT_ADD(tls, 12_chacha)
 {
        .tls_version = TLS_1_2_VERSION,
        .cipher_type = TLS_CIPHER_CHACHA20_POLY1305,
+       .fips_non_compliant = true,
 };
 
 FIXTURE_VARIANT_ADD(tls, 13_chacha)
 {
        .tls_version = TLS_1_3_VERSION,
        .cipher_type = TLS_CIPHER_CHACHA20_POLY1305,
+       .fips_non_compliant = true,
 };
 
 FIXTURE_VARIANT_ADD(tls, 13_sm4_gcm)
 {
        .tls_version = TLS_1_3_VERSION,
        .cipher_type = TLS_CIPHER_SM4_GCM,
+       .fips_non_compliant = true,
 };
 
 FIXTURE_VARIANT_ADD(tls, 13_sm4_ccm)
 {
        .tls_version = TLS_1_3_VERSION,
        .cipher_type = TLS_CIPHER_SM4_CCM,
+       .fips_non_compliant = true,
 };
 
 FIXTURE_VARIANT_ADD(tls, 12_aes_ccm)
@@ -311,6 +317,9 @@ FIXTURE_SETUP(tls)
        int one = 1;
        int ret;
 
+       if (fips_enabled && variant->fips_non_compliant)
+               SKIP(return, "Unsupported cipher in FIPS mode");
+
        tls_crypto_info_init(variant->tls_version, variant->cipher_type,
                             &tls12);
 
@@ -1865,4 +1874,17 @@ TEST(prequeue) {
        close(cfd);
 }
 
+static void __attribute__((constructor)) fips_check(void) {
+       int res;
+       FILE *f;
+
+       f = fopen("/proc/sys/crypto/fips_enabled", "r");
+       if (f) {
+               res = fscanf(f, "%d", &fips_enabled);
+               if (res != 1)
+                       ksft_print_msg("ERROR: Couldn't read /proc/sys/crypto/fips_enabled\n");
+               fclose(f);
+       }
+}
+
 TEST_HARNESS_MAIN
index 184da81..452638a 100755 (executable)
@@ -264,60 +264,60 @@ setup_xfrm()
        ip -netns host1 xfrm state add src ${HOST1_4} dst ${HOST2_4} \
            proto esp spi ${SPI_1} reqid 0 mode tunnel \
            replay-window 4 replay-oseq 0x4 \
-           auth-trunc 'hmac(md5)' ${AUTH_1} 96 \
-           enc 'cbc(des3_ede)' ${ENC_1} \
+           auth-trunc 'hmac(sha1)' ${AUTH_1} 96 \
+           enc 'cbc(aes)' ${ENC_1} \
            sel src ${h1_4} dst ${h2_4} ${devarg}
 
        ip -netns host2 xfrm state add src ${HOST1_4} dst ${HOST2_4} \
            proto esp spi ${SPI_1} reqid 0 mode tunnel \
            replay-window 4 replay-oseq 0x4 \
-           auth-trunc 'hmac(md5)' ${AUTH_1} 96 \
-           enc 'cbc(des3_ede)' ${ENC_1} \
+           auth-trunc 'hmac(sha1)' ${AUTH_1} 96 \
+           enc 'cbc(aes)' ${ENC_1} \
            sel src ${h1_4} dst ${h2_4}
 
 
        ip -netns host1 xfrm state add src ${HOST2_4} dst ${HOST1_4} \
            proto esp spi ${SPI_2} reqid 0 mode tunnel \
            replay-window 4 replay-oseq 0x4 \
-           auth-trunc 'hmac(md5)' ${AUTH_2} 96 \
-           enc 'cbc(des3_ede)' ${ENC_2} \
+           auth-trunc 'hmac(sha1)' ${AUTH_2} 96 \
+           enc 'cbc(aes)' ${ENC_2} \
            sel src ${h2_4} dst ${h1_4} ${devarg}
 
        ip -netns host2 xfrm state add src ${HOST2_4} dst ${HOST1_4} \
            proto esp spi ${SPI_2} reqid 0 mode tunnel \
            replay-window 4 replay-oseq 0x4 \
-           auth-trunc 'hmac(md5)' ${AUTH_2} 96 \
-           enc 'cbc(des3_ede)' ${ENC_2} \
+           auth-trunc 'hmac(sha1)' ${AUTH_2} 96 \
+           enc 'cbc(aes)' ${ENC_2} \
            sel src ${h2_4} dst ${h1_4}
 
 
        ip -6 -netns host1 xfrm state add src ${HOST1_6} dst ${HOST2_6} \
            proto esp spi ${SPI_1} reqid 0 mode tunnel \
            replay-window 4 replay-oseq 0x4 \
-           auth-trunc 'hmac(md5)' ${AUTH_1} 96 \
-           enc 'cbc(des3_ede)' ${ENC_1} \
+           auth-trunc 'hmac(sha1)' ${AUTH_1} 96 \
+           enc 'cbc(aes)' ${ENC_1} \
            sel src ${h1_6} dst ${h2_6} ${devarg}
 
        ip -6 -netns host2 xfrm state add src ${HOST1_6} dst ${HOST2_6} \
            proto esp spi ${SPI_1} reqid 0 mode tunnel \
            replay-window 4 replay-oseq 0x4 \
-           auth-trunc 'hmac(md5)' ${AUTH_1} 96 \
-           enc 'cbc(des3_ede)' ${ENC_1} \
+           auth-trunc 'hmac(sha1)' ${AUTH_1} 96 \
+           enc 'cbc(aes)' ${ENC_1} \
            sel src ${h1_6} dst ${h2_6}
 
 
        ip -6 -netns host1 xfrm state add src ${HOST2_6} dst ${HOST1_6} \
            proto esp spi ${SPI_2} reqid 0 mode tunnel \
            replay-window 4 replay-oseq 0x4 \
-           auth-trunc 'hmac(md5)' ${AUTH_2} 96 \
-           enc 'cbc(des3_ede)' ${ENC_2} \
+           auth-trunc 'hmac(sha1)' ${AUTH_2} 96 \
+           enc 'cbc(aes)' ${ENC_2} \
            sel src ${h2_6} dst ${h1_6} ${devarg}
 
        ip -6 -netns host2 xfrm state add src ${HOST2_6} dst ${HOST1_6} \
            proto esp spi ${SPI_2} reqid 0 mode tunnel \
            replay-window 4 replay-oseq 0x4 \
-           auth-trunc 'hmac(md5)' ${AUTH_2} 96 \
-           enc 'cbc(des3_ede)' ${ENC_2} \
+           auth-trunc 'hmac(sha1)' ${AUTH_2} 96 \
+           enc 'cbc(aes)' ${ENC_2} \
            sel src ${h2_6} dst ${h1_6}
 }
 
index 7060bae..a32f490 100755 (executable)
@@ -188,6 +188,26 @@ if [ $? -ne 0 ]; then
        exit $ksft_skip
 fi
 
+ip netns exec $ns2 nft -f - <<EOF
+table inet filter {
+   counter ip4dscp0 { }
+   counter ip4dscp3 { }
+
+   chain input {
+      type filter hook input priority 0; policy accept;
+      meta l4proto tcp goto {
+             ip dscp cs3 counter name ip4dscp3 accept
+             ip dscp 0 counter name ip4dscp0 accept
+      }
+   }
+}
+EOF
+
+if [ $? -ne 0 ]; then
+       echo "SKIP: Could not load nft ruleset"
+       exit $ksft_skip
+fi
+
 # test basic connectivity
 if ! ip netns exec $ns1 ping -c 1 -q 10.0.2.99 > /dev/null; then
   echo "ERROR: $ns1 cannot reach ns2" 1>&2
@@ -255,6 +275,60 @@ check_counters()
        fi
 }
 
+check_dscp()
+{
+       local what=$1
+       local ok=1
+
+       local counter=$(ip netns exec $ns2 nft reset counter inet filter ip4dscp3 | grep packets)
+
+       local pc4=${counter%*bytes*}
+       local pc4=${pc4#*packets}
+
+       local counter=$(ip netns exec $ns2 nft reset counter inet filter ip4dscp0 | grep packets)
+       local pc4z=${counter%*bytes*}
+       local pc4z=${pc4z#*packets}
+
+       case "$what" in
+       "dscp_none")
+               if [ $pc4 -gt 0 ] || [ $pc4z -eq 0 ]; then
+                       echo "FAIL: dscp counters do not match, expected dscp3 == 0, dscp0 > 0, but got $pc4,$pc4z" 1>&2
+                       ret=1
+                       ok=0
+               fi
+               ;;
+       "dscp_fwd")
+               if [ $pc4 -eq 0 ] || [ $pc4z -eq 0 ]; then
+                       echo "FAIL: dscp counters do not match, expected dscp3 and dscp0 > 0 but got $pc4,$pc4z" 1>&2
+                       ret=1
+                       ok=0
+               fi
+               ;;
+       "dscp_ingress")
+               if [ $pc4 -eq 0 ] || [ $pc4z -gt 0 ]; then
+                       echo "FAIL: dscp counters do not match, expected dscp3 > 0, dscp0 == 0 but got $pc4,$pc4z" 1>&2
+                       ret=1
+                       ok=0
+               fi
+               ;;
+       "dscp_egress")
+               if [ $pc4 -eq 0 ] || [ $pc4z -gt 0 ]; then
+                       echo "FAIL: dscp counters do not match, expected dscp3 > 0, dscp0 == 0 but got $pc4,$pc4z" 1>&2
+                       ret=1
+                       ok=0
+               fi
+               ;;
+       *)
+               echo "FAIL: Unknown DSCP check" 1>&2
+               ret=1
+               ok=0
+       esac
+
+       if [ $ok -eq 1 ] ;then
+               echo "PASS: $what: dscp packet counters match"
+       fi
+}
+
 check_transfer()
 {
        in=$1
@@ -286,17 +360,26 @@ test_tcp_forwarding_ip()
        ip netns exec $nsa nc -w 4 "$dstip" "$dstport" < "$nsin" > "$ns1out" &
        cpid=$!
 
-       sleep 3
+       sleep 1
 
-       if ps -p $lpid > /dev/null;then
+       prev="$(ls -l $ns1out $ns2out)"
+       sleep 1
+
+       while [[ "$prev" != "$(ls -l $ns1out $ns2out)" ]]; do
+               sleep 1;
+               prev="$(ls -l $ns1out $ns2out)"
+       done
+
+       if test -d /proc/"$lpid"/; then
                kill $lpid
        fi
 
-       if ps -p $cpid > /dev/null;then
+       if test -d /proc/"$cpid"/; then
                kill $cpid
        fi
 
-       wait
+       wait $lpid
+       wait $cpid
 
        if ! check_transfer "$nsin" "$ns2out" "ns1 -> ns2"; then
                lret=1
@@ -316,6 +399,51 @@ test_tcp_forwarding()
        return $?
 }
 
+test_tcp_forwarding_set_dscp()
+{
+       check_dscp "dscp_none"
+
+ip netns exec $nsr1 nft -f - <<EOF
+table netdev dscpmangle {
+   chain setdscp0 {
+      type filter hook ingress device "veth0" priority 0; policy accept
+       ip dscp set cs3
+  }
+}
+EOF
+if [ $? -eq 0 ]; then
+       test_tcp_forwarding_ip "$1" "$2"  10.0.2.99 12345
+       check_dscp "dscp_ingress"
+
+       ip netns exec $nsr1 nft delete table netdev dscpmangle
+else
+       echo "SKIP: Could not load netdev:ingress for veth0"
+fi
+
+ip netns exec $nsr1 nft -f - <<EOF
+table netdev dscpmangle {
+   chain setdscp0 {
+      type filter hook egress device "veth1" priority 0; policy accept
+      ip dscp set cs3
+  }
+}
+EOF
+if [ $? -eq 0 ]; then
+       test_tcp_forwarding_ip "$1" "$2"  10.0.2.99 12345
+       check_dscp "dscp_egress"
+
+       ip netns exec $nsr1 nft flush table netdev dscpmangle
+else
+       echo "SKIP: Could not load netdev:egress for veth1"
+fi
+
+       # partial.  If flowtable really works, then both dscp-is-0 and dscp-is-cs3
+       # counters should have seen packets (before and after ft offload kicks in).
+       ip netns exec $nsr1 nft -a insert rule inet filter forward ip dscp set cs3
+       test_tcp_forwarding_ip "$1" "$2"  10.0.2.99 12345
+       check_dscp "dscp_fwd"
+}
+
 test_tcp_forwarding_nat()
 {
        local lret
@@ -385,6 +513,11 @@ table ip nat {
 }
 EOF
 
+if ! test_tcp_forwarding_set_dscp $ns1 $ns2 0 ""; then
+       echo "FAIL: flow offload for ns1/ns2 with dscp update" 1>&2
+       exit 0
+fi
+
 if ! test_tcp_forwarding_nat $ns1 $ns2 0 ""; then
        echo "FAIL: flow offload for ns1/ns2 with NAT" 1>&2
        ip netns exec $nsr1 nft list ruleset
@@ -489,8 +622,8 @@ ip -net $nsr1 addr add 10.0.1.1/24 dev veth0
 ip -net $nsr1 addr add dead:1::1/64 dev veth0
 ip -net $nsr1 link set up dev veth0
 
-KEY_SHA="0x"$(ps -xaf | sha1sum | cut -d " " -f 1)
-KEY_AES="0x"$(ps -xaf | md5sum | cut -d " " -f 1)
+KEY_SHA="0x"$(ps -af | sha1sum | cut -d " " -f 1)
+KEY_AES="0x"$(ps -af | md5sum | cut -d " " -f 1)
 SPI1=$RANDOM
 SPI2=$RANDOM
 
index bbce574..1b7b3c8 100644 (file)
@@ -64,7 +64,7 @@ QEMU_ARGS_mips       = -M malta -append "panic=-1 $(TEST:%=NOLIBC_TEST=%)"
 QEMU_ARGS_riscv      = -M virt -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
 QEMU_ARGS_s390       = -M s390-ccw-virtio -m 1G -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
 QEMU_ARGS_loongarch  = -M virt -append "console=ttyS0,115200 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
-QEMU_ARGS            = $(QEMU_ARGS_$(ARCH))
+QEMU_ARGS            = $(QEMU_ARGS_$(ARCH)) $(QEMU_ARGS_EXTRA)
 
 # OUTPUT is only set when run from the main makefile, otherwise
 # it defaults to this nolibc directory.
@@ -76,16 +76,12 @@ else
 Q=@
 endif
 
-CFLAGS_STACKPROTECTOR = -DNOLIBC_STACKPROTECTOR \
-                       $(call cc-option,-mstack-protector-guard=global) \
-                       $(call cc-option,-fstack-protector-all)
-CFLAGS_STKP_i386 = $(CFLAGS_STACKPROTECTOR)
-CFLAGS_STKP_x86_64 = $(CFLAGS_STACKPROTECTOR)
-CFLAGS_STKP_x86 = $(CFLAGS_STACKPROTECTOR)
 CFLAGS_s390 = -m64
-CFLAGS  ?= -Os -fno-ident -fno-asynchronous-unwind-tables \
+CFLAGS_mips = -EL
+CFLAGS_STACKPROTECTOR ?= $(call cc-option,-mstack-protector-guard=global $(call cc-option,-fstack-protector-all))
+CFLAGS  ?= -Os -fno-ident -fno-asynchronous-unwind-tables -std=c89 \
                $(call cc-option,-fno-stack-protector) \
-               $(CFLAGS_STKP_$(ARCH)) $(CFLAGS_$(ARCH))
+               $(CFLAGS_$(ARCH)) $(CFLAGS_STACKPROTECTOR)
 LDFLAGS := -s
 
 help:
@@ -94,6 +90,7 @@ help:
        @echo "  help         this help"
        @echo "  sysroot      create the nolibc sysroot here (uses \$$ARCH)"
        @echo "  nolibc-test  build the executable (uses \$$CC and \$$CROSS_COMPILE)"
+       @echo "  libc-test    build an executable using the compiler's default libc instead"
        @echo "  run-user     runs the executable under QEMU (uses \$$ARCH, \$$TEST)"
        @echo "  initramfs    prepare the initramfs with nolibc-test"
        @echo "  defconfig    create a fresh new default config (uses \$$ARCH)"
@@ -128,10 +125,16 @@ nolibc-test: nolibc-test.c sysroot/$(ARCH)/include
        $(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ \
          -nostdlib -static -Isysroot/$(ARCH)/include $< -lgcc
 
+libc-test: nolibc-test.c
+       $(QUIET_CC)$(CC) -o $@ $<
+
 # qemu user-land test
 run-user: nolibc-test
        $(Q)qemu-$(QEMU_ARCH) ./nolibc-test > "$(CURDIR)/run.out" || :
-       $(Q)grep -w FAIL "$(CURDIR)/run.out" && echo "See all results in $(CURDIR)/run.out" || echo "$$(grep -c ^[0-9].*OK $(CURDIR)/run.out) test(s) passed."
+       $(Q)awk '/\[OK\][\r]*$$/{p++} /\[FAIL\][\r]*$$/{f++} /\[SKIPPED\][\r]*$$/{s++} \
+                END{ printf("%d test(s) passed, %d skipped, %d failed.", p, s, f); \
+                if (s+f > 0) printf(" See all results in %s\n", ARGV[1]); else print; }' \
+                $(CURDIR)/run.out
 
 initramfs: nolibc-test
        $(QUIET_MKDIR)mkdir -p initramfs
@@ -147,18 +150,26 @@ kernel: initramfs
 # run the tests after building the kernel
 run: kernel
        $(Q)qemu-system-$(QEMU_ARCH) -display none -no-reboot -kernel "$(srctree)/$(IMAGE)" -serial stdio $(QEMU_ARGS) > "$(CURDIR)/run.out"
-       $(Q)grep -w FAIL "$(CURDIR)/run.out" && echo "See all results in $(CURDIR)/run.out" || echo "$$(grep -c ^[0-9].*OK $(CURDIR)/run.out) test(s) passed."
+       $(Q)awk '/\[OK\][\r]*$$/{p++} /\[FAIL\][\r]*$$/{f++} /\[SKIPPED\][\r]*$$/{s++} \
+                END{ printf("%d test(s) passed, %d skipped, %d failed.", p, s, f); \
+                if (s+f > 0) printf(" See all results in %s\n", ARGV[1]); else print; }' \
+                $(CURDIR)/run.out
 
 # re-run the tests from an existing kernel
 rerun:
        $(Q)qemu-system-$(QEMU_ARCH) -display none -no-reboot -kernel "$(srctree)/$(IMAGE)" -serial stdio $(QEMU_ARGS) > "$(CURDIR)/run.out"
-       $(Q)grep -w FAIL "$(CURDIR)/run.out" && echo "See all results in $(CURDIR)/run.out" || echo "$$(grep -c ^[0-9].*OK $(CURDIR)/run.out) test(s) passed."
+       $(Q)awk '/\[OK\][\r]*$$/{p++} /\[FAIL\][\r]*$$/{f++} /\[SKIPPED\][\r]*$$/{s++} \
+                END{ printf("%d test(s) passed, %d skipped, %d failed.", p, s, f); \
+                if (s+f > 0) printf(" See all results in %s\n", ARGV[1]); else print; }' \
+                $(CURDIR)/run.out
 
 clean:
        $(call QUIET_CLEAN, sysroot)
        $(Q)rm -rf sysroot
        $(call QUIET_CLEAN, nolibc-test)
        $(Q)rm -f nolibc-test
+       $(call QUIET_CLEAN, libc-test)
+       $(Q)rm -f libc-test
        $(call QUIET_CLEAN, initramfs)
        $(Q)rm -rf initramfs
        $(call QUIET_CLEAN, run.out)
index 21bacc9..4863349 100644 (file)
@@ -1,10 +1,7 @@
-// SPDX-License-Identifier: GPL-2.0
+/* SPDX-License-Identifier: GPL-2.0 */
 
 #define _GNU_SOURCE
 
-/* platform-specific include files coming from the compiler */
-#include <limits.h>
-
 /* libc-specific include files
  * The program may be built in 3 ways:
  *   $(CC) -nostdlib -include /path/to/nolibc.h => NOLIBC already defined
@@ -20,7 +17,9 @@
 #include <linux/reboot.h>
 #include <sys/io.h>
 #include <sys/ioctl.h>
+#include <sys/mman.h>
 #include <sys/mount.h>
+#include <sys/prctl.h>
 #include <sys/reboot.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
 #include <sched.h>
 #include <signal.h>
 #include <stdarg.h>
+#include <stddef.h>
+#include <stdint.h>
 #include <unistd.h>
+#include <limits.h>
 #endif
 #endif
 
@@ -43,8 +45,8 @@ char **environ;
 
 /* definition of a series of tests */
 struct test {
-       const char *name;              // test name
-       int (*func)(int min, int max); // handler
+       const char *name;              /* test name */
+       int (*func)(int min, int max); /* handler */
 };
 
 #ifndef _NOLIBC_STDLIB_H
@@ -103,24 +105,32 @@ const char *errorname(int err)
        CASE_ERR(EDOM);
        CASE_ERR(ERANGE);
        CASE_ERR(ENOSYS);
+       CASE_ERR(EOVERFLOW);
        default:
                return itoa(err);
        }
 }
 
+static void putcharn(char c, size_t n)
+{
+       char buf[64];
+
+       memset(buf, c, n);
+       buf[n] = '\0';
+       fputs(buf, stdout);
+}
+
 static int pad_spc(int llen, int cnt, const char *fmt, ...)
 {
        va_list args;
-       int len;
        int ret;
 
-       for (len = 0; len < cnt - llen; len++)
-               putchar(' ');
+       putcharn(' ', cnt - llen);
 
        va_start(args, fmt);
        ret = vfprintf(stdout, fmt, args);
        va_end(args);
-       return ret < 0 ? ret : ret + len;
+       return ret < 0 ? ret : ret + cnt - llen;
 }
 
 /* The tests below are intended to be used by the macroes, which evaluate
@@ -162,7 +172,7 @@ static int expect_eq(uint64_t expr, int llen, uint64_t val)
 {
        int ret = !(expr == val);
 
-       llen += printf(" = %lld ", expr);
+       llen += printf(" = %lld ", (long long)expr);
        pad_spc(llen, 64, ret ? "[FAIL]\n" : " [OK]\n");
        return ret;
 }
@@ -290,18 +300,24 @@ static int expect_sysne(int expr, int llen, int val)
 }
 
 
+#define EXPECT_SYSER2(cond, expr, expret, experr1, experr2)            \
+       do { if (!cond) pad_spc(llen, 64, "[SKIPPED]\n"); else ret += expect_syserr2(expr, expret, experr1, experr2, llen); } while (0)
+
 #define EXPECT_SYSER(cond, expr, expret, experr)                       \
-       do { if (!cond) pad_spc(llen, 64, "[SKIPPED]\n"); else ret += expect_syserr(expr, expret, experr, llen); } while (0)
+       EXPECT_SYSER2(cond, expr, expret, experr, 0)
 
-static int expect_syserr(int expr, int expret, int experr, int llen)
+static int expect_syserr2(int expr, int expret, int experr1, int experr2, int llen)
 {
        int ret = 0;
        int _errno = errno;
 
        llen += printf(" = %d %s ", expr, errorname(_errno));
-       if (expr != expret || _errno != experr) {
+       if (expr != expret || (_errno != experr1 && _errno != experr2)) {
                ret = 1;
-               llen += printf(" != (%d %s) ", expret, errorname(experr));
+               if (experr2 == 0)
+                       llen += printf(" != (%d %s) ", expret, errorname(experr1));
+               else
+                       llen += printf(" != (%d %s %s) ", expret, errorname(experr1), errorname(experr2));
                llen += pad_spc(llen, 64, "[FAIL]\n");
        } else {
                llen += pad_spc(llen, 64, " [OK]\n");
@@ -471,11 +487,60 @@ static int test_getpagesize(void)
        return !c;
 }
 
+static int test_fork(void)
+{
+       int status;
+       pid_t pid;
+
+       /* flush the printf buffer to avoid child flush it */
+       fflush(stdout);
+       fflush(stderr);
+
+       pid = fork();
+
+       switch (pid) {
+       case -1:
+               return 1;
+
+       case 0:
+               exit(123);
+
+       default:
+               pid = waitpid(pid, &status, 0);
+
+               return pid == -1 || !WIFEXITED(status) || WEXITSTATUS(status) != 123;
+       }
+}
+
+static int test_stat_timestamps(void)
+{
+       struct stat st;
+
+       if (sizeof(st.st_atim.tv_sec) != sizeof(st.st_atime))
+               return 1;
+
+       if (stat("/proc/self/", &st))
+               return 1;
+
+       if (st.st_atim.tv_sec != st.st_atime || st.st_atim.tv_nsec > 1000000000)
+               return 1;
+
+       if (st.st_mtim.tv_sec != st.st_mtime || st.st_mtim.tv_nsec > 1000000000)
+               return 1;
+
+       if (st.st_ctim.tv_sec != st.st_ctime || st.st_ctim.tv_nsec > 1000000000)
+               return 1;
+
+       return 0;
+}
+
 /* Run syscall tests between IDs <min> and <max>.
  * Return 0 on success, non-zero on failure.
  */
 int run_syscall(int min, int max)
 {
+       struct timeval tv;
+       struct timezone tz;
        struct stat stat_buf;
        int euid0;
        int proc;
@@ -491,7 +556,7 @@ int run_syscall(int min, int max)
        euid0 = geteuid() == 0;
 
        for (test = min; test >= 0 && test <= max; test++) {
-               int llen = 0; // line length
+               int llen = 0; /* line length */
 
                /* avoid leaving empty lines below, this will insert holes into
                 * test numbers.
@@ -527,14 +592,11 @@ int run_syscall(int min, int max)
                CASE_TEST(dup3_0);            tmp = dup3(0, 100, 0);  EXPECT_SYSNE(1, tmp, -1); close(tmp); break;
                CASE_TEST(dup3_m1);           tmp = dup3(-1, 100, 0); EXPECT_SYSER(1, tmp, -1, EBADF); if (tmp != -1) close(tmp); break;
                CASE_TEST(execve_root);       EXPECT_SYSER(1, execve("/", (char*[]){ [0] = "/", [1] = NULL }, NULL), -1, EACCES); break;
+               CASE_TEST(fork);              EXPECT_SYSZR(1, test_fork()); break;
                CASE_TEST(getdents64_root);   EXPECT_SYSNE(1, test_getdents64("/"), -1); break;
                CASE_TEST(getdents64_null);   EXPECT_SYSER(1, test_getdents64("/dev/null"), -1, ENOTDIR); break;
-               CASE_TEST(gettimeofday_null); EXPECT_SYSZR(1, gettimeofday(NULL, NULL)); break;
-#ifdef NOLIBC
-               CASE_TEST(gettimeofday_bad1); EXPECT_SYSER(1, gettimeofday((void *)1, NULL), -1, EFAULT); break;
-               CASE_TEST(gettimeofday_bad2); EXPECT_SYSER(1, gettimeofday(NULL, (void *)1), -1, EFAULT); break;
-               CASE_TEST(gettimeofday_bad2); EXPECT_SYSER(1, gettimeofday(NULL, (void *)1), -1, EFAULT); break;
-#endif
+               CASE_TEST(gettimeofday_tv);   EXPECT_SYSZR(1, gettimeofday(&tv, NULL)); break;
+               CASE_TEST(gettimeofday_tv_tz);EXPECT_SYSZR(1, gettimeofday(&tv, &tz)); break;
                CASE_TEST(getpagesize);       EXPECT_SYSZR(1, test_getpagesize()); break;
                CASE_TEST(ioctl_tiocinq);     EXPECT_SYSZR(1, ioctl(0, TIOCINQ, &tmp)); break;
                CASE_TEST(ioctl_tiocinq);     EXPECT_SYSZR(1, ioctl(0, TIOCINQ, &tmp)); break;
@@ -550,6 +612,7 @@ int run_syscall(int min, int max)
                CASE_TEST(poll_null);         EXPECT_SYSZR(1, poll(NULL, 0, 0)); break;
                CASE_TEST(poll_stdout);       EXPECT_SYSNE(1, ({ struct pollfd fds = { 1, POLLOUT, 0}; poll(&fds, 1, 0); }), -1); break;
                CASE_TEST(poll_fault);        EXPECT_SYSER(1, poll((void *)1, 1, 0), -1, EFAULT); break;
+               CASE_TEST(prctl);             EXPECT_SYSER(1, prctl(PR_SET_NAME, (unsigned long)NULL, 0, 0, 0), -1, EFAULT); break;
                CASE_TEST(read_badf);         EXPECT_SYSER(1, read(-1, &tmp, 1), -1, EBADF); break;
                CASE_TEST(sched_yield);       EXPECT_SYSZR(1, sched_yield()); break;
                CASE_TEST(select_null);       EXPECT_SYSZR(1, ({ struct timeval tv = { 0 }; select(0, NULL, NULL, NULL, &tv); })); break;
@@ -557,6 +620,7 @@ int run_syscall(int min, int max)
                CASE_TEST(select_fault);      EXPECT_SYSER(1, select(1, (void *)1, NULL, NULL, 0), -1, EFAULT); break;
                CASE_TEST(stat_blah);         EXPECT_SYSER(1, stat("/proc/self/blah", &stat_buf), -1, ENOENT); break;
                CASE_TEST(stat_fault);        EXPECT_SYSER(1, stat(NULL, &stat_buf), -1, EFAULT); break;
+               CASE_TEST(stat_timestamps);   EXPECT_SYSZR(1, test_stat_timestamps()); break;
                CASE_TEST(symlink_root);      EXPECT_SYSER(1, symlink("/", "/"), -1, EEXIST); break;
                CASE_TEST(unlink_root);       EXPECT_SYSER(1, unlink("/"), -1, EISDIR); break;
                CASE_TEST(unlink_blah);       EXPECT_SYSER(1, unlink("/proc/self/blah"), -1, ENOENT); break;
@@ -565,6 +629,8 @@ int run_syscall(int min, int max)
                CASE_TEST(waitpid_child);     EXPECT_SYSER(1, waitpid(getpid(), &tmp, WNOHANG), -1, ECHILD); break;
                CASE_TEST(write_badf);        EXPECT_SYSER(1, write(-1, &tmp, 1), -1, EBADF); break;
                CASE_TEST(write_zero);        EXPECT_SYSZR(1, write(1, &tmp, 0)); break;
+               CASE_TEST(syscall_noargs);    EXPECT_SYSEQ(1, syscall(__NR_getpid), getpid()); break;
+               CASE_TEST(syscall_args);      EXPECT_SYSER(1, syscall(__NR_statx, 0, NULL, 0, 0, NULL), -1, EFAULT); break;
                case __LINE__:
                        return ret; /* must be last */
                /* note: do not set any defaults so as to permit holes above */
@@ -581,7 +647,7 @@ int run_stdlib(int min, int max)
        void *p1, *p2;
 
        for (test = min; test >= 0 && test <= max; test++) {
-               int llen = 0; // line length
+               int llen = 0; /* line length */
 
                /* avoid leaving empty lines below, this will insert holes into
                 * test numbers.
@@ -639,9 +705,9 @@ int run_stdlib(int min, int max)
                CASE_TEST(limit_int_fast32_min);    EXPECT_EQ(1, INT_FAST32_MIN,   (int_fast32_t)    INTPTR_MIN); break;
                CASE_TEST(limit_int_fast32_max);    EXPECT_EQ(1, INT_FAST32_MAX,   (int_fast32_t)    INTPTR_MAX); break;
                CASE_TEST(limit_uint_fast32_max);   EXPECT_EQ(1, UINT_FAST32_MAX,  (uint_fast32_t)   UINTPTR_MAX); break;
-               CASE_TEST(limit_int_fast64_min);    EXPECT_EQ(1, INT_FAST64_MIN,   (int_fast64_t)    INTPTR_MIN); break;
-               CASE_TEST(limit_int_fast64_max);    EXPECT_EQ(1, INT_FAST64_MAX,   (int_fast64_t)    INTPTR_MAX); break;
-               CASE_TEST(limit_uint_fast64_max);   EXPECT_EQ(1, UINT_FAST64_MAX,  (uint_fast64_t)   UINTPTR_MAX); break;
+               CASE_TEST(limit_int_fast64_min);    EXPECT_EQ(1, INT_FAST64_MIN,   (int_fast64_t)    INT64_MIN); break;
+               CASE_TEST(limit_int_fast64_max);    EXPECT_EQ(1, INT_FAST64_MAX,   (int_fast64_t)    INT64_MAX); break;
+               CASE_TEST(limit_uint_fast64_max);   EXPECT_EQ(1, UINT_FAST64_MAX,  (uint_fast64_t)   UINT64_MAX); break;
 #if __SIZEOF_LONG__ == 8
                CASE_TEST(limit_intptr_min);        EXPECT_EQ(1, INTPTR_MIN,       (intptr_t)        0x8000000000000000LL); break;
                CASE_TEST(limit_intptr_max);        EXPECT_EQ(1, INTPTR_MAX,       (intptr_t)        0x7fffffffffffffffLL); break;
@@ -667,17 +733,98 @@ int run_stdlib(int min, int max)
        return ret;
 }
 
-#if defined(__clang__)
-__attribute__((optnone))
-#elif defined(__GNUC__)
-__attribute__((optimize("O0")))
-#endif
+#define EXPECT_VFPRINTF(c, expected, fmt, ...)                         \
+       ret += expect_vfprintf(llen, c, expected, fmt, ##__VA_ARGS__)
+
+static int expect_vfprintf(int llen, size_t c, const char *expected, const char *fmt, ...)
+{
+       int ret, fd, w, r;
+       char buf[100];
+       FILE *memfile;
+       va_list args;
+
+       fd = memfd_create("vfprintf", 0);
+       if (fd == -1) {
+               pad_spc(llen, 64, "[FAIL]\n");
+               return 1;
+       }
+
+       memfile = fdopen(fd, "w+");
+       if (!memfile) {
+               pad_spc(llen, 64, "[FAIL]\n");
+               return 1;
+       }
+
+       va_start(args, fmt);
+       w = vfprintf(memfile, fmt, args);
+       va_end(args);
+
+       if (w != c) {
+               llen += printf(" written(%d) != %d", w, (int) c);
+               pad_spc(llen, 64, "[FAIL]\n");
+               return 1;
+       }
+
+       fflush(memfile);
+       lseek(fd, 0, SEEK_SET);
+
+       r = read(fd, buf, sizeof(buf) - 1);
+       buf[r] = '\0';
+
+       fclose(memfile);
+
+       if (r != w) {
+               llen += printf(" written(%d) != read(%d)", w, r);
+               pad_spc(llen, 64, "[FAIL]\n");
+               return 1;
+       }
+
+       llen += printf(" \"%s\" = \"%s\"", expected, buf);
+       ret = strncmp(expected, buf, c);
+
+       pad_spc(llen, 64, ret ? "[FAIL]\n" : " [OK]\n");
+       return ret;
+}
+
+static int run_vfprintf(int min, int max)
+{
+       int test;
+       int tmp;
+       int ret = 0;
+       void *p1, *p2;
+
+       for (test = min; test >= 0 && test <= max; test++) {
+               int llen = 0; /* line length */
+
+               /* avoid leaving empty lines below, this will insert holes into
+                * test numbers.
+                */
+               switch (test + __LINE__ + 1) {
+               CASE_TEST(empty);        EXPECT_VFPRINTF(0, "", ""); break;
+               CASE_TEST(simple);       EXPECT_VFPRINTF(3, "foo", "foo"); break;
+               CASE_TEST(string);       EXPECT_VFPRINTF(3, "foo", "%s", "foo"); break;
+               CASE_TEST(number);       EXPECT_VFPRINTF(4, "1234", "%d", 1234); break;
+               CASE_TEST(negnumber);    EXPECT_VFPRINTF(5, "-1234", "%d", -1234); break;
+               CASE_TEST(unsigned);     EXPECT_VFPRINTF(5, "12345", "%u", 12345); break;
+               CASE_TEST(char);         EXPECT_VFPRINTF(1, "c", "%c", 'c'); break;
+               CASE_TEST(hex);          EXPECT_VFPRINTF(1, "f", "%x", 0xf); break;
+               CASE_TEST(pointer);      EXPECT_VFPRINTF(3, "0x1", "%p", (void *) 0x1); break;
+               case __LINE__:
+                       return ret; /* must be last */
+               /* note: do not set any defaults so as to permit holes above */
+               }
+       }
+       return ret;
+}
+
 static int smash_stack(void)
 {
        char buf[100];
+       volatile char *ptr = buf;
+       size_t i;
 
-       for (size_t i = 0; i < 200; i++)
-               buf[i] = 'P';
+       for (i = 0; i < 200; i++)
+               ptr[i] = 'P';
 
        return 1;
 }
@@ -689,12 +836,20 @@ static int run_protection(int min, int max)
 
        llen += printf("0 -fstackprotector ");
 
-#if !defined(NOLIBC_STACKPROTECTOR)
+#if !defined(_NOLIBC_STACKPROTECTOR)
        llen += printf("not supported");
        pad_spc(llen, 64, "[SKIPPED]\n");
        return 0;
 #endif
 
+#if defined(_NOLIBC_STACKPROTECTOR)
+       if (!__stack_chk_guard) {
+               llen += printf("__stack_chk_guard not initialized");
+               pad_spc(llen, 64, "[FAIL]\n");
+               return 1;
+       }
+#endif
+
        pid = -1;
        pid = fork();
 
@@ -708,6 +863,7 @@ static int run_protection(int min, int max)
                close(STDOUT_FILENO);
                close(STDERR_FILENO);
 
+               prctl(PR_SET_DUMPABLE, 0, 0, 0, 0);
                smash_stack();
                return 1;
 
@@ -778,6 +934,7 @@ static const struct test test_names[] = {
        /* add new tests here */
        { .name = "syscall",    .func = run_syscall    },
        { .name = "stdlib",     .func = run_stdlib     },
+       { .name = "vfprintf",   .func = run_vfprintf   },
        { .name = "protection", .func = run_protection },
        { 0 }
 };
@@ -785,7 +942,7 @@ static const struct test test_names[] = {
 int main(int argc, char **argv, char **envp)
 {
        int min = 0;
-       int max = __INT_MAX__;
+       int max = INT_MAX;
        int ret = 0;
        int err;
        int idx;
@@ -833,7 +990,7 @@ int main(int argc, char **argv, char **envp)
                                 * here, which defaults to the full range.
                                 */
                                do {
-                                       min = 0; max = __INT_MAX__;
+                                       min = 0; max = INT_MAX;
                                        value = colon;
                                        if (value && *value) {
                                                colon = strchr(value, ':');
@@ -899,7 +1056,7 @@ int main(int argc, char **argv, char **envp)
 #else
                else if (ioperm(0x501, 1, 1) == 0)
 #endif
-                       asm volatile ("outb %%al, %%dx" :: "d"(0x501), "a"(0));
+                       __asm__ volatile ("outb %%al, %%dx" :: "d"(0x501), "a"(0));
                /* if it does nothing, fall back to the regular panic */
 #endif
        }
index 6922d64..88d6830 100644 (file)
@@ -90,7 +90,6 @@ again:
        }
 
        ret = WEXITSTATUS(status);
-       ksft_print_msg("waitpid WEXITSTATUS=%d\n", ret);
        return ret;
 }
 
index 3fd8e90..4e86f92 100644 (file)
@@ -143,6 +143,7 @@ static inline int child_join(struct child *child, struct error *err)
                r = -1;
        }
 
+       ksft_print_msg("waitpid WEXITSTATUS=%d\n", r);
        return r;
 }
 
index e2dd4ed..00a07e7 100644 (file)
@@ -115,7 +115,8 @@ static int test_pidfd_send_signal_exited_fail(void)
 
        pidfd = open(buf, O_DIRECTORY | O_CLOEXEC);
 
-       (void)wait_for_pid(pid);
+       ret = wait_for_pid(pid);
+       ksft_print_msg("waitpid WEXITSTATUS=%d\n", ret);
 
        if (pidfd < 0)
                ksft_exit_fail_msg(
index 26d853c..4275cb2 100644 (file)
@@ -97,7 +97,7 @@ TEST_F(vma, renaming) {
        TH_LOG("Try to pass invalid name (with non-printable character \\1) to rename the VMA");
        EXPECT_EQ(rename_vma((unsigned long)self->ptr_anon, AREA_SIZE, BAD_NAME), -EINVAL);
 
-       TH_LOG("Try to rename non-anonynous VMA");
+       TH_LOG("Try to rename non-anonymous VMA");
        EXPECT_EQ(rename_vma((unsigned long) self->ptr_not_anon, AREA_SIZE, GOOD_NAME), -EINVAL);
 }
 
index 198ad5f..cfa9562 100644 (file)
@@ -502,11 +502,11 @@ int main(int argc, char *argv[])
                        interval = t2 - t1;
                        offset = (t2 + t1) / 2 - tp;
 
-                       printf("system time: %lld.%u\n",
+                       printf("system time: %lld.%09u\n",
                                (pct+2*i)->sec, (pct+2*i)->nsec);
-                       printf("phc    time: %lld.%u\n",
+                       printf("phc    time: %lld.%09u\n",
                                (pct+2*i+1)->sec, (pct+2*i+1)->nsec);
-                       printf("system time: %lld.%u\n",
+                       printf("system time: %lld.%09u\n",
                                (pct+2*i+2)->sec, (pct+2*i+2)->nsec);
                        printf("system/phc clock time offset is %" PRId64 " ns\n"
                               "system     clock time delay  is %" PRId64 " ns\n",
index b52d506..48b9147 100644 (file)
@@ -250,7 +250,7 @@ identify_qemu_args () {
                echo -machine virt,gic-version=host -cpu host
                ;;
        qemu-system-ppc64)
-               echo -enable-kvm -M pseries -nodefaults
+               echo -M pseries -nodefaults
                echo -device spapr-vscsi
                if test -n "$TORTURE_QEMU_INTERACTIVE" -a -n "$TORTURE_QEMU_MAC"
                then
index f57720c..84f6bb9 100644 (file)
@@ -5,4 +5,4 @@ rcutree.gp_init_delay=3
 rcutree.gp_cleanup_delay=3
 rcutree.kthread_prio=2
 threadirqs
-tree.use_softirq=0
+rcutree.use_softirq=0
index 64f864f..8e50bfd 100644 (file)
@@ -4,4 +4,4 @@ rcutree.gp_init_delay=3
 rcutree.gp_cleanup_delay=3
 rcutree.kthread_prio=2
 threadirqs
-tree.use_softirq=0
+rcutree.use_softirq=0
index 97165a8..9274398 100755 (executable)
@@ -26,6 +26,7 @@ Usage: $0 [OPTIONS]
   -l | --list                  List the available collection:test entries
   -d | --dry-run               Don't actually run any tests
   -h | --help                  Show this usage info
+  -o | --override-timeout      Number of seconds after which we timeout
 EOF
        exit $1
 }
@@ -33,6 +34,7 @@ EOF
 COLLECTIONS=""
 TESTS=""
 dryrun=""
+kselftest_override_timeout=""
 while true; do
        case "$1" in
                -s | --summary)
@@ -51,6 +53,9 @@ while true; do
                -d | --dry-run)
                        dryrun="echo"
                        shift ;;
+               -o | --override-timeout)
+                       kselftest_override_timeout="$2"
+                       shift 2 ;;
                -h | --help)
                        usage 0 ;;
                "")
@@ -85,7 +90,7 @@ if [ -n "$TESTS" ]; then
        available="$(echo "$valid" | sed -e 's/ /\n/g')"
 fi
 
-collections=$(echo "$available" | cut -d: -f1 | uniq)
+collections=$(echo "$available" | cut -d: -f1 | sort | uniq)
 for collection in $collections ; do
        [ -w /dev/kmsg ] && echo "kselftest: Running tests in $collection" >> /dev/kmsg
        tests=$(echo "$available" | grep "^$collection:" | cut -d: -f2)
index 75af864..50aab6b 100644 (file)
@@ -17,6 +17,7 @@ ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
               -fno-stack-protector -mrdrnd $(INCLUDES)
 
 TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
+TEST_FILES := $(OUTPUT)/test_encl.elf
 
 ifeq ($(CAN_BUILD_X86_64), 1)
 all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
index 4638c63..6e73b09 100644 (file)
@@ -6,20 +6,18 @@ CONFIG_NF_CONNTRACK_MARK=y
 CONFIG_NF_CONNTRACK_ZONES=y
 CONFIG_NF_CONNTRACK_LABELS=y
 CONFIG_NF_NAT=m
+CONFIG_NETFILTER_XT_TARGET_LOG=m
 
 CONFIG_NET_SCHED=y
 
 #
 # Queueing/Scheduling
 #
-CONFIG_NET_SCH_ATM=m
 CONFIG_NET_SCH_CAKE=m
-CONFIG_NET_SCH_CBQ=m
 CONFIG_NET_SCH_CBS=m
 CONFIG_NET_SCH_CHOKE=m
 CONFIG_NET_SCH_CODEL=m
 CONFIG_NET_SCH_DRR=m
-CONFIG_NET_SCH_DSMARK=m
 CONFIG_NET_SCH_ETF=m
 CONFIG_NET_SCH_FQ=m
 CONFIG_NET_SCH_FQ_CODEL=m
@@ -57,8 +55,6 @@ CONFIG_NET_CLS_FLOW=m
 CONFIG_NET_CLS_FLOWER=m
 CONFIG_NET_CLS_MATCHALL=m
 CONFIG_NET_CLS_ROUTE4=m
-CONFIG_NET_CLS_RSVP=m
-CONFIG_NET_CLS_TCINDEX=m
 CONFIG_NET_EMATCH=y
 CONFIG_NET_EMATCH_STACK=32
 CONFIG_NET_EMATCH_CMP=m
index ba2f5e7..e21c7f2 100644 (file)
         "setup": [
             "$IP link add dev $DUMMY type dummy || /bin/true"
         ],
-        "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root sfb db 10",
+        "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root sfb db 100",
         "expExitCode": "0",
         "verifyCmd": "$TC qdisc show dev $DUMMY",
-        "matchPattern": "qdisc sfb 1: root refcnt [0-9]+ rehash 600s db 10ms",
+        "matchPattern": "qdisc sfb 1: root refcnt [0-9]+ rehash 600s db 100ms",
         "matchCount": "1",
         "teardown": [
             "$TC qdisc del dev $DUMMY handle 1: root",
index afb0cd8..eb357bd 100755 (executable)
@@ -2,5 +2,6 @@
 # SPDX-License-Identifier: GPL-2.0
 
 modprobe netdevsim
+modprobe sch_teql
 ./tdc.py -c actions --nobuildebpf
 ./tdc.py -c qdisc
index 8879a7b..d6979a4 100644 (file)
 
 #include "../kselftest_harness.h"
 
-const char *dyn_file = "/sys/kernel/tracing/dynamic_events";
-const char *clear = "!u:__test_event";
+const char *abi_file = "/sys/kernel/tracing/user_events_data";
+const char *enable_file = "/sys/kernel/tracing/events/user_events/__test_event/enable";
 
-static int Append(const char *value)
+static bool wait_for_delete(void)
 {
-       int fd = open(dyn_file, O_RDWR | O_APPEND);
-       int ret = write(fd, value, strlen(value));
+       int i;
+
+       for (i = 0; i < 1000; ++i) {
+               int fd = open(enable_file, O_RDONLY);
+
+               if (fd == -1)
+                       return true;
+
+               close(fd);
+               usleep(1000);
+       }
+
+       return false;
+}
+
+static int reg_event(int fd, int *check, int bit, const char *value)
+{
+       struct user_reg reg = {0};
+
+       reg.size = sizeof(reg);
+       reg.name_args = (__u64)value;
+       reg.enable_bit = bit;
+       reg.enable_addr = (__u64)check;
+       reg.enable_size = sizeof(*check);
+
+       if (ioctl(fd, DIAG_IOCSREG, &reg) == -1)
+               return -1;
+
+       return 0;
+}
+
+static int unreg_event(int fd, int *check, int bit)
+{
+       struct user_unreg unreg = {0};
+
+       unreg.size = sizeof(unreg);
+       unreg.disable_bit = bit;
+       unreg.disable_addr = (__u64)check;
+
+       return ioctl(fd, DIAG_IOCSUNREG, &unreg);
+}
+
+static int parse(int *check, const char *value)
+{
+       int fd = open(abi_file, O_RDWR);
+       int ret;
+
+       if (fd == -1)
+               return -1;
+
+       /* Until we have persist flags via dynamic events, use the base name */
+       if (value[0] != 'u' || value[1] != ':') {
+               close(fd);
+               return -1;
+       }
+
+       ret = reg_event(fd, check, 31, value + 2);
+
+       if (ret != -1) {
+               if (unreg_event(fd, check, 31) == -1)
+                       printf("WARN: Couldn't unreg event\n");
+       }
 
        close(fd);
+
        return ret;
 }
 
-#define CLEAR() \
+static int check_match(int *check, const char *first, const char *second, bool *match)
+{
+       int fd = open(abi_file, O_RDWR);
+       int ret = -1;
+
+       if (fd == -1)
+               return -1;
+
+       if (reg_event(fd, check, 31, first) == -1)
+               goto cleanup;
+
+       if (reg_event(fd, check, 30, second) == -1) {
+               if (errno == EADDRINUSE) {
+                       /* Name is in use, with different fields */
+                       *match = false;
+                       ret = 0;
+               }
+
+               goto cleanup;
+       }
+
+       *match = true;
+       ret = 0;
+cleanup:
+       unreg_event(fd, check, 31);
+       unreg_event(fd, check, 30);
+
+       close(fd);
+
+       wait_for_delete();
+
+       return ret;
+}
+
+#define TEST_MATCH(x, y) \
 do { \
-       int ret = Append(clear); \
-       if (ret == -1) \
-               ASSERT_EQ(ENOENT, errno); \
+       bool match; \
+       ASSERT_NE(-1, check_match(&self->check, x, y, &match)); \
+       ASSERT_EQ(true, match); \
 } while (0)
 
-#define TEST_PARSE(x) \
+#define TEST_NMATCH(x, y) \
 do { \
-       ASSERT_NE(-1, Append(x)); \
-       CLEAR(); \
+       bool match; \
+       ASSERT_NE(-1, check_match(&self->check, x, y, &match)); \
+       ASSERT_EQ(false, match); \
 } while (0)
 
-#define TEST_NPARSE(x) ASSERT_EQ(-1, Append(x))
+#define TEST_PARSE(x) ASSERT_NE(-1, parse(&self->check, x))
+
+#define TEST_NPARSE(x) ASSERT_EQ(-1, parse(&self->check, x))
 
 FIXTURE(user) {
+       int check;
 };
 
 FIXTURE_SETUP(user) {
-       CLEAR();
 }
 
 FIXTURE_TEARDOWN(user) {
-       CLEAR();
+       wait_for_delete();
 }
 
 TEST_F(user, basic_types) {
@@ -95,33 +193,30 @@ TEST_F(user, size_types) {
        TEST_NPARSE("u:__test_event char a 20");
 }
 
-TEST_F(user, flags) {
-       /* Should work */
-       TEST_PARSE("u:__test_event:BPF_ITER u32 a");
-       /* Forward compat */
-       TEST_PARSE("u:__test_event:BPF_ITER,FLAG_FUTURE u32 a");
-}
-
 TEST_F(user, matching) {
-       /* Register */
-       ASSERT_NE(-1, Append("u:__test_event struct custom a 20"));
-       /* Should not match */
-       TEST_NPARSE("!u:__test_event struct custom b");
-       /* Should match */
-       TEST_PARSE("!u:__test_event struct custom a");
-       /* Multi field reg */
-       ASSERT_NE(-1, Append("u:__test_event u32 a; u32 b"));
-       /* Non matching cases */
-       TEST_NPARSE("!u:__test_event u32 a");
-       TEST_NPARSE("!u:__test_event u32 b");
-       TEST_NPARSE("!u:__test_event u32 a; u32 ");
-       TEST_NPARSE("!u:__test_event u32 a; u32 a");
-       /* Matching case */
-       TEST_PARSE("!u:__test_event u32 a; u32 b");
-       /* Register */
-       ASSERT_NE(-1, Append("u:__test_event u32 a; u32 b"));
-       /* Ensure trailing semi-colon case */
-       TEST_PARSE("!u:__test_event u32 a; u32 b;");
+       /* Single name matches */
+       TEST_MATCH("__test_event u32 a",
+                  "__test_event u32 a");
+
+       /* Multiple names match */
+       TEST_MATCH("__test_event u32 a; u32 b",
+                  "__test_event u32 a; u32 b");
+
+       /* Multiple names match with dangling ; */
+       TEST_MATCH("__test_event u32 a; u32 b",
+                  "__test_event u32 a; u32 b;");
+
+       /* Single name doesn't match */
+       TEST_NMATCH("__test_event u32 a",
+                   "__test_event u32 b");
+
+       /* Multiple names don't match */
+       TEST_NMATCH("__test_event u32 a; u32 b",
+                   "__test_event u32 b; u32 a");
+
+       /* Types don't match */
+       TEST_NMATCH("__test_event u64 a; u64 b",
+                   "__test_event u32 a; u32 b");
 }
 
 int main(int argc, char **argv)
index 7c99cef..eb6904d 100644 (file)
@@ -102,30 +102,56 @@ err:
        return -1;
 }
 
+static bool wait_for_delete(void)
+{
+       int i;
+
+       for (i = 0; i < 1000; ++i) {
+               int fd = open(enable_file, O_RDONLY);
+
+               if (fd == -1)
+                       return true;
+
+               close(fd);
+               usleep(1000);
+       }
+
+       return false;
+}
+
 static int clear(int *check)
 {
        struct user_unreg unreg = {0};
+       int fd;
 
        unreg.size = sizeof(unreg);
        unreg.disable_bit = 31;
        unreg.disable_addr = (__u64)check;
 
-       int fd = open(data_file, O_RDWR);
+       fd = open(data_file, O_RDWR);
 
        if (fd == -1)
                return -1;
 
        if (ioctl(fd, DIAG_IOCSUNREG, &unreg) == -1)
                if (errno != ENOENT)
-                       return -1;
-
-       if (ioctl(fd, DIAG_IOCSDEL, "__test_event") == -1)
-               if (errno != ENOENT)
-                       return -1;
+                       goto fail;
+
+       if (ioctl(fd, DIAG_IOCSDEL, "__test_event") == -1) {
+               if (errno == EBUSY) {
+                       if (!wait_for_delete())
+                               goto fail;
+               } else if (errno != ENOENT)
+                       goto fail;
+       }
 
        close(fd);
 
        return 0;
+fail:
+       close(fd);
+
+       return -1;
 }
 
 static int check_print_fmt(const char *event, const char *expected, int *check)
@@ -155,9 +181,8 @@ static int check_print_fmt(const char *event, const char *expected, int *check)
        /* Register should work */
        ret = ioctl(fd, DIAG_IOCSREG, &reg);
 
-       close(fd);
-
        if (ret != 0) {
+               close(fd);
                printf("Reg failed in fmt\n");
                return ret;
        }
@@ -165,6 +190,8 @@ static int check_print_fmt(const char *event, const char *expected, int *check)
        /* Ensure correct print_fmt */
        ret = get_print_fmt(print_fmt, sizeof(print_fmt));
 
+       close(fd);
+
        if (ret != 0)
                return ret;
 
@@ -228,6 +255,12 @@ TEST_F(user, register_events) {
        ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
        ASSERT_EQ(0, reg.write_index);
 
+       /* Multiple registers to same name but different args should fail */
+       reg.enable_bit = 29;
+       reg.name_args = (__u64)"__test_event u32 field1;";
+       ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
+       ASSERT_EQ(EADDRINUSE, errno);
+
        /* Ensure disabled */
        self->enable_fd = open(enable_file, O_RDWR);
        ASSERT_NE(-1, self->enable_fd);
@@ -250,10 +283,10 @@ TEST_F(user, register_events) {
        unreg.disable_bit = 30;
        ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSUNREG, &unreg));
 
-       /* Delete should work only after close and unregister */
+       /* Delete should have been auto-done after close and unregister */
        close(self->data_fd);
-       self->data_fd = open(data_file, O_RDWR);
-       ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSDEL, "__test_event"));
+
+       ASSERT_EQ(true, wait_for_delete());
 }
 
 TEST_F(user, write_events) {
@@ -310,6 +343,39 @@ TEST_F(user, write_events) {
        ASSERT_EQ(EINVAL, errno);
 }
 
+TEST_F(user, write_empty_events) {
+       struct user_reg reg = {0};
+       struct iovec io[1];
+       int before = 0, after = 0;
+
+       reg.size = sizeof(reg);
+       reg.name_args = (__u64)"__test_event";
+       reg.enable_bit = 31;
+       reg.enable_addr = (__u64)&self->check;
+       reg.enable_size = sizeof(self->check);
+
+       io[0].iov_base = &reg.write_index;
+       io[0].iov_len = sizeof(reg.write_index);
+
+       /* Register should work */
+       ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
+       ASSERT_EQ(0, reg.write_index);
+       ASSERT_EQ(0, self->check);
+
+       /* Enable event */
+       self->enable_fd = open(enable_file, O_RDWR);
+       ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1")))
+
+       /* Event should now be enabled */
+       ASSERT_EQ(1 << reg.enable_bit, self->check);
+
+       /* Write should make it out to ftrace buffers */
+       before = trace_bytes();
+       ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 1));
+       after = trace_bytes();
+       ASSERT_GT(after, before);
+}
+
 TEST_F(user, write_fault) {
        struct user_reg reg = {0};
        struct iovec io[2];
index a070258..8b09be5 100644 (file)
@@ -81,6 +81,32 @@ static int get_offset(void)
        return offset;
 }
 
+static int clear(int *check)
+{
+       struct user_unreg unreg = {0};
+
+       unreg.size = sizeof(unreg);
+       unreg.disable_bit = 31;
+       unreg.disable_addr = (__u64)check;
+
+       int fd = open(data_file, O_RDWR);
+
+       if (fd == -1)
+               return -1;
+
+       if (ioctl(fd, DIAG_IOCSUNREG, &unreg) == -1)
+               if (errno != ENOENT)
+                       return -1;
+
+       if (ioctl(fd, DIAG_IOCSDEL, "__test_event") == -1)
+               if (errno != ENOENT)
+                       return -1;
+
+       close(fd);
+
+       return 0;
+}
+
 FIXTURE(user) {
        int data_fd;
        int check;
@@ -93,6 +119,9 @@ FIXTURE_SETUP(user) {
 
 FIXTURE_TEARDOWN(user) {
        close(self->data_fd);
+
+       if (clear(&self->check) != 0)
+               printf("WARNING: Clear didn't work!\n");
 }
 
 TEST_F(user, perf_write) {
@@ -160,6 +189,59 @@ TEST_F(user, perf_write) {
        ASSERT_EQ(0, self->check);
 }
 
+TEST_F(user, perf_empty_events) {
+       struct perf_event_attr pe = {0};
+       struct user_reg reg = {0};
+       struct perf_event_mmap_page *perf_page;
+       int page_size = sysconf(_SC_PAGESIZE);
+       int id, fd;
+       __u32 *val;
+
+       reg.size = sizeof(reg);
+       reg.name_args = (__u64)"__test_event";
+       reg.enable_bit = 31;
+       reg.enable_addr = (__u64)&self->check;
+       reg.enable_size = sizeof(self->check);
+
+       /* Register should work */
+       ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
+       ASSERT_EQ(0, reg.write_index);
+       ASSERT_EQ(0, self->check);
+
+       /* Id should be there */
+       id = get_id();
+       ASSERT_NE(-1, id);
+
+       pe.type = PERF_TYPE_TRACEPOINT;
+       pe.size = sizeof(pe);
+       pe.config = id;
+       pe.sample_type = PERF_SAMPLE_RAW;
+       pe.sample_period = 1;
+       pe.wakeup_events = 1;
+
+       /* Tracepoint attach should work */
+       fd = perf_event_open(&pe, 0, -1, -1, 0);
+       ASSERT_NE(-1, fd);
+
+       perf_page = mmap(NULL, page_size * 2, PROT_READ, MAP_SHARED, fd, 0);
+       ASSERT_NE(MAP_FAILED, perf_page);
+
+       /* Status should be updated */
+       ASSERT_EQ(1 << reg.enable_bit, self->check);
+
+       /* Ensure write shows up at correct offset */
+       ASSERT_NE(-1, write(self->data_fd, &reg.write_index,
+                                       sizeof(reg.write_index)));
+       val = (void *)(((char *)perf_page) + perf_page->data_offset);
+       ASSERT_EQ(PERF_RECORD_SAMPLE, *val);
+
+       munmap(perf_page, page_size * 2);
+       close(fd);
+
+       /* Status should be updated */
+       ASSERT_EQ(0, self->check);
+}
+
 int main(int argc, char **argv)
 {
        return test_harness_run(argc, argv);
index 15dcee1..38d46a8 100644 (file)
@@ -84,12 +84,12 @@ static inline int vdso_test_clock(unsigned int clock_id)
 
 int main(int argc, char **argv)
 {
-       int ret;
+       int ret = 0;
 
 #if _POSIX_TIMERS > 0
 
 #ifdef CLOCK_REALTIME
-       ret = vdso_test_clock(CLOCK_REALTIME);
+       ret += vdso_test_clock(CLOCK_REALTIME);
 #endif
 
 #ifdef CLOCK_BOOTTIME
diff --git a/tools/virtio/ringtest/.gitignore b/tools/virtio/ringtest/.gitignore
new file mode 100644 (file)
index 0000000..100b9e3
--- /dev/null
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0-only
+/noring
+/ptr_ring
+/ring
+/virtio_ring_0_9
+/virtio_ring_inorder
+/virtio_ring_poll
index b68920d..d18dd31 100644 (file)
@@ -8,6 +8,7 @@
 #ifndef MAIN_H
 #define MAIN_H
 
+#include <assert.h>
 #include <stdbool.h>
 
 extern int param;
@@ -95,6 +96,8 @@ extern unsigned ring_size;
 #define cpu_relax() asm ("rep; nop" ::: "memory")
 #elif defined(__s390x__)
 #define cpu_relax() barrier()
+#elif defined(__aarch64__)
+#define cpu_relax() asm ("yield" ::: "memory")
 #else
 #define cpu_relax() assert(0)
 #endif
@@ -112,6 +115,8 @@ static inline void busy_wait(void)
 
 #if defined(__x86_64__) || defined(__i386__)
 #define smp_mb()     asm volatile("lock; addl $0,-132(%%rsp)" ::: "memory", "cc")
+#elif defined(__aarch64__)
+#define smp_mb()     asm volatile("dmb ish" ::: "memory")
 #else
 /*
  * Not using __ATOMIC_SEQ_CST since gcc docs say they are only synchronized
@@ -136,10 +141,16 @@ static inline void busy_wait(void)
 
 #if defined(__i386__) || defined(__x86_64__) || defined(__s390x__)
 #define smp_wmb() barrier()
+#elif defined(__aarch64__)
+#define smp_wmb() asm volatile("dmb ishst" ::: "memory")
 #else
 #define smp_wmb() smp_release()
 #endif
 
+#ifndef __always_inline
+#define __always_inline inline __attribute__((always_inline))
+#endif
+
 static __always_inline
 void __read_once_size(const volatile void *p, void *res, int size)
 {
index 4fb9368..0127ff0 100644 (file)
@@ -95,7 +95,7 @@ Run
 
 1) Enable ftrace in the guest
  <Example>
-       # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+       # echo 1 > /sys/kernel/tracing/events/sched/enable
 
 2) Run trace agent in the guest
  This agent must be operated as root.
index cdfe77c..7e2d9bb 100644 (file)
@@ -18,8 +18,9 @@
 #define PIPE_DEF_BUFS          16
 #define PIPE_MIN_SIZE          (PAGE_SIZE*PIPE_DEF_BUFS)
 #define PIPE_MAX_SIZE          (1024*1024)
-#define READ_PATH_FMT  \
-               "/sys/kernel/debug/tracing/per_cpu/cpu%d/trace_pipe_raw"
+#define TRACEFS                "/sys/kernel/tracing"
+#define DEBUGFS                "/sys/kernel/debug/tracing"
+#define READ_PATH_FMT          "%s/per_cpu/cpu%d/trace_pipe_raw"
 #define WRITE_PATH_FMT         "/dev/virtio-ports/trace-path-cpu%d"
 #define CTL_PATH               "/dev/virtio-ports/agent-ctl-path"
 
@@ -120,9 +121,12 @@ static const char *make_path(int cpu_num, bool this_is_write_path)
        if (this_is_write_path)
                /* write(output) path */
                ret = snprintf(buf, PATH_MAX, WRITE_PATH_FMT, cpu_num);
-       else
+       else {
                /* read(input) path */
-               ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, cpu_num);
+               ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, TRACEFS, cpu_num);
+               if (ret > 0 && access(buf, F_OK) != 0)
+                       ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, DEBUGFS, cpu_num);
+       }
 
        if (ret <= 0) {
                pr_err("Failed to generate %s path(CPU#%d):%d\n",
diff --git a/tools/workqueue/wq_monitor.py b/tools/workqueue/wq_monitor.py
new file mode 100644 (file)
index 0000000..6e258d1
--- /dev/null
@@ -0,0 +1,168 @@
+#!/usr/bin/env drgn
+#
+# Copyright (C) 2023 Tejun Heo <tj@kernel.org>
+# Copyright (C) 2023 Meta Platforms, Inc. and affiliates.
+
+desc = """
+This is a drgn script to monitor workqueues. For more info on drgn, visit
+https://github.com/osandov/drgn.
+
+  total    Total number of work items executed by the workqueue.
+
+  infl     The number of currently in-flight work items.
+
+  CPUtime  Total CPU time consumed by the workqueue in seconds. This is
+           sampled from scheduler ticks and only provides ballpark
+           measurement. "nohz_full=" CPUs are excluded from measurement.
+
+  CPUitsv  The number of times a concurrency-managed work item hogged CPU
+           longer than the threshold (workqueue.cpu_intensive_thresh_us)
+           and got excluded from concurrency management to avoid stalling
+           other work items.
+
+  CMwake   The number of concurrency-management wake-ups while executing a
+           work item of the workqueue.
+
+  mayday   The number of times the rescuer was requested while waiting for
+           new worker creation.
+
+  rescued  The number of work items executed by the rescuer.
+"""
+
+import sys
+import signal
+import os
+import re
+import time
+import json
+
+import drgn
+from drgn.helpers.linux.list import list_for_each_entry,list_empty
+from drgn.helpers.linux.cpumask import for_each_possible_cpu
+
+import argparse
+parser = argparse.ArgumentParser(description=desc,
+                                 formatter_class=argparse.RawTextHelpFormatter)
+parser.add_argument('workqueue', metavar='REGEX', nargs='*',
+                    help='Target workqueue name patterns (all if empty)')
+parser.add_argument('-i', '--interval', metavar='SECS', type=float, default=1,
+                    help='Monitoring interval (0 to print once and exit)')
+parser.add_argument('-j', '--json', action='store_true',
+                    help='Output in json')
+args = parser.parse_args()
+
+def err(s):
+    print(s, file=sys.stderr, flush=True)
+    sys.exit(1)
+
+workqueues              = prog['workqueues']
+
+WQ_UNBOUND              = prog['WQ_UNBOUND']
+WQ_MEM_RECLAIM          = prog['WQ_MEM_RECLAIM']
+
+PWQ_STAT_STARTED        = prog['PWQ_STAT_STARTED']      # work items started execution
+PWQ_STAT_COMPLETED      = prog['PWQ_STAT_COMPLETED']   # work items completed execution
+PWQ_STAT_CPU_TIME       = prog['PWQ_STAT_CPU_TIME']     # total CPU time consumed
+PWQ_STAT_CPU_INTENSIVE  = prog['PWQ_STAT_CPU_INTENSIVE'] # wq_cpu_intensive_thresh_us violations
+PWQ_STAT_CM_WAKEUP      = prog['PWQ_STAT_CM_WAKEUP']    # concurrency-management worker wakeups
+PWQ_STAT_MAYDAY         = prog['PWQ_STAT_MAYDAY']      # maydays to rescuer
+PWQ_STAT_RESCUED        = prog['PWQ_STAT_RESCUED']     # linked work items executed by rescuer
+PWQ_NR_STATS            = prog['PWQ_NR_STATS']
+
+class WqStats:
+    def __init__(self, wq):
+        self.name = wq.name.string_().decode()
+        self.unbound = wq.flags & WQ_UNBOUND != 0
+        self.mem_reclaim = wq.flags & WQ_MEM_RECLAIM != 0
+        self.stats = [0] * PWQ_NR_STATS
+        for pwq in list_for_each_entry('struct pool_workqueue', wq.pwqs.address_of_(), 'pwqs_node'):
+            for i in range(PWQ_NR_STATS):
+                self.stats[i] += int(pwq.stats[i])
+
+    def dict(self, now):
+        return { 'timestamp'            : now,
+                 'name'                 : self.name,
+                 'unbound'              : self.unbound,
+                 'mem_reclaim'          : self.mem_reclaim,
+                 'started'              : self.stats[PWQ_STAT_STARTED],
+                 'completed'            : self.stats[PWQ_STAT_COMPLETED],
+                 'cpu_time'             : self.stats[PWQ_STAT_CPU_TIME],
+                 'cpu_intensive'        : self.stats[PWQ_STAT_CPU_INTENSIVE],
+                 'cm_wakeup'            : self.stats[PWQ_STAT_CM_WAKEUP],
+                 'mayday'               : self.stats[PWQ_STAT_MAYDAY],
+                 'rescued'              : self.stats[PWQ_STAT_RESCUED], }
+
+    def table_header_str():
+        return f'{"":>24} {"total":>8} {"infl":>5} {"CPUtime":>8} '\
+            f'{"CPUitsv":>7} {"CMwake":>7} {"mayday":>7} {"rescued":>7}'
+
+    def table_row_str(self):
+        cpu_intensive = '-'
+        cm_wakeup = '-'
+        mayday = '-'
+        rescued = '-'
+
+        if not self.unbound:
+            cpu_intensive = str(self.stats[PWQ_STAT_CPU_INTENSIVE])
+            cm_wakeup = str(self.stats[PWQ_STAT_CM_WAKEUP])
+
+        if self.mem_reclaim:
+            mayday = str(self.stats[PWQ_STAT_MAYDAY])
+            rescued = str(self.stats[PWQ_STAT_RESCUED])
+
+        out = f'{self.name[-24:]:24} ' \
+              f'{self.stats[PWQ_STAT_STARTED]:8} ' \
+              f'{max(self.stats[PWQ_STAT_STARTED] - self.stats[PWQ_STAT_COMPLETED], 0):5} ' \
+              f'{self.stats[PWQ_STAT_CPU_TIME] / 1000000:8.1f} ' \
+              f'{cpu_intensive:>7} ' \
+              f'{cm_wakeup:>7} ' \
+              f'{mayday:>7} ' \
+              f'{rescued:>7} '
+        return out.rstrip(':')
+
+exit_req = False
+
+def sigint_handler(signr, frame):
+    global exit_req
+    exit_req = True
+
+def main():
+    # handle args
+    table_fmt = not args.json
+    interval = args.interval
+
+    re_str = None
+    if args.workqueue:
+        for r in args.workqueue:
+            if re_str is None:
+                re_str = r
+            else:
+                re_str += '|' + r
+
+    filter_re = re.compile(re_str) if re_str else None
+
+    # monitoring loop
+    signal.signal(signal.SIGINT, sigint_handler)
+
+    while not exit_req:
+        now = time.time()
+
+        if table_fmt:
+            print()
+            print(WqStats.table_header_str())
+
+        for wq in list_for_each_entry('struct workqueue_struct', workqueues.address_of_(), 'list'):
+            stats = WqStats(wq)
+            if filter_re and not filter_re.search(stats.name):
+                continue
+            if table_fmt:
+                print(stats.table_row_str())
+            else:
+                print(stats.dict(now))
+
+        if interval == 0:
+            break
+        time.sleep(interval)
+
+if __name__ == "__main__":
+    main()
index cb5c13e..65f94f5 100644 (file)
@@ -686,6 +686,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
 
        return __kvm_handle_hva_range(kvm, &range);
 }
+
+static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+       /*
+        * Skipping invalid memslots is correct if and only change_pte() is
+        * surrounded by invalidate_range_{start,end}(), which is currently
+        * guaranteed by the primary MMU.  If that ever changes, KVM needs to
+        * unmap the memslot instead of skipping the memslot to ensure that KVM
+        * doesn't hold references to the old PFN.
+        */
+       WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
+
+       if (range->slot->flags & KVM_MEMSLOT_INVALID)
+               return false;
+
+       return kvm_set_spte_gfn(kvm, range);
+}
+
 static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
                                        struct mm_struct *mm,
                                        unsigned long address,
@@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
        if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
                return;
 
-       kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
+       kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
 }
 
 void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
@@ -3962,18 +3980,19 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
        }
 
        vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
-       r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
-       BUG_ON(r == -EBUSY);
+       r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT);
        if (r)
                goto unlock_vcpu_destroy;
 
        /* Now it's all set up, let userspace reach it */
        kvm_get_kvm(kvm);
        r = create_vcpu_fd(vcpu);
-       if (r < 0) {
-               xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
-               kvm_put_kvm_no_destroy(kvm);
-               goto unlock_vcpu_destroy;
+       if (r < 0)
+               goto kvm_put_xa_release;
+
+       if (KVM_BUG_ON(!!xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) {
+               r = -EINVAL;
+               goto kvm_put_xa_release;
        }
 
        /*
@@ -3988,6 +4007,9 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
        kvm_create_vcpu_debugfs(vcpu);
        return r;
 
+kvm_put_xa_release:
+       kvm_put_kvm_no_destroy(kvm);
+       xa_release(&kvm->vcpu_array, vcpu->vcpu_idx);
 unlock_vcpu_destroy:
        mutex_unlock(&kvm->lock);
        kvm_dirty_ring_free(&vcpu->dirty_ring);
@@ -5184,7 +5206,20 @@ static void hardware_disable_all(void)
 static int hardware_enable_all(void)
 {
        atomic_t failed = ATOMIC_INIT(0);
-       int r = 0;
+       int r;
+
+       /*
+        * Do not enable hardware virtualization if the system is going down.
+        * If userspace initiated a forced reboot, e.g. reboot -f, then it's
+        * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling
+        * after kvm_reboot() is called.  Note, this relies on system_state
+        * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops
+        * hook instead of registering a dedicated reboot notifier (the latter
+        * runs before system_state is updated).
+        */
+       if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
+           system_state == SYSTEM_RESTART)
+               return -EBUSY;
 
        /*
         * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
@@ -5197,6 +5232,8 @@ static int hardware_enable_all(void)
        cpus_read_lock();
        mutex_lock(&kvm_lock);
 
+       r = 0;
+
        kvm_usage_count++;
        if (kvm_usage_count == 1) {
                on_each_cpu(hardware_enable_nolock, &failed, 1);
@@ -5213,26 +5250,24 @@ static int hardware_enable_all(void)
        return r;
 }
 
-static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
-                     void *v)
+static void kvm_shutdown(void)
 {
        /*
-        * Some (well, at least mine) BIOSes hang on reboot if
-        * in vmx root mode.
-        *
-        * And Intel TXT required VMX off for all cpu when system shutdown.
+        * Disable hardware virtualization and set kvm_rebooting to indicate
+        * that KVM has asynchronously disabled hardware virtualization, i.e.
+        * that relevant errors and exceptions aren't entirely unexpected.
+        * Some flavors of hardware virtualization need to be disabled before
+        * transferring control to firmware (to perform shutdown/reboot), e.g.
+        * on x86, virtualization can block INIT interrupts, which are used by
+        * firmware to pull APs back under firmware control.  Note, this path
+        * is used for both shutdown and reboot scenarios, i.e. neither name is
+        * 100% comprehensive.
         */
        pr_info("kvm: exiting hardware virtualization\n");
        kvm_rebooting = true;
        on_each_cpu(hardware_disable_nolock, NULL, 1);
-       return NOTIFY_OK;
 }
 
-static struct notifier_block kvm_reboot_notifier = {
-       .notifier_call = kvm_reboot,
-       .priority = 0,
-};
-
 static int kvm_suspend(void)
 {
        /*
@@ -5263,6 +5298,7 @@ static void kvm_resume(void)
 static struct syscore_ops kvm_syscore_ops = {
        .suspend = kvm_suspend,
        .resume = kvm_resume,
+       .shutdown = kvm_shutdown,
 };
 #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
 static int hardware_enable_all(void)
@@ -5967,7 +6003,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
        if (r)
                return r;
 
-       register_reboot_notifier(&kvm_reboot_notifier);
        register_syscore_ops(&kvm_syscore_ops);
 #endif
 
@@ -6039,7 +6074,6 @@ err_cpu_kick_mask:
 err_vcpu_cache:
 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
        unregister_syscore_ops(&kvm_syscore_ops);
-       unregister_reboot_notifier(&kvm_reboot_notifier);
        cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
 #endif
        return r;
@@ -6065,7 +6099,6 @@ void kvm_exit(void)
        kvm_async_pf_deinit();
 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
        unregister_syscore_ops(&kvm_syscore_ops);
-       unregister_reboot_notifier(&kvm_reboot_notifier);
        cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
 #endif
        kvm_irqfd_exit();